Microsoft Word - SOPNN_Optimization.doc 1 Optimization of Self-Organizing Polynomial Neural Networks Author: Ivan Marić Rudjer Boskovic Institute Bijenicka 54 10000 Zagreb Croatia Email: ivan.maric@irb.hr Phone: +385-1-4561191 CORE Metadata, citation and similar papers at core.ac.uk https://core.ac.uk/display/33995641?utm_source=pdf&utm_medium=banner&utm_campaign=pdf-decoration-v1 2 Abstract The main disadvantage of self-organizing polynomial neural networks (SOPNN) automatically structured and trained by the group method of data handling (GMDH) algorithm is a partial optimization of model weights as the GMDH algorithm optimizes only the weights of the topmost (output) node. In order to estimate to what extent the approximation accuracy of the obtained model can be improved the particle swarm optimization (PSO) has been used for the optimization of weights of all node-polynomials. Since the PSO is generally computationally expensive and time consuming a more efficient Levenberg-Marquardt (LM) algorithm is adapted for the optimization of the SOPNN. After it has been optimized by the LM algorithm the SOPNN outperformed the corresponding models based on artificial neural networks (ANN) and support vector method (SVM). The research is based on the meta-modeling of the thermodynamic effects in fluid flow measurements with time-constraints. The outstanding characteristics of the optimized SOPNN models are also demonstrated in learning the recurrence relations of multiple superimposed oscillations (MSO). Keywords Polynomial neural networks GMDH Levenberg-Marquardt algorithm Particle swarm optimization Time series modeling 3 1. Introduction Approximation of complex multidimensional systems by SOPNN, also known as the GMDH polynomial neural networks (PNN), was introduced by Ivakhnenko, (1971). The SOPNN are constructed by combining the low order polynomials into multi layered polynomial structures where the coefficients of the low-order polynomials (generally 2-dimensional 2 nd order polynomials) are obtained by polynomial regression (Chapra & Canale, 1998) with the aim to minimize the approximation error. GMDH models may achieve reasonable approximation accuracy at low complexity and are simple to implement in digital computers (Maric & Ivek, 2011). The GMDH is resistant to over-fitting since it uses separate data sets for regression and for model selection. When applied to real time compensation of nonlinear behavior, the self-organizing nature of GMDH may eliminate the complicated structural modeling and parameterization, common to conventional modeling strategies (Iwasaki, Takei & Matsui, 2003). The performance of the SOPNN is generally evaluated by a single parameter measure (Witten & Eibe, 2005), typically by the least square error, which minimizes the model approximation error rather than its complexity. When building the models for time-constrained applications the constraints can be efficiently embedded into the model selection metrics (Maric & Ivek, 2011). It was shown (Maric & Ivek, 2011) that the raw SOPNN models (GMDH PNN) are inferior to multilayer perceptron (MLP) when considering the accuracy with respect to the complexity. It was also concluded (Maric & Ivek, 2011) that SVM is not appropriate for building the low complexity models for time-constrained applications. The SOPNN node-polynomial weights, after calculated by the regression, remain unchanged during the rest of the training process resulting in sub-optimal SOPNN models. The accuracy and the prediction of models may be improved significantly when trained by the genetic programming and back-propagation (BP). It was shown (Nikolaev & Iba, 2003) that the population-based search technique, relying on the genetic programming and the BP algorithm, enables to identify the networks with good training as well as generalization performances. The BP improves the accuracy of the model but it is known to often get stuck in local minima. The idea of this paper is to adapt a more robust procedure for the optimization of the SOPNN relation with respect to its weights. The PSO is a nature inspired algorithm, which enables the optimization of model weights by simulating the flight of a bird flock (Eberhart & Kennedy, 1995). Since the PSO is simple to implement it has been used in our experiments for the estimation of the approximation abilities of raw SOPNN models. After the PSO improved significantly the approximation accuracy of various SOPNN models a more complex Levenberg-Marquardt (LM) algorithm (Levenberg, 1944; Marquardt, 1963) has been adapted for the optimization of model weights. Although widely used for the optimization of ANN, the use of the LM algorithm for the optimization of weights of the SOPNN to the best of my knowledge has not been reported in the literature. The LM algorithm converges many times faster than the PSO and increases the approximation accuracy of the SOPNN model substantially. This paper describes the adaptation of PSO and LM algorithm for the optimization of SOPNN and demonstrates how the approximation accuracy of the original GMDH model can be significantly improved after optimizing its weights. In the following section, the GMDH, PSO and the LM algorithm are described. The PSO and the LM algorithm are adapted for the optimization of SOPNN weights. Section 3 describes the procedure for the estimation of the execution time for SOPNN and MLP in time constrained applications. A procedure for the compensation of thermodynamic effects in flow rate measurements is summarized in section 4 and the results of the simulations of the flow rate error compensation procedure by the surrogate models are given in section 5. Finally in section 6, the outstanding performances of the SOPNN are demonstrated on MSO task that has been widely studied in echo state networks (ESN) literature. 2. GMDH, PSO and LM algorithm 2.1. GMDH algorithm The GMDH algorithm (Ivakhnenko, 1971) constructs the models by combining the low-order polynomials into multi layered polynomial networks. Fig. 1 illustrates a complete two-layer feed-forward SOPNN representing a 3-dimensional system, where pλ,i denotes a low-order and low-dimensional polynomial corresponding to the i th node of the layer λ and xi represents the i th independent variable. 4 First order and second order two-dimensional polynomials are preferred as their cascading does not increase rapidly the order of overall polynomial relation and because they are fast to process. In this paper we will restrict our attention to a complete second-order two-dimensional polynomial 216 2 25 2 1423121 iiiiiiiiiiiii zzazazazazaap λλλλλλλλλλλλλ +++++= , (1) where zλi1 and zλi2 represent the input variables and aλi1,...,aλi6 are the corresponding weights (coefficients) obtained by the polynomial regression. Note that zλi1 and zλi2 can be any combination of two different variables from lower layers including the independent variables (xi) and the derived regression polynomials (pλi). For example, the input variables for the polynomial p2,6 from Fig. 1 are the polynomial p1,1 from the first layer and the independent variable x3. Fig. 1. The GMDH algorithm assumes two independent data sets: a training set of M samples M 1i }),{( == titit yxD , (2) where each sample consists of a data vector ( ) tiKtititi xxx ,...,, 21 =x , K ti x R∈ , and the corresponding dependent variable R∈ ti y , as well a validation data set of N samples N 1i }),{( == viviv yxD (3) with data vector ( ) viKvivivi xxx ,...,, 21 =x , K vi x R∈ , and the corresponding dependent variable R∈ vi y . The algorithm uses the training data set (2) to fit the coefficients of the regression polynomial (1) and the validation set (3) to verify the approximation error of the polynomial. The polynomials are then ranked according to some predefined metrics (Maric & Ivek, 2011). In order to calculate the coefficients, aλi1,...,aλi6, in (1) by the polynomial regression, a set of 6 simultaneous linear equations ( ) 6 11 2 0 ==       =− ∂ ∂ ∑ k M m tm itm ik py a λ λ (4) must be solved, where M is a total number of training samples, tm y is the m th sample value of the dependent variable from the training data set (2) and tm i pλ is the value of the i th polynomial at layer λ corresponding to the m th data vector from the training data set (t). In our implementation of the GMDH algorithm the above set of linear equations is solved by the Gauss elimination method using forward elimination, back substitution, and pivoting (Chapra & Canale,1998). As illustrated in Fig. 1. the total number of possible nodes in each layer is increasing rapidly by the increase of the layer number and can be easily calculated by the following simple iterative equation 5 ( ) ∑ − = + + −⋅ = 1 0 1 2 1 λ λ λλ λ i i NN NN N , where λ=0,1,… denotes the corresponding layer, Nλ is the total number of nodes at layer λ and the second term in the equation equals zero for λ=0. To prevent the combinatorial explosion the maximum number of nodes retained per layer is generally limited. The nodes retained at lower layers are combined multiple times to produce the nodes at higher layers. To speedup the algorithm the retained polynomials may be tabulated. 2.2. Particle Swarm Optimization of SOPNN Weights Let N ℜ∈a be the vector, {ai}, i=1,...,N, in N-dimensional space. Let the SOPNN polynomial relation P(x,a) be the particle whose position in the space is defined by the vector a. Particle swarm optimization (Eberhart & Kennedy, 1995) minimizes the nonlinear fitness function: ( ) ( )[ ]∑ = −= K k kk Pye 1 2 , axa (5) where K is the total number of training vectors, xk denotes the k th M-dimensional input training vector, yk=y(xk) is the corresponding training value and a denotes N-dimensional vector representing the coefficients of all nodes of the SOPNN polynomial P(xk,a). Before starting the PSO the position a and the velocity v of each particle are initialized by: NiKkLUrLa iikiiki ,...,1,,...,1, 0 ==−+= (6) and ( ) NiKkLUsv iikiki ,...,1,,...,1,12 0 ==−⋅−= (7) where 0 ki a and 0 ki v are the corresponding i th component of the k th particle’s initial position and velocity, Li and Ui are the corresponding lower and upper boundaries for the i th dimension of the search space, and rki and ski denote the random numbers from the range [0,1]. For each particle k, the best particle position Ak and the best fitness value Ek are initialized by setting them equal to the initial position and the initial fitness value of the particle, respectively. The pointer to the particle α having the best fitness value, Eα, of all particles is also initialized. According to (Shi & Eberhart, 1998) the particles are manipulated iteratively by using the following equations: ( ) ( ) NiKkaAScaARcvwv j kiiki j kikiki j ki j ki ,...,1,,...,1̧, 21 1 ==−+−⋅⋅+⋅= + α (8) and NiKkvaa j ki j ki j ki ,...,1,,...,1, 11 ==+= ++ , (9) where j ki v and j ki a denote the corresponding i th component of the k th particle’s velocity and position in j th iteration, w denotes the inertia weight (0.81. The following pseudo code illustrates the simplified flow diagram of the LM algorithm (Marquardt, 1963): Begin with: i=0, γ= γ0, e0=e(a) Eq. (5) 1. Set γ1=γ 2. Calculate δ1, and e1=e(a+δ1); Eq. (15) and (11) 3. Set γ2= γ/ν 4. Calculate δ2 and e2=e(a+δ2); Eq. (15), and (11) 5. If e1γM Then GoTo 13. 9. γ1=γ 10. Calculate δ1, and e1=e(a+δ1); Eq. (15) and (11) 11. If e1