Microsoft Word - Zhang et al.pdf African Journal of Biotechnology Vol. 10(32), pp. 6162-6171, 4 July, 2011 Available online at http://www.academicjournals.org/AJB DOI: 10.5897/AJB10.2086 ISSN 1684–5315 © 2011 Academic Journals Full Length Research Paper Comparison of ν-support vector regression and logistic equation for descriptive modeling of Lactobacillus plantarum growth Lijing Zhang 1,3 , Zhijian Song 2,3 , Xiaodong Pan 3 , Mingguang Feng 1 and Zhihua Jin 3 * 1 Institute of Microbiology, Zhejiang University, Hangzhou, 310058, China. 2 Institute of Biomedical Informatics/Zhejiang Provincial Key Laboratory of Medical Genetics, Wenzhou Medical College, Wenzhou 325000, China. 3 Ningbo Institute of Technology, Zhejiang University, Ningbo 315104, China. Accepted 15 April, 2011 Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of biochemistry systems. As a novel type of learning method, support vector regression (SVR) owns the powerful capability to characterize problems via small sample, non- linearity, high dimension and local minima. In this paper, we developed a ν-SVR model with genetic algorithms (GA) in the pre-estimate in Lactobacillus plantarum fermentation by comparing the predicting capability of logistic model and SVR model. 5-fold cross validation technique was applied in the SVR train to avoid over-fitting. The information of SVR parameters were obtained in the generation of 150 and the optimal parameters were C= 235.8935, σ= 8.3608, ν=0.7587. Correspondingly, the logistic model parameters maxµ and maxx were estimated as 0.4791 and 0.3498, respectively. The experimental results demonstrated that, SVR model excelled the logistic model based on the normalized mean square error (NMSE), mean absolute percentage error (MAPE) and the Pearson correlation coefficient R. We found that the ν-SVR model optimized by genetic algorithms could be a potential monitoring method for prediction of biomass. Key words: Support vector regression, genetic algorithm, logistic model, prediction of biomass. INTRODUCTION Because microorganism fermentation process is complex and non-linear, many parameters such as the concentration of bacteria, substrates and products are difficult to measure on-line (Wang et al., 2006). In order to optimize bioprocess and put the advanced algorithms of control into practice, it is necessary to monitor and diagnose the bioprocess parameters. Kinetic models are generally experimentally derived mathematical formulas that can agree well with the cultivation data (Bessaou and Siarry, 2001). Kinetic models can be linear or nonlinear, such as ordinary (or * Corresponding author. E-mail: zlj@nit.zju.edu.cn. Tel: +86 574 88229516. Fax: +86 574 88229516. partial) differential equations, cellular automata (Xiao et al., 2005; Xiao et al., 2006) and logistic model. The logistic model firstly, proposed by Verhulst in the eighteenth century(Liu et al., 2002), has been demonstrated to be the most illustrative model of microbial growth dynamics in a habitat of finite resources (Fujikawa et al., 2002; Peleg et al., 2007; Schepers et al., 2000). Logistic equation is based on the notion that the momentary growth rate of a given population inoculated into a virgin environment is proportional to the momentary population’s size. Recently, a statistical learning theory based on formalism known as the support vector machines (SVMs) emerged as a novel powerful tool for data classification and regression, which are called support vector classification (SVC) and support vector regression (SVR). SVMs could solve the problems resulted from small samples, nonlinearity, high dimension and local minimum (Vapnik, 1995; Borin et al., 2006; Cai et al., 2003; Moreira et al., 2007). Therefore, SVR is gradually accepted in the data-driven nonlinear modeling applications (Desai et al., 2006). Unfortunately, for the generalization performance of SVR which depends on SVR parameters, how to set these parameters by a given set of data is the main issue for the application of SVR. Cherkassky (2004) described ε-support vector regression (ε-SVR) for setting meta- parameters. However, as the parameter ε influences the solution indirectly and lacks intuitionistic means, it is difficult to give appropriate value for ε. To overcome this problem, a new parameter ν was introduced to control the fitness and predication accuracy instead of parameter C in SVR (Schepers et al., 2000). Chalimourda et al. (2004) studied optimal ν in SVR for different noise models using the data from sine function and Boston housing problem. Recently, the performance of SVR in bioprocess was also researched (Li et al., 2006a; Li and Yuan, 2006b). However, ν-SVR coupled with genetic algorithm has not been investigated in fermentation process. In this study, ν-SVR was applied to determine the concentration of cell via the data from fermentation process. We aimed to find a suitable method for the simulation of microbial growth by comparing different modeling strategies in process simulation and fermentation control. The organism used in the model was Lactobacillus plantarum. MODEL STRUCTURE Logistic model As unstructured kinetic model, logistic model is the most frequently employed for modeling microbial systems, because it is simple and good enough for technical purposes ((Kiviharju et al., 2006; Mu et al., 2006). The logistic model can describe the kinetics of the microbial growth as follows: m 1 max ax dx x x dt x µ   = −    (1) Where, x (g l −1 ) is the cell concentration, t (h) is the fermentation time, maxµ (h -1 ) is the maximum specific growth rate and m axx (g l −1 ) is the maximum cell concentration. Using x = 0 x (t = 0), integration of equation 1 gives the following equation for microbial biomass: Zhang et al. 6163 m 0 m m 0 0 axm ax ax ax t t x x e x x x x e µ µ = − + (2) Support vector regression Support vector regression The ν-support vector regression (ν-SVR) is a promising method for solving non-linear regression problem, which depends on the theory of support vector machines (SVMs). In this section, a brief introduction of SVR is given. Given a set of data points: l i i i=1 T={ (x , y )} N R R∈ × , where m i x R∈ is the input vector, iy R∈ is the desired value and l is the total number of data points. SVR estimation aims to seek for a function using the following formula: ( ) T i y w x b= ⋅ Φ + (3) Where, w is the weight, b is the coefficient, and ( )ixΦ denotes the data in high dimensional feature space mapped by applying nonlinear kernel function from the input space i x . The coefficients (both w and b) could be optimized only by minimizing the regularized risk function: 2 1 1 1 ( ) 2 l i ik i w C l ζ ζ ∗ = + +∑ (4) Where, 21 2 k w is called the regularized term. 1 1 ( ) l i i il ζ ζ ∗ = +∑ is the empirical risk measured by the ε − insensitive loss function. The slack variable i ζ , is called the upper training error and iζ ∗ represents the lower training error subject to the " ε -radius-tube” where (( ( ) ) i i y x bω ε− ⋅ Φ + ≤ . C is the regularization factor which has the ability of determining the trade-off between the empirical risk and the regularized term. As a consequence, the ε-SVR problem can be 6164 Afr. J. Biotechnol. described as the following quadratic optimization problem: 2 , , 1 1 1 min ( ) 2 l i ikb i w C lω ζ ζ ζ ∗ =   + ⋅ +    ∑ (5) (( ( ) ) , (( ( ) ) , , 0, 1, 2, 3 i i i i i i i i subject to x b y y x b i l ω ε ζ ω ε ζ ζ ζ ∗ ∗ ⋅ Φ + − ≤ + − ⋅ Φ + ≤ + ≥ = L However, the varied value of the ε could be selected from all values in real number and would deeply influence the solution of quadratic optimization problem. Fortunately, Schölkopf et al. (2000) cited a weight parameter (ν) multiplied by ε in the ε-SVR problem to control the tube size, which avoids exhausting search of the ε parameter in the real number range. After this transition, the optimization problem is as follows: 2 , , 1 1 1 m in ( ) 2 l i ikb i w C lω ζ υ ε ζ ζ ∗ =    + ⋅ + +      ∑ (6) (( ( ) ) , (( ( ) ) , , 0, 1, 2, 3 i i i i i i i i su b jec t to x b y y x b i l ω ε ζ ω ε ζ ζ ζ ∗ ∗ ⋅ Φ + − ≤ + − ⋅ Φ + ≤ + ≥ = L Usually, the problem stated earlier could be successfully solved in its dual form by exploiting the Lagrange multipliers approach (Vapnik, 1995; Belousov et al., 2004). Through the Karush-Kuhn-Tucker (KKT) conditions of solving quadratic programming problem, the dual Equation 5 leads to the solution by maximizing the , i i α α∗ , which are Lagrange multipliers . * , , 1 1 1 max ( )( ) ( , ) ( ) 2 l l i i j j i j i i i i j i K x x y α α α α α α α α∗ ∗ ∗ = = − − + −∑ ∑ (7) 1 1 ( ) 0 0 , , 1, 2, 3 ; ( ) l i i i i i l i i i su bject to C i l l C α α α α α α υ ∗ = ∗ ∗ = − = ≤ ≤ ∀ = + ≤ ⋅ ∑ ∑ L They satisfy the condition of 0, 0, 0 i i i i xα α α∗ ∗× = ≥ ≥ . Where 1, 2...i l= . Parameters determination of the ν-SVR model To construct an efficient ν-SVR model, SVR’s parameters are crucial and should be set carefully (Fujikawa et al., 2004). These parameters include: Kernel function: The term ( , )i jk x x in Equation 7 is defined as kernel function. Generally speaking, as the key in the SVM theory, kernel function is used to map the input feature data ix into a higher-dimensional feature space via non-linear mapping (Bessaou and Siarry, 2001). Three typical examples of kernel functions are polynomial, radial basis function (RBF) and linear kernel. As a kind of RBF kernel functions, Gaussian function yielding better prediction performance was used in this study (Samanta et al., 2003). Penalty parameter C: As mentioned earlier, C determines the trade off between margin maximization and error minimization. Bandwidth of the kernel function ( 2σ ): In this paper, sigma square ( 2σ ) denotes the variance of the Gaussian kernel function. Parameter ν: ν represents an upper bound on the fraction of errors and a lower bound on the fraction of parameters to build the regression function and the range is from 0 to 1. In general, the accurate estimation of ν-SVR model is based on the hyper-parameters (C, σ , ν), these parameters need to be selected in ν-SVR. But now it is still a complex problem and hard to obtain the optimized parameters value in an effective algorithm. Heuristic search techniques are usually used to find the optimal SVR parameter settings, such as genetic algorithm (GA) or simulation annealing algorithm (SA). Thereinto, GA is better for robust optimization in a complex search space (Potocnik and Grabec, 1999; Ustun et al., 2005). In this study, the coupled-model of GA-SVR was proposed to find the optimal values of ν-SVR parameters. GA-SVR optimization procedure Genetic algorithm (GA), which was formerly introduced by Holland (1975), employs the biological techniques of Figure 1. GA-ν-SVR model. mutation and crossover to search the local optimal solutions. Figure 1 illustrates the algorithm of the GA- SVR model. Our proposed GA- SVR is described as follows in detail: Step 1. Population initialization: In this process, the three ν-SVR parameters (C, σ , ν) were converted decimal code into a binary format. The chromosome X was represented as X = p1, p2, p3, where p1, p2 and p3 denote the regularization parameters C, σ and ν, respectively. One population contains n sizes of such kind chromosomes and is randomly generated. In this study, we used 100 sizes of such chromosomes and when the chromosomes were put into SVR to train, the binary format was translated into decimal once again. Step 2. Evaluate the fitness function: The fitness of training data set is easy to be calculated, but also, it is prone to be over-fitting. The 5-fold cross validation technique was applied in the SVR train to avoid over- fitting and to make the result generalized and reasonable (Holland, 1975; Browne, 2000). The fitness function of the parameter set is measured as the criteria of minimizing the MAPE (mean absolute percentage error): Zhang et al. 6165 MAPE= 1 100% / × n i i ii a p a n = −∑ Where, ip denotes the predicted value in SVR model, i a represents the actual experimental observation and n is the number of experimental observations. The smaller the values of MAPE in the regression, the better the prediction performance would be. Step 3. Genetic operation: The genetic operators (Figure 2) which contain selection, crossover and mutation are mainly illustrated as follows: Selection: A spin of a weighted roulette wheel is applied to choose chromosomes in the current population with the possibility of selection frequency rate. Crossover: Crossover is a typical genetic operation in genetic algorithm which allows new population to be created in the search space. This mechanism operation aims to create a crossover point randomly in a pair of parental chromosomes and then, exchange genes between these two chromosomes. This situation occurs between two chromosomes and the crossover rate could be described with the frequency of crossover. Mutation: Mutation process is the same genetic operator of crossover which also alters the solution of parental chromosome. However, it happened in one chromosome and the genes may occasionally be altered. In binary code, one mutation means genes changing code from 0 to 1 at one random point. After these operators, if the minimum fitness value of the new population is smaller than that of the old population, the offspring chromosome could replace the old chromosome and create a new population. Step 4. Termination criteria: The process was repeated from the genetic operation and fitness function evaluation until the number of generations was up to termination. After that, the best chromosomes would be presented as a solution. Of course, the information of parameters in the chromosomes could be acquired and the best MAPE could be represented. In this study, we defined 150 generation. MATERIALS AND METHODS Strain L. plantarum C263, originally isolated from diary food were 6166 Afr. J. Biotechnol. Figure 2. Genetic operation. rendered by China Center of Industrial Culture Collection (CCICC) and cultured in MRS broth (Difco). Culture stocks were inoculated into MRS medium at 5% and incubated for 24 h at 37°C. This culture was propagated twice under the same conditions. Fermentation experiment The optimized fermentation medium consisted of (g l -1 ): glucose, 2; peptone, 1; beef extract, 1; yeast extract, 1; Na3C6H5O7 ·2H2O, 0.1; (NH4)2HPO4, 1; K2SO4, 0.2. The seed culture was inoculated at 5% into 250 ml Erlenmeyer flasks containing 150 ml MRS medium cultured at 37°C. For batch cultures in a bioreactor, 150 ml seed culture was added to a KF-5l fermentor (KBT, Korea) with 3 L production medium at 37°C anaerobically. The samples were fetched at intervals and the pH was recorded. The microbial growth was measured by recording the optical density (OD) at 660 nm. The glucose concentration in the fermentation broth was determined by the DNS method (Zhang et al., 1997). Data scaling Because original data may disturb each other during the training process, which could mislead the predicting result, data re-scaling plays an important role to improve the predicting accuracy. Another advantage for data scaling is to avoid numerical difficulties during the calculation. All the data sets are scaled within the range of 0 and 1 in this formation shown as follows: min , max min i N i x x x x x − = − Where, i x is the feature vector which influences the L. plantarum growth in the batch fermentation, ,N i x is the normalized value of the ith process variable after the process of scaling, maxx is the maximized value of the feature vector i x , minx is the minimized value of the feature vector ix and N is the number of experimental observations. RESULTS AND DISCUSSION Variables selection Variables selection is necessary for the v-SVR model with a good prediction. In this paper, we selected the most important features to describe the fermentation process, which included temperature, pH value, glucose concentration and fermentation time. We set the four vectors as the input variables and the cell concentration as output variables. Performance criteria In the statistical prediction, the following three cross- validation methods are often used to examine a prediction model for its effectiveness in practical application; independent dataset test, subsampling (K- fold cross-validation) test and jackknife test. However, among the three cross-validation methods, the jackknife test is deemed the most objective which can always yield a unique result for a given benchmark dataset and thus, has been increasingly used and widely recognized by investigators to examine the performance quality of various predictors (Chou, 2009; Lin et al., 2009; Wu et al., 2010; Xiao et al., 2008a; Xiao et al., 2008b; Xiao et al., 2009a; Xiao et al., 2009b; Xiao et al., 2009c; Xiao et al., 2010). In this work, we used 5-fold cross-validation to examine the performance quality of the two models and chose some statistical metrics such as normalized root mean square error (NRMSE), MAPE and R. Table 1 shows these performance metrics and their calculations. NRMSE, MAPE were used to measure the deviation between the measured and predicted values. The smaller Zhang et al. 6167 Table 1. Performance metrics and their calculations. Metric Calculation MAPE MAPE= 1 100% / × n i i ii a p a n = −∑ NRMSE 2 1 2 1 ( ) NRMSE= n i ii n ii a p a = = −∑ ∑ R 1 2 2 1 1 ( )( ) ( ) ( ) n i ii n n i ii i a a p a R a a p a − − = − − = = − − = − − ∑ ∑ ∑ the values of NRMSE and MAPE were the closer the predicted values were to the measured values. The Pearson correlation coefficient R was adopted to measure the correlation of the experimental and predicted values. Growth models obtained by v-SVR The GA was used for searching the optimal parameter sets when the MAPE was at its minimum. The searching process of optimal parameters was operated with 150 generations in total. The genetic process was recorded. Figure 3 shows that the tendency of the optimal and average fitness of population varied during genetic process. We found that when the generation of population rose to 60, the optimal fitness value (MAPE) became steady. The information of the best parameters were acquired in the generation of 150 and the corresponding optimal ones were C = 235.8935, σ = 8.3608 and ν = 0.7587. The mean error applied to describe the average MAPE in the total chromosome population was 0.068969. After the GA method was applied to search for the optimal parameter of ν-SVR, the predicting model for the batch fermentation with L. plantarum was constructed. The graphical comparison of the measured and predicted value modeling by GA-ν-SVR is shown in Figure 4. Growth models obtained by logistic model The logistic model is an approximation of the microbial growth curve for the batch experiments in this study. Figure 5 showed the relationship between the cell concentration and cultivation time, including both experimental data and predicted values obtained based on the logistic model. Correspondingly, the logistic model parameters maxµ and maxx were estimated as 0.4791 and 0.3498. Performance comparison of SVR and logistic model As shown in Table 2, the values of MAPE and NRMSE generated from GA-ν-SVR model were smaller than those from logistic model, which indicated smaller deviations between the measured and predicted values. Moreover, the Pearson correlation coefficient R was higher. To illustrate the better performance of the SVR model, we compared the measured value with the predicted one. The plots between the measured value and the predicted value are shown in Figures 6 and 7. The predicting simulation was performed with the testing data. From Figures 6 and 7, the proposed GA-ν-SVR model qualified this particular data set very well (Pearson correlation coefficient was 0.9966), while two particular points lacked accuracy in the logistic model (Pearson correlation coefficient was 0.9828). Therefore, the performance of GA-ν-SVR model was better than that of the logistic equation. Conclusions Fermentation process is very complex and it is very difficult to obtain a complete picture of what is actually going on in a particular fermentation. In order to estimate the biomass online, the models presented in this work can be use to predict fermentation process of microbes. In this paper, a relatively easy strategy was given based on the investigation of ν-SVR parameters, which determined the value of ν beforehand and then, selected other two parameters(C and σ). From the results of MAPE, NRMSE and Pearson correlation coefficient, we concluded that, the approximation ability and generalization of v-SVR model were better than the logistic model for the on-line monitoring of biomass. SVR models could be considered as intracellular metabolic pathways, which are described by four variables (temperature, pH value, glucose concentration and fermentation time), while the logistic model has only one variable to describe the biomass. In conclusion, GA v- 6168 Afr. J. Biotechnol. Figure 3. The fitness alternation during the evolutionary process. Table 2. Comparison of the predicted results from GA-ν-SVR and logistic models. Model R MAPE NRMSE GA-ν-SVR 0.9966 0.0185 0.0209 Logistic 0.9828 0.0375 0.0465 Figure 4. Measured and estimated cell concentration by ν-SVR (○-measured value ▲-predicted value). 0 10 20 30 40 50 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 T he O D 66 0 v al ue o f L . p la nt ar um Fermentation time (h) Zhang et al. 6169 Figure 5. Measured and estimated cell concentration by logistic model (○-measured value ▲- predicted value). Figure 6. The plots of measured cell concentration vs. predicted value by the ν-SVR model. 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 SV R p re di ct ed v al ue ( O D 66 0) The measured value (OD 660 ) 6170 Afr. J. Biotechnol. Figure 7. The plots of the measured cell concentration vs. predicted value by logistic model. SVR has better predicting capacity on simulating bacteria biomass than logistic model and is a promising method for estimating the microbial growth. Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant No. 20976161 and Natural Science Foundation of Ningbo City, China (Grant No. 2009A610149). REFERENCES Belousov AI, Verzakov SA, Frese JV (2004). A flexible classification approach with optimal generalisation performance: support vector machines. Chemom. Intell. Lab. Syst. 64: 15–25. Bessaou M, Siarry P (2001). A genetic algorithm with real-value coding to optimize multimodal continuous functions. Struct. Multidisc. Optim. 23: 63–74. Borin A, Ferrao MF, Mello C, Maretto DA, Poppi RJ (2006). Least- squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk. Anal. Chim. Acta. 579: 25–32. Browne MW (2000). Cross-validation methods. J. Math. Psychol. 44: 108-132. Cai CZ, W ang W L, Sun LZ, Chen YZ (2003). Protein function classification via support vector machine approach. Math. Biosci. 185: 111-122. Chalimourda A, Schölkopf B, Smola AJ (2004). Experimentally optimal ν in support vector regression for different noise models and parameter settings. Neural Netw. 17: 127-141. Cherkassky V, Ma YQ (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 17: 113-126. Cho KC (2009). Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Current Proteomics 6, 262-274. Desai K, Badhe Y, Tambe SS, Kulkarni BD (2006). Soft-sensor development for fed-batch bioreactors using support vector regression. Biochem. Eng. J. 27: 225–239. Fujikawa H, Kai A, Morozumi SA (2004). New logistic model for Escherichia coli growth at constant and dynamic temperatures. Food Microbiol. 21: 501–509. Holland JH (1975). Adaptation in Natural and Artificial systems, University of Michigan Press. Kiviharju K, Salonen K, Leisola M, Eerikainen T (2006). Modeling and simulation of Streptomyces peucetius var. caesius N47 cultivation and ε-rhodomycinone production with kinetic equations and neural networks. J. Biotechnol. 126: 365–373 Li YF, W ang ZF, Yuan JQ (2006a). On-line fault detection using SVM- based dynamic MPLS for batch process. Chinese J. Chem. Eng. 14: 754-758. Li YF, Yuan JQ (2006b). Prediction of key state variables using support vector machines in bioprocess. Chem. Eng. Technol. 29: 313-319. Lin, W .Z., Xiao, X., and Chou, K.C., 2009. GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis. Protein Eng Des Sel 22, 699-705. Liu JZ, W eng LP, Zhang QL, Xu H, Ji LN (2002). A mathematical model for gluconic acid fermentation by Aspergillus niger. Biochem. Eng. J. 14: 137-141. Moreira GA, Micheloud GA, Beccaria AJ, Goicoechea HC (2007). Optimization of the Bacillus thuringiensis var. kurstaki HD-1 δ- endotoxins production by using experimental mixture design and artificial neural networks. Biochem. Eng. J. 35: 48–55. Mu Y, W ang G, Yu HQ (2006). Kinetic modeling of batch hydrogen production process by mixed anaerobic cultures. Bioresource. Technol. 97: 1302–1307. Peleg M, Corradini MG, Normand MD (2007). The logistic (Verhulst) model for sigmoid microbial growth curves revisited. Food Res. Int. 40: 808–818. Potocnik P, Grabec I (1999). Empirical modeling of antibiotic fermentation process using neural networks and genetic algorithms. Math. Comput. Simul. 49: 363-379. Samanta B, Al-Balushi KR, Al-Araimi SA (2003). Artificial neural networks and support vector machines with genetic algorithm for bearing fault detection. Eng. Appl. Artif. Intell. 16: 657–665. Schepers AW , Thibault J, Lacroix C (2000). Comparison of simple neural networks and nonlinear regression models for descriptive modeling of Lactobacillus helveticus growth in pH-controlled batch cultures. Enzyme. Microb. Tech. 26: 431-445. Schölkopf B, Smola AJ, W illiamson PC, Bartlett PL (2000). New Support Vector Algorithms. Neural Comput. 12: 1207-1245. Ustun B, Melssen W J, Oudenhuijzen M, Buydens LMC (2005). Determination of optimal support vector regression parameters by genetic algorithms and simplex optimization. Anal. Chim. Acta. 544: 292–305. Vapnik V (1995). The Nature of Statistical Learning Theory, Springer, New York. W ang JL, Yu T, Jin CY (2006). On-line estimation of biomass in fermentation process using support vector machine. Chinese J. Chem. Eng. 14: 383-388. Wu ZC, Xiao X, Chou KC (2010). 2D-MH: A web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids. J. Theoret. Bio. 267(1): 29-34. Xiao X, Lin W Z (2009c). Application of protein grey incidence degree measure to predict protein quaternary structural types. Amino Acids. 37(4) :741-749. Xiao X, Lin W Z, Chou, KC (2008a). Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes. J. Comput. Chem. 29, 2018-2024. Zhang et al. 6171 Xiao X, W ang P, Chou KC ( 2008b). Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. J. Theor. Biol. 254, 691-696. Xiao X, W ang P, Chou KC (2009a). GPCR-CA: A cellular automaton image approach for predicting G-protein-coupled receptor functional classes. J. Comput. Chem. 30, 1414-1423. Xiao X, W ang P, Chou, KC (2010). Quat-2L: a web-server for predicting protein quaternary structural attributes. Molecular Diversity, DOI 10.1007/s11030-010-9227-8. Xiao X, W ang P, Chou KC (2009b). Predicting the quaternary structure attribute of a protein by hybridizing functional domain composition and pseudo amino acid composition. J. Applied Crystallography, 42: 169-173. Zhang LX, Zhang TF, Li LY (1997). Biochemical experimental method and technology. Higher Education Press, Beijing.