Data mining with various optimization methods Data mining with various optimization methods Vladimir Nedic a, Slobodan Cvetanovic b, Danijela Despotovic c, Milan Despotovic d,⇑, Sasa Babic e a Faculty of Phil. and Arts, University of Kragujevac, Jovana Cvijica bb, 34000 Kragujevac, Serbia b Faculty of Economics, University of Nis, Trg kralja Aleksandra Ujedinitelja 11, 18000 Nis, Serbia c Faculty of Economics, University of Kragujevac, Djure Pucara Starog 3, 34000 Kragujevac, Serbia d Faculty of Engineering, University of Kragujevac, Sestre Janjic 6, 34000 Kragujevac, Serbia e College of Applied Mechanical Engineering, Trstenik, Serbia a r t i c l e i n f o Keywords: Traffic noise Artificial intelligence Genetic algorithm Hooke and Jeeves Simulated annealing Particle swarm optimization Software a b s t r a c t Road traffic represents the main source of noise in urban environments that is proven to significantly affect human mental and physical health and labour productivity. Thus, in order to control noise sound level in urban areas, it is very important to develop methods for modelling the road traffic noise. As observed in the literature, the models that deal with this issue are mainly based on regression analysis, while other approaches are very rare. In this paper a novel approach for modelling traffic noise that is based on optimization is presented. Four optimization techniques were used in simulation in this work: genetic algorithms, Hooke and Jeeves algorithm, simulated annealing and particle swarm optimization. Two different scenarios are presented in this paper. In the first scenario the optimization methods use the whole measurement dataset to find the most suitable parameters, whereas in the second scenario optimized parameters were found using only some of the measurement data, while the rest of the data was used to evaluate the predictive capabilities of the model. The goodness of the model is evaluated by the coefficient of determination and other statistical parameters, and results show agreement of high extent between measured data and calculated values in both scenarios. In addition, the model was com- pared with classical statistical model, and superior capabilities of proposed model were demonstrated. The simulations were done using the originally developed user friendly software package. � 2013 Elsevier Ltd. All rights reserved. 1. Introduction Road traffic noise along with the noise coming from railways and industries represents very important factor regarding environ- mental pollution in urban areas. The influence of traffic noise on human health has been studied on numerous occasions in recent years (Brink, 2011; Fyhri & Klboe, 2009; Pirrera, De Valck, & Cluydts, 2010) resulting that this kind of annoyance significantly affects both mental and physical health in many ways: causing anxiety, stress, hearing impediments, sleep disturbance, cardiovas- cular problems, etc. Thus, in order to control noise sound level in urban areas, it is very important to develop methods for prediction of the traffic noise. Due to the rapid development of means of transportation and road traffic, the influence of the traffic flow structure on the level of traffic noise is an important area of research. Through the monitoring of basic flow parameters and their trends it is possible to predict and monitor noise that appears in the certain part of the transport network. In this way, the effect of noise reduction can be achieved through different modes of traffic management, which is particularly important for human health and environmental improvement. The first traffic noise prediction (TNP) models date back to early 1950s. Since then large number of methods and models for traffic noise prediction has been developed. The critical reviews of the most used ones are given in Steele (2001) and Quartieri et al. (2009). Most of the TNP models that are presented in literature are based on linear regression analysis. The main limit of those models, as concluded in Quartieri et al. (2009) and Guarnaccia, Lenza, Mastorakis, and Quartieri (2011), is ‘‘that they do not take into account the intrinsic random nature of traffic flow, in the sense that they do not take care of how vehicles really run, consid- ering only how many they are’’. More advanced models involve artificial neural networks (ANN) (Cammarata, Cavalieri, & Fichera, 1995; Givargis & Karimi, 2010) and genetic algorithms (Gndogdu, Gkdad, & Yksel, 2005; Rahmani, Mousavi, & Kamali, 2011). ANN model that was used in Cammarata et al. (1995) has 3 inputs: equivalent number of vehicles, which was obtained by adding to the number of cars number of motorcycles multiplied by 3 and number of trucks multiplied by 6, the average height of the build- ings on the sides of the road, and the width of the road. In order to increase the number of inputs authors decomposed equivalent number of vehicles into the number of cars, the number of 0957-4174/$ - see front matter � 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.12.025 ⇑ Corresponding author. Tel.: +381 69 844 9679. E-mail address: mdespotovic@kg.ac.rs (M. Despotovic). Expert Systems with Applications 41 (2014) 3993–3999 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2013.12.025&domain=pdf http://dx.doi.org/10.1016/j.eswa.2013.12.025 mailto:mdespotovic@kg.ac.rs http://dx.doi.org/10.1016/j.eswa.2013.12.025 http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa motorcycles, and the number of trucks, and got the ANN model with 5 inputs. In terms of the parameters involved in the CoRTN (Calculation of Road Traffic Noise) model (Quartieri et al., 2009), which was initially developed in 1975 by the Transport and Road Research Laboratory and the Department of Transport of the Uni- ted Kingdom, the ANN model that was used in Givargis and Karimi (2010) has 5 input variables: the total hourly traffic flow, the percentage of heavy vehicles, the hourly mean traffic speed, the gradient of the road, and the angle of view. Authors tested the developed model on the data collected on Tehran’s roads, and found no significant differences between the outputs of the developed ANN and the calibrated CoRTN model. In Gndogdu et al. (2005) genetic algorithm was used to model the traffic noise in relation to traffic composition (vehicle per hour), the road gradi- ent and the ratio of building height to the road width. In Rahmani et al. (2011) the proposed model is a function of total equivalent traffic flow and equivalent traffic speed. In both papers the authors used MATLAB to find the optimized values of model parameters. In this paper an application of four optimization techniques for the prediction of traffic noise is presented. These techniques are: genetic algorithms, Hooke and Jeeves algorithm, simulated anneal- ing, and particle swarm optimization. The model that is proposed consists of five variables: the number of light motor vehicles, the number of medium trucks, the number of heavy trucks, the num- ber of buses and the average traffic flow speed. All optimized mod- els are tested on data measured on Serbian road using the originally developed user friendly software package. 2. Problem formulation The most suitable measure for depicting traffic noise emission is equivalent sound pressure level ðLeqÞ, which is expressed in units of dbA and corresponds to fictitious noise source emitting steady noise, which in specific period of time contains the same acoustic energy as the observed source with fluctuating noise. For a number of discrete measurements ðNÞ; Leq for time period T is expressed by following equation: Leq ¼ 10log10 1=T XN i¼1 10 Li 10 ! ð1Þ where Li is sound pressure level, which corresponds to i th measurement. In order to reduce the noise it is necessary to know functional relationship between the equivalent sound pressure level and influential parameters. Leq is correlated to numerous parameters, such as numbers and types of vehicles, their velocities, type of road surface, width and slope of the road, height of buildings facing the road, etc. As mentioned in the introduction, in this paper the following variables were considered: the number of light motor vehicles (LMV), the number of medium trucks (STV), the number of heavy trucks (TTV), the number of buses (BUS) and the average traffic flow speed (Vavg). A brief description of how these variables were measured is given in the following chapter. 3. Data sampling For traffic data measurement and for noise measurement on the road M5, automatic traffic counters QLTC-10C and sound level me- ter Bruel&Kajer type 2230 class 1 respectively were used. The equivalent sound pressure levels were measured for time period of 15 min. In order to include greater number of scenarios that might occur in urban environments, a total of 124 measurements of equivalent noise levels for time periods of 15 min were carried out. Measurements of Leq for time period of 15 min were performed at various times to include diversity of the traffic flow as much as possible. Simultaneously, variations in traffic flow, traffic speed and composition of traffic flow were measured. For that reasons the surveys at the same time also consist of the following param- eters: the number of light motor vehicles, the number of medium trucks, the number of heavy trucks, the number of buses, and the average traffic speed in the given time periods. Measurements were taken in accordance with recommenda- tions for road traffic noise measurement; microphone was mounted away from reflecting facades, at a height of 1.2 m above the ground level and 7.5 m away from central line of the road. Dur- ing the measurements it has been taken care that climate condi- tions are as similar as possible (no wind, no rain) in order to eliminate their influence. 4. Mathematical model and methods The equivalent sound pressure level is supposed to be modeled by the following equation: Leq ¼ N1 � log10ðLMVÞþ N2 � log10ðSTVÞþ N3 � log10ðTTVÞ þ N4 � log10ðBUSÞþ N5 � Vavg N6 þ N7 � log10ðVavgÞ ð2Þ where Niði ¼ 1 � 7Þ are coefficients. The problem transforms to find coefficients Ni , such that supposed model best fits experimental data. For that purpose genetic algorithms, Hooke and Jeeves algorithm, simulated annealing, and particle swarm optimization are used. These techniques are briefly described in following subchapters. 4.1. Genetic algorithms Genetic algorithms (Rao, 1996) are class of evolutionary algo- rithms that could be used for a large number of different applica- tion areas. The principle of genetic algorithms is based on Darwin’s theory of evolution, by which the fittest individuals have the best chances to survive. Genetic algorithms operate with a set of individuals (chromosomes) called population. The information Fig. 1. Flowchart of the Genetic algorithm workflow. 3994 V. Nedic et al. / Expert Systems with Applications 41 (2014) 3993–3999 https://isiarticles.com/article/22312