key: cord-0057683-xxefhhl6 authors: Nayak, Archana M.; Chaubey, Nirbhay title: Predicting Passenger Flow in BTS and MTS Using Hybrid Stacked Auto-encoder and Softmax Regression date: 2020-06-08 journal: Computing Science, Communication and Security DOI: 10.1007/978-981-15-6648-6_3 sha: bcfa65c51023e1e7873ca7600f288c4733abde09 doc_id: 57683 cord_uid: xxefhhl6 In recent era, the deep learning techniques are effectively applied and achieved an amazing result in numerous fields. Meanwhile, for the past few years the transportation industry also gets modernized due to the influence of big data. With these two trending topics, the traditional issue found in transportation industry while predicting the passenger flow is again taken into consideration in this method for solving the issues in passenger flow forecasting. In this method, the passenger flow prediction for both Bus Transit System (BTS) and Metro Transit System (MTS) mode of transportation is carried out. The gathered passenger details is clustered by dynamic clustering as summer, winter, weekend, weekdays and public holidays. Initial cluster centroid selection is enhanced by Tabu search algorithm, which furthermore improves the performance of dynamic clustering algorithm. Following this clustering, the stacked auto-encoder (SAE) with softmax regression (SR) classifier is introduced for prediction purpose. Finally, the Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) of the Cluster-SAE-DNN (Proposed) method is compared with SAE-DNN based prediction approach. The implementation for this prediction process is carried out in Matlab. Final results illustrate that this proposed method provide high prediction result with less error rate than SAE-DNN. Our paper is structured as: some literature works are reviewed in Sect. 2; the discussion for the proposed work is presented in Sect. 3, finally the outcomes of this prediction process is discussed in Sect. 4 and the entire work is concluded in Sect. 5. Yu et al. 2018 [20] , developed a novel prediction method by investigating the connection between the traveller stream of a station and its surrounding region's factors. Initially, the city was separated into various regions to accurately identify the factors that were affecting the passenger flow. Subsequently, the fuzzy processing and membership degree concept were introduced to solve the issues produced by the fuzziness of the bus stops attraction scope. Finally, based on Xgboost the prediction method was launched for passenger flow. The most essential component of ITS, was passenger flow forecasting. To enhance the forecasting accuracy Li et al. 2018 [21] , combined both symbolic regression and ARIMA model. The complexity patterns that were obtained beneath the data structure was captured after obtaining the unique strength from this every single model. An increase in prediction step increases the superiority of this method. Hu et al. 2019 [22] , introduced a re-sample Recurrent Neural Network (RRNN) model to forecast the traffic flow of passenger on MRT system. RNN approach was introduced to develop a method to predict the passenger traffic, here the forecast phase was changed as classification task. But, the training dataset is ended up in imbalanced manner, so RRNN was introduced to overcome this dataset imbalance problem. Lijuan and Rung 2017 [23] have developed a (passenger flow on hours) prediction model along with the aid of deep learning models. They have included some temporal features like days in a week, the hour in a day, and so on. Further, the features have both inbound and outbound. Those features were combined, and they have also trained as varied 'Stacked Auto-Encoders (SAE)' in the initial stage. Next, to this, a pre-trained SAE was utilized to prime the supervised Deep Neural Network (DNN) along with the passenger flow, which was considered as the data label for second phase (stage). The major objective of this proposed (Cluster-SAE-DNN) approach is to analyse diverse approaches to execute this passenger flow estimation for Surat city data. With this passenger details, the passenger flow for summer, winter, public holidays, weekend, and for week days are predicted by this proposed method. The historical data for this method is taken from Surat city dataset. The passenger flow for both BTS and MTS are predicted by this method. In this paper we develop an intelligent passenger flow estimation for public transport system. Initially, we collect the passenger amount from the historical data. The proposed clustering algorithm follows Tabu search algorithm to enhance the clustering performance through the selection of initial cluster centroid. The process of selecting the initial centroid for clustering gets affected therefore the solution may easily get trapped in local optimum. The flow diagram of proposed (Cluster-SAE-DNN) methodology is depicted in Fig. 1 . Finally, SAE with a softmax regression-based model is developed for passenger flow prediction. Recently, the SAE technique is often included as a modelling process as it exhibits some advanced features like nonlinearity, capability of mapping arbitrary function, and flexibility. Moreover, it also affect the difficult non-linear issues without having any former knowledge about the relationships among the output and input variables. Assume we are given a set of values, x = x 1 , x 2 , x 3 , . . . , x n . The major role of this clustering technique is to partition the given set of data into m disjoint subsets (clusters). Generally, this clustering process is accomplished by determining the squared Euclidean distance that exist between the data point x i and the subset c k , centroid M k (cluster centre) which contains, x i . The following procedures are followed to minimize the problems that occur while clustering the M clusters. Initiate the process with one cluster (k = 1) and determine its optimal position that matches the centroid of the dataset, X . Therefore, to solve the issue with second cluster,(k = 2), perform N executions within the dynamic algorithm from the succeeding initial cluster centre positions. Normally, the first cluster center for (k = 1) is placed at optimal location, however the second cluster having n executions are placed at the info point position i.e. X n (n = 1, 2, . . . , N ). Finally, the ordinary solution that obtained after performing N executions with this dynamic clustering algorithm is taken into account to obtain a solution for the clustering problem of (k = 2). Calculate the Euclidean distance using the given Eq. (1) In the dynamic clustering the initial values are as the passenger flow. Based on the distance value of each cluster assignment and centroid calculations are done. The step is repeated until satisfactory results are obtained. Many authors proposed the Tabu search algorithm using new methods to overcome the existing problems. To make the algorithm very efficient we define a precise possible solution to the problem. Given a set Z having a feasible solutions S and a function F which is assigned to each in Z a real values f (s). It is essential to determine a solution S * in Z for which F(S * ) is minimum. The value of S * cannot be obtained in a single step and so iterative process is applied to get the solution naturally. In the iterative procedure it move from a position S to the new position S and repeat it until satisfied result or the region obtained. The move to the next location is based on the best search. Let us consider p = p ⊕ q with the meaning p is obtained by introducing a modification q into solution p. Define q as the set of acceptable modification which are acceptable at solution p. The neighborhood calculation is done by using the formula If we are in the current node C and then we need to move to the next node which is better than the current node (i.e. n ← best(neighbour)). If it is best than the current node then move to n (i.e. c ← n). Some Tabu conditions or the least move which is better to be identified and some of the moves are not allowed because of Tabu. In this algorithm the aspiration criteria has been identified and it represents that if the disallowed move is best then move to that position. Tabu search algorithm is used to enhance the cluster centroid. Two auto-encoders are stacked to form SAE [24] . With this SAE the feature extraction process is carried out for passenger flow prediction. For clarification here a single layer auto-encoder network is explained. It contains encoder and decoder. An input x i is mapped by this encoder to its hidden representation, h i . The non-linear mapping function is applied in this method and its common form is represented as: Where, the encoding weight is represented as W 1 , similarly the bias vector is represented as b 1 . From the hidden representation, h i the input x i is recovered by decoder. The formula for transformation function is expressed in Eq. (4), The decoding bias vector and weight are represented as, W 2 , b 2 . The reconstruction error is minimized by this auto-encoder model to learn the useful hidden representation. Thus, for N training samples, the subsequent optimization problem is applied to resolve the parameters, W 1 , W 2 , b 1 , and b 2 , The supervised learning approach that is included within the DNN as a output layer is the softmax regression (SR) model. This type of supervised learning are widely applied to Fig. 2 . Input: clustered data (month, week, weekend, holidays, seasons, and normal days), a 1 -input of first neuron, b 1 -label for a 1 . Output: (predicted passenger flow) (Fig. 3) . It is a supervised learning approach, therefore it requires both input a and label b [25] . A training set for m sample is represented as {(a 1 , b 1 ), (a 2 , b 2 ), . . . , (a m , b m ., k}, after that determine the probability p(b = j|a ) for each value of j = 1, 2, .., k. The formula applied to evaluate h θ (a) is given in Eq. (6), . . . Where, the model parameters are represented as θ 1 , θ 2 , . . . , θ k ∈ R n+1 , then the distribution is normalized by 1 k i=1 e θ T j a i , so that this distribution is then sum to 1. In terms of log likelihood the cost function for Softmax regression is expressed as, Another form of cost function is Then the weight decay term is added in this Eq. (9), then this model becomes more robust to various input. After adding weight decay the cost function is expressed as The main aim of this SR is passenger flow prediction with less complexity but with high accuracy. By this method, the passenger flow is accurately predicted for BTS and MTS during holidays, weekends, normal days, summer, winter, one month, and for one year. This prediction is very much useful for transportation industry. In this method, the passenger flow for one year is predicted. This passenger flow prediction is performed for both BTS and MTS passengers. The passenger detail for this prediction process is taken from Surat city dataset. The passenger details are taken for one year (June 2017 to June 2018) for passenger flow prediction in both BTS and MTS transportation. With this details the passenger flow during summer, winter, weekend, weekdays, and during public holidays are predicted. Two performance metrics RMSE and MAPE are evaluated in this method to illustrate the effectiveness of this proposed (Cluster-SAE-DNN) prediction approach. MAPE determines the prediction accuracy of this proposed forecasting technique. The implementation for this prediction process is carried out in Matlab environment. The equation for RMSE and MAPE are given in Eqs. (11 and 12) respectively. Where, the number of actual passengers is represented as xr i , similarly the predicted passenger is represented as xp i . These two metrics are evaluated for both BTS and MTS passenger to depict the effectiveness of this proposed (Cluster-SAE-DNN) technique. d) e) f) Fig. 4 . Actual and predicted passenger flow for BTS (a. Weekend, b. Public holidays, c. Normal days, d. summer, e. winter, and f. One month), X axis -number of days, Y axis -Passenger availability In Fig. 4(a) , the real and predicted passenger flow in BTS for weekend (Saturday and Sunday). Similarly, the passenger flow prediction for summer, weekdays (normal days), winter, one month, and one year is also attained. The actual and predicted passengers during summer and winter are given in Fig. 4(d & e) , for holidays and normal days are depicted in Fig. 4(b & c) . Moreover, the actual and predicted values for one month and one year is illustrated in Fig. 4(f & g) respectively. Similarly, the MTS passenger availability for summer, weekdays (normal days), winter, one month, and one year is also attained. summer, e. winter, and f. One month), X axis -number of days, Y axis -Passenger availability Figure 5 represents the cluster formation for BTS and MTS using the historical dataset. The centroid is calculated by using the dynamic clustering algorithm and it is enhanced by Tabu search algorithm. The historical BTS passenger flow data obtained from the dataset is clustered for one month, one year, weekend, weekdays, summer, winter, and public holidays. The formed clusters are shown in Fig. 5 (a, b, c, d , e, f, and g). In similar way, the clustered data is attained by tabu search based clustering process for MTS passengers. The passenger flow for MTS is predicted by this deep learning technique. The error values that are obtained during prediction process of BTS and MTS passenger for both proposed (Cluster-SAE-DNN) and existing (SAE-DNN) is tabulated in Table 1 . An increase in accuracy minimize the presence of error in prediction process. The accuracy of BTS and MTS for proposed method is found higher than existing approach [20] , which depicts that the prediction error of proposed (Cluster-SAE-DNN) approach is much lower than existing method. Both the actual and predicted passenger flow in summer and winter seasons for BTS passengers is shown in Fig. 6(a) . The actual and predicted passenger flow (one year (i.e. June 2017 to June 2018)) for both BTS transportation is shown in Fig. 6(b) , which depicts that during the month of August 2017 more number of passenger flow is predicted The actual and predicted passenger flow for MTS transportation is shown in Fig. 7(a, b, & c) . The amount of passengers predicted for MTS during summer (March, April, and May) and winter (December, January, and February) season is shown in Fig. 7(a) , which depicts that during the month of January 2018 more number of passenger flow is predicted by this method. Then, the predicted and actual amount of passenger availability for the month of January 2018, is shown in Fig. 7(c) . BTS experience high passenger flow in summer season, however the MTS experience high passenger flow in winter season. Similarly, the actual and predicted amount of passengers for one year (from June 2017 to May 2018) is shown in Fig. 7(b) . The accuracy value attained by proposed (Cluster-SAE-DNN) BTS passenger flow method for weekend, holidays, normal days, summer, winter, 1 month, and 1 year is shown in Fig. 8(a) . From this result, it is clear that the accuracy attained by BTS transportation for summer season is found highest than others. Similarly, the accuracy value attained by MTS transportation for weekend, holidays, normal days, summer, winter, 1 month, and 1 year is shown in Fig. 8(b) . The highest accuracy attained by MTS transportation for passenger flow prediction is during public holidays. This prediction is performed for one year, most particularly for winter, summer, weekend, weekdays, and public holidays. This prediction methods are very much useful for transportation field. With this details, an effective scheduling is performed by operators to satisfy the passenger demand. Due to this the economic condition of transportation industry is also gets improved. Passenger flow prediction are gaining a huge demand in recent days. Accurate prediction of passenger flow has major implication in real-time bus scheduling, moreover it is also found essential to improve the reliability of bus service. The proposed method accurately predict the passenger flow for both BTS and MTS with available dataset. In this method, the passenger flow is predicted for one year. From the performance result, it is ensured that with this hybrid model the 22 passenger flow prediction is accomplished robustly and accurately than the other single prediction models. This hybrid prediction models effectively utilize the historical information to accurately provide the prediction results. Even though it attains higher accuracy, but the complexity of this prediction process is found high. Therefore, in future we will introduce an optimization based learning approaches to avoid this system complexity. In this approach, our main consideration is passenger flow detection, but an increase in passenger count causes crowding due to this the passenger comfort gets reduced. Moreover, the bus needs to be scheduled for each route based on the passenger availability or else the improper bus scheduling may lead to the wastage of transport resources. By keeping this into consideration, the bus scheduling for two different routes on the basis of passenger availability using deep learning will be implemented as a future work. Using people flow technologies with public transport Effective passenger flow forecasting using STL and ESN based on two improvement strategies Predicting short-term bus passenger demand using a pattern hybrid approach Forecasting bus passenger flows by using a clusteringbased support vector regression approach A multi-pattern deep fusion model for short-term bus passenger flow forecasting A new approach to the prediction of passenger flow in a transit system Multi-output bus travel time prediction with convolutional LSTM neural network Big data for social transportation Prediction of bus travel time using ANN: a case study in Delhi Modeling passenger flow distribution based on disaggregate model for urban rail transit Passenger flow forecast of sanya airport based on ARIMA model Predicting passenger flow using different influence factors for Taipei MRT system A real-time passenger flow estimation and prediction method for urban bus transit systems DeepPF: a deep learning based architecture for metro passenger flow prediction A novel bus-dispatching model based on passenger flow and arrival time prediction The time series forecasting: from the aspect of network IIGPTS: IoT-based framework for intelligent green public transportation system Security analysis of vehicular ad hoc networks (VANETs): a comprehensive study In: IoT and Cloud Computing Advancements in Vehicular Ad-Hoc Networks Passenger flow prediction for new line using region dividing and fuzzy boundary processing Short-to-medium term passenger flow forecasting for metro stations using a hybrid model Mass rapid transit system passenger traffic forecast using a re-sample recurrent neural network A novel passenger flow prediction model using deep learning methods Deep auto-encoder based clustering Bearing fault diagnosis method based on stacked autoencoder and softmax regression