key: cord-0114628-kl7c01o6 authors: Kim, Dong-Keon; Shyn, Sung Kuk; Kim, Donghee; Jang, Seungwoo; Kim, Kwangsu title: A Daily Tourism Demand Prediction Framework Based on Multi-head Attention CNN: The Case of The Foreign Entrant in South Korea date: 2021-12-01 journal: nan DOI: nan sha: 80fe5759b79fece5be7e8b62ae5cd995b276d2f7 doc_id: 114628 cord_uid: kl7c01o6 Developing an accurate tourism forecasting model is essential for making desirable policy decisions for tourism management. Early studies on tourism management focus on discovering external factors related to tourism demand. Recent studies utilize deep learning in demand forecasting along with these external factors. They mainly use recursive neural network models such as LSTM and RNN for their frameworks. However, these models are not suitable for use in forecasting tourism demand. This is because tourism demand is strongly affected by changes in various external factors, and recursive neural network models have limitations in handling these multivariate inputs. We propose a multi-head attention CNN model (MHAC) for addressing these limitations. The MHAC uses 1D-convolutional neural network to analyze temporal patterns and the attention mechanism to reflect correlations between input variables. This model makes it possible to extract spatiotemporal characteristics from time-series data of various variables. We apply our forecasting framework to predict inbound tourist changes in South Korea by considering external factors such as politics, disease, season, and attraction of Korean culture. The performance results of extensive experiments show that our method outperforms other deep-learning-based prediction frameworks in South Korea tourism forecasting. As exchanges between countries increase, the tourism industry in each country becomes more important. For developing the tourism industry, tourism demand is essential in establishing tourism policy, business plan, and strategy revision. Therefore, tourism management needs to discover external factors related to tourism demand and design an accurate prediction model. Unlike earlier studies that mainly used regression models, recent tourism demand forecasting studies [1] , [2] utilize sequential deep learning models such as RNN (Recurrent Neural Network) [3] and LSTM (Long Short Term Memory) [4] in their prediction framework. And the latest studies [5] , [6] of tourism demand forecasting mainly design a framework based on this sequential neural network model. These predictive models show better accuracy than regression models. Recurrent-based networks (RNN, LSTM) are mainly used in tourism demand forecasting due to their excellent performance, but have the following limitations. First, the recurrent network models are structured to extract temporal features of a single variable, while tourism demand variables are affected by other external factors (e.g., the number of tourists entering a country is influenced by oil prices.) [2] . So, the recurrent models are difficult to interpret variable-wise correlation, which leads to the limitation of forecasting performance. In addition, RNN and LSTM models have an autoregressive structure in which prediction values are put back as input. This structure has poor long-term prediction accuracy when the data has a nonlinear trend, whereas the long-term forecasting of tourism demand plays a vital role from a practical point of view. Therefore, autoregressive-based prediction models are not suitable for practical forecasting. We present a Multi-Head Attention Convolutional neural network (MHAC) model for forecasting tourism demand to address the problems mentioned above. The proposed model receives historical multivariate time-series information and outputs a sequence of how the interest variable will change in the future. With multivariate time series as inputs, separated CNN (Convolutional Neural Network) layers independently interpret temporal patterns of the individual variable. Also, the model employs an attention module for discovering correlations between multiple variables. Finally, the tourism demand prediction sequence is output at once through attention content and temporal feature. The proposed method is designed to predict the demand for foreign entrants in South Korea. To the best of our knowledge, there are no deep-learning-based prediction framework studies suitable for South Korea's tourism data. We design a deep learning model that predicts tourism demand in South Korea using a multivariate time series. As in other tourism management studies [7] , we consider several extrinsic factors related to the inbound tourist of South Korea. Then, the proposed MHAC model predicts the tourist trend by reflecting the considered variables and the historical tourist trend data as inputs. As a result of the prediction experiment, our forecasting framework shows the high accuracy of demand forecasting for tourists visiting South Korea. Especially, it shows good forecasting performance even in extreme situations (e.g., COVID-19 pandemic). Our contributions in this paper are as follows: • A novel time-series forecasting framework for tourism management is proposed. This framework has a structure that receives several extrinsic variables as input and outputs future sequences of the tourism demand variable. • We introduce a multi-head attention CNN model (MHAC) that receives multivariable time-series inputs, extracts temporal characteristics, and interprets correlations between variables. • We utilize the proposed framework to predict the demand for foreign inbound tourists in South Korea. We explore various external factors related to South Korea's tourism demand and reflect them in the forecasting model. In early studies of tourism demand forecasting, research is mainly focused on predictions in specific regions and specific tourism industry sectors [8] - [13] . Most of these studies use regression models, mainly used for time-series prediction, or classical machine learning techniques such as Support Vector Machine (SVM) [14] [16] . Existing studies of tourism demand forecasting mainly focus on the discovering data from specific tourism industries of each region or country rather than designing a specific forecasting model. The regression models used as prediction models in previous studies show good performance in predicting tourism demand in particular regions. However, there are some limitations to these existing studies. First, utilized regression models such as ARMA, ARIMA, and classical machine learning models such as SVM commonly show poor forecasting performance with non-stationary time-series data. In addition, these studies constructed a prediction framework using only a single variable such as the past entrants data. There is a limitation in that various external factors affecting the tourism industry are not considered. Beyond the study of designing a prediction framework using single time-series data, several recent studies have proposed a multivariate prediction model using deep learning. The deep learning techniques of the recurrent neural network series such as LSTM [4] have received attention as a tourism demand prediction model. Zhang et al. use historical hotel guest data, and Baidu index data to predict the trend of hotel guests in Hunan, China and designs an LSTM-based predictive model that can reflect these multiple variables [1] . Kulshrestha et al. design a prediction framework based on the Bidirectional LSTM model to predict monthly Macau visitors [5] . In a recent study, an LSTM model with an attention module [17] added is devised to consider the correlation between various input variables that influence tourism demand [2] . As such, recent studies attempt to analyze various timeseries patterns by designing more advanced recursive models. They consider not only historical data of the variable to be predicted, but also various data related to the target variable (e.g., Google Trends, Baidu Index, etc.). However, frameworks of these studies have a structure to separately predict by constructing a recursive model for each variable. Since this structure cannot interpret correlations between variables, the recurrent-based models are restricted in precisely forecasting tourism demand that is affected by multiple factors. To handle this problem, a model having a structure other than the recursive model is required. Before explaining the forecasting framework, we introduce data to be used in forecasting South Korean tourism demand. Novel variables that haven't been considered in the field of Korean tourism management are introduced. We forecast concrete tourism demand trends for a certain period for a more practical tourism demand forecasting. The proposed prediction framework is a time-series prediction model that receives multivariate time-series data as input and outputs a sequence of a single interest variable. A multivariate time-series input X t for prediction framework is below: where x i ∈ R n indicates n multivariate inputs at time point i. Note that t refers to a specific time point, and m is the input window size. The multivariate input data having n variables which includes past foreign entrant data which is target variable, and n − 1 external variables related thereto. The prediction resultŶ t is as follows. Y t = {ŷ t+1 ,ŷ t+2 , · · · ,ŷ t+k } Here,ŷ i is the single target variable at time point i, which indicates the predicted number of foreigners entering South Korea. k refers to the output window size. A ground-truth value is Y t , which is shaped the same asŶ t . In summary, X t is a n × m shaped multivariate time-series input matrix, and Y t is a 1 × k shaped univariate time-series output vector. This paper uses the foreign entrants data from January 1, 2010, to September 30, 2020, provided by the Korea Tourism Organization. The provided visitor data is aggregated in all provinces in South Korea. This data shows the daily number of foreigners entering South Korea from 21 countries, including China, Japan, and the United States. We define this foreign entrant variable as the main variable, which is a variable to use for both input and prediction. Figure 1 shows the overview of given foreign entrant data. We would like to analyze the characteristics of this foreign entrant data and find related external factors. The time-series decomposition is performed with data on foreign entrants summed up the number of entrants before and after 15 days of a certain day (i.e., 30-day moving sum). The results of timeseries decomposition of preprocessed foreign entrants data are shown in Figure 2 . The overall trend of entrants increased until 2017. As of the first half of 2017, the trend declined and then increased again. It is for the decrease in Chinese tourists, which account for the largest proportion of tourists visiting South Korea. The reason for the decline in Chinese tourists is the Chinese restrict policy against Korean culture, which is related to deteriorated Sino-South Korea Relations. Note that the number of entrants dramatically plummeted since February 2020 when the travel restrictions due to COVID-19 were taken. Like this case, the number of tourists sharply declines during the MERS epidemic (June 2015 to August 2015), which was a short-period epidemic in South Korea. The seasonality graph shows that a larger number of entrants usually comes in fall rather than in spring. The residual graph shows that the residual is very large, unlike other conventional time-series data. In conclusion of data analysis, the overall data of foreign entrants have irregular, inconsistent patterns and is influenced by certain external factors. Extrinsic factors correlated to tourism should be considered as well as main variables for precise prediction. Rather than using external variables such as economy metrics or the oil price [18] , [19] , recent studies utilize external variables directly related to forecasting tourism demand (e.g., Google trends [20] , [21] , seasonal components [22] , climate data [23] , etc.). In our method, external factors influencing South Korea's tourism demand are used as input variables. We select external factors based on the results of the data analysis briefly mentioned above. We consider Politics, Diseases, Seasons, and Attraction as representative external variables that directly affect tourism in South Korea. A sum-mary of the input variables considered is presented in Table I . A description of each variable is explained below. Politics Variable According to the provided foreign entrants data, Chinese tourists visit South Korea the most (About 35%). Therefore, the demand for tourism in South Korea is greatly affected by diplomatic relations between South Korea and China. For instance, the number of foreign arrivals changes significantly because of the deterioration of Sino-South Korea relations since 2017, as mentioned above. We use the variable "Hanhanlyeong," a sanctions policy against Korean culture in China, as an external factor representing the diplomatic and political situation in South Korea [24] . This restriction policy has been in effect since March 2017. The Hanhanlyeong variable is a dummy variable that consists of 0 and 1. The time-series value of this variable is set to 1 from March 1, 2017, to September 30, 2020 (Dataset Endpoint), when these sanctions policy against South Korea is in effect, and 0 for other periods. Diseases Variable During the epidemic period, travel and exchanges are restricted, so the number of people entering or leaving South Korea drops sharply. From June 2015, the period when the epidemic MERS was prevalent in South Korea, to August of the same year, and from February 2020, the period when the COVID-19 pandemic worldwide, the number of foreign entrants has sharply decreased. Based on these observations, we create dummy variables that reflect the duration of the epidemic outbreak. The variable is set to 1 during the MERS outbreak period (June 1, 2015, to August 31, 2015) and the COVID-19 epidemic (February 1, 2020, to September 30, 2020), and the variable is set to 0 for other periods. Seasonal Variable As the results of the time-series decomposition in Figure 2 , entrant data has seasonality. Thus, we use seasonal dummy variables to utilize the seasonality of the data in the predictive model. The seasonal variable is a dummy variable time-series with a total of 4 channels reflecting spring, summer, autumn, and winter. The variable is set to 1 for each season and 0 for other periods. In several existing studies, Google Trends and Baidu Index related to each region are used in the tourism demand forecasting framework [21] , [25] , [26] . It can be interpreted that searching for a search word related to tourism on a portal site indicates the degree of tourism interest in the region or country. We also consider Google Trends, which shows the trend of search volume for keywords that are highly related to Korean tourism. We choose 'Seoul Hotel', 'Korea Tour', 'Incheon Airport', and 'Myeongdong' as keywords related to Korean tourism, which represent accommodation, tourism, aviation, and attractions each [27] . The Google Trends represents the relative trend of the search volume for a specific search word over a certain period of time as a real value between 0 and 100. Our work uses Google Trends data from January 1, 2017, to September 30, 2020, which is the entire period of foreign entrant data. We would like to design a novel neural network model that is suitable for forecasting tourism demand. In this paper, a multi-head attention model with convolutional neural network layer (MHAC) is proposed. The overall structure of the predictive model MHAC is shown in Figure 3 . We introduce a multi-head neural network structure to extract features variable-wise. As described in Section III, the input variables have different data structures (numeric type, dummy type), and the temporal features of each variable are very diverse. Putting whole variables into a shared single neural network is unsuitable for handling multiple variables with different time-series characteristics. The proposed multihead structure extracts the temporal features of individual variables with a parallel neural network layer. This structure has the advantage of extracting features for each variable by independently tuning the hyper-parameters for each head layer. Since there are a total of 5 types of variables, we design a forecasting model with a 5-multi-head structure. Several studies [28] - [31] have proposed CNN-based models to process sequential data of signal processing, time-series classification, speech recognition, etc. These previous studies demonstrate a one-dimensional CNN advantage in extracting temporal features from data with irregular and diverse patterns, such as the provided foreign entrant data. We add a multi-head temporal 1D-CNN layer to interpret the pattern of the timeseries data variate-wise. For the t-th input sequence X (i) t of single variable i, the latent feature Z (i) t is extracted through the following function Equation (1). Here, f i (·) indicates the ith head of the temporal-CNN layer, and σ(·) refers to ReLU and MaxPool1D layer. The detailed hyper-parameters of the multi-head CNN layer are described in Section V-D. We add the attention module to the prediction model to reflect the correlation between each extrinsic factor and the entrant data. The attention module receives Query, Key, and Value and outputs context vector. To give attention to the extracted features Z t , Query, Key, and Value are calculated through the following Equation (2) . Note that W indicates weight matrix for Equation (2) , and Q t , K t , V t are the query, key, and value from latent feature Z t , respectively. Then, the attention Score is obtained from Query and Key by Equation (3). where W´refers to weight matrix for Equation (3) and b is bias. Finally, Equation (4) shows how to derive the final Context vector C t through obtained Score(Q t , K t ). Context vector C t obtained through the attention layer is concatenated with the latent feature Z t and enters the input of the last fully-connected layer. The proposed forecasting framework outputs the prediction sequence. According to Gehring et al. [32] , their experiments demonstrate that the weight normalization method achieves better performance than the conventional batch normalization method in the sequence-output framework with the CNN structure. Based on this idea, we add a weight normalization layer in front of the fully-connected layer in the forecasting model. The normalized weight vector of the fully-connected layer w is as shown in Equation (5). where g is a scalar parameter, v indicates a k-dimensional vector, and ||v|| is the Euclidean norm of v [33] . By adding a weight normalization layer, the proposed model can become more robust to the values of learning hyper-parameters such as learning rate. Also, weight normalization reduces the like- For a training dataset with a total length of Lt, a data segment is generated by sliding the window by 1 time unit. The blue window is an input window with a length of m, and the red window is a ground-truth window with a length of k. lihood of convergence with sharp minima, thereby improving generalization performance [34] . In our experiments, the daily foreign entrant data and the daily data of the extrinsic factors are used, which are mentioned in Section III. More specifically, we use five input variables: foreign entrant data, politics (Hanhanlyeong) dummy variable, disease dummy variable, seasonal dummy variable, and attraction variable (Google trend data of keyword 'Seoul hotel') respectively. Daily data has 3926 days from January 1, 2010, to September 30, 2020. During this period, we set data up to December 31, 2018, as training data, and data from January 1, 2019, as test data. As explained in Section III-A, our framework predicts the trend of entrants within the future k days through m data of n variables in the past. We set k = 30, m = 30, and n = 5. A window of size m + k is created and data segments are generated by pushing 1 time unit for the entire dataset period using a sliding window method. Therefore, L t − m − k + 1 training data segments are generated for the total length L t of training dataset. The first m data are input data, and the last k data are ground-truth. In the same way, test data segments are also created from the test dataset. Figure 5 shows the process of creating data segments using the sliding window method. The total length of the provided daily foreign entrant data is about two years and nine months. time-series data of this length is not sufficient to train the model. Moreover, it is difficult to train with provided data since the pattern itself is very diverse compared to the data length. So, we augment the time-series data for more stable training results and higher prediction accuracy. Unlike other augmentation methods such as cropping or rotation, time-series data is mainly augmented by using specific methods like window warping [35] , flipping, Fourier transform [36] , and down-sampling. Among them, a simple method of adding Gaussian noise is often used [37] . This data augmentation technique improves performance in time-series prediction models such as DeepAR [38] . In this paper, we apply a new technique to augment timeseries data to fit the proposed MHAC model. The training data is augmented by generating data obtained by multiplying statistical noise to the existing training data. The method of augmenting training data is as follows: 1) Prepare variance vector V for n input variables like below. where v i is the variance of the variable i. Note that i is the numbered index of the enumerated n input variables. 2) Prepare 1 input data matrix X t and corresponding ground-truth value Y t at time point t mentioned in Section III-A. 3) Generate the noise t i for that follows the following lognormal distribution. Create the augmented data X t and the corresponding ground truth Y t through the following equations. Repeat for every data segment. Training data is augmented based on the aforementioned method. An example of augmented data is shown in Figure 6 . In this paper, the training data is augmented by a total of 9 times the training data. The performance of the model is verified only with test data that has not participated in data augmentation. The learning rate is 0.001, and the optimizer is Adam [39] . The total training epoch is 50, and 20% of the training data is used as verification data. The number of output channels of the CNN Layer of each input variable is samely set to 4, and stride is samely set to 1. The sizes of the 1D kernel in the CNN Layers are 5, 3, 3, 3, and 5, referred to as the Foreign entrant, Politics, Disease, Season, and Attraction variable, respectively. In CNN layers, a causal convolution is performed in consideration of the temporal features of the input variables. The same-padding is adjusted so that the length of the latent features does not change from the input shape (i.e., Z (i) t ∈ R m×4 , where 4 indicates the number of output channels of the CNN Layer). The activation function of CNN Layer is ReLU. Lastly, the final fully connected layer has a 25% dropout rate. The batch size is 4, smaller than that of other conventional deep learning experiments because the training procedure is unstable with larger batch size. In addition, the loss function is set to Mean Squared Error (MSE). The final output result is sequential data with k lengths. Since the generated data segment was created in the form of a 1 time unit sliding window, k output results are overlapped for a single time point. We firstly use methods evaluated in the existing tourism demand forecasting study to see the concrete forecast results. So we only plot the resulting graph for a single time point. In the resulting graph to be described later, the prediction time point is after one day. Since our forecasting framework outputs sequential forecast results, a more detailed evaluation is needed in addition to graphing the forecast results for a single time period. Therefore, we evaluate the performance of the prediction model using forecasting performance metrics [40] . We use Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and empirical correlation coefficient (CORR) as evaluation indicators. Lastly, we would like to observe the reliable results of the experiments. Therefore, we conducted the same experiment 5 times and averaged the results. We conduct prediction experiments with the proposed model and other deep learning models. Bidirectional LSTM [41] , CNN-LSTM [42] , and 1D-CNN [43] are selected as comparison models. All of the presented comparison models are deep learning models that are frequently used in the time-series prediction field recently. Each prediction model has the same input size and output size. The prediction performance results for each model are presented in Figure 7 and Table II . The predictive evaluation indicators for the entire test period are presented in Table II , which shows that the MHAC model has better predictive performance than the other deep learning models. In particular, our prediction model is superior to other prediction models in the CORR result. The MHAC model infers the overall trend better than the comparison models. Figure 7 shows each prediction model's foreign entrants prediction results from January 1, 2019, to September 30, 2020, the test period. The blue line is the actual data, and the red line is the predicted value. According to the graphs of each prediction model, all models are somewhat inaccurate in predicting very detailed patterns-however, the proposed model, MHAC, is superior to other models in following complex patterns and trends. In particular, from February 1, 2020, to September 30, 2020, during the COVID-19 outbreak, it is observed that our predictive model infers the number of foreign entrants during this period better than other comparison models. We design in-depth experiments on the proposed methods. We conduct experiments on the effect of the external factors used, the effect on data augmentation, and the batch size. Effect of the External Factors We perform an experiment to compare the prediction results for whether or not external factors are used. Each experiment is performed in which a single external factor is removed. Fine-tuning is performed separately by setting the head of the MHAC model to 4. The prediction experiment results are presented in Table III . We observe that the prediction results when each variable is subtracted are worse than the original one. In particular, it appears that the performances are much worse when Disease Prediction experiments are conducted according to the number of augmented data. Experiments with 13-time augmented, 9time augmented, 5-time augmented, 1-time augmented, and no augmentation are performed, respectively. Note that "9time augmented" means that the entire data was augmented 9-times by the proposed method, which is for the original experiment. The forecasting results for each experiment are shown in Table IV . As shown in the experimental results, we observe that the performance improves as the number of data increases. However, there is no significant difference in the performance between the 9-time and 13-time augmentation experiments. It is assumed that as the augmented data becomes too large, data redundancy occurs and the performance is not significantly improved. Study about Batch Size We design experiments with different batch sizes. Experiments on the batch size are performed for 1, 2, 4, 8, 16, 32, and 64, respectively. The performance results of tuning batch size are presented in Table V . As we described in Section V-D, the prediction accuracy drops sharply as the number of batches increases. On the one hand, there is no significant improvement in prediction performance when the batch size is extremely small (1, 2) . We see that the performance is sensitively dependent on the batch size in our framework. We additionally conduct ablation experiments to determine how much the weight normalization, attention module, and Table VI , the predictive model's performance is worsened when weight normalization is removed. Especially, the RMSE metrics differ greatly because weight normalization allows the forecasting model to interpret detailed patterns better, while the MHAC model without weight normalization shows a large error due to unstable learning. Attention Module We compare the original MHAC model and the MHAC model with only the attention module removed. All other conditions remain the same except for the attention module between the convolutional and fully connected layers. The prediction results for the ablation experiment of the attention module are presented in Table VI . The results when the attention module is included in the predictive model are better than when the attention module is not included. Extracting the correlation between each variable through the attention module of the features gathered through the convolutional layer helps in a good forecasting result. Multi-Head Structure The experiment is conducted by replacing the CNN part of the MHAC model with the multihead CNN structure into a single CNN layer. Input variables of the same input size are input to the model channelwisely. The attention module and the fully connected layer of the prediction model are the same. The experimental results for the multi-head structure are presented in Table VI . As shown in the results of Table VI , the multi-head CNN structure and the single CNN structure show very large differences in the prediction results. In addition, the prediction performance differs greatly in the CORR metric, which shows the multi-head CNN structure is more advantageous in inferring the long-term trend than the single CNN structure. We propose a multi-head attention CNN model to predict the foreign entrants of South Korea with high accuracy. Not just using only past foreign entrants data, we additionally utilize various external factors related to Korean tourism as input variables in the prediction model. By conducting a comparative experiment with other deep learning prediction models, it is shown that the proposed prediction model has higher accuracy in predicting South Korea's tourism demand. As a future study, we will explore whether the proposed model can be used for forecasting tourism demand in other countries and for other multivariate time series forecasting fields. Forecasting hotel accommodation demand based on lstm model incorporating internet search index Tourism demand forecasting: A deep learning approach Learning representations by back-propagating errors Long short-term memory Bayesian bilstm approach for tourism demand forecasting A method based on ga-cnn-lstm for daily tourist flow prediction at scenic spots A review of research on tourism demand forecasting: Launching the annals of tourism research curated collection on tourism demand forecasting Persistence in the short-and long-term tourist arrivals to australia Comparing the box-jenkins approach with the exponentially smoothed forecasting model application to hawaii tourists Using a neural network to forecast visitor behavior A note on forecasting international tourism demand in spain Forecasting tourism: A sine wave time series regression approach Forecasting tourist arrivals in barbados Support vector machines Support vector regression with genetic algorithms in forecasting tourism demand A novel approach to model selection in tourism demand modeling Attention is all you need An econometric study of tourist arrivals in aruba and its implications The impact of the financial and economic crisis on european tourism Google trends and tourists' arrivals: Emerging biases and proposed corrections Forecasting tourism demand with google trends for a major european city destination Big data analytics for forecasting tourism destination arrivals with the applied vector autoregression model Relative climate index and its effect on seasonal tourism demand China's south korea travel ban: What you need to know Can google data improve the forecasting performance of tourist arrivals? mixed-data sampling approach Predicting the present with google trends Seoul Tourism Global Search Trend Report. Seoul Tourism Organization Wavenet: A generative model for raw audio Divide and conquer-based 1d cnn human activity recognition using test data sharpening Convolutional neural networks for time series classification An empirical evaluation of generic convolutional and recurrent networks for sequence modeling Convolutional sequence to sequence learning Weight normalization: A simple reparameterization to accelerate training of deep neural networks Micro-batch training with batch-channel normalization and weight standardization Multi-scale convolutional neural networks for time series classification Feature representation and data augmentation for human activity classification based on wearable imu sensor data using a deep lstm neural network Time series data augmentation for deep learning: A survey Deepar: Probabilistic forecasting with autoregressive recurrent networks Adam: A method for stochastic optimization Temporal pattern attention for multivariate time series forecasting Parallel architecture of convolutional bi-directional lstm neural networks for network-wide metro ridership prediction A cnn-lstm model for gold price time-series forecasting Deep learning and time series-to-image encoding for financial forecasting