key: cord-1053967-3gtobe4s authors: Abbasimehr, Hossein; Paki, Reza title: Prediction of COVID-19 Confirmed Cases Combining Deep Learning Methods and Bayesian Optimization date: 2020-11-28 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110511 sha: a338873ed9a763725d0544862554e0985666cc04 doc_id: 1053967 cord_uid: 3gtobe4s COVID-19 virus has encountered people in the world with numerous problems. Given the negative impacts of COVID-19 on all aspects of people's lives, especially health and economy, accurately forecasting the number of cases infected with this virus can help governments to make accurate decisions on the interventions that must be taken. In this study, we propose three hybrid approaches for forecasting COVID-19 time series methods based on combining three deep learning models such as multi-head attention, long short-term memory (LSTM), and convolutional neural network (CNN) with the Bayesian optimization algorithm. All models are designed based on the multiple-output forecasting strategy, which allows the forecasting of the multiple time points. The Bayesian optimization method automatically selects the best hyperparameters for each model and enhances forecasting performance. Using the publicly available epidemical data acquired from Johns Hopkins University's Coronavirus Resource Center, we conducted our experiments and evaluated the proposed models against the benchmark model. The results of experiments exhibit the superiority of the deep learning models over the benchmark model both for short-term forecasting and long-horizon forecasting. In particular, the mean SMAPE of the best deep learning model is 0.25 for the short-term forecasting (10 days ahead). Also, for long-horizon forecasting, the best deep learning model obtains the mean SMAPE of 2.59. Coronavirus 2019 (COVID-19) pandemic [1] has spread from Wuhan, China to other countries in the world. It has high viral infectivity and a rapid rate of spread compared to prior infectious diseases which makes its control hard [2] . Since its emergence, COVID-19 disease has encountered people in the world with many problems. It has more negative impacts on people's health and interrupted the economy. As a result, many countries have implemented strong interventions to control the spread of the epidemic and to reduce the negative effects of COVID-19 disease [3] . Although the interventions vary between countries, the commonly adopted interventions are social distancing, border closure, school closure, lockdown, travel banning, and public events banning [4] . The effectiveness of interventions across 11 European countries has been investigated in Flaxman, Mishra [4] concluding that the adopted interventions were effective in reducing the rate of transmission of COVID-19 epidemic. To evaluate the success of controlling COVID-19 epidemic, it is vital to accurately monitor and reveal the data about the number of cases infected with it [2] . Making public the data of confirmed cases of countries in the world allow academics to conduct modeling on data in order to gain useful knowledge about the trend of the disease. Johns Hopkins University's Coronal Virus Resource Center [5] has collected and published the data about the COVID-19 confirmed cases which are used by scholars to model the spread of the disease and perform data analysis. Given the negative impacts of COVID-19, accurately forecasting the number of cases infected with this virus is a vital task to reveal the trend of the disease and thereby to help governments to take preventive measures [6] . Previous researches on COVID-19 time series forecasting have adopted mathematical and computational intelligence models to forecast the number of confirmed cases. In [7] the adaptive neurofuzzy inference system (ANFIS) was employed to forecast the number of infected cases in China. In [3] mathematical and computational models such as Logistic, Gompertz, and ANN were applied to model the number of cases in Mexico. Castillo and Melin [8] proposed a new combined approach with fuzzy fractal and fuzzy logic to predict the number of confirmed cases of COVID-19 in 10 countries. Also, in [9] , a new ensemble approach based on ANNs and fuzzy aggregation was proposed and its performance was evaluated on COVID-19 time series of Mexico and its 12 states which showed significant improvement than single ANN. In recent studies [2, [10] [11] [12] , deep learning methods such as LSTM and bidirectional LSTM (BiLSTM) have been utilized for COVID-19 time series forecasting . The results indicated that LSTM and its variants have good performance in predicting the COVID-19 time series. In the literature review section, we will give a comprehensive review of studies related to COVID-19 time series forecasting. Although LSTM was recently applied for Coivd-19 infection forecasting, the predictive power of other deep learning methods that are suitable for sequence processing problems has not been explored in COVID-19 forecasting context. Therefore, in this paper, in addition to LSTM [13] , we focus on the other deep learning models including the multi-head attention [14] , and CNNs [15] to forecast the number of cases of COVID-19. Furthermore, the performance of deep learning methods mainly influenced by hyperparameter tuning [16] . There are several hyperparameters that must be specified when employing a deep learning model. The previous studies on COVID-19 forecasting using the LSTM method have not exploited an optimization method to identify the optimal hyperparameters. Most of those studies (e.g. [2, 10, 12] ) have implemented models using hand-tuned hyperparameters. As another contribution, in this study, we utilize the Bayesian Optimization method [17] in order to optimize the hyperparameters of Multi-Head Attention, LSTM, and CNN. Besides, the design of proposed methods is based on the multiple output approach that allows forecasting of the number of cases for multiple next days. Overall, the main contributions of this study are as follows: 1-Adopting the deep learning models to predict the number of daily infected cases with COVID-19. 2-Exploiting the Bayesian Optimization for optimal parameter selection 3-Adopting a multiple-output modeling approach: The models are designed to be multi-output to predict the next few days. The usual approach to multi-step-ahead prediction is iterated one-stepahead forecasting in which the forecasting of the n next steps performed as a n single step-ahead forecasting. Multi-output forecasting is an effective choice for long-horizon forecasting [18] . The deep learning models are applied on COVID-19 data of the top 10 countries with the highest number of infections. To evaluate the performance of the proposed models, we perform two sets of experiments. The first set of experiments explores the effectiveness of the proposed models in short-term forecasting and compares their performance with the results of the fuzzy fractal model presented in [8] . The results indicated the deep models achieve better performance than the fuzzy fractal across all countries. Also, the second set of experiments are conducted to investigate the prediction power of the devised models in a wider forecasting window. The results can help governments in long-term decision making to control the pandemic. The rest of this paper is organized as follows. In section 2, we provide a comprehensive literature review on models and methods proposed for COVID-19 time series forecasting. Section 3 describes the structure of the proposed models. In Section 4, we describe the data and provide the detailed results of the proposed models and compare their performance to the benchmark model. Section 5 concludes the paper and outlines future work. In this section, we summarize the previous studies in the context of COVID-19 time series prediction. Since the publicly available data of COVID-19 contains daily statistics of the confirmed cases, so it is considered as a time series data and the time series forecasting techniques can be exploited to this data. Table 1 illustrates the researches on COVID-19 time series forecasting. The table highlights the modeling techniques, the countries, and the time period of the utilized data in each study. As Table 1 indicates, various types of methods including mathematical, statistical, machine and deep learning, and fuzzy logic-based techniques have been employed for COVID-19 time series forecasting. From mathematical models, the Gompertz model and logistic models have been used in several studies (i.e. [3, 19, 20] ). Also, from statistical methods, the Auto-Regressive Integrated Moving Average (ARIMA) approach has been employed in some studies such as [2, 6, 11] . Besides, the machine and deep learning techniques such as ANN and LSTM have exhibited improvements in COVID-19 time series forecasting studies (e.g. [2, 10, 12] ). Also, some methods based on fuzzy logic have been proposed in the literature(e.g. [7, 8] ). As the literature review indicates, the exploitation of deep learning models has led to improvements in the prediction of COVID-19 cases [2, [10] [11] [12] . Since the COVID-19 time series forecasting task is a kind of sequence processing, other deep learning models can be adopted to forecast the COVID-19 time series [12] . The remarkable characteristic of the machine and deep learning methods is their ability to capture nonlinear patterns [21] , which makes them suitable for modeling complex time series. In the recent years, in addition to the LSTM model, other types of deep learning models such as methods based on the attention mechanisms and convolutional neural networks have demonstrated promising results in many areas of applications such as natural language processing (NLP) [22] , stock market price forecasting [21] and so on. Investigating the literature on COVID-19 forecasting reveals that attention mechanism and the convolutional neural network have not been employed for COVID-19 prediction. Therefore, this study aims to propose deep learning models based on these methods to evaluate their effectiveness in forecasting COVID-19 infected cases. In this study, we consider three different deep learning methods to predict the cumulative number of cases. The three proposed methods are the multi-head attention-based method (ATT_BO), CNN-based method (CNN_BO), and LSTM-based method (LSTM_BO). As illustrated in Figure 1 , all proposed methods are combined with the Bayesian optimization algorithm to select the optimal values of hyperparameters. In Figure 1 , the Bayesian optimizer [23] accomplishes the task of identifying the optimal hyperparameters. A common alternative to Bayesian optimization is the grid search which is a time-consuming method. The reason for choosing Bayesian optimization are: (1) the superiority of Bayesian optimization over grid search has been proved in previous studies [24] (2) unlike grid search, Bayesian optimization can efficiently find the optimal hyperparameters with fewer iterations [25] . In the following subsections, we describe the structure of the proposed models. Recently attention mechanisms have been employed successfully in the sequence processing tasks and especially in natural language processing applications [21, 22] . The study of Vaswani, Shazeer [26] demonstrated the effectiveness of the attention mechanism for processing sequence data. In this study, we propose a multi-head attention-based model for COVID-19 forecasting using the multi-head attention mechanism developed in [26] ( Figure 2 ). An attention function takes a query Q and a set of keys and values , KV  to get the output O . This procedure is often called Scaled Dot-Product Attention. Multi-head attention is a set of multiple heads that jointly learn different representations at every position in the sequence [14] . The proposed attention method (ATT_BO) has three main parts including the multi-head attention layer, the flatten layer, and the fully connected layer. After preprocessing the input data and creating the instances, the multi-head attention layer computes a new representation of the input data which are more informative than the input data. The output of the multi-head attention layer is reshaped using the flatten layer and finally, the outputs are produced using the fully connected layer. The superiority of the proposed model is attributed to the multi-head attention layer which has the ability to capture the most important input features and gives higher weights to them. Deep learning methods such as RNNs are suitable for sequence processing as they consider the temporal behavior of a given time series [21] . But, the main shortcoming of RNNs is the vanishing/exploding gradient problem that makes their training a difficult task [27] . To overcome this problem, LSTM which is a kind of gated RNNs are often employed [28] . The structure of an LSTM block is depicted in Figure 3 . Each LSTM block consists of a memory cell along with three gates including an input gate ( ) it , the forget gate ( ) ftand the output gate ( ) ot which regulate the flow of information to its cell state ( ) ct : Each of the three gates accomplishes a different operation [29] : • The forget gate determines which information is discarded. • The input gate decides which information is input to the cell state. • The output gate regulates the outgoing information of the LSTM cell The architecture of the proposed LSTM-based (LSTM_BO) is articulated in Figure 4 . This method consists of three main parts, including the LSTM layer, the flatten layer, and the fully connected layer. The input time series is firstly preprocessed and then is fed into the LSTM layer, which learns a new representation of data considering the dependency among data. Afterward, the output of the LSTM layer is reshaped into a suitable format using a flatten layer and then is fed into a fully connected layer. Finally, the fully connected layer produces multiple outputs. CNNs are quite successful in processing machine vision problems [15] . In this study, we implement CNN for COVID-19 time series forecasting. The convolutional layers in CNNs take input data and apply convolution operation on data using convolution kernels to extract new features. The convolution kernel is a small window that slides over the input data and performs convolutional operations to extract new features [30] . The derived features using the convolution operation are usually more discriminative than the raw input data, therefore, improving the forecasting. The architecture of the proposed CNN-based model (CNN_BO) is described in Figure 5 . CNN_BO contains three main parts: the convolution layer, the flatten layer, and the fully connected layer. After preprocessing of the input data, features are extracted from the input time series using the convolution layer, and then the flatten layer reshapes data into a format that can be used by the fully connected layer and the fully connected layer generates the multiple outputs. The data utilized in this study was obtained from the Humanitarian Data Exchange (HDX) [31] . In this study, we perform two sets of experiments using two different datasets, including Dataset 1 and Dataset 2 that are described in Table 2 . The first set of experiments examine the usefulness of the proposed deep learning model in a shorter 10 days window. To perform the first set of experiments, we utilize Dataset 1 which contains the data used in [8] . To compare the results of the proposed methods, we choose the fuzzy fractal method proposed by Castillo and Melin [8] as the benchmark. Also, to evaluate the performance of the three proposed models in long-horizon forecasting, we use Dataset 2 that includes the updated data of COVID-19 cases until 3 August. Similar to Dataset 1, Dataset 2 contains data for ten countries with the highest number of cases. In selecting the top ten countries of Dataset 2, we firstly aggregate the data of all cities for each country. To In this study, as the architectures of the three proposed models indicate, we design the models following the multi-output forecasting strategy, which allows forecasting of multiple time steps rather than a single time step that is applied in the single-output strategy. The proposed models require the input to be instances (data objects) of input-output format. So, the input time series must be converted into the input-output format. Therefore, considering the input size, L (Lag), which refers to the length of the input window, and the output size, O, which denotes the length of the output window, subsequences of length LO + are extracted from the series. The first L points of a sequence are considered as the input, and the last O points are considered as the output values. For example, as depicted in Figure 6 , the process of the construction of the instances iteratively generates the instances using the input=3 (L=3) and the output size O=2. In this study, we combine the proposed methods with the Bayesian optimization algorithm to identify the optimal hyperparameter value. The proposed methods the proposed method are implemented using Keras library in python [32] . To prevent all methods from overfitting and improving their generalization to new data, we use early stopping [33] . To employ early stopping, we set the epoch limit to 500. To utilize the Bayesian optimizer, the range of the hyperparameters should be specified. One important hyperparameter which significantly impacts time series forecasting accuracy is the size of the input window (Lag). The range of Lag is set to (10, 11, 12, 13, 14, 15) for all proposed methods. Table 3 provides the range of hyperparameters utilized throughout the experiments. As the fully connected and output layers have been incorporated after the main layer of the proposed methods; for all deep learning models, we set the range of hyperparameters corresponding to these layers identical. To limit the search space of the Bayesian optimization algorithm, for these layers, we include their activation functions in the hyperparameter selection process. For both layers, "ReLU" and "Linear" activation functions [15] are utilized. Also, the range of learning rate parameter for all models is set to (0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05). In this section, we give the results of the experiments conducted based on the two datasets. In the analysis of the first set of experiments, we consider the results of the fuzzy fractal model proposed in [8] . The main reason behind choosing the fuzzy fractal method as the benchmark is that this method was comprehensively evaluated in the recent study conducted by Castillo and Melin [8] using Dataset 1. Besides, on the second set of experiments, we explore the performance of our developed models on a wider forecasting window by adopting a multi-output forecasting strategy. To make the forecasting comparable with the results of the fuzzy fractal model [8] , for Dataset 1, we consider the last 10 days as the test points . The results of the proposed models as well as the benchmark model on Dataset 1 are illustrated in Tables 4, 5, and 6. As the results indicate, in terms of SMAPE ( Table 7 . The Mean SMAPEs of the three deep learning models are significantly lower than the fuzzy fractal's one (Mean SMAPE=0.7052) (as seen in Table 7 ). Furthermore, the ATT_BO and CNN_BO models outperform the fuzzy fractal model in terms of Rank SMAPE. To illustrate the performance of methods, in Figure 8 shows the forecasted values for UK, where the difference between the deep learning model and the benchmarking model is apparent. Figure 9 illustrates similarly the predicted values for Turkey, where the forecasted values using both the deep learning model and the benchmarking model are very close to the real values. Figure 10 plots the forecasted values for Spain, where the benchmark model slightly predicts better than the deep learning model. Figures 11, 12, and 13 show the predicted values for Mexico, Iran, and Italy respectively, where the forecasted values using the deep learning method are very close to the actual ones. The plots for Germany and France are illustrated in Figures 14 and 15 , respectively, which indicate the fuzzy fractal model predicted slightly better than the deep learning model. Figure After validating the effectiveness of the deep learning-based model on a shorter-window forecasting task, in this section, we perform the second set of experiments on Dataset 2 to examine the performance of the proposed models in longer-horizon forecasting. Longer-horizon forecasting reveals the trend of the pandemic in the long term and thus help governments to make appropriate decisions. To conduct experiments on Dataset 2, we adopt the hold-out method and split each COVID-19 time series into two parts: train set (80%) and test set (out-of-sample (20%)). The model building process is accomplished on the train set. The test set is used for evaluating the obtained models throughout the experiments. Also, for each time series, 20% of the train set is considered as the validation set data that is used in the hyperparameter identification process. As mentioned before, we adopt a multi-output forecasting strategy, so we set the output size=7. Therefore, the proposed model can forecast the number of cases for 7 next days. The results of experiments in terms of SMAPE are provided in Table 8 . For Dataset 2, ATT_BO achieves the best SMAPE for US, Africa, and Chile. Also, LSTM_BO exhibits a significant performance and obtains the best SMAPE for 6 countries including India, Russia, Mexico, Peru, Columbia and Iran. CNN performs worse among these three methods and obtains the best performance only for Brazil. Table 9 shows the results of experiments with respect to the MAPE measure. Similar to the results given in Table 8 , LSTM_BO, ATT_BO, and CNN_BO achieve the best performance in 6, 3, and 1 countries, respectively. The results of models in terms of RMSE are given in Table 10 . We observe that regarding RMSE, LSTM_BO achieves the lowest RMSE in 5 cases. Also, the second-best performing method is the ATT_BO, which obtains the lowest RMSE in 3 countries. CNN_BO obtains the best forecasting only for Brazil. To gain a more understanding of the overall performance of the proposed methods and their rank across all countries, we calculate Mean SMAPE, Mean MAPE, Mean RMSE, Rank SMAPE, Rank MAPE, Rank RMSE over all 10 countries data (as seen in Table 11 ). The results demonstrate that the LSTM_BO method outperforms ATT_BO and CNN_BO in terms of all overall performance measures and is a suitable choice for a longer horizon forecasting task. In this study, three methods based on combining the deep learning models such as multi-head attention, CNN, and LSTM with the Bayesian optimization algorithm were developed to forecast COIVD-19 timeseries data. The main advantage of the proposed methods is their ability in processing the sequence data. Also, as another advantage, the design of the devised models is based on the multi-output forecasting strategy that allows forecasting multiple next days. The proposed methods were applied on the COVID-19 time series data considering two settings, the short-term forecasting, and the long horizon forecasting. For short-term forecasting, we adopted the fuzzy fractal method as the benchmarking model. the best deep learning model outperforms the fuzzy fractal model in 6 countries out of 10 countries. The significant result is that in terms of all overall measures such as Mean SMAPE, Rank SMAPE, Mean MAPE, Rank MAPE, Mean RMSE, and Rank RMSE, the three proposed methods perform significantly better than the benchmark model. Also, as the long-horizon forecasting is beneficial for long-term decision making on COVID-19 interventions, we explored the ability of the proposed methods on a longer horizon forecasting. The results of experiments indicated that among the three proposed models, the LSTM_BO achieves the best SMAPE in 6 countries. Besides, in terms of the performance measures computed across all countries, LSTM_BO outperformed ATT_BO and CNN_BO. Moreover, visualizing the actual and forecasted values demonstrated the effectiveness of the proposed methods in COVID-19 time series forecasting. As future work, we aim to extend the proposed methods by extracting the informative features from time series and incorporating them into the deep learning models. ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches Modeling and prediction of COVID Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe CRITICAL TRENDS: TRACKING CRITICAL DATA Exponentially Increasing Trend of Infected Patients with COVID-19 in Iran: A Comparison of Neural Network and ARIMA Forecasting Models Optimization method for forecasting confirmed cases of COVID-19 in China Forecasting of COVID-19 Time Series for Countries in the World based on a Hybrid Approach Combining the Fractal Dimension and Fuzzy Logic Multiple Ensemble Neural Network Models with Fuzzy Response Aggregation for Predicting COVID-19 Time Series: The Case of Mexico Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM Time series prediction for the epidemic trends of COVID-19 using the improved LSTM deep learning method: Case studies in Russia, Peru and Iran Understanding lstm networks Multi-Head Attention with Disagreement Regularization Conference on Empirical Methods in Natural Language Processing Deep learning LSTM: A Search Space Odyssey Practical Bayesian support vector regression for financial time series prediction and market condition change detection Multiple-output modeling for multi-step-ahead time series forecasting Prediction and analysis of Coronavirus Disease Data analysis on Coronavirus spreading by macroscopic growth laws Feature engineering for mid-price prediction with deep learning Sentiment analysis of student feedback using multi-head attention fusion model of word and context embedding for LSTM A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning Bayesian optimization of a hybrid system for robust ocean wave features prediction A hybrid short-term load forecasting model based on variational mode decomposition and long short-term memory networks considering relevant factors with Bayesian optimization algorithm Attention is all you need. Advances in neural information processing systems2017 Long short-term memory An optimized model using LSTM network for demand forecasting Generating sequences with recurrent neural networks Deep convolutional neural networks for image classification: A comprehensive review Novel Coronavirus (COVID-19) Cases Data Early Stopping -But When?