key: cord-0879081-tnqkl8pi authors: Abbasimehr, Hossein; Paki, Reza; Bahrini, Aram title: A novel approach based on combining deep learning models with statistical methods for COVID-19 time series forecasting date: 2021-10-10 journal: Neural Comput Appl DOI: 10.1007/s00521-021-06548-9 sha: 1ae251727997cc10fa8534119d9389ac7badbd5e doc_id: 879081 cord_uid: tnqkl8pi The COVID-19 pandemic has disrupted the economy and businesses and impacted all facets of people’s lives. It is critical to forecast the number of infected cases to make accurate decisions on the necessary measures to control the outbreak. While deep learning models have proved to be effective in this context, time series augmentation can improve their performance. In this paper, we use time series augmentation techniques to create new time series that take into account the characteristics of the original series, which we then use to generate enough samples to fit deep learning models properly. The proposed method is applied in the context of COVID-19 time series forecasting using three deep learning techniques, (1) the long short-term memory, (2) gated recurrent units, and (3) convolutional neural network. In terms of symmetric mean absolute percentage error and root mean square error measures, the proposed method significantly improves the performance of long short-term memory and convolutional neural networks. Also, the improvement is average for the gated recurrent units. Finally, we present a summary of the top augmentation model as well as a visual representation of the actual and forecasted data for each country. Temporary interventions such as social distancing, selfisolating, quarantining, and shutting down nonessential activities have been strategies for the governments to prevent the virus from spreading. It is essential to forecast the number of infected cases using different data types to notify public health decision-makers by estimating the likely impact of the COVID-19 pandemic and plan accordingly [1] [2] [3] [4] . Deep learning models have demonstrated successful performance in language and image processing tasks [6] [7] [8] . Also, they exhibited state-of-the-art performance in forecasting complex time series data [5, [9] [10] [11] [12] . The main advantage of deep learning models is their ability to learn representations from raw input data. Among the most popular deep learning algorithms, long short-term memory (LSTM) and bidirectional LSTM (Bi-LSTM) [13] have been used in [14] [15] [16] [17] , with significant results in COVID-19 forecasting. LSTM is a special type of recurrent neural networks (RNNs), which is developed to learn temporal information from sequential data [18] . Despite the fact that deep learning algorithms can reach acceptable performance in time series forecasting, particularly in COVID-19 forecasting applications, their forecasting capability is primarily dependent on the amount of data available to fit their parameters appropriately [12, 19] . Another challenge with deep learning for time series forecasting is that, even though adequate data samples are available, data from the distant past are typically less useful for forecasting [12] . In other words, in predicting, recent observations of an individual series are more valuable. This may be due to shifts in patterns that formerly occurred in a series. To overcome the aforementioned issue and increase the performance of deep learning models in time series forecasting, we propose exploiting time series augmentation techniques [19] [20] [21] to generate new series with similar temporal dependencies as the original series. We then extract new samples from the augmented time series to enhance model training. Three deep learning models based on the LSTM, gated recurrent units (GRU) [22] , and convolutional neural network (CNN) [23] are used to see whether the proposed approach is useful. A multi-stepahead forecasting strategy [24] is used to develop the models, allowing them to predict the number of cases for the next few days. It is a preferable alternative to singlestep-ahead forecasting for long-horizon forecasting [25] . The proposed models are applied to COVID-19 data from the top 10 countries with the most reported confirmed cases from January 20, 2020, until March 28, 2021 . We show that the proposed method significantly improves the performance of the LSTM-based and CNN-based models but has an average improvement on the GRU performance. To evaluate the effectiveness of the proposed model, we visualize the forecasting results and provide statistical characteristics of the data to enable governments to make long-term decisions on how to deal with the pandemic. The remainder of this paper is organized as follows. Section 2 provides a brief review on COVID-19 time series forecasting and the description of the employed deep learning methods. In Sect. 3, we present the proposed approach and the architectures of the designed models. Section 4 assesses the usefulness of the proposed method via the experimental study. Discussions are provided in Sect. 5, and finally, the paper concludes in Sect. 6 with some suggestions for future work in this area. This section first presents a review of the COVID-19 time series forecasting methods and then describes the utilized models throughout the study. Various approaches, mostly mathematical, statistical, machine learning, and deep learning models have been utilized in previous studies [3, 4, 14, 16, 17, 26] . Rahimi et al. [27] provided a review of widely used forecasting models on COVID-19 data. Here, we concentrate mainly on COVID-19 time series forecasting studies and present a brief review in this context. Al-Qaness et al. [28] presented an improved adaptive neuro-fuzzy inference method (ANFIS) that uses an enhanced flower pollination algorithm (FPA) by the salp swarm algorithm (SSA) to forecast the COVID-19 cases in China. Their model is more potent in terms of mean absolute percentage error (MAPE), root mean squared relative error (RMSRE), coefficient of determination, and computing time. Torrealba-Rodriguez et al. [3] used Gompertz, logistic, and artificial neural network (ANN) models. Their results from the infected cases in Mexico showed a high coefficient of determination between the studied data and those obtained by the proposed models. Similar studies which considered Gompertz and logistic models can be found in [3, [29] [30] [31] [32] . Castillo and Melin [26] studied an approach based on fuzzy fractal for data from 10 countries by combining (1) fractal dimension to evaluate the complexity of the dynamics in the time series and (2) fuzzy logic to reflect the uncertainty forecasting. Melin et al. [33] introduced a multiple ensemble neural network model with fuzzy logic response aggregation. Their experiments on the data of Mexico infected cases show the superiority of their proposed model over the single ANN. Kırbaş et al. [15] used autoregressive integrated moving average (ARIMA), nonlinear autoregression neural network (NARNN), and LSTM approaches to study the data of 8 European countries. Shahid et al. [16] proposed forecast models with ARIMA, support vector regression (SVR), LSTM, Bi-LSTM in 10 significantly affected countries. Leila et al. [34] applied ANN and ARIMA models, and Petropoulos and Makridakis [35] implemented exponential smoothing forecasting to predict the infected cases. Arura et al. [14] utilized recurrent neural network (RNN)-based variants such as deep LSTM, convolutional LSTM, and Bi-LSTM for the cases in India. For Russia, Peru, and Iran, Wang et al. [16] used LSTM networks and rolling updating mechanisms to feed new forecasting outcomes into model training for the next iteration. The study of Hasan [36] suggested a hybrid model consisting of ensemble empirical mode decomposition (EEMD) and artificial neural network (ANN), which outperformed conventional statistical analysis. Machine learning algorithms were used by Li et al. [37] in predicting mortality in confirmed cases of COVID-19. Their results indicated that the gradient boosting decision tree (GBDT) outperforms logistic regression (LR) models, the performance comparison appeared to be independent of disease severity, and the 5-index LR or LR-5 model is powerful in death prediction with a high area under the curve (AUC). Reviewing the previous studies indicate that computational intelligence methods and especially deep learning methods have attracted growing attention in COVID-19 time series forecasting. Even though deep neural networks have performed reasonably well when applied on COVID-19 time series data, in this study, we aim to enhance their predictive power by feeding them with more data. In general, the performance of the generated model in a deep learning task is largely determined by the amount of samples used in the model training phase. The inherent problem in time series forecasting is that time series are often short, and accordingly, the number of extracted samples becomes small. To address this problem, we propose to generate a new time series with similar characteristics to the original time series using statistical data augmentation methods. The obtained series via the augmentation approach is used to create new samples. In this way, a sufficient number of instances are provided for model learning. We use RNN for the sequence processing task, which can catch the temporal dependencies in a time sequence, unlike ANN. However, the key issue with RNN is the gradient vanishing/exploding problem, which makes them difficult to train. Two new architectures with gating mechanisms, the LSTM [38] and GRU [39] , have been proposed to solve this problem. In addition, we will use CNN, which is briefly discussed here, as another deep learning unit in our experiments. In this section, we explain the structure and mechanism of the LSTM unit. As illustrated in Fig. 1 , each LSTM unit is comprised of a memory cell C, an input gate i, an output gate o and a forget gate f. Considering the following parameters, the learning procedure of LSTM is described below: are bias vectors of input, output, forget, and memory cell. are weight matrix of input, output, forget, and memory cell. are the recurrent weights of input, output, forget, and memory cell. The output h t of the LSTM unit is computed as follows: where o t is the output gate that regulates the outgoing information of the LSTM unit and c t is the memory. o t is computed by where r is the logistic sigmoid and h tÀ1 is the output vector (hidden state) of the time t À 1. The memory cell is updated as follows: wherec t , the newly computed memory is obtained as follows: In fact, the memory cell c t is a combination of the previous memory multiplied by the forget gate, f t and the new memoryc regulated by the input gate, i t . f t and are computed as follows: GRU is another variant of RNN that uses gating mechanism to regulate the flow of information inside the unit. Unlike LSTM, GRU does not contain a memory cell. As Figure 2 portrays, the GRU has two gates, a reset gate r t and an update gate z t . The rest gate decides how to combine the new input x t with the previous hidden state, h tÀ1 . Also, the update gate determines how much unit updates its hidden state. CNN has shown promise in a variety of fields, including machine vision [23] . CNN's convolutional layers take input data and extract new features by performing convolution operations on it with convolution kernels. Each CNN contains a convolution kernel (i.e., a small window) that slides over the input data and performs convolutional operations to generate new features, as shown in Fig. 3 [40] . The generated features obtained by the convolution technique are typically more discriminative than the raw input data, resulting in better forecasting. Deep learning methods such as LSTM, CNN, and GRU have been applied successfully in the time series forecasting context. These techniques' performance mainly depends on having enough data to fit their parameters suitably [12] . The number of samples extracted from a short time series may be insufficient to achieve an optimal model [19] . These methods should be appropriately regularized to prevent them from overfitting. Another difficulty with time series forecasting is that, even if the series is long and adequate data are available, the observations from the far past usually provide fewer determinants for predicting. In other words, recent observations of an individual series are more useful in forecasting. This may be because of the changes that happen in patterns that existed in a series. The commonly used procedure of data preparation for a time series forecasting task is illustrated in Fig. 4 . As shown, a given time series is divided into in-samples and out-samples considering a certain ratio, for example, 80/20. The out-sample part (test data) ft mþ1 ; t mþ2 ; :::; t n g is used to evaluate the obtained model. Also, the in-sample part is divided into the train data ft 1 ; t 2 ; :::; t k g and the validation data ft kþ1 ; t kþ2 ; :::; t m g. The validation data are utilized to tune the model's hyperparameters and to evaluate a model fit on the train data. Selecting separate validation data leads to excluding the recent observations from the train data, so the recent patterns that exist in the data will not be captured. One simple solution to tackle this problem is to include the validation data in model training. However, in this way, overfitting may occur, which usually leads to loss of accuracy on test data. In this study, we propose to use time series augmentation methods to avoid model overfitting and improve the accuracy. Specifically, we utilize a time series augmentation technique to create new series with the same temporal dependencies that exist in the original series. The augmented time series is used to create a new validation set. The overall procedure of the proposed idea is illustrated in time series augmentation technique is applied, and then, the sample generation procedure is accomplished. In the modeling phase, the deep learning models are employed on the generated samples, and the best model is achieved. In the model training process, we adopt the Bayesian optimization algorithm to fine-tune the hyperparameters of each model. To explain our proposal, we describe its procedure using Algorithm 1. To augment a time series, we apply the method proposed in [20] . This algorithm firstly applies the Box-Cox transformation to the series and then decomposes the series into trend, seasonal, and reminder adopting STL or Loess [41] . Then it bootstraps the reminder using the moving block bootstrap (MBB) [42] , and the trend and seasonal components are added together, and finally, inverse Box-Cox transformation is applied. As illustrated in Algorithm 1, lines 1-7 show the procedure of computing bootstrapped series. In lines 8-10, the bootstrapped series are aggregated, and then, for the original series and the augmented series, the instances with input-output format are created considering a Lag, and an output window (Output_Window). Line 11 concatenates the two validation sets. In lines 12-18, the benchmarking models are trained and evaluated, and the best model in terms of RMSE is returned. Three state-of-the-art deep learning models are employed to explore whether the forecasting performance of the proposed scheme is better than the performance of the regular approach. The list of benchmarking models along with their architectures is provided in Table 1 . Also, Fig. 6 illustrates the full architectures of the proposed methods. As can be observed from the figure, the dense and output layers are the same for LSTM, CNN, and GRU. Every model learns a representation (a feature vector) of the input data and feeds it into the fully connected (dense) layer; afterward, the predictions are computed using the output layer. The choice of optimal hyperparameters is essential in obtaining a forecasting model with high accuracy [43] . Deep learning-based models usually contain several hyperparameters. Although grid search is a popular strategy for finding the optimal hyperparameters, it requires more computational time and resources to fine-tune deep learning methods. This is due to the fact that the grid search method exhaustively considers all parameter combinations, so it needs more computational resources, especially in the case of deep learning. The main reason behind using the Bayesian hyperparameter optimization is that it does not consider all hyperparameter combinations, and so less training time and resources are needed. The Bayesian hyperparameter optimization uses Bayesian models based on Gaussian processes to predict good tuning parameters [43] . The study of Wu et al. [43] indicated that the Bayesian optimization-based method could find the optimal hyperparameters for the popular machine learning algorithms. In line with [11, 12, 43] , the Bayesian optimization technique [44, 45] is used to tune the hyperparameters in all of the experiments in this study. The Bayesian optimization algorithm uses the error on the validation data to determine the appropriateness of each model. In this study, we use R forecast package 1 version 8.14 to generate the augmentation of each time series. Also, the deep learning models are implemented with Keras [46] , the Python deep learning library. The Humanitarian Data Exchange (HDX) [47] is the source of the data utilized in this study. In Tables 2 and 3 , we apply statistical properties to the aforementioned ten countries with the highest COVID-19 cases to better interpret the dataset. The sample size refers to the number of observations included in the experiment for each country which is not necessarily equal because it is counted from the day the first COVID-19 cases were reported. The mean or average of the data is the most popular and well-known measure of central tendency and is equal to the sum of all the values in the dataset divided by the number of observations. It is worth noting that the total cases can be obtained by multiplying the sample size by the sample mean during the study period. Other than mean, two other measures of central tendency are median and mode. Median is the middle value for the dataset that has been arranged in order of magnitude. An essential property about the median is that it is less affected or ''Robust'' by outliers and skewed data. Mode, on the other hand, is the most frequent number of daily cases in our dataset. It does not give a fair measure of central tendency when compared to median and mean [50] . The obtained mode for most of the countries is zero. The reason for that could be having days without any new cases or failing to report instances due to holidays or weekends. The sample's square root of variance often known as standard deviation is a measure of the amount of variation or dispersion of the dataset, using the same unit as the mean. A small standard deviation implies that the values tend to be close to the mean of the dataset, whereas a high standard deviation suggests that the values are spread out over a wider range [51] . Besides standard deviation, two other dispersion measures are (1) skewness, where it measures the amount of asymmetricity, and (2) kurtosis, where it determines the heaviness of the distribution tails, also known as is the ''tailedness'' or the ''peakedness.'' For a country dataset with one mode (uni-modal), a positive Fig. 6 Architecture of the utilized models skewness shows that the data are asymmetric and skewed to the right, a negative skewness explains that the data are asymmetric and skewed to the left, and finally, a symmetric dataset always has a zero skewness. To provide a comparison to the standard normal distribution, it is common to use an adjusted version known as the excess kurtosis, which is the kurtosis minus 3. A dataset with zero excess kurtosis is called ''Mesokurtic,'' with a positive excess kurtosis is named ''Leptokurtic'' indicating heavy tails with large outliers and less variable, and with a negative excess kurtosis is known as ''Platykurtic'' which have the flattest peak and highly dispersed [52] . A Z-score for skewness and kurtosis can be obtained by dividing the skew values or excess kurtosis by their standard errors, respectively, which are shown as Z Skewness and Z Kurtosis in Tables 2 and 3. As the studied sample size of the countries is large, either an absolute skew value larger than 2 or an absolute kurtosis larger than 7 can be used as reference values for determining the significance of nonnormality [53] . It is worth mentioning that the utilized models in this study are based on neural networks and deep learning. These methods are nonparametric that model the data without prior assumptions of their distribution [54] . Finally, the range for each country is the difference between the dataset's largest and smallest observations, which expresses a country dataset's dispersion. The two forecasting performance measures used in the comparison are (1) symmetric mean absolute percentage error (SMAPE) which is defined as and (2) the root mean square error (RMSE) which is obtained by where f t and y t are the predicted and observed values at time point t, respectively. Table 4 illustrates the domain of all hyperparameters utilized in the implemented models. The lag hyperparameter exploited in transforming input time series into samples suitable for deep learning techniques has a significant impact on obtaining models that can forecast future values with minimum error [49] . Another important hyperparameter is the learning rate that regulates how the weights are adjusted during the model training. Additionally, the utilized models throughout this study have different key hyperparameters that influence the forecasting accuracy of obtained models. The hyperparameters specific to each model are provided in Table 4 . Also, as outlined in the previous section (see Table 1 ), each utilized deep learning techniques contain a dense layer that follows the sequence capturing layer (e.g., LSTM, CNN, or GRU) and an output layer, which produces the outputs. These layers are common in all the utilized models, and their ranges are also given in Table 4 . According to the methodology shown in Fig. 5 and the procedure described in Algorithm 1, firstly, a new time series is generated via augmentation. Next, the original and the augmented series are transformed into samples with the input-output format. Then, the resulted samples are split into train set, validation set, and test set following the holdout procedure. Finally, a new validation set is created by concatenating the validation samples corresponding to the original series and the augmented one. It should be noted that this study adopts multi-output forecasting, and the sample generation process is performed using the lag (size of the input window) and the output window. In all experiments, the output window is set to 7 days. Following the multi-output forecasting strategy, for time series T : t 1 ; t 2 ; t 3 ; t 4 ; t 5 ; t 6 ; t 7 ; t 8 ; . . .; t n and Lag ¼ 5, Output_window ¼ 2, the created instances are shown in Table 5 . In this section, we investigate whether our proposed approach is able to enhance the forecasting accuracy of the deep learning models based on LSTM, CNN, and GRU. We run our experiments on the data of the before-mentioned ten countries. All experiments are repeated ten times, and the average performance measures are reported. Table 6 shows the results obtained using the deep learning model based on LSTM. As it can be seen, the model obtained using the proposed approach (LSTM_Aug) leads to a lower error in terms of SMAPE and RMSE for eight countries. LSTM_Aug achieves superior results for the USA, Brazil, India, France, Russia, the UK, Spain, and Turkey. Also, the mean SMAPE for LSTM_Aug is 0.82, which is lower than the one for LSTM, which has a mean of 1.30. Also, regarding RMSE, the mean RMSE measure for LSTM_Aug is significantly lower than the mean RMSE of LSTM. The experiments indicate that the results of LSTM_Aug are excellent, and the proposed approach significantly impacts the performance of LSTM. The results of experiments using deep learning model based on CNN are given in Table 7 . The best values are shown in boldface. In terms of SMAPE, the CNN_Aug model achieves better performance in 9 countries out of 10. Also, in terms of RMSE, we see a similar performance where CNN_Aug beats CNN in 9 cases. To give a comprehensive report on the performance of models, the mean SMAPE and mean RMSE measures also are computed. The mean SMAPE for CNN_Aug is 0.63, which is lower than that for CNN (0.73). Also, this is true for mean RMSE in which the CNN_Aug achieves lower error than CNN. The results indicate that CNN_Aug outperforms CNN, and that using the proposed data preparation strategy considerably enhances the accuracy of CNN-based deep learning models. Table 8 provides the results of experiments using the deep learning method based on GRU. Similar to the previously mentioned models, we compare the model obtained using the regular experimental setting (GRU model) with the model obtained using the proposed augmentation approach (GRU_aug). GRU_Aug and GRU perform similarly as each of them achieves minimum error in terms of SMAPE and RMSE in 5 cases out of 10 countries. This can be attributed to the fact that the GRU model uses different To provide an overall description of the results, we summarize the results of experiments and show the top augmentation model for each country in Table 9 . As can be seen from the table, for all ten countries, the models based on the proposed augmentation approach show the top accuracy in terms of both SMAPE and RMSE. This demonstrates the effectiveness of the proposed approach in increasing the forecasting accuracy of the deep learning methods. Also, CNN Aug performs excellently and reaches the best model for eight countries including, the USA, Brazil, France, Russia, Italy, Spain, Turkey, and Germany. Besides, LSTM_Aug achieves the best accuracy for two countries. Furthermore, as illustrated in Table 9 , in no country is GRU_Aug superior. To further demonstrate the forecasting ability of the obtained models, in this section, the actual and forecasts for each country are visualized in Figs Fig. 14, the inaccuracy is rather substantial in various time points for Spain. This is primarily due to the noise in the country's input data. In this study, we proposed a method that uses augmentation techniques to enhance time series forecasting. To conduct experimental study and to test the effectiveness of the proposed idea, we selected three deep learning methods, LSTM, GRU, and CNN. Furthermore, due to the importance of accurate forecasting of COVID-19 infections, data Similar to any time series forecasting task, in this study, we utilize the series past values to train the models. Also, we assume that an optimal hyperparameters for the utilized models have been chosen. Unlike the one-step-ahead forecasting, where a forecasting model uses the previous observations to predict a single Neural Computing and Applications time step, the multi-step-ahead forecasting strategy [24] , which was used in this study, allows forecasting two or more steps. In the COVID-19 forecasting, the multi-stepahead forecasting is attractive to policymakers. In fact, a longer window forecasting uncovers the trend of pandemic effectively and thus appeal more significant for governments. Also, in terms of SMAPE, the models generated following the proposed idea demonstrate excellent performance. Besides, the Mean SMAPE values for LSTM_Aug, CNN_Aug, and GRU_Aug are 0.82, 0.63, and 1.01, respectively indicating the forecasting power of the proposed method. As we mentioned previously, in this study, we formulate forecasting the number of infected cases as a time series forecasting problem in which the data of past observations of a series is used for predicting the future time points. The proposed models forecast the number of infected cases for a longer horizon with minimum error in comparison to their regular counterparts. The forecasts can be utilized by governments to take appropriate decisions in controlling the pandemic. In this study, we did not access to the other sources of information such as the interventions implemented by each country or vaccination of COVID-19. The models only were learned using the time series of the infections. Another limitation of this study is related to the hyperparameter selection for deep learning methods. As these methods contains a complex architecture, they require more computation. Therefore, investigating every hyperparameter configuration, similar to way performed in the grid search method, may not practicable. Therefore, in this study we used the Bayesian optimization algorithm to search the optimal hyperparameters. A new schema based on time series augmentation was suggested in this study to improve the performance of deep learning techniques in time series forecasting. The proposed method's main idea is to use a time series augmentation technique to create a new time series with the same properties in the original series. Then, we use the generated series to obtain enough samples to train the deep learning methods optimally. The proposed method is implemented in the context of COVID-19 time series forecasting data of the 10 most affected countries using the LSTM, GRU, and CNN models. According to the findings of the experiments, in the majority of countries, the LSTM_Aug model outperformed the standard LSTM model and the CNN_Aug model achieved significant performance than the regular CNN. In addition, GRU_Aug obtained an average performance when compared to the regular GRU. Overall, the models' performance following the proposed idea is excellent and significantly improves the regular models. As future work, we intend to evaluate the proposed method using other time series augmentation approaches such as dynamic time warping barycentric averaging. Coronaviridae Study Group of the International Committee on Taxonomy of V (2020) The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Forecasting and planning during a pandemic: COVID-19 growth rates, supply chain disruptions, and governmental decisions Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models Prediction of COVID-19 confirmed cases combining deep learning methods and Bayesian optimization Improving the performance of deep learning models using statistical features: The case study of COVID-19 forecasting Emerging Research in Data Engineering Systems and Computer Communications Triage of potential covid-19 patients from chest x-ray images using hierarchical convolutional networks COVIDScreen: Explainable deep learning framework for differential diagnosis of COVID-19 using chest X-Rays An optimized model using LSTM network for demand forecasting Feature engineering for mid-price prediction with deep learning Improving time series forecasting using LSTM and attention models Forecasting across time series databases using recurrent neural networks on groups of similar series: a clustering approach Understanding lstm networks Prediction and analysis of COVID-19 positive cases using deep learning models: a descriptive case study of India Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches Predictions for COVID-19 with deep learning models of LSTM Time series prediction for the epidemic trends of COVID-19 using the improved LSTM deep learning method: case studies in Russia, Peru and Iran Temporal attention-augmented bilinear network for financial time-series data analysis Toward automatic time-series forecasting using neural networks Bagging exponential smoothing methods using STL decomposition and Box-Cox transformation Improving the Accuracy of Global Forecasting Models using Empirical evaluation of gated recurrent neural networks on sequence modeling Deep learning Multi-step-ahead time series prediction using multiple-output support vector regression Multiple-output modeling for multi-step-ahead time series forecasting Forecasting of COVID-19 Time Series for Countries in the World based on a Hybrid Approach Combining the Fractal Dimension and Fuzzy Logic A review on COVID-19 forecasting models Abd El Aziz M (2020) Optimization method for forecasting confirmed cases of COVID-19 in China Prediction and analysis of Coronavirus Disease Data analysis on Coronavirus spreading by macroscopic growth laws Real-time forecasts of the COVID-19 epidemic in China from Modeling and forecasting trend of COVID-19 epidemic in Iran until May 13, 2020 Multiple ensemble neural network models with fuzzy response aggregation for predicting COVID-19 time series: the case of Mexico Exponentially Increasing Trend of Infected Patients with COVID-19 in Iran: A Comparison of Neural Network and ARIMA Forecasting Models Forecasting the novel coronavirus COVID-19 A methodological approach for predicting COVID-19 epidemic using EEMD-ANN hybrid model Development and external evaluation of predictions models for mortality of COVID-19 patients using machine learning method Long short-term memory Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Deep convolutional neural networks for image classification: a comprehensive review STL: A seasonal-trend decomposition Resampling methods for dependent data Hyperparameter optimization for machine learning models based on bayesian optimization A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning Bayesian optimization for learning gaits under uncertainty Keras Accessed Novel Coronavirus (COVID-19) Cases Data Tsang IR Lag selection for time series forecasting using Particle Swarm Optimization Essentials of statistics for the behavioral sciences Measurement error Measuring skewness and kurtosis Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis Evaluation of statistical and machine learning models for time series prediction: identifying the state-of-the-art and the best conditions for the use of each model Author Contributions HA performed conceptualization, methodology design, software development, validation, writing the original draft, and writing, reviewing, and editing. RP had contributed to software development, data curation, and visualization. AB took part in writing, reviewing, and editing.Availability of data and material The data is publicly available at the Humanitarian Data Exchange (HDX) https://data.humdata.org/data set/novel-coronavirus-2019-ncov-cases. Conflict of interest The authors declare that they have no conflict of interest.