key: cord-0833130-yafppblg authors: Eledum, H.; Yagoub, R. title: Modeling of the COVID-19 Cases in Gulf Cooperation Council (GCC) countries using ARIMA and MA-ARIMA models. date: 2021-05-29 journal: nan DOI: 10.1101/2021.05.27.21257916 sha: 2b7f661cdd397ced6c066054d9a120e0c5d36a3c doc_id: 833130 cord_uid: yafppblg Coronavirus disease 2019 (COVID-19) is still a great pandemic presently spreading all around the world. In Gulf Cooperation Council (GCC) countries, there were 1015269 COVID-19 confirmed cases, 969424 recovery cases, and 9328 deaths as of 30th Nov. 2020. This paper, therefore, subjected the daily reported COVID-19 cases of these three variables to some statistical models including classical ARIMA, kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA to study the trend and to provide the long-term forecasting of the confirmed, recovery, and death cases of the novel COVID-19 pandemic in the GCC countries. The data analyzed in this study covered the period starting from the first case of coronavirus reported in each GCC country to Nov 30, 2020. To compute the best parameter estimates, each model was fitted for 90% of the available data in each country, which is called the in-sample forecast or training data, and the remaining 10% was used for the out-of-sample forecast or testing model. The AIC was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE was utilized for testing data, and the model with the minimum AIC and minimum RMSE was selected. The main finding, in general, is that the two models WMA-ARIMA and EWMA-ARIMA, besides the cubic linear regression model have given better results for in-sample and out-of-sample forecasts than the classical ARIMA models in fitting the confirmed and recovery cases while the death cases haven't specific models. The main objective of this article is to model confirmed, recovery, and death cases of COVID-19 using classical ARIMA besides the three types of k th Moving Average-ARIMA (k th MA-ARIMA), including k th Simple Moving Average-ARIMA (k th SMA-ARIMA), k th Weighted Moving Average-ARIMA (k th WMA-ARIMA) and k th Exponential Weighted Moving Average-ARIMA (k th EWMA-ARIMA) in the GCC countries. This study starts from the first case of coronavirus reported in each GCC country to Nov 30, 2020. This article's main contribution is that it considers the only study that used the classical ARIMA together with k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA to model the three variables confirmed, recovery, and death cases of COVID-19 in the GCC countries. The organization of the paper is as follows. Section 2 describes the study area and data collection. Section 3 briefs the methodology used in the study. The article ends with the results and discussion in Section 4, and conclusions in Section 5. To achieve this study's objectives, all six countries within the GCC were included (Saudi Arabia, United Arab Emirates, Qatar, Kuwait, Bahrain, and Oman). The sample data consist of daily reported COVID-19 cases of 3 variables involving confirmed, recovery, and deaths in each country. The data cover the period starting from the first confirmed case of COVID-19 reported in each country to Nov 30, 2020. The data extracted from the WHO situation reports, Sehhty website, and Wikipedia. This paper's main goal is to model 3 variables involving daily confirmed, recovery, and death cases in GCC countries using classical ARIMA besides the three types of k th MA-ARIMA including k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA. Therefore, this section investigates each of these models, discussing model building and model evaluation. ARIMA model, which was developed by Box and Jenkins (1994) , is a statistical model that uses time series data to study the trend and generate future forecasting of time series data. reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; For a given non-stationary time series , the classical ( , , ) model is defined as Where is the backward shift operator, (1 − ) is the difference filter, is a number of times, need to differentiate to make the data stationary, is the order of autoregression, is the order of moving average, ( ) = 1 − 1 − 2 2 − … − , ( ) = 1 + 1 + 2 2 + ⋯ + + and ∼ (0,1). ARIMA model is a generalized model that integrates the autoregressive model ( ) and the moving average model ( ), ARIMA models that do not require differencing are considered as ARMA models, therefore model (1) can be expressed as polynomials of autoregressive ( ), residuals ( ), and a combination of them ( , ) as The k th SMA-ARIMA process of a time series and it is the corresponding back-shift operator are defined, respectively, by The k th WMA-ARIMA process of a time series and it is the corresponding back-shift operator are given, respectively, as ̂= The k th EWMA-ARIMA process of a time series and it is the corresponding back-shift operator are computed, respectively, as Model selection criteria are rules used to select a statistical model among a set of candidate models based on the observed data. The Akaike information criterion (AIC) is a widely used model selection tool due to its computational simplicity and effective performance in many modeling frameworks. The AIC is given as (Akaike, 1974) = −2 log + 2 (11) Where is the likelihood of the model and is the total number of estimated parameters in the model. A good model is the one that has the minimum AIC among all other models. The most popular measure of forecast accuracy in univariate time series data is the Root Mean Square Error (RMSE) proposed by Hyndman and Koehler (2006) . The RMSE is computed as where and ̂ are the actual and predicted values at time , respectively, and is a sequence of time points. The lower value of RMSE indicates better calibration and, therefore, better performance. reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; https://doi.org/10.1101/2021.05.27.21257916 doi: medRxiv preprint After the ARIMA or k th MA-ARIMA model, which is considered appropriate among the alternatives, is put in place, it can be tested for a goodness fit, which entails testing its efficiency. The model is assumed to be a good fit if the residuals are approximately equal to the white noise. The essential tools are the plots of ACF and PACF. The Box-Ljung test is a diagnostic tool used to test the lack of fit of a time series model. This test is applied to the residuals of a time series after fitting an ARIMA or k th MA-ARIMA model to the data. The test examines autocorrelations of the residuals. The null and alternative hypothesis for this test is 0 : The model does not exhibit a lack of fit, or there is no serial correlation among lags 1 : The model exhibits a lack of fit, or the residuals are approximately equal to the white noise. This section first demonstrates summary statistics for the three variables, confirmed, recovery, and death cases in each GCC country, then reports and discusses the results obtained from applying the ARIMA and k th MA-ARIMA models on these variables. Table 1 shows the summary statistics measures, including mean and standard deviation of the confirmed, recovery, and death cases of COVID-19 among the GCC countries. Moreover, Table 1 also demonstrates the prevalence of confirmed cases per 100000 population for the first four weeks. with a standard deviation of (1210.86), followed by UAE, Kuwait, and Oman; on the other side, Bahrain has the lowest mean (306.18) with a standard deviation of (215.08). For recovery cases, KSA has the highest mean, followed by UAE, Kuwait, Qatar, and Oman, but Bahrain has the lowest one. KSA has the highest mean of reported death cases, followed by Oman, Kuwait. On the other hand, Qatar has the lowest one. It can be also seen that in the first 4 weeks of COVID-19 outbreak, Qatar and Bahrain have the highest prevalence of confirmed cases of 18 and 16 infected persons per 1000000, respectively. In contrast, UAE and Oman have the lowest ones of 1 and 1.1 per 1000000, respectively (see Figure 1 ). This paper uses the time series, daily COVID-19 confirmed, recovery, and death cases in each GCC country. Therefore, we have a time series presented as follows: where represents the confirmed, recovery or death cases at day and 1 denotes the date of the first case of COVID-19 detected in a given country. The time-series plot of the daily COVID-19 confirmed, recovery, and death cases for GCC countries is presented in Figure 2 , Figure 3 , and Figure 4 , respectively. Oman reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, To compute the best parameters estimates of ARIMA, k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA models, these models were fitted for 90% of the available data in each country which is called the in-sample forecast or training data and the remaining 10% was used for the out-of-sample forecast or testing the model. The AIC of Eq. (11) was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE of Eq.(12) was utilized for testing data, and the model with the minimum AIC and minimum RMSE was selected. The calculations were performed using R studio version 1.2.5033 and EViews 10. To check whether the daily COVID-19 confirmed, recovery and death cases time series in each country were stationary; we carried on ADF root test. The results of the ADF unit root test are demonstrated in Table A .1 in the Appendix. Based on Table A .1, we conclude that all variables are stationary with constant and trend at first differences throughout the study period; therefore, the ARIMA model can be done. After the stationarity of the confirmed, recovery, and death cases time series in each country were determined, the best ARIMA model that fit these 3 variables well for training data with the minimum AIC and lowest RMSE were selected. Table 2 summarizes the best ARIMA model for the confirmed recovery and death cases in each country and their corresponding RMSE and AIC. Table 2 can be interpreted in the same manner. We can summarize the process of developing the k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA models as follows: 1. Transforming the original time series into the new one ( , , ) for = 2,3, … , 5 by using Eq. (5), Eq. (7), and Eq. (9), respectively. 2. Checking the stationary of time series ( , , ) using the ACF test until we achieve stationarity. 3. Applying the classical ( , , ) for the , or determined in step 2, where + ≤ 5. After taking the first differences of the transformed data to make it stationarity, we fitted 72 models for each type of the 3 k th MA-ARIMA models (6 countries × 3 variables × 4 values of ( = 2,3,4, 5) ). The best 18 out of 72 different combinations of k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA models fitting the confirmed, recovery and death cases of COVID-19 well with the corresponding RMSE and AIC for each country are presented in Table 3, Table 4 , and Table 5 , respectively. reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; Depending on the results in Table 3 , it can be concluded that the 2 nd SMA-ARIMA(2,1,3), 2 nd SMA-ARIMA(2,1,2), and 2 nd SMA-ARIMA(3,1,1) were selected as the best models to fit the confirmed, recovery, and death cases of COVID-19 in Saudi Arabia, respectively. trend. The remaining results of Table 3 and the outputs in Table 4 and Table 5 can be interpreted in the same manner. Table 6 reviews the best models among the k th MA-ARIMA models based on the smallest RMSE. In contrast, Table 7 shows the best models among classical ARIMA besides the k th MA-ARIMA based on the smallest RMSE. After identifying the best model within the classical ARIMA and k th MA-ARIMA models fitting confirmed, recovery and death cases for each country (see : Table 7) , the next step is to check the pattern followed by residuals from the specific model by plotting the ACF of the residuals and conducting the Box-Ljung test to examine the goodness of fit for each models. Figures (5.a.1 to 5.c6) show ACF plots for all the best models located in Table 7, while Table 8 demonstrates the outputs of the Box-Ljung test. 16.04 0.982 reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; By looking at the ACF plots in all sub-Figures of Figure 5 , it is observed that for the first 30 lags, most of the autocorrelations are inside the 95% confidence interval bounds indicating that they are white noise and normally distributed except ACF of Figure a2 and Figure a4 which have deviated a little from normality and randomized. The outputs of the Ljung-Box reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; https://doi.org/10.1101/2021.05.27.21257916 doi: medRxiv preprint test in Table 8 confirm that there is no autocorrelation left on the residuals for all models in Table 7 except the two models concerning confirmed cases in UAE and Qatar, and the null hypothesis that the residuals were white noise was not rejected and therefore, all models were exhibited goodness of fit. Thus, each model in Table 7 has passed the required checks and is ready for forecasting except the two models 5 th WMA-ARIMA(2,1,3) and ARIMA (2, 1, 3) corresponding to the confirmed cases in UAE and Qatar respectively. F statistic = 150.3*** Signif. codes: <0.001 "***" 0.001 "**" 0.01 "*" 0.05 ". " Therefore, the forecast values of confirmed cases in USA and Qatar shown in Table 9 were computed based on the cubic linear regression model. Four important models including classical ARIMA, k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA have been considered in the prediction of the confirmed, recovery, and death cases of the novel COVID-19 pandemic in the GCC countries, these models have been applied on the daily data from the first case reported in each country until Nov 30, 2020. To compute the best parameter estimates, each model was fitted for 90% of the available data in each country, which is called the in-sample forecast or training data, and the remaining 10% was used for the out-of-sample forecast or testing the model. The AIC was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE was utilized for testing data, and the model with the minimum AIC reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; https://doi.org/10.1101/2021.05.27.21257916 doi: medRxiv preprint and minimum RMSE was selected. The main finding, in general, is that the two models WMA-ARIMA and EWMA-ARIMA, besides the cubic linear regression model have given better results for in-sample and out-of-sample forecasts than the classical ARIMA models in fitting the confirmed and recovery cases while the death cases haven't specific models. No written consent has been obtained from the patients as there is no patient identifiable data included in this study. The data that the findings of this study are openly available, at the web addresses https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports . https://sehhty.com/ https://en.wikipedia.org/wiki/COVID-19_pandemic The authors declare that they have no conflicts of interest. reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; https://doi.org/10.1101/2021.05.27.21257916 doi: medRxiv preprint Coronavirus disease of 2019 (COVID-19) in the Gulf Cooperation Council (GCC) countries: Current status and management practices Coronavirus disease-19 spread in the Eastern Mediterranean Region, updates and prediction of disease progression in Kingdom of Saudi Arabia, Iran, and Pakistan Comparison of COVID-19 pandemic dynamics in Asian countries with statistical modeling. Computational and mathematical methods in medicine Analyzing and forecasting COVID-19 pandemic in the Kingdom of Saudi Arabia using ARIMA and SIR models reuse, remix, or adapt this material for any purpose without crediting the original authors preprint (which was not certified by peer review) in the Public Domain Modeling Nigerian Covid-19 cases: A comparative analysis of models and estimators Mathematical modeling of the COVID-19 prevalence in Saudi Arabia. medRxiv Forecasting daily confirmed COVID-19 cases in Malaysia using ARIMA models Brief Analysis of the ARIMA model on the COVID-19 in Italy. medRxiv Forecasting of COVID19 per regions using ARIMA models and polynomial functions Modeling Palestinian COVID-19 Cumulative Confirmed Cases: A Comparative Study Predicting the Pandemic COVID-19 Using ARIMA Model Spatial prediction of COVID-19 epidemic using ARIMA techniques in India. Modeling earth systems and environment Forecasting the covid-19 outbreak: an application of arima and fuzzy time series models A weighted moving average process for forecasting K-th moving, weighted and exponential moving average for time series forecasting models A new look at the statistical model identification Another look at measures of forecast accuracy Time series analysis forecasting and control