key: cord-0787829-d5be5q16 authors: Biswas, Santanu title: Forecasting and comparative analysis of Covid-19 cases in India and US date: 2022-03-19 journal: Eur Phys J Spec Top DOI: 10.1140/epjs/s11734-022-00536-3 sha: 795bcc8372ff18dbb3d5bb0eff3ec28832968474 doc_id: 787829 cord_uid: d5be5q16 The devastating waves of covid-19 have wreaked havoc on the world, particularly India and US. The article aims to predict the real-time forecasts of covid-19 confirm cases for India and US. To serve the purpose, ARIMA and NNAR based models have been used to the daily new covid-19 confirm cases. The proposed hybrid models are: (i) ARIMA-NNAR model, (ii) NNAR-ARIMA model, (iii) ARIMA-Wavelet ARIMA model, (iv) ARIMA-Wavelet ANN model, (v) NNAR-Wavelet ANN model, and (vi) NNAR-Wavelet ARIMA model. The models are performed to predict the next 45 days of daily new cases. These forecasts can help Govt. to predict the behavior of covid -19 and aware people about the upcoming third wave of covid-19. Our results suggest that hybrid models perform better than single models. We have also proved that our wavelet-based hybrid models can outdated the performance of previously defined hybrid models in terms of accuracy assessments (MAE and RMSE). We have also estimated the time-dependent reproduction number for India and US to observe the present situation. The total number of Covid-19 confirmed cases worldwide had exceeded 180 million, with more than 3.9 million deaths. Like several parts of the world, India and US have also witnessed a tremendous storm of Covid-19 cases. Both the country, India and US have successfully defended the first wave of Covid-19 without any significant mortality or morbidity through several protection measures, but the second wave overwhelmed the present health system. With a huge population and an inadequate health system, India may lose several lives due to Covid-19 [1] . In recent years, Mathematical Modelling, Artificial intelligence, Machine learning and network modeling have been applied in the arena of screening, predicting, forecasting, contact tracing, and drug development for Covid-19 [2] [3] [4] [5] . Only a few research has been performed to model the Covid-19 pandemic in India: (a) Deep learning-based models are applied for predicting positive reported cases of Covid-19 in India [6, 7] (b) SSA technique is another option for forecasting daily confirmed cases due to Covid-19 in the USA, India, Brazil, Russia, France, Spain, Argentina, Colombia, UK, and Mexico [8] (c) Fractal interpolation method to predict the third wave of Covid-19 is also discussed [9, 10] e.t.c. a e-mail: Santanubiswas1988@gmail.com (corresponding author) Presently, US and India are the leading countries based on the daily confirmed cases. The peak of India's second Covid wave was recorded on May 8, 2021, with more than 0.4 million cases. While we aren't yet out of the clear, medical experts forewarned us about the upcoming wave of Covid-19. Due to the deadly impact of Covid-19 in people's live, policymakers are most concerned about the third wave of the Covid 19 pandemic. In this article, we made an effort to predict the outbreak of Covid-19 based on the proposed hybrid models. We have forecasted the confirmed cases due to Covid-19 over a sufficiently large time by using time series data. Time series analysis represents the past relationship among the variables by applying various modeling methodologies. ARIMA is often used statistical models for time series analysis [11] but recent research about the application of wavelet analysis in non stationary time series analysis shows magnificent results. The Neural network model deals with complex nonlinear relationships between the response variable and its predictors. Thus the combination of the ARIMA or Neural network model with wavelet technology may reduce the bias and variance of the prediction error for the component models. The main aim of this paper is to determine the relative predictive capabilities of the hybrid models and to conclude that which sequence of ARIMA and NNAR is better for constructing series hybrid models for Covid 19 time series forecasting. On the other hand, the real time forecasting for Covid-19 confirmed cases will be helpful for government officials and policymak-ers to allocate adequate health care resources for the coming days. In the absence of medicine or antiviral drugs for COVID-19, these estimates will provide an insight into the resource allocations for the exceedingly affected countries like India to keep this epidemic under control. The rest of this paper is structured as follows: Sect. 2 describe the formulation methodology of the hybrid models and describe the component models individually. In Sect. 3, we have discussed about the data set adopted in the article, the results of forecasting models applied in the study and related discussion about models forecasting performance are mentioned. In the next section, basic and effective reproduction numbers are analyzed to observe the present Epidemic scenario in India and US. We end our study with Sect. 5, reserve for discussion and the conclusion. In this section, we have discussed about the brief description of various models, which has implemented here for data analysis. ARIMA is one of the most common methods used for time series analysis. The ARIMA model consists of three non-negative integers p (order of AR model),d (different degree of trend differences) and q (order of MA model). Any ARIMA model can be expressed as ARIMA(p,d,q). Building any ARIMA model consists of three phases: model identification, parameter estimation, and diagnostic checking of the model. This modelling approach assumes that the observation z t at time t is a function of p lagged values. The NNAR(p,k) is a neural network with k (k = [ p+1 2 ]) hidden nodes and past p lagged values as input i.e. . "learning algorithm" was used to get the weight of the input data. Logistic activation function execute iteratively to forecast the time series. Wavelet-based forecasting methods can be used for nonstationary data analysis. Multi-resolution and localization ability in both the time and frequency domain draws the attention of researcher in wavelet analysis [12] . There are several types of mother wavelets available in the literature [13] . The decomposition level for wavelet analysis is int log(n)( n is the length of the time series). The wavelet-based forecasting (WBF) model transforms the time series data by using a hybrid maximal overlap discrete wavelet transform (MODWT) algorithm with a "haar" filter. At first, Daubechies wavelet transformation was used to remove the high frequency components from non stationary time series data. After that, the time series data is reconstructed using denoising method. Finally, ARIMA or ANN methods are implemented in the reconstructed series to forecast the time series. Modeling the linear part, nonlinear part individually and combining them to forecast the data set can be advantageous for modeling purposes [14] . Six possible hybrid models based on the sequence of using ARIMA and NNAR models in series combination can be presented as: The ARIMA models can be used to capture linear trends in the data set, while the NNAR models can be used for nonlinear time series prediction. The hybrid strategy with linear and non-linear modeling approach can be a better alternative than single models. The hybrid model H t can be represented as where L t represents linear part and N t represents nonlinear part of the model. Both, L t and N t are estimated from the data set. IfL t be the predicted value of L t by ARIMA model and t be the residual part of the time series, then H t =L t + t . Next, the residual part is captured by NNAR models as follows: Error where g is a nonlinear function modeled by the NNAR approach. The hybrid time series model can be defined asĤ whereN t represents the forecast value of NNAR model. In summary, the proposed hybrid ARIMA-NNAR model works in two stages. In the first stage, an ARIMA model is applied to analyze the linear part of the model. In the next stage, an NNAR model is employed to model the residuals of the ARIMA model. The hybrid model also reduces the model uncertainty which occurs in inferential statistics and forecasting time series. The proposed NNAR-ARIMA model works in two phases. In the first phase, a nonlinear model as NNAR was used to model a data set, then the residual part of the NNAR model contains only a linear structure. In the next stage, the linear part (residual part of the NNAR model) can be modeled by ARIMA. The proposed ARIMA-Wavelet ARIMA model performs in two parts. In the first part of the proposed modelling approach, an ARIMA model is built to model the linear components of the time series, and a set of out-of-sample forecasts are generated. In the second part, the ARIMA residuals (oscillatory residual series) are modeled using WBF model (2.3) with ARIMA method to forecast the time series. For the proposed ARIMA-Wavelet ANN model, we have used same methodology as ARIMA-Wavelet ARIMA model but in the second step the ARIMA residuals are remodeled using WBF model (2.3) with ANN method. Same methodology as ARIMA-Wavelet ARIMA model, except that the proposed hybrid approach used NNAR as base model. Same methodology as ARIMA-Wavelet ANN model, except that the proposed hybrid approach used NNAR as base model. In this section, the covid 19 death cases have been fitted with our model. We have also checked the accuracy of our models by two metrics, namely MAE and RMSE. US and India have approximately produced one-third of total Covid-19 confirmed cases worldwide (Fig. 1) . In this article, two different data sets of Covid-19 confirmed cases are used here for prediction purposes. For India and US, we consider the daily reported confirmed cases from 03.14.2020 to 30.01.2022 and from 22.01.2020 to 30.01.2022 respectively. The data set consists of 688 and 741 observations respectively. The source of our data is https://api.covid19india.org/ and Johns Hopkins Github data. To evaluate the performance of our models, we have used two metrics, namely MAE and RMSE. Here, z i is the predicted value by using our model, y i is the target output and i is the corresponding data point which varies from 1 to n. In this section, we have discussed the results of the training and testing data sets. We have divided the dataset in two ways:-(i) The last data for 8 days are used for model testing and the remaining data samples are used for model building (Experiment-1) (ii) The last data for 15 days are used for model testing and the remaining data samples are used as training dataset (Experiment-2). Tables 1, 2, 3 and 4 described the accuracy of our models for the daily new Covid-19 data sets of India and US. Tables 1 and 3 compares the result of evaluated models for training samples , whereas Tables 2 and 4 explain the outcome for the testing data sets ). All the models (Hybrid 1 , Hybrid 2 , Hybrid 3 , Hybrid 4 , Hybrid 5 and Hybrid 6 )) have used to forecast the data sets of India and US for the next 15 days. Further, ARIMA residuals (R 1 ) are modeled with NNAR(p, k) model having a pre-defined box-cox transformation set to λ = 0 to ensure the forecast values to stay positive and the residual part (R 2 ) of the NNAR model are modeled by the ARIMA to get ARIMA-NNAR and NNAR-ARIMA model. We have added both the linear and nonlinear forecasts to obtain the final forecast results. The predicted values of the proposed hybrid models with the training data and testing data sets are compared for performance assessments. We have compared the accuracy of the proposed models by the metrics RMSE, and MAE. Experiment-1 suggest that, Hybrid 1 (ARIMA-NNAR) and NNAR-WaveletANN (Hybrid 5 ) are best fitted model for the training data of India and US respectively ( From the disease transmission perspective, the effective reproduction number is considered useful to compute transmission probability. The basic reproduction number (R 0 ) can be defined as the average number of secondary infections produced from a primary infected case in a completely susceptible population, but in reality total population can not be susceptible at a given time. On the other hand, Effective reproduction number (R t ) can be defined as the realistic average of secondary cases per primary case at a given time t > 0 [15] . The number of infected cases increase when R t is greater than 1 and decreases if R t is less than 1. The R package EpiEstim was used to estimate R t for India and US. We assume that the serial interval distribution follows gamma distribution with mean (sd) was 4 (2) [16] . Figure 7 represents the Effective reproduction number for India and US respectively. Initially, R t is greater than 1 but due to strict lock down it decreases gradually. Our estimates indicate that it took more than 100 days for R t to fall below 1 after the second wave of the epidemic. Effective reproduction number is higher for the present Omicron wave in India and US than the other waves. In this paper, hybrid models are developed and the predictive capabilities of the proposed models are investigated using past data of daily new cases by COVID-19 and a prediction of 15 days ahead daily new confirm cases was made for India and US. Wavelet decomposition method was combined with ARIMA and NNAR model to develop a better hybrid model to forecast future cases accurately. In this article, we have proposed six hybrid models and the accuracy of the hybrid models are compared using training and testing data sets. Tables 1, 2, 3 and 4 suggest us best fitted model for the training and testing data samples between the models ARIMA, NNAR, ARIMA-NNAR (Hybrid 1 ), NNAR-ARIMA (Hybrid 2 ), ARIMA-Wavelet ARIMA model (Hybrid 3 ), ARIMA-Wavelet ANN model (Hybrid 4 ), NNAR-Wavelet ANN model (Hybrid 5 )and NNAR-Wavelet ARIMA model (Hybrid 6 ). In general, it can be demonstrated that series hybrid models can obtain results that at least is better than one of the component models. On the other hand, obtained results of series hybrid models will not be generally worse than all their components. However, in comparison of series hybrid models with themselves, the results indicate that the NNAR based hybrid models overall outperforms the Arima based hybrid models. Second wave of COVID-19 pandemic in India: barriers to effective governmental response Mathematical prediction of the time evolution of the COVID-19 pandemic in Italy by a Gauss error function and Monte Carlo simulations Describing the COVID-19 outbreak during the lockdown: fitting modified SIR models to data COVID-19 prediction using AI analytics for South Korea A gradient boosting machine learning approach in modeling the impact of temperature and humidity on the transmission rate of COVID-19 in India Prediction and analysis of COVID-19 positive cases using deep learning models: a descriptive case study of India Time series forecasting of Covid-19 using deep learning models: India-USA comparative case study Forecasting COVID-19 pandemic using optimal singular spectrum analysis The second and third waves in India: when will the pandemic be culminated? An exploration of fractal-based prognostic model and comparative analysis for second wave of COVID-19 diffusion Forecasting dengue epidemics using a hybrid methodology Data driven estimation of novel COVID-19 transmission risks through hybrid softcomputing techniques Modelling and forecasting of COVID-19 spread using wavelet-coupled random vector functional link networks A comparative study of series arima/mlp hybrid models for stock price forecasting A simplified estimate of the effective reproduction number Rt using its relation with the doubling time and application to Italian COVID-19 data Basic and effective reproduction numbers of COVID-19 cases in South Korea excluding Sincheonji cases Can India develop herd immunity against COVID-19?