key: cord-0868915-qtoa38qi authors: Singh, Sarbjit; Parmar, Kulwinder Singh; Kumar, Jatinder; Makkhan, Sidhu Jitendra Singh title: Development of New Hybrid Model of Discrete Wavelet Decomposition and Autoregressive Integrated Moving Average (ARIMA) Models in Application to One Month Forecast the Casualties Cases of COVID-19 date: 2020-05-11 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.109866 sha: c2867bdcd5e3c8f8d0d8e154e0ad0969b48c1937 doc_id: 868915 cord_uid: qtoa38qi Everywhere around the globe, the hot topic of discussion today is the ongoing and fast-spreading coronavirus disease (COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-COV-2). Earlier detected in Wuhan, Hubei province, in China in December 2019, the deadly virus engulfed China and some neighboring countries, which claimed thousands of lives in February 2020. The proposed hybrid methodology involves the application of discreet wavelet decomposition to the dataset of deaths due to COVID-19, which splits the input data into component series and then applying an appropriate econometric model to each of the component series for making predictions of death cases in future. ARIMA models are well known econometric forecasting models capable of generating accurate forecasts when applied on wavelet decomposed time series. The input dataset consists of daily death cases from most affected five countries by COVID-19, which is given to the hybrid model for validation and to make one month ahead prediction of death cases. These predictions are compared with that obtained from an ARIMA model to estimate the performance of prediction. The predictions indicate a sharp rise in death cases despite various precautionary measures taken by governments of these countries. In Dec 2019, Wuhan, China, witnessed the start of an epidemic, which is just a period of two months overpowered the entire world and took the form of a pandemic named COVID-19 , Shen et al. 2020 , Zhu et al. 2020 . The novel coronavirus disease pandemic caused by the virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has engulfed the entire world within a short period of time (Lai et al. 2020; Yang and Wang, 2020; WHO 2020) . Being highly contagious in nature, it poses a massive threat to people"s health as till 10:00 CET, 30 March 2020 a total of 693282 confirmed cases and 33106 deaths were reported globally as per World Health Organization (WHO) (Lai et al. 2020 ; Wang et al. 2020; WHO 2020; Yang and Wang, 2020) . The outbreak of new infection has created an emergency situation that raises many important questions related to its transmission dynamics, mitigation, and control measures. Researchers are taking the help of mathematical modeling in order to provide answers to such urgent queries (Chen et al., 2020) . For instance, to contain the spread, strategies such as social distancing, quarantine and contact tracing of the infected or suspected people, the complete lockdown of the area or countries dealing with it and screening international travelers are the results of model predictions (Hellewell et al. 2020; Prem et al. 2020; Mandal et al. 2019; Choi and Ki, 2020; Yuan et al. 2020) . Early modeling results by Kucharski et al. based on the stochastic transmission model told about the variation of COVID-19 over a certain period of time, probability of an outbreak in other areas outside Wuhan and observed a decline in reproduction number from 2.35 to 1.05 after the introduction of travel restrictions . In another study by Chen et al. (2020) , a Bats-Hosts-Reservoir-People transmission network model was developed to simulate the probable transmission from bats to human beings. They also simplified the above model and found that majorly the transmission occurred from person to person relying on the reproduction number estimated as 3.58 (Prem et al. 2020) . Li et al. also provided evidence about the person to person transmission route in Wuhan, China, and the calculated that the number of infections doubled in 7.4 days . Similarly, several other models have been used to access the outbreak characteristics Binti Hamzah et al. 2020; Lin et al. 2020) . Moreover, once the vaccine is made available, its effective distribution could be carried out by mathematical modeling, as suggested in the literature for such infections (Abdirizak et al. 2019 ). However, looking at the severity of the pandemic and the rapidly changing numbers of the infected population, it demands constant data analysis. Time series analysis and forecasting deal with understanding the past relationship among the variables by using various modeling techniques with the ultimate goal of obtaining accurate prediction of future values. Box-Jenkins based ARIMA (Autoregressive Integrated Moving Average) model is a widely used statistical model in time series analysis, which covers a wide variety of patterns, ranging from stationary to non-stationary and seasonal (periodic) time series (Melard and Pasteels, 2000; Valenzuela et al., 2008) . However, in dealing with non-linear situations where data is not a linear function of time, Box-Jenkins methodology is inappropriate (Box & Jenkins, 1976; Kantz & Schreider, 1997; Melard & Pasteels, 2000; babu et al. 2014) . For accurate forecasting of non-linear data, wavelet analysis is a magnificent tool that is capable of diagnosing high-frequency components in time series data (Mallat 1989; Daubechies 1992; Meyer and Coifman 1997; Percival and Walden 2000; Freire et al. 2019) . Discreet wavelet transformation involves decomposition of time series at different scales, and each component series can be treated for forecasting purpose (Torrence and Compo 1998; Ramsey 2002; Parmar et al. 2014 Parmar et al. , 2015 Soni et al. 2014 Soni et al. , 2015 Soni et al. , 2016 Kumar et al. 2015) . The use of wavelets for forecasting purposes includes the extent of refinement and flexibility, which the traditional methods cannot afford (Diebold 1998; Yousefi et al. 2005; Huang et al. 2011; Yeap et al. 2017; Jeddi 2020) . The present study deals with developing a hybrid model for making the prediction of death cases due to COVID-19 by understanding the dynamic nature of the transmission of the virus. Hybrid modeling in such a situation can prove to be a vital tool to deal with it by studying its potential of transmission and growth of the virus in the long run (Ma et al. 2004 , Zhao et al. 2020b . For this, we have considered the dataset of daily deaths due to the COVID-19 in most affected five countries of the world, namely Italy, Spain, France, the United Kingdom (UK), and the United States of America (USA) (Data Source: World Health Organization). Wavelets are localized functions with zero mean and compact support, which are capable of analyzing non-periodic and transient signals (Saadaoui et al. 2014 : Davidson et al. 1998 . A function is a wavelet if it satisfies the admissibility condition (1). where denotes Fourier Transform of . A family of functions generated by translation and dilation of a single function is known as the "Mother Wavelet." A mother wavelet constitutes a family of functions of the form where "a" is a scaling parameter which determines the expansion or compactness of a signal, and "b" is a translation or shifting parameter which determines the location of wavelet. For discreet wavelet decomposition of time series { }, the mother wavelet function and the father wavelet function are defined respectively by equations (3) and (4). The approximation coefficients are obtained by convoluting the scaling coefficients with and the convolution with of the wavelet function gives the detailed coefficients which are given by equations (5) and (6). Using integrals (5) and (6), decomposed series applicable to continuous time series is given by (7). Since the time series data under study is discreet and is of finite length, so the discretized time series of length is given by (8). The decomposition of into approximation and detail components is also classified in figure 1 (Soni et al. 2017) . plots. These plots are also helpful in identifying the values of parameters and (Chatfield 1996; Brockwell and Davis 2002) . Parameter estimation of the appropriately selected model is made by maximum likelihood, which is a commonly used method for evaluation. Finally, the overall adequacy of the model is checked with the help of the Ljung and Box test so that no further modeling of time series is required (McNeil et al. 2006; Peng et al. 2014 ). An ARIMA model using lag polynomial L is expressed as where the non-negative integers and are the orders of autoregressive and moving average polynomials respectively; d is the non-seasonal differencing required to make data stationary; is the value of observations and is a random error at time t; and are the coefficients. Both the ARIMA model and the Wavelet decomposition methods have different tendencies to deal with linear and non-linear features of data, so the coupled models proposed in this study consists of forecasting by ARIMA models on the time series data refined by wavelet decomposition methods. Thus, the coupled models can improve forecasting performance by modeling linear and non-linear components of data (Salazar et al. 2019 ). In the wavelet decomposition method, time-series data is first decomposed into approximations ( ) and detail ( ) coefficients (Section 2.1), which can be used as separate series for prediction purposes; then, each of these series is modeled and forecasted by using an appropriate ARIMA model. The predicted approximations ( ̂) and detail ( ̂) coefficients so obtained are summed to obtain forecasted data ( ̂) , expressed as In this paper, the dataset consisting of death cases by COVID-19 in five countries of the world, namely Italy, Spain, France, the United Kingdom, and the United States of America is used as input to a hybrid model and prediction results so obtained are compared with that of the ARIMA model. The dataset consists of 82 daily observations ranging from 21 January 2020 to 11 April 2020, out of which 66 data points ( ) are used for modeling purpose and rest 16 ( ) are kept for testing purpose of validating the model. The first step in many time series methods is to check the stationarity of data. Quick changes in time series data indicate non-stationarity, which can be checked by an autocorrelation function (ACF) and partial autocorrelation function (PACF) plots. A slow decaying ACF plot indicates that the time series is non-stationary, and it is removed by differencing transformation to get stationary data (Chatfield 1996) . After checking stationarity, the next step is to determine the order of the ARIMA model parameter, which can be determined by the ACF plot of differenced time series. Then, an appropriate ARIMA model is fitted to data that generates future values of time series data. Wavelet decomposition is an excellent method of extracting different frequency components from a signal and explores important features of the signal. For applying wavelet decomposition to time series, the choice of mother wavelet, its order, and level of decomposition is very important. There are several families of wavelets for wavelet decomposition, but Daubechies wavelet is one of the important types of wavelets which has its own advantages. An accurate forecasting system is developed on the basis of appropriate order and level of decomposition of the input signal. COVID-19 death cases data is decomposed by using the Daubechies wavelet of order 8 and level 3, which are shown in figure 2. The approximation parts are low-frequency parts showing a trend and detailed parts representing high-frequency parts. The approximation A3 and details D1, D2, D3 are separately modeled with an appropriate ARIMA model to obtain predicted components. The predicted outputs ̂ ̂ ̂ ̂ are finally summed to obtain the forecasts of death cases data given in equation (11). where capped ( ) symbol is used to denote predicted values. In this section, Wavelet decomposition, together with an ARIMA model, is applied to COVID-19 death cases dataset for obtaining accurate prediction results. Autoregressive Integrated Moving Average (ARIMA) is an appropriate econometric model used to generate future values independently as well as jointly with Wavelet decomposition (Guerrero et al. 1991; Bianchi et al. 1998; Akrami et al. 2014 ). In the case of a hybrid model, the data is The hybrid model of discreet In this paper, hybrid Wavelet-ARIMA model is developed and the accuracy of proposed model is investigated using past 66 days data of death cases by COVID-19 and a 12-04-2020 13-04-2020 14-04-2020 15-04-2020 16-04-2020 17-04-2020 18-04-2020 19-04-2020 20-04-2020 21-04-2020 22-04-2020 23-04-2020 24-04-2020 25-04-2020 26-04-2020 27-04-2020 28-04-2020 29-04-2020 30-04-2020 01-05-2020 02-05-2020 03-05-2020 04-05-2020 05-05-2020 06-05-2020 07-05-2020 08-05-2020 09-05-2020 10-05-2020 11-05-2020 Italy Spain France USA UK prediction of 16 days ahead death cases was made within sample which was then used to predict one-month ahead out of sample death cases in most affected five countries of world Evaluating the potential impact of targeted vaccination strategies against severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) outbreaks in the healthcare setting Rainfall data analyzing using moving average (MA) model and wavelet multi-resolution intelligent model for noise evaluation to improve the forecasting accuracy A moving-average filter-based hybrid ARIMA-ANN model for forecasting time series data Mitigating the COVID Economic Crisis: Act Fast and Do Whatever It Takes Improving forecasting for centers by ARIMA modeling with intervention CoronaTracker: Worldwide COVID-19 Outbreak Data Analysis and Prediction Time series analysis, forecasting and control Introduction to Time Series and Forecasting The Analysis of Time Series: An Introduction A mathematical model for simulating the phase-based transmissibility of a novel coronavirus Estimating the reproductive number and the outbreak size of Novel Coronavirus disease (COVID-19) using mathematical model in Republic of Korea Venezuelan migrants "struggling to survive" amid COVID-19 Wavelet analysis of commodity price behaviour Orthonormal bases of compactly supported wavelets Ten lectures on wavelets Analysis of the use of discrete wavelet transforms coupled with ANN for short-term streamflow forecasting ARIMA forecasts with restrictions derived from a structural change Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Forecasting stock indices with wavelet domain kernel partial least square regressions A hybrid wavelet decomposer and GMDH-ELM ensemble model for Network function virtualization workload forecasting in cloud computing Nonlinear time series analysis Early dynamics of transmission and control of COVID-19: a mathematical modelling study Forecasting the Time Series Data Using ARIMA with Wavelet Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and the challenges Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action Improving real time flood forecasting using Fuzzy Inference system Spatiotemporal fluctuation scaling law and metapopulation modeling of the novel coronavirus (COVID-19) and SARS outbreaks Mathematical modeling and research of infectious disease dynamics Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: A mathematical model-based approach Automatic ARIMA modeling including interventions, using time series expert software A Theory for Multiresolution Signal Decomposition: The Wavelet Representation Quantitative risk management: concepts, techniques, and tools Wavelets Breaking down of healthcare system Mathematical modeling for controlling the novel coronavirus (COVID-19) outbreak in Wuhan Water quality management using statistical and time series prediction model Wavelet and statistical analysis of river water quality parameters Statistical, Time Series and Fractal Analysis of Full Stretch of River Yamuna (India) for Water Quality Management A novel hybridization of echo state networks and multiplicative seasonal ARIMA model for mobile communication traffic series forecasting Wavelet methods for time series analysis The effect of control trategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study Wavelets in Economics and Finance: Past and Future Successful containment of COVID-19: the WHO-Report on the COVID-19 outbreak in China Modeling the epidemic trend of the 2019 novel coronavirus outbreak in China A wavelet-based multiscale vector-ANN model to predict co-movement of econophysical systems Predicting hourly ozone concentrations using wavelets and ARIMA models Statistical Variability Comparison in MODIS and AERONET Derived Aerosol Optical Depth Over Indo-Gangetic Plains Using Time Series Time series model prediction and trend variability of aerosol optical depth over coal mines in India Statistical Analysis of Aerosols over the Gangetic-Himalayan region using ARIMA model based on longterm MODIS observations Modeling of Air Pollution in Residential and Industrial Sites by Integrating Statistical and Daubechies Wavelet (Level 5) Analysis A practical guide to wavelet analysis Hybridization of intelligent techniques and ARIMA models for time series prediction A review of the 2019 Novel Coronavirus (COVID-19) based on current evidence Coronavirus disease 2019 (COVID-19) Situation Report -70 Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Real time tentative assessment of the epidemiological characteristics of novel coronavirus infections in Wuhan, China, as at 22 COVID-19: a new challenge for human beings Analysis and validation of wavelet transform based DC fault detection in HVDC system A simple model to assess Wuhan lock-down effect and region efforts during COVID-19 epidemic in China Mainland Wavelet-based prediction of oil prices Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak Estimating the unreported number of novel coronavirus (2019-nCoV) cases in China in the first half of January 2020: a data-driven Modelling analysis of the early outbreak A pneumonia outbreak associated with a new coronavirus of probable bat origin A novel coronavirus from patients with pneumonia in China There is no conflict of interest between the authors.Availability of Data and Materials: All data are publicly available with WHO. No conflict of interest exists.