key: cord-311054-dwns5l64 authors: Rafiq, Danish; Suhail, Suhail Ahmad; Bazaz, Mohammad Abid title: Evaluation and prediction of COVID-19 in India: a case study of worst hit states date: 2020-06-19 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110014 sha: doc_id: 311054 cord_uid: dwns5l64 In this manuscript, system modeling and identification techniques are applied in developing a prognostic yet deterministic model to forecast the spread of COVID-19 in India. The model is verified with the historical data and a forecast of 30-days ahead is presented for the 10 most affected states of India. The major results suggest that our model can very well capture the disease variations with high accuracy. Results also show a steep rise in the total cumulative cases and deaths in the coming weeks. The advent and spread of 2019 novel coronavirus (SARS-CoV-2) has posed a global health crisis with a sharp rise in cases and deaths since its first detection in Wuhan, China, in December 2019. The infection causes illness ranging from common cold to extreme respiratory disease and death [1] . Currently, the prime epidemiological risk factor for 2019 novel corona-virus disease includes close contact with infected individuals with an incubation period of 2 − 14 days [2] . The case mortality rate is projected to range from 2 to 3% [3] . Various drugs are being assessed in line with previous researches into therapeutic treatments for SARS and MERS, however, there is no robust evidence for any significant improved clinical outcome [4] . Apparent risk of acquiring the disease has led many governments to institute a variety of control procedures like quarantine, isolation and lock-down measures. Despite rigorous global containment measures, the frequency of the novel corona-virus disease continues to rise, with over 4.5 million confirmed cases and over 300, 000 deaths worldwide as on 17 th May, 2020 [5] . Although countries around the world have enhanced capacity building of the laboratory systems and response procedures, yet, there is a need for proper disease surveillance systems. Comprehending the initial transmission of the virus and analyzing the effectiveness of control measures are crucial in assessing the prospects for continued transmission in newer locations. This necessitates tracking the course of the pandemic to be able to foresee its emergence for a better response. Prospective studies on modeling and forecasting of the epidemic have been carried out to provide analytical predictions on the size and end phase of the spread. Wu, et al. [6] have used a susceptible exposed infectious recov-Email address: danish_pha2007@nitsri.net (Corresponding Author) (Danish Rafiq) ered (SEIR) meta-population model to simulate the epidemic across all major cities in China. Early dynamics of transmission and control of COVID-19 within and outside Wuhan has also been studied using a stochastic transmission dynamic model [7] . Another study used the SEIR compartmental model to predict the feasibility for conducting the summer Olympics of 2020 in Japan [8] . Similarly, Abdullah, et al. [9] presented a stochastic SIR model to predict the spread of COVID-19 in Kuwait. A classical SEIR type mathematical model is also presented in [10] to study the qualitative dynamics of COVID-19 in India. Further work has been carried out in [11] , with special focus on the transmissibility of super-spreader individuals in Wuhan, China. Besides the above mentioned compartmental models, some other methods have been used to model and forecast the COVID-19 spread. For example, in [12] , a data-driven estimation method like long short-term memory (LSTM) is used for the prediction of total number of COVID-19 cases in India for a 30-days ahead prediction window. In [13] , daily forecasts of COVID-19 activity from global epidemic and mobility model (GLEAM), an agent-based mechanistic model is used as an one of the inputs to produce stable and accurate forecasts two days ahead of current time. Harun, et al. [14] have used Box-Jenkins (ARIMA) and Brown/Holt linear exponential smoothing methods to estimate and forecast the number of COVID-19 cases in the G8 countries. Al-qaness et al. [15] have incorporated a modified version of flower pollination algorithm (FPA) coupled with the salp swarm algorithm (SSA) to forecast the number of confirmed cases of COVID-19 for ten days in China. As on 17 th May 2020, India observed a total cases of 90, 927 with 2, 872 deaths [16, 17] . The very first case was reported on 30 th January 2020, in a coastal state of Kerela (southern India) when a student returned from Wuhan, China. Subsequently, the number of positive cases in In-dia rose rapidly due to the arrival of many passengers via airways [18] . An overview of the spread of COVID-19 in India is shown in figure (1) . It can be easily seen that the virus has spread to entire country with the worst hit states being Maharashtra (30,706 cases), Gujarat (10,988), Tamil Nadu (10,588), Delhi (9,333), Rajasthan (4,960), and Madhya Pradesh (4,789). Figures (2) and (3) show the trend of rising new cases and deaths in India. This manuscript demonstrates a control-theoretic, datadriven estimation technique to derive a time-series model from the historical data collected from [5, 16] up-to 17 th May 2020. The model is then used for the prediction of the total number of cases and deaths in most affected states of India for the next 30 days. The paper is sectioned as follows: Section (2) describes the system identification method employed. Section (3) presents the predicted cases and deaths along-with some discussions. Finally, conclusions are presented in section (4). To estimate the spread of COVID-19 in India, we used a Predictive Error Minimization (PEM) based system identification technique to identify a discrete-time, single-input, single-output (SISO) model [19] [20] [21] . Different models very identified for different states based on the data collected. The models were then verified on the testing data and upon validation, the models were used to predict the total number of cases and deaths for the next 30-days in the 10 worst hit states in India. The discrete-time, identified model can be realized in the state-space from given as: where the y(t) represents total number of cases or deaths of a particular area which is proportional to system state vector x(t) ∈ R n , u(t) is the time series input and T s is the sampling interval. Here, the unknowns to be identified are A ∈ R n×n , K ∈ R n×1 and C ∈ R 1×n which are in canonical form. Here, n is the dimension of the state-space model. The identification problem can thus be posed as to selecting a model set M (θ) (indexed by a finite dimensional parameter vector θ) and evaluating a member from the set which best describes the recorded input-output relation according to a given criterion. One such criteria as given in [22] is defined as : where (t, θ) = (y 0 −ŷ 0 , ..., y N −ŷ N ) is referred as the prediction error, l(.) is a scalar measure of fit and z(t) = [y T (t), u T (t)] and N is length of data-set. Typical choices of l(t, θ, ) can be seen in [22] . The identified model thus minimizes the 1-step ahead prediction and the error (t, θ) between the measured y(t) and predicted valuesŷ(t) is used to make the future prediction about the system. The prediction error identification estimate is thus given as: Here, we have taken: Figures (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) show the dynamics of the forecasted response for most infected states of India along-with a 10step predicted response comparison with the validation data. Further results are presented in table (1). As seen from table (1), Maharashtra has recorded the highest number of COVID-19 cases accounting for 36% of the total countrys caseload. It has also witnessed the sharpest rise in COVID-19 deaths with Mumbai being the epicenter of the pandemic in India. The constant influx of tourists, reliance on public transportation and population destiny have cumulatively made the metropolitan city hospitable for corona virus. Even though the state is conducting more tests, the violation of physical distancing rules by individuals particularly in containment zones results in the mixing of infected with healthy population. Moreover, unlike other red zones of Maharashtra, Mumbai faces shortage of ICU beds and dedicated COVID-19 hospitals. According to the prediction made herein, it would be inevitable that Mumbai and its suburbs would continue to see an upsurge in the number of cases and deaths for at least up to 17 th June 2020. Gujarat has recorded the second highest COVID-19 mortality rate in the country in spite of reporting its first case as late as March 20. The COVID-19 mortality rate of Ahmedabad city is 6.8%, which is double the national average. Officials acknowledge that while Gujarat had its guard up sufficiently fast, there was a delay in testing. Even by mid of March, the daily average was as less as 15 tests per day, going up to 200/day by the end of March. According to the data driven identification scheme employed herein, the mortality rate in Gujarat may increase as high as 15.2% up to 17 th June 2020. Tamil Nadu, although being the third worst hit Indian state in terms of COVID-19 cases has witnessed the least number of mortalities with 1 among 143 positive cases succumbing to the disease (see fig 6) . This is attributed to its credibility as a trusted medical center of the country. Chennai has the highest medical tourism in India with the states average being above the national average in the health sector. This may be the reason that the predictable mortality rate of Tamil Nadu projected in this study is least among the rest of the states in consideration (see table (1)). As per our prediction based on data up to 17 th May 2020, Delhi along with other states would continue to see marginal surge in the number of COVID-19 cases owing to the relaxations in lock-down measures. The impact of removing the curbs will be more evident by the mid of June 2020. The under-funding of the healthcare system, paucity of testing labs, violations of the lock-down protocols and inadequate quarantine facilities arranged by states and union territories are the biggest hurdles in combating the spread. The study concerns the spread of COVID-19 in India. A control-theoretic approach is used to develop an epidemic model to simulate and predict the disease variations of 10 most affected states of India. Results depict a rapid increase in the number of cases in the coming days. However, it is pertinent to mention that the future estimation provided, is subject to certain system parameters and can vary based on the external inputs like lock-down measures, social-distancing, vaccine/drug development, rapid testing, etc. Information provided by our model could help establish a realistic assessment of the situation for the time-being and in the near future in order to apply the appropriate public health measures. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Carlos,Mild or Moderate Covid-19 Three months of COVID-19: A systematic review and meta-analysis Coronavirus: covid-19 has killed more people than SARS and MERS combined, despite lower case fatality rate Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China, The lancet Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Early dynamics of transmission and control of COVID-19: a mathematical modelling study Prediction of the epidemic peak of coronavirus disease in Japan Forecasting the Spread of COVID-19 in Kuwait Using Compartmental and Logistic Regression Models A model based study on the dynamics of COVID-19: Prediction and control Mathematical Modeling of COVID-19 Transmission Dynamics with a Case Study of Wuhan Prediction for the spread of COVID-19 in India and effectiveness of preventive measures A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models Modeling and Forecasting for the number of cases of the COVID-19 pandemic with the Curve Estimation Models, the Box-Jenkins and Exponential Smoothing Methods Optimization method for forecasting confirmed cases of COVID-19 in China On The Consistency of Prediction Error Identification Methods Dynamical effects of overparametrization in nonlinear models Improved structure selection for nonlinear models based on term clustering System Identification -Theory For the User, Appendix 4A Method for the Solution of Certain Problems in Least-Squares Algorithm for Least-squares Estimation of Nonlinear Parameters The Levenberg-Marquardt Algorithm: Implementation and Theory, Numerical Analysis On the decay rate of Hankel singular values and related issues Ministry of Human Resource Development (MHRD), New Delhi, India, is duly acknowledged. 2. Author 1 would like to thank Asiya Batool for fruitful discussions. The authors declare no potential conflicts of interest regarding the publication of this paper.