key: cord-311183-5blzw9oy authors: Malavika, B.; Marimuthu, S.; Joy, Melvin; Nadaraj, Ambily; Asirvatham, Edwin Sam; Jeyaseelan, L. title: Forecasting COVID-19 epidemic in India and high incidence states using SIR and logistic growth models date: 2020-06-27 journal: Clin Epidemiol Glob Health DOI: 10.1016/j.cegh.2020.06.006 sha: doc_id: 311183 cord_uid: 5blzw9oy BACKGROUND: Ever since the Coronavirus disease (COVID-19) outbreak emerged in China, there has been several attempts to predict the epidemic across the world with varying degrees of accuracy and reliability. This paper aims to carry out a short-term projection of new cases; forecast the maximum number of active cases for India and select high-incidence states; and evaluate the impact of three weeks lock down period using different models. METHODS: We used Logistic growth curve model for short term prediction; SIR models to forecast the cumulative, maximum number of active cases and peak time; and Time Interrupted Regression model to evaluate the impact of lockdown and other interventions. RESULTS: The predicted cumulative number of cases for India was 58,912 (95% CI: 57,960, 59,853) by May 08, 2020 and the observed number of cases was 59,695. The model predicts a cumulative number of 1,02,974 (95% CI: 1,01,987, 1,03,904) cases by May 22, 2020 As per SIR model, the maximum number of active cases is projected to be 57,449 on May 18, 2020. The time interrupted regression model indicates a decrease of 149 daily new cases after the lock down period which is statistically not significant. CONCLUSION: The Logistic growth curve model predicts accurately the short-term scenario for India and high incidence states. The prediction through SIR model may be used for planning and prepare the health systems. The study also suggests that there is no evidence to conclude that there is a positive impact of lockdown in terms of reduction in new cases. Title of the article: Forecasting COVID-19 epidemic in India and high incidence States using SIR and Logistic Growth Models. Since the beginning of the COVID-19 epidemic, there has been several mathematical and statistical modelling that have predicted the global and national epidemic with varying degrees of accuracy and reliability. 7, 8 The accuracy of prediction and its uncertainty depend on the assumptions, availability and quality of data. 9 The results can vary significantly if there is difference in the assumptions, and values of input parameters. During a pandemic like COVID-19, the availability and quality of data keep improving as the epidemic progress, which make predictions uncertain in the early stages and expected to improve in the later stages. Moreover, an epidemic may not always behave in the same manner as pathogens are likely to behave differently over time. 10 In terms of COVID-19, different models are used to estimate the key features of the disease such as the incubation period, transmissibility, asymptomaticity, severity, and the likely impact of different public health interventions. Among the models, Susceptible, Exposed, Infection and Recover (SEIR), Susceptible, Infection and Recover (SIR) models, Agent-based models and Curve-fitting, Logistic growth models due to the exponential nature of growth of the epidemic or extrapolation models, are commonly adopted using different biological and social processes. 7, [11] [12] [13] [14] [15] [16] 17 In this scenario, the logistic growth models are better preferred option. Choudhary (2020) has predicted the estimated cases very early till April 7, 2020, using time series models. 18 However, it was found to be a gross underestimation. In spite of the limitations, considering the unprecedented nature of the pandemic, uncertainties about the disease and the need for urgent but appropriate social, economic and public health responses; accurate forecasting of the size, severity and duration of the epidemic is critical to inform policies, programme and strategies. This paper aims to carry out short-term projection of new cases using the logistic growth curve model; forecast the maximum number of active cases for India and selected highburden states using the SIR model with correction factor based on China, Italy and South Korea; and evaluate the impact of lockdown and other interventions on the incidence of daily cases. Logistic Growth is characterized by an increasing growth in the beginning, but a decreasing growth at a later stage, as it approaches the maximum. In COVID-19, the maximum limit will be the total population and the growth will necessarily come down when a greater proportion of the population is sick. The reason for using logistic growth for modelling the Coronavirus outbreak is based on the evidence that the epidemic follows an exponential growth in the early stages and expected to come down during the later stages of the epidemic. The modified logistic growth model 19, 20 is presented as follows, Where, y(t) is the number of cases at any given time t C is the limiting value, the maximum capacity for y a = (C / y 0 ) -1 b is the rate of change. • the number of cases at the beginning, also called initial value is: C / (1 + a) • the maximum growth rate is at t = ln(a) / b When y is equal to C (that is, the population is at maximum size), y/C will be 1. Therefore, the (1-(y/C)) will be 0 and hence the growth will be 0. β is a transmission parameter, which is the average number of individuals that one infected individual will infect per time unit. It is determined by the chance of contact and the probability of disease transmission. γ is the rate of recovery in a specific period. D, the average time period during which an infected individual remains infectious which is derived from γ. = . The ratio 0 = , is the basic reproduction number. R is the average number of people infected by an infected individual over the disease infectivity period, in a totally susceptible population. In order to fit a SIR model, the parameters were obtained by minimizing the residual sum of squares between the observed cumulative active cases and the predicted cumulative infected cases. We have fixed R0 and as 2.5 and 7 days respectively. 24, 25 Therefore, is 0.14 and the is 0.36. The data for India was taken from the crowd sourced Invariably, the SIR model overestimates the active number of cases. In order to compute the overestimation, the actual number of reported cases from China was obtained up to April 5, 2020 and used to estimate the maximum number of active cases in China. Subsequently, the ratio of maximum (peak) active cases projected by the model to the observed peak active cases was computed. The similar estimation was done for Italy and South Korea as well. In order to choose the best correction factor that is appropriate for India, we compared the age and gender distribution of population of these three countries with the age and gender distribution of population in India. China correction factor was applied to states such as Maharashtra, Rajasthan and Tamil Nadu. As the population size in Delhi is small which is about four to five times lower than the other states, SIR model was not done for Delhi. Data that were used in the modelling is presented in appendix. Time interrupted regression analysis 26 was done to assess the impact of 3 weeks' lockdown on the incidence of new cases. Dummy variable was introduced at April 15, 2020. The hypothesis was that there will be a decline in the incidence of new cases after the lock down period, that is after April 14, 2020. That is, the regression coefficient will be significant and negative in direction. As there were only 3 cases reported from Jan 03 to March 01, 2020, we excluded these time points from the analysis. Table 1 is presented in Figure 1a & 1b. The Rajasthan and Tamil Nadu, it will be 5,089, 3,324 and 3,221 respectively. The corresponding peak time was expected to be June 10, 2020, June 6, 2020 and June 21, 2020 respectively. The diagrammatic representation of the trend is presented in Figure 2 . The results of the interrupted time regression analyses are presented in Table 4 . The model indicates a decrease of 149 daily new cases after April 14, 2020, 3 weeks after the lockdown which is not statistically significant. There have been several studies forecasting the incident cases of COVID-19 in various countries. However, there are a little peer reviewed articles about India. Forecasting COVID-19 through appropriate models can help us to understand the possible spread across the population so that appropriate measures can be taken to prevent further transmission and prepare the health systems for medical management of the disease. It is also essential to evaluate the effectiveness of interventions so that appropriate and timely programmatic changes can be made to mitigate the epidemic. We forecasted the number of cumulative cases for India and four other high incidence states using logistic growth model which has projected the cumulative cases very closely to the observed cases. This model is based on the current trends of the cumulative cases in India and specific states. We have used the logistic growth model due to the exponential nature of growth of the epidemic which eventually get stabilised as against pure exponential model. 7, 11- end of May, 2020. However, the total number of cases had already crossed 20,000 by April 22, 2020, which was a gross underestimation. 27 The SIR model with correction factor predicted 57,450 cases which will be the maximum number of active cases by May 18, 2020. However, the peak time gets pushed to June in other states. When we performed the SIR model using the reported cases from China, South Korea and Italy, we found that the model predicted more number of active cases than what they observed up to a time point for which the data was analysed. In order to address the overestimation, we formulated a correction factor which is essential to predict the epidemic accurately. Besides, as suggested by Ranjan (2020) , the SIR model depends heavily on the population who are susceptible. Therefore, it may overestimate the maximum cases when the epidemic is not generalized in the population. Therefore, this could be considered as a warning signal for preparing the health systems in terms of planning treatment facilities and other interventions. In COVID-19 epidemic, assessing the effectiveness of lockdown is one of the key interest areas. India had a head start in imposing the lockdown relatively early, in addition to strong public health measures to mitigate the spread of the epidemic. It also raises an interesting question whether this lockdown has really impacted the incidence cases. Several studies have assessed the effectiveness of interventions with varying level of results. 28, 29 We carried out interrupted time series analyses that suggested no significant decline in the number of daily cases immediately after the lock down. Ironically, there is an increase in the number of daily cases immediately after the 3 weeks of lockdown period. It indicates that the lockdown and other interventions did not have any impact on reducing the number of daily cases after a certain period. This may be due to the fact that the number of tests done over a period of time has increased significantly. However, we need to revise the model every week as and when the data gets accumulated. Limitations: As in any other projection using models, the limitation is that each model would behave differently, not merely due to differences in underlying assumptions but differences in population density, existing capacity of the health systems, current level of interventions and socio-demographic and economic situation across and within the states and districts. Therefore, district level projections may be required, which would account the variations between the states and within the states. In Covid-19, there has been a higher level of uncertainly about the number of reported confirmed cases due to the issues in varying testing strategies, the proportion of asymptomatic cases and the effective transmission rate. Because of this, we may be missing a significant number of reported confirmed cases which may affect the accuracy of any models. In conclusion, the short term projection predicts exactly well with the observed number of cases in India and in other states through the logistic growth model. The findings from SIR model may be used for planning the interventions and prepare the health systems for better clinical management of the infected in the country and respective states. None of the authors have conflicts of interest to report. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Not required Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Geneva: World Health Organization Mass testing, school closings, lockdowns: Countries pick tactics in 'war' against coronavirus The impact of social distancing and epicenter lockdown on the COVID-19 epidemic in mainland China: A data-driven SEIQR model study New Delhi: Ministry of Health and Family Welfare, Government of India Why Is It Difficult to Accurately Predict the COVID-19 Epidemic? Infect Dis Model When will the coronavirus outbreak peak? Nature Epidemic Forecasting is Messier Than Weather Forecasting: The Role of Human Behavior and Internet Data Streams in Epidemic Forecast Managing epidemics:Key facts about major deadly diseases Modeling and Predictions for COVID 19 Spread in India Age-structured impact of social distancing on the COVID-19 epidemic in India SEIR and Regression Model based COVID-19 outbreak predictions in India Mathematical modeling of the spread of the coronavirus disease 2019 (COVID-19) taking into account the undetected infections. The case of China Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: A mathematical model-based approach Epidemic situation and forecasting of COVID-19 in and outside China COVID-19 in India: Predictions, Reproduction Number and Public Health Preparedness Forecasting COVID-19 cases in India. Towards Data Science Analysis of a Modified Logistic Model for Describing the Growth of Durable Customer Goods in China Modeling Logistic Growth COVID-19 Growth Modeling and Forecasting with Prophet An introduction to compartmental modeling for the budding infectious disease modeler Application of the susceptible-infected-recovered deterministic model in a GII.P17 emergent norovirus strain outbreak in Romania in 2015 The reproductive number of COVID-19 is higher compared to SARS coronavirus Pattern of early human-to-human transmission of Wuhan Segmented regression analysis of interrupted time series studies in medication use research Predictions for COVID-19 outbreak in India using Epidemiological models Predictions, role of interventions and effects of a historic national lockdown in India's response to the COVID-19 pandemic: data science call to arms COVID-19: Mathematical Modelling and Predictions 03-Feb-2020 3 6 04-Feb-2020 3 705-Feb-2020 3 8 06-Feb-2020 3 907-Feb-2020 3 10 08-Feb-2020 3 11 09-Feb-2020 3 12 10-Feb-2020 3 13 11-Feb-2020 3 14 12-Feb-2020 3 15 13-Feb-2020 3 16 14-Feb-2020 3 17 15-Feb-2020 3 18 16-Feb-2020 3 19 17-Feb-2020 3 20 18-Feb-2020 3 21 19-Feb-2020 3 22 20-Feb-2020 3 23 21-Feb-2020 3 24 22-Feb-2020 3 25 23-Feb-2020 3 26 24-Feb-2020 3 27 25-Feb-2020 3 28 26-Feb-2020 3 29 27-Feb-2020 3 30 28- Feb-2020 3 31 29-Feb-2020 3 32 01-Mar-2020 3 33 02-Mar-2020 5 1 34 03-Mar-2020 6 1 1 35 04-Mar-2020 28 1 2 36 05-Mar-2020 30 2 2 37 06-Mar-2020 31 3 2 38 07-Mar-2020 34 3 2 39 08-Mar-2020 39 3 2 40 09-Mar-2020 48 2 4 2 41 10- Mar-2020 63 5 4 3 42 11-Mar-2020 71 11 5 3 43 12-Mar-2020 81 14 6 3 44 13-Mar-2020 91 17 7 3 45 14-Mar-2020 102 26 7 4 46 15 The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.Author names: ________________________________________________________________________________________ ______________________________________________________________________________________________________ ______________________________________________________________________________________________________ The authors whose names are listed immediately below report the following details of affiliation or involvement in an organization or entity with a financial or non-financial interest in the subject matter or materials discussed in this manuscript. Please specify the nature of the conflict on a separate sheet of paper if the space below is inadequate.