key: cord-0063264-l87lvp5e authors: Majumder, Debjit; Mazumder, Sougata; Ghosal, Prasun title: CARD Predictive Modeling and SEI Formulation: COVID-19 Statistics in India date: 2021-05-20 journal: J DOI: 10.1007/s40031-021-00608-3 sha: ff14544f5150c8f7fe5da05b5f73ed3890b58e61 doc_id: 63264 cord_uid: l87lvp5e The current scenario of the pandemic COVID-19 has been a source of anchorage for researchers, healthcare professionals, and statisticians. Based on the immense data, it has been observed that the role of statistics has been crucial in researching and at the same for predicting the COVID-19 scenario of the entire globe. This paper deals with extensive data collection and predictive modeling to derive a CARD model using statistical tools like regression curve fitting. The exponential growth model has been prevalent in live updates via COVID-19 dashboards maintained by different organizations like WHO, Johns Hopkins University, Indian Council of Medical Research. In a similar tone, the paper discusses a time-varying exponential growth model specific to the Indian condition. However, a generic model has been derived by different researchers of other countries. The model accuracy has been considered satisfactory. Moreover, a State-wise Evaluation Indexing has been performed considering parameters like sanitation, population below the poverty line, literacy rate, and population density. Results have been shown for better data visualization through cartograms. The conclusions are noteworthy, and the CARD model can be trained and developed with better accuracy using the concept of machine and deep learning, keeping in context the huge amount of instantaneous data being generated every day all over the world. With the recent outbreak of the pandemic of COVID-19, every country has made significant steps to combat this widely spreading threat to the human community. Mathematical and statistical models have been produced to forecast the future of COVID-19 [15] . For India, relative to other countries in the EU, the spread has been less, which can be attributed to early lockdown implementation throughout the country. However, it has been seen that the cases of confirmed and death rates have been steadily increasing in the country, as shown in Figs. 1 and 2. Both of these show increasing trends and have been considered as a major parameter in world COVID-19 dashboards, maintained by many organizations like Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [4], World Health Organization (WHO) [5] . This situation emphasizes the need for a reliable and accurate statistical model that can not only fit well to the existing data, but also can predict with considerable accuracy. There have been predictive models published by World Health Organization [2] that discussed a model of susceptible, exposed, infected, and removed (SEIR) cases, and it has been reformed by alternating various parameters by many leading research organizations. We decided to produce a predictive growth model during our research using statistical tools and generalize it for various countries. This study comprised of obtaining the specific growth model of the Indian COVID-19 scenario, updated from 30th January 2020 (when the first confirmed case was detected in India, taken as t ¼ 1) till 13th May 2020 (t ¼ 104). However, this predictive empirical model can be used country-wise, using the data available. The growth model had a very high R 2 value, and the deviations of the forecasted data from the actual data were considerably small. The model can predict the future of Indian COVID-19 and be of great help for policy-makers, researchers, scientists, and medical professionals to combat this deadly virus and control its spread. The forecasted values can also be termed reliable, and suitable hospitalization can be improved in India. Based on this, future research can be carried out on how to control the reproduction number and doubling time and improve test per million, keeping in mind the socio-demographic diversity of a country like India. The data collection methodology, statistical analysis, growth models, and data validation have been shown. On 11th February, 2020, the World Health Organisation announced that severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as the virus responsible for the pandemic of COVID-19 [21] . To support a sound ecosystem of research in this pandemic, WHO agreed to work with other countries. It was directed that countries affected by this pandemic shall implement proper measures to detect, isolate, and treat potential cases, trace contact along with incorporating social distancing measures and to take crucial steps during the early spread of this disease [22, [25] [26] [27] 31] . On 11th March, 2020, there were reportedly more than 118,000 cases in 114 countries and 4291 people have lost their lives. Keeping this in context and probable risk of foreseeing, alarming levels of this spread WHO declared COVID-19 as a global pandemic [23] . Other situation reports of every month released by WHO have been very essential to emphasize the need of this model at this stage of ever increasing cases of COVID-19 in India and abroad [24] . A joint mission was launched between WHO and China for better knowledge sharing and combating this pandemic. Updates regarding this mission helped countries to monitor how it dealt with the problems Time 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 [28] [29] [30] . As on 13th May, 2020, globally 4,170,424 cumulative cases and 287,399 deaths [32] have been recorded highlighting the fact that models such as SEIR, SIQR and CARD shall be a great boon for predicting the future of COVID-19. The spread of this disease shall be tackled by following the latest updated guidelines by WHO. In a similar note as the SEIR model released in the WHO bulletin, other models such as susceptible-infectiousquarantined-recovered (SIQR) have been derived [17] . Indicators were compared with other countries such as Italy and Brazil. However, a purely statistical model using the specific case of India has been missing. Thus, there is an ardent need for the CARD model, which can be used for improved governance and counteracting the threat of SARS-Cov-2, a novel virus responsible for the rapid spread of COVID-19 around the world [8, 9] . Several statisticians and mathematicians are publishing models online, and the race to establish a particular model is enormous. However, loopholes are there due to the maximum of such models, not taking into consideration the dynamic nature of COVID-19. The forecasts and nowcasts are being highlighted, and the peer review process does not encounter most of the models published online, and thus have erroneous predictions [14] . The CARD model and the SEI takes into consideration the actual statistical modeling parameters which genuinely contributes to the spread of this deadly virus, along with a comparative study. The need for such an established model is the need of the hour. It can be a massive boon for the researchers (specific not only to India) to strategic combat plans and increasing data. It can also train this particular predictive model to give more accurate results shortly [3] . The challenges addressed through this model include the following. • To represent the current scenario of COVID- 19 produced earlier have a limited data set for operation, and this CARD model incorporates the data in its modeling until 13th May 2020, latest. Moreover, it can control the new outbreak of this pandemic which can result in further waves. [13, 18] . The only limitation of this model is the effects due to partial lifting of lockdown in India on community spreading, which has not been considered. However, further research is already being done by us to estimate the effect of the partial lifting of lockdown on COVID-19 statistics in the Indian context. Novelties of this work lie in the manifold and may be summarized as follows. • A well-established growth model (CARD) is developed, which can be used to show the relationship between time and COVID-19 cases in India. • A generic relation is established between time and COVID-19 cases, which can be applied to any country in the world. • The CARD model can act as a founding stone for statisticians and scientists to develop a more welltrained and advanced model. • In contrary to other mathematical models, the CARD model is purely statistical in nature and is based on reliable data, which is updated daily by credible sources like WHO, ICMR, MHA. • 37 different states and union territories of India are given index points based on their performances on certain parameters like population BPL, sanitation, literacy rate, and population density. This variation has been shown using a cartogram, which can help the decision-makers to improve on these parameters and act judiciously. The overall organization of the rest of the paper is as follows. Section 1.2 discusses the background of the present work and details regarding the related state-of-the-art research works. It highlighted the recent works done in modeling. It emphasizes the need to undertake this research study and its limitations to enable future research to improve this CARD model by feeding more data into it. Section 1.3 enlists the novel contributions of this work and its impact on future researches, decision-making processes of policymakers, socio-economic parameters, and the ardent need for this to reach the researchers. Section 2 in the paper summarizes the time-varying growth models for India and the deduction of a generic version for the world. Section 3 deals with the comparative study between the actual and derived data, showcasing the accuracy of this CARD model with the deviations mentioned, which has a very small mean standard error for the last ten day's data set. The accuracy can be further increased as we go on with data for the next 1-2 months and a study can surely be done to analyze the statistical differences in COVID-19, after partial lock-down relaxation in India starting from 17th May 2020. Section 4 is a section that supplements the entire CARD model in its approach towards helping stakeholders in their decision-making process. The Statewise Evaluation Index (SEI) has been formed tactically to rate the states based on some fixed parameters such as sanitation, population below poverty line, literacy rate, and population density. Most of the researches done [11, 12, 19] highlighted the concept of reproduction number, doubling time, temperature, humidity, etc. This aims to bridge the gap, and those researches can now be combined to more accurately determine the states' performance in lieu of its response to COVID-19 and what factors shall be improved. Visualization at a glance has been given using a cartogram [4, 5] . The paper concludes in Sect. 5 with some noteworthy suggestions which can surely prove to be crucial, followed by references in the end. An overview of the workflow adopted across the entire work is depicted in the flowchart (see Fig. 3 ). Raw data are one of the most significant parts of this study. The data that support the findings of this study are available in Statista at www.statista.com [7] . On this data, a series of regressions was performed, and ultimately a particular solution was reached. A lot of trial and error was carried out choosing various linear and nonlinear regression , it was observed that most of the models were exponential growth models, which were back validated with our trial and error method. The models obtained have been tabulated in Table 1 . where C(t), R(t), and D(t) represent time-varying functions of confirmed, recovered, and death cases (specifically for India), respectively, and RSS denotes the residual sum of squares which should be minimized in order to get an optimized equation, and R 2 is an indicator of the extent of curve fitting which shall ideally be 1. It is quite evident that the total number of confirmed cases till date is the summation of the total number of recovered, death and active cases, respectively, till date. Thus, the total number of active cases till date can be found out by subtracting the summation of the total number of recovered and death cases, respectively, from the total number of confirmed cases. or, where A(t) denotes the time-varying function of active cases. As a matter of fact, a more generic version of this model can be found out which will be applicable to every country in the world, where in, the time variant equation of confirmed, recovered and death cases will change in the following way: RðtÞ ¼ l t ð4Þ Therefore, using equation (2) above, The values of j, a, l, k and b are country specific and can be found out by carrying out a series of regressions on the COVID-19 data of that particular country. Using the equations mentioned above in Table 1 , obtained from the regression analysis, a comparative study between the actual data and the derived data has been carried out. Due to inconsistency in data from 30th January 2020 (t ¼ 1) to 17th March 2020 (t ¼ 48), there was an anomaly in the forecasting. This anomaly is attributed to the poor quality of data and thus has been eliminated. Although the regression has been done for the entire set of data, only the result from 18th March 2020 (t ¼ 49) to 13th May 2020 (t ¼ 104) has been shown below in Table 2 for confirmed, recovered, death and active cases, respectively. Deviations have also been shown from the actual data to emphasize the accuracy of this model. Mean standard error calculation of each parameter has been shown for a sample space of the last 10 days, i.e., from 3rd May 2020 to 13th May 2020 in Table 3 . The accuracy of this confirmed-active-recovereddeath (CARD) model is demonstrated by a parity between the actual data and derived data through curve fitting (see Fig. 4 ). A state-wise rating has been formulated to calculate the index of every state based on some of the fixed basic parameters, which can presumably play an important role 16-2.8 in the fight against COVID-19. The parameters are taken for index calculation, and the corresponding scales have been mentioned in Table 4 . All the states and union territories have been considered the latest updated in 2020. The sources of the data have also been mentioned on why that particular parameter has been chosen for indexing. The data (for sanitation and population below poverty line of every state and union territories) that support this study's findings are openly available in the Handbook of Urban Statistics 2019, Ministry of Housing and Urban Affairs, Government of India [16] . Sanitation denotes the percentage of the population having necessary hand-washing facilities. As per the Interim Guidance Report released by WHO on 23rd April 2020 [20] , the water, sanitation, and hygiene (WASH) play a considerable part in this context. Therefore, it is a very crucial point to evaluate states based on sanitation levels. Population below poverty line (expressed in percentage of population) is essential to highlight the effect of COVID-19 and how it shall affect the vital topic of cash fluidity in the bottom portion of the pyramid, ensuring financial stability. Literacy rate (denoted by the percentage of the total population) can be related to the understanding of the standard population and its sequential reaction to this pandemic, state-wise. In general, responding to the awareness campaigns and being socially aware can be considered an essential yardstick for fighting against COVID-19 and preventing community spreading. Population density (inferred about population/square kilometer) is an indicator of expectation of community spreading, rising, and correlation among all the above parameters. The data (for literacy rate and population density of every state and union territories) that support this study's findings are openly available in Wikipedia under ''List of states and union territories of India by population'' [10] to increase accuracy with updated data till 2020. Table 5 shows the Indian state-wise data along with the calculated index using the rating system discussed already. For better visualization of the index calculated for every state and union territories, Datawrapper has been used to create a cartogram, which will help state-wise analysis in India based on the four parameters. It has been shown in Fig. 5 . Credit for the map tile used for cartography goes to OSM and its contributors [1, 6] [ÓOpenStreetMap contributor, more details here: https://www.openstreetmap. org/copyright]. We have developed a CARD model using statistical tools and predictive modeling, which emphasizes on the need for a time-variant growth model to focus on the effect of COVID-19 in Indian conditions. However, a generic equation is generated for further scopes of study specific to other countries. The model is quite accurate, as indicated by the mean standard error of the sample data. Curve fitting further highlights the parity between the actual and derived data and can be a real asset for predicting the future of COVID in India, provided the current circumstances remain constant. The model is applicable to the lockdown condition and does not consider the dynamic parameters which play a role during unlock down scenario. In that condition, the model does not perform well and gives a significant deviation in the actual and derived curves. Potential dynamic factors are mobility, migration, etc. With more data being generated every day, this model can be trained, and the accuracy can be increased using machine learning and deep learning techniques, which is one of the future scopes of the researchers. From the cartogram (Fig. 5) and SEI: The states, having a rating of more than 35, have set themselves a class apart in dealing with the pandemic of COVID-19 appropriately. Although the index scores are considerably better, there are shortcomings in other dynamic parameters for the states with ratings between 30-35, such as for Maharashtra. Its location and constant exposure to international travelers through airports are critical parameters attributing to its latest worsening condition. The rest of the states whose index numbers are below 30 lacks planning and its implementation. Development of factors such as test per million, reproduction number, doubling time, hospitalization, and availability of beds shall be a benevolence for the country, in this fight against COVID-19. Worldwide COVID-19 Outbreak Data Analysis and Prediction Studying COVID-19 in light of critical approaches to risk and uncertainty research pathways, conceptual tools, and some magic from Mary Douglas Openstreetmap: user-generated street maps Covid-19 Daily Confirmed Recovered Death Cases Early dynamics of transmission and control of COVID-19: a mathematical modelling study Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) List of states and union territories of India by population (For data: Literacy Rate and Population Density) The reproductive number of COVID-19 is higher compared to SARS coronavirus The role of absolute humidity on transmission rates of the COVID-19 outbreak Covert coronavirus infections could be seeding new outbreaks A model society: maths, models and expertise in viral outbreaks Real-time forecasts of the COVID-19 epidemic in china from The Handbook of Urban Statistics Modelling and analysis of COVID-19 epidemic in India A novel coronavirus outbreak of global health concern Temperature significant change COVID-19 Transmission in 429 cities Water Sanitation Hygiene and Waste Management for Covid-19 World Health Organization: naming the coronavirus disease (COVID-19) and the virus that causes it Emergency committee regarding the outbreak of novel coronavirus (2019-ncov World Health Organization: Director-general's opening remarks at the media briefing on COVID-19 Coronavirus disease (COVID-2019) Situation reports Coronavirus disease 2019 (COVID-19) situation report-24 Coronavirus disease 2019 (COVID-19) situation report-28 Coronavirus disease 2019 (COVID-19) situation report-31 World Health Organization: report of the WHO-China Joint Mission on Coronavirus Disease 2019 World Health Organization: rolling updates in coronavirus disease COVID-19 World Health Organization: report of the who-China joint mission on coronavirus disease Coronavirus disease 2019 (COVID-19) situation report-25 Coronavirus disease 2019 (COVID-19) situation report-114 Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations