key: cord-1035937-jb8b1k2w authors: Alenezi, Mohammed N.; Al-Anzi, Fawaz S.; Alabdulrazzaq, Haneen title: Building a Sensible SIR Estimation Model for COVID-19 Outspread in Kuwait date: 2021-02-04 journal: nan DOI: 10.1016/j.aej.2021.01.025 sha: c2b6820f6c70d38fef859abaf6998536072788db doc_id: 1035937 cord_uid: jb8b1k2w The Susceptible - Infected - Recovered (SIR) model is used in this research to analyze and predict the outbreak of coronavirus (COVID-19) in Kuwait. The time dependent SIR model is used to model the growth of COVID-19 and to predict future values of infection and recovery rates. This research presents an analysis on the impact of the preventive measures taken by Kuwait’s local authorities to control the spread. It also empirically examines the validity of various values of R 0 ranging from 2 to 5.2. The proposed model is built using Python language modules and simulated using official data of Kuwait in the period from February 24 th to May 28 th of 2020. Our results show the SIR model is almost fitted with the actual confirmed cases of both infection and recovery for the values of R 0 ranging from 3 to 4. The results shown indicate COVID-19 peak infection rates and their anticipated dates for Kuwait. It has been observed from the obtained prediction that if preventive measures are not strictly followed, the infection numbers will grow exponentially. These days the world faces an unparalleled challenge of the spread of an infectious virus known as Coronavirus or COVID19. The family of coronaviruses consists of various viruses that cause illness ranging from a simple cold to more severe diseases like Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The recently discovered COVID-19 virus belongs to this family of viruses. The first case of COVID-19 was found and reported in the city of Wuhan, China, on the 31 st of December 2019, hence the name COVID-19. Since the rapid spread of this epidemic; the Chinese government introduced several measures to hinder this outbreak such as locking down the city of Wuhan and closing all routes that lead to it, in late January [1] . Up until the date of writing this paper, the pandemic has been officially acknowledged in almost 213 countries and territories around the world. The first reported case of COVID-19 outbreak outside China was in Thailand on January 13 th 2020 [2] . On May 28 th 2020, around 5.76 million confirmed cases, 2.39 million recovered cases, and 358 thousand deaths were registered worldwide [3] . The World Health Organization (WHO) declared COVID-19 an epidemic and Global Public Health Emergency on January 30 th 2020 [4] . On that date, the total number of infected cases were 8,096. The WHO declared COVID-19 a global pandemic on March 11 th 2020 because it imposed a threat to the whole world [5] . More than 40% of the World's population have been placed under lockdowns issued by various governments in order to reduce the spread of COVID-19. Social distancing is an agreed upon measure and known to be among the best viable ways to reduce the spread of this novel coronavirus. The rapid and uncontrollable increase in the total number of cases globally has created a worldwide public health issue. This pandemic created significant challenges not only in public health but also in various areas, including politics, economics, education, and social behavior. Furthermore, this epidemic caused an intensification in poverty and unemployment globally. Due to its strong infectious nature, extended incubation period, difficulty in detection, and vagueness in transmission ways, COVID-19 is becoming a very difficult disease to control. Countries around the world have joined in tremendous efforts to decrease or prevent the outspread of COVID-19 [6] . Many health and pharmaceutical research centers and companies are racing with time to develop a vaccine or cure to treat COVID-19. However, none of these efforts have been successful thus far. The virus has taken its toll on the world's economy and many countries are undergoing major economic crises. Like the rest of the world, Kuwait currently is facing the many challenges of COVID-19. The first five cases of COVID-19 infection were reported on February 24 th 2020 and were associated with citizens returning from travel abroad. This sparked the threat of local transmission in the country. Upon the rapid increase in numbers of confirmed infected cases, the Kuwaiti authorities started to take preventive measures including; quarantine, banning in-bound flights from various countries, closing down retail shops, and issuing a public holiday from 12 th March 2020 onward. The government issued a partial curfew on the 22 nd of March 2020 from 5:00 PM to 4:00 AM daily, then amended the timings twice to be from 5:00 PM to 6:00 AM and then from 4:00 PM to 8:00 AM and finally a full lockdown that was implemented from May 10 th until May 30 th 2020 [7] . On May 28 th 2020, the total number of confirmed infections was reported to be 24,112 from which 8,698 were recovered and 15,229 were in treatment. The overall death toll was 185, as well as 185 of the cases were considered to be under critical conditions. As of June 1 st 2020, the total population size of the state of Kuwait is 4,776,407 based on data from Public Authority for Civil Information (PACI) [8] . Kuwait's primary healthcare provider is the Ministry of Health (MOH), and there is a small number of private sector hospitals and clinics. The total bed capacity of MOH is to be around 7,118 while the private sector's capacity is estimated at 1,082 beds [9] . In light of the uncertainties facing the world today, it is becoming essential for decision makers to have good estimates for the damages COVID-19 has inflicted so far, and what it will cause in the near future. The estimation of the epidemic outspread would help the official authorities to warrant effective prevention measures and to be able to prepare for care and treatment actions. Although, a precise estimate of any pandemic is considered unachievable, nevertheless, scientists and researchers can attempt to make rough estimates by using proven scientific-based prediction techniques. Based on such estimates, official authorities can make good and informed decisions on how to proceed onwards in their efforts to control the damages caused by COVID-19. One of the simple yet effective mathematical models to predict a pandemic is a popular model called the SIR model. The name SIR comes from Susceptible-Infected-Recovered [10] . In an SIR model, the entire population is placed in one of 3 categories: Susceptible, Infected, or Recovered. People who are not yet infected with the disease are considered susceptible. Those who are confirmed to be infected and capable of transferring the disease are placed in the infected category. The people who have recovered from the disease or are deceased are considered recovered. These three categories represent the progressive stages of a contagious disease. The SIR model is well suited to predict the number of the population who would need medical care during an epidemic spread. The SIR model assumes that a person that has recovered from a disease will attain a lifetime immunity against it and will never get infected again. SIR models are simple mathematical models and yet have been known to adequately predict a pandemic with accuracy [11] . In this paper, we use the SIR model to estimate and analyze the spread of COVID-19 pandemic in Kuwait with an assumption that the country will be almost constant in population size, which means deaths, births, and migration will cancel each other in the total population count during our analysis due to the narrow time window. Our prediction of daily confirmed cases, cumulative cases, and recovered cases are estimated with different values of basic reproduction number R 0 that will be described in the next sections. The data processed in this paper is based on the daily confirmed cases from 24 th of February until the 28 th of May 2020 and is retrieved from the official website for COVID-19 in Kuwait [7] . Our research results show a promising prediction of the peak and end dates of the COVID-19 outbreak for Kuwait. The UN Secretary-General Antonio Guterres said that currently the world is facing its most challenging crisis since World War II. People all over the world are in a state of panic due to the new coronavirus outbreak. This pandemic has shown severe effects not only on public health, but also on social, economic, and political aspects. COVID-19 has coerced countries to impose travel bans and restrictions as well as the promotion of large-scale quarantines worldwide in an attempt to restrain its spread. Several studies, by different researchers, were done to forecast and model the COVID-19 epidemic. The studies were performed to track this epidemic's spread and estimate its infection rate and expected ending. Various researchers utilized different models for analyzing and forecasting COVID-19, such as the logistic growth model, deterministic compartmental models (DCM), and agent-based models (ABM) [12] . Statistical models are well suited for predicting pandemics. Regression analysis is mainly used to analyze and predict the relationship of a dependent or target variable with an independent variable, which predicts the target variable based on previous values. It can be used to analyze the relationships between two or more dependent and independent variables. A regression is said to be linear if it contains linear parameters. Therefore, polynomial regressions are also known to be linear regressions. The dependent variable in the second-order polynomial regression with one variable can be calculated using equation 1. The independent variable, x can be the number of tests performed, gender, age, region, etc., whereas dependent variable(y) can be the number of confirmed cases, recovered cases, death cases, etc. β 0 is the intercept also bias, and β 1 , β 2 are the weights(slopes). In real life situations, a regression model can not be able to predict exactly a dependent variable using an independent variable. So there exists some error which is represented by . It adds noise of this relationship between dependent and independent variables. The relationship of the dependent and independent variables which contains two explanatory variables is given the equation 2. If x j = x j for i=1,2,..., n, then the polynomial regression is viewed as a linear regression of multiple independent variables such as The second-order polynomial regression is also known as a second-order model or The Generalized Linear Model (GLM) [13] is a regression model used to estimate or analyze the effect of various continuous independent variables on different dependent variables. In GLM, the data is calculated as a sum of the generated model and possible error. It can also be viewed as a flexible generalization of linear or polynomial regression models. It mainly consists of a random component, linear predictor, and a link function g(·). A random component represents the conditional distribution of dependent variable Y i which is defined by the independent variables. A linear predictor is the linear function of dependent and independent variables which is given in equation 4. where i = 1, 2, ..., n and the link function is an invertible function, which defines how the mean, E(Y i ) = µ i depends on the linear function. Locally Weighted Linear Regression(loess) [14] is a non-parametric regression model that is mainly used to smooth the curve in volatile time-series using a scatter plot to get the best fitting data. It is used in local subsets to smoothen their values. The loess method first identifies a smoothing parameter, then selects k nearest neighbors of x 0 , which is to be smoothened. Loess algorithm assigns the weights of each point of x 0 to its nearest neighbors. Poisson regression [15] is a regression model used to estimate the discrete dependent or response variable; it assumes response variables are positive counts which follow the Poisson distribution. Logistic regression is similar to Poisson regression. Poisson regression is mainly used to analyze the rates whose values are positive counts. In contrast, logistic regression is mainly used to calculate ratios whose values lie in the range of 0 and 1. Logistic regression model [16] is a regression model which is used to predict or estimate a dependent variable based on the independent variable under consideration. It is a well-suited regression model to analyze the growth of epidemic diseases. The model considers that epidemics increase exponentially at the starting stage, then a steady increase occurs, and finally a decline in its growth rate. C represents the number of infected cases, r is the rate of infection, and K is the final epidemic size. The number of infected cases is calculated by The changes in the number of infections at t is defined by equation 6 [16] . Maximum growth rate occurring at time t p is calculated using equation 7. The peak number of cases and maximum growth rate at maximum peaks are defined using the following equations 8 and 9 respectively. For fitting the actual confirmed infected population to the regression model is the actual estimate for i ranging from 1 to n is given in Deterministic compartmental models (DCM) are nonlinear models used to estimate the spread of an epidemic. In DCM, differential equations are used to model the epidemic spread. Susceptible-Infected-Recovered (SIR), Susceptible-Exposed-Infected-Recovered (SEIR), and Autoregressive Integrated Moving Average (ARIMA) model are the three mainly used DCMs, for estimating the epidemic spread. The SIR model has been used in the past to estimate many diseases including HIV and Ebola [17, 18] . SIR considers the total population as a combination of three parameters: Susceptible (S), Infected (I), and Recovered (R) [19] . Susceptible holds the total population which are healthy but are at risk of being infected. Infected is the number of mildly or severely infected population. Recovered is the total number of both recovered persons from the epidemic who attained immunity and it includes persons who have deceased [20] . The total population, N can be written by equation 11 [10] . With the SIR model, the population is assumed to be constant; no deaths and births during the period of epidemic prediction. The model calculates the changes in S, I, R using differential equations represented in equations 12, 13, and 14 respectively [21, 22] . β and γ are deterministic parameters that reflect the infection rate at which the susceptible population is infected per day and recovery rate at which they become recovered with immunity [23] . The basic reproduction number of the disease can be calculated using equation 15 . The Residual Sum of Squares is a statistical method to determine the variance in a dataset which is not considered by the proposed model. It calculates the error between the dataset and estimation model. In SIR based model, RSS is used to find the optimal values of β and γ, which calculates the error rate with the model using the given infection and recovery rates. RSS is computed using equation 16 . The coefficient of determination(R 2 ) is another statistical method used as goodness-of-fit measure which is defined as the percentage of variance in the dependent variable which can be estimated using independent variables. It determines the relationship strength between the estimation model and the dependent variable. Normally, its value ranges from 0 -100%(0 -1). R 2 can be measured using equation 17 . TSS is the total sum of squares which finds the sum of squared differences of predicted variable(y i , i ≤ n) with their overall mean(ȳ) and can be calculated using equation 18 . SIR estimates the outbreak based on some initial parameters such as initial susceptible population (S(0)), initial infected population (I(0)), and the initial recovered population (R(0)). SEIR model is an advancement of the SIR model where the total population is divided into four compartments instead of three. SEIR model assumes that the entire population is susceptible [24] . Exposed (E) refers to the population who were exposed to an infected person and have become infected, but are not yet infectious. The total population in SEIR model is represented by equation 19 . SEIR model also considers no deaths or births during the estimation period. The changes in each time t of SEIR model is given in equations 20, 21, 22, and 23 [25] . α is known as incubation rate, the rate at which an individual became infectious [2] . The values of β, α, and γ are calculated using β = R0 Ti , α = 1 T l , and γ = 1 Ti . T i and T l are defined as serial and incubation period [25] . ARIMA is a statistical model used to estimate or analyze time-series data and forecast future values. ARIMA is a widely used statistical model for predicting the periodic changes and analyzing time series data [26] . AR in ARIMA refers to AutoRegression which is a model used to identify the relationship of the observation with other lagged observations. Integrated, is a pre-processing step to make the time-series stationary with the help of differentiation of observations. Moving Average (MA) is applied to lagged observations using the observation and residual error dependency. The ARIMA model using lag polynomials, is given by equations 24 and 25 [27] . The values for p, d, and q must be greater than or equal to 0. The model with ARIMA(p, 0, q) is known as ARMA(p, q) model, ARIMA(p, 0, 0) is an AR(p) model, and ARIMA(0, 0, q) is a MA(q) model. In most cases, the time-series data will be differenced once (value of d is 1). Random Walk model is a special case of ARIMA model with p = 0, q = 0, and d = 1, then the y t can be calculated using equation 26 [27] . Atangana also mentioned the detrimental effects of inadequate tesing. Coronavirus has already spread in 213 countries in the world among which is Kuwait. The SIR model is one of the most widely used models when a prediction of a disease outbreak is required. The SIR model has been applied to estimate the COVID-19 outbreak by various researchers for different countries around the world. Brauer indicates that "In order to prevent a disease from becoming endemic it is necessary to reduce the basic reproduction number R 0 below one. This may sometimes be achieved by immunization" [33] . Since a vaccine is not currently available, it becomes essential to give a reasonable estimate for R 0 so that healthcare decision makers can take any necessary measures needed to contain the spread of COVID-19. For SIR model estimation, the time varying infection rate (β(t)) and recovery rate (γ(t)) are the two important parameters used. This research focuses on an SIR based estimation using the confirmed cases in Kuwait for the period of February 24 th 2020 to May 28 th 2020. The SIR model is simulated using Python programming language with the help of some predefined Python modules or tools such as sklearn, matplotlib, xlrd, xlsxwriter, and math [34] . The sklearn module is an effective machine learning platform built on NumPy, SciPy, and matplotlib; it is used for error calculation using RMSE and R 2 . The graphs are plotted with the help of matplotlib. The data is collected from recognized sources such as the Kuwaiti government's official COVID-19 website [7] and WHO. The collected data of infections and recoveries are plotted against time and shown in figure 1. The value of S, I, and R are calculated at any time (t+1) from the values of these populations at time t, given by equations 29, 30, and 31, respectively [35] . The value of dS dt , dI dt , and dR dt are calculated using equations 12, 13, and 14, respectively. Based on the collected data, the cumulative counts for the infection and [36] . The value of R 0 determines a disease will or will not become an outbreak. If R 0 value is greater than one, then the disease will grow exponentially and affect a significant portion of the entire population [36, 22] . R 0 defines that an infected person is recovered in 1/γ days and he/she has an average of β contacts [37] . If the value of R 0 is below 1, there will not be an outbreak and it will suddenly decline. This study is conducted using different R 0 values. Our estimation is based on the confirmed cases from 24 th February 2020 to 28 th May 2020. February 24 th 2020 is when the first five cases were reported. The values of β and γ are changed over time. In the early stages, the infected cases increased slowly, and the recovery rate is zero for initial cases. Based on this estimation, the infection reaches its peak value between the 23 rd of July 2020 and the 22 nd of August 2020. The growth rate is observed to be slow in the early stages, and increasing gradually. It then increases exponentially reaching its peak point. After this estimated period, the infection rate starts to decrease, and declines gradually. In the future as the period under consideration may be longer, better estimates may become more possible. It should be noted that the data used does not capture external factors that may affect the resulting numbers. One factor, for example, is the policy to increase the number of COVID-19 tests conducted on random members of the population. At this time, the nature of this pandemic, as well as the responsive measures taken by the government to contain it, are continually being transformed. The Why is it difficult to accurately predict the covid-19 epidemic? Coronatracker: World-wide covid-19 outbreak data analysis and prediction WHO, Coronavirus disease (covid-19) situation report A review on corona virus Predicting turning point, duration and attack rate of covid-19 outbreaks in major western countries On predicting the novel covid-19 human infections by using infectious disease modelling method in the indian state of tamil nadu during 2020, medrxiv Corona virus covid-19 updates The public authority for civil information PACI official website Hospital bed occupancy and utilization: Is kuwait on the right track? Covid-19 spread: Reproduction of data and prediction using a sir model on euclidean network Agent-based models of malaria transmission: A systematic review Generalized Linear Models The Oxford Handbook of Quantitative Methods Poisson regression, The Southwest Respiratory and Critical Care Chronicles Estimation of the final size of the coronavirus epidemic by the logistic model (Update 3), medrxivAccessed on analysis of prediction models in spread of ebola virus disease A sir epidemic model for hiv/aids infection The sir model and the foundations of public health Impact of control strategies on covid-19 pandemic and the sir model based forecasting in bangladesh, medrxiv Epidemic situation and forecasting of covid-19 in and outside china An introduction to the basic reproduction number in mathematical epidemiology Covid-19 pandemic scenario in india compared to china and rest of the world: a data driven and model analysis, medRxiv Seir transmission dynamics model of 2019 ncov coronavirus with considering the weak infectious ability and changes in latency duration Covid-19 outbreak progression in italian regions: Approaching the peak by the end of march in northern italy and first week of april in southern italy Estimation of covid-19 prevalence in italy, spain, and france An introductory study on time series modeling and forecasting Forecasting the spread of covid-19 in kuwait using compartmental and logistic regression models Estimation of the final size of the covid-19 epidemic in pakistan, medRxiv Estimation of the final size of the coronavirus epidemic by the sir model, medrxiv Sir epidemic model with mittag-leffler fractional derivative Modelling the spread of COVID-19 with new fractalfractional operators: Can the lockdown save mankind before vaccination? Compartmental models in epidemiology Learning Python: Powerful Object-Oriented Programming Observation and model error effects on parameter estimates in susceptible-infected-recovered epidemic model Inferring r0 in emerging epidemics: the effect of common population structure is small A time-dependent sir model for covid-19 with undetectable infected persons