key: cord-0618605-6yn1cimi
authors: Cintra, P. H. P.; Citeli, M. F.; Fontinele, F. N.
title: Mathematical Models for Describing and Predicting the COVID-19 Pandemic Crisis
date: 2020-06-03
journal: nan
DOI: nan
sha: b5f49677a944a9c74d907ec1d2e9d7079b88e690
doc_id: 618605
cord_uid: 6yn1cimi

The present article studies the extension of two deterministic models for describing the novel coronavirus pandemic crisis, the SIR model and the SEIR model. The models were studied and compared to real data in order to support the validity of each description and extract important information regarding the pandemic, such as the basic reproductive number R0, which might provide useful information concerning the rate of increase of the pandemic predicted by each model. We next proceed to making predictions and comparing more complex models derived from the SEIR model with the SIRD model, in order to find the most suitable one for describing and predicting the pandemic crisis. Aiming to answer the question if the simple SIRD model is able to make reliable predictions and deliver suitable information compared to more complex models.

Back on 2015, a group of researches described the potential for a SARS coronavirus circulating inside bats to mutate to humans [1] . Early on 2020 the world suffers from an new pandemic crisis caused by the novel coronavirus, SARS-CoV-2, belonging to the Betacoronavirus genus and with probable origin on bats [2] . The first cases of the novel virus date back to December 2019 at the Food Market of Wuhan, China [3] , where bats are sold among other exotic animals, since then the virus has been spreading throughout China, later Asia, Europe, Africa and America, causing a global scale economic crisis and being notified by the World Health Organization as an pademic on March 11th. COVID-19 is a respiratory disease caused by the Sars-Cov-2 virus, currently in human-to-human sustained transmission [4] . We know that the contamination mainly occurs by a close interaction with infected individuals, as the viral charge is carried by respiratory droplets that can remain in suspension in air or deposited on surfaces of common contact. As a novel strain of the Coronaviridae family, it is not expected that any individual has antibodies against it, which causes the entire population to be susceptible to infection. As an individual is exposed to the virus, the incubation period begins, with no symptoms and a small chance of contaminating others. When the virus is onset, the infected individual show symptoms in a varied range of intensities and may develop severe acute respiratory syndrome. COVID-19 has a general mortality rate bellow 5% [5] , with an average of 2.3%. The behavior of the disease is age dependent, with the higher risk group being older populations, that present a mortality rate of 8% for individuals between 70-79 years and 14.8% for people older than 80 years [6] . However, even with a low mortality rate, the number of hospitalizations is quite high, with 5% of the cases being critical and 14% being severe [7] , presenting an challenge to health care systems of some countries.

Mathematical models predicted the potential for an international outbreak early on [8] and described how Wuhan became the center of an epidemic crisis on China. The outbreak quickly spread throughout mainland China and other countries. Although the pandemic crisis began on Wuhan, today the United States of America is the epicenter of the pandemic crisis on the world.

Many models are widely used from scenario prediction [9, 10, 11, 12 ] to data analysis [13] . Among all those models, the most used ones are the SIR and SEIR models [14] , including their modifications to include hospitalizations, asymptomatic cases and other compartments [15] . We develop here a comparative study between modified SIR and SEIR models; in order to provide support for the accuracy of both models, we compare the differences on the fitting process between the simple SIRD and simple SEIRD models and the accuracy of prediction generated by the simple SIRD and the SEIRD model with age division, using data from Germany and the Republic of Korea. The choice was based on the accuracy of the data for representing the true scale and dynamics of the pandemic, other countries who presented a much lower testing rate such as Brazil [16] are not reliable sources for testing models describing the disease. There are also countries such as Taiwan and Iceland, which kept track of the disease; however the number of cases in those countries were much lower, escaping the deterministic nature of the models described here.

Mathematical models for disease epidemic are either deterministic or stochastic [17] , where the first may be considered some sort of thermodynamic limit of the second. An analogy made with thermodynamics, where given a big enough number of particles in a gas, for example, you find deterministic equations to describe the behavior of the gas given by the laws of thermodynamics without the need to know the exact behavior of each particle. Otherwise, when your number of molecules is low or you try to compute too many interactions between particles of your system, the random behavior and fluctuations start taking place and you get to a stochastic model.

We describe here a simple extensions of models constantly used on literature [18, 19, 10] and with it show some possible behaviors of a disease outbreak.

A simple mathematical model for disease epidemic can been built dividing the population in 3 groups: susceptible individuals (S), infected individuals (I) and recovered individuals (R). The model composed by these groups is called the SIR model. In this article, however, we consider also individuals who have died by the disease, denote by D. Following the same arguments of the SIR model, the SIRD model can be described by the set of four differential equations:

Last equation is easily understood by thinking that the variation of the number of deaths may be proportional to the infected individuals, where the proportionality constant is denoted by µ. The constants γ and β are, respectively, the recovery rate and the number of infected, where µ and γ are given in terms of the infection fatality rate (IFR) or the case fatality rate (CFR); that is, the number of people who contracted the disease and died according to the total number of infections (IFR) or the registered number of infections (CFR), and the average time taken from symptoms onset to recovery, τ r , or death τ d , formally µ = P CF R /τ d and γ = (1 − P CF R )/τ r . The equations are simply a mathematical way to describe how individuals passes from one group to the other according to the following chain of events: A susceptible individual becomes infected by the virus, and from this point, it either dies or recovers ( Figure 1 ).

Infected I(t) Summing the four equations we get

where the constant may represent the total number of individuals, N . Before proceeding, we propose the initial condition that, when t goes to zero, I(t) = I 0 , R(t) = D(t) = 0, and, therefore,

Such an assumption is based on the fact that the entire population is susceptible to the SARS-CoV-2 virus.

Since R(t) and D(t) are both data updated day by day in Germany and Korea, it would be helpful to write I(t) as function of them so as to predict its behavior, obtaining, for example, the maximum number of infected individuals. For reasons that may be clear soon, we first write I(t) in terms of S(t). An intuitive step is to divide equation (2) by (1). Thus,

where k = (γ + µ)/β. Eliminating the temporal dependence, we get a separable differential equation, that is,

which the solution is easily verified to be

Applying the initial condition, we obtain

Hence, equation (8) may be written as

We can visualize here, that depending on the combination of γ and µ, I reaches 0 before the entire population S becomes infected ( Figure 2 ). Next, we may write S(t) in terms of R(t) and D(t). For this purpose, we begin by dividing equation (1) by (3) and (1) by (4),

Adding these two equations and writing S(R, D) as S(R, D) = f (R)g(D), we get

Since (13) is a separable equation, the well-known solution is given by

Therefore, S(t) can be written as

where we absorbed both constants A and B into C. By the initial condition, we find that C = N − I 0 . To find a and b, we must derive (16) in time under the condition that it may return to equation (1) . In this way, we see that

By the other hand, substituting equations (14) and (15) in (13), we get

Solving this system,

Hence, I(t) can be finally written as

With this equation, see that as t → ∞, I does not approaches N necessarily, depending on the recovery and death rates, I does not reach N .

The last important quantity extracted from this model is the basic reproduction number R 0 , given by [14] :

This quantity, is of vital importance of the study of a disease outbreak.

Another deterministic mathematical model possible is the SEIRD model, in which we consider the population N of a given region as divided in 5 groups. At time t, there are those who are susceptible to get infected S(t), the ones who have already been exposed the virus but does not present symptoms yet E(t), people who are already infected and present the symptoms I(t), the ones that have already recovered from the disease R(t) and those who are dead due to the infection D(t). This model is a good approximation to a short epidemic, so the population of a region is roughly constant throughout the epidemic period. Also, since this is a deterministic model, we assume N to be a big number compared to the number of people associated with the infection of a single person. The final consideration is that we also assume that people that are recovered from the disease acquire immensity and does not become susceptible to become infected again.

The rate of infection λ is proportional to the number of people infected, λ(t) = βI(t), where the constant β represents the effectiveness of the infection, the rate of cure γ = P :) τ −1 r , where P :) is the probability of recovery and τ r is the average time taken for an infected person to recover. Similarly the rate of death is µ = P CF R τ −1 d , where P CF R = 1 − P :) is the probability of death, given by the CFR and τ d is the average time taken for an infected person to die. Figure 3 carries an visual representation of the SEIRD model.

Infected I(t) Figure 3 : Representation of a SEIRD model, a susceptible person gets exposed to the virus, being infected afterwards and either dies or recovers from the disease.

The differential equations representing the evolution of the populations are given by

We first turn our attention to the construction of an appropriate formula for calculating R 0 with this model. For that we follow the method derived on [20] . The study develops a mathematical generalization for writing R 0 depending on the type of epidemiological model. R 0 is defined as

where ρ(X) means the spectral radius of the matrix X, that is, the largest absolute eigenvalue. Both F and V are the matrices of the derivatives of the functions defining the behavior of the disease population, with respect to each population compartment.

To get to these matrices, we first note that the set of equations regarding the dynamics of the SEIRD model can be expressed as follow: Consider x the vector of populations, that is

Analogously, d x/dt is the vector of the first derivatives. Then, we can write the dynamics of the populations as

where F is the vector that relates the appearance of new infections on the disease populations due to contamination, and V is the input and output of members in all populations due to all other causes, such as recovery from the disease, development of symptoms after an incubation period, etc. In our case, since all newly infected members go to the E population

Now, we know that the situation of a disease free equilibrium (DFE), meaning no disease is happening, is achived by the vector x 0 = (0, 0, S 0 , 0, 0), where S 0 = N . According now to [20] we can calculate F and V as

being x j the vector components of x related to the populations with the disease, in our case E and I, and m is the number of populations related to infectious beings. Here m = 2. Performing the derivatives, we conclude

The next step is to find the inverse matrix of V , fortunately V is a 2x2 matrix and the formula for it's inverse is straightforward

and we proceed to the last step of combining F V −1 in order to retrieve ρ(F V −1 ) and find R 0 .

therefore, by computing the eingenvalues of F V −1 we find

Having R 0 in our hands, we continue to the study of some behaviors of this model.

The set of equations describing the model is subjected to the initial conditions. When t → 0,

where I 0 is the initial number of infected, E 0 is the initial number of exposed in the population and no deaths or recoveries are assumed at t = 0.

Without vaccines or efficient medicine against the disease, non-pharmaceutical interventions are the only effective way to prevent further increase of the pandemic [13] . These interventions take different approaches such as social distancing, social isolation and lockdown of the population. Despite the differences, they all carry the same objective, decreasing the infection rate β. It is convenient to implement the effect of these interventions on the model, when making predictions. Here, we model this effect by a logistic function, where β starts at a initial value β i and at some critical time t c a intervention is imposed and beta decreases to β f = P dec β i , where P dec is the fraction of β i decreased by the intervention. In France, studies estimate that the intervention decreased β i by 77% [21] , therefore P dec = 0.77 in France. 

Since the case fatality rate (CFR) of COVID-19 is different among age groups [6, 7, 22] , we propose here a modification on both models, including the age distribution of the population and the social aspects of close contact between members of the population. The modification is describe as follow: Each compartment is divided into M age groups, where each i-th group has a P CF Ri associated to it, that is, the probability of death associated to the i-th age group. The β parameter is now described as the average number of daily contacts between a member belonging to the i-th age group to the j-th age group, multiplied by the infection probability P inf c

where C ij is called the social contact matrix and we included I j and E j inside β now to place everything on the same sum. The age distribution among the population is retrieved from the UN prospects [23] and the social contact matrix for those countries was measured on previous studies [24, 25] . The specific contact matrix for the Republic of Korea was not found, however, [26] finds evidences of cultural clusters in the world, where countries belonging to the same cluster share cultural similarities; thus, we use this fact to justify the use of Hong Kong's social contact matrix to describe the Republic of Korea. That way, we include cultural and population aspects for each of those countries, increasing the odds of a successful prediction. This type of model was used recently to describe the coronavirus outbreak on large cities in Brazil [27] .

To test the SIRD and SEIRD model we first compare them to the pandemic crisis on the Republic of Korea, running a numerical solution for the differential equations (23) - (27) we adjust the general behavior of the populations to Korean data acquired from [28] since 15/02/2020. The data from the Republic of Korea consists of the infection curve and death curve. To prevent problems with initial guess on the fitting process, both models used the same values for the initial guess, except E 0 , which is found only on the SEIRD model. S 0 = N was also left as a free parameter of the adjustment instead of set to the total population of the country, which is justified by a limitation in both models, where the population is assumed homogeneously spread, which does not correspond to reality. Thus, N does not represent the total population, instead it represents an effective population smaller than the total population, due to non-homogeneous distribution throughout the territory, the interpretation of N as the disease evolves is discussed on the discussion session. The parameter was chosen to be k = 0.44β, we considered a study which estimated that presyntomatic cases caused 44% of infections [29] , while for c we used an average of several clinical studies shown on Since the Korean government did not impose a lockdown or social isolation, we set P dec = 0 in both models. Figures 5  and 6 show the result of the fitting process and table 2 includes the acquired values for all parameters for each model. 

0.478 ± 0.004 0.513 ± 0.008 N 11035 ± 57 11218 ± 48 Table 2 : Parameters found by the adjustment with both models

The recovery time on both models is close to 8 days, while other studies such as [34] found 10 days. The time from symptoms onset to death was in both models close to 30 days, being 1 day shorter with the SEIRD model, [35] and [36] found τ d = 18 or 11 days.

Comparing the accuracy of the fitting with the data, both models resulted the same value of χ 2 . The parameter E 0 presents a large margin of error, which is expected given the lack of real data concerning the exposed population.

Proceeding to the calculation of R for both models, using equations (39) and (22) we found R SEIRD = 1.92 ± 0.07

The value for R 0 according to other studies ranges from 2 to 3 [37, 38, 39, 40] , therefore, both models yield acceptable values for R being the one predicted by the SEIRD model lower.

We now proceed to test the prediction accuracy of both models. We used the first third of the data for fitting both models and extracting parameters, after having the parameters, we compare the prediction for the next days with these parameters with the rest of the dataset.

For Germany, the fitting data corresponds to the cases and deaths until 17th March. However, until the peak is reached, both models find presents very large margin of error for N , to avoid this problem, we varied N manually, from 0 to 10% of the local population; which was taken from a united nation prospect for the year of 2020 [23] , at steps of 0.05%. At each step, we fit the initial data and reject the fitting if the χ 2 value is lower than 0.995, this χ 2 method for validating the goodness of a data adjustment had already been used for epidemiological models [41] . We than plotted the maximum and minimum acceptable fits to generate the margin of prediction, comparing it with the complete dataset. We also decided to use τ r and τ d according to clinical studies when performing the prediction, instead of leave them as free parameters for the fitting, I 0 was also determined a priori according to the first registered number on 15/02/2020. The resulting free parameters for fitting the training set are P inf c and E 0 .

With the SEIRD model, we found the maximum and minimum values of N to be N min = 0.25% of the German population and N max = 0.40% of the German population. The P inf c parameter varied from 15.5% to 16.2%

Using the SIRD, leaving β and I 0 as free parameters. The limit values of N were N min = 0.15% and N max = 0.2%, while β varied from 0.247 to 0.279 and I 0 from 11 to 30. Figures 7 and 8 show the result of both simulations, with the maximum N max and minimum N min curves. The shaded region is the region between N max and N min . 

The training set consisted of 20 days, corresponding to the infections from 15/02 to 06/03. The SEIRD model found N min = 0.03% and N max = 0.04% of the total Korean population, while P inf c varied between 80 to 85%. The simple SIRD model found N min = 0.02% and N max = 0.03%, β went from 0.345 to 0.436, and I 0 was between 23 to 65. Figures 9 and 10 present the result for prediction of both models. 

When concerning the adjustment process for acquisition of parameters with both models, there were no difference on the accuracy of the fit, and both models yielded very close values for the parameters. However, τ d is super estimated in both models, being slightly lower on the SEIRD model. The value of τ r is acceptable inside the variation of clinical measures.

The SEIRD model yields a slower growth rate than the SIRD model, that might happen due to the incubation period on the SEIRD model, which slows down the propagation of the virus towards other individuals. The main difference between the growth rate predicted by both models is better visualized by figure 11 , the action of the incubation period slows down the rate of infection, as seen by the adjustments, but also decreases the peak of infections. However, the cumulative numbers of infection, deaths and recoveries are the same. N could be understood as the population susceptible to the first pandemic wave, due to the non-homogeneous distribution of the population, not everyone is susceptible to the disease right at the start. With such an interpretation, N tends to increase with time and approach the total population, here. Comparing predictions generated by the SIRD model with the SEIRD model with age division, the SEIRD model becomes a little more precise, although both simulations fail to predict the slower decrease of Korean data, that might be explained by an increase on N as time passes, resulting in new cases registered and therefore, slowing down the rate of decrease. Such hypothesis is well acceptable since the Republic of Korea did not adopt any lockdown or social isolation measure, making the disease able to propagate towards other regions, increasing N with time. Even with better prediction, the SEIRD model is far more complicated than the SIRD model and the use of the later should probably not compromise any data analysis. The same must hold true for simple SIR and SEIR models, when deaths are not a population to be accounted for, instead are just represented with a rate of removal for individuals. The social isolation model developed here shows good results on the predictions, indicating that the description of β should be close to reality. Here we find a huge advantage of the SEIRD model with age division in comparison with the SIRD model; by including age division, it is possible to simulate the effect of specific non-pharmaceutical interventions, such as school closure, which in principle would decrease β i I and β i E for the age groups between 0 to 19 years only. Another possibility is to include isolation of only elderly individuals. Several non-pharmaceutical measures have been already described in literature [42] , other studies show how the total number of infected might be changed due to the efficiency of non-pharmaceutical measures [43] .

Other models might present more complete analysis of the disease, including hospitalizations and even asymptomatic cases, which are difficult to track and seem to vary a lot from place to place, the Diamond Princess cruise ship found 17.9% of asymptomatic infections [44] , while an airplane flight found 11.2% of cases being asymptomatic. An Italian village presented 50 to 70% of cases being asymptomatic [45] . There are yet the problem of assuring that the asymptomatic cases registered on studies are really asymptomatic and not presymtomatic, that is, are people still on the incubation period.

Of course, any mathematical model is only as good as the data allows, using mathematical models to describe the disease on countries with low testing rates might yield unrealistic predictions. For example, [46] estimates 86% of infections being undocumented on China, at the early stages of the outbreak.

Another consideration we did not take, was the possibility of reinfection, where individuals leave the recovered group and re-enter the susceptible compartment. However, since other coronaviruses belonging to the same genus betacoronavirus such as the SARS-CoV and the MERS-CoV does not present a high enough mutation rate to cause reinfection in short term [47] , the only cause of reinfection would be the loss of antibodies to fight the virus; nevertheless, on both diseases, the infected person acquires antibodies enough to prevent reinfection for a period of 2 -3 years [48] . With those considerations, we did not assume reinfection was probable on short-term. Future studies may be conducted to study the possibility of reinfection of individuals on the long-term.

Mathematical models of a disease outbreak such as the COVID-19 are able to predict the behavior of the infection. Both models have proved to be efficient tools for acquiring data and forecast the future situation. Despite the limitations, the models made it possible to achieve a value of R 0 in good agreement with other studies, providing evidence in favor of the validity of the model. However, the present models do not take into consideration the spatial distribution of the population, reflecting on some uncertainties that made the window of prediction larger.

A sars-like cluster of circulating bat coronaviruses shows potential for human emergence

Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding

Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia

World Health Organization et al. Coronavirus disease 2019 (covid-19): situation report

Features, evaluation and treatment coronavirus (covid-19)

The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (covid-19)-china

Characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72 314 cases from the chinese center for disease control and prevention

Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study

Estimating and simulating a sird model of covid-19 for many countries, states, and cities

A simulation on potential secondary spread of novel coronavirus in an exported country using a stochastic epidemic seir model

Projecting demand for critical care beds during covid-19 outbreaks in canada

The global impact of covid-19 and strategies for mitigation and suppression. WHO Collaborating Centre for Infectious Disease Modelling, MRC Centre for Global Infectious Disease Analysis

Inferring change points in the spread of covid-19 reveals the effectiveness of interventions

Athanasios Tsakris, and Constantinos Siettos. Data-based analysis, modelling and forecasting of the covid-19 outbreak

Estimating the reproductive number and the outbreak size of covid-19 in korea

Estimative of real number of infections by covid-19 on brazil and possible scenarios. medRxiv

On a stochastic epidemic seihr model and its diffusion approximation

Estimations of the coronavirus epidemic dynamics in south korea with the use of sir model

Statistics based predictions of coronavirus 2019-ncov spreading in mainland china

Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission

Estimating the burden of sars-cov-2 in france

Characteristics of covid-19 infection in beijing

Department of Economic and Social Affairs

Social contacts and mixing patterns relevant to the spread of infectious diseases

Social contact patterns relevant to the spread of respiratory infectious diseases in hong kong

Cultural clusters: Methodology and findings

Expected impact of covid-19 outbreak in a major metropolitan area in brazil. medRxiv

Temporal dynamics in viral shedding and transmissibility of covid-19

Incubation period of 2019 novel coronavirus (2019-ncov) infections among travellers from wuhan, china

Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data

Clinical characteristics of coronavirus disease 2019 in china

The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application

Chest ct findings in coronavirus disease-19 (covid-19): relationship to duration of infection

Clinical predictors of mortality due to covid-19 based on an analysis of data of 150 patients from wuhan, china

Characteristics of sars-cov-2 patients dying in italy

Preliminary estimation of the basic reproduction number of novel coronavirus (2019-ncov) in china, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak

The reproductive number of covid-19 is higher compared to sars coronavirus

Estimation of the reproductive number of novel coronavirus (covid-19) and the probable outbreak size on the diamond princess cruise ship: A data-driven analysis

Novel coronavirus 2019-ncov: early estimation of epidemiological parameters and epidemic predictions. medRxiv

Prediction modeling with data fusion and prevention strategy analysis for the covid-19 outbreak

An optimal control theory approach to non-pharmaceutical interventions

Impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand

Estimating the asymptomatic proportion of coronavirus disease 2019 (covid-19) cases on board the diamond princess cruise ship, yokohama, japan

Covid-19: identifying and isolating asymptomatic people helped eliminate virus in italian village

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov2)

Moderate mutation rate in the sars coronavirus genome and its implications

Two-year prospective study of the humoral immune response of patients with severe acute respiratory syndrome

The age division does not change the prediction drastically, suggesting that in the case of a simple prediction or analysis, SIRD models are useful. The age division SEIRD model provides an advantage when requiring specific simulations on specific groups of the population.