key: cord-0985196-le7mxh9g
authors: Ananthakrishna, G.; Kumar, J.
title: A reductive analysis of a compartmental model for COVID-19: data assimilation andforecasting for the United Kingdom
date: 2020-05-29
journal: nan
DOI: 10.1101/2020.05.27.20114868
sha: a0a605b90bcaf66965b5ade77fc94b88742d0855
doc_id: 985196
cord_uid: le7mxh9g

We introduce a deterministic model that partitions the total population into the susceptible, infected, quarantined, and those traced after exposure, recovered and the deceased. We introduce the concept of 'accessible population for transmission of the disease', which can be a small fraction of the total population, for instance when interventions are in force. This assumption, together with the structure of the set of coupled nonlinear ordinary differential equations for the populations, allows us to decouple the equations into just two equations. This further reduces to a logistic type of equation for the total infected population. The equation can be solved analytically and therefore allows for a clear interpretation of the growth and inhibiting factors in terms of the parameters in the full model. The validity of the 'accessible population' assumption and the efficacy of the reduced logistic model is demonstrated by the ease of fitting the United Kingdom data for the total number of infected cases. The model can also be used to forecast further progression of the disease. The approach further helps us to analyze the original model equations. We show that the original model equations provide a very good fit with the United Kingdom data for the cumulative number of infections. The active infected population of the model is seen to exhibit a turning point around mid-May, suggesting the beginning of a slow-down in the spread of infections. However, the rate of slowing down beyond the turning point is small and therefore the cumulative number of infections is likely to saturate to about 3.8 x 105 only towards the end of July or beginning of August, provided the lock-down conditions continue to prevail. Noting that the fits obtained from the reduced logistic equation and the full model equations are equally good, the underlying causes for the limited forecasting ability of the reduced logistic equation is elucidated. The model and the procedure adopted here are expected to be useful in fitting the data for other countries and forecasting the progression of the disease.

The highly contagious SARS-CoV-2 has infected more than five million people worldwide since its first detection in China on December 31 [1] . The novel coronavirus is the fourth wave in the class of coronaviruses. In less than two months, the virus has spread all over the world, posing serious threats to health care systems and economies. The alarming speed of transmission, the virulence of the disease, and the unprecedented high proportion of fatalities even in countries with high healthcare indices have raised questions about what kind of interventions are appropriate for a given setting. The wide variability in infected numbers and fatalities in different counties and settings has also brought into sharp focus a debate about the underlying causes of the variability. In the absence of any treatment for the disease and non-availability of vaccines in the near future, policy makers have resorted to standard epidemiological interventions, such as social distancing, isolation, contact tracing, and quarantining, and more recently a complete lock-down.

At a basic level, the purpose of all non-pharmacological interventions is to control disease transmission by limiting the proportion of population exposed to the virus as much as possible. Furthermore, inherent in the process of implementation of these interventions are delays at each stage. The delay time-scales are specific to the particular intervention.

The importance of mathematical models describing the spreading dynamics of infectious diseases has been recognized since early days [2] . In particular, the fact that timely models that include realistic features have often been helpful in decision making on health care issues is well recognized [2] [3] [4] . In the short period since the emer-gence of the coronavirus, there have been several mathematical models [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] , to name a few. Several of these models attempt to evaluate the contribution from different transmission routes, such as contact tracing and isolation [10, 11, 18] , travel restrictions [16, 17] , social distancing [19, 20] , lock-down measures [16, 21, 22] , and a combination of several of these interventions [12, 23, 24] . These models broadly fall into three categories, deterministic, stochastic, and simulations. Several new mathematical techniques used in different disciplines have been employed to gain insights, which would not be possible with the traditional approaches in the field. These include the human mobility model [16] , differential evolution [19] , heuristic optimization technique [19] , stochastic agentbased discrete time simulation [25] , supply chain risk simulations [26] , etc.

One class of epidemiological models attempt to describe the transmission dynamics by partitioning the population into smaller subsets based on the disease status such as the susceptible, exposed, infected, quarantined, recovered, etc [5] [6] [7] [8] [9] . Most models of this kind ignore age-dependent infection and fatality rates, and the heterogeneous spatial distribution of the population. In a sense, these models describe the evolution of the mean response of each type of population. Despite these limitations, these models have the ability to include several realistic features.

In compartment type of models, the disease status of individuals changes with the development of the disease, i.e., transitions occur between two compartments either due to interaction of the infected with the susceptible or due to interventional actions. These models include delay time-scales inherent in the dynamics of transmission, for instance, the period spent in quarantine and the time required for tracing individual exposed to the infected. These models have the ability to include several realistic features, such as the response of the population to interventional measures. However, generally, inclusion of more and more realistic features requires a larger number of partitions. Then, the number of differential equations increases and so does the number of parameters, making calibration of the parameters difficult [5] [6] [7] [8] [9] 13] .

Motivated by the complexity of such models, we have devised a compartment-based model having susceptible, infected, quarantined, traced, recovered, and deceased populations. The susceptible and the infected form the core populations in the sense that it is through these two populations that inward/outward transitions occur with other populations. We introduce the concept of 'accessible population for infection', assumed to be a small fraction of the total population. The validity of this assumption can be seen by noting that the purpose of interventions is to minimize the exposure of the population to virus transmission, thereby limiting the spread of infection. We further assume that the order of magnitude of the accessible population is similar to that of the infected population. This, assumption is made more quantitative. This, together with the structure of the model equations, allows us to decouple them into two equations. These two equations further reduce to a logistic type of equation for the total infected population with well defined parameters namely, the 'testing rate' and 'contact rate' transmission parameters [27, 28] . The equation can be solved analytically, thereby allowing for a clear interpretation of the parameters controlling the growth and inhibiting factors. The validity of the 'accessible population' assumption and the efficacy of the reduced model is demonstrated by the ease of fitting the cumulative number of infections for the United Kingdom (UK). The procedure further allows us to forecast the progression of the disease. Using this information and calibrating the relative importance of various transition rates (equivalently the associated parameters), we optimize the parameter values specific to the UK. Using this, we numerically solve the full model equations. The calculated total infected population fits very well with the available data for the UK [29]. (UK does not publish data on the recovered and the active populations.) The model exhibits a turning point in the active infected population around May 15. However, since the rate of slowing down beyond the turning point is poor, the projected end time of the epidemic would be around late July or early August and the predicted saturation level of the total number of infections is ∼ 3.8 × 10 5 assuming lock-down conditions continue.

The total population N is partitioned into the susceptible S, active infected I, quarantined Q, those traced T after being exposed to the infected, recovered R, and the deceased D. The respective populations are denoted by N s , N i , N q , N t , N r and N d .

Testing is one of the standard protocols used for identifying the infected. If α s is rate of testing per day per million and p s is the probability of testing positive, then α s p s N s is the transition rate from S to I. Infected Individuals coming into contact with the susceptible class can transmit the virus. If β i is the transmission rate per contact, p i is the probability of transmission of the disease, F (d i ) is a distance dependent interaction, and f i the proportion of the susceptible coming in contact with the infected, then, f i p i β i F (d i )N i N s is the transition rate from S to I. Considering the fact that one of the primary routes of transmission is through airborne aerosols generated by the infected, a larger separation is known to reduce the risk of transmission [19, 20, 30] . This distance

However, in the present context where we will be dealing with a lock-down situation for most part of the progression of the disease, we set F (d i ) = 1.

During testing, some individuals would always exhibit mild or ambiguous symptoms. These are identified as pre-symptomatic. If the probability of finding the presymptomatic is p q , then, α s p q N s transition out of S to Q. Subsequently, when tested again, say after a quarantine duration [5, 31, 32] , some of them may either test positive with a probability q 1 or negative with a probability (1 − q 1 ). If positive, the transition out of Q (to I) is q 1 λ q N q . Here, 1/λ q is the quarantine duration, usually of the order of the incubation period [31, 32] . Similarly, if tested negative, the transition rate out of Q into S is (1 − q 1 )λ q N q . The total loss rate toṄ q is λ q N q . Tracing those exposed to the infected and testing to find if they are infected, are important steps in controlling the spread of the disease. Inherent in tracing such individuals are delays in tracing. Such delays cause increased transmission of the disease. If p t is the probability of tracing such individuals, then, α t p t N s is the transition rate from S to T . Subsequently, individuals testing positive will move to the infected compartment I with a probability q 2 and the rest with a probability (1 − q 2 ) move to S. The total transition out of T is equal to λ t N t , where 1/λ t is the time taken to trace the individuals. (There is also another possibility, namely, some individuals may show mild symptoms. Then, there would be a transition into Q. For the sake of simplicity, we have ignored this route.) Finally, the outward transitions from I are the recovery and death rates respectively, γ r N i and κ d N i .

Collecting these terms, we have the following set of coupled nonlinear ordinary differential equationṡ

(Here, we have suppressed F (d i ) factor since it has been set equal to unity.) Note that the total infected population is given by

To begin with, we highlight a few features of the model equations. Our model, much as other compartment-type models, has several parameters. However, several of these are directly measurable and therefore can be obtained from the literature. A few others are related to testing protocols and again can be obtained from the literature or from relevant open sources [29] . For instance, α s p s , α s p q and α t p t are directly related to testing rates and therefore, these are known for a given situation. A few other rate parameters such λ q , λ t , γ r and κ d , are inversely related measurable time-scales, such as the duration of quarantine τ q , time required for tracing τ t , time for recovery starting from illness τ r and the time from illness to death τ d respectively [32, 33] .

The present model includes two delay loops defined by Eqs.

(3) and (4). These delays are natural to the implementation of the protocols. For instance, once quarantined subsequent tests are conducted after quarantine duration to identify if quarantined individuals test positive or negative. Similarly, delays in tracing individuals . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint are common. A more transparent way to describe these delay loops is through the integral representation of Eqs.

(3) and (4), which forms the definitions of the two populations N q and N t , respectively. For instance,

When the kernel K(t) is modeled using an exponential form with a single time scale 1/λ q , i.e., K(t) = e −λqt , one can easily verify that differentiating Eq. (7) (using the Leibniz rule) leads to Eqs.

(3). The convoluted nature of the integral physically implies that those quarantined earlier will leave the quarantine sooner than those quarantined later.

Equations (1-6) constitute a set of coupled nonlinear differential equations. A standard procedure for further analysis of such equations is through numerical integration. Recall that our model is devised in such a way that there are two main populations, namely, the susceptible (Eqs. 1) and the infected (Eq. 2). Furthermore, Eqs. (5, 6) are essentially decoupled from the rest (transitions to R and D are from I). These two features suggest that Eqs. (1,2) can be decoupled from the rest of the equations. We refer to the decoupled equations as the reduced model equations. Since the two equations can be further reduced to a logistic-type equation (referred to as the reduced logistic equation), it can be analytically solved. As we shall see, analysis of this equation provides insights that prove to be useful for the analysis of the full model Eqs. (1-6). (We shall often refer to Eqs.

(1-6) as full model equations to avoid confusion.)

We now introduce the concept of 'accessible population for transmission of the disease'. To appreciate this concept, consider the spreading dynamics of a contagious disease in the absence of any interventions. Then, in principle, the entire population is exposed to the disease, and it may spread to the entire population (barring the possibility of population acquiring herd immunity). In this case, the entire population is the accessible population. However, since no Government would like to see the entire population infected, interventional measures are enforced precisely to mitigate the risk of transmission and limit the population exposed to the disease to a minimum. In this case, the accessible population is expected to be a small fraction of the total population.

Consider dropping all terms except α s p s N s and Eqs. (1,2). Then, these two equations get decoupled from the rest of the equations. Further, because all other inward/outward transitions are removed, the character of the compartment I changes from the active infected to the cumulative infected I t with N t denoting the corresponding population. Then, we havė

Noting that

we have N t + N s = constant. Without loss of generality, we set N t + N s = N s (0), the total population. Then, we get a single equation governing the cumulative infected population, given bẏ

Equation (11) has the well known form of logistic equation extensively studied in the context of population dynamics [34] , with a notable difference, namely, the parameters a, b, and c have a well defined interpretation as discussed above. We refer to Eq. (11) as the reduced logistic equation. (For brevity we often refer to α s p s and f i p i β i as testing and contact transmission rates respectively.)

We begin with a few observations on the relative magnitudes of the model parameters in the absence and presence of interventions. Consider a situation when there are no constraints. Then, one should expect that the testing rate (α s p s ) to be low due to absence of any guidelines from policy makers. Similarly, since infected individuals carry on with their routine activity, the number of con-. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint tact transmissions is high and hence, the contact transmission rate (f i p i β i ) is expected to be high (compared to when interventions are in place). Then, the total accessible population denoted by N a (0) is the entire population of the region or the country, i.e., N a (0) = N s (0). In contrast, when interventions are in place, testing rates are high to ensure identification of the infected, therefore, α s p s is high. In this situation, since the mobility of individuals is restricted, the number of contacts is severely limited, i.e., f i p i β i will be small. Therefore, the accessible population N a (0) is expected to be small compared to the total population N s (0). These qualitative statements about the accessible population will be made quantitative by carrying out a detailed analysis of Eq. (11) .

Consider the initial growth of Eq. (11) by dropping the quadratic term. Then, we have

The solution is given by

where N t (0) is the initial number of infections. As can be seen, the growth rate given by b ≈ f i p i β i N s (0) depends on N s (0), the total population. Therefore, the growth rate can be high. In addition, the prefactor for the exponential growth term (in Eq. 16) depends not only on N t (0) but also on c/b = α s p s /f i p i β i . Thus, the initial growth depends on relative magnitudes of N t (0) and α s p s /f i p i β i .

It is straightforward to obtain the solution of Eq. (11) . (See Appendix for details.) Here it is adequate to consider the solution in terms of the parameters a, b, and c, given by

We now examine two limiting cases. For short times, N t tends to (N t (0) + c b )e bt (since the denominator is dominated by b/a = N s (0)), consistent with the short time solution given by Eq. (16) . For long times however, N t tends to b/a = N s (0), the total population.

The self-limiting nature of Eq. (17), a characteristic feature of logistic equations, is evident from the fact that N t tends to N s (0). In other words, the entire population becomes accessible for transmission of the disease. Clearly, the situation can only represent the growth of infection in the absence of any kind of interventions.

On the other hand, the effect of all interventions is to limit the transmission rate of transmission, thereby limiting the proportion of the exposed population to the disease to a small fraction. It is this that we call the accessible population. In other words, the accessible population N a (0) is of the same order as the infected population. This can be written as

However, within the scope of the reduced logistic model, the evolution of N t is independent of the values of the parameters α s p s and f i p i β i during the absence or presence of interventions. As a consequence, the asymptotic value of the cumulative infected population is always N t = N s (0), the entire population. Therefore, demonstrating the accessible population is a small fraction of the total population is outside the scope of Eq. (11) and the full model Eqs. (1-6 ). An independent way of demonstrating N a (0) = FN s (0) is desirable.

A. Quantitative estimate of the accessible population Since the factor F is not well determined, there is a necessity to get a better estimate of N a (0). This is done by numerical evaluation of the dependence of N t on the parameters N a (0), α s p s , and f i p i β i . Given the fact that the disease evolves, N a (0) also evolves with time. This can be seen by the fact that in the early stages of evolution, N a (0) will be small, even in the absence of interventions.

Consider the dependence of N t on N a (0), keeping α s p s and f i p i β i fixed. In addition, since the disease evolves with time, the accessible population also evolves with time. We find that even for relatively large values N a (0), N t grows exponentially; for intermediate values, a near saturation value is reached in relatively short duration of 10-15 days; and for small values, the saturation value is not reached even after 100 days. These features are illustrated in Fig. 1 in plots (i-iii) for N a (0) = 8 × 10 5 , 2.8 × 10 5 and N a (0) = 1.45 × 10 5 respectively, keeping f i p i β i = 3.3913 × 10 −7 and α s p s = 1.0 × 10 −4 . We have also examined the influence of f i p i β i , keeping . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 29, 2020. . N a (0) = 2.8 × 10 5 and α s p s = 1.1 × 10 −3 . We find that smaller values of f i p i β i , it takes a longer time for the infection (N t ) to grow. This feature can be seen from the curves (iv) for f i p i β i = 4.7522 × 10 −7 and (iii) for f i p i β i = 3.3913 × 10 −7 . We have also examined the growth dependence of N t on α s p s , keeping the other two parameters fixed. The dependence of N t on this parameter is similar to that on f i p i β i . The curve (v) taken together with (ii) shows that increasing α s p s also leads to faster initial growth of N t . In the same plot, we have also plotted the total number of infected cases • for the UK.

A careful scrutiny of the total coronavirus cases (•) in the UK shows that it is similar both in magnitude and shape to the plot of N t corresponding to N a (0) = 0.28 × 10 6 marked (ii) shown in Fig. (1) . This similarity suggests two important points. First, noting that the UK is under lock-down, one expects that the accessible population is a small fraction of the total population, and therefore we see that the order of magnitude of the accessible population N a (0) used is comparable to that of the infected population N t shown in curve (ii). The figure also shows that as much as all populations evolve dynamically during the development of the pandemic, N a (0) also keeps evolves with time. Second, the similarity in shape of the UK data (•) with the sigmoidal shape of the logistic solution raises a question whether the similarity is accidental. If not, can this be used to fit the UK data?

However, considering the complex dynamics of the highly contagious virus and the fact that logistic equation can at best represent simple situations, any attempt to fit the data appears ambitious. Even so, it is tempting to examine if Eq. (17) could be used to fit the coronavirus data for some country/region. To do this, we first note that the reduced model equation contains just three parameters and the dependence of N t on these parameters has already been examined [see Fig. 1 ].

In most countries, the development of the disease falls into two phases, namely, the initial period when Governmental constraints are absent, referred to as phase one and the period beyond the lock-down date, called phase two. In the case of the UK, the first case was reported on January 31, 2020. Subsequently, the lock-down was imposed on March 23. Thus, we need to fit the data for the period January 31 to March 23 and then the rest.

Consider the period between January 31 and March 23, 2020. Briefly, the fitting procedure adopted here is to equate the initial growth rate of infections obtained from the coronavirus data with the model growth rate given by Eq. (16) (or Eq. 17). Using the fact that the accessible population is of the order of the total number of infections, we use a trial value of N a (0) (assumed to be a few times larger than the infected population) to fix the parameter β i . Then, the correct value of N a (0) that provides the best fit for the entire data is found iteratively by decreasing N a (0) so as to fit increasing number of data points. The procedure is illustrated below.

Here, we use the analytical solution given by Eq. (17) (or solving Eqs. 8-9) with parameters and initial conditions appropriate for the unconstrained growth. Recall . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint that the testing rate parameter α s p s is low during the initial period and the contact transmission rate parameter f i p i β i would be high. The values of these two parameters in the lock-down period are just the opposite. Consider the first phase where virus transmission is unconstrained. A careful perusal of the UK data shows that a smooth increase in the infected numbers starts on Feb. 26, 2020, when the number infected was N t = 13. The local growth rate obtained from the data over 7 days was found to be 0.4820/day. Equating this with the model growth rate given by f i p i β i N a (0) (in Eq. 16), with a trial value of N a (0) = 4.0 × 10 5 fixes a value of β i = 1.205 × 10 −5 . The solution of Eq. (17) (or Eqs. 8-9) obtained using the initial condition N t = 13 keeping α s p s = 0, passes through several more data points than 7. In the next iterations, we reduce N a (0), keeping in mind that the solution should pass through larger number of data points. In addition, since the initial growth rate (Eq. 16) depends on c/b = α s p s /f i p i β i also, a proper value of α s p s is required for a good fit. For the initial phase (of short duration), we find that just one iteration of reducing N a (0) to N a (0) = 1.86 × 10 5 with α s p s = 9 × 10 −6 fits the data well for the period from Feb. 27 to March 23, 2020, as shown in the inset of Fig. 2 .

Fitting the data for the second phase follows the same iterative procedure except that the number of iterations is greater for the second phase due to the large number of data points. The number of infections as on March 23 stood at N t = 5687. This number matches with the predicted value of N t as on March 23, 2020, obtained from Eq. (17) for the first phase. (See the inset in Fig. 2 .) The local slope over 13 points from the lock-down day is 0.16383/day. This slope is equated with model growth rate using a trial value of N a (0) = 5.0 × 10 5 (α s p s = 0) to obtain β i = 3.2766 × 10 −6 . Using the initial condition N t = 5687 in Eq. (17) (or solving Eqs. 8-9), we find that the solution (i) (with α s p s = 0) passes through a few more than 13 points. In the next iterations, we reduce N a (0) = 4.0 × 10 5 and compute the solution taking into account the contribution from α s p s = 1 × 10 −3 . The solution (ii) passes through several more data points. Two further iterations for successively smaller values of N a (0) = 3.00 × 10 5 and N a (0) = 2.70 × 10 5 are used to obtain the solution marked (iii) and (iv), respectively. (The corresponding value of α s p s = 2.8 × 10 −3 , and α s p s = 2.8 × 10 −3 respectively.) This is shown in Fig.  2 . As is clear from the Fig. 2 , solutions (ii) and (iii) are seen to pass through successively larger number of points. Surprisingly, the solution curve labeled (iv) with N a (0) = 2.70 × 10 5 fits the entire data very closely. (The overall accuracy of the fit is not less than 99.95%.) Note the increasing trend of the values of α s p s for successive iterations. This feature is consistent with the steadily increasing testing rates routinely used for proper enforcement of lock-down. This feature is easily incorporated by parameterizing α s p s with time.

Unexpectedly, apart from providing a good fit for the entire data, the method appears to have a predictive power, as is clear from the curve (iv) which shows that the rate of slowing of the total number of infections is decreasing. The predicted saturation value is ∼ 2.75 × 10 5 . A near saturation value is likely to be seen by the first week of June. These results suggest that the reduced logistic model can be used for obtaining a fit for the COVID-19 data for other countries as well. The good fit however is attributable to fact that the total infected . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint population N t does not carry any information about the recovered and the dead. On the other hand, Eq. (17) does not include outward transitions (the recovered and the dead), and the inward quarantine and tracing transitions. Therefore, the estimated saturation value and the projected future development should be taken with some reservation. This will be clear once the full model equations is analyzed and a fit with COVID-19 data for the UK is accomplished. Despite these limitations, because the reduced logistic equation retains basic growth contributions to the cumulative infected N t , the fit with the data appears reasonable.

There are attempts to use logistic equations to get insights into the dynamics of COVID-19 transmission [27, 28] . For instance, a five-parameter hierarchical logistic model has been used to fit the observed data to project the cumulative number of cases for several countries [28] . The parameters entering in the model are determined by the fitting procedure.

One of the challenges of compartmental models is the difficulty associated in making accurate predictions, mainly attributable to the uncertainties in obtaining proper estimates of the parameters [13-15, 32, 33] . For the same reason, forecasting is even more challenging. Often, several factors may also contribute to the same parameter, making it difficult for proper interpretation. In our model however, several parameters in Eqs. (1-6) are related to measurable quantities. For instance, the parameters α s p s , α s p q and α t p t respectively represent rates of testing positive, rates identified as presymptomatic, and tracing rate of those exposed to the infected. Similarly, parameters λ q , λ t , γ r and κ d are inversely related to quarantine duration τ q = 1/λ q and time required for tracing τ t = 1/λ t , time from illness to recovery τ r = 1/γ r , and time from illness to death τ d = 1/κ d . Though these quantities are country/regionspecific, their values have been estimated in the literature [5, 6, 8, 32, 33, 35, 36] . Some values are also available in the public domain [29, 37] . One parameter that is hard to estimate is the contact transmission rate β i , which has is already estimated in the context of the reduced logistic (1-6) is necessarily complex. Therefore, in the absence of appropriate values relevant for the country/region, a systematic method of finding optimized values of parameters that fit the data . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint under considerations requires calibration of all the parameters in the model. Following the method developed recently in the area of plasticity [38] [39] [40] , we investigate the influence of the parameters on the growth of N i to identify the relative importance of the transition rates. Since it is a multi-parameter space, we vary each parameter, keeping all other parameters fixed at a reference set of values listed in Table I . The results are illustrated using plots of the active infected population N i . The dotted curve shown in Fig. 3 is the reference curve corresponding to the reference set of parameters given in Table I . As in the case of the reduced logistic model, the growth of N i sensitively depends on f i p i β i . Even a 20% increase induces a substantial increase in the peak height and position, as is clear from the curve (i). A similar effect is seen when testing rate α s p s is increased by a factor two seen in (ii). In contrast, an increase in quarantining rate by a factor two decreases the peak height marginally, as shown in curve (iii). We have also investigated the dependence of the recovery (γ r ) and death rate (κ d ) parameters on N i . An increase in the death rate by 30% decreases the peak height marginally as is clear from (iv). A Similar effect is seen when the recovery rate γ r is increased (see (v)). We have also investigated the influence of other parameters and find that N i is relatively insensitive. Noting that any change in the parameter values relative to those corresponding to the reference curve changes the peak position and height, we conclude that the parameters listed in Table I are close to the optimized values.

Having demonstrated that the two direct transition rates f i p i β i and α s p s are the dominant contributions to the growth of N i and having assessed the relative importance of other transitions, we now consider the solution of the full model Eqs. (1-6) with a view to obtaining the best possible fit with the COVID-19 United Kingdom data. Attempt will also be made to forecast the future progression of the disease.

Recall that the spread of coronavirus in the UK falls into two phases of development. During the first phase prior to the lock-down on March 23, 2020, there were no Other parameter values are the same as given in Table I. constraints and the disease transmission was free. After the lock-down date, the transmission is restricted. Therefore, the model parameters and the initial conditions relevant for the two phases are different. As in the reduced model, we assume that the dynamics of the disease transmission is limited by the accessible population N a (0) and not by the total population N s (0), i.e., N a (0) ≈ FN s (0).

Consider the period between January 31 and March 23 corresponding to the initial phase. For further analysis, . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint it is useful to begin with a few observations. First, recall that our analysis in the previous Section showed that the parameters corresponding to the reference curve N i (dotted curve) in Fig. 3 are close to the optimized values (listed in Table I ). In addition, the parameters f i p i β i and α s p s are different in the two phases. Second, prior to the lock-down date, there would be no quarantining and tracing procedures. Considering this, it is adequate to solve Eqs. (1,2,5,6 ) for the first phase (by ignoring delayed inward transitions into I). Furthermore, in the first few days of the development of the disease, we may assume that the total number of the infected cases N t is equal to the active infections N i . Finally, since, we plan to fit the model solution with the UK data [29], publicly available coronavirus data for the total number infected, active infected, recovered and the dead are useful in further optimizing the parameters. Unfortunately however, only the total numbers of the infected and the dead are made available in the UK. Now we are in a position to solve the relevant equations for the first phase. As discussed in Section III B, we use February 27, 2020 as the starting day for the first phase evolution of Eqs. (1,2,5,6 ). The local growth rate (on the starting day) over 12 days obtained from the loglinear plot of the cumulative infected cases for the UK is equated with the model growth rate given by f i p i β i N a (0) to fix β i = 6.118 × 10 −6 by using the initial value of N a (0) = 4.0 × 10 5 . Further, using the initial conditions for N t (0) = N i (0) = 13, N r (0) = 0 and N d (0) = 0, we solve Eqs. (1,2,5,6) from February 27 to March 23 by choosing a value for α s p s that gives the best fit to the data for the period. (Here, α s p s = 7.2 × 10 −7 and the values of other relevant parameters are those listed Table .I.) The model-predicted total infected population N t (continuous curve) along with the data points (•) is shown in the inset of Fig. 4 . Clearly, the match is seen to be very good. Also shown is a plot of active infections N i (dotted curve). Equations (1,2,5,6) also provide the values of N i , N r and N d on March 23, 2020. These are N i = 5407, N r = 400, N d = 285. Now we consider the solution of Eqs. (1-6) in an effort to obtain the best fit for the UK data for the period starting from March 23, 2020. Here again, we first find the growth rate from the data and equate it with the model growth rate. Using the 13-point slope in the log-linear plot, we get the rate of 0.1638/day. Equating this with f i p i β i N a (0) and using N a (0) = 4 × 10 5 we get β i = 2.3014 × 10 −6 . The initial values used for evolving Eqs. (1-6) are N i (0) = 5407, N q (0) = 0, N t (0) = 0, N r (0) = 0, N d (0) = 0. (The reason for using zero initial conditions for N q (0), N t (0), N r (0) and N d (0) is that the initial values would not be recorded during the first phase. However, using the values obtained from the first phase for N r and N d makes little difference. Note that N i (0) = 5407 is smaller than the total number of infected cases. Again, using N i (0) = 5687 does not alter the results.) The values of the parameters are those listed in Table I . Figure 4(a) shows plots of the calculated total infected population N t and and the total infected cases in the UK (•). Clearly, the fit is very good. Also shown is the active infected N i labeled (ii). To the best of our knowledge, we are not aware of any model that fits the COVID-19 data over such a long periods (with the intention of forecasting the future) for any country as has been done here, although there have been some efforts to fit data for initial periods [8, 21, 22, 24, 35, 41] , More importantly, the plot of model predicted active infected population N i (ii) shows a peak around May 15. Subsequent decrease in N i is seen to be slow. At this rate of slowing-down, the model predicts that a near saturation value (of 3.8 × 10 5 ) would only be reached by the beginning of August. Strictly, the end time of the epidemic, i.e., with no new infected cases, appears to be even farther. Further, the model can be used to fit the COVID-19 data for other countries and also to forecast the progression of the disease.

Within the scope of the model, the slowing down pace is captured by the relative magnitudes of the contact transmission rate parameter β i before and after the lockdown date. The value of β i prior to the lock-down period (β i = 6.118 × 10 −6 ) is just 2.65 times that during the lock-down period (β i = 2.3014 × 10 −6 ). These numbers have been obtained purely fitting the initial growth rate for the phases as explained earlier. However, an independent estimate obtained for the Wuhan case shows that this factor should be close to 5 [6] . If we take the small ration of 2.65 seriously (which is questionable), it might reflect that the lock-down efforts have not been fully effective. However, similar independent estimate of the contact rate transmission parameter for the uncon-. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint strained and constrained growths is not available for the UK.

V. SUMMARY, DISCUSSION AND CONCLUSIONS Recent literature has focused on abstracting the effect of various types of interventions through epidemiological models to make projections of how the disease progresses under different conditions. Recall that one limitation particularly applicable to the deterministic compartmental models is the difficulty in getting proper estimates of the parameters, particularly when the number of compartments is large. In this respect, simpler models with fewer compartments have an advantage. However, several factors may contribute to a single parameter and therefore the ability of such parameters to represent the mitigating efficacy of interventions appears limited. Furthermore, the number of parameters in such models is not small, making numerical solution often the only choice. Therefore, any method -whether mathematical or conceptual -which simplifies analysis and easy interpretation is welcome.

Motivated by this, we have introduced the concept of accessible population for transmission of the virus, which is taken to be a small fraction of the total population. Indeed, the effect of lock-down interevention is evident in all counties where the disease has been controlled or restricted. At the mathematical level, we introduce a decoupling scheme to aid mathematical analysis that also helps easy interpretation. The model equations have been devised in such a way that the susceptible and active infected populations form the main populations. The decoupling is effected by dropping all inward and outward transitions excepting the direct transitions (f i p i β i N a (0) and α s p s N a (0)). Because, all outward transitions from I are ignored under this decoupling, the active infected population N i takes the role of the cumulative infected population N t . The simplicity of the reduced logistic equation (11) allows easy identification of the growth and inhibiting factors in terms of the dominant growth factors (direct inwards transitions or parameters). Surprisingly, this simple equation provides a good fit to the reported cumulative number of infections for United Kingdom, as is clear in Fig. 2 . The fits for the period till March 23 and thereafter are clearly good.

The full model Eqs. (1-6) contain several parameters whose range has been estimated in a number of studies [8, 21, 22, 24, 35, 41] . However, when it comes to explaining or capturing the growth characteristics for a specific country, optimized parameters suitable for the situation are required. Following [38] [39] [40] , we have determined the relative importance of the various transition rates (equivalently the associated parameters) subject to the constraint that the parameter values provide the best fit for the given data. In this work, we have made use of publicly available data on the total infected cases for the United Kingdom.

Figure 4(a) shows the fit obtained for the period till March 23, 2020 (shown in the inset) and for the period beyond. Clearly, the fit is seen to be very good for both the initial period till the lock-down date and the period thereafter. Comparing Fig. 4(a) with Fig. 2 for the reduced logistic map, we see that while the fit in both cases is equally good, the projections of the future evolutions are significantly different. The saturation value predicted by the full model (shown in Fig. 4(a) is close to 3.8 × 10 5 , whereas that predicted by the reduced logistic equation in Fig. 2 is ∼ 2.75 × 10 5 . Conventionally, the end time of epidemic is defined as the day on which no new infections are reported. However, approach to the end point is generally slow. For this reason, we use a working definition of the end time of the epidemic as the time required to reach 5% of the saturation level. Then, the end time of the epidemic predicted by the full model turns out be late July or early August (see Fig. 4a ). In contrast, the end time for the epidemic predicted by the reduced logistic model is late June. Clearly, the results obtained from the full model emphasize the limitations of the reduced model. A natural question is: what are the underlying causes?

The fact that the reduced logistic model provides a good fit also means that the major contributing factors for the growth of infection are included in Eq. (2). To see this, consider Eqs. (1-6). The growth of N i (t) has two types of inward transitions, namely, direct and delayed. Note that the direct transition from S to I given by f i p i β i N s (t) controls the growth rate of N i . Because of the presence of N s , the growth rate parameter can be large, at least during the initial period, and therefore is . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint a fast mode [42] , meaning that the growth rate of N i is faster than the growth of the other populations. This is also physically clear. The other transition α s p s N s into I contributes only to the pre-exponential factor (see Eq. 16). Now, consider the delayed inward transitions to I coming from Q and T . These transitions are smaller in magnitude and contribute to sub-exponential growth of N i in time. More importantly, the turning point in N i is due to a competition between the growth factors (all inward transitions) and the outward transitions (recovery and fatality terms). Further, since the time evolution beyond the turning point is controlled by outward transitions, the approach towards the state of no infections or the saturation value of N t is slow. These features are clear from Fig. 4(a) . Note that the fit till May 25 is just beyond the turn point of N i and it has a long way to evolve to the end point of the epidemic. Therefore, it would be interesting to see if the long term prediction of the model would agree with further evolution of the pandemic assuming the present lock-down continues.

These arguments explain two features of the data fit obtained using the reduced logistic equation. Because the total number of infected cases N t does not have any information about the recovered and deceased but has the dominant growth contributions, the good fit is not surprising. On the other hand, growth dynamics beyond the turning point (of N i ) is controlled by a balance between growth factors (all inward transitions) and inhibiting factors (the rate of recovery and dead). However, these competing time scales are absent in the logistic equation. Therefore, the projected saturation value of N t and the end time of the epidemic is not well captured.

In conclusion, the simple compartmental model not only provides a good fit to the United Kingdom caronavirus data but also makes concrete long term predictions for the future. We believe that these results have been made possible due to the reductive approach adopted here.

Recall the equation governing the cumulative infected population N s (t) from Eqs. (8) (9) is given bẏ be the roots of the quadratic equation. Then, in terms of a, b and c, the two roots can be written as α 1 ∼ b a = N s (0) and α 2 ∼ −ac/b < 0, which is small compared to b. Then the solution is given by

Aα 1 e a(α1−α2)t − α 2 Ae a(α1−α2)t − 1 = Aα 1 e bt − α 2 Ae bt − 1 .

(A.5)

The constant A is given by For short times, N t tends to (N t (0) + c b e bt (since the denominator is dominated by b/a = N s (0)), consistent with Eq. (16), the short time solution. For long times however, N t tend to b/a = N s (0), the total population. [4] Egger M, Johnson L, Althaus C, Schöni A, Salanti G, . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 29, 2020. . https://doi.org/10.1101/2020.05.27.20114868 doi: medRxiv preprint

Developing WHO guidelines: Time to formally include evidence from mathematical modelling studies

Community-based measures for mitigating the 2009 H1N1 pandemic in China

Estimation of the Transmission Risk of the 2019-nCoV and Its Implication for Public Health Interventions

A conceptual model for the coronavirus disease

China with individual reaction and governmental action

Modeling the epidemic dynamics and control of COVID-19 outbreak in China

A mathematical model for simulating the phasebased transmissibility of a novel coronavirus

Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing

Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts

Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions

Estimating the burden of SARS-CoV-2 in France

Tracking and tracing in the UK: a dynamic causal modelling study

Mobility traces and spreading of COVID-19

The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak

Mathematical models of isolation and quarantine

Coronavirus Covid-19 spreading in Italy: optimizing an epidemiological model with dynamic social distancing through Differential Evolution

Data-driven modeling reveals a universal dynamic underlying the COVID-19 pandemic under social distancing

COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: A data driven model approach

Evaluation of the lockdowns for the SARS-CoV-2 epidemic in Italy and Spain after one month follow up

Impact of Non-pharmaceutical Interventions (NPIs) to Reduce COVID-19 Mortality and Healthcare Demand

Mathematical modelling of COVID-19 transmission and mitigation strategies in the population of Ontario

Modelling transmission and control of the COVID-19 pandemic in Australia

Predicting the impacts of epidemic outbreaks on global supply chains: A simulation-based analysis on the coronavirus outbreak

A phenomenological approach to COVID-19 spread in a population

COVID-19) Case Growth with a Hierarchical Logistic Model

Aerodynamic analysis of SARS-CoV-2 in two Wuhan hospitals

The incubation period of 2019-nCoV infections among travellers from Wuhan, China

Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data

Estimates of the severity of coronavirus disease 2019: a model-based analysis

The refractory model: the logistic curve and the history of population ecology

Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures

Real-Time Estimation of the Risk of Death from Novel Coronavirus (COVID-19) Infection: Inference Using Exported Cases

Dynamical approach to displacement jumps nanonindentation

An alternate framework for indentation size effect based on residual plastic depth: A dislocation dynamical approach, G. Ananthakrishna and Srikanth K

Dislocation mechanisms based model for Portevin-Le Chatelier like instability in microindentation of dilute alloys

Potential short-term outcome of an uncontrolled COVID-19 epidemic in

Nonlinear dynamics and chaos