key: cord-0638923-7e1wn7bg
authors: Mathur, Nilmani; Shaw, Gargi
title: An empirical model on the dynamics of Covid-19 spread in human population
date: 2020-08-13
journal: nan
DOI: nan
sha: ca38e405545dd2b2abd6d3ea697672915deec654
doc_id: 638923
cord_uid: 7e1wn7bg

We propose a mathematical model to analyze the time evolution of the total number of infected population with Covid-19 disease at a region in the ongoing pandemic. Using the available data of Covid-19 infected population on various countries we formulate a model which can successfully track the time evolution from early days to the saturation period in a given wave of this infectious disease. It involves a set of effective parameters which can be extracted from the available data. Using those parameters the future trajectories of the disease spread can also be projected. A set of differential equations is also proposed whose solutions are these time evolution trajectories. Using such a formalism we project the future time evolution trajectories of infection spread for a number of countries where the Covid-19 infection is still rapidly rising.

Currently a pandemic is ongoing throughout the world caused by a contagious respiratory disease, called Covid-19. The pathogen of this respiratory disease is a novel coronavirus, named SARS-CoV-2 [1]. It started from China and subsequently spreads to most of the countries, and as of August 12, 2020, it has infected more than 20.2 million human population worldwide causing more than 740 thousand deaths [1] [2] [3] . Though the spread of infection has substantially reduced in several countries, particularly in China and Europe, in many countries with a large population, such as USA, Brazil and India, the pandemic is surging prominently at this time. It is also not clear whether this respiratory disease will be seasonal and a second wave will come later. There is no clear consensus in the scientific community on the possible future evolution of this disease and naturally there is no consensus on the ideal intervention strategy by a government in minimizing the number of fatalities while allowing economic and social activities. Many mathematical models have been put forward 1 to track the time evolution of the disease spread, to understand its dynamics, as well as to provide a feasible guidance to governments around the world to control this pandemic.

As in ecology and population growth, in epidemiology too the time evolution of virus growth in a population is of fundamental importance. In general, one builds a model of disease transmission using a system of differential equations with an assumption of initial exponential growth [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] . A standard way for mathematical analysis of the dynamics of infection spread is to adopt a variety of compartmental models originated primarily from the so-called SIR model [46] [47] [48] , incorporating susceptible (S), infectious (I) and Recovered (R) population. For the Covid-19 disease growth, one of these models, the so-called SEIR model, and its extensions have been utilized extensively [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] 31] . In this scheme of models, one divides the total population, N , at a infected region (e.g., city, country), into susceptible (S), exposed (E), Infected (I) and Recovered (R) population with a constraint S +E +I +R = N . A set of differential equations incorporating these correlated compartments then provides the time evolution of disease spread. The success and problem of this model have been discussed over last few months in detail and there is no clear consensus whether this type of models can predict the spread of Covid-19 with reasonable accuracy [33, 34] . There is an alternate view that the integral equations based models could be more effective than the models based on differential equations mentioned above to describe the dynamics of epidemics [30, 49, 50] .

Another approach in studying the dynamics of an epidemic is to employ data-driven phenomenological statistical models [24] [25] [26] [27] [28] [29] where one constructs a mathematical description utilizing the existing data on the epidemics. For example, the numbers of total infections, day to day infections, fatalities etc. can be utilized to construct a model with a number of parameters and then constrain those parameters with the data. Of course, these models do not include any microscopic parameters but can describe the data in an effective way. Once the model parameters are fixed, in principle, it is possible to project 2 the future dynamics of the infection spread.

In this work, adopting the phenomenological approach of the statistical models, we propose a mathematical model for the time evolution of the number of infected population.

Rather than proposing a microscopic model, by analyzing the available data on various countries and cities, we develop an effective time evolution trajectory on the number of total infection which can be employed at Covid-19-affected regions. The reason behind adopting such an approach is that since the Covid-19 disease is contagious, assuming a region as a closed system, the number of today's infected population is directly correlated to the number of infected population in the past, and moreover, today's number will also determine how many people will be infected in the near future. The parameters of the model can be thought of as an effective mean-field type parameters, which can be constrained with the available data, and later those can be employed for the future time evolution of the disease spread. We find that there is a clear common pattern in the initial growth, mitigation and saturation periods of the time evolution of the number of infected population at various Covid-19-affected regions. The only difference in describing the Covid-19 virus spread between various affected population is the difference between parameter sets of these regions. However, they are all confined within a smaller subspace of the parameter space. The differences between parameters of different regions are possibly due to their differences in total number of population, density, mobility, age and gender distributions, testing facility, lockdown effect, social distancing etc. microscopic factors. Of course, it will be interest-ing to find mathematical correlations between the effective parameters and the microscopic parameters.

Using the proposed model, we also formulate a set of differential equations which can equally well describe the Covid-19 spread at an affected population from the initial days to the saturation period. The time evolution of these differential equations can be performed with any good first-order differential equation solver with matching the boundary conditions at different periods of infection. We find that by solving these differential equations it is possible to find a common trajectory that can describe the available data on the total number of infected population for all periods of Covid-19 spread. The parameters inside these differential equations can be tuned with the available data. Instead of finding a single trajectory, we use the available data set with error-bars which yield a set of parameters and hence a set of trajectories within the allowed errorbars. The use of errorbars on the available data is justified as the exact number of infection is unknown due to the lack of adequate testing facility and often for social as well as political reasons. We use a larger error (mostly within 10%, and in a few cases maximum up to 20%) at the first 14 days (considering the incubation period of SARS-CoV-2 virus) and later reduced it to less than a percent level in the saturation period. This is a reasonable assumption, as in the initial days the testing facilities are in general severely inadequate, the symptoms are not well recognized in the population and hence under-reported and so the reported infected numbers could be well below the true numbers. In the later days, these bottlenecks in general reduce substantially and the number of reported cases become less erroneous. We adopt a χ 2 minimization procedure to incorporate these errors and this also helps to get a band of trajectories around its mean values (reported numbers). In this way, the onset of saturation period of infection can also be span over a few days (or weeks), and not on a single day which is also what we observe at various affected regions.

In this work, our main objective is to build a model, utilizing the available data on Covid-19 spread of various regions, which can track the time evolution of the number of infected population, and also to project the possible future trajectory. As it is data-driven, the dynamics of our model and its predictive power is dependent on the correct source of data, and hence the model parameters and projection can change if the data is erroneous.

We use data mainly from the coronavirus resources of Wikipedia [3] , W.H.O.

[1] as well as of local governments. We assume each region as a closed system and all conditions during the disease spread more or less remain to be the same so that a time correlation of infection can be built. If the prevailing measures against the disease spread, such as effectiveness of lockdown, social distancing, contact tracing and quarantine, preventive maskwearing, forbidding large public gatherings etc., under which the data were available, change substantially then the parameters and hence the trajectories will also change. However, this model can be progressively improved with more data, and projection for a few weeks to a month or more can also be made.

We organize the article as below. In section II, we detail our model and also elaborate on the set of differential equations corresponding to this model. In section III, we provide results with numerical details. First, we validate the model by analyzing data on various European countries and New York City. Then to demonstrate the predictive ability of this model we show how a subset of data can help to predict the future time evolution trajectory of the infection at a region. Next, we proceed to analyze data on Russia, Brazil, India and the USA where Covid-19 infections are still increasing rapidly. For India we separately analyze the data for its two biggest cities: Mumbai and Delhi. For each of the regions, within this model, we show the projections of the future time evolution of the infection with the most probable time-scale for the onset of saturation along with the cumulative number of infection. At the end, we discuss our results and conclude.

The basis of building a phenomenological data-driven model is to analyze the available data on the targeted problem and formulate a mathematical framework to represent the data, with a minimal set of parameters, in a consistent plausible way. This mathematical model can also predict (project) the dynamics in a domain where data is not available.

With this in mind we analyze the number of cumulative infected population of Covid-19

disease for a number of countries and cities. In Fig. 1 By observing the progress of Covid-19 disease spread at various countries, as shown in Fig. 1 , we find that the time evolution of cumulative infected population shows more or less a common pattern, irrespective of the inherent different conditions prevailing at the affected regions. We find that the total infection time can be divided broadly within the following three periods:

Here t 0 , t m , t s i and t s f are the starting times of infection, mitigation, saturation and the ending time of the infection, respectively. Of course, the transitions from one period to the other are not sharps, rather those are cross-over between two regions and can span over a couple of days or weeks. For example, for the latent factor, t 0 may not be a particular day.

Similarly t m , t s i , t s f may also span over a couple of days or weeks. Hence, if there are different mathematical forms for describing the time evolution at different periods, there should be a continuity of the functional forms from one period to the others as will be explained later. The transition times from one period to other can be determined through the data, for example, by minimizing the χ 2 of a particular model against the data. Naturally, the transition times will be different for different countries since the conditions responsible for early infection rate, decrement of that rate in the mitigation period, onset of the saturation period are different for different countries.

To show the above findings more conclusively, as a representative case, in fitting our model with available data with a minimum acceptable χ 2 which we will discuss later in detail. For England, with fixing the initial days of infection (t 0 ) as February 24, 2020, we find that the transition time from early rise to the mitigation period started at around t = 47, and the onset of saturation period started at around t = 93 days. We will discuss on this in more detail in section II when all other results will be presented.

By analyzing the available data for various countries, we find empirically that a possible way to track the trajectory of the number of infected population, within the above-mentioned three periods, starting from the early days (t 0 ) to the saturation period (t s f ), is through the following equations:

where the functional form of the fast increment period could be an exponential rise followed by a power-law-rise of the form:

and in some cases, simply be the following power-rise form

Here A, B, C are constants and mainly depend on the latent population and the density of population, while α e i and α i govern the rate of infection in a given population. In reality, α e i and α i are time-dependent variables and depend on many factors, such as the density of population, the total population, effectiveness of the preventive measures against the disease spread. In this study we assume that these variables can be taken as constants in the sense of an effective mean-field approximation: α e i (t) =α e i and α i (t) =α i . However, if there are rapid mutations of the pathogen, migration between different population in a short interval as well as rapid change in environmental factors, these assumptions may not be valid. For a given population, depending on its density, immunity in population, social distancing or any other preventing factor against the disease spread, one of the above forms enhances the infection to a large number in a short period of time. By tracking the infected population in the early rise period one can fix one of these forms to explain the rise of infected of population. Such power-law-rise form was also observed in Refs. [24] .

In the second period (so-called mitigation period with t m ≤ t < t s i ), we find that the time evolution trajectory for the total number of infected population can be tracked with a combination of rising and damping factors through the following equation:

where, α m is related to the rate of infection in this period, while the parameters λ and γ determine how fast the saturation period can be reached. The decrease of early infection rate may happen for various reasons, for example for government imposed preventive measures, development of immunity in the community, availability of drugs etc. Eventually the infection rate saturates and then decreases depending on the effectiveness of these external factors.

For a given wave of infection, we observe that the increment of infection in the saturation/decline period (t s i ≤ t ≤ t s f ) can be tracked through the following equation:

where E is a constant, α s is the infection rate in this period, and these constants depend on the population density, herd immunity factor, availability of remedy through drugs etc.

However, in this period, at any time, a second wave may start if there is a large unaffected population and the restrictions to contain the virus become much softer. If the virus is seasonal, on which there is no consensus yet, it can also come back to unaffected population.

In the case of USA, such a second rise of infection by this virus is quite prominent (and probably also for Spain, albeit slowly) which we will discuss later.

To be noted that rather than using a single mathematical formula for tracking the infection throughout the contamination period, from the early rise to the saturation period, as used in [24, 28, 29] , we use three different functional forms. The reason behind such a formulation is that the dynamics of disease spread in three periods are different due to change in various external conditions as infection progresses (this is quite apparent in Fig.   2 ). A single functional form, as used in Refs. [24, 28, 29] is thus not suitable to follow the time evolution dynamics throughout the contamination period. As mentioned earlier, in our case, at the boundaries the model-evaluated trajectories obtained from different functional forms are matched between two periods so that no discontinuity arises.

Though these data-driven mathematical trajectories can be effective to project the future evolution of the disease, it is important to find out a set of differential equations whose solution are these equations. That may help to understand the dynamics of infection in a better way as well as the origin of such equations and their parameters. To achieve such a formulation it is worthwhile to look towards the growth equations in ecology and epidemic studies. In fact in epidemic studies, to estimate the infection rate, cumulative number of cases, the peak number of infected population, future dynamics of the epidemics etc., one can use the growth equations. One such approach is the standard logistic model [52] where population growth is determined through an exponential growth, constrained to the number of population, as below:

where the first term αN (α > 0) implies an initial exponential growth with an exponent α, while the second term, known as the bottleneck factor with the parameter K, the carrying capacity, adjusts the growth with critical resources in the population. A situation with maturity of population arrives as the competition between two terms reduces the combined growth rate, until the infected population saturates. Since the environmental conditions influence the carrying capacity (K), as a consequence it can be time-varying (K(t) > 0), leading to the following mathematical model of growth:

This can be associated with a more general growth model, the so-called generalized Richards

Model [53] which was originally introduced in the context of ecological population growth, as below:

where the additional parameter ν > 0 determines the asymptote of maximum growth. This model has a solution (when K is independent of time)

with t tp as the turning point where the growth rate becomes maximum. This model has already been employed for real-time prediction in epidemiology, for example in Refs. [54, 55] .

At this point, we would like to correlate our observed empirical growth model (Eq.(1)), with Richards Model (Eq. (8)). However, once a government introduces a measure, such as lockdown and/or introduction of preventive drugs, to reduce the disease spread, the mitigation period also reduces. If one considers a time-dependent log-kill factor, c(t)N (t), as in reference [24] , the above equation modifies to

If Eq.(1b) is a theoretical model for infectious growth with preventive measures against infection spread, and Eq.(10) can also explain the same growth, then Eq.(1b) could well be a solution of Eq. (10) . Following the similar strategy as in Ref. [24] , we find that for t > t 0 , Eq.(1b) is a solution for Eq.(10) with the following set of parameters:

and,

With this parameterization, we arrive at the following dynamical evolution equation of the infection growth for Covid-19 with preventive measures (such as lockdown, social distancing, introduction of preventive drugs etc.)

The above equation has five parameters: α, α m , β, λ and γ, and those are dependent on the particular virus that is spreading, the infection rate in a particular community as well as on the externally imposed conditions such as the lockdown measures to which the population is subjected to (t m will be determined by varying it dynamically and requiring minimum χ 2 ). With the available data these parameters can be constrained and then Eq. (12) can be utilized for future time evolution. To be noted that with α m = 1, and γ = 2, Eq.(1b) turns

to Covid-19 model of Ref. [24] , which can be thought of as a special case of Eq.(1b).

Combining all the terms, the final compilation of a set of equations that we propose for the time evolution of the cumulative infected population at different period is the following:

These equations can be solved numerically for a given set of parameters, and by matching the boundary conditions at t m and t s i one can get a time evolution trajectory for the whole infection period. The parameters of Eq.(13) can be fixed through a χ 2 -fitting of the numerical trajectory against the available data. The transition periods can be evaluated dynamically by requiring the minimum χ 2 . Here one can introduce an error σ i 's on each days data which will then generate a set of trajectories allowed within that errors.

The onset of saturation (plateauing), that is t s i , can be determined by finding the maximum of N (t) at the end of the mitigation period. Taking the first derivative one arrives at

a positive-time solution of which provides t s i . One can also find the inflection point, that is the onset of approach towards t s i , by taking the second derivative which yields

and then finding a solution of this equation.

We use both Eqs.

(1) and (13) to generate the time evolution trajectories and the parameters are fixed using the available data. The full trajectories, covering the whole contamination period, are then generated which also project the disease spread at the future times. Results obtained for various affected regions are presented in the next section.

In this section we present the results obtained through our proposed model. First, we will check whether the proposed model is able to successfully track the available data on the cumulative number of infection at various Covid-19-affected regions. Then we will probe the predictive ability of our model using a subset of data and reproducing the later trajectory.

Finally, we will project the future time evolution of the number of infected population for a few countries where the infection is still rising rapidly.

To validate our model we use the available data on Covid-19-infected population of the following countries: England, Germany, Italy, France, Spain and also for New York City.

These data are taken from Refs. [56] [57] [58] [59] [60] [61] . We choose these countries and city as they have already at the inside of the saturation period (after t s i ) in the ongoing wave of Covid-19

disease. This will enable us to test our model from the early rise to the saturation period.

After demonstrating the usefulness of the proposed model we proceed to show the predictive ability of our model by analyzing the subset of data for Italy and New York City. As will be evident later that this model can predict the future time evolution trajectory for a few weeks to months. Then we will show our results on the total number of infected population of USA, Brazil, Russia and India (separately on Mumbai and Delhi also), where the number of cases are still rapidly increasing. This will enable us to project the time evolution trajectories of these countries in the mitigation period, till they reach the beginning of their respective plateau positions (starting at t s i ). (1). We vary the extent of different timeperiods dynamically and choose t m and t s i that minimize the total χ 2 , which is a sum over individual terms, i.e.,

is the cumulative number of infected population on i-th day, and σ N (t i ) is the error on that. We introduce this error (σ N (t i ) ) with each day's number with a maximum of 20% error for the first 14 days (incubation period, and most cases it is 10%) and less than a percent level error as the fit approaches to the saturation period. An error of about 5-1% is imposed in between with gradual decrement with the number of days. We have already mentioned it and elaborating it further here with following arguments: at the onset of infection, the number of initial cases is not well determined as the number of tests performed could well be too low and there is a good probability for underestimation. We keep the maximum error for the first 14 days which is typically the incubation period for this respiratory disease. The testing procedure, capacity as well as reliability improves over time and hence the reported number for the Covidpositive cases becomes less more erroneous over time. The timing of a transition from one period to other, as mentioned at the beginning of section II is chosen dynamically so that the total χ 2 is minimum. To avoid a sharp transition between two periods (for example, early rise to mitigation, that is points just before and after t m ) we interpolate the results between the adjacent points obtained from two different fitting forms. This is justifiable as we mentioned earlier that change form one period to others does not happen in a single day and can happen with a smooth cross-over over several days. With this procedure we fit the data for the above-mentioned countries uniformly. We first show our results for the first two time-periods (initial and mitigation, till the onset of saturation) with fit to Eqs. (1a) and (1b). This will show the validity of our model at these two time-periods. Later we also show our results of the entire time-periods (initial mitigation, and saturation). We show it separately as Eq.(1c) is the ideal representation of the saturation period with the same environmental factors as in the mitigation period. However, in this period due to possible ease of various environmental factors and with a large unaffected population a second wave of infection may resurface which needs to be dealt separately. That we will discuss for the case of USA later.

In Fig. 3 we show the cumulative Covid-positive cases for the above-mentioned countries for the first two time periods (initial to the onset of saturation, before t s i ), as determined by our fit to Eqs. (1a) and (1b) dynamically with a minimum χ 2 = χ 2 i + χ 2 m . The red points are the actual data points as obtained from Refs. [56] [57] [58] [59] [60] [61] , while the black lines are the fitted trajectories with the time evolution form as in Eqs. (1a) and (1b). In Table   I we show the mean values of the fitted parameters. As one can observe that the fitted trajectories obtained through Eqs. (1a) and (1b) describe the actual trajectories of the time evolution, quite well, starting from the onset of disease to the onset of saturation period as it approaches the plateau region. It is interesting to see that the onset of saturation period for all countries happened when the power γ reaches a value closer to 1. In Fig. 4 , we show the per day infection as a function of days for the above-mentioned countries. The red points are the actual data points as obtained from Refs. [56] [57] [58] [59] [60] [61] , while the black lines are results from Eq. (1).

Initial Parameters and City time ( Table I . is unsuitable to track the progression of the disease. At that point the third phase of time evolution with Eq. (1c) becomes effective. We choose the onset of t s i dynamically with the condition such that χ 2 = χ 2 i + χ 2 m + χ 2 s together provide the best acceptable χ 2 /dgf with t s i as high as possible. With that method we fit the available full data set till July 31, 2020. Since data for an individual day fluctuates more and sometime extra data for a day gets reported on the next day (since the official announcement comes at a certain time) it may be worthwhile to study the same data taking an average over a few days. That will also check the consistency of fits to extract the parameters of Eq. (1). We perform a study on that for

England's data taking an average over 5 days (one can also choose a different bin size, but it should not be too large). We fit the data with the same errorbar as in each day's data with the corresponding periods. The results are shown in Fig. 7 , both corresponding to per day data and average over 5 days. As one can see that there is no significant difference between results obtained when we use each days data (magenta line) and average data (blue line).

Next, we proceed to solve Eq. (13) numbers as mentioned earlier. To be noted that solutions of these equations with a given errorbar will result in a band of trajectories rather than just one. However, that is more realistic since a different set of parameters, with minimum χ 2 , generates a band of trajectories which can spread over a few days which is also what we observe in reality. At the boundary, we match the solution without any discontinuity that is solutions are continuously progressed. For example, the initial value of N (t) at t = t m + 1 with (13b) is generated from the solution of (13a) at t = t m . In this way, we match the boundary conditions of solutions between different regions. A trajectory generated by these equations is accepted or rejected based on the minimum χ 2 within the errorbars. The trajectories obtained through these differential equations for Italy and New York City are shown in can also track the disease spread in the saturation period assuming the same environmental condition prevails throughout this period. Though we show results for only two regions, data for others regions can also be tracked with equal success through the proposed differential equations (Eq. (13)). For Italy this projection is for 25 days t s i (= 85 − 60). We find that this projection can progressively be improved by gradually including more data. It would be interesting to see how far in future time-scale this model can make a projection, and that we will discuss in the next subsection while considering the disease spread for countries where the infection is still increasing substantially.

Having validated the model and then showing its predictive ability, we proceed to study the data on other countries, Russia, Brazil, India and the USA, where the number of Covidpositive cases are still rising substantially. For India, we also choose two of its biggest cities, Mumbai and Delhi. We first constrain the parameters by fitting the available data as detailed above, both for Eq. (1) and the differential equations (Eq (13)). We then proceed to predict the future time-evolution trajectories up to the onset of saturation. 

In Fig. 10 we plot the cumulative number of Covid-19 positive cases for Russia [62] with the effective starting date (t 0 ) as March 03, 2020. The data of the time-trajectory shows the expected exponential rise followed by a power-rise growth as in Eq. (1a) and (1b). We fit the data with the combined equations and find α e i = 0.2239 and α i = 4.886. It is interesting to note that this is towards the higher value of α i that shown in Table 1 for other countries in Europe. From the fitted results we find the transition from the rising to the mitigation period has happened around time t m = 67, which corresponds to around May 10, 2020.

It is interesting to note that the mitigation period of Russia is quite longer compared to other European countries, perhaps due to larger population. Using the available data up to August 10 we extract the parameters and use those to predict the time evolution trajectory till the time t = 200 − 215, which we believe, signals the onset of saturation period (t s i ) with less than 300 per day infections. We also use the proposed differential equation for the time evolution (Eq (13)) and constrain the parameter set using data till August 10. The corresponding time evolution trajectories (both cumulative and differential) are shown in Fig. 10 . The blue bands are obtained using the constrained parameter set, while the red solid lines are actual data. The projected results, using both Eq. (1) as well as Eq. (13) show that, with the same prevailing environmental condition as it is now, the onset of saturation (t s i ) will start most probably in between t = 190 − 215. That is, by the late September to the middle of October the number of infection will reduce substantially (below 300 per days) with a total cumulative infected number in between 0.95-1.1 million. With the sustained conditions against disease spread a gradual slow increment is thereafter expected. It will be interesting to find if the true data follows the trajectory as we projected in Fig. 10 . 

The second country whose data we study, and where Covid-19 infection is increasing rapidly, is Brazil. It has a large population and high population density in a number of big cities. It is natural to expect that without the strict regulations on human to human contact and proper mechanisms to tackle a pandemic, a contagious disease like Covid-19 will spread rapidly to a high number, and both the rising as well as mitigation periods will be prolonged.

In fact, in a short span of time the total number of infection has reached to a large number and it is now the second most affected country with total infections more than 3 million and more than one hundred thousand deaths (Ref. [63] till August 10). The infection is still very much increasing and one would expect that a large population will further be infected in the foreseeable future. It is thus quite important to project the probable time evolution of the total number of infection. Fig. 11 . The red solid line is the available data, black solid lines are obtained through Eq. (1) and the blue band represents the projection through the differential equations in Eq (13) . Our results suggest that the current wave of infection, with the prevailing measures against disease spread, will progress for another 3-4 months and then it will reach to the beginning of the saturation period by the end of November with less than 2500 infections per day. The cumulative number of infection by then will reach to 5.5-6.5 million. However, with more stringent measures to prevent disease spread or if an effective vaccine becomes available soon, the mitigation period could be shortened.

On the other hand, if the prevailing conditions of preventive measures become softer and no vaccine is available, then a second wave can start from this mitigation period with more number of infections as is happening now in USA. There is still a large uncertainty in this projection. As mentioned earlier that this projection can be progressively improved with inclusion of more data as the disease progress further. Update results will be posted in the url mentioned later 4 .

The analysis of Covid-19 data on India provides an ideal platform as well as a challenge to In Fig. 12 , we plot the cumulative number of confirmed cases [64, 65] with the effective starting date (t 0 ) as March 02, 2020. Again we find an exponential increment followed by a power-law-rise growth as in Eq. (1a). We fit the data with the combined equations. It is interesting to note that α e i and α i , till the middle of June, was on the lower side compared to that of other countries with a smaller population, which signifies the effect of preventive measures at the initial days. However, α i has started to increase after that (higher percentage of infection is also correlated to the number of tests which has substantially increased during this period). The two bands show 68% and 95% confidence intervals.

As is evident from the recent data that the disease spread is surging ahead and there is no signal for slowing down of infection even after passing more than 5 months by when many countries are either in mitigation or saturation periods in the first wave of infection. This is precisely due to a large number as well as density of population where the strict lockdown measures cannot be maintained continuously for indefinite periods considering economical, social and political factors. Considering these external factors, it is expected that the rising, as well as mitigation periods of a contagious disease, will considerably be longer (and most probably be the longest) compared to those of all other countries. Given that there is no availability of an effective vaccine in the immediate future and it is not feasible to impose a prolonged lockdown, other possible measures (strict social distancing, contact tracing, maskwearing, informing all levels of population about the disease transmission etc.) against this contagious disease, which could well be airborne [1] must be strictly followed to avoid an avalanche of infected population in next few months.

On Indian data, the main observation is that the mitigation period has not started yet.

Hence, at this moment, through our model it is not possible to project the time evolution trajectory for a longer period in future. However, considering the current rate of infection we project the possible scenario that may happen to the number of total infection in next three weeks. Fits to the model suggest that, with the current trend, the cumulative number of infection will cross 3.5 million at the end of August (Fig. 12) . On a positive note, only very recently the increment coefficient, α m (of Eq. (1b)), is found to be reducing, albeit slowly, with its current value at 4.58. It will be interesting monitor if α m continues to reduce further and the infection enters into the mitigation period. We will continuously monitor the disease spread and will project the time evolution later more accurately with the availability of more data.

Along with the whole Indian population, we also analyze data for its two biggest cities, Mumbai and Delhi. Below we elaborate our analysis for these two cities and subsequently project the most probable trajectory of infection for the next few months.

Mumbai is the largest city in India and also is one of the most densely populated cities in the world (> 25000 per sq km with a total population more than 20 million [71]). Moreover, the city has clusters of densely populated areas and a large number of migration happens very frequently. Naturally it is intriguing to analyze the time evolution trajectory of Covid-19 infection of such a city. The first reported case in the city of Mumbai was on March 11, 2020 and subsequently the infected number has increased substantially, and as of August 10, 2020, the total infected number is more than 124 thousand with a fatality rate of 5.55%

[67]. As in other parts of India, Mumbai was also under lockdown effectively from March 20, and gradual un-lockdown has started since the beginning of June while placing lockdown measures at containment zones.

In Fig. 13 , we plot the cumulative number of confirmed Covid-19-infected patients with the effective starting date (t 0 ) as March 14, 2020 [67] . This data also shows an exponential increase followed by a power-rise growth in the initial days as in Eq. (1a). We fit the data with the combined form and find α e i = 0.176 and α i = 3.436 which are again smaller compared to many other places. Perhaps this is due to strict lockdown measures at the initial days. The transition from the rising to the mitigation period is found to be at time t m = 81, which corresponds to around June 03, 2020. Due to its huge carrying population density as well as the total population, it is expected that the mitigation period will be much longer. With existing data we track the time evolution with Eq. (1) and the fitted result is shown with the black solid line in Fig. 13 . We also use Eq. Similar to our study on England's data we also perform a study on 5 days average data of Mumbai. The results are shown in Fig. 14. As one can see that there is effectively no difference between two results (magenta and blue lines), except a very small difference at the saturation period. This shows that our method for obtaining parameters for Eq. (1) is robust.

Delhi is another highly populated city in the world. Like Mumbai, Delhi has also a number of densely populated areas and everyday migration happens to and from Delhi. It is thus also interesting to analyze the Covid-19 data on Delhi and compared it with that of Mumbai. Below we elaborate our analysis on Delhi.

The first reported case on Covid-19 infection in the city of Delhi was on March 2. Subsequently the infected number has increased, and as of August 10, the total infected number as reported is more than 146 thousand with a mortality rate of 2.82% [68] , which is considerably lower than the same rate of Mumbai. As in other parts of India, Delhi was also under lockdown from March 24 and gradual un-lockdown has initiated from the beginning of June. However, unlike Mumbai, there was a sudden jump in the number of infection was observed around fourth week of June, perhaps due to ease of lockdown and not maintaining other preventive measures. Since then further measures against the disease spread have been taken and the number of per day infected population has slowed down. In Fig. 15 , we plot the cumulative number of infected population [68] with the effective starting date (t 0 ) as March 2, 2020. This data also shows an exponential followed by a power-law-rise form as in Eq. (1). We fit the data with the combined form and find α e i = 0.186 and α i = 4.82 which are larger than the corresponding exponents for Mumbai. The transition from the rising to the mitigation period is found to be at time t m = 103, which again is larger than Mumbai. This transition time corresponds to June 25, 2020. Due to its huge carrying population density as well as the total number of population, we again expect that the mitigation period will be longer. With the existing data, we track the time evolution with Eq. (1) and the result is shown with the black solid line in Fig. 15 . We also use Eq Indeed data after that can be fitted easily again with a power-law-rise form (Eq. (1a)).

Because of the above-mentioned unique feature of the trajectory we use a different analysis strategy for this data. First, we try to find out a maximum time till which the total number of infection can be analyzed like any other affected regions incorporating an early increment of exponential-cum-power-law-rise followed by a mitigation period. We call that part of data as W a. The maximum of W a is chosen dynamically with the usual condition of minimum

). Then we introduce another exponential-cum-power-law-rise followed by a mitigation period, considering as if there is a second wave. We call this part of data as W b. We then dynamically choose the time of transition from W a to W b so that the total χ 2 = χ 2 (W a) + χ 2 (W b) becomes minimum within the acceptable limit. We find the minimum χ 2 fit leads to t m (W a) = 45, the transition time from W a to W b at around time t tr = 103, and t m (W b) = 129 − 140. Interestingly in this way we could track the time evolution of the total number of infection from the initial period to date very well. Result using Eqs. (1a) and (1b) is shown in Fig. 16 with the black solid line, whereas the red solid line represents the actual data [1, 69] . To be noted that the power-law-rise exponent α i in W b (5.594) is much bigger than its corresponding value in W a (2.744). In fact, it is the biggest among all other power-law-rise exponents for any affected regions that we study. This is a clear signal that the infection had spread very rapidly in the early period of W b, and unless adequate measures are imposed against the disease spread a substantial increase in the number of infection is expected in foreseeable future. However, on a positive note, it is worthwhile to mention that our model also allows a possibility that another mitigation period has started, albeit slow, at around t = 129 − 140, i.e., around the second week of July. With this slow rate of mitigation, and in no change in existing conditions, this disease can rise another 2.5-3 months infecting about 6.5-7 million people (shown by the black line in Fig. 16 ), before reaching to the saturation period with less than about 3000 infections per day by the end of October to the beginning of November. Thereafter the onset of saturation period can possibly be achieved by maintaining the same restrictions against the infection spread. We project this as the ideal case scenario with continuation of the existing restrictions and current trend of reduction. Although this is an ideal situation and in reality there will be deviation in restrictions and so in this projection, still this can be taken as a possible scenario. However, if the current value of the percentage of per day increment sustains and does not reduce further in the next 2 weeks, this situation can change. We will discuss next about such a possible situation with a set of speculated data.

Before that, here one point is worthwhile to mention that had the infection trajectory follow the similar pattern as that of most other countries (i.e., without W b, as shown by the blue line in Fig. 16 ), around 150 days (i.e., by now) the onset of saturation period would had started with a much reduced number of infections (about 3000 per day). This is strongly indicative that by maintaining the measures against the infection spread for longer time, as it was in the early days of W a, perhaps the disease spread could have been substantially contained by now and perhaps the onset of saturation period could have happened as well! 4.1 A speculative but possible scenario:

At the end we would discuss about a speculative scenario on the disease spread in next few months based on a speculative set of data for next 2 weeks. We make following two assumptions: i) in next two weeks the number of per day infection does not increase more than 1.25% of previous day's number (current rate is about 1.1-1.2%) and ii) it also does not decrease less than 0.9%. With these assumptions we generate a set of speculative data for next 2 weeks. We then use this data, and use our model, with t m (W b) = 129, to get the future trajectory. In Fig. 17 we show the corresponding results. The speculative data is shown with the blue line and the projection is shown by the green line. The original data till August 10 is shown with the red line, and projection with the same t m is represented by the black line. This speculative scenario points that if the percentage of infection, as mentioned above, does not decrease further in next 2 weeks and sustains in between 0.9-1.25%, then within this model there is high possibility (95% confidence intervals) that the number of infection by the end of October will be about 7-7.5 million with 10000-15000 infections per day. Data (speculative): fit to data: fit with spec. data:

FIG. 17: A speculative scenario considering next 2 weeks data as shown by blue lines. The red filled circles are actual data from Ref. [69] . Results from our model are shown by the black line.

The green line shows a possible scenario as projected by our model considering the speculated data (blue line) along with the actual data (red points). The two bands represent 68% and 95% confidence intervals. Summary of all projections described above is given in Table II .

In this work, using a data-driven statistical model, we present an analysis of the timeevolution of the Covid-19 infected human population for a number of countries and cities.

By analyzing the time-dependent data of the infected population, we find that there is our finding is that it is now crossing the peak of disease spread (that is per day maximum number of infection). However, due to large unaffected population and the unavailability of an effective vaccine, its mitigation period will be prolonged. With the existing measures against the disease spread we expect to see the onset of the saturation period by the middle of November to the middle of December, with 5.5 − 6.5 million cumulative infection and less than 2500 infection per day. Our analysis on India shows that it is still in the powerlaw rise period even after passing more than 5 months of the disease spread. With the current trend of infection we expect that there would be more than 3 million infection by the end of August. We also expect that even with the continuation of the existing measures against the diseases spread, the mitigation period will be quite long considering its huge unaffected population. Hence it is even more important to maintain the existing measures against the disease spread, at least for the next few months, given that no effective vaccine would be available before that. We will continue to monitor the disease propagation and progressively improve our projection on the future trajectory with the availability of more data 3 . We also analyze data on two of its biggest cities: Mumbai and Delhi. For Mumbai, our finding is that it has entered into the mitigation period. If the current conditions against the disease spread can be continued throughout, by the beginning of October we expect to see the onset of saturation period, with the cumulative infected population about 145-160 thousand and with less than 250 infections per day. For Delhi, we project that the onset of saturation period, with a cumulative number of infection of about 160-180 thousand and per day infections below 250, can be achieved by the end of September to the beginning of October. Finally on USA's data, we find that there is a distinct feature in the time evolution trajectory. Our model clearly shows that this data can be explained very well by allowing two waves of infection: first one with the usual exponential-cum-power-rise followed by a mitigation period up to a time, and then the second one started with that mitigated number with a power-rise followed by a very slow mitigation. The exponent in the second power-rise is bigger than that of the first one suggesting a very large number of population can get infected if adequate preventive measures against it are not followed. Our result also shows that had the mitigation period continued for the first wave, by now, a saturation period with much less number of infection could have been achieved. With the current rate of slow mitigation, but with existing restrictions, we project that this disease can rise another 3-4 months infecting about 6.5 − 7 million people, before reaching to the saturation period with less than 5000 infections per day. However, we project this as an ideal case scenario. If the rate of per day increment remains within 0.9-1.25% and no further reduction happens in next 2 weeks, within a speculative scenario, we show that the total number of infection can be around 7-7.5 million by the early November, with more than 10000-12000 infections per day. All the projections are made with 95% confidence intervals in the proposed model given the reported available data. These projections can be progressively improved with the availability of more data in next few weeks 3 . We summarize these projections in Table II .

In the proposed model the parameters are assumed to be mean-field-type effective parameters with no time and other dependences. However, in reality, there is a number of external parameters, which could be environmental, medical, economical, social as well as political.

It is not clear how the model parameters are correlated with those external factors, and it would be worthwhile to study their inter-relations. Such a study is necessary to understand the significance of the model parameters. It may be possible to constrain the parameters more stringently using some other data, for example, the number of active cases and the number of fatalities. We will pursue such a study in future. The predictive ability of the model can be improved further if it is possible to find a robust method to quantify the errors in the reported data. 

Here we show some of the numerical details, particularly describing the method employed for obtaining the confidence intervals.

For a given data set {days, number of infection and error ≡ t, y(t) and σ(t), t = 1, ..., N } and the model function (f (t)), where f (t) is the time evolution trajectory as in Eqs. (1) and (13), we calculate the minimum of χ 2 , defined as

(y(t) − f (t)) 2 σ 2 (t) .

A non-linear fitting method is utilized for the minimization. The total minimum χ 2 is obtained by fitting all periods simultaneously (by varying t e , t m , t s i together). For Eq. (13) this minimization is performed for each trajectory with a given set of parameters and we choose only those which are within the acceptable minimum χ 2 . The non-linear fitting method employed here has been extensively used previously for lattice gauge theory calculations and data analysis [72] [73] [74] .

To maximize the probability of the observed data given the model, we follow Ref.

[75]

and calculate the likelihood of the parameters, defined as

Here we use the generic notation x for time (t) and {α} represents all parameters of the model, and the frequency distribution p (y i |x i , σ yi , {α}) is taken to be Gaussian 

Special report: the simulations driving the world's response to COVID-19

Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study

Contacts in context: large-scale setting-specific socialmixing matrices from the BBC Pandemic project

Estimation of the transmission risk of the 2019-ncov and its implication for public health interventions

An updated estimation of the risk of transmission of the novel coronavirus

Early transmission dynamics in wuhan, china, of novel coronavirusinfected pneumonia

Phase-adjusted estimation of the number of coronavirus disease 2019 cases in wuhan, china

Early dynamics of transmission and control of covid-19: a mathematical modelling study

The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application

Incubation period of 2019 novel coronavirus (2019-ncov) infections among travellers from wuhan, china

Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data

The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak

The effect of human mobility and control measures on the covid-19 epidemic in china

The effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov2)

Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions

Risk assessment of novel coronavirus covid-19 outbreaks outside china

How will countrybased mitigation measures influence the course of the covid-19 epidemic?

Projecting the transmission dynamics of sars-cov-2 through the postpandemic period

Age-structured impact of social distancing on the COVID-19 epidemicin India

Dynamics of the COVID-19 -Comparison between the Theoretical Predictions and Real Data

A Poisson Autoregressive Model to Understand COVID-19 Contagion Dynamics

Propagation analysis and prediction of the COVID-19

Statistical Modeling of COVID-19 Pandemic Stages Worldwide

health service utilization forecasting team and

Data-driven modeling reveals a universal dynamic underlying the COVID-19 pandemic under social distancing

Why integral equations should be used instead ofdifferential equations to describe the dynamics of epidemics

Covid-19: analysis of a modified SEIR model, a comparison of different intervention strategies and projections for India

Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study

Wrong but Useful -What Covid-19 Epidemiologic Models Can and Cannot Tell Us

A guide to R -the pandemic's misunderstood metric

Infectious diseases of humans

Mathematical epidemiology of infectious diseases: Model building, analysis and interpretation

Virus Dynamics, Mathematical Principles of Immunology and Virology

Infectious disease modelling

Infectious Disease Modelling

A Contribution to the Mathematical Theory of Epidemics

The Mathematics of Infectious Diseases

Exact analytical solutions of the Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates

Integral equation models for endemic infectious diseases

Appropriate models for the management of infectious diseases

On Future Population

A Flexible Growth Function for Empirical Use

SARS epidemiology, logistic-type model, and cumulative case number

Real-time Forecast of Multiphase Outbreak

England data: (as in the website on

Ministry of Health and Family Welfare, Government of India

Deuteron-like heavy dibaryons from Lattice QCD

Precise predictions of charmed-bottom hadrons

We thank TIFR for support. GS also acknowledges WOS-A grant from the Department of Science and Technology (SR/WOS-A/PM-9/2017), India. NM would like to thank Giorgio Sonnino for discussions, Abhishek Dhar for comments, and Girish Kulkarni for discussions on error analysis.

to generalize the above frequency distribution aswhere • I : all the prior knowledge of the data and the problem.• p ({α}|I): the prior probability distribution for the parameters ({α}) that represents all knowledge except the data, I : the posterior probability distribution for the parameters ({α}) with the given data and the prior knowledgeOne gets the peak of the posterior probability at the best-fit values of the parameters ({α}) while its moments provide the uncertainties of these parameters.Since the above posterior probability distribution with the given number of parameters of our model is quite complicated, we adopt a Markov-Chain-Monte-Carlo (MCMC) method to generate this distribution with the given data set and model. The priors for the parameters are chosen uniformly within the relevant ranges. We use more than 50000 MCMC steps in most cases after suitably adjusting the autocorrelation times. Each MCMC steps are initialized with a Gaussian ball around the maximum likelihood result.In Fig. 18 we show the projections of the posterior probability distributions of model parameters for the mitigation period. Since this period is most complicated as well as important and more number of parameters are involved, we choose this period to show results from the above-mentioned analysis method. Fig. 18 is for the data on USA and the extracted parameters correspond to Fig. 16 . The bands represent the one, two and three sigma confidence intervals. In Fig. 19