key: cord-208252-e0vlaoii authors: Calvetti, Daniela; Hoover, Alexander; Rose, Johnie; Somersalo, Erkki title: Bayesian dynamical estimation of the parameters of an SE(A)IR COVID-19 spread model date: 2020-05-09 journal: nan DOI: nan sha: doc_id: 208252 cord_uid: e0vlaoii In this article, we consider a dynamic epidemiology model for the spread of the COVID-19 infection. Starting from the classical SEIR model, the model is modified so as to better describe characteristic features of the underlying pathogen and its infectious modes. In line with the large number of secondary infections not related to contact with documented infectious individuals, the model includes a cohort of asymptomatic or oligosymptomatic infectious individuals, not accounted for in the data of new daily counts of infections. A Bayesian particle filtering algorithm is used to update dynamically the relevant cohort and simultaneously estimate the transmission rate as the new data on the number of new infections and disease related death become available. The underlying assumption of the model is that the infectivity rate is dynamically changing during the epidemics, either because of a mutation of the pathogen or in response to mitigation and containment measures. The sequential Bayesian framework naturally provides a quantification of the uncertainty in the estimate of the model parameters, including the reproduction number, and of the size of the different cohorts. Moreover, we introduce a dimensionless quantity, which is the equilibrium ratio between asymptomatic and symptomatic cohort sizes, and propose a simple formula to estimate the quantity. This ratio leads naturally to another dimensionless quantity that plays the role of the basic reproduction number $R_0$ of the model. When we apply the model and particle filter algorithm to COVID-19 infection data from several counties in Northeastern Ohio and Southeastern Michigan we found the proposed reproduction number $R_0$ to have a consistent dynamic behavior within both states, thus proving to be a reliable summary of the success of the mitigation measures. Since its emergence in Wuhan, China, at the end of 2019, the novel coronavirus SARS-CoV-2 has spread worldwide. In a little over four months it has evolved into a pandemic affecting nearly every country in the world in spite of the measures taken to control and contain the contagion. The novelty of the virus, which is related, but different from the coronaviruses responsible for the SARS and MERS epidemics in 2003 and 2009, poses a challenge to using mathematical models to predict the dynamics of the pandemic and the effects of changing mitigation strategies. Following the initial outbreak, number of contributors have either proposed mathematical models specifically designed to examine the spread of COVID-19 [1, 2] , or addressed important points related to the estimation of the key model parameters [3, 4] from partial, biased daily updated data [5, 6] and incomplete information about the pathogen responsible for the pandemic. Early virological assessment of SARS-CoV-2 [7] from a small sample of patients in Germany found active virus replication in the upper respiratory tract, as opposed to SARS where the predominant expression was in the lower respiratory tract. Patients were tested when symptoms were mild or in the prodromal phase. Pharyngeal viral shedding was very high during the first week of symptoms, peaking on the fourth day, with shedding continued after the symptoms subsided, well into the second week. Moreover, most of the patients seemed to have passed their shedding peak in the upper respiratory tract at the time of the first testing, suggesting efficient transmission of SARS-CoV2 through pharyngeal shedding when the symptoms are mild. In [8] peak shedding is suggested to occur around day 0, and 44% of the transmission is estimated to occur in the asymptomatic phase. These specific clinical features need an interpretation in the epidemic models. The spread and speed of the COVID-19 pandemic has triggered a burst of modeling activity in the effort to help predict the location and intensity of the next hotspots. One of the most used indicators of the potential for spread of an infectious agent is the basic reproduction number R 0 [9, 10, 11] , although questions have been raised as to how it should be computed [12, 13, 14] and interpreted [15, 16, 17] , and what can be inferred from it [18] . In general, R 0 > 1 signals epidemic potential, and the larger the R 0 , the faster the spread of the infectious agent. In [19] , where a SEIR model is calibrated and tested on the Wuhan outbreak data, the initial estimate of R 0 , whose definition and significance will be discussed later, was updated after the implementation of strict measures for control and prevention of the contagion. The model was then used to predict the magnitude of the pandemic by predicting the number of infected individuals on February 29, 2020 under the two different scenarios of an increasing R 0 , changing from 1.9 to 2.6 and ending at 3.1, corresponding to letting the pandemic follow its course, and that of a decreasing R 0 , going from 3.1 to 2.6, then 1.9, and eventually 0.9 and 0.5, following the mitigation measures taken in Wuhan. Of models particularly adapted to descibe COVID-19, we mention the SIDARTHE model [20] , consisting of eight separate compartments to address the different role in the spread of the pandemic of unreported infectious individuals, who presumably constitute the great majority. The model parameters, inferred from official data on the number of diagnosed and recovered cases, and fitted by recursive least squares methods, are adaptively changed during the simulation to reflect the introduction of progressively restrictive measures. More specifically, for the Wuhan outbreak the value of R 0 , set to 2.38 on day 1, is changed to 1.6 on day 5, increasing to 1.8 on day 12 and returning to 1.6 on day 22. After day 22, R 0 is reduced to 0.99, and from day 38 it is set to 0.85. The SIDARTHE model is then used to predict the course of the pandemic if the lockdown measures are either weakened or tightened. The importance of considering the contribution from undocumented infections is repeatedly addressed in the COVID-19 literature [21] . The population seroprevalence of antibodies in a random sample of 3300 people in Santa Clara County, California [22] early in the U.S. outbreak, would indicate the number of infections to be much higher than the number of reported cases, possibly by a factor of 50 to 80. This is in line with 10% seropositivity reported in the town of Robbio, Italy, and 14% in the town of Gangelt, Germany. Another important issue, raised in [23] , is the bias in the estimate of key epidemic parameters if the delay in case reporting is not taken into consideration. Documented and undocumented infections are accounted for in separate compartments in the SEIR network dynamic metapopulation model in [24] , which assumes that asymptomatic, or oligosymptomatic cases can expose a far larger proportion of the susceptible cohort to the virus than can reported cases, with asymptomatic cases less infectious than symptomatic ones. The model takes into account the travels of infectious individuals between cities by setting up a network comprised of 375 cities in China, with the amount of traffic between any pair of cities estimated from mobility data, and assigns fours separate cohort to each city. The estimator of the distribution of the six model parameters via a variant of ensemble Kalman filter suggests that in the first month of the Wuhan outbreak, only approximately one out of six infections was reported. The adoption of strong mitigation measures starting from January 23, 2020 is included in the model by first reducing the mobility coefficients to 2%, of the normal value, and by setting them to zero when all traffic stopped. The median latent period of COVID-19 in Wuhan according to this model is 3.7 days, and the median infectious period 3.48 days, with a median delay in case reporting of 10 days at first, and 6 days after the lockdown. The median estimate of the reproduction number varies from 2.38 in the early phase, to 1.66 and 0.99 after containment measures, when the fraction of undocumented infections decreases sharply. In [25] , the epidemic is assumed to go through three phases: in the first there is a slow accumulation of new infections, most of which are unreported. The second phase is characterized by a rapid growth of infections, diseases and deaths, while in the third phase the epidemic slows down due to the depletion of susceptible individuals. The SIR model used to describe the course of the COVID-19 pandemic, calibrated on the daily number of deaths in the United Kingdom and Italy and with R 0 = 2.75 and R 0 = 2.225, respectively, inferred from the literature, suggest that the epidemic already started one month before the first reported death. The most widely used tools in the United States for predicting the dynamics of the pandemic include the CHIME [26, 27] and the IHME forecasting model [28] . The former is an SIR/SEIR based differential-algebraic simulation tool with an accessible interface. The latter, in order to counteract the typical SIR overestimation of the proportion of infected and to focus on the most severely ill patients, uses empirically observed COVID-19 population mortality curves. The underlying statistical model assumes that the cumulative death rate at each location follows the Gaussian error function, and produces long and short term predictions. The goal of this article is to provide a relatively simple and flexible model equipped with a Bayesian state and parameter estimation protocol to dynamically process the COVID-19 data inflow to assess the current standing of the population and to make short term forecasts of the progression. All parameters and outputs of the algorithm are easily interpretable and adjustable. The underlying model is an adaptation of the classical SEIR model [29] , adjusted to better conform with certain specific features of the current COVID-19 epidemics. In particular, we reinterpret the cohort of exposed individuals, defining it as individuals who carry and shed the virus asymptomatically, presymptomatically, or oligosymptomatically, thus not being isolated or hospitalized. Moreover, this cohort does not contribute to the number of confirmed cases. We refer to this model as SE(A)IR. We propose a Bayesian particle filtering (PF) algorithm for estimating dynamically the state vector consisting of the sizes of the four cohorts in the model based on a Poisson distributed observation of the infected cohort size, with the dynamic model generating the mean of the distribution. Concomitantly, we estimate the presumably dynamically changing rate of transmission with posterior envelopes of model uncertainty. Being a fully Bayesian algorithm, the output consist of model uncertainties. Moreover, we show the forecasting of the expected number of new infections based on the model with predictive uncertainty envelopes. One key factor contributing to the challenge of making predictions and planning is the unknown number of individuals spreading the virus asymptomatically. The particle filter algorithm provides an estimate for the ratio of the asymptomatic and symptomatic virus carriers. A novel feature of this contribution is the derivation of a Riccati type equation for the ratio of the sizes of the two cohorts. Moreover, the Riccati equation has a short time approximate stable equilibrium. The equilibrium value, which can be analytically calculated from the model parameters, corresponds well to the model-based estimated ratio and can be used to define a dynamically changing effective basic reproduction number R 0 for the epidemic, facilitating the comparison of model predictions with other models. The methodology is extensively tested using COVID-19 data of 18 counties in Northeastern Ohio (Cleveland area) and 19 counties in Southeastern Michigan (Detroit area) during the period from early March 2020 to early May 2020, including the period when both states introduced similar, yet slightly different mitigation protocols. Compartment models in mathematical epidemiology partition a homogenous and well-mixed population into cohorts of individuals at different stages of the infection [?] . The popular SIR model, with separate compartments for susceptible (S), infected (I) and recovered (R), introduced nearly a century ago by Kermack and McKendrick [29] , introduced a population dynamics component into the previous, purely phenomenological statistical models (see, e.g. [30] ) that still seem to have a life of their own in modeling the COVID-19 epidemic [28] . A significant challenge for the control and containment of the COVID-19 epidemic is the spread of the infection by a large portion of asymptomatic or lightly symptomatic infectious individuals who are unaware of being vectors of the virus. This is especially problematic when, as is the case at the time of the writing of this article, due to limited availability, testing priority is given to symptomatic individuals or to vulnerable populations, thus the size of the asymptomatic cohort must be estimated indirectly. In the next subsection, we propose a compartment model that can be used to obtain such an estimate, and discuss its advantages and limitations. We begin by considering a modification of the classical SEIR model where the infected cohort I is subdivided into two groups according to the manifestation of symptoms, denoting by A the asymptomatic, infected, and infectious subcohort and by I the symptomatic infected one. Hence, while E and A are both asymptomatic and technically infected, the E cohort is not infectious as in the classical SEIR model, as opposed to A that sheds the virus. The compartment model, schematically represented by the branching flow diagram in the left panel of Figure 1 , is governed by the system of differential equations where the functional form of the fluxes is will be specified below. The COVID-19 data, consisting of the daily count of newly reported infection, corresponds to observations of the flux ϕ 2 : In the absence of additional populationlevel inputs, this is the data that must be used to estimate the cohort sizes and model parameters. In particular, if no data concerning the asymptomatic cohort dynamics are available, the values of the fluxes ϕ 3 and ϕ 3 can be set rather arbitrarily, since they are minimally connected with the fluxes in the lower branch of the flow diagram, whose values are part of the observations. To estimate for the size of the asymptomatic cohort in the absence of additional information, it is necessary to modify the model. Below we propose a modified version of the model that is suitable for estimating the size of the asymptomatic cohort, while retaining many of the salient features of the extended model. The modification that we propose applies an approach similar to that used in metabolic network model reduction [?] , where lumping enzymatic reactions whose parameters cannot be estimated from the data is fairly common. In our reduced model, the fictitious compartment E(A) embeds the asymptomatic cohort into the exposed one, as After a non-infectious incubation period, the exposed individuals branch either to symptotmatic (I) or asymptomatic (A) infectious compartments, with unknown frequencies and unknown reasons. In the modified SE(A)IR model (right), the two compartments E and A on the blue background are lumped together to form the fictitious E(A) compartment representing the exposed, asymptomatic cohorts. The two fluxes ϕ 3 . In this manner, the flux ϕ 3 , that describes a variation internal to the lumped compartment, is no longer part of the model governing equations. Assuming, for simplicity, that both asymptomatic and symptomatic individuals move to the recovered compartment with the same relative rate, denoted by γ, it follows that The flux ϕ 2 represents the rate at which part of the exposed individuals develop symptoms, and for simplicity, we assume that the flux is proportional to the size of the compounded cohort, that is, One of the most relevant parameters in an epidemic is the rate at which susceptible individual are infected. In the classical SEIR model, the transmission flux assumes the form where β is the transmission rate and N is the population size. However, one of the problems of COVID-19 is the significant amount of virus transmission by the asymptomatic cohort within E(A). Our model assumes that in the current COVID-19 epidemics, most infected individuals who have developed symptoms, are either hospitalized or in self-isolation, thus playing a limited role in the exposure of susceptible individuals to the contagion. Acknowledging the differing roles that symptomatic and asymptomatic virus shedders play in transmission to susceptibles, we define where p and q are frequencies, 0 ≤ p, q ≤ 1. The frequency p may be interpreted as the fraction of the compound cohort E(A) who are infectious. Classical SEIR models, where usually it is assumed that the exposed cohort is not 5 Figure 1 : Left: compartment diagram of the model including both symptomatic and asymptomatic infected cohorts. After a non-infectious incubation period, the exposed individuals branch either to symptotmatic (I) or asymptomatic (A) infectious compartments, with unknown frequencies and unknown reasons. In the modified SE(A)IR model (right), the two compartments E and A on the blue background are lumped together to form the fictitious E(A) compartment representing the exposed, asymptomatic cohorts. The two fluxes ϕ 3 and ϕ 3 are merged to a single flux ϕ 3 . depicted in Figure1, (right), thus the size of the new cohort is 3 . In this manner, the flux ϕ 3 , that describes a variation internal to the lumped compartment, is no longer part of the model governing equations. Assuming, for simplicity, that both asymptomatic and symptomatic individuals move to the recovered compartment with the same relative rate, denoted by γ, it follows that The flux ϕ 2 represents the rate at which part of the exposed individuals develop symptoms, and for simplicity, we assume that the flux is proportional to the size of the compounded cohort, that is, One of the most relevant parameters in an epidemic is the rate at which susceptible individual are infected. In the classical SEIR model, the transmission flux assumes the form where β is the transmission rate and N is the population size. However, one of the problems of COVID-19 is the significant amount of virus transmission by the asymptomatic cohort within E(A). Our model assumes that in the current COVID-19 epidemics, most infected individuals who have developed symptoms, are either hospitalized or in self-isolation, thus playing a limited role in the exposure of susceptible individuals to the contagion. Acknowledging the differing roles that symptomatic and asymptomatic virus shedders play in transmission to susceptibles, we define where p and q are frequencies, 0 ≤ p, q ≤ 1. The frequency p may be interpreted as the fraction of the compound cohort E(A) who are infectious. Classical SEIR models, where usually it is assumed that the exposed cohort is not infective, hence only the infected I cohort is responsible for the spreading of the epidemic, can be accounted for by setting p = 0, q = 1. In our model COVID-19 dynamics, we assume that p > q. The transmission rate β, which is the quantity of primary interest when it comes to assessing how fast the epidemics spread, integrates elements related to the characteristics of the pathogen determining the probability of infection of a given susceptible contact, as well as factors related to social behavior, including the number and nature of daily contacts. In summary, the goverining equations of the proposed COVID-19 model are where β is the transmission rate, γ is the recovery rate, η is the symptomatic rate integrating in it the incubation process E → I, and µ is the death rate. The data that is used to inform our parametrized model, as well as to update the state estimation of the cohorts over time is the number of confirmed, symptomatic new daily cases, or, equivalently, daily observations of the flux or, in fact, of a stochastic integer-valued realization of it, as explained in more detail in the next section. Since, as already pointed out, the rate of transmissibility integrates not only properties of the pathogen, but also factors related to human behavior that change in the course of an epidemic, in general β is not a static parameter. In the filtering approach used to estimate the size of the cohorts in the course of the epidemic, to be explained below, β is modeled as a time dependent parameter. In the absence of a known deterministic dynamical evolution model for β, its proposed changes will be described in terms of a stochastic geometric random walk. Before describing the computational methodology on which our model predictions are based, some qualitative comments about the model are in order. In classical SEIR models, the inclusion of the exposed cohort E adds a time delay corresponding to the incubation period not accounted for in SIR models. While our expression for ϕ 1 implicitly assumes that the infected and asymptomatic individuals are immediately infectious, the flux ϕ 2 introduces a slight delay in the I cohort if, for instance, the transmission rate is changed. Moreover, if p = 0, we may write and by absorbing p into the transmission rate β, the model can be written in terms of the ratio q/p determining the size of the relative effect of infected symptomatic individuals on the spread of the infection. Hence, without a loss of generality, we may set p = 1, and assume q < 1. The value q = 0.1 used in our numerical tests is a rough estimate, and the effect of that parameter on the estimated quantities is briefly discussed in the results. Finally, we point out that the simplifying assumption that the symptomatic and asymptomatic individuals recover at the same relative rate is not essential for the methodology developed below, and can be easily removed. In the discussion to follow, we simplify the notation and write E instead of E(A) for the exposed/asymptomatic cohort size. Bayesian filtering algorithms update the information about the state vector and parameters in a sequential manner as new data arrives. In the following, we use the convention of denoting by upper case letters the random variables and by lower case letter their realizations. We refer to [31, 32] for particle filtering, and to [33, 34, 35] for the particular type of applications to ODE systems. Let {X t } denote a discrete time Markov process, with the transition probability distribution π X t+1 |Xt (x t+1 | x t ). Furthermore, let {B t } denote the stochastic process representing the observations, and let π Bt| (b t | x t ) denote the likelihood density, where we implicitly assume that the current observation B t depends only on the current state X t and not on the past. Finally, let B t denote the cumulative data up to time t, that is, We denote by B t the set of observed realizations, B t = {b 1 , b 2 , . . . , b t }. In Bayesian filtering algorithms the update of the posterior distributions is carried out in two consecutive steps, where the first step is referred to as propagation step, and the second as correction, or analysis step. The first step is accomplished through the Chapman-Kolmogorov formula, while the analysis step builds on Bayes' formula, that is, the predicted distribution of X t+1 acts as the prior as the next observation arrives. In the following, we specify the transition probability kernel and the likelihood in our model, and describe the computational steps for the numerical implementation. We then extend the discussion to include the estimation of static parameters. Consider our modified SE(A)IR model introduced in the previous section, and define the state vector at time t Since we assume that the infectivity parameter β may vary over time, we denote its value at time t by β t . Formally, let ψ be the numerical propagator advancing the state variable and the infectivity parameter from one day to the next, In our computations, the time integration performed by means of a standard ODE solver such as Runge-Kutta, and during the one day propagation step, the infectivity β t is kept constant. Formally, we write a propagation model of the form and we account for uncertainties both in the state vector z t and the parameter vector β t by introducing an innovation term. We guarantee that all components of the state vector and the parameter β are nonnegative, with a multiplicative innovation, assuming a geometric random walk model, where C ∈ R 4 is a diagonal positive definite matrix. In the definition of the likelihood, we assume that the data consist of realizations b t of the daily new infections, denoted by B t . The new infection count is assumed to be Poisson distributed, with the Poisson parameter equal to the flux ϕ 2 (t), Therefore, the likelihood of the observed number b t of new infections is To initialize the process, we need to define the initial state at the time t = 0 before the first observed infections. Since the initial state is unknown, its initial value becomes part of the estimation problem. We postulate that before the first infection is observed, there is an unknown number of asymptomatic individuals in the community. We assume that the initial number of asymptomatic cases follows a Poisson distribution with a uniformly distributed expected value Λ ∼ Uniform([0, λ max ]). Moreover, we assume that β 0 follows a uniform distribution over some interval [β min , β max ]. We are now ready to outline the particle filter (PF) algorithm for estimating the state vector X t based on the infection count. Assume that at time t a sample from the distribution π Xt|Bt {x 1 t , x 2 t , . . . , x N t }, is available, and each sample point is associated its corresponding weight w j t . We write a particle approximation of the Chapman-Kolmogorov formula and combine it with Bayes' formula to obtain the updating formula Let x j t+1 denote a propagated predictor particle associated with x j t , that is, . At the arrival of the next observation b t+1 , the likelihood π B t+1 |X t+1 (b t+1 | x j t+1 ) expresses how well the predictor is at explaining the data. Writing the above formula as , we can interpret the formula as follows: Given a predictor particle x j t+1 , the expression (1), combining the importance of its predecessor in the weight and its fitness in the likelihood, evaluates the relevance of the particle; (2) weighs the importance of the new particle relative to the predictor, and the transition kernel (3) generates a new particle from the predictor by the formula (7). This hierarchical organization is the backbone of the particle filter algorithm. Initialize: Draw N independent realizations of β 0 ∼ Uniform([β min , β max ]) and Λ ∼ Uniform([0, λ max ]), and generate N realizations of the initial state of Z 0 , then define the initial cloud of the particles While t < t max do (a) Propagate the particles according to (6) to generate the predictive particle cloud (b) Extract the second component e j t+1 of each of the propagated particles, and compute the weights, (c) Sample with replacement N indices j ∈ {1, 2, . . . , N }, j = 1, 2, . . . , N , with the probability weights g j t+1 . Define the new resampled predictive cloud Generate a new particle cloud through the innovation process, log x j t+1 = log x j t+1 + C 1/2 w j , w j ∼ N (0, I 5 ). (d) Extract the second component e j t+1 from each new particle, and compute the new likelihood weights, Update the weights, . Advance t → t + 1. In the particle filter algorithm, the model parameters γ, η and µ are fixed. It is possible to modify the algorithm to estimate these parameters also. In that case, to estimate the death rate µ, the information about the deceased must be included in the data, as the new infection count is essentially insensitive to that parameter. The parameters γ and η are less time-dependent than the infectivity β, and we set their values according to what is suggested in the literature. A challenges in forecasting the spread of the COVID-19 epidemic is the presumably large portion of population that is asymptomatic or shows only light symptoms, while shedding the virus, which in our model is part of the cohort E. The ratio between the numbers of symptomatic and asymptomatic individuals, can be estimates from the output of the particle filter. Differentiating ρ with respect to time, expressing the derivatives of E and I in terms of the governing equations (3)-(4), and simplifying, we find that ρ satisfies the Riccati equation If we assume that the frequency of susceptible cohort ν S = S/N is approximately constant, we find an equilibrium state ρ * of the ratio, which is a stable equilibrium. In particular, when the infection has not spread yet to a significant portion of the population, substituting ν S ≈ 1 yields a potentially useful estimate for the prevalence of the asymptomatic infection in the population. In the computed results, a comparison of the values of ratio ρ computed from the state vectors and the equilibrium estimate, can be used to asses whether the infection pattern is settling to or near the equilibrium state. The equilibrium condition provides a natural way of defining a basic reproduction number R 0 for the model. Assuming equilibrium, we have Thus, when R 0 = 1 the infection stops growing under the assumption of a stable ratio of asymptomatic to symptomatic infections. This interpretation of the reproduction number is in close agreement with the R 0 for SIR models. While not perfect, this number summarizes the phase of the epidemic, as our computed examples will demonstrate. To test our model, we consider the COVID-19 data of new infection counts in different counties of Northeastern Ohio and Southeastern Michigan. The counties represent different population densities and demographics, e.g., Cuyahoga (OH) and Wayne (MI) include dense urban areas (Cleveland and Detroit), Summit (OH) and Genesee (MI) represent mid-size urban centers (Akron and Flint), Lake (OH) and Oakland (MI) represent areas near urban centers, while Holmes (OH) and Sanilac (MI) host rural communities. The data consist of the confirmed daily cases made available by USAfacts [6] . The values of the model parameters and of the prior densities, as described in the algorithm in Section 3, are listed in Table 1 . The innovation covariance matrix C is a diagonal matrix of the form where σ 2 1 and σ 2 2 are the variances of the uncertainty in the logarithm of the state vector Z t and in the logarithm of the transmission rate β t ,respectively. Observe that for the susceptible population, the variance is weighted with the population squared to keep the innovation for this cohort from becoming excessive. We have and without the scaling by N , the innovation in a large population could be significantly large. Numerical tests indicate that with a too large innovation, the algorithm may go astray. The panels in the left column of Figures 2-3 display the raw data and the dynamically estimated expected values of the new infections computed with the particle filter. In the estimation process, we averaged the data over a moving window of three days to attenuate the effects of, e.g., reporting lags, and differences between weekends and weekdays. For each county we show the 50% and 75% credibility intervals and the median defined by the particles. More precisely, for each parameter/state vector sample X j t , 1 ≤ j ≤ N , we calculate the corresponding fluxes ϕ j 2 (t) = ηE j t , and evaluate the interval obtained by discarding the 1/2 × (1 − p/100) × N smallest and largest values. The red curve is the median value of the fluxes. The middle column in Figures 2-3 shows the 50% and 75% posterior belief envelope of the transmission rate parameter for the respective counties. Finally, the right column of Figure 2-3 shows the 50% and 75% posterior envelopes estimated ratio between the cohorts E and I. In the same figure, the dashed curve represents the equilibrium value ρ * of the ratio based on the Particle filter paremeters a priori lower bound transmission rate [1/days] β min 0.1 a priori upper bound for transmission rate β max 0.5 a priori upper bound of initial infectious individuals λ max 5 number of particles N 20 000 standard deviation of innovation of the state σ 1 0.01 standard deviation of innovation of the transmission rate σ 2 0.1 recovery rate γ 1/21 incubation rate η 1/7 death rate µ 0.004 (8) with β(t) equal to the posterior mean, Towards the end of the observation period the equilibrium value is in good agreement with the sample-based estimate. The numerical results indicate that the asymptotic value ρ * is a good proxy of the ratio E/I, in particular once the transmission rate β has stabilized, therefore providing a quick way to assess the size of the asymptomatic cohort from the size of infected cohort. To understand better the dependency of the ratio ρ * on the model parameters, we introduce two dimensionless quantities characterizing the system of differential equations, Neglecting the effect of the death rate µ, and assuming that the infection is not yet widespread, so that S/N ≈ 1, we find an approximate formula for ρ * of (8) in terms of the dimensionless quantities t and s, Figure 5 shows the equilibrium value as a function of the dimensionless parameters for three different choices of the parameter: q = 0.1 (left), q = 0.5 (center) and q = 1 (right). In the four Ohio counties, the equilibrium value is consistently near ρ * ≈ 0.5, while slightly below it in the Michigan counties. Observe that the value ρ * = 0.5 would correspond to effective basic reproduction number indicating that the disease is in a slow progression phase. While this definition of R 0 in terms of ρ * implicitly assumes equilibrium, it is possible to compute the time course of R 0 for the period when the system has not reached the equilibrium, by using formula (9) and the estimated transmission rate β. Figure 4 shows the posterior belief envelopes for R 0 for six counties, three in Ohio, and three in Michigan, calculated from the posterior sample for β at each time. We did not show the results for Holmes County (OH) and Sanilac County (MI), whose R 0 was consistently very low. The time courses of R 0 show similar patterns in all six counties, and while in Michigan the peak values were higher than in Ohio, the values towards the end of the observation period are lower, varying around the critical values R 0 = 1. The particle solutions to the estimation problem provide a natural tool for predicting new infections. For each particle, the last realization x j t provides an initial value for the differential equations and an estimated value for the transmission rate, thus we may use the system of ODEs to propagate the particle data forward in time. From the ensemble of the predictions we can then extract the forecasting belief envelopes, similarly as we did for the nowcasting problem. Figures 6-7 show the predictions for the following 20 days for the four sample counties under the assumption that the mitigation conditions remain unchanged. In addition, we have run two alternative scenarios: In the first one, the last estimate of the transmission rate for each particle is multiplied by a factor κ = 0.8, simulating a tightening of the containment and control measures, while in the second scenario, the transmission rate is increased by multiplicative factor κ = 1.2, simulating a relaxation of the control measures. We observe that the predictions illustrate well the what the lower R 0 in Michigan compared to Ohio means in terms of expected number of new infections. To further test the predictive skills of the model, we leave out from the current data set {b 1 , b 2 , . . . , b T } the last t pred data points, considering only the data up to T = T − t pred , and compute a sample of predicted new cases, defining the predicted particle sample that is, we propagate the particles computed at T forward, and generate artificial observations. A comparison of the resulting predictive envelopes and the actual data is indicative of the predictive skill of the model: large deviations indicate that the model may not have taken some important factors into account. Figure 8 shows the 10 day predictions for three counties in Ohio and three counties in Michigan, together with the actual data. The plots indicate that, in general, the model captures the trend of the data relatively well, with the exception of the Saginaw County (MI), where the predicted trend seems higher than the actual observation. Finally, we consider the sensitivity of the results to the choice of some of the model parameters that assumed to be known, and in particular, the time constants T inc = 1/η (incubation) and T rec = 1/γ (recovery). To test the sensitivity, we select one of the counties, Cuyahoga (OH) and run the program with different parameter combinations. Figure 9 -12 show a selection of results. The results show that both the estimated transmission rate and the ratio ρ = E/I are rather robust for variations of the parameter, the former decreasing slightly with increasing T rec . Interestingly, towards the end of the observation interval, the ratio ρ is in all cases close to the equilibrium value ρ * ≈ 0.5. Therefore, the value of R 0 is approximately However, this approximation gives an idea of the spreading of the transmission only if the ratio E/I is near the equilibrium. The plots show that in particular at the beginning of the observation period the ratio is far from the equilibrium, and R 0 would predict a too slow growth rate for the transmission dynamics. The proposed dynamical Bayesian filtering algorithm based on particle filters is shown to be able to produce a robust and consistent estimate of the state vectors consisting of the sizes of the four cohorts of a new SE(A)IR model for COVID-19 spread, as well as of the transmission rate, a key parameter that is not expected to be a constant. In epidemiology models, the transmission rate β is usually defined as β = (transmission probability per contact )×(number of contacts per day), where the first factor is related to the characteristics of the pathogen, while the second factor is related to the behavior of the individuals and can change in connection with hygiene, social distancing and other mitigation measures. In the Ohio and Michigan counties, we observe a rapid growth of β at the beginning of the outbreak, followed by a clear and consistent drop. The initial increase may be due, to some extent, to the scarcity of the data in the beginning of the epidemics, and reflect the learning effort of the algorithm. The drop that follows, however, correlates well with the stay-at-home order and other social distancing measures introduced early on in both states, from March 16 to March 23 in Michigan, and from March 15 to March 19 in Ohio. In both states and across all counties considered here, the transmission rate is fairly stable, except for Wayne County, Michigan, where β is slightly higher early in the epidemic. This may be the effect of a wider use of public transportation, socio-economic conditions and demographic factors. If the reason for the drop is the adoption of mitigation measures, it is natural to wonder why its onset is not the same in all counties. A possible explanation of the delay may lie in the fact that the infection arrived in different communities through the mobility of infectious individuals, and what happens earlier in large, densely populated areas affects the surrounding communities with a delay. The effects of the network structure on the timing and intensity of the COVID-19 spread will be addressed separately. This SE(A)IR model we present directly addresses the role of asymptomatic individuals shedding virus, many of whom recover before developing symptoms, in transmission dynamics. We demonstrate a method for dynamically estimating the ratio of asymptomatic/presymptomatic/oligosymptomatic individuals in the E compartment to symptomatic individuals in the I compartment. This ratio, whose equilibrium value consistently settles to a value less than one across all counties examined and with a wide range of model parameters, is considerably lower than the 50-85 ratio speculated in [22] based on a serosurvey conducted in California on April 3-4, 2020. This disparity may stem partly from the growing availability of testing throughout April as well as the cross-reactivity in the ELISA tests used as part of the serosurvey. Understanding better this discrepancy will be a topic of future studies. However, we emphasize that while our methodology predicts that the asymptomatic cohort is typically smaller than the symptomatic one, according to the model, it is still responsible for the majority of transmissions. Furthermore, as already pointed out, the current SE(A)IR model does not address some of the features of COVID-19 transmission, including the latent period of the asymptomatic cohort. While designing a model that properly addresses the delay of the infectious phase of the asymptomatic patients is straightforward, retaining a structure that allows us to inform about the size of asymptomatic cohort based on data on symptomatic patients alone remains a challenge. Elaborating on that point will be a future direction of this research. The model-based predictions indicate that the current value of β alone is not enough to project how the infection process will proceed. In fact, while the value β at the end of the considered time interval is nearly the same for all counties, the predictions on the new infections differ significantly. The main reason for this discrepancy is the different size of the cohorts. The numbers infected tend to increase strongly even if β is relatively small if the pipeline of exposed/asymptomatic individuals is crowded. Because an unobservable concentration of individuals in the exposed/asymptomatic category may represent an uptick in future cases, only several days (preferably two weeks) of consecutively falling case numbers likely represents systematic decline in population-level disease activity. The uncertainty associated with undulating case counts is reflected in the wide predictive envelopes seen in projections for several of the Ohio counties. It is worth noting 10 day projections shown in Figure 8 tend to overestimate observed incidence in the last few days. We suspect this is attributable in part to reporting delays. In this paper we introduce a new way of estimating the reproduction number R 0 for the proposed model, based on an equilibrium condition of a related non-linear Riccati type equation. The equilibrium condition, in turn, was shown to correspond well to the estimated ratio of the symptomatic and non-symptomatic cases after the transmission rate had stabilized, presumably due to the social distancing measures. Interestingly, the equilibrium value ρ * and the corresponding R 0 are very consistent for a number of counties in two states, Ohio and Michigan, that adopted similar distancing measures around the same time, regardless of the fact that Michigan had a significantly higher number of infection than Ohio at the time of the adoption. These observations give an indication that the quantities reflect well the effectiveness of the mitigation measures, providing a useful and quick indicator for assessing such measures. In the left column, the propagation for each particle is continued by using the last state vector as initial data, and the last estimated β value as transmission rate. In the middle column, the transmission rates are slightly lowered by a factor of 0.8, while in the right column, the rates are increased by a factor of 1.2. The predictive envelopes correspond to 50% and 75% levels of belief. The predictions are based on the data excluding the last ten observations. The two different shades correspond to 50% and 75% uncertainty of predicting new observations. The actual data are shown as stem plots. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The lancet infectious diseases First-wave COVID-19 transmissibility and severity in China outside hubei after control measures, and second-wave scenario planning: a modelling impact assessment. The Lancet Athanasios Tsakris, and Constantinos Siettos. Data-based analysis, modelling and forecasting of the COVID-19 outbreak Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak Coronavirus 2019-nCoV global cases Coronavirus locations: COVID-19 map by county and state Katrin Zwirglmaier, Christian Drosten, and Clemens. Virological assessment of hospitalized patients with COVID-2019 Temporal dynamics in viral shedding and transmissibility of COVID-19 The analysis of equilibrium in malaria The epidemiology and control of malaria. The Epidemiology and Control of Malaria A brief history of R 0 and a recipe for its calculation On the definition and computation of the basic reproduction ratio R 0 in models for infectious diseases in heterogeneous populations The construction of next-generation matrices for compartmental epidemic models The estimation of the basic reproduction number for infectious diseases. Statistical methods in medical research Complexity of the basic reproduction number (R 0 ) Pespectives on the basic reproductive ratio The failure of r 0 . Computational and mathematical methods in medicine Unraveling R 0 : Considerations for public health applications Phaseadjusted estimation of the number of coronavirus disease Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Covid-19 antibody seroprevalence Effect of changing case definitions for COVID-19 on the epidemic curve and transmission parameters in mainland China: a modelling study Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2) Fundamental principles of epidemic spread highlight the immediate need for largescale serological surveys to assess the stage of the SARS-CoV-2 epidemic. medRxiv Simcovid: An open-source simulation program for the covid-19 outbreak. medRxiv Locally informed simulation to predict hospital capacity needs during the covid-19 pandemic IHME COVID-19 health service utilization forecasting team and Christopher JL Murray. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months A contribution to the mathematical theory of epidemic Causes of death in England and Wales. Second Annual Report of the Registrar General of Births, Deaths and Marriages in England Combined parameter and state estimation in simulation-based filtering Statistical and computational inverse problems Parameter estimation for stiff deterministic dynamical systems via ensemble Kalman filter Linear multistep methods, particle filtering and sequential Monte Carlo Astrocytic tracer dynamics estimated from [1-11C]-acetate PET measurements The work of DC was partly supported by NSF grants DMS-1522334 and DMS-1951446, and the work of ES was partly supported by NSF grant DMS-1714617.