key: cord-326314-9ycht8gi authors: Armstrong, Eve; Runge, Manuela; Gerardin, Jaline title: Identifying the measurements required to estimate rates of COVID-19 transmission, infection, and detection, using variational data assimilation date: 2020-11-02 journal: Infect Dis Model DOI: 10.1016/j.idm.2020.10.010 sha: doc_id: 326314 cord_uid: 9ycht8gi We demonstrate the ability of statistical data assimilation (SDA) to identify the measurements required for accurate state and parameter estimation in an epidemiological model for the novel coronavirus disease COVID-19. Our context is an effort to inform policy regarding social behavior, to mitigate strain on hospital capacity. The model unknowns are taken to be: the time-varying transmission rate, the fraction of exposed cases that require hospitalization, and the time-varying detection probabilities of new asymptomatic and symptomatic cases. In simulations, we obtain estimates of undetected (that is, unmeasured) infectious populations, by measuring the detected cases together with the recovered and dead - and without assumed knowledge of the detection rates. Given a noiseless measurement of the recovered population, excellent estimates of all quantities are obtained using a temporal baseline of 101 days, with the exception of the time-varying transmission rate at times prior to the implementation of social distancing. With low noise added to the recovered population, accurate state estimates require a lengthening of the temporal baseline of measurements. Estimates of all parameters are sensitive to the contamination, highlighting the need for accurate and uniform methods of reporting. The aim of this paper is to exemplify the power of SDA to determine what properties of measurements will yield estimates of unknown parameters to a desired precision, in a model with the complexity required to capture important features of the COVID-19 pandemic. The coronavirus disease 2019 (COVID- 19) is burdening health care systems worldwide, threatening physical and psychological health, and devastating the global economy. Individual countries and states are tasked demonstrate the power and versatility of the SDA technique to explore what is required of measurements to complete a model with a dimension sufficiently high to capture the policy-relevant complexities of COVID-19 transmission and containment -an examination that has not previously been done. To this end, we sought to estimate the parameters noted above, using simulated data representing a metropolitan-area population loosely based on New York City. We examined the sensitivity of estimations to: i) the subpopulations that were sampled, ii) the temporal baseline of sampling, and iii) uncertainty in the sampling. Results using simulated data are threefold. First, reasonable estimations of time-varying detection probabilities require the reporting of new detected cases (asymptomatic and symptomatic), dead, and recovered. Second, given noiseless measurements, a temporal baseline of 101 days is sufficient for the SDA procedure to capture the general trends in the evolution of the model populations, the detection probabilities, and the time-varying transmission rate following the implementation of social distancing. Importantly, the information contained in the measured detected populations propagates successfully to the estimation of the numbers of severe undetected cases. Third, the state evolution -and importantly the populations requiring inpatient care -tolerates low (∼ five percent) noise, given a doubling of the temporal baseline of measurements; the parameter estimates do not tolerate this contamination. Finally, we discuss necessary modifications prior to testing with real data, including lowering the sensitivity of parameter estimates to noise in data. The model is written in 22 state variables, each representing a subpopulation of people; the total population is conserved. Figure 1 shows a schematic of the model structure. Each member of a Population S that becomes Exposed (E) ultimately reaches either the Recovered (R) or Dead (D) state. Absent additive noise, the model is deterministic. Five variables correspond to measured quantities in the inference experiments. As noted, the model is written with the aim to inform policy on social behavior and contact tracing so as to avoid exceeding hospital capacity. To this end, the model resolves asymptomatic-versus-symptomatic cases, undetected-versus-detected cases, and the two tiers of hospital needs: the general (inpatient, nonintensive care unit (ICU)) H versus the critical care (ICU) C populations. The resolution of asymptomatic versus symptomatic cases was motivated by an interest in what interventions are necessary to control the epidemic. For example, is it sufficient to focus only on symptomatic individuals, or must we also target and address asymptomatic individuals who may not even realize they are infected? The detected and undetected populations exist for two reasons. First, we seek to account for underreporting of cases and deaths. Second, we desire a model structure that can simulate the impact of increasing detection rates on disease transmission, including the impact of contact tracing. Thus the model was 3 J o u r n a l P r e -p r o o f structured from the beginning so that we might examine the effects of interventions that were imposed later on. The ultimate aim here is to inform policy on the requirements for containing the epidemic. We included both H and C populations because hospital inpatient and ICU bed capacities are the key health system metrics that we aim to avoid straining. Any policy that we consider must include predictions on inpatient and ICU bed needs. Preparing for those needs is a key response if or when the epidemic grows uncontrolled. For details of the model, including the differential equations describing the mass action between susceptible and infectious individuals and the disease progression through different sub-populations, see Appendix A. SDA is an inference procedure, or a type of machine learning, in which a model dynamical system is assumed to underlie any measured quantities. This model F can be written as a set of D ordinary differential equations that evolve in some parameterization t as: where the components x a of the vector x are the model state variables, and unknown parameters to be estimated are contained in p(t). A subset L of the D state variables is associated with measured quantities. One seeks to estimate the p unknown parameters and the evolution of all state variables that is consistent with the L measurements. A prerequisite for estimation using real data is the design of simulated experiments, wherein the true values of parameters are known. In addition to providing a consistency check, simulated experiments offer the opportunity to ascertain which and how few experimental measurements, in principle, are necessary and sufficient to complete a model. method, and we employ a method of annealing to identify a lowest minumum -a procedure that has been referred to loosely in the literature as variational annealing (VA). The cost function A 0 used in this paper is written as: One seeks the path X 0 = x(0), ..., x(N), p(0), ...p(N) in state space on which A 0 attains a minimum value 1 . Note that Equation 1 is shorthand; for the full form, see Appendix A of Ref [18] . For a derivation -beginning with the physical Action of a particle in state space -see Ref [26] . The first squared term of Equation 1 governs the transfer of information from measurements y l to model states x l . The summation on j runs over all discretized timepoints J at which measurements are made, which may be a subset of all integrated model timepoints. The summation on l is taken over all L measured quantities. The second squared term of Equation 1 incorporates the model evolution of all D state variables x a . The term f a (x(n)) is defined, for discretization, as: 1 2 [F a (x(n)) + F a (x(n + 1))]. The outer sum on n is taken over all discretized timepoints of the model equations of motion. The sum on a is taken over all D state variables. R m and R f are inverse covariance matrices for the measurement and model errors, respectively. We take each matrix to be diagonal and treat them as relative weighting terms, whose utility will be described below in this Section. The procedure searches a (D (N + 1) + p (N + 1))-dimensional state space, where D is the number of state variables, N is the number of discretized steps, and p is the number of unknown parameters. To perform simulated experiments, the equations of motion are integrated forward to yield simulated data, and the VA procedure is challenged to infer the parameters and the evolution of all state variablesmeasured and unmeasured -that generated the simulated data. This specific formulation has been tested with chaotic models [27] [28] [29] [30] , and used to estimate parameters in models of biological neurons [13, 14, 16, 18, 31, 32] , as well as astrophysical scenarios [33] . Our model is nonlinear, and thus the cost function surface is non-convex. For this reason, we iterate -or anneal -in terms of the ratio of model and measurement error, with the aim to gradually freeze out a 1 It may interest the reader that one can derive this cost function by considering the classical physical Action on a path in a state space, where the path of lowest Action corresponds to the correct solution [26] 5 J o u r n a l P r e -p r o o f lowest minimum. This procedure was introduced in Ref [34] , and has since been used in combination with variational optimization on nonlinear models in Refs [11, 18, 31, 33] above. The annealing works as follows. We first define the coefficient of measurement error R m to be 1.0, and write the coefficient of model error R f as: R f = R f ,0 α β , where R f ,0 is a small number near zero, α is a small number greater than 1.0, and β is initialized at zero. Parameter β is our annealing parameter. For the case in which β = 0, relatively free from model constraints the cost function surface is smooth and there exists one minimum of the variational problem that is consistent with the measurements. We obtain an estimate of that minimum. Then we increase the weight of the model term slightly, via an integer increment in β, and recalculate the cost. We do this recursively, toward the deterministic limit of R f R m . The aim is to remain sufficiently near to the lowest minimum to not become trapped in a local minimum as the surface becomes resolved. We will show in Results that a plot of the cost as a function of β reveals whether a solution has been found that is consistent with both measurements and model. We based our simulated locality loosely on New York City, with a population of 9 million. For simplicity, we assume a closed population. Simulations ran from an initial time t 0 of four days prior to 2020 March 1, the date of the first reported COVID-19 case in New York City [35] . At time t 0 , there existed one detected symptomatic case within the population of 9 million. In addition to that one initial detected case, we took as our initial conditions on the populations to be: 50 undetected asymptomatics, 10 undetected mild symptomatics, and one undetected severe symptomatic. 2 We chose five quantities as unknown parameters to be estimated (Table 1) : 1) the time-varying transmission rate K i (t); 2) the detection probability of mild symptomatic cases d Sym (t), 3) the detection probability of severe symptomatic cases d Sys (t), 4) the fraction of cases that become symptomatic f sympt , and 5) the fraction of symptomatic cases that become severe enough to require hospitalization f severe . Here we summarize the key features that we sought to capture in modeling these parameters; for their mathematical formulatons, see Appendix B. The transmission rate K i (often referred to as the effective contact rate) in a given population for a given infectious disease is measured in effective contacts per unit time. This may be expressed as the total contact rate multiplied by the risk of infection, given contact between an infectious and a susceptible individual. The contact rate, in turn, can be impacted by amendments to social behavior 3 . As a first step in applying SDA to a high-dimensional epidemiological model, we chose to condense the significance of K i into a relatively simple mathematical form. We assumed that K i was constant prior to the implementation of a social-distancing mandate, which then effected a rapid transition of K i to a lower constant value. Specifically, we modeled K i as a smooth approximation to a Heaviside function that begins its decline on March 22, the date that the stay-at-home order took effect in New York City [39] : 25 days after time t 0 . For further simplicity, we took K i to reflect a single implementation of a social distancing protocol, and adherence to that protocol throughout the remaining temporal baseline of estimation. Detection rates impact the sizes of the subpopulations entering hospitals, and their values are highly uncertain [3, 4] . Thus we took these quantities to be unknown, and -as detection methods will evolve -time-varying. We also optimistically assumed that the methods will improve, and thus we described them as increasing functions of time. We used smoothly-varying forms, the first linear and the second quadratic, to preclude symmetries in the model equations. Meanwhile, we took the detection probability for asymptomatic cases (d As ) to be known and zero, a reasonable reflection of the state of testing in that population during our study period. Finally, we assigned as unknowns the fraction of cases that become symptomatic ( f sympt ) and fraction of symptomatic cases that become sufficiently severe to require hospitalization ( f severe ), as these fractions possess high uncertainties (Refs [40] and [41] , respectively). As they reflect an intrinsic property of the disease, we took them to be constants. All other model parameters were taken to be known and constant (Appendix A); however, the values of many other model parameters also possess significant uncertainties given the reported data, including, for example, the fraction of those hospitalized that require ICU care. Future VA experiments can treat these quantities as unknowns as well. The simulated experiments are summarized in the schematic of The base experiment (i in Figure 2 ) possesses the following features: a) five measured populations: detected asymptomatic As det , detected mild symptomatic Sym det , detected severe symptomatic Sys det , Recovered R, and Dead D; b) a temporal baseline of 101 days, beginning on 2020 February 26; c) no noise in measurements. 3 The reproduction number R 0 , in the simplest SIR form, can be written as the effective contact rate divided by the recovery rate. In practice, R 0 is a challenge to infer [22, [36] [37] [38] . 7 J o u r n a l P r e -p r o o f The three variations on this basic experiment (ii through iv in Figure 2 ), incorporate the following independent changes. In Experiment ii, the R population is not measured -an example designed to reflect the current situation in some localities (e.g. Refs [3, 4] ). Experiment iii includes a ∼ five percent noise level (for the form of additive noise, see Appendix C) in the simulated R data, and Experiment iv includes that noise level in addition to a doubled temporal baseline. For each experiment, twenty independent calculations were initiated in parallel searches, each with a randomly-generated set of initial conditions on state variable and parameter values. For technical details of all experimental designs and implementation, see Appendix C. The salient results for the simulated experiments i through iv are as follows: i Table 2 . The base experiment that employed five noiseless measured populations over 101 days yielded an excellent solution in terms of model evolution and parameter estimates. Prior to examining the solution, we shall first show the cost function versus the annealing parameter β, as this distribution can serve as a tool for assessing the significance of a solution. Figure 3 shows the evolution of the cost throughout annealing, for the ten distinct independent paths that were initiated; the x-axis shows the value of Annealing Parameter β, or: the increasing rigidity of the model constraint. At the start of iterations, the cost function is mainly fitting the measurements to data, and its value begins to climb as the model penalty is gradually imposed. If the procedure finds a solution 8 J o u r n a l P r e -p r o o f that is consistent not only with the measurements, but also with the model, then the cost will plateau. In prior to its steep decline. We noted no improvement in this estimate for K i (t), following a tenfold increase in the temporal resolution of measurements (not shown). The procedure does appear to recognize that a fast transition in the value of K i occurred at early times, and that that value was previously higher. It will be important to investigate the reason for this failure in the estimation of K i at early times, to rule out numerical issues involved with the quickly-changing derivative 4 . Figure 5 shows the cost as a function of annealing for the case with no measurement of Recovered Population R. Without examining the estimates, we know from the Cost(β) plot that no solution has been found that is consistent with both measurements and model: no plateau is reached. Rather, as the model constraint strengthens, the cost increases exponentially. Indeed, Figure 6 shows the estimation, taken at β = 2, prior to the runaway behavior. Note the excellent fit to the measured states and simultaneous poor fit to the unmeasured states. As no stable solution is found at high β, we conclude that there exists insufficient information in As det , Sym det , Sys det , and D alone to corral the procedure into a region of state-and-parameter space in which a model solution is possible. We repeated this experiment with a doubled baseline of 201 days, and noted no improvement (not shown). In Experiment iii, the low noise added to R yielded a poor state and parameter estimate (not shown). With a doubled temporal baseline of measurements (Experiment iv), however, the state estimate became robust to the contamination. Figure 7 shows this estimate. While the ∼ five percent noise added to Population R propagates to the unmeasured States S, E, and P, the general state evolution is still captured well. Importantly, the populations entering the hospital are well estimated. Note that some low state estimates (e.g. As) are not perfectly offset by high estimates (e.g. Sym). The addition of noise in these numbers -by definition -breaks the conservation of the population. Finally, the parameter estimates for Experiment iv do not survive the added contamination (not shown). We have endeavoured to illustrate the potential of SDA to systematically identify the specific measurements, temporal baseline of measurements, and degree of measurement accuracy, required to estimate unknown model parameters in a high-dimensional model designed to examine the complex problems that COVID-19 presents to hospitals. In light of our assumed knowledge of some model parameters, we restrict our conclusions to general comments. We emphasize that estimation of the full model state requires measurements of the detected cases but not the undetected, provided that the recovered and dead are also measured. The state evolution is tolerant to low noise in these measurements, while the parameter estimates are not. The ultimate aim of SDA is to test the validity of model estimation using real data, via prediction. In advance of that step, we are performing a detailed study of the model's sensitivity to contamination in the measurable populations As det , Sym det , Sys det , R, and D. Concurrently we are examining means to render the parameter estimation less sensitive to noise, via various additional equality constraints in the cost function, and loosening the assumption of Gaussian-distributed noise. In particular, we shall require that the time-varying parameters be smoothly-varying. It will be important to examine the stability of the SDA procedure over a range of choices for parameter values and initial numbers for the infected populations. This procedure can be expanded in many directions. Currently we are working to divide the model subpopulations by age, and to include age-specific parameters such as susceptibility and the likelihood of requiring hospitalization and intensive care. Specifically, SDA might inform the question of whether the contact matrices among age groups are non-stationary -a question of high interest for predicting age-dependent susceptibility during a second wave [42] of the virus. Other avenues for expansion are as follows: 1) define additional model parameters as unknowns to be estimated, including the fraction of patients hospitalized, the fraction who enter critical care, and the various timescales governing the reaction equations; 2) impose various constraints regarding the unknown time-varying quantities, particularly transmission rate K i (t), and identifying which forms permit a solution consistent with measurements; 3) examine model sensitivity to the initial numbers within each population; 4) examine model sensitivity to the temporal frequency of data sampling. Moreover, it is our hope that the procedure described in this paper can guide the application of SDA to a host of complicated questions surrounding COVID-19. Thank you to Patrick Clay from the University of Michigan for discussions on inferring exposure rates given social distancing protocols. and f severe are constant numbers, as they are assumed to reflect an intrinsic property of the disease. The detection probability of asymptomatic cases is taken to be known and zero. [40] and [41] . For Experiments i and iv, the reported numbers are taken from the annealing iteration with a value of parameter β of 32 and 40, respectively: once the deterministic limit has been reached (see text). For Experiment ii, an attempt was made to retrieve parameter estimates at β = 2; that is: before the solution grows unstable exponentially (see Figure 5 ). See specific subsections for details of each experiment. The blue notation specified by overbrackets denotes the correspondence of specific terms to the reactions between the populations depicted in Figure 1 . J o u r n a l P r e -p r o o f t D Time from entering critical care to death, for severe symptomatics 5.0 [50] a As described in [45] , viral load can be high and detectable for up to 20 days. We choose a shorter duration of infectiousness to capture the time during which transmissibility is highest. J o u r n a l P r e -p r o o f Appendix B: Unknown time-varying parameters to be estimated The unknown parameters assumed to be time-varying are the transmission rate K i , and the detection probabilities d Sym and d Sys for mild and severe symptomatic cases, respectively. The transmission rate in a given population for a given infectious disease is measured in effective contacts per unit time. This may be expressed as the total contact rate (the total number of contacts, effective or not, per unit time), multiplied by the risk of infection, given contact between an infectious and a susceptible individual. The total contact rate can be impacted by social behavior. In this first employment of SDA upon a pandemic model of such high dimensionality, we chose to represent K i as a relatively constant value that undergoes one rapid transition corresponding to a single social distancing mandate. As noted in Experiments, social distancing rules were imposed in New York City roughly 25 days following the first reported case. We thus chose K i to transition between two relatively constant levels, roughly 25 days following time t 0 . Specifically, we wrote K i (t) [51] as: The parameter T was set to 25, beginning four days prior to the first report of a detection in NYC [35] to the imposition of a stay-home order in NYC on March 22 [39] . The parameter s governs the steepness of the transformation, and was set to 10. Parameters f and ξ were then adjusted to 1.2 and 1.5, to achieve a transition from about 1.4 to 0.3. For detection probabilities d Sym and d Sys , a linear and quadratic form, respectively, were chosen to preclude symmetries, and both were optimistically taken to increase with time: dSym(t) = 0.2 · t dSys(t) = 0.1 · t 2 Finally, each time series was normalized to the range: [0:1], via division by their respective maximum values. The simulated data were generated by integrating the reaction equations (Appendix A) via a fourth-order adaptive Runge-Kutta method encoded in the Python package odeINT. A step size of one (day) was used to record the output. Except for the one instance noted in Results regarding Experiment i, we did not examine the sensitivity of estimations to the temporal sparsity of measurements. The initial conditions on the populations were: S 0 = N − 1 (where N is the total population), As 0 = 1, and zero for all others. For the noise experiments, the noise added to the simulated Sym det , Sys det , and R data were generated by Python's numpy.random.normal package, which defines a normal distribution of noise. For the "low-noise" 24 J o u r n a l P r e -p r o o f experiments, we set the standard deviation to be the respective mean of each distribution, divided by 100. For the experiments using higher noise, we multiplied that original level by a factor of ten. For each noisy data set, the absolute value of the minimum was then added to each data point, so that the population did not drop below zero. The optimization was performed via the open-source Interior-point Optimizer (Ipopt) [52] . Ipopt uses a Simpson's rule method of finite differences to discretize the state space, a Newton's method to search, and a barrier method to impose user-defined bounds that are placed upon the searches. We note that Ipopt's search algorithm treats state variables as independent quantities, which is not the case for a model involving a closed population. This feature did not affect the results of this paper. Those interested in expanding the use of this tool, however, might keep in mind this feature. One might negate undesired effects by, for example, imposing equality constraints into the cost function that enforce the conservation of N. Within the annealing procedure described in Methods, the parameter α was set to 2.0, and β ran from 0 to 38 in increments of 1. The inverse covariance matrix for measurement error (R m ) was set to 1.0, and the initial value of the inverse covariance matrix for model error (R f ,0 ) was set to 10 −7 . For each of the four simulated experiments, twenty paths were searched, beginning at randomlygenerated initial conditions for parameters and state variables. All simulations were run on a 720-core, 1440-GB, 64-bit CPU cluster. IHME COVID-19 health service utilization forecasting Team and Christopher JL Murray. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months. medRxiv, page 2020.03 The need for data innovation in the time of covid-19 Estimating the early death toll of covid-19 in the united states Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2) Inverse problem theory and methods for model parameter estimation Numerical weather prediction Atmospheric modeling, data assimilation and predictability Data assimilation: the ensemble Kalman filter Practical methods for optimal control and estimation using nonlinear programming The number of required observations in data assimilation for a shallow-water flow Estimating the state of a geophysical system with sparse observations: time delay methods to achieve accurate initial states for prediction Kalman meets neuron: the emerging intersection of control theory with neuroscience Dynamical estimation of neuron and network properties i: variational methods Dynamical estimation of neuron and network properties ii: path integral monte carlo methods Real-time tracking of neuronal network structure using data assimilation Estimating parameters and predicting membrane voltages with conductance-based neuron models Automatic construction of predictive neuron models through large scale assimilation of electrophysiological data Statistical data assimilation for estimating electrophysiology simultaneously with connectivity within a biological neuronal network Towards real time epidemiology: data assimilation, modeling and anomaly detection of health surveillance data streams Variational data assimilation with epidemic models Bayesian tracking of emerging epidemics using ensemble optimal statistical interpolation Real time bayesian estimation of the epidemic potential of emerging infectious diseases Adjoint-based data assimilation of an epidemiology model for the covid-19 pandemic in 2020 An epidemiological modelling approach for covid19 via data assimilation Efficient bayesian inference of fully stochastic epidemiological models with applications to covid-19 Predicting the future: completing models of observed complex systems Dynamical parameter and state estimation in neuron models. The dynamic brain: an exploration of neuronal variability and its functional significance Estimating the biophysical properties of neurons with intracellular calcium dynamics Accurate state and parameter estimation in nonlinear systems with sparse observations Improved variational methods in statistical data assimilation Nonlinear statistical data assimilation for HVC RA neurons in the avian song system Data assimilation of membrane dynamics and channel kinetics with a neuromorphic integrated circuit An optimization-based approach to calculating neutrino flavor evolution Systematic variational method for statistical nonlinear state and parameter estimation Improved inference of time-varying reproduction numbers during infectious disease outbreaks A new framework and software to estimate time-varying reproduction numbers during epidemics Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures PAUSE order in New York City takes effect Getting a handle on asymptomatic SARS-CoV-2 infection Estimating the burden of sars-cov-2 in france As Americans Brace for Second Wave of COVID-19 Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Estimation of incubation period distribution of COVID-19 using disease onset forward time: a novel cross-sectional and forward follow-up study. medRxiv, page 2020.03.06 Virological assessment of hospitalized patients with COVID-2019 Incidence, clinical outcomes, and transmission dynamics of hospitalized 2019 coronavirus disease among 9,596,321 individuals residing in california and washington Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia Clinical features of patients infected with 2019 novel coronavirus in wuhan, china. The lancet Epidemiology and transmission of covid-19 in shenzhen china: Analysis of 391 cases and 1,286 of their close contacts Clinical course and outcomes of critically ill patients with sars-cov-2 pneumonia in wuhan, china: a single-centered, retrospective, observational study. The Lancet Respiratory Medicine An updated estimation of the risk of transmission of the novel coronavirus (2019-ncov) Short tutorial: getting started with ipopt in 90 minutes