key: cord-241057-cq20z1jt authors: Han, Jungmin; Cresswell-Clay, Evan C; Periwal, Vipul title: Statistical Physics of Epidemic on Network Predictions for SARS-CoV-2 Parameters date: 2020-07-06 journal: nan DOI: nan sha: doc_id: 241057 cord_uid: cq20z1jt The SARS-CoV-2 pandemic has necessitated mitigation efforts around the world. We use only reported deaths in the two weeks after the first death to determine infection parameters, in order to make predictions of hidden variables such as the time dependence of the number of infections. Early deaths are sporadic and discrete so the use of network models of epidemic spread is imperative, with the network itself a crucial random variable. Location-specific population age distributions and population densities must be taken into account when attempting to fit these events with parametrized models. These characteristics render naive Bayesian model comparison impractical as the networks have to be large enough to avoid finite-size effects. We reformulated this problem as the statistical physics of independent location-specific `balls' attached to every model in a six-dimensional lattice of 56448 parametrized models by elastic springs, with model-specific `spring constants' determined by the stochasticity of network epidemic simulations for that model. The distribution of balls then determines all Bayes posterior expectations. Important characteristics of the contagion are determinable: the fraction of infected patients that die ($0.017pm 0.009$), the expected period an infected person is contagious ($22 pm 6$ days) and the expected time between the first infection and the first death ($25 pm 8$ days) in the US. The rate of exponential increase in the number of infected individuals is $0.18pm 0.03$ per day, corresponding to 65 million infected individuals in one hundred days from a single initial infection, which fell to 166000 with even imperfect social distancing effectuated two weeks after the first recorded death. The fraction of compliant socially-distancing individuals matters less than their fraction of social contact reduction for altering the cumulative number of infections. The pandemic caused by the SARS-CoV-2 virus has swept across the globe with remarkable rapidity. The parameters of the infection produced by the virus, such as the infection rate from person-to-person contact, the mortality rate upon infection and the duration of the infectivity period are still controversial . Parameters such as the duration of infectivity and predictions such as the number of undiagnosed infections could be useful for shaping public health responses as the predictive aspects of model simulations are possible guides to pandemic mitigation [7, 10, 20] . In particular, the possible importance of superspreaders should be understood [24] [25] [26] [27] . [5] had the insight that the early deaths in this pandemic could be used to find some characteristics of the contagion that are not directly observable such as the number of infected individuals. This number is, of course, crucial for public health measures. The problem is that standard epidemic models with differential equations are unable to determine such hidden variables as explained clearly in [6] . The early deaths are sporadic and discrete events. These characteristics imply that simulating the epidemic must be done in the context of network models with discrete dynamics for infection spread and death. The first problem that one must contend with is that even rough estimates of the high infection transmission rate and a death rate with strong age dependence imply that one must use large networks for simulations, on the order of 10 5 nodes, because one must avoid finite-size effects in order to accurately fit the early stochastic events. The second problem that arises is that the contact networks are obviously unknown so one must treat the network itself as a stochastic random variable, multiplying the computational time by the number of distinct networks that must be simulated for every parameter combination considered. The third problem is that there are several characteristics of SARS-CoV-2 infections that must be incorporated in any credible analysis, and the credibility of the analysis requires an unbiased sample of parameter sets. These characteristics are the strong age dependence of mortality of SARS-CoV-2 infections and a possible dependence on population density which should determine network connectivity in an unknown manner. Thus the network nodes have to have location-specific population age distributions incorporated as node characteristics and the network connectivity itself must be a free parameter. 3 An important point in interpreting epidemics on networks is that the simplistic notion that there is a single rate at which an infection is propagated by contact is indefensible. In particular, for the SARS-CoV-2 virus, there are reports of infection propagation through a variety of mucosal interfaces, including the eyes. Thus, while an infection rate must be included as a parameter in such simulations, there is a range of infection rates that we should consider. Indeed, one cannot make sense of network connectivity without taking into account the modes of contact, for instance if an individual is infected during the course of travel on a public transit system or if an individual is infected while working in the emergency room of a hospital. One expects that network connectivity should be inversely correlated with infectivity in models that fit mortality data equally well but this needs to be demonstrated with data to be credible, not imposed by fiat. The effective network infectivity, which we define as the product of network connectivity and infection rate, is the parameter that needs to be reduced by either social distancing measures such as stay-at-home orders or by lowering the infection rate with mask wearing and hand washing. A standard Bayesian analysis with these features is computationally intransigent. We therefore adopted a statistical physics approach to the Bayesian analysis. We imagined a six-dimensional lattice of models with balls attached to each model with springs. Each ball represents a location for which data is available and each parameter set determines a lattice point. The balls are, obviously, all independent but they face competing attractions to each lattice point. The spring constants for each model are determined by the variation we find in stochastic simulations of that specific model. One of the dimensions in the lattice of models corresponds to a median age parameter in the model. Each location ball is attracted to the point in the median age parameter dimension that best matches that location's median age, and we only have to check that the posterior expectation of the median age parameter for that location's ball is close to the location's actual median age. Thus we can decouple the models and the data simulations without having to simulate each model with the characteristics of each location, making the Bayesian model comparison amenable to computation. Finally, the distribution of location balls over the lattice determines the posterior expectation values of each parameter. We matched the outcomes of our simulations with data on the two week cumulative death counts after the first death using Bayes' theorem to obtain parameter estimates for the infection dynamics. We used the Bayesian model comparison to determine posterior expectation values for parameters for three distinct datasets. Finally, we simulated the effects of various partially effective social-distancing measures on random networks and parameter sets given by the posterior expectation values of our Bayes model comparison. We used data for the SARS-CoV-2 pandemic as compiled by [28] from the original data We generated random G(N, p = 2L/(N − 1)) networks of N = 90000 or 100000 nodes with an average of L links per node using the Python package NetworkX [36] . ScalePopDens ≡ L is one of the parameters that we varied. We compared the posterior expectation for this parameter for a location with the actual population density in an attempt to predict the appropriate way to incorporate measurable population densities in epidemic on network models [37, 38] . We used the Python Epidemics on Networks package [39, 40] to simulate networks with specific parameter sets. We defined nodes to have status Susceptible, Infected, Recovered or Dead. We started each simulation with exactly one infected node, chosen at random. The simulation has two sorts of events: 1. An Infected node connected to a Susceptible node can change the status of the Susceptible node to Infected with an infection rate, InfRate. This event is network connectivity dependent. Therefore we expect to see a negative or inverse correlation between InfRate and ScalePopDens. 2. An Infected node can transition to Recovered status with a recovery rate, RecRate, or transition to a Dead status with a death rate, DeathRate. Both these rates are entirely node-autonomous. The reciprocal of the RecRate parameter (RecDays in the following) is the number of days an individual is contagious. We assigned an age to each node according to a probability distribution parametrized by the median age of each data set (country or state). As is well-known, there is a wide disparity in median ages in different countries. The probability distribution approximately models the triangular shape of the population pyramids that is observed in demographic studies. We parametrized it as a function of age a as follows: Here MedianAge is the median age of a specific country, MaxAge = 100 y is a global maxi- It is computationally impossible to perform model simulations for the exact age distribution for each location. We circumvented this problem, as detailed in the next subsection (Bayes setup), by incorporating a ScaleMedAge parameter in the model, scaled so that ScaleMedAge = 1.0 corresponds to a median age of 40 years. The node age is used to make the DeathRate of any node age-specific in the form of an age-dependent weight: where a[n] is the age of node n and AgeIncrease = 5.5 is an age-dependence exponent. w(a) is normalized so that a w(a|AgeIncrease)P (a|MedianAge = 38.7y) = 1, using the median age of New York state's population as the value of AgeIncrease given above was approximately determined by fitting to the observed age-specific mortality statistics of New York state [35] . However, we included AgeIncrease as a model parameter since the strong age dependence of SARS-CoV-2 mortality is not well understood, with the normalization adjusted appropriately as a function of AgeIncrease. Note that a decrease in the median age with all rates and the age-dependence exponent held constant will lead to a lower number of deaths. We use simulations to find the number of Dead nodes as a function of time. The first time at which a death occurs following the initial infection in the network is labeled Time-FirstDeath. Figure close to its actual median age. We implemented Bayes' theorem as usual. The probability of a model, M, given a set of after the first death did not affect our results. As alluded to in the previous subsection, the posterior expectation of the ScaleMedAge parameter (×40 y) for each location should turn out to be close to the actual median age for each location in our setup, and this was achieved (right column, Figure 5 ). We simulated our grid of models on the NIH Biowulf cluster. Our grid comprised of 56448 ×2 parametrized models simulated with 40 random networks each and parameters in all possible combinations from the following lists: parameters. In particular, note that the network infectivity (InfContacts) has a factor of two smaller uncertainty than either of its factors as these two parameters (InfRate and ScalePopDens) cooperate in the propagation of the contagion and therefore turn out to have a negative posterior weighted correlation coefficient (Table I ). The concordance of posterior expectation values (Table I) This goes along with the approximately 80 day period between the first infection and the first death for a few outlier trajectories. However, it is also clear from the histograms in Figure 9 and the mean TimeFirstDeath given in Table I that the likely value of this duration is considerably shorter. Finally, we evaluated a possible correlation between the actual population density and the ScalePopDens parameter governing network connectivity. We found a significant correlation When we added additional countries to the European Union countries in this regression, we obtained (p < 0.0019, r = 0.33) ScalePopDens(US&EU+) = 0.11 ln(population per km 2 ) + 2.9. While epidemiology is not the standard stomping ground of statistical physics, Bayesian model comparison is naturally interpreted in a statistical physics context. We showed that taking this interpretation seriously leads to enormous reductions in computational effort. Given the complexity of translating the observed manifestations of the pandemic into an understanding of the virus's spread and the course of the infection, we opted for a simple data-driven approach, taking into account population age distributions and the age dependence of the death rate. While the conceptual basis of our approach is simple, there were computational difficulties we had to overcome to make the implementation amenable to computability with finite computational resources. Our results were checked to not depend on the size of the networks we simulated, on the number of stochastic runs we used for each model, nor on the number of days that we used for the linear regression. All the values we report in Table I are well within most estimated ranges in the literature but with the benefit of uncertainty estimates performed with a uniform model prior. While each location ball may range over a broad distribution of models, the consensus posterior distribution (Table I) shows remarkable concordance across datasets. We can predict the posterior distribution of time of initial infection, TimeFirstDeath, as shown in Table I . The dynamic model can predict the number of people infected after 21 the first infection (right panel, Figure 10 ) and relative to the time of first death (left panel, Figure 10 ) because we made no use of infection or recovery statistics in our analysis [9] . Note the enormous variation in the number of infections for the same parameter set, only partly due to stochasticity of the networks themselves, as can be seen by comparing the upper and lower rows of Figure 4 . With parameters intrinsic to the infection held fixed, we can predict the effect of various degrees of social distancing by varying network connectivity. We assumed that a certain fraction of nodes in the network would comply with social distancing and only these compliant nodes would reduce their connections at random by a certain fraction. Figure 12 shows the effects of four such combinations of compliant node fraction and fraction of con- (Table II) with the posterior expectations of parameters (Table I) shows that the Bayes entropy of the model posterior distribution is an important factor to consider, validating our initial intuition that optimization of model parameters would be inappropriate in this analysis. The regression we found (Eq.'s 6, 7, 8) with respect to population density must be considered in light of the fact that many outbreaks are occurring in urban areas so they are not necessarily reflective of the true population density dependence. Furthermore, we did not find a significant regression for the countries of the European Union by themselves, perhaps because they have a smaller range of population densities, though the addition of these countries into the US states data further reduced the regression p-value of the null hypothesis without materially altering regression parameters. Detailed epidemiological data could be used to clarify its significance. [ [24] [25] [26] [27] have suggested the importance of super-spreader events but we did not encounter any difficulty in modeling the available data with garden variety G(n, p) networks. Certainly if the network has clusters of older nodes, there will be abrupt jumps in the cumulative death count as the infection spreads through the network. Furthermore, it would be interesting to consider how to make the basic model useful for more heterogenous datasets such as all countries of the world with vastly different reporting of death statistics. Using the posterior distribution we derived as a starting point for more complicated models may be an approach worth investigating. Infectious disease modeling is a deep field with many sophisticated approaches in use [39, [41] [42] [43] and, clearly, our analysis is only scratching the surface of the problem at hand. Network structure, in particular, is a topic that has received much attention in social network research [37, 38, [44] [45] [46] . Bayesian approaches have been used in epidemics on networks modeling [47] and have also been used in the present pandemic context in [2, 27, 48] . To our knowledge, there is no work in the published literature that has taken the approach adopted in this paper. There are many caveats to any modeling attempt with data this heterogenous and complex. First of all, any model is only as good as the data incorporated and unreported SARS-CoV-2 deaths would impact the validity of our results. Secondly, if the initial deaths occur in specific locations such as old-age care centers, our modeling will over-estimate the death rate. A safeguard against this is that the diversity of locations we used may compensate to a limited extent. Detailed analysis of network structure from contact tracing can be used to correct for this if such data is available, and our posterior model probabilities could guide such refinement. Thirdly, while we ensured that our results did not depend on our model ranges as far as practicable, we cannot guarantee that a model with parameters outside our ranges could not be a more accurate model. The transparency of our analysis and the simplicity of our assumptions may be helpful in this regard. All code is available 23 An SEIR infectious disease model with testing and conditional quarantine The Lancet Infectious Diseases The Lancet Infectious Diseases The Lancet Infectious Diseases Proceedings of the 7th Python in Science Conference 2015 Winter Simulation Conference (WSC) Agent-based modeling and network dynamics Infectious Disease Modeling Charting the Next Pandemic: Modeling Infectious Disease Spreading in the Data Science Age We are grateful to Arthur Sherman for helpful comments and questions and to Carson Chow for prepublication access to his group's work [6] . This work was supported by the