key: cord- -s qnhswe authors: shu, panpan; wang, wei; tang, ming; do, younghae title: numerical identification of epidemic thresholds for susceptible-infected-recovered model on finite-size networks date: - - journal: chaos doi: . / . sha: doc_id: cord_uid: s qnhswe epidemic threshold has always been a very hot topic for studying epidemic dynamics on complex networks. the previous studies have provided different theoretical predictions of the epidemic threshold for the susceptible-infected-recovered (sir) model, but the numerical verification of these theoretical predictions is still lacking. considering that the large fluctuation of the outbreak size occurs near the epidemic threshold, we propose a novel numerical identification method of sir epidemic threshold by analyzing the peak of the epidemic variability. extensive experiments on synthetic and real-world networks demonstrate that the variability measure can successfully give the numerical threshold for the sir model. the heterogeneous mean-field prediction agrees very well with the numerical threshold, except the case that the networks are disassortative, in which the quenched mean-field prediction is relatively close to the numerical threshold. moreover, the numerical method presented is also suitable for the susceptible-infected-susceptible model. this work helps to verify the theoretical analysis of epidemic threshold and would promote further studies on the phase transition of epidemic dynamics. epidemic threshold, which is one of the most important features of the epidemic dynamics, has attracted much attention recently. the existing studies have provided different theoretical predictions for epidemic threshold of the susceptible-infected-recovered (sir) model on complex networks, while the numerical verification of these theoretical predictions is still lacking. as a result, it is very necessary to develop an effective numerical measure for identifying the sir epidemic threshold. in this paper, the numerical identification of the sir epidemic threshold is systematically studied. we present a numerical method by analyzing the peak of the epidemic variability to identify the epidemic threshold. to understand the effectiveness of the variability measure, the distribution of outbreaks sizes is investigated near the epidemic threshold on random regular networks. based on the analysis of the cutoff hypothesis of the outbreak size distribution, we find that the variability measure can provide an excellent identification of the epidemic threshold. we further use the variability measure to verify the existing theoretical predictions on scale-free and real networks. the results show that the heterogeneous meanfield (hmf) prediction agrees very well with the numerical threshold, except the case that the networks are disassortative, in which the quenched mean-field (qmf) prediction is relatively close to the numerical threshold. the numerical method presented can effectively identify sir epidemic thresholds on various networks, and could be extended to other dynamical processes such as information diffusion and behavior spreading. this work provides us a deep understanding of epidemic threshold and would promote further studies on phase transition of epidemic dynamics. models for disease propagation are at the core of our understanding about epidemic dynamics on complex networks. , two epidemic models of particular importance are the susceptible-infected-susceptible (sis) and susceptibleinfected-recovered (sir) models. at each time step, an infected node can transmit a disease to each of its susceptible neighbors with probability b. at the same time, the infected nodes become susceptible again in the sis model or recover in the sir model with probability l. in the sis model, a critical value of the effective transmission rate k ¼ b=l separates the absorbing phase with only healthy nodes from the active phase with a stationary density of infected nodes. differently, no steady state is allowed in the sir model, but a threshold still exists above which the final fraction of recovered nodes is finite. the traditional theoretical study on the epidemic threshold of the sis model is based on the heterogeneous meanfield (hmf) theory, which means that all the nodes within a given degree are considered to be statistically equivalent. , according to the hmf theory, the epidemic threshold of sis model , is given by k hmf c ¼ hki hk i , where hki and hk i are the first and second moments of degree distribution p(k), respectively. as the quenched structure of the network and dynamical correlations between the state of adjacent nodes is a) neglected in the hmf theory, researchers have proposed an important improvement over the hmf theory-quenched mean-field (qmf) theory. the qmf theory fully preserves the actual quenched structure of the network described as its adjacency matrix, and the epidemic threshold is predicted to be [ ] [ ] [ ] where k n is the maximum eigenvalue of the adjacency matrix of a given network. considering that the existing theories more or less have some limitations (e.g., the hmf theory neglects the quenched structure of the network; qmf theory ignores dynamical correlations ) , some numerical methods such as the finite-size scaling analysis, susceptibility, and lifetime measure have been proposed to check the accuracies of different theoretical predictions for the sis model. among these existing methods, the relatively common one for a network with finite size n is the susceptibility measure where q denotes the outbreak size. in ref. , the susceptibility measure has been shown to be very effective for identifying the sis epidemic thresholds on various networks. for another paradigmatic epidemic model, the sir model, there have been a lot of theoretical studies on its epidemic threshold. the earliest theoretical study on the sir epidemic threshold is based on the assumption of homogeneous mixing, showing that the sir epidemic threshold is inversely proportional to the average connectivity hki. at the hmf level, the epidemic threshold of sir model takes the value on networks with power-law scaling pðkÞ $ k Àc , , where c is the degree exponent, the hmf approach predicts a vanishing threshold for scale-free networks with c and the finite threshold for c > . mapping the sir model to a bond percolation process, the epidemic threshold coincides with the result of eq. ( ) for a sir model with unit infection time (e.g., l ¼ ), and when the infection times vary among infected nodes (e.g., l < ), the epidemic threshold is given by according to the qmf theory, the epidemic threshold of the sir model has the same expression as eq. ( ). however, the qmf result is even qualitatively not correct, as the qmf predicts a vanishing threshold for power-law distributed networks with c > that is in conflict with the visually numerical results. although the numerical threshold of the sis model has attracted much attention, [ ] [ ] [ ] [ ] the systematic study of the numerical identification of the sir epidemic threshold is still insufficient. it is well known that the outbreak size becomes finite above the threshold k c . however, as the value of k increases, the outbreak size continuously changes from an infinitesimal fraction to a finite fraction in the sir model, and thus, it is difficult to determine the value of k at which the outbreak size turns to be finite. to our knowledge, there has no effective numerical method for identifying the sir epidemic threshold in previous studies. in this work, we perform extensive numerical simulations of the sir model on networks with finite size, and present a numerical identification method by analyzing the peak of the epidemic variability , (i.e., the maximal value of the epidemic variability) to identify the epidemic threshold. the effectiveness of the numerical measure is checked on random regular networks (rrn), where the hmf theory is exact. to get a deep understanding of the validity of the numerical method, we investigate the distribution of outbreaks sizes near the epidemic threshold. the robustness of the variability measure is confirmed by the analysis of cutoff hypothesis of the outbreak size distribution. we further employ the variability measure to verify the theoretical predictions on scale-free networks and real-world networks, where the results indicate that the hmf prediction agrees very well with the numerical threshold, except the case that the networks are disassortative, in which the qmf prediction is relatively close to the numerical threshold. in this section, we give the detailed description of the simulation of sir model, propose the numerical identification measure of the epidemic threshold, and make deep analysis of the effectiveness of the numerical measure. in the sir model, each node can be one of three states which are the susceptible state, infected state, and recovered state, respectively. at the beginning, one node is randomly selected as the initial infected (i.e., seed), and all other nodes are susceptible. at each time step t, each susceptible node i becomes infected with probability À ð À bÞ n i if it has one or more infected neighbors, where n i is the number of its infected neighbors. at the same time, all infected nodes recover (or die) at rate l and the recovered nodes acquire permanent immunity. time increases by dt ¼ , and the dynamical process terminates when all infected nodes are recovered. in this paper, l is set to , unless otherwise specified. the susceptibility measure can not only identify an effective sis epidemic threshold, but can also be used to determine the critical point of the percolation process. since the connection between sir and bond percolation is made through the assimilation of the size of the percolating giant component with the final number of recovered individuals, we check the effectiveness of susceptibility measure v for the sir model on rrn, where all nodes have exactly the same degree k. on these networks, the hmf prediction we compare the hmf prediction with the numerical threshold identified by the susceptibility measure k v p in fig. (a) , where the result shows that the numerical threshold of the sir model identified by v is significantly larger than =ðk À Þ. considering that the fluctuation of the outbreak size is large near the epidemic threshold, we try to identify the epidemic threshold by the variability measure , which is a standard measure to determine the critical point in equilibrium phase on magnetic system. the inset of fig. (a) shows that the variability d exhibits a peak over a wide range of k, so we estimate the epidemic threshold from the position of the peak of the variability k d p . on rrn with different values of k, we find that k d p is always consistent with the hmf prediction. when the degree k is given, we further consider the relationship between the epidemic threshold and the network size n in fig. (b) , where the numerical thresholds k v p and k d p do not change with n. k d p is very close to the hmf prediction, while there is an obvious gap between k v p and hmf prediction. from the above, we know that the variability d can identify an effective sir epidemic threshold, while the epidemic threshold identified by the susceptibility v is overestimated on rrn. thus, a new problem has arisen: why the variability d performs well but the susceptibility v goes awry for the sir model? b. analysis of the effectiveness of numerical identification measure next, we make an analysis of the effectiveness of the numerical identification measures above, by investigating the distribution of outbreak sizes which has the strong heterogeneity near the epidemic threshold. on rrn with k ¼ , where the epidemic threshold k c ¼ =ðk À Þ ¼ = , fig. (a) shows the distribution of outbreak sizes near k c . the outbreak sizes follow approximately an exponential distribution at k ¼ : that is smaller than k c . at k ¼ k c , the outbreak sizes follow a power-law distribution pðqÞ $ q a with a cutoff at some values, where a ' À : . [ ] [ ] [ ] since the disease may die out quickly or infect a subset of nodes when k > k c , the distribution of outbreak sizes is bimodal, , with two peaks occurring at q ¼ =n and q ' : for k ¼ : , respectively. moreover, the theoretical distribution of outbreak sizes (see appendix for details) is compared with the results obtained by numerical simulations in fig. (a) , where the theoretical probability from eq. (a ) is consistent with the numerical results for relatively small outbreak sizes (q < : ). at the epidemic threshold, the theoretical results also show that the outbreak sizes obey a power-law distribution with the exponent of about À . . when k > k c , some large outbreak sizes constitute a lump in the numerical scattergram, but the probability of large outbreak sizes cannot be solved from eq. (a ). we thus speculate that the nonignorable lump may affect the numerical identification of sir epidemic threshold for the susceptibility measure. to verify the speculation, fig. (b) investigates the effectiveness of the susceptibility measures under some cutoff hypotheses. we set the cutoff value of the outbreak size as r c , which means that only the outbreak sizes with q r c are used to numerically count the susceptibility v under the cutoff hypothesis. three kinds of r c are considered, where r c ¼ : corresponds to the maximum value of small outbreak size before the lump appears in the numerical scattergram, r c ¼ : means that the numerical scattergram consists of a part of the lump, and r c ¼ : means that there is a complete lump in the numerical scattergram. as shown in fig. (b) , the susceptibility measure can indeed give a quite effective estimate of the sir epidemic threshold when the whole lump is ignored (i.e., r c ¼ : ). with the increase of r c , the position of peak value of susceptibility v gradually shifts to the right for large outbreak sizes are considered. this shows that the susceptibility v loses its effectiveness in identifying the sir epidemic threshold due to the existence of the lump. in fig. (c) , the robustness of the variability d is further checked in theory. as the numerical distribution of the large outbreak sizes is concentrated, we assume that there is a lump located at r c with pðr c Þ ¼ À r q k c . based on such theoretical distribution, we plot the variability measure as a function of k for different values of r c in fig. (c) . since the variability d measures the heterogeneity of the outbreak size distribution, which is strongest at the epidemic threshold, - the peak position of the variability measure does not change with the position of the lump. from the above analysis, we can conclude that the variability d is effective in identifying the epidemic threshold of sir model, while the bimodal distribution of outbreak sizes for k > k c leads to the overestimation of the sir epidemic threshold when using the susceptibility v. we further verify the theoretical predictions on scalefree and real networks, by comparing them with the numerical thresholds from the variability d. we build scale-free networks (sfn) with degree distribution pðkÞ $ k Àc based on the configuration model. the so-called structural cutoff k max $ n = and natural cutoff k max $ n =cÀ (ref. ) are considered to constrain the maximum possible degree k max on sfn. differently, the degreedegree correlations vanish on scale-free networks with structural off, while the disassortative degree-degree correlations exist when c < for scale-free networks with natural cutoff, because high degree vertices connect preferably to low degree ones in this case. we consider the sir model on sfn with structural cutoff in figs. (a) and (c) , where the sir epidemic threshold increases monotonically with the degree exponent c, and the variation of epidemic threshold with network size n is approximately linear in logarithmic relationship. the hmf prediction k hmf c is very close to the numerical threshold k d p , while there is an obvious difference between the qmf prediction k qmf c and k d p . the sfn with natural cutoff are considered in figs. (b) and (d) , where the variations of epidemic threshold with c and n are similar to the results on sfn with structural cutoff. when c > , the hmf prediction is close to the numerical threshold, while there is a gap between the qmf prediction and the numerical threshold. since the disassortative degreedegree correlations exist for c < , there is a slight difference between k hmf c and k d p . in fig. (d) , the distinction between k hmf c and k d p becomes large with the increase of n for c ¼ : , while in such case the qmf prediction is always close to the numerical threshold since the principle eigenvector is delocalized when < c = . it can been seen from the above analysis that the numerical method presented provides quantitive indexes for the observations in ref. , where the hmf theory is relatively accurate for the sir model. to check the performance of the variability d on realworld networks, [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] fig. depicts d as a function of k for hamsterster full network, which contains friendships and family links between users of the website hamsterster.com, and facebook (nips) network, which contains facebook user-user friendships. the numerical results intuitively show that the variability d always reaches a maximum value near the value of k above which the outbreak size q becomes finite. the theoretical predictions of the hmf theory and of the qmf theory are quite close to the numerical threshold identified by d on hamsterster full network, but they become poor on facebook (nips) network. considering the difference in the behaviors of the theoretical predictions on the two networks above, the detailed comparisons between the numerical and theoretical thresholds on other real networks are presented in table i . the results indicate that although the hmf prediction and the numerical threshold k d p ðsirÞ are nearly the same for assortative networks, there is an obvious difference between them for the networks showing significant disassortative mixing. the qmf prediction is relatively worse than the hmf prediction for assortative networks, but the former is close to k d p ðsirÞ for some disassortative networks (e.g., router views, caidi, and email contacts ). these findings numerically show the accuracies of existing theoretical predictions on real-world networks. in summary, we have studied the numerical identification of sir epidemic threshold on complex networks with finite size. we have checked the effectiveness of the susceptibility measure for sir on rrn, where the hmf is exact. the results showed an obvious gap between the numerical threshold identified by the susceptibility v and the hmf prediction. then, we proposed the numerical identification method by analyzing the peak of the epidemic variability, and found that the numerical threshold identified by the variability measure d agrees very well with the hmf prediction on rrn. in order to get a deep understanding of the effectiveness of the two numerical measures above, we have analyzed the distribution of outbreak sizes near the epidemic threshold k c . the outbreak sizes follow approximately an exponential distribution when k < k c . at the epidemic threshold, the outbreak sizes follow a power-law distribution with the exponent of about À . . when k > k c , the numerical distribution of outbreak sizes is bimodal with two peaks occurring at q ¼ =n and o( ), respectively. the probability of small outbreak sizes in theory is consistent with that obtained by numerical simulations, but the probability of large outbreak sizes that constitute a lump in the numerical scattergram cannot be obtained theoretically. based on the analysis of the cutoff hypothesis of the outbreak size distribution, we found that the susceptibility measure can give a fairly effective sir epidemic threshold when the lump is ignored. since the variability measure reflects the heterogeneity of the outbreak size distribution, it is always effective in identifying the epidemic threshold, where the distribution of outbreak sizes has the very strong heterogeneity. we further employed the variability measure to verify the theoretical predictions on scale-free and real networks. the hmf prediction is close to the numerical threshold on most of the networks, but on sfn with natural cutoff and degree exponent c < = , it becomes poor due to the existence of disassortative mixing. similarly, the hmf prediction agrees well with the numerical method on real networks with assortative mixing, while it becomes very poor for disassortative networks, where the qmf prediction is relatively close to the numerical threshold. these findings provide quantitive indexes for the accuracies of existing theoretical predictions from the perspective of simulation. as part of the discussion, we have considered the epidemic threshold for l < in fig. . the results on rrn and sfn all show that the numerical thresholds for l ¼ : are a little larger than those for l ¼ : . as shown in the inset, l ! leads to an epidemic threshold close to eq. ( ), while l ! leads to an epidemic threshold close to eq. ( ). it should be pointed out that the numerical threshold of sir model is inclined to the theoretical prediction k c ¼ hki hk iþðlÀ Þhki from the edge-based compartmental theory , when < l < . these findings could be complementary to some existing results. moreover, we have tried applying the variability measure to the identification of the sis epidemic threshold. as shown in fig. and table i , the numerical threshold k d p from the variability measure agrees very well with the threshold k v p identified by the susceptibility measure, whose validity for the sis model has been confirmed in ref. . this shows that the variability measure can also provide an effective estimate of the sis epidemic threshold. we have put forward a numerical method for identifying the epidemic threshold for sir model, which is also suitable for the sis model. this method can effectively identify epidemic thresholds on various networks, and could be extended to other dynamical processes such as information diffusion and behavior spreading. further work should be done to check the effectiveness of this method on more complicated networks (e.g., temporal networks and multilayer networks ). besides, the accurate analytic approximation of the epidemic threshold for general networks remains an table i. characteristics and epidemic thresholds of real-world networks. n is the network size, k max is the maximum degree, r is the correlation coefficient of the degrees, k hmf c and k qmf c are the hmf and qmf predictions for sir, respectively, k d p (sir) denotes the numerical threshold of sir identified by d, and k d p (sis) and k v p (sis) represent the numerical thresholds of sis identified by d and v, respectively. the threshold k c vs. c for l ¼ : (c) and l ¼ : (d) on sfn, where solid and empty symbols denote sfn with structural cutoff k max $ n = and natural cutoff k max $ n =cÀ , respectively. "circles" and "squares" denote k hmf c and k d p , respectively. inset: k c as a function of l on rrn with n ¼ and k ¼ . the results are averaged over Â independent realizations on networks. important problem. this work helps to verify theoretical analysis of epidemic threshold and would promote further studies on phase transition of epidemic dynamics. shu et al. chaos , ( ) dynamical processes on complex networks infections diseases in humans networks: an introduction nonequilibrium phase transitions in lattice models monte carlo simulation in statistical physics proceedings of the th acm sigkdd international conference on knowledge discovery and data mining advances in neural information processing systems for the case of the sir model and similar models with no steady-state, the static properties (e.g., the final outbreak size and the epidemic threshold) of the epidemic outbreak can be mapped into a suitable bond percolation problem. in this framework, the distribution of occupied cluster sizes is related to the distribution of outbreak sizes. to get the distribution of small outbreak sizes in the sir model with a fixed value of k when recovery rate l ¼ , we will present the derivation of the distribution of small occupied cluster sizes in bond percolation with bond occupation probability k. after the percolation process on a general network with arbitrary degree distribution p k , the average degree of the occupied network a , which composes of vertices and occupied edges, is hk t i ¼ khki, where hki is the average degree of the original network a . and, the size distribution of the small subgraphs of network a iswhere s is the small subgraphs size and g ðzÞ is the generating function of the excess degree of network a . in addition, the generating function of degree distribution of a isand we thus havein a random regular network, which has an unique degree k with p k ¼ , we can easily obtain thatandsubstituting eq. (a ) into eq. (a ), we can obtain the distribution of small outbreak sizeswhere cðx þ Þ ¼ x!; a ¼ ðs À Þ; a ¼ sðk À Þ À ðs À Þ, and a ¼ sðk À Þ À . key: cord- - za msu authors: o’regan, suzanne m.; drake, john m. title: theory of early warning signals of disease emergenceand leading indicators of elimination date: - - journal: theor ecol doi: . /s - - - sha: doc_id: cord_uid: za msu anticipating infectious disease emergence and documenting progress in disease elimination are important applications for the theory of critical transitions. a key problem is the development of theory relating the dynamical processes of transmission to observable phenomena. in this paper, we consider compartmental susceptible–infectious–susceptible (sis) and susceptible–infectious–recovered (sir) models that are slowly forced through a critical transition. we derive expressions for the behavior of several candidate indicators, including the autocorrelation coefficient, variance, coefficient of variation, and power spectra of sis and sir epidemics during the approach to emergence or elimination. we validated these expressions using individual-based simulations. we further showed that moving-window estimates of these quantities may be used for anticipating critical transitions in infectious disease systems. although leading indicators of elimination were highly predictive, we found the approach to emergence to be much more difficult to detect. it is hoped that these results, which show the anticipation of critical transitions in infectious disease systems to be theoretically possible, may be used to guide the construction of online algorithms for processing surveillance data. infectious diseases are among the most visible and costly threats to individual and public health. antibiotics, vaccines, and the molecular revolution in biology have not erased their mark. millions of persons die every year from treatable ancient diseases, such as malaria (gething et al. ; who ) , tuberculosis (dye et al. ) , and measles (orenstein and hinman ; simons et al. ) . sometimes, elimination of these diseases through vaccination, prophylaxis, and/or vector control is possible, but sustaining elimination campaigns is difficult as pathogens approach the point of elimination (cohen et al. ) . conversely, emerging pathogens such as sars and swineorigin influenza a (h n ) cause excess mortality, disrupt business, terrorize vulnerable populations, and lead to new, sometimes ineradicable endemic diseases (e.g., hiv/aids). the benefits of accurately forecasting disease emergence would be tremendous: in the case of a low-incidence sarslike pathogen, the savings could be tens of billions of $us (rossi and walker ; smith et al. ); an illness resembling the influenza virus might take millions of lives and impose costs of the same order as a year's gross domestic product (osterholm ) . elimination and emergence of infectious disease both involve a transmission system that is pushed over a critical point. in most cases, criticality occurs at the point where the basic reproduction number, r , the number of secondary infected cases arising from a single infected case in an entirely susceptible population, is equal to one (heffernan et al. ) . similar critical points occur in other complex systems (strogatz ; sole ) . particularly, in noisy (stochastic) systems, these critical points manifest as transitions between alternative modes of fluctuation (scheffer ; scheffer et al. ; lenton ; scheffer et al. ) . we call such a stochastic transition a critical transition if there exists a bifurcation in a suitably constructed limit case of the mean field model. a central problem in the study of critical transitions is the identification of phenomena indicating the proximity to a critical transition in the absence of a detailed understanding on the system's dynamical equations and/or the forcing variables causing the change scheffer et al. ; boettiger and hastings ) . recent studies have established that some noise-induced phenomena may signal the approach to a critical transition in a slowly forced dynamical system (ives and dakos ; dakos et al. a, b; dakos et al. ; carpenter and brock ; donangelo et al. ; seekell et al. ; carpenter and brock ; brock and carpenter ; dakos et al. ; carpenter et al. ; jayaprakash , ) . if this property would apply to infectious diseases, it would suggest a model-independent route to forecasting infectious disease emergence in subcritical systems with r < (i.e., crossing the critical point "from below") and documenting the approach to elimination in endemic (supercritical) disease systems where r > (crossing the critical point "from above"). many of these characteristic noise-induced phenomena involve critical slowing down, a decline in the resilience of the system to perturbations, which generally gives rise to an increase in the variance and autocorrelation of fluctuations as the system approaches the transition (van nes and scheffer ; dakos et al. a) . but, these properties require that the transition be suitably regular (hastings and wysham ) . specifically, to observe critical slowing down requires that the potential function of the system, if it exists, be smooth. also, for systems with multiple attracting sets (e.g., bistable systems), a sufficient level of noise gives rise to "flickering" which exhibits the characteristic increase in variance (but for a different reason) and not the increase in autocorrelation dakos et al. b) . other obstacles to anticipating critical transitions in infectious diseases include that epidemiological systems are complicated by amplification of transients and oscillatory dynamics bauch and earn ) , that infectious diseases are often seasonally forced (altizer et al. ; fraser and grassly ) and propagate in demographically open systems subject to imported cases , and that diseases that are close to elimination or close to emerging are, by definition, characterized by low prevalence and, therefore, both subject to demographic stochasticity and difficult to observe (lloyd-smith et al. ). therefore, to determine whether diagnostic noise-induced phenomena accompany the critical transitions that occur in infectious disease dynamics, namely the transition to endemicity of emerging infectious diseases and the transition to extinction in disease elimination, it will be useful to have a quantitative theory, both to guide the selection of statistical quantities to be investigated and to predict the form that observable quantities should take as the transition is approached. this paper is a contribution to this theory. we consider the compartmental susceptible-infectious-susceptible (sis) and susceptible-infectious-recovered (sir) models, which are widely used to represent the dynamics of a range of nonimmunizing and immunizing pathogens and may be viewed as approximations to an even broader class of infectious diseases. in their deterministic formulations, these models are characterized by a transcritical bifurcation at r = , where the endemic and disease-free steady states meet and exchange stability. the epidemic models we investigate are generalizations of the closed population sis and sir models that allow for immigration from external sources, of which the more familiar models without immigration that are characterized by a transcritical bifurcation are special cases. to appropriately model the slowly forced dynamical system, we first develop mean field theory for the nonstationary systems gradually approaching a bifurcation. we develop the mean field theory here because slowly forced dynamical systems approaching a bifurcation have rarely been investigated (but see kuehn ( ) ). we then develop the full stochastic description in terms of a master equation. our strategy is to use the van kampen system size expansion to separate the mean field dynamics from the fluctuations induced by demographic stochasticity. these fluctuations are described by a fokker-planck equation, from which the power spectrum, variance, autocorrelation, and coefficient of variation may be analytically obtained. our main result is contained in table , which provides formulas expressing these theoretical predictions in terms of the dominant eigenvalue of the mean field solutions, which goes to zero as the system approaches the critical transition. hence, these formulas express how epidemiologically measurable quantities are expected to change as the system approaches the critical transition. a number of approximations are introduced in the derivation of these formulas, including separation of time scales, second-order description of the fluctuations, and continuum representation of the state space. we, therefore, performed simulations to study the possible detrimental effects of these approximations. specifically, we examined the agreement between the theoretical predictions and "measured" statistics obtained in a moving window, as one would analyze real-world data, in both discrete state-space and continuous state-space simulations. we show that the analytical expressions are robust, provided the conditions of fast-slow systems are met. these results demonstrate that the critical transitions associated with disease emergence and elimination may be anticipated even in the absence of a detailed understanding of their underlying causes. we first consider a general sis model that allows for immigration. denoting the proportions of susceptible and infectious populations by x(t) and y(t), respectively, the sis model in a population of size n is given bẏ where β is the the transmission rate, γ is the rate of transfer from the infectious class to the susceptible class, and a dot denotes the time derivative. the model assumes that infection may also occur through contact with a trickle of infectious imports at a rate η, either by susceptibles briefly leaving the population and making contact with infectious individuals located elsewhere or through infectious visitors briefly entering the population and making contact with susceptibles, so that the total force of infection is = βy + η. since the population size n is constant, i.e., x + y = , the sis model may be described by a single equation, we note that eq. ( ) can encompass a variety of infectious disease systems, including closed population si models. thus, the system can be used to model diseases that confer no long-lasting immunity, e.g., sexually transmitted diseases or acute infections such as influenza or diseases that are fatal. if η = , eq. ( ) has two equilibria: the disease-free equilibrium y = and the endemic equilibrium y * = ( − /r ), where r = β/γ denotes the basic reproduction number. when r > , the infection can persist, but if r < , then the infection dies out. the equilibria y and y * meet and exchange stability when the basic reproduction number r = , i.e., a transcritical bifurcation occurs. we will refer to the sis model with no immigration as the limiting case. however, it is usually more realistic to assume that a low level of immigration occurs, i.e., η is positive. if η > , then eq. ( ) has a single positive equilibrium that is always locally stable, provided β, γ > , since the slope at the equilibrium λ = β − βy * − γ − η is negative. moreover, whether η = or η > , the return time − /λ decreases in the limit β → γ . at the critical point in the limiting case sis model, λ = . therefore, the transcritical bifurcation occurs in a suitably constructed limiting case (η = ), which we suggest gives an indication for the behavior of the fluctuations for the stochastic version of the model with immigration, provided η is small. slow changes in the transmission rate β, through demographic or evolutionary means, may induce a transcritical bifurcation of eq. ( ). for example, transmission may decrease as a result of slow increases in vaccination uptake among the population, leading to extinction of the pathogen, or it may increase due to slow decreases in vaccination uptake, potentially causing a transition to endemicity. to formally incorporate slow changes in transmission into the model, we rewrite eq. ( ) as a fast-slow system: where < and the function f describes the change in transmission rate. in this paper, we assume that transmission is a slowly changing linear function of time, i.e., β = β(t) = β( − p(t)). the proportion of the population that are vaccinated is modeled by the function p(t) = p s + p t, with p an incremental change in β. hence,β = p . if p > , then the rate of transmission is slowly declining over time, and if p < , it is slowly increasing. the limit case → of eq. ( ) is eq. ( ), and thus, the endemic equilibrium of the limit case assumes β = β( −p s ), which arises from the level of vaccination uptake p(t) (the bifurcation parameter). figure shows the transcritical bifurcation diagram for the sis model. for small η, the plot of the stable endemic equilibrium y * as a function of vaccination uptake is similar to the bifurcation diagram for η = , except in the vicinity of the critical point p * (inset figure of fig. ). for p > p * , the disease-free equilibrium is stable in the limiting case, but for the model with immigration, the infectious population is sustained at a low level due to importation of the disease from external sources (fig. ) . the inset plot shows a close-up of the diagram near the transcritical bifurcation point in the limiting case where η = . the dashed line indicates the location of the critical point, p * = − /r = . . the bifurcation diagram was plotted using the parameters given in table stochastic description of sis model to investigate the effect of noise on this transition, we assume that fluctuations in the infectious population are caused by demographic stochasticity (intrinsic noise). to understand these fluctuations, we require an individual-level description rather than a population-level description. we assume that all individuals have identical attributes, and individuals may move from the susceptible state s(t) to the infectious state i (t), or from the infectious state to the susceptible state. we assume that the population size is constant, i.e., s(t) + i (t) = n. since the population size is constant, we need only model the transitions into and out of the infectious state. furthermore, since the sis system moving towards a bifurcation is a fast-slow system, the transmission rate β is not constant in time but is a slowly changing function of time. because the probability of transmission between an infectious and susceptible individual is slowly changing over time, we treat the transmission rate as constant for each small time increment dt, β. table shows the variables and parameters of the sis model. we assume that in a sufficiently small time increment dt, the number of infectious individuals i can either (a) increase by , (b) decrease by , or (c) not change in number. infection and removal are the events in an sis process that lead to these changes in state. we denote the number of individuals i at time t or the state of the system by α = i at time t and the alternative states byα, where eitherα = i + orα = i − in the sis model. the system of individuals goes from a state with α = i individuals at time t to a state withα individuals at time t + dt with transition probability per unit time t i (α|α). the system can also go from a state withα individuals at time t to i individuals at time t + dt critical point p * = − /r with a probability fluxt i (α|α). table outlines the events, changes in state, and the transition probability fluxes into and out of state i that can occur in a stochastic sis process with a slowly changing transmission rate. since the transitions in table describe a markov process, we can write down a master equation describing how the probability of there being i individuals at time t evolves with time. the master equation represents the continuous time version of the dynamics of the markov process. derivations of master equations can be found in van kampen ( ) and renshaw ( ) . letting p (i, t) = prob (i (t) = i ) be the probability that the infectious state variable i (t) is equal to some nonnegative integer i , the master equation for the evolution of the probability of the infectious population being in state α = i at time t is whereα describes all other states (table ) , and the probability fluxes into and out of state i are as described in table . this is a system of n + ordinary differential equations, α = , , . . . n, which can be solved with an initial condition of i individuals at time t = , i.e., p (α, ) = for α = i and p (α, ) = for all α = i . thus, the probability distribution at time t = is a delta function. table summarizes the quantities in eq. ( ). we will apply the van kampen system size expansion to the master equation. the system size expansion takes advantage of the fact that, for large population size n, the demographic fluctuations in the population are small, and so n is the expansion parameter. thus, the key assumption for the expansion to be valid is that the system size n is sufficiently large (see appendix a for details). here, we pause to note that that the sis model is equivalent to a simple logistic process, and hence, there are analytical expressions note that β is treated as constant in a small time increment dt, but the transmission depends on a slowly changing linear function of time, β(t) = β( − p(t)) = β( − (p s + p t)). table summarizes the parameters in the rates table master equation terms. the transition probability of infection per unit time is given as an example of a transition into and out of state α for the sis and sir stochastic processes. the other transition probability fluxes can be found in tables and system state α alternative statesα sis i i + , i − sir (s, i ) (s + , i − ), (s − , i ), (s + , i ), (s, i − ), (s, i + ) system probability flux into α probability flux out of α for the mean and variance (allen ) . the i = state is absorbing, and the expected time to extinction is finite (allen ) . nevertheless, if the population size is large, the probability distribution is approximately stationary for a very long time. it is the statistics of this quasi-stationary distribution that we are interested in. following expansion of the master equation, the leading order and next-to-leading order terms are collected, giving rise to a deterministic system that describes the evolution of the trend and its stochastic correction, respectively, see appendix a for details. the deterministic system turns out to be the fast-slow system ( ). the distribution of the fluctuations about the solution of eq. ( ) is given by a linear fokker-planck equation. thus the fluctuations are gaussian distributed about the mean, and thus, i (t) ∼ normal(nϕ(t), nσ ), see appendix a for details and definitions. the fokker-planck equation is equivalent to a stochastic differential equation (gardiner ). this is the key advantage of applying the system size expansion. using the stochastic differential equation, we can obtain analytical expressions for statistical signatures of leading indicators and early warning signals, including the power spectrum and autocorrelation function (see appendix a for details). table summarizes the indicator statistics calculated using the methods described in appendix a for the stable limiting case → for the sis model, in terms of the eigenvalue λ. we will examine how the statistics change over a range of vaccination uptake values in section "results". since the system size expansion involves approximating the discrete random variable i with a normal random variable, its assumptions may break down when i is small. for small numbers of infectious individuals, e.g., if the system is subcritical and, thus, incidence of a disease is low, the assumption that the fluctuations are normally distributed about the mean is likely to be inappropriate. the van kampen expansion is valid when the system is far from the absorbing boundary at i = . however, the approximation upon which the approach is built (eq. ( ) in appendix a) cannot describe chance extinctions, which can occur if i is small. if η = and p > − /r , the disease-free equilibrium is stable, and so the approximation given by eq. ( ) is not valid. due to the absorbing boundary, the probability distribution about the disease-free state for a given time t will be one-sided. on the the other hand, if η > , then, close to the system boundary, the normal distribution approximation about the deterministic mean may be poor because the distribution is bimodal due to extinction events. however, the models with immigration do not have a disease-free state; just a small infectious population when p is approximately greater than /r . therefore, predictions for the quasi-stationary statistics about this state can be obtained. denoting the proportions of susceptible, infectious, and recovered populations by x(t), y(t), and z(t) respectively, the sir model with immigration in a closed population of size n iṡ where β is the transmission rate, γ is the recovery rate, η is the immigration rate, and μ is the per capita birth rate. to maintain a constant population size, the per capita death rate is set equal to the birth rate. a proportion p of the population are chosen at random for vaccination at birth and are recruited into the recovered class. the remaining unvaccinated proportion of the population enter the susceptible class at birth. since the population size n is constant, x + y + z = , system ( ) is equivalent to the following system of ordinary differential equationṡ the sir model is particularly appropriate for acute immunizing infections such as measles and pertussis. in the absence of vaccination and immigration, the basic reproduction number is given by r = β/(γ + μ). if table analytical expressions for quasi-stationary statistics about the endemic infectious quasi-steady state expressed in terms of the eigenvalues. expressions for the endemic equilibrium ϕ * of the sis and sir models are found by solving equations ( ) and ( ) in appendices a and b power spectrum s i (ω) autocorrelation the eigenvalues of the jacobian of the sir model evaluated about ϕ * are complex conjugates when the equilibrium is a stable spiral, i.e., reλ = reλ = reλ and |imλ | = |imλ | = |imλ|. when ϕ * is a stable node, the jacobian matrix has two real negative eigenvalues, λ and λ . when there is no immigration, η = and the power spectrum collapses to zero. variables for each model are described in tables and . the expressions for the power spectrum are multiplied by because they are evaluated over the frequency domain [ , ∞). no closed-form expression for the lag-τ autocorrelation is known, and so it must be evaluated numerically η = , the sir model ( ) has two equilibria: the disease free state (x , y ) = ( − p, ) and the endemic equilibrium the sir model undergoes a transcritical bifurcation at r = . the endemic equilibrium is locally stable if r > and is not biologically feasible if r < , when the disease-free equilibrium is locally stable. as the vaccination uptake p increases, the basic reproduction number reduces by a factor ( − p), i.e., the effective reproduction number is r ( − p). the vaccination uptake p for the effective reproduction number is at the critical vaccination threshold, p * = − /r (anderson and may ), and it is at this critical threshold that the transcritical bifurcation occurs. again, the sir model with no immigration is the relevant limiting case. temporary importation of pathogen often occurs in infectious disease systems (keeling and rohani ). assuming that a low level of immigration occurs, i.e., η > , only a single positive equilibrium is biologically feasible. this equilibrium is a stable spiral when the square of the trace of the jacobian matrix evaluated about the equilibrium is less than four times its determinant and is a stable node if it is greater than or equal to this quantity. complex eigenvalues of the jacobian matrix of eq. ( ) characterize a stable spiral, and the eigenvalues are real and negative if the equilibrium is a stable node. the stable node equilibrium can be thought of as a "disease-free" equilibrium if η is small, in the sense that the infectious population is sustained at a low level due to importation of the disease from external sources. therefore, while the transcritical bifurcation occurs in a suitably constructed limiting case (η = ), the limit case may give an indication for the behavior of the fluctuations for the stochastic version of the model with immigration, provided that η is small. gradual changes in the vaccination uptake rate p may induce a transition from endemicity or to extinction. recruitment of susceptibles may slowly vary over long time scales as a result of demographic or evolutionary changes. vaccination uptake rates are often not constant over time and may exhibit trends, e.g., percentage uptake of pertussis vaccine in the usa (rohani and drake ). mathematically, we may express the sir model approaching a transcritical bifurcation as a fast-slow system: where < and the function f describes the change in vaccination uptake p. again, we model vaccination uptake as a linear function of time p(t) = p s + p t, with p an incremental change in p. consequently,ṗ = p . if p > , recruitment into the susceptible class is slowly declining over time and if p < , then recruitment is slowly increasing over time. in the limit → , the vaccination uptake is fixed at a constant rate p s , and the system is stable. figure shows the transcritical bifurcation diagram for the sir model. the infectious equilibrium y * is plotted as a function of vaccination uptake p. the bifurcation diagram indicates that the endemic infectious equilibrium declines linearly with p in the models with and without immigration. however, the inset plot indicates that in the vicinity of the η = critical point, the infectious equilibrium of the immigration model is elevated relative to the disease-free equilibrium, since the infectious population is sustained at a low level due to immigration. model ( ) is the deterministic description of the sir system approaching a transition, but there will be stochastic fluctuations in the state of the system as the transition is approached. as before, we assume that these fluctuations result from demographic stochasticity. to quantify these fluctuations, we assume individuals are identical. individuals may be recruited into the susceptible state and can transition out of this state through infection or death. infectious individuals may recover or die. assuming that the population size s(t)+i (t)+r(t) = n is constant, we need only consider the transitions into and out of the susceptible and infectious states. furthermore, since the sir system moving towards a bifurcation is a fast-slow system, the vaccination uptake p is not constant in time but is a slowly changing function of time. therefore, the recruitment rate into the susceptible class is slowly changing, but, for each small time increment dt, we can treat the vaccination uptake as constant p. table presents the variables and parameters of the sir model. events that occur in an sir process in a small time increment dt moving slowly towards a transition include infection, recovery, recruitment to the susceptible class, and death due to natural causes. table outlines the events, changes in state, and the transition probabilities per unit time for each change in state. we can construct a master equation in the same manner as for the sis model. letting p (s, i, t) = prob((s(t), i (t)) = (s, i )) be the probability is equal to some nonnegative integer (s, i ), the master equation for the evolution of the probability of the population being in state α = (s, i ) whereα describes all other states (table ) . the master eq. ( ) is nonlinear and, therefore, cannot be solved analytically. to make analytical progress with eq. ( ), again, we can use the van kampen system size expansion (see appendix b for details). the approach gives rise to the deterministic sir fast-slow system ( ) and a linear fokker-planck equation that describes the evolution of the fluctuations, which may be written as a system of stochastic differential equations. these equations can be analyzed using fourier transformation. the solution of the fokker-planck equation is a bivariate normal distribution. table summarizes the indicator statistics in the stable system limiting case → for the sir model, in terms of its eigenvalues. the preceding sections present an analytical theory of early warning signals for emergence and leading indicators of elimination. to investigate the results of this theory for a particular parameter set (table ) , we calculated leading indicators of elimination and emergence, assuming alternatively that (a) the mean proportion of infectious individuals is given by the deterministic endemic equilibrium ( → theory) or (b) assuming it is given by the current state of the fast-slow system approaching a transition. we selected parameters consistent with sexually transmitted table transition probability fluxes for sir model. transitions can occur into and out of state α = (s, i ) change in state transition probability per unit time dt note that p is treated as constant in a small time increment dt, but the vaccination uptake is a slowly changing linear function of time, p = p(t) = p s + p t at time t. table outlines the parameters used in each transition rate sis diseases with long infectious period and large r and parameters typical for childhood infectious diseases with sir dynamics. to examine how different changes in vaccination uptake p affect the statistics, we varied p by / sufficiently large population sizes n were chosen to ensure validity of the van kampen system size expansion. the per capita immigration rate η on the approach to elimination was calculated assuming η = δr /n, where δ is the number of imports per year (keeling and rohani ) . on the approach to emergence, r is approximately one, and the number of infectious individuals is near zero. therefore, η = δ/n (sis model) and η = δ/(n( − p s )) (sir model). all emergence simulations begin with the forcing variable p s = . , beyond the threshold p * for emergence, p * = − /r . the immigration rate remains fixed during simulations. the initial rate of transmission is calculated from r and / year − in the model with immigration. we also compared the elimination indicators with those calculated assuming that the mean proportion of infectious individuals was given by the deterministic endemic equilibrium from the limiting case models with no immigration. the early warning signals of emergence were not calculated for the limiting case because the disease-free equilibrium is stable. to test the robustness of this theory to the range of approximations that were introduced (fast-slow approximation, continuum description, van kampen expansion), we simulated the approach to elimination and emergence in a variety of cases. to simulate the approach to elimination, we followed the "bottom-up" approach of allen ( ) to derive stochastic differential equations that incorporated demographic stochasticity. this approach uses the rates in tables and to build a system of stochastic differential equations. stochastic differential equations formulated in this way are appropriate, provided that the population size is sufficiently large, because then changes in the state variables are assumed to be normally distributed. simulations were compared with output from gillespie's direct method (gillespie ) . the simulations were qualitatively similar for population sizes greater than , . to simulate the approach to emergence, we used gillespie's direct method as this is most appropriate for small population sizes. we simulated the sis and sir stochastic models with immigration approaching elimination and emergence times. to compare the statistics to the theoretical predictions in table , the infectious time series approaching elimination were sampled at yearly intervals. the transcritical bifurcation in these scenarios was approached over a long time frame (e.g., years when p = / in the sir model). however, the transcritical bifurcation to emergence was approached over a relatively short time frame for the sir model ( years). to obtain a better sampled time series, the data from the gillespie simulations were aggregated over monthly intervals for the sir model. the infectious time series were aggregated over yearly intervals for the sis model approaching emergence because events did not always occur at a monthly frequency. thus, time series obtained from the sis system approaching emergence allowed us to examine the issues that arise from poorly sampled time series. to investigate the robustness of the early warning predictions over a moving window, i.e., as they would be used in online analysis of surveillance data, the influence of the slowly varying trend must be removed. the van kampen approach (appendices a and b) leads to a natural expression for the fluctuations from the quasi-stationary state nϕ(t), where ϕ(t) is determined by the mean field equations ( ) (sis) and ( ) (sir) in appendices a and b respectively. to obtain the fluctuations, we subtracted the current mean, which we assumed to be determined by the current state of the fast-slow system nϕ(t) , from the state of the system at the start of each year and divided this quantity by the square root of the population size. we refer to this as van kampen detrending. gaussian filtering is another, more common, method used to remove the influence of a slowly varying mean of a data series. to compare the performance of gaussian smoothing to van kampen detrending, for each time series, we fit a gaussian kernel smoothing function across the entire infectious case record up to the time that the transcritical bifurcation was predicted using a fixed bandwidth. lenton et al. ( ) have shown that the results obtained from applying the gaussian filter across the entire time series do not differ significantly from detrending within windows. to obtain the residuals, we subtracted the fit from each time series and divided by the square root of the population size to be consistent with van kampen detrending. the choice of bandwidth was informed by the resemblance of the gaussian residuals to the fluctuations obtained from the van kampen approach. to study the changes in the statistics up to the critical transition, we calculated the lag- autocorrelation and the variance of the fluctuations obtained using the two detrending methods over a moving window half the length of the time series. we calculated the lag- autocorrelation coefficient of each replicate using the acf function in r. the coefficient of variation (cv) was calculated by calculating, over a moving window, the mean and standard deviation of each infectious replicate. the median and % prediction intervals for each of the statistics were calculated over the replicates of each model. the prediction intervals were calculated using the quantile function in r. to quantify trends in each statistic for each replicate, we used kendall's correlation coefficient τ . to determine the distribution of kendall's τ , we calculated the coefficient for the trend in the test statistic for each realization. to assess the performance of the leading indicators, we followed the approach of boettiger and hastings ( ) and calculated receiver operating characteristic (roc) curves from the distribution of kendall's τ calculated from realizations of the models with and without transitions. the model without a transition is quasi-stationary, and we refer to it as the baseline or null model. the null models for elimination assume vaccination uptake p = and are simulated beginning from the deterministic equilibrium at p = . the null models for emergence assume vaccination uptake p = . and are simulated beginning from the deterministic equilibrium at p = . . the baseline models were simulated for the same length of time as it takes for the transition to be approached in the test models. the model with a transition is the test model. an roc curve enables investigation of the sensitivity of leading indicators to detect differences between quasi-stationary systems and those approaching a critical transition. we simulated the baseline models times and obtained fluctuations using the van kampen approach and from gaussian filtering. we then quantified the trend in each indicator using kendall's τ for each baseline simulation. in our results, indicator statistics typically exhibited increasing trends or no trend, but for those that decreased, we multiplied kendall's τ for each realization by - to calculate the roc curve. the area under the roc curve (auc) was also calculated. an auc close to one indicates near-perfect detection. the mean field fast-slow dynamics drive the transitions to elimination and emergence in stochastic sis models with gradual changes in transmission. figure a shows a solution y(t) of the fast-slow system ( ) with η = . the solution is the current "pool" of infectious individuals (the proportion infectious y(t)), not the number of new cases per time interval that comprise the traditional epidemic curve or recorded case reports of infectious individuals. at time t = , the limiting case model predicts elimination of the table pathogen. the limit case plotted in the figure is that arising from the → mean field theory, not the η = critical case. here, we note that the limit case → and the solution of the fast-slow system diverge from one another close to the proximity of the transcritical bifurcation of the η = limit case. figure b shows a typical realization of the stochastic counterpart of the fast-slow sis system with η = , assuming a change in vaccination uptake of p = / per year. the description of the stochastic process is outlined in table . the infectious population slowly declines and closely follows the limiting case → up until time t = before eventually diverging. the inset figure indicates that the infectious population remains elevated years after the transcritical bifurcation has occurred in the model with η = . therefore, the transition to elimination in this system is characterized by a slow change in the mean, followed by rapid fall off in incidence that precedes a very slow fadeout to extinction. on the other hand, if a disease is emerging due to small increases in transmission, or the level of vaccination uptake in the population is declining, solutions of the fast-slow system may exhibit a "delay" in emergence relative to the occurrence of the transcritical bifurcation. figure a shows the delay in emergence in a solution of eq. ( ) with p = / . at time t = , the limiting case model predicts emergence. again, the limit case indicated in the figure is the limit → of eq. ( ), not the η = critical case. figure b shows a typical realization of the stochastic counterpart of the solution shown in fig. a . the delay in the take-off of the epidemic is apparent. the rapid increase in the number of cases, is preceded by a low level of incidence long after the transition to endemicity has occurred. of course, the question is, does the low level of infection in the time series give any clue, through early warning signals, of the eventual rapid increase of infectious cohorts? the transitions to disease elimination and endemicity in the stochastic sir models with gradually changing vaccination uptake are also driven by the mean field fast-slow dynamics. to eradicate a disease, the effective reproduction number must be reduced by increasing the vaccination uptake p. figure a shows a solution of the fast-slow system ( ) with η = . the mean field → theory and the solution of eq. ( ) agree closely. the timing of the transcritical bifurcation in the η = model occurs at t = , and the number of infectious individuals has reached a low level by this time. figure b shows a stochastic realization of the sir model with immigration approaching elimination (the details of the stochastic process are described in table ). the infectious population exhibits amplified oscillations, unlike the deterministic trajectory in fig. a . demographic stochasticity is expected to excite the transient oscillations of the sir model (bauch and earn ) , and we will see this in detail later when we examine the power spectrum of the fluctuations. on the other hand, if a disease is emerging through an increase in recruitment into the susceptible class, the effective reproduction number is increasing. figure a shows a solution of the fast-slow system ( ) table of emergence. after the transcritical bifurcation, the solution of the system does not agree with the stable endemic equilibrium at time t until approximately t = years. the solution grows more slowly than the stable equilibrium. the delay in the dynamics is also exhibited by stochastic realizations, e.g., fig. b . outbreaks remain small and sporadic for at least years after the bifurcation has occurred. the delay before emergence arises because the mean predicted by system ( ) remains small following the bifurcation. however, the bifurcation delay is not as marked as that in the sis model (compare fig. with fig. ). power spectrum predictions figure compares the power spectra of the fluctuations in the stable models with and without immigration. the patterns in the power spectra with immigration (fig. c, d) and without (fig. a, b) are similar. for sis systems with low transmission rates, and sir models with low recruitment rate into the susceptible class, the power in the peak of the spectrum is low. the power spectrum for sis and table (emergence parameters) sir systems is expected to shift to lower frequencies as the pathogen approaches extinction. in the sis systems, the power at smallest frequencies increases as the critical point is approached, but this effect is subtle. in contrast, in the sir case, there is a dramatic reduction in the frequency at which the highest power is observed. the approach to endemicity is expected to be indicated by a shift to higher frequencies in the power spectrum. in the sis model, the spectrum encompasses a broader range of frequencies as emergence is approached. on the other hand, in subcritical sir limiting case → prediction for the power spectrum for the models with and without immigration. the predictions for η > agree closely with those generated from the η = models. in the sis case, the power at smallest frequencies increases as the critical point is approached, but this effect is subtle, whereas in the sir case, there is a dramatic reduction in the frequency at which the highest power is observed . each power spectrum is evaluated about the mean field infectious equilibrium ϕ * , given in tables and . the parameters used to calculate the power spectrum are given in table . importation, vaccination uptake, and population sizes were chosen according to the elimination scenario. the power spectrum has been log-transformed for clarity systems approaching endemicity, fluctuations should become less noisy as the threshold is approached. the power spectrum transforms from a flat "white-noise" spectrum in the subcritical scenario to a spectrum exhibiting resonant peaks for sufficiently decreased vaccination uptake. oscillations are expected to be amplified at these resonant frequencies. it has been shown that complex eigenvalues of the jacobian matrix of system ( ) guarantee the existence of the power spectrum peak (alonso et al. ). systems approaching a critical transition are expected to exhibit rising variance and rising autocorrelation (scheffer ). in general, the predictions for the leading indicators of sis disease elimination follow these expectations for one-dimensional stochastic systems. figure a -c shows the leading indicators of elimination for the sis model. as the threshold to pathogen extinction is approached by gradually increasing the vaccination uptake, all of the statistics evaluated about the endemic equilibrium of the stable system increase as the bifurcation parameter is increased, as expected from the analytical formulas for the limiting case (table ). the lag- autocorrelation rises, indicating an increase in system memory, and the fluctuation variance increases, as suggested by the plot of the power spectrum for increasing vaccination uptake values. the coefficient of variation also increases, indicating that infectious time series should become noisier as the transition is approached. the signal predictions for the sis systems with and without immigration in the limit case → (in green) are in close agreement, which makes sense from the analytical formulae, provided η is small. to examine how different changes in vaccination uptake p in the fast-slow model with immigration affect the statistics, we varied p by / and / year − . predictions are in close agreement far from the η = critical point but they differ close to it. the statistics evaluated about the endemic equilibrium the current state of the mean field fast-slow system are shown in blue (dotted) and black lines. the gray dashed line indicates the location of the critical point when η = . the theoretical predictions in the η > sis case are not as dramatic as one expects from the limit case but are qualitatively consistent with standard expectations, unlike the sir limit case predictions, which do not conform with standard expectations. the variance in the sir case is predicted to decline, which is not unexpected here, but does not conform to standard expectations. the coefficient of variation increases as the critical point is approached, but the increase is not as dramatic as one may expect from the limiting case. in contrast to expectations, the lag- autocorrelation is predicted to decline close to the critical point if η > . the predictions for the statistics evaluated about the current state of the fast-slow system differ from those evaluated about the endemic equilibrium. even if p = / year − , there are discrepancies in the predictions near the transcritical bifurcation point. the autocorrelation of the fluctuations with a lag of year is shown, and the variance shown is the fluctuation variance. all calculations used the parameter values in table sir leading indicator predictions leading indicators for sir systems approaching elimination ( fig. d-f ) are not always consistent with the standard expectations of increasing variance, increasing autocorrelation, and increasing coefficient of variation. in addition, the predictions for the leading indicators of sir disease elimination evaluated about the endemic equilibrium of the stable system behave differently to sis leading indicators, due to the presence of resonant frequencies. as the threshold to elimination is approached, the lag- autocorrelation increases with vaccination uptake, indicating an increase in system memory, but in contrast to the sis model, the fluctuation variance declines. we observe in fig. b , d that the area under the power spectrum declines as vaccination uptake increases, particularly in the model with η > , where the height of the peak of the power spectrum, which reflects amplification of fluctuations, declines close to the transition. the coefficient of variation also increases, indicating that infectious time series should become noisier as the transition is approached but does so more dramatically in the model without immigration, because the power spectrum peak height increases as the transcritical bifurcation is approached. to examine how different changes in vaccination uptake p affect the statistics, we varied p by / and / year − in the fast-slow model with immigration. predictions generated from the p = / model are in close agreement with predictions arising from the η > stable model. however, when vaccination uptake is increasing an order of magnitude faster (i.e., p = / ), there can be temporary fluctuations, including increases, in the variance (fig. e) . such a pattern is not predicted when we evaluate the statistics about the endemic equilibrium. generally, the analytical predictions for leading indicators for sis disease systems approaching elimination are robust over a moving window, and they perform well in comparison to statistics generated from the null stable models, as indicated by the roc curves in fig. . the roc curves lie above the black line, showing that the indicators behave better than chance in distinguishing between realizations that have been generated by the null and test models and the reported auc values are close to . furthermore, the trends in the statistics agree with the theoretically predicted trends (compare fig. a , c, e with fig. a-c) . however, the gaussian smoother may not be appropriate when the trend declines rapidly, as occurs for the sis system approaching elimination. various bandwidths were used, but while smaller bandwidths removed the slowly varying trend successfully, they did not capture the fluctuations relevant for critical slowing down. larger bandwidths captured these fluctuations but did not successfully remove the slowly varying trend from the last years of the time series, as can be seen in the plots. this problem with gaussian filtering has been noted before by dakos et al. ( ) . the theoretical predictions for leading indicators for sir disease systems approaching elimination are robust over a moving window. the trends in the statistics shown in figs. a, c, e, and a , c, e agree with the theoretical predictions ( fig. d-f) . notably, the median fluctuation variance for the system with p = / also exhibits more variability, as predicted theoretically. the roc analysis for the sir models indicates that the statistics from the test models perform well in comparison to statistics generated from the null stable models, but the fluctuation variance for the system with p = / is an exception. the performance is poor because the trend in the variance exhibits wider variation (fig. c) . finally, the gaussian filtering performs well in comparison to the van kampen detrending, because the mean of the sir model declines linearly and does not exhibit any rapid changes. as noted, the standard expectations for early warning signals of systems approaching a critical transition include increasing autocorrelation, increasing variance, and increasing coefficient of variation. prior to emergence, the theoretical predictions for the sis model generally agree with these standard expectations . for emerging sis diseases with immigration, the lag- autocorrelation is predicted to increase, indicating an increase in memory in the system. the variance is also predicted to increase, as suggested by the increasing area under the power spectrum in fig. c . however, the coefficient of variation is predicted to decline as p(t) decreases, indicating that the mean of the stable sis system rises more rapidly than the variance. the rapid rise in the mean can be seen in fig. a . however, the coefficient of variation is predicted to increase as the transition is approached if the statistic is evaluated about the solution of the fast-slow system. this is because the mean remains small long after the predicted critical transition (fig. a) . however, the limiting case theoretical predictions for the trends in the statistics for systems on the threshold of emergence do not always agree with the predictions for the corresponding fast-slow systems beyond the critical threshold for emergence. this finding results from the predicted mean given by fast-slow system being lower than that of the stable system in the → theory (c.f., fig. a ). due to the in agreement with standard theory, the variance and autocorrelation increase in the sir and sis systems as the bifurcation is approached (the peaks in the variance and cv predictions have been cut off for clarity). to aid comparison with fig. c , the inset plot of (e) shows a close-up of increasing variance prior to the transition in the sir case. however, the → theory predicts the cv declines on the approach to emergence. the effect of the bifurcation delay is also seen in the statistics predicted by the fast-slow theory. an order of magnitude difference in the change in vaccination uptake p affects the predictions; e.g., the cv in the sis model is predicted to rise before the transcritical bifurcation by the fast-slow theory. increases in autocorrelation (a) and in variance (b) can occur in this model after the transcritical bifurcation. the autocorrelation of the fluctuations with a lag of year (sis) and month (sir) is shown, and the variance shown is the fluctuation variance. all calculations used the parameter values in table ; the sir model calculations used the parameters in table scaled to rates per month rapid rise in the mean following the bifurcation delay, each statistic exhibits another increase following the transcritical bifurcation. in subcritical sir systems approaching endemicity, the theoretical predictions for the sir model agree with standard expectations, except for the coefficient of variation, which is predicted to decline (figs. d-f ). the limiting case theoretical predictions for the trends in the statistics for sir systems on the threshold of emergence agree more closely with the predictions for the fluctuations about the solution of the corresponding sir fast-slow system, in contrast to the sis model, because the bifurcation delay, although present, is not as marked as that in the sis model. figure d , e shows that the monthly lag- autocorrelation and the variance increase as the bifurcation is approached. it is not surprising that the variance increases because the power spectrum increases in power as the critical threshold for emergence is approached (fig. d) . however, the coefficient of variation is predicted to decline, i.e., the signal should become less noisy with decreasing vaccination uptake. in the context of the power spectrum result, this again makes sense because when the system transitions from subcritical to endemic, the power spectrum changes from a relatively flat shape to one with a peak about a resonant frequency. to examine how differences in vaccination uptake p affect the statistics, we varied p by / and / year − in the fast-slow model. the statistics predicted by the fast-slow theory do not change as rapidly as predicted in the → theory. finally, we did not calculate the early warning signals for the disease systems without immigration because the disease-free state is stable. in contrast to disease systems approaching elimination, the early warning signals for diseases on the verge of emergence do not perform well. no marked trends are present in the median statistics calculated from either of the van kampen or gaussian detrending approaches, with the exception that the variance of the sis and sir systems approaching emergence (figs. c, d and c , d) increases slightly. the % prediction intervals for each statistic further show that all the statistics are highly variable. moreover, the roc curves indicate that it may, in general, be difficult to distinguish between subcritical systems with r < and endemic disease systems with r > (figs. b, d, f and b, d, f) . the roc curves show that the distributions of kendall's correlation coefficient generated from stable systems and systems approaching a critical transition overlap significantly. however, we are conservative in our approach because we take only the years up to the critical transition in the absence of immigration, and not the years after, when a bifurcation delay may occur, during which time prevalence remains low and so the mean prevalence may not increase for a long time after the transition. therefore, it is not surprising that it is difficult to distinguish between the subcritical and supercritical systems. the transmission of infectious diseases is a paradigm example of a low-dimensional, noisy, nonlinear dynamical process. critical transitions in infectious disease systems, which correspond to such socially important events as the emergence of novel pathogens and the elimination of disease, are of special interest. the development of early warning signals of emerging infectious diseases and leading indicators for disease elimination, particularly, would be of tremendous value to the advance of public health. the goal of our study was to develop the theory of such early warning signals and leading indicators for infectious disease transmission systems that meet the assumptions of the familiar sis and sir models and which are forced through a critical transition by changes in transmission. our main result-analytical expressions for the change in observable statistics as the transmission system approaches the critical transition-are reported in table . b a d c f e fig. performance of the statistics over a moving window for the sis system approaching elimination, assuming that immigration occurs (η > ). panels a, c, and e show the median statistics (thick lines) and % prediction intervals (shaded regions). the lag- autocorrelation and variance have been calculated from the fluctuations obtained from van kampen and gaussian detrending. the coefficient of variation (cv) is marked in dashed green line to indicate that it was calculated from the raw time series, not the deviations from the mean. the dashed vertical line marks the time of the transcritical bifurcation in the → limiting case with η = . panels b, d, and f show the corresponding roc curves. the auc value indicates the area under the corresponding roc curve. all curves are above the black line, showing that the indicators behave better than chance in distinguishing between realizations that have been generated by the null and test models. however, the gaussian filtering may not be an appropriate method to obtain the fluctuations, because it does not remove the slowly varying trend close to the transition, as exhibited by the rapid rise in the indicators. all calculations used the parameter values in table . a bandwidth of years was chosen for the gaussian filtering because this theory depends on a sequence of approximations, we investigated the robustness of our results in a sequence of simulations. for the sis model, we found that the approach to elimination (fig. ) was indicated by an increase in the autocorrelation, variance, and the coefficient of variation, as predicted by the theory. for the sir model, the approach to elimination (figs. and ) was indicated by an increase in the autocorrelation, a decrease in the variance, and an increase in the coefficient of variation, as predicted by the theory. for the sis and sir models, the approach to emergence was indicated by an increase in variance, but the effects on the autocorrelation and the coefficient of variation were imperceptible (figs. and ) . further, since the theoretical patterns may be difficult to distinguish from random noise or fail to provide a sufficiently reliable signal under realistic conditions for data collection, we also examined the suitability of these statistics as an online algorithm for early warning. following boettiger and hastings ( ) , we summarized the results of online analysis using the receiver operator characteristic. these results showed that the approach to elimination in both the sis model (fig. ) and the sir model (figs. and ) could be detected with a high level of reliability (large auc), although variance was a much poorer indicator in the sir model than autocorrelation and the coefficient of variation. however, the approach to emergence was very difficult to detect in both the sis model and the sir model, with estimated auc values hovering just above the null value of . (figs. and ) . in summary, our simulation studies show the approximations required to obtain the theoretical predictions to be acceptable but indicate that reliable prediction is likely only in the case of leading indicators for elimination, not early warning signals of emergence. besides the practical goal of indicating patterns that function as early warning signals and leading indicators, our study also provides some basic insights into the dynamics of infectious diseases. particularly, our simulations illustrated the ubiquity of bifurcation delays-changes in system state that lag behind the bifurcation of the limiting case-during disease emergence and elimination. delays are of fundamental interest because they provide a mechanism for the frequent realization of far-from-equilibrium situations and because they are a dramatic example of tipping point phenomena in infectious diseases. indeed, delays can cause a system to appear to undergo a catastrophic shift (e.g., the b a d c f e fig. performance of the statistics over a moving window for the sir system approaching elimination, assuming that immigration occurs (η > ). panels a, c, and e show the median statistics (thick lines) and % prediction intervals (shaded regions). the lag- autocorrelation and variance have been calculated from the fluctuations obtained from van kampen and gaussian detrending. the coefficient of variation is marked in green line to indicate that it was calculated from the raw time series, not deviations from the mean. the dashed vertical line marks the time of the transcritical bifurcation in the → limiting case with η = . panels b, d, and f show the roc curves. the auc value indicates the area under the corresponding roc curve. all curves are above the black line, showing that the indicators behave better than chance in distinguishing between realizations that have been generated by the null and test models. a bandwidth of years was chosen for the gaussian filtering. all calculations used the parameter values in table b a d c f e fig. performance of the statistics over a moving window for the sir system rapidly approaching elimination, assuming that immigration occurs (η > ). panels a, c, and e show the median statistics (thick lines) and % prediction intervals (shaded regions). the lag- autocorrelation and variance have been calculated from the fluctuations obtained from the van kampen and gaussian detrending. the coefficient of variation (cv) is marked in dashed green line to indicate that it was calculated from the raw time series, not deviations from the mean. the dashed vertical line marks the time of the transcritical bifurcation in the → limiting case with η = . panels b, d, and f show the roc curves. the auc value indicates the area under the corresponding roc curve. all curves are above the black line, showing that the indicators behave better than chance in distinguishing between realizations that have been generated by the null and test models. however, the variance performs poorly compared with the lag- autocorrelation and cv. a bandwidth of years was chosen for the gaussian filtering. all calculations used the parameter values in table except for p = / year − dramatic upturn of infectious cases associated with emergence) when, in fact, only a noncatastrophic transcritical bifurcation has occurred. our results show that two kinds of delays occur in disease emergence and elimination. the first delay is a deterministic phenomenon referred to as a canard (diener ) . canards occur in fast-slow systems where the solution follows an attracting slow manifold of the fast-slow system, passing close to a bifurcation point, and then follows a repelling slow manifold for a considerable period of time. thus, for instance, in fig. a , which depicts the emergence of an sis infection, the → limit case branch rapidly increases close to the η = critical point, but the solution of the fast-slow system ( > ) tracks the disease-free equilibrium for a noticeable amount of time even after it has become unstable. the influence that the canard has on the stochastic dynamics can be observed in fig. b , where the number of infectious individuals remains in the vicinity of the precritical level for a noticeable period of time. a related phenomenon, but not so dramatic, is observed in the elimination case (fig. a) . following the bifurcation, the infectious population remains elevated, but unlike the emergence case, no dramatic shift to extinction occurs. analogous phenomena are observed in the sir model (figs. and ) . the second class of delays is a stochastic phenomenon, caused by the nonzero time interval that occurs between the time at which the critical point is reached and the time of the event that initiates the transition (i.e., the index case of a major outbreak). this phenomenon is most obvious in the emergence of an immunizing pathogen (the emergence scenario for the sir model). in this case, the number of infectious individuals in the population prior to the critical transition is most commonly zero. assuming that there are no infectious individuals in the population at the time of the bifurcation event, then the number of infections in the population will remain zero until the time that an imported case arises. in view of the assumption of demographic stochasticity, importation is a poisson process with nonzero rate parameter, which similarly ensures that a nonzero period of time must elapse before the first case appears. indeed, even after the first case appears, there is no guarantee that the system will undergo transition, since even b a d c f e fig. performance of the statistics over a moving window for the sis system approaching endemicity, assuming that immigration occurs (η > ). panels a, c, and e show the median statistics (thick lines) and % prediction intervals (shaded regions). the lag- autocorrelation and variance have been calculated from the fluctuations obtained from the van kampen detrending and gaussian filtering. the coefficient of variation is marked in green (dashed) line to indicate that it was calculated from the raw time series, not deviations from the mean. the dashed vertical line marks the time of the transcritical bifurcation. panels b, d, and f shows the roc curves. the auc value indicates the area under the corresponding roc curve. there are no marked trends in the median statistics, although the variance exhibits a slight increasing trend. the roc curves show that the distributions of kendall's τ overlap greatly, suggesting that it is difficult to distinguish between a quasi-stationary system and one approaching an emergence threshold. due to the bifurcation delay, it is not surprising that it is difficult to distinguish between null and test replicates. a bandwidth of years was selected for the gaussian filtering. all calculations used the parameter values in table in a supercritical population, there is some chance of rapid extinction (allen ). when the system is only slightly supercritical, this chance may not be small. for measurement purposes, we might define this stochastic bifurcation delay as the time elapsed between the occurrence of the bifurcation and the final occurrence of zero infectious individuals in the population. clearly, this quantity is a random variable. the distributional properties of this stochastic bifurcation delay are an important problem for further research. delays are also important for the immediate practical reason that their existence implies that the state of the system, i.e., the presence or absence of disease, cannot be taken as reliable evidence concerning whether the system is subcritical or supercritical. in the case of disease emergence, a system may have developed to the supercritical state-a significant concern for public health-although no cases have occurred. in such a scenario, any introduction of infectious individuals may spark a major outbreak. in the case of disease elimination, a system may already exist in the subcritical state, with only lingering infections and short transmission chains enabling the pathogen to transiently persist. in such cases, the battle has already been won, but it is crucial that vaccination campaigns and other elimination activities be continued until the remaining lines of transmission are snuffed out. otherwise, gains that may have come at great cost will fail to be maintained, and the pathogen may resurge in the population, sparked by its own embers still smoldering in the face of relaxed intervention. the development of statistical methods to identify such situations is an urgent problem for further research. limiting case predictions can sometimes be misleading predictions for the statistics derived from the limiting case (η = ) version of the sir model can sometimes be misleading. as the extinction threshold is approached, the predicted sir (η > ) lag- autocorrelation increases as expected, but this phenomenon gives way to a decrease close to the η = critical point (fig. d) . this finding is not so surprising when one recalls that the single biologically feasible equilibrium in the η > model changes from a stable spiral to a stable node near the critical point as in addition, the roc curves indicate that the distributions of kendall's τ overlap greatly, suggesting that it is difficult to reliably distinguish between a stable system and one undergoing a critical transition using the lag- autocorrelation coefficient. the coefficient of variation also performs poorly, as predicted theoretically. due to the bifurcation delay, it is not surprising that it is difficult to distinguish between null and test replicates. a bandwidth of months was selected for the gaussian filtering. all calculations used the parameter values in table vaccination is increased. the magnitude of the real part of the eigenvalue, which approximates the rate of decay of the amplitude of the oscillations to the equilibrium, decreases as elimination is approached, but, near the threshold, the magnitude of the real part begins to increase once more, indicating a shorter return time to equilibrium and increased resistance of the system to perturbations, as the spiral begins to resemble a node. this subsequent decline in system memory is reflected in the decrease in the autocorrelation function near the critical point. the more rapid decay of the oscillations is also reflected in the power spectrum, in that the the spectrum peak declines as vaccination uptake increases, meaning that the resonant frequencies become less amplified. on the other hand, the recovery rate of the stable spiral in the limiting case decreases much more dramatically on the transition to elimination, and oscillations at the resonant frequencies become more amplified. thus, while the limiting case is an important guide in understanding more general disease models, it is also necessary to thoroughly investigate these models to understand the behavior of systems approaching a critical transition. this study has followed kuehn ( ) and boettiger and hastings ( ) in the use of stochastic differential equations to understand noise-induced phenomena that occur during a critical transition. it is understood, of course, that for discrete systems (such as infectious disease transmission), this continuum representation is only an approximation, and, moreover, that the approximation breaks down as the system size-or even if the size of just one important compartment-becomes small. the case of disease emergence, where the number of infectious individuals in a population prior to the critical transition is typically zero and only occasionally one or more, is precisely this situation. it would not be surprising, therefore, to discover the approach taken in this paper to perform poorly in this situation. but, this is not the case. as figs. and show, in both the sis and sir systems, the "measured" statistics obtained in simulation are qualitatively similar to the analytic expressions reported in table , with the exception of the coefficient of variation in the sir model. we note, further, that although emergence was difficult to predict (auc values were not much greater than . ), the van kampen detrending, which we think of as the theorydependent approach, typically performed better than generic gaussian detrending, which may be computed without any theoretical assumptions. it is, nevertheless, a concern that the approach taken here will be misleading under some scenarios: for instance, a very slow approach to the critical transition, when the time between immigration events may be large and the van kampen approximation is likely to break down. moreover, the stochastic features of bifurcation delay, particularly important in the emergence scenarios, depend heavily on (a) the sequence of immigration events by infectious individuals, and (b) the probability of nonextinction for a transmission chain initiated by an infectious individual after the critical point has been passed. these are both intrinsically discrete aspects of the emergence process. for this reason, we suggest that the study of (nonstationary) discrete transmission models may be a fruitful direction for further study. there is, of course, a large body of work on (stationary) discrete contagion models on which such a theory could build (bailey ; renshaw ; daley and gani ; allen ) . in conclusion, early warning systems for emerging infectious diseases and leading indicators of elimination, which would be of tremendous benefit to society, now appear to be potentially achievable. their realization depends on two key developments: ( ) better understanding of noiseinduced phenomena exhibited by nonlinear contagion processes in the vicinity of their critical points, and ( ) surveillance methods that acquire data of sufficiently high accuracy, resolution, and timeliness. toward the first of these goals, our studies have indicated that anticipation of critical transitions is theoretically possible. it is hoped that these results, which include expressions for the key observable phenomena (table ) , may help to guide achievement of the second, which is now the most important outstanding problem, i.e., empirical demonstration in experimental or surveillance data. the deterministic system describes the evolution of the trend or the location of the peak of the probability distribution of the infectious state i at a given time t. the variable ϕ(t) is defined as the proportion i (t)/n in the limit n → ∞. the fokker-planck equation describes the evolution of the fluctuations, i.e., the standard deviation of the probability distribution about nϕ(t). since the fokker-planck equation is linear, the solution for the probability distribution of the fluctuations (ζ, t) in the deterministic state is a normal distribution. we note that this is a result of our second-order truncation, not a derivative property of the original model, per se. our aim is to quantify the fluctuations in the quasistationary state given by the solution of the sis fast-slow system at time t. we can do this by using the fact that eq. ( ) is equivalent to the following stochastic differential equation: ( ) where dw is a wiener process with mean zero and variance dt (gardiner ) . analytical analysis of eq. ( ) enables us to establish the quasi-stationary statistics that potentially could be used as leading indicators of a critical transition in sis infectious disease systems. as a special case, we consider the limit → , whereby the quasi-stationary state is given by the deterministic endemic equilibrium of eq. ( ), ϕ * (defined in table ). since eq. ( ) dictates the evolution of the deterministic state and eq. ( ) determines fluctuations in it, we may let ϕ(t) = ϕ * in eq. ( ), and therefore, eq. ( ) becomes analysis of eq. ( ) leads to expressions for the fluctuation statistics, whereby the influence of the mean has table variable substitutions for the sis model. to calculate the statistics of the fluctuations about the deterministic endemic equilibrium ϕ * , replace ϕ(t) with ϕ * in each variable variable expression β β ( − (p s + p t)) ϕ * (β − γ − η + (β − (γ + η) + βη))/ β λ β − βϕ(t) − γ − η q βϕ(t)( − ϕ(t)) + η( − ϕ(t)) + γ ϕ(t) note that β is constant in a small time increment and is evaluated at the current time t already been removed. to analyze eq. ( ), it is easier to rewrite it as follows: where λ and q are defined in table in the main text and (t) denotes gaussian white noise. fourier transformation of eq. ( ) yields iζ (ω) = λζ (ω) + qγ (ω), whereζ (ω) is the fourier transform of ζ . the definition of the fourier transform of a continuous function is provided in nisbet and gurney ( ) , and those authors extensively discuss the meaning of the fourier transform of white noise. rearranging eq. ( ), we obtaiñ the quasi-stationary power spectrum of the fluctuations is given by the long-term average of the square of the magnitude ofζ (ω). the power spectrum is (nisbet and gurney ) s i (ω) = |ζ (ω) | = q ω + λ , where . is the expectation. we can integrate this expression over the frequency domain to obtain the quasi-stationary variance, the quasi-stationary autocorrelation is given by the integral π ∞ s i (ω) cos(ωτ )dω = exp(λτ ). finally, we can use eq. ( ) to obtain an expression for the coefficient of variation of i (t). since ζ ∼ normal( , σ ), where the variance σ is given by eq. ( ), then i (t) = nϕ(t) + ζ n ∼ normal(nϕ(t), nσ ). therefore, the theoretical coefficient of variation is (n − σ )/ϕ(t). to make analytical progress with eq. ( ) in the main text, we use the van kampen system size expansion. we approximate the number of susceptibles and the number of infectives by respectively. thus, we have transformed the discrete variables in terms of continuous stochastic variables σ and ζ , and, as in appendix a, we anticipate that the probability distribution will depend on the system size n. the van kampen system size expansion has been applied to sir models previously (alonso et al. ) , and so, we do not include the full details of the expansion here. using the approximations ( ) to leading order, the system size expansion gives rise to the systeṁ which is simply the sir system approaching a transition ( ). at next-to-leading order, the fokker-planck equation for the fluctuations in the solution of the system ( ) at time t is where x = (σ (t), ζ (t)) denotes the vector of fluctuations from the susceptible and infectious states, respectively. the mean vector is given by μ i (x) = a i σ (t)+a i ζ(t), i = , . the matrices a and d are obtained from the system size expansion. the coefficients a ij of the mean vector are the entries of the jacobian matrix obtained from linear stability analysis. the d ij coefficients are the entries of the noisecovariance matrix and can only be obtained from the system size expansion (van kampen ) . equation ( ) describes the evolution of the trend, and the fokker-planck equation models the evolution of the fluctuations. the solution of eq. ( ) is a bivariate normal distribution since the fokker-planck equation is linear (van kampen ). we note that when incidence of a disease is low, as occurs when the system is subcritical, the assumption that the fluctuations are normally distributed about the mean is likely to not be appropriate because the approximation ( ) cannot describe chance extinctions, which can occur if i is small. finally, it is possible to numerically obtain the variance-covariance matrix for the fluctuations in s and i around the values of the solution of eq. ( ), but, in this paper, we are interested only in fluctuations in i since s is unobservable. to quantify the fluctuations, we use the fact that the following system of stochastic differential equations is equivalent to eq. ( ): dσ dt = a σ (t) + a ζ(t) + (t) dζ dt = a σ (t) + a ζ(t) + (t), where (t), (t) are white-noise processes with covariance matrix d. the entries of the matrices a and d are included in table in the main text. if the mean field → theory applies, the coefficients a ij are approximately the entries in the jacobian matrix of eq. ( ) evaluated at the since we could not obtain an analytical expression for the solution of eq. ( ), we had to obtain this integral numerically. finally, an expression for the coefficient of variation of i may also be obtained, using eq. ( ). since ζ ∼ normal( , σ ), where the variance σ is given by eq. ( ), then i (t) ∼ normal(nϕ(t), nσ ). therefore, the theoretical coefficient of variation is (n − σ )/ϕ(t). an introduction to stochastic processes with applications to biology stochastic amplification in epidemics seasonality and the dynamics of infectious diseases infectious diseases of humans: dynamics and control the elements of stochastic processes with applications to the natural sciences quantifying limits to detection of early warning for critical transitions interacting regime shifts in ecosystems: implication for early warnings rising variance: a leading indicator of ecological transition early warnings of unknown nonlinear shifts: a nonparametric approach leading indicators of trophic cascades malaria resurgence: a systematic review and assessment of its causes slowing down as an early warning signal for abrupt climate change spatial correlation as leading indicator of catastrophic shifts robustness of variance and autocorrelation as indicators of critical slowing down methods for detecting early warnings of critical transitions in time series illustrated using simulated ecological data epidemic modelling: an introduction the canard unchained or how fast/slow systems bifurcate early warnings of catastrophic shifts in ecosystems: comparison between spatial and temporal indicators measuring tuberculosis burden, trends, and the impact of control programmes seasonal infectious disease epidemiology handbook of stochastic methods for physics, chemistry and the natural sciences exact stochastic simulation of coupled chemical reactions changing skewness: an early warning signal of regime shifts in ecological systems spatial variance and spatial skewness: leading indicators of regime shifts in spatial ecological systems regime shifts in ecological systems can occur with no warning perspectives on the basic reproductive ratio detecting dynamical changes in nonlinear time series using locally linear state-space models estimating spatial coupling in epidemiological systems: a mechanistic approach modeling infectious diseases in humans and animals a mathematical framework for critical transitions: bifurcations, fastslow systems and stochastic dynamics early warning of climate tipping points early warning of climate tipping points from critical slowing down: comparing methods to improve robustness epidemic dynamics at the human-animal interface stochastic models in population biology and their deterministic analogs measles: the burden of preventable deaths preparing for the next pandemic modelling biological populations in space and time the decline and resurgence of pertussis in the us the interplay between determinism and stochasticity in childhood infectious diseases assessing the economic impact and costs of flu pandemic originating in asia critical transitions in nature and society early warning signals for critical transitions anticipating critical transitions heteroscedasticity as a leading indicator of ecological regime shifts assessment of the global measles mortality reduction goal: results from a model of surveillance data the economywide impact of pandemic influenza on the uk: a computable general equilibrium modelling experiment nonlinear dynamics and chaos with applications to physics, biology, chemistry and engineering who (world health organization) ( ) who malaria report acknowledgments this research was funded by a grant from the james s. mcdonnell foundation.open access this article is distributed under the terms of the creative commons attribution license which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. to make the description of the probability distribution given by eq. ( ) in the main text more amenable to analytical analysis, we approximate the discrete infectious state random variable by a continuous random variable ζ ,we are anticipating that the probability distribution will depend on the population size n. we do not assume what equation ϕ(t) satisfies, but we will see later that it is equal to the proportion of infectious individuals at time t, in the limit of infinitely large population size n; the system size expansion will give rise to the mean field theory discussed in the "mean field theory of sir model" section. equation ( ) says that fluctuations (denoted by ζ ) about the deterministic state i/n are expected to be of the order of n − / , which is expected from the central limit theorem, and so the fluctuations in i are of the order n / (van kampen ). if the population size n is large, we intuitively expect fluctuations to be small. consequently, the probability distribution for the system being in a certain state at time t is expected to have a peak located at the deterministic value nϕ(t) and a standard deviation of the order n / . the approximation ( ) turns out to be a normal random variable.the system size expansion is described in detail by the van kampen (van kampen ) and has been applied to one-dimensional population biology models (mckane and newman ) . therefore, we do not include the full details of the expansion of eq. ( ) in this paper. in summary, there are five essential steps: ( ) write down transition rates (e.g., table ); ( ) derive the master equation for the markov process; ( ) rewrite the master equation in terms of jump functions; ( ) substitute the ansatz ( ) into the master equation and perform the expansion; and ( ) collect deterministic (leading order) and stochastic (next-to-leading order) terms to obtain the deterministic system and its stochastic correction.rewriting the master equation as described in detail in (van kampen ) and performing the system size expansion using eq. ( ), at leading order, give rise to the differential equation that ϕ(t) satisfiesthis is simply the fast-slow system ( ). at the next-toleading order, the method leads to a linear fokker-planck equation for the distribution of the fluctuations about the solution ϕ(t), note that p is assumed to be constant in a small time increment and is evaluated at the current time t. other symbols in the table:endemic equilibrium. to analyze the system ( ), we take its fourier transform, leading towhereσ (ω),ζ (ω),˜ (ω) and˜ (ω) are the fourier transforms of σ , ζ , , and , respectively. since we are interested in the fluctuations about the infectious state, we solve eq. ( ) for the fourier transformζ (ω), obtainingwhere t and d are the trace and determinant of the jacobian matrix, respectively, given in tables . using eq. ( ), we can establish the power spectrum of the fluctuations |ζ (ω) | ,where α i is given in table . the variance and autocorrelation of the fluctuations in i (t) around the solution of system ( ) may be obtained using eq. ( ) through integration. the variance of the fluctuations isthe lag-τ autocorrelation is given by the integral π ∞ s i (ω) cos(ωτ )dω. key: cord- - fob ax authors: hasegawa, takehisa; nemoto, koji title: outbreaks in susceptible-infected-removed epidemics with multiple seeds date: - - journal: phys rev e doi: . /physreve. . sha: doc_id: cord_uid: fob ax we study a susceptible-infected-removed (sir) model with multiple seeds on a regular random graph. many researchers have studied the epidemic threshold of epidemic models above which a global outbreak can occur, starting from an infinitesimal fraction of seeds. however, there have been few studies of epidemic models with finite fractions of seeds. the aim of this paper is to clarify what happens in phase transitions in such cases. the sir model in networks exhibits two percolation transitions. we derive the percolation transition points for the sir model with multiple seeds to show that as the infection rate increases epidemic clusters generated from each seed percolate before a single seed can induce a global outbreak. the threat of infectious disease is becoming increasingly conspicuous for modern society, wherein there is a large amount of international travel all over the world. understanding how infectious diseases spread in our society is crucial to the development of strategies for disease control. a mathematical model of infectious disease, called the susceptible-infectedremoved (sir) model, was first applied with the assumption of a well-mixed population for computation of the final numbers of infected and eventually removed (or recovered) individuals [ ] . so far, many mathematical models of infectious diseases have been proposed for understanding the spread of epidemics and proposing strategies for disease control [ ] . in recent years, many studies have been devoted to epidemic models with a network structure of people [ ] . diseases spread over the networks of physical contacts between individuals, and the structure of real networks [ ] [ ] [ ] [ ] has crucial effects on this spread. for example, moreno et al. [ ] studied the sir model in a scale-free network having a degree distribution of p k ∝ k −γ using a degree-based mean-field approach. their approximation clarified that epidemics can spread over the network for any infection rate if γ . in addition, many analytical approaches for epidemic models with network structures, such as the approximation onto a bond percolation problem [ , ] , the edge-based compartment model [ ] , the effective degree approach [ ] , and the pair approximation [ , ] , have been proposed and have succeeded in describing epidemic dynamics. numerical simulations have revealed how epidemics spread in more realistic situations. also, several strategies for disease control have been proposed on the basis of the knowledge of epidemics on networks, e.g., target immunization [ , ] , acquaintance immunization [ ] [ ] [ ] [ ] , and graph-partitioning immunization [ ] . most previous studies using sir-type epidemic models have assumed that the fraction of infection seeds is infinitesimally small. in contrast, there have been few studies on epidemic models with finite fractions of seeds. miller [ ] considered the sir model in networks with large initial * takehisa.hasegawa.sci@vc.ibaraki.ac.jp † nemoto@statphys.sci.hokudai.ac.jp conditions to resolve an apparent paradox in works assuming an infinitesimal fraction of seeds. hu et al. [ ] numerically studied how the positions of multiple seeds in a network affect spreading behavior. ji et al. [ ] identified multiple influential spreaders in real networks by ranking nodes in disintegrated networks after random bond removals. what we discuss here is a more fundamental, but almost overlooked problem: how do epidemic models with finite fractions of seeds undergo phase transitions? for sir-type epidemics, each infection seed creates an epidemic cluster of infected individuals. epidemic clusters generated by multiple seeds will have global connectivity in some parameter regions even though each seed may not have the potential to induce a global outbreak there. in this paper, we consider the sir model in networks with multiple seeds. in this case, the sir model exhibits a kind of percolation transition. an epidemic cluster grows from each of multiple seeds. we regard the clusters so generated as supernodes and study the percolation problem of these supernodes. indeed, we can analytically and numerically obtain the percolation transition point of supernodes to show a gap between this transition point and the epidemic threshold. the existence of this gap indicates that the percolation transition of epidemic clusters occurs before a single seed can induce a global outbreak. our result also shows the sensitivity of the seed fraction on percolation transition points, i.e., that a small seed fraction drastically reduces the critical infection rate for the emergence of the infinite epidemic cluster. let us give a brief review of the sir model in a given static network. each node in the network takes one of three states: susceptible, infected, and removed. the system evolves as a continuous-time markov process. as an initial-state configuration, a fraction, ρ, of the nodes is randomly chosen to be seeds and is initially infected, while other nodes are susceptible. the infection rate is denoted by λ. when an infected node is adjacent to a susceptible node, this susceptible node gets infected with probability λ t within a short time, t. note that this probability is independently given by each of the infected nodes so that the total infection rate at a node is just proportional to the number of infected neighbors. an infected node becomes removed at a rate μ, i.e., with probability μ t within a short time t, irrespective of the neighbors' states. without loss of generality, we set μ = unless otherwise specified. the dynamics stops when no infected nodes exist in the network. let us consider the limit ρ → . the sir model exhibits a phase transition at the epidemic threshold λ = λ sir c when λ increases from . above λ sir c , a single seed can induce global outbreaks. in a global outbreak, a nonzero fraction of nodes become infected and eventually removed. below λ sir c , the number of removed nodes is always negligible compared with the total number of nodes. as already mentioned, we have several approaches for obtaining λ sir c (see the recent review [ ] ); newman approximated the sir model in uncorrelated networks by mapping onto a bond percolation problem (which is called the sir model with transmissibility) [ ] and derived λ sir where k and k are the first and the second moments of the degree distribution, p k , respectively [ ] . this result indicates that, for a fat-tailed scale-free network whose degree distribution obeys p k ∝ k −γ with γ , a global outbreak starting from an infinitesimal fraction of seeds occurs even for an infinitesimal infection rate. as indicated in [ ] , mapping onto a bond percolation problem does not give the exact outbreak size or probability, but it does predict exactly the epidemic threshold. lindquist et al. [ ] proposed an effective degree approach for describing the time evolution of the sir dynamics using numerous ordinary differential equations and derived the same epidemic threshold, ( ). miller [ ] introduced another approach by means of the edge-based compartment model to enable accurate descriptions of the sir dynamics accurately with a few rate equations. we can also describe the phase transition of the present model in terms of percolation. in any final state, each node takes either a susceptible or a removed state. we call the connected components of removed nodes and susceptible nodes the r components and the s components, respectively. for the sir model in networks with ρ , we have two percolation transition points, λ c and λ c . when the number of nodes, n , is much greater than , the mean fraction of the largest r component, r max (n ) = r max (n)/n, where r max (n ) is the mean size of the largest r component, changes from to a nonzero value at the former point λ c . note that λ c corresponds to λ sir c in the limit ρ → by definition. percolation analysis of an epidemic cluster starting from a single seed has been used for numerical computations of the epidemic threshold and critical properties [ , ] . the latter point, λ c , is on the percolation of the s component (also called the residual graph [ , ] ) and is usually larger than λ c . above λ c , the network remaining after removal of the r components is disintegrated such that the sizes of all remaining components are finite. in other words, the mean fraction of the largest s component, s max (n ) = s max (n )/n, where s max (n) is the mean largest s-component size, is (nonzero) for λ > λ c (λ < λ c ) when n . whether the susceptible nodes are globally connected is important because a second epidemic spread may occur in the remaining network [ , ] . in [ ] , newman analyzed this second transition point of the sir model with transmissibility in uncorrelated networks with ρ → to show that the transition point is positive even when γ . valdez et al. [ ] proposed a new strategy for suppressing epidemics by regarding this second transition point as a measure of the efficiency of a mitigation or control strategy. if we regard the present model as showing the propagation of an attack against a network, such as a computer virus, λ c is a measure of the robustness of networks against such attacks [ , ] . konno and the authors numerically studied λ c for correlated networks to show that any positive or negative degree correlation makes networks more robust [ ] . to summarize, the system with a given value of ρ has the following three regions: (i) the s-dominant phase, where r max = and s max > for λ < λ c ; (ii) the coexisting phase, where r max > and s max > for λ c < λ < λ c ; and (iii) the r-dominant phase, where r max > and s max = for λ > λ c . to investigate in detail the phase transitions of the sir model with a finite fraction of seeds, we focus on the z-regular random graph (rrg). our formulations discussed below are for the rrg. the extension to degree-uncorrelated networks having degree distribution p(k) may be straightforward, although its execution will be cumbersome. at any rate, our findings obtained from the rrg probably will be in common with other networks. in sec. iv d, we numerically study the outbreaks induced by multiple seeds in finite-dimensional euclidean lattices. to evaluate the time evolution of the sir dynamics and the total densities of the susceptible and removed nodes in the final states, we consider the approximate master equations (ames) [ , ] . let s l,m (t), i l,m (t), and r l,m (t) be the fractions of nodes that are susceptible, infected, and removed, respectively, at time t and have l susceptible and m infected neighbors. the ames for the evolution of these variables are as follows (see [ ] for details): the transition rates of neighboring nodes are approximated as where the summations run over all l + m k. to describe the sir dynamics with ρ > , we set the initial condition as by numerical evaluation of the above equations, we obtain the total densities which satisfy the conservation law, at any time t. note that all variables other than s l, and r l, vanish in the limit t → ∞, and therefore i(∞) = . to check the accuracy of the ame, we perform monte carlo simulations for the sir model on the rrg with z = . in our simulations, we set μ = and ρ = . . the numbers of nodes are n = , , , and . the number of graph realizations is , and the number of trials on each graph is . figure (a) shows the ame result (line) and the monte carlo result (symbols) of the total densities of susceptible and removed nodes, s and r, in the final states. we find that data from the ames wholly coincide with those from the monte carlo simulations. equations ( )-( ) do not predict any transition point for ρ > because r(∞) ρ > , although it is possible to derive the epidemic threshold λ sir c = μ/(z − ) for the rrg with degree z [ ] by considering the limit ρ → (see appendix a). in contrast, the monte carlo simulations suggest that the model actually exhibits phase transitions. in fig. (b) , we plot the r and s susceptibilities (which we call the susceptibility by analogy with the magnetic susceptibility in spin systems) χ r and χ s . here, χ r (χ s ) is defined as the mean size of all r components (s components) except the largest one [ ] . we find that χ r and χ s have peaks at λ c and λ c , respectively, implying two phase transitions. moreover, these points are clearly different from λ sir c . in particular, the gap between λ c and λ sir c indicates that as the infection rate increases, the epidemic clusters generated from each seed percolate before a single seed can induce a global outbreak. in the next section, we derive these percolation transition points for < ρ < . to derive λ c , we consider the percolation of the s components. in [ ] , newman analyzed the percolation of the s components using generating functions. his method gives λ c for the sir model in uncorrelated networks but assumes a single seed. by combining the ames and newman's method, we obtain s max and λ c for the case with < ρ < . let us consider the s components in a typical final state for the sir model on an infinitely large rrg with ρ > . in the previous section, we already have the probability s l, (∞) that a randomly chosen node is susceptible and has l susceptible neighbors [s l,m (∞) = for m = ]. using s l, (∞), we obtain the degree distribution of the s components as where the denominator s = k l= s l, is the prior probability of being susceptible. the corresponding generating function, f s (x), is given by here we assume that this subnetwork is degree uncorrelated. we consider the excess degree, which is the degree of the node reached by following a randomly chosen link minus [ ] . the excess degree distribution, q s l , which is the probability that a randomly chosen link from s components points to a (susceptible) node with excess degree l, is (l + )p s l+ / l (l + )p s l + . then the generating function for the excess degree distribution of the s components is and the mean excess degree is given by f s ( ). by arguing the emergence of an infinitely connected component of this subnetwork (similar to [ ] ), we easily find that there is an infinite s component if f s ( ) > , and thus, the percolation transition point of the s component, λ c , satisfies following [ ] , we also have the mean fraction of the largest s component, s max , as where v is the solution of we check these estimates using monte carlo simulations. figure (a) shows the order parameter, i.e., the fraction of the largest s component, s max (n ), for rrgs with several n 's. we find that the numerical results coincide with the analytical line below λ c and tend to with increasing n above λ c . to numerically obtain the transition point, λ c , we introduce the fractal exponent [ ] . the fractal exponent of the largest s component is defined and approximated as in the limit n → ∞, ψ s = for λ < λ c and ψ s = for λ > λ c , because the largest s-component size should be proportional to n for λ < λ c and finite for λ > λ c . as shown in fig. (b) , numerical results for ψ s approach ψ s = (ψ s = ) for λ < λ c (λ > λ c ) as n increases and have a crossing point at λ c . from numerical data, we have ψ c s ≡ ψ s (λ c ) / at λ c . , which coincides well with our analytical estimate (vertical line in fig. ) . this observation is also confirmed by a finite-size scaling. as in [ ] , we assume a scaling form for s max (n ) as where λ = |λ c − λ|, β is the critical exponent related to the order parameter, s max ∝ λ β , and in fig. (c) , our scaling shows a nice collapse with ψ c s = / and β = . because ψ c s is related to another critical exponent τ as ψ c s = (τ − ) − [ ] , where τ is associated with the distribution function of s components n s (s) at the critical point, n s (s) ∝ s −τ , these exponents mean that the percolation transition of the s components actually occurs at λ c and belongs to the mean-field universality class such that β = and τ = / [ ] . to derive λ c for the sir model with multiple seeds, we need to calculate the connectivity of the r components generated by each seed. we should note that a percolation analysis, as in the previous subsection, using the degree distribution of the r components r , /r is not applicable to this case, because such an analysis ignores the condition that each r component is connected. to consider the connectivity of numerous r components, we use the following procedure: (i) we first calculate the probability, p n , that the size of the r component generated by a single seed is n. (ii) for the case of ρ > , the system has numerous r components proportional to ρ. we regard each r component as a supernode. the number of nodes confined in a supernode obeys the distribution p n , and its degree k n is given accordingly. (iii) then we consider a site percolation problem of supernodes. the first percolation point, λ c , is given as the critical point where the infinitely connected component of the supernodes appears. in appendix b, we evaluate the mean size of the r component starting from a single seed, n (= n p n n), and the corresponding mean square size, n = ( n p n n ), by using generating functions. then we consider the case of ρ > . below λ c , we naturally assume that the mean size n is so small that each r component is a tree [ ] and that any overlaps between the r components are negligible so that the total fraction of the r components can be evaluated as ρ n [ ] , which should necessarily be less than . then we can determine the creation process of the infinite r component by regarding each r component of size n as a supernode whose degree depends on n (see fig. ) and considering the percolation problem of these supernodes. the density of susceptible nodes, s, is just minus the density of the removed nodes, ρ n : then the probabilityp that the node reached by following a randomly chosen link is a component of a supernode is p = ρ k n ρ k n + zs , where k n is the number of external links of the r component (the degree of the supernode) having size n and is given by equation ( ) holds since each r component is a tree with the number of edges equal to the number of nodes minus . the mean branching ratio of supernodes, b, is evaluated by multiplyingp by the mean excess degree of supernodes: the percolation of supernodes takes place when b , and thus the transition point is given by b = . that is, λ c satisfies z ρ = k n (k n − ) + z n where n and n are functions of λ. we can show that these moments diverge as n ∼ (λ sir c − λ) − and n ∼ (λ sir c − λ) − when λ approaches λ sir c from below (see appendix b), and therefore λ c → λ sir c as ρ → like thus, a small increase in ρ drastically reduces λ c from λ sir c . we approximately have the fraction of the largest r component, r max , by applying a procedure similar to the derivation of s max to the connected components of supernodes (see appendix c). this approximation may predict a rise of the order parameter r max around λ c but inherently overestimates r max for λ > λ c due to the overlaps between the r components generated from each seed being non-negligible [see fig. (a) ]. we also check our estimate by comparison with monte carlo simulations. in figs. (a) and (b) , we plot the monte carlo results for the order parameter, r max (n ), and the corresponding fractal exponent, ψ r (n ) ≡ d ln r max (n )/d ln n . in a manner similar to that in the previous subsection, we find that the crossing point of ψ r is at our estimate of λ c , λ c . . we also find a good scaling result for r max (n ) using ψ c r ≡ ψ r (λ c ) = / , β = , and the estimated λ c [fig. (c) ], supporting the validity of our estimate and indicating that the percolation transition at λ c belongs to the mean-field universality class. we analytically and numerically evaluate λ c and λ c for several values of ρ. in fig. , we have the phase diagram of (ρ,λ) space. we find that our estimates of λ c and λ c perfectly match the monte carlo results. the first percolation point λ c is smaller than λ sir c as long as ρ > . that is, the percolation of the r components occurs without global outbreaks. the gap between λ c and λ sir c shrinks with decreasing ρ and λ c = λ sir in the limit ρ → . note that λ c = when ρ . because the seeds themselves percolate (the site percolation threshold of the z-rrg is /(z − ), which is derived from the local tree approximation [ , ] ), and λ c = when ρ . because the seeds themselves disintegrate the susceptible network into finite components. our finite-size scaling for several values of ρ shows that both percolation transitions at λ c and λ c belong to the meanfield universality class, irrespective of the value of ρ. this seems unsurprising because the two processes comprising the present model, the sir model and site percolation, belong to the mean-field universality class of percolation when the graph is rrg. when ρ > , the system does not show any singular behavior at λ sir c . however, this does not mean that λ sir c is unimportant. in practice, λ sir c is still an important measure in the strategy for disease control because a single seed has the potential to induce a global outbreak above λ sir c (in other words, the basic reproduction number r > when λ > λ sir c ). the singular behaviors at λ c , e.g., the divergence of the r susceptibility, may be interpreted as a precursor to global outbreaks, like the proverbial canary in a coal mine. we have numerically and analytically shown that the present model with multiple seeds on the rrg percolates at a lower infection rate than the epidemic threshold. is this phenomenon in common with other networks, e.g., networks with many short loops and without the logarithmic dependence of the mean shortest path? in the rest of this section, we briefly consider the sir model with multiple seeds on finitedimensional euclidean lattices by monte carlo simulations. first, we consider the cubic lattice with the periodic boundary condition. in monte carlo simulations, we set μ = , ρ = . , and n = , , , . the number of trials is . in fig. (a) , we show the size dependence of the order parameters, r max and s max . we find that r max and s max become nonzero and at different points λ c . and λ c . , respectively. this discrepancy is also led by the peak positions of r susceptibility and s susceptibility, as shown in fig. (b) . both λ c and λ c are also different from the epidemic threshold λ sir c , which is obtained from the single-seed simulations (not shown). starting from a single seed, r max (= r) becomes nonzero at λ sir c . , which is larger than λ c . then we have the discrepancy among λ c , λ sir c , and λ c , on the cubic lattice, and expect that the observed phenomena on the rrg will hold for other clustered networks. a qualitative difference between the cubic lattice and the rrg is in their universality class. from the crossings of the fractal exponents ψ r and ψ s , we have ψ c r ψ c s . for the cubic lattice (not shown), and this estimate means that transitions of both r components and s components belong to the universality class of three-dimensional percolation transition [ ] but not to the mean-field universality class. let us mention a special case, the sir model in the square lattice with the periodic boundary condition. in fig. , we show the monte carlo results. in this case, s max and r max seem to undergo a macroscopic change at the same point, i.e., λ c = λ c . [see fig. (a) ]. the corresponding susceptibilities also seem to have a peak at the same value of λ [ fig. (b) ]. compared to single-seed simulations, λ c and λ c decrease with an increase in ρ and differ from the epidemic threshold λ sir does not seem surprising if we consider a spatial constraint of the square lattice: when the largest component percolates the lattice vertically and horizontally, the residual components after removing the largest one cannot maintain the connection across the lattice. (this reflects on the fact that the percolation threshold is equal to or larger than / in both site and bond percolations.) turning to other real spatial networks, which are often regarded as two-dimensional objects, it is an open question whether or not the coexisting phase exists. in this paper, we have studied the sir model in an rrg with a nontrivial fraction of infection seeds, ρ. through analytical estimates and numerical simulations, we have obtained the phase diagram in (ρ,λ) space. the sir model with numerous seeds shows the percolation transition of the removed and susceptible nodes at λ c and λ c , respectively. in particular, λ c is smaller than the epidemic threshold λ sir c as long as ρ > . this means that epidemic clusters generated by multiple seeds percolate without global outbreaks. so far, we have focused on the sir model in the rrg and the lattices. we expect that the above statement holds for the sir model in other networks, although the details of the phase transition may depend on network structures, e.g., λ c < λ sir c = in a fat-tailed scale-free network with γ . finally, we briefly discuss other epidemic models with multiple seeds. krapivsky et al. [ ] proposed an extended sir model, called a transient fad, with the assumption of a well-mixed population. they analytically showed that this model exhibits a discontinuous transition if ρ > . the authors and a collaborator [ ] performed monte carlo simulations for this fad model in networks to confirm that a discontinuous jump of the order parameter appears near the epidemic threshold, which is behind the percolation of epidemic clusters. the authors also investigated the discretetime version of the transient fad to confirm it numerically and analytically [ ] . very recently, several generalized epidemic models in networks beyond the classical sir model have been investigated [ ] [ ] [ ] [ ] . it will be interesting to clarify what numerous seeds induce in such generalized epidemic models. with and the initial condition equation (a ) is substituted into eq.(a ) to find the condition for s z− , to remain finite as t → ∞, and thus, the lower bound of μ/λ gives the epidemic threshold, , which corresponds to the known result, ( ). first, we evaluate the size distribution of the r component created by a single seed. to do this, we need to know the probability, p (k) , that an infected node will infect of k neighboring susceptible nodes before being removed. such an infected node is removed before infecting any neighbors with a probability p (k) = +kλ , so that the probability of its infecting at least one neighboring node before removal is − p (k) = kλ +kλ . as shown in fig. , we can express p (k) as a recursive form, from this equation, we find that the generating function of p (k) , satisfies the recursion relation with g (x) = . now let p n be the probability that a single seed creates an r component of size n, and let q n be the probability that a node infected by another node further creates a partial r component of size n. then, by considering the infection process starting fig. . schematic of the recursive relation for p (k) . let consider the probability p (k) that an infected node infects nodes of k susceptible neighbors before being removed [(a)→(d)]. the probability that this infected node is removed before infecting any neighbors [(a)→(b)] is p (k) = /( + kλ). on the other hand, the event that this node infects one susceptible neighbor [(a)→(c)] occurs with probability − p (k) = kλ/( + kλ). at (c), the focal infected node infects further − nodes from k − remaining neighbors before being removed [(c)→(d)] with probability p (k− ) − because the infecting process is markovian. thus, p (k) , which is the probability from (a) to (d), is given as p (k− ) − × kλ/( + kλ). from a single seed, p n can be evaluated as where z ν = z − ν (fig. ). introducing the corresponding generating functions, we can express the above relations (b ) and (b ) in a compact form as g (x) = xg z (g (x)) (b ) and g (x) = xg z (g (x)), respectively. what we want to know is the mean size of the r component, n = g ( ), and the mean square size of the r component, n = g ( ) + g ( ). to evaluate these values, we need the derivative of g k (x), which is given by from which the mean value k is easily found to be one also requires the second derivative, so that yielding now we can evaluate the derivatives of g (x) and g (x) as g = g z (g )g + xg z (g )g + xg z (g )g (b ) and g = g z (g ) + xg z (g )g , g = g z (g )g + xg z (g )g + xg z (g )g . (b ) infectious diseases of humans: dynamics and control dynamical processes on complex networks here we replaced the transmissibility t in [ ] with λ as t = λ/(μ + λ) with μ = the susceptibilities χ r and χ s are given as χ r = r =rmax r n r (r) and χ s = s =smax s n s (s) introduction to percolation theory finite components have no cyclic path in infinite locally treelike networks let us consider a finite network with n nodes. the number of seeds is nρ, and the mean r-component size for each seed is n fig. . schematic of p n (top) and q n (bottom) for the case of the rrg with z = : p n = p ( ) δ n, + p ( ) setting x = gives g ( ) = andthese quantities provide an explicit expression for the first moment, m , and the second cumulant, c (and thus the second moment m = c + m ), of p n aswhen λ approaches λ sir c from below, g ( ) dominates the behavior of m and c . indeed (b ) tells us that g ( ) diverges as δλ − , where δλ = λ sir c − λ, and thussubstituting eq. (b ) for eq. ( ), we have a power-law dependence of the gap δλ on the seed fraction ρ as eq. ( ). the generating function of the probability that a supernode will have the degree k n is given byand that of the excess degree aslet u be the probability that a finite cluster of supernodes is found by following randomly chosen links; this satisfieshere,p is the probability that the node reached by following a randomly chosen link is a component of a supernode and is given by eq. ( ) . then, the density of the largest component of supernodes in size, i.e., r max , is evaluated aswhere we have used k n = (z − ) n + from eq. ( ).∞ drdτ p (r)p (τ )e −rτ , where t ij is the probability that the disease did not transmit from node i to node j before node i was removed when we consider the pair of an infected node i and a susceptible node j , r is the infection rate between the focal pair, τ is the time for which the infected node remains infected, and p (r) and p (τ ) are the corresponding distributions. in the present case, we have p (r) = δ(r − λ) (because the infection rate is a constant λ), p (τ ) = μe −μτ (as the distribution of the interevent time of a poisson process with parameter μ), and t = − ∞ drdτ δ(r − λ)μe −μτ e −rτ = − μ key: cord- -b frnfr authors: thomas, loring j.; huang, peng; yin, fan; luo, xiaoshuang iris; almquist, zack w.; hipp, john r.; butts, carter t. title: spatial heterogeneity can lead to substantial local variations in covid- timing and severity date: - - journal: nan doi: nan sha: doc_id: cord_uid: b frnfr standard epidemiological models for covid- employ variants of compartment (sir) models at local scales, implicitly assuming spatially uniform local mixing. here, we examine the effect of employing more geographically detailed diffusion models based on known spatial features of interpersonal networks, most particularly the presence of a long-tailed but monotone decline in the probability of interaction with distance, on disease diffusion. based on simulations of unrestricted covid- diffusion in u.s cities, we conclude that heterogeneity in population distribution can have large impacts on local pandemic timing and severity, even when aggregate behavior at larger scales mirrors a classic sir-like pattern. impacts observed include severe local outbreaks with long lag time relative to the aggregate infection curve, and the presence of numerous areas whose disease trajectories correlate poorly with those of neighboring areas. a simple catchment model for hospital demand illustrates potential implications for health care utilization, with substantial disparities in the timing and extremity of impacts even without distancing interventions. likewise, analysis of social exposure to others who are morbid or deceased shows considerable variation in how the epidemic can appear to individuals on the ground, potentially affecting risk assessment and compliance with mitigation measures. these results demonstrate the potential for spatial network structure to generate highly non-uniform diffusion behavior even at the scale of cities, and suggest the importance of incorporating such structure when designing models to inform healthcare planning, predict community outcomes, or identify potential disparities. since its emergence at the end of , the sars-cov- virus has spread rapidly to all portions of globe, infecting nearly five million people as of late may ( ) . the disease caused by this virus, denoted covid- , generally manifests as a respiratory illness that is spread primarily via airborne droplets. while most cases of covid- are non-fatal, a significant fraction of those infected require extensive supportive care, and the mortality rate is substantially higher than more common infectious diseases such as seasonal influenza ( ) . even for survivors, infection can lead to long-term damage to the lungs and other organs, leading to long convalescence times and enhance risks of secondary complications ( , ) . by early march of , covid- outbreaks had appeared on almost every continent, including significant clusters within many cities ( ) . in the absence of an effective vaccine, public health measures to counteract the pandemic in developed nations have focused on social distancing measures that seek to slow diffusion sufficiently to avoid catastrophic failure of the healthcare delivery system. both the planning and public acceptance of such measures have been highly dependent upon the use of epidemiological models to probe the potential impact of distancing interventions, and to anticipate when such measures may be loosened with an acceptable level of public risk. as such, the assumptions and behavior of covid- diffusion models is of significant concern. currently dominant approaches to covid- modeling ( ) ( ) ( ) are based on compartment models (often called sir models, after the conventional division of the population into susceptible, infected, and recovered groups in the most basic implementations) that implicitly treat individuals within a population as geographically well-mixed. while some such models include differential contact by demographic groups (e.g., age), and may treat states, counties, or occasionally cities as distinct units, those models presently in wide use do not incorporate spatial heterogeneity at local scales (e.g., within cities). past work, however, has shown evidence of substantial heterogeneity in social relationships at regional, urban, and sub-urban scales ( , ) , with these variations in social network structure impacting outcomes as diverse as regional identification ( ) , disease spread ( ) , and crime rates ( ) , as well as in both human and non-human networks ( ) . if individuals are not socially "well-mixed" at local scales, then it is plausible that diffusion of sars-cov- via interpersonal contacts will likewise depart from the uniform mixing characteristic of the sir models. indeed, at least one computational study ( ) using a fairly "generic" (non-covid) diffusion process on realistic urban networks has showed considerable nonuniformity in diffusion times, suggesting that such effects could hypothetically be present. however, it could also be hypothesized that such effects would be small perturbations to the broader infection curve captured by conventional compartment models, with little practical importance. the question of whether these effects are likely to be present for covid- , and if so their strength and size, has to date remained open. in this paper, we examine the potential impact of local spatial heterogeneity on covid- , modeling the diffusion of sars-cov- in populations whose contacts are based on spatially plausible network structures. we focus here on the urban context, examining nineteen different cities in the united states. we simulate the population of each city in detail (i.e., at the individual level), simulating hypothetical outbreaks on the contact network in each city in the absence of measures such as social distancing. despite allowing the population to be well-mixed in all other respects (i.e., not imposing mixing constraints based on demographic or other characteristics), we find that spatial heterogeneity alone is sufficient to induce substantial departures from spatially homogeneous sir behavior. among the phenomena observed are "long lag" outbreaks that appear in previously unharmed communities after the aggregate infection wave has largely subsided; frequently low correlations between infection timing in spatially adjacent communities; and distinct sub-patterns of outbreaks found in some urban areas that are uncorrelated with the broader infection pattern. gaps between infection peaks at the intra-urban level can be be large, e.g. on the order of weeks or months in extreme cases, even for communities that are within kilometers of each other. such heterogeneity is potentially consequential for the management of healthcare delivery services: as we show using a simple "catchment" model of hospital demand, local variations in infection timing can easily overload some hospitals while leaving others relatively empty (absent active reallocation of patients). likewise, we show that individuals' social exposures to others who are morbid or deceased vary greatly over the course of the pandemic, potentially leading to differences in risk assessment and bereavement burden for persons residing in different locations. differences in outbreak timing and severity may exacerbate health disparities (since e.g., surge capacity varies by community) and may even affect perception of and support for prophylactic behaviors among the population at large, with those in so-far untouched communities falsely assuming that the pandemic threat is either past or was exaggerated to begin with, or attributing natural variation in disease timing to the impact of health interventions. we note at the outset that the models used here are intended to probe the hypothetical impact of spatial heterogeneity on covid- diffusion within particular scenarios, rather than to produce high-accuracy predictions or forecasts. for the latter applications, it is desirable to incorporate many additional features that are here simplified to facilitate insight into the phenomenon of central interest. in particular, we do not incorporate either demographic effects or social distancing, allowing us consider a setting that is as well-mixed as possible (and hence as close as possible to an idealized sir model) with the exception of spatial heterogeneity. as we show, even this basic scenario is sufficient to produce large deviations from the sir model. despite the simplicity of our models, we do note that the approach employed here could be integrated with other factors and calibrated to produce models intended for forecasting or similar applications. covid- is typically transmitted via direct contact with infected individuals, with the greatest risk occurring when an uninfected person is within approximately six feet of an infected person for an extended period of time. such interactions can be modeled as events within a social network, where individuals are tied to those whom they have a high hazard of intensive interaction. in prior work, this approach has been successfully employed for modeling infectious disesaes ranging from hiv ( ) and influenza ( ) to zika ( ) transmission. to model networks of potential contacts at scale, we employ spatial network models ( ) , which are both computationally tractable and able to capture the effects of geography and population heterogeneity on network structure ( ) . such models have been successfully used to capture social phenomena ranging from neighborhood-level variation crime rates ( ) and regional identification ( ) to the flow of information among homeless persons ( ) . the spatial network models used here allow for complex social dependence through a kernel function, referred to as the social interaction function or sif. the sif formally defines the relationship between two individuals based on spatial proximity. for example it has been shown that many social interaction patterns obey the zipf law ( ) , where individuals are more likely to interact with others close by rather than far away (a pattern that holds even for online interactions ( ) ). here, we use this approach to model a network that represents combination of frequent interactions due to ongoing social ties, and contacts resulting from frequent incidental encounters (e.g., interactions with neighbors and community members). we follow the protocol of ( , ) to simulate social network data that combines the actual distribution of residents in a city with a pre-specified sif. we employ the model and data from ( ) to produce large-scale social networks for cities and counties in the united states -providing a representation of major urban areas in the united states (see supplement). given these simulated networks, then implement an sir-like framework to examine covid- diffusion. at each moment in time, each individual can be in a susceptible, infected but not infectious, infectious, deceased, or recovered state. the disease diffuses through the contact network, with currently infectious individuals infecting susceptible neighbors as a continous time poisson process with a rate estimated from mortality data (see supplement); recovered or deceased individuals are not considered infectious for modeling purposes. upon infection, an individual's transitions between subsequent states (and into mortality or recovery) are governed by waiting time distributions based on epidemiological data as described in the supplemantary materials. to begin each simulated trajectory, we randomly infect individuals, with all others being considered susceptible. simulation proceeds until no infectious individuals remain. from the simulated trajectory data, we produce several metrics to assess spatial heterogeneity in disease outcomes. first, we present infection curves for illustrative cities, showing the detailed progress of the infection and its difference from what an sir model would posit. we also present chloropleth maps showing spatial variation in peak infection times, as well as the correlations between the infection trajectory within local areal units and the aggregate infection trajectory for the city as a whole. while an sir model would predict an absence of systematic variation in the infection curves or the peak infection day for different areal units in the same city, geographically realistic models show considerable disparities in infection progress from one neighborhood to another. to quantify the degree of heterogeneity more broadly, we examine spatial variation in outcomes for each of our city networks. we show that large variations in peak infection days within across tracts are typical (often spanning weeks or even months), and that overall correlations of within-tract infection trajectories with the aggregate urban trajectory are generally modest (a substantial departure from what would be expected from an sir model). in addition to these relatively abstract metrics, we also examine a simple measure of the potential load on the healthcare system in each city. given the locations of each hospital in each city, we attribute infections to each hospital using a voronoi tessellation (i.e., under the simple model that individuals are most likely to be taken to the nearest hospital if they become seriously ill). examination of the potential hospital demand over time shows substantial differences in load, with some hospitals severely impacted while others have few cases. finally, we consider the social exposure of individuals to covid- , by computing the fraction of individuals with a personal contact who is respectively morbid or deceased. our model shows considerable differences in these metrics over time, revealing that the pandemic can appear very different to those "on the ground" -evaluating its progress by its impact on their own personal contacts -than what would be suggested by aggregate statistics. networks are generated using population distributions from ( ), based on block-level census data. hospital information are obtained from the homeland infrastructure foundation-level data (hifld) database ( ) . hifld is a an initiative that collects geospatial information on critical infrastructure across multiple levels of government. we employ the national-level hospital facility database, which contains locations of hospitals for the us states, washington d.c., us territories of puerto rico, guam, american samoa, northern mariana islands, palau, and virgin islands; underlying data are collated from various state departments or federal sources (e.g., oak ridge national laboratory). we employ all hospitals within our target cities, excluding facilities closed since . latitude/longitude coordinates and capacity information were employed to create a spatial database that includes information on the number of beds in each hospital. the dates of the first confirmed case and all the death cases for king county, where seattle is located, were obtained from the new york times, based on reports from state and local health agencies ( ) . the death rate was calculated based on population size of each county from the american community survey, and employed to calibrate the infection rate (the only free parameter in the models used here); details are provided in the supplemental materials. we ran replicates of the covid- diffusion process in each of our cities, seeding with randomly selected infections in each replicate and following the course of the diffusion until no infectious individuals remained. simulations were performed using a combination of custom scripts for the r statistical computing system ( ) and the statnet library ( ) ( ) ( ) . analyses were performed using r. when taken over even moderately sized regions, aggregate infection curves can appear relatively smooth. although this suggests homogeneous mixing (as assumed e.g. by standard sir models), appearances can be deceiving. fig. shows typical realizations of infection curves for two cities (seattle, wa and washington, dc), showing both the aggregate trajectory (red) and trajectories within individual census tracts (black). while the infection curves in both cases are relatively smooth, and suggestive of a fairly simple process involving a sharp early onset followed by an initially sharp but mildly slowing decline in infections, within-tract trajectories tell a different story. instead of one common curve, we see that tracts vary wildly in onset time and curve width, with some tracts showing peaks weeks or months after the initial aggregate spike has passed. the cases of fig. are emblematic of a more systematic phenomenon: the progress of the infection within any given areal unit often has relatively little relationship to its progress in the city as a whole. a common pattern by taking the variance on the first principal component of the standardized infection curves. as before, where different parts of the city experience similar patterns of growth and decline in infections, we expect the dimension of greatest shared variance to account for the overwhelming majority of variation in infection rates. contrary to these expectations, however, fig. shows that there is little coherence in tract-level infection patterns. mean correlations of local infection curves across tracts typically range from approximately and . , with a mean of approximately . , indicating very little correspondence between infection timing in one track and that of another. the principal component analysis tells a similar story: overall, we see that first component accounts for relatively little of the total variance in trajectories, with on average only around % of variation in infection curves lying on the first principal component (and no observed case of the first component accounting for more than % of the variance). this confirms that local infection curves are consistently distinct, with behavior that is only weakly related to infections in the city as a whole. this is a substantially different scenario than what is commonly assumed in traditional sir models. these differences in local infection curves are a consequence of the unevenness of the "social fabric" that spans the city: while the disease can spread rapidly within regions of high local connectivity, it can easily become stalled upon reaching the boundaries of these regions. further transmission requires that a successful infection event occur via a bridging tie, an event with a potentially long waiting time. such delays create potential opportunities for public health interventions (trace/isolate/treat strategies), but they can also create a false sense of security for those on the opposite side of bridge (who may incorrectly assume that their area was passed over by the infection). indeed, examining the time to peak infection across the cities of seattle and washington, d.c. (fig. ) shows that while peak times are visibly autocorrelated, tracts with different peak times frequently border each other. residents on opposite sides of the divide may be exposed to very different local infection curves, making risk assessment difficult. the cases of seattle and washington, dc are not anomalous. looking across multiple trajectories over our entire sample, fig. shows consistently high variation in per-tract peak infection times for nearly all study communities. (this variation is also seen within individual trajectories, as shown in supplemental figure s .) although peak times in some cities are concentrated within an interval of several days to a week, it is more common for peak times to vary by several months. such gaps are far from what would be expected under uniform local mixing. variation in the timing of covid- impacts across the urban landscape has potential ramifications for healthcare delivery, creating unequally distributed loads that overburden some providers while leaving others with excess resources. to obtain a sense of how spatial heterogeneity in the infection curve could potentially impact hospitals, we employ a simple "catchment" model in which seriously ill patients are taken to the nearest hospital, subsequently recovering and/or dying as assumed throughout our modeling framework. based on prior estimates, we assume that % of all infections are severe enough to require hospitalization ( ) . while hospitals draw from (and hence average across) areas that are larger than tracts, the heterogeneity shown in fig. suggests the potential for substantial differences in hospital load over time. indeed, our models suggest that such differences will occur. fig. shows the number of patients arriving at each hospital in seattle and washington, dc (respectively) during a typical simulation trajectory. while some hospitals do have demand curves that mirror the city's overall infection curve, others show very different patterns of demand. in particular, some hospitals experience relatively little demand in the early months of the pandemic, only to be hit hard when infections in the city as a whole are winding down. just as hospital load varies, hospital capacities vary as well. as a simple measure of strain on hospital resources, we consider the difference between the number of covid- hospitalizations and the total capacity of the hospital (in beds), truncating at zero when demand outstrips supply. (for ease of interpretation as a measure of strain, we take the difference such that higher values indicate fewer available beds.) using data on hospital locations and capacities, we show in fig. strain on all hospitals in seattle and washington, d.c. (respectively) during a typical infection trajectory. while some hospitals are hardest hit early on (as would be expected from the aggregate infection curve), others do not peak for several months. likewise, hospitals proximate to areas of the city with very different infection trajectories experience natural "curve flattening," with a more distributed load, while those that happen to draw from positively correlated areas experience very sharp increases and declines in demand. these conditions in some cases combine to keep hospitals well under capacity for the duration of the pandemic, while others are overloaded for long stretches of time. these marked differences in strain for hospitals within the same city highlight the potentially complex consequences of heterogeneous diffusion for healthcare providers. looking across cities, we see the same high-variability patterns as observed in seattle and washington. in particular, we note that local variation in disease timing leads to a heavy-tailed distribution for the duration at which hospitals will be at capacity. fig. shows the marginal distribution of hospital overload periods (defined as total number of days at capacity during the pandemic), over the entire sample. while the most common outcome is for hospitals to be stressed for a brief period (not always to the breaking point), a significant fraction of hospitals end up being overloaded for months -or even, in a small fraction of cases, nearly the whole duration of the pandemic. while most hospitals will have only brief periods of overload, some will be at or over capacity for the entire pandemic, potentially several years. it should be reiterated that the hospital load model used here is extremely simplified, and that we are employing a no-mitigation scenario. however, these results quite graphically demonstrate that the importance of curve-flattening interventions does not abate once geographical factors are taken into account. on the other hand, these results suggest that differences in hospital load may be substantially more profound than would be anticipated from uniform mixing models, creating logistical challenges and possibly exacerbating existing differences in resource levels across hospitals. at the same time, such heterogeneity implies that resource sharing and patient transfer arrangements could prove more effective as load-management strategies than would be suggested by spatially homogeneous models, as hospitals are predicted to vary considerably in the timing of patient demand. in addition to healthcare strain, the subjective experience of the pandemic will potentially differ for individuals residing in different locations. in particular, social exposures to outcomes such as morbidity or mortality may shape individuals' understandings of the risks posed by covid- , and their willingness to undertake protective actions to combat infection. such exposures may furthermore act as stressors, with potential implications for physical and/or mental health. as a simple measure of social exposure, we consider the question of whether a focal individual (ego) either has experienced a negative outcome themselves, or has at least one personal contact (alter) who has experienced the outcome in question. (given the highly salient nature of covid- morbidity and mortality, we focus on the transition to first exposure rather than e.g. the total number of such exposures, as the first exposure is likely to have the greatest impact on ego's assessment of the potential severity of the disease.) to examine how social exposure varies by location, we compute the fraction of individuals in each tract who are in socially exposed to (respectively) morbidity or mortality. fig. shows these proportions for baltimore, md, over the course of the pandemic. as with other outcomes examined here, we see considerable variation in timing, with many tracts seeing a rapid increase in exposure to infections, while others go for weeks or months with relatively few persons having a personal contact with the disease. another notable axis of variation is sharpness. in many tracts, the fraction of individuals with at least one morbid contact transitions from near zero to near one within a matter of days, creating an extremely sharp social transition between the "pre-exposure world" (in which almost no one present knows someone with the illness) to a "post-exposure world" in which almost everyone knows someone with the illness). by contrast, other tracts show a much more gradual increase (sometimes punctuated by jumps), as more and more individuals come to know someone with the disease. in a few tracts that are never hit hard by the pandemic, few people ever have an infected alter; residents of these areas obviously have a very different experience than those of high-prevalence tracts. these distinctions are even more stark for mortality, which takes longer to manifest and which does so much more unevenly. tracts vary greatly in the fraction of individuals who ultimately lose a personal contact to the disease, and in the repidity with which that fraction is reached. in many cases, it may take a year or more for this quantity to be realized; until that point, many residents may be skeptical to the notion that the pandemic poses a great risk to them personally. by way of assessing the milieu within each tract, it is useful to consider the "cross-over" point at which at least half of the residents of a given tract have been socially exposed to either covid- morbidity or mortality. fig. maps these values for baltimore, md. it is immediately apparent that social exposures are more strongly spatially autocorrelated than other outcomes considered here, due to the presence of long-range ties within individuals' personal networks. even so, however, we see strong spatial differentiation, with residents in the urban core being exposed to both morbidity and mortality much more quickly than those on the periphery. this suggests that the social experience of the pandemic will be quite different from those in city centers than in more outlying areas, with the latter taking far longer to be exposed to serious consequences of covid- . this may manifest in differences in willingness to adopt protective actions, with those in the urban core being more highly motivated to take action (and perhaps resistant to rhetoric downplaying the severity of the disease) than those on the outskirts of the city. we see a large degree of spatial heterogeneity, as some tracts are more insulated from others in terms of social exposure. however, by the end of the pandemic, most people across all tracts have been exposed to someone who has had the disease. (right) the fraction of persons in each tract who have an alter who died from covid- in their personal network. on average, only around % of people in any given tract know someone who died by the end of the pandemic, though this varies widely across tracts. figure : (left) chloropleth showing the time for half of those in each tract to be socially exposed to covid- morbidity in baltimore, md. the central and southern parts of the city are exposed far sooner than the northwestern part of the city. (right) chloropleth showing the time for half of those in each tract to be socially exposed to covid- mortality. central baltimore is exposed to deaths in personal networks far sooner than the more outlying areas of the city. our simulation results all underscore the potential effects of local spatial heterogeneity on disease spread. the spatial heterogeneity heterogeneity driving these results occurs on a very small scale (i.e., census blocks), operating well below the level of the city a whole. as the infection spreads, relatively small differences in local network connectivity and the prevalence of bridging ties driven by uneven population distribution can lead to substantial differences in infection timing and severity, leading different areas in each city to have a vastly different experience of the pandemic. resources will be utilized differently in different areas, some areas will have the bulk of their infections far later than others, and the subjective experience of a given individual regarding the pandemic threat may differ substantially from someone in a different area. these behaviors are in striking contrast to what is assumed by models based on the assumption of spatially homogeneous mixing, which posit uniform progress of the infection within local areas. as noted at the outset, our model is based on a no-mitigation scenario, and is not intended to capture the impact of social distancing. while distancing measures by definition limit transmission rates -and will hence slow diffusion -contacts occurring through spatially correlated networks like those modeled here are still likely to show patterns of heterogeneity like those described. one notable observation from our simulations is the long outbreak delay that some census tracts experience, even in the absence of social distancing. this would suggest that relaxation of mitigation measures leading to a resumption of "normal" diffusion may initially appear to have few negative effects, only to lead to deadly outbreaks weeks or months later. public health messaging may need to stress that apparent lulls in disease progress are not necessarily indicators that the threat has subsided, and that areas "passed over" by past outbreaks could be impacted at any time. finally, we stress that conventional diffusion models using locally homogeneous mixing have been of considerable value in both pandemic planning and scenario evaluation. our findings should not be taken as an argument against the use of such models. however, the observation that incorporating geographical heterogeneity in contact rates leads to radically different local behavior would seem to suggest that there is value in including such effects in models intended to capture outcomes at the city or county level. since these are the scales on which decisions regarding infrastructure management, healthcare logistics, and other policies are often made, improved geographical realism could potentially have a substantial impact on our ability to reduce lives lost to the covid- pandemic. in this supplement, we go into more depth on spatial interaction functions, spatial bernoulli models, the setup and parameterizations of our simulations, and the paramter estimation problems that we used for this paper. we focus specifically on the more technical aspects of each of these components, showing how they have been formally specified and parameterized. a spatial interaction function (sif) describes the marginal probability of a tie between any two nodes, given the distance between those nodes, represented f(d i,j , θ). in this representation, d i,j is the distance between the nodes, and θ are the parameters for the function. prior literature shows that spatial interaction functions tend to be of the power law or attenuated power law form ( ) . thus, we can represent the sif as f(d i,j , θ) = p b ( +α * d i,j ) γ . here, p b represents the base tie probability, which can be thought of as the probability of a tie at distance . α is a scaling parameter that determines the speed at which the probability drops towards zero. γ is the parameter that determines the weight of the tail. we draw on two sifs in this paper, using models for social interactions and face to face interactions employed in prior studies ( , ) . the social interaction sif declines with a γ of . , while the face-to-face sif declines with γ of . . the paramters for the social interaction sif are p b = . , α = . , γ = . , and the parameters for the face-to-face sif are p b = . ,α = . , γ = . ( ) . bernoulli models are a class of random graph models that leverage the concept of a bernoulli graph, or a graph in which the probability associated with each edge is a bernoulli trial. in a spatial bernoulli graph, tie probabilities are determined by a spatial interaction function, applied to the pairwise distances between individuals within some space (here, geographically determined using census data). spatial bernoulli models are highly scalable due to the conditional independence of edges, but allow for extremely complex structure due to the heterogeneity in edge probabilities induced by the sif; likewise, they naturally produce properties such as local cohesion and degree heterogeneity observed in many types of social networks ( ) . formally, we can specify a spatial bernoulli model by p r(y ij = ) = f(d i,j , θ), where y ij is each dyad, and f(d i,j , θ) is a spatial interaction function with inputs as distance d i,j , and parameters θ. to simulate diffusion of covid- , we require a contact network. here, we employ the above-described spatial bernoulli graphs, with node locations for each of our study locations drawn based on blocklevel census data (including clustering within households, an important factor in disease diffusion). we follow the protocols described in ( , ) to generate node positions, specifically using the quasirandom (halton) placement algorithm. node placement begins with the households in each census block, using census data from ( ) . the quasirandom placement algorithm uses a halton sequence to place households in space within the areal unit in which they reside. if any two households are placed within a critical radius of each other, then the algorithm "stacks" the households on top of each other by introducing artificial elevation (simulating e.g. a multistory apartment building). once all households are placed, individuals within households are placed at jittered locations about the household centroid. (individuals not otherwise attached to households are treated as households of size .) given an assignment of individuals to spatial locations, we simulate spatial bernoulli graphs using the models specified above. we generate two networks for each city, one with the social interaction sif, and the other with the face-to-face interaction sif. to form a network of potential high-risk contacts, we then merge these networks (which share the same node set) by taking their union, leading to a network in which two individuals are tied if they either have an ongoing social relationship or would be likely to have extensive face-to-face interactions for other reasons (e.g., interacting with neighbors). this process is performed for each city in our sample. table s lists the cities that we use for our simulations. these data are drawn from ( ). we conduct a series of simulations to examine the spread of covid- across city-sized networks. these simulations use a simple continuous-time network diffusion process, the general description of which are described in the main text. the input for the diffusion simulation is a network and a vector of initial disease states (susceptible, latent (infected but not yet infectious), infectious, recovered, and deceased), and the output is detailed history of the diffusion process up to the point at which a steady state is obtained (i.e., no infectious individuals remain). infection occurs via the network, with currently infectious individuals infecting susceptible alters as poisson events with a fixed rate. the transitions between latent and infectious, and infectious and either recovery or mortality are governed by gamma distributions estimated from epidemiological data. table s shows the estimated shape and scale parameters for the gamma distributions employed here. the parameters for waiting time to infectiousness are directly available in the appendix of ( ), while those for the recovery and death are estimated by matching the mean and standard deviation of durations reported in the literature ( ) . selection into death versus recovery was made via a bernoulli trial drawn at time of infection (thereby determining which waiting time distribution was used), with the estimated mortality probabiliy being . . to determine the infection rate (the only free parameter for the network models used in our simulations), we simulate the diffusion of virus in seattle and fit it to the over-time death rate of the king county, wa before the first shelter-in-place order went into effect on march , . (we limit our data to this time period because our simulation employs a no-mitigation scenario.) a grid search strategy was employed to determine the expected days to transmission (which is the inverse of infection rate), and the number of days between the existence of the first infected cases and the first confirmed cases (aka the time lag, a nuisance parameter that is relevant only for estimation of the infection rate). the time lag is treated as an integer and the expected days to transmission as a continuous variable. for each lag/rate pair, we randomly take draws from the expected infection waiting time distribution, add them to the lag time (i.e. the introduction of the true patient zero for the initial outbreak), and simulate realizations of the diffusion process (redrawing the network each time). the diffusion rate parameter was selected based on minimizing the mean squared error between the simulated death rate and the observed number of deaths over the selected period. the first round of grid-search divided the expected days of search into intervals, from ( , ) to ( , ), with days of lag ranging from to days. the second round of grid-search, based on the performance of the first round, divided the expected days of search into intervals, from ( . , . ) to ( . , . ), with days of lag ranging from to days. the grid-search suggests that the expected days to transmission is . ( . , . ) days ( fig s ) ; that is, in a hypothetical scenario in which a single infective ego remained indefinitely in the infective state, and a single alter remained otherwise susceptible, the average waiting time for ego to infect alter would be approximately days. while this may at first blush appear to be a long delay, it should be borne in mind that this embodies the reality that no individual is likely to infect any given alter within a short period (since, indeed, ego and alter may not happen to interact within a narrow window). with many alters, however, the chance of passing on the disease is quite high. likewise, we note that the thought experiment above should not be taken to imply that actors remain infectious for such an extended period of time; per the above-cited epidemiological data, individuals typically remain infectious for roughly - days (though variation outside this range does occur, as captured by the above gamma distributions). when both delay times are considered, the net probability of infecting any given alter prior to recovering is approximately %. using the above, we can calculate the corresponding basic reproductive number (r ): where d is the mean degree of an individual in the network; α is the infection rate (the inverse of expected days of transmission); τ is the time spent in the infectious state (here in days). for each simulated seattle contact network, we calculate the degree for every individual. the time in the infectious statae was obtained by simulating gamma distributions for days of incubation, recovery, and death, and randomly permuting the distribution for times for each simulated sif network. taking the mean of r for each individual, the corresponding basic reproductive number in the diffusion simulation model is . . to supplement the results on the variation in the peak infection time given in the main text, we ran a series of simulation replicates. figure in the main text shows the data from the figure below aggregated across all replicates. in the supplemental figure s , we break out the peak infection days in each city, by replicate. these data show that the significant variation in figure is not due to the number of replicates that were run, but instead due to the intrinsic variation that is present (due to spatial heterogeneity). network epidemiology: a handbook for survey design and data collection the sage handbook of gis and society research human behavior and the principle of least effort: an introduction to human ecology r: a language and environment for statistical computing, r foundation for statistical computing the lancet infectious diseases this research was supported by nsf awards iis- and ses- to c.t.b, and by the uci seed grants program. figure s : boxplot showing the peak infection days across replicates for each city in the sample. there is a large degree of heterogeneity within each city, showing that the day that the infection peaks for any given tract is not uniform at all. within any given city, there is a consistently high amount of variance in the peak infection day. in other words, the variance that we show here is a property of the spread of the disease, rather than the number of simulation replicates. key: cord- -eelqmzdx authors: guo, chungu; yang, liangwei; chen, xiao; chen, duanbing; gao, hui; ma, jing title: influential nodes identification in complex networks via information entropy date: - - journal: entropy (basel) doi: . /e sha: doc_id: cord_uid: eelqmzdx identifying a set of influential nodes is an important topic in complex networks which plays a crucial role in many applications, such as market advertising, rumor controlling, and predicting valuable scientific publications. in regard to this, researchers have developed algorithms from simple degree methods to all kinds of sophisticated approaches. however, a more robust and practical algorithm is required for the task. in this paper, we propose the enrenew algorithm aimed to identify a set of influential nodes via information entropy. firstly, the information entropy of each node is calculated as initial spreading ability. then, select the node with the largest information entropy and renovate its l-length reachable nodes’ spreading ability by an attenuation factor, repeat this process until specific number of influential nodes are selected. compared with the best state-of-the-art benchmark methods, the performance of proposed algorithm improved by . %, . %, . %, . %, . %, and . % in final affected scale on cenew, email, hamster, router, condmat, and amazon network, respectively, under the susceptible-infected-recovered (sir) simulation model. the proposed algorithm measures the importance of nodes based on information entropy and selects a group of important nodes through dynamic update strategy. the impressive results on the sir simulation model shed light on new method of node mining in complex networks for information spreading and epidemic prevention. complex networks are common in real life and can be used to represent complex systems in many fields. for example, collaboration networks [ ] are used to cover the scientific collaborations between authors, email networks [ ] denote the email communications between users, protein-dna networks [ ] help people gain a deep insight on biochemical reaction, railway networks [ ] reveal the structure of railway via complex network methods, social networks show interactions between people [ , ] , and international trade network [ ] reflects the products trade between countries. a deep understanding and controlling of different complex networks is of great significance in information spreading and network connectivity. on one hand, by using the influential nodes, we can make successful advertisements for products [ ] , discover drug target candidates, assist information weighted networks [ ] and social networks [ ] . however, the node set built by simply assembling the nodes and sorting them employed by the aforementioned methods may not be comparable to an elaborately selected set of nodes due to the rich club phenomenon [ ] , namely, important nodes tend to overlap with each other. thus, lots of methods aim to directly select a set of nodes are proposed. kempe et al. defined the problem of identifying a set of influential spreaders in complex networks as influence maximization problem [ ] , and they used hill-climbing based greedy algorithm that is within % of optimal in several models. greedy method [ ] is usually taken as the approximate solution of influence maximization problem, but it is not efficient for its high computational cost. chen et al. [ ] proposed newgreedy and mixedgreedy method. borgatti [ ] specified mining influential spreaders in social networks by two classes: kpp-pos and kpp-neg, based on which he calculated the importance of nodes. narayanam et al. [ ] proposed spin algorithm based on shapley value to deal with information diffusion problem in social networks. although the above greedy based methods can achieve relatively better result, they would cost lots of time for monte carlo simulation. so more heuristic algorithms were proposed. chen et al. put forward simple and efficient degreediscount algorithm [ ] in which if one node is selected, its neighbors' degree would be discounted. zhang et al. proposed voterank [ ] which selects the influential node set via a voting strategy. zhao et al. [ ] introduced coloring technology into complex networks to seperate independent node sets, and selected nodes from different node sets, ensuring selected nodes are not closely connected. hu et al. [ ] and guo et al. [ ] further considered the distance between independent sets and achieved a better performance. bao et al. [ ] sought to find dispersive distributed spreaders by a heuristic clustering algorithm. zhou [ ] proposed an algorithm to find a set of influential nodes via message passing theory. ji el al. [ ] considered percolation in the network to obtain a set of distributed and coordinated spreaders. researchers also seek to maximize the influence by studying communities [ ] [ ] [ ] [ ] [ ] [ ] . zhang [ ] seperated graph nodes into communities by using k-medoid method before selecting nodes. gong et al. [ ] divided graph into communities of different sizes, and selected nodes by using degree centrality and other indicators. chen et al. [ ] detected communities by using shrink and kcut algorithm. later they selected nodes from different communities as candidate nodes, and used cdh method to find final k influential nodes. recently, some novel methods based on node dynamics have been proposed which rank nodes to select influential spreaders [ , ] .Şirag erkol et al. made a systematic comparison between methods focused on influence maximization problem [ ] . they classify multiple algorithms to three classes, and made a detailed explanation and comparison between methods. more algorithms in this domain are described and classified clearly by lü et al. in their review paper [ ] . most of the non-greedy strategy methods suffer from a possibility that some spreaders are so close that their influence may overlap. degreediscount and voterank use iterative selection strategy. after a node is selected, they weaken its neighbors' influence to cope with the rich club phenomenon. however, these two algorithms roughly induce nodes' local information. besides, they do not further make use of the difference between nodes when weakening nodes' influence. in this paper, we propose a new heuristic algorithm named enrenew based on node's entropy to select a set of influential nodes. enrenew also uses iterative selection strategy. it initially calculates the influence of each node by its information entropy (further explained in section . ), and then repeatedly select the node with the largest information entropy and renovate its l-length reachable nodes' information entropy by an attenuation factor until specific number of nodes are selected. experiments show that the proposed method yields the largest final affected scale on real networks in the susceptible-infected-recovered (sir) simulation model compared with state-of-the-art benchmark methods. the results reveal that enrenew could be a promising tool for related work. besides, to make the algorithm practically more useful, we provide enrenew's source code and all the experiments details on https://github.com/yangliangwei/influential-nodes-identification-in-complex-networksvia-information-entropy, and researchers can download it freely for their convenience. the rest of paper is organized as follows: the identifying method is presented in section . experiment results are analyzed and discussed in section . conclusions and future interest research topics are given in section . the best way to measure the influence of a set of nodes in complex networks is through propagation dynamic process on real life network data. a susceptible infected removed model (sir model) is initially used to simulate the dynamic of disease spreading [ ] . it is later widely used to analyze similar spreading process, such as rumor [ ] and population [ ] . in this paper, the sir model is adopted to objectively evaluate the spreading ability of nodes selected by algorithms. each node in the sir model can be classified into one of three states, namely, susceptible nodes (s), infected nodes (i), and recovered nodes (r). at first, set initial selected nodes to infected status and all others in network to susceptible status. in each propagation iteration, each infected node randomly choose one of its direct neighbors and infect it with probability µ. in the meantime, each infected node will be recovered with probability β and won't be infected again. in this study, λ = µ β is defined as infected rate, which is crucial to the spreading speed in the sir model. apparently, the network can reach a steady stage with no infection after enough propagation iterations. to enable information spreads widely in networks, we set µ = . µ c , where µ c = k k − k [ ] is the spreading threshold of sir, k is the average degree of network. when µ is smaller than µ c , spreading in sir could only affect a small range or even cannot spread at all. when it is much larger than µ c , nearly all methods could affect the whole network, which would be meaningless for comparison. thus, we select µ around µ c on the experiments. during the sir propagation mentioned above, enough information can be obtained to evaluate the impact of initial selected nodes in the network and the metrics derived from the procedure is explained in section . . the influential nodes selecting algorithm proposed in this paper is named enrenew, deduced from the concept of the algorithm. enrenew introduces entropy and renews the nodes' entropy through an iterative selection process. enrenew is inspired by voterank algorithm proposed by zhang et al. [ ] , where the influential nodes are selected in an iterative voting procedure. voterank assigns each node with voting ability and scores. initially, each node's voting ability to its neighbors is . after a node is selected, the direct neighbors' voting ability will be decreased by k , where k = * m n is the average degree of the network. voterank roughly assigns all nodes in graph with the same voting ability and attenuation factor, which ignores node's local information. to overcome this shortcoming, we propose a heuristic algorithm named enrenew and described as follows. in information theory, information quantity measures the information brought about by a specific event and information entropy is the expectation of the information quantity. these two concepts are introduced into complex network in reference [ ] [ ] [ ] to calculate the importance of node. information entropy of any node v can be calculated by: where p uv = d u ∑ l∈Γv d l , ∑ l∈Γ v p lv = , Γ v indicates node v's direct neighbors, and d u is the degree of node u. h uv is the spreading ability provided from u to v. e v is node v's information entropy indicating its initial importance which would be renewed as described in algorithm . a detailed calculating of node entropy is shown in figure . it shows how the red node's (node ) entropy is calculated in detail. node has four neighbors from node to node . node 's information entropy is then calculated by simply selecting the nodes with a measure of degree as initial spreaders might not achieve good results. because most real networks have obvious clumping phenomenon, that is, high-impact nodes in the network are often connected closely in a same community. information cannot be copiously disseminated to the whole network. to manage this situation, after each high impact node is selected, we renovate the information entropy of all nodes in its local scope and then select the node with the highest information entropy, the process of which is shown in algorithm . e k = − k · k · log k and k is the average degree of the network. l− is the attenuation factor, the farther the node is from node v, the smaller impact on the node will be. e k can be seen as the information entropy of any node in k -regular graph if k is an integer. from algorithm , we can see that after a new node is selected, the renew of its l-length reachable nodes' information entropy is related with h and e k , which reflects local structure information and global network information, respectively. compared with voterank, enrenew replaces voting ability by h value between connected nodes. it induces more local information than directly set voting ability as in voterank. at the same time, enrenew uses h e k as the attenuate factor instead of k in voterank, retaining global information. computational complexity (usually time complexity) is used to describe the relationship between the input of different scales and the running time of the algorithm. generally, brute force can solve most problems accurately, but it cannot be applied in most scenarios because of its intolerable time complexity. time complexity is an extremely important indicator of an algorithm's effectiveness. through analysis, the algorithm is proved to be able to identify influential nodes in large-scale network in limited time. the computational complexity of enrenew can be analyzed in three parts, initialization, selection and renewing. n, m and r represent the number of nodes, edges and initial infected nodes, respectively. at start, enrenew takes o(n · k ) = o(m) for calculating information entropy. node selection selects the node with the largest information entropy and requires o(n), which can further be decreased to o(log n) if stored in an efficient data structure such as red-black tree. renewing the l-length reachable nodes' information entropy needs o( k l ) = o( m l n l ). as suggested in section . , l = yields impressive results with o( m n ). since selection and renewing parts need to be performed r times to get enough spreaders,the final computational complexity is o(m + n) + o(r log n) + o(r k ) = o(m + n + r log n + rm n ). especially, when the network is sparse and r n, the complexity will be decreased to o(n). the algorithm's performance is measured by the selected nodes' properties including its spreading ability and location property. spreading ability can be measured by infected scale at time t f(t) and final infected scale f(t c ), which are obtained from sir simulation and widely used to measure the spreading ability of nodes [ , [ ] [ ] [ ] [ ] [ ] . l s is obtained from selected nodes' location property by measuring their dispersion [ ] . infected scale f(t) demonstrates the influence scale at time t and is defined by where n i(t) and n r(t) are the number of infected and recovered nodes at time t, respectively. at the same time step t, larger f(t) indicates more nodes are infected by initial influential nodes, while a shorter time t indicates the initial influential nodes spread faster in the network. f(t c ) is the final affected scale when the spreading reaches stable state. this reflects the final spreading ability of initial spreaders. the larger the value is, the stronger the spreading capacity of initial nodes. f(t c ) is defined by: where t c is the time when sir propagation procedure reaches its stable state. l s is the average shortest path length of initial infection set s. usually, with larger l s , the initial spreaders are more dispersed and can influence a larger range. this can be defined by: where l u,v denotes the length of the shortest path from node u to v. if u and v is disconnected, the shortest path is replaced by d gc + , where d gc is the largest diameter of connected components. an example network shown in figure is used to show the rationality of nodes the proposed algorithm chooses. the first three nodes selected by enrenew is distributed in three communities, while those selected by the other algorithms are not. we further run the sir simulation on the example network with enrenew and other five benchmark methods. the detailed result is shown in table for an in-depth discussion. this result is obtained by averaging experiments. . this network consists of three communities at different scales. the first nine nodes selected by enrenew are marked red. the network typically shows the rich club phenomenon, that is, nodes with large degree tend to be connected together. table shows the experiment results when choosing nodes as the initial spreading set. greedy method is usually used as the upper bound, but it is not efficient in large networks due to its high time complexity. enrenew and pagerank distribute nodes in community , nodes in community , and node in community . the distribution matches the size of community. however, the nodes selected by the other algorithms tend to cluster in community except for greedy method. this will induce spreading within high density area, which is not efficient to spread in the entire network. enrenew and pagerank can adaptively allocate reasonable number of nodes based on the size of the community just as greedy method. nodes selected by enrenew have the second largest average distance except greedy, which indicates enrenew tends to distribute nodes sparsely in the graph. it aptly alleviates the adverse effect of spreading caused by the rich club phenomenon. although enrenew's average distance is smaller than pagerank, it has a higher final infected scale f(t c ). test result on pagerank also indicates that just select nodes widely spread across the network may not induce to a larger influence range. enrenew performs the closest to greedy with a low computational cost. it shows the proposed algorithm's effectiveness to maximize influence with limited nodes. note: n and m are the total number of nodes and edges, respectively, and k = * m n stands for average node degree and k max = max v∈v d v is the max degree in the network and average clustering coefficient c measures the degree of aggregation in the network. c = n ∑ n i= * i i |Γ i | * (|Γ i |− ) , where i i denotes the number of edges between direct neighbors of node i. table describes six different networks varying from small to large-scale, which are used to evaluate the performance of the methods. cenew [ ] is a list of edges of the metabolic network of c.elegans. email [ ] is an email user communication network. hamster [ ] is a network reflecting friendship and family links between users of the website http://www.hamsterster.com, where node and edge demonstrate the web user and relationship between two nodes, respectively. router network [ ] reflects the internet topology at the router level. condmat (condense matter physics) [ ] is a collaboration network of authors of scientific papers from the arxiv. it shows the author collaboration in papers submitted to condense matter physics. a node in the network represents an author, and an edge between two nodes shows the two authors have collaboratively published papers. in the amazon network [ ] , each node represents a product, and an edge between two nodes represents two products were frequently purchased together. we firstly conduct experiments on the parameter l, which is the influence range when renewing the information entropy. if l = , only the direct neighbors' importance of selected node will be renewed, and if l = , the importance of -length reachable nodes will be renewed and so forth. the results with varying parameter l from to on four networks are shown in figure . it can be seen from figure that, when l = , the method gets the best performance in four of the six networks. in network email, although the results when l = and l = are slightly better comparing with the case of l = , the running time increases sharply. besides, the three degrees of influence (tdi) theory [ ] also states that a individual's social influence is only within a relatively small range. based on our experiments, we set the influence range parameter l at in the preceding experiments. with specific ratio of initial infected nodes p, larger final affected scale f(t c ) means more reasonable of the parameter l. the best parameter l differs from different networks. in real life application, l can be used as an tuning parameter. many factors affect the final propagation scale in networks. a good influential nodes mining algorithm should prove its robustness in networks varying in structure, nodes size, initial infection set size, infection probability, and recovery probability. to evaluate the performance of enrenew, voterank , adaptive degree, k-shell, pagerank, and h-index algorithms are selected as benchmark methods for comparing. furthermore, greedy method is usually taken as upper bound on influence maximization problem, but it is not practical on large networks due to its high time computational complexity. thus, we added greedy method as upper bound on the two small networks (cenew and email). the final affected scale f(t c ) of each method on different initial infected sizes are shown in figure . it can be seen that enrenew achieves an impressing result on the six networks. in the small network, such as cenew and email, enrenew has an apparent better result on the other benchmark methods. besides, it nearly reaches the upper bound on email network. in hamster network, it achieves a f(t c ) of . only by ratio of . initial infected nodes, which is a huge improvement than all the other methods. in condmat network, the number of affected nodes are nearly times more than the initial ones. in a large amazon network, nodes will be affected on average for one selected initial infected node. but the algorithm performs unsatisfactory on network router. all the methods did not yield good results due to the high sparsity structure of the network. in this sparse network, the information can hardly spread out with small number of initial spreaders. by comparing the methods from the figure , enrenew surpasses all the other methods on five networks with nearly all kinds of p varying from small to large. this result reveals that when the size of initial infected nodes varies, enrenew also shows its superiority to all the other methods. what is worth noticing is that enrenew performs about the same as other methods when p is small, but it has a greater improvement with the rise of initial infected ratio p. this phenomenon shows the rationality of the importance renewing process. the renewing process of enrenew would influence more nodes when p is larger. the better improvement of enrenew than other methods shows the renewing process reasonability redistributes nodes' importance. timestep experiment is made to assess the propagation speed when given a fixed number of initial infected nodes. the exact results of f(t) varying with time step t are shown in figure . from the experiment, it can be seen that with same number of initial infected nodes, enrenew always reaches a higher peak than the benchmark methods, which indicates a larger final infection rate. in the steady stage, enrenew surpasses the best benchmark method by . %, . %, . %, . %, . % and . % in final affected scale on cenew, email, hamster, router, condmat and amazon networks, respectively. in view of propagation speed, enrenew reaches the peak at about th time step in cenew, th time step in email, th time step in hamster, th time step in router, th time step in condmat and th time step in amazon. enrenew always takes less time to influence the same number of nodes compared with other benchmark methods. from figure , it can also be seen that k-shell also performs worst from the early stage in all the networks. nodes with high core value tend to cluster together, which makes information hard to dissipate. especially in the amazon network, after timesteps, all other methods reach a f(t) of . , which is more than twice as large as k-shell. in contrast to k-shell, enrenew spreads the fastest from early stage to the steady stage. it shows that the proposed method not only achieve a larger final infection scale, but also have a faster infection rate of propagation. in real life situations, the infected rate λ varies greatly and has huge influence on the propagation procedure. different λ represents virus or information with different spreading ability. the results on different λ and methods are shown in figure . from the experiments, it can be observed that in most of cases, enrenew surpasses all other algorithms with λ varying from . to . on all networks. besides, experiment results on cenew and email show that enrenew nearly reaches the upper bound. it shows enrenew has a stronger generalization ability comparing with other methods. especially, the enrenew shows its impressing superiority in strong spreading experiments when λ is large. generally speaking, if the selected nodes are widely spread in the network, they tend to have an extensive impact influence on information spreading in entire network. l s is used to measure dispersity of initial infected nodes for algorithms. figure shows the results of l s of nodes selected by different algorithms on different networks. it can be seen that, except for the amazon network, enrenew always has the largest l s , indicting the widespread of selected nodes. especially in cenew, enrenew performs far beyond all the other methods as its l s is nearly as large as the upper bound. in regard to the large-scale amazon network, the network contains lots of small cliques and k-shell selects the dispersed cliques, which makes k-shell has the largest l s . but other experimental results of k-shell show a poor performance. this further confirms that enrenew does not naively distribute selected nodes widely across the network, but rather based on the potential propagation ability of each node. figure . this experiment compares different methods regard to spreading speed. each subfigure shows experiment results on one network. the ratio of initial infected nodes is % for cenew, email, hamster and router, . % for condmat and . % for amazon. the results are obtained by averaging on independent runs with spread rate λ = . in sir. with the same spreading time t, larger f(t) indicates larger influence scale in network, which reveals a faster spreading speed. it can be seen from the figures that enrenew spreads apparently faster than other benchmark methods on all networks. on the small network cenew and email, enrenew's spreading speed is close to the upper bound. . . figure . this experiment tests algorithms' effectiveness on different spreading conditions. each subfigure shows experiment results on one network. the ratio of initial infected nodes is % for cenew, email, hamster and router, . % for condmat, and . % for amazon. the results are obtained by averaging on independent runs. different infected rate λ of sir can imitate different spreading conditions. enrenew gets a larger final affected scale f(t c ) on different λ than all the other benchmark methods, which indicates the proposed algorithm has more generalization ability to different spreading conditions. . this experiment analysis average shortest path length l s of nodes selected by different algorithms. each subfigure shows experiment results on one network. p is the ratio of initial infected nodes. generally speaking, larger l s indicates the selected nodes are more sparsely distributed in network. it can be seen that nodes selected by enrenew have the apparent largest l s on five networks. it shows enrenew tends to select nodes sparsely distributed. the influential nodes identification problem has been widely studied by scientists from computer science through to all disciplines [ ] [ ] [ ] [ ] [ ] . various algorithms that have been proposed aim to solve peculiar problems in this field. in this study, we proposed a new method named enrenew by introducing entropy into a complex network, and the sir model was adopted to evaluate the algorithms. experimental results on real networks, varying from small to large in size, show that enrenew is superior over state-of-the-art benchmark methods in most of cases. besides, with its low computational complexity, the presented algorithm can be applied to large scale networks. the enrenew proposed in this paper can also be well applied in rumor controlling, advertise targeting, and many other related areas. but, for influential nodes identification, there still remain many challenges from different perspectives. from the perspective of network size, how to mine influential spreaders in large-scale networks efficiently is a challenging problem. in the area of time-varying networks, most of these networks are constantly changing, which poses the challenge of identifying influential spreaders since they could shift with the changing topology. in the way of multilayer networks, it contains information from different dimensions with interaction between layers and has attracted lots of research interest [ ] [ ] [ ] . to identify influential nodes in multilayer networks, we need to further consider the method to better combine information from different layers and relations between them. the scientific collaboration networks in university management in brazil arenas, a. self-similar community structure in a network of human interactions insights into protein-dna interactions through structure network analysis statistical analysis of the indian railway network: a complex network approach social network analysis network analysis in the social sciences prediction in complex systems: the case of the international trade network the dynamics of viral marketing extracting influential nodes on a social network for information diffusion structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review efficient immunization strategies for computer networks and populations a study of epidemic spreading and rumor spreading over complex networks epidemic processes in complex networks unification of theoretical approaches for epidemic spreading on complex networks epidemic spreading in time-varying community networks suppression of epidemic spreading in complex networks by local information based behavioral responses efficient allocation of heterogeneous response times in information spreading process absence of influential spreaders in rumor dynamics a model of spreading of sudden events on social networks daniel bernoulli?s epidemiological model revisited herd immunity: history, theory, practice epidemic disease in england: the evidence of variability and of persistency of type infectious diseases of humans: dynamics and control thermodynamic efficiency of contagions: a statistical mechanical analysis of the sis epidemic model a rumor spreading model based on information entropy an algorithmic information calculus for causal discovery and reprogramming systems the hidden geometry of complex, network-driven contagion phenomena extending centrality the h-index of a network node and its relation to degree and coreness identifying influential nodes in complex networks identifying influential nodes in large-scale directed networks: the role of clustering collective dynamics of ?small-world?networks identification of influential spreaders in complex networks ranking spreaders by decomposing complex networks eccentricity and centrality in networks the centrality index of a graph a set of measures of centrality based on betweenness a new status index derived from sociometric analysis mutual enhancement: toward an understanding of the collective preference for shared information factoring and weighting approaches to status scores and clique identification dynamical systems to define centrality in social networks the anatomy of a large-scale hypertextual web search engine leaders in social networks, the delicious case using mapping entropy to identify node centrality in complex networks path diversity improves the identification of influential spreaders how to identify the most powerful node in complex networks? a novel entropy centrality approach a novel entropy-based centrality approach for identifying vital nodes in weighted networks node importance ranking of complex networks with entropy variation key node ranking in complex networks: a novel entropy and mutual information-based approach a new method to identify influential nodes based on relative entropy influential nodes ranking in complex networks: an entropy-based approach discovering important nodes through graph entropy the case of enron email database identifying node importance based on information entropy in complex networks ranking influential nodes in complex networks with structural holes ranking influential nodes in social networks based on node position and neighborhood detecting rich-club ordering in complex networks maximizing the spread of influence through a social network efficient influence maximization in social networks identifying sets of key players in a social network a shapley value-based approach to discover influential nodes in social networks identifying a set of influential spreaders in complex networks identifying effective multiple spreaders by coloring complex networks effects of the distance among multiple spreaders on the spreading identifying multiple influential spreaders in term of the distance-based coloring identifying multiple influential spreaders by a heuristic clustering algorithm spin glass approach to the feedback vertex set problem effective spreading from multiple leaders identified by percolation in the susceptible-infected-recovered (sir) model finding influential communities in massive networks community-based influence maximization in social networks under a competitive linear threshold model a community-based algorithm for influence blocking maximization in social networks detecting community structure in complex networks via node similarity community structure detection based on the neighbor node degree information community-based greedy algorithm for mining top-k influential nodes in mobile social networks identifying influential nodes in complex networks with community structure an efficient memetic algorithm for influence maximization in social networks efficient algorithms for influence maximization in social networks local structure can identify and quantify influential global spreaders in large scale social networks identifying influential spreaders in complex networks by propagation probability dynamics systematic comparison between methods for the detection of influential spreaders in complex networks vital nodes identification in complex networks sir rumor spreading model in the new media age stochastic sir epidemics in a population with households and schools thresholds for epidemic spreading in networks a novel top-k strategy for influence maximization in complex networks with community structure identifying influential spreaders in complex networks based on kshell hybrid method identifying key nodes based on improved structural holes in complex networks ranking nodes in complex networks based on local structure and improving closeness centrality an efficient algorithm for mining a set of influential spreaders in complex networks the large-scale organization of metabolic networks the koblenz network collection the network data repository with interactive graph analytics and visualization measuring isp topologies with rocketfuel graph evolution: densification and shrinking diameters defining and evaluating network communities based on ground-truth the spread of obesity in a large social network over years identifying the influential nodes via eigen-centrality from the differences and similarities of structure tracking influential individuals in dynamic networks evaluating influential nodes in social networks by local centrality with a coefficient a survey on topological properties, network models and analytical measures in detecting influential nodes in online social networks identifying influential spreaders in noisy networks spreading processes in multilayer networks identifying the influential spreaders in multilayer interactions of online social networks identifying influential spreaders in complex multilayer networks: a centrality perspective we would also thank dennis nii ayeh mensah for helping us revising english of this paper. the authors declare no conflict of interest. key: cord- -w k r z authors: arazi, r.; feigel, a. title: discontinuous transitions of social distancing date: - - journal: nan doi: nan sha: doc_id: cord_uid: w k r z the st wave of covid- changed social distancing around the globe: severe lockdowns to stop pandemics at the cost of state economies preceded a series of lockdown lifts. to understand social distancing dynamics it is important to combine basic epidemiology models for viral unfold (like sir) with game theory tools, such as a utility function that quantifies individual or government forecast for epidemic damage and economy cost as the functions of social distancing. here we present a model that predicts a series of discontinuous transitions in social distancing after pandemics climax. each transition resembles a phase transition and, so, maybe a general phenomenon. data analysis of the first wave in austria, israel, and germany corroborates the soundness of the model. besides, this work presents analytical tools to analyze pandemic waves. pandemics are complex medical and socioeconomic phenomena [ , ] : near social interactions benefit both spread of disease and significant part of modern economy [ , ] . during covid- most government and individuals accepted significant limitation on interpersonal contacts (aka social distancing) to reduce pandemics at the cost of the economy. can the dynamics of social distancing be explained as a self-contained socio-epidemiological model [ ] [ ] [ ] or completely subjected to extrinsic effects [ ] is a crucial question for sociophysics, a field that tries to describe social human behavior as a physical system. social distancing changes as pandemic unfolds, reflecting a change in personal and government beliefs in the future. the spread of disease increases the individual probability to become sick and may collapse the national health system. social distancing reduces interpersonal interactions and complicates some individual production. the level of social distancing corresponds to equilibrium between future beliefs on epidemic size and economy cost [ , ] . to address social distancing classical epidemiological modeling is extended with economy or game theory tools [ - , , ] . sir model [ ] addresses disease spread as gas or network-like [ ] interaction of susceptible, infected, and recovered individuals. an infected transmit the disease to susceptible from the moment of infection until he/she recovers. social distancing reduces inter-person interaction and thus reduces the effectivity of transmission, but at the same time claims significant economic price. to find an equilibrium level of social distancing a utility function quantifies the negative weight of epidemics, social distancing itself, and the economic cost of social distancing. utility function extension of the sir model is in the focus of game theory treatment of vaccine policies [ ] and recent estimates of covid- economy damage due to social distancing [ , , , [ ] [ ] [ ] . * sasha@phys.huji.ac.il here we show that social distancing during a pandemic may have discontinuous changes that to some extent resemble phase-transitions. in this work, epidemy starts to unfold according to the sir model. at each moment social distancing depends on utility functions that quantify the final epidemic size and social distancing economy cost. optimal values of social distancing correspond to the maxima of the utility function. discontinuous transitions correspond to the abrupt changes in maxima locations. this work presents an analytical expression of the general utility function in a polynomial form. utility function, so, reminds the free energy of a system during ginzburg-landau phase transition [ ] [ ] [ ] [ ] and may have some level of universality. the first component of the utility function is economic gain or loss from a decrease or increase of social distancing. thus social distancing parameter should address changes in the number of individuals that produce or take place in productive mingling [ , ] . this work associates social distancing with s th = /r, where r is a basic reproductive number. r is a major parameter of the sir model -the average number of susceptible that an infected person infects. epidemy breaks out when r > . network interpretation maps sir model on a network with edge occupancy t = −exp(−s th ) [ ] . a utility function, as all other parameters of the model, depends on network topology and s th . this work considers only changes in s th . association of social distancing transitions only with s th is supported by the fit of covid- data. the second component of the utility function comprises epidemy cost that favors social distancing. in this work, this cost corresponds to individual future probability to become sick. this probability is proportional to the forward amount of infected till the end of epidemic wave -final epidemy size (fes). an important contribution of this work is an analytical presentation of fes derivatives due to basic reproductive number with the help of lambert w function [ ] [ ] [ ] . the work proceeds with the presentation of the sir model with induced transitions (sirit), almost analytical treatment of this model, the calibration of an epidemic and economy parameters of the model using time series of active cases and causalities during the st wave of covid- in austria, israel, and germany, followed by a discussion of obtained results and their implications. classical sir model separates the population in three compartments: s susceptible, i infected and r recovered. the flux between compartments goes in the order s → i → r since susceptible becomes infected at frequency β − during an encounter with an infected one. infected becomes recovered after time γ − on average. the population is well mixed and sustains the gas-like interaction of its members. during covid- pandemics, daily worldwide reports [ ] include active cases and coronavirus deaths. active cases are detected infected, which are a fraction of the total infected. thus in this work, we redefine i as active cases, and instead of recovered r will use deceased d = m r, where m is infected fatality rate (ifr). ifr is an average probability of an infected to die (for instance, reported covid- ifr in germany m ≈ . % [ ] ). here are sir equations modified to this work: where i is reported absolute amount of active cases [ ] , a ′ is a ratio between actual and reported active cases, n is the population size, < s < is the normalized (s = s n ) amount of susceptible and d are reported deaths due to epidemy. when population size n and a ′ remain constant it is convenient to unite them into a single pa- the parameter s th represents social distancing in this work. it also represents a threshold value to ratio of susceptible in population: number of infected growths when s > s th and reduces when s < s th . besides s th = /r, where r = β/γ is basic reproduction number. in this work s th changes with time according to utility function u (s th ). at each moment t the value of s current th changes if there exists s new th such that: the utility function consists of two parts that represent pandemics cost c and economy gain g: where c is proportion to epidemy size fes [ , ] : epidemy cost changes with time as population advances in (s, i) space. economy gain is some general function g(s th ). let us expand economy gain in taylor series around s current th : epidemy cost c is expanded in taylor series of the third order: coefficients a, b, c (as epidemy cost itself ( )) depend on s ht , s, i. expansion of the third order, unlike the second order in ( ), required due to non-linear behavior of fes ( ) and because pandemic cost prevents significant lift of social distancing constraints (the third term in ( )). this work considers only a decrease in social distancing ∆s th < which corresponds to the reopening of the economy. it is a consequence of an assumption that a close ≫ a open -even small constraint on social interactions bring significant economy damage. transition occur if utility function ( ), taking into account ( ) and ( ): possesses any positive values for ∆s th < . to consider only relative changes in s th , we set u (s current th ) = . utility function ( ) together with condition ( ) possesses ginzburg-landau like instability, see figure . first, no transition occurs if u < for all s th . second, discontinuous change in s th take place if there is single value u (s th ) > . third, s th changes continuously when derivatives of ( ) vanish near ∆s th = discontinuous transition occurs when there exists a single ∆s th < root for u = ( ). this condition requires the determinant of quadratic function u/∆s th ( ) to vanish: (dashed red line). two cases (dotted lines) that make possible change of s th to many values do not exist becasue either continuous or discontinuous transition occur before. this work consideres only opening of population, that corresponds to reduction of s th . utility function is approximately a cubic fuction of ∆s th (solid green). the transition predicted by exact calculation (solid blue) predict insignificant for this work changes in time and strength of the transition. the corresponding ∆s th : the new s new th is: coefficients a, b, c possess an analytic approximation, while condition ( ) reduces to a polynome of the th order. the first two equations of ( ) have a solution in the form of lambert w function [ ] [ ] [ ] : where (s , i ) are initial values for ratio of susceptible s and amount of infected i correspondingly, see appendix a. eq. ( ) is valid for any (s, i) on the trajectory in time that initiates at (s , i ). the parameter s reaches it smallest value s min when there are no more infected (i = ) at t = ∞: following ( ), fes at any time t is: and parameters a, b, c in ( ) are the taylor coefficients: the parameters ( ) are polynomials of log s of the order , and correspondingly, see appendix b. thus condition ( ) is a polynomial of the th order (quatric function) of log s, with coefficients that are the functions of (s , i , s, i, s th , β, a). consider population at state (s , i , s, i, s th , β, a). figure . transitions of s th take place until ( ) predicts ∆s th = . following ( ) and ( ) it happens when the first two derivatives of ( ) vanish: . the time to these infinite number of transitions to take place remain finite because time is finite to pass between and two values of s, see ( ) . after the limit of transition, utility function preserves continuous transition state. otherwise if s th remain constant the utility function would make possible many values of s th with u (s th ) > , see figure . at the region of continuous transitions, at each moment equation: defines s th , and ( ) is solved numerically. transitions of s th result in time derivative discontinuities of s and i. these derivative discontinuities can be detected, see figure . sirit model provided a successful fit of covid- data in austria, germany and israel. austria and israel are countries with similar population sizes and with similar policies during the initial stages of the st wave. germany is a country with about × population size that still demonstrates sir like behavior during the first wave. the main purpose of the fit is to show that selfcontained model sirit is capable of description dynamics of the st covid- wave. full optimization of the covid- data fit and its validation is out of the scope of this work. the fit proceeds in the following steps: first, a small region around the greatest number of infected, see ( ) and ( ) . complete dynamics of the first wave active cases and susceptible, see figure , follows eqs. ( ) and the fitted s , i , s th , β, a, n together with the transitions locations t i , s i tr and strengths s i th . see figure for changes in s th . to fit casualties, an effective population size n is chosen to fit reported coronavirus death at the th day of the first wave. it predicted quite small n less than / austrian population. this result to be addressed during the discussion. besides, there exists some time shift between calculated and reported coronavirus death. two alternative fits of austria covid- data are brought in table i. table i summarizes the results for all three countries. all countries demonstrated a low size of the effective population. one of the assumptions that s ≈ . in the case of israel, it required to be constrained. neither of deviations from the fit refutes the main results of this work. the work introduces sirit, a standard susceptible-infected-recovered (sir) model extended with a utility function that predicts induced transitions (it) of social distancing. the model provides almost analytical treatment and reasonable but an ambiguous fit of covid- st wave active cases and casualties. let's summarize and discuss the main assumptions, results, and implications of this work together with some alternative approaches. the validity of the predicted discontinuous dynamics of social distancing and results depend on the specific choice of the social distancing parameter and the choice of a utility function. the choice of s th = /r as a social distancing parameter is not unique. in the framework of sir, for instance, another valid candidate is β -the probability of disease transmission per unit time [ , ] . this parameter depends both on clinics of infection and the mechanism of interpersonal interactions. the fit of the first social distancing transitions (deviations from sir model) after covid- climax in austria, germany, and israel, but, demonstrates that β remains constant during the transitions. the purpose of the fit is to show that there is a possibility to fit the real data using many transitions. a) active cases -reported (dashed green) and calculated with many transitions (solid blue). before transitions (red bars) classical sir fits well the reported active cases. there is a significant deviation of sir from reported cases after the first transition (dashed blue). sirit model with many transitions fir well entire range of the first wave, though the fit was obtained using a small range around the greatest of active cases and characteristics of the first transition (horizontal error bars). b) susceptibles and socal distance parameter s th . at each transition s th changes its value. series of discontinuous transitions is followed by a continuous change region. c) coronavirus deaths. there exists a time delay between reported and calculated deaths. this delay can be explained by the long course of covid- . an interesting conclusion of this work is that change in social distancing corresponds to γ -a rate of becoming immune. transition corresponds to changes in s th = γ/β while β remains constant. it seems a fallacy because γ seems to depend only on epidemy clinics. even so, social distancing affects γ -society on alert removes contagious individuals by distancing from confirmed sick or even from asymptomatic cases that had contact with a sick person. in an alternative way, /s th = r can serve as a social distancing parameter. this choice does not change major predictions or analytical developments of this work. the parameter < s th < that can be compared with the fraction of susceptible in a population serves better the purpose of this work. the price of pandemics can go beyond the final epidemy size (fes). for instance, one may include derivatives of infected in time as a psychological factor that affects individual decision making. there is a lot of room to make utility function more complicated. relative weights of different pandemic characteristics on individual or government decision making is an important question for future investigations and out of the scope of this work. the first wave of covid- in austria, germany, and israel was fitted using sirit in two different ways: the first, series of discontinuous transitions with con-stant economy parameters. the parameters a open , b open are fitted by the first transition. the second, economic weights are fitted for every candidate transition (devia- figure . fit of israel covid- st wave. the results are similar to the case of austria. the fit is valid until the beginning of the nd wave about at the th day of the first one a) active cases. a significant deviation exists between reported and calculated active cases even before the start if the second wave. b) susceptible and s th . the social distancing parameter s th remains a bit higher in israel rather than in austria or germany. c) coronavirus deaths. the time delay between reported and calculated cases is smaller than in the case of austria. it can be explained either by late or early reports of coronavirus tests or reported deaths in israel or austria. d) alternative fit with two transitions. tion from sir model). both these scenarios include discontinuous changes in social influence and have the same first transition. all fit attempts of this work need parameters of the sir model to be constant from pandemic climax (greatest number of infected) till the first transition. deviations from the fit may reflect a change of regulation and test policies during the first wave. analysis of the first wave predicts small, less than / , effective population size in all three tested countries, see table i . an estimate of effective population size depends on the choice of m -infected fatality rate (ifr). increase/reduction in m causes proportional reduction/increase in effective population size n and predicted ration a ′ between reported and real numbers of infected. these values in table i correspond to m = . %. the reported value of m for germany is . % [ ] . thus n and a ′ maybe about × lower than in table i . nevertheless m as low as . % were reported [ ] . all other prediction or results of this work, including the graphs, are independent of m . mortality rate m > . % causes non-physical a ′ < in case of israel. an explanation of low n may be that the ini-tial lockdown separated the population in disconnected domains [ , ] and the wave of epidemy occurred in a limited number of domains. the other possible explanation is that a significant part of the population is immune to covid- [ ] . finally, sir approach with quasi-constant parameters may be an oversimplified presentation of reality. small effective population size during the first wave may indicate a danger of abrupt transition to a bigger population size when s th reduces below some critical value. it may result in a significant second wave of the epidemy. the critical value of the basic reproduction number was reported for some interaction networks [ , ] , while the majority of the networks lack it [ , ] . to conclude, this work predicts observable transitions of social distancing and provides tools for quantitative analysis of pandemic waves. observable phenomena are essential to test the validity of human behavior modeling. the tools, as sir model itself, contribute to social epidemiology [ ] and, to spread of non-contagious but "going-viral" phenomena [ ] . appendix a: analytic solution of ( ) the first two equations of ( ) may be rewritten as: using transformation ∂ log i ∂t = z, ∂z ∂t = ∂z ∂ log i ∂ log i ∂t = ∂z ∂ log i z, x = log i. integration of (a ) results in: let us notice that s th , f (s th ) and w (f (s th )) are constant along trajectory (s t , i t ) until value of s th changes by a transition. the values f (s th ) and w (f (s th )): derivatives of log w (f ) due to f are invariant along (s t , i t ) trajectory, see appendix b. derivatives of f due to s th are polynomials of log [s t ]. consider the first derivative of (a ) taking into account (a ): where: expression (a ) can be rewritten in the form: ai − s min + s th log s s + s − s min s th ( ai − s min + s ) + s min s th (ai − s min + s ) (ai + s min + s ) + s min s th log s s × (c ) s th ( ai + s min + s ) + s min (ai − s min + s ) + s th log s s ( s min + s th ) − s th + s min (ai − s min + s ) − s min s th + s th modeling the interplay between human behavior and the spread of infectious diseases the lancet infectious diseases appendix b: derivatives of lambert w function all derivatives of w due to f are constant along any trajectory in (s, i) space:the final expressions are invariant until s th changes because they depend on f and w only, see (a ).appendix c: final expressions for a,b,cthe first:the second: key: cord- - af authors: lee, duan-shin; zhu, miao title: epidemic spreading in a social network with facial masks wearing individuals date: - - journal: nan doi: nan sha: doc_id: cord_uid: af in this paper, we present a susceptible-infected-recovered (sir) model with individuals wearing facial masks and individuals who do not. the disease transmission rates, the recovering rates and the fraction of individuals who wear masks are all time dependent in the model. we develop a progressive estimation of the disease transmission rates and the recovering rates based on the covid- data published by john hopkins university. we determine the fraction of individual who wear masks by a maximum likelihood estimation, which maximizes the transition probability of a stochastic susceptible-infected-recovered model. the transition probability is numerically difficult to compute if the number of infected individuals is large. we develop an approximation for the transition probability based on central limit theorem and mean field approximation. we show through numerical study that our approximation works well. we develop a bond percolation analysis to predict the eventual fraction of population who are infected, assuming that parameters of the sir model do not change anymore. we predict the outcome of covid- pandemic using our theory. in december of a few patients of a new infectious respiratory disease were detected in wuhan, china. this disease has been called coronavirus disease (covid- ) and the virus that causes covid- has been named sars-cov- by the world health organization (who). who declared the outbreak a public health emergency of international concern at the end of january , and a pandemic on march , . since the outbreak, most countries have adopted various measures in an attempt to contain the pandemic. these measures include restriction of traveling, shutting down schools, restaurants and businesses, canceling large gatherings such as concerts, sports and religious activities, and even city lockdowns where residents are not allowed to leave home unless emergencies. clearly these measures seriously affect daily lives and are devastating to the economics. the purpose of this report is to show that wearing facial masks is a simple and inexpensive measure to contain the spread of covid- . in fact, we shall show that if a relatively small fraction of population wears facial masks, the disease can be contained. facial masks have been shown in labs to be effective to limit the spread of droplets or aerosols if a wearer coughs [ ] - [ ] . this ability is measured by a quantity called the outward mask filter efficiency of masks. facial masks also protect their wearers from inhaling droplets or aerosols from a nearby cougher, if the cougher does not wear a mask. this ability is measured by a quantity called the inward mask filter efficiency of masks. thus, facial masks can be particularly useful to confine the spreading of diseases that transmit through droplets or aerosols. however, the effect of wearing facial masks to the epidemic spreading has never been studied in a network level. in this report we present an epidemic network study to justify this argument. specifically, we propose a time dependent susceptible-infected-recovered (sir) model with two types of individuals. type individuals wear a facial mask and type individuals do not. a randomly selected individual from a population is a type individual with probability p, and is of type with probability − p. there are four types of contacts between two individuals depending on whether the two individuals wear a facial mask or not. these four types of contacts have four different disease transmission rates. from the data published by john hopkins university [ ] we progressively estimate the time dependent disease transmission rates and the recovery rates of the sir model. for parameter p, we propose a stochastic version of the sir model. we derive the transition probability of the number of infected individuals from one time slot to the next. we propose a maximum likelihood estimation of p that maximizes the transition probability. the transition probability is expressed in terms of binomial distributions. parameters of the binomial distributions corresponding to real data published in [ ] are typically very large. that makes the transition probability numerically difficult to compute. we propose an approximation of the transition probability based on central limit theorem and mean field approximation. through numerical studies, we show that the approximation works well. we derive a percolation analysis of the maximum number of individuals that can eventually be infected. we incorporate the maximum likelihood estimation and the percolation analysis into the progressive estimation. that is, based on the data published by john hopkins university, we progressively estimate the disease transmission rates and the recovery rates. we then find p from the maximum likelihood estimation. finally, using percolation analysis, we predict the maximum number of individuals that can eventually be infected. the outline of this report is as follows. in section ii, we present a time dependent sir model. we present a progressive estimation of the disease transmission rates and the recovery rates of the model. in section iii, we present a maximum likelihood method to estimate p. in section iv we present a percolation analysis. in section v we present results of numerical study and simulation. we present the conclusions in section vi. in this section we present a discrete-time susceptible-infected-recovered (sir) model. time is divided into periods of equal lengths. there are two types of individuals. type individuals wear a facial mask and type individuals do not. a randomly selected individual from a population is a type individual with probability p(t) in period t. a randomly selected individual is of type with probability − p(t). let s(t) and r(t) be the number of susceptible and recovered individuals, respectively, at time t. similarly, let x i (t) be the number of infected type i individuals in period t for i = , . in this model, the disease transmission rates are time dependent. let β ij (t) be the expected number of type j susceptible individuals who receive the disease from one type i infected individual per unit time in period t. let γ(t) be the recovering rate of the disease in period t. we assume that both types of infected individuals have the same recovering rate. in this report we assume that the length of a time unit in the discrete-time sir model is τ days. the dynamics of this discrete-time sir model is as follows. the existing infected individuals at time t− transmit the disease to newly infected individuals. those infected individuals existed at time t− become recovered at time (t + )−. it is easy to derive the following set of difference equations for the sir model. where is the size of the population. we assume that the epidemics is in the early stage. that is, we assume that s(t) ≈ n. under this assumption, the difference equations for x (t) and x (t) reduce to we now determine parameters β ij (t) and p(t). we reduce the number of parameters. previous study [ ] suggested that where η and η are outward and inward efficiencies of masks, respectively. study [ ] suggested that η > η . in addition, we assume that thus, we only need to determine β (t). the other three parameters are determined according to eqs. ( ), ( ) and ( ) . substituting ( ), ( ) and ( ) into ( ) and ( ), we obtain now we determine the values of parameters γ(t), β (t) and p(t) based on the data published by john hopkins university [ ] . note that john hopkins university publishes total number of daily newly infected individuals and the number of recovered individuals. from the published data, one can easily compute r(t) for each t. the website does not distinguish between infected individuals who wear masks or who do not. thus, we must determine β (t) and p(t) based on the total number of infected individuals at time t. from the data published by john hopkins university, we estimate γ(t), β (t) and p(t). we develop in section iv a percolation analysis to determine the ultimate size of infected population, if these parameters do not change. recovering rate is simple to determine. from ( ) we have next, we determine β (t). to determine the value of β (t) in period t, we let x(t) be the total number of infected individuals in period t, i.e. adding eqs. ( ) and ( ) and solving for β (t), we obtain to determine the value of β (t) at time t, we use x(t + ) published in [ ] . we assume that x (t) and x (t) and p(t) are available for t. we use ( ) to determine β (t). the value of p(t) is determined by a maximum likelihood estimation method. we shall present this method in section iii. after β (t) at time t is determined, we use ( ) and ( ) to determine x (t + ) and x (t + ). note that the sum of x (t + ) and x (t + ) determined in this way agrees with x(t + ). we summarize the algorithm that determines parameters of the time-dependent sir model in algorithm . compute γ(t) using eq. ( ); : find β (t) using eq. ( ) and p(t), x (t), x (t) and y(t + ); compute x (t + ) and x (t + ) by eqs. ( ) and ( ); : find predicted giant component size s(t) using eq. (v); : end for in this section we present a maximum likelihood estimation method to determine the value p(t) based on x(t), x(t + ) and β (t − ). this estimation will be progressively used by algorithm in periods t = , , . . .. thus, in the rest of this report we simplify the notation by eliminating the dependency of time from p(t) and γ(t) and simply use p and γ, respectively. recall that x(t) and x(t + ) are the total number of infected individuals in periods t and t + . we would like to determine the value of p such that the likelihood of this sample path is maximized. to determine the likelihood function, we propose a probabilistic version of the sir model. this probabilistic sir model is based on independent cascade models. independent cascade models are popular not only in the study of epidemic spreading but also in the influence maximization problems of viral marketing [ ] , [ ] . in an independent cascade model, each infected node has exactly one opportunity to transmit the disease to its neighbors. whether the transmissions are successful or not depend on independent events. in our model, we distinguish between nodes that have used their opportunity to transmit the disease from those who have not. our model is a discrete time model. let x t be the total number of currently infected individuals in period t. let y t be the number of individual who contract the disease in period t. in our model, we assume that those who contract the disease in period t have ability to transmit the disease to their neighbors, and lose the ability in periods t + , t + , . . . and so on. thus, in period t there are x t − y t infected individuals who can not transmit the disease to others. those who are infected but cannot transmit the disease to others in period t can remain infected in subsequent periods or become recovered. a graphical illustration of the model is shown in figure . it is clear that this model is a discrete time markov chain. we shall derive the transition probability and findp(t) such thatp we now specify more details of the probabilistic sir model. each individual can be of type , or of type . each infectious individual has k contacts, to whom he or she can transmit the disease. a type i infectious individual can transmit the disease to a type j susceptible individual with probability φ ij , where i, j = , . parameter φ ij is related to β (t) and γ through let c be the number of type individuals among the y t infectious individuals in period t. where {u j , j = , , . . .} and {v j , j = , , . . .} are two independent and identically distributed (i.i.d.) sequences of bernoulli random variables with success probabilities p and p , respectively. the two sequences are independent to anything else. event {u j = } indicates a type infectious individual successfully transmits the disease to a neighboring node. this occurs with probability p , where similarly, event {v j = } indicates a type infectious individual successfully transmits the disease to a neighboring node with probability p , where an infected individual becomes recovered with probability γ, and remains infected otherwise. hence, where {w j , j = , , . . .} is an i.i.d. sequence of bernoulli random variable independent of anything else. the success probability of w j is − γ. we now derive an expression for the transition probability in ( ) as a function of p. let we can rewrite ( ) and ( ) in terms of i , i and i , i.e. conditioning of event {c = i}, we have conditioning on {c = i}, random variables i , i and i all have binomial distributions. define binomial probability mass function the conditional distributions of i , i and i are substituting ( ), ( ) and ( ) into ( ) and noticing that i , i and i are independent, we obtain in addition, taking average to ( ) with respect to event {c = i} using ( ), we have eq. ( ) is very complicated to evaluate and it is difficult to be directly used in the optimization problem ( ) when y(t), y(t + ) or both are large. we now propose an approximation method to simplify ( ). first, we take logarithm on ( ) to obtain from the de moivre-laplace central limit theorem we approximate binomial distributions by normal distributions. let n (z, µ, σ ) denote the probability density function (pdf) of a normal random variable with mean µ and variance σ , i.e., πσ . first, we approximate probability mass function b(i, y(t), p) in ( ) by a normal distribution. specifically, where next, recall that conditioning on event {c ∈ (z, z+dz)}, i and i are independent binomial random variables. we approximate them by normal pdf's with means and variances respectively. since i and i are conditionally independent, we have p(i + i ∈ (y(t + ) − / , y(t + ) + / ) | c ∈ (z, z + dz)) ≈ n (y(t + ), zkp + (y(t) − z)kp , we propose to further simplify eq. ( ). we apply a mean-field approximation [ ] , [ ] to replace z, which is the sample value of c, in ( ) and ( ) with the expectation e[c] = y(t)p. specifically, let where is the basic reproduction number of the branching process. we further approximateσ defined in ( ) bỹ = y(t)r assuming that p and p are small. with these approximations, the logarithmic transition probability reduces to since the second term on the right side of the preceding equation is independent of p, maximizing the logarithmic transition probability is equivalent to maximizing the first term on the right side. define we approximate the optimization problem in ( ) by the following optimization problem to study the extrema of f we differentiate f , i.e. note that f (r ) has a unique positive root, which is the last expression gives the most probable basic reproduction number that produces the sample path (y(t), y(t + )). at this root, the second derivative is negative, i.e. achieves maximum when eq. ( ) holds. we solve p from ( ) and ( ). recall that we assume ( ), ( ) and ( ) . under these assumptions, ( ) can have a unique root in [ , ], in which case, the unique root is the solution of the optimization problem ( ). eq. ( ) can have no roots in [ , ], in which case the solution of ( ) isp(t) = orp(t) = . we summarize the solution of the optimization problem ( ) in the following proposition. the proof of the proposition is presented in the appendix at the end of this report. proposition . eq. ( ) has two real roots. either eq. ( ) has exactly one root in interval [ , ], or it has no root in this interval. if eq. ( ) has exactly one root in [ , ], it is the smaller root (denoted by p ) of the two roots. in this case, the optimal solution of ( ) is p . if eq. ( ) has no roots in [ , ], either p < or p > . in the former case, the optimal solution isp(t) = . in the latter case, the optimal solution isp(t) = . in this section we consider a random contact network, in which a disease transmitted by droplets or aerosols spreads according to an independent cascade model. that is, the disease transmits from a node at one end of an edge to the node at the other end with a probability. in addition, the transmissions along all edges are independent. we shall present a percolation analysis of this model and obtain percolation thresholds and sizes of giant components. we now describe our model. consider a random graph (g, v, e). randomly select a node from the graph. let z be the degree of this node. let g (z) denote the probability generating function of z, i.e., now randomly select an edge. let y be the excess degree of a node reached along the randomly selected edge. let g (z) be the probability generating function of y , i.e. every node in this graph can be one of two types. a type node denotes an individual who wears a facial mask and a type node denotes an individual who does not. a randomly selected node is of type with probability p and is of type with probability − p. assume that this event is independent of anything else. as mentioned before, an infectious disease spreads in this network according to an independent cascade model. consider a randomly selected edge connecting two nodes, say node v and node v . let φ ij be the conditional probability that the disease transmits from node v to node v , given that the types of nodes v and v are i and j, respectively, where i, j = , . we now present a percolation analysis of the random network model described above. percolation analysis has been a useful tool to study resilience of communication networks [ ] , [ ] and epidemic networks [ ] - [ ] . there are two types of percolation models, bond percolation and site percolation, depending on whether edges or nodes, respectively, are removed randomly [ ] . our model is a form of bond percolation, where edges are randomly removed. removed edges imply that diseases can not be transmitted from one end of these edges to the other end. the size of the largest component in the percolated network is the maximum fraction of possibly infected population in the network. randomly select an edge. let v and v be the two nodes at the two ends of the edge. suppose that the type of node v is i. let e i be the event that along the selected edge from v , one can not reach a giant component. let u i = p(e i ). now we condition on the event that the type of node v is j. event e i will occur, if the randomly selected edge is removed. this occurs with probability − φ ij . with probability φ ij the randomly selected edge is present. let y be the number of neighbors of node v , not including v . event e i will occur, if one can not reach a giant component along any one these y edges. combining these arguments, we have where c is the number of type nodes connected to node v not including v . the distribution of c conditioning on y = k is binomial. thus, eq. ( ) becomes for i = , . eq. ( ) is a system of nonlinear equations, from which we can solve for u and u . once we have u and u we can compute the giant component size of the percolated network. randomly select a node from the network. let s i , where i = , , be the conditional probability that the randomly selected node is connected with a giant component, given that the selected node is of type i. let x be the degree of the randomly selected node. the node is connected with a giant component if along at least one of its edges one can reach a giant component. conditioning on x = k, let i be the number of type nodes among the k neighbors. combining all these arguments, we have for i = , . by taking average on the conditional probabilities, a randomly selected node is connected to a giant component with probability this is also the expected size of the giant component. let u be a × vector over the set of real numbers r. specifically, let where symbol t denotes transposition of vectors. let f be a vector-valued function mapping from r to r , i.e., where x = (x , x ). eq. ( ) implies that u is a root of the roots of equations of the form ( ) are also called the fixed points of function f . it is clear that function f always has fixed point = ( , ) t . lee et al. [ ] established that f has an additional fixed point if the dominant eigenvalue of the jacobian matrix evaluated at is greater than one. in addition, this fixed point is attractive. the jacobian matrix for function f evaluated at x = a is defined as the jacobian matrix for the function f defined in ( ) evaluated at ( , ) t is where e[y ] = g ( ) is the expected excess degree of a node reached by a randomly selected edge. it is easy to derive the eigenvalues of j . denote the two eigenvalues of j by λ and λ , then, note that both eigenvalues are real, and λ > λ . thus, spectral radius or the dominant eigenvalue of j is λ . it is also this eigenvalue that controls the percolation threshold of the epidemic network. in this section we present numerical and simulation results. we first verify the accuracy of ( ) to approximate the transition probability in ( ). we select an arbitrary set of parameters: according to [ ] , the efficiency of a typical mask is in the range from . % to %. since masks are typically more efficient to stop viral transmission if sources wear masks [ ] , we set η = . and η = . . we select a value for p between and . we randomly generate ten thousand values for y(t). we perform simulation to generate ten thousand values for y(t + ) based on y(t) and p. we solve ( ) based on ( ) and ( ) for the ten thousand values of y(t) and y(t + ). we calculate the average distance between the two solutions. the result is shown in figure . this shows that the approximation method works quite well. next, we use the data published by john hopkins university [ ] to predict the spread of the covid- pandemic using our model. due to large infection rates of some countries, we assume that k = e[y ] = in order to have φ ≤ . we choose τ = days as the width of a time slot in the time dependent sir model. we first execute algorithm on the number of infected and recovered individuals in the mainland of china. the time functions of β (t) and p(t) are shown in figure . the maximum fraction of population who will be infected predicted at time t is also shown in the figure. the epidemic started from the people's republic of china from january, . the authority managed to contain the epidemic very well by the end of february. however, there are several significant events between january and august of . in mid april, the chinese authority revised the way that death toll is calculated [ ] . in mid june, beijing faced a second wave of infections [ ] in mid july, there was a surge of infected number of individuals in xinjiang [ ] . at the end of july, there was a surge in dalian [ ] . from figure , we see that β (t) is large initially in january, and is controlled to reach a small value by the end of february. the value of β (t) again rises and drops before and after the events described above. from figure , note that p(t) also rises and drops before and after the events. however, function p(t) lags behind function β (t). this correlation has a nice interpretation. rise in β (t) usually manifests itself in the rise of infected number of individuals. as the population sees a rise in the infected number of individuals, more people wear masks to protect themselves. on the other hand, decreases of β (t) result in less number of newly infected individuals. people typically see this as sign of a safe community. thus, less people wear masks in public places. as β (t) rises, the predicted size of giant component s(t) also increases. however, as p(t) catches up and rises, s(t) decreases. notice that the time sequence β (t) reflects a joint efforts of many measures to contain the epidemic. wearing facial masks is a measure in the personal level. shutting down schools, businesses, and keeping people at home is a measure in the government level. in this paper, we consider only the measure of wearing masks and ignore other measures. thus, p(t) in figure ? ? may be higher than the actual fraction of population who wear masks, as it reflects a joint effect of many measures to contain the epidemic. next we study the epidemics of the united states of america, india and france. we execute algorithm to compute β (t), p(t) and s(t). the time functions of β (t), p(t) and s(t) for the three countries are shown in figure , figure and figure , respectively. we present p(t) and s(t) of the three countries in figure and figure . since these functions fluctuate a lot, we use a built-in matlab function to compute the polynomial regression of these functions in order to show a general trend. note that india has a smaller infection rate β (t) than that of the u.s.a. however, india has a much smaller predicted size of giant component than that of the u.s.a. this is because india has a very large recovery rate [ ] , [ ] . we also note that around july france has a β (t) comparable with that of the us. however, france has a high predicted s(t) after august. it is worth noting that p(t) of france in mid august is not close to . by raising p(t), france may have a better control of the epidemics. in this report, we presented a time dependent sir model, in which some individuals wear facial masks and some do not. based on the number of infected individuals and the number of recovered individuals published by j. h. university, we estimate the disease infection rates and recovery rates. we proposed a probabilistic version of the sir model. we derived the transition probability of this random sir model. by maximizing the transition probability, we estimate the most probable value of the fraction of population who wear masks. this transition probability numerically difficult to compute, if the states of model are large. based on central limit theorem, we proposed an approximation. through numerical and simulation study, we show that the approximation works well. finally we carried out a percolation analysis to predict the eventual fraction of population who will be infected with the disease. we proposed a progress analysis of the epidemics. using results from the progressive analysis, we analyzed the epidemics of four countries. effectiveness of facemasks to reduce exposure hazards for airborne infections among general populations the effect of mask use on the spread of influenza during a pandemic testing the efficacy of homemade masks: would they protect in an influenza pandemic respiratory source control using a surgical mask: an in vitro study maximizing the spread of influence through a social network networks, crowds and markets reasoning about a highly connected world networks: an introduction statistical approach to quantum field theory : an introduction resilence of the internet to random breakdowns percolation in interdependent and interconnected networks: abrupt change from second-to first-order transitions percolation on heterogeneous networks as a model for epidemics epidemics and percolation in small-world networks epidemic models and percolation percolation and epidemic thresholds in clustered networks a generalized configuration model with degree correlations and its percolation analysis the death toll in wuhan was revised from to covid- pandemic in beijing officials rule out domestic transmission as origin of dalian covid- cluster coronavirus recovery: what do india's high covid recovery numbers really mean? we explain india's covid- recovery rate surges past %, one of the highest globally: govt in this appendix, we prove proposition . proof of proposition . we rewrite r defined in ( ) to show dependency with p explicitly, i.e.r (p) = kφ (η η p − (η + η )p + ).in ( ) we have assumed ( ), ( ) and ( ) . note that η < and η < . since the right side of ( ) is a quadratic polynomial in p, it is easy to establish the following results. ) the minimum of r (p) occurs atthis point is greater than one, since η < and η < . which is negative. from the results above and the fact that the right side of ( ) is positive, it follows that ( ) must have two real roots. it also follows from the results above that r (p) is monotonically decreasing and ( ) can not have two roots in [ , ]. since r (p) is decreasing for p ∈ [ , ], it follows that if ( ) has exactly one root in [ , ], the root is p . it is also quite clear that if ( ) has a unique root in [ , ], the root is the optimal solution of ( ). now we analyse cases, in which ( ) does not have roots in [ , ]. since r (p) is decreasing for p ∈ [ , ], it follows that either both roots of ( ) are greater than , or p < and the other root is greater than . we now analyze these two cases separately. we first note that we have treated r as an independent variable in ( ). thus, the derivative f (r ) is with respect to r . in the two cases in which eq. ( ) has no roots in [ , ], we have to treat the objective function in ( ) as a function of p. specifically, the derivative of the objective function with respect to p is d dpwhere f (r ) is given in ( ). we claim that in the first case the objective function of ( is increasing for all p ∈ [ , ]. thus, the optimal solution of ( ) isp(t) = . we also claim that in the second case, the objective function of ( is decreasing for all p ∈ [ , ]. thus, the optimal solution of ( ) isp(t) = .we now prove the two claims. from ( ) and ( ),wherenote that h is a quadratic and concave function of r with h( ) = y(t + ) /y(t) > . function h has a zero at r = c, where c is the right side of ( ), i.e. c = −n + n + y(t + ) y(t) .if c < r ( ), both roots of ( ) are greater than . this corresponds to the first case. if c > r ( ), then p < and the other root is greater than . this corresponds to the second case. in the first case,for all p ∈ [ , ]. from ( ), it follows that df (r (p))/dp > , since r (p) < for all p ∈ [ , ]. this proves the first claim.in the second case, r (p) < c for all p ∈ [ , ]. thus, h(r (p)) > g(c) = for all p ∈ [ , ]. from ( ), it follows that df (r (p))/dp < . this proves the second claim. key: cord- -jomeywqr authors: massonis, gemma; banga, julio r.; villaverde, alejandro f. title: structural identifiability and observability of compartmental models of the covid- pandemic date: - - journal: nan doi: nan sha: doc_id: cord_uid: jomeywqr the recent coronavirus disease (covid- ) outbreak has dramatically increased the public awareness and appreciation of the utility of dynamic models. at the same time, the dissemination of contradictory model predictions has highlighted their limitations. if some parameters and/or state variables of a model cannot be determined from output measurements, its ability to yield correct insights -- as well as the possibility of controlling the system -- may be compromised. epidemic dynamics are commonly analysed using compartmental models, and many variations of such models have been used for analysing and predicting the evolution of the covid- pandemic. in this paper we survey the different models proposed in the literature, assembling a list of model structures and assessing their ability to provide reliable information. we address the problem using the control theoretic concepts of structural identifiability and observability. since some parameters can vary during the course of an epidemic, we consider both the constant and time-varying parameter assumptions. we analyse the structural identifiability and observability of all of the models, considering all plausible choices of outputs and time-varying parameters, which leads us to analyse different model versions. we classify the models according to their structural identifiability and observability under the different assumptions and discuss the implications of the results. we also illustrate with an example several alternative ways of remedying the lack of observability of a model. our analyses provide guidelines for choosing the most informative model for each purpose, taking into account the available knowledge and measurements. the current coronavirus disease pandemic, caused by the sars-cov- virus, continues to wreak unparalleled havoc across the world. public health authorities can use mathematical models to answer critical questions related with the dynamics of an epidemic (severity and time course of infected people), its impact on the healthcare system, and the design and effectiveness of different interventions [ ] [ ] [ ] [ ] . mathematical modeling of infectious diseases has a long history [ , ] . modeling efforts are particularly important in the context of covid- because its dynamics can be particularly complex and counter-intuitive due to the uncertainty in the transmission mechanisms, possible seasonal variation in both susceptibility and transmission, and their variation within subpopulations [ ] . the media has given extensive coverage to analyses and forecasts using covid- models, with increased attention to cases of conflicting conclusions, giving the impression that epidemiological models are unreliable or flawed. however, a closer looks reveals that these modeling studies were following different approaches, handling uncertainty differently, and ultimately addressing different questions on different time-scales [ ] . broadly speaking, data-driven models (using statistical regression or machine learning) can be used for shortterm forecasts (one or a few weeks). mechanistic models based on assumptions about transmission and immunity try to mimic how the virus spreads, and can be used to formalize current knowledge and explore long-term outcomes of the pandemic and the effectiveness of different interventions. however, the accuracy of mechanistic models is constrained by the uncertainties in our knowledge, which creates uncertainties in model parameters and even in the model structure [ ] . further, the uncertainty in the covid- data and the exponential spread of the virus amplify the uncertainty in the predictions. predictability studies [ ] seek the characterization of the fundamental limits to outbreak prediction and their impact on decision-making. despite the vast literature on mathematical epidemiology in general, and modeling of covid- in particular, comparatively few authors have considered the predictability of infectious disease outbreaks [ , ] . uncertainty quantification [ ] is an interconnected concept that is also key for the reliability of a model, and that has received similarly scant attention [ , ] . in addition to predictability and uncertainty quantification approaches, identifiability is a related property whose absence can severely limit the usefulness of a mechanistic model [ ] . a model is identifiable if we can determine the values of its parameters from knowledge of its inputs and outputs. likewise, the related control theoretic property of observability describes if we can infer the model states from knowledge of its inputs and outputs. if a model is non-identifiable (or non-observable) different sets of parameters (or states) can produce the same predictions or fit to data. the implications can be enormous: in the context of the covid- outbreak in wuhan, non-identifiability in model calibrations was identified as the main reason for wide variations in model predictions [ ] . reliable models can be used in combination with optimization and optimal control methods to find the best intervention strategies, such as lock-downs with minimum economic impact [ , ] . further, they can be used to explore the feasibility of model-based real-time control of the pandemic [ , ] . however, using calibrated models with non-identifiability or non-observability issues can result in bad or even dangerous intervention and control strategies. it is common to distinguish between structural and practical identifiability. structural non-identifiability may be due to the model and measurement (input-output) structure. practical non-identifiability is due to lack of information in the considered data-sets. non-identifiability results in incorrect parameter estimates and bad uncertainty quantification [ , ] , i.e. a misleading calibrated model which should not be used to analyze epidemiological data, test hypothesis, or design interventions. the structural identifiability of several epidemic mechanistic models has been studied e.g. in [ ] [ ] [ ] [ ] [ ] [ ] . other recent studies have mostly focused on practical identifiability, such as [ , , [ ] [ ] [ ] . in this paper we assess the structural identifiability and observability of a large set of covid- mechanistic models described by deterministic ordinary differential equations, derived by different authors using the compartmental modeling framework [ ] . compartmental models are widely used in epidemiology because they are tractable and powerful despite their simplicity. we collect different compartmental models, of which we consider several variations, making up a total of different model versions. our aim is to characterize their ability to provide insights about their unknown parameters -i.e. their structural identifiability -and unmeasured states -i.e. their observability. to this end we adopt a differential geometry approach that considers structural identifiability as a particular case of nonlinear observability, allowing to analyse both properties jointly. we define the relevant concepts and describe the methods used in section . then we provide an overview of the different types of compartmental models found in the literature in section . we analyse their structural identifiability and observability and discuss the results in section , where we also show different ways of remedying lack of observability using an illustrative model. finally, we conclude our study with some key remarks in section . we consider models defined by systems of ordinary differential equations with the following notation: where f and h are analytical (generally nonlinear) functions of the states x(t) ∈ r n x , known inputs u(t) ∈ r n u , unknown constant parameters θ ∈ r n θ , and unknown inputs or time-varying parameters w(t) ∈ r n w . the output y(t) ∈ r n y represents the measurable functions of model variables. the expressions ( - ) are sufficiently general to represent a wide range of model structures, of which compartmental models are a particular case. definition (structurally locally identifiable [ ] ). a parameter θ i of model m is structurally locally identifiable (s.l.i.) if for almost any parameter vector θ * ∈ r n θ there is a neighbourhood n(θ * ) in which the following relationship holds:θ ∈ n(θ * ) and y(t,θ) = y(t, θ * ) otherwise, θ i is structurally unidentifiable (s.u.). if all model parameters are s.l.i. the model is s.l.i. if there is at least one s.u. parameter, the model is s.u.. likewise, a state x i (τ) is observable if it can be distinguished from any other states in a neighbourhood from observations of the model output y(t) and input u(t) in the interval t ≤ τ ≤ t ≤ t f , for a finite t f . otherwise, x i (τ) is unobservable. a model is called observable if all its states are observable. we also say that m is invertible if it is possible to infer its unknown inputs w(t), and we say that w(t) is reconstructible in this case. structural identifiability can be seen as a particular case of observability [ ] [ ] [ ] , by augmenting the state vector with the unknown parameters θ, which are now considered as state variables with zero dynamics, x = (x t , θ t ) t . the reconstructibility of unknown inputs w(t), which is also known as input observability, can also be cast in a similar way, although in this case their derivatives may be nonzero. to this end, let us augment the state vector further with w as additional states, as well as their derivatives up to some non-negative integer l: the l−augmented dynamics is: leading to the l−augmented system: remark (unknown inputs, disturbances, or time-varying parameters). in section , when reporting the results of the structural identifiability and observability analyses, we will explicitly consider some parameters as time-varying. in the model structure defined in equations ( - ) the unknown parameter vector θ is assumed to be constant. to consider an unknown parameter as time-varying we include it in the "unknown input" vector w(t). thus, changing the consideration of a parameter from constant to time-varying entails removing it from θ and including it in w(t). the elements of w(t) can be interpreted as unmeasured disturbances or inputs of unknown magnitude or, equivalently, as time-varying parameters. regardless of the interpretation, they are assumed to change smoothly, i.e. they are infinitely differentiable functions of time. for the analysis of some models it is necessary, or at least convenient, to introduce the mild assumption that the derivatives of w(t) vanish for a certain non-negative integer s (possibly s = +∞), i.e. w s) (t) and w i) (t) = for all i > s. this assumption is equivalent to assuming that the disturbances are polynomial functions of time, with maximum degree equal to s [ ] . definition (full input-state-parameter observability, fispo [ ] ). let us consider a model m given by ( ) ( ) . we augment its state vector as z(t) = x(t) t θ t w(t) t t ( ), which leads to its augmented form ( ) . we say that m has the fispo property if, for every t ∈ i, every model unknown z i (t ) can be inferred from y(t) and u(t) in a finite time interval t , t f ⊂ i. thus, m is fispo if, for every z(t ) and for almost any vector z * (t ), there is a neighbourhood n (z * (t )) such that, for allẑ(t ) ∈ n (z * (t )) , the following property is fulfilled: in this paper we analyse input, state, and parameter observability -that is, the fispo property defined aboveusing a differential geometry framework. such analyses are structural and local. by structural we refer to properties that are entirely determined by the model equations; thus we do not consider possible deficiencies due to insufficient or noise-corrupted data. by local we refer to the ability to distinguish between neighbouring states (similarly, parameters or unmeasured inputs), even though they may not be distinguishable from other distant states. this is usually sufficient, since in most (although not all, see e.g. [ ] ) applications local observability entails global observability. this specific type of observability has sometimes been called local weak observability [ ] . this approach assesses structural identifiability and observability by calculating the rank of a matrix that is constructed with lie derivatives. the corresponding definitions are as follows (in the remainder of this section we omit the dependency on time to simplify the notation): definition (extended lie derivative [ ] ). consider the system m ( - ) with augmented state vector ( ) and augmented dynamics ( ) . assuming that the inputs u are analytical functions, the extended lie derivative of the output alongf =f (·, u) is: the zero-order derivative is l f h = h, and the i−order extended lie derivatives can be recursively calculated as: definition (observability-identifiability matrix [ ] ). the observability-identifiability matrix of the system m ( - ) with augmented state vector ( ), augmented dynamics ( ), and analytical inputs u is the following mnx × nx matrix, the fispo property of m can be analysed by calculating the rank of the observability-identifiability matrix: theorem (observability-identifiability condition, oic [ ] ). if the identifiability-observability matrix of a model m satisfies rank (o i (x , u)) = nx = n x + n θ + n w , withx being a (possibly generic) point in the augmented state space, then the system is structurally locally observable and structurally locally identifiable. in this paper we generally check the oic criterion of ( ) using strike-goldd, an open source matlab toolbox [ ] . alternatively, for some models we use the maple code observabilitytest, which implements a procedure that avoids the symbolic calculation of the lie derivatives and is hence computationally efficient [ ] . a number of other software tools are available, including genssi [ ] in matlab, identifiabilityanalysis in mathematica [ ] , daisy in reduce [ ] , sian in maple [ ] , and the web app combos [ ] . it should be taken into account that in the present work we are interested in assessing structural identifiability and observability both with constant and continuous time-varying model parameters (or equivalently, with unknown inputs), as explained in remark . ideally, the method of choice should provide a convenient way of analysing models with this type of parameters (inputs). it is always possible to perform this type of analysis by assuming that the time dependency of the parameters is of a particular form, e.g. a polynomial function of a certain maximum degree. in this article we review compartmental models, which are one of the most widely used families of models in epidemiology. they divide the population into homogeneous compartments, each of which corresponds to a state variable that quantifies the number of individuals that are at a certain disease stage. the dynamics of these compartments are governed by ordinary differential equations, usually with unknown parameters that describe the rates at which individuals move among different stages of disease. the basic compartmental model used for describing a transmission disease is the sir model, in which the population is divided into three classes: • susceptible: individuals who have no immunity and may become infected if exposed. • infected and infectious: an exposed individual becomes infected after contracting the disease. since an infected individual has the ability to transmit the disease, he/she is also infectious. • recovered: individuals who are immune to the disease and do not affect its transmission. another class of models, called seir, include an additional compartment to account for the existence of a latent period after the transmission: • exposed: individuals vulnerable to contracting the disease when they come into contact with it. these idealized models differ from the reality. contact tracing, screening, or changes in habits are some differences that are not considered in basic sir or seir models, but are important for evaluating the effects of an intervention. furthermore, it is not only important to enrich the information about the behaviour of the population; the characteristics of the disease must also be taken into account. these additional details can be incorporated to the model as new parameters, functions, or extra compartments. compartments such as asymptomatic, quarantined, isolated, and hospitalized have been widely used in covid- models. from articles, most of which are very recent [ , , , we have collected models. depending on whether they have an exposed compartment or not, they can be broadly classified as belonging to the sir or seir families. however, most of these models include additional compartments. susceptible individuals become infected with an incidence of: where β = pc is the transmission rate, c is the contact rate and p the probability that a contact with a susceptible individual results in a transmission [ ] . individuals who recover leave the infectious class at rate γ, where /γ is the average infectious period. the set of differential equations describing the basic sir model is given by: as mentioned above, compartmental models can be extended to consider further details. we have found models that incorporate the following features: asymptomatic individuals, births and deaths, delay-time, lock-down, quarantine, isolation, social distancing, and screening. figure shows a classification of the sir models reviewed in this article, and table lists them along with their equations. multiple output choices have been considered in the study of the structural identifiability and observability of some models. in such cases the observations are listed in the output column. figure : classification of sir models. each block represents a model structure. the basic, three-compartment sir model structure is on top of the tree. every additional block is labeled with the additional feature that it contains with respect to its parent block. the darkness of the shade indicates the number of additional features with respect to the basic sir model. parameters output ics input equations individuals in the seir model are divided in four compartments: susceptible (s), exposed (e), infected (i) and recovered (r). compared to the sir models, the additional compartment e allows for a more accurate description of diseases in which the incubation period and the latent period do not coincide, i.e. the period between which an infected becomes infectious. this is why seir models are in principle best suited to epidemics with a long incubation period such as covid- [ ] . susceptible individuals move to the exposed class at a rate βi(t), where β is the transmission rate parameter. exposed individuals become infected at rate κ, where /κ is the average latent period. infected individuals recover at rate γ, where /γ is the average infectious period. thus, the set of differential equations describing the basic seir model is: existing extensions of seir models may incorporate some of the following features: asymptomatic individuals, births and deaths, hospitalization, quarantine, isolation, social distancing, screening and lock-down. figure shows a classification of the models found in the literature; table lists them along with their equations. [ ] s, l, e, i, q, r γ, β , η, δ, ξ, θ , , θ , α , α , we analysed the structural identifiability and observability of the sir model structures (a total of model versions considering the different output configurations and time-varying parameter assumptions) and seir models (with a total of model versions) listed in tables and . the detailed results for each model are given in appendix a, which reports the structural identifiability of each parameter and the observability of each state, for every model version. in the remainder of this section we provide an overview of the main results. the general patterns regarding state observability are as follows. the recovered state (r) is almost never observable unless it is directly measured (d.m.) as output; the only exceptions are two seir models, and , for which r is observable under the assumption of time-varying parameters. the susceptible state (s), in contrast, is observable in roughly two thirds of the models (sir: / , seir: / ); this is also true for the exposed state (e) in the seir models. the infected state (i) is included in most studies among the outputs, either directly (d.m.) or indirectly measured (as part of a parameterized measurement function). when it is not considered in this way, its observability is generally similar to that of s (in / model versions i is not an output and it is observable in / ). the transmission and recovery rates (β, γ) are the two parameters common to all sir models. the transmission rate is identifiable in / model versions, and γ in / and its derivatives in / . seir models have a third parameter in common, the latent period (κ). it is identifiable in most of the models ( / ), as well as the recovery rate ( / ). the transmission rate is identifiable in / model versions, but it is not identifiable in any seir model version that accounts for social distancing (numbers and ); we found no clear pattern in the other models. the transmission rate β, the recovery rate γ, and in seir models the latent period κ, can vary during an epidemic as a result of changes in the population's behaviour [ , ] , the introduction of new drugs or new medical equipment [ ] , or the reduction of the period duration as a result of high temperatures [ ] . to account for such variations, the present study has considered both the constant and the time-varying cases, by including the corresponding variables either in the constant parameter vector θ or in the unknown input vector w(t), respectively, as described in remark . changing a parameter from constant to time-varying naturally influences structural identifiability and observability. this effect is graphically summarized in figures - , which represent classes of models in tree form and classify them according to their observability. each model is shaded with a color, according to the observability of the parameter studied. some models include different rates for different population groups: for example, they may consider two different transmission rates for symptomatic and asymptomatic individuals. for those models, each rate may have different observability properties when considered time-varying parameters; in such cases the model is depicted between two color blocks (see for example the sir model in figure ). changing β from a constant to a time-varying parameter (or equivalently an unknown input) does not change its observability nor that of the other variables in sir models. in contrast, this is not the case with the recovery rate γ, for which a somewhat counter-intuitive result may be obtained: by changing γ from a constant to a continuous function of time with at least a non-zero derivative, its model can become more observable and identifiable -despite the fact that it is an unknown function. an example of this is the sir model : if γ is constant the model has only one identifiable parameter, τ, and no observable states; if γ is time-varying with at least one non-zero derivative, two parameters become identifiable (β, µ), two states become observable (i, s), and γ itself is observable. in the other models, when γ is not identifiable as a constant nor observable as an unknown input, its successive derivatives are observable. for the seir models, the consideration of the β parameter as an unknown input function follows a similar trend to that of the sir models with the exception of model , which gains both observability and identifiability and becomes fispo. considering the recovery rate γ (fig. ) or the latent period κ (fig. ) individually as time-varying parameters generally leads to greater observability, except for model ( ) . as an example, in model ( ) one of the unknown inputs becomes observable, three states become observable (s, e, i), and three parameters become identifiable (γ, µ i , β); or the ( ) model, in which both its input and three states (s, e, i) become observable and two parameters (µ, β) become identifiable. besides the transmission rate, latent period, and recovery rate, other rates (screening, disease-related deaths, and isolation) have also been considered as time-varying parameters in some studies. the observability of most models is not modified if these parameters are allowed to change in time; the exception being models which gain observability. an example is the seir model ( ), which has seven parameters, seven states, and one output. assuming constant parameters, five of them are structurally identifiable (κ, α, β, γ , γ ) and two are unidentifiable (q, ρ), while there are three observable states (i, j, c) and four unobservable states (s, e, a, r) [ ] . however, when the parameter ρ (which describes the proportion of exposed/latent individuals who become clinically infectious) is considered timevarying, all parameters become identifiable (including ρ) and six states become observable (all except r, which is never observable unless it can be directly measured, as we have already mentioned). the fact that allowing an unknown quantity to change in time can improve its observability -and also the observability of other variables in a model -may seem paradoxical. an intuitive explanation can be obtained from the study of the symmetry in the model structure. the existence of lie symmetries amounts to the possibility of transforming parameters and state variables while leaving the output unchanged, i.e. their existence amounts to lack of structural identifiability and/or observability [ ] . the strike-goldd toolbox used in this paper includes procedures for finding lie symmetries [ ] . let us use the sir model as an example. this model has five parameters (τ, β, ρ, µ, d), of which only τ is identifiable if assumed constant. the model contains the following symmetry: where is the parameter of the lie group of transformations. thus, there is a symmetry between ρ and µ that makes them unidentifiable: changes in one parameter can be compensated by changes in the other one. however, if ρ is time-varying and µ is constant, the latter cannot compensate the changes of the former, and the symmetry is broken. indeed, if ρ is considered time-varying the model gains identifiability (not only µ, but also τ and β become identifiable) and observability (s, i and ρ become observable). let us now illustrate how the results of this study may be applied in a realistic scenario. we use as an example the model sir , which has states (s, i, r, a, q, j) and parameters (d , d , d , d , d , d , k , k , λ, γ , γ , a , q , j , µ , µ ); its equations are shown in table . this model includes the following additional features with respect to the basic sir model: birth/death, asymptomatic individuals (a), quarantine (q), and isolation (j). in its original publication two states were measured (q, j). with these two states as outputs the model has five identifiable parameters (d , d , q , k , µ ) and two observable states (a, i); thus, there are two unobservable states (s, r) and ten unidentifiable parameters. if we are interested in estimating e.g. the number of susceptible individuals (s), this model would not be appropriate. how should we proceed in that scenario? one way of improving observability could be by including more outputs (option ). for example, since there is a separate class for asymptomatic individuals (a), the infected compartment (i) considers only individuals with symptoms, and we could assume that they can be detected. by including 'i' in the output set, the structural identifiability and observability of the model improves: six more parameters are identifiable (λ, a , j , d , k , µ ) and the state in which we are interested (s) becomes observable. however, including more outputs is not always realistic. another possibility would then be to reduce the complexity of the model by decreasing the number of additional features (option ). for example, leaving out the asymptomatic compartment leads to the following model: the output of the model is the same, q, j. in this case, the model has eight identifiable parameters (λ, q , j , d , d , µ , µ , k ) and two observable states (s, i). a third possibility is to simplify the parametrization of the model (option ). this model considers a different death rate for every compartment (d i , i = , . . . , .). with some loss of generality, we could consider a specific death rate for infected individuals, d i = d , and a general death rate d for all non-infected and asymptomatic individuals, this reduction of the number of parameters leads to a better observability to the model: the only unidentifiable parameters are d , γ , and k , and the only non-observable state is r. thus, this option also allows to identify s. our analyses have shown that a fraction of the models found in the literature have unidentifiable parameters. key parameters such as the transmission rate (β), the recovery rate (γ), and the latent period (κ) are structurally identifiable in most, but not all, models. the transmission and recovery rates are identifiable in roughly two thirds of the models, and the latent period in almost all (> %) of them. likewise, the states corresponding to the number of susceptible (s) and exposed (e) individuals are non-observable in roughly one third of the model versions analysed in this paper. the number of infected individuals (i) can usually be directly measured, but it is non-observable in one third of the model versions in which it is not measured. the situation is worse for the number of recovered individuals (r), which is almost never observable unless it is directly measured. many models include other states in addition to s, e, i, and r, which are not always observable either. the transmission rate and other parameters may vary during the course of an epidemic, as a result of a number of factors such as changes in public policy, population behaviour, or environmental conditions. to account for these variations, in the present study we have considered both the constant and the time-varying parameter case. somewhat unexpectedly, we found that allowing for variability in an unknown parameter often improves the observability and/or identifiability of the model. this phenomenon might be explained by the contribution of this variability to the removal of symmetries in the model structure. structural identifiability and observability depend on which states or functions are measured. the lack of these properties may in principle be surmounted by choosing the right set of outputs [ ] , but the required measurements are not always possible to perform in practice. epidemiological models are a clear example of this; limitations such as lack of testing or the existence of asymptomatic individuals usually make it impossible to have measurements of all states. an alternative to measuring more states is to use a model with fewer compartments and/or a simpler parameterization, thus decreasing the number of states and/or parameters. reducing the model dimension in this way may achieve observability and identifiability. even when it is not possible (or practical) to avoid non-observability or non-identifiability by any means, the model may still be useful, as long as it is only used to infer its observable states or identifiable parameters. for example, we may be interested in determining the transmission rate β but not the number of recovered individuals r; in such case it is fine to use a model in which β is identifiable even if r is not observable. of course, this means that, to ensure that a model is properly used, it is necessary to characterize its identifiability and observability in detail, to know if the quantity of interest is observable/identifiable. the contribution of this work has been to provide such a detailed analysis of the structural identifiability and observability of a large set of compartmental models of covid- presented in the recent literature. the results of our analyses can be used to avoid the pitfalls caused by non-identifiability and non-observability. by classifying the existing models according to these properties, and arranging them in a structured way as a function of the compartments that they include, our study has answered the following question: given the sets of existing models and available measurements, which model is appropriate for inferring the value of some particular parameters, and/or to predict the time course of the states of interest? the tables included in the following pages report the results of the observability and structural identifiability analyses of all the model variants considered in this paper. each block of rows represents one of the following assumptions: • all parameters considered constant (i.e. as is usually the case in the original publications). • transmission rate β considered time-varying. • latent period κ considered time-varying (only in seir models; sir models do not have this parameter). • recovery rate γ considered time-varying. • all parameters considered time-varying. within each block, each row provides detailed information about identifiable and non-identifiable parameters, observable and non-observable states, directly measured (d.m.) states, observable and unobservable unknown inputs (and time-varying parameters), known inputs, and number of derivatives of the unknown inputs (and time-varying parameters) assumed to be non-zero (nnderw). the suffix d number represents the n th derivative of an unknown function (e.g. β d is the first derivative of the time-varying parameter β). the blank blocks in the tables of the seir models numbers and indicate that the corresponding time-varying case is already considered in the original formulation of the model. the sir models and have only been studied in their original form, i.e. without considering time-varying parameters, because these models do not contain the common parameters of the sir models; instead they use the r constant. h=i h=ki h=i, r, q h=q h=x h=d, r, t identifiable β, γ γ k β, δ, η, ξ, , ρ non identifiable k, β β, γ, δ k, β, α α, γ, ε, c, λ, σ, κ, θ, μ, τ opinion: mathematical models: a key tool for outbreak response an introduction to mathematical modeling of infectious diseases how simulation modelling can help reduce the impact of covid- special report: the simulations driving the world's response to covid- mathematical epidemiology an introduction to mathematical epidemiology modeling infectious disease dynamics wrong but usefulwhat covid- epidemiologic models can and cannot tell us on the predictability of infectious disease outbreaks predictability: can the turning point and end of an expanding epidemic be precisely forecast? sensitivity analysis for uncertainty quantification in mathematical models asymptotic estimates of sars-cov- infection counts and their sensitivity to stochastic perturbation covid- outbreak in wuhan demonstrates the limitations of publicly available case numbers for epidemiological modelling fitting dynamic models to epidemic outbreaks with quantified uncertainty: a primer for parameter uncertainty, identifiability, and forecasts why is it difficult to accurately predict the covid- epidemic? a simple planning problem for covid- lockdown a multi-risk sir model with optimally targeted lockdown can the covid- epidemic be controlled on the basis of daily test reports? practical unidentifiability of a simple vector-borne disease model: implications for parameter estimation and intervention assessment the structural identifiability of a general epidemic (sir) model with seasonal forcing the structural identifiability of the susceptible infected recovered model with seasonal forcing the structural identifiability of susceptible-infective-recovered type epidemic models with incomplete immunity and birth targeted vaccination identifiability and estimation of multiple transmission pathways in cholera and waterborne disease integrating measures of viral prevalence and seroprevalence: a mechanistic modelling approach to explaining cohort patterns of human papillomavirus in women in the usa population modeling of early covid- epidemic dynamics in french regions and estimation of the lockdown impact on infection rate structural and practical identifiability analysis of outbreak models assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models influencing public health policy with data-informed mathematical models of infectious diseases: recent developments and new challenges compartmental models in epidemiology dynamic systems biology modeling and simulation new results for identifiability of nonlinear systems a probabilistic algorithm to test local algebraic observability in polynomial time observability and structural identifiability of nonlinear biological systems full observability and estimation of unknown inputs, states, and parameters of nonlinear biological models local identifiability analysis of nonlinear ode models: how to determine all candidate solutions nonlinear controllability and observability an efficient method for structural identiability analysis of large dynamic systems structural identifiability of dynamic systems biology models genssi . : multi-experiment structural identifiability analysis of sbml models a new version of daisy to test structural identifiability of biological models sian: software for structural identifiability analysis of ode models on finding and using identifiable parameter combinations in nonlinear dynamic systems biology models and combos: a novel web implementation total variation regularization for compartmental epidemic models with time-varying dynamics effective containment explains subexponential growth in recent confirmed covid- cases in china modelling the covid- epidemic and implementation of population-wide interventions in italy a simple sir model with a large set of asymptomatic infectives fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the sars-cov- epidemic covid- outbreak in wuhan demonstrates the limitations of publicly available case numbers for epidemiological modelling a feedback sir (fsir) model highlights advantages and limitations of infection-based social distancing construction of compartmental models for covid- with quarantine, lockdown and vaccine interventions models of seirs epidemic dynamics with extensions, including network-structured populations, testing, contact tracing, and social distancing a modified seir model to predict the covid- outbreak in spain and italy: simulating control scenarios and multi-scale epidemics social distancing to slow the coronavirus seiar model with asymptomatic cohort and consequences to efficiency of quarantine government measures in covid- epidemic research about the optimal strategies for prevention and control of varicella outbreak in a school in a central city of china: based on an seir dynamic model epidemic analysis of covid- in china by dynamical modeling mathematical modeling of epidemic diseases to mask or not to mask: modeling the potential for face mask use by the general public to curtail the covid- pandemic seir transmission dynamics model of ncov coronavirus with considering the weak infectious ability and changes in latency duration healthcare impact of covid- epidemic in india: a stochastic mathematical model modeling the control of covid- : impact of policy interventions and meteorological factors modelling the transmission dynamics of covid- in six high burden countries mathematical model of transmission dynamics with mitigation and health measures for sars-cov- infection in european countries a novel covid- epidemiological model with explicit susceptible and asymptomatic isolation compartments reveals unexpected consequences of timing social distancing a mathematical model of epidemics with screening and variable infectivity dynamic models for the analysis of epidemic spreads effects of quarantine in six endemic models for infectious diseases introduction to seir models a time-dependent sir model for covid- with undetectable infected persons a periodic seirs epidemic model with a time-dependent latent period structural identifiability analysis via symmetries of differential equations finding and breaking lie symmetries: implications for structural identifiability and observability in biological modelling minimal output sets for identifiability key: cord- -b i vo authors: hubert, emma; mastrolia, thibaut; possamai, dylan; warin, xavier title: incentives, lockdown, and testing: from thucydides's analysis to the covid- pandemic date: - - journal: nan doi: nan sha: doc_id: cord_uid: b i vo we consider the control of the covid- pandemic via incentives, through either stochastic sis or sir compartmental models. when the epidemic is ongoing, the population can reduce interactions between individuals in order to decrease the rate of transmission of the disease, and thus limit the epidemic. however, this effort comes at a cost for the population. therefore, the government can put into place incentive policies to encourage the lockdown of the population. in addition, the government may also implement a testing policy in order to know more precisely the spread of the epidemic within the country, and to isolate infected individuals. we provide numerical examples, as well as an extension to a stochastic seir compartmental model to account for the relatively long latency period of the covid- disease. the numerical results confirm the relevance of a tax and testing policy to improve the control of an epidemic. more precisely, if a tax policy is put into place, even in the absence of a specific testing policy, the population is encouraged to significantly reduce its interactions, thus limiting the spread of the disease. if the government also adjusts its testing policy, less effort is required on the population side, so individuals can interact almost as usual, and the epidemic is largely contained by the targeted isolation of positively-tested individuals. starting around bc, and known as the first historically well-documented epidemic, the plague of athens killed between a quarter and a third of athenians, as reported by thucydides. he described the reaction of common athenians and physicians of the time alike in these terms for a while physicians, in ignorance of the nature of the disease, sought to apply remedies; but it was in vain, and they themselves were among the first victims, because they oftenest came into contact with it. no human art was of any avail, and as to supplications in temples, inquiries of oracles, and the like, they were utterly useless, and at last men were overpowered by the calamity and gave them all up. (jowett [ , volume i, book ii, pp. ]) thucydides analysed the consequences of this epidemic, and concluded that it had led a moral upheaval for the athenians, faced with the complete lack of any useful cure. indeed, they realised that their traditionally used policies (mostly of a religious nature) to face tragedies had no effect on the epidemic, and that in the end, the disease was only stopped thanks to the development of a natural immunity within the population, during the first four years of the epidemic phase. concerning now more specifically the spread of the disease itself, thucydides wrote the following appalling too was the rapidity with which men caught the infection; dying like sheep if they attended on one another; and this was the principal cause of mortality. when they were afraid to visit one another, the sufferers died in their solitude, so that many houses were empty because there had been no one left to take care of the sick; or if they ventured they perished, especially those who aspired to heroism. for they went to see their friends without thought of themselves and were ashamed to leave them, at a time when the very relations of the dying were at last growing weary and ceased even to make lamentations, overwhelmed by the vastness of the calamity. (jowett [ , volume i, book ii, pp. ]) in thucydides's analysis of the plague of athens, we can isolate three fundamental questions that need to be addressed whenever an unknown epidemic occurs. ( ) how can one model a disease when one has, at best, parsimonious information on how it is spreading among the population? ( ) how can one solve the gordian knot associated to interactions within the population: enjoying on the one hand the presence of others and avoiding isolation and solitude, and on the other hand potentially dramatically spreading the disease? ( ) how can governments and decisions-makers incentivise people in order to better control the spread of the epidemic? the first question is naturally linked to several strands of fundamental research, both for mathematicians and physicians, dealing with the problem of choosing a relevant epidemic model. if the paternity of the first mathematical model designed to describe the evolution of an epidemic is often attributed to bernoulli, who proposed one for smallpox as early as in [ ] , the real mathematical development of the theory had to wait for the th century, with fundamental contributions for the development of deterministic models by hamer [ ] , ross [ ; ; ] , soper [ ] , and later kermack and mckendrick [ ] , mckendrick [ ] , and bartlett [ ] who proposed one of the first general investigations of the evolution of deterministic interacting systems, which was then applied to epidemiology in kendall [ ] . the previous list is by no means comprehensive, and we refer the interested reader to the monograph by bailey [ ] for more historical details. it was rapidly noticed that deterministic models were insufficient to account for the uncertainty associated the disease spreading, and the technical difficulties usually encountered for its detection. this acknowledgement helped nurturing the development of stochastic models, whose first instances seems to be traced back to mckendrick [ ] and greenwood [ ] . for a precise comparison between deterministic and stochastic models in discrete-time settings, we refer our readers to bailey [ ] , bartlett [ ] , and allen and burgin [ ] , and to allen [ ] for more up-to-date references and an overview of recent epidemiological models. we will now describe some specific type of epidemiological models, belonging to the general class of compartmental models, and which will be at the heart of our work. the first one considers a sort of worst-case scenario, in which an immunity is not developed after infection. this is specially relevant for instance for some sexually transmitted infections, or bacterial diseases. in such a case, infected people can either die of the infection, or be cured and therefore become once more susceptible to contract the disease. such models have been coined sis (for susceptible-infected-susceptible), and consider a population divided into two groups. susceptible individuals interact with infected ones, and therefore move from one class to the other repeatedly. this model was first discussed in weiss and dishon [ ] , generalising a simpler version by bailey [ ] , where it was linked to birth and death interacting processes. it was then further studied by kryscio and lefèvre [ ] , who computed the mean time of extinction of the infection. these discrete-time models were then extended by nåsell [ ; ] , who found the quasi-stationary distribution of a continuous-time stochastic sis model with no births nor deaths. more recently, gray, greenhalgh, hu, mao, and pan [ ] proposed to model a stochastic sis process in continuous-time, as a solution to a bi-dimensional sde driven by a brownian motion. this is the model we will follow in our sis framework. alternatively to this quite pessimistic scenario, one can also assume that an immunity will appear after infection. in that case, we can distinguish three classes: susceptible individuals who can contract the disease, infected people who are currently infected by the disease, and recovered people who have been cured and developed antibodies. introduced originally by kermack and mckendrick [ ] , this so-called sir model was studied in depth by anderson and may [ ] in a deterministic setting, while stochastic perturbations were introduced by beretta, kolmanovskii, and shaikhet [ ] . modelling a stochastic sir process as a solution to an sde driven by a brownian motion was then proposed in tornatore, buccellato, and vetro [ ] , and jiang, yu, ji, and shi [ ] . this will be our model choice in this case. for a more realistic modelling in the case of the covid- disease, and especially to account for the relatively long latent phase of this disease, one could also assume that once a susceptible individual contracts the disease, he does not immediately become contagious. this led us to provide some extensions of our reasoning, in particular to a seir model, used by bacaër [ ] , dolbeault and turinici [ ] and Élie, hubert, and turinici [ ] to model the covid- disease. in this type of models, an intermediary class between susceptible and infected is introduced, usually referred to as the class of exposed individuals. this class allows to model individual infected but not yet infectious. similarly as for the sir/sis models, another variation on this model considers that there is only a partial immunity, and individuals having recovered may revert to the class of susceptible: in this case, the model is usually coined seirs. in our framework we need to consider continuous-time stochastic version of these models, and will therefore use the ones introduced in mummert and otunuga [ ] . the question of the model being now settled, we can focus more on the second question raised above, which is linked to the spread of the disease through interactions within the population. in classical sis/sir models, the infection grows into the population through an incidence rate β, and proportionally to the product of the number of susceptible and infected individuals. in the absence of a cure or a vaccine, this transmission rate appears therefore as the only control variable of individual or public institutions, in order to reduce the spread of an epidemic. our take on the second question will therefore be from a control-theoretic perspective. at the heart of this approach is the simple idea that when faced with an epidemic, a perfectly rational population will try to find an equilibrium interaction rate, balancing the need to still connect with others, and the natural fear of spreading the infection itself. this is by no means a new point of view, and papers discussing the use of formal control theory in epidemiology can be dated back to the s, see among others taylor [ ] , jaquette [ ] , sanders [ ] , gupta and rink [ ; ] , abakuks [ ] , morton and wickwire [ ] , wickwire [ ] , or sethi and staats [ ] . more recently and closer to our purpose, we can also refer to behncke [ ] , riley et al. [ ] , who studied the impact of the control of transmission rate on the - sars outbreak in hong kong and on the ways to interfere with the disease spreading, piunovskiy and clancy [ ] , hansen and day [ ] , fenichel et al. [ ] , kandhway and kuri [ ] , sélley, besenyei, kiss, and simon [ ] , and more broadly to the monograph by lenhart and workman [ ] . an important, and slightly unrealistic aspect of the framework we just described is that the population is perfectly rational. though it seems reasonable to assume that at least some individuals, being afraid of getting sick, will naturally decrease their interaction rates, it would however clearly be a stretch to assume that all individuals will have access to enough information, compared for instance to public institutions, for them to assess whether they are really acting in a way which is truly beneficial to the population as a whole. this is one of the reasons why quarantine and lockdown measures can be in addition introduced by governments, in order to help slow down a pandemic, when no cure nor vaccine have been developed, and there is a risk for medical facilities to be overwhelmed by a large influx of patients. as should be expected, a significant part of the recent literature on the covid- pandemic has also adopted this point of view, and such measures as well as their medical, societal, and economical impacts are discussed by, among others, alvarez, argente, and lippi [ ] , anderson, heesterbeek, klinkenberg, and hollingsworth [ ] , colbourn [ ] , del rio and malani [ ] , djidjou-demasse, michalakis, choisy, sofonea, and alizon [ ] , Élie, hubert, and turinici [ ] , ferguson et al. [ ] , fowler, hill, levin, and obradovich [ ] , grigorieva, khailov, and korobeinikov [ ] , hatchimonji, swendiman, and seamon [ ] , kantner [ ] , ketcheson [ ] , piguillem and shi [ ] , thunstrom, newbold, finnoff, ashworth, and shogren [ ] , toda [ ] , or wilder-smith, chiew, and lee [ ] . a telling example in the above list is the report of the imperial college london by ferguson et al. [ ] , which assesses the impact of non-pharmaceutical interventions to reduce the contact rate within a population for the covid- pandemic. they distinguish between mitigation strategies (i.e. reduction of the peak hospitalisation levels by protecting the most susceptible individual from getting infected, with shelter in place policies or social distancing), and suppression strategies (i.e. aiming at reversing the disease growth with home isolation and social distancing for the entire population). it has been showed that mitigation policies 'might reduce deaths seen in the epidemic by up to half, and peak healthcare demand by two-thirds,' (ferguson et al. [ , pp. ] ) but will lead to numerous deaths and saturation of health systems. the suppression strategy thus appears in this report as a preferred policy. in light of the issues we have raised, a natural conclusion was, at least for us, that even if a control-theoretic approach to mitigate the impact of an epidemic is clearly desirable, there is a priori no evidence that in face of clear public policies, a population will directly adopt a social distancing behaviour leading to an optimal transmission rate for the welfare of the society. moreover, in the absence of a system allowing to actually keep track of the level of interaction within the population, governments are faced with a clear situation of moral hazard. consequently, an incentive policy should also be calibrated by governments in order to get a better control on the spread of the disease. this, as expected, leads us to our third question, which is where our approach departs significantly from the extant literature. the covid- pandemic has emphasised that a control policy has to be established with penalties, if lockdown measures are not respected by the population. however, such policy is subject to two mains issues. first, regardless of the amount of police checks being put into place, it is impossible for large countries to ensure the application of such isolation measures, and therefore it is unfeasible to have an absolute control on the behaviour of all individuals and their interactions. second, a balance has to be stricken between the severity of penalties or other type of incentives to help reduce the propagation of the disease, and the natural yearning of citizens for interactions. to the best of our knowledge, no real calibration, founded on quantitative criteria, of appropriate incentive policies has been investigated in epidemiological models. the present paper proposes to fulfil this gap by studying how a lockdown policy, seen as a suppression strategy to echo [ ] , can limit the number of infected people during an epidemic, with uncertainties on the actual number of affected individuals, and on their level of adherence to such a policy. more specially, we aim at solving this moral hazard problem by finding (i) the best reaction effort of the population to reduce the interaction given a specific government policy; (ii) the optimal policy composed by an aggregated tax paid by the population at some fixed maturity, and a testing policy to reduce the uncertainty on the estimated number of infected people. as we already mentioned, this problem perfectly fits with a classical principal-agent problem with moral hazard, and boils down to finding a stackelberg equilibrium between the principal (the leader, here the government) proposing a policy to an agent (the follower, here the population) to interact optimally in order to reduce the spread of the disease. principal-agent problems have a long history in the economics literature, dating back from, at least, the s. it is not our goal here to review the whole literature on the subject, and we refer the interested reader to the seminal books by laffont and martimort [ ] , bolton and dewatripont [ ] , or salanié [ ] . for our purpose here, we will content ourselves to mention that this literature regained a strong momentum in the past two decades, where continuoustime models where developed and showed to be more flexible and tractable than the earlier static or discrete-time models. main contributors in these regards are holmström and milgrom, [ ] , schättler and sung [ ] , sannikov [ ] , williams [ ] , see also the monograph by cvitanić and zhang [ ] . more recently, cvitanić, possamaï, and touzi [ ; ] developed a general theory allowing to tackle a great number of contract-theory problem, which has been then extended and applied in many different situations . the basic idea is to identify a sub-class of contracts offered by the principal, which are revealing in the sense that the best-reaction function of the agent, and his optimal control, can be computed straightforwardly, and then proving that restricting one's attention to this class is without loss of generality. with this approach, the problem faced by the principal now becomes a standard optimal control problem. there are however two fundamental assumptions for this theory to work, one of them being a specific structure condition, which enforces that the drift of the process controlled by the agent, meaning here for us the pair (s, i) giving the number of susceptible and infected people in the population, must be in the range of the volatility matrix of this process. this fundamental assumption is not satisfied in our model, because roughly speaking, there is only one brownian motion driving the two processes, and we therefore cannot directly rely on existing result to tackle our problem. in these so-called degenerate problems, the literature has so far relied on the pontryagin stochastic maximum principle, see for instance [ ] , but this requires extremely stringent assumptions, such as linear dynamics, which are automatically precluded for sis/sir models. we however prove that in our specific problem, it is possible designed to help tracking down subsequent exposures after an infected individual is identified, see for instance cho, ippolito, and yu [ ] , or reichert, brack, and scheuermann [ ] . using these would in principle erase any possibility or moral hazard, provided that all the population uses the app, and that testing is organised on a massive scale. even admitting that this would be the case, it remains that these tools have raised complex issues of privacy, see ienca and vayena [ ] or park, choi, and ko [ ] , and thus are still extremely polemical. in any case, the incentive-based approach we propose can always be considered as a useful complement to any other adopted strategy. there are a certain number of papers studying disease spreading through the lens of either moral hazard or adverse selection. however, these papers are mostly interested in livestock related diseases, where producers naturally have private information on preventive measures they may have adopted, prior to contamination (ex ante moral hazard), and may or may not declare whether their herd is infected after contamination (ex post adverse selection). such issues and the design of appropriate policies are considered for instance in valeeva and backus [ ] , gramig, horan, and wolf [ ; ] , but the problematic is completely different from the one we are interested in. a notable exception can be found in the work of carmona and wang [ , section ] , where the authors consider an application of their moral hazard theory for agents interacting through a finite state mean-field game to the containment of an epidemic. to identity a whole family of contract representations (unlike the unique one in non-degenerate models), which is different from the one obtained in [ ] , but which still allows us to re-interpret the problem of the principal as a standard stochastic control problem. as far as we know, ours is the first paper in the literature which uses a dynamic programming approach to solve a degenerate principal-agent problem, and this constitutes our main mathematical contribution. unfortunately, but of course expectedly for a relatively general framework, there is no way to extract from our model explicit results, especially on the shape of optimal controls. it is therefore necessary to perform numerical simulations, by implementing semi-lagrangian schemes, proposed for the first time by camilli and falcone [ ] , using some truncated high-order interpolators, as proposed by warin [ ] . the numerical results for both sis and sir models are conclusive, and confirm the relevance of a tax and testing policy to improve the control of an epidemic. first, in the benchmark case, considered as the case where the government does not put into place a specific policy, the efforts of the population are not sufficient to contain the epidemic. in our opinion, this supports the need for incentives. indeed, if a tax policy is put into place, even in the absence of a specific testing policy, the population is then encouraged to significantly reduce its interactions, thus containing the epidemic until the end of the period under consideration. however, for a fixed containment period, the population relaxes its effort at the very end, leading to a resumption of the epidemic at that point. finally, if the government also adjusts its testing policy, less effort is required on the population side, so individuals can interact almost in a business-as-usual fashion, and the epidemic is largely contained by the targeted isolation of positively-tested individuals. we let n be the set of positive integers, r + := [ , ∞) and r + := ( , ∞). we fix a time horizon t > corresponding to the lockdown length chosen, a priori, by the government. for every n ∈ n , s n represents the set of n × n symmetric positive matrices with real entries. we also denote by c n the space of continuous functions from [ , t ] into r n , and simplify notations when n = by setting c := c . the set c n will always be endowed with the topology associated to the uniform convergence on the compact [ , t ]. for every finite dimensional euclidean space e, and any n ∈ n , we let c b (e, r) be the space of bounded, continuous functions from e to r, as well as c n b (e, r) the subset of c b (e, r) of all n-times continuously differentiable functions on e, with bounded derivatives. for every ϕ ∈ c b (e, r), we denote by ∇ϕ its gradient vector, and by d ϕ its hessian matrix. in this section, in order to highlight the results we obtained throughout this paper, we present our model in an informal way. we thus detail the compartmental epidemic models we consider to represent the spreading of the virus, i.e., either a sis or a sir model. indeed, at the beginning of an epidemic, it is unlikely that decision-makers, let alone the population, will have sufficient data to conclude that infected individuals become immune to the virus in question once they have recovered. this is particularly the case when the virus is new, as in the case of the covid- . with this in mind, we concentrate our attention to two classical models in epidemiology: the sis model, for the case where infected individuals do not develop an immunity to the disease, and can therefore re-contract it, and the sir model in the opposite case. our study is therefore able to deal with both models, and one of the important points will be to compare the results obtained for each of them. we insist on the fact that this entire section is informal, and the reader is referred to section for the rigorous mathematical study. some parameters will be common in the considered models. in particular, they both involve four non-negative parameters, λ, µ, β and γ. the parameters λ and µ represent respectively the birth and (natural) death rates among the population, and therefore reflect the demographic dynamics unrelated to the epidemic , while γ represents the death rate associated to the disease. all these parameters are assumed to be constant and exogenous. in most epidemic models, the parameter β, representing the transmission rate of the disease, is also assumed to be constant and exogenous. nevertheless, in our framework, we will consider that β is endogenous and time-dependent, in order to model the influence that the population can have on this transmission rate. more precisely, the transmission rate β depends essentially on two factors: the disease characteristics and the contact rate within the population. although the population cannot modify the disease characteristics, each individual can choose (or be incentivised) to reduce his/her contact rate with other individuals in the population. we will thus assume that the population can control the transmission rate β of the disease, by reducing social interactions. with this in mind, we will denote by β > the constant initial transmission rate of the disease, i.e., without any control measures or effort from the population. unfortunately, reducing social interactions is costly for the population. this cost takes into account both the obvious social cost, due to accrued isolation during the lockdown period, and an economic cost (loss of employment due to the lockdown,...). from now on, β will thus denote the time-dependent transmission rate of the disease, controlled by the population. more precisely, we fix some constant β max ≥ β representing the maximum rate of interaction that can be considered, and we define b := [ , β max ]. the process β will be assumed to be b-valued, and we will denote by b the corresponding set of processes. one of the two epidemic models we will study is inspired by the well-known sis (susceptible-infected-susceptible) compartment model, which mainly considers two classes s and i within the population: the class s represents the 'susceptible', while the class i represents the 'infected'. in this model, during the epidemic, each individual can be either susceptible or infected, and (s t , i t ) denotes the proportion of each category at time t ≥ . more precisely, as in classical sis models, we assume that an infected individual returns, after recovery, to the class of susceptible individuals, and can therefore re-contract the disease. we denote by ν the associated rate, which is assumed to be a non-negative constant. we also take into account the demographic dynamics of the population, i.e., births and deaths (related to the considered disease or not), through the previously mentioned parameters λ, µ and γ. to sum up, the model is represented in figure below, and the (continuous-time) evolution of the disease is described by the following system   for an initial compartmental distribution of individuals at time , denoted by (s , i ) ∈ r + , supposed to be known. death the second epidemic model we will focus on is the classical sir (susceptible-infected-recovered) compartment model. as in the sis model, the class s represents the 'susceptible' and the class i represents the 'infected'. the sir model is used to describe epidemics in which infected individuals develop immunity to the virus. this therefore involves a third class, namely r, representing the 'recovered', i.e., individuals who have contracted the disease, are now cured, and therefore immune to the virus under consideration. we denote by ρ the recovery rate, which is assumed to be a fixed non-negative constant. therefore, during the epidemic, each individual can be either susceptible, infected or recovery, and (s t , i t , r t ) denotes the proportion of each category at time t ≥ . as in the previously described sis model, we also take into account the demographic dynamics of the population, through the parameters λ, µ and γ. to sum up, the epidemic scheme is represented in figure , and the (continuous-time) evolution of the disease is described by the following system for a given initial distribution of individuals at time , denoted by (s , i , r ) ∈ r + and assumed to be known. recovery the use of a deterministic model is widespread and generally justified for most epidemics. however, in our case study, and given what is currently happening in many countries, it appears that the number of infected individuals is not so simple to quantify and estimate. indeed, without a large testing campaign, it seems complicated to know precisely the proportion of infected in the population. this is particularly true in the case of the covid- epidemic: the absence of symptoms for a significant proportion of infected individuals leads to uncertainty about the actual number of susceptible and infected. as a consequence, it seems more realistic in our study to turn both the sis and sir deterministic controlled models previously described, into stochastic controlled models. concerning the deterministic part, the dynamics written in the previous systems remain identical. the volatility is partly represented by a fixed and deterministic parameter σ > , and by a time-dependent process α, representing the actions of the government in terms of testing policy. more precisely, in our model, an increase of the number of tests in the population, represented by a decrease of the parameter α, leads to a decrease in the volatility of the processes s and i. hence, both the population and the government have a clearer view of the number of susceptible and infected, and thus on the epidemic. however, this strategy comes at an economic cost for the government. we then assume that, without any specific effort of the government, α is equal to . we also fix a small parameter ε ∈ ( , ) to consider the subset a := [ε, ]. the control α of the government is assumed to be a-valued, and we denote by a the corresponding set of processes. in addition, the testing policy allows the government to isolate individuals with positive test results. therefore, the control α also has an impact on the effective transmission rate of the disease. more precisely, without any testing policy, i.e. α = , the government cannot isolate contaminated individuals efficiently. in this case, all infected people spread the disease, and the transmission rate of the virus is given by β. conversely, if a testing policy is put into place by the government, i.e. when α < , we consider that individuals with positive test results can be isolated, and as a consequence less infected people spread the disease. in this case, the effective transmission rate is lower. we however do not assume that the impact of the testing policy on the volatility of s and i, and on the transmission rate has the same magnitude. indeed, we expect a lower reduction of the effective transmission rate, compared to the volatility reduction for a given policy α. this should be understood as a manifestation of the fact that it is easier to reduce the uncertainty on the number of infected people, compared to actually isolate individuals who have been identified as infected. we thus assume a linear dependency with respect to α for the volatility of both s and i, while the effective transmission rate is chosen equal to β √ α, so that the number of infected people spreading the disease at time t is given by we can now consider the sis model previously defined by ( . ), but in its stochastic version: the number of infected, and therefore the number of susceptible, are impacted at each time t by a brownian motion w t . more precisely, the dynamic of the epidemic is now given by the following system similarly to the sis model, we consider that the deterministic model sir described by ( . ) is also subject to a noise in the estimation of the proportion of susceptible and infected individuals. inspired by the stochastic sir model in tornatore, buccellato, and vetro [ ] , the dynamic of the epidemic is now given by the following system ( . ) note that the proportion r of individuals in recovery is also uncertain, but only through its dependency with respect to i. more precisely, we assume that there is no uncertainty on the recovery rate ρ, implying that if the proportion of infected individual is perfectly known, the proportion of recovered is also known without uncertainty. this modelling choice is consistent with most stochastic sir models, and emphasises that the major uncertainty in the current epidemic is related to the non-negligible proportion of (nearly) asymptomatic individuals. indeed, an asymptomatic individual may be mis-classified as susceptible. this is also the case for an individual in recovery, who has been asymptomatic, but the uncertainty is solely related to the fact that he was not classified as infected when he actually was. in order to provide a unified framework for both the sis and sir models, and simplify the presentation, we will consider the following dynamic for the epidemic ( . ) notice that to recover the sis model, one has to set ρ = , and conversely, ν = to obtain the sir model. in addition to the choice of a testing policy, the government can also incentivise the population to limit their social interactions, in order to decrease the transmission rate of the disease, by introducing financial penalties. more precisely, at time , the government informs the population about its testing policy α ∈ a, as well as its fine policy χ ∈ c , for the lockdown period [ , t ]. knowing this, the population will choose an interacting behaviour according to the following rules: (i) an increase in the tax lowers its utility; (ii) an increase in the level of interaction (up to a specific threshold, namely β) improves its well-being; (iii) the population is scared of having a large number of infected people. see section . . for a rigorous definition of the set c of admissible fine policies. we stylise the previous facts by considering that the population solves the following optimal control problem, for a given pair (α, where u : [ , t ] × b × r + −→ r and u : r −→ r are continuous functions in all their arguments, and u is a bijection from r to r. given a pair (α, χ), the set of optimal contact rates β will be denoted b (α, χ). the functions u and u should be interpreted as functions translating respectively the actual value of interaction from the point of view of the population, and the disutility associated to the fine. more precisely, the function u is assumed to be an increasing function, according to (i) above. concerning the function u, it should be non-decreasing in the second variable up to β, and then non-increasing, modelling (ii) above. on the other hand, the function u is assumed to be non-increasing with respect to the proportion of infected individual in the population. in particular, this allows to take into account both the fear of the infection (as mentioned in (iii) above) and the cost that is incurred if an individual is infected. from the population's point of view, this cost is not actually expressed in terms of money, but mainly corresponds to medical side effects or general morbidity. we refer to anand and hanson [ ] , zeckhauser and shepard [ ] and sassi [ ] , for an introduction to qaly/daly (quality-and disability-adjusted life-year), the generic measures of disease burden used in economic evaluation to assess the value of medical interventions. we choose to normalise the utility of the population to zero when there is no epidemic. in other words, if i = , then i t = for all t ∈ [ , t ], and thus the utility of the population should be equal to . with this in mind, we assume that u ( ) = , which means that without a fine, the population does not suffer any disutility. moreover, when there is no epidemic, the population should not reduce its social interaction, meaning that for all t ∈ [ , t ], β t = β. this leads us to assume that represents the social cost of lockdown policy, and thus should capture the two rules (ii) and (iii), as well as satisfy u(t, β, ) = for all t ∈ [ , t ]. in particular, we could consider a separable utility function u of the form where the function u i : r + −→ r represents the fear of the infection for the population. in order to choose this function, we would like model the fact that when the proportion of infected is close to , the population underestimates the epidemic, while when this proportion becomes large, the population becomes irrationally afraid. therefore, we can consider a function independent of t, and take next, the function u β represents the sensitivity of the population with respect to the initial transmission rate β of the disease, i.e., without any lockdown measure. during the lockdown period, the social cost of distancing measures becomes more and more important for the population, and we thus expect the cost u β to also reflect this sensitivity with respect to time. more precisely, we can consider two particular functions to model these stylised facts for some η p > , to insist on the fact that it is costly for the population to deviate from its usual contact rate, i.e. its level of interactions in an epidemic-free environment, inducing the natural transmission rate of the disease β; finally, concerning the utility of the agent with respect to the tax χ, we choose a mixed cara-risk-neutral utility function where θ p > is the risk-aversion of the population, and φ p > , so that u ( ) = , and u is an increasing and strictly concave bijection from r to r. for later use, we record that the inverse of u , denoted by u (− ) , has an explicit form (see corless, gonnet, hare, jeffrey, and knuth [ ] for more details about the lambertw function) before turning to the principal-agent problem itself, we aim at solving ( . ) for α = fixed, and χ = , i.e. without tax and testing policy. similar problems have been studied in for instance kandhway and kuri [ ] . mathematically speaking, the optimisation problem faced by the population without contract is informally given by since we assumed u ( ) = . notice that by assumption on the function u, in the no-epidemic case, i.e., if i = , the population should not make any effort, and therefore the optimal contact rate β over the period [ , t ] is equal to β. we thus consider in the following a fixed initial condition (s , i ) ∈ (r + ) , which implies that for all t ∈ [ , t ], both s t and i t are (strictly) positive. without tax, the population's problem boils down to a standard control problem, with two state variables s and i. we will give the associated pde in section . . below. one of the main theoretical result of our study is given by theorem . . informally, this theorem states that given an admissible contract, namely a testing policy α ∈ a and a tax χ ∈ c, there exist a unique y and z such that the following representation holds where β is the unique optimal contact rate for the population. more precisely, we can state that for (lebesgue-almost under some assumptions for existence and smoothness of the inverse of the function u , the previous equation gives a representation for the tax χ. based on ( . ), the tax χ will be indexed on the variation of the proportion of infected i, through the stochastic integral · z s di s , and not on the variation of susceptible s (though it is indexed on s through the dt integral). nevertheless, using the link between the dynamics of i and s, we can write a representation equivalent to ( . ) through this equation, we can state that the tax can be indexed on s instead of i. therefore, given the strong link between the number of susceptible and the number of infected, it is sufficient to index the tax on only one of these two quantities, and one can therefore choose indifferently to index the tax χ on the variations of i or s. the reader familiar with contract theory in continuous-time will have noticed that the previous representation for the tax χ is not exactly the expected one. indeed, referring for instance to cvitanić, possamaï, and touzi [ ] the contract is usually the sum of three components: (i) a constant similar to y , chosen by the principal in order to satisfy the participation constraint of the agent; (ii) an integral with respect to time t ∈ [ , t ] of the agent's hamiltonian; (iii) a stochastic integral with respect to the controlled process, i.e., in our framework, (s, i). neither the representation ( . ) nor ( . ) are, a priori of this form. this difference is due to the fact that the dynamics of (s, i) is degenerated. more precisely, there is a fundamental structure condition in [ ] requiring that the drift of the output process belongs to the range of its volatility. in words, defining for (s, which is obviously impossible here. therefore, we cannot use directly any existing result in the literature, and we should not expect, a priori, to be able to obtain a contract representation similar to the one in [ ] , nor that the so-called dynamic programming approach will prove effective in our case. indeed, as far as we know, such degenerate models have only been tackled using the stochastic maximum principle, see hu, ren, and touzi [ ] . however, and somewhat surprisingly, the form we exhibit for the tax is actually strongly related to the usual representation. the reason for this is twofold. first, up to the sign, the volatilities in the dynamics of both s and i are exactly the same. second, both the processes s and i are driven by the same brownian motion w . therefore, intuitively, in order to provide incentives to the population, the government can afford to index the tax on only one of the two processes. mathematically, it is also straightforward to show that given an arbitrary decomposition of the process z in equation ( . ) of the form z =: where h is the hamiltonian of the population, and this is exactly the general form provided in [ ] . the main difference is that in [ ] , z s and z i are both uniquely given, while in our representation, only their difference actually matters. hence, there is an infinite number of possible representations for the tax χ in our degenerate model. as already explained, the government can choose the tax χ ∈ c paid by the population together with the testing policy α ∈ a. it aims at minimising the number of infected people until the end of the quarantine period, and we informally write its minimisation problem as where c : the function c denotes the instantaneous cost implied by the proportion of infected people during the quarantine period, and is thus assumed to be non-decreasing, while the function k represents the cost of the testing policy. in addition, the set Ξ takes into account the so-called participation constraint for the population. this means that the government is benevolent, which translates into the fact that it has committed to ensure that the living conditions of the population do not fall below a minimal level. mathematically, the government can only implement policies concerning the cost function k associated with the testing policy, we recall that α = means no testing policy, so no cost for the government. as soon as α is different from , the cost has to be higher. we may consider the following function for the testing policy k, for some η g > and κ g > , this function highlights the fact that it is very costly, if not impossible, to eliminate the uncertainty associated with the epidemic. indeed, in a relatively populous country, it seems impossible to develop a testing policy sufficient to know exactly the proportion of susceptible and infected people. another interesting case to compare our results with, corresponds to the so-called first-best case. this is the bestpossible scenario where the government can enforce whichever interaction rate β ∈ b it desires, and simply has to satisfy the participation constraint of the population. from the practical point of view, this could correspond to a situation where the government would be able to track every individual and force them to stop interacting. the problem faced by the government is then in this section, we present the main theoretical results obtained when the dynamic of the epidemic is given by ( . ) . recall that, in order to consider the sis or the sir model, one has to set respectively ρ = or ν = . as mentioned in section . . , the benchmark problem is a standard markovian stochastic control problem, whose we then have the natural identification where v solves the associated hamilton-jacobi-bellman (hjb for short) equation for a particular function f defined by ( . ) in section . . note that if we consider separable utilities with the form in particular, the optimal interaction rate is given by to find the optimal interaction rate β ∈ b, as well as the optimal contract (α, χ) ∈ a × c, in the first-best case, one has to solve the government's problem defined by ( . ). mathematical details are postponed to section . . , but we present here an overview of the main results. to take into account the inequality constraint in the definition of v p,fb , one has to introduce the associated lagrangian. given a lagrange multiplier > , we first remark that the optimal tax is constant and given by then, defining for any > we have note that v ( ) is the value function of a standard stochastic control problem, and therefore we expect to have in particular, if we consider separable utilities with the forms ( . ), for a given testing policy α ∈ a and a lagrange multiplier > , the optimal interaction rate is given for all t recalling that b • is defined by ( . ). thanks to the reasoning developed in section , we are able to determine the optimal design of the fine policy, the optimal testing policy, as well as the optimal effort of the population. first, as informally explained in section . . , to implement a tax policy χ ∈ c, the government only needs to choose a constant y and a process z. given these two parameters, we can state that the optimal contact rate for the population is defined by it thus remains to solve the government's problem in order to determine the optimal choice of y and z. the reader is referred to section . for the rigorous government's problem, but, to summarise the results, the optimal process z as well as the optimal testing policy α are determined so as to maximise the government's hamiltonian, given by finally, it remains to solve numerically the following hjb equation, for all t ∈ [ , t ] and x : where the natural domain over which the above pde must be solved is the results presented in section . are quite theoretical: except for the optimal transmission rate, it is complicated to obtain explicit formulae for the other variables sought, in particular for the optimal testing policy α, even if we consider separable utility functions as in ( . ) . it is therefore necessary to perform numerical simulations to evaluate the optimal efforts of the population and the government, as well as the optimal tax policy. given the similarities in the results between the sis and sir models, only those related to the sir model are presented in this section. the reader will find in appendix a the results corresponding to the sis model. the following numerical experiments are implemented using the utility and cost functions respectively mentioned in example . for the population and in example . for the government. to summarise, we choose for the population: table . in addition, the set of parameters used for the simulations of the epidemic dynamics given by ( . ) are provided in table and are inspired by those chosen by Élie, hubert, and turinici [ ] . recall that the parameter β denotes the usual contact rate within the population, before the beginning of the lockdown. in other words, β represents the initial and effective transmission rate of the disease, without any specific effort of the population. the associated reproduction number r , commonly defined by r := β/(ν + ρ) in the literature on epidemic models, is equal to . , and is thus in the confidence interval of available data, see for example li et al. [ ] . recall that the parameters λ and µ represent respectively the birth and (natural) death rates among the population, and therefore reflect the demographic dynamics unrelated to the epidemic, while γ represents the death rate associated to the disease. to simplify, and since the duration of the covid- epidemic should be relatively short in comparison to the life expectancy at birth, we choose to disregard the demographic dynamics by setting λ = µ = . in contrast, we set γ = %, since the mortality associated with the disease appears to be significant. finally, recall that the parameters ν and ρ correspond respectively to the recovery rates in the sis and sir models, i.e., the inverse of the virus contagious period. since we want to consider here a sir dynamic, we let ν = and ρ = . , to account for the average -day duration of covid- disease. when not explicitly specified, the simulations presented in this section are performed with the sets of parameters described in tables and . however, the parameters used to describe in particular the utility and cost functions of the population and government are set in a relatively arbitrary way. to actually estimate these parameters would require an extensive sociological and economic study, that we do not presume to be able to perform at this stage, and linking, for example, the population's costs to the daly/qaly concepts already mentioned, and the government's costs to those of the health care system and its possible congestion. moreover, there is considerable uncertainty in the medical literature on the choice of all parameters used to describe the dynamics of the epidemic, in particular because the covid- is a new type of virus and therefore we do not have sufficient hindsight to reliably estimate its characteristics. it will therefore be necessary to study the sensitivity of the results obtained with respect to the selected parameters. finally, it should be remembered that, in contrast to usual principal-agent problems, the government implements a mandatory tax, which the population cannot refuse. nevertheless, we consider that the government is benevolent, in the sense that it still wishes to ensure that the utility of the population remains above a certain level, denoted by v. to fix this level, we assume that the government wants to ensure at the very least to the population the same living conditions it would have had in the event of an uncontrolled epidemic, i.e., without any effort on the part of neither the population nor the government, meaning β = β, α = and χ = . mathematically, this is equivalent to the following, since u is separable of the form ( . ), such that for all t ∈ [ , t ], u β (t, β) = and u i satisfies ( . ) notice that the reservation utility v is given by the worst case scenario, without any sanitary precaution neither from the population nor from the government. this level may be judged too severe, and one could consider a model where the government is more benevolent. in particular, one could set v closer to the value that the population achieves in the benchmark case, i.e., when it makes optimal efforts in the absence of government policy. nevertheless, the value of v should not be of major importance, since it should only impact the initial value y . in order to solve equation ( . ) corresponding to the population's problem in the benchmark case, as well as equation ( . ) for the government's problem, we need a method permitting to deal with degenerate hjb equations. we choose to implement semi-lagrangian schemes, first proposed in camilli and falcone [ ] . these are explicit schemes using a given time-step ∆t, and requiring interpolation on the grid of points where the equation is solved. this interpolation can be either linear, as proposed in [ ] , or using some truncated higher-order interpolators, as proposed by warin [ ] , leading to convergence of the numerical solution to the viscosity solution of the problem. a key point here, which makes the approach delicate, is that the domain over which the pdes are solved is unbounded. it is therefore necessary to define a so-called resolution domain, over which the numerical solution will be actually computed, which on the one hand must be large enough, and which on the other hand creates additional difficulties in the treatment of newly introduced boundary conditions. in order to treat these issues, we use two special tricks: (i) picking randomly the control in ( . ) for the benchmark case, and in ( . ) for the general case, and using the forward sde with an euler scheme, a monte-carlo method allows us to get an envelop of the reachable domain with a high probability at each time-step. then, given a discretisation step fixed once and for all, the grid of points used by the semi-lagrangian scheme is defined at each time-step with bounds set by the reachable domain estimated by monte-carlo. therefore, at time step , the grid is only represented by a single mesh, while the number of meshes can reach millions near t ; (ii) since the scheme is explicit, starting at a given point at date t, it requires to use only some discretisation points at date t + ∆t, and a modification of the general scheme is implemented to use only points inside the grid at date t + ∆t, as shown in [ ] . lastly, in dimension or above, parallelisation techniques defined in [ ] have to be used in order to accelerate the resolution of the problems. the numerical results below are obtained using the stopt library, see gevret, langrené, lelong, warin, and maheshwari [ ] . we first focus on the benchmark case, when the government does not implement any particular policy to tackle the epidemic, i.e., α = and χ = . recall that in this case, the population's problem is given by ( . ), and is then equivalent to solving the hjb equation ( . ). for our simulations, we choose a number of time-steps equal to , and a discretisation step equal to . . the interpolator is chosen linear, and the optimal command b • used to maximise the hamiltonian is discretised with points given a step discretisation of . . once the pde is solved, a forward euler scheme is used to obtain trajectories of the optimally controlled s and i, meaning with the optimal transmission rate b • . in order to check the accuracy of the method described in section . , we implement two versions of the resolution (i) the first version is a direct resolution of ( . ) with the hamiltonian ( . ); (ii) the second one relies on a change of variable. more precisely, we consider (s, x := (s + i)) as state variables, instead of (s, i), and then solve the problem ( . ), but with a slightly modified hamiltonian to take into account this change of variable the advantage of the second representation is that the dispersion of i t + s t is zero and thus smaller than the one of i t , leading to the use of grids with a smaller number of points. first, to give an overview of the overall trend, we plot, on figure , trajectories of the optimal interaction rate β , and the associated proportions s t and i t of susceptible and infected, using the resolution method (i) mentioned above, i.e., with state variables (s, i). for more accurate trajectories, we compare on figure two different trajectories of the optimal interaction rate β , together with the corresponding dynamic of the proportion i of infected. for these two simulations, we compare the results given by the two aforementioned methods. more precisely, while the blue curve is obtained through the direct resolution, the orange one results from the second method, i.e., with state variables (s, s − i). finally, on figures and , we test the influence of the parameter τ p by setting τ p = . , instead of . proportion i of infected proportion s of susceptible voluntary lockdown of the population. as expected, the optimal behaviour β is to start close to β, then we note that β decreases as the disease spreads in the population. more specifically, two waves of effort can be observed: the first one delays the acceleration of the epidemic, and the second, generally more significant, takes place during the peak of the epidemic. approaching the fixed maturity, individuals come back to their usual behaviour β. however, even if the population chooses to decrease the interaction rate among individuals, the range of β stays quite small with minimum . and maximum β = . . optimal effort % of infected sensitivity with respect to the method. as we can notice in figure (top), the optimal effort obtained for these two simulations exhibits the same features as those previously described. moreover, the blue curve and the orange curve, representing respectively the results of the two aforementioned methods, are very close, except at the beginning of the time interval, probably because of the very small initial value i . nevertheless, we can see on the bottom graphs that the two methods lead to the same dynamic for the proportion of infected, since the two curves, blue and orange, are almost superposed. therefore, a small error on the computation of the optimal effort at the beginning does not impact the optimally controlled trajectories of i. the resolution with respect to (s, s + i) seems to be more regular, and may give a command closer to the analytical one. the fear of the infection is not enough. without a proper government policy to encourage the lockdown, the natural reduction of the interaction rate among individuals is not sufficient to contain the disease, so that it spreads with a high infection peak, up to . . as a result, even if at the end of the time interval under consideration, the epidemic appears to be over, between and % of the population has been contaminated by the virus, since the proportion s at time t = lies between . and . . in conclusion, without some governmental measures, the fear of the epidemic is not sufficient to encourage the population to make sufficient effort, in order to significantly reduce the rate of transmission of the disease. the introduction by the government of an effective lockdown policy together with an active testing policy should improve the results of the benchmark case, in particular by reducing the peak of infection and the total number of infected people over the considered period. optimal effort % of infected simulation simulation figure : the optimal transmission rate β and the resulting proportion i with τ p = . comparison between of the two methods on two simulations. the lockdown fatigue. by setting τ p = . instead of , the cost of the lockdown from the population's point of view is now increasing with time. this allows to take into account the possible fatigue the population may suffer if the lockdown continues for too long. as expected, by comparing figures and , the impatience of the population, gives higher values of optimal interaction rate β. moreover, comparing figures and , we can see that in both simulations, the second wave of effort is of course more impacted (i.e., the contact rate is less reduced) by the impatience of the population than the first one. optimal control β proportion i of infected % s of susceptible we focus in this section on the tax policy, by assuming that a = { }. in words, we assume that the government does not implement a specific testing policy, which means α = as in the benchmark case, but only encourages the population to lockdown through the tax policy χ. in such a situation, i.e., without a proper testing policy, the detection and hence the isolation of ill people becomes very intricate. the only possibility to regain control of the epidemic was to reduce the interaction rate of the population. this case is interesting, as it corresponds to the lockdown policy that most of western countries have implemented in , when faced with the covid- disease, while a very small number of tests was available. indeed, most countries put in place systems of fines, or even prison sentences, to incentivise people to lockdown. although the penalties for non-compliance are not as sophisticated as in our model, most governments did adapt the level of penalties according to the stage of the epidemic: higher fines during periods of strict lockdown (hence at the peak of the epidemic), or in case of recidivism, for example. this reflects the adjustment of sanctions in many countries according to the health situation, and therefore a notion of dynamic adaptation to circumstances, which is exactly what is suggested by our tax system. though it is clear that our model is different from reality, since we consider a fine/compensation, paid at some terminal time t , and equal for each individual, whereas in most countries, the fine is paid by a particular individual who has not complied with the injunctions, we still believe it allows to highlight sensible guidelines. the numerical approach is highly similar to the method used to solve the benchmark case. one difference is that we have to estimate the reservation utility of the population, namely v, given by ( . ). using a monte-carlo method and an euler scheme with a time-discretisation of time-steps and trajectories, we obtain an approximated value v = − . . then, we can solve ( . ) through the aforementioned semi-lagrangian scheme, with time steps, as well as a step discretisation for the grid in (s, i, y) corresponding to ( . , . , . ), leading to a number of meshes at maturity equal to × × (for z max = ). a last technical point concerning the domain of the control z. although this control of the government, used to index the tax on the proportion of infected, can take high values, we have to bound its domain in order to perform the numerical simulations. we choose to restrict its domain to an interval [−z max , z max ], and consider a discretisation step equal to . . one would naturally expect that a larger choice would lead to somewhat better solutions. however, this neglects a fundamental numerical issue: large values of z increase the numerical cost, as they enlarge the volatility of the process y (given by σzis). as such, since the volatility cone becomes larger, it is necessary to sample a much larger grid in order to be able to cover the region were y will most likely take its values. too large values of z max therefore become numerically intractable, unless one is willing to sacrifice accuracy. a balance need to be struck, which is why we capped z maz at . a sensitivity analysis with respect to variations of z max is provided in figure . though the trajectories of the optimal z are somewhat impacted, figure confirms that this is minimal impact on the trajectories of i itself. indeed, for different values for z max , the shape of the parameter z remains the same. more importantly, we will see that the paths of the optimal transmission rate, namely β , associated to different z max , are almost superposed. as a consequence, the dynamic of i also follows almost the same paths independently of z max . first, we present in figure different trajectories of the proportion i of infected when the government implements the optimal tax policy, and compare it to the trajectories obtained in the benchmark case. as mentioned before, we also want to study the sensibility with respect to the arbitrary bound z max , and we thus represent the paths of i in three cases, in addition to the benchmark case: for z max = (orange curves), z max = (green), and z max = (red). then, the corresponding simulations of the optimal control z of the government, used to index the tax on the proportion of infected, is given in figure . we compare optimal controls β and z for the tax policy with different lockdown time period in figure . finally, figure regroups the simulations of the optimal transmission rate β obtained with the tax policy, and compare it to β • obtained in the benchmark case. the epidemic is at best contained, and at worst delayed. compared to the benchmark case, we observe in figure that the optimal lockdown policy prevents the epidemic peak in most cases by maintaining the epidemic to low levels of infection during the lockdown period. therefore, the government has more time to prepare for a possible infection peak after the lockdown, specifically to increase hospital capacity and provide safety equipment (surgical masks, hydro-alcoholic gel, respirators...). the government can also use this time to fund the development of tests to detect the virus, as well as the research on a vaccine or a remedy for the related disease. nevertheless, we can see that at the end of the lockdown period, in many cases the virus is not exterminated and the epidemic may even restart. this is particularly well illustrated by figure , representing trajectories of i, obtained with the optimal control. such a phenomenon can be understood as follows: the lockdown slows down the epidemic, so that a very small proportion of the population has been infected and is therefore immune. we thus cannot thus rely on herd immunity, which is reached if at least % of the population has been contaminated, to prevent a resurgence of the epidemic. consequently, this lockdown policy is a powerful leverage to control an epidemic, but this tool needs to be supplemented by alternative policies, such as those mentioned above, in order to be fully effective. if the time saved through lockdown is not exploited, it will have no impact on the final consequences of the epidemic, measured by the economic and social cost associated with the total number of people infected and deceased during the total duration of the epidemic. policy implications by comparing the graphs in figure , we first remark that the shape of the optimal indexation parameter rate z remains the same, regardless of the simulation and the value of z max . the control takes the most negative value possible (−z max ) for about days, then increases almost instantaneously to reach the maximum value z max , before slowly decreasing to . therefore, the optimal tax scheme set by the government is as follows. first, at the beginning of the epidemic, it seems optimal to give to the population a compensation (corresponding to a negative tax) as maximal as possible, by setting z = −z max . though this may be a numerical artefact, given that the initial values of i and its variations are extremely low, the fact that the same phenomenon appeared in virtually all our simulations tends to show that it is actually significant. we interpret this as a the government anticipating the negative consequences of the lockdown policy by immediately providing monetary relief to the population. this is exactly what happened in several countries, for instance in the usa with stimulus checks sent to every citizen, and our model endogenously reproduces this aspect. policy-wise, it also shows that maximum efficiency for such stimulus packages is attained when they are provided to the population as early as possible. after this initial phase, when the epidemic spreads among the population, the government suddenly increases z, so that the tax becomes positive and is in fact maximum, in order to deter people from interacting. approaching the maturity, the government eases the lockdown little by little. however, this end of lockdown may be premature, since we have observed in the previous figures that the epidemic may restart at the end of the considered period. indeed, considering a final time horizon is equivalent to assuming that 'the world' stops at that time: all the potential costs generated by the epidemic after t are not taken into account in the model. the government thus has no interest in implementing costly measures, whose subsequent impact on the epidemic will not be measured. nevertheless, this boundary effect has no impact on the previous results and interpretations. indeed, we remark in the numerical results that if we consider a more distant time t , the lockdown certainly lasts longer, but follows the exact same paths during most of the lockdown period, and its release occurs around the same time before maturity (see figure below). moreover, the lockdown period should still end at some time, which is why a finite terminal time is assumed. this time may correspond to an estimate of the time needed to implement other more sustainable policies than lockdown, such as the implementation of an active testing policy, or to hope for the discovery of a vaccine or cure, as mentioned above. optimal control z optimal tax sensitivity with respect to the lockdown duration. on figure , we give two trajectories of the optimal contact rate β (on the left) and the optimal indexation parameter z (on the right) for two different maturities. it is clear that both trajectories follow the same paths until some point. regardless of the maturity, the contact rate β and the parameter z have the same characteristics as those shown respectively in figures and . as one approaches the shortest maturity, i.e. t = , the parameter z decreases towards for the contract of this maturity, while the other remains at the maximum, and decreases later, as its maturity approaches. therefore, the fact that z decreases at maturity, as mentioned in the paragraph 'policy implications' above appears to be a boundary effect since it is not sensitive with respect to the maturity. optimal interaction rate and comparison with the benchmark case. we now explain the general trend of the optimal interaction rate. in the beginning, recall that z is negative, meaning that the tax is negatively indexed on the variation of i. in other words, since i is globally (but very slightly) increasing at the beginning of the epidemic, the compensation increases with i, which means that the population is not incentivised at all to decrease their contact rate, and thus the transmission rate of the virus, which remains equal to the initial level β. then, as the epidemic spreads, z becomes very high, which now incentivises the population to reduce the transmission rate below β. finally, near the end of the lockdown period, z plunges to zero, which naturally implies that the optimal contact rate β goes back to its usual level β. by comparing with the benchmark case, we see that the tax policy succeeds in reducing significantly the interaction rate. as a consequence, and as we have seen in figure , the tax policy contains the spread of the disease during the considered time period, unlike in the case without intervention of the government. contract case benchmark case figure : simulations of the proportion i of infected in the sir model comparison between the case with tax policy (but without testing) on the left and the benchmark case on the right. in this section, we now study the case where the government can implement an active testing policy, in addition to the incentive policy for lockdown, to contain the spread of the epidemic. this policy is similar to the one adopted by most european governments in june , after relatively strict containment periods and at a time when the covid- epidemic seemed to be under control. indeed, the lockdown periods in europe have generally made it possible to delay the epidemic, and thus to give public authorities time to prepare a meaningful testing policy by developing and increasing the number of available tests. this testing policy has two major interests. first, it allows the identification of clusters, and therefore provides a more precise knowledge of the dynamics of the epidemic in real time on the different territories. second, by identifying infected people, we can force them to remain isolated, in order to avoid the contamination of their relatives. this policy therefore constitutes another leverage, in addition to containment, to reduce the contact rate within the population. thus, by developing a robust testing policy, public authorities can in fact relax the lockdown while keeping the rate of disease transmission at a sufficiently low level. therefore, comparing with the no-testing policy case, we expect that (i) the government will be able to control the epidemic at least as well as with just the lockdown policy; (ii) it will allow the population to regain a contact rate closer to the desired and initial level β. to study the optimal testing policy α , taking values in a := [ε, ], we consider the cost of effort k given by ( . b) . this cost function emphasises the fact that testing the entire population every day is inconceivable, and therefore results in an explosion of cost when α takes values close to . recall that the parameters for the function k, namely κ g and η g are given in table b . finally, a is discretised with a step equal to . and we consider z max = . as we can see from the six selected simulations below, the control z is very regular (see figure ), while the control α is less regular and concentrated at the heart of the epidemic (see figure ). figure gives a global overview of the simulations, which confirms the intuition given by the six selected ones. comparison between the three cases, the benchmark, with, and without testing. relaxed lockdown but lower effective transmission rate. first, comparing figures and , the optimal control z presents the same shape in both cases, except at the beginning, since now z is not negative initially. in fact, in this case, we observe that the government is asking for less effort from the population, and therefore the initial stimulus mentioned in the paragraph 'policy implications' still happens, but later and for a much shorter length. figure also shows that the optimal contact rate is closer to the initial level β, which should induce a more violent spread of the disease. nevertheless, the control α, representing the testing policy and given by figure , balances this effect. indeed, the testing allows an isolation of targeted infected individual, and therefore contribute to the decrease of the effective transmission rate of the disease, represented in figure . therefore, comparing figure with figure , we notice that the control of the epidemic is more efficient than in the case a = { }, since the proportion of infected is globally decreased. optimal contact rate β effective transmission rate β √ α first, remark that, with the particular choice of utility functions, we have otherwise, if ≥ , the optimal tax policy is equal to −∞, which cannot be optimal from the government's point of view, since it leads to an infimum on equal to +∞ (see ( . ) ). for each value of the lagrange parameter, a two dimensional pde with a two-dimensional control (α, β) is considered. a step discretisation for the grid in (s, i) is taken equal to ( . , . ). a = [ε, ] is discretised with values and the values of β are discretised with equally spaced values (to reduce the cost of optimisation). we then search for the optimal parameter with a step of . within the interval ( , ). we obtain in this case an optimal value equal to . and we give on figure the results, which show in particular that the epidemic is controlled in a similar way as in the second-best case, with incentives and testing policy. testing policy α proportion of infected i figure : trajectories obtained in the first-best case. the shape of the optimal controls β and α, as well as the trajectories for the proportion i of infected, are highly similar to those obtained in the previous case. the only clear difference is the principal's value. indeed, we can compare the optimal value v p for the government in the moral hazard case, to the first best value v p,fb . using trajectories and the previously optimal control computed, we estimate v p,fb = − . while v p = − . . the difference between the two values, with a relative difference of % only pleads in favour of our incentive model: even without being able to track all the population, governments can achieve containment strategies with very similar levels of efficiency, and costs which are not significantly higher. this is of course partly explained by the fact that the testing is profitable both for the government and for the population, as it allows for values of β very close to its usual value β, as shown on figure . we fix a small parameter ε ∈ ( , ) to consider the subset a := [ε, ]. we then define by a the set of all finite and positive borel measures on [ , t ] × a, whose projection on [ , t ] is the lebesgue measure. in other words, every q ∈ a can be disintegrated as q(ds, dv) = q s (dv)ds, for an appropriate borel measurable kernel (q s ) s∈[ ,t ] , meaning that for any s ∈ [ , t ], q s is a finite positive borel measure on a, and the map [ , t ] s −→ q s is borel measurable, when the space of measures on a is endowed with the topology of weak convergence. we then define the following canonical space Ω := c × a, whose canonical process is denoted by (s, i, Λ), in the sense that s t s, ι, q := s(t), i t s, ι, q := ι(t), Λ s, ι, q := q, ∀ t, s, ι, q ∈ [ , t ] × Ω. we let f be the borel σ-algebra on Ω, and f := (f t ) t∈[ ,t ] be the natural filtration of the canonical process where for any (s, Υ) recall that in this framework f = f t . let m be the set of probability measures on (Ω, f t ). for any p ∈ m, we let n p be the collection of all p-null sets, that is to say where we recall that Ω represents the set of all subsets of Ω, and we let (ii) p (s , i ) = (s , i ) = ; (iii) with p-probability , the canonical process Λ is of the form δ φ· (dv) for some borel function φ : [ , t ] −→ a, where as usual, for any a ∈ a, δ a is the dirac mass at a. we can follow bichteler [ ] , or neufeld and nutz [ , proposition . ] to define a pathwise version of the density of the quadratic variation of s, denoted by σ : [ , t ] × Ω −→ r, by notice that the initial value of r of r, which appears in the sir version of the model, is irrelevant at this stage. lévy's characterisation of brownian motion ensures that the process is an (f p , p)-brownian motion for any p ∈ p. for any p ∈ p, we denote by a o (p) the set of f-predictable and a-valued process α := (α s ) s∈[ ,t ] such that, p-a.s. we recall that the term λ ≥ denotes the birth rate, the parameter µ ≥ is the natural death rate in the population (susceptible and infected), γ ≥ is the death rate inside the infected population. the parameters ν and ρ correspond to recovery rates, depending on whether we are considering a sis or a sir model, see the remark below for more details. (i) if ρ = , the constant ν ≥ is the rate of recovery for infected people, going back in the class of susceptible. this case corresponds to the classical sis model whose dynamics are described by the system ( . ); (ii) if ν = , the constant ρ ≥ is the recovery rate for infected individual, going into a class of recovered people, whose proportion is denoted by r. this case corresponds to the classical sir model described by ( . ) . it can be noted that our model, which results from a mixing of the sis and sir models, can be interpreted as an sir model with partial immunisation, in the sense that only a part of the population develops antibodies for the disease after being infected. thus, a proportion ρ of the infected moves to the class r, and can no longer be infected. conversely, the proportion of the infected who do not develop antibodies reverts to the class s, and can therefore contract the disease again. this resulting model is similar to the one developed by zhang, wu, zhao, su, and choi [ ] and called sisrs. this type of model seems in fact well suited to model epidemics related to new viruses, such as the covid- , when the immunity of infected persons has not yet been proved. before pursuing, we need a bit more notations, and will consider the following sets as well as, for any α ∈ a o , p(α) := p ∈ p : α ∈ a o (p) . we will require that the controls chosen by the government lead to only one weak solution to equation ( . ) , and are such that the processes s and i remain non-negative. we will therefore concentrate our attention to the set a of admissible controls defined by notice that for any α ∈ a, we have σ t = σs t i t α t , dp α ⊗ dt-a.e. more precisely, one should first use the result of stroock and varadhan [ , theorem . . ] to obtain that on an enlargement of (Ω, f t ), there is for any p ∈ p, a brownian motion w p , and an f-predictable process, a-valued process α p such that the result for w is then immediate. notice in addition that since w is defined as a stochastic integral, it should also depend on explicitly on p. we can however use nutz [ notice finally that for any α ∈ a, we have we thus deduce, using the positivity of s and i, that where for all (t, s, i) this result proves in particular that s and i are actually p α -almost surely bounded, for any α ∈ a. moreover, if (s , i ) ∈ (r + ) , then for all t ∈ [ , t ], both s t and i t are (strictly) positive. note that in the sir model described by the system ( . ), we have, for all t ∈ [ , t ], so that r t depends only on the observation of i s for s ≤ t. in addition to that the basic model from ( . ) takes into account the testing policy put into place by the government, but ignores so far the interacting behaviour of the population. we model this through an additional control process chosen by the population. more precisely, we fix some constant β max > representing the maximum rate of interaction that can be considered, and we define b := [ , β max ]. let b be the set of all f-predictable and b-valued processes. given a testing policy α ∈ a implemented by the government, notice that the following stochastic exponential , is an (f, p α )-martingale, given that the process β/(σ √ α) takes values in , β max /(σ √ ε) , p α -a.s. therefore, for any (α, β) ∈ a × b, we can define a probability measure p α,β on (Ω, f), equivalent to p α , by using girsanov's theorem, we know that the process is an (f, p α,β )-brownian motion, and we have ( . ) at time , the government informs the population about its testing policy α ∈ a, as well as its fine policy χ, which for now will be an f t -measurable and r-valued random variable (a set we denote by c). the population solves the following optimal control problem the interpretation of the functions u and u is detailed in section . . , where the population's problem was informally defined. for any (α, χ) ∈ a × c, we recall that we denoted by b (α, χ) the set of optimal controls for v a (α, χ), that is to say we require minimal integrability assumptions at this stage, and insist that there exists some p > such that remark . . notice that since for any α ∈ a the radon-nykodým density dp α,β /dp α has moments of any order under p α (since any β ∈ b is bounded and any α ∈ a is bounded and bounded away from ), a simple application of hölder's inequality ensures that ( . ) implies that for any p ∈ ( , p) and any β ∈ b recall that the government can only implement policies (α, χ) ∈ a × c such that v a (α, χ) ≥ v, where the minimal utility v ∈ r is given. we denote the subset of a × c satisfying this constraint and equation ( . ) by Ξ. in line with the informal reasoning developed in section . . , the government aims at minimising the number of infected people until the end of the lockdown period, and we write rigorously its minimisation problem as since the fine policy χ is an f t -measurable random variable, where f is the filtration generated by the process (s, i), we should expect that in general v a (α, χ) = v( , s , i ), where the map v : [ , t ] × c −→ r satisfies an informal hamilton jacobi bellman (hjb for short) equation, and as such has the dynamic in particular, defining z := z s − z i , we should have given the supremum appearing above, the following assumption will be useful for us. where, in this case, the population's hamiltonian h : since the dynamics of r is deterministic and not controlled, a simplification occurs between the additional part of the hamiltonian (ρi − µr) z and the integral with respect to dr, which leads to the same form for the utility function as previously mentioned, i.e., equation ( . ). let us start this section by defining two useful spaces. for any α ∈ a, and any m ∈ n , we define s m (p α ) and h m (p α ) as respectively the sets of r-valued, f p α + -adapted continuous processes y such that y s m (p α ) < ∞, and the set of theorem . . let (α, χ) ∈ Ξ. there exists a unique f p α + -measurable random variable y and a unique z ∈ h p (p α ) such that proof. fix (α, χ) ∈ Ξ as in the statement of the theorem. let us consider the solution (y, z) of the following bsde ( . ) since χ ∈ c, u is continuous, i and s are bounded, and b is a compact set, it is immediate this bsde is well-posed and admits a unique solution (y, z) ∈ s p (p α )×h p (p α ) (in a more general context, one may refer for instance to bouchard, possamaï, tan, and zhou [ , theorem . ] ). therefore, using the dynamic of i under p α , given by equation ( . ), as well as the definition of β , and letting t = , we obtain that ( . ) is satisfied. next, using this representation for u (χ), notice that for any β ∈ b, we have where we used the fact that that z ∈ h p (p α ), and that the process is continuous, and both an (f p α , p α )-and an (f p α + , p α )-martingale (see for instance neufeld and nutz [ , proposition . ] ), so that for any β ∈ b the previous inequality implies that moreover, thanks to assumption . , equality is achieved if and only if we choose the control β . this shows that in the previous result, the fact that equation ( . ) holds with an f p α + -measurable random variable and not a constant is somewhat annoying. the next lemma shows that we can actually have the representation with a constant without loss of generality. lemma . . let α ∈ a, and fix an f p α + -measurable random variable y and some z ∈ h p (p α ). define the following contracts then proof. the equalities for (α, χ) are immediate from theorem . . for (α, χ ), we have, using the fact that z ∈ h p (p α ), and thus z ∈ h q (p α,β ) for any β ∈ b and any q ∈ ( , p) since the equality is attained if and only if we choose β = β , this ends the proof. we introduce the class Ξ of contracts defined by all pairs α, u (− ) (−y y ,z t ) with α ∈ a, and y y ,z a process given, p α -a.s., for all t ∈ [ , t ] by with z ∈ h p (p α ) and y ∈ [v, ∞). we also denote for simplicity p ,α,z := p α,b (s·,i·,z·) . lemma . . the problem of the government given by ( . ) can be rewritten proof. from theorem . and lemma . , we know that Ξ ⊂ Ξ. to prover the reverse inclusion, let us now consider a pair α, −u (− ) y y ,z t ∈ Ξ. we simply need to ensure that −u (− ) y y ,z t ∈ c. we have, using the fact that u is continuous, b is compact, α is bounded below by ε, and s and i are bounded, that there exists some constant c > , which may change value from ligne to ligne, such that where we used burkholder-davis-gundy's inequality and cauchy-schwarz's inequality. this proves the reverse inclusion and thus that Ξ = Ξ. next, we use lemma . to realise that b α, to conclude, it is enough to notice that the following map is non-increasing. lemma . states that the problem of the government can be can be reduced to a more standard stochastic control problem. however, in the current formulation, one of the three state variables, namely y , is considered in the strong formulation, while the other state variables s and i are considered in weak formulation. indeed, the variable y is indexed by the control z, while the control (α, z) only impacts the distribution of s and i through p ,α,z . as highlighted by cvitanić and zhang [ , remark . . ] , it makes little sense to consider a control problem of this form directly. therefore, contrary to what is usually done in principal-agent problems (see, e.g., [ ] ), we decided to adopt the weak formulation to rigorously write the problem of the principal, since this is the formulation which makes sense for the agent's problem. we will thus formulate it below, for the sake of thoroughness. let v := r × a and consider the sets v as we defined a in section . . . the intuition is that the principal's problem depends only on time and on the state variable x = (s, i, y ). following the same methodology used for the agent's problem, to properly define the weak formulation of the principal's problem, we are led to consider the following canonical space we let g be the borel σ-algebra on Ω p , and g := (g t ) t∈[ ,t ] the natural filtration of (s, i, y, Λ p ), defined in the same way as f in the previous canonical space Ω (see section . ). let then m p be the set of probability measures on (Ω p , g t ). for any p ∈ m p , we can define g p the p-augmentation of g, its right limit g p+ , as well as f Π := (f Π t ) t∈[ ,t ] the Π-universal completion of f for any subset Π ⊂ m p . the drift and volatility functions for the process x are now defined for any (t, s, i, z, a) where u (t, s, i, z, a) := u (t, b (t, s, i, z, a) , i), for all (t, s, i, z) b p (r, s r , i r , v) · ∇ϕ p (x r ) + tr d ϕ p (x r ) Σ p (Σ p ) (r, s r , i r , v) Λ p (dr, dv). in the spirit of definition . for p ⊂ m, we define the subset q ⊂ m p as the one consisting of all p ∈ m p such that (iii) with p-probability , the canonical process Λ p is of the form δ φ· (dv) for some borel function φ : [ , t ] −→ v . still following the line of section . , we know that for any p ∈ q, we can define a (g q , p)-brownian motion w p . we then denote by v o (p) the set of g-predictable and v -valued process (z, α) such that, p-a.s. and for all t ∈ [ , t ], ( . ) thank to the analysis conducted in the previous subsection, the problem of the government given by ( . ) can now be written rigorously in weak formulation where s represents the set of × symmetric positive matrices with real entries. more explicitly, the hamiltonian can be written as follows we are then led to consider the following hjb equation, for all t ∈ [ , t ] and x = (s, i, y) ∈ r : with terminal condition v(t, x) := −u (− ) (y), and where the natural domain over which the above pde must be solved is o := (t, s, i, y) ∈ [ , t ) × r + × r : < s + i < f (t, s , i ) , recalling that f is defined by ( . ). where v p should be understood as the unique viscosity solution, in an appropriate class of functions, of the pde ( . ). obtaining further regularity results is by far more challenging. indeed, it is a second-order, fully non-linear, parabolic pde, which is clearly not uniformly elliptic, the corresponding diffusion matrix being degenerate. this makes the question of proving the existence of an optimal contract a very complicated one, which is clearly outside the scope of our study. as a sanity check though, we recall that ε-optimal contracts always exist, and can be indeed approximated numerically. see for instance kharroubi, lim, and mastrolia [ ] for an explicit construction of such ε-optimal contracts in a particular case dealing with the stochastic logistic equation. as already mentioned, the first-best case corresponds to the case where the government can enforce whichever interaction rate β ∈ b it desires (in addition to a contract (α, χ) ∈ a × c), and simply has to satisfy the participation constraint of the population. in order to find the optimal interaction rate in this scenario, as well as the optimal contract, one has to solve the government's problem defined by ( . ). the simplest way to take into account the inequality constraint in the definition of v p,fb is to introduce the associated lagrangian. by strong duality, we then have first, by concavity of u , it is immediate that for any given lagrange multiplier > , the optimal tax is constant and given by ( . ) . then, using the definition of v ( ) for any > in ( . ), we have: note that v ( ) is the value function of a standard stochastic control problem. therefore, we expect to have where the hamiltonian is defined, for t ∈ [ , t ], (s, i) ∈ (r + ) , p := (p , p ) ∈ r and m ∈ s by to simplify, let us consider separable utilities with the forms ( . ). we focus on the maximisation of the hamiltonian h with respect to b ∈ b, to obtain the optimal interaction rate β . the maximiser b is defined by recalling that b • is defined by ( . ). in particular, for a given testing policy α ∈ a and a lagrange multiplier > , the optimal interaction rate in this case is given for all t ∈ [ , t ] by β t = b s t , i t , ∂v (t, s t , i t ), α t . we thus obtain where in addition for a ∈ a, u (t, s, i, p, a) := u (t, b (s, i, p, a) , i). then, the optimal testing policy is given for all t ∈ [ , t ] by α t := a (t, s t , i t , ∂v (t, s t , i t ), d v (t, s t , i t )), where a : [ , t ] × (r + ) × r × s −→ a is the maximiser of the previous hamiltonian on a ∈ a, if it exists. the boundary of the domain cannot be reached by the processes s an i, which is why it not necessary to specify a boundary condition there. notice though that the upper bound can formally only be attained when i is constantly , in which case s becomes deterministic, and the government best choice for α is clearly , and its choice of z becomes irrelevant. in such a situation, we would immediately have v p = v. we now focus on the seir/s (susceptible-exposed-infected-recovered or susceptible) compartment model. again, the class s represents the 'susceptible' and the class i represents the 'infected' and infectious. the seir and seis models are used to describe epidemics in which individuals are not directly contagious after contracting the disease. this therefore involves a fourth class, namely e, representing the 'exposed', i.e., individuals who have contracted the disease but are not yet infectious. with this in mind, we denote by ι the rate at which an exposed person becomes infectious, which is assumed to be a fixed non-negative constant. therefore, during the epidemic, each individual can be either 'susceptible' or 'exposed' or 'infected' or in 'recovery', and (s t , e t , i t , r t ) denotes the proportion of each category at time t ≥ . the difference between seis and seir models is embedded into the immunity toward the disease: for seir models, it is assumed that the immunity is permanent, i.e., after being infected, an individual goes and stays in the class r, whereas for seis models, there is no immunity, i.e., infected individual come back in the susceptible class at rate ν ≥ , similarly to sis models. as in the previously described sir model, we also take into account the demographic dynamics of the population, through the parameters λ, µ and γ. to sum up, the epidemic dynamics is represented in figure . recovery similarly to the previous models, we consider that the dynamic of the epidemic is subject to a noise in the estimation of the proportion of susceptible and infected individuals. inspired by the stochastic model in mummert and otunuga [ , equation ( )], we therefore consider that the dynamics of the epidemic is given by the following system note that the proportion i of infected and infectious is also uncertain, but only through its dependence on e and the proportion r of recovery is uncertain only through its dependence on i. more precisely, we assume that there is no uncertainty on both the recovery rate ρ, the rate ι at which infected people becomes infectious and the (potentially) rate ν at which an individual loses immunity, implying that if the proportion of exposed individual is perfectly known, the proportion of infected is also known without uncertainty and consequently the proportion of recovery is also certainly known. again this modelling choice is consistent with most stochastic seirs models, and emphasises that the major uncertainty in the current epidemic is related to the non-negligible proportion of (nearly) asymptomatic individuals. indeed, an asymptomatic individual may be misclassified as susceptible or exposed. we will now give (informally) the optimisation problems faced by both the population and the government, the rigorous treatment can be done following the lines of section . the most important change compared to sis/sir models is that the criteria should now depend on the sum e + i, representing the proportion of the population having contracted the disease, rather than just the proportion i of infectious people. unless otherwise stated, the notations are those of section . the problem of the population is now while that of the government becomes notice that in the cost function k, we did not replace i by i + e. this is due to the fact that this cost should scale with the volatility of i + e (see the discussion in example . ), which is still σ α · (s · i · ) in the model ( . ). for b ∈ b. given the supremum appearing above, and similarly to assumption . , we make the following assumption. therefore, a straightforward adaption of our earlier arguments will show that every admissible contract will take the form χ := −u (− ) (y t ) where where β t := b (t, s t , e t , i t , z t , α t ) for all t ∈ [ , t ] is the optimal control of the population. it thus remain to solve the government's problem. unlike in the previous sis/sir models, there are now four state variables for the government's problem, namely (s, e, i, y ), whose dynamic under the optimal effort of the population is as follows recalling that f is defined by ( . ). solving numerically ( . ) is really more challenging since it increases the dimension of the problem. a numerical investigation seems to be complicated as far as we now, and we left these numerical issues for future researches. there are of course plethora of generalisations of the models we have considered so far. for instance, in seirs (or also sirs) models, the immunity is temporary, i.e. people in the class r may come back into the class s at rate ν. using a similar stochastic extension of this model, it is straightforward that all our results extend, mutatis mutandis, to this case as well, albeit with one important difference: the control problem faced by the government now has states variables, namely (s, e, i, r, y ). even more generally, our approach can readily be adapted to compartmental models considering additional classes: for instance the sidarthe ('susceptible' (s), 'infected' (i), 'diagnosed' (d), 'ailing' (a), 'recognised' (r), 'threatened' (t), 'healed' (h) and 'extinct' (e)) model investigated in giordano, blanchini, bruno, colaneri, di filippo, di matteo, and colaneri [ ] for covid- . of course the price to pay is that the number of state variables in the government's problem will increase with the number of compartments, and numerical procedures to solve the hjb equation will become more delicate to implement, and could be based on neural networks. similar to section , we present in this appendix the numerical results obtained when considering a sis compartmental model, whose dynamic is given by ( . ), or equivalently by ( . ) with ρ = . we take the same parameters as for the sir case to model the preferences of the government and the population, i.e. the parameters given in table , except for β max = . . to model the sis dynamic, we consider a different set of parameters (see table ), in order to obtain the same shape for the proportion of infected at the beginning of the epidemic in both cases of an sir and sis dynamics. this choice is made to model the fact that, at the beginning of a relatively unknown epidemic such as that of covid- , the proportion of infected people is observed (with noise), but the authorities do not necessarily know whether this disease allows immunity to be acquired. table : set of parameters for the simulation of sis dynamics to solve the benchmark case, we follow the method described in section . , although we choose here a number of time steps equal to , a time step discretisation equal to . , a linear interpolator, and the optimal command β used to maximise the hamiltonian is discretised with points given a step discretisation of . . once the pde is solved, a simulator is used in forward using the optimal command and giving the dynamic of the proportion (s, i). as for the numerical resolution of the benchmark case for the sir model, we implement two versions of the resolution, with variables (s, i) or (s, s + i). optimal effort % of infected simulation simulation figure : two simulations of the sis in the benchmark case comparison between the two methods. as the numerical results obtained in the benchmark case when the epidemic dynamic is given by a sis model have the same characteristics as with the sir dynamic, we describe the graphs only briefly below. figure . as in the sir case, the trajectories of β obtained through the two aforementioned resolutions are rather close, and the corresponding trajectories for i coincide. we plot trajectories of the optimal interaction rate β , the proportion of susceptible s, as well as the proportion of infected i. the population's behaviour is similar to the one obtained in the sir model: first the population behaves as usual, then begins to reduce β, which finally goes back to its usual values as the epidemic disappears. once again that the population's fear of infection is not sufficient to prevent the epidemic. as for the benchmark case, the numerical method to obtain the optimal lockdown policy is similar to the one used in the case of an sir dynamics. we only recall here the key points of the method. we first solve ( . ) with the semi-lagrangian scheme, taking v given by ( . ) and estimated with a monte carlo method, and by using an euler scheme with a time-discretisation of time steps and trajectories. the estimated value for v is − . . we then take a step discretisation for the grid in (s, i, y) corresponding to ( . , . , . ), leading to a number of meshes at maturity equal to × × . we consider the bounded set of values [− , ] for the control z, and a step discretisation equal to . . the graphs obtained are briefly described below. we present some trajectories of the optimal controls β and z, as well as the resulting proportion i of infected individuals. figure . we compare on some simulations the optimal transmission rate obtained with the contract to the one obtained in the benchmark case. we see that the tax succeeds in reducing significantly the interaction rate compared to the no-tax policy case. comparison with the benchmark case. due to the larger terminal time horizon, the computation time is particularly significant. to reduce it, the discretisation used to find the optimal control z is reduced to . the resulting graphs are briefly described below. we present trajectories of the optimal controls β, α and z, and the resulting proportion i of infected. we compare on simulations the optimal proportion of infected with the two previous cases (benchmark case and only tax policy): with testing, the epidemic is now totally under control. we present simulations of the optimal effective transmission rate in this case, and compare it to the optimal β obtained in the benchmark case and without testing policy. figure . we present simulations of the optimal α: its quick variations explain the swift changes in the effective β. optimal contact rate β optimal testing policy α optimal control z proportion i of infected an optimal isolation policy for an epidemic optimal electricity demand response contracting with responsiveness incentives a principal-agent approach to study capacity remuneration mechanisms an introduction to stochastic epidemic models comparison of deterministic and stochastic sis and sir models in discrete time a simple planning problem for covid- lockdown disability-adjusted life years: a critical review population biology of infectious diseases: part i how will country-based mitigation measures influence the course of the covid- epidemic? the lancet un modèle mathématique des débuts de l'épidémie de coronavirus en france a simple stochastic epidemic the mathematical theory of infectious diseases and its applications some evolutionary stochastic processes deterministic and stochastic models for recurrent epidemics optimal control of deterministic epidemics stability of epidemic model with time delays influenced by stochastic perturbations essai d'une nouvelle analyse de la mortalité causée par la petite vérole, et des avantages de l'inoculation pour la prévenir stochastic integration and l p -theory of semimartingales contract theory a unified approach to a priori estimates for supersolutions of bsdes in general filtrations. annales de l'institut henri poincaré an approximation scheme for the optimal control of diffusion processes finite-state contract theory with a principal and a field of agents contact tracing mobile apps for covid- : privacy considerations and related trade-offs covid- : extending or relaxing distancing control measures. the lancet public health on the lambertw function asset pricing under optimal contracts contract theory in continuous-time models moral hazard in dynamic risk management dynamic programming approach to principal-agent problems covid- -new insights on a rapidly changing epidemic optimal covid- epidemic control until vaccine deployment heterogeneous social interactions and the covid- lockdown outcome in a multi-group seir model optimal make-take fees for market making regulation capacities, measurable selection and dynamic programming part ii: application in stochastic control problems contracting theory with competitive interacting agents mean-field moral hazard for optimal energy demand response management a tale of a principal and many many agents contact rate epidemic control of covid- : an equilibrium view adaptive human behavior in epidemiological models impact of non-pharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand the effect of stay-at-home orders on covid- infections in the united states dynamics of a stochastic sis epidemic model with nonlinear incidence rates stochastic optimization library in c++. hal preprint hal- modelling the covid- epidemic and implementation of population-wide interventions in italy a model of incentive compatibility under moral hazard in livestock disease outbreak response livestock disease indemnity design when moral hazard is followed by adverse selection a stochastic differential equation sis epidemic model on the statistical measure of infectiousness optimal quarantine strategies for covid- control models a model for communicable disease control optimum control of epidemics the milroy lectures on epidemic disease in england -the evidence of variability and of persistency of type optimal control of epidemics with limited resources trauma does not quarantine: violence during the covid- pandemic aggregation and linearity in the provision of intertemporal incentives continuous-time principal-agent problem in degenerate systems on the responsible use of digital data to tackle the covid- pandemic a stochastic model for the optimal control of epidemics and pest populations asymptotic behavior of global positive solution to a stochastic sir model thucydes translated into english, to which is prefixed an essay on inscriptions and a note on the geography of thucydides, volume i how to run a campaign: optimal control of sis and sir information epidemics beyond just" flattening the curve": optimal control of epidemics with purely non-pharmaceutical interventions deterministic and stochastic epidemics in closed populations a contribution to the mathematical theory of epidemics optimal control of an sir epidemic through finite-time non-pharmaceutical intervention regulation of renewable resource exploitation on the extinction of the s-i-s stochastic logistic epidemic the theory of incentives: the principal-agent model optimal control applied to biological models. mathematical and computational biology series early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia second order backward sde with random terminal time random horizon principal-agent problem principal-agent problem with common agency without communication applications of mathematics to medical problems the dynamics of crowd infection on the optimal control of a deterministic epidemic parameter identification for a stochastic seirs epidemic model: case study influenza the quasi-stationary distribution of the closed endemic sis model on the quasi-stationary distribution of the stochastic logistic epidemic measurability of semimartingale characteristics with respect to the probability law pathwise construction of stochastic integrals information technology-based tracing strategy in response to covid- in south korea -privacy controversies the optimal covid- quarantine and testing policies an explicit optimal intervention policy for a deterministic epidemic model stochastic control for a class of nonlinear kernels and applications. the annals of probability privacy-preserving contact tracing of covid- patients transmission dynamics of the etiological agent of sars in hong kong: impact of public health interventions the prevention of malaria some a priori pathometric equations an application of the theory of probabilities to the study of a priori pathometry -part i the economics of contracts: a primer quantitative guidelines for communicable disease control programs a continuous-time version of the principal-agent problem calculating qalys, comparing qaly and daly calculations the first-order approach to the continuous-time principal-agent problem with exponential utility dynamic control of modern, network-based epidemic models optimal control of some simple deterministic epidemic models the interpretation of periodicity in disease prevalence multidimensional diffusion processes some models in epidemic control the benefits and costs of using social distancing to flatten the curve for covid- susceptible-infected-recovered (sir) dynamics of covid- and economic impact stability of a stochastic sir system incentive systems under ex post moral hazard to control outbreaks of classical swine fever in the netherlands some non-monotone schemes for time dependent hamilton-jacobi-bellman equations in stochastic control on the asymptotic behavior of the stochastic and deterministic models of an epidemic optimal isolation policies for deterministic and stochastic epidemics can we contain the covid- outbreak with the same measures as for sars? on dynamic principal-agent problems in continuous time where now for saving lives? law and contemporary problems epidemic spreading on a complex network with partial immunization key: cord- - vdz d authors: nikolaou, m. title: a fundamental inconsistency in the sir model structure and proposed remedies date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: vdz d the susceptible-infectious-removed (sir) compartmental model structure and its variants are a fundamental modeling tool in epidemiology. as typically used, however, this tool may introduce an inconsistency by assuming that the rate of depletion of a compartment is proportional to the content of that compartment. as mentioned in the seminal sir work of kermack and mckendrick, this is an assumption of mathematical convenience rather than realism. as such, it leads to underprediction of the infectious compartment peaks by a factor of about two, a problem of particular importance when dealing with availability of resources during an epidemic. to remedy this problem, we develop the dsir model structure, comprising a single delay differential equation and associated delay algebraic equations. we show that sir and dsir fully agree in assessing stability and long-term values of a population through an epidemic, but differ considerably in the exponential rates of ascent and descent as well as peak values during the epidemic. the novel pade-sir structure is also introduced as a approximation of dsir by ordinary differential equations. we rigorously analyze the properties of these models and present a number of illustrative simulations, particularly in view of the recent coronavirus epidemic. suggestions for further study are made. in their landmark publication contribution to the mathematical theory of epidemics, , kermack and mckendrick developed a general, if elaborate model structure to capture the dynamics of a fixed-size population comprising compartments of individuals susceptible (s) to a spreading infection, infectious (i), and removed (r) from the preceding two compartments by recovery or death. propagation of an individual from s to i to r underlies the basic context of the exercise. in a modeling tour-de-force, the authors eventually present, in equations ( ) through ( ) of their paper (ibid.), the general structure of the elaborate mathematical model they derive. they proceed to examine the implications of their model for special cases, and finally present a very special case resulting in a set of three relatively simple ordinary differential equations (odes, equations ( ) ibid.), which were destined to form the basis for a genre of mathematical models in epidemiology, the celebrated sir model and its many variants. the three odes are where prime denotes time derivative. , , consistent with the importance of the sir odes, the basic reproductive ratio, ≝ / , is widely considered "one of the most critical epidemiological parameters" and has even become a household name in the recent coronavirus epidemic. in the sentence right before they present their sir model in equation ( ), (ibid., cf. the above eqns. the assumption about constant is plausible, as it refers to the rate of spread of the epidemic (cf. eqn. ( ) ). while that parameter might change over time as a result of measures however, the rate of removal depends more on the duration over which individuals have remained infected and less on the size of that group. in the conceptually simple case where all individuals recover or die at a single number of days, , after their infection, the rate of removal is or , depending on whether is greater than or not ( figure ). of course, in reality will likely follow a distribution ( figure ) rather than being a single number. however, in that case as well, the removal rate will depend on comparison between and the distribution of , rather than on the size of the group remaining infectious for time . starting with the assumption that individuals leave the infectious group at time after infection, we develop in this paper a corresponding mathematical model structure, named delay sir (dsir), in the form of a single delay differential equation (dde) for , and two associated delay algebraic equations, for and in terms of . in the rest of the paper we first introduce the dsir model structure in section , and provide an intuitive exposition of its basic properties in section , where we also introduce the padé sir model structure as an ode approximation of dsir. rigorous analysis follows in section . in that section we explain that certain sir and dsir properties are exactly similar (e.g. herd immunity, total number of infected), while others are quite different (e.g. exponential rates, peak of the infectious group). extension to models with additional compartments beyond s, i, and r is discussed in section , with presentation of the dspir model. finally, the significance of this work and future extensions are discussed. to explain the derivation of the dsir model structure, we will rely on the detailed version of figure shown in figure . the schematic shows the evolution of , , over discrete time steps = , , , … of length each. the thick-green bordered rectangle in figure suggests by visual inspection that where ( , ) is the rate at which the infection spreads, for example in proportion to the product , as shown above. assuming that each new part of the infectious fraction, , gained at a given time step, , moves to the removed fraction, , in time steps, the thick-red bordered rectangle in figure suggests that eqns. ( ) and ( ) yield where ′ ≝ / and − ≝ ( − ). eqn. ( ) is a single nonlinear dde that involves only and is decoupled from equations for and . as such, it captures the entire dynamics of the dsir system. medrχiv.org michael nikolaou the remaining two fractions of the population, and , can simply be inferred by algebraic equations, as eqn. ( ) implies and + + = implies to put the schematic in figure in context, observe that summation of eqns. ( )-( ) and discretization yields this suggests that, according to the sir model, each thickblack bordered green rectangle in figure would have to be a constant fraction of the immediately previous orange column. however, with each new part of the infectious fraction moving to the removed fraction after steps (thickblack bordered orange rectangles moving to thick-black bordered blue rectangles) this cannot be true as an emerging fact, by the simple observation of the shapes of and + , which follow a convex to concave pattern after an inflection point. therefore, the assumption of constant , equivalent to constant in eqn. ( ) , is not compatible with the assumption of time to transition from infectious to removed being independent of the infectious fraction size. this will be further illustrated with simulations in the next section. incidentally, eqns. ( )-( ) of the sir model can also be decoupled to a single if non-intuitive nonlinear ode in terms of , solved in a form implicit in time, . before any theoretical properties of the dsir model structure are analyzed, a simple visual comparison between sir and dsir is presented. unless specifically stated otherwise, all numerical simulations have been conducted with , figure is the standard { , , } plot, for sir and dsir. note the faster dynamics of dsir compared to sir, the approximately twice as high peak of i for dsir compared to sir, as well as the asymmetric profile of for sir, compared to the symmetric dsir profile of . (details in section .) to further illustrate the dsir/sir relationship, figure and figure show the continuous-time counterparts of figure for the sir and dsir models, respectively. in this stacked representation of { , , } over time, it is clear that the sir model corresponds to a time-varying infectious period for each individual. in fact, the value of calculated according to eqn. ( ) as with ′ and produced by the dsir model, is time varying, as shown in figure . this discrepancy suggests that the standard interpretation of / as the average infectious period … estimated relatively precisely from epidemiological data is increasingly inaccurate as increases above . what is estimated from epidemiological data is the delay , rather than , and if / is used as an estimate of , as is typically done, the sir model response will be too slow, with a peak value for lower than its dsir counterpart by about half. depending on . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . ( ) . note that remains constant. figure is also superimposed, for comparison. the correspondence = / was used. ( ) . note that all lines start at = , as ′ ( < ) = . note also small spurious deviations from constant shortly after = , due to numerical approximation of ( ≤ ), by the continuous function as pointed out in the preceding sections, interpreting the value of in the sir model as the average infectious period is not accurate and produces misleading results. it turns out (appendix a) that the following simple remedy can be used to retain the ode structure of the standard sir model, while better approximating the dde dynamics of the more realistic dsir model structure: the sir equations for { ′ , ′ }, eqns. ( ) and ( ), can be replaced by the equally simple odes or by the more accurate set of odes with ( ) = . figure confirms that the trajectories for the dsir and the modified sir model structures -eqns. ( )-( ) and ( )-( ) with eqn. ( ) replaced by ( ) or ( ), respectively -are close to one another, both in terms of the time to peak and the value of the peak. as already mentioned, these properties are important when such models are used to anticipate hospitalization needs for the infected during an epidemic. figure . comparison of the profiles for the first-order padé sir, second-order padé sir, dsir, and sir models. note the improve approximation of dsir by the second-order padé sir, compared to first-order padé sir, as anticipated by eqns. ( ) or ( ), respectively. note (appendix a) that the essence of eqns. ( ) and ( ) is in approximating the pulse profile ( ) − ( − ) in figure by a transfer function approximation based on firstor second-order padé-approximants (in the laplace domain) (rather than by the decaying exponential of figure ) as shown in figure . padé-approximation has long been a popular . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint medrχiv.org michael nikolaou approach for approximating transcendental transfer functions by polynomial rational fractions in process control. figure . reduction of the infectious fraction from ( ) to (a) in a single front in terms of the heaviside step function with = / , (b) following a first-order padé approximation, and (c) following a second-order padé approximation. (cf. figure ) . because of the critical role of padé approximation in deriving eqn. ( ) for the odes of the modified sir model to approximate the dsir model, we will use the term padé sir to denote the modified sir model structure. finally, it should be noted that eqn. ( ) may seem counter-intuitive, as it appears to suggest that the generation and depletion rates of are and − / , respectively. however, eqn. ( ) rather suggests that while the susceptible depletion rate remains − , the infectious depletion rate appears as ( − / ) (which is -dependent) rather than − / (which is not) due simply to replacement of the delay term − in the laplace domain by its padé approximant. similarly, the rate of increase for in eqn. ( ) is -dependent, rather than not. standard theory of ddes (e.g. equation ( . ) in ch. and equation ( . ) in ch. of kuang, or gopalsamy ) can be applied to establish rigorous properties for the dsir model, such as global stability, convergence to a final steady state, and others. a complete analysis is beyond the scope of this paper. however, some important theoretical properties of the dsir model structure of practical interest are discussed next, particularly in comparison to their sir counterparts. the analysis of the padé sir model follows standard ode analysis and is presented more briefly, except when it has important implications for either theoretical or practical issues. eqn. ( ) can be used to show (appendix b) that an equilibrium point ̅ is stable and an epidemic outbreak does not occur iff note that the stability upper bound /( ) for the dsir model in the above eqn. ( ) coincides with the well known bound / = / dictated by the sir model under the widely used correspondence the same stability bound can be derived for the padé sir model structure using standard ode analysis based on linearization around an equilibrium point. accepting for now without proof the global stability of eqn. where ≝ and is the lambert function , of order . the standard plot for ∞ , equal to the total fraction of infected by the end of an epidemic, is shown in figure for completeness. note that the plot is valid for both sir and dsir models, with = / and = / , respectively. interestingly, while use of the lambert function to solve problems such as the above was pointed out as early as , it may have escaped the attention of most literature in this field. the lambert function in its various forms will turn up in a number of results below. it should also be noted that eqns. ( ) and ( ) are the same for the sir and padé sir model (appendix c) under the correspondence between and in eqn. ( ) . for the initial part of a spreading epidemic, starting from a perturbation of the steady state ( , , ) = ( ̅ , , − ̅ ) as ( < ) = , ( ) = , ̅ ≝ ̅ > it can be shown (appendix d) that the infectious fraction initially grows approximately at an exponential rate as . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . where Δ ≝ − ̅ and the constants , are in terms of ≝ (appendix d) with as shown in figure . similarly, the initial exponential rate of the padé sir model is given by where = / , and for the standard sir model, the exponential rate is where = / , shown in figure as well. this has immediate implications for the early rate of rise of the infectious fraction to its peak, * , as illustrated in figure . note that while the initial sir rate is half of the padé sir rate and about half of the dsir rate, all three models eventually reach the exact same steady-state values, as captured by eqns. ( ) and ( ). while it is not obvious to the author whether the peak * can be easily obtained for the dsir model, a good approximation can be obtained (appendix e) through the padé sir model, as * = , * = � − ln( ) − + � ( ) note that the above * , exact for the first-order padé sir model and approximate for the second-order padé sir and dsir models, is double the * of the standard sir model, as confirmed in figure . once more, there are obvious practical implications from this discrepancy. note also that in case an upper bound is placed on * , to avoid overwhelming hospitalization facilities during an epidemic, eqn. ( ) has an explicit solution for the corresponding maximum ≝ as for comparison, the standard sir model yields the values of indicated by eqns. ( ) and ( ), with corresponding definitions, are shown in figure . it is evident that the padé sir model places twice as tight a restriction on as the standard sir model, if is not to exceed the * value. the dsir model structure developed in figure can be easily extended to include additional compartments. in fact, practically all population models of infections developed to date using the concept of exchange between compartments can be immediately translated (a) from odes to ddes through replacement of compartment drain rates proportional to the drained quantity by drainage of amounts that have resided for a certain time, , in that compartment, or (b) from odes to padé approximations that maintain the ode structure but are more realistic. we illustrate these ideas on . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint medrχiv.org michael nikolaou the spir model structure, in view of its importance for the recent coronavirus epidemic. [ ] [ ] [ ] a distinct feature of the coronavirus causing covid- is that it enables infection transmission at the pre-symptomatic stage. therefore, the spir model structure (figure ) comprises the usual population fractions { , , } along with the pre-symptomatic infectious fraction, , with symptomatic infectious being . spir differs from the standard seir structure , , , by the way the four compartments interact. while it is formidable to practically monitor infections on presymptomatic infectious individuals, monitoring symptomatic infectious is more reasonable, as symptoms are clear and can be confirmed by testing. therefore, tighter restrictions can be placed on the i group, in addition to the p group typically following general restrictions placed on the general population to curb the spread of the epidemic, as captured by the spread factors and , respectively, in figure . the dynamics of the dspir structure is shown in figure , which follows the pattern of figure , with the addition of the pre-symptomatic infectious fraction . following the same technique as in section , we can immediately write the following equations for the dspir model structure by visual inspection of figure : combining the above equations and letting → ∞ yields the nonlinear dde and associated delay algebraic equations where and − are the durations of an individual's staying in the p and i compartment, respectively, and typically note that the form of the spir model following the standard sir pattern, eqns. ( )- ( ), is the coupled odes in the form of these odes is the following more realistic padé spir model, which approximates the dspir ddes, eqns. ( )-( ), significantly better than the sir odes (appendix f): . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . the illustration and analysis presented in sections and can be easily repeated for the dspir and padé spir structures. to maintain the scope of this publication, only a few properties will be explored below. the rest will be explored in more detail in forthcoming publications. we present here only a few simulations comparing the { , , , } profiles resulting from numerical solution of the spir, dspir, and padé spir models. the values are used in all simulations. , figure illustrates the differences in the infectious peaks, * and * , similar to these in figure . following the same approach as in section . , eqn. ( ) can be used to show (appendix g) that an epidemic outbreak does not occur around an equilibrium point ̅ iff the proof in appendix g is by approximation. it is conjectured that the bound in the above equation is exact. this conjecture will be examined in subsequent studies. it can be shown (appendix h) that the dspir counterpart of eqn. ( ) is where > and the counterpart of the sir and dsir plot ( figure . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . a subtle inconsistency in the standard sir model structure was pointed out. this inconsistency arises from misinterpretation of an assumption explicitly articulated in the original publication of kermack and mckendrick: that the depletion of the infectious compartment is proportional to the content of that compartment. the depletion rate constant, , is usually interpreted as equal to the inverse of the residence time in the infectious compartment, namely the duration of the infection, , for each individual (eqn. ( ) ). to the extent that this duration is about constant, the analysis presented here suggests that the preceding interpretation is incorrect, leading to certain erroneous conclusions. a corresponding model structure, termed dsir, was developed, to account for each individual leaving the infectious compartment after a certain duration. the dsir model structure comprises a single dde for the susceptible fraction, s, and associated algebraic equations capturing the dependence of the remaining population fractions on . while both sir and dsir produce the same results for assessment of stability and final values, the sir model produces a maximum of the infectious fraction, * , about half of its dsir counterpart. this has profound consequences if the sir model is used to predict * during an epidemic. it is also noted that even if the sir model parameters and are estimated based on experimental data fit -albeit under the wrong interpretationmodel predictions are still going to be inaccurate. this is because the standard sir model, comprising the three odes in eqns. ( )-( ), is structurally different from the dde form of the dsir model, eqns. ( )-( ), or even from the ode form of the padé-based approximation of the dsir model ddes, eqns. ( ) and ( ) or ( ) and ( ) . the dsir structure can be easily extented to other compartment-based population models. such an extension to the dspir model structure was presented and briefly illustrated and analyzed. this model is important for infections transmitted by both pre-symptomatic and symptomatic infected individuals. similarities and differences between spir and dspir models are of the same nature as between sir and dsir models. numerous additional issues related to this work can be considered, including the following: rigorous analysis of dde models; dde modeling and analysis for a distribution rather than uniform delay (cf. figure ) ; resolution of conjectures presented in the text; and implications for different forms of infection kinetics. such issues will be addressed in forthcoming publications. all computations were done in mathematica, available at the university of houston. sharing of teaching material about the sir model on github by prof. jeff kantor of notre dame is also gratefully acknowledged. research reported in this publication was partially supported by the institute of allergy and infectious diseases of the national institutes of health under award number r ai , financed with federal money. the content is solely the responsibility of the author and does not necessarily represent the official views of the national institutes of health. the roots of the transcendental characteristic equation is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . for stability, all must be in the left-half of the complex plane. we show first that for any positive value of ̅ , no complex root = + can cross the imaginary axis to move from stability to instability. because, if = were a root of eqn. ( ) for ̅ > , it would be therefore, only real roots should be considered in eqn. ( ) for stability analysis. furthermore, since the relevant values of are , − in eqn. ( ) for ∈ ℝ, and < implies which is satisfied for < (see figure below) leading immediately to eqn. ( ) . appendix c. proof of eqn. ( ) dividing eqn. ( ) by yields continuing on the last equation we get which is eqn. ( ) . the padé sir model also reaches the same result: dividing eqn. ( ) by eqn. ( ) and rearranging yields taking the limit as → ∞ with / = , ≈ , and ≈ yields eqn. ( ) . proof that the standard sir model also reaches the same result follows the same pattern and is omitted for brevity. therefore, the terms rapidly decay for ≥ , and the summation in eqn. ( ) quickly becomes approximately equal to . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint a contribution to the mathematical theory of epidemics modeling infectious diseases in humans and animals population biology of infectious diseases: part i mathematical biology: i. an introduction infectious diseases of humans: dynamics and control evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of coronavirus disease exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application risk assessment of novel coronavirus covid- outbreaks outside china an interactive web-based dashboard to track covid- in real time chemical process control: an introduction to theory and practice delay differential equations with applications in population dynamics stability and oscillations in delay differential equations of population dynamics on the lambertw function the lambert function belongs in the function hall of fame. chemical engineering education in preparation report -impact of nonpharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand institute for health metrics and evaluation (ihme) for some range of time. a time series representative of how the reproductive ratio r ðtÞ behaves for different combinations of and d is shown in fig. . the parameters chosen in fig. display strange nonchaotic dynamics for the sir model to be discussed in section iii. in fig. (a) , the reproductive ratio looks like a sinusoidal time series while in fig. (b) this resemblance is totally lost. in this subsection, we define three measures to distinguish the different types of attractors. we use the phase sensitivity dxðtÞ dt to distinguish the fractal geometry of snas from that of smooth torus and chaotic motions. here, x(t) represents the phase or state space of the attractor defined by the coordinates ðs; i; e; rÞ, depending on the model considered. in order to be able to capture the intermittent behaviour of phase sensitivity, one defines the following quantity: , , note the partial derivative indicates the differentiation of any one coordinate or state variable (e.g., s(t)). here, l represents a length of time. a growing cðlÞ as a function of l is indicative of nonsmoothness of the underlying attractor. , from cðlÞ, one can define the quantity which is a collection of minima's over different initial points. for sna cðlÞ $ l q q > , while for torus q ¼ . for chaotic orbits cðlÞ $ e ql . hereafter, we call this quantity the phase sensitivity parameter. the phase sensitivity parameter shows linear growth for snas, an exponential growth for chaotic orbit, and non-growing behaviour for smooth torus, as captured by the exponent q along with the functional form. next, we introduce the mean square displacement hr i. if w is any of the phase space coordinates (i.e., s, i, e, or r) and w ref is taken as a reference point, then the hr i is defined as follows: here, we take where t i represents continuous time with i ¼ ; ; :::; l (we took l ¼ ) and w ref ¼ . the hr i clusters around a single value for each coexisting attractor. furthermore, this quantity converges rapidly with l (note l is the length of the time series) and thus provides a good measure in counting the number of coexisting attractors in addition to providing an estimation of basin sizes. the lyapunov exponents fk i g ðk > k …Þ are used to detect whether attractors are chaotic or nonchaotic. in particular, the largest and second largest lyapunov exponents k ; have the following characteristics: • for chaotic orbits, k > . • for limit cycles, k ¼ and k ; …; n < . parameter is able to distinguish between sna and other orbits. the criterion of the largest lyapunov exponent (lle) for sna in maps is that the largest nonzero lyapunov exponent must be negative, while for flows it has to be zero. in the results shown in this work, we are able to ignore the zero lyapunov exponent (for flow one, le must always be zero) due to time coming explicitly in the dynamical equations eq. ( ). the evolution of the basin of attraction of snas can be studied using the distribution of lyapunov exponents in the space of initial conditions. for different initial conditions, if the largest non-zero lyapunov exponent clusters around a single value then one can say, that those initial conditions belong to the basin of attraction of the same attractor. for all the results provided below, unless otherwise stated, we set b ¼ (yr) À , g ¼ (yr) À , and r ¼ = : (yr) À , along with l ¼ : (yr) À . the disease parameter values are typical of measles infection dynamics and are taken from ref. . the dynamics of the sirs model is explored for a set of values (in yr À ) of the parameter j and are provided where applicable. we integrate the system dynamics numerically using the fixed-step fourth-order runge-kutta method with a time step size of . , including the calculations of the lyapunov exponents and other measures described in section ii c. initial conditions are provided at appropriate places in the text. first, we consider the sir model. here, the condition for infectives to grow gives a threshold for susceptibles s > =r ðtÞ, where r ðtÞ is the reproductive ratio. , numerical integration of the model eq. ( ) gives information about how s(t) and i(t) change and infection outbreak occurs with time. the dynamics of epidemic models under periodic modulation of the transmission rate (i.e., ¼ ; d ¼ ) has been a subject of extensive study. [ ] [ ] [ ] by introducing periodic modulation in the transmission rate, infection dynamics displays a repertoire of periodic to chaotic behaviours. the route to chaos is period doubling. in the periodic modulation case, there is infinite period doubling, whereas in quasiperiodic modulation finite doubling of torus before the onset of chaos. the organization of the dynamics in the d À parameter space is shown in fig. . to obtain the parameter space, we choose for each the initial conditions s ic ¼ : ; i ic ¼ : ; r ic ¼ À s ic À i ic at d ¼ , and then using the last coordinate values of the orbit obtained as the initial conditions for next d, separated by . . we calculate the largest nonzero lyapunov exponent (lle) of the system and choose the color red wherever lle is positive and the color white whenever it is zero. the sna dynamics, which occur at the boundary of the regular and chaotic dynamics, are characterized by the green color. note that the line ¼ corresponds to the periodic modulation while nonzero lines, each separated by . from next, represent the quasiperiodic modulation. fig. shows the coexistence of multiple attractors for the periodic modulation of the transmission rate in terms of forward ( fig. (a) ) and backward ( fig. (b) ) bifurcation diagrams and the corresponding lyapunov spectra for forward ( fig. (c) ) and backward (fig. (d) ) bifurcations. when the dynamics is in the periodic regime (e.g., for small d), the frequency of oscillations is the same or an integral multiple of the forcing frequency (taken to be unity here). however, when d is increased, chaotic dynamics is achieved via period doubling route. in the chaotic regime, unpredictable episodes of epidemic outbreaks can occur over time. the forward bifurcation and the corresponding lyapunov spectrum were obtained by choosing a set of the initial conditions at d ¼ and then using the coordinates of the orbit obtained as the initial conditions for next d, separated by . . a similar procedure was adopted for the backward bifurcation, beginning at d ¼ =ð þ Þ and moving towards d ¼ in steps of . . this procedure of deriving the attractor types is called the continuation of attractor. it ignores the multiple coexisting attractors while traversing in either direction. however, the coexistence is revealed by comparing the attractors constructed by both the forward and backward procedures. as reported in previous studies, , , , an important feature of the periodically forced sir system is the existence of multistability where two different attractors may coexist. for one such parameter value, the coexisting attractors are shown in figs. (a) and (b) along with their basin of attraction in fig. (c) . to obtain the basins, we used the largest nonzero lyapunov exponent and the mean square displacement (eq. ( )) as indicators of orbits. these orbits were obtained by varying the initial conditions in the s ic -i ic plane with r ic being determined by the constraint s ic þ i ic þ r ic ¼ . clearly, the relevant initial conditions in the s ic -i ic plane are limited to a triangular region with its vertices at ( , ) ( , ), and ( , ). the coexistence of multistability implies that different initial states may result in different asymptotic states even if we keep all other parameters fixed. when the transmission rate is quasiperiodically modulated (i.e., ¼ ), the simplest nonchaotic dynamics happens on a smooth torus surface. this implies that the susceptible (or infectious, etc.) fraction varies with an irrational frequency (because forcing frequency is irrational). consequently, it cannot be claimed that the peak in disease incidence would repeat itself periodically in time, as happens in the periodic modulation case. on increasing the values of d, the dynamics changes from torus to chaos via a finite period doubling route. strange non-chaotic attractors may be observed at the transition boundary between torus and chaos. a typical feature of sna is that the dynamics are free from sensitive dependence on initial conditions: two trajectories started with slightly different initial conditions synchronize on the sna but have non-smooth or fractal geometry. this implies that asymptotically predictable pattern of epidemic outbreaks is possible under quasiperiodic forcing. this is because a sna trajectory has both contracting and expanding subsets of the attractor and for a sufficiently long trajectory the contracting set dominates, rendering the dynamics free from sensitive dependence on initial conditions. the snas are observed at the transition boundary of chaotic and regular motions as one varies d, see the green regions in the two dimensional parameter space shown in fig. . examples of the typical bifurcation diagram as a function of d for different values of are shown in figs. (a) and (b), both containing the forward (black dots) and backward (red dots) bifurcation plots. although the bifurcation diagrams for the quasiperiodic modulation case are shown, it is important to note that the dynamics occurs on a quasiperiodic torus and the standard bifurcation theory does not apply: n-branch orbits are converted to t n , etc. the corresponding lyapunov exponent spectra are shown in figs. (c) and (d). orbits are distinguished from each other based on the lyapunov exponents (see section ii c) and the phase sensitivity parameter described in eq. ( ) . typical orbits in the torus, sna, and chaotic regimes are shown in figs. (a)- (c), their corresponding phase sensitivity parameters in figs. (d)- (f), while the distribution of finite time largest lyapunov exponents, pðk ; tÞ with t ¼ , are shown in figs. (g)- (i). since multistability is observed in the periodically modulated case, it is natural to check the fate of coexisting attractors and their basins under quasiperiodic modulation. to this end, we look for multiple attractors using the mean square displacement and the largest nonzero lyapunov exponent (see section ii c), calculated for a set of initial conditions on the line i ic ¼ s ic = by fixing d and varying . we find that multistability disappears after a critical value c is reached. the critical value c is different for different d. a typical example for d ¼ : is shown in fig. : the largest nonzero lyapunov exponent in fig. (a) and mean-square displacement in fig. (b) . distinct clusters of the largest nonzero lyapunov exponents or the hr i indicate the coexistence of distinct attractors. now that we know a critical value of after which only one attractor survives for a given d, we choose a value of < c for which it is expected to see multistability as we increase d from zero. we stop at a value of d at which an sna is likely to be observed and scan for the other coexisting attractors and their relative basin sizes. we find that typically an sna occupies the whole basin: fig. (a) shows the sna at d ¼ : ; ¼ : , fig. (b) its phase sensitivity parameter, and fig. (c) shows its basin. the sirs and the seir models behave like the sir model when j ! or r ! , respectively. in these limits, the analysis of sirs and seir does not differ from that of the sir model. below, therefore, we present results for these two models when j/r is in the regime which makes them distinct from the sir model in the dspace. in the sirs model, if the immunity loss rate j is sufficiently high then the chaotic behaviour is reduced to very narrow ranges of d for the periodically modulated case ( ¼ ). when quasiperiodic modulation is switched on ( ¼ ), the chaotic regions are further reduced, eventually disappearing altogether above a certain as shown in figs. (a)- (d). in the periodically modulated ( ¼ ) seir model, we find that as r ! the chaotic region is reduced to narrow ranges of d before finally disappearing below r ¼ (see figs. (a)- (c)). thus, when the latent period is large, even the periodic modulation of the transmission parameter in the seir model is not capable of generating complicated dynamics beyond periodic orbits as r ! . however, if for some r chaotic dynamics is observed for a finite range of d then, unlike the sirs model, the introduction of quasiperiodic modulation ¼ does not eliminate chaotic regions in the dspace. the reduction in chaotic regions in the sirs and seir models may be understood from the following arguments: recall that as j ! (recovered individuals do not become susceptible again) and r ! (the latent period tends to zero) reduce the sirs and seir models, respectively, to sir. then, taking the opposite limit j ! would imply the individuals are less likely to stay in the recovered state thereby reducing the dimension of the dynamical equations to two with only s and i as effective variables. similarly, r ! would mean that susceptible individuals never get to the infective state, thus once again reducing the dimension of the dynamical equation to two with only s and e. it is known that only three or higher dimensional autonomous flows display chaotic behaviour. thus, the result of our numerical experiments that chaotic regions start depleting as soon as we increase j or decrease r is due to the lower dimensional dynamics taking over even for j and sigma being far from that absolute limit. previous studies, investigating the effects of periodic modulation in the transmission rate, showed the coexistence of multiple attractors in the dynamics of the sir family of epidemic models. they helped in understanding important implications of seasonality for transmission ecology, dynamics, and control (e.g., by vaccination) of infectious diseases. [ ] [ ] [ ] , , however, these studies ignored the impact of variability in the external driving signals and that of internal factors (e.g., immune responses) as the source of nonlinearity in disease dynamics itself on the infection dynamics and coexistence of multiple attractors. this exclusion is increasingly surfacing as a significant element to be considered in disease modelling (see the arguments put forth in refs. and ) as we progressively gather empirical evidence that the external driving signals (i.e., rainfall, temperature, etc.) may indeed vary year-to-year , and therefore their impact may not be captured by a sinusoidal wave function. in this paper, we studied the effects of quasiperiodic modulation in the transmission rate in the sir and its allied models such as the sirs and seir. in physics and engineering sciences, there have been extensive studies in forced oscillator systems using different forcing functions, including quasiperiodic one, and this study is a first attempt, as far as we are aware of, to apply and investigate the effect of quasiperiodic forcing of the transmission terms in epidemic models. the addition of quasiperiodic element in the temporal modulation of the transmission term gradually (as increases) annihilates multistability, leaving behind only one attractor for each parameter set. additionally, new dynamical states, such as the snas, are created which make the dynamics of epidemics asymptotically predictable although they have non-smooth geometry (chaotic states too have non-smooth geometry). however, in the sna states there can be unpredictability in outbreaks of disease in finite time. the coexistence of multiple attractors (in the periodically modulated case), such as chaos and periodic orbits, is believed to provide an explanation for observed different trajectories of the incidence of childhood diseases (e.g., measles) in the post-vaccination era from those dominated in the pre-vaccination era. , the coexistence of multiple attractors (such as chaotic and smooth torus) is still likely as in the periodically forced sir models albeit for smaller values of . however, when the disease dynamics are dominated by the sna-type trajectories, we find that it occupies the whole basin and no other states coexist. the existence of snas provides an additional state available where the dynamics is asymptotically predictable. since snas in epidemic models created under quasiperiodic forcing are not sensitively dependent on initial conditions (as they occupy the whole basin), control efforts such as vaccination may not be able to alter the predictability of disease incidences. periodically forced sir dynamics were hypothesized to be able to switch and settle on different attractors in the presence of noise, we hypothesize that noise-induced switching of trajectories may not be apparent due to the presence of snas in quasiperiodic forced epidemic models. in addition, the sir models considered here are simple in comparison to the models of vector-borne or environment-mediated infectious diseases (such as malaria and cholera, respectively). year-to-year variation in external forcing factors is likely to introduce temporal heterogeneity in the persistence and density of disease vectors or causative agents (e.g., vibrio cholerae) in the environment, and this effect will be more pronounced in the regions of marginal environmental conditions. therefore, the dynamics of more complex disease models under quasiperiodic forcing of the transmission rate will be highly relevant. we plan to investigate these questions in future. regarding the problem of persistence in metapopulations, it is known that seasonal forcing of transmission has the tendency to synchronize epidemics among sub-populations. , in uncoupled metapopulations, chaotic dynamics, however, does not synchronize although the strange nonchaotic attractors, which are created when transmission rate is quasiperiodically modulated, can synchronize. an investigation of metapopulations when transmission is quasiperiodically modulated will form the basis of our future work. infectious diseases of humans: dynamics and control modelling infectious diseases in humans and animals handbook of chaos control strange nonchaotic attractors chaos in dynamical systems consider a pair of two dimensional systems such that the x-coordinate for them has mean square displacement r x ¼ ; r x ¼ while the y-coordinate has mean square displacement has r y ¼ ; r y ¼ then the vector z ¼ ½x; y t for each of them will have mean square displacement r z ¼ and thus rendering them indistinguishable chaos: an introduction to dynamical systems numerical recipes in c key: cord- -z a eoyo authors: brockmann, dirk title: human mobility, networks and disease dynamics on a global scale date: - - journal: diffusive spreading in nature, technology and society doi: . / - - - - _ sha: doc_id: cord_uid: z a eoyo disease dynamics is a complex phenomenon and in order to address these questions expertises from many disciplines need to be integrated. one method that has become particularly important during the past few years is the development of computational models and computer simulations that help addressing these questions. in the focus of this chapter are emergent infectious diseases that bear the potential of spreading across the globe, exemplifying how connectivity in a globalized world has changed the way human-mediated processes evolve in the st century. the examples of most successful predictions of disease dynamics given in the chapter illustrate that just feeding better and faster computers with more and more data may not necessarily help understanding the relevant phenomena. it might rather be much more useful to change the conventional way of looking at the patterns and to assume a correspondingly modified viewpoint—as most impressively shown with the examples given in this chapter. in early , news accumulated in major media outlets about a novel strain of influenza circulating in major cities in mexico [ ] . this novel h n strain was quickly termed "swine flu", in reference to its alleged origin in pig populations before jumping the species border to humans. very quickly public health institutions were alerted and saw the risk of this local influenza epidemic becoming a major public health problem globally. the concerns were serious because this influenza strain was of the h n subtype, the same virus family that caused one of the biggest pandemics in history, the spanish flu that killed up to million people in the beginning of the th century [ ] . the swine flu epidemic did indeed develop into a pandemic, spreading across the globe in matters of months. luckily, the strain turned out to be comparatively mild in terms of symptoms and as a health hazard. nevertheless, the concept of emergent infectious diseases, novel diseases that may have dramatic public health, societal and economic consequences reached a new level of public awareness. even hollywood picked up the topic in a number of blockbuster movies in the following years [ ] . only a few years later, mers hit the news, the middle east respiratory syndrome, a new type of virus that infected people in the middle east [ ] . mers was caused by a new species of corona virus of the same family of viruses that the sars virus belonged to. and finally, the ebola crisis in west african countries liberia, sierra leone and guinea that although it did not develop into a global crisis killed more than people in west africa [ ] . emergent infectious diseases have always been part of human societies, and also animal populations for that matter [ ] . humanity, however, underwent major changes along many dimensions during the last century. the world population has increased from approx. . billion in to . billion in [ ] . the majority of people now live in so-called mega-cities, large scale urban conglomerations of more than million inhabitants that live in high population densities [ ] often in close contact with animals, pigs and fowl in particular, especially in asia. these conditions amplify not only the transmission of novel pathogens from animal populations to human, high frequency human-to-human contacts yield a potential for rapid outbreaks of new pathogens. population density is only one side of the coin. in addition to increasing faceto-face contacts within populations we also witness a change of global connectivity [ ] . most large cities are connected by means of an intricate, multi-scale web of transportation links, see fig. . . on a global scale worldwide air-transportation dominates this connectivity. approx. , airports and , direct connections span the globe. more than three billion passengers travel on this network each year. every day the passengers that travel this network accumulate a total of more than billion kilometers, which is three times the radius of our solar system [ , ] . clearly this amount of global traffic shapes the way emergent infectious diseases can spread across the globe. one of the key challenges in epidemiology is preparing for eventual outbreaks and designing effective control measures. evidence based control measures, however, require a good understanding of the fundamental features and characteristics of spreading behavior that all emergent infectious diseases share. in this context this means addressing questions such as: if there is an outbreak at fig. . the global air-transportation network. each node represents one of approx. airports, each link one of approx. direct connections between airports. more than billion passengers travel on this network each year. all in all every day more than billion km are traversed on this network, three times the radius of our solar system location x when should one expect the first case at a distant location y? how many cases should one expect there? given a local outbreak, what is the risk that a case will be imported in some distant country. how does this risk change over time? also, emergent infectious diseases often spread in a covert fashion during the onset of an epidemic. only after a certain number of cases are reported, public health scientists, epidemiologist and other professionals are confronted with cases that are scattered across a map and it is difficult to determine the actual outbreak origin. therefore, a key question is also: where is the geographic epicenter of an ongoing epidemic? disease dynamics is a complex phenomenon and in order to address these questions expertises from many disciplines need to be integrated, such as epidemiolgy, spatial statistics, mobility and medical research in this context. one method that has become particularly important during the past few years is the development of computational models and computer simulations that help address these questions. these are often derived and developed using techniques from theoretical physics and more recently complex network science. modeling the dynamics of diseases using methods from mathematics and dynamical systems theory has a long history. in kermack and mckenrick [ ] introduced and analyzed the "suceptible-infected-recovered" (sir) model, a parsimoneous model for the description of a large class of infectious diseases that is also still in use today [ ] . the sir model considers a host population in which individuals can be susceptible (s), infectious (i) or recovered (r). susceptible individuals can aquire a disease and become infectious themselves and transmit the disease to other susceptible individuals. after an infectious period individuals recover, acquire immunity, and no longer infect others. the sir model is an abstract model that reduces a real world situation to the basic dynamic ingredients that are believed to shape the time course of a typical epidemic. structurally, the sir model treats individuals in a population in much the same way as chemicals that react in a wellmixed container. chemical reactions between reactants occur at rates that depend on what chemicals are involved. it is assumed that all individuals can be represented only by their infectious state and are otherwise identical. each pair of individuals has the same likelihood of interacting. schematically, the sir model is described by the following reactions where and are transmission and recovery rates per individual, respectively. the expected duration of being infected, the infectious period is given by t = − which can range from a few days to a few weeks for generic diseases. the ratio of rates r = ∕ is known as the basic reproduction ratio, i.e. the expected number of secondary infections caused by a single infected individual in a fully susceptible population. r is the most important epidemiological parameter because the value of r determines whether an infectious disease has the potential for causing an epidemic or not. when r > a small fraction of infected individuals in a susceptible population will cause an exponential growth of the number of infections. this epidemic rise will continue until the supply of susceptibles decreases to a level at which the epidemic can no longer be sustained. the increase in recovered and thus immune individuals dilutes the population and the epidemic dies out. mathematically, one can translate the reaction scheme ( . ) into a set of ordinary differential equations. say the population has n ≫ individuals. for a small time interval Δt and a chosen susceptible individual the probability of that individual interacting with an infected is proportional to the fraction i∕n of infected individuals. because we have s susceptibles the expected change of the number susceptibles due to infection is where the rate is the same as in ( . ) and the negative sign accounts for the fact that the number of susceptibles decreases. likewise the number of infected individuals is increased by the same amount Δi = +Δt × × s × i∕n. the number of infecteds can also decrease due to the second reaction in ( . ). because each infected can spontaneouly recover the expected change due to recovery is based on these assumptions eqs. ( . ) and ( . ) become a set of differential equations that describe the dynamics of the sir model in the limit Δt → : depending on the magnitude of n a model in which reactions occur randomly at rates and a stochastic system generally exhibits solutions that fluctuate around the solutions to the deterministic system of eq. ( . ) . both, the deterministic sir model and the more general particle kinetic stochastic model are designed to model disease dynamics in a single population, spatial dynamics or movement patterns of the host population are not accounted for. these systems are thus known as well-mixed systems in which the analogy is one of chemical reactants that are well-stirred in a chemical reaction container as mentioned above. when a spatial component is expected to be important in natural scenario, several methodological approaches exist to account for space. essentially the inclusion of a spatial component is required when the host is mobile and can transport the state of infection from one location to another. the combination of local proliferation of an infection and the disperal of infected host individuals then yields a spread along the spatial dimension [ , ] . one of the most basic ways of incorporating a spatial dimension and host dispersal is by assuming that all quantities in the sir model are also functions of a location , so the state of the system is defined by s( , t), j( , t) and r( , t). most frequently two spatial dimensions are considered. the simplest way of incorporating dispersal is by an ansatz following eq. ( . ) in chap. which assumes that individuals move diffusively in space which yields the reaction-diffusion dynamical system where e.g. in a two-dimensional system with = (x, y) the laplacian is ∇ = ∕ x + ∕ y and the parameter d is the diffusion coefficient. the reasoning behind this approach is that the net flux of individuals of one type from one location to a neighboring location is proportional to the gradient or the difference in concentration of that type of individuals between neighboring locations. the key feature of diffusive dispersal is that it is local, in a discretized version the laplacian permits movements only within a limited distance. in reaction diffusion systems of this type the combination of initial exponential growth (if r = ∕ > ) and diffusion (d > ) yields the emergence of an epidemic wavefront that progresses at a constant speed if initially the system is seeded with a small patch of infected individuals [ ] . the advantage of parsimoneous models like the one defined by eq. ( . ) is that properties of the emergent epidemic wavefront can be computed analytically, e.g. the speed of the wave in the above system is related to the basic reproduction number and diffusion coefficient by in which we recognize the relation of eq. ( . ). another class of models considers the reaction of eq. ( . ) to occur on two-dimensional (mostly square) lattices. in these models each lattice site is in one of the states s, i or r and reactions occur only with nearest neighbors on the lattice. these models account for stochasticity and spatial extent. given a state of the system, defined by the state of each lattice site, and a small time interval Δt, infected sites can transmit the disease to neighboring sites that are susceptible with a probability rate . infected sites also recover to the the system is identical to the system depicted in (a). however, in addition to the generic next neighbor transmission, with a small but significant probability a transmission to a distant site can occur. this probability also decreases with distance as an inverse power-law, e.g. where the exponent is in the range < < . because the rare but significant occurance of long-range transmissions, a more complex pattern emerges, the concentric nature observed in system a is gone. instead, a fractal, multiscale pattern emerges r state and become immune with probability Δt. figure . a illustrates the time course of the lattice-sir model. seeded with a localized patch of infected sites, the system exhibits an asymptotic concentric wave front that progresses at an overall constant speed if the ratio of transmission and recovery rate is sufficiently large. without the stochastic effects that yield the irregular interface at the infection front, this system exhibits similar properties to the reaction diffusion system of eq. ( . ). in both systems transmission of the disease in space is spatially restricted per unit time. the stochastic lattice model is particularly useful for investigating the impact of permitting long-distance transmissions. figure . b depicts temporal snapshots of a simulation that is identical to the system of fig. . a apart from a small but significant difference. in addition for infected sites to transmit the disease to neighboring susceptible lattice sites, every now and then (with a probability of %) they can also fig. . ) geographic distance to the initial outbreak location is no longer a good predictor of arrival time, unlike in systems with local or spatially limited host mobility infect randomly chosen lattice sites anywhere in the system. the propensity of infecting a lattice site at distance r decreases as an inverse power-law as explained in the caption to fig. . . the possibility of transmitting to distant locations yields new epidemic seeds far away that subsequently turn into new outbreak waves and that in turn seed second, third, etc. generation outbreaks, even if the overall rate at which long-distance transmission occur is very small. the consequence of this is that the spatially coherent, concerntric pattern observed in the reaction diffusion system is lost, and a complex spatially incoherent, fractal pattern emerges [ ] [ ] [ ] . practically, this implies that the distance from an initial outbreak location can no longer be used as a measure for estimating or computing the time that it takes for an epidemic to arrive at a certain location. also, given a snapshot of a spreading pattern, it is much more difficult to reconstruct the outbreak location from the geometry of the pattern alone, unlike in the concentric system where the outbreak location is typically near the center of mass of the pattern. a visual inspection of the air-transportation system depicted in fig. . is sufficiently convincing that the significant fraction of long-range connections in global mobility will not only increase the speed at which infectious diseases spread but, more importantly, also cause the patterns of spread to exhibit high spatial incoherence and complexity caused by the intricate connectivity of the air-transportation network. as a consequence we can no longer use geographic distance to an emergent epidemic epicenter as an indicator or measure of "how far away" that epicenter is and how long it will take to travel to a given location on the globe. this type of decorrelation is shown in fig. . for two examples: the sars epidemic and the influenza h n pandemic. on a spatial resolution of countries, the figure depicts scatter plots of the epidemic arrival time as a function of geodesic (shortest distance on the surface of the earth) distance from the initial outbreak location. as expected, the correlation between distance and arrival time is weak. given that models based on local or spatially limited mobility are inadequate, improved models must be developed that account for both, the strong heterogeneity in population density, e.g. that human populations accumulate in cities that vary substantially in size, and the connectivity structure between them that is provided by data on air traffic. in a sense one needs to establish a model that captures that the entire population is a so-called meta-population, a system of m = , … , m subpopulation, each of size n m and traffic between them, e.g. specifying a matrix f nm that quantifies the amount of host individuals that travel from population m to population n in a given unit of time [ , ] . for example n n could correspond to the size of city n and f nm the amount of passengers the travel by air from m to n. one of earliest and most employed models for disease dynamics using the meta-population approach is a generalization of eq. ( . ) in which each population's dynamics is governed by the ordinary sir model, e.g. ds n ∕dt = − s n i n ∕n n ( . ) di n ∕dt = s n i n ∕n n − i n dr n ∕dt = i n where the size n n = r n + i n + s n of population n is a parameter. in addition to this, the exchange of individuals between populations is modeled in such a way that hosts of each class move from location m to location n with a probability rate nm which yields which is a generic metapopulation sir model. in principle one is required to fix the infection-related parameters and and the population sizes n m as well as the mobility rates nm , i.e. the number of transitions from m to n per unit time. however, based on very plausible assumptions [ ] , the system can be simplified in such a way that all parameters can be gauged against data that is readily available, e.g. the actual passenger flux f nm (the amount of passengers that travel from m to n per day) that defines the air-transportation network, without having to specify the absolute population sizes n n . first the general rates nm have to fulfill the condition nm n m = mn n n if we assume that the n n remain constant. if we assume, additionally, that the total air traffic flowing out of a population n obeys where the dynamic variables are, again, fractions of the population in each class: s n = s n ∕n n , j n = i n ∕n n , and r n = r n ∕n n . in this system the new matrix p mn and the new rate parameter can be directly computed from the traffic matrix f nm and the total population involved n = ∑ m n m according to where f = ∑ n,m f mn is the total traffic in the network. the matrix p nm is therefore the fraction of passengers that are leaving node m with destination n. because passengers must arrive somewhere we have ∑ n p nm = . an important first question is concerning the different time scales, i.e. the parameters , and that appear in system ( . ) . the inverse − = t is the infectious period, that is the time individuals remain infectious. if we assume t ≈ - days and r = ∕ ≈ both rates are of the same order of magnitude. how about ? the total number of passengers f is approximately × per day. if we assume that n ≈ × people we find that it is instructive to consider the inverse t travel = − ≈ days. on average a typical person boards a plane every - years or so. keep in mind though that this is an average that accounts for both a small fraction of the population with a high frequency of flying and a large fraction that almost never boards a plane. the overall mobility rate is thus a few orders of magnitude smaller than those rates related to transmissions and recoveries. this has important consequences for being able to replace the full dynamic model by a simpler model discussed below. figure . depicts a numerical solution to the model defined by eq. ( . ) for a set of initial outbreak locations. at each location a small seed of infected individuals initializes the epidemic. global aspects of an epidemic can be assessed by the total fraction of infected individuals j g (t) = ∑ n c n j n (t) where c n is the relative size population n with respect to the entire population size n . as expected the time course of a global epidemic in terms of the epicurve and duration depends substantially on the initial outbreak location. a more important aspect is the spatiotemporal pattern generated by the model. figure . depicts temporal snapshots of simulations initialized in london and chicago, respectively. analogous to the qualitative patterns observed in fig. . b, we see that the presence of long-range connections in the worldwide air-transportation network yields incoherent spatial patterns much unlike the regular, concentric wavefronts observed in systems without long-range mobility. figure . shows that also the model epidemic depicts only a weak correlation between geographic distance to the outbreak location and arrival time. for a fixed geographic distance arrival times at different airports can vary substantially and thus the traditional geographic distance is useless as a predictor. the system defined by eq. ( . ) is one of the most parsimoneous models that accounts for strongly heterogeneous population distributions that are coupled by traffic flux between them and that can be gauged against actual population size distributions and traffic data. surprisingly, despite its structural simplicity this type of model has been quite successful in accounting for actual spatial spreads of past epi-and pandemics [ ] . based on early models of this type and aided by the exponential increase of computational power, very sophisticated models have been developed that account for factors that are ignored by the deterministic metapopulation sir model. in the most sophisticated approaches, e.g. gleam [ ] , the global epidemic and mobility computational tool, not only traffic by air but other means of transportation are considered, more complex infectious dynamics is considered and in hybrid dynamical systems stochastic effects caused by random reactions and mobility events are taken into account. household structure, available hospital beds, seasonality have been incorporated as well as disease specific features, all in order to make predictions more and more precise. the philosophy of this type of research line heavily relies on the increasing advancement of both computational power as well as more accurate and pervasive data often collected in natural experiments and webbased techniques [ ] [ ] [ ] [ ] [ ] . despite the success of these quantitative approaches, this strategy bears a number of problems some of which are fundamental. first, with increasing computational methods it has become possible to implement extremely complex dynamical systems with decreasing effort and also without substantial knowledge of the dynamical properties that often nonlinear dynamical systems can possess. implementing a lot of dynamical detail, it is difficult to identify which factors are essential for an observed phenomenon and which factors are marginal. because of the complexity that is often incorporated even at the beginning of the design of a sophisticated model in combination with the lack of data modelers often have to make assumptions about the numerical values of parameters that are required for running a computer simulation [ ] . generically many dozens of unknown parameters exist for which plausible and often not evidence-based values have to be assumed. because complex computational models, especially those that account for stochasticity, have to be run multiple times in order to make statistical assessments, systematic parameter scans are impossible even with the most sophisticated supercomputers. finally, all dynamical models, irrespective of their complexity, require two ingredients to be numerically integrated: ( ) fixed values for parameters and ( ) initial conditions. although some computational models have been quite successful in describing and reproducing the spreading behavior of past epidemics and in situations where disease specific parameters and outbreak locations have been assessed, they are difficult to apply in situations when novel pathogens emerge. in these situations, when computational models from a practical point of view are needed most, little is known about these parameters and running even the most sophisticated models "in the dark" is problematic. the same is true for fixing the right initial con-ditions. in many cases, an emergent infectious disease initially spreads unnoticed and the public becomes aware of a new event after numerous cases occur in clusters at different locations. reconstructing the correct initial condition often takes time, more time than is usually available for making accurate and valueable predictions that can be used by public health workers and policy makers to devise containment strategies. given the issues discussed above one can ask if alternative approaches exist that can inform about the spread without having to rely on the most sophisticated highly detailed computer models. in this context one may ask whether the complexity of the observed patterns that are solutions to models like the sir metapopulation model of eq. ( . ) are genuinely complex because of the underlying complexity of the mobility network that intricately spans the globe, or whether a simple pattern is really underlying the dynamics that is masked by this complexity and our traditional ways of using conventional maps for displaying dynamical features and our traditional ways of thinking in terms of geographic distances. in a recent approach brockmann and helbing [ ] developed the idea of replacing the traditional geographic distance by the notion of an effective distance derived from the topological structure of the global air-transportation network. in essence the idea is very simple: if two locations in the air-transportation network exchange a large number of passengers they should be effectively close because a larger number of passengers implies that the probability of an infectious disease to be transmitted from a to b is comparatively larger than if these two locations were coupled only by a small number of traveling passengers. effective distance should therefore decrease with traffic flux. what is the appropriate mathematical relation and a plausible ansatz to relate traffic flux to effective distance? to answer this question one can go back to the metapopulation sir model, i.e. eq. ( . ) . dispersal in this equation is governed by the flux fraction p nm . recall that this quantity is the fraction of all passengers that leave node m and arrive at node n. therefore p nm can be operationally defined as the probability of a randomly chosen passenger departing node m arriving at node n. if, in a thought experiment, we assume that the randomly selected person is infectious, p nm is proportional to the probability of transmitting a disease from airport m to airport n. we can now make the following ansatz for the effective distance: ( . ) where d ≥ is a non-negative constant to be specified later. this definition of effective distance implies that if all traffic from m arrives at n and thus p nm = the effective distance is d nm = d which is the smallest possible value. if, on the other hand p nm becomes very small, d nm becomes larger as required. the definition ( . ) applies to nodes m and n that are connected by a link in the network. what about pairs of nodes that are not directly connected but only by paths that require intermediate steps? given two arbitrary nodes, an origin m and a destination n, an infinite amount of paths (sequence of steps) exist that connect the two nodes. we can define the shortest effective route as the one for which the accumulation of effective distances along the legs is minimal. so for any path we sum the effective distance along the legs according to eq. ( . ) adding up to an effective distance d nm . this approach also explains the use of the logarithm in the definition of effective distance. adding effective distances along a route implies the multiplication of the probabilities p nm along the involved steps. therefore the shortest effective distance d nm is equivalent to the most probable path that connect origin and destination. the parameter d is a free parameter in the definition and quantifies the influence of the number of steps involved in a path. typically it is chosen to be either or depending on the application. one important property of effective distance is its asymetry. generally we have this may seem surprising at first sight, yet it is plausible. consider for example two airports a and b. let's assume a is a large hub that is strongly connected to many other airports in the network, including b. airport b, however, is only a small airport with only as a single connection leading to a. the effective distance b → a is much smaller (equal to d ) than the effective distance from the hub a to the small airport b. this accounts for the fact that if, again in a thought experiment, a randomly chosen passenger at airport b is most definitely going to a whereas a randomly chosen passenger at the hub a is arriving at b only with a small probability. given the definition of effective distance one can compute the shortest effective paths to every other node from a chosen and fixed reference location. each airport m thus has a set of shortest paths p m that connect m to all other airports. this set forms the shortest path tree t m of airport m. together with the effective distance matrix d nm the tree defines the perspective of node m. this is illustrated qualitatively in the fig. . that depicts a planar random triangular weighted network. one can now employ these principles and compute the shortest path trees and effective distances from the perspective of actual airports in the worldwide airtransportation network based on actual traffic data, i.e. the flux matrix f nm . figure . depicts the shortest path tree of one of the berlin airports (tegel, txl). the radial distance of all the other airports in the network is proportional to their effective distance from txl. one can see that large european hubs are effectively close to txl as expected. however, also large asian and american airports are effectively close to txl. for example the airports of chicago (ord), beijing (pek), miami (mia) and new york (jfk) are comparatively close to txl. we can also see that from the perspective of txl, germany's largest airport fra serves as a gateway to a considerable fraction of the rest of the world. because the shortest path tree also represents the most probable spreading routes one can use this method to identify airports that are particularly important in terms of distributing an infectious disease throughout the network. the shortest path trees are also those paths that correspond to the most probable paths of a random walker that starts at the reference location and terminates at the respective target node the use of effective distance and representing the air-transportation network from the perspective of chosen reference nodes and making use of the more plausible notion of distance that better reflects how strongly different locations are coupled in a networked system is helpful for "looking at" the world. yet, this representation is more than a mere intuitive and plausible spatial representation. what are the dynamic consequences of effective distance? the true advantage of effective distance is illustrated in fig. . . this figure depicts the identical computer-simulated hypothetical pandemic diseases as fig. . . unlike the latter, that is based on the traditional geographic representation, fig. . employs the effective distance and shortest path tree representation from the perspective of the outbreak location as discussed above. using this method, the spatially incoherent patterns in the traditional representation are transformed into concentric spreading patterns, similar to those expected for simple reaction diffusion systems. this shows that the complexity of observed spreading patterns is actually equivalent to simple spreading patterns that are just convoluted and masked by the underlying network's complexity. this has important consequences. because only the topological features of the network are used for computing the effective distance and no dynamic features are required, the concentricy of the emergent patterns are a generic feature and independent of dynamical properties of the underlying model. it also means that in effective distance, contagion processes spread at a constant speed, and just like in the simple reaction diffusion model one can much better predict the arrival time of an epidemic wavefront, knowing the speed and effective distance. for example if shortly after an epidemic outbreak the spreading commences and the initial spreading speed is assessed, one can forecast arrival times without having to run computationally expensive simulations. even if the spreading speed is unknown, the shortest path trees and effective distance from the perspective of airport tegel (txl) in berlin. txl is the central node. radial distance in the tree quantifies the effective distance to the reference node txl. as expected large european hubs like frankfurt (fra), munich (muc) and london heathrow (lhr) are effective close to txl. however, also hubs that are geographically distant such as chicago (ord) and beijing (pek) are effectively closer than smaller european airports. note also that the tree structure indicates that fra is a gateway to a large fraction of other airports as reflected by the size of the tree branch at fra. the illustration is a screenshot of an interactive effective distance tool available online [ ] effective distance which is independent of dynamics can inform about the sequence of arrival times, or relative arrival times. the benefit of the effective distance approach can also be seen in fig. . in which arrival times of the sars epidemic and the h n pandemic in affected countries are shown as a function of effective distance to the outbreak origin. comparing this figure to fig. . we see that effective distance is a much better predictor of arrival time, a clear linear relationship exists between effective distance ord lhr fig. . simulations and effective distance. the panels depict the same temporal snapshots of computer simulated hypothetical pandemic scenarios as in fig. . . the top row corresponds to a pandemic initially seeded at lhr (london) the bottom row at ord (chicago). the networks depict the shortest path tree effective distance representation of the corresponding seed airports as in fig. . . the simulated pandemics that exhibit spatially incoherent complex patterns in the traditional representation (fig. . ) are equivalent to concentric wave fronts that progress at constant speeds in effective distance space. this method thus substantially simplifies the complexity seen in conventional approaches and improves quantitative predictions and epidemic arrival. thus, effective distance is a promising tool and concept for application in realistic scenarios, being able to provide a first quantitative assessment of an epidemic outbreak and its potential consequences on a global scale. in a number of situation epidemiologists are confronted with the task of reconstructing the outbreak origin of an epidemic. when a novel pathogen emerges in some cases the infection spreads covertly until a substantial case count attracts attention and public health officials and experts become aware of the situation. quite often cases occur much like the patterns depicted in fig. . b in a spatially incoherent way because of the complexity of underlying human mobility networks. when cases emerge at apparently randomly distributed locations it is a difficult task to assess where the event initially started. the computational method based on effective distance can also be employed in these situations provided that one knows the underlying mobility network. this is because the concentric pattern depicted in fig. . is . compared to the conventional use of geographic distance effective distance is a much better predictor of epidemic arrival time as is reflected by the linear relationship between arrival time and effective distance, e.g. compare to fig. . . right: the same analysis for the sars epidemic. also in this case effective distance is much more strongly correlated with arrival time than geographic distance only observed if and only if the actual outbreak location is chosen as the center perspective node. in other words, if the temporal snapshots are depicted using a different reference node the concentric pattern is scrambled and irregular. therefore, one can use the effective distance method to identify the outbreak location of a spreading process based on a single temporal snapshot. this method is illustrated in a proofof-concept example depicted in fig. . . assume that we are given a temporal snapshot of a spreading process as depicted in fig. . a and the goal is to reconstruct the outbreak origin from the data. conventional geometric considerations are not sucessful because the network-driven processes generically do not yields simple geometric patterns. using effective distance, we can now investigate the pattern from the perspective of every single potential outbreak location. we could for example pick a set of candidate outbreak locations (panel (b) in the figure). if this is done we will find that only for one candidate outbreak location the temporal snapshot has the shape of a concentric circle. this must be the original outbreak location. this process, qualitatively depicted in the figure, can be applied in a quantitative way and has been applied to actual epidemic data such as the ehec outbreak in germany [ ] . outbreak reconstruction using effective distance. a the panel depicts a temporal snapshot of a computer simulated hypothetical pandemic, red dots denote airports with a high prevalence of cases. from the snapshot alone it is difficult to assess the outbreak origin which in this case is ord (chicago). b a choice of potential outbreak locations as candidates. c for these candidate locations the pattern is depicted in the effective distance perspective. only for the correct outbreak location the pattern is concentric. this method can be used quantitatively to identify outbreaks of epidemics that initially spread in a covert way emergent infectious diseases that bear the potential of spreading across the globe are an illustrative example of how connectivity in a globalized world has changed the way human mediated processes evolve in the st century. we are connected by complex networks of interaction, mobility being only one of them. with the onset of social media, the internet and mobile devices we share information that proliferates and spreads on information networks in much the same way (see also chap. ) . in all of these systems the scientific challenge is understanding what topological and statistical features of the underlying network shape particular dynamic features observed in natural systems. the examples addressed above focus on a particular scale, defined by a single mobility network, the air-transportation network that is relevant for this scale. as more and more data accumulates, computational models developed in the future will be able to integrate mobility patterns at an individual resolution, potentially making use of pervasive data collected on mobile devices and paving the way towards predictive models that can account very accurately for observed contagion patterns. the examples above also illustrate that just feeding better and faster computers with more and more data may not necessarily help understanding the fundamental processes and properties of the processes that underly a specific dynamic phenomenon. sometimes we only need to change the conventional and traditional ways of looking at patterns and adapt our viewpoint appropriately. note , examples are contagion, a surprisingly accurate depiction of the consequences of a severe pandemic, and rise of the planet of the apes that concludes with ficticious explanation for the extinction of mankind due to a man made virus in the future the fates of human societies proc. r. soc. lond infectious diseases of humans: dynamics and control modeling infectious diseases in humans and animals human mobility and spatial disease dynamics proc. natl. acad. sci. usa key: cord- -l s utqb authors: maheshwari, h.; shetty, s.; bannur, n.; merugu, s. title: cosir: managing an epidemic via optimal adaptive control of transmission policy date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: l s utqb shaping an epidemic with an adaptive contact restriction policy that balances the disease and socioeconomic impact has been the holy grail during the covid- pandemic. most of the existing work on epidemiological models focuses on scenario-based forecasting via simulation but techniques for explicit control of epidemics via an analytical framework are largely missing. in this paper, we consider the problem of determining the optimal policy for transmission control assuming sir dynamics, which is the most widely used epidemiological paradigm. we first demonstrate that the sir model with infectious patients and susceptible contacts (i.e., product of transmission rate and susceptible population) interpreted as predators and prey respectively reduces to a lotka-volterra (lv) predator-prey model. the modified sir system (lvsir) has a stable equilibrium point, an energy conservation property, and exhibits bounded cyclic behaviour similar to an lv system. this mapping permits a theoretical analysis of the control problem supporting some of the recent simulation-based studies that point to the benefits of periodic interventions. we use a control-lyapunov approach to design adaptive control policies (cosir) to nudge the sir model to the desired equilibrium that permits ready extensions to richer compartmental models. we also describe a practical implementation of this transmission control method by approximating the ideal control with a finite, but a time-varying set of restriction levels and provide simulation results to demonstrate its efficacy. the covid- situation and its immense toll on human lives has highlighted the enormous public health challenges associated with managing a pandemic. in the absence of a vaccine, there are primarily three control levers available for public health officials, namely, (a) contact restrictions, (b) testing, tracing and isolation, and (c) provisioning for additional medical capacity. of these, contact restrictions via lockdowns and social distancing have emerged as the most powerful policy instrument especially in low and middle income countries that cannot afford to scale up testing or medical capacity. choosing the optimal level of restrictions, however, has been highly non-trivial not only because it involves a complex trade-off between the yet to be understood covid- disease impact and other socio-economic disruptions, but also because of the rapidly evolving situation on the ground. public health interventions related to the covid- pandemic have largely been driven by scenariobased epidemiological forecasting studies [ , ] . current epidemiological models [ , , ] incorporate spatio-temporal variations and predictive signals such as mobility to provide high fidelity forecasts. however, the decision making on contact restrictions is still fairly sub-optimal since it is based on the comparison of a few enumerated scenarios for a limited time horizon. furthermore, forecasts based on a constant transmission rate (i.e., avg. #new people directly infected by an infectious person per time unit) convey the impression that the epidemic progression corresponds to a bell curve [ ] regardless of empirical evidence to the contrary (see fig ) . flattening the curve till herd immunity is seen as the only choice. epidemiological analysis is often centred around determining the height and timing of the caseload peak as well as the time to attain herd immunity. though highly valuable, this scenario-based decision-making approach leans towards a limited reactive role for public health agencies. in contrast, much less attention has been devoted to developing a mathematical control framework to support proactive decision-making based on the target disease & economic outcomes, and the state of the epidemic. multiple studies [ , , , ] point to the benefits of periodic lockdowns and staggered mobility among population groups, but these dynamic interventions are based on forecast simulations of a limited number of scenarios and are not adaptive in nature. in this paper, we explore the problem of optimal adaptive control of transmission rate for a desired bound on infectious population. we focus on epidemiological models based on compartmental (sir and seir) dynamics [ ] because of their wide applicability, parsimonious & interpretable encoding of the disease dynamics, and amenability for data-driven calibration to yield accurate forecasts. • we demonstrate that the sir dynamics map to the well-known lotka-volterra (lv) system [ ] on interpreting infectious patients as predators and susceptible contacts (i.e., the product of transmission rate and susceptible population) as the prey under specific conditions on the transmission rate. the resulting system (lvsir) has a well-defined stable equilibrium point and an "energy" conservation property. it exhibits a bounded cyclic trend for active infections and a steady decline of the susceptible population. • we derive optimal control policy for transmission rate (cosir) using control-lyapunov functions [ ] based on the energy of the system, that is guaranteed to converge to the desired equilibrium, i.e., target infectious levels from any valid initial state. we also discuss extensions to compartmental model variants that involve an incubation period (e.g., delayed sir, seir) as well as control of the infectious period that is influenced by testing and quarantine policy. • we propose a practical approximate implementation of the transmission rate control via discrete, but time-varying restriction levels. simulation results demonstrate the efficacy of the approximate control in stabilizing infections and adaptability to perturbations from super-spreader events . the rest of the paper is organized as follows. section presents a formulation of the restriction control problem. section provides background on compartmental models, lv systems and relevant aspects of control theory. sections , , , present the sir to lv system mapping, the transmission rate control mechanism, practical restriction control policy, and extensions respectively. section presents the concluding remarks. notation: x t and x(t) interchangeably denote the value of a variable x at time t. time derivative of t denotes the series of x as t varies from t to t . during an epidemic, a key concern for public health officials is to determine the right level and schedule of contact restrictions that balances the disease and socioeconomic burdens. strict lockdown conditions for a short time period suppress the infection levels, but infections tend to flare up again on easing restrictions unless the epidemic is completely wiped out. on the other hand, prolonged restrictions with no intermittent easing hinder economic activity and impose heavy costs on vulnerable population groups. furthermore, progressive reduction in the susceptible population offers a chance for relaxation of restrictions that needs to be exploited. modelling the multi-faceted impact of contact restrictions requires accounting for region-specific cultural and economic constructs as well as the available medical capacity, a highly complex task. for tractability, we assume that the public health goal is to limit active infections to a certain target level determined via an independent impact analysis [ ] . the controls available to the public health authorities can be viewed as multiple knobs that can be set to different levels (e.g., public transport at % occupancy). however, the need to communicate the policy to the general public and ensure compliance entails a simpler strategy centred around a few discrete restriction levels [ , ] , (e.g. table ) and a preset schedule for a future time horizon, which is often longer than the intervals at which the epidemic observations are collected. for example, the infection levels might be monitored at a daily frequency, but the restriction guidelines (e.g., level on weekdays and level on weekends) might be chosen for a monthly period. restriction control. for a given region, let n, s curr , i curr be the total, current susceptible, and infectious populations respectively. let i target avg be the target average infectious level. let a be the set of restriction levels for which the transmission rate is known or can be estimated example. on oct , , a hypothetical city has a population of m of which . m are currently infectious and . m are post-infectious with the rest still susceptible. assuming five restriction levels with transmission rates of [ . , . , . , . , . ], the objective is to figure out a restriction policy for the upcoming month so that the infectious count averages around i target avg = , . since our primary focus is on an analytical control framework, we make simplifying assumptions on the observability, (i.e., accurate estimates of the infectious population is possible via a mix of serosurveys and diagnosis tests) and the infection dynamics (region isolation, homogeneous interactions, negligible incubation period, and constant infectious period). section describes extensions when some of these assumptions are relaxed. our work builds on three research areas: (a) compartmental epidemiological models, (b) lotka-volterra systems, and (c) optimal control of non-linear dynamical systems. infectious diseases are commonly modeled using compartmental models where the population (n ) is divided into various compartments corresponding to different disease stages and the intercompartment transitions are governed by the model dynamics. the sir model [ , ] is the simplest and most widely used one. the model comprises three compartments: susceptible (s), infectious (i) and removed (r -includes immune & post-infectious persons) with the dynamics in fig (b) . here β is the rate of disease transmission from infectious to susceptible individuals, which largely depends on the contact restriction policy. γ is the inverse of the average infectious period, which depends on the testing and quarantine policy but is largely invariant when testing volumes are low [ ] . in this model, the effective reproduction number (avg. #direct infections from each infection) is r ef f = βs γn . existing restriction control approaches [ ] are often guided by the principle of ensuring r ef f . certain infectious diseases have a significant incubation period when the individuals are infected but are not spreading the disease (non-infectious). the seir model [ ] includes an additional e (exposed) compartment to model the incubation phase. lotka-volterra (lv) systems [ , , ] model the population dynamics of predator-prey interactions in a biological ecosystem. these models form a special case of kolmogorov systems [ ] that capture the evolution of a stochastic process over time. in a simple -species lv system, the population of prey (p) interacts with that of predator (q). the growth rate of prey depends on its reproduction rate (r) and the rate of consumption by predator (e). the change in predator population depends on the nourishment-based birth rate b and its death rate d. the system has two fixed points: (a) a saddle point that maps to extinction (p, q) = ( , ), and (b) a stable equilibrium at (p, q) = ( d b , r e ). typically, the system exhibits oscillations resulting in a closed phase plot that corresponds to conservation of an "energy" function. fig (a) presents the dynamics of an lv system and the oscillations of the prey and predator populations. due to the criticality of ecological population control, there has been considerable research on multiple variants of lv systems [ , ] and their hamiltonian dynamics [ ] . optimal control of dynamical systems has rich connections to multiple fields [ ] that deal with optimizing sequential decisions to maximize a desired objective such as reinforcement learning [ ] , multi-armed bandits [ ] , and stochastic control. given a set of control variables, the optimal control policy describes the time derivatives of these variables that minimize the cost function and can be derived using pontryagin's maximum principle [ ] or the hamilton-jacobi-bellman equations [ ] . . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint the copyright holder for this this version posted november , . ; though there exist comprehensive techniques for control of linear dynamical systems, the control strategies for non-linear dynamical systems rely heavily on the existence of control-lyapunov functions, which are typically identified using conservation laws of the associated physical systems. once a suitable lyapunov function is identified, there exist multiple control design strategies such as feedback linearization, backstepping and sliding mode control that are guaranteed to converge using artstein's theorem [ ] . in the case of the sir model, a suitable lyapunov function is not readily evident. on the other hand, lyapunov stability and practical control strategies of lv systems have been extensively studied [ , , , ]. our primary goal is to solve the contact restriction control problem in section . we choose to mainly focus on sir dynamics because it captures the core disease spread mechanism that is an essential element of most epidemiological models. since existing work on stability analysis of sir models [ ] does not address controllability, we first establish a connection between the sir model and lv system, which is more amenable to principled control. in section , we leverage the properties of the lv system to propose strategies for restriction control. the problem of stabilizing infection levels assuming sir dynamics has a direct analogy with population control in lv predator-prey systems where it is desirable to maintain the predator and prey population at certain target levels suitable for the ecosystem. comparing the sir and lv dynamics in fig , we observe that the behaviour of the infectious people (i) is similar to that of the "predators" (q). there is an inflow (birth) βsi/n that depends on β as well as the current infectious & susceptible population and an outflow (death) γi from the i to the r compartment. however, the counterpart for the "prey" is not readily apparent. an intuitive choice for "prey" is the "susceptible contacts" (i.e., the product of susceptible people and β, the #contacts of an infectious person per day) since this acts as "nourishment" to the predators and contributes to the inflow into the i compartment. denoting the susceptible contacts by j = βs, we note that equivalence with the lv system requireṡ ( ) hence, the transmission rate β has to folloẇ this modified version of sir model (lvsir) maps to a lotka-volterra system. comparing the model parameters, we note that the inverse of the infectious period (γ) corresponds to the predator death rate (d) and the inverse of population ( /n ) to the predator birth rate (b). we now analyze the behaviour of the lvsir system. theorem for the lvsir model in fig. (c) , the following holds true: please see appendix a for the proofs. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. where (x min , x max ) are defined as above, g(z) = γ(e z − z − ) − w , and f (s), f (s) are restrictions of f (s) = s + r log( − s r ) to +ve and -ve ranges. in general, t period has the form g(r, γ, w r ) with approximation via linearization yielding t period table . similar to an lv system, the "energy" which corresponds to a weighted itakura-saito distance [ ] between . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted november , . ; (i, j) and the equilibrium (i * , j * ) is conserved. the infectious population i (and susceptible contacts j) oscillates between [y min i * , y max i * ] (and [x min j * , x max j * ] ) during the entire period with an average value of i * (and j * ) while the susceptible population reduces steadily in a staircase-like fashion. the transmission rate β also exhibits periodic oscillations but the average steadily goes up to compensate for the reduction in the susceptible population. fig (c-b) shows the phase plot and the variation of the key quantities during a single period with the four extreme points (south, east, north, west) marked explicitly. fig (d) shows how t period [ ] depends on r for different choices of w /r. this mapping can be used to identify suitable values of r for practical policy making. for the special case where the initial state is at equilibrium, the system behaviour is steady as in fig (a) . we now consider the problem of controlling the transmission rate β for the lvsir model (fig (c) ) to nudge the infectious levels to a desired equilibrium. as discussed in section , control of non-linear dynamical systems is typically achieved via control-lyapunov functions (clfs). our approach is to exploit the mapping from the sir model to lv system and use clfs derived from the "lotka-volterra energy" function w(j, i). given a dynamical systemż = f (z, u) with state vector z ∈ d ⊂ r n , control u ∈ r m , and equilibrium state z * = , a control-lyapunov function (clf) is a function v : the clf v (·) can be viewed as a generalized energy function withv (·) being a dissipation function. artstein [ ] proved that as long as there is a clf, there exists a control u to ensure the reduction of energy at every non-equilibrium state and eventual convergence to the zero energy equilibrium. ,ż = f (z, u) = f (z) + m i= f i (z)u i with state z ∈ d ⊂ r n , control u ∈ r m , once a clf is identified, it is relatively straightforward to design an appropriate control function u as described in [ , ] . for our scenario, we rely on the conservation law of the lv system as well as the existing literature on its lyapunov functions [ ] . let z = ( j j * − , i i * − ) so that the equilibrium z * = ( , ). let l(a , a ) : r + → r be a continuously differentiable divergence such that | dl da | > and l(a , a ) > , ∀a = a and l(a , a ) = ⇐⇒ a = a . then, the function v (z) = l(w, w target ) where w = w(j, i) can be used as a clf. we focus on the case where w target = w * = . and propose a controlled sir model (cosir). theorem (cosir) for the sir model, a proportional additive control on β, the convergence rate dependent on η. the design of the control also makes it robust to perturbations in the infectious population as the system re-calibrates β as appropriate. the β-control policy (eqn. ) can be interpreted as follows. the first term β i n corresponds to the relaxation possible due to the decreasing susceptible population while the second term (r − ei)β leads to oscillatory behavior, and the last term uβ = −η(t)β dl dw (j/j * − ) ensures dissipation of energy and convergence to the equilibrium. we now describe a practical solution to the public health restriction control problem in section . algorithm outlines a holistic approach to obtain a restriction schedule using the optimal β-control in theorem . there are four key steps. input collection. infection level targets (i target avg , i target max ), periodicity of the restriction schedule (t period ), decision horizon (t ) need to be determined based on a careful assessment of public health and socioeconomic considerations. historical case counts and restrictions ([s t , i t , r t , a t ], [t] curr ) also need to be collected to enable accurate optimization. data-driven calibration. the next step is to use sir-calibration methods [ , , ] along with historical data to estimate a static γ, time-varying β, and the state of the epidemic (s curr , i curr , r curr ). the restriction level to transmission map, π, can be initially chosen from public health guidelines [ ] and refined using the observed β for past restrictions in the region of interest. choosing cosir parameters. the free parameters of the cosir model need to be chosen based on the control requirements. algorithm lists the updates derived from theorem with flexibility on the choice ofβ curr and η. choosing the immediate transmission rate to beβ curr = j * scurr = γn scurr (equivalent to forcing effective reproduction number r ef f = ) ensures a maximal reduction in the system "energy" and faster convergence to the desired equilibrium, but dampens fluctuations. however, fluctuations might be necessary for economic activity. when nearly steady infection levels are desired, i target avg i target max , then r = ( π) γt period and high η are appropriate. computing optimal restrictions. determining the optimal policy can be split into two phases. the first involves estimating the ideal β control from eqn. while the second involves identifying the "closest" restriction level for the ideal β at each time step with "closeness" based on a suitable divergence such as the squared loss. employing algorithm for the scenario in example with a weekly periodicity, we obtain suitable parameters (table ) for the cosir model. fig shows the behaviour with three variants of restriction policies (arbitrary, cosir, and cosir-approximation based on the specified levels). in case of the arbitrary policy, the infections peak in narrow time intervals and become unmanageable while the ideal cosir and even approximate control variants are able to achieve a steady rate of infections. this is true even when the infection levels are subject to sudden upward (t = ) or downward perturbations (t = ) in case of super-spreader events or sudden quarantine restrictions respectively. the β-control mechanism adapts and continues to push towards the equilibrium. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. we now briefly describe some key extensions. delayed sir & seir models. the seir model allows explicit modelling of the incubation period and is known to closely mimic the behaviour of the delayed sir model [ ] . when β follows eqn. , the delayed sir model readily maps to a delayed lv system with a non-preying growth period for the predators, which is a special case of the well-studied wangersky-cunningham systems [ , ] . it can be shown that the modified delayed sir system with a delay τ has the same equilibrium (j * , i * ) = (γn, r/e), exhibits (unbounded) oscillations and permits control of the form eqn , where there is a need for special handling when j approaches j * with the behavior depending on τ . testing & isolation policy. testing, tracing and isolation also play a critical role in regulating the epidemic. in terms of sir and seir dynamics, the net effect of aggressive testing is minimizing the infectious period or increasing γ [ , ] . this is analogous to the culling of predators (infectious population) by increasing the death rate for which there already exist multiple control mechanisms [ ] . in particular, choosing v (z) = l(w(j, i), w * ) as the control-lyapunov function of interest, we obtain the control, γ(t) = γ + ζ(t) dl dw ( i i * − ), with ζ(t) > . online learning. the restriction control problem can also be posed as a non-linear contextual bandit [ ] formulation with the cumulative lv energy w(j, i) of the cosir model over a future horizon interpreted as the (negative) "reward". here, the discrete restriction levels can be viewed . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. as the multiple arms of a bandit, the context includes the state of the epidemic, and the "reward" distribution is computed using the context and the observed transmission rate for the arms. our current work proposes an analytical framework for epidemic control with the intent of supporting an active goal-oriented public health response. the proposed framework relies on a mapping between sir dynamics to lotka-volterra systems under specific conditions on the transmission rate. given the vast literature on control of lv systems, this mapping can be leveraged to design new epidemic control techniques as well as extend current results to richer heterogeneous compartmental models and additional control variables (e.g., testing levels). effective practical implementation of control requires further exploration of online and reinforcement learning variants [ ] , addressing the limitations of sir dynamics, and incorporation of additional signals such as mobility [ ] . this effort also points to the feasibility of exploring control strategies for other macro social and physical systems, e.g., good and bad actor populations on social media. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint the copyright holder for this this version posted november , . ; https://doi.org/ . / . . . doi: medrxiv preprint denoting w b = (r + γ)d max , from theorem (a), we note that w(j(t), i(t)) = w(j , given the nature of f (s), f (s) < c ⇒ s min < s < s max where (s min , s max ) are the finitevalued roots of f (s) = c. hence x(t) and y(t) are both bounded on either side. consequently, (x(t) − , y(t) − ) is confined to a bounded rectangle and thus, where can be directly expressed in terms of δ and vice versa. hence, from definition , z * = ( , ) (or equivalently (j * , i * )) is a stable equilibrium. when initial state (j , i ) is at equilibrium (j * , i * ) = (γn, r/e), we have (j(t), i(t)) = (γn, r/e), ∀t. hence,Ṡ similarly,Ṙ = γi * ⇒ r(t) = r + γi * t, let z = log(x). then,ẋ = e zż andẍ = e z (z +ż ). from , we have e z (z +ż ) − e z (ż) e z −γ(e z − )(e zż − re z ) = ⇒z − γ(e z − )(ż − r) = . choosing s =ż ⇒ s =ẋ x = −r(y − ). let w = w(j , i ). then, the trajectory corresponds to γ(x − log x − ) + r(y − log y − ) = w ⇒ γ(e z − z − ) + r(y − log y − ) = w where g(z) = γ(e z − z − ) − w and f (s) = s + r log( − s r ). let f (s), f (s) be the restrictions of f (s) for the lower and upper parts of the phase plot. then the time period for the lower section is given by dz. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint the copyright holder for this this version posted november , . ; https://doi.org/ . / . . . doi: medrxiv preprint when w , linearization is possible. simplifying the trajectory f (s) = g(z) using the approximations e a = + a + a and log( − a) = a − a , we have γ(e z − z − ) − w = s + r log( − s r ) essentially, we have an elliptical curve with x, y following sinusoidal behavior with a period π √ rγ . proof of theorem (d). assuming a continuous form for j, we observe thaṫ idt (since i is periodic) = γi * t period (from above) . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint the copyright holder for this this version posted november , . ; analysis and mapping of policies -covid-amp covid- : a global perspective google mobility data new zealand covid- alert levels global stability for sir and sirs models with differential mortality stabilization with relaxed controls the limits to learning an sir process: granular forecasting for covid- lotka-volterra dynamics: an introduction a quantitative compendium of covid- epidemiology an seir infectious disease model with testing and conditional quarantine real time bayesian estimation of the epidemic potential of emerging infectious diseases contextual bandit algorithms with supervised learning guarantees on fast multi-shot covid- interventions for post lock-down mitigation elementary differential equations and boundary value problems a lyapunov-based approach to safe reinforcement learning dynamic interventions to control covid- pandemic: a multivariate prediction modelling study comparing worldwide countries forecasting the impact of the first wave of the covid- pandemic on hospital demand and deaths for the usa and european economic area countries. medrxiv alexandr ten, et al. a structured open dataset of government interventions in response to covid- optimal control and synchronization of lotka-volterra model impact of non-pharmaceutical interventions (npis) to reduce covid mortality and healthcare demand exact analytical solutions of the susceptibleinfected-recovered (sir) epidemic model and of the sir model with equal death and birth rates coexistence and extinction for stochastic kolmogorov systems the mathematics of infectious diseases a remark on the period of the periodic solution in the lotka-volterra system the behaviour and attractiveness of the lotka-volterra equations analysis synthesis telephony based upon the maximum likelihood method a comparison of delayed sir and seir epidemic models a contribution to the mathematical theory of epidemics evaluating covid- lockdown and business-sector-specific reopening policies for three us states test sensitivity is secondary to frequency and turnaround time for covid- surveillance covid policy trackers globally stabilizing feedback control of process systems in generalized lotka-volterra form predator-prey models with delay and prey harvesting controller design techniques for the lotka-volterra nonlinear system hamiltonian structure of the lotka-volterra equations control of oscillatory behavior of multispecies populations stochastic hamilton-jacobi-bellman equations from reinforcement learning to optimal control: a unified framework for sequential decisions lyapunov functions for lotka-volterra systems: an overview and problems indsci-sim a state-level epidemiological model for india the period of a lotka-volterra system a 'universal' construction of artstein's theorem on nonlinear stabilization stabilization of affine in control nonlinear systems sufficient lyapunov-like conditions for stabilization. mathematics of control, signals and systems time lag in prey-predator population models stability and bifurcation analysis for a delayed lotka-volterra predator-prey system a proofs definition [ ] let z * ∈ r n be a critical point of a system of odes. the critical point z * is stable if, for any > ∃ δ > such that if z = φ(t) satisfies ||φ( ) − z * || < δ then ||φ(t) − z * || < , ∀ t > . at equilibrium (j * , i * ), we havej = andİ = . from eqn. , it follows thaṫ j| (j,i)=(j * ,i * ) = rj * − ei * j * = ⇒ i * = r e ,to prove the stability of the critical point at (j * , i * ), let us consider the normalized variableshence, w(j, i) remains invariant throughout and is equal to w(j , i ) = w . let w = w(j , i ) be the energy associated with the modified sir system in to identify the extreme x values, we observe thatγ . both the extreme values of x are realized for y = . similarly, the extreme values of y are attained for x = and given by (y min , y max ) which correspond to the roots of f (x) = w r .proof of theorem (c).the period of a lotka-volterra system has been derived in multiple works [ ] . we include the below proof based on hsu's method [ ] for completeness. assuming a proportional additive control on β of the formβ = β si n + (r − ei)β + uβ, the variation of the susceptible contacts j is given bẏlet z = (j/j * − , i/i * − ) = (z , z ) so that z = ( , ) corresponds to the equilibrium state. then w(j, i) = γ(z − log( + z )) + r(z − log( + z )). for v (z) = l(w(j, i), w * ) to be a control-lyapunov function, we requirev (z, u) < .when the control is chosen as u = −η(t) dl dw ( j j * − ) and η(t) > , ∀t, key: cord- -px lf authors: anand, nikhil; sabarinath, a.; geetha, s.; somanath, s. title: predicting the spread of covid- using [formula: see text] model augmented to incorporate quarantine and testing date: - - journal: trans indian natl doi: . /s - - - sha: doc_id: cord_uid: px lf india imposed a nationwide lockdown from th march onwards to combat the spread of covid- pandemic. to model the spread of a disease and to predict its future course, epidemiologists make use of compartmental models such as the [formula: see text] model. in order to address some of the assumptions of the standard [formula: see text] model, a new modified version of [formula: see text] model is proposed in this paper that takes into account the percentage of infected individuals who are tested and quarantined. this approach helps overcome the assumption of homogenous mixing of population which is inherent to the conventional [formula: see text] model. using the available data of the number of covid- positive cases reported in the state of kerala, and in india till th april, and th may , respectively, the parameter estimation problem is converted into an optimization problem with the help of a least squared cost function. the optimization problem is then solved using differential evolution optimizer. the impact of lockdown is quantified by comparing the rising trend in infections before and during the lockdown. using the estimated set of parameters, the model predicts that in the state of kerala, by using certain interventions the pandemic can be successfully controlled latest by the first week of july, whereas the [formula: see text] value for india is still greater than , and hence lifting of lockdown from all regions of the country is not advisable. corona virus disease has presented an out of the ordinary challenge before us. as of th may , the disease has infected more than four million people globally and claimed the lives of almost three hundred thousand individuals (situation report- , ) . covid- is caused by the novel corona virus sars-cov- (report of who ). currently, there is no clinically proven medicine to treat this ailment (sanders et al. ) . optimistic researchers suggest that a clinically proven and tested vaccine is at least - years away (ferguson et al. ) . as per who guidelines, the only way to combat and contain this pandemic is to maintain personal hygiene and observe strict social distancing measures (ferguson et al. ). this strategy is widely adopted across the world and many countries have imposed nationwide lockdowns (situation report- ). the first case in india was reported on th january (situation report- ). as of th may , the cases have risen to more than seventy thousand in number (ministry of health & family welfare ). government of india took an active step towards containing the spread of this disease by initially imposing a -day nationwide lockdown from th march and later extending it for more weeks. this measure of abundant precaution has helped to suppress the frequency and magnitude of social contacts within the country (situation report- ). this quarantine period is also accompanied with continuous contact tracing of the increasing infected patients by local government authorities and functioning at war-footing of the health care system. epidemiology is a branch of medicine that deals with the incidence, distribution and possible control of diseases in a population. epidemiological models such as the sir model are widely used to model the spread of diseases in a population. the standard sir model works with the assumption that there is continuous contact between the infected and susceptible population (adamu et al. ). this assumption is violated in a scenario wherein restrictions such as quarantine and social distancing are enforced, in which case the portion of infected population which is quarantined will not be contributing in the spread of the disease. several authors have made modifications to the existing sir model to make more accurate predictions. peng et al. ( ) has developed a -variable seir model, which was used to model the outbreak in china. lopez et al. ( ) , have used a model similar to peng, to predict the outbreak in italy and spain. stochastic models can also be used to make predictions, such as aravind et al. ( ) , where a stochastic model is used to predict the outbreak in india. in the same study, the r value for india was estimated between . and . . in this paper, the growth characteristics of this disease using an augmented sir model are discussed in detail. the standard sir model is tweaked to study the effect of testing by introducing categorization for quarantined and unquarantined individuals. the growth trend for pre-lockdown and post-lockdown scenarios is assessed and the necessary parameters of the sir model have been estimated separately for the state of kerala and india, as the authors and their affiliated institution are located in the indian state of kerala. the following section of this paper describes the standard sir model along with its assumption, and the proposal of the modified sir model is later presented. the methodology used in this study as well as the results and predictions for the state of kerala and india are presented in the later sections. the standard sir model divides the entire population into three compartments, namely, s : susceptible, i : infected, and r : removed or recovered (adamu et al. ) . a set of ordinary differential equations are then used to solve the disease dynamics and propagate the model. the disease dynamics equations are, where s is the number of susceptible people in the population, i.e. number of people who are healthy but are likely to fall ill, i is the number of people who are infected and infectious. upon interacting with the susceptible population, these people are likely to infect them, r is the number of people who have recovered or removed from considerations. this group consists of the number of people who were infected once upon a time, but have recovered and will neither get infected nor infect again, n is the total number of people in the population. the rate of change of infected population di dt depends on two primary factors: . the number of people falling ill, and . the number of people recovering. the number of people falling ill is dictated by the level of interactions between the infected and the susceptible population, and is related by the constant , which stands for the rate of spread of infection by an infected person per day when he/she interacts with the susceptible population. the number of people recovering is dictated by the rate of recovery . despite its popular usage, the sir model makes several assumptions. some important assumptions are: ( ) it assumes that the total population does not change much with time, i.e. ds dt + di dt + dr dt = . ( ) the model assumes that each person in the population has the same health characteristics, i.e. same immunity, same immuno-response, etc. ( ) every person in the population interacts with one another. ( ) all infected people are infectious and are spreading the disease among the susceptible population. ( ) the natural history of the diseases, i.e. latency period, incubation period, etc., is not considered in this model. assumptions ( ) and ( ) can result in significant overestimation in the total number of cases. in order to address assumption , this paper presents a modified sir model that takes into account the effect of quarantining the infected people. in order to accommodate the quarantine effect into the sir model, the infected population (i) , is broken into two subgroups (shown in fig. ), namely, quarantined: consists of people who are infected and have been tested positive for the disease. once they are quarantined, they no longer pose a risk to the susceptible population. unquarantined: consists of people who are infected and have still not been tested for the diseases. this group is still interacting with the susceptible population and poses a risk to them. the two groups can be correlated using the factor of testing (ft) which stands for the fraction of infected people who are tested and quarantined. if ft increases, then more infected people will get quarantined, and hence, the spread can be controlled as the bulk of infected people can no longer infect others. the modified sir equations are as such, the rate of change of infected population, di dt is broken into two parts, dq dt and duq dt referring to the rate of change of quarantined population and unquarantined population, respectively. separate recovery rates have been considered here for quarantined and unquarantined population owing to the fact that the quarantined population may be receiving better medical assistance at hospitals in the presence of professionals. however, due to lack of data, both the recovery rates are assumed to be same in this study, i.e. the basic reproduction number r and the effective reproduction number r eff are indicative of the number of secondary infections caused by a single infected person. in the standard sir model, the basic reproduction number r is defined as the ratio of the rate of spread and the rate of recovery, i.e. . the equation for basic reproduction ratio can now be modified to during the beginning of an outbreak, s n ∼ , hence the equation for r can be written as this equation offers some very intuitive results. in order to control an outbreak of a disease, the basic reproductive number r should be less than ( r < ), which can be achieved in three possible ways, increase in ft (factor of testing) if testing is ramped up and more number of infected people are tested, the number of unquarantined people will reduce, thereby controlling the spread of the disease among the susceptible population. reduction in (rate of spread) by employing social distancing measures and personal protective measures, such as wearing masks and regularly washing hands, the rate of spread can be controlled. increase in ν (rate of recovery) by ensuring the presence of sufficient amount of medicines and adequate healthcare equipment, the rate of recovery can be increased. the presence of factor of testing (ft) in the sir model greatly affects the dynamics of the model. shown in fig. a is the variation in the spread of a disease in a sample population of , people for different values of ft while keeping the rate of spread per infected person per day ( ) and recovery ratio ( ) constant. the higher the value of ft, i.e. the higher the number of infected people who are quarantined, better controlled is the epidemic. if only % of all infected people are tested, the active cases can rise to , which may be challenging for the health system to deal with. if testing is increased and % of all infected people are tested, the maximum number of active cases can be reduced to . and if the testing is further increased to % of all infected fig. categorization of infected individuals based on factor of testing people, the maximum number of active cases can be brought down to . this showcases the importance of testing in controlling a pandemic. figure b shows the variation in the spread of a disease in the same sample population of , people for different values of while keeping the factor of testing (ft) and the recovery ratio ( ) constant. as explained earlier, is an indicator of the levels of interaction among a population, i.e. higher the value of , the higher the levels of social interaction. if the interactions between the population are reduced, the spread of the disease can be controlled. for a value of . , where each infected person spreads the disease to . people every day, the number of active cases can rise to beyond . if with the help of social distancing measures, the value of is reduced to . , the maximum number of active cases can be reduced to . if the government poses stricter social distancing measures, and the value of is further brought down to . , then the maximum number of active cases can be kept below . the parameters and are estimated from the available data on the number of positive cases (crowd source and our world in data ). the above problem is then converted into a parameter estimation problem by formulating a cost function and using a differential evolution optimizer. assume y d to be the time series of the number of reported cases obtained from the data, and y m to be the same parameter predicted by the model. the cost function can then be written as the sum of the squared difference of all data points as explained in eq. : where n is the total number of data points. differential evolution (de) is an evolutionary heuristicsbased optimization technique developed by storn and price in . the method iteratively tries to improve the solution estimates by regularly creating new candidate solutions via combining existing ones according to a simple mathematical formula (storn and price ) . the simulations were carried out in python . © using the in-built differential evolution routine provided by the scipy© package version . . . by th april, kerala had witnessed a total of confirmed covid- positive cases, with the highest rise seen on th march, after which a steady decline in the cases was observed. the technique presented in the above section was used to estimate the β and ft values before lockdown and during lockdown using data available till th april, and the following estimates were obtained. as shown in table , before the enforcement of lockdown on th march, , the basic reproductive number (r ) in kerala is estimated to be . with only % of all infected people tested. after the enforcement of lockdown, the r appears to have reduced to . with testing improved to ~ %, which indicates a receding epidemic in kerala. as shown in fig. , a significant match can be seen between the model predictions and the actual data, with a mean absolute percentage error of . %. given the current situation in kerala, several different scenarios were simulated. in the following figures, the coloured region indicates a lockdown. if the lockdown is opened on rd may and the existing testing levels are maintained, then assuming a level of interaction similar to the pre-lockdown period, the number of cases in kerala can be expected to increase (shown in fig. ). if the government opens the lockdown on rd may, and at the same time increases the testing levels to %, then assuming the same level of interactions as the pre-lockdown period, the model predicts that kerala will stop seeing new cases by th may, as shown in fig. . increasing testing levels rapidly can pose several logistical constraints, and thereby cannot be expected. instead, if the government decides to opt for a -month long lockdown, while gradually increasing the testing to %, then assuming the same level of interactions as the pre-lockdown period, the model predicts that the pandemic can be completely controlled in kerala by the first week of july (as shown in fig. ). having a long lockdown can severely affect the socio-economic activities of the state, therefore a possible option for the government can be to enforce intermittent or staggered lockdowns. considered here is a scenario where the lockdown is opened on rd may for week, after which a -day lockdown is re-imposed followed by another week relief period and then another -day lockdown. in such a scenario, assuming the existing levels of testing, the model predicts that the cases in kerala can be controlled by th june, (as shown in fig. ). another possible option for the government could be to open the lockdown on rd may, but with social distancing measures. here, we have assumed that the level of interactions after the lockdown to be an average of the level of interactions before and during lockdown, i.e.: where al is the rate of spread after the lockdown is lifted, bl is the rate of spread before the lockdown was imposed, and l is the rate of spread during the lockdown. in such a scenario, assuming the present levels of testing, the model predicts that the number of cases in kerala can be controlled by the first week of june (as shown in fig. ). unlike kerala, the covid- -positive cases in india are still on the rise. as of th may, , there are ~ , positive cases; hence removal of the lockdown is out of question. figure shows the comparison between the model's prediction and actual data. data till th may were used to estimate the parameters r and ft before and during lockdown. the estimated parameters, with a mean absolute percentage error of . % are given below. as shown in table , the r value for india appears to have come down from . before lockdown to . during lockdown, but is still not low enough. the current estimates also indicate that india is testing % of its infected people. if this scenario continues, the model predicts that india can for the state of kerala, the estimated values of and ft indicate that the pandemic is receding, and with a few careful decisions, it can be completely controlled. if the lockdown is lifted on rd may without any protective measures, the numbers will start rising again. in such a scenario, the following possible options can be explored in kerala, namely, ( ) lift lockdown by rapidly increasing testing to %. in such a case, the spread can be controlled by may th. ( ) extend lockdown to months while gradually increasing testing to %, in which case, the spread can be controlled by first week of july. ( ) intermittent or staggered lockdown while maintaining the current levels of testing, in which case, the spread can be controlled by th june. ( ) lifting the lockdown on rd may while ensuring adequate social distancing measures, in which case, the spread can be controlled by the first week of june. in the case of india, estimates show that the r value during lockdown has come down to . from . before lockdown. although improved, the existing r value is not low enough to call off the lockdown, hence removal of the country-wide lockdown must not be considered at present. mathematical modelling using improved sir model with more realistic assumptions ) epidemic landscape and forecasting of sars-cov- in india report- , impact of non-pharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand. imperial college covid- response team modified seir model to predict covid- outbreak in spain and italy: simulating control scenarios and multi scale epidemics district wise list of reported cases our world in data : coronavirus source data epidemic analysis of covid- in china by dynamic modelling report of who-china joint mission on coronavirus disease ( ) world health organization situation report- coronavirus disease world health organization situation report- , coronavirus disease ( ) world health organization differential evolution-a simple and efficient adaptive scheme for global optimization over continuous spaces the authors would like to express their sincere thanks to mustafa shahid, vssc, isro; ramesh mettu, vssc, isro; rani radhakrishnan, vssc, isro and harish c.s, vssc, isro, for their valuable comments that helped shape this study and their contributions during the preparation of this manuscript. key: cord- -dl z x h authors: dandekar, r.; rackauckas, c.; barbastathis, g. title: a machine learning aided global diagnostic and comparative tool to assess effect of quarantine control in covid- spread date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: dl z x h we have developed a globally applicable diagnostic covid- model by augmenting the classical sir epidemiological model with a neural network module. our model does not rely upon previous epidemics like sars/mers and all parameters are optimized via machine learning algorithms employed on publicly available covid- data. the model decomposes the contributions to the infection timeseries to analyze and compare the role of quarantine control policies employed in highly affected regions of europe, north america, south america and asia in controlling the spread of the virus. for all continents considered, our results show a generally strong correlation between strengthening of the quarantine controls as learnt by the model and actions taken by the regions' respective governments. finally, we have hosted our quarantine diagnosis results for the top $ $ affected countries worldwide, on a public platform, which can be used for informed decision making by public health officials and researchers alike. we have developed a globally applicable diagnostic covid- model by augmenting the classical sir epidemiological model with a neural network module. our model does not rely upon previous epidemics like sars/mers and all parameters are optimized via machine learning algorithms employed on publicly available covid- data. the model decomposes the contributions to the infection timeseries to analyze and compare the role of quarantine control policies employed in highly affected regions of europe, north america, south america and asia in controlling the spread of the virus. for all continents considered, our results show a generally strong correlation between strengthening of the quarantine controls as learnt by the model and actions taken by the regions' respective governments. finally, we have hosted our quarantine diagnosis results for the top affected countries worldwide, on a public platform, which can be used for informed decision making by public health officials and researchers alike. the coronavirus respiratory disease originating from the virus "sars-cov- " , has led to a global pandemic, leading to , , confirmed global cases in more than countries as of july , . as the disease began to spread beyond its apparent origin in wuhan, the responses of local and national governments varied considerably. the evolution of infections has been similarly diverse, in some cases appearing to be contained and in others reaching catastrophic proportions. given the observed spatially and temporally diverse government responses and outcomes, the role played by the varying quarantine measures in different countries in shaping the infection growth curve is still not clear. with publicly available covid- data by country and world-wide by now widely available, there is an urgent need to use data-driven approaches to bridge this gap, quantitatively estimate and compare the role of the quarantine policy measures implemented in several countries in curtailing spread of the disease. as of this writing, more than a papers have been made available, mostly in preprint form. existing models have one or more of the following limitations: • lack of independent estimation: using parameters based on prior knowledge of sars/mers coronavirus epidemiology and not derived independently from the covid- data or parameters like rate of detection, nature of government response fixed prior to running the model. • lack of global applicability: not implemented on a global scale. • lack of interpretibility: using several free/fitting parameters making it a cumbersome, complicated model to reciprocate and use by policy makers. in this paper, we propose a globally scalable, interpretable model with completely independent parameter estimation through a novel approach: augmenting a first principles-derived epidemiological model with a data-driven module, implemented as a neural network. we leverage this model to quantify the quarantine strengths and analyze and compare the role of quarantine control policies employed to control the virus effective reproduction number [ ] [ ] [ ] [ ] [ ] [ ] [ ] in the european, north american, south american and asian continents. in a classical and commonly used model, known as seir, [ ] [ ] [ ] the population is divided into the susceptible s, exposed e, infected i and recovered r groups, and their relative growths and competition are represented as a set of coupled ordinary differential equations. the simpler sir model does not account for the exposed population e. these models cannot capture the large-scale effects of more granular interactions, such as the population's response to social distancing and quarantine policies. however, a major assumption of these models is that the rate of transitions between population states is fixed. in our approach, we relax this assumption by estimating the time-dependent quarantine effect on virus exposure as a neural network informs the infected variable i in the sir model. this trained model thus decomposes the effects and the neural network encodes information about the quarantine strength function in the locale where the model is trained. in general, neural networks with arbitrary activation functions are universal approximators. [ ] [ ] [ ] unbounded activation functions, in particular, such as the rectified linear unit (relu) has been known to be effective in approximating nonlinear functions with a finite set of parameters. [ ] [ ] [ ] thus, a neural network solution is attractive to approximate quarantine effects in combination with analytical epidemiological models. the downside is that the internal workings of a neural network are difficult to interpret. the recently emerging field of scientific machine learning exploits conservation principles within a universal differential equation, sir in our case, to mitigate overfitting and other related machine learning risks. in the present work, the neural network is trained from publicly available infection and population data for covid- for a specific region under study; details are in the experimental procedures section. thus, our proposed model is globally applicable and interpretable with parameters learned from the current covid- data, and does not rely upon data from previous epidemics like sars/mers. the classic sir epidemiological model is a standard tool for basic analysis concerning the outbreak of epidemics. in this model, the entire population is divided into three sub-populations: susceptible s; infected i; and recovered r. the sub-populations' evolution is governed by the following system of three coupled nonlinear ordinary differential equations here, β is the infection rate and γ is the recovery rates, respectively, and are assumed to be constant in time. the total population n = s(t) + i(t) + r(t) is seen to remain constant as well; that is, births and deaths are neglected. the recovered population is to be interpreted as those who can no longer infect others; so it also includes individuals deceased due to the infection. the possibility of recovered individuals to become reinfected is accounted for by seis models, but we do not use this model here, as the reinfection rate for covid- survivors is considered to be negligible as of now. the reproduction number r t in the seir and sir models is defined as an important assumption of the sir models is homogeneous mixing among the subpopulations. therefore, this model cannot account for social distancing or or social network effects. additionally the model assumes uniform susceptibility and disease progress for every individual; and that no spreading occurs through animals or other non-human means. alternatively, the sir model may be interpreted as quantifying the statistical expectations on the respective mean populations, while deviations from the model's assumptions contribute to statistical fluctuations around the mean. to study the effect of quarantine control globally, we start with the sir epidemiological model. figure a shows the schematic of the modified sir model, the qsir model, which we consider. we augment the sir model by introducing a time varying quarantine strength rate term q(t) and a quarantined population t (t), which is prevented from having any further contact with the susceptible population. thus, the term i(t) denotes the infected population still having contact with the susceptibles, as done in the standard sir model; while the term t (t) denotes the infected population who are effectively quarantined and isolated. thus, we can write an expression for the quarantined infected population t (t) as further we introduce an additional recovery rate δ which quantifies the rate of recovery of the quarantined population. based on the modified model, we define a covid spread parameter in a similar way to the reproduction number defined in the sir model ( ) as c p > indicates that infections are being introduced into the population at a higher rate than they are being removed, leading to rapid spread of the disease. on the other hand, c p < indicates that the covid spread has been brought under control in the region of consideration. since q(t) does not follow from first principles and is highly dependent on local quarantine policies, we devised a neural network-based approach to approximate it. recently, it has been shown that neural networks can be used as function approximators to recover unknown constitutive relationships in a system of coupled ordinary differential equations. , following this principle, we represent q(t) as a n layer-deep neural network with weights w , w . . . w n , activation function r and the input vector u = (s(t), i(t), r(t)) as for the implementation, we choose a n = -layer densely connected neural network with units in the hidden layer and the relu activation function. this choice was because we found sigmoidal activation functions to stagnate. the final model was described by tunable parameters. the neural network architecture schematic is shown in figure b. the governing coupled ordinary differential equations for the qsir model are more details about the model initialization and parameter estimation methods is given in the experimental procedures section. in all cases considered below, we trained the model using data starting from the dates when the th infection was recorded in each region and up to june . in each subsequent case study, q(t) denotes the rate at which infected persons are effectively quarantined and isolated from the remaining population, and thus gives composite information about (a) the effective testing rate of the infected population as the disease progressed and (b) the intensity of the enforced quarantine as a function of time. to understand the nature of evolution of q(t), we look at the time point when q(t) approximately shows an inflection point, or a ramp up point. an inflection point in q(t) indicates the time when the rate of increase of q(t) i.e dq(t) dt was at its peak while a ramp up point corresponds to a sudden intensification of quarantine policies employed in the region under consideration. we define the quarantine efficiency, q eff as the increase in q(t) within a month following the detection of the th infected case in the region under consideration. thus the magnitude of q eff shows how rapidly the infected individuals were prevented from coming into contact with the susceptibles in the first month following the detection of the th infected case; and thus contains composite information about the quarantine and lockdown strength; and the testing and tracing protocols to identify and isolate infected individuals. figure shows the comparison of the model-estimated infected and recovered case counts with actual covid- data for the highest affected european countries as of june , namely: russia, uk, spain and italy, in that order. we find that irrespective of a small set of optimized parameters (note that the contact rate β and the recovery rate γ are fixed, and not functions of time), a reasonably good match is seen in all four cases. recovery rates are assumed to be constant in our model, in the duration spanning the detection of the th infected case and june st , . the average contact rate in spain and italy is seen to be higher than russia and uk over the considered duration of − months, possibly because russia and uk were affected relatively late by the virus, which gave sufficient time for the enforcement strict social distancing protocols prior to widespread outbreak. for spain and italy, the quarantine efficiency and also the recovery rate are generally higher than for russia and uk, possibly indicating more efficient testing, isolation and quarantine; and hospital practices in spain and italy. this agrees well with the ineffectiveness of testing, contact tracing and quarantine practices seen in uk. although the social distancing strength also varied with time, we do not focus on that aspect in the present study, and will be the subject of future studies. a higher quarantine efficiency combined with a higher recovery rate led spain and italy to bring down the covid spread parameter (defined in ( )), c p from > to < in , days. respectively, as compared to days for uk and days for russia (figure ). figure shows q eff for the highest affected european countries. we can see that q eff in the western european regions is generally higher than eastern europe. this can be attributed to the strong lockdown measures implemented in western countries like spain, italy, germany, france after the rise of infections seen first in italy and spain. although countries like switzerland and turkey didn't enforce a strict lockdown as compared to their west european counterparts, they were generally successful in halting the infection count before reaching catastrophic proportions, due to strong testing and tracing protocols. , subsequently, these countries also managed to identify potentially infected individuals and prevented them from coming into contact with susceptibles, giving them a high q eff score as seen in figure . in contrast, our study also manages to identify countries like sweden which had very limited lockdown measures; with a low q eff score as seen in figure . this strengthens the validity of our model in diagnosing information about the effectiveness of quarantine and isolation protocols in different countries; which agree well with the actual protocols seen in these countries. figure shows reasonably good match between the model-estimated infected and recovered case counts with actual covid- data for the highest affected north american states (including states from mexico, the united states, and canada) as of june , namely: new york, new jersey, illinois and california. q(t) for new york and new jersey show a ramp up point immediately in the week following the detection of the th case in these regions, i.e. on march for new york and on march for new jersey ( figure ) . this matches well with the actual dates: march in new york and march in new jersey when stay at home orders and isolation measures were enforced in these states. a relatively slower rise of q(t) is seen for illinois while california showing a ramp up post a week after detection of the th case. although no significant difference is seen in the mean contact and recovery rates between the different us states, the quarantine efficiency in new york and new jersey is seen to be significantly higher than that of illinois and california (figure b), indicating the effectiveness of the rapidly deployed quarantine interventions in new york and new jersey. owing to the high quarantine efficiency in new york and new jersey, these states were able to bring down the covid spread parameter, c p to less than in days ( figure ). on the other hand, although illinois and california reached close to c p = after the day and day mark respectively, c p still remained greater than (figure ), indicating that these states were still in the danger zone as of june , . an important caveat to this result is the reporting of the recovered data. comparing with europe, the recovery rates seen in north america are significantly lower (figures a,b) . it should be noted that accurate reporting of recovery rates is likely to play a major role in this apparent difference. in our study, the recovered data include individuals who cannot further transmit infection; and thus includes treated patients who are currently in a healthy state and also individuals who died due to the virus. since quantification of deaths can be done in a robust manner, the death data is generally reported more accurately. however, there is no clear definition for quantifying the number of people who transitioned from infected to healthy. as a result, accurate and timely reporting of recovered data is seen to have a significant variation between countries, under reporting of the recovered data being a common practice. since the effective reproduction number calculation depends on the recovered case count, accurate data regarding the recovered count is vital to assess whether the infection has been curtailed in a particular region or not. thus, our results strongly indicate the need for each country to follow a particular metric for estimating the recovered count robustly, which is vital for data driven assessment of the pandemic spread. figure a shows the quarantine efficiency for major us states spanning the whole country. figure b shows the comparison between a report published in the wall street journal on may highlighting usa states based on their lockdown conditions, and the quarantine efficiency magnitude in our study. the size of the circles represent the magnitude of the quarantine efficiency. the blue color indicate the states for which the quarantine efficiency was greater than the mean quarantine efficiency across all us states, while those in red indicate the opposite. our results indicate that the north-eastern and western states were much more responsive in implementing rapid quarantine measures in the month following early detection; as compared to the southern and central states. this matches the on-ground situation as indicated by a generally strong correlation is seen between the red circles in our study (states with lower quarantine efficiency) and the yellow regions seen in in the wall street journal report (states with reduced imposition of restrictions) and between the blue circles in our study (states with higher quarantine efficiency) and the blue regions seen in the wall street journal report (states with generally higher level of restrictions). this strengthens the validity of our approach in which the quarantine efficiency is recovered through a trained neural network rooted in fundamental epidemiological equations. figure shows reasonably good match between the model-estimated infected and recovered case count with actual covid- data for the highest affected asian countries as of june , namely: india, china and south korea. q(t) shows a rapid ramp up in china and south korea ( figure ) which agrees well with cusps in government interventions which took place in the weeks leading to and after the end of january and february for china and south korea respectively. on the other hand, a slow build up of q(t) is seen for india, with no significant ramp up. this is reflected in the quarantine efficiency comparison (figure c), which is much higher for china and south korea compared to india. south korea shows a significantly lower contact rate than its asian counterparts, indicating strongly enforced and followed social distancing protocols. no significant difference in the recovery rate is observed between the asian countries. owing to the high quarantine efficiency in china and a high quarantine efficiency coupled with strongly enforced social distancing in south korea, these countries were able to bring down the covid spread parameter c p from > to < in and days respectively, while it took days in india ( figure ). figure shows reasonably good match between the model-estimated infected and recovered case count with actual covid- data for the highest affected south american countries as of june , namely: brazil, chile and peru. for brazil, q(t) is seen to be approximately constant ≈ initially with a ramp up around the day mark; after which q(t) is seen to stagnate (figure a). the key difference between the covid progression in brazil compared to other nations is that the infected and the recovered (recovered healthy + dead in our study) count is very close to one another as the disease progressed ( figure ). owing to this, as the disease progressed, the new infected people introduced in the population were balanced by the infected people removed from the population, either by being healthy or deceased. this higher recovery rate combined with a generally low quarantine efficiency and contact rate (figure d) manifests itself in the covid spread parameter for brazil to be < for almost the entire duration of the disease progression (figure a). for chile, q(t) is almost constant for the entire duration considered (figure b). thus, although government regulations were imposed swiftly following the initial detection of the virus, leading to a high initial magnitude of q(t), the government imposition became subsequently relaxed. this maybe attributed to several political and social factors outside the scope of the present study. even for chile, the infected and recovered count remain close to each other compared to other nations. a generally high quarantine magnitude coupled with a moderate recovery rate (figure d) leads to c p being < for the entire duration of disease progression (figure b). in peru, q(t) shows a very slow build up (figure c) with a very low magnitude. also, the recovered count is lower than the infected count compared to its south american counterparts (figure c). a low quarantine efficiency coupled with a low recovery rate (figure d) leads peru to be in the danger zone (c p > ) for days post detection of the th case (figure c). n y n j m i p a f l g a c a m a t x il m d o k u t a z n e w a o h o r c o s our model captures the infected and recovered counts for highly affected countries in europe, north america, asia and south america reasonably well, and is thus globally applicable. along with capturing the evolution of infected and recovered data, the novel machine learning aided epidemiological approach allows us to extract valuable information regarding the quarantine policies, the evolution of covid spread parameter c p , the mean contact rate (social distancing effectiveness), and the recovery rate. thus, it becomes possible to compare across different countries, with the model serving as an important diagnostic tool. our results show a generally strong correlation between strengthening of the quarantine controls, i.e. increasing q(t) as learnt by the neural network model; actions taken by the regions' respective governments; and decrease of the covid spread parameter c p for all continents considered in the present study. based on the covid- data collected (details in the materials and methods section), we note that accurate and timely reporting of recovered data is seen to have a significant variation between countries; with under reporting of the recovered data being a common practice. in the north american countries, for example, the recovered data are significantly lower than its european and asian counterparts. thus, our results strongly indicate the need for each country to follow a particular metric for estimating the recovered count robustly, which is vital for data driven assessment of the pandemic spread. the key highlights of our model are: (a) it is highly interpretable with few free parameters rooted in an epidemiological model, (b) its reliance on only covid- data and not on previous epidemics and (c) it is highly flexible and adaptable to different compartmental modelling assumptions. in particular, our method can be readily extended to more complex compartmental models including hospitalization rates, testing rate and distinction between symptomatic and asymptomatic individuals. thus, the methodology presented in the study can be readily adapted to any province, state or country globally; making it a potentially useful tool for policy makers in event of future outbreaks or a relapse in the current one. finally, we have hosted our quarantine diagnosis results for the top affected countries worldwide on a public platform (https://covid ml.org/ or https://rajdandekar.github.io/covid-quarantinestrength/), which can be used for informed decision making by public health officials and researchers alike. we believe that such a publicly available global tool will be of significant value for researchers who want to study the correlation between the quarantine strength evolution in a particular region with a wide range of metrics spanning from mortality rate to socio-economic landscape impact of covid- in that region. currently, our model lacks forecasting abilities. in order to do robust forecasting based on prior data available, the model needs to further augmented through coupling to with real-time metrics parameterizing social distancing, e.g. the publicly available apple mobility data. this could be the subject of future studies. the starting point t = for each simulation was the day at which infected cases were crossed, i.e. i ≈ . the number of susceptible individuals was assumed to be equal to the population of the considered region. also, in all simulations, the number of recovered individuals was initialized from data at t = as defined above. the quarantined population t (t) is initialized to a small number t (t = ) ≈ . the time resolved data for the infected, i data and recovered, r data for each locale considered is obtained from the center for systems science and engineering (csse) at johns hopkins university. the neural network-augmented sir ode system was trained by minimizing the mean square error loss function l nn (w, β, γ, δ) = log(i(t) + t (t)) − log(i data (t)) + log(r(t)) − log(r data (t)) that includes the neural network's weights w . for most of the regions under consideration, w, β, γ, δ were optimized by minimizing the loss function given in ( ) . minimization was employed using local adjoint sensitivity analysis , following a similar procedure outlined in a recent study with the adam optimizer with a learning rate of . . the iterations required for convergence varied based on the region considered and generally ranged from , − , . for regions with a low recovered count: all us states and uk, we employed a two stage optimization procedure to find the optimal w, β, γ, δ. in the first stage, ( ) was minimized. for the second stage, we fix the optimal γ, δ found in the first stage to optimize for the remaining parameters: w, β based on the loss function defined just on the infected count as l(w, β) = log(i(t) + t (t)) − log(i data (t)) . in the second stage, we don't include the recovered count r(t) in the loss function, since r(t) depends on γ, δ which have already been optimized in the first stage. by placing more emphasis on minimizing the infected count, such a two stage procedure leads to much more accurate model estimates; when the recovered data count is low. the iterations required for convergence in both stages varied based on the region considered and generally ranged from , − , . preliminary versions of this work can be found at medrxiv . . . and arxiv: . . data for the infected and recovered case count in all regions was obtained from the center for systems science and engineering (csse) at johns hopkins university. all code files are available at https://github.com/rajdandekar/mit-global-covid-modelling-project- . all results are publicly hosted at https://covid ml.org/ or https://rajdandekar.github.io/covid-quarantinestrength/. a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster coronavirus disease (covid- ) situation summary coronavirus disease (covid- ) situation report - what china's coronavirus response can teach the rest of the world whose coronavirus strategy worked best? scientists hunt most effective policies first case of novel coronavirus in the united states hidden outbreaks spread through u.s. cities far earlier than americans knew, estimates say coronavirus in latin america: what governments are doing to stop the spread an aggregated dataset of clinical outcomes for covid- patients the effect of travel restrictions on the spread of the novel coronavirus forecasting covid- and analyzing the effect of government interventions the effect of human mobility and control measures on the covid- epidemic in china impact of nonpharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand novel coronavirus -ncov: early estimation of epidemiological parameters and epidemic predictions estimation of the transmission risk of the -ncov and its implication for public health interventions early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study early dynamics of transmission and control of covid- : a mathematical modelling study. the lancet infectious diseases modelling the sars epidemic by a lattice-based monte-carlo simulation extension and verification of the seir model on the influenza a (h n ) pandemic in japan forecasting epidemics through nonparametric estimation of time-dependent transmission rates using the seir model approximations by superpositions of sigmoidal functions approximation capabilities of multilayer feedforward networks neural network with unbounded activation functions is universal approximator deep sparse rectifier neural networks maxout networks. th int. conf. mach. learn improving deep neural networks for lvcsr using rectified linear units and dropout workshop report on basic research needs for scientific machine learning: core technologies for artificial intelligence universal differential equations for scientific machine learning analysis of a spatially extended nonlinear seis epidemic model with distinct incidence for exposed and infectives ) diffeqflux.jl -a julia library for neural differential equations spain orders nationwide lockdown to battle coronavirus. the guardian italy extends coronavirus lockdown to entire country, imposing restrictions on million people how did britain get its coronavirus response so wrong? guardian coronavirus: what are the lockdown measures across europe what switzerland did right in the battle against coronavirus. marketwatch ) coronavirus: how turkey took control of covid- emergency sweden has become the world's cautionary tale these states have some of the most drastic restrictions to combat the spread of coronavirus a guide to state coronavirus reopenings and lockdowns coronavirus cases have dropped sharply in south korea. what's the secret to its success what's behind south korea's covid- exceptionalism politics and poverty hinder covid- response in latin america ) mobility trend report adjoint sensitivity analysis for differentialalgebraic equations: the adjoint dae system and its numerical solution adam: a method for stochastic optimization this effort was partially funded by the intelligence advanced reseach projects activity (iarpa.) we are grateful to emma wang for help with some of the simulations, and to haluk akay, hyungseok kim and wujie wang for helpful discussions and suggestions. the authors declare no conflicts of interest. key: cord- - pjd authors: saeedian, m.; khalighi, m.; azimi-tafreshi, n.; jafari, g. r.; ausloos, m. title: memory effects on epidemic evolution: the susceptible-infected-recovered epidemic model date: - - journal: phys rev e doi: . /physreve. . sha: doc_id: cord_uid: pjd memory has a great impact on the evolution of every process related to human societies. among them, the evolution of an epidemic is directly related to the individuals' experiences. indeed, any real epidemic process is clearly sustained by a non-markovian dynamics: memory effects play an essential role in the spreading of diseases. including memory effects in the susceptible-infected-recovered (sir) epidemic model seems very appropriate for such an investigation. thus, the memory prone sir model dynamics is investigated using fractional derivatives. the decay of long-range memory, taken as a power-law function, is directly controlled by the order of the fractional derivatives in the corresponding nonlinear fractional differential evolution equations. here we assume “fully mixed” approximation and show that the epidemic threshold is shifted to higher values than those for the memoryless system, depending on this memory “length” decay exponent. we also consider the sir model on structured networks and study the effect of topology on threshold points in a non-markovian dynamics. furthermore, the lack of access to the precise information about the initial conditions or the past events plays a very relevant role in the correct estimation or prediction of the epidemic evolution. such a “constraint” is analyzed and discussed. the study of epidemiology, concerning the dynamical evolution of diseases within a population, has attracted much interest during recent years [ ] . mathematical models of infectious diseases have been developed in order to integrate realistic aspects of disease spreading [ ] [ ] [ ] [ ] . a simple and commonly studied model, introduced by kermack and mckendrick, is the susceptible-infected-recovered (sir) model [ ] . in this model, populations can be in each of three states: susceptible, infected, and recovered (removed), denoted by s, i, and r, respectively. originally, it is assumed that susceptible individuals become infected with a rate proportional to the fraction of infected individuals in the overall population (fully mixed approximation) and infected individuals recover at a constant rate. the epidemic process presents a (percolation) transition between a phase, in which the disease outbreak reaches a finite fraction of the population, and a phase with only a limited number of infected individuals [ , ] . the model has also been investigated for population on lattices (e.g., [ ] [ ] [ ] ) or on networks (e.g., [ ] [ ] [ ] ). for simplicity, we will keep the "medical epidemic" vocabulary hereafter. however, the model has also been interesting for describing nonmedical epidemics, such as for financial bubbles [ , ] , migration [ ] , opinion formation [ , ] , or internet "worm propagation" [ ] [ ] [ ] . sir models with distributed delay and with discrete delay have also been studied [ , ] . in the usual sir model, it is assumed that all contacts transmit the disease with the same probability. moreover, the transmission and recovery coefficients are constant. hence the state of system at each time does not depend on the previous history of the system: it is a memoryless, so-called markovian, process. however, real surveys show evidence of a non-markovian spreading process [ , ] in agreement with common expectation. the epidemic processes evolution and control, in human societies, cannot be considered without any memory effect. when a disease spreads within a human population, the experience or knowledge of individuals about that disease should affect their response [ ] . if people know about the history of a certain disease in the area where they live, they use different precautions, such as vaccination, when possible. thus, some endogenous controlled suppression of the spreading is expected, although other factors can help [ ] [ ] [ ] . however, knowledge about the history of a disease does not have the same influence at all times. experience about the prevalence of a disease and precautions related to the "old times" are not always applicable or recommended, hence people tend to follow new strategies against the diseases. in other words, memory of the earlier times could have less effect on the present situation, as compared to more recent times. it can be expected that long-range memory effects decay in time more slowly than an exponential decay, but can typically behave like a power-law damping function. while much effort has been made so far to determine exact epidemic thresholds in markovian epidemic models [ ] [ ] [ ] [ ] [ ] , few works have been devoted to study the non-markovian aspects of epidemic processes [ , ] . furthermore, in this work we focus on long-range memory effects, which means arbitrarily long history can be included. that is in contrast to short-term memory effects, which have been extensively studied. for instance, dodds and watts [ ] introduced a general model of contagion considering memory of past exposures to a contagious influence. the authors have argued that their model can fall into one of three universal classes, due to the behavior of fixed point curves. also, in [ ] [ ] [ ] , the authors consider "implicit memory" by applying asynchronous adapting in disease propagation. they show that this type of memory can lead to a first-order phase transition in outbreaks, thus hysteresis can arise in such models [ ] . it is here briefly recalled that fractional calculus is a valuable tool to observe the influence of memory effects on the dynamics of systems [ ] [ ] [ ] [ ] [ ] [ ] , and has been recently used in epidemiological models [ ] [ ] [ ] [ ] . typically, the evolution of epidemiological models is described with differential equations, the derivatives being of integer order. by replacing the ordinary time derivative by a fractional derivative, a time correlation function or memory kernel appears, thereby making the state of the system dependent on all past states. thus, it seems that such a method based on derivatives with noninteger order, as introduced by caputo for geophysics problems [ ] , is a very proper formalism for such non-markovian problems. moreover, caputo's formalism provides the advantage that it is not necessary to define the fractional order initial conditions, when solving such differential equations [ ] [ ] [ ] [ ] . furthermore, the time correlation function, in the definition of caputo fractional derivative, is a power-law function, which is flexible enough to reflect the fact that the contribution of more early states is noticeably less relevant than the contribution of more recent ones on the present state of the dynamical system. most of the previous works have studied the epidemiological models with fractional order differential equations from a mathematical point of view. they mainly focused on presenting effective mathematical methods in order to solve the corresponding differential equations [ , [ ] [ ] [ ] . for instance, in [ ] a mathematical tool (the multistep generalized differential transform method) is introduced to approximate the numerical solution of the sir model with fractional differential equations. also in [ ] the authors use fractional order differential equations for epidemic models and concentrate on the equilibrium points of the models and their asymptotic stability of differential equations of fractional order. other variations of the sir model with fractional derivatives have also been studied. for instance, seo et al. introduced the sir epidemic model with square root interaction of the susceptible and infected individuals and discussed the local stability analysis of the model [ ] . also in [ ] , numerical solution of the sir epidemic model of fractional order with two levels of infection for the transmission of viruses in a computer network has been presented. in all previous works, the authors rarely discuss the effect of fractional order differential equations and memory on the epidemic thresholds and the macroscopic behavior of epidemic outbreaks. hence, one question remains; we address it in this paper: how does the system robustness change if memory is included in the sir model? we also use the fractional differential equations, describing the sir model on structured networks, to see the effect of topology on the evolution of the sir model including memory effects. furthermore the lack of access to accurate information on initial conditions sometimes leads to doubt about epidemic evolution predictions [ ] . the same type of difficulty occurs in related problems, such as in opinion formation [ , ] . moreover, it may also happen in certain cases that individuals do not believe in old strategies in order to avoid the disease. this means that the initial time for taking into account the disease control memory is shifted toward more recent times: thereafter, the dynamics is evolving with a new fraction of susceptible and infected individuals, different from that predicted by the solution of the differential equations. in contrast, the fractional calculus method allows us to choose any arbitrary initial time at which the effect of initial conditions can be introduced on the spreading dynamics with a memory content. the interest of fractional calculus will appear through such aspects in the core of the paper. thus, the paper is organized as follows. in sec. ii, following caputo's approach, we convert the differential equations of the standard sir model to the fractional derivatives, thereby allowing us to consider memory effects. using numerical analysis results (sec. iii), we discuss the influence of memory on the epidemic thresholds in sec. iii a. we also discuss the dynamics of a non-markovian epidemic process, when choosing different initial conditions or modifying the proportions of agents at a given time in sec. iii b. to complete our discussion, we study the dynamics of the model on structured networks in sec. iv. we also point out that we have observed qualitatively similar results for the sis (susceptible-infectedsusceptible) epidemic model. the conclusions are found in sec. v. the evolution of the standard sir model is described by a set of coupled ordinary differential equations for susceptible (s), infected (i ), and recovered(r) individuals, respectively given by in which β and γ are infection and recovery coefficients, respectively. the infected individual makes β contacts per unit time producing new infections within a mean infectious time of order /γ . the evolution of the model is controlled by quantity β/γ , such that above the epidemic threshold, (β/γ ) c , the disease spreads among a finite fraction of individuals. these (ordinary) differential equations describe a markov epidemic process, in which the state of individuals at each time step does not depend on previous steps. the set of eqs. ( ) can be solved iteratively until time t. in particular, the fraction of susceptible individuals at time t, denoted as s t , can be determined. in fact, − s t is the size of outbreaks, i.e., the population that has or has had the disease until time t. in order to observe the influence of memory effects, first we rewrite the differential equations ( ) in terms of time-dependent integrals as follows: in which κ(t − t ) plays the role of a time-dependent kernel and is equal to a delta function δ(t − t ) in a classical markov process. in fact, any arbitrary function can be replaced by a sum of delta functions, thereby leading to a given type of time correlations. a proper choice, in order to include long-term memory effects, can be a power-law function which exhibits a slow decay such that the state of the system at quite early times also contributes to the evolution of the system. this type of kernel guarantees the existence of scaling features as it is often intrinsic in most natural phenomena. thus, let us consider the following power-law correlation function for κ(t − t ): in which < α and (x) denotes the gamma function. the choice of the coefficient / (α − ) and exponent (α − ) allows us to rewrite eqs. ( ) to the form of fractional differential equations with the caputo-type derivative. if this kernel is substituted into eqs. ( ), the right-hand side of the equations, by definition, are fractional integrals of order applying a fractional caputo derivative of order α − on both sides of each eq. ( ), and using the fact the caputo fractional derivative and fractional integral are inverse operators, the following fractional differential equations can be obtained for the sir model: t denotes the caputo derivative of order α, defined for an arbitrary function y(t) as follows [ ] : hence, the fractional derivatives, when introducing a convolution integral with a power-law memory kernel, are useful to describe memory effects in dynamical systems. the decaying rate of the memory kernel (a time-correlation function) depends on α. a lower value of α corresponds to more slowly decaying time-correlation functions (long memory). hence, in some sense, the strength (through the "length") of the memory is controlled by α. as α → , the influence of memory decreases: the system tends toward a memoryless system. note that for simplicity, we assume the same memory contributions (same value of α) for different states of s, i , and r. obviously, more complicated functions than eq. ( ) and taking into account different α i (i = , , ) could be investigated in further work to take into account different time scales. although analytical solutions of eqs. ( ) are hard to obtain for the general case, they can be obtained at the early stage of the epidemic under a linearization approximation. in this case, it turns out that the number of infected individuals behaves as a mittag-leffler function [ ] : in which ζ is a constant, which depends on the initial conditions [ ] . in particular, for α = ζ = , the mittag-leffler function is the exponential function. thus, in the early stage of epidemic dynamics, the growth rate of the infected population in eq. ( ) is positive, if β − γ > . therefore, the number of infected individuals grows exponentially in such a case, for β > γ , as of course is expected for the standard memoryless sir model. the same reasoning applies in order to determine the epidemic threshold for α < . let it be reemphasized that eqs. ( ) consist in a system of coupled nonlinear differential equations of fractional order, in the following general form: where i = , , and y ( ) ,y ( ) ,y ( ) denote s,i,r cases, respectively. also, y (i ) are constants which indicate the initial conditions. to solve the equations, we use the predictor-corrector algorithm, which is well known for obtaining a numerical solution of first-order problems [ ] [ ] [ ] . it is assumed that there exits a unique solution for each of y (i) on the interval [ ,t ] for a given set of initial conditions. considering a uniform grid {t n = nh : n = , , , . . . ,n}, in which n is an integer and h ≡ t /n, each eq. ( ) can be rewritten in a discrete form, where the coefficients b n−k− refer to the contribution of each of the n − past states on the present state of n. the coefficients are given by thereby after solving eq. ( ), numerically, the influence of memory on the evolution of the sir epidemic model can be analyzed. as mentioned in the introduction, let us consider two pertinent aspects successively: the finite time behavior and the role of changing initial conditions. let us compare the evolution of a system including memory effects with the memoryless case. we solve eq. ( ) with initial conditions y ( ) = s = − , y ( ) = i = . figure is zero (with accuracy − ) for small values of β/γ . the specific value of β/γ , in which the epidemic size starts to get a nonzero value, is identified as the epidemic threshold point. the stationary time for a memoryless system (α = ) is t = . with decreasing the value of α (including memory) the system needs much time to reach the stationary state. hence at t = , the threshold point is shifted to the higher value of β/γ . figure shows that the threshold point is increased with decreasing of α for a finite time t. furthermore, as can be seen in fig. , the size of outbreaks decreases for decreasing α. let the interval [t ,t] be the time interval in which memory effects are taken into account. in fig. , we compare the evolution of the model with memory for different values of the finite time t. the memory effects are considered for a weight α = . . it is seen that as time evolves the influence of memory decreases, since memory effects decay in time like fig. . furthermore, for a given β/γ value, it appears that there is more time available for disease spreading, whence more individuals become infected. recall that the dynamics of a non-markovian process is directly influenced by all events from the beginning of the process. however, some loss of information about some period of time in the past may lead one to consider that the influence of memory might not need to be considered as continuous. it may happen, in many social networks, that individuals do not have enough information about the history of a disease, as recent cases and studies indicate; e.g., see [ ] [ ] [ ] [ ] . only after several individuals have already been infected, do people start to increase their knowledge about the disease and take different precautions. the question arises on how the "initial time" at which a non-markovian process is started affects the subsequent dynamics of the process. if two markovian processes start at two different times, the evolution of both processes is identical. however, the scenario is quite different for a non-markovian process, i.e., in which the memory plays a role. this is illustrated through fig. where the fractions of susceptible, infected, and removed individuals are compared in the case of two markov and non-markov epidemic processes. continuous and dashed (black) lines correspond to a system with and without including memory effects, respectively, evolving from the same initial time t = . as can be seen, the fraction of susceptible individuals is greater in a system with inclusion of memory effects with respect to that ignoring the memory [ fig. (a) ]. in other words, the experience and knowledge which individuals have about the disease are obviously helping them to protect themselves against the disease. equivalently, in a system including the memory effects, the infection grows more slowly as seen in fig. (b) . thereafter, consider that a non-markovian process, including memory effects, has evolved until a specific time t . let the process be continuing its evolution, but let the memory of the system be removed at that time. this corresponds to having a new initial time and new initial conditions for the epidemics spreading. the process can be continued without or with memory. the markovian case is trivial thereafter and thus not discussed. instead, consider that memory effects are only taken into account at this starting "new initial time." in other words, let the population ignore the disease control history (memory) until t ; let the system continue its dynamics but taking into account memory effects thereafter from t . the initial conditions for the evolution of the system are now a fraction of susceptible and infected individuals at time t . the curves with square symbols in fig. correspond to what happens for different "new initial times" t = , , for the dynamics of such a non-markovian epidemic process. as can be seen, at the beginning of the dynamics, the fraction of susceptible individuals is reduced, since people do not know about the disease. however, as soon as it is influenced by memory, the system becomes more resilient to the spreading. hence, the fraction of s individuals remains greater as compared to that with a memoryless system, having started at t = . in a similar manner, the fraction of infected and removed individuals deviate from the original one and tend toward the populated states of a memoryless system when the memory from further past times is included. in this case, the curves become closer to the dashed curve corresponding to a memoryless system. that means that the system loses the information related to past times and tends to present a behavior similar to a memoryless system. finally, one can consider "to remove the memory" of an epidemic process at various times. at each time step, the system is supposed to lose (or practically negate) the information about the disease before some "reawareness time" (see also [ ] ) and to continue its dynamics regardless of the past. for illustration, consider the case of such a sudden notice that in this particular illustrative case, the behavior of the system is seen to be close to the dynamics of a memoryless system, since contributions of the memory of the system are sometimes removed. such an illustration points to the interest of the model in order to compare it with the case of epidemics spreading waves [ ] ; for completeness, let it be pointed out that the connection of periodic epidemics to sir models has been already mentioned [ ] : flu is yearly recurrent. notice also that the value of α could be modified at each new awareness time, but this investigation goes outside the present paper. so far we have considered the fully mixed approximation, such that an infected individual is equally likely to spread the disease to any other individual. however, in the real world an individual connects to a small fraction of people. hence, as is well known, more realistic modeling can be studied through networks, where their topology has a significant effect on the epidemic process [ ] [ ] [ ] . for homogeneous networks, each individual has the same number of connections k ≈ k and disease propagates with spreading rate β k . in this case, it is obvious that the epidemic threshold ( β γ ) c is simply replaced by ( β γ ) c k . it is also true for the case of fractional differential equations ( ) . in other words, threshold point in fig. for each value of α is shifted to ( β γ ) c k . now, let us consider heterogeneous scale-free networks with degree distribution p (k) ∼ k −λ . in heterogeneous meanfield approximation, it is assumed that all nodes are statistically equivalent and thus one can consider groups of nodes with the same degree k. with this assumption the ordinary differential equations describing the sir model are given by where and i k , s k , and r k denote the density of infected, susceptible, and removed nodes in each group, respectively. it turned out that in scale-free networks characterized by a degree exponent < λ , there is no epidemic threshold [ ] . following the same procedure presented in sec. ii, we can rewrite eqs. ( ) to the fractional derivatives, as follows: , for a network with degree exponent λ = , we solve eqs. ( ) numerically. figure shows the evolution of the fraction of total infected individuals i(t) = k p (k)i k (t), with considering memory effects with different values of α. while for a memoryless sir model (α = ), the system reaches a stationary state after a short time (t ), the stationary time is increased with decreasing the value of α. furthermore, we obtain the size of outbreaks at a finite time. figure shows − s t , measured with accuracy − until t = for different values of α. as we can see, the epidemic threshold is always zero, as it is for markov epidemic spreading on scale-free networks with λ = . however, the size of epidemic decreases with decreasing α. the same results are obtained for networks with < λ < . however, for λ > , the epidemic threshold is shifted with including the memory, similar to what is observed for the homogeneous networks. memory plays a significant role in the evolution of many real dynamical processes, including the cases of epidemic spreading. here we have reported a study on the evolution of the sir epidemic model, considering memory effects. using the fractional calculus technique, we show that the dynamics of such a system depends on the strength of memory effects, controlled by the order of fractional derivatives α. at finite times, including memory effects, the epidemic threshold (β/γ ) c is shifted to higher values than those for memoryless systems, at values depending on the memory decay rate α. in the case that the model evolves on heterogeneous scale-free networks with < λ , the threshold point is always zero. however, the fraction of individuals who are infected or recovered, is reduced if the memory "length" increases. hence, memory renders the system more robust against the disease spreading. if the epidemic process evolves further in time, for a fixed memory strength, (i) the disease can infect more individuals and (ii) the epidemic threshold is shifted to smaller values and tends to the memoryless case values. furthermore, we have shown the following result: the evolution of an epidemic process, including memory effects, much depends on the fraction of infected individuals at the beginning of the memory effect insertion in the evolution. during a non-markovian epidemic process, if the system abruptly loses its memory at a definite time and if from that time on, one lets the non-markovian process continue again, starting with the number of infected individuals at that time, the dynamics of the system deviates from the basic case, in which the system continuously includes memory effects from the beginning of the process. our observations are obtained from a simple epidemiological model: the sir model. obviously many parameters are here assumed to be constant. we are aware that some, e.g., policy, feedback might influence the parameter values. they may depend on space, groups, and time. external field conditions may also surely influence real aspects. however, we guess that many qualitative behaviors as those presented here are likely to be quite generally found in reality. more advanced epidemic models, based on various types of complex networks, are surely interesting subjects for further investigations, in line with investigations such as, e.g., in [ ] [ ] [ ] [ ] [ ] . we also wish to point out that we have observed qualitatively similar results for the sis epidemic model. finally, we may claim that our results are not limited to the epidemiological ("medical") models but also can be extended for analogous epidemic spreading of rumors, gossip, opinions, religions, and other topics pertinent to epidemics on many social networks. knowledge epidemics and population dynamics models for describing idea diffusion, models of science dynamics: encounters between complexity theory and information sciences introduction to percolation theory the logistic map and the route to chaos: from the beginning to modern applications a mixed abstraction level simulation model of large-scale internet worm infestations measurement and analysis of worm propagation on internet network topology, proceedings of the th international conference on computer communications and networks (icccn, ) (unpublished) climate change and crop production, cabi climate change series fractional calculus: an introduction for physicists applications of fractional calculus in physics fractional differential equations theory and applications of fractional differential equations, north-holland mathematics studies the fracpece subroutine for the numerical solution of differential equations of fractional order, forschung und wissenschaftliches rechnen changing waves: the epidemics of and proc. natl. acad. sci. usa it could be instructive to study fractional order operators within a geometric interpretation (see interesting references in [ ] ). here we compare the time scales in fractional-and integer-order dynamics. to image a geometric interpretation, let us consider the right-sided fractional integral of order α,and write it in the formwhereif we compare eq. (a ) with its counterpart x α t = t v(τ )dτ , in the homogeneous time scheme, the main difference is related to the different time variables t and τ . notice that time variable t t (τ ) has a scaling property. if we take t = kt and τ = kτ , then t t (τ ) = k α t t (τ ). hence, in the fractional order dynamics, the time is "accelerating" in the early time and after that it is "slowing down," as sketched in fig. .in this case, the "passing time" in the two axes of time is not the same. for this reason, in epidemic "fractional" dynamics, the epidemic threshold is shifted to the higher values. a lower α indicates a "stronger" (long-lasting) memory and a more pronounced shift of threshold point. however, if one waits long enough, the same behavior will be observed in both fractional and usual homogeneous time. in fractional dynamics, after a "very long" time, the threshold point coincides with the one appearing in integer-order dynamics. key: cord- -rvg ayp authors: ponce, marcelo; sandhel, amit title: covid .analytics: an r package to obtain, analyze and visualize data from the corona virus disease pandemic date: - - journal: nan doi: nan sha: doc_id: cord_uid: rvg ayp with the emergence of a new pandemic worldwide, a novel strategy to approach it has emerged. several initiatives under the umbrella of"open science"are contributing to tackle this unprecedented situation. in particular, the"r language and environment for statistical computing"offers an excellent tool and ecosystem for approaches focusing on open science and reproducible results. hence it is not surprising that with the onset of the pandemic, a large number of r packages and resources were made available for researches working in the pandemic. in this paper, we present an r package that allows users to access and analyze worldwide data from resources publicly available. we will introduce the covid .analytics package, focusing in its capabilities and presenting a particular study case where we describe how to deploy the"covid .analytics dashboard explorer". in a novel type of corona virus was first reported, originally in the province of hubei, china. in a time frame of months this new virus was capable of producing a global pandemic of the corona virus disease (covid ), which can end up in a severe acute respiratory syndrome (sars-cov- ). the origin of the virus is still unclear [ , , ] , although some studies based on genetic evidence, suggest that it is quite unlikely that this virus was human made in a laboratory, but instead points towards cross-species transmission [ , ] . although this is not the first time in the human history when humanity faces a pandemic, this pandemic has unique characteristics. for starting the virus is "peculiar" as not all the infected individuals experience the same symptoms. some individuals display symptoms that are similar to the ones of a common cold or flu while other individuals experience serious symptoms that can cause death or hospitalization with different levels of severity, including staying in intensive-care units (icu) for several weeks or even months. a recent medical survey shows that the disease can transcend pulmonary manifestations affecting several other organs [ ] . studies also suggest that the level of severity of the disease can be linked to previous conditions [ ] , gender [ ] , or even blood type [ ] but the fundamental and underlying reasons still remain unclear. some infected individuals are completely asymptomatic, which makes them ideal vectors for disseminating the virus. this also makes very difficult to precisely determine the transmission rate of the disease, and it is argued that in part due to the peculiar characteristics of the virus, that some initial estimates were underdetermining the actual value [ ] . elderly are the most vulnerable to the disease and reported mortality rates vary from to % depending on the geographical location. in addition to this, the high connectivity of our modern societies, make possible for a virus like this to widely spread around the world in a relatively short period of time. what is also unprecedented is the pace at which the scientific community has engaged in fighting this pandemic in different fronts [ ] . technology and scientific knowledge are and will continue playing a fundamental role in how humanity is facing this pandemic and helping to reduce the risk of individuals to be exposed or suffer serious illness. techniques such as dna/rna sequencing, computer simulations, models generations and predictions, are nowadays widely accessible and can help in a great manner to evaluate and design the best course of action in a situation like this [ ] . public health organizations are relying on mathematical and data-driven models (e.g. [ ] ), to draw policies and protocols in order to try to mitigate the impact on societies by not suffocating their health institutions and resources [ ] . specifically, mathematical models of the evolution of the virus spread, have been used to establish strategies, like social distancing, quarantines, self-isolation and staying at home, to reduce the chances of transmission among individuals. usually, vaccination is also another approach that emerges as a possible contention strategy, however this is still not a viable possibility in the case of covid , as there is not vaccine developed yet [ , ] . simulations of the spread of virus have also shown that among the most efficient ways to reduce the spread of the virus are [ ] : increasing social distancing, which refers to staying apart from individuals so that the virus can not so easily disperse among individuals; improving hygiene routines, such as proper hand washing, use of hand sanitizer, etc. which would eventually reduce the chances of the virus to remain effective; quarantine or self-isolation, again to reduce unnecessary exposure to other potentially infected individuals. of course these recommendations based on simulations and models can be as accurate and useful as the simulations are, which ultimately depend on the value of the parameters used to set up the initial conditions of the models. moreover these parameters strongly depend on the actual data which can be also sensitive to many other factors, such as data collection or reporting protocols among others [ ] . hence counting with accurate, reliable and up-to-date data is critical when trying to understand the conditions for spreading the virus but also for predicting possible outcomes of the epidemic, as well as, designing proper containment measurements. similarly, being able to access and process the huge amount of genetic information associated with the virus has proben to shred light into the disease's path [ , ] . encompassing these unprecedented times, another interesting phenomenon has also occurred, in part related to a contemporaneous trend in how science can be done by emphasizing transparency, reproducibility and robustness: an open approach to the methods and the data; usually refer as open science. in particular, this approach has been part for quite sometime of the software developer community in the so-called open source projects or codes. this way of developing software, offers a lot of advantages in comparison to the more traditional and closed, proprietary approaches. for starting, it allows that any interested party can look at the actual implementation of the code, criticize, complement or even contribute to the project. it improves transparency, and at the same time, guarantees higher standards due to the public scrutiny; which at the end results in benefiting every one: the developers by increasing their reputation, reach and consolidating a widely validated product and the users by allowing direct access to the sources and details of the implementation. it also helps with reproducibility of results and bugs reports and fixes. several approaches and initiatives are taking the openness concepts and implementing in their platforms. specific examples of this have drown the internet, e.g. the surge of open source powered dashboards [ ] , open data repositories, etc. another example of this is for instance the number of scientific papers related to covid published since the beginning of the pandemic [ ] , the amount of data and tools developed to track the evolution of pandemic, etc. [ ] . as a matter of fact, scientists are now drowning in publications related to the covid [ , ] , and some collaborative and community initiatives are trying to use machine learning techniques to facilitate identify and digest the most relevant sources for a given topic [ , , ] . the "r language and environment for statistical computing" [ , ] is not exception here. moreover, promoting and based on the open source and open community principles, r has empowered scientists and researchers since its inception. not surprisingly then, the r community has contributed to the official cran [ ] repository already with more than a dozen of packages related to the covid pandemic since the beginning of the crisis. in particular, in this paper we will introduce and discuss the covid .analytics r package [ ] , which is mainly designed and focus in an open and modular approach to provide researchers quick access to the latest reported worldwide data of the covid cases, as well as, analytical and visualization tools to process this data. this paper is organized as follow: in sec. we describe the covid .analytics , in sec. we present some examples of data analysis and visualization, in sec. we describe in detail how to deploy a web dashboard employing the capabilities of the covid .analytics package providing full details on the implementation so that this procedure can be repeated and followed by interested users in developing their own dashboards. finally we summarize some conclusions in sec. . the covid .analytics r package [ ] allows users to obtain live worldwide data from the novel covid . it does this by accessing and retrieving the data publicly available and published by two main sources: the "covid- data repository by the center for systems science and engineering (csse) at johns hopkins university" [ ] for the worldwide and us data, and the city of toronto for the toronto data [ ] . the package also provides basic analysis and visualization tools and functions to investigate these datasets and other ones structured in a similar fashion. the covid .analytics package is an open source tool, which its main implementation and api is the r package [ ] . in addition to this, the package has a few more adds-on: • a central github repository, https://github.com/mponce /covid .analytics where the latest development version and source code of the package are available. users can also submit tickets for bugs, suggestions or comments using the "issues" tab. • a rendered version with live examples and documentation also hosted at github pages, https: //mponce .github.io/covid .analytics/; • a dashboard for interactive usage of the package with extended capabilities for users without any coding expertise, https://covid analytics.scinet.utoronto.ca. we will discuss the details of the implementation in sec. . • a "backup" data repository hosted at github, https://github.com/mponce /covid analytics. datasets -where replicas of the live datasets are stored for redundancy and robust accesibility sake (see fig. ). one of the main objectives of the covid .analytics package is to make the latest data from the reported cases of the current covid pandemic promptly available to researchers and the scientific community in what follows we describe the main functionalities from the package regarding data accessibility. the covid .data function allows users to obtain realtime data about the covid reported cases from the jhu's ccse repository, in the following modalities: • aggregated data for the latest day, with a great 'granularity' of geographical regions (ie. cities, provinces, states, countries) • time series data for larger accumulated geographical regions (provinces/countries) • deprecated : we also include the original data style in which these datasets were reported initially. the datasets also include information about the different categories (status) "confirmed"/"deaths"/"recovered" of the cases reported daily per country/region/city. this data-acquisition function, will first attempt to retrieve the data directly from the jhu repository with the latest updates. if for what ever reason this fails (eg. problems with the connection) the package will load a preserved "image" of the data which is not the latest one but it will still allow the user to explore this older dataset. in this way, the package offers a more robust and resilient approach to the quite dynamical situation with respect to data availability and integrity. in addition to the data of the reported cases of covid , the covid .analytics package also provides access to genomics data of the virus. the data is obtained from the national center for biotechnology information (ncbi) databases [ , ] . table shows the functions available in the covid .analytics package for accessing the reported cases of the covid pandemic. the functions can be divided in different categories, depending on what data they provide access to. for instance, they are distinguished between agreggated and time series data sets. they are also grouped by specific geographical locations, i.e. worldwide, united states of america (us) and the city of toronto (ontario, canada) data. the time series data is structured in an specific manner with a given set of fields or columns, which resembles the following format: "province.state" | "country.region" | "lat" | "long" | ... sequence of dates ... one of the modular features this package offers is that if an user has data structured in a data.frame organized as described above, then most of the functions provided by the covid .analytics package for analyzing time series data will just work with the user's defined data. in this way it is possible to add new data sets to the ones that can be loaded using the repositories predefined in this package and extend the analysis capabilities to these new datasets. sec. . presents an example of how external or synthetic data has to be structured so that can use the function from the covid .analytics package. it is also recommended to check the compatibility of these datasets using the data integrity and consistency checks functions described in the following section. due to the ongoing and rapid changing situation with the covid- pandemic, sometimes the reported data has been detected to change its internal format or even show some anomalies or inconsistencies . for instance, in some cumulative quantities reported in time series datasets, it has been observed that these quantities instead of continuously increase sometimes they decrease their values which is something that should not happen . we refer to this as an inconsistency of "type ii". some negative values have been reported as well in the data, which also is not possible or valid; we call this inconsistency of "type i". when this occurs, it happens at the level of the origin of the dataset, in our case, the one obtained from the jhu/ccesgis repository [ ] . in order to make the user aware of this, we implemented two consistency and integrity checking functions: • consistency.check: this function attempts to determine whether there are consistency issues within the data, such as, negative reported value (inconsistency of "type i") or anomalies in the cumulative quantities of the data (inconsistency of "type ii") • integrity.check: this determines whether there are integrity issues within the datasets or changes to the structure of the data alternatively we provide a data.checks function that will execute the previous described functions on an specified dataset. data integrity. it is highly unlikely that the user would face a situation where the internal structure of the data or its actual integrity may be compromised. however if there are any suspicious about this, it is possible to use the integrity.check function in order to verify this. if anything like this is detected we urge users to contact us about it, e.g. https://github.com/mponce /covid .analytics/issues. data consistency. data consistency issues and/or anomalies in the data have been reported several times these are claimed, in most of the cases, to be missreported data and usually are just an insignificant number of the total cases. having said that, we believe that the user should be aware of these situations and we recommend using the consistency.check function to verify the dataset you will be working with. nullifying spurious data. in order to deal with the different scenarios arising from incomplete, inconsistent or missreported data, we provide the nullify.data function, which will remove any potential entry in the data that can be suspected of these incongruencies. in addition ot that, the function accepts an optional argument stringent=true, which will also prune any incomplete cases (e.g. with nas present). similarly to the rapid developments and updates in the reported cases of the disease, the sequencing of the virus is moving almost at equal pace. that's why the covid .analytics package provides access to good number of the genomics data currently available. the covid .genomic.data function allows users to obtain the covid 's genomics data from ncbi's databases [ ] . the type of genomics data accessible from the package is described in table . although the package attempts to provide the latest available genomic data, there are a few important details and differences with respect to the reported cases data. for starting, the amount of genomic information available is way larger than the data reporting the number of cases which adds some additional constraints when retrieving this data. in addition to that, the hosting servers for the genomic databases impose certain limits on the rate and amounts of downloads. in order to mitigate these factors, the covid .analytics package employs a couple of different strategies as summarized below: • most of the data will be attempted to be retrieved live from ncbi databases -same as using src='livedata'. • if that is not possible, the package keeps a local version of some of the largest datasets (i.e. genomes, nucleotides and proteins) which might not be up-to-date -same as using src='repo'. • the package will attempt to obtain the data from a mirror server with the datasets updated on a regular basis but not necessarily with the latest updates -same as using src='local'. these sequence of steps are implemented in the package using trycath() exceptions in combination with recursivity, i.e. the retrieving data function calling itself with different variations indicating which data source to use. as the covid .analytics package will try present the user with the latest data sets possible, different strategies (as described above) may be in place to achieve this. one way to improve the realiability of the access to and avialability of the data is to use a series of replicas of the datasets which are hosted in different locations. fig. summarizes the different data sources and point of access that the package employs in order to retrieve the data and keeps the latest datasets available. genomic data as mentioned before is accessed from ncbi databases. this is implemented in the covid .genomic.data function employing the ape [ ] and rentrez [ ] packages. in particular the proteins datasets, with more than k entries, is quite challenging to obtain "live". as a matter of fact, the covid .genomic.data function accepts an argument to specify whether this should be the case or not. if the src argument is set to 'livedata' then the function will attempt to download the proteins list directly from ncbi databases. if this fail, we recommend using the argument src='local' which will provide an stagered copy of this dataset at the moment in which the package was submitted to the cran repository, meaning that is quite likely this dataset won't be complete and most likely outdated. additionaly, we offer a second replica of the datasets, located at https://github.com/mponce /covid analytics.datasets where all datasets are updated periodically, this can be accessed using the argument src='repo'. in addition to the access and retrieval of the data, the covid .analytics package includes several functions to perform basic analysis and visualizations. table shows the list of the main functions in the package. description main type of output data acquisition covid .data obtain live* worldwide data for covid virus, from the jhu's ccse repository [ ] return dataframes/list with the collected data covid .toronto.data obtain live* data for covid cases in the city of toronto, on canada, from the city of toronto reports [ ] return dataframe/list with the collected data covid .us.data obtain live* us specific data for covid virus, from the jhu's ccse repository [ ] return dataframe with the collected data genomics covid .genomic.data c .refgenome.data c .fasta.data c .ptree.data c .nps.data c .np_fasta.data obtain genomic data from ncbi databases -see table in the reported data, this is mostly given by the province/city and/or country/region. in order to facilitate the processing of locations that are located geo-politically close, the covid .analytics package provides a way to identify regions by indicating the corresponding continent's name where they are located. i.e. "south america", "north america", "central america", "america", "europe", "asia" and "oceania" can be used to process all the countries within each of these regions. the geographicalregions function is the one in charge of determining which countries are part of what continent and will display them when executing geographicalregions(). in this way, it is possible to specify a particular continent and all the countries in this continent will be processed without needing to explicitly specifying all of them. reports. as the amount of data available for the recorded cases of covid can be overwhelming, and in order to get a quick insight on the main statistical indicators, the covid .analytics package includes the report.summary function, which will generate an overall report summarizing the main statistical estimators for the different datasets. it can summarize the "time series" data (when indicating cases.to.process="ts"), the "aggregated" data (cases.to.process="agg") or both (cases.to.process="all"). the default will display the top entries in each category, or the number indicated in the nentries argument, for displaying all the records just set nentries= . the function can also target specific geographical location(s) using the geo.loc argument. when a geographical location is indicated, the report will include an additional "rel.perc" column for the confirmed cases indicating the relative percentage among the locations indicated. similarly the totals displayed at the end of the report will be for the selected locations. in each case ("ts" or/and "agg") will present tables ordered by the different cases included, i.e. confirmed infected, deaths, recovered and active cases. the dates when the report is generated and the date of the recorded data will be included at the beginning of each table. it will also compute the totals, averages or mean values, standard deviations and percentages of various quantities, i.e. • it will determine the number of unique locations processed within the dataset • it will compute the total number of cases per case type • percentages -which are computed as follow: for the "confirmed" cases, as the ratio between the corresponding number of cases and the total number of cases, i.e. a sort of "global percentage" indicating the percentage of infected cases with respect to the rest of the world covid .analytics "internal" rsync/git -when a new release is push to cran "internal" scripts src="livedata" src="repo" src="local" https://github.com/mponce /covid analytics.datasets figure : schematic of the data acquision flows between the covid .analytics package and the different sources of data. dark and solid/dashed lines represent api functions provided by the package accesible to the users. dotted lines are "internal" mechanisms employed by the package to synchronize and update replicas of the data. data acquisition from ncbi servers is mostly done utilizing the ape [ ] and rentrez [ ] packages. for "confirmed" cases, when geographical locations are specified, a "relative percentage" is given as the ratio of the confirmed cases over the total of the selected locations for the other categories, "deaths"/"recovered"/"active", the percentage of a given category is computed as the ratio between the number of cases in the corresponding category divided by the "confirmed" number of cases, i.e. a relative percentage with respect to the number of confirmed infected cases in the given region • for "time series" data: it will show the delta (change or variation) in the last day, daily changes day before that (t − ), three days ago (t − ), a week ago (t − ), two weeks ago (t − ) and a month ago (t − ) when possible, it will also display the percentage of "recovered" and "deaths" with respect to the "confirmed" number of cases the column "globalperc" is computed as the ratio between the number of cases for a given country over the total of cases reported -the "global perc. average (sd: standard deviation)" is computed as the average (standard deviation) of the number of cases among all the records in the data -the "global perc. average (sd: standard deviation) in top x" is computed as the average (standard deviation) of the number of cases among the top x records a typical output of the summary.report for the "time series" data, is shown in the example in sec. . in addition to this, the function also generates some graphical outputs, including pie and bar charts representing the top regions in each category; see fig. . totals per location & growth rate. it is possible to dive deeper into a particular location by using the tots.per.location and growth.rate functions. these functions are capable of processing different types of data, as far as these are "time series" data. it can either focus in one category (eg. "ts-confirmed", "ts-recovered", "ts-deaths",) or all ("ts-all"). when these functions detect different types of categories, each category will be processed separately. similarly the functions can take multiple locations, ie. just one, several ones or even "all" the locations within the data. the locations can either be countries, regions, provinces or cities. if an specified location includes multiple entries, eg. a country that has several cities reported, the functions will group them and process all these regions as the location requested. totals per location. the tots.per.location function will plot the number of cases as a function of time for the given locations and type of categories, in two plots: a log-scale scatter one a linear scale bar plot one. when the function is run with multiple locations or all the locations, the figures will be adjusted to display multiple plots in one figure in a mosaic type layout. additionally, the function will attempt to generate different fits to match the data: • an exponential model using a linear regression method • a poisson model using a general linear regression method • a gamma model using a general linear regression method the function will plot and add the values of the coefficients for the models to the plots and display a summary of the results in the console. it is also possible to instruct the function to draw a "confidence band" based on a moving average, so that the trend is also displayed including a region of higher confidence based on the mean value and standard deviation computed considering a time interval set to equally dividing the total range of time over equally spaced intervals. the function will return a list combining the results for the totals for the different locations as a function of time. growth rate. the growth.rate function allows to compute daily changes and the growth rate defined as the ratio of the daily changes between two consecutive dates. the growth.rate function shares all the features of the tots.per.location function as described above, i.e. can process the different types of cases and multiple locations. the graphical output will display two plots per location: • a scatter plot with the number of changes between consecutive dates as a function of time, both in linear scale (left vertical axis) and log-scale (right vertical axis) combined • a bar plot displaying the growth rate for the particular region as a function of time. when the function is run with multiple locations or all the locations, the figures will be adjusted to display multiple plots in one figure in a mosaic type layout. in addition to that, when there is more than one location the function will also generate two different styles of heatmaps comparing the changes per day and growth rate among the different locations (vertical axis) and time (horizontal axis). furthermore, if the interactivefig=true argument is used, then interactive heatmaps and d-surface representations will be generated too. some of the arguments in this function, as well as in many of the other functions that generate both static and interactive visualizations, can be used to indicate the type of output to be generated. table lists some of these arguments. in particular, the arguments controlling the interactive figures -interactivefig and interactive.display-can be used in combination to compose an interactive figure to be captured and used in another application. for instance, when interactive.display is turned off but interactivefig=true, the function will return the interactive figure, so that it can be captured and used for later purposes. this is the technique employed when capturing the resulting plots in the covid .analytics dashboard explorer as presented in sec. . . finally, the growth.rate function when not returning an interactive figure, will return a list combining the results for the "changes per day" and the "growth rate" as a function of time, i.e. when interactivefig is not specified or set to false (which its default value) or when interactive.display=true. when is turned off, but interactivefig=true, the function will return the interactive figure, so that it can be captured and used for later purposes. trends in daily changes. the covid .analytics package provides three different functions to visualize the trends in daily changes of reported cases from time series data. • single.trend, allows to inspect one single location, this could be used with the worldwide data sliced by the corresponding location, the toronto data or the user's own data formatted as "time series" data. • mtrends, is very similar to the single.trend function, but accepts multiple or single locations generating one plot per location requested; it can also process multiple cases for a given location. • itrends function to generate an interactive plot of the trend in daily changes representing changes in number of cases vs total number of cases in log-scale using splines techniques to smooth the abrupt variations in the data the first two functions will generate "static" plots in a compose with different insets: • the main plot represents daily changes as a function of time • the inset figures in the top, from left to right: total number of cases (in linear and semi-log scales), changes in number of cases vs total number of cases changes in number of cases vs total number of cases in log-scale • the second row of insets, represent the "growth rate" (as defined above) and the normalized growth rate defined as the growth rate divided by the maximum growth rate reported for this location plotting totals. the function totals.plt will generate plots of the total number of cases as a function of time. it can be used for the total data or for a specific or multiple locations. the function can generate static plots and/or interactive ones, as well, as linear and/or semi-log plots. plotting cases in the world. the function live.map will display the different cases in each corresponding location all around the world in an interactive map of the world. it can be used with time series data or aggregated data, aggregated data offers a much more detailed information about the geographical distribution. the covid .analytics package allows users to model the dispersion of the disease by implementing a simple susceptible-infected-recovered (sir) model [ , ] . the model is implemented by a system of ordinary differential equations (ode), as the one shown by eq.( ). where s represents the number of susceptible individuals to be infected, i the number of infected individuals and r the number of recovered ones at a given moment in time. the coefficients β and γ are the parameters controlling the transition rate from s to i and from i to r respectively; n is the total number of individuals, i.e. n = s(t) + i(t) + r(t); which should remain constant, i.e. eq.( ) can be written in terms of the normalized quantities, although the ode sir model is non-linear, analytical solutions have been found [ ] . however the approach we follow in the package implementation is to solve the ode system from eq.( ) numerically. the function generate.sir.model implements the sir model from eq.( ) using the actual data from the reported cases. the function will try to identify data points where the onset of the epidemic began and consider the following data points to generate proper guesses for the two parameters describing the sir ode system, i.e. β and γ. it does this by minimizing the residual sum of squares (rss) assuming one single explanatory variable, i.e. the sum of the squared differences between the number of infected cases i(t) and the quantity predicted by the modelĨ(t), the ode given by eq.( ) is solved numerically using the ode function from the desolve and the minimization is tackled using the optim function from base r. after the solution for eq.( ) is found, the function will provide details about the solution, as well as, plot the quantities s(t), i(t), r(t) in a static and interactive plot. the generate.sir.model function also estimates the value of the basic reproduction number or basic reproduction ratio, r , defined as, which can be considered as a measure of the average expected number of new infections from a single infection in a population where all subjects can be susceptible to get infected. the function also computes and plots on demand, the force of infection, defined as, f inf ection = βi(t), which measures the transition rate from the compartment of susceptible individuals to the compartment of infectious ones. for exploring the parameter space of the sir model, it is possible to produce a series of models by varying the conditions, i.e. range of dates considered for optimizing the parameters of the sir equation, which will effectively "sweep" a range for the parameters β, γ and r . this is implemented in the function sweep.sir.models, which takes a range of dates to be used as starting points for the number of cases used to feed into the generate.sir.model producing as many models as different ranges of dates are indicated. one could even use this in combination to other resampling or monte carlo techniques to estimate statistical variability of the parameters from the model. in this section we will present some basic examples of how to use the main functions from the covid .analytics package. we will begin by installing the covid .analytics package. this can be achieved in two alternative ways: . installing the latest stable version of the package directly from the cran repository. this can be done within an r session using the install.packages function, i.e. > install.packages("covid .analytics") . installing the development version from the package's github repository, https://github.com/ mponce /covid .analytics using the devtools package [ ] and its install_github function. i.e. # begin by installing devtools if not installed in your system > install.packages("devtools") # install the covid .analytics packages from the github repo > devtools::install_github("mponce /covid .analytics") after having installed the covid .analytics package, for accessing its functions, the package needs to be loaded using r's library function, i.e. the covid .analytics uses a few additional packages which are installed automatically if they are not present in the system. in particular, readxl is used to access the data from the city of toronto [ ] , ape is used for pulling the genomics data from ncbi; plotly and htmlwidgets are used to render the interactive plots and save them in html documents, desolve is used to solve the differential equations modelling the spread of the virus, and gplots, pheatmap are used to generate heatmaps. lst. shows how to use the covid .data function to obtain data in different cases. # obtain all the records combined for " confirmed " , " deaths " and " recovered " cases # for the global ( worldwide ) * aggregated * data covid . data . allcases <-covid . data () # obtain time series data for global " confirmed " cases covid . confirmed . cases <-covid . data ( " ts -confirmed " ) # reads all possible datasets , returning a list covid . all . datasets <-covid . data ( " all " ) # reads the latest aggregated data of the global cases covid . all . agg . cases <-covid . data ( " aggregated " ) # reads time series data for global casualties covid . ts . deaths <-covid . data ( " ts -deaths " ) # read " time series " data for the city of toronto toronto . ts . data <-covid . data ( " ts -toronto " ) # this can be also done using the covid . toronto . data () fn tor . ts . data <-covid . toronto . data () # or get the original data as reported by the city of toronto tor . df . data <-covid . toronto . data ( data . fmr = " orig " ) # retrieve us time series data of confirmed cases us . confirmed . cases <-covid . data ( " ts -confirmed -us " ) # retrieve us time series data of death cases us . deaths . cases <-covid . data ( " ts -deaths -us " ) # or both cases combined us . cases <-covid . us . data () listing : reading data from reported cases of covid using the covid .analytics package. in general, the reading functions will return data frames. exceptions to this, are when the functions need to return a more complex output, e.g. when combining "all" type of data or when requested to obtain the original data from the city of toronto (see details in table ). in these cases, the returning object will be a list containing in each element dataframes corresponding to the particular type of data. in either case, the structure and overall content can be quickly assessed by using r's str or summary functions. one useful information to look at after loading the datasets, would be to identify which locations/regions have reported cases. there are at least two main fields that can be used for that, the columns containing the keywords: 'country' or 'region' and 'province' or 'state'. lst. show examples of how to achieve this using partial matches for column names, e.g. "country" and "province". # read a data set data <-covid . data ( " ts -confirmed " ) # look at the structure and column names str ( data ) names ( data ) # find ' country ' column country . col <-pmatch ( " country " , names ( data ) ) # slice the countries countries <-data [ , country . col ] # list of countries print ( unique ( countries ) ) # sorted table of countries , may include multiple entries print ( sort ( table ( countries ) ) ) # find ' province ' column prov . col <-pmatch ( " province " , names ( data ) ) # slice the provinces provinces <-data [ , prov . col ] # list of provinces print ( unique ( provinces ) ) # sorted table of provinces , may include multiple entries print ( sort ( table ( provinces ) ) ) listing : identifying geographical locations in the data sets. an overall view of the current situation at a global or local level can be obtained using the report.summary function. lst. shows a few examples of how this function can be used. # a quick function to overview top cases per region for time series and aggregated records report . summary () # save the tables into a text file named ' covid -summaryreport _ currentdate . txt ' # where * currrentdate * is the actual date report . summary ( savereport = true ) # summary report for an specific location with default number of entries report . summary ( geo . loc = " canada " ) # summary report for an specific location with top report . summary ( nentries = , geo . loc = " canada " ) # it can combine several locations report . summary ( nentries = , geo . loc = c ( " canada " ," us " ," italy " ," uruguay " ," argentina " ) ) a typical output of the report generation tool is presented in lst. . typical output of the report.summary function. this particular example was generated using report.summary(nentries= ,graphical.output=true,savereport=true), which indicates to consider just the top entries, generate a graphical output as shown in fig. and a to save text file including the report which is the one shown here.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- # #### ts -confirmed cases --data dated : - - :: - - : : # #### ts -deaths cases --data dated : - - :: - - : : # #### ts -recovered cases --data dated : - - :: - - : : # #### aggregated data --ordered by confirmed cases --data dated : - - :: - - : : # #### aggregated data --ordered by deaths cases --data dated : - - :: - - : : # #### aggregated data --ordered by recovered cases --data dated : - - :: - - : : # #### aggregated data --ordered by active cases --data dated : - - :: - - : : * statistical estimators computed considering independent reported entries * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * overall summary * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * statistical estimators computed considering / / independent reported entries per case -type * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * a daily generated report is also available from the covid .analytics documentation site, https: //mponce .github.io/covid .analytics/. the covid .analytics package allows users to investigate total cumulative quantities per geographical location with the totals.per.location function. examples of this are shown in lst. . # totals for confirmed cases for " ontario " tots . per . location ( covid . confirmed . cases , geo . loc = " ontario " ) # total for confirmed cases for " canada " tots . per . location ( covid . confirmed . cases , geo . loc = " canada " ) # total nbr of confirmed cases in hubei including a confidence band based on moving average tots . per . location ( covid . confirmed . cases , geo . loc = " hubei " , confbnd = true ) # total nbr of deaths for " mainland china " tots . per . location ( covid . ts . deaths , geo . loc = " china " ) # ## # read the time series data for all the cases all . data <-covid . data ( 'ts -all ') # run on all the cases tots . per . location ( all . data , " japan " ) # ## # total for death cases for " all " the regions tots . per . location ( covid . ts . deaths ) # or just tots . per . location ( covid . data ( " ts -confirmed " ) ) listing : calculation of totals per country/region/province. in addition to the graphical output as shown in fig. , the function will provide details of the models fitted to the data. similarly, utilizing the growth.rate function is possible to compute the actual growth rate and daily changes for specific locations, as defined in sec. . . lst. includes examples of these. # read time series data for confirmed cases ts . data <-covid . data ( " ts -confirmed " ) # compute changes and growth rates per location for all the countries growth . rate ( ts . data ) # compute changes and growth rates per location for ' italy ' growth . rate ( ts . data , geo . loc = " italy " ) # compute changes and growth rates per location for ' italy ' and ' germany ' growth . rate ( ts . data , geo . loc = c ( " italy " ," germany " ) ) # #### # combining multiple geographical locations : # obtain time series data tsconfirmed <-covid . data ( " ts -confirmed " ) # explore different combinations of regions / cities / countries # when combining different locations , heatmaps will also be generated comparing the trends among these locations growth . rate ( tsconfirmed , geo . loc = c ( " italy " ," canada " ," ontario " ," quebec " ," uruguay " ) ) growth . rate ( tsconfirmed , geo . loc = c ( " hubei " ," italy " ," spain " ," united ␣ states " ," canada " ," ontario " ," quebec " ," uruguay " ) ) growth . rate ( tsconfirmed , geo . loc = c ( " hubei " ," italy " ," spain " ," us " ," canada " ," ontario " , " quebec " ," uruguay " ) ) # turn off static plots and activate interactive figures growth . rate ( tsconfirmed , geo . loc = c ( " brazil " ," canada " ," ontario " ," us " ) , staticplt = # static and interactive figures growth . rate ( tsconfirmed , geo . loc = c ( " brazil " ," italy " ," india " ," us " ) , staticplt = true , interactivefig = true ) listing : calculation of growth rates and daily changes per country/region/province. in addition to the cumulative indicators described above, it is possible to estimate the global trends per location employing the functions single.trend, mtrends and itrends. the first two functions generate static plots of different quantities that can be used as indicators, while the third function generates an interactive representation of a normalized a-dimensional trend. the lst. shows examples of the use of these functions. fig. displays the graphical output produced by these functions. # single location trend , in this case using data from the city of toronto tor . data <-covid . toronto . data () single . trend ( tor . data [ tor . data $ status == " active ␣ cases " ,]) # or data from the province of ontario ts . data <-covid . data ( " ts -confirmed " ) ont . data <-ts . data [ ts . data $ province . state == " ontario " ,] single . trend ( ont . data ) # or from italy single . trend ( ts . data [ ts . data $ country . region == " italy " ,]) # multiple locations ts . data <-covid . data ( " ts -confirmed " ) mtrends ( ts . data , geo . loc = c ( " canada " ," ontario " ," uruguay " ," italy " ) ) # multiple cases single . trend ( tor . data ) # interactive plot of trends # for all locations and all type of cases itrends ( covid . data ( " ts -all " ) , geo . loc = " all " ) # or just for confirmed cases and some specific locations , saving the result in an html file named " itrends _ ex . html " itrends ( covid . data ( " ts -confirmed " ) , geo . loc = c ( " uruguay " ," argentina " ," ontario " ," us " ," italy " ," hubei " ) , filename = " itrends _ ex " ) listing : calculation of trends for different cases, utilizing the single.trend, mtrends and itrends functions. the typical representations can be seen in fig. . most of the analysis functions in the covid .analytics package have already plotting and visualization capabilities. in addition to the previously described ones, the package has also specialized visualization functions as shown in lst. . many of them will generate static and interactive figures, see table for details of the type of output. in particular the live.map function is an utility function which allows to plot the location of the recorded cases around the world. this function in particular allows for several customizable features, such as, the type of projection used in the map or to select different types of projection operators in a pull down menu, displaying or not the legend of the regions, specify rescaling factors for the sizes representing the number of cases, among others. the function will generate a live representation of the cases, utilizing the plotly package and ultimately open the map in a browser, where the user can explore the map, drag the representation, zoom in/out, turn on/off legends, etc. # retrieve time series data ts . data <-covid . data ( " ts -all " ) # static and interactive plot totals . plt ( ts . data ) # totals for ontario and canada , without displaying totals and one plot per page totals . plt ( ts . data , c ( " canada " ," ontario " ) , with . totals = false , one . plt . per . page = true ) # totals for ontario , canada , italy and uruguay ; including global totals with the linear and semi -log plots arranged one next to the other totals . plt ( ts . data , c ( " canada " ," ontario " ," italy " ," uruguay " ) , with . totals = true , one . plt . per . page = false ) # totals for all the locations reported on the dataset , interactive plot will be saved as " totals -all . html " totals . plt ( ts . data , " all " , filename = " totals -all " ) # retrieve aggregated data data <-covid . data ( " aggregated " ) # interactive map of aggregated cases --with more spatial resolution live . map ( data ) # or live . map () # interactive map of the time series data of the confirmed cases with less spatial resolution , ie . aggregated by country live . map ( covid . data ( " ts -confirmed " ) ) listing : examples of some of the interactive and visualization capabilities of plotting functions. the typical representations can be seen in fig. . last but not least, one the novel features added by the covid .analytics package, is the ability of model the spread of the virus by incorporating real data. as described in sec. . , the generate.sir.model function, implements a simple sir model employing the data reported from an specified dataset and a particular location. examples of this are shown in lst. . the generate.sir.model function is complemented with the plt.sir.model function which can be used to generate static or interactive figures as shown in fig. . the generate.sir.model function as described in sec. will attempt to obtain proper values for the parameters β and γ, by inferring the onset of the epidemic using the actual data. this is also listed in the output of the function (see lst. ), and it can be controlled by setting the parameters t and t or deltat, which are used to specify the range of dates to be considered for using when determining the values of β and γ. the fatality rate (constant) can also be indicated via the fatality.rate argument, as well, as the total population of the region with tot.population. # read time series data for confirmed cases data <-covid . data ( " ts -confirmed " ) # run a sir model for a given geographical location generate . sir . model ( data , " hubei " , t = , t = ) generate . sir . model ( data , " germany " , tot . population = ) generate . sir . model ( data , " uruguay " , tot . population = ) generate . sir . model ( data , " ontario " , tot . population = , add . extras = true ) # the function will aggregate data for a geographical location , like a country with multiple entries generate . sir . model ( data , " canada " , tot . population = , add . extras = true ) fig.( ) , also raises an interesting point regarding the accuracy of the sir model. we should recall that this is the simplest approach one could take in order to model the spread of diseases and usually more refined and complex models are used to incorporate several factors, such as, vaccination, quarantines, effects of social clusters, etc. however, in some cases, specially when the spread of the disease appears to have enter the so-called exponential growth rate, this simple sir model can capture the main trend of the dispersion (e.g. left plot from fig. ). while in other cases, when the rate of spread is slower than the freely exponential dispersion, the model clearly fails in tracking the actual evolution of cases (e.g. right plot from fig. ) . finally, lst. shows an example of the generation of a sequence of values for r , and actually any of the parameteres (β, γ) describing the sir model. in this case, the function takes a range of values for the initial date t and generates different date intervals, this allows the function to generate multiple sir models and return the corresponding parameters for each model. the results are then bundle in a "matrix"/"array" object which can be accessed by column for each model or by row for each paramter sets. # read timeseries data ts . data <-covid . data ( " ts -confirmed " ) # select a location of interest , eg . france # france has many entries , just pick " france " fr . data <-ts . data [ ( ts . data $ country . region == " france " ) & ( ts . data $ province . state == " " ) ,] # sweep values of r based on range of dates to consider for the model ranges <- : deltat <- params _ sweep <-sweep . sir . models ( data = fr . data , geo . loc = " france " , t _ range = ranges , deltat = deltat ) # the parameters --beta , gamma , r --are returned in a " matrix " " array " object print ( params _ sweep ) as mentioned before, the functions from the covid .analytics package also allow users to work with their own data, when the data is formated in the time series strucutre as discussed in sec. . . . this opens a large range of possibilities for users to import their own data into r and use the functions already defined in the covid .analytics package. a concrete example of how the data has to be formatted is shown in lst. . the example shows how to structure the data in a ts format from "synthetic" data generated from randomly sampling different distributions. however this could be actual data from other places or locations not accesible from the datasets provided by the package, or some researchers may have access to their own private sets of data too. the example also shows two cases, where the data can include the "status" column or not, and whether it could be more than one location. as a matter of fact, we left the "long" and "lat" fields empty but if one includes the actual coordinates, the maping function live.map can also be used with these structured data. # ts data structure : # " province . state " " country . region " " lat " " long " dates . . . # first let ' s create a ' fake ' location fake . locn <-c ( na , na , na , na ) # names for these columns names ( fake . locn ) <-c ( " province . state " ," country . region " ," lat " ," long " ) # let ' s set the dates dates . vec <-seq ( as . date ( " / / " ) , as . date ( " / / " ) , " days " ) # data . vecx would be the actual values / cases data . vec <-rpois ( length ( dates . vec ) , lambda = ) # can also add more cases data . vec <-abs ( rnorm ( length ( dates . vec ) , mean = , sd = ) ) data . vec <-abs ( rnorm ( length ( dates . vec ) , mean = , sd = ) ) # this will names the columns as your dates names ( data . vec ) <-dates . vec names ( data . vec ) <-dates . vec names ( data . vec ) <-dates . vec # merge them into a data frame with multiple entries synthetic . data <-as . data . frame ( rbind ( rbind ( c ( fake . locn , data . vec ) ) , rbind ( c ( fake . locn , data . vec ) ) , rbind ( c ( fake . locn , data . vec ) ) ) ) # finally set you locn to somethign unqiue , so you can use it in the generate . sir . model fn synthetic . data $ country . region <-" mylocn " # one could even add " status " synthetic . data $ status <-c ( " confirmed " ," death " ," recovered " ) # or just one case per locn synthetic . data <-synthetic . data [ , -ncol ( synthetic . data ) ] synthetic . data $ country . region <-c ( " mylocn " ," mylocn " ," mylocn " ) # now we can use this ' synthetic ' dataset with any of the ts functions # data checks integrity . check ( synthetic . data ) consistency . check ( synthetic . data ) data . checks ( synthetic . data ) # quantitative indicators tots . per . location ( synthetic . data ) growth . rate ( synthetic . data ) single . trend ( synthetic . data [ ,] ) mtrends ( synthetic . data ) # sir models synthsir <-generate . sir . model ( synthetic . data , geo . loc = " mylocn " ) plt . sir . model ( synthsir , interactivefig = true ) sweep . sir . models ( synthetic . data , geo . loc = " mylocn " ) listing : example of structuring data in a ts format, so that it can be used with any of the ts functions from the covid .analytics package. the covid .analytics package provides access to genomics data available at the ncbi databases [ , ] . the covid .genomic.data is the master function for accesing the different variations of the genomics information available as shown in gtypes <-c ( " genome " ," fasta " ," tree " , " nucleotide " ," protein " , " nucleotide -fasta " ," protein -fasta " , " genomic " ) each of these functions return different objects, lst. shows an example of the different structures for some of the objects. the most involved object is obtained from the covid .genomic.data when combining different types of datasets. # str ( results ) list of $ refgenome : list of .. $ livedata : chr [ : ] " a " " t " " t " " a " ... .. $ repo : chr [ : ] " a " " t " " t " " a " ... .. $ local : chr [ : ] " a " " t " " t " " a " . ] " a " " t " " t " " a " ... .. .. -attr ( * , " species " ) = chr " severe _ acute _ respiratory _ syndrome _ coronavirus _ " .. $ local : list of .. .. $ nc _ . : chr [ : ] " a " " t " " t " " a " ... .. .. -attr ( * , " species " ) = chr " severe _ acute _ respiratory _ syndrome _ coronavirus _ " $ ptns : list of .. $ repo : chr [ : ] " yp _ " " yp _ " " yp _ " " yp _ " ... .. $ local : chr [ : ] " yp _ " " yp _ " " yp _ " " yp _ " ... : chr [ : ] " - - t : : z " " - - t : : z " " - - t : : z " " - - t : : z " ... : chr [ : ] " severe ␣ acute ␣ respiratory ␣ syndrome -related ␣ coronavirus " " severe ␣ acute ␣ respiratory ␣ syndrome -related ␣ coronavirus " " severe ␣ acute ␣ respiratory ␣ syndrome -related ␣ coronavirus " " severe ␣ acute ␣ respiratory ␣ syndrome -related ␣ coronavirus " .. : chr [ : ] " homo ␣ sapiens " " homo ␣ sapiens " " homo ␣ sapiens " " homo ␣ sapiens " ... .. $ isolation _ source : chr [ : ] " " " " " " " " ... .. $ collection _ date : chr [ : ] " - " " - - " " - - " " - - " ... .. $ biosample : chr [ : ] " " " samn " " samn " " samn " ... .. $ genbank _ title : chr [ : ] " severe ␣ acute ␣ respiratory ␣ syndrome ␣ coronavirus ␣ ␣ isolate ␣ wuhan -hu - , ␣ complete ␣ genome " " severe ␣ acute ␣ respiratory ␣ syndrome ␣ coronavirus ␣ ␣ isolate ␣ sars -cov - / human / ind / gbrc / , ␣ complete ␣ genome " " severe ␣ acute ␣ respiratory ␣ syndrome ␣ coronavirus ␣ ␣ isolate ␣ sars -cov - / human / ind / gbrc a / , ␣ complete ␣ genome " " severe ␣ acute ␣ respiratory ␣ syndrome ␣ coronavirus ␣ ␣ isolate ␣ sars -cov - / human / ind / gbrc b / , ␣ complete ␣ genome " ... $ proteins : ' data . frame ': obs . of variables : .. $ accession : chr [ : ] " yp _ " " yp _ " " yp _ " " yp _ " ... .. $ sra _ accession : chr [ : ] " " " " " " " " ... .. $ release _ date : chr [ : ] " - - t : : z " " - - t : : z " " - - t : : z " " - - t : : z " ... .. $ species : chr [ : ] " severe ␣ acute ␣ respiratory ␣ syndrome -related ␣ coronavirus " " severe ␣ acute ␣ respiratory ␣ syndrome -related ␣ coronavirus " " severe ␣ acute ␣ respiratory ␣ syndrome -related ␣ coronavirus " " severe ␣ acute ␣ respiratory ␣ syndrome -related ␣ coronavirus " .. : chr [ : ] " leader ␣ protein ␣ [ severe ␣ acute ␣ respiratory ␣ syndrome ␣ coronavirus ␣ ] " " nsp ␣ [ severe ␣ acute ␣ respiratory ␣ syndrome ␣ coronavirus ␣ ] " " nsp ␣ [ severe ␣ acute ␣ respiratory ␣ syndrome ␣ coronavirus ␣ ] " " nsp ␣ [ severe ␣ acute ␣ respiratory ␣ syndrome ␣ coronavirus ␣ ] " ... $ sra : list of .. $ sra _ info : chr [ : ] " this ␣ download ␣ ( via ␣ ftp ) ␣ provides ␣ coronaviridae ␣ family -containing ␣ sra ␣ runs ␣ detected ␣ with ␣ ncbi ' s ␣ kmer ␣ analysis ␣ ( stat ) ␣ tool . ␣ " " it ␣ provides ␣ corresponding ␣ sra ␣ run ␣ ( srr ) ,␣ sample ␣ ( srs ) ,␣ and ␣ submission ␣ ( sra ) ␣ accessions , ␣ as ␣ well ␣ as ␣ biosample ␣ an " | _ _ truncated _ _ " the ␣ stat ␣ kmer ␣ analysis ␣ was ␣ performed ␣ via ␣ a ␣ two -step ␣ process ␣ with ␣ a ␣ -mer ␣ coarse ␣ database ␣ and ␣ a ␣ -mer ␣ fine ␣ database . ␣ " " the ␣ database ␣ is ␣ generated ␣ from ␣ refseq ␣ genomes ␣ and ␣ the ␣ viral ␣ genome ␣ set ␣ from ␣ nt ␣ using ␣ a ␣ minhash -based ␣ approach . ␣ " ... .. $ sra _ runs : ' data . frame ': obs . of variables : .. .. $ acc : chr [ : ] " err " " err " " err " " err " ... .. .. $ sample _ acc : chr [ : ] " ers " " ers " " ers " " ers " ... .. .. $ biosample : chr [ : ] " samea " " same a " " samea " " samea " ... .. .. $ sra _ study : chr [ : ] " erp " " erp " " erp " " erp " ... .. .. $ bioproject : chr [ : ] " " " " " " " " ... $ references : list of .. $ : chr " covid . analytics ␣ --␣ local ␣ data " .. $ : chr " / users / marcelo / library / r / . / library / covid . analytics / extdata / " listing : objects composition for the example presented in lst. one aspect that should be mentioned with respect to the genomics data is that, in general, these are large datasets which are continuously being updated hence increasing theirs sizes even more. these would ultimately present pragmatical challenges, such as, long processing times or even starvation of memory resources. we will not dive into major interesting examples, like dna sequencing analysis or building phylogenetics trees; but packages such as ape, apegenet, phylocanvas, and others can be used for these and other analysis. one simple example we can present is the creation of dynamical categorization trees based on different elements of the sequencing data. we will consider for instance the data for the nucleotides as reported from ncbi. the example on lst. shows how to retrieve either nucleotides (or proteins) data and generate categorization trees based on different elements, such as, hosting organism, geographical location, sequences length, etc. in the examples we employed the collapsibletree package, that generates interactive browsable trees through web browsers. # retrieve the nucleotides data nucx <-covid . genomic . data ( type = ' nucleotide ' , src = ' repo ') # identify specific fields to look at len . fld <-" length " acc . fld <-" accession " geoloc . fld <-" geo _ location " seq . fld <-" sequence _ type " host . fld <-" host " seq . limit <- seq . limit <- seq . limit <- # selection criteria , nucleotides with seq . length between and selec . ctr . <-nucx $ length < seq . limit & nucx $ length > seq . limit # remove nucletoides without specifying a " host " listing : example of how to generate a dynamic browsable tree using some of information included in the nucleotides dataset. some of these trees representations are shown in fig. . in this section we will present and discuss, how the covid .analytics dashboard explorer is implemented. the main goal is to provide enough details about how the dashboard is implemented and works, so that users could modify it if/as they seem fit or even develop their own. for doing so, we will focus in three main points: • the front end implementation, also know as the user interface, mainly developed using the shiny package • the back end implementation, mostly using the covid .analytics package • the web server installation and configuration where the dashboard is hosted the covid .analytics dashboard explorer is built using the shiny package [ ] in combination with the covid .analytics package. shiny allows users to build interactive dashboards that can work through a web interface. the dashboard mimics the covid .analytics package commands and features but enhances the commands as it allows users to use dropdowns and other control widgets to easily input the data rather than using a command terminal. in addition the dashboard offers some unique features, such as a personal protective equipment (ppe) model estimation, based on realistic projections developed by the us centers for disease control and prevention (cdc). the dashboard interface offers several features: . the dashboard can be run on the cloud/web allowing for multiple users to simultaneously analyze the data with no special software or hardware requirements. the shiny package makes the dashboard mobile and tablet compatible as well. . it aids researchers to share and discuss analytical findings. . the dashboard can be run locally or through the web server. . no programming or software expertise is required which reduces technical barriers to analyzing the data. users can interact and analyze the data without any software expertise therefore users can focus on the modeling and analysis. in these times the dashboard can be a monumental tool as it removes barriers and allows a wider and diverse set of users to have quick access to the data. . interactivity. one feature of shiny and other graphing packages, such as plotly, is interactivity, i.e. the ability to interact with the data. this allows one to display and show complex data in a concise manner and focus on specific points of interest. interactive options such as zoom, panning and mouse hover all help in making the user interaction enjoyable and informative. . fast and easy to compare. one advantage of a dashboard is that users can easily analyze and compare the data quickly and multiple times. for example users can change the slider or dropdown to select multiple countries to see the total daily count effortlessly. this allows the data to be displayed and changed as users analysis requirements change. the dashboard can be laucnhed locally in a machine with r, either through an interactive r session or in batch mode using rscript or r cmd batch or through the web server accessing the following url https://covid analytics.scinet.utoronto.ca. for running the dashboard locally, the covid .analytics package has also to be installed. for running the dashboard within an r session the package has to be loaded and then it should be invoked using the following sequence of commands, > library(covid .analytics) > covid explorer() the batch mode can be executed using an r script containing the commands listed above. when the dashboard is run locally the browser will open a port in the local machine -localhost:port-connection, i.e. http:// . . . . it should be noted, that if the dashboard is launched interactively within an r session the port used is -http:// . . . : -, while if this is done through an r script in batch mode the port used will be different. to implement the dashboard and enhance some of the basic functionalities offered, the following libraries were specifically used in the implementation of the dashboard: • shiny [ ] : the main package that builds the dashboard. • shinydashboard [ ] : this is a package that assists us to build the dashboard with respect to themes, layouts and structure. • shinycssloaders [ ] : this package adds loader animations to shiny outputs such as plots and tables when they are loading or (re)-calculating. in general, this are wrappers around base css-style loaders. • plotly [ ] : charting library to generate interactive charts and plots. although extensively used in the core functions of the covid .analytics , we reiterate it here as it is great tool to develop interactive plots. • dt [ ] : a datatable library to generate interactive table output. • dplyr [ ] : a library that helps to apply functions and operations to data frames. this is important for calculations specifically in the ppe calculations. the r shiny package makes developing dashboards easy and seamless and removes challenges. for example setting the layout of a dashboard typically is challenging as it requires knowledge of frontend technologies such as html, css and bootstrap to be able to position elements and change there asthetic properties. shiny simplifies this problem by using a built in box controller widget which allows developers to easily group elements, tables, charts and widgets together. many of the css properties, such as, widths or colors are input parameters to the functions of interest. the sidebar feature is simple to implement and the shiny package makes it easy to be compatible across multiple devices such as tablets or cellphones. the shiny package also has built in layout properties such as fluidrow or columns making it easy to position elements on a page. the library does have some challenges as well. one challenge faced is theme design. shinydashboard does not make it easy to change the whole color theme of the dashboard outside of the white or blue theme that is provided by default. the issue is resolved by having the developer write custom css and change each of the various properties manually. the dashboard contains two main components a sidebar and a main body. the sidebar contains a list of all the menu options. options which are similar in nature are grouped in a nested format. for example the dashboard menu section called "datasets and reports", when selected, displays a nested list of further options the user can choose such as the world data or toronto data. grouping similar menu options together is important for making the user understand the data. the main body displays the content of a page. the content a main body displays depends on the sidebar and the selected menu option the user selects. there are three main generic elements needed to develop a dashboard: layouts, control widgets and output widgets. the layout options are components needed to layout the features or components on a page. in this dashboard the layout widgets used are the following: • box: the boxes are the main building blocks of a dashboard. this allows us to group content together. • tabpanels: tabpanels allow us to create tabs to divide one page into several sections. this allows for multiple charts or multiple types of data to be displayed in a single page. for example in the indicators page there are four tabs which display four different charts with mosaic tab displaying the charts in different configurations. • header and title: these are used to display text, title pages in the appropriate sizes and fonts. an example describing these elements and its implementation is shown in lst. . ␣ table ' ) , h ( ' world ␣ data ␣ of ␣ all ␣ covid ␣ cases ␣ across ␣ the ␣ globe ') , column ( , selectinput ( ns ( " category _ list " ) , label = h ( " category " ) , choices = category _ list ) ) , column ( , downloadbutton ( ns ( ' downloaddata ') , " download " ) ) , withspinner ( dt :: datatableoutput ( ns ( " table _ contents " ) ) ) ) } listing : snippet of a code that describes the various features used in generating a dashboard. the ns(id) is a namespaced id for inputs/outputs. withspinner is the shiny cssloaders which generates a loading gif while the chart is being loaded. shiny modules are used when a shiny application gets larger and complicated, they can also be used to fix the namespacing problem. however shiny modules also allow for code reusability and code modularity as the code can be broken into several pieces called modules. each module can then be called in different applications or even in the same application multiple times. in this dashboard we break the code into two main groups user interface (ui) modules and server modules. each menu option has there own dedicated set of ui and associated server modules. this makes the code easy to build and expand. for each new menu option a new set of ui and sever module functions will be built. lst. is also an example of an ui module, where it specifies the desing and look of the element and connect with the active parts of the application. lst. shows an example of a server function called reportserver. this type of module can update and display charts, tables and valueboxes based on the user selections. this same scenario occurs for all menu options with the ui/server paradigm. another way to think about the ui/server separation, is that the ui modules are in charge of laying down the look of a particular element in the dahboard, while the sever is in charge of dynamically 'filling' the dynamical elements and data to populate this element. control widgets, also called input widgets, are widgets which users use to input data, information or settings to update charts, tables and other output widgets. the following control widgets were used in this dashboard: • numericalinput: a textbox that only allows for numerical input which is used to select a single numerical value. • selectinput: a dropdown that may be multi select for allowing users to select multiple options as in the case of the country dropdown. figure : screenshot from the "covid .analytics dashboard explorer", "mosaic" tab from the 'indicators' category. four interactive figures are shown in this case: the trends (generated using the itrends function), totals (genrated using the totals.plt function) and two world global representations of covid reported cases (generated using the live.map function). the two upper plots are adjusted and re-rendered according to the selection of the country, category of the data from the input boxes. • slider: the slider in our dashboard is purely numerical but used to select a single numerical value from a given range of a min and max range. • download button: this is a button which allows users to download and save data in various formats such as csv format. • radiobuttons: used to select only one from a limited number of choices. • checkbox: similar in purpose to radiobuttons that also allow users to select one option from a limited number of options. output control widgets are widgets that are used to display content/information back to the user. there are three main ouput widgets used in this dashboard: • plotlyouput: this widget output and creates plotly charts. plotly is a graphical package library used to generate interactive charts. • rendertable: is an output that generates the output as an interactive table with search, filter and sort capabilities provided out of the box. • valuebox: this is a fancy textbox with border colors and descriptive font text to generate descriptive text to users such as the total number of deaths. the dashboard contains the menus and elements shown in and described below: • indicators: this menu section displays different covid indicators to analyze the pandemic. there are four notable indicators itrend, total plot, growth rate and live map, which are displayed in each of the various tabs. itrend displays the "trend" in a log-log plot, total plot shows a line graph of total number, growth rate displays the daily number of changes and growth rate (as defined in sec. . ), live map shows a world map of infections in an aggregated or timeseries format. these indicators are shown together in the "mosaic" tab. • models: this menu option contains a sub-menu presenting models related to the pandemic. the first model is the sir (susceptible infection recovery) which is implemented in the covid .analytics package. sir is a compartmental model to model how a disease will infect a population. the second two models are used to estimate the amount of ppe needed due to infectious diseases, such as ebola and covid . • datasets and reports: this section provides reporting capability which outputs reports as csv and text files for the data. the world data subsection displays all the world data as a table which can be filtered, sorted and searched. the data can also be saved as a csv file. the toronto data displays the toronto data in tabular format while also displaying the current pandemic numbers. data integrity section checks the integrity and consistency of the data set such as when the raw data contains negative numbers or if the cumulative quantities decrease. the report section is used to generate a report as a text file. • references: the reference section displays information on the github repo and documentation along with an external dashboards section which contains hyperlinks to other dashboards of interest. dashboards of interest are the vaccine tracker which tracks the progress of vaccines being tested for covid , john hopkins university and the canada dashboard built by the dall lana school of epidemiology at the university of toronto. • about us: contact information and information about the developers. in addition to implementing some of the functionalities provided by the covid .analytics package, the dashboard also includes a ppe calculator. the hospital ppe is a qualitative model which is designed to analyze the amount of ppe equipment needed for a single covid patient over a hospital duration. the ppe calculation implemented in the covid .analytics dashboard explorer is derived from the cdc's studies for infectious diseases, such as ebola and covid . the rationality is that ebola and covid are both contagious infections and ppe is used to protect staff and patients and prevent transmission for both of these contagious diseases. the hospital ppe calculation estimates and models the amount of ppe a hospital will need during the covid pandemic. there are two analysis methods a user can choose to determine hospital ppe requirement. the first method to analyze ppe is to determine the amount of ppe needed for a single hospitalized covid patient. this first model requires two major component: the size of the healthcare team needed to take care of a single covid patient and the amount of ppe equipment used by hospital staff per shift over a hosptialization duration. the model is based off the cdc ebola crisis calculation [ ] . alhough ebola is a different disease compared to covid , there is one major similarity. both covid and ebola are diseases which both require ppe for protection of healthcare staff and the infected patient. to compensate the user can change the amount of ppe a healthcare staff uses per shift. that information can be adjusted by changing the slider values in the advanced setting tab. the calculation is pretty straightforward as it takes the ppe amount used per shift and multiplies it by the number of healthcare staff and then by the hospitalization duration. the first model has two tabs. the first tab displays a stacked bar chart displaying the amount of ppe equipment user by each hospital staff over the total hospital duration of a single patient. it breaks each ppe equipment by stacks. the second tab panel called advanced settings has a series of sliders for each hospital staff example nurses where users can use the slider to change the amount of ppe that the hospital staff will user per shift. the second model is a more recent calculation developed by the cdc [ ] . the model calculates the burn rate of ppe equipment for hospitals for a one week time period. this model is designed specifically for covid . the cdc has created an excel file for hospital staff to input their information and also an android app as well which can be utilized. this model also implemented in our dashboard, is simplified to calculate the ppe for a one week setting. the one week limit was implemented for two reasons, first to limit the amount of input data a user has to enter into the system as too much data can overwhelm and confuse a user; second because the covid pandemic is a highly fluidic situation and for hospital staff to forecast their ppe and resource equipments greater than a one week period may not be accurate. note that this model is not accurate if the facilitiy recieves a resupply of ppe. for resupplied ppe start a new calculation. there are four tab panels to the burn rate calculation which displays charts and settings. the first tab daily usage displays a multi-line chart displaying the amount of ppe used daily, ∆p p e daily . the calculation for this is a simple subtraction between two consecutive days, i.e. the second day (j + ) from the first day (j) as noted in eq. ( ) . the tab panel called remaining supply shows a multi line chart the number of days the remaining ppe equipment will last in the facility. the duration of how long the ppe equipment can last in a given facility, inversely depends on the amount of covid patients admitted to the hospital. to calculate the remaining ppe one calculates the average amount of ppe used over the one week duration and then divides the amount of ppe at the beginning of the day by the average ppe usage, as shown in eq. ( ), where t denotes the time average over a t period of time. the third panel called ppe per patient displays a multi line chart of the burn rate, i.e. the amount of ppe used per patient per day. eq.( ) represents the calculation as the remaining ppe supply divided by the number of covid patients in the hospital staff during that exact day. the fourth tab called advanced settings is a series of show and hide "accordians" where users can input the amount of ppe equpiment they have at the start of each day. there are six collapsed boxes for each ppe equipment type and for covid patient count. expanding a box displays seven numericalinput textboxes which allows users to input the number of ppe or patient count for each day. the equations describing the ppe needs, eqs. ( , , ) are implemented in the shiny dashboard using the dplyr library. the dplyr library allows users to work with dataframe like objects in a quick and efficient manner. the three equations are implemented using a single dataframe. the advanced setting inputs of the burn rate analysis tab are saved into a dataframe. the ppe equations -eqs. ( the back-end implementation of the dashboard is achieved using the functions presented in sec. on the server module of the dashboard. the main strategy is to use a particular function and connect it with the input controls to feed the needed arguments into the function and then capture the output of the function and render it accordingly. let's consider the example of the globe map representation shown in the dashboard which is done using the live.map function. lst. shows how this function connects with the other elements in the dashboard: the input elements are accessed using input$... which in this are used to control the particular options for the displaying the legends or projections based on checkboxes. the output returned from this function is captured through the renderplotly({...}) function, that is aimed to take plotly type of plots and integrate them into the dashboard. # livemap plot charts on the three possible commbinations output $ ts _ livemap <-output $ ts _ livemap <-output $ ts _ livemap <-output $ ts _ livemap <-renderplotly ({ legend <-input $ sel _ legend projections <-input $ sel _ projection live . map ( covid . data ( " ts -confirmed " ) , interactive . display = false , no . legend = legend , select . projctn = projections ) }) listing : example of how the live.map function is used to render the ineractive figures display on the dashboard. another example is the report generation capability using the report.summary function which is shown in lst. . as mentioned before, the input arguments of the function are obtained from the input controls. the output in this case is rendered usign the rendertext({...}) function, as the output of the original function is plain text. notice also that there are two implementations of the report.summary, one is for the rendering in the screen and the second one is for making the report available to be downloaded which is handled by the downloadhandler function. reportserver <-function ( input , output , session , result ) { output $ report _ output _ default <-rendertext ({ # extract the vairables of the inputs nentries <-input $ txtbox _ nentries geo _ loc <-input $ geo _ loc _ select ts <-input $ ddl _ ts capture . output ( report . summary ( graphical . output = false , nentries = nentries , geo . loc = geo _ loc , cases . to . process = ts ) ) } , sep = '\ n ') report <-reactive ({ nentries <-input $ txtbox _ nentries geo _ loc <-input $ geo _ loc _ select ts <-input $ ddl _ ts report <-capture . output ( report . summary ( graphical . output = false , nentries = nentries , geo . loc = geo _ loc , cases . to . process = ts ) ) return ( report ) }) output $ downloadreport <-downloadhandler ( filename = function () { paste ( " report -" , sys . date () ," . txt " , sep = " " ) } , content = function ( file ) { writelines ( paste ( report () ) , file ) } ) } listing : report capabilites implemented in the dashboard using the report.summary function. the final element in the deployment of the dashboard is the actual set up and configuration of the web server where the application will run. the actual implementation of our web dashboard, accessible through https://covid analytics.scinet.utoronto.ca, relies on a virtual machine (vm) in a physical server located at scinet headquarters. we should also notice that there are other forms or ways to "publish" a dashboard, in particular for shiny based-dashboards, the most common way and perhaps straighforward one is to deploy the dashboard on https://www.shinyapps.io. alternatively one could also implement the dashboard in a cloud-based solution, e.g. https://aws.amazon.com/blogs/big-data/running-r-on-aws/. each approach has its own advantages and disadvantages, for instance, depending on a third party solution (like the previous mentioned) implies some cost to be paid to or dependency on the provider but will certainly eliminate some of the complexity and special attention one must take when using its own server. on the other hand, a self-deployed server will allow you for full control, in principle cost-effective or cost-controled expense and full integration with the end application. in our case, we opted for a self-controlled and configured server as mentioned above. moreover, it is quite a common practice to deploy (multiple) web services via vms or "containers". the vm for our web server runs on centos and has installed r version . from sources and compiled on the vm. after that we proceeded to install the shiny server from sources, i.e. https://github.com/rstudio/ shiny-server/wiki/building-shiny-server-from-source. after the installation of the shiny-server is completed, we proceed by creating a new user in the vm from where the server is going to be run. for security reasons, we recommend to avoid running the server as root. in general, the shiny server can use a user named "shiny". hence a local account is created for this user, and then logged as this user, one can proceed with the installation of the required r packages in a local library for this user. all the neeeded packages for running the dashboard and the covid .analytics package need to be installed. lst. shows the commands used for creating the shiny user and finalizing the configuration and details of the log files. # place a shortcut to the shiny -server executable in / usr / bin sudo ln -s / usr / local / shiny -server / bin / shiny -server / usr / bin / shiny -server # create shiny user sudo useradd -r -m shiny # create log , config , and application directories sudo mkdir -p / var / log / shiny -server sudo mkdir -p / srv / shiny -server sudo mkdir -p / var / lib / shiny -server sudo chown shiny / var / log / shiny -server sudo mkdir -p / etc / shiny -server listing : list of commands used on the vm to finalize the setup of the shiny user and server. source: https: //github.com/rstudio/shiny-server. for dealing with the apache configuration on port , we added the file /etc/httpd/conf.d/rewrite.conf as shown in lst. . rewritecond %{ request _ scheme } = http rewriterule^https : / / %{ server _ name }%{ request _ uri } [ qsa , r = permanent ] listing : modifications to the apache configurations, specified in the file rewrite.conf. these three lines rewrite any incoming request from http to https. for handling the apache configuration on port , we added this file /etc/httpd/conf.d/shiny.conf, as shown in lst. . this virtualhost receives the https requests from the internet on port , establishes the secure connection, and redirects all input to port using plain http. all requests to "/" are redirected to "http:// . . . : /app /", where app in this case is a subdirectory where a particular shiny app is located. there is an additional configuration file, /etc/httpd/conf.d/ssl.conf, which contains the configuration for establishing secure connections such as protocols, certificate paths, ciphers, etc. the main tool we use in order to communicate updates between the different elements we use in the development and mantainance of the covid .analytics package and dashboard web interface is orchestrated via git repositories. in this way, we have in place version control systems but also offer decentralized with multiple replicas. fig. shows an schematic of how our network of repositories and service is connected. the central hub for our package, is located at the github repo htttps: //github.com/mponce /covid .analytics; we then have (and users can too) our own clones of local copies of this repo -we usually use this for development and testing-. when a stable and substantial contribution to the package is reached, we submit this to the cran repository. similarly, when an update is done on the dashboard we can synchronize the vm via git pulls and deploy the updates on the server side. in this paper we have presented and discussed the r covid .analytics package, which is an open source tool to obtain, analyze and visualize data of the covid pandemic. the package also incorporates a dashboard to facilitate the access to its functionalities to less experienced users. as today, there are a few dozen other packages also in the cran repository that allow users to gain access to different datasets of the covid pandemic. in some cases, some packages just provide access to data from specific geographical locations or the approach to the data structure in which the data is presented is different from the one presented here. nevertheless, having a variety of packages from which users can try and probably combine, is an important and crucial element in data analysis. moreover it has been reported different cases of data misuse/misinterpretation due to different issues, such as, erroneous metadata or data formats [ ] and in some cases ending in articles' retractions [ ] . therefore providing additional functionalities to check the integrity and consistency of the data, as our the covid .analytics package github repo -central repository https://github.com/mponce /covid .analytics shiny server, running on vm https://covid analytics.scinet.utoronto.ca local copies cran repo https://cran.r-project.org/package=covid .analytics local copies local copies github io -web rendering https://mponce .github.io/covid .analytics/ private instances figure : schematic of the different repositories and systems employed by the covid .analytics package and dashboard interface. does is paramount. this is specially true in a situation where the unfolding of events and data availability is flowing so fast that sometimes is even hard to keep track of all the changes. moreover, the covid .analytics package offers a modular and versatile approach to the data, by allowing users to input their own data for which most of the package functions can be applied when the data is structured using a time series format as described in this manuscript. the covid .analytics is also capable of retrieving genomics data, and it does that by incorporating a novel, more reliable and robust way of accessing and designing different pathways to the data sources. another unique feature of this package is the ability of incorporating models to estimate the disease spread by using the actual data. although a simple model, it has shown some interesting results in agreement for certain cases. of course there are more sophisticated approaches to shred light in analyzing this pandemic; in particular novel "community" approaches have been catalyzed by this too [ ] . however all of these approaches face new challenges as well [ ] , and on that regards counting with a variety, in particular of open source tools and direct access to the data might help on this front. r: a language and environment for statistical computing, r foundation for statistical computing r: a language for data analysis and graphics covid .analytics: load and analyze live data from the covid- pandemic the biggest mystery: what it will take to trace the coronavirus source animal source of the coronavirus continues to elude scientists a pneumonia outbreak associated with a new coronavirus of probable bat origin the proximal origin of sars-cov- bat-borne virus diversity, spillover and emergence extrapulmonary manifestations of covid- opensafely: factors associated with covid- death in million patients considering how biological sex impacts immune responses and covid- outcomes coronavirus blood-clot mystery intensifies using influenza surveillance networks to estimate state-specific prevalence of sars-cov- in the united states consolidation in a crisis: patterns of international collaboration in early covid- research critiqued coronavirus simulation gets thumbs up from code-checking efforts timing social distancing to avert unmanageable covid- hospital surges special report: the simulations driving the world's response to covid- covid- vaccine design: the janus face of immune enhancement covidep: a web-based platform for real-time reporting of vaccine target recommendations for sars-cov- social network-based distancing strategies to flatten the covid- curve in a post-lockdown world asymptotic estimates of sarscov- infection counts and their sensitivity to stochastic perturbation evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic an interactive web-based dashboard to track covid- in real time pandemic publishing poses a new covid- challenge will the pandemic permanently alter scientific publishing? how swamped preprint servers are blocking bad coronavirus research project, trainees, faculty, advancing scientific knowledge in times of pandemics covid- risk factors: literature database & meta-analysis coronawhy: building a distributed, credible and scalable research and data infrastructure for open science, scinlp: natural language processing and data mining for scientific text the comprehensive r archive network covid- data repository by the center for systems science and engineering covid- : status of cases in toronto database resources of the national center for biotechnology information ape . : an environment for modern phylogenetics and evolutionary analyses in r rentrez: an r package for the ncbi eutils api a contribution to the mathematical theory of epidemics the sir model for spread of disease: the differential equation model, loci.(originally convergence exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates devtools: tools to make developing r packages easier shiny: web application framework for r shinydashboard: create dashboards with 'shiny', r package version shinycssloaders: add css loading animations to 'shiny' outputs interactive web-based data visualization with r, plotly, and shiny, chapman and hall/crc, dt: a wrapper of the javascript library 'datatables', r package version dplyr: a grammar of data manipulation estimated personal protective equipment (ppe) needed for healthcare facilities personal protective equipment (ppe) burn rate calculator high-profile coronavirus retractions raise concerns about data oversight covid- pandemic reveals the peril of ignoring metadata standards artificial intelligence cooperation to support the global response to covid- the challenges of deploying artificial intelligence models in a rapidly evolving pandemic the r script containing the shiny app to be run should be placed in /etc/shiny-server and confiurations details about the shiny interface are adjusted in the /etc/shiny-server/shiny-server.conf file.permissions for the application file has to match the identity of the user launching the server, in this case the shiny user.at this point if the installation was sucessful and all the pieces were placed properly, when the shiny-server command is executed, a shiny hosted app will be accessible from localhost: .since the shiny server listens on port in plain http, it is necessary to setup an apache web server to act as a reverse proxy to receive the connection requests from the internet on ports and , the regular http and https ports, and redirect them to port on the same host (localhost). mp wants to thank all his colleagues at scinet, especially daniel gruner for his continous and unconditional support, and marco saldarriaga who helped us setting up the vm for installing the shiny server. key: cord- - umv ox authors: ambrosio, benjamin; aziz-alaoui, m. a. title: on a coupled time-dependent sir models fitting with new york and new-jersey states covid- data date: - - journal: nan doi: . /preprints . .v sha: doc_id: cord_uid: umv ox this article describes a simple susceptible infected recovered (sir) model fitting with covid- data for the month of march in new york (ny) state. the model is a classical sir, but is non-autonomous; the rate of susceptible people becoming infected is adjusted over time in order to fit the available data. the death rate is also secondarily adjusted. our fitting is made under the assumption that due to limiting number of tests, a large part of the infected population has not been tested positive. in the last part, we extend the model to take into account the daily fluxes between new jersey (nj) and ny states and fit the data for both states. our simple model fits the available data, and illustrates typical dynamics of the disease: exponential increase, apex and decrease. the model highlights a decrease in the transmission rate over the period which gives a quantitative illustration about how lockdown policies reduce the spread of the pandemic. the coupled model with ny and nj states shows a wave in nj following the ny wave, illustrating the mechanism of spread from one attractive hot spot to its neighbor. } from a scientific perspective, the covid- pandemic has highlighted the crucial role of mathematical and statistical models in providing guidance for health policies. expressions such as "flatten the curve", the "apex", the "plateau" have been widely heard in medias and employed by decision makers to explain their choices regarding rules and policies during this critical period. in this short article, we first introduce a simple sir model, in which we adjust a key parameter k standing for a control on the susceptible-infected rate, and secondarily the death rate, in order to fit the data of the pandemic in ny state in march , and provide predictions for a near future. then, we add a node in the model to take into account the daily fluxes between ny and nj states. note that these two close states, are, up to the day of redaction of this article, the more severe hit by the pandemic in united states. of course, the coupling may be extended to other states. however, in this article, we restrict ourselves to ny and nj. accordingly, the main key points of this article are that, ) it highlights the dynamics and epidemiological characteristics which have been discussed in press and health policies; it highlights qualitatively how lockdown policies have decreased the spread of the virus and provides prediction and explanation of an upcoming apex, ) it fits real data provided for the new york state and ) it fits the data of nj state by considering coupled equations taking into account the daily fluxes between ny and nj. this provides a quantitative visualization of how the virus may spread from an attractive hot spot (new york city in ny state) towards close states trough the daily fluxes of commuters. we especially focus on fitting the total number of cases tested positive for covid- as well as the number of deaths in both ny and nj states. we also give insights in prediction of the number of people needing hospitalization in ny state. sir models are very classic in literature. for some reader's convenience, we mention here some contextual elements and references. the simplest classical sir model is the kermackmckendrick (kmck) model which goes back to , see [ , ] . it writes in this original (kmck) model, the population splits into three classes. the class s stands for susceptible, who can catch the disease, i stands for infective, who have the disease and can transmit it and r stands for the removed, namely, those who have or have had the disease but not transmit it anymore. note that our terminology is slightly different as explained below. in ( ), the dynamics follow the scheme s → i → r with respective transfer rates between classes of k and r. sir type models and more generally, mathematical models of epidemics have in fact a significant history. we refer to [ ] for a textbook on these models and a brief history of epidemics. models have become more sophisticated and may include more compartments such as exposed, infective asymptomatic, infective with symptoms, and also reservoirs such as bats, and include stochastic dynamics. recently, sir type models have been widely used in the context of the covid- pandemic, to model the spread all around the world, see for example [ , , , , , , ] and reference therein cited. here are also some examples of references for sir models in other epidemic diseases: dengue [?], chikungunya [ , ] and ebola [ ] . see also references therein cited. in the present article, we first consider the following model this simple model has classically three classes: s for susceptible, i for infected and r for recovered. specifically, the class i is intended to represent all the people who bear effectively the virus at a given time, and can transmit it if in contact with other people. it includes all infected people with or without symptoms, reported or not. there are some differences with equation ( ) . first, it includes a death rate d(t). and even tough, the number of deaths does not appear explicitly as a variable, it is simply given by the integral t d(u)i(u)du. also note that in expression the rate of contamination from s to i is proportional to the proportion of susceptible (s) in the whole population (s + i + r). this is a classical expression standing for the fact that the probability for each individual in the i class to spread the virus among the class s is proportional to the portion of s in the whole population, see for example [ , ] and references therein cited. this rate is corrected by a crucial coefficient k(t) which is intended to fit the real transfer rate and which contains the effects inherent to the properties of the virus (for example change of propagation rate due to genetic mutation of the virus) or to specific policies (like quarantine, social distancing, lockdown...). this time dependence allows to adjust the dynamics to fit the data. this is a specificity of our model and turns it into a non-autonomous equation. this time dependence of k is obviously relevant in our model since the rate of transfer from s to r is the main target of health policies and is subsequently subject to vary over time. secondarily, we also allow the death rate to vary. many internal or external factors may affect the death rate among which are concomitant lethal disease, temperature, hospital conditions... more significantly, one has to note that the rate transfers considered here are instantaneous transfer between compartments, and the function d(t) is different from the case fatality rate (cfr). recall that the cfr is the death rate per confirmed case over a given period of time, and is a typical indicator for death rate. in south korea, the country which led the highest number of tests, it has been reported to be of percent, see [ ] . in china as of february , this rate varied between . in the region of wuhan and . in others regions, see [ ] and also [ ] . using tables and gives a cfr in ny from march to april of . per cent. since our model fits the data, by definition, it fits the cfr. at the end of the epidemic, if the whole number of people that contracted the virus would be tested positive the cfr would provide the probability to die for an individual who catches the virus. however, during a growing phase such as the month of march considered here, the cfr has large variations. moreover, there is a delay between the time a person is tested positive and the time he dies. the time between symptom onset and death has been reported to range from about to weeks, see [ ] . the typical average being days according to [ ] . one could for example introduce a time-integral death expression t d (u, t)i(u)du to take into account these informations. however, the time-window considered here is short and corresponds to the beginning of reported cases in ny. furthermore, in this short article, we wanted to focus on a simple model able to fit data, highlight relevant dynamics and provide estimations. since, a person in the i compartment, will either recover or die, above remarks on the function d hold for the coefficient r. in the present work, for sake of simplicity, we have set the r coefficient to the constant value . . this is a simplification which is classical in sir models. note that setting the coefficient r to a constant value is equivalent to assume that people recovering between times t − and t, which is given by r(t) − r(t − ) would also write r t t− i(u)du according to equations ( ) . a more meaningful expression for r(t) − r(t − ) would be t r (u, t)i(u)du standing to the fact that people recovering between time t − and t have been infected from a period ranging from to t, with a transfer rate given byr(u, t). this would lead to the equation once r is set to a constant value, to fit data over a reasonable period, numerical tries provide a unique choice for constants k and d. it was still possible to use different values of r to fit the data. however, different values of r result in different dynamics over time, and notably different time for apex. our choice of r = . was made to provide dynamics that seem relevant to us regrading the timely dynamics beyond the data. in particular, too smaller of values for r would provide dynamics with a late apex and less relevant regarding the timely effects of the disease. other studies have considered models with a constant r, with different values. see for example [ ] and references therein cited. remarkably, varying r, and make it depend on time, provides some freedom to later fit data over a longer period of time, taking in account the end of the first wave in ny. upon the above discussions, our strategy is rather simple: set r to a constant value. then choose a constant k which fits the data of positive cases during a period of time. next, choose a constant d to fit deaths data for the same period of time. note that since the parameter k has much more effect than the small parameter d, this procedure is possible and efficient. after, repeat the procedure over a subsequent period. the overall procedure results in a constant function r and two piecewise constant functions k(t) and d(t). for our model and the given data, the procedure was efficient allowing to provide these choices by successive tries. note that this could be done automatically by cooking an algorithm to set the parameters following these guidelines. note finally that our assumptions do not a include birth rate for the susceptible population but rather focus on the short time effects of the disease. data from total infected people and total number of deaths in new york state is available at the new york times web site, see [ ] . we have downloaded the data from there from march to april , which makes days. we have reported the number of total cases in table and the number of deaths in table . for numerical simulations of equation ( ), the parameter r was set to . . then, in order to fit the data of the total number of infected people, we chose: and to fit the total number of deaths, we chose: figure -b, we plotted the total number of deaths from the model and compared it to the data. in figure -c, we have plotted four curves corresponding to various i(t) for different simulations of ( ). • the curve i (t) in red corresponds to the simulation of ( ) with k(t) = k = . and d(t) = d = . for all time. • the curve i (t) in green corresponds to the simulation of ( ) with k(t) = k = . and d(t) = d = . for ≤ t ≤ , and k(t) = k = . and d(t) = d = . for ≤ t < . • the curve i (t) in pink corresponds to the simulation of ( ) with k(t) as given in ( ), i.e. k(t) = ki and d(t) = di, t ∈ [ti− , ti), i ∈ { , ..., } with t = , t = , t = , t = , t = . this panel illustrates how the health policies flatten the curve. in figure -d, we have again plotted the solution i(t) for k(t) as in ( ), for a longer period. next, we have retrieved data corresponding to the number of people being at the hospital between march and april . during this period, and after, ny state officially reported daily useful charts and statistics, on the local spread. we have then computed an estimation of people being effectively at hospital at a given date. we denote h(ti), ti ∈ { , , ..., } the total number of hospitalizations at a given date. in order to fit the solution of ( ) with this data, we performed a linear regression between (i(ti)), ti ∈ { , , ..., } and (h(ti)), ti ∈ { , , ..., }. the coefficients a and b of the linear regression were determined by the least-square method, leading to the formulas: ( ) and how it fits the data. in (a), we have plotted the quantity . × (i + r) as a function of time in red. the blue dots correspond to the data retrieved from [ ] . analogously, in (b), we have plotted the quantity t d(s)i(s)ds, which represents the total number of deaths according with the model, as a function of time in red. the blue dots correspond to the data retrieved from [ ] . in (c), we have illustrated the quantity i(t) corresponding to different values of k(t), d(t): the curve i (t) in red corresponds to the simulation of ( ) with k(t) = k and d(t) = d for all time. the curve i (t) in green corresponds to the simulation of ( ) with k(t) = k and d(t) = d for ≤ t ≤ , and k(t) = k and d(t) = d for t ≥ . the curve i (t) in pink corresponds to the simulation of ( ) with k(t) as given in ( ) the result is plotted in figure -a. it clearly shows a good approximation by two distinct lines, corresponding to a = . , b = − . and a = . , b = . . then, a prevision of number of people needing hospitalization can be provided by the formula: the result is plotted in figure -b. summing up the equations in ( ) and looking for stationary solutions yield the following theorem. non-negative stationary solutions of the system are given by (s, ,r), withs ≥ andr ≥ . furthermore, s(t) is decreasing, r(t) is increasing and the variation of i(t) is given by the sign of remark . it is worth noting that this theorem, which proof is relatively straightforward provides two simple but relevant interpretations from the applicative point of view. the first thing is that the stationary solutions are given by (s, ,r), withs ≥ andr ≥ . this means that the stationary solutions, are all with infected but may take arbitrary non-negative values (within a bounded interval) for susceptible and recovered. this reflects the reality of the disappearance of the virus. the second thing we want to mention is that the variation of i is given by the sign of expression ( ) . in particular, basically during a classical wave, this sign will be positive before the apex and negative after. it is estimated that around people used to commute from nj to ny before the lockdown policies due to the pandemic. it is natural to integrate these effects in the model, since those commuters played the role of vector for virus before the lockdown. we therefore build a small network by coupling two nodes, representing ny (node ) and nj (node ). to this end, we couple two copies of the local model ( ) to take into account the daily fluxes between those two states. this leads to the following model: the assumptions are analog to those given section , but we aim here to take into account the daily fluxes between nj and ny. for sake of simplicity, we consider that there is a flux of people coming from nj to ny in the morning and returning home at night. the coupling functions, which are non-autonomous here are given by: for node , where functions +cij stand for the densities of population coming from mode j and going into node i. in the remaining of the manuscript we assume that these functions cij do not depend on s, i and r and drop the superscripts. furthermore, we assume c and c to be periodic of period (one day), with a gaussian profile and respective apex at : am and : pm. not that the amplitude c is multiplied by a coefficient greater than . in comparison with c to take in account the amount of population int ny and nj. we want to emphasize here the attractivity of nyc, and therefore assume that the fluxes are mainly from nj to ny in the morning and form ny to nj in the evening. the function l(t) integrates policies of lockdown: after march , in the model, the daily fluxes are divided by . therefore, the function l(t) is a piecewise constant function given by: initial conditions were set to ( , , ), for node and ( , , ), for node . note there are no infected cases at initial time in nj. this means that in our model, initial spread in nj follows from infection in ny. the same methods as in section were used to fit the data for both ny and nj for the network model ( ) . illustrations are provided in figure . it shows how the model fits the data. in figure -a, we have plotted the quantity . (i + r ) as a function of time in red. recall that, in the model i + r , represents the number of people in the population which has been infected by the virus and are still alive. the blue dots correspond to the data retrieved from [ ] and plots the total number of infected minus the number of total deaths. analogously, in figure -b we have plotted the quantity t d (u)i (u)du as a function of time, in red. the blue dots correspond to the data retrieved from [ ] . finally, figure -c illustrates i t) and i (t), which represent respectively the infected population in ny and nj. it shows how the curve of nj follows the curve in ny, with a small attenuation. an analog theoretical result as in section holds for solutions of ( ). theorem . we assume that initial condition of system ( ) satisfies s ( ) > , i ( ) > , r ( ) = , s ( ) > , i ( ) = and r ( ) = . then for t > , all variables remain bounded and positive: there exists a positive constant m < r( ) + i( ) such that for all t > : ( ) and how it fits the data. in (a), we have plotted the quantity . (i + r ) as a function of time in red. recall that, in the model i + r , represents the number of people in the population which has been infected by the virus and are still alive. the blue dots correspond to the data retrieved from [ ] and plots the total number of infected minus the number of total deaths. analogously, in (b) we have plotted the quantity t d (u)i (u)du) as a function of time in red. the blue dots correspond to the total number of deaths in nj according with data retrieved from [ ] . panel (c) illustrates i (t) and i (t), which represent respectively the infected in ny and nj. and s (t) + i (t) + r (t) + s (t) + i (t) + r (t) ≤ s ( ) + i ( ) + s ( ) − t (d (s)i (s) + d (s)i (s))ds. for any non negative s ( ), r ( ), s ( ), r ( ), the following functions satisfying for t > i (t) = i (t) = s (t) + s (t) = s ( ) + s ( ) r (t) + r (t) = r ( ) + r ( ) s t = −s t = l(t) c (t)s − c (t)s r t = −r t = l(t) c (t)r − c (t)r are solutions of ( ). furthermore, s (t) + s (t) is decreasing, r (t) + r (t) is increasing and i (t) + i (t) is non increasing if and only if i (t)k (t) s (t) s (t) + i (t) + r (t) + i (t)k (t) s (t) s (t) + i (t) + r (t) ≤ i (t)(r + d (t)) + i (t)(r + d (t)). ( ) remark . as in remark , we want to point out some interpretation of this theorem in the context of the pandemic. here, we have described some solutions with free epidemic component (i ( t) = i (t) = ). note that in this case however, si and ri are not constant since they vary according to the fluxes between the two nodes. again the last inequality, quantifies the idea that when the spread of the virus becomes lower than the death and recovery the infected population starts to decrease. the difference here is that we take into account the two nodes together. report of the who-china joint mission on coronavirus disease (covid- ) covid- projections. the institute for health metrics and evaluation we re sharing coronavirus case data for every u.s. county. the new york times oficial oral report of new york governor on covid percent of nyc residents tested in state study have antibodies from covid- oficial oral report of new york governor on covid . pbs news a network model for control of dengue epidemic using sterile insect technique real estimates of mortality following covid- infection. the lancet infectious diseases a mathematical model for simulating the phase-based transmissibility of a novel coronavirus disease transmission models with density-dependent demographics severe acute respiratory syndrome a contribution to the mathematical theory of epidemics prediction of the epidemic peak of coronavirus disease in japan correcting under-reported covid- case numbers: estimating the true scale of the pandemic a conceptual model for the coronavirus disease (covid- ) outbreak in wuhan, china with individual reaction and governmental action understanding unreported cases in the covid- epidemic outbreak in wuhan, china, and the importance of major public health interventions predicting the cumulative number of cases for the covid- epidemic in china from early data the chikungunya disease: modeling, vector and transmission global dynamics optimal control of chikungunya disease: larvae reduction, treatment and prevention using early data to estimate the actual infection fatality ratio from covid- in france estimation of the transmission risk of the -ncov and its implication for public health interventions rigorous surveillance is necessary for high confidence in end-of-outbreak declarations for ebola and other infectious diseases global stability of a delayed sir epidemic model with density dependent birth and death rates in this article, we have considered a simple non-autonomous sir model to fit the data of covid- in new york state. the model illustrates and quantifies how acting on the control k(t) allows to flatten the curve of infected people over the time. from the model, using classical statistical methods, it is then possible to provide predictions of the number of people in needs of hospitalization. lastly, we have fitted data from nj state thanks to a coupled sir model taking into account the daily fluxes between nj and ny. it allows to predict similar dynamics in ny and nj, with a delay and small attenuation. note that, despite its simplicity, our model fits, as good as more sophisticated models, the available data during the growing phase of march . in a forthcoming work, we aim to fit data with the model over a longer period . this would provide an accurate estimation of the different parameters. key: cord- -erpoh he authors: schaback, robert title: on covid- modelling date: - - journal: nan doi: nan sha: doc_id: cord_uid: erpoh he this contribution analyzes the covid- outbreak by comparably simple mathematical and numerical methods. the final goal is to predict the peak of the epidemic outbreak per country with a reliable technique. this is done by an algorithm motivated by standard sir models and aligned with the standard data provided by the johns hopkins university. to reconstruct data for the unregistered infected, the algorithm uses current values of the infection fatality rate and a data-driven estimation of a specific form of the recovery rate. all other ingredients are data-driven as well. various examples of predictions are provided for illustration. this contribution starts in section with a rather elementary reconciliation of the standard sir model for epidemics, featuring the central notions like basic reproduction number, herd immunity threshold, and doubling time, together with some critical remarks on their abuse in the media. experts can skip over this completely. readers interested in the predictions should jump right away to section . section describes the johns hopkins data source with its limitations and flaws, and then presents a variation of a sir model that can be applied directly to the data. it allows to estimate basic parameters, including the basic reproduction number, but does not work for predictions of peaks of epidemics. to achieve the latter goal, section combines the data-compatible model of section with a sir model dealing with the unknown susceptibles and the unregistered infectious. this needs two extra parameters that should be extracted from the literature. the first is the infection fatality rate, as provided e.g. by [ , ] , combined with the case fatality rate that can be deduced from the johns hopkins pointing out certain abuses of these notions. this will not work without calculus, but things were kept as simple as possible. readers should take the opportunity to brush up their calculus knowledge. experts should go over to section . the simplest standard "sir" model of epidemics (e.g. [ ] and easily retrievable in the wikipedia [ ] ) deals with three variables . susceptible (s), . infectious (i), and . removed (r). the removed cannot infect anybody anymore, being either dead or immune. this is the viewpoint of bacteria of viruses. the difference between death and immunity of subjects is totally irrelevant for them: they cannot proliferate anymore in both cases. the sir model cannot say anything about death rates of persons. the susceptible are not yet infected and not immune, while the infectious can infect susceptibles. the three classes s, i, and r are disjoint and add up to a fixed total population count n = s + i + r. all of these are ideally assumed to be smooth functions of time t, and satisfy the differential equationṡ s = −β s n i, where the dot stands for the time derivative, and where β and γ are positive parameters. the product s n i models the probability that an infectious meets a susceptible. note that the removed of the sir model are not the recovered of the johns hopkins data that we treat later, and the sir model does not account for the confirmed counted there. sinceṄ =Ṡ +İ +Ṙ = , the equation n = s + i + r is kept valid at all times. the term β s n i moves susceptibles to infectious, while γi moves infectious to removed. thus β represents an infection rate while the removal rate γ accounts for either healing or fatality after infection, i.e. immunity. political decisions about reducing contact probabilities will affect β , while γ resembles the balance between the medical aggressivity of the infection and the quality of the health care system. as long as the infectious i are positive, the susceptibles s are decreasing, while the removed r are increasing. excluding the trivial case of zero infectious from now on, the removed and the susceptible will be strictly monotonic. qualitatively, the system is not really dependent on n, because one can multiply n, r, i, and s by arbitrary factors without changing the system. as an aside, one can also go over to relative quantities with the two differential equations d dt figure shows some simple examples that will be explained in some detail below. we start by looking at the initial conditions. since everything is invariant under an additive time shift, we can start at time and consideṙ i( ) = +β s( ) n i( ) − γi( ) and see that the infectious decrease right from the start if and this keeps going on since s must decrease anḋ there is no outbreak in this case, because there are not enough susceptibles at start time. the case s( ) = n means that there is no infection at all, and we ignore it, though it is a solution to the system, with i = r = , s = n throughout. if there is a time instance t i (maybe t i = above) where the infectious are positive and do not change, we have =İ(t i ) = β s(t i ) n i(t i ) − γi(t i ), if β < γ holds, this situation cannot occur, and i must be decreasing all the time, i.e. the infection dies out. this is what everybody wants. there is no outbreak. in case of γ = β we go back to the initial situation of the previous section and see that there is no outbreak due to s( )/n < if there is an infection at all. the interesting case is β > γ. then the first part of ( ) shows that as soon as t is larger than the peak time t i , the infectious will decrease due to ( ) . therefore the zero ofİ must be a maximum, i.e. a peak, and it is unique. the infectious go to zero even in the peak situation. it is one of the most important practical problems in the beginning of an epidemic to predict • whether there will be a peak at all, • when the possible peak will come, and • how many infectious will be at the peak. this can be answered if one has good estimates for β and γ, and we shall deal with this problem in the major part of this paper, in real life it is highly important to avoid the peak situation, and this can only be done by administrative measures that change β and γ to the situation β < γ. this is what management of epidemics is all about, provided that an epidemic follows the sir model. we shall see how countries perform. the quotient is called the basic reproduction number. if it is not larger than one, there is no outbreak, whatever the initial conditions are. if it is larger than one, there is an outbreak provided that holds. in that case, there is a time t i where i reaches a maximum, and ( ) holds there. when we discuss an outbreak in what follows, we always assume r > and ( ). if we later let r tend to from above, we also require that s( ) tends to n from below, in order to stay in the outbreak situation. both β and γ change under a change of time scale, but the basic reproduction number is invariant. physically, β and γ have the dimension time − , but r = β /γ is dimensionless. figure shows a series of test runs with s( ) = n · . and r( ) = with fixed γ = . and β varying from . to . , such that r varies from / to . due to the realistically small i( ) being . % of the population, one cannot see the decaying cases near startup, but the tails of the blue i curves are decaying examples by starting value, due to s(t) n < γ β = /r when started at time t. decreasing r flattens the blue curves for i. one can observe that i always dies out, while s and r tend to fixed positive levels. we shall prove this below. from the system, one can also infer that r has an inflection point where i has its maximum, since .. if only r would be observable, one could locate the peak of i via the inflection point of r. figure shows an artifical case with a large starting value i( ) = n/ , fixed γ = . and β varying from . to . , letting r vary from . to . in contrast to figure , this example shows cases with small r properly. the essence is that the infectious go down, whether they have a peak or not, and there will always be a portion of susceptibles. again, we shall prove this below. classical sir modeling this is a number related to the basic reproduction number r by following a special scenario. if a population is threatened by an infection with basic reproduction number r , what is the number of immune persons needed to prevent an outbreak right from the start? we can read this off equation ( ) in the ideal situation that i( ) = and s( ) + r( ) = n, namely r as the threshold between outbreak and decay. this does not refer to a whole epidemic scenario, nor to an epidemic outbreak. it is a condition to be checked before anything happens, and useless within a developing epidemic, whatever the media say. in the peak situation of ( ), the fraction of the non-susceptible at the peak t i of i is exactly the herd immunity threshold. thus it is correct to say that if the immune of a population are below the herd immunity threshold at startup, and if the basic reproduction number is larger than one, the sum of the immune and the infectious will rise up to the herd immunity threshold and then the infectious will decay. this is often stated imprecisely in the media. furthermore, the herd immunity threshold has nothing to do with the important long-term ratio of susceptibles to removed. we shall address this ratio in section . . the most interesting questions during an outbreak with r > are • at which time t i will we reach the maximum of the infectious, and • what is i(t i ), i.e. how many people will maximally be infectious at that time? it will turn out that there are no easy direct answers. from ( ) we see that at the maximum of i the susceptibles s have the value i.e. the portion /r of the population is susceptible. from that time on, the infectious decrease. in terms of r and i, the value of the non-susceptibles marks the peak of the infectious at the herd immunity threshold. "flattening the curve", as often mentioned in the media, is intended to mean making the maximum of i smaller, but this is not exactly what happens, since the maximum is described by the penultimate equation concerning the susceptibles, while for i(t i ) we only know yielding that the left-hand side gets smaller if r gets closer to one. politically, this requires either making β smaller via reducing contact probabilities or making γ larger by improving the health system, or both. anyway, "flattening the curve" works by letting r tend to from above, but the basic reproduction number does not directly determine the time t i of the maximum or the value there. we shall improve the above analysis in section . . in the beginning of the outbreak, s/n is near to one, and thereforė i ≈ +β i − γi models an exponential outbreak with exponent β − γ > , with a solution if this is done in discrete time steps ∆t, one has the severity of the outbreak is not controlled by r = β /γ, but rather via β − γ. publishing single values i(t) does not give any information about β − γ. better is the ratio of two subsequent values and if this gets smaller over time, the outbreak gets less dramatic because β − γ gets smaller. really useful information about an outbreak does not consist of values and not of increments, but of increments of increments, i.e. some second derivative information. this is what the media rarely provided during the outbreak. another information used by media during the outbreak is the doubling time, i.e. how many days it takes until daily values double. this is the number n in i.e. it is inversely proportional to β − γ. if political action doubles the "doubling time", if halves β − γ. if politicians do this repeatedly, they never reach β < γ, and they never escape an exponential outbreak if they do this any finite number of times. extending the doubling time will never prevent a peak, it only postpones it and hopefully flattens it. when presenting a "doubling time", media should always point out that this makes only sense during an exponential outbreak. and it is not related to the basic reproduction number r = β /γ, but rather to the difference β − γ. media often say that the basic reproduction number r gives the number of persons an average infectious infects while being infectious. this is a rather mystical statement that needs underpinning. the quantity is a value that has the physical dimension of time. it describes the ratio between current infectious and current newly removed, and thus can be seen as the average time needed for an infectious to get removed, i.e. it is the average time that an infectious can infect others. correspondingly, are the newly infected, and therefore can be seen as the time it needs for an average infectious to generate a new infectious. the ratio β γ s n then gives how many new infectious can be generated by an infectious while being infected, but this is only close to r if s ≈ n, i.e. at the start of an outbreak. a correct statement is that r is the average number of infections an infectious generates while being infectious, but within an unlimited supply of susceptibles. besides the peak in case r > , it is interesting to know the portions of the population that get either removed (by death or immunity) or get never in contact to the infection. this concerns the long-term behavior of the removed and the susceptibles. figures and if we are at a time t d behind the possible peak at t i , or in a decay situation enforced by starting value, like in ( ), we know that i must decrease exponentially to zero. this follows from showing that log i must decrease linearly, or i must decrease exponentially. thus we get rid of the infectious in the long run, keeping only susceptibles and removed. surprisingly, this happens independent of how large r is. dividing the first equation in ( ) by the third leads to and when setting σ = s/n and ρ = r/n, we get when assuming r( ) = at startup. since ρ is increasing, it has a limit < ρ ∞ ≤ for t → ∞, and in this limit holds, together with the condition ρ ∞ + σ (ρ ∞ ) = , because there are no more infectious. the equation has a unique solution in ( , ) dependent on σ ( ) < and r = β /γ. see qualitatively, we can use ( ) or ( ) in the form to see that the ratio of removed to susceptibles increases with r , but there is a logarithm involved. all of this has some serious implications, if the model is correct for an epidemic situation. first, the infectious always go to zero, but susceptibles always remain. this means that a new infection can always arise whenever some infected person enters the sanitized population. the outbreak risk is dependent on the portion σ ∞ = − ρ ∞ of the susceptibles. this illustrates the importance of vaccination, e.g. against measles or influenza. the above analysis shows that large values of r lead to large relative values of removed to susceptible in the limit. the consequence is that systems with large r have a dramatic outbreak and lead to a large portion of removed. this is good news in case that the rate of fatalities within the removed is low, but very bad news otherwise. when politicians try to "flatten the curve" by bringing r below from some time on, this will automatically decrease the asymptotic rate of removed and increase the asymptotic rate of susceptibles in the population. this is particularly important if the rate of fatalities within the removed is high, but by the previous argument the risk of re-infection rises due to the larger portion of susceptibles. the decay situation ( ) implies that therefore the final rate of the removed is not smaller than the herd immunity threshold. this is good news for possible re-infections, but only if the death rate among the removed is small enough. in a decay situation like in ( ), we get figure : solving for ρ ∞ for fixed c( ) = . and varying r to see that the exponential decay is not ruled by β − γ as in the outbreak case with r > , but rather by −γ + β σ ∞ . this also holds for large r = β /γ because σ ∞ counteracts. the bell shapes of the peaked i curves are not symmetric with respect to the peak. if we go back to analyzing the peak of i at t i for r > , we know and get leading to for standard infections that have starting values σ ( ) = s( )/n very close to one, the maximal ratio of infectious is figure shows the behaviour of the function, and this is what "flattening the curve" is all about. a value of r = gets a maximum of more than % of the population infectious at a single time. if % need hospital care, this implies that a country needs hospital beds for % of the population. the dotted line leaves the log term out, i.e. is marks the rate of the susceptibles at the peak, and by ( ) the difference is the rate r(t i )/n of the recovered at the peak. to analyze the peak time t i , we usė to get an upper bound for the exponential outbreak that implies a lower bound for t i of the form this needs improvement. to get a quantitative result about "flattening the curve", we first evaluate the integral assuming r( ) = , and set it equal to an integral over the constant value at the maximum, i.e. we squeeze the area under the curve into a rectangle of length b −a under the maximal value, i.e. and if we "flatten the curve" by letting r tend to from above, we see that the length b − a of the above rectangle goes to infinity like r /( − r ), because ρ ∞ tends to . if there is no peak, e.g. if r = β /γ is below either at the beginning or after some political intervention, one can repeat the above argument starting with the infectious at some time t looking at the area under i from t to infinity: this needs improvement as well. here is a detour that is well-known in the sir literature. the sir system can be written as and in a new time variable τ with dτ = i n dt, one gets the system the beauty of this is that the roles of β and γ are perfectly split, like the roles of σ and ρ. in the new timescale, ρ increases linearly and σ decreases exponentially. the basic reproduction number then describes the fixed ratio ( ), and the result ( ) of section . comes back as for the case ρ( ) = . this approach has the disadvantage to conceal the peak within the new timescale, and is useless for peak prediction. if data for the sir model were fully available, one could solve for and we shall use this in section . . the validity of a sir model can be tested by checking whether the right-hand sides for β , γ and r are roughly constant. if data are sampled locally, e.g. before or after a peak, the above technique should determine the parameters for the global epidemic, i.e. be useful for either prediction or backward testing. however, in pandemics like covid- , the parameters β and γ change over time by administrative action. this means that they should be considered as functions in the above equations, and then their changes may be used for conclusions about the influence of such actions. this is intended when media say that "r has changed". from this viewpoint, one can go back to the sir model and consider β and γ as "control functions" that just describe the relation between the variables. but the main argument against using ( ) is that the data are hardly available. this is the concern of the next section. now we want to confront the modelling of the previous section with available data. this is crucial for maneuvering countries through the epidemics [ ] . in this text, we work with the covid- data from the cssegisanddata repository of the johns hopkins university [ ] . they are the only source that provides comparable data on a worldwide scale. the numbers there are as cumulative integer valued time series in days from jan. nd, . all these values are absolute numbers, not relative to a total population. note that the unconfirmed cases are not accessible at all, while the confirmed contain the dead and the recovered of earlier days. at this point, we do not question the integrity of the data, but there are many wellknown flaws. in particular, the values for specific days are partly belonging to previous days, due to delays in the chains of data transmission in different countries. this is why, at some points, we shall apply some conservative smoothing of the data. finally, there are inconsistencies that possibly need data changes. for an example, consider that usually covid- cases lead to recovery or death within a rather fixed period, e.g. k ≈ − days. but some johns hopkins data have less new infectioned at day n than the sum of recovered and dead at day n + k. and, there are countries like germany who deliver data of recovered in a very questionable way. the law in germany did not enforce authorities to collect data of recovered, and the united kingdom did not report numbers of dead and recovered from places outside the national health system, e.g. from senior's retirement homes. both strategies have changed somewhat in the meantime, as of early may, but the data still keep these flaws. we might assume that the dead plus the recovered of the johns hopkins data are the removed of the sir model, and that the infectious i = c − r − d of the johns hopkins data are the infectious of the sir model. but this is not strictly valid, because registration or confirmation come in the way. on the other hand, one can take the radical viewpoint that facts are not interesting if they do not show up in the johns hopkins data. except for the united kingdom, the important figures concern covid- casualties that are actually registered as such, others do not count, and serious cases needing hospitalization or leading to death should not go unregistered. if they do in certain countries, using such data will not be of any help, unless other data sources are available. if sir modelling does not work for the johns hopkins data, it is time to modify the sir technique appropriately, and this will be tried in this section. an important point for what follows is that the data come as daily values. to make this compatible with differential equations, we shall replace derivatives by differences. to get a first impression about the johns hopkins data, figure shows raw data up to day (may th , as of this writing). the presentation is logarithmic, because then linearly increasing or decreasing parts correspond to exponentially increasing or decreasing numbers in the real data. many presentations in the media are nonlogarithmic, and then all exponential outbreaks look similar. the real interesting data are the infectious i = c − r − d in black that show a peak or not. the other curves are cumulative. the data for other countries tell similar stories and are suppressed. the media, in particular german tv, present covid- data in a rather debatable way. when mentioning johns hopkins data, they provide c, d, and r separately without stating the most important figures, namely i = c − d − r, their change, and the change of their change. when mentioning data of the infectious from the robert koch institute alongside, they do not say precisely that these are noncumulative and should be compared to the i = c −r−d data of the johns hopkins university. and, in most cases during the outbreak, they did not mention the change of the change. quite like all other media. one can see in figure that germany and south korea have passed the peak of the infectious, while france is roughly at the peak and the united states are still in an exponential outbreak. the early figures, below day , are rather useless, but then an exponential outbreak is visible in all cases. this outbreak changes its slope due to political actions, and we shall analyze this later. see [ ] for a detailed early analysis of slope changes. there are strange anomalies in the recovered (green). france seems not to have delivered any data between days and , germany changed the data delivery policy between days and , and the uk data for the recovered are a mess. it should be noted that the available medical results on the covid- disease often state that confirmed will die or survive after a more or less fixed number of days, roughly to . this would imply that the red curves for the dead and the green curves for the recovered should roughly follow the blue curves for the confirmed with a fixed but measurable delay. this is partially observable, but much less accurately for the recovered. the idea is to define a model that works exclusively with the johns hopkins data, but comes close to a sir model, without being able to use s. since the sir model does not distinguish between recoveries and deaths, we set r sir ⇔ d jh + r jh and let the infectious be comparable, i.e. and we completely omit the susceptibles. from now on, we shall omit the subscript jh when we use the johns hopkins data, but we shall use sir when we go back to the sir model. defining time series γ n and b n that model γ and b = β · s sir /n without knowing s sir . this is equivalent to the model c n+ −c n = b n i n , i n+ − i n = b n i n − γ n i n , (r + d) n+ − (r + d) n − = γ n i n that maintains c = i + r + d, and we may call it a johns hopkins data model. it is very close to a sir model if the time series b n is not considered to be constant, but just an approximation of β · s sir /n. by brute force, one can consider as a data-driven substitute for β γ then there is a rather simple observation: if r n is smaller than one, the infectious decrease. it follows from but this is visible in the data anyway and not of much help. since r n models r s sir n , it always underestimates r . this underestimation gets dramatic when it must be assumed that s sir gets seriously smaller than n. at this point, it is not intended to forecast the epidemics. the focus is on extracting parameters from the johns hopkins data that relate to a background sir-type model. figure shows r s sir n estimates via r n for the last four weeks before day , i.e. march th . except for the united states, all countries were more or less successful in pressing r below one. in all cases, s sir /n is too close to one to have any influence. the variation in r n is not due to the decrease in s sir /n, but should rather be attributed to political action. as mentioned above, the estimates for r by r n are always too optimistic. for the figure, the raw johns hopkins data were smoothed by a double action of a / , / , / filter on the logarithms of the data. this smoother keeps constants and linear sections of the logarithm invariant, i.e. it does not change local exponential behavior. this smoothing was not applied to figure . it was by far not strong enough to eliminate the apparent -day oscillations that are frequently in the johns hopkins data, see the figure. data from the robert koch institute in germany have even stronger -day variations. as long as r n is roughly constant, the above approach will always model an exponential outbreak or decay, but never a peak, because the difference equations are linear. it can only help the user to tell if there is a peak ahead or behind, depending on r n ≈ r being larger or smaller than . if r n is kept below one, the confirmed infectious will not increase, causing no new threats to the health system. then the s/n factor will not decrease substantially, and a full sir model is not necessary. on the other hand, if a country manages to keep r n smaller than some r = b γ < , it is clear that it takes as long as countries keep the r n clearly below one, e.g. below / , this would mean that r ≈ r n n s sir stays below one if s sir ≥ n/ , i.e. as long as the majority of the population has not been in contact with the sars-cov- virus. this is good news. but observing a small r n can conceal a situation with a large r if s sir /n is small. this is one reason why countries need to get a grip on the susceptibles nationwide. it is tempting to use the above technique for prediction in such a way that the b n and the γ n are fitted to a constant or a linear function, and using the values of the fit for running the system into the future. this is very close to extending the logarithmic plots of figure by lines, using pencil and ruler, and thus hardly interesting. so far, the above argument cannot replace a sir model. it only interprets the available data. however, monitoring the johns hopkins data in the above way will be very useful when it comes to evaluate the effectivity of certain measures taken by politicians. it will be highly interesting to see how the data of figure continue, in particular when countries relax their contact restrictions. for cases where one still has to expect r > , e.g. uk, us and sweden around day (see figure ), the challenge remains to predict a possible peak. using the estimates from the previous section is questionable, because they concern the subpopulation of confirmed and are systematically underestimating r . the "real" sir model will have different parameters, and it needs the susceptibles to model a peak or to make the r n estimates realistic. so far, the johns hopkins data work in a range where s/n is still close to one, and the susceptibles are considered to be abundant. but the bad news for countries with r n > is that r n underestimates r . anyway, if one trusts the above time series as approximations to β and γ, one can run a sir model, provided one is in the case r > and has reasonable starting values. but these are a problem, because the unconfirmed infectious and the unconfirmed recovered are not known, even if, for simplicity, one assumes that there are no unconfirmed covid- deaths. for an unrealistic scenario, consider total registration, i.e. all infected are automatically confirmed. then the susceptibles in the johns hopkins model would be s n = n −c n = n − i n − r n − d n . now the estimate for r must be corrected to r n n s n = r n n n −c n = r n + c n n −c n but this change will not be serious during the outbreak. one gets a crude prediction of the peak in case r = β /γ > . figure shows results for two cases. the top shows the case for the united states using data from day (may th ) and estimating β and γ from the data one week before. the peak is predicted at day with a total rate of % infectious. to see how crude the technique is, the second plot shows the case of germany using data up to day , i.e. before the peak, and the peak is predicted at day with about % infected. at day , r was estimated at . , but a few days later the estimate went below ( figure ) by political intervention changing b n considerably. see figure for a much better prediction using data only up to day . to get closer to what actually happens, one should combine the data-oriented johns hopkins data model with a sir model that accounts for what happens outside of the confirmed. we introduce the time series this implies that all deaths occur within the confirmed, though this is a highly debatable issue. it assumes that persons with serious symptoms get confirmed, and nobody dies of covid- without prior confirmation. the removed from the viewpoint of a global sir model including h and m are h +c, and thus the sir model is to run this hidden model with constant n = s + m + h + c, one needs initial values and good estimates for β and γ, which are not the ones of the johns hopkins data model of section . . the johns hopkins variables d and r are linked to the hidden model via c = i − r − d. they follow an observable model with instantaneous case death and recovery rates γ icd and γ icr for the confirmed infectious. these rates can be estimated separately from the available johns hopkins data, and we shall do this below. we call thes rates instantaneous, because they artificially attribute the new deaths or recoveries at day n + to the previous day, not to earlier days. in this sense, they are rather artificial, and we shall address this question. they are case rates, because they concern the confirmed. the observable model is coupled to the hidden model only by c n . any datadriven c n from the observable model can be used to enter the h + c variable of the hidden model, but in an unknown ratio. conversely, any version of the hidden model produces h + c values that do not determine the c part. summarizing, there is no way to fit the hidden model to the data without additional assumptions. various possibilites were tried to connect the hidden to the observable. two will be presented now. recall that the parameter γ icd in the observable model ( ) relates case fatalities to the confirmed infectious of the previous day. in contrast to this, the infection fatality rate in the standard literature, denoted by γ if here, is relating to the infection directly, independent of the confirmation, and gives the probability to die of covid- after infection with the sars-cov- virus. it is γ if = . % by [ ] and . % by [ ] , but specialized for china. recent data from the heinsberg study by streeck et. al. [ ] gives a value of . % for the heinsberg population in germany. the idea to use the infection fatality rate for information about the hidden system comes from [ ] . the way to use it will depend on how to handle delays, and it turned out to be difficult to use these rates in a stable way. let us focus on probabilities to die either after an infection or after confirmation of an infection. the first is the infection fatality rate given in the literature, but what is the comparable case fatality rate γ cf when using the johns hopkins data? it is clearly not the γ icd in ( ), giving the ratio of new deaths at day n + as a fraction of the confirmed infectious at day n. it makes much more sense to use the fact that covid- diseases end after k days from confirmation with either death or recovery. let us call this the k-day rule. suggested values for k range between and . following [ ] , one can estimate the probability p i to die on day i after confirmation, and this works in a stable way per country, based only on c and d, not on the unstable r data. in [ ] this approach was used to produce r values that comply with the k-day rule, but here we use it for estimating the case fatality. the technique of [ ] performs a fit i.e. it assigns all new deaths at day n to previous new infections on previous days in a hopefully consistent way, minimizing the error in the above formula under variation of the probabilities p i . if the p i are known for days to k, the case fatality rate is this argument can also be seen the other way round: the new confirmed c n −c n− at day enter into d n+ − d n with probability q = p , into d n+ − d n+ with probability q = p ( − p ) and so on. the rest enters into the new recovered at day n + k with probability q k+ if we set p k+ = . thus the case fatality rate can be expressed as − q k+ like above. at this point, there is a hidden assumption. persons that are new to the confirmed at day n are not dead and not recovered. the change c n+ −c n to the confirmed is therefore understood as the number of new registered infections. otherwise, one might replace c n−i −c n−i− by i n−i − i n−i− in ( ), but this would connect a cumulative function to a non-cumulative function. furthermore, this would use the unsafe data of the recovered. in fact, equation ( ) is unexpectedly reliable, provided one looks at the sum of the probabilities p i . this follows from series of experiments that we do not document fully here, in [ ] , data for k days backwards were used for the estimation, and results did not change much when more or less data were used or when k was modified. here, the range ≤ k ≤ was tested, and backlogs of up to days from day (as of this writing). see figure below for an example. larger k lead to somewhat larger fatality rates, because the method has more leeway to assign deaths to confirmations, but the increase is rather small when k is increased beyond . it is remarkable that k = suffices for a rather good fit for us, uk, sweden, and italy. in contrast to other countries, this means that days after confirmation are enough to assign deaths consistently to previous confirmations, indicating that a large portion of confirmations concern rather serious cases. in general, the fit gets better when the backlog is not too large, avoiding use of outdated data, and the resulting rate does not change much when backlogs are shortened. roughly, the rule of a backlog of k days works well, together with k = to allow enough leeway to ensure a small fitting error. note that all variations of k and the backlog data have a rather small effect on the sum of the p i , while the p i will vary strongly. see the first column of table for estimates of case fatality rates for different countries, calculated on day (may th ). these depend strongly on the strategy for confirmation. in particular, they are high when only serious cases are confirmed, e.g. cases that need hospital care. if many more people are tested, confirmations will contain plenty of much less serious cases, and then the case fatality rates are low. the values were entered manually after inspection of a plot of the rates as functions of k and the backlog. see figure for how the value . for italy was determined. the instantaneous case death rate γ icd of ( ) for the johns hopkins data comes out around . for germany by direct inspection of the data via table is about . . the deaths have to be attributed to different days using the k-day rule, they cannot easily be assigned to the previous day without making the rate smaller. if the case fatality rates γ cf of table are used with the infection fatality rate of γ if = . , one should obtain an estimate of the total infectious. if the formula ( ) is written as in terms of the previous new infections s n−i− − s n−i with infection fatality probabilitiesq i , one should maintain and this works by setting in general, without using the unstable p i . the quotient γ if γ cf can be called the detection rate, stating the fraction of infectious that is entering confirmation. see the second column of table . note that all of this is dependent on good estimates of the infection fatality rate, and the new value by [ ] will roughly double the detection rate for germany. all of this is comparable to the findings of [ ] and uses the basic idea from there, but is executed with a somewhat different technique. a simple way to understand the quotient γ if γ cf as a detection rate is to ask for the probability p(c) for confirmation. if the probability to die after confirmation is γ cf , and if there are no deaths outside confirmation, then and it is tempting to replace s by m in ( ), but this would make m cumulative. under political changes of the parameters β and γ, the estimation of the case fatality rate should be made locally, not globally. using the experience of [ ] and section . . , we shall do this using a fixed k = for the k-day rule and data for a fixed backlog of k days. we need another parameter to make the model work. there are many choices, and after some failures we selected the constant γ iir in a model equation following what was mentioned about instantaneous rates in section . , this is an instantaneous infection recovery rate, relating the new unregistered recovered to the unregistered infections the day before. a good value of γ iir can come out of an experiment that produces a time series for m and h, i.e. for unregistered infectious and unregistered recovered. then the instantaneous infection recovery rate γ iir can be obtained directly by the infection recovery rate γ ir = − γ if does not help much, because we need an instantaneous rate that has no interpretation as a probability. without additional input, we can look at the instantaneous case recovery rate that is available from the johns hopkins data, and comes out experimentally to be rather stable. the rate γ iir must be larger, because we now are not in the subpopulation of the confirmed, and nobody can die without going first into the population of the confirmed. as long as no better data are available, we use the formula that accounts for two things: . the value γ icr is increased by the ratio of recovered probabilities for the infected and the confirmed, . the value γ ir is multiplied by a factor for transition to immediate rates, and this factor is the transition factor for the confirmed recovered. the above strategy is debatable and may be the weakest point of this approach. however, others turned out to be worse, mainly due to instability of results. in ( ) the rate γ ir is fixed, and the rate γ cr is determined locally via section . . . the rate γ icr follows from the time series r n+ − r n i n ≈ γ icr as in ( ). this works for countries that provide useful data for the recovered. in that case, and in others to follow below, we can take the time series itself as long as we have data. for prediction, we estimate the constant from the time series using a fixed backlog of m days from the current day. since many data have a weekly oscillation, due to data being not properly delivered during weekends, the backlog should not be less than . but for certain countries, like the united kingdom, the data for the recovered are useless. in such cases, we employ the technique of [ ] to estimate the recovered using the k-day rule and a backlog of k days, like in section . . for the case fatality rates. we now have everything to run the hidden model, but we do it first for days that have available johns hopkins data. this leads to estimations of s, m, and h from the observed data of the johns hopkins source. with the parameters from above, we use the new relations the first equation is a priori and determines s. one could run it over the whole range of available data, if the parameter γ cf were fixed, like γ ci . but since we estimate it via section . that resulted in table , we should use section . . to calculate γ cf locally. the second is a posteriori and lets h follow from m, but we postpone it. the third model equation in ( ) will be handled defining γ n by brute force via we then set up the second model equation for m as that can be solved if an initial value is prescribed. the first model equation in ( ) is used to define β n by the model is run by executing ( ) since the populations are large, the starting values for s are not important. the starting value for h is irrelevant for h itself, because only differences enter, but it determines the starting value for m due to the balance equation. anyway, it turns out experimentally that the starting values do not matter, if the model is started early. the hidden model ( ) depends much more strongly on c than on the starting values. see figure for an example. along with the calculation of s, m, and h, we run the calculation of the time series β n and γ n using ( ) and ( ). these yield estimates for the parameters of the full sir model that replace the earlier time series from the johns hopkins data model in section . . the figures to follow in section . show the original johns hopkins data together with the hidden variables s, m, and h that are calculated by the above technique. note that the only ingredients beside the johns hopkins data are the number k for the k-day rule, the infection fatality rate γ if from the literature, and the backlog m for estimation of constants from time series. to let the combined model predict the future, or to check what it would have predicted if used at an earlier day, we take the completed model of the previous sections up to a day n and use the values s n , m n , h n , c n , i n , r n and d n for starting the prediction. with the variable hc := h +c, we use the recursion this needs fixed values of β and γ that we estimate from the time series for β n and γ n by using a backlog of m days, following section . . . the instantaneous rates γ icr and γ icd can be calculated via their time series, as in ( ) and ( ), using the same backlog. we do this at the starting point of the prediction, and then the model runs in what can be called a no political change mode. examples will follow in section . . the first part of the full model ( ) runs as a standard sir model for the variables s, m and h +c, and inherits the properties of these as described in section . it does not use the γ iir parameter of the second equation in ( ), and it uses the first the other way round, now determining c from s, not s from c. the bracket is positive if which is enough for practical purposes. figure shows predictions on day , may th, for germany, sweden, us and uk, from the top. the plots for countries behind their peak are rather similar to the one for germany. the other three countries are selected because they still have to face their peak, if no action is taken to change the parameters. the estimated r values are . , . , . , and . , respectively. note that these are not directly comparable to figure , because they are the fitted constant to the backlog of a week, and using ( ) and ( ) instead of ( ) . the black and magenta curves are the estimated m and h values, while the s values are hardly visible on the top. the hidden m and h in black and magenta follow roughly the observable i and c in blue and cyan, but with a factor due to the detection rate that is different between countries, see table . to evaluate the predictions, one should go back and start the predictions for earlier days, to compare with what happened later. figure shows overplots of predictions for days , , and , each a fortnight apart. the starting points of the predictions are marked by asterisks again. now each prediction has slightly different estimates for s, m, and h due to different available data. recall that the determination of these variables is done while there are johns hopkins data available, following section . , and will be dependent on the data-driven estimations described there. in particular, the case fatality rates and detection rates of table change with the starting point of the prediction, and they determine s, m, and h backwards, see the figure for sweden. all test runs were made for the infection fatality rate γ if = . , the delay k = for estimating case fatalities, and a backlog of days when estimating constants out of recent values of time series. the choice γ if = . is somewhat between . % from [ ] , . % from [ ] , and . % from [ ] . new information on infection fatality rates should be included as soon as they are available. the johns hopkins data were smoothed by a double application of the / , / , / filter on the logarithms, like for figure . for uk and sweden, the data for the confirmed recovered r were replaced by the -day rule estimation via [ ] and a backlog of days. the original data were too messy to be useful, unfortunately. for an early case in germany, figure shows the prediction based on data of day , march th . on the side, the figure contains a wide variation of the starting value h = n − s − c at the starting point, by multipliers between / and . this has hardly any effect on the results. the peak of about million infected is predicted on day , may th , with roughly a million confirmed and about casualties at the peak and about . finally. note that the real death count is about on may th , and the prediction of the day, in figure , targets a final count of below . the parameter changes by political measures turned out to be rather effective, like in many countries that applied similar strategies. but since parts of the population want to go back to their previous lifestyle, all of this is endangered, and the figures should be monitored carefully. of course, all of this makes sense only under the assumption that reality follows the model, in spite of all attempts to design a model that follows reality. so far, the model presented here seems to be useful, combining theory and practically available data. it is data-driven to a very large extent, using only the infection fatality rate from outside for prediction, and the approximation ( ) for calibration. on the downside, there is quite a number of shortcomings: latest data, but it needs changes as soon as new information on the hidden infections come in. • there may be better ways of estimating the hidden part of the epidemics. however, it will be easy to adapt the model to other parameter choices. if time series for the unknown variables get available, the model can easily be adapted to being data-driven by the new data. • the treatment of delays is unsatisfactory. in particular, infected persons get infected immediately, and the k-day rule is not followed at all places in the model. but the rule is violated as well in the data [ ] . • there is no stochastics involved, except for simple things like estimating constants by least squares, or for certain probabilistic arguments on the side, e.g. in section . . . but it is not at all clear whether there are enough data to do a proper probabilistic analysis. • as long as there is no probabilistic analysis, there should be more simulations under perturbations of the data and the parameters. a few were included, e.g. for section . . and figures and , but a large number was performed in the background when preparing the paper, making sure that results are stable, but there are never too many test simulations. • other models were not considered, e.g. the classical ones with delays [ , ] . • under certain circumstances, epidemics do not show an exponential outbreak, in particular if they hit only locally and a prepared population. see figure for the covid- cases in göttingen and vicinity. • estimates for the peak time in section . need improvement. • same for the underpinning of "flattening the curve" in section . . matlab programs are available on request. modellierung von beispielszenarien der sars-cov- -epidemie in deutschland average detection rate of sars-cov- infections is estimated around six percent inferring covid- spreading rates and potential change points for case number forecasts the mathematics of infectious diseases a problem in the theory of epidemics i a problem in the theory of epidemics ii covid- repository at github sars-cov- steckbrief zur coronavirus-krankheit- (covid- ) modelling recovered cases and death probabilities for the covid- outbreak preliminary results from the heinsberg outbreak, cited after göttinger tageblatt estimates of the severity of coronavirus disease : a model-based analysis, www.thelancet.com/infection published online bloß raus hier ! article in the weekly journal die zeit key: cord- -o ru nr authors: tewari, a. title: temporal analysis of covid- peak outbreak date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: o ru nr intent of this research is to explore how a specific class of mathematical models namely susceptible-infected-removed model can be utilized to forecast peak outbreak timelines of covid- epidemic amongst a population of interest starting from the date of first reported case. till the time of this research, there was no effective and universally accepted vaccine to control transmission and spread of this infection. covid- primarily spreads in population through respiratory droplets from an infected person cough and sneeze which infects people who are in proximity. covid- is spreading contagiously across the world. if health policy makers and medical experts could get early and timely insights into when peak infection rate would occur after first reported case, they could plan and optimize medical personnel, ventilators supply, and other medical resources without over-taxing the infrastructure. the predictions may also help policymakers devise strategies to control the epidemic, potentially saving many lives. thus, it can aid in critical decision-making process by providing actionable insights into covid- outbreak by leveraging available data. coronavirus is a large family of viruses causing illness in both animals and/or humans. over last decade or so, several other coronaviruses are known to cause respiratory infections in humans, ranging from the common cold to more severe diseases such as middle east respiratory syndrome (mers) and severe acute respiratory syndrome (sars). the recently discovered coronavirus causes coronavirus disease covid- . today covid- is causing a global pandemic affecting almost all countries [ ] . data from health organizations indicate that asymptomatic individuals can transmit the virus without themselves showing any signs of infection. disease control organizations across the globe are investing in research on this topic and how often this happens. recovery from novel coronavirus usually takes days [ ] . about %- % of patients with covid- infection require intensive care surveillance and ventilator support [ ] . this poses a challenge for health planners and administrators as to how to optimally plan and allocate medical staff and other resources such as ventilators etc. in a large sized country such as india. according to a joint report [ ] by princeton university and the center for disease dynamics, economics & policy (cddep), most of the beds and ventilators in india are concentrated in seven states only. the report also mentioned that bed capacity was saturated at hospitals and any spike in covid- cases would require drastic expansion of hospital beds and ventilators. this problem represents crux of the issue that the current research is trying to address by using mathematical modeling to predict peak covid- outbreak timeline in various states across india. many previous researches and studies have attempted to employ mathematical models to provide insights into spread of influenza epidemics and pandemics [ ] [ ] [ ] . many studies have investigated historical pandemics of the th century [ ] [ ] [ ] . modeling techniques have also been used to understand the influence of interventions in mitigating pandemics [ ] . a category of mathematical models is agent-based models (abm) which represent a relatively recent approach to model complexities in a system composed of agents whose actions are described using simple rules. it is different from classical sir mathematical models (which assume homogeneous population), as agent-based models try to simulate individuals with distinct characteristics and in theory can provide more realistic results. a recent study used agent-based model to evaluate covid- transmission risks in facilities [ ] . however, there are several difficulties associated with creating abms such as integration with too many features, choice of model parameters, model results being either trivial or too complex [ ] . the spread of covid- in india has been investigated in many researches including [ ] [ ] [ ] [ ], but they laid little emphasis on post-model validation for peak covid- timeline forecast. with this in mind, sir model is explored in current research to forecast peak covid- outbreak over a large population in india. the sir model was chosen because of its simplicity as well as minimalist compute and data requirements as compared to agent-based models. for the purpose of this research, compartmental class of mathematical models is used in modelling covid- . specifically, kermack-mckendrick susceptible-infected-removed (sir) model [ ] is employed which distributes . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . population in compartments with labels -s, i, or r at any point of time. s is the number of susceptible individuals, i is the number of individuals infected, r is the number of individuals who have recovered and developed immunity to the infection. the number of s, i and r individuals may vary over time, but total population remains constant. model computes the predicted number of people infected with a contagious illness in a closed population over time. the model assumes fixed size homogeneous population with no social or spatial structure. an individual with covid- is infectious for approximately days [ ]. let's assume during these weeks period, they can potentially pass covid- to approximately people. these parameters determine the model inputs viz γ, the recovery rate (= days) and β, rate of infection (= / = . ). using these parameters, the time to reach peak covid- outbreak starting from first reported case is predicted by solving below system of three linked nonlinear ordinary differential equations in python [ ] : data covid- statistics data till -august- used in this research has been sourced from ndtv [ ]. population figures for the largest states in india have been taken from statistics times [ ] . together these states constitute more than % of total population in india. . . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint v. discussion this research was conducted to evaluate the feasibility of application of sir model to predict peak covid- outbreak timeline from the date of first reported case for the largest states in india which together constitute more than % or almost / th of total population in india. the broader goal is to analyze and evaluate sir model to provide early insights to public health agencies which in turn can expedite optimum response to covid- epidemic. the research results indicate that for out of these states, sir model could predict peak covid- outbreak timeline from the date of first reported case with error of +/- days or less, standard deviation (sd) in error = . days and mean absolute deviation (mad) in error = . days. aim of this research paper was to predict covid- peak timeline in various indian states using sir model. for out of largest states in india included in the research, chosen sir model could predict peak outbreak timeline from the date of the first reported case with error of +/- days or less and standard deviation (sd) in error = . day. these states constitute over % of total population of india. the model results present a potential opportunity for health policy makers and medical experts to gain early and timely insights into covid- peak outbreak timelines for a large proportion of population in india. they could use these insights to plan and optimize medical personnel and equipment or to devise strategies to control the epidemic, well before it hits its peak. while sir models have been extensively used, there is little research on validating their predictions. this research provided pragmatic validation of sir models over a large population. compartmental models are in many ways favorable to other exotic models due to their simplicity and minimal computational requirements. however, sir models assume several assumptions [ ] that do not exist in real world epidemic conditions. the sir model assumes that there is homogeneous mixing of the infected and susceptible individuals and that the total population is constant in time. in the classic sir model, the susceptible population decreases monotonically towards zero. however, these assumptions are not strictly valid in the case of covid- outbreak, since new hot-spots spike at different times. also, the effect of enforcing social distancing measures by respective government and health agencies has not been considered. this research does not attempts to perform an exhaustive study because of lack of suitable data as well as uncertainty in different factors, namely, the degree of home isolation, restrictions in social contact, the initial number of infected and exposed individuals, variations in incubation and infectious periods, and the fatality rate. population density, geographic area, demographics such as age and effect of social isolation etc. are possibly some parameters to consider and include in building more advanced epidemiology models for predicting peak epidemic outbreak timeline. disease spread models can also be used to predict number of infected individuals to better manage epidemics. the author would like to thank the editor and the reviewers for their helpful comments and review that contributed to improve the manuscript. ventilation of covid- patients in intensive care units the center for disease dynamics a contribution to the mathematical theory of epidemics statisticstimes seasonal influenza in the united states, france and australia:transmission and prospects for control the effect of public health measures on the influenza pandemicin us cities transmission dynamics of the great influenzapandemic of in geneva, switzerland: assessing the effects of hypothetical interventions transmissibility of pandemic influenza analyses of the (asian) influenza pandemic in the united kingdomand the impact of school closures predicting the global spread of new infec-tious agents estimating the impact ofschool closure on influenza transmission from sentinel data global stability of an sir epidemic model with time delays an agent-based model to evaluate the covid- transmission risks in fa-cilities communicating social simulation models to sceptical minds forecasting covid- impact in india using pandemic waves nonlinear growthmodels.medrxiv a minimal and adaptive prediction strategy for critical resource planning in apandemic.medrxiv possibilities of exponential or sigmoid growth of covid data in different states ofindia.medrxiv recent update on covid- in india: is locking down the country enough?medrxiv key: cord- -qszvwqtj authors: bizet, nana cabo; oca, alejandro cabo montes de title: modelos sir modificados para la evoluci'on del covid date: - - journal: nan doi: nan sha: doc_id: cord_uid: qszvwqtj we study the sir epidemiological model, with a variable contagion rate, applied to the evolution of covid in cuba. it is highlighted that an increase in the predictive character depends on understanding the dynamics for the temporal evolution of the rate of contagion $beta^*$. a semi-empirical model for this dynamics is formulated, where reaching $beta^*approx $ due to isolation is achieved after the mean duration of the disease $tau= /gamma$, in which the number of infected in the confined families has decreased. it is considered that $beta^*(t)$ should have an abrupt decrease on the day of initiation of confinement and decrease until canceling at the end of the interval $tau$. the analysis describes appropriately the infection curve for germany. the model is applied to predict an infection curve for cuba, which estimates a maximum number of infected as less than in the middle of may, depending on the rigor of the isolation. this is suggested by the ratio between the daily detected cases and the total. we consider the ratio between the observed and real infected cases (k) less than unity. the low value of k decreases the maximum obtained when $beta^*-gamma> $. the observed evolution is independent of k in the linear region. the value of $beta^*$ is also studied by time intervals, adjusting to the data of cuba, germany and south korea. we compare the extrapolation of the evolution of cuba with the contagion rate until . . with that obtained by a strict quarantine at the end of april. this model with variable $beta^*$ correctly describes the observed infected evolution curves. we emphasize that the desired maximum of the sir infected curve is not the maximum standard with constant $beta^*$, but one achieved due to quarantine when $tilde r_ =beta^*/gamma< $. for the countries controlling the epidemic the maxima are in the region in which sir equations are linear. la pandemia asociada la covid está siendo investigada con gran intensidad hoy día y la descripción de sus singulares propiedades ampliamente difundidas . el presente trabajo está dedicado a explorar aspectos de esta epidemia. consideramos un modelo determinista empleado en el estudio de epidemias, el sir (siglas de susceptibles, infectados y recuperados). los parámetros que controlan dichos modelos son β y γ. el primero define el número de personas susceptibles por unidad de tiempo que enferman por unidad de persona susceptible y por unidad de persona infectada. este también se utiliza después de multiplicado por el número de la población, en las unidades β * = βn . el segundo determina el número de personas por unidad de tiempo que se recuperan entre el número de personas infectadas. estimamos que la aplicación del sir en una efectiva predicción de los datos necesita la clarificación un aspecto central: el como estimar la dependencia temporal de los parámetros β y γ. en particular aquí nos enfocamos en el estudio de la evolución del primero de estos. en la sección i se exponen las ecuaciones del modelo sir y se presentan sus constantes para el caso de cuba. las soluciones obtenidas se ajustan a los datos de número de infectados activos que se reportan diariamente. destacamos las condiciones en que se obtiene un máximo valor del número de infectados en circunstancias realistas. dichas características son conocidas, pero creemosútil el insistir sobre ellas, se destaca que siempre que las ecuaciones indican infección, se cumple β * > γ. en esa situación el máximo del número de infectados siempre alcanza un valor del orden de la población n , lo que para cuba resultaría en millones. resaltamos que esos máximos, no son los que se reportan en los países que actualmente rebasan la epidemia (por ej. china, corea del sur y alemania), los cuales presentan máximos del número de infectados mucho menores que la población. en la sección ii se considera el mismo modelo sir pero en el cual se introduce una ya reconocida propiedad de la presente epidemia: la razón k entre el número de infectados observados (por los sistemas de salud) y el total de infectados, es un número que se estima en el intervalo de . a . [ , ] . en esta sección se muestran las curvas solución después de ajustar los datos para los infectados reales con los datos observados después de divididos por k. los enormes números máximos de enfermos observados se reducen a medida que k disminuye. este efecto es una consecuencia directa de la no linealidad del sistema de ecuaciones. si el sistema fuera linear el máximo no debería cambiar, pues al variar las condiciones iniciales en ser divididas por k, la solución para el número total sería proporcional a la anterior. luego, después de multiplicar por k el máximo número de infestados totales para obtener el observado, se volvería a obtener el mismo valor. sin embargo, como el sistema de ecuaciones es no lineal (el número de infectados no puede sobrepasar la población del país) la no linealidad resulta capaz de reducir el máximo. sin embargo nuestro interés son las condiciones realistas donde el máximo está muy lejos de acercarse a la población del país, por lo tanto estamos en la región lineal. esto significa que nuestra predicción para infectados y recuperados observados si k es constante es independiente de su valor. en la sección iii discutimos las condiciones de aparición de esos máximos reducidos. la sección iii comienza discutiendo las soluciones del sistema sir en los casos de que el número de infectados i es mucho más pequeño que la población n . en esta situación el número de susceptibles en esa etapa se puede aproximar por la población del país n . se presenta entonces las conocidas soluciones explícitas del problema para β y γ constantes. estas son exponenciales cuyo tiempo característico de crecimiento o decrecimiento es determinado por elúnico número β * −γ. si esta cantidad es positiva la solución del problema crece indefectiblemente, independiente de los valores de i y r en el momento dado. esta es una propiedad importante de la dinámica considerada. para que i decrezca en esta zona de bajos valores (i ≪ n ) es estrictamente necesario que los valores locales de β − γ resulten negativos o que el parámetro (r < ). por tanto, en todos los países en que se ha logrado la recuperación lo que se ha alcanzado son valores negativos de esta cantidad. sin embargo, aunque el país imponga aislamiento total a partir de cierto día, las curvas de infectados en ningún caso comienzan a bajar. parece lógico suponer que esas medidas deben forzar la validez de la condición β = , o sea, ausencia de transmisibilidad. las causas de esta aparente contradicción se analizan en la siguiente sección. cuba, considerando dos escenarios: en el primero la tasa de contagio continua siendo la tasa al día de hoy ( de abril ), la epidemia se extendería a junio y los enfermos detectados en el pico rondarían los , . en el segundo escenario a finales de abril la cuarentena estricta lograr < y por tanto el pico de la epidemia se alcanza en mayo, y los enfermos detectados en el pico rondan los varios miles (más de ). se comprueba que la cuarentena en el caso de alemania y corea del sur ya alcanzó la regiónr < (β * − γ < ), que es lo deseado. estosúltimos resultados enfatizan la importancia de una cuarentena estricta. todo lo anterior se resume en nuestra conclusiones en laúltima sección. el modelo"suceptible infected recovered" (sir) es uno de los más sencillos y claves en los estudios epidemiológicos para la propagación de enfermedades [ ] . en el mismo la población total n se divide en tres grupos: suceptibles s, infectados i y recuperados r. denominemos por n la poblacion del país considerado. las ecuaciones diferenciales del modelo que gobiernan la evolución de la epidemia están dadas por las unidades deβ = βn = β * n están dadas por [β] = p/t . también emplearemos β * = β * n con unidades [β * ] = /t . algo a destacar es que las magnitudes s, i y r en las ecuaciones ( , , ) son los números de personas susceptibles, infectadas y recuperadas que existen realmente en toda la población. sin embargo, los datos de que se disponen para resolver las ecuaciones a partir de sus condiciones iniciales, en muchos casos son solo la población total del país y los números de infectados y recuperados que detecta el sistema de salud. en esta sección consideraremos que de los datos observados, elúnico que es exacto es la población del país que define a s al inicio. se toma en cuenta una población de millones de personas para cuba y de millones para alemania. a medida de comenzar con un modelo de juguete, consideremos los datos presentados en el panel de arriba, la curva azul corresponde a los susceptibles, la naranja a los recuperados y la amarilla a los infectados activos. los números obtenidos son gigantescos, porque no se considera el efecto de la cuarentena en la disminución de β * , ni el hecho de que sólo se detectan a nivel mundial del orden del % de los casos. el panel de abajo muestra un intervalo de tiempo más pequeño. esta dependencia se obtiene ajustando a los datos experimentales, es decir la dependencia de los infectados activos en términos del tiempo al comienzo de la epidemia. se considera una tasa de detección del porciento. esto es una valoración muy primitiva de la pandemia, que debe corregirse tomando una tasa de detección del orden del % [ ] y en nuestra opinión considerando valores variables de β para reflejar las medidas de contención. los resultados se muestran en la figura . consideremos ahora que el número total de personas infectadas i(t) se desconoce. esto sucede debido a que hay personas que enferman y sanan sin ser reportadas, debido a presentar síntomas leves, denotemos por i o (t) al número de personas infectadas observado. la relación k entre i (t) e i(t) asumiremos que es una constante estimada en la literatura [ , ] por lo cual donde r es la razón entre el número de infectados observados y el número de los no observados, la cual ha sido estimada en la referencia [ ] a un valor en el rango ( . , . ). los casos observados están entre un décimo y dos décimos de los casos no observados. adoptaremos un valor de k = . cercano a r = / , el cual es simplemente un estimado. el cociente entre el número de recuperados observados a un tiempo dado y el número total de recuperados k * podría ser también una constante en la zona de tiempos pequeños, ya que la solución es exponencial. no asumiremos que k * coincide con el valor de k, aunque en los gráficos representaremos también la magnitud que constituyen los recuperados observados en el caso de que k * = k. por lo cual este número no constituye una predicción para el número de recuperados observados. consideremos la solución del sistema de ecuaciones ( , , ) que describa aproximadamente la lista de valores para el número de los infectados activos y sus incrementos diarios observados por el sistema de salud de cuba entre los días . . y . . . la solución del sistema debe describir los valores de dicha lista después de divididos por el factor k = . los datos para el número de infectados crecen rápidamente en forma compatible con la conocida evolución exponencial de las soluciones sir cuando la cantidad de infectados es pequeña. el valor de γ describe un decaimiento exponencial de los infectados si no hay contagio (β = ). dado que el tiempo típico en que se cura cada enfermo es de alrededor de a días, en esta sección adoptaremos el estimado de γ = = . . se obtuvieron valores de β que permiten una aproximación de los datos observados para i o (t). se resolvió entonces el sistema de ecuaciones sir fijando las condiciones iniciales para i(t) en el tiempo nulo i( ). estas condiciones iniciales se ajustaron con vistas a reproducir los datos reportados. los valores obtenidos de los parámetros fueron la solución obtenida para i(t) y su derivada se muestran en la figura . [ ]). la curva y puntos azules corresponden a los infectados predichos y observados, respectivamente, y la curva y puntos amarillos a la derivada respecto al tiempo predichos y observados. las predicciones de la curva continua para tiempos mayores a los usados para este gráfico, al ser comparados con los datos que van apareciendo cada día, brindan una idea acerca del funcionamiento de las medidas de confinamiento a partir del día de marzo del . puede observarse que la evolución presenta el carácter exponencial en la región de tiempos anterior al confinamiento, pese a las fluctuaciones de los datos. para un intervalo de tiempo mayor: ( , ), la evolución temporal para el número de infectados muestra un máximo como se ilustra en la figura , que presenta además las curvas del número de susceptibles y recuperados (considerando k * = k). : la figura muestra las curvas de infectados asociadas a tres valores distintos de la relación entre el número de infectados observados y la total: k = . , . , . que corresponden en orden a las curvas con valores crecientes de su máximo. debido a la no linealidad del sistema de ecuaciones sir, la cantidad máxima de infectados decrece con la disminución del número k de infectados no observados por cada infectado total. la no linealidad del sistema se sigue de que el número de enfermos no puede superar a la población. del mínimo en el número /k. la no linealidad se produce debido a que el número total de infectados no puede nunca ser superior a la población. iv. relevancia del parametro β * − γ a continuación discutiremos las soluciones del sistema sir en los casos de que el número de infectados es mucho más pequeño que la población i ≪ n . esta aproximación es relevante con vistas a discutir los casos de los países que han superado la epidemia, en estos países esta relación se cumple. en este caso el número de susceptibles se puede aproximar por la misma población del país s(t) ≈ n . por tanto el sistema de ecuaciones ( , , ) se simplifica a la forma dado que las ecuaciones son lineales, las soluciones resultan exponenciales dadas por: donde i constituye elúnico parámetro libre de la solución. para países con millones de habitantes, como antes se mencionó, la solución exponencial es una muy buena aproximación para tiempos en que i(t) ≪ n. en este punto es de interés subrayar que si en esas ecuaciones se considera β * como una variable dependiente del tiempo, el signo de β * (t) − γ determina el signo de la derivada temporal de i(t). es decir, que la magnitud β * (t) − γ brinda un factor tan relevante como si la epidemia crece o decrece en el instante considerado. en el caso que no hay contagio β = , por lo cual el valor de γ corresponde predecir un decaimiento exponencial de los infectados (caso de china). sin embargo el tiempo típico en que se cura cada enfermo es de a días. por lo tanto, adoptaremos en esta sección el estimado de γ = = . . tal como comentamos en la introducción, las soluciones exponenciales tienen un tiempo característico de crecimiento o decrecimiento determinado por el número β * − γ. si esta cantidad resulta positiva la solución del problema crece indefectiblemente, independientemente de los valores especficos de i y r en el momento dado. esta es una propiedad relevante en la dinámica de este problema. para que i decrezca en la inmportante región de los bajos valores es imprescindible que el valor en el tiempo de β * − γ resulte negativo. de acuerdo a esto, en todos los países en que se ha logrado la recuperación de la epidemia, se ha podido hacer que esa cantidad sea negativa, dando un máximo de infectados activos con i ≪ n . sin embargo, es también conocido que aunque el país imponga aislamiento total a partir de cierta fecha, las curvas de infección en ningún caso comienzan a bajar instantaneamente. es decir, aunque es lógico suponer las medidas deben fijar instantaneamente la condición β * = (ausencia de transmisibilidad) los datos de infección niegan esta propiedad y las pendientes continuan siendo positivas días después de la fecha de confinamiento. analicemos las causas de este comportamiento en la siguiente sección. describamos ahora un modelo cualitativo que brinda una razón por la que el valor de β * = no se hace cero inmediatamente después de tomar las medidas de aislamiento. es decir, que un día después de la implantación del confinamiento, y durante digamos aproximadamente veinte días β * debe decrecer hasta anularse. la dependencia temporal en ese intervalo no es conocida. solo es de esperar que la función tenga una disminución brusca el preciso día en que comienza el aislamiento, dado que las condiciones de contacto de los infectados con su entorno cambiaron drásticamente. una vez que la transmisibilidad se anula, la curva de decaída del número de infectados i(t) debe tener un carácter exponencial con tiempo de decaimiento γ. como antes mencionamos la curva de infección de china reportada en el sitio web www.worldometer.info [ ], permiten estimar el valor de /γ entre y días. haremos un comentario final acerca de las suposiciones del modelo. considerar β * = luego de un tiempo de duración de la afección después de instaurado el aislamiento, es válido si se supone que este aislamiento es unipersonal. es decir si cada persona se encuentra aislada. pasado este intervalo, el número de susceptibles no puede variar con el tiempo pues no es posible infectarlos. siendo alemania uno de los países más desarrollados, esa condición pudiera satisfacerse aproximadamente. como se verá en la siguiente subseccción, el modelo funciona bastante bien para ese país. sin embargo, en otras situaciones puede suceder que al final de un período de duración después del confinamiento, la tasa de contagio comience a decrecer a partir de un valor disminuído. esto es de esperar en países donde el aislamiento se aleje bastante del individual, donde existe aglomeración de personas en la familia media. en esos casos, supondremos que pasado el citado intervalo de tiempo los valores de β * disminuyen exponencialmente con una constante de decaimiento cuyo valor controlará el máximo de i o (t). a continuación el modelo cualitativo descrito de la evolución de β * bajo condiciones de aislamiento se aplica a describir las curvas de infectados de alemania y cuba. consideremos en esta subsección una descripción de las curvas de infección de alemania y cuba en base al modelo simple descrito. para ello, primeramente fijamos los parámetros β * y γ en el comienzo de crecimiento exponencial de la infección en ambos países. el valor de la relación k entre el número de infectados observados y el total se tomó como k = . . nótese que la curva continua (exponencial en la aproximación lineal) sigue muy exactamente a la observada en ese intervalo. en el rango - días la variación de la función β * se muestra en la figura . note que la caída exponencial del número de infectados observados comienza aproximadamente un período de duración de la enfermedad después adoptado el aislamiento. muestra esta variación. ajustando la dependencia lineal de β * para tiempos posteriores al instante de aislamiento, se logra entonces describir satisfactoriamente la curva de infección de alemania, como lo muestra la figura . puede decirse que el caso de alemania brinda un ejemplo de que la correcta implantación de las medidas de aislamiento puede determinar un decaimiento exponencial de la pandemia, transcurrido aproximadamente un período medio de duración de la enfermedad. pensamos que este efecto pudiera ser relevante para explicar los casos de otros países en esta etapa como corea del sur, austria, suiza, etc. posteriormente el mismo modelo se aplicó a la curva de infección de cuba, asumiendo la fecha del de marzo para la imposición del aislamiento. tomando γ = . , para el parámetro β * derivado de los datos diarios de infectados brindados en la referencia [ ], se obtuvieron los valores β * = . , γ = / . una dependencia exponencial del tipo donde en este caso, tomamos τ = como el tiempo de medio de la enfermedad. las curvas que se muestran en la figura para tiempos mayores que + τ corresponden a los valores de α = . , . , . , . , . . estas consideran seis formas de decaimiento exponencial de β * con vistas a intentar describir la esperada disminución de β * a suceder después de establecido el aislamiento y esperar un tiempo de duracion de la enfermedad. como se comentó en la discusión del modelo, en países en que el confinamiento se logra con mas dificultad, solo cabe esperar que β * decaiga despues de esperar un tiempo τ a partir del día de aislamiento. es interés entonces comparar las observaciones que se ofrecen por el ministerio de salud pública con las curvas de infectados para varios valores del parametro α. obsérvese que para α = . la exponencial se reduce bastante en el curso de dos días, por lo que para ese valor se puede considerar que se está imponiendo β * = una vez pasado un período medio de duración de la enfermedad. esto puede considerarse como una buena aproximación de una respuesta similar a la de alemania. para valores mayores de α las curvas de la figura predicen máximos de mayor valor que van apareciendo a tiempos también superiores. los tiempos a que ocurren están descritos por los ceros de la función como ya mencionamos, la figura muestra las curvas de i ′ (t)/i(t) = β * (t) − γ para el modelo y para los datos, estimada para estos por la razón entre el número de enfermos reportados diariamente y la cantidad total de enfermos activos. a pesar de las fluctuaciones, debidas a que la cantidad de casos es aún reducida (respecto a la población), se indentifica cierta coherencia con la dependencia temporal de la β * (t) − γ asumida en el marco del modelo considerado. la dependencia más allá del día de abril, en que se términa de redactar este trabajo, debe decidir cuan válida es la suposición sugerida por el modelo de alemania: considerar que β * (t) debe tender a cero después de un tiempo de duración medio de la enfermedad. una sugerencia importante que puede extraerse de las figuras y se refiere a asumir que la curva azul de los datos para i ′ (t)/i(t) en se debe esperar que se apegue a alguna de las exponenciales ploteadas en ella. si ello es así, se sigue que las cotas de un máximo no superior a , o , casos, a suceder antes del de mayo, pueden resultar válidas como indica la figura . figure : se ilustran las curvas de i ′ (t)/i(t) = β * (t) − γ para la solución y para los datos, estimada para estos por la razón entre el número de enfermos reportados diariamente y la cantidad total de ellos que permanecen enfermos dicho día (enfermos activos). el gráfico muestra cierta semejanza entre la β * (t) asumida en el marco del modelo con la observada. las curvas exponenciales que aparecen corresponden a los varios valores de α = . , . , . , . , . .la curva gruesa corresponde a α = . . esta figura y la figura sugieren que el máximo del número de casos podría estar acotado entre , y , , si la curva azul de i ′ (t)/i(t)observado corta el eje horizontal antes del de mayo. en esta subsección realizaremos un estudio experimental del cambio de la tasa de contagio β para varios países. estos resultados los comparamos con el modelo presentado anteriormente, llegando a la conclusión de que efectivamente esta tasa de contagio variable se ajusta a los datos, y se reduce con el proceso de cuarentena. por lo cual el estudio de la dinámica de esta tasa de contagio es relevante. figure : número de infectados predichos por el modelo sir en función del tiempo. se considera una taza de detección de k = . [ , ] . en la primera fila, se dan los graficos de infectados totales y observados, respectivamente, considerando una tasa de contagio variable, ajustada a datos del país. la extrapolación a grandes tiempos se hace empleando la tasa de contagio al . . . la segunda fila muestra la evolución de infectados totales y observados para el caso de imponer un confinamiento más estricto a finales de abril (tabla). . . construimos un modelo sir para cuba dividido en cuatro etapas, considerando valores de β * dependientes del tiempo. los valores se muestran en la tabla i. estos valores de β * locales se emplean para construir dos escenarios, el primero con tres valores de β * y extrapolando la evolución para tiempos mayores con laúltima tasa de contagio calculada. el segundo, suponiendo que a finales de abril el β * se reduce dandor < . en la figura se muestra la solución del sir, empleando los valores β * obtenidos haciendo ajustes exponenciales locales mediante mínimos cuadrados. el panel izquierdo de la figura muestra el ajuste a los datos, y el panel derecho de destaca que sucedería a la curva si a finales de abril disminuye la tasa de contagio al nivel β * − γ < . la figura muestra en la primera columna la situación hipotética de evolución del modelo sir si la tasa de contagio continua siendo la observada al . . . el pico de la infección sería dilatado hasta septiembre, y alcanzaría los , infectados activos observados. se considera una tasa de detección k = . [ , ] lo que diría que el pico real seria de aproximadamente . millones de personas. este escenario diría que la mayor parte de la población del país se enfermaría, como se puede ver en el panel izquierdo de la figura , y no es deseable. de aquí la necesidad de confinamiento. en la segunda columna de la figura se muestra la situación hipotética que debido a las medidas de confinamiento, a finales de abril se alcance un β * = . dado en la quinta columna de la tabla i. en este caso el pico de infectados activos se alcanzaría a finales de mayo y los valores observados estarían en el orden de los , los reales más bien en los , casos. esto se puede ver también en el panel derecho de la figura donde se observa claramente que la población infectada sería una pequeña fracción de los millones de habitantes. este escenario ideal es una muestra clara de que el máximo que queremos alcanzar no es el máximo standard de la evolución del sir, si no un máximo logrado cuandor < . este es una propiedad importante a tener en cuenta por las personas que en estos tiempos deseen comenzar a analizar esta pandemia. para aclarar aún más nuestra idea mostraremos las curvas de alemania y corea del sur, esta ultima ya en fase de salida de la epidemia. las figuras y muestran la evolución de la epidemia en estas dos naciones empleando un modelo sir con tasa de contagio figure : en la primera fila el panel de la izquierda muestra la evolución de los infectados para alemania de acuerdo a los datos experimentales (puntos azules) en a un modelo sir con tasa de contagio β variable (línea naranja). en la primera fila el segundo gráfico muestra el número de i + r de los datos (puntos azules) confrontado con el modelo (línea naranja). en la segunda fila se muestra primero la variación de la tasa de contagio en el tiempo, y en el segundo gráfico se superpone el ajuste de una de exponencial (línea) a las βs locales (puntos) a partir del confinamiento. se considera γ = / . alemania se encuentra en la región donde β * − γ < . variable en el tiempo, con vistas a reflejar el efecto del confinamiento. las líneas sólidas representan a las soluciones de las ecuaciones diferenciales del modelo sir empleando la tasa β(t) obtenida de realizar ajustes locales. en las tablas ii y iii se representa la evolución de los valores de las tasas de contagio β,r y γ(r − ). se observa que ambos países alcanzaron la regiónr < . en el trabajo se exploraron dos descripciones de la pandemia. una de ellas constituye un modelo empírico en el cual se considera que desde el punto en que se establece el confinamiento, la tasa de contagio debe, al menos, empezar a decrecer tendiendo cero después de un período de duración de la enfermedad ( /γ) (aproximadamente a días). este modelo es capaz de describir bien el comportamiento de la epidemia en países como alemania. el segundo es un modelo sir con tasa de contagio variable, la cual se ajusta experimentalmente, y su variación se asume que se produce a consecuencia de la cuarentena. estos análisis sugieren que la tasa de contagio a partir del día de aislamiento se comporta aproximadamente como una exponencial. ambas descripciones coinciden, y el enfásis de ambos análisis radica en que un control de la epidemia sólo se logra con un confinamiento riguroso. por lo cual analizando la evolución de la epidemia, resulta muy importante el monitorear los valores de β instantáneos para medir si el confinamiento está siendo lo estricto que se necesita. esto es, con vistas a controlar la pandemia con un número de infectados muy inferior a la poblacion se requiere que en cierto período de días se cumpla la condicion β * − γ < (r < ). en los días del - - al . . la tasa de contagio de cuba está enr = . , por lo que aún no han sido totalmente figure : el panel de la izquierda muestra la evolución de los infectados para corea del sur de acuerdo a los datos experimentales (puntos azules) y a un modelo sir con tasa de contagio β variable. el panel derecho muestra la variación de la tasa de contagio en el tiempo. se considera γ = / . corea del sur se encuentra en la región donde β * − γ < . el modelo para la evolución de β aplicado al caso cubano, predice que la epidemia pudiera concluir inclusive a finales de abril o mediados de mayo si las medidas de aislamiento tomadas en el entorno del de marzo, resultaran efectivas. en este caso optimista, el máximo número de infectados podría resultar en cerca de − infectados activos con el máximo apareciendo entre finales de abril y el de mayo. el valor del máximo crecería a medida que este tarde mas en realizarse. también en los modelos de β variables a pedazo en el tiempo, se asumió una implantación efectiva más tardía del aislamiento a finales de abril, que podría definir a finales de mayo un pico de infectados activos del orden de los miles de personas. en esta situación optimista el número de personas máximo en estado grave estaría en el orden los cientos. en el caso de no lograr controlar la evolución de la epidemia, y continuar con la tasa de contagio al día de hoy los casos graves rondarían los , y este pico se alcanzaría en el mes de septiembre. nos gustaría destacar que el mensaje importante de este trabajo es el hecho de que la dinámica de los parámetros que controlan la evolución de la epidemia es relevante. en particular la evolución de la tasa de contagio β (o der − , o de i ′ (t) i(t) ) debe tomarse muy the sars-cov- in mexico: analysis of plausible scenarios of behavioral change and outbreak containment an updated estimation of the risk of transmission of the novelcoronavirus ( -ncov) modelo de infectados para el edo. de guanajuato covid- como usé las matemáticas para predecir covid- en el estado de guanajuato substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) public health responses to covid- outbreaks on cruise ships euro worldwide, february href euro an sir epidemic model with time delay and general nonlinear incidence rate infectious disease models with time-varying parameters and general nonlinear incidence rate epidemic modeling : or why your covid- exponential fits are wrong /ide/gida-fellowships/imperial-college-covid -europe-estimates-and-npi-impact das mysterium um die ansteckungsrate" spiegel online coronavirus covid spreading in italy: optimizing an epidemiological model with dynamic social distancing through differential evolution clinical features of patients infected with novel coronavirus in wuhan estimating the asymptomatic proportion of coronavirus disease (covid- ) cases on board the diamond princess cruise ship la pandemia covid- -coronavirus en méxico y el mundo covid cuba key: cord- - wyxedn authors: dimarco, g.; perthame, b.; toscani, g.; zanella, m. title: social contacts and the spread of infectious diseases date: - - journal: nan doi: nan sha: doc_id: cord_uid: wyxedn motivated by the covid- pandemic, we introduce a mathematical description of the impact of sociality in the spread of infectious diseases by integrating an epidemiological dynamic with a kinetic modeling of population-based contacts. the kinetic description leads to study the evolution over time of boltzmann type equations describing the number densities of social contacts of susceptible, infected and recovered individuals, whose proportions are driven by a classical compartmental model in epidemiology. explicit calculations show that the spread of the disease is closely related to the mean number of contacts, thus justifying the lockdown strategies assumed by governments to prevent them. furthermore, the kinetic model allows to clarify how a selective control can be assumed to achieve a minimal lockdown strategy by only reducing individuals undergoing a very large number of daily contacts. this, in turns, could permit to maintain at best the economic activities which would seriously suffer from a total closure policy. numerical simulations confirm the ability of the model to describe different phenomena characteristic of the rapid spread of an epidemic. a last part is dedicated to fit numerical solutions of the proposed model with experimental data coming from different european countries. introduction a model which well fits with the experimental data. an easy way to understand epidemiology models is that, given a population composed of agents, they prescribe movements of individuals between different states based on some matching functions or laws of motion. according to the classical sir models [ ] , agents in the system are split into three classes, the susceptible, who can contract the disease, the infected, who have already contracted the disease and can transmit it, and the recovered who are healed, immune or isolated. inspired by the model considered in [ ] for describing a social attitude and making use of the sir dynamics, we present here a model composed by a system of three kinetic equations, each one describing the time evolution of the distribution of the number of contacts for the subpopulation belonging to a given epidemiological class. these three equations are further coupled by taking into account the movements of agents from one class to the other as a consequence of the infection. the interactions which describe the social contacts of individuals are based on few rules and can be easily adapted to take into account the behavior of agents in the different classes in presence of the infectious disease. our joint framework is consequently based on two models which can be considered classical in the respective fields. from the epidemic side, the sir and its related models are widely used in most applications (cf. [ ] and the references therein). from the side of multi-agent systems in statistical mechanics, the linear kinetic model introduced in [ , ] has been shown to be flexible and able to describe, with opportune modifications, different problems in which human behavior plays an essential role, like the formation of social contacts. the study of the effects of the intensity of social contacts in the epidemic diffusion, among other consequences, allows to obtain in a rigorous way a variety of non-linear incidence rates of the infectious disease as for instance recently considered in [ ] . it is also interesting to remark that the presence of non-linearity in the incidence rate function, and in particular, the concavity condition with respect to the number of infected has been considered in [ ] as a consequence of psychological effects. namely, the authors observed that in the presence of a very large number of infected, the probability for an infective to transmit the virus further may decrease because the population tend to naturally reduce the number of contacts. this fact will be embedded in the kinetic model producing a change in the mean value of the number of daily contacts. the importance of reducing at best the social contacts to countering the advance of a pandemic is a well-known and studied phenomenon [ ] . while in normal life activity, it is commonly assumed that a large part of agents behaves in a similar way, in presence of an extraordinary situation like the one due to a pandemic, it is highly reasonable to conjecture that the social behavior of individuals is strictly affected by their personal feeling in terms of safeness. in this work, we focus on the assumption that it is the degree of diffusion of the disease that changes people's behavior in terms of social contacts, in view both of the personal perception and/or of the external government intervention. more generally, the model can be clearly extended to consider more realistic dependencies between an epidemic disease and the social contacts of individuals. however, this does not change the essential conclusions of our analysis, namely that there is a close interplay between the spread of the disease and the distribution of contacts, that the kinetic description is able to quantify. the encouraging results described in the rest of the article, finally suggest that a similar analysis can be carried out, at the price of an increasing difficulty in computations, in more complex epidemiological models like the sidarthe model [ , ] , recently used to simulate the covid- epidemic in italy to validate and improve the eventual partial lockdown strategies of the government and to suggest future measures. the rest of the paper is organized as follows. section introduces the system of three sirtype kinetic equations combining the dynamics of social contacts with the spread of infectious disease in a system of interacting individuals. then, in section we show that in a suitable asymptotic procedure the solution to the kinetic system tends towards the solution of a system of three sir-type fokker-planck type equations with local equilibria of gamma-type [ ] . once the system of fokker-planck type equations has been derived, in section we close the sir-type system of kinetic equations around the gamma-type equilibria to obtain a sir model in which the presence and consequently the evolution of the social contacts leads to a non-linear incidence rate of the infectious disease satisfying the compatibility conditions introduced in [ ] . last, in section , we investigate at a numerical level the relationships between the solutions of the kinetic system of boltzmann type, its fokker-planck asymptotics and the sir model. these simulations confirm the ability of the model to describe different phenomena characteristic of the trend of social contacts in situations compromised by the rapid spread of an epidemic, and the consequences of various lockdown action in its evolution. a last part is sacred to a fitting of the model with the experimental data by first estimating the parameters of the epidemic through data and successively by using these parameters in the kinetic model. our first goal is to build a kinetic system which suitably describes the spreading of an infectious disease under a dependence of the contagiousness parameters on the individual number of social contacts of the agents. as in classical sir models [ ] , the entire population is divided into three classes: susceptible, infected and recovered individuals. aiming to understand social contacts effects on the dynamics, we will not consider in the sequel the role of heterogeneity in the disease parameters (such as the personal susceptibility to a given disease), which could be derived from the classical sir model, suitably adjusted to account for new information [ , , ] . consequently, agents in the system are considered indistinguishable [ ] . this means that the state of a person in each class at any instant of time t ≥ is completely characterized by the number of contacts x ≥ , measured in some unit. while x is a natural positive number at the individual level, without loss of generality we will consider x in the rest of the paper to be a nonnegative real number, x ∈ r + , at the population level. we denote then by f s (x, t), f i (x, t) and f r (x, t), the distributions at time t > of the number of social contacts of the population of susceptible, infected and recovered individuals, respectively. the distribution of contacts of the whole population is then recovered as the sum of the three distributions as outlined in the introduction, we do not consider for simplicity of presentation disease related mortality as well as the presence of asymptomatic individuals. therefore, we can fix the total distribution of social contacts to be a probability density for all times t ≥ as a consequence, the quantities denote the fractions of susceptible, infected and recovered respectively. for a given constant α > , we also denote with m α (t) the moment of order α of the distribution of the number of contacts and the moments of order α for the distributions of the number of contacts for each class unambiguously, we will indicate the mean values, corresponding to α = , by m(t) and, respectively, in what follows, we assume that the evolution of the densities is built according to the classical sir model [ ] , and that the various classes in the model could act differently in the social process constituting the contact dynamics. the kinetic model then follows combining the epidemic process with the contact dynamics, as modeled in [ , ] . this gives the system where γ is the constant recovery rate while the transmission of the infection is governed by the function k(f s , f i ), the local incidence rate, expressed by in ( ) the contact function κ(x, y) is a nonnegative function growing with respect to the number of contacts y of the population of infected, such that κ(x, ) = . then, the spreading of the epidemic depends heavily on the function κ(·, ·) used to quantify the rate of possible contagion in terms of the number of social contacts between the classes of susceptible and infected. the evolution of the mass fractions j(t), j ∈ {s, i, r} then obeys to the classical sir model by choosing κ(x, y) ≡ β > [ ] , thus considering the spreading of the disease independent of the intensity of social contacts. a leading example is obtained by choosing a rank matrix where α, β are positive constants, that is by taking the incidence rate directly proportional to the product of the number of contacts of susceptible and infected people. when α = , for example, the incidence rate takes the simpler form in equations ( ) the operators q j , j ∈ {s, i, r} characterize the thermalization of the distribution of social contacts in the various classes in terms of repeated interactions [ , ] . the q j , j ∈ {s, i, r} are integral operators that modify the distribution of contacts f j (x, t), j ∈ {s, i, r}. their action on observable quantities is given by where b(x) measures the interaction frequency, · denotes mathematical expectation with respect to a random quantity, and x * j denote the post-interaction value of the number x of social contacts of the j-th population. last, the constant τ in front of the interaction integral measures the time scale of the thermalization of the distribution of social contacts. remark . the relaxation constant τ plays an important role in the time evolution of the model ( ), since it relates the time scale of the epidemic evolution with that of the statistical formation of social contacts. small values of the constant τ will correspond to a fast adaptation of people to a steady situation. hence, since it is reasonable to assume that adaptation of people is faster that the dynamics of epidemic, in what follows we will assume τ . the process of formation of the social contacts distribution is obtained by taking into account the typical aspects of human being, in particular the search, in absence of epidemics, of opportunities for socialization. in addition to that, social contacts are due to the common use of public transportations to reach schools, offices and, in general, places of work and to basic needs of interactions due to work duties. as shown in [ ] , this leads individuals to stabilize on a characteristic number of daily contacts depending on the social habits of the different countries, (represented by a suitable valuex m of the variable x, which can be considered as the mean number of contacts relative to the population under investigation). then, following a consolidated path in the kinetic theory of social phenomena, one aims in obtaining the characterization of the distribution of social contacts in a multi-agent system (the macroscopic behavior), starting from some universal assumption about the personal behavior of agents (the microscopic behavior). indeed, as in many other human activities, the daily amount of social contacts is the result of a repeated upgrading based on well-established rules. to this extent, it is enough to recall that the daily life of each person is based on a certain number of activities, and each activity carries a certain number of contacts. moreover, for any given activity, the number of social contacts varies in consequence of the personal choice. the typical example is the use or not of public transportations to reach the place of work or the social attitudes which scales with the age. clearly, independently of the personal choices or needs, the number of daily social contacts contains a certain amount of randomness, that has to be taken into account. also, while it is very easy to reach a high number of social contacts attending crowded places for need or will, going below a given threshold is very difficult, since various contacts are forced by working or school activities, among others causes. this asymmetry between growth and decrease, as exhaustively discussed in [ , , ] , can be suitably modeled by resorting to the value function description of the elementary variation of the x variable. it is important to outline that, in the presence of an epidemic, the characteristic mean number of daily contactsx m reasonably changes in time, even in absence of an external lockdown intervention, in reason of the perception of danger linked to social contacts that people manifest. consequently, even if not always explicitly indicated, we will assumex m =x m (t). also, this time dependent value can be different depending on the class to which agents belong. a further non secondary aspect of the formation of the number of social contacts is that their frequency is not uniform with respect to the values of x. indeed, it is reasonable to assume that the frequency of interactions is inversely proportional to the number of contacts x. this relationship takes into account that it is highly probable to have at least some contacts, and the rare situation in which one reaches a very high values of contacts x. the choice of a variable interaction frequency has been fruitfully applied in a different context in a different situation [ ] . this was done to better describe the evolution in time of the wealth distribution in a western society. there, the frequency of the economic transactions has been proportionally related to the values of the wealth involved, to take into account the low interest of trading agents in transactions with small values of the traded wealth. as discussed in [ ] , the introduction of a variable kernel into the kinetic equation does not modify the shape of the equilibrium density, but it allows a better physical description of the phenomenon under study, including an exponential rate of relaxation to equilibrium for the underlying fokker-planck type equation. following [ , , , ] , we will now illustrate the mathematical formulation of the previously discussed behavior. in full generality, we assume for the moment that individuals in different classes can have a different mean number of contacts. the microscopic updates of social contacts of individuals in the class j, j ∈ {s, i, r} will be taken of the form in a single update (interaction), the number x of contacts can be modified for two reasons, expressed by two terms, both proportional to the value x. in the first one, the coefficient Φ (·), which can assume both positive and negative values, characterizes the typical and predictable variation of the social contacts of agents, namely the personal social behavior of agents. the second term takes into account a certain amount of unpredictability in the process. the usual choice is to assume that the random variable η is of zero mean and bounded variance, expressed by η = , η = λ, with λ > . small random variations of the interaction ( ) will be expressed simply by multiplying η by a small positive constant √ , with , which produces the new (small) variance λ. notice that this formulation makes an essential use of the choice x ∈ r + . this scaled variable is the variable η appearing in ( ) . the function Φ plays the role of the value function in the prospect theory of kahneman and twersky [ , ] , and contains the mathematical details of the expected human behavior in the phenomenon under consideration. in particular, the hypothesis on which it is built is that it is normally easier to increase the value of x than to decrease it, in relationship with the mean valuē x j , j ∈ {s, i, r}. in terms of the variable s = x/x j we consider as in [ ] the class of value functions given by where < δ ≤ and < µ < are suitable constants characterizing the intensity of the individual behavior, while the constant > is related to the intensity of the interaction. hence, the choice corresponds to small variations of the mean difference x * j − x and making it common to both effects, randomness and adaptation, permits to equilibriate their effects as can be seen in section . in ( ), the value µ denotes the maximal amount of variation of x that agents will be able to obtain in a single interaction. note indeed that the value function Φ δ (s) is such that clearly, the choice µ < implies that, in absence of randomness, the value of x * j remains positive if x is positive. as proven in [ ] , the value function ( ) satisfies these properties are in agreement with the expected behavior of agents, since deviations from the reference point (s = in our case), are bigger below it than above. letting δ → in ( ) allows to recover the value function Φ (s) = µ s − s + , s ≥ . introduced in [ , ] to describe phenomena characterized by the lognormal distribution [ , ] . once the elementary interaction ( ) is given, for any choice of the value function, the study of the time-evolution of the distribution of the number x of social contacts follows by resorting to kinetic collision-like models [ , ] , that quantify at any given time the variation of the density of the contact variable x in terms of the interaction operators. for a given density f j (x, t), j ∈ {s, i, r}, the action on the density of the interaction operators q j (f )(x, t) in equations ( ) is fruitfully written in weak form. the weak form corresponds to say that for all smooth functions ϕ(x) (the observable quantities) the action of the operator q j on ϕ is given by here expectation · takes into account the presence of the random parameter η in the microscopic interaction ( ) . the function b(x) measures the interaction frequency. the right-hand side of equation ( ) quantifies the variation in density, at a given time t > , of individuals in the class j ∈ {s, i, r} that modify their value from x to x * j (loss term with negative sign) and agents that change their value from x * j to x (gain term with positive sign). in [ ] , the interaction kernel b(x) has been assumed constant. this simplifying hypothesis, which is not always well justified from a modeling point of view, is the common assumption in the boltzmann-type description of socio-economic phenomena [ , ] . in particular, the role of a non constant collision kernel b(x) has been analyzed in its critical aspects in [ ] . following the approach in [ , ] , we express the mathematical form of the kernel b(x) by assuming that the frequency of changes which leads to increase the number x of social contacts is inversely proportional to x. hence, we consider collision kernels in the form for some constants a > and b > . this kernel assigns a low probability to interactions in which individuals have already a large number of contacts and assigns a high probability to interactions in which the value of the variable x is small. the constants a and b can be suitably chosen by resorting to the following argument [ ] . for small values of the x variable, the rate of variation of the value function ( ) is given by hence, for small values of x, the mean individual rate predicted by the value function is proportional to x δ− . then, the choice b = δ would correspond to a collective rate of variation of the system independent of the parameter δ which instead characterizes the individual rate of variation of the value function. a second important fact is that the individual rate of variation ( ) depends linearly on the positive constant , and it is such that the intensity of the variation decreases as decreases. then, the choice a = , is such that the collective rate of variation remains bounded even in presence of very small values of the constant . with these assumptions, the weak form of the interaction integrals in ( ), is given by note that, in consequence of the choice made on the interaction kernel b(·), the evolution of the density f (x, t) is tuned by the parameter , which characterizes both the intensity of interactions and the interaction frequency. this choice implies consequently, the actions of both the value function and the random part of the elementary interaction in ( ) survive in the limit → . remark . the approach just described can be easily adapted to other compartmental models in epidemiology like seir, mseir [ , , ] and/or sidharte [ , ] . for all these cited models, the fundamental aspects of the interaction between social contacts and the spread of the infectious disease we expect not to change in a substantial way. the limit procedure induced by ( ) corresponds to the situation in which elementary interactions ( ) which produce an extremely small modification of the number of social contacts (quasi-invariant interactions) are prevalent in the dynamics. at the same time, the frequency of these interactions is suitably increased to still permitting to see their effect. in kinetic theory, this is a well-known procedure with the name of grazing limit. we point the interested reader to [ , , ] for further details. expanding the difference ϕ(x * j ) − ϕ(x) in taylor series, and then passing to the limit into the right-hand side of ( ), one obtains, for j ∈ {s, i, r} if we impose at x = the boundary conditions and a suitable rapid decay of the densities f j (x, t) at x = +∞ [ ] , the limit operators q δ j , j ∈ {s, i, r}, coincide with the fokker-planck type operators characterized by a variable diffusion coefficient and a time-dependent drift term. indeed, in view of remark we havex j =x j (t). thus, system ( ), in the limit → corresponds to the following fokker-planck system in strong form the fokker-planck system ( ) is complemented with the boundary conditions ( ) at x = . clearly, the steady distributions satisfy the ordinary differential equations corresponding to equations ( ) with time derivatives set equal to zero. it is interesting to remark that the explicit form of the equilibria is easily obtained when both β and γ are set equal to zero, and the contact function κ(x, y) = , so that the incidence rate k(f s , f i ) vanishes. in this simple case, by assuming that the mean valuesx j , j ∈ {s, i, r} are constant, and by setting ν = µ/λ, the equilibria are given by the functions j ∈ {s, i, r}. by fixing the mass of the steady state ( ) equal to one, in agreement with [ ] , the consequent probability densities are generalized gamma f ∞ (x; θ, χ, δ) defined by [ , ] characterized in terms of the shape χ > , the scale parameter θ > , and the exponent δ > that in the present situation are given by it has to be remarked that the shape χ is positive, only if the constant ν = µ/λ satisfies the bound note that condition ( ) holds, independently of δ, when µ ≥ λ, namely when the variance of the random variation in ( ) is small with respect to the maximal variation of the value function. note moreover that for all values δ > the moments are expressed in terms of the parameters denoting respectively the meanx j , j ∈ {s, i, r}, the variance λ of the random effects and the values δ and µ characterizing the value function φ δ defined in ( ) . the standard gamma and weibull distributions are included in ( ) , and are obtained by choosing δ = and, respectively δ = χ. in both cases, the shape χ = ν, and no conditions are required for its positivity. the fokker-planck system ( ) contains all the information on the spreading of the epidemic in terms of the distribution of social contacts. indeed, the knowledge of the densities f j (x, t), j ∈ {s, i, r}, allows to evaluate by integrations all moments of interest. however, in reason of the presence of the incidence rate k(f s , f i ), as given by ( ), the time evolution of the moments of the distribution functions is not explicitly computable, since the evolution of a moment of a certain order depends on the knowledge of higher order moments, thus producing a hierarchy of equations, like in classical kinetic theory of rarefied gases [ ] . we will come back to this question later on. before discussing the closure, we establish some choices in the model in order to obtain results which are in agreement with the ones reported in [ ] . we fix the value δ = , and the incidence rate as in ( ) . in particular, the choice δ = gives, for j ∈ {s, i, r} in this case ( ) imply χ = ν and θ =x j /ν, and the steady states of unit mass, for j ∈ {s, i, r}, are the gamma densities with this particular choice, the mean value and the variance of the densities ( ), j ∈ {s, i, r}, are given by we now go back to the problem of modeling the time evolution of the densities f j (x, t). we assume that due to the presence of the epidemic, the population tends to reduce the typical average number of contacts which exhibits in standard situations. this can happen due to two main reasons: on voluntary basis for preventing contagion or by authorities decision through lockdown measures. this average reduction effect can be introduced by assuming that the mean number of daily contacts x j (t), j ∈ {s, i} depends on the proportion of infected where the function h(r) is decreasing with respect to r, ≤ r ≤ , starting from h( ) = . in particular, in our model, we do not consider a time-dependent mean number of social contacts for the class of recovered. this is due to the fact that the behavior of epidemic does not depend on it. for j ∈ {s, i, r}, andx j (t) as in ( ), we now definē with these notations, system ( ) with δ = reads integrating both sides of equations in ( ) with respect to x, and recalling that the fokker-planck type operators are mass-preserving, we obtain the system for the evolution of the proportions j(t) defined in ( ), j ∈ {s, i, r} as anticipated, unlike the classical sir model, system ( ) is not closed, since the evolution of the mass fractions j(t), j ∈ {s, i, r} depends on the mean values m j (t). similarly to the derivation of macroscopic equations in kinetic theory, the closure of system ( ) can be obtained by resorting, at least formally, on a limit procedure. in fact, as outlined in the introduction, the typical time scale involved in the social contact dynamic is τ which identifies a faster adaptation of individuals to social contacts with respect to the evolution time of the epidemic disease. the choice of the value τ pushes the distribution function f s (x, t) towards the gamma equilibrium density with a mass fraction s(t) and momentum as it can be easily verified from the differential expression of the interaction operatorq s . indeed, if τ is sufficiently small, one can easily argue from the exponential convergence of the solution of the following expression towards the equilibrium f ∞ s (x; θ, ν) (see [ ] for details), that the solution f s (x, t) remains sufficiently close to the gamma density with mass s(t) and momentum given by ( ) for all times. this equilibrium distribution f ∞ s (x; θ, ν) can be plugged into the first equation of ( ) and then one can integrate it with respect to the variable x. this procedure gives formally the first equation of ( ). analogous analysis and formal limit procedure can be done with the second equation in system ( ) , which leads to relaxation towards a gamma density with mass fraction i(t) and momentum given by m i (t) =x i i(t)h(i(t)). consequently, we obtain the closure ∂s(t) ∂t = −β s(t) i(t) h (i(t)), in system ( ) we definedβ = βx ixs . this identifies the classical transmission parameter of the sir model, where however now the difference is that this quantities are not postulated but instead derived starting from microscopic considerations. in the following, we refer to this model as the social contact-sir model (s-sir). remark . the outlined closure strategy can be obtained also by resorting to the splitting method, a very popular numerical approach for the boltzmann equation [ , ] . if at each time step (t, t + ∆t) we consider sequentially the population-based interaction and relaxation operators in the first equation in ( ) , during this short time interval we recover the evolution of the density from the joint action of relaxation ∂f s (x, t) ∂t and sir interaction we recall once again that the typical time to consider is τ , which identifies a faster adaptation of individuals to social contacts with respect to the evolution time of the epidemic disease. since the operatorq s is mass preserving, in the considered time interval the relaxation ( ) with a value τ pushes the solution of equation ( ) towards the gamma equilibrium density with the same mass fraction s(t) of the initial datum, and momentum m s (t + ∆t) =x s s(t)h(i(t)), ( ) as it can be easily verified from the differential expression of the interaction operatorq s . indeed, if τ is sufficiently small, one can easily argue from the exponential convergence of the solution to ( ) towards the equilibrium [ ] , that the solution f s (x, t + ∆t) is sufficiently close to the gamma density with mass s(t) and momentum given by ( ) , and this density can be used into the sir step ( ) to close the splitted system ( )- ( ) . analogous procedure can be done with the second equation in system ( ) , which leads to relaxation towards a gamma density with mass fraction i(t) and momentum given by consequently, substituting into system ( ) we obtain the closed system ( ) . it remains to quantify the action of the social contacts on the evolution of the epidemic. for this reason, let us introduce for a given constant n the decreasing function [ ] which describes a possible way in which, in presence of the spread of the disease, the susceptible and the infected population tend to reduce the mean number of daily social contactsx j , j ∈ {s, i}. this choice produces a sir model with global incidence rate that fulfils all the properties required by the non-linear incidence rates considered in [ ] . for all s, i > . in addition to the form ( ), we can also consider the following function this function satisfies the same properties than ( ) in terms of the incidence rate requests detailed in [ ] and takes into account memory effects on the population's behavior. in fact, people may adapt their life style in terms of possible daily contacts to answer to the actual pandemic situation as to conclude, let us introduce the basic reproduction number r of this model, i.e. the average number of secondary cases produced by a single infected agent introduced into an entirely susceptible population. this is given by where γ is the recovery rate of infected. according to the analysis of [ ] , an autonomous compartmental epidemiological model with the non-linear incidence rate ( ) under the constant population size assumption is stable. such system has either a unique and stable endemic equilibrium state or no endemic equilibrium state at all. since the incidence rate d(s, i) satisfies the conditions ( )- ( ) , if r > , then the endemic equilibrium state q * = (s * , i * ) of system ( ) is asymptotically stable. if r ≤ , then there is no endemic equilibrium state, and the infection-free equilibrium state is asymptotically stable. in addition to analytic expressions, numerical experiments allow us to visualize and quantify the effects of social contacts on the sir dynamics used to describe the time evolution of the epidemic. more precisely, starting from a given equilibrium distribution detailing in a probability setting the daily number of contacts of the population, we show how the coupling between social behaviors and number of infected people may modify the epidemic by slowing down the number of encounters leading to infection. in a second part, we discuss how some external forcing, mimicking political choices and acting on restrictions on the mobility, may additionally improve the reduction of the epidemic trend avoiding concentration in time of people affected by the disease and then decreasing the probability of hospitalization peaks. in a third part, we focus on experimental data for the specific case of covid- in different european countries and we extrapolate from them the main features characterizing the contact function h(·). the starting point is represented by a population composed of % of susceptibles and . % of infected. the distribution of the number of contact is described by ( ) with ν = , δ = and x j = while the epidemic parameters are k(x) = . x j x and γ = . . the kinetic model ( ) is solved by a splitting strategy together with a monte carlo approach where the number of samples used to described the population is fixed to m = . the time step is fixed to ∆t = − and the scaling parameter is τ = − . these choices are enough to observe the convergence of the boltzmann dynamics to the fokker-planck one as shown in figure where the analytical equilibrium distribution is plotted together with the results of the boltzmann dynamics. we considered also uniform initial distribution being χ(·) the indicator function. in the introduced setting, we then compare two distinct cases: in the first one we suppose that the social contacts do not affect the solution, meaning h(i(t)) = , while the second includes the effects of the function h(i(t)) given in ( ) with n = . the results are depicted in figure . the top right images show the time evolution of the distribution of the number of contacts for the two distinct cases, while the middle images report the corresponding evolution of the epidemic. for this the second case, the function h(i(t)) as well as the distribution of contacts for respectively the susceptible and the recovered are shown at the bottom of the same figure. we clearly observe a reduction of the peak of infected in the case in which the dynamics depends on the number of contacts with h(i(t)) given by ( ) . we now repeat the same simulation by only changing the considered epidemic parameters. in particular, we consider a lower infection and recovery rate given by k(x) = . /x j x and γ = , respectively. in figure , we show the evolution of two epidemic profiles in time for the case in which social contacts do not affect the solution and for the case in which h(i(t)) is a function of the number of infected as for the previous situation. results show the same qualitative behavior: peak reduction and spread of the number of infected over time is observed when sociality is taken into account. the total number of infected is also reduced in the second case. next, we compare the effects on the spread of the disease when the population adapts its habits with a time delay with respect to the onset of the epidemic. this kind of dynamics corresponds to a modeling of a possible lockdown strategy whose effects are to reduce the mobility of the population and, correspondingly, to reduce the number of daily contacts in the population. the setting is similar to the one introduced in section . and we consider a switch between h = to h(i(t)) = / + n i(t) when the number of infected increases. the social parameters are ν = , δ = and x j = , as before, while the epidemic parameters are k(x) = . /x j x and γ = . , the final time is fixed to t = . the initial distribution of contacts is also assumed to be of the form ( ) . we consider three different settings, in the first one h = up to t < t / , in the second one up to t < t / while in the third one we prescribe a lockdown for a limited amount of time ( < t < ) and then we relax back to h = . the results are shown in figure for both the distribution of daily contacts over time and the sir evolution. we can identify three scenarios. the first on the top gives a slightly change to the standard epidemic dynamics. indeed, we can observe around t = . a reduction of the speed of the infection. for the second, we observe an inversion in the trend of the epidemic around t = , while for the third case we first observe inversion and then the resurgence of the number of infected when the lockdown measures are relaxed. in this part, we consider data about the dynamics of covid- in three european countries: france, italy and spain. for these three countries, the evolution of the disease, in terms of reported cases, evolved in rather different ways. the estimation of epidemiological parameters of compartmental models is an inverse problem of generally difficult solution for which different approaches can be considered. we mention in this direction a very recent comparison study [ ] . it is also worth to mention that often the data are partial and heterogeneous with respect to their assimilation, see for instance discussions in [ , , , ] . this makes the fitting problem challenging and the results naturally affected by uncertainty. the data concerning the actual number of infected, recovered and deaths of covid- are publicly available from the john hopkins university github repository [ ] . for the specific case of italy, we considered instead the github repository of the italian civil protection department [ ] . in the following, we present the adopted approach which is based on a strategy with two optimisation horizons (pre-lockdown and lockdown time spans) depending on the different strategies enacted by the governments of the considered european countries. in details, we considered first the time interval t ∈ [t , t ], being t the day in which lockdown started in each country (spain, italy and france) and t the day in which the reported cases hit units. the lower bound t has been imposed to reduce the effects of fluctuations caused by the way in which data are measured which have a stronger impact when the number of infected is low. once the time span has been fixed, we then considered a least square problem based on the minimization of a cost functional j which takes into account the relative l norm of the difference between the reported number of infected and the reported total casesÎ(t),Î(t) +r(t) and the evolution of i(t) and i(t) + r(t) prescribed by system ( ) with h ≡ . in practice, we solved the following constrained optimisation problem min β,γ j (Î,r, i, r) ( ) where the cost functional j is a convex combination of the mentioned norms and assumes the following form we choose p = . and we look for a minimum under the constraints ≤ β ≤ . , . ≤ γ ≤ . . in table we report the results of the performed parameter estimation together with the resulting attach rate r defined in ( ) . once that the contagion parameters have been estimated in the pre-lockdown time span, we successively proceeded with the estimation of the shape of the function h from the data.to estimate the second optimisation problem has been solved up to last available data for each country with daily time stepping h = and over a time window of three days. this has been done with the scope of regularizing possible errors due to late reported infected and smoothing the shape of h. both optimisation problems ( )- ( ) have been tackled using the matlab functions fmincon in combination with a rk integration method of the system of odes. in figure , we present the result of such fitting procedure between the model ( ) and the experimental data. the evolution of the estimated h(t), t ∈ [t , t ], is instead presented in the left column of figure . from this figure, it can be observed in the case of italy how, even if the daily number of infected decreases after may st, the estimated h remains quite stable after this day. this behavior cannot be reproduced by using a function h(i(t)) as the one given in ( ) . instead, a function h(t, i(t)) of the form ( ) , which takes into account both the instantaneous number of confirmed infections n i(t) and the total number of infected in the population n s , b ∈ r that best fit the estimated curve whose results are presented in table . we can easily observe in the right column of figure how the function h is capable to better explain the estimated values of h especially after the epidemic peak. to evaluate the goodness of fit we can finally use the so-called coefficient r , as reported in table . results show that the function h appears more suitable in terms of the fitting results for all tested situations. this fact may indicate that people are rather fast to apply social distancing, and therefore to reduce their average number of contacts, whereas they tend to restore the pre-pandemic average contact rate more slowly, possibly due to further memory effects. in this last part, we discuss the results of the s-sir model when the contact function has the shape extrapolated in the previous paragraph. in particular, we devoted it to show whether the obtained extrapolated function h, which depends on the product between the current number of infected and the total number of infected, produces qualitative trends that are in agreement with the data. as discussed in section . , it appears that the curve of infection may be better explained in the case in which the contact function is both a function of the instantaneous number of infected and of the total number of people that contracted the disease up to time t. in order to compare qualitatively the observed curve of infected and the theoretical one, we consider the following setting for the three countries under study: ν = , δ = and x j = , ∆t = . , τ = . , m = . moreover, we suppose s(t = ) and i(t = ) to match the relative number of susceptible and infected of each country at the time in which we start our comparison. finally, we consider where we use the parameters of table that we recall here for the seek of clarity: (a, b) = ( . , . ) in the case of france, and (a, b) = ( . , . ) in the case of italy. the case of spain will be discussed later. in figure we show the profiles of the infected over time together with the shape of the function h again over time. the results show that with the choices done for the contact function, it is possible to reproduce at least qualitatively the shape of the trend of infected during the pandemic observed in italy and in france. it is worth to remark that the considered social parameters have been estimated only the in the case of france, see [ ] , and we assumed that the initial contact distribution is the same for the italian case. we now consider the case of spain. for this country, according to figure , the trend of infected undergoes a deceleration during the lockdown period. this can be also clearly observed in figure where the extrapolated shape of the contact function h is shown. let also observe that while the global behavior of this function is captured by the fitting procedure, we however lose the minimum which takes place around end of april. this minimum is responsible of the deceleration in the number of infected and can be brought back to a strong external intervention in the lifestyle of spain country with the scope of reducing the hospitalizations. this effect can be reproduced by our model by imposing the same behavior in the function h. to that aim, the figure reports finally the profile of the infected over time together with the shape of the function h again over time for this last case. the results show that also in this case, the s-sir model is capable to qualitatively reproduce the data. the development of strategies for mitigating the spreading of a pandemic is an important public health priority. the recent case of covid- pandemic has seen as main strategy restrictive measures on the social contacts of the population, obtained by household quarantine, school or workplace closure, restrictions on travels, and, ultimately, a total lockdown. mathematical models represent powerful tools for a better understanding of this complex landscape of intervention strategies and for a precise quantification of the relationships among potential costs and benefits of different options [ ] . in this direction, we introduced a system of kinetic equations coupling the distribution of social contacts with the spreading of a pandemic driven by the rules of the sir model, aiming to explicitly quantify the mitigation of the pandemic in terms of the reduction of the number of social contacts of individuals. the kinetic modeling of the statistical distribution of social contacts has been developed according to the recent results in [ ] , which present an exhaustive description of contacts in the france population, divided by categories. the numerical experiments then show that the kinetic system is able to capture most of the phenomena related to the effects of partial lockdown strategies, and, eventually to maintain pandemic under control. control with uncertain data of socially structured compartmental models the log-normal distribution the french connection: the first large population-based contact survey in france relevant for the spread of infectious diseases the theory of the nonlinear, spatially uniform boltzmann equation for maxwellian molecules economic and social consequences of human mobility restrictions under covid- mathematical models in epidemiology. with a foreword by simon levin parameter estimation and uncertainty quantification for an epidemic model the boltzmann equation and its applications fitting dynamic models to epidemic outbreaks with quantified uncertainty: a primer for parameter uncertainty, identifiability, and forecast x: interaction of maturation delay and nonlinear birth in population and epidemic models on a kinetic model for a simple market economy on the definition and the computation of the basic reproduction ratio r in models for infectious diseases in heterogeneous populations mathematical epidemiology of infectious diseases: model building, analysis and interpretation kinetic modeling of alcohol consumption wealth distribution under the spread of infectious diseases an interactive web-based dashboard to track covid- in real time. the lancet infectious diseases strategies for mitigating an influenza pandemic estimating the number of infections and the impact of non-pharmaceutical interventions on covid- in european countries inferring the structure of social contacts from demographic data in the analysis of infectious diseases spread fokker-planck equations in the modelling of socio-economic phenomena fokker-planck equations in the modelling of socio-economic phenomena non-maxwellian kinetic equations modeling the evolution of wealth distribution relaxation schemes for nonlinear kinetic equations a simple sir model with a large set of asymptomatic infectives spread and dynamics of the covid- epidemic in italy: effects of emergency containment measures modelling the covid- epidemic and implementation of population-wide interventions in italy call center service times are lognormal. a fokker-planck description human behavior and lognormal distribution. a kinetic description the mathematics of infectious diseases prospect theory: an analysis of decision under risk choices, values, and frames determining the best population-level alcohol consumption model and its impact on estimates of alcoholattributable harms non-linear incidence and stability of infectious disease models a physical basis for the generalized gamma distribution log-normal distributions across the sciences: keys and clues the reproductive number of covid- is higher compared to sars coronavirus social contacts and mixing pat-terns relevant to the spread of infectious diseases on the spread of epidemics in a closed heterogeneous population time relaxed monte carlo methods for the boltzmann equation interacting multiagent systems: kinetic equations and monte carlo methods dipartimento della protezione civile. github: covid- italia -monitoraggio situazione gmel, gerhard; statistical modeling of volume of alcohol exposure for epidemiological studies of population health: the us example transmission dynamics of the etiological agent of sars in hong kong: impact of public health interventions epidemic models with uncertainty in the reproduction a generalization of the gamma distribution trails in kinetic theory: foundational aspects and numerical methods entropy-type inequalities for generalized gamma densities reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission key: cord- - ded t authors: kiamari, mehrdad; ramachandran, gowri; nguyen, quynh; pereira, eva; holm, jeanne; krishnamachari, bhaskar title: covid- risk estimation using a time-varying sir-model date: - - journal: nan doi: nan sha: doc_id: cord_uid: ded t policy-makers require data-driven tools to assess the spread of covid- and inform the public of their risk of infection on an ongoing basis. we propose a rigorous hybrid model-and-data-driven approach to risk scoring based on a time-varying sir epidemic model that ultimately yields a simplified color-coded risk level for each community. the risk score $gamma_t$ that we propose is proportional to the probability of someone currently healthy getting infected in the next hours. we show how this risk score can be estimated using another useful metric of infection spread, $r_t$, the time-varying average reproduction number which indicates the average number of individuals an infected person would infect in turn. the proposed approach also allows for quantification of uncertainty in the estimates of $r_t$ and $gamma_t$ in the form of confidence intervals. code and data from our effort have been open-sourced and are being applied to assess and communicate the risk of infection in the city and county of los angeles. the ongoing covid- epidemic has forced governments and public authorities to employ stringent measures [ ] , [ ] , including closing business and implementing stay-at-home orders, to contain the spread. when making such decisions, policymakers require tools to understand in "real-time" how the virus is spreading in the community, as well as tools to help communicate the level of risk to citizens so that they can be encouraged to take appropriate measures and take the public health directives seriously. one metric that has been found to be useful for authorities to assess the level of containment over time is the effective reproduction number [ ] . the effective reproduction number, r t , indicates on average how many currently susceptible persons can be infected by a currently infected individual. the epidemic grows if this measure is above one. it is desirable to keep this value as far below one as possible over time in order to contain and eventually, hopefully, eliminate the virus from the community. while r t is meaningful to understand the rate at which the epidemic is spreading and has been proposed previously (for example, see https://rt.live/ ), what has been missing in the public discourse is a risk metric that is more suitable for communication to a wider public. one key requirement for such a metric is that it be something that a citizen could relate to on an individual basis. another requirement is that it needs to be easy to communicate to a wide audience. we address both these requirements in this work and make the following contributions. first, we obtain the daily effective reproduction number r t of a time-varying sir model as well as the corresponding confidence interval. the confidence interval reflects uncertainty in both the parameter of the underlying model and uncertainty in the data itself. further, we present the mathematical derivation of the distribution of r t . second, we propose a novel risk score Γ t for a community that is proportional to the probability that an individual will get infected in the next hours. we show that the risk score can be calculated given estimates of four quantities: a) an estimate of i rep,new (t), the most recently reported count of new confirmed infectious cases, b) an estimate of r t as discussed above, c) an estimate of k, the ratio of true infectious cases to the number of confirmed cases, and d) an estimate of s(t), the current number of susceptible individuals in the community. to make the score more meaningful, we normalize the probability of infection by multiplying it by , . then, a risk score of x is an indication that there is, on average, a chance of x in , of an individual in the community becoming infected in the next hours. third, we propose to convert the numerical risk score, which has an intuitive meaning as indicated above, to a color-coded risk level based on suitably chosen thresholds. we propose the use of four color-levels to indicate the corresponding risk level from low to high: green, yellow, orange, and red. fourth, we have implemented software to estimate the risk level for any community and released it as open-source. the code requires only time-series data on confirmed new cases, the population of the community, and an estimate for the ratio of true to confirmed (detected) covid- positive cases. this software is being used at usc to process the daily data of communities within los angeles county to estimate and generate maps of risk levels by community. the block diagram in figure illustrates key elements of our system design. our data parser is able to get the raw data from online data sources, clean them up and store them in machine-friendly (csv and json) formats. our code for infection risk calculation uses this data in conjunction with a time-varying sir-based bayesian mathematical model to obtain risk estimates and prediction for different communities. the results are provided in csv format and can be used to generate a heatmap-type visualization as well. the risk scoring model we describe in this work is now being used by the city of los angeles, which in turn is working with the county of los angeles and other partners to develop a publicly accessible tool that can be used by individuals and communities to grow awareness and mitigate risk of infection. we believe that our risk estimation approach will be similarly of value to other communities around the world. ii. related work as noted above, the calculation of the risk score requires an estimate of r t . we show how this can be estimated using a time-varying sir model, a generalization of the well-known sir compartmental model [ ] , [ ] which consists of three states, namely the susceptible state, the infected state, and the recovered state. while traditionally this model is assumed to have a interaction rate / infection rate parameter that is constant, one recent work has used a time-varying sir model to recover the time-varying effective reproduction number [ ] . going beyond that work, we also show how to derive a confidence interval for r t in this work. further, the authors of [ ] make strong assumptions on the number of susceptible individuals by approximating it as a constant factor of the entire population. this assumption may not be accurate when the number of infected individuals are high compared to the total population of a community; we therefore take a more general approach. another recent work by systrom [ ] has presented a bayesian prediction approach to obtain confidence intervals for r t . however, systrom's work builds on [ ] , where the definition of infection rate r t is not based on a time-varying contact rate of the sir model. instead, their approach estimates infection rate probabilistically based on the number of new cases alone. we are not aware of prior work that has proposed defining risk for covid- or other epidemics in terms of an individual's probability of infection, which we argue is more meaningful for communicating risk to the public. iii. methodology compartmental mathematical models for epidemic spreads including the well-known sir model have been used since the work of kermack and mckendrick in [ ] . in the sir model, each member of a given population is in one of three states at any time: susceptible, infectious, recovered. any individual that is susceptible could become infected with some probability when they come into contact with an infected individual. any individual that is infectious eventually recovers (in the context of covid- when applying the sir model, note that the category of recovered individuals will also include removed individuals due to deaths, which could be modeled as a constant fraction of all individuals in this category). in the classical sir model, the number of susceptible individuals that become infected depends on the rate at which infected and susceptible individuals encounter each other and this rate is assumed to be constant. a well-known parameter in the classical sir model is called r , the effective reproductive number, which measures the average number of infections caused by infectious individuals at the beginning of the epidemic. in our work, we have extended the sir model to a time-varying model, in which the rate of encounters and infection probability between individuals in the population is assumed to be time-varying. this better reflects the reality of our present epidemic where interventions such as stay-at-home have been put in place and relaxed and various times and compliance with recommendations such as wearing masks and maintaining physical density has also been time-varying. based on this model, we are able to define and derive a new approach to calculating a time-varying version of the effective reproductive number, which we refer to as r t . a particularly innovative aspect of our model is that it is a bayesian model that allows the incorporation of various sources of uncertainty in the model, including uncertainty in the actual numbers of infected individuals (due to not every infected individual having been tested, as studies [ ] have shown), uncertainty in recovery times, and uncertainty in the choice of parameters for de-noising the empirical data. this allows us to generate not only an estimate of r t , but also quantify confidence in the estimate from a rigorous statistical perspective. in this section, we elaborate upon the sir model in detail. the sir model is one of the simplest and the most well-known epidemic model [ ] , [ ] where each person belongs to one of the following three states: the susceptible state, the infected state, and the recovered state. regarding the susceptible state, individuals have not had the virus yet. however, they may get infected in case of being exposed to an infected individual. as far as the infected state is concerned, a susceptible person has the virus after being exposed to infected individuals. finally, a person enters the recovered state in case of either the individual gets healed or dies. one important point about this model is that a recovered person will not be a susceptible one anymore. the sir model follows the following differential equations: where s(t), i(t), and r(t) respectively represent the number of susceptible, infected, and recovered people in a population size of n at time t. regarding the parameter σ, it is the recovery rate after being infected and is equal to d i where d i represents the average infectious days. parameter β is known as the effective contact rate, i.e. the average number of contacts an individual have with others is β. in analyzing whether any pandemic is contained, it is very crucial to obtain parameter β. we next show that how we can derive β from the aforementioned differential equations. ) obtaining β t and r t for the sir model: in the sir model, we can express the number of susceptible individuals in terms of population size and the number of infected persons as s(t) ≈ n −i(t). by replacing s(t) with n − i(t) in the second differential equation of ( ), we would have we can rewrite ( ) as follows: by taking definite integral from time t to t and assuming β to be constant in this time interval, we would have which leads to one can easily check ( ) has a unique solution for β due to the fact that term β−σ and log term have monotonic behaviors. an epidemic happens in case of increase in the number of infected individuals, i.e. di(t) dt > , or consequently in the early stage of an epidemic, almost everyone are susceptible except very few cases. therefore, n − i(t) ≈ n and as a result, condition ( ) would turn into β σ > . the variable r β σ is defined as the effective reproduction number. it is a useful metric to determine epidemic growth. in case of having r > , the epidemic is growing exponentially while r < indicates the epidemic is contained and will decline and die out eventually. for discrete-time cases such as daily reporting on number of infected cases, the time-variant effective contact rate β t , which represents the contact rate for time slot t can be derived by solving the following equation: therefore, the time-variant effective reproduction number would be defined as r t βt σ . since it is difficult to write a closed form solution for β t in ( ), we take a simpler approximation to β t by considering the following which is based on ( ) . then, we estimate r t as βt σ . ) obtaining the confidence interval for r t : since there is uncertainty about parameter d i (or equivalently σ) and the number of infected cases i(t), we now provide the derivation of confidence interval for parameter r t . regarding modeling the ambiguity in the number of the infected cases, we present the uncertainty about the actual number of infected cases as a factor of reported ones, i.e. i rep (t) k i(t), and k is a constant greater than . the main intuition behind this factor is due to taking into account the following two phenomena, namely lack of sufficient number of tests (specially in the beginning of the pandemic) and asymptomatic cases (mild infections which might not even be noticed). to derive the confidence interval, we need to first find the marginal distribution of r t . by considering f d (d) and f k (k) as the probability distribution function (pdf) for parameters d i and k, respectively, the joint pdf of these parameters would be due to the independence of d i and k. we can derive the probability distribution function of r t by performing the following transformation on parameters d i and k and introducing auxiliary variable z: since the transformation of (z, r t ) to (d i , k) is one-to-one, we have where a t , the joint pdf of z and r t would be f z,rt (z, r) = |j|f d,k (d, k) with ( ) by substituting the corresponding values of parameters and the jacobin, we have: the marginal pdf of r t can be obtained by taking integral of ( ) over parameter z, i.e. remark : one reasonable assumption regarding the pdf of parameters d i and k is that both of them have gaussian distributions. by considering d i ∼ n (µ d , σ d ) and k ∼ n (µ k , σ k ), the pdf of r t can be simplified as where φ µc,σ c (.) indicates the pdf of a normal distribution with mean µ c and variance σ c while by taking integral through using change of parameters, ( ) can be rewritten as follows ( ) where Φ µc,σ c (.) represents the cumulative distribution function (cdf) of a normal distribution with mean µ c and variance σ c . the confidence interval would belong to (r t − δ,r t + δ) wherer t e[r t ] = rf rt (r)dr and δ can be derived by satisfying p(|r t −r t |≤ δ) = rt−δ f rt (x)dx = − for some small > . ) estimating the risk score: we propose a novel risk score metric for a given community that is proportional to the probability of someone in that community becoming infected in the next time period (typically, hours). the risk score can be derived as the average number of people in that community that are likely to get infected in the next hours by the currently infectious people divided by the current number of susceptible individuals. we further normalize this probability by multiplying by , , so that a score of implies a in , chance of getting infected, a score of implies a in , chance of getting infected, and so on. mathematically, the risk score is defined as follows: where i rep,new (t) indicates the most recently reported count of new confirmed infectious cases, k refers to the ratio of true cases to reported cases, r t is the time-varying reproduction number, and n is the total population size of the community. the approximation follows from the fact that i rep,new (t) is approximately equal to i(t) d i ·k and s(t) the number of susceptible people in the community is approximately equal to n in the early stages of the epidemic. confidence intervals for the risk score Γ t could be obtained numerically using a similar process as described for r t accounting also for uncertainty in k. note that since k may not be known for a given community, it may be helpful to use the following normalized form of the risk score: Γt k , which is still proportional to the probability of infection for an individual. ) color-coded risk levels: to further simplify the presentation of the risk score to a wider audience, we propose to classify the risk levels into four color-coded levels: (green, yellow, orange, red). the risk level is determined by evaluating the normalized risk score ( Γ k ) with respect to three pre-specified threshold levels θ , θ , θ , such that when Γ k < θ the risk level is green, when θ ≤ Γ k < θ the risk level is yellow, when θ ≤ Γ k < θ the risk level is orange, and when Γ k ≥ θ the risk level is red. the software for data collection, infection rate estimation and prediction has already been implemented and made available as open-source software (at the following repository: https://github.com/anrgusc/ covid risk estimation). the software is written in python using standard data processing libraries such as numpy and scipy. we have acquired covid- case data from the la county's department of public health using a python-based data parser we wrote (open-sourced at the following link: https://github.com/anrgusc/ lacounty covid data). we have been updating this repository regularly with the latest data every day since mid-march and also making available plots of the number of cases, number of fatalities, top communities with the large number of cases, infection rate for the entire la county, and the top communities with the highest infection rate at the following link: http://anrg.usc.edu/www/covid .html. the following data sources are used for the infection rate and prediction: • the covid- case information was collected through la county's daily press releases (accessible through the following website: http://publichealth.lacounty.gov/media/coronavirus/). • recovery information provided by the world health organization. • the population data from la county census is available online (from lacounty.gov/government/ geography-statistics/cities-and-communities/). the city of los angeles is currently using the risk model described in this work that has been developed by researchers at usc, to help assess location-based risk for covid- infection. the city is working with the county and other partners to develop a tool that is publicly accessible and can be used by individuals and communities to mitigate risk of infection. the goal is to change behaviors to reduce risk of infection and promote a greater understanding of factors that increase covid risk. a color-coded covid- threat level tool that can be used by citizens has also been unveiled by the mayor of the city of la, online at https://corona-virus.la/covid- -threat-level. we present below plots from our analysis of la county community case data using the estimation approach described in this work. figure shows plots of the estimated expected reproductive number r t and the estimated risk score for the entire la county. these plots are based on a -day moving average applied on the daily number of confirmed cases. in accordance with la county daily press releases, there is a sharp jump in both r t and risk score around the beginning of july. note that the reason the risk score during the beginning of july is higher than the risk score during the last week of march, despite having the same r t , is due to the fact that there are significantly more confirmed cases in july compared to march. figure shows the risk score estimates over time for four representative communities within the la county. figure shows the color-coded risk levels for communities in la county for select dates over the past months. we have proposed a new risk metric Γ t that can be used by individuals in any community to assess their probability of getting infected by covid- . the metric builds on the estimation of r t , the average reproductive number, which is obtained from a time-varying extension of the classical sir model. we show how to evaluate the uncertainty in both metrics as well. in future work, we plan to generalize the approach to the seir model, which also models an additional incubation period. we have released code to implement an estimation of the risk score that can be used for any community worldwide as long as time-series data for confirmed new cases and the population are known. we have also proposed the use of simple color-coded risk levels to inform and guide the public, as has been adopted in the city of los angeles. evaluation of the lockdowns for the sars-cov- epidemic in italy and spain after one month follow up the lockdowns workedbut what comes next the reproductive number of covid- is higher compared to sars coronavirus the kermack-mckendrick epidemic model revisited networks: an introduction a time-dependent sir model for covid- the metric we need to manage covid- real time bayesian estimation of the epidemic potential of emerging infectious diseases key: cord- -ivhqeu authors: battiston, pietro; gamba, simona title: covid- : $r_ $ is lower where outbreak is larger date: - - journal: nan doi: nan sha: doc_id: cord_uid: ivhqeu we use daily data from lombardy, the italian region most affected by the covid- outbreak, to calibrate a sir model individually on each municipality. these are all covered by the same health system and, in the post-lockdown phase we focus on, all subject to the same social distancing regulations. we find that municipalities with a higher number of cases at the beginning of the period analyzed have a lower rate of diffusion, which cannot be imputed to herd immunity. in particular, there is a robust and strongly significant negative correlation between the estimated basic reproduction number ($r_ $) and the initial outbreak size, in contrast with the role of $r_ $ as a emph{predictor} of outbreak size. we explore different possible explanations for this phenomenon and conclude that a higher number of cases causes changes of behavior, such as a more strict adoption of social distancing measures among the population, that reduce the spread. this result calls for a transparent, real-time distribution of detailed epidemiological data, as such data affects the behavior of populations in areas affected by the outbreak. the basic reproduction number, or r , represents the average number of secondary cases produced by a single infected case in an otherwise susceptible population, and it is typically used as a reference value to assess the transmissibility of an infectious disease in a given population. given a number of individuals susceptible to infection, a disease with higher r will infect a larger number of individuals. there is hence an obvious positive relationship between the r and the resulting size of an outbreak (tildesley and keeling, ). however, the value of r during an outbreak does not only depend on exante features of a virus or a population, but potentially also on the response of both population and authorities to the outbreak. this is particularly true in the context of the covid- pandemic, to which most countries in the world have reacted with some form of social distancing measures, or lockdown. in absence of a vaccine or effective drugs, these measures are the best weapon to reduce the number of deaths, as well as the number of intensive care unit beds required (kucharski et al., ; flaxman et al., ; ferguson et al., ; greenstone and nigam, ) . in the present study, we analyze data on the diffusion of covid- in lombardy, the region of italy most heavily affected by the pandemic provide an accurate description of the early phase of the outbreak in such region). specifically, we employ daily data on the number of individuals positive to covid- at the municipality level, focusing on a period in which the entire country was subject to a lockdown. all municipalities under analysis share the same public health system and, in the period considered, were subject to the same social distancing regulation. however, at the start of the period, they were characterized by a strong heterogeneity in the number of cases, both in absolute terms and in terms of cases per capita. we study a period beginning on march , , that is, more than two weeks after the lockdown regulation was put in place, and ending with april , when such regulations still held: this means that movements across municipalities are severely restricted, requiring any travelers to present a valid (typically work or health related) justification for their journey. we fit a susceptible-infected-recovered (sir) model on data from each municipality and find that the estimated r is negatively correlated with the prevalence in the municipality at the beginning of our period. this result holds both when considering the absolute and per capita number of cases and is robust to different specifications and sample disaggregations. we present and compare different complementary explanations for this finding. early and widespread testing increases the reported number of cases and might allow the authorities to slow the spread of the pandemic by isolating known cases. at the same time, where the number of cases is higher, the population might comply more strictly with the lockdown measures, thus reducing the rate of spread: we show in section . why the latter mechanism is most likely to drive our results. we employ count data of per-municipality recorded cases, updated daily and distributed by regional authorities. we do not rely on data on recovered and deceased individuals, as such data are not available with the required geographical disaggregation. data are available starting from march , and cover a period of twenty-one days during which lockdown measures were always in place. we verify that only minimal deviations appear between regional data and the aggregation of municipal data. out of municipalities in lombardy, had at least one recorded covid- case as of this date. figure a displays the number of cases (size of the dots) and the cases per capita (color of the dots) as of march for each of these municipalities. similarly, figure b displays the number of new cases (size of the dots) and the number of new cases per capita (color of the dots) recorded in each municipality between march and april . it should be noted that official data concerning the covid- outbreak in italy has been found to be strongly incomplete, both in terms of positive individuals and of casualties: several researchers have estimated an outbreak size much higher than that suggested by official numbers (flaxman et al., ) , while others have corroborated this with an analysis of anomalies in death rates. moreover, local testing strategies are known to have deviated from who guidelines and have changed over time, also depending on available resources: towards the end of our period of interest, more subjects with mild symptoms were tested. for this reason, some researchers have put forwards adaptations of the sir model that account for a threshold in the note: dot size represent absolute numbers, colors represent cases per one thousand inhabitants. capacity of the health system. such problems are not specific to italy, as official data from a number of countries have been questioned. more in general, the difficulty in obtaining reliable data on the number of infected, deceased and recovered individuals calls for refinements of traditional epidemiologic models (atkeson et al., ; riccardo et al., ) . given our research question, these caveats are of limited importance. indeed, the focus of the present work is to document differences in response across municipalities, rather than to precisely estimate the epidemiological parameters or expected duration of the covid- outbreak in lombardy. data on population size is obtained from the italian national istitute of statistics (istat). we fit a sir model on each municipality in the period of twenty-one days beginning in march . given the short time span considered, we employ a simplified sir model which does not account for natural rate of mortality. hence, the model is entirely defined by setting few parameters: β, which determines the rate at which susceptible (s) individuals become infected (i); γ, which determine the rate at which infected individuals become recovered (r); the initial number of infected and recovered individuals, and the population size n (= s + i + r). we take population size from official statistics. we hence consider a discretized version of the continuous sir model -each period corresponding to a day -and automatically explore the parameter space for β, γ, and the initial value for i and r, looking for the combination that provides the best fit. specifically, the goodness of fit is maximized by minimizing the sum of square residuals between the cases count and the sum of the i and r pools sizes. the initial values for the free parameters are set to those calibrated on the entire lombardy region. note: fit between data and the corresponding sir model for lombardy region (left) and the most affected municipalities at the beginning of our period of interest in absolute and per capita terms, respectively (center, right). given that the sir model assumes a non-null initial population of infected individuals, we only consider the municipalities satisfying this condition. we further drop municipalities which had new cases recorded on only one or two dates, hence reducing to municipalities. although this sample selection might in principle affect our results, we show in section . that this is not the case. figure a displays the fit between data at the regional level and the corresponding simulated sir model. figures b and c are the equivalent for milan and castiglione d'adda: these are the two municipalities which, at the beginning of our period of interest, had been most heavily hit in absolute and per capita terms, respectively. note that a weekly fluctuation can be observed for all municipalities: this is in line with documented evidence that less tests are processed during the weeekend, and the effect reverberates on the number of positive detected cases with a delay of two to three days. we expect these fluctuations to affect the entire region homogeneously. once we find the best sir parameters for each municipality, we regress the estimated r (the ratio of the estimated β and γ) on the outbreak size within the municipality as of march . we focus on the per capita number of cases, as we expect any effect to be related to the prevalence of the outbreak -a same number of cases will be perceived in a very different way in milan or in a small municipality. figure shows the distribution of the estimated values of r : the mean estimated value is . ( . when weighted on population), while the median is . a strong heterogeneity (which can be partly attributed to statistical noise -several municipalities count only a few cases each) can be observed across municipalities: in what follows, unless differently specified, we trim data by dropping . % of outliers on each side of the distribution of r , hence reducing to municipalities. in only few of these ( ) the value of r appears to be larger than the critical threshold of : that is, in the vast majority of municipalities, the outbreak is expected to spontaneously extinguish without requiring herd immunity. table presents the results of the regression analysis. we see a negative and strongly significant relationship between the initial number of cases per one thousand inhabitants and the estimated r (column ( )); this relationship is robust to controlling for population size (column ( )), and to both the absolute number of cases and the inverse of population size (column ( )), i.e., a full interaction model where the the per capita count represents the interaction term (kronmal, ) . the coefficient for the per capita number of cases can be interpreted as the reduction in r resulting from an increase of one case per one thousand individuals in the prevalence of the outbreak. the value of - . observed in column ( ), which we consider as our baseline specification, indicates a sizeable effect: for reference, given that the prevalence in milan as of march was of around . % , the above mentioned result suggests that had it been . % , the average r would have been around . instead than the observed . . the same negative and strongly significant effect is observed if we focus on the absolute number of cases as explanatory variable, controlling for the population size (column ( )). it should be noted that any intrinsic characteristic of municipalities -such as demography, location, structure of the economy -which might explain a larger outbreak size should also favor a larger r (mills, ) . thus, controlling for such characteristics is expected to further reduce the coefficient for cases% . there are a few reasons that might explain why a larger outbreak should result in a subsequent lower r . the first might be related to herd immunity, by which areas where the outbreak is initially more present have less scope for further spread because a large share of individuals have already caught, and possibly developed immunity to, the virus. this is in principle not a problem of our approach, as the sir model accounts for this effect and should estimate an r net of it -in other terms, r describes the evolution of the outbreak in an hypothetical situation in which the pool of susceptible individuals is never reduced. however, the problem might still arise if the count data employed severely underestimate the actual spread of the virus: the number of positive cases could actually be much larger than the detected one, leading to an estimated r lower than the real one because of the undetected effect of herd immunity in reducing the rate of contagion. the underestimation of infected population might also suggest an alternative explanation of the result related to test capacity: to the extent that a lower detected prevalence reflects a lower ability of authorities to identify infected individuals, it should then correlate with a lower ability to isolate, hospitalize and cure them, and hence to a faster outbreak growth. a third, social, explanation is instead that wherever the local population is aware of a larger prevalence of the disease, it reacts by changing its behavior towards a stricter application of social distancing rules, thus leading to a lower r . in what follows, we provide evidence in favor of this hypothesis. we start by analyzing the first possible explanation: several sources have argued that the actual size of the infected population might lie between four and ten times the official reported numbers. in the most affected muncipalities in our sample during the period analyzed, infections per one thousand inhabitants have been recorded, and according to the most pessimistic estimates this would mean that up to % of the population was infected. while most municipalities have a number of recorded cases per one thousand inhabitants which is orders of magnitude lower, to avoid the possibility that an even partial herd immunity effect might be driving the results, we reestimate our main model on subsamples of municipalities according to their initial number of cases per capita. specifically, we split the sample according to quartiles of cases per capita on march . the results, presented in columns ( ) to ( ) of table , show that our findings are not driven by herd immunity, as the coefficient for cases% is negative in each quartile. the absolute value of such coefficient is actually much larger for municipalities with a low prevalence than for those with a higher prevalence, and is strongly significant in the first two quartiles, hence including municipalities with cases per one thousand inhabitants or less. we then consider the second possible explanation: that a lower detected prevalence signals a lower detection ability, and that this naturally correlates with lower ability to track and quarantine infected subjects, hence raising the subsequent rate of diffusion. in order to disentangle this test capacity explanation from the third, social, one, we sketch two simple models of how these would be expected to affect r . let us represent with u t the unknown real number of infected subjects per one thousand inhabitants, at time t in a given municipality, and with i t the corresponding known number. we are interested in the extent to which unidentified infected subjects (which are u t −i t cases per one thousand inhabitants) will raise the r for the municipality in the subsequent period. more specifically, we can assume that identified and unidentified patients form two different pools of infected subjects and that the latter has a much higher β -probability of infecting susceptible individuals -that leads to a corresponding higher r . since β enters linearly in r -and assuming for simplicity that γ is constant -the relationship between u t − i t and r would be expected to be linear. moreover, it is well known that not only identified patients are subject to a stronger form of isolation, but also close contacts of such patient (some of which are not infected) are recommended to selfquarantine: this does not happen in municipalities with a larger number of undetected cases, which implies that the effect of each unidentified patient should be more than linear in increasing the r . this would imply a linear or concave relationship between cases% and r . vice-versa, any social explanation is based on the assumption that inhabitants react to the news of the cases in their municipality. given any concave function describing this reaction, a same increase in per capita cases will be perceived as more important if the initial number of cases is lower. that is, we can expect inhabitants of two towns with respectively and known cases per one thousand inhabitants to differ in their compliance with social distancing prescriptions more than inhabitants of two towns with respectively and known cases per one thousands inhabitants: a same difference of one percentage point in prevalence will have a weaker effect on people behavior were prevalence is higher. this alternative explanation pre- dicts a convex relationship (given the negative sign) between cases% and r . to disentangle between the test capacity and the social explanation, we enrich our basic model by introducing a quadratic term in cases% . this is done in column ( ) of table . we see that the quadratic term has a positive sign and is strongly significant, while the sign of the linear term is still negative and has increased in absolute terms. hence, while this does not allow us to exlude that the other explanations might play a role, we can conclude that the social explanation is the main driver of the negative relationship between cases% and r . (jones et al., ) describes two possible opposite reactions to the covid- oubreak: a precautionary attitude that leads to a stricter adherence to guidelines, and a "fatalism effect" according to which an individual who is more likely to be infected in the future "reduces her incentives to be careful today". our results provide strong evidence in favor of the first mechanism. in addition to the quantile analysis previously described, we verify that our main result also holds consistently across the provinces (lower level administrative regions) of which lombardy is composed. results are displayed in figure a . we see that, for each province, the effect of cases% on r is negative: although the small sample size results in only few provinces reaching statistical significance, it is clear that no specific area of lombardy is alone responsible for our findings. in order to verify that our results do not strictly depend upon the period considered, we replicate our analysis over different -days moving windows within our period of analysis. for each subperiod, we fit the r for each municipality and regress it on the number of cases per thousand individuals at the beginning of that subperiod. in accordance with the selection procedure described in section , we reduce this analysis to the municipalities that feature at least one case on march and, in each window, have new cases recorded in at least two dates. the results are shown in figure b . for comparability, we also display the value of the coefficient estimated for the entire time period on the same restricted sample of municipalities. we find that the effect of interest is robust, that is, the coefficient for the cases% is consistently negative and strongly significant for each subperiod. its absolute value is significantly decreasing over time; that is, the effect of the number of cases on the r in the following days appears to be stronger in the earlier days of the outbreak. while there might be multiple explanations for this, we only remark that the rate of growth of the epidemic has been consistently decreasing: whether individual behavior reacts not just to outbreak size, but also to its change over time, is an issue for further research. finally, we verify that all results reported in table , including statistical significance, are virtually unchanged if we do not trim the data as previously described. while an accurate predition of the date of extinction of the outbreak deserves more sophisticated epidemiologic models riccardo et al., ) that are out of the scope of the present paper, we can analyze to some extent the relationship between the predicted date of extinction and the number of initial cases. ( ) and ( ), . cases in columns ( ) and ( ). * p< . ; * * p< . ; * * * p< . in general, the relationship between infected population at time t and expected date of extinction of the outbreak within a sir model depends on the size of r : if the latter is smaller than -i.e., the outbreak is spontaneously slowing -then a smaller outbreak will extinguish sooner; viceversa, if r > , a larger outbreak will sooner reach a level of herd immunity, and hence die out. since, according to our data, most municipalities in lombardy display an r < , we focus on this case. while for a same level of r we expect the predicted time to extinction to increase with the initial outbreak size, the fact that the r is negatively related to initial outbreak size -and that a lower r leads to a quicker extinction -leaves theoretically undetermined the relationship between initial outbreak size and duration of the outbreak. in order to shed light on this indeterminacy, we proceed to simulating the sir model for each municipality until the predicted size of the infected population decreases below either (i) . cases for one thousands inhabitants or (ii) . cases and we consider the number of periods elapsed as the outbreak duration. we then regress the outbreak duration, defined in these two different ways, over the initial (i) number of cases per capita (columns ( ) and ( ) of table ) and (ii) absolute number of cases (columns ( ) and ( ) of table ), respectively. results from table show that the relationship between outbreak size and extinction date is non-trivial. first, the relatively few municipalities with r > do influence significantly the results -as already discussed, the expected effect of outbreak size for a same r is reversed in such cases. second, if we restrict to r < , the relationship is positive and significant when reasoning in per capita terms, but not in absolute terms. it should also be mentioned that the results depend on the thresholds adopted in the definition of outbreak extinction. in general, given that r < determines the exponential decay, a lower threshold will mean that the date of extinction is further away for municipalities with a relatively low number of cases and relatively high r . summing up, the results of predicting the extinction date are to be interpreted as cautionary: municipalities with smaller outbreaks might get rid of them sooner than others with more infected individuals (column ( ) of table ), but this result does not generalize to the absolute outbreak size (columns ( ) and ( ) -the former even featuring a negative sign). plans for a gradual exit from lockdown should take into account that the relationship between outbreak size and expected outbreak duration is difficult to pinpoint -as well as the possibility that a larger outbreak might bring the population closer to herd immunity, making it more resistent to a new outbreak. we show that in lombardy, during a lockdown, the basic reproduction number for covid- reacts negatively to the initial size of an outbreak at the municipality level, an effect which cannot be explained by the population having reached herd immunity. limited test capacity -and hence a limited ability by health authorities to isolate and treat affected individuals -appear sir models by design tend to infected subjects only for t → ∞ and different authors pick different thresholds as denoting outbreak extinction. notice that the most appropriate value crucially depends also on the extent to which the outbreak is underestimated by available data. to have at most a marginal role in explaining our result. instead, we show that the population's behavior is key to slowing down the contagion and in particular that information about local outbreaks impacts on diffusion rates. this effect is consistent across all provinces and it is robust to the sample period considered. the fact that the effect is particularly strong in municipalities characterized by a smaller outbreak suggests that individuals react more strongly to the first few cases. this aspect is confirmed by the convex relationship we find between the initial size of the outbreak and the r : the marginal effect on behavior of each new case seems to decrease in the number of cases. our results provide evidence in favor of a precautionary rather than fatalistic individual attitude towards the outbreak. they call for considering the population as an integral part of the decision making process, and for a timely and transparent provision of epidemiologic data. how deadly is covid- ? understanding the difficulties with estimation of its fatality rate impact of non-pharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand estimating the number of infections and the impact of nonpharmaceutical interventions on covid- in european countries. imperial college covid- response team does social distancing matter? university of chicago, becker friedman institute for economics working paper potential short-term outcome of an uncontrolled covid- epidemic in lombardy, italy optimal mitigation policies in a pandemic: social distancing and working from home spurious correlation and the fallacy of the ratio standard revisited early dynamics of transmission and control of covid- : a mathematical modelling study. the lancet infectious diseases demographic science covid- is r a good predictor of final epidemic size: foot-and-mouth disease in the uk compute values of the parameter π i−l = π i × ( − δ π,i ) and π i,r = π i × ( + δ π,i ) (the right and left candidate values for parameter π), . compare count data and the simulation obtained with each of the three candidates π i,l , π i and π i,r by computing the sum of squared residuals, . select the candidate value which results in the smallest error as new parameter value π i+ , . if the value did not change (that is, π i+ = π i ) optimization procedure for simplicity, the procedure for fitting the sir model is implemented over the parameters r and γ rather than β and γ, where r = β γ . for each parameter (including the initial values i =Î and r =r), the procedure is initialized by deriving reasonable values based on tuning the model to regional aggregated data.then the procedure works as follows (i denotes an iteration, δ π, is set to . for each parameter π): key: cord- -k hx m authors: toda, alexis akira title: susceptible-infected-recovered (sir) dynamics of covid- and economic impact date: - - journal: nan doi: nan sha: doc_id: cord_uid: k hx m i estimate the susceptible-infected-recovered (sir) epidemic model for coronavirus disease (covid- ). the transmission rate is heterogeneous across countries and far exceeds the recovery rate, which enables a fast spread. in the benchmark model, % of the population may be simultaneously infected at the peak, potentially overwhelming the healthcare system. the peak reduces to . % under the optimal mitigation policy that controls the timing and intensity of social distancing. a stylized asset pricing model suggests that the stock price temporarily decreases by % in the benchmark case but shows a w-shaped, moderate but longer bear market under the optimal policy. the novel coronavirus disease that was first reported in wuhan, china in december (covid- ) is quickly spreading around the world. as of march , , the total number of cases exceeds , and the disease has claimed more than , lives globally. since march , while new cases in china appears to have settled down, the number of cases are exponentially growing in the rest of the world. to prevent the spread of the new virus, many governments have introduced draconian measures such as restricting travel, ordering social distancing, and closing schools, bars, restaurants, and other businesses. in a time of such extreme uncertainty, making economic decisions becomes challenging because pandemics are rare. the most recent comparable episode is the spanish flu of (trilla et al., ) , so pandemics are likely to occur at most once during one's lifetime. nevertheless, individuals need to make everyday decisions such as how to manage inventories of staples, how much to consume and save, when to buy or sell stocks, etc., and these decisions depend on the expectation of how long and severe the epidemic is. governments must also make decisions such as to what extent imposing travel restrictions, social distancing, closure of schools and businesses, etc., and for how long (anderson et al., ) . when past experience or data are not so relevant in new situations such as the covid- pandemic, simple mathematical models are useful in analyzing the current situation and predicting the near future. this paper aims to help decision making by building a mathematical epidemic model, estimating it using the up-to-date data of covid- cases around the world, making out-of-sample predictions, and discussing optimal policy and economic impact. the model is the kermack and mckendrick ( ) susceptible-infected-recovered (sir) model and is relatively simple. an infected individual interacts with other agents and transmits the disease at a certain rate if the other agent is susceptible. an infected individual also recovers (or dies) at a certain rate. the model can be described as a system of ordinary differential equations, which is nonlinear due to the interaction between the infected and susceptible. the behavior of the model is completely determined by the transmission rate (β), the recovery rate (γ), and the initial condition. despite the nonlinearity, the model admits an exact analytical solution in parametric form (harko et al., ) , which is convenient for estimation and prediction. using this model, i theoretically derive the condition under which an epidemic occurs and characterize the peak of the epidemic. i next take this model to the data. because the situation and policies surrounding covid- is rapidly evolving, i use the most recent two weeks ( days) of cases and estimate the model parameters by nonlinear least squares. except for china, japan, and korea, which are early epicenters of the outbreak, the transmission rate β is around . - . and heterogeneous across countries. the estimated transmission rates far exceed the recovery rate γ, which is about . based on the clinical course of covid- . due to the high transmission rate and lack of herd immunity, in the absence of mitigation measures such as social distancing, the virus spreads quickly and may infect around percent of the population at the peak of the epidemic. using the model, i conduct an experiment where the government introduces temporary mitigation measures and succeeds in reducing the transmission rate. if the mitigation measures are taken too early, the peak is delayed but the epidemic restarts with no effect on the peak because the population does not acquire herd immunity. assuming the government can take drastic measures up to weeks, the optimal policy is start mitigation measures once the number of cases reaches . % of the population. under the optimal policy, the peak infection rate reduces to . %. therefore unless vaccines are expected to be developed in the near future, the draconian measures currently taken in many countries may be suboptimal, and it may be desirable to postpone them. to evaluate the potential economic impact of covid- , i build a stylized production-based asset pricing model. capitalists hire labor at competitive markets and infected workers are unable to work. because the epidemic (temporarily) drastically reduces the labor supply, output goes down and the model calibration suggests that the stock market crashes by % during the epidemic, though the crash is short-lived. under the optimal policy, the stock price exhibits a w-shaped pattern and remains about % undervalued than the steady state for about half a year. i first present the compartment model of epidemics following kermack and mckendrick ( ) . the society consists of n individuals, among which s are susceptible to an infectious disease (they are neither infected nor have immunity) and i are infected. (we ignore population growth because an epidemic occurs in a relatively short interval.) let r = n − s − i be the number of individuals who are immune (possibly because they are vaccinated, infected and recovered, or dead). suppose that individuals meet each other randomly, and conditional of an infected individual meeting a susceptible individual, the disease is transmitted with some probability. let β > be the rate at which an infected individual meets a person and transmits the disease if susceptible. let γ > be the rate at which an infected individual recovers or dies. then the following differential equations hold. to see why ( whereẋ = dx/ dt. although the system of differential equations ( . ) is nonlinear, harko et al. ( ) obtain an exact analytical solution in parametric form. then the solution to ( . ) is parametrized as . ( . ) proof. see equations ( )-( ) in harko et al. ( ) . the parametrization has been changed slightly for convenience. using proposition . , we can study the qualitative properties of the epidemic. proposition . . let everything be as in proposition . . then the followings are true. . in the long run, fraction v * ∈ ( , ) of susceptible individuals will not be infected (fraction − v * infected), where v * is the unique solution to . if βx > γ, then there is an epidemic. the number of infected individuals reaches the maximum when βx(t max ) = γ, at which point the fraction of population is infected. the maximum infection rate y max is increasing in x , y and decreasing in γ/β. because f can be approximated by a linear function around v * , we get for θ = γ/β, it follows from simple algebra that ∂y/∂y = , so y max is increasing in x , y and decreasing in θ = γ/β. proposition . has several policy implications for dealing with epidemics. first, the policy maker may want to prevent an epidemic. this is achieved when the condition βx ≤ γ holds. since before the epidemic the fraction of infected individuals y is negligible, we can rewrite the no-epidemic condition as β( −z ) ≤ γ. unlike bacterial infections, for which a large variety of antibiotics are available, there is generally no curative care for viral infections. therefore the recovery/death rate γ is generally out of control. hence the only way to satisfy the no-epidemic condition β( − z ) ≤ γ is either (i) control transmission (reduce β), for example by washing hands, wearing protective gear, restricting travel, or social distancing, or (ii) immunization (increase z ). the required minimum immunization rate to prevent an epidemic is z = − γ/β. second, the policy maker may want to limit the economic impact once an epidemic occurs. because the supply of healthcare services is inelastic in the short run, it is important to keep the maximum infection rate y max in ( . ) within the capacity of the existing healthcare system. this is achieved by lowering the transmission rate β. in this section i estimate the sir model in section and use it to predict the evolution of the covid- pandemic. the number of cases of covid- is provided by center for systems science and engineering at johns hopkins university (henceforth csse). the cumulative number of confirmed cases and deaths can be downloaded from the github repository. the time series starts on january , and is updated daily. because countries are added as new cases are reported, the cross-sectional size increases every day. for the majority of countries, the csse data are at country level. however, for some countries such as australia, canada, and china, regional data at the level of province or state are available. in such countries, i aggregate across regions and use the country level data. figure shows the number of covid- cases in early epicenters, namely china, iran, italy, japan, and korea. estimation of the model poses significant challenges because the situation of covid- is rapidly evolving. the model parameters are likely time-varying because new policies are introduced on a day-to-day basis, temperature and weather may affect the virus activity, and the virus itself my genetically mutate. for this reason, i only use the data from the two most recent weeks ( days). i estimate the model parameters by nonlinear least squares, minimizing the distance between model outputs (x, y, z) and data. because the csse data only contains confirmed cases and deaths, but the sir model abstracts from death, i define c = y + z = − x to be the fraction of infected or recovered cases in the model. the counterpart in the data is c = c/n , where c is the number of confirmed cases and n is population. because the number of cases grows by many orders of magnitude within a short period of time, i define the loss function using log cases: since i only include c in the loss function ( . ), the parameters γ and z , which govern the dynamics of fraction of recovered z, are not identified. therefore i exogenously fix these two parameters. for the recovery rate γ, because the majority of patients with covid- experience mild symptoms that resemble a common cold or influenza (zhou et al., ) , which takes about days to recover, i set γ = / = . . for z , i set it to one divided by population. although the fraction of cases c(t) is likely significantly underestimated because infected individuals do not appear in the data unless they are tested, it does not cause problems for estimating the parameter of interest (the transmission rate β) because under-reporting is absorbed by the constant y in ( . b), which only affects the onset of the epidemic by a few weeks without changing the overall dynamics (see figure ). to sum up, i estimate the remaining parameters β and y by numerically minimizing the loss function ( . ). standard errors are calculated using the asymptotic theory of m -estimators. see appendix a for the solution algorithm of the sir model. i estimate the sir model for all countries that meet the following inclusion criteria: (i) the number of confirmed cases as of march , exceeds , , and (ii) the number of confirmed cases at the beginning of the estimation sample exceeds . these countries are mostly early epicenters (china, japan, korea), european countries, and north america. table shows the estimated transmission rate (β), its standard error, the fraction of infected individuals at the peak (y max ), number of days to reach the peak (t max ), and the fraction of the population that is eventually infected. figure shows the time evolution of covid- cases in italy, which is the earliest epicenter outside east asia. we can make a few observations from table . first, the estimated transmission rates are heterogeneous across countries. while β is low in china, the note: the table presents the estimation results of the sir model in section . β (s.e.): the transmission rate and standard error; ymax: the fraction of infected individuals at the peak in ( . ); tmax: the number of days to reach the peak; "total": the fraction of the population that is eventually infected. origin of covid- , and the neighboring countries (japan and korea), where the virus spread first, β is very high at around . - . in other countries and the no-epidemic condition βx ≤ γ fails. despite the short time series ( days), the transmission rate is precisely estimated in most countries. although current data is insufficient to draw any conclusion, there are a few possible explanations for the heterogeneity of β. first, the transmission rate β may artificially appear high in later epicenters such as europe and north america just because these countries were slow in adopting tests of covid- and the testing (hence reporting) rate is increasing. second, the heterogeneity in β may be due to the fact that early epicenters have already taken mitigation measures of covid- . for example, while japan closed all schools starting on march , many states in us have implemented similar measures such as closing schools, bars, and restaurants only around march , so we may not have yet seen the effect of such policies. finally, it is possible that there are cultural differences. for example, school children in japan are taught to wash their hands before eating and to gargle after returning home, which they practice, and (from personal experience) japanese cities tend to be much cleaner than most cities in the world. second, according to the model, countries other than china, japan, and korea are significantly affected by the epidemic. if the current trend in the transmission rate β continues, the epidemic will peak in may , at which point around percent of the population will be infected by the virus simultaneously. by the time the epidemic ends, more than percent of the population is eventually infected. these numbers can be used to do a back-of-the-envelope calculation of health outcomes. in february , the cruise ship diamond princess was put under quarantine for two weeks after covid- was detected. all passengers were tested and tracked, among whom tested positive and died. although this is not a representative sample because the cruise ship passengers tend to be older and wealthier, the mortality of covid- should be around % for this group and possibly lower for the general population. zhou et al. ( ) document that patients died among that required hospitalization in two hospitals in wuhan. therefore the ratio of patients requiring hospitalization to death is / = . . thus, based on the model, the fraction of people requiring hospitalization at the peak is y max × . × . = . % assuming y max = %, the median value in table . using the estimated model parameters, we can predict the course of the epidemic. for this exercise, i consider the following scenario. the epidemic starts with the initial condition (y , z ) = ( − , ). the benchmark transmission rate is set to the median value in table , which is β = . . when the number of total cases c = y + z exceeds − , the government introduces mitigation measures such as social distancing, and the transmission rate changes to either β = . or β = . . mitigation measures are lifted after weeks and the transmission rate returns to the benchmark value. i also consider the optimal mitigation policy, where the government chooses the threshold of casesc to introduce mitigation measures as well as the transmission rate β to minimize the maximum infection rate y max . figure shows the fraction of infected and recovered over time. when the government introduces early but temporary mitigation measures (left panel), the epidemic is delayed but the peak is unaffected. this is because the maximum infection rate y max in ( . ) is mostly determined by β and γ since (x , y ) ≈ ( , ), and the epidemic persists until the population acquires herd immunity so that the no-epidemic condition βx ≤ γ holds. while early drastic mitigation measures might be useful to buy time to develop a vaccine, they may not be effective in mitigating the peak unless they are permanent. the right panel in figure shows the course of the epidemic under the optimal policy, which is to introduce mitigation measures such that β = . when the number of cases reachesc = . % of the population. under this scenario, only y max = . % of the population is simultaneously infected at the peak as opposed to % under the benchmark scenario. the intuition is that by waiting to introduce mitigation measures, a sufficient fraction of the population is infected (and acquires herd immunity) and thus reduces the peak. to evaluate the economic impact of the covid- epidemic, in this section i solve a stylized production-based asset pricing model. the economy consists of two agent types, capitalists and workers, who respectively own the capital stock and labor. the capital stock at time t is denoted by k t . the capital growth rate is exogenous, lognormal, and i.i.d. over time: capitalists hire labor at competitive markets and produce a perishable good using a cobb-douglas production technology y = k α l −α , where α ∈ ( , ) is the capital share. the labor supply is exogenous, deterministic, and normalized to during normal times. during an epidemic, workers are either susceptible, infected, or recovered, and only non-infected agents can supply labor. for simplicity, i assume that workers are hand-to-mouth and consume the entire wage. the financial market is complete, and capitalists maximize the constant relative risk aversion (crra) utility where β > is the discount factor and γ > is the relative risk aversion coefficient. a stock is a claim to the representative firm's profit k α l −α − wl, where w is the wage. given the sequence of labor supply {l t } ∞ t= , we can solve for the equilibrium stock price semi-analytically as follows. the first-order condition for profit maximization implies w = ( − α)(k/l) α . hence the firm's profit, which by market clearing must equal consumption of capitalists, is ( . ) because the marginal buyer of the stock is a capitalist, the stochastic discount factor of the economy is given by m t+ = β(c t+ /c t ) −γ . letting p t be the stock price, the no-arbitrage condition implies dividing both sides of ( . ) by c t , letting v t = p t /c t be the price-dividend ratio, and using ( . ), we obtain because capital growth is i.i.d. normal and labor supply is deterministic, we can rewrite the price-dividend ratio as where κ = βe α( −γ)µ+[α( −γ)] σ / . in normal times, we have l t ≡ and v t ≡ κ −κ , where we need κ < for convergence. during an epidemic, it is straightforward to compute the price-dividend ratio by iterating ( . ) using the boundary condition v ∞ = κ −κ . i calibrate the model at daily frequency. i set the capital share to α = . and the relative risk aversion to γ = , which are standard values. i assume a % annual discount rate, so β = exp(− . /n d ), where n d = . is the number of days in a year. to calibrate capital growth and volatility, note that in normal times we have l = and hence y = k α . taking the log difference, we obtain log(y t+ /y t ) = α log(k t+ /k t ). therefore according to the model, capital growth rate and volatility are /α times those of output. i calibrate these parameters from the us quarterly real gdp per capita in q - q and obtain µ = . and σ = . at the annual frequency. for the transmission rate, using the point estimates in section , i consider β = . . the recovery rate is γ = . . the initial condition is (y , z ) = ( − , ). figure shows the stock price relative to potential output p t /y * t , where y * t = k α t is the full employment output. the left and right panels are under the benchmark case and optimal policy, respectively. in the benchmark model, the stock price decreases sharply during the epidemic by about %. however, the stock market crash is short-lived and prices recover quickly after the epidemic. this observation is in sharp contrast to the prediction from rare disasters models (rietz, ; barro, ) , where shocks are permanent. under the optimal policy, because the infection rate y has two peaks, the stock price shows a wshaped pattern. however, the decline is much more moderate at around %. because the situation with covid- is rapidly evolving, any analysis based on current data will quickly become out of date. however, any analysis based on available data is better than no analysis. with these caveats in mind, i draw the following conclusions from the present analysis. the covid- epidemic is spreading except in china, japan, and korea. in many countries the transmission rate at present (march , ) is very high at around β = . . this number implies that it takes only /β ≈ days for a patient to infect another individual. since it takes around days to recover from the illness, the number of patients will grow exponentially and may overwhelm the healthcare system if no actions are taken. if the current trend continues, the epidemic will peak in early may in europe and north america, at which point around percent of the population will be infected. because the recovery rate γ is an uncontrollable biological parameter, the only way to control the epidemic is to reduce the transmission rate β, perhaps by restricting travel or social distancing. however, temporary measures only slows the onset of the epidemic but has no effect on the peak because the epidemic persists until the population acquires herd immunity. the optimal policy that minimizes the peak is to wait to introduce mitigation measures until a sufficient fraction of the population is infected, which can reduce the peak to . %. policy makers in affected countries may also want to look at measures taken in china, japan, and korea, which are the countries relatively successful at controlling the spread so far. using the estimated transmission rates, i have solved a stylized productionbased asset pricing model. the model predicts that the stock price decreases by % during the epidemic, but recovers quickly afterwards because the epidemic is a short-lived labor supply shock. under the optimal policy, the stock price exhibits a w-shaped pattern and remains about % undervalued than the steady state level for half a year. in principle, solving the sir model numerically is straightforward using the following algorithm. . given the parameters (β, γ) and initial condition (x , y ), solve for v * as the unique solution to ( . ). . take a grid = v > v > v > · · · > v n > v * . for each n = , . . . , n , compute the integral numerically. . define t = and t n = n k= i n for n ≥ . compute (x n , y n , z n ) using ( . ) evaluated at v = v n . then {t n , (x n , y n , z n )} n n= gives the numerical solution to the sir model. although the above algorithm is conceptually straightforward, there are two potential numerical issues. first, the integrand in ( . ) is not well-behaved near ξ = . in fact, setting ξ = we obtain g( ) = /βy , which is typically a very large number since y (the fraction of infected at t = ) is typically small, say of the order − . this makes the numerical integral i n in (a. ) inaccurate for small n. second, for applications we would like the dates {t n } n n= to be well-behaved (say approximately evenly spaced), which requires an appropriate choice of the grid {v n } n n= . to deal with the first issue, let us express g as g = h + h , where h has a closed-form primitive function and h is well-behaved near ξ = . since log ξ ≈ ξ − near ξ = , a natural candidate is h (ξ) := ξ(βx ( − ξ) + βy + γ(ξ − )) = β(x +y )−γ ξ + βx −γ (βx −γ)( −ξ)+βy , (β(x + y ) = γ) βy ξ . (β(x + y ) = γ) then by simple algebra, ( . ) becomes where h (ξ) := ξ(βx ( − ξ) + βy + γ log ξ) − ξ((βx − γ)( − ξ) + βy ) . because h (ξ) is approximately to the first order around ξ = , we can calculate the numerical integrals in (a. ) accurately. to deal with the second issue, consider the sir model ( . ) with γ = . then ( . a) becomesẋ = −βx( − x), and by separation of variables we obtain the analytical solution x(t) = x x + ( − x )e βt . using ( . a), for the case γ = , time t and parameter v are related as v = finally, define v n = x + ( − x )e βt * n/n . figure shows the dynamics of sir model when (β, γ) = ( . , . ), y = − , − , − , and z = . for this example, − v * = . % of the population is eventually infected, and y max = . % of the population is infected at the peak of the epidemic. the initial condition (y ) affects the timing of the epidemic but not its dynamics. figure : dynamics of sir model when (β, γ) = ( . , . ), y = − , − , − , and z = . smaller y corresponds to later onset of epidemic. economic activity and the spread of viral diseases: evidence from high frequency data how will country-based mitigation measures influence the course of the covid- epidemic? rare disasters and asset markets in the twentieth century the macroeconomics of epidemics exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates the impact of the spanish flu epidemic on economic performance in sweden a contribution to the mathematical theory of epidemics antiviral drugs for viruses other than human immunodeficiency virus the equity risk premium: a solution the "spanish flu" in spain hua chen, and bin cao. clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study key: cord- - vtxz authors: cooper, ian; mondal, argha; antonopoulos, chris g. title: a sir model assumption for the spread of covid- in different communities date: - - journal: chaos solitons fractals doi: . /j.chaos. . sha: doc_id: cord_uid: vtxz in this paper, we study the effectiveness of the modelling approach on the pandemic due to the spreading of the novel covid- disease and develop a susceptible-infected-removed (sir) model that provides a theoretical framework to investigate its spread within a community. here, the model is based upon the well-known susceptible-infected-removed (sir) model with the difference that a total population is not defined or kept constant per se and the number of susceptible individuals does not decline monotonically. to the contrary, as we show herein, it can be increased in surge periods! in particular, we investigate the time evolution of different populations and monitor diverse significant parameters for the spread of the disease in various communities, represented by countries and the state of texas in the usa. the sir model can provide us with insights and predictions of the spread of the virus in communities that the recorded data alone cannot. our work shows the importance of modelling the spread of covid- by the sir model that we propose here, as it can help to assess the impact of the disease by offering valuable predictions. our analysis takes into account data from january to june, , the period that contains the data before and during the implementation of strict and control measures. we propose predictions on various parameters related to the spread of covid- and on the number of susceptible, infected and removed populations until september . by comparing the recorded data with the data from our modelling approaches, we deduce that the spread of covid- can be under control in all communities considered, if proper restrictions and strong policies are implemented to control the infection rates early from the spread of the disease. in december , a novel strand of coronavirus (sars-cov- ) was identified in wuhan, hubei province, china causing a severe and potentially fatal respiratory syndrome, i.e., covid- . since then, it has become a pandemic declared by world health organization (who) on march , which has spread around the globe [ , , , , ] . who published in its website preliminary guidelines with public health care for the countries to deal with the pandemic [ ] . since then, the infectious disease has become a public health threat. italy and usa are severely affected by covid- [ , , ] . millions of people are forced by national governments to stay in self-isolation and in difficult conditions. the disease is growing fast in many countries around the world. in the absence of availability of a proper medicine or vaccine, currently social distancing, self-quarantine and wearing a face mask have been emerged as the most widely-used strategy for the mitigation and control of the pandemic. in this context, mathematical models are required to estimate disease transmission, recovery, deaths and other significant parameters separately for various countries, that is for different, specific regions of high to low reported cases of covid- . different countries have already taken precise and differentiated measures that are important to control the spread of the disease. however, still now, important factors such as population density, insufficient evidence for different symptoms, transmission mechanism and unavailability of a proper vaccine, makes it difficult to deal with such a highly infectious and deadly disease, especially in high population density countries such as india [ , , ] . recently, many research articles have adopted the modelling approach, using real incidence datasets from affected countries and, have investigated different characteristics as a function of various parameters of the outbreak and the effects of intervention strategies in different countries, respective to their current situations. it is imperative that mathematical models are developed to provide insights and make predictions about the pandemic, to plan effective control strategies and policies [ , , ] . modelling approaches [ , , , , , , ] are helpful to understand and predict the possibility and severity of the disease outbreak and, provide key information to determine the intensity of covid- disease intervention. the susceptible-infected-removed (sir) model and its extended modifications [ , , , ] , such as the extended-susceptible-infected-removed (esir) mathematical model in various forms have been used in previous studies [ , , ] to model the spread of covid- within communities. here, we propose the use of a novel sir model with different characteristics. one of the major assumptions of the classic sir model is that there is a homogeneous mixing of the infected and susceptible populations and that the total population is constant in time. in the classic sir model, the susceptible population decreases monotonically towards zero. however, these assumptions are not valid in the case of the spread of the covid- virus, since new epicentres spring up around the globe at different times. to account for this, the sir model that we propose here does not consider the total population and takes the susceptible population as a variable that can be adjusted at various times to account for new infected individuals spreading throughout a community, resulting in an increase in the susceptible population, i.e., to the so-called surges. the sir model we introduce here is given by the same simple system of three ordinary differential equations (odes) with the classic sir model and can be used to gain a better understanding of how the virus spreads within a community of variable populations in time, when surges occur. importantly, it can be used to make predictions of the number of infections and deaths that may occur in the future and provide an estimate of the time scale for the duration of the virus within a community. it also provides us with insights on how we might lessen the impact of the virus, what is nearly impossible to discern from the recorded data alone! consequently, our sir model can provide a theoretical framework and predictions that can be used by government authorities to control the spread of covid- . in our study, we used covid- datasets from [ ] in the form of time-series, spanning january to june, . in particular, the time series are composed of three columns which represent the total cases i d tot , active cases i d and deaths d d in time (rows). these datasets were used to update parameters of the sir model to understand the effects and estimate the trend of the disease in various communities, represented by china, south korea, india, australia, usa, italy and the state of texas in the usa. this allowed us to estimate the development of covid- spread in these communities by obtaining estimates for the number of deaths d, susceptible s, infected i and removed r m populations in time. consequently, we have been able to estimate its characteristics for these communities and assess the effectiveness of modelling the disease. the paper is organised as following: in sec. , we introduce the sir model and discuss its various aspects. in sec. , we explain the approach we used to study the data in [ ] and in sec. , we present the results of our analysis for china, south korea, india, australia, usa, italy and the state of texas in the usa. section discusses the implications of our study to the "flattening the curve" approach. finally, in sec. , we conclude our work and discuss the outcomes of our analysis and its connection to the evidence that has been already collected on the spread of covid- worldwide. . the sir model that can accommodate surges in the susceptible population the world around us is highly complicated. for example, how a virus spreads, including the novel strand of coronavirus (sars-cov- ) that was identified in wuhan, hubei province, china, depends upon many factors, among which some of them are considered by the classic sir model, which is rather simplistic and cannot take into consideration surges in the number of susceptible individuals. here, we propose the use of a modified sir model with characteristics, based upon the classic sir model. in particular, one of the major assumptions of the classic sir model is that there is a homogeneous mixing of the infected i and susceptible s populations and that the total population n is constant in time. also, in the sir model, the susceptible population s decreases monotonically towards zero. these assumptions however are not valid in the case of the spread of the covid- virus, since new epicentres spring up around the globe at different times. to account for this, we introduce here a sir model that does not consider the total population n , but rather, takes the susceptible population s as a variable that can be adjusted at various times to account for new infected individuals spreading throughout a community, resulting in its increase. thus, our model is able to accommodate surges in the number of susceptible individuals in time, whenever these occur and as evidenced by published data, such as those in [ ] that we consider here. our sir model is given by the same, simple system of three ordinary differential equations (odes) with the classic sir model that can be easily implemented and used to gain a better understanding of how the covid- virus spreads within communities of variable populations in time, including the possibility of surges in the susceptible populations. thus, the sir model here is designed to remove many of the complexities associated with the real-time evolution of the spread of the virus, in a way that is useful both quantitatively and qualitatively. it is a dynamical system that is given by three coupled odes that describe the time evolution of the following three populations: . susceptible individuals, s(t): these are those individuals who are not infected, however, could become infected. a susceptible individual may become infected or remain susceptible. as the virus spreads from its source or new sources occur, more individuals will become infected, thus the susceptible population will increase for a period of time (surge period). furthermore, it is assumed that the time scale of the sir model is short enough so that births and deaths (other than deaths caused by the virus) can be neglected and that the number of deaths from the virus is small compared with the living population. based on these assumptions and concepts, the rates of change of the three populations are governed by the following system of odes, what constitutes our sir model where a and b are real, positive, parameters of the initial exponential growth and final exponential decay of the infected population i. it has been observed that in many communities, a spike in the number of infected individuals, i, may occur, which results in a surge in the susceptible population, s, recorded in the covid- datasets [ ] , what amounts to a secondary wave of infections. to account for such a possibility, s in the sir model ( ), can be reset to s surge at any time t s that a surge occurs, and thus it can accommodate multiple such surges if recorded in the published data in [ ] , what distinguishes it from the classic sir model. the evolution of the infected population i is governed by the second ode in system , where a is the transmission rate constant and b the removal rate constant. we can define the basic effective reproductive rate r e = as(t)/b, as the fate of the evolution of the disease depends upon it. if r e is smaller than one, the infected population i will decrease monotonically to zero and if greater than one, it will increase, i.e., if di(t) thus, the effective reproductive rate r e acts as a threshold that determines whether an infectious disease will die out quickly or will lead to an epidemic. at the start of an epidemic, when r e > and s ≈ , the rate of infected population is described by the approximation di(t) dt ≈ (a − b) i(t) and thus, the infected population i will initially increase exponentially according to i(t) = i( ) e (a−b)t . the infected population will reach a peak when the rate of change of the infected population is zero, di(t)/dt = , and this occurs when r e = . after the peak, the infected population will start to decrease exponentially, following i(t) ∝ e −bt . thus, eventually (for t → ∞), the system will approach s → and i → . interestingly, the existence of a threshold for infection is not obvious from the recorded data, however can be discerned from the model. this is crucial in identifying a possible second wave where a sudden increase in the susceptible population s will result in r e > , and to another exponential growth of the number of infections i. the data in [ ] for china, south korea, india, australia, usa, italy and the state of texas (communities) are organised in the form of time-series where the rows are recordings in time (from january to june, ), and the three columns are, the total cases i d tot (first column), number of infected individuals i d (second column) and deaths d d (third column). consequently, the number of removals can be estimated from the data by since we want to adjust the numerical solutions to our proposed sir model ( ) to the recorded data from [ ] , for each dataset (community), we consider initial conditions in the interval [ , ] and scale them by a scaling factor f to fit the recorded data by visual inspection. in particular, the initial conditions for the three populations are set such that s( ) = (i.e., all individuals are considered susceptible initially), is the maximum number of infected individuals i d . consequently, the parameters a, b, f and i d max are adjusted manually to fit the recorded data as best as possible, based on a trial-and-error approach and visual inspections. a preliminary analysis using non-linear fittings to fit the model to the published data [ ] provided at best inferior results to those obtained in this paper using our trial-and-error approach with visual inspections, in the sense that the model solutions did not follow as close the published data, what justifies our approach in the paper. a prime reason for this is that the published data (including those in [ ] we are using here) are data from different countries that follow different methodologies to record them, with not all infected individuals or deaths accounted for. in this context, s, i and r m ≥ at any t ≥ . system ( ) can be solved numerically to find how the scaled (by f ) susceptible s, infected i and removed r m populations (what we call model solutions) evolve with time, in good agreement with the recorded data. in particular, since this system is simple with well-behaved solutions, we used the first-order euler integration method to solve it numerically, and a time step h = / = . that corresponds to a final integration time t f of days since january, . this amounts to double the time interval in the recorded data in [ ] and allows for predictions for up to days after january, . obviously, what is important when studying the spread of a virus is the number of deaths d and recoveries r in time. as these numbers are not provided directly by the sir model ( ), we estimated them by first, plotting the data for deaths d d vs the removals r d m , where r d m = d d + r d = i d tot − i d and then fitting the plotted data with the nonlinear function where d and k are constants estimated by the non-linear fitting. the function is expressed in terms of only model values and is fitted to the curve of the data. thus, having obtained d from the non-linear fitting, the number of recoveries r can be described in time by the simple observation that it is given by the scaled removals, r m from the sir model ( ), less the number of deaths, d from eq. ( ), the rate of increase in the number of infections depends on the product of the number of infected and susceptible individuals. an understanding of the system of eqs. ( ) explains the staggering increase in the infection rate around the world. infected people traveling around the world has led to the increase in infected numbers and this results in a further increase in the susceptible population [ ] . this gives rise to a positive feedback loop leading to a very rapid rise in the number of active infected cases. thus, during a surge period, the number of susceptible individuals increases and as a result, the number of infected individuals increases as well. for example, as of march, , there were infected individuals and by april, , this number had grown to a staggering [ ] . understanding the implications of what the system of eqs. ( ) tells us, the only conclusion to be drawn using scientific principles is that drastic action needs to be taken as early as possible, while the numbers are still low, before the exponential increase in infections starts kicking in. here, we have applied the sir model ( ) considering data from various countries and the state of texas in the usa provided in [ ] . assuming the published data are reliable, the sir model ( ) can be applied to assess the spread of the covid- disease and predict the number of infected, removed and recovered populations and deaths in the communities, accommodating at the same time possible surges in the number of susceptible individuals. figures - show the time evolution of the cumulative total infections i tot , current infected individuals, i, recovered individuals, r, dead individuals, d, and normalized susceptible populations, s for china, south korea, india, australia, usa, italy and texas in the usa, respectively. the crosses show the published data [ ] and the smooth lines, solutions and predictions from the sir model. the cumulative total infections plots also show a curve for the initial exponential increase in the number of infections, where the number of infections doubles every five days. the figures also show predictions, and a summary of the sir model parameters in ( ) and published data in [ ] for easy comparisons. we start by analysing the data from china and then move on to the study of the data from south korea, india, australia, usa, italy and texas. the number of infections peaked in china about february, and since then, it has slowly decreased. the from the plots shown in figs. and , it is obvious that the south korean government has done a wonderful job in controlling the spread of the virus. the country has implemented an extensive virus testing program. there has also been a heavy use of surveillance technology: closed-circuit television (cctv) and tracking of bank cards and mobile phone usage, to identify who to test in the first place. south korea has achieved a low fatality rate (currently one percent) without resorting to such authoritarian measures as in china. the most conspicuous part of the south korean strategy is simple enough: implementation of repeated cycles of test and contact trace measures. to match the recorded data from india with predictions from the sir model ( ), it is necessary to include a number of surge periods, as shown in fig. . this is because the sir model cannot predict accurately the peak number of infections, if the actual numbers in the infected population have not peaked in time. it is most likely the spread of the virus as of early june, is not contained and there will be an increasing number of total infections. however, by adding new surge periods, a higher and delayed peak can be predicted and compared with future data. in fig. , a consequence of the surge periods is that the peak is delayed and higher than if no surge periods were applied. the model predictions for the september, including the surges are: total infections, active infections and deaths, whereas if there were no surge periods, there would be total infections, active infections and deaths, with the peak of , which is about % of the current number of active cases occuring around may . thus, the model can still give a rough estimate of future infections and deaths, as well as the time it may take for the number of infections to drop to safer levels, at which time restrictions can be eased, even without an accurate prediction in the peak in active infections (see figs. and ). a surge in the susceptible population was applied in early march, in the country. the surge was caused by passengers disembarking from the ruby princes cruise ship in sydney and then, returning to their homes around australia. more than passengers and crew have become infected and died. two government enquires have been established to investigate what went wrong. also, at this time many infected overseas passengers arrived by air from europe and the usa. the australian government was too slow in quarantining arrivals from overseas. from mid-march, until mid-may, , the australian governments introduced measures of testing, contact tracing, social distancing, staying at home policy, closure of many businesses and encouraging people to work from home. from figs. and , it can be observed that actions taken were successful as the actual number of infections as of early june, , the peak number of infections has not been reached. when a peak in the data is not reached, it is more difficult to fit the model predictions to the data. in the model, it is necessary to add a few surge periods. this is because new epicentres of the virus arose at different times. the virus started spreading in washington state, followed by california, new york, chicago and the southern states of the usa. the need to add surge periods shows clearly that the spread of the virus is not under control. in the usa, by the end of may, , the number of active infected cases has not yet peaked and the cumulative total number of infections keeps getting bigger. this can be accounted for in the sir model by considering how the susceptible population changes with time in may. during that time, to match the data to the model predictions, surge periods were used where the normalized susceptible population s was reset to . every four days. what is currently happening in the usa is that as susceptible individuals become infected, their population decreases, with these infected individuals mixing with the general population, leading to an increase in the susceptible population. this is shown in the model by the variable for the susceptible population, s, varying from about . to . , repeatedly during may. until this vicious cycle is broken, the cumulative total infected population will keep growing at a steady rate and not reach an almost steady-state. the fluctuating normalized susceptible variable provides clear evidence that government authorities do not have the spread of the virus under control (see figs. and ). the plots in figs. and show that the peak in the total cumulative number of infections has not been reached as early as june, however, the peak is probably not far away. if there are no surges in the susceptible population, then one could expect that by late september, , the number of infections will have fallen to very small numbers and the virus will have been well under control with the total number of deaths in the order of . in mid-may, , some restrictions have been lifted in the state of texas. the sir model can be used to model some of the possible scenarios if the early relaxation of restrictions leads to increasing number of susceptible populations. if there is a relatively small increase in the future number of susceptible individuals, no series impacts occur. however, if there is a large outbreak of the virus, then the impacts can be dramatic. for example, at the end of june, , if s was reset to . (s = . ), a second wave of infections occurs with the peak number of infections occurring near the end of july, with the second wave peak being higher than the initial peak number of infections. subsequently, the number of deaths will rise from about to nearly , as shown in figs. and . if governments start lifting their containment strategies too quickly, then it is probable there will be a second wave of infections with a larger peak in active cases, resulting to many more deaths. figure shows clearly that the peak of the pandemic has been reached in italy and without further surge periods, the spread of the virus is contained and number of active cases is declining rapidly. the plots in panels (a), (b) in fig. are a check on how well the model can predict the time evolution of the virus. these plots also assist in selecting the model's input parameters. the term flattening the curve has rapidly become a rallying cry in the fight against covid- , popularised by the media and government officials. claims have been made that flattening the curve results in: (i) reduction in the peak number of cases, thereby helping to prevent the health system from being overwhelmed and (ii) in an increase in the duration of the pandemic with the total burden of cases remaining the same. this implies that social distancing measures and management of cases, with their devastating economic and social impacts, may need to continue for much longer. the picture which has been widely shown in the media is shown in fig. (a) . the idea presented in the media as shown in fig. (a) is that by flattening the curve, the peak number of infections will decrease, however, the total number of infections will be the same and the duration of the pandemic will be longer. hence, they concluded that by flattening the curve, it will have a lesser impact upon the demands in hospitals. figure (b) gives the scientific meaning of flattening the curve. by governments imposing appropriate measures, the number of susceptible individuals can be reduced and combined with isolating infected individuals, will reduce the peak number of infections. when this is done, it actually shortens the time the virus impacts the society. thus, the second claim has no scientific basis and is incorrect. what is important is reducing the peak in the number of infections and when this is done, it shortens the duration in which drastic measures need to be taken and not lengthen the period as stated in the media and by government officials. figure shows that the peak number of infections actually reduces the duration of the impact of the virus on a community. mathematical modelling theories are effective tools to deal with the time evolution and patterns of disease outbreaks. they provide us with useful predictions in the context of the impact of intervention in decreasing the number of infected-susceptible incidence rates [ , , ] . in this work, we have augmented the classic sir model with the ability to accommodate surges in the number of susceptible individuals, supplemented by recorded data from china, south korea, india, australia, usa and the state of texas to provide insights into the spread of covid- in communities. in all cases, the model predictions could be fitted to the published data reasonably well, with some fits better than others. for china, the actual number of infections fell more rapidly than the model prediction, which is an indication of the success of the measures implemented by the chinese government. there was a jump in the number of deaths reported in mid-april in china, which results in a less robust estimate of the number of deaths predicted by the sir model. the susceptible population dropped to zero very quickly in south korea showing that the government was quick to act in controlling the spread of the virus. as of the beginning of june, , the peak number of infections in india has not yet been reached. therefore, the model predictions give only minimum estimates of the duration of the pandemic in the country, the total cumulative number of infections and deaths. the case study of the virus in australia shows the importance of including a surge where the number of susceptible individuals can be increased. this surge can be linked to the arrival of infected individuals from overseas and infected people from the ruby princess cruise ship. the data from usa is an interesting example, since there are multiple epicentres of the virus that arise at different times. this makes it more difficult to select appropriate model parameters and surges where the susceptible population is adjusted. the results for texas show that the model can be applied to communities other than countries. italy provides an example where there is excellent agreement between the published data and model predictions. thus, our sir model provides a theoretical framework to investigate the spread of the covid- virus within communities. the model can give insights into the time evolution of the spread of the virus that the data alone does not. in this context, it can be applied to communities, given reliable data are available. its power also lies to the fact that, as new data are added to the model, it is easy to adjust its parameters and provide with best-fit curves between the data and the predictions from the model. it is in this context then, it can provide with estimates of the number of likely deaths in the future and time scales for decline in the number of infections in communities. our results show that the sir model is more suitable to predict the epidemic trend due to the spread of the disease as it can accommodate surges and be adjusted to the recorded data. by comparing the published data with predictions, it is possible to predict the success of government interventions. the considered data are taken in between january and june, that contains the datasets before and during the implementation of strict and control measures. our analysis also confirms the success and failures in some countries in the control measures taken. strict, adequate measures have to be implemented to further prevent and control the spread of covid- . countries around the world have taken steps to decrease the number of infected citizens, such as lock-down measures, awareness programs promoted via media, hand sanitization campaigns, etc. to slow down the transmission of the disease. additional measures, including early detection approaches and isolation of susceptible individuals to avoid mixing them with no-symptoms and self-quarantine individuals, traffic restrictions, and medical treatment have shown they can help to prevent the increase in the number of infected individuals. strong lockdown policies can be implemented, in different areas, if possible. in line with this, necessary public health policies have to be implemented in countries with high rates of covid- cases as early as possible to control its spread. the sir model used here is only a simple one and thus, the predictions that come out might not be accurate enough, something that also depends on the published data and their trustworthiness. however, as the model data show, one thing that is certain is that covid- is not going to go way quickly or easily. health organization, coronavirus disease (covid- ) outbreak nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study novel coronavirus (covid- ) cases, provided by jhu csse estimation of the transmission risk of the -ncov and its implication for public health interventions the effect of human mobility and control measures on the covid- epidemic in china naming the coronavirus disease (covid- ) and the virus that causes it an epidemiological forecast model and software assessing interventions on covid- epidemic in china extended sir prediction of the epidemics trend of covid- in italy and compared with hunan, china quantifying the effect of quarantine control in covid- infectious spread using machine learning predictions for covid- outbreak in india using epidemiological models covid- : india imposes lockdown for days and cases rise mohfw, coronavirus disease (covid- ). available online on the predictability of infectious disease outbreaks the effect of travel restrictions on the spread of the novel coronavirus (covid- ) outbreak. science epidemics with mutating infectivity on small-world networks early dynamics of transmission and control of covid- : a mathematical modelling study. the lancet infectious diseases modified seir and ai prediction of the epidemics trend of covid- in china under public health interventions analysis and forecast of covid- spreading in china, italy and france a data-driven network model for the emerging covid- epidemics in wuhan, toronto and italy estimation of covid- dynamics on a back-of-envelope: does the simplest sir model provide quantitative parameters and predictions modeling the impact of mass influenza vaccination and public health interventions on covid- epidemics with limited detection capability three basic epidemiological models the mathematics of infectious diseases the basic epidemiology models: models, expressions for r , parameter estimation, and applications the sir model and the foundations of public health global analysis of the covid- pandemic using simple epidemiological models a modified sir model for the covid- contagion in italy mathematical modeling of covid- transmission dynamics with a case study of wuhan mod- elling the covid- epidemic and implementation of population-wide interventions in italy the effectiveness of quarantine of wuhan city against the corona virus disease (covid ): a wellmixed seir model analysis ), e . declaration of competing interest i am attaching herewith a copy of our manuscript entitled "a sir model assumption for the spread of covid- in different communities" co-authored by ian cooper and chris g. antonopoulos in favor of publication in your esteemed journal chaos, solitons & fractals. the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work am is thankful for the support provided by the department of mathematical sciences, university of essex, uk to complete this work. key: cord- -xpfbw lo authors: molnar, tamas g.; singletary, andrew w.; orosz, gabor; ames, aaron d. title: safety-critical control of compartmental epidemiological models with measurement delays date: - - journal: nan doi: nan sha: doc_id: cord_uid: xpfbw lo we introduce a methodology to guarantee safety against the spread of infectious diseases by viewing epidemiological models as control systems and by considering human interventions (such as quarantining or social distancing) as control input. we consider a generalized compartmental model that represents the form of the most popular epidemiological models and we design safety-critical controllers that formally guarantee safe evolution with respect to keeping certain populations of interest under prescribed safe limits. furthermore, we discuss how measurement delays originated from incubation period and testing delays affect safety and how delays can be compensated via predictor feedback. we demonstrate our results by synthesizing active intervention policies that bound the number of infections, hospitalizations and deaths for epidemiological models capturing the spread of covid- in the usa. the rapid spreading of covid- across the world forced people to change their lives and practice mitigation efforts at a level never seen before, including social distancing, mask-wearing, quarantining and stay-at-home orders. these human actions played a key role in reducing the spreading of the virus, although such interventions often have economic consequences, lose of jobs and physiological effects. therefore, it is important to focus mitigation efforts and determine when, where and what level of intervention needs to be taken. this research provides a methodology to determine the level of active human intervention needed to provide safety against the spreading of infection while keeping mitigation efforts minimal. we use compartmental epidemiological models to describe the spreading of the infection [ ] , [ ] , and we view these models as control systems where human intervention is the control input. viewing epidemiological models as control systems has been proposed in the literature recently [ ] , [ ] , [ ] , and various models with varying transmission rate [ ] , [ ] , [ ] , [ ] have appeared to quantify the level of human interventions in the case of covid- . in this paper, we build on our recent work [ ] and use a safety-critical control approach to synthesize control strategies that guide human interventions so that certain safety criteria (such as keeping infection, hospitalization and death below given limits) are fulfilled with minimal mitigation efforts. the approach is based on the framework of fig. . illustration of the sir model as control system and its fit to us covid- data [ ] . model parameters were estimated from compartmental data (right) by accounting for a measurement delay τ . the transmission rate and the corresponding control input (left) were fitted to mobility data. control barrier functions [ ] , [ ] that leverages the theory of set invariance [ ] for dynamical [ ] , [ ] and control systems [ ] , [ ] , [ ] . we take into account that data about the spreading of the infection may involve significant measurement delays [ ] , [ ] , [ ] , [ ] due to the fact that infected individuals may not show symptoms and get tested for quite a few days. we use predictor feedback control [ ] , [ ] , [ ] to compensate these delays, and we provide safety guarantees against errors in delay compensation. the outline of the paper is as follows. section ii introduces a generalized compartmental model, which covers the class of the most popular epidemiological models. section iii introduces safety critical control without considering measurement delays, while sec. iv is dedicated to delay compensation. conclusions are drawn in sec. v. compartmental models describe how the size of certain populations of interest evolve over time. consider n + m compartments, given by x ∈ r n+m , which are separated into two groups: n so-called multiplicative compartments, given by w ∈ r n , and m outlet compartments, given by z ∈ r m . the evolution of these compartments over time t can be given by the following generalized compartmental model: where x = [w t z t ] t , initial conditions are x( ) = x , and f, g : r n → r n , q : r n → r m and r : r m → r m depend on the choice of the model; see examples , and . in model ( ), the multiplicative compartments w are populations that essentially describe the transmission of the infection. the transmission can be reduced by active interventions, whose intensity is quantified by a control input u ∈ u ⊂ r. the outlet compartments z, on the other hand, do not actively govern the transmission, but rather indicate its effects, as they are driven by the evolution of the multiplicative compartments. example . sir model. one of the most fundamental epidemiological models is the sir model [ ] , [ ] that consists of susceptible, s, infected, i, and recovered, r, populations. the sir model captures the spread of the infection based on the interplay between the susceptible and infected populations. thus, s and i are multiplicative compartments, while r, that measures the number of recovered (or deceased) individuals, is an outlet compartment. the model uses three parameters: the transmission rate β > , the recovery rate γ > and the total population n . active interventions given by the control input u ∈ [ , ] allow the population to reduce the transmission to an effective rate β = β ( − u), where u = means no intervention and u = means total isolation of infected individuals. this puts the sir model with active intervention to form ( ) where example . seir model. the seir model [ ] , [ ] is an extension of the sir model that incorporates an exposed population e apart from the s, i and r compartments. the exposed individuals are infected but not yet infectious over a latency period given by /σ > . since the latency affects the transmission, e is a multiplicative compartment. the seir model can be described by ( ) with example . sihrd model. the sihrd model [ ] adds two more outlet compartments to the sir model: hospitalized population h and deceased population d. their evolution is captured by three additional parameters: the hospitalization rate λ > , the recovery rate ν > in hospitals and the death rate µ > . equation ( ) yields the sihrd model for there exist several other compartmental models of form ( ) which involve further compartments, such as the sird [ ] , sirt [ ] , sixrd [ ] or sidarthe [ ] models. more complex models can provide higher fidelity, although they involve more parameters that need to be identified. in what follows, we show applications of the sir and sihrd models and we discuss the occurrence of time delays related to incubation and testing. we omit further discussions on latency, the seir model or other more complex models. fig. shows the performance of the sir model in capturing the spread of covid- for the case of us national data. the parameters β = . day − , γ = . day − and n = × of the sir model were fitted following the algorithm in [ ] to the recorded number of confirmed cases i + r [ ] between march and august , , while the control input u(t), that represents the level of quarantining and social distancing, was identified from mobility data [ ] based on the medium time people spent home. the fitted control input (blue) follows the trend of the mobility data (gray) well, especially in march where stay-at-home orders came into action. while the fit (blue) captures the data about confirmed cases (gray), the model also has predictive power (orange); see more details about forecasting in [ ] . note that once an individual gets infected by covid- , it takes a few days of incubation period to show symptoms and an additional few days to get tested for the virus [ ] , [ ] , [ ] , [ ] . therefore, the measured number of confirmed cases represents a delayed state of the system, , and thus we involved a time delay τ in the model identification process, which was found to be τ = days by fitting [ ] . the delay-free counterpart of the fit (purple) shows that the measurement delay can lead to a significant error in identifying the true current level of infection. the effects of the delay τ on safety-critical control and its compensation will be discussed in sec. iv. formally, safety can be translated into keeping system ( ) within a safe set s ⊂ r n+m that is the -superlevel set of a continuously differentiable function h : r n+m → r: function h prescribes the condition for safety: for example, if one intends to keep the infected population i under a limit i max for the sir, seir or sihrd models, the safety condition is h(x) = i max − i ≥ . to guarantee safety, we design a controller that ensures that the set s in ( ) is forward invariant under the dynamics ( ), i.e., if x( ) ∈ s (h(x( )) ≥ ), then x(t) ∈ s (h(x(t)) ≥ ) for all t > . below we use the framework of control barrier functions [ ] , [ ] to synthesize controllers that are able to keep certain compartments of interest within prescribed limits. first, we consider safety for multiplicative compartments, and then for outlet compartments. consider keeping the i-th multiplicative compartment ( ≤ i ≤ n) below a safe limit given by c i , i.e., we prescribe where c i is an upper bound for w i . a lower bound could also be considered similarly, by taking h(x) = w i − c i . theorem : consider dynamical system ( ), function h in ( ) and the corresponding set s given by ( ) . the following safety-critical active intervention controller guarantees that s is forward invariant (safe) under dynamics ( ) if g i (w) = , ∀w ∈ r n : where relu(·) = max{ , ·} is the rectified linear unit, and α > . furthermore, the controller is optimal in the sense that it has minimum-norm control input. proof. according to [ ] , the necessary and sufficient condition of forward set invariance is given by ∀t ≥ , where the derivative is taken along the solution of ( ). if there exists a control input u(t) so that ( ) is satisfied, then h is called a control barrier function. substitution of ( ) and ( ) into ( ) gives the safety condition where ϕ i is given by ( ) . the control input u(t) must satisfy ( ) for all t ≥ . to keep control efforts minimal, one can achieve this by solving the quadratic program: based on the kkt conditions [ ] , the explicit solution is if g i (w(t)) = , which can be simplified to ( ) . we remark that if g i (w) = , safety can be ensured by the help of extended control barrier functions as discussed for the safety guarantees of outlet compartments in sec. iii-b. for example, to keep the infected population i below the limit i max for the sir model given by ( ) , one shall prescribe h(x) = i max − i, and ( ) leads to the controller fig . shows the dynamics of the closed control loop for the covid- model fitted in fig. by prescribing more precisely, α must be chosen as an extended class k function [ ] , but we use a constant for simpler discussion and without loss of generality. fig. . safety-critical active intervention control of the sir model fitted in fig. to us covid- data. the controller keeps the infected population under the prescribed limit imax as opposed to the second wave of infection that was experienced during the summer of . i max = , and using α = γ/ . indeed, the safetycritical controller (red) applied from june , is able to keep the level of infection under the safe limit (red dashed), while gradually reducing mitigation efforts to zero. meanwhile, the us experienced a second wave of infections (gray) in the summer of , which was caused by the drop in mitigation efforts in june (see the blue control input). now consider the case where the j-th outlet compartment ( ≤ j ≤ m) needs to be kept within the safe limit c j : in the following theorem, we use a dynamic extension of control barrier functions to guarantee safety. theorem : consider dynamical system ( ), function h in ( ) and the corresponding set s given by ( ) . the following safety-critical active intervention controller guarantees that s is forward invariant (safe) under dynamics ( ) iḟ h(x ( )) + αh(x( )) ≥ and if l g q j (w) = , ∀w ∈ r n : and α > , α e > . furthermore, the controller is optimal in the sense that it has minimum-norm control input. proof. we again use ( ) as the necessary and sufficient condition for safety, where the following expression appears: h e (x(t)) :=ḣ(x(t)) + αh(x(t)) = −(q j (w(t)) + r j (z(t))) + α(c j − z j (t)), ( ) which puts the safety condition into the form h e (x(t)) ≥ , ∀t ≥ . however, the control input does not explicitly show up in ( ) . still, if there exists a control input that satisfieṡ h e (x(t)) ≥ −α e h e (x(t)), then h e is an extended control barrier function [ ] , [ ] , whose -superlevel is forward invariant, that is, h e (x ) ≥ implies h e (x(t)) ≥ , ∀t > . substitution of ( ), ( ) and ( ) into ( ) gives the extended safety condition where ϕ e j is defined by ( ) . this can be satisfied by a minnorm controller obtained from the quadratic program: the explicit solution of the quadratic program is which is equivalent to ( ) . as an example of keeping outlet compartments safe, consider limiting the number of hospitalizations below h max and deaths below d max for the sihrd model given by ( ) . by choosing h(x) = h max − h, one can guarantee safety in terms of hospitalization based on ( ) by the controller: whereas prescribing h(x) = d max − d ensures safety by upper bounding deaths via: having synthesized controllers that keep selected compartments safe, let us now guarantee safety for multiple compartments at the same time: a set of multiplicative compartments i ⊂ { , . . . , n} and a set of outlet compartments j ⊂ { , . . . , m}. to formulate the safety condition, one can utilize ( ) for any multiplicative compartment i ∈ i and ( ) for any outlet compartment j ∈ j . then, one needs to solve the corresponding quadratic program subject to all these constraints. in general, the quadratic program can only be solved numerically and one may need relaxation terms to satisfy multiple constraints [ ] . however, analytical solutions can be found in some special cases, such as the one given by the following assumption. assumption . assume that the following terms have the same sign: sign(g i (w(t))) = sign(l g q j (w(t))) = − , ∀i ∈ i, ∀j ∈ j , ∀t ≥ . this assumption often holds for models where compartments need to be upper bounded for safety, e.g., the assumption holds for keeping e, i, r, h or d below a safe limit in the sir, seir or sihrd models. under this assumption, one can state the following proposition. proposition : consider dynamical system ( ) with assumption and the controllers ( ) and ( ) that keep individual multiplicative compartments w i , i ∈ i ⊂ { , . . . , n} and outlet compartments z j , j ∈ j ⊂ { , . . . , m} safe using the control barrier functions in ( ) and ( ) . the following safety-critical active intervention controller guarantees safety for all compartments at the same time: that is, one needs to take the maximum of the individual control inputs that keep each individual compartment safe. proof. if assumption holds, the safety conditions in ( ) and ( ) can be combined into one inequality: then, one can solve the quadratic program: in the form: this can be simplified to ( ) based on ( ) and ( ) . fig. shows the closed loop response of the sihrd model given by ( ) that was fitted to us covid- data [ ] . the data about confirmed cases were scaled by the cube root of the positivity rate (positive per total tests) to account for the significant under-reporting of cases during the first wave of the virus (and cube root was applied to scale less aggressively). starting from june , safetycritical active intervention control is applied to limit both the hospitalizations below h max = , and the deaths below d max = , . based on ( ), we utilize the controller a hd (x) = max{a h (x), a d (x)} where a h and a d are given by ( ) and ( ) . the model and controller parameters are β = . day − , γ = . day − , λ = . day − , ν = . day − , µ = . day − , n = × , τ = days, α d = α e d = α h = (γ + λ + µ)/ and α e h = ν/ . safetycritical control is able to reduce mitigation efforts while keeping the system below the prescribed hospitalization and death bounds and preventing a second wave of the virus. controller ( ) in sec. iii is designed based on feeding back the instantaneous state x(t) of the compartmental model. however, data about certain compartments is measured with delay due to the incubation period and testing delays. thus, the instantaneous state x(t) may not be available for feedback, but the delayed state x(t − τ ) with measurement delay τ shall be used. if one implements a(x(t − τ )) instead of a(x(t)) for active intervention, a significant discrepancy between the delayed and instantaneous states can endanger safety. for example, the delay was identified to be τ = days for the us covid- data in fig. , while the infected population grew from a few thousands to more than a hundred thousand within days in mid march. this difference significantly impacts safety-critical control. thus, we propose a method to compensate delays via predicting the instantaneous state from the delayed one and we analyze how the prediction error affects safety. we use the idea of predictor feedback control [ ] , [ ] , [ ] to overcome the effect of delays. namely, at each time moment t we use the data that are available up to time t − τ and we calculate a predicted state x p (t) that approximates the instantaneous state: x p (t) ≈ x(t). then, we use the predicted state in the feedback law by applying a(x p (t)) ≈ a(x(t)). if the prediction is perfect (i.e., x p (t) = x(t)), safety is guaranteed even in the presence of delay according to sec. iii. below we analyze how errors in the prediction affect safety. the prediction can be done by any model-based or databased methods; see example for instance. at this point we only assume that the prediction error defined by is bounded in the sense that e(t) ∞ ≤ ε for some ε ≥ . the prediction error leads to an input disturbance relative to the nominal control input u(t) = a(x(t)), which yields the closed control looṗ w(t) = f (w(t)) + g(w(t))(u(t) + d(t)), z(t) = q(w(t)) + r(z(t)). note that for a lipschitz continuous controller a with lipschitz constant c, the input disturbance is upper bounded by d(t) ∞ ≤ c e(t) ∞ ≤ cε =: δ. the following theorem summarizes how the disturbance affects safety via the notion of input-to-state safety [ ] . for simplicity, we state this theorem only for the safety of multiplicative compartments. theorem : consider the closed loop dynamical system ( ) , function h in ( ) and the corresponding set s given by ( ) . assume that the nominal controller u(t) guarantees safety without the input disturbance d(t) by satisfying ( ), while the input disturbance d(t) defined by ( ) is bounded by d(t) ∞ ≤ δ. then, set s is input-to-state safe in the sense that a larger set s d ⊇ s given by is forward invariant (safe) under dynamics ( ) , where proof. similarly to ( ) and ( ), the necessary and sufficient condition for the invariance of s d is given bẏ substituting ( ) and taking the derivative along the solution of ( ) yields −ϕ i (w(t))−g i (w(t))(u(t)+d(t))+δ g i (w(t)) ∞ ≥ , ( ) which indeed holds, since ( ) and d(t) ∞ ≤ δ hold. how much larger set s d is compared to set s depends on the size δ of the disturbance that is related to the prediction error ε. if the prediction is perfect (x p (t) = x(t)), then ε = , δ = and s d recovers s. however, if one implements a delayed state feedback controller without prediction (x p (t) = x(t − τ )), then ε and δ can be large, while s d can be significantly larger than the desired set s. example . a possible model-based prediction can be done as follows. at each time moment t, we take the most recent available measurement x(t − τ ) and calculate the predicted state x p (t) by numerically integrating the ideal delay-free closed loop over the delay interval θ ∈ [t − τ, t]: w p (θ) = f (w p (θ)) + g(w p (θ))a(x p (θ)), z p (θ) = q(w p (θ)) + r(z p (θ)), where x p = [w t p z t p ] t and the initial condition for integration is x p (t − τ ) = x(t − τ ). the results in figs. and involve this kind of predictor feedback to compensate the delay τ . the simulations were carried out without considering uncertainties in the delay or other parameters, therefore the predicted state x p (t) matched the instantaneous state x(t) up to the numerical accuracy of integration. this allowed us to guarantee safety even in the presence of a significant delay. v. conclusions in this paper, we viewed compartmental epidemiological models as control systems where human actions (such as quarantining or social distancing) are considered as control input. by the framework of control barrier functions, we synthesized optimal safety-critical active intervention policies that formally guarantee safety against the spread of infection while keeping mitigation efforts minimal. we highlighted that time delays arising during state measurements can significantly affect safety-critical control, and we proposed predictor feedback to compensate the delays while preserving a certain level of input-to-state safety. we demonstrated our results on compartmental models fitted to us covid- data, where we synthesized controllers to keep infection, hospitalization and deaths within prescribed limits. these controllers can help guide policy makers to decide when and how much mitigation efforts shall be reduced or increased. modelling the covid- epidemic and implementation of population-wide interventions in italy early dynamics of transmission and control of covid- : a mathematical modelling study optimal control of an sir model with delay in state and control variables time-optimal control strategies in sir epidemic models can the covid- epidemic be controlled on the basis of daily test reports estimating the impact of covid- control measures using a bayesian model of physical distancing quantifying the effect of quarantine control in covid- infectious spread using machine learning a feedback sir (fsir) model highlights advantages and limitations of infection-based social distancing modeling shield immunity to reduce covid- epidemic spread safetycritical control of active interventions for covid- mitigation control barrier function based quadratic programs with application to adaptive cruise control control barrier function based quadratic programs for safety critical systems control barrier functions: theory and applications on a characterization of flow-invariant sets barrier certificates for nonlinear model validation viability theory set-theoretic methods in control a time delay dynamical model for outbreak of -ncov and the parameter identication initial simulation of sars-cov spread and intervention effects in the continental us risk assessment of novel coronavirus covid- outbreaks outside china time delay compensation in unstable plants using delayed state feedback compensation of infinitedimensional input dynamics predictor feedback for delay systems: implementations and approximations the mathematics of infectious diseases estimation of the final size of the covid- epidemic eects of latency and age structure on the dynamics and containment of covid- a modified seir model to predict the covid- outbreak in spain and italy: simulating control scenarios and multi-scale epidemics estimating and simulating a sird model of covid- for many countries, states, and cities a metapopulation network model for the spreading of sars-cov- : case study for ireland safegraph convex optimization exponential control barrier functions for enforcing high relative-degree safety-critical constraints american control conference (acc) input-to-state safety with control barrier functions the authors would like to thank franca hoffmann for her insights into compartmental epidemiological models and gábor stépán for discussions regarding non-pharmaceutical interventions in europe. this research is supported in part by the national science foundation, cps award # . key: cord- -fxuzd qf authors: palladino, andrea; nardelli, vincenzo; atzeni, luigi giuseppe; cantatore, nane; cataldo, maddalena; croccolo, fabrizio; estrada, nicolas; tombolini, antonio title: modelling the spread of covid in italy using a revised version of the sir model date: - - journal: nan doi: nan sha: doc_id: cord_uid: fxuzd qf in this paper, we present a model to predict the spread of the covid- epidemic and apply it to the specific case of italy. we started from a simple susceptible, infected, recovered (sir) model and we added the condition that, after a certain time, the basic reproduction number $r_ $ exponentially decays in time, as empirically suggested by world data. using this model, we were able to reproduce the real behavior of the epidemic with an average error of %. moreover, we illustrate possible future scenarios, associated to different intervals of $r_ $. this model has been used since the beginning of march , predicting the italian peak of the epidemic in april with about . detected active cases. the real peak of the epidemic happened on the th of april , with . active cases. this result shows that the model had predictive power for the italian case. at the beginning of , a previously-unknown respiratory tract disease was reported in china [ ] . this event is having a huge negative impact worldwide, not only under an healthcare perspective, but also under the economic, social and cultural ones. sars-cov- has been identified as the causative agent of the pandemic outbreak. it is a newly encountered member of the coronavirus family belonging to the rna-viruses. its behaviour is comparable to influenza viruses or sars-cov the causative agent of the pandemic outbreak / of [ ] [ ] . as soon as virus particles get into a host (human), they start invading cells (in this case, predominantly the ones in the respiratory tract) and these replicate the genome of the virus. virus particles get into the hosts saliva and humans infect each other by talking to infected individuals, touching hands and close face-to-face interaction [ ] [ ] . a number that is a good landmark for the transmission rate, or the infectiousness of any infectious disease, is the basic reproduction number (r ): this defines the average number of people that are infected by a single carrier over a defined period of time. r is an indicator of the transmissibility of the epidemic and it has been defined as the average number of secondary cases that a single case can generate in a completely susceptible population. r is of course dependent on the characteristics of the epidemic itself; however, it also depends on the population sample we are considering. the higher the human interaction in a population is, the higher the value r will be. among the factors that can influence r in a given population there are therefore social habits and social organization. the basic reproduction number is therefore mutable depending on these aspects, and the analysis of its variation in time can be crucial to monitor the trend in the transmissibility among a single population. for our specific case of study, i.e. the spread of the covid- epidemic in italy, it is important to study the variation of r in time before and after the implementation of lockdown measures, as well as after its removal. in our modified sir model, we let r vary in time as a consequence of quarantine. as we will show later, this will also allows us better reproduce real data, as well as have a predictive view on the whole trend of the epidemic. symptoms of coronavirus disease (covid- ) are widespread: from asymptomatic patients to patients with flu-like symptoms up to a severe pneumonia leading to a severe acute respiratory distress symptom (ards) [ ] [ ] , making the ventilation of patients unavoidable. given the lack of reliable and long-term data regarding incubation period, virulence, contagiousness, and other transmission parameters [ ] for the novel coronavirus sars-cov- and the lack of reliable drugs and vaccines [ ] , containment measurements, the tracking of infected people and the treatment of patients in the early stage of the illness, remain the only feasible option to face the ongoing outbreak of the virus that is leading to a collapsing health system with thousands of deaths, as seen in hotspots. the impact of covid- in italy has been and is still very severe, with a death toll notably high. up to now, more than . people have been tested positive in italy and more than . died due to the coronavirus. in this paper, we present the model used by the covstat group [ ] to model the spread of the epidemic. using this model, we had been able to predict with good accuracy the peak of the active infected, both related to its location in time (the date of the peak) and its amplitude (the maximum number of active infected). in this paragraph we present the sir model, that is used as base for the development of further models from the covstat group. we point out that the sir model is not predictive. in sec. . we present the ingredients that represent a novelty here and have been included to improve the performance of the model and to describe the real data in a reasonable manner. sir is one of the simplest models to describe the spread of an epidemic [ ] . the sir model is based on the assumption of a totally susceptible population at time t , i.e. the beginning of the spreading. in the sir model, the overall population of n individuals is divided into categories: susceptibles (s), infected (i) and removed (r). hence, at a given time t from the beginning of the spreading of the epidemic, i(t) and s(t) are the number of infected people present in the population and the number of vulnerable people that have not contracted the virus yet, respectively, while r(t) is the sum of the ones that have developed immunity (recovered) or deceased and are therefore removed from the susceptible count. it is straightforward to notice that, at any time t, s(t) + i(t) + r(t) = n . the sir model purpose is to describe the variation in time of s(t), i(t), and r(t) meaning the migration in time of individuals among these categories. the model consists of categories: s for the susceptible people, i for the infected people and r for the sum of recovered and deceased people. the classic sir model is described by ordinary differential equations: where β is related to the velocity of diffusion of the virus and γ is related to the time required to infected people to become removed (recovered or deaths). both parameters have the dimension of time − . the three equations written above can be interpreted in the following manner: • at the beginning of the epidemic, the entire population is susceptible to the infection (s( ) = n ). if there is a single infected person, other people can get the infection, going from the category s to the category i. the strength (speed) of the spread of the virus is determined by the parameter β; • the number of infected people increases when susceptible people get infected. after a typical timescale equal to /γ infected people i go in the third category r; • the category r includes the sum of people that recovered or died after infection. therefore, in the classic sir model there are only free parameters that can be used to fit real data: β and γ. both β and γ have dimension of time − . from here on, we will use days as unit of time. in this model, the basic reproduction number r is a dimensionless number obtained using a combination of the previous parameters: r (t) provides direct and quantitative information on the spread of the epidemic. if r ≤ the epidemic will stop spontaneously, while with r > it will continue spreading. in fig. we show two different evolution patterns of the pandemic, to demonstrate the effect of different values of β. both simulations start with the initial condition of infected person, for a population of n = people in total, and fixing γ = . (i.e. typical duration of the illness of days). in both panels the blue line represents the number s of susceptible people which at time t = days coincides with almost the entire population (for our simulation: s( ) = − ) and then decreases as both the recovered and infected numbers (green and red lines, respectively) increase. after an interval of time, in both the simulations, the infected number i reaches a peak, the value of which is dependent on the parameter β. after that time, the number of infected decreases as the recovered increase and ultimately reaches the total number of individuals. in the simulation on the left we set β = . while on the right we assume β = . , to simulate the implementation of social distancing actions. we notice that in the left panel the peak of infected people comes after days, with roughly half the population infected. on the right panel the peak is shifted and comes after days and the number of infected people at the peak is less than the half of the previous case. this shows the importance of social distancing, since social distancing and quarantines reduce the parameter β, helping in reducing the number of infected people at the peak of the epidemic. this is indispensable to avoid the collapse of the healthcare system, and especially the intensive care units. although a simulation with the standard sir appears to be adequate to describe an epidemic spreading in a sample where all the initial conditions remain constant throughout the period of time, it is not sufficient when it comes to a more complex and realistic situation such as the population of a given country, where the parameters of the model are influenced by other external factors. from this the necessity to modify the model for our case of study. in the classic sir model the parameter β is constant, therefore it cannot account for the effect produced by quarantine actions, that would have also a dynamical impact on the parameter r (t). for this reason, it is useful to go beyond the classic sir model, as explained in the next section. in the classic sir model, the parameter β is constant in time. this means that it cannot account for the slowdown of the spread due to the quarantine. to simulate a more realistic scenario, it is necessary to go beyond the classic sir model. we denote this model as sir . . the open source code is available in the covstat repository [ ] . compared to the classic sir it contains the following new features: • the parameter β(t) changes in time, to account for the effects due to quarantine and social distancing. particularly β(t) = β before a time t th , while it exponentially decays for t > t th : where t th and τ are two additional parameters of the model. the time t th represents the starting time of the quarantine actions, while τ refers to the decaying period and it has the dimension of time. the previous assumption is driven by empirical observations, since this behavior of r (t) has been observed in several countries during the covid pandemic; • we take into consideration the possible presence of asymptomatic patients. the virologist ilaria capua has suggested that / of patients in italy might be asymptomatic [ ] . the study conducted on the population of vo' euganeo [ ] reaches similar conclusions. therefore, it is reasonable to assume that the total number of infected people is roughly times higher that the number of people that were tested positive. compared to the classic sir model, in this revised version there are more parameters, t th and τ . therefore the sir . is characterized by parameters in total: β , γ, t th and τ . in this paper we present and discuss varios results obtained using the model described above. we first focus on the curve of the active infected in italy. then we discuss possible future scenarios and how to compute the parameter r for italy and italian regions. from here on, we will always use data from the th of february to the th of may. using the model sir . it is possible to reasonably describe the spread of covid- in italy. we used the data of protezione civile [ ], starting on february th, (therefore, this date corresponds to time zero of our model). we use n = . × people as population and, as initial conditions, we choose r = and i = , corresponding to the number of infected people on the th of february. in order to find the best model, we minimize the mean squared error between predictions and true data, allowing the free parameters to vary. we consider as a last update the th of may. the best fit is given by figure : simulation of future scenarios, fixing a certain value of r at the th of may. we identify regions, corresponding to low risk (green), mid risk (orange) and high risk (red) situations. the following set of parameters: β = . , γ = . , t th = , τ = . . the best model has an average error of . % compared to the data and is shown in fig. as the dashed blue line. on the home page of [ ] the best fit model is updated daily. however it is important to recall that the predictions during the pandemic were stable. since the beginning of march the model has predicted the peak of the infected people in april, with . of infected people (see palladino's talk [ ] ). the phase started in italy on the th of may with progressive removal of quarantine. the conditions are different from phase , however, since the lockdown has not been completely released we cannot assume the initial conditions are fully restored. even when the country's lockdown will formally be over, the local legislation about social distancing and the civic consciousness of population will be dramatically different than before. on the other side we still don't have enough information about the immunity for those who have contracted the virus. therefore, it is hard, or even impossible, to make accurate predictions without a reasonable model for the evolution of r (t). however, it is interesting to understand which future scenarios are associated to different intervals of the parameter r (t). in order to do that, we fit the past data with the sir . , as explained in the previous section. then, we use the basic sir model for new future predictions, fixing a certain ratio β(t)/γ on the last day of data. therefore, the value ofr at the t d = th may is given by: where s(t d ) is the number of susceptible people on t d . then r (t) evolves as: since in our model we assume that the total number of infected people is times higher that the number of tested positives i(t), the previous expressions becomes equal to: the epidemic starts decreasing when r (t) < . this condition is always satisfied wheñ r < . in fig. we report different regions corresponding to different intervals ofr , as explained in the legend of the figure. in this section we focus on the computation of r (t) for italy and italian regions. in order to do that, we focus on a subset of data. particularly we use that last days of data for italy and last days of data for italian regions (due to the smaller statistics). this is done to avoid fast oscillations of r and to better understand the general trend of the epidemic during the last week. in order to do that, we use the standard version of the sir model. we assume an average duration of the illness of days, i.e. γ = / . then we minimize the mean squared errors between real data and model, varying r (t), with the usual definition given for the sir model. let us notice that, even assuming that the total number of infected people is times larger than that the tested positive ones, the number of susceptible people is still very high. therefore the value of r (t) corresponds, in very good approximation, to: with the present numbers, this remains true even if the number of infected people were times larger. indeed up to now order of k cases were tested positive. so, if the true number of infected people were . millions, the ratio s(t d )/n = . , very close to . the evolution of r (t) in italy is shown on the right panel of fig. . on the left panel of the same figure, we represent r (t) computed on the last days. in the central panel of the same figure we report the last value of r (t) for italian regions, updated to the th of may. in this case we use a time window of days, to compensate the smaller statistics and to reduce the oscillations of r (t). using this procedure, r (t) becomes a good indicator of the behavior of the pandemic during the last week. we have built a model to predict the spread of the covid- epidemic in italy. we started from a simple sir model and we added the condition that, after a threshold time, the basic reproduction number r (t) exponentially decays in time, as empirically suggested by the spread of the epidemic in different countries. using this model we were able to predict the peak of the epidemic . months before it happened, with an error of week on the period and an error smaller than % on the absolute numbers. we have also presented possible future scenarios, assuming different intervals of the parameter r (t) after the th of may, i.e. when the lockdown in italy has been released. we conclude explaining our procedure to compute r (t), as a function of time, for italy and italian regions. this paper shows that the model has a good predictive power, when a period of quarantine is observed with fixed conditions. the origin, transmission and clinical therapies on coronavirus disease (covid- ) outbreak an update on the status a review of coronavirus disease- (covid- ) severe acute respiratory syndrome coronavirus (sars-cov- ) and coronavirus disease- (covid- ): the epidemic and the challenges estimating clinical severity of covid- from the transmission dynamics in wuhan, china the covid- epidemic early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia contributions to the mathematical theory of epidemics i-ii-ii suppression of covid- outbreak in the municipality of vo, italy medrxiv link to the desy seminar key: cord- -mg drna authors: maltezos, s. title: methodology for modelling the new covid- pandemic spread and implementation to european countries date: - - journal: nan doi: nan sha: doc_id: cord_uid: mg drna after the breakout of the disease caused by the new virus covid- , the mitigation stage has been reached in most of the countries in the world. during this stage, a more accurate data analysis of the daily reported cases and other parameters became possible for the european countries and has been performed in this work. based on a proposed parametrization model appropriate for implementation to an epidemic in a large population, we focused on the disease spread and we studied the obtained curves, as well as, we investigated probable correlations between the country's characteristics and the parameters of the parametrization. we have also developed a methodology for coupling our model to the sir-based models determining the basic and the effective reproductive number referring to the parameter space. the obtained results and conclusions could be useful in the case of a recurrence of this repulsive disease in the future. the disease of the new virus covid- which has been pandemic in the world for about days, the "wavefront" of infection has reached its mitigation stage. therefore, this is the time to turning our thoughts not only to its subsequent, painful and serious implications of this pandemic [ ] , [ ] , [ ] , [ ] but also, it could be useful to analyse the way of growth of the disease among the countries until the th of may , as well as, to correlate these with the main parameters that likely played a significant role. in particular, we consider extremely useful to study the specific characteristics of each country that played a role, the financial level or even genetic behaviour against to the new corona virus and the associated disease. some of these characteristics used as mathematical parameters for performing correlation studies. the results of this study could give us information for preparing more effective defensive strategies or practical "tools" in a possible future return of the pandemic which constitutes the central goal of the present work. the outline of the paper is as follows. in section we present a theoretical methodology for parametrizing an epidemic, in section we explain how to couple the present parametrization model with the basic sir model, in section we give results relevant to the end-to-end epidemics growth and in section we discuss the conclusions. our methodology is based on the parametrization of the growth of the covid- disease that we used in our recent work [ ] and also in [ ] and [ ] , that we call "semi-gaussian of n-degree". it was used for fitting the disease's growth at various indicative countries and it belongs to the model category of "regression techniques" for epidemic surveillance. the basic-single term expression of this parametrization model is where the function c(t) applied in an epidemic spread represents the rate of the infected individuals as the new daily reported cases (drc) and coincides with the function i(t) in the sir model, as we can see in the following. also a is constant while n and τ are model parameters. the more analytical approach, in the general case from the mathematical point of view, comes from the fundamental study of the epidemic growth and includes a number of terms in a form of double summation related to the inverse laplace transform of a rational function given in [ ] , referring to the "earlier stages of an epidemic in a large population". in this hypothesis, the number of unaffected individuals may be considered to be constant, while any alternation is assumed small compared to the total number of exposed individuals. this function, which could be called "large population epidemic semi-gaussian model" (lpe-sg), is the following where a ij are arbitrary amplitudes, n ij are the degrees of the model (assumed fractional in a general case) and τ ij are the time constants representing in our case a mean infection time respectively. also, m and n are the finite number of terms to be included. it is easy to prove that the "peaking time" of the function of each term depends on the product of n ij by τ ij , that is, t pij = n ij τ ij . in practice, the number of the required terms should be determined according to the shape of the data and the desired achievable accuracy. after investigation of the fitting performance we concluded that, at most, two terms of the above double sum are adequate for our purpose. also, the cross terms, with indices ij and ji, cannot help more the flexibility of the model. in particular, a) the degree of the model can "cover" any early or late smaller outbreak of the daily cases, while b) the mean infection time is a characteristic inherent parameter of the disease under study and thus should be essentially constant. for these reasons, the expression with one term was adequate in most of the cases, whereas, the free parameters allow a very good flexibility for the fitting. therefore, we can write for the fitting procedure, we have used two alternative tactics, based on either the daily model or on the cumulative integral of it. the decision depends on the goodness of the fit in each case based on the criterion of minimizing the χ /dof, as we have done in our previous work. [ ] . the starting date (at t = ) was one day before the first reported case (or cases). the cumulative parametrization model and the fitting model take the following forms respectively where the symbol Γ represents the gamma function and the Γ c the upper incomplete gamma function at time t. the above generalized mathematical model, has the advantage of providing more flexibility when the raw data include a composite structure or superposition of more than one growth curves which could be coexisting. this is possible to happen due to restriction measures applied during the evolution of an epidemic. regardless of the number of terms used, the obtained parameters must be well understood, in the sense of their role in the problem. let us consider a country where the disease starts to outbreak due to a small number of infected individuals. in this stage we assume that the country has been isolated in a relatively high degree, but of course not ideally. at this point, the disease starts with a transmission rate which depends on the dynamics of the spread in each city and village, while other inherent properties of the disease affect its dynamics (e.g. the immune reaction, the incubation time and recovering time). at this point, we must clarify also the issue of the "size" of the epidemic. the sir-based models assume that the size, n, that is the total number of individuals exposable to the disease is unchanged during its evolution, a fact which cannot be exactly true. on the other hand, a fraction of the size concerns individuals who are in quarantine for different reasons (due to tracing or for precautionary reasons). therefore, the size cannot be absolutely constant and the forecasting at the first stage (during the growing of the epidemic) should be very uncertain. in the second stage (around the turning point), the situation is more clear by means of more accurate parametrization, although high fluctuations could still be present. at the third stage (mitigation or suppress), an overall parametrization can be made and any trial for forecasting concerns a likely future comeback of the epidemic. in any of the three stages regardless of the level of uncertainties the parametrization specifies the associated parameters according to the epidemic model used. it is known that the "basic reproductive number" symbolized by r is a very important parameter of the spreading of the epidemic. in the sir-based models it is determined at the first moments of the epidemic (mathematically at t = ) and is related to the associated parameters. moreover, as it is proven in the next, this parameter doesn't depend on the size n of the epidemic. by using the present parametrization model we assume that the size of the epidemic is, not only unknown, but also much smaller than the population of the country or city under study, that is, it constitutes an unbiased sample of the potentially exposable generic population. once the epidemic is pretty much eliminated, the size could be also estimated "a posteriori" by the help of a sir-based model. however, in this case, the parameters of the spread, as well as, the reproductive number are already determined by the methodology given in the next. we consider that this more generic approach facilitates the fitting process and improves the accuracy because of the existence of an analytical mathematically optimal solution. coupling with the sir-based models the classical model for studying the spread of an epidemic, sir, belongs to the mathematical or state-space category of models along with a large number of other types which are analytically described in [ ] . our model belongs to the continuum deterministic sir models in the special case applied at the earlier stage of an epidemic, assuming that the population is much greater than the infected number of people [ ] . this model can be applied also when the epidemic is in each latest stage where the total number of infected individuals is an unbiased sample of the population. under these assumptions, this model can be related to the classical sir model, or even with its extensions (seir and sird), by means of correlating their parameters. below, we give a brief description of the basic sir epidemic model. let us describe briefly the three state equations of the sir model: the function s = s(t) represents the number of susceptible individuals, i = i(t) the number of infected individuals and r = r(t) the number of recovered individuals, all referred per unit time, usually measured in [days] . the constant factor a is the transmission rate, the constant factor β is the recovering rate and n is the size of the system (the total number of individuals assumed constant in time), that is, s + i + r = n , at every time. the assumptions concerning the initial and final conditions are, s( ) = , s(∞) > , i(∞) = and this model does not have an analytical mathematical solution additional the two parameters a and β are constant during the spread of the epidemic. a solution is derived only with the approximation, βr/n < , that is, when the epidemic essentially concerns a small number of recovered compared to the total number of individuals. in this case a taylor's expansion to an exponential function is used. in our study, we work for the general case without this assumption. in our basic model, c(t), the integral of the function c(t), must be compared to the total number of infected individuals i, while the parameter a undertakes the scaling of the particular data. the parameter τ does not necessarily coincides with the inverse of the " mean infection rate", a, but /τ expresses an "effective transmission rate" in our model. the parameter n cannot be equalized to any of the parameters of the sir model. however, this parameter contributes to the key parameter of the epidemic spread, the so-called "basic reproductive number", r , which is defined at t = and is equal to r = (a/β)(s( )/n ), where β is the "mean recovering rate". because s( ) ≈ n , it becomes r ≈ a/β. however, the s( )/n ) represents a basic threshold, the so called "population density", above which the epidemic is initiated and growing when r o = (a/β) > . moreover the "effective reproductive number", r e , a variable as a function of time is also defined by the same way as follows because the condition for creating an epidemic is r e > , the corresponding condition should be s/n > /r . also, at t = should be r e ( ) ≡ r , at the peaking time t = t p should be r e (t p ) = and at t = ∞ takes the value r t = r s n (∞) < r . by using the expressions of eq. , including only the value of i and its derivative, one can estimate roughly the r e at any time t. it can be also shown, that the s(∞)/n can be determined by solving the following transcendental equation numerically. therefore, we can conclude that for r e , at the outbreak of the epidemic (rising branch of the curve) we have, < r e < r , exactly at the peak of the curve, r e = (because s = n/r , as we explain in the next) and at the mitigation stage (leading branch of the curve), r e < and tends to a minimum value at the asymptotic tail of the curve which is once the above relationship is achieved, the r can be determined by solving the derived algebraic equation. indeed, this was our initial motivation to perform the following analysis. the methodology for accomplishing it, was based on the idea to exploit the property of our model at its maximum at the peaking time, which is t p = nτ , as it can be easily proven by differentiation. on the other hand, in the sir model a peak is expected some time for the function i, as the typical effect of the epidemic's spread. considering that both models can be fitted to the data, in the vicinity of the peak must agree, and therefore, we must claim that i p = c(t p ). let us first find an expression of the s, r and i at the peaking time, symbolizing them by, s p , r p and i p respectively. in order to relate s with r, we replace i from eq. into eq. , we obtain from which, taking into account that s( ) ≈ n , we derive the solution the later result at the peaking time gives us an expression of r p also, according to eq. , at the peaking time we might have from this equation, and using the definition of r , we obtain based on eq. we calculate r p as follows adding the three functions at the peaking time, s p , i p and r p , we derive the algebraic equation in order to achieve an equation independent of size n , we must express i p /n as a function of the model parameters, that is, by using the maximum value of the model curve [ ] and the eq. of the sir model by integration with upper limit the infinity, as follows aβτ −n Γ(n + ) + s(∞) where the symbol Γ represents the gamma function, n and τ are the particular values obtained by a fitting) and τ = /β. replacing the above expression to eq. and setting s n = s(∞)/n we obtain this transcendental equation can be solved only numerically for r in which the combined unknown s n is also found numerically by using again another transcendental eq. , by using multiple iterations leading to a converging accurate solution within loops. the parameter n of the model is essentially the expresser of r , while the obtained value of r concerns an hypothetical sir model fitted to the data of the daily reported cases (drc). from the obtained solution for r we can also calculate the parameter a of sir model, a = βr , where β can be calculated from the peak value of the daily reported recovered individuals by the formula, β = r p /i tot,p , where i tot,p represents the integral of the drc curve with upper limit the peaking time t p . in particular, implementing the above methodology, by using home made software codes written in matlab platform [ ] , we obtained the fitting of the drc curve for greece at the mitigation stage, shown in fig. and fig. . the fitted parameters, n = . and τ = . and the solutions r = . and s n = . . for the parameter β we used a typical-average value found in the literature where, β = . and the same value we used for the other analyzed countries. two characteristic parametrizations of very large normalized size and very small one ( times smaller), that is, of belgium and malta, respectively, are given in fig. and fig. . definitely, without seeing the vertical scale, one cannot distinguish which corresponds a large or a small normalized size. the only visible difference at a glance, is the peaking time ( and days respectively). study of the end-to-end epidemic growth for the data analysis we selected the countries of eu, including switzerland and uk obtained from [ ] . the characteristics of the countries relevant to our study are summarized in table . in particular, we used the population density, the estimated normalized total number of infected individuals (determined by the number of deaths by using a typical constant factor) and the gross domestic product (gdp), nominal per capita. the degree of correlation among the above characteristics and the modelling parameters, was studied by the "theoretical pearson linear correlation coefficient" given by the following formula where x and y are considered normal random variables, σ x and σ y are the corresponding standard deviations and cov(x, y ) is their covariance. however, as it is done in practice, we calculated the "sampling pearson coefficient" (spc), r(x, y ), for n observed random pairs (x i , y i , ..., x n , y n ), where the x can represent the first selected variable and y the second one. the correlation study concerned eight pairs, as is illustrated in table . the conclusions of the linear correlation study are the following: . for the population density d: no correlation was found with other parameters. . for the model parameters n and τ : strong anti-correlation was found. . for the peaking time t p : very strong anti-correlation was found with the basic reproductive number, r . the scatter plot of the basic reproductive number r and the peaking time t p is shown in fig. . this correlation gives us the following message: the higher r results to less a delay of the upcoming peak in the drc curve. the obtained slope of the linear fit was − . ± . days. on the other hand, the r among the analyzed countries, present statistical fluctuations from about to . , obeying roughly a gaussian distribution with mean value . and standard deviation . (or relative to mean %). the parameters n and τ also fluctuate, as we can see in fig. while the peaking time t p shows stochastic characteristics obeying similarly a gaussian distribution with mean . days and standard deviation . days (or relative to mean %). since r fluctuates (and in turn the t p due to their linear correlation) among the different countries randomly without presenting any correlation with their associated parameters, we can conclude that the normalized size of the epidemic can be explained only by taking into account other reasons and aspects related to the way citizens interact and behave as well as the degree of social distances and mobility or transport within a country's major cities. also, a crucial role played definitely the degree of quarantine and likely some individual biological differentiations (genetic and other related characteristics). the capability for surveying the epidemic spread during the three main stages is very important and could be based on the daily data and the mathematical modelling we presented. in the mitigation stage the surveying is even more useful and crucial when the restriction measures are starting to be relaxed. the crucial condition for a new epidemic reappearance is based on the effective reproductive number, r e , as well as, on the corresponding population threshold. however, because of the large statistical fluctuations caused by the poor statistics of data as well as because of the low slope of the epidemic curve in this stage, it is very hard to achieve accurate numbers, but only a qualitative estimate as follows. the r e can be estimated from the expressions in eq. and using average numerical approximations of the slope, di/dt. an alternative and practical formula based on the parametrization model sg-lpe can be easily proven and is, r e = + (nτ /t − )/β. this formula is valid only in the vicinity of the peak, namely in the narrow range from . t p to . t p , because the fitted model and the sir one differ in the slopes at both side tails. once r e is estimated, the population density threshold, in turn, can be calculated and should be s(t)/n = /r e , assuming that the normalized size n can also be estimated by a similar level of accuracy. therefore the crucial condition in the mitigation stage is written as the derivative has to be calculated as an average slope, ∆i/∆t, preferably at least within one week. assuming that this slope is i w and the corresponding average cases in a week is i av the crucial condition becomes where we used the typical values, β = . days − and n s(∞) ≈ for having a practical result as a case study. this simplified formula combined with one-week measurements should be very useful because the relative fluctuations of the drc are expected to be very large. a systematic analysis of the epidemic characteristics of the new virus covid- disease spread is presented in this work. for the mathematical analysis, we used a model that we called lpe-sg which facilitates the parametrization by an analytical mathematical description. we also presented a methodology of its coupling with the sir-based models aiming to determine the basic and effective reproductive numbers based on the fitted parameters. analysing the daily reported cases of european countries, we found no correlation between the population density, normalized size or gdp of the countries with respect to the spreading characteristics. another important finding of our study was a strong anti-correlation, statistically significant, of the basic reproductive number and the peaking time. moreover, we found that the basic reproductive number in the epidemics studied showed a uniform distribution with a wide range of values. this means that it is mainly influenced by many factors and generic characteristics of the society in a country. databased analysis, modelling and forecasting of the covid- outbreak epidemic analysis of covid- in china by dynamical modeling a robust stochastic method of estimating the transmission potential of -ncov predicting the cumulative number of cases for the covid- epidemic in china from early data parametrization model motivated from physical processes for studying the spread of civid- epidemic polynomial growth in branching processes with diverging reproductive number fractal kinetics of covid- pandemic a contribution to the mathematical theory of epodemics mathematical modeling of infectious disease dynamics i would like to thank my daughter v. maltezou, a graduate of the department of agriculture of the aristotle university of thessaloniki and, of athens school of fine arts, for our discussions on the global epidemiological problem, which gave me the warmth and the motivation for doing this work. also, i thank my colleagues, prof. emeritus e. fokitis and e. katsoufis, for their insightful comments and our useful discussions. key: cord- -id m mfk authors: vrugt, michael te; bickmann, jens; wittkowski, raphael title: containing a pandemic: nonpharmaceutical interventions and the"second wave" date: - - journal: nan doi: nan sha: doc_id: cord_uid: id m mfk in response to the worldwide outbreak of the coronavirus disease covid- , a variety of nonpharmaceutical interventions such as face masks and social distancing have been implemented. a careful assessment of the effects of such containment strategies is required to avoid exceeding social and economical costs as well as a dangerous"second wave"of the pandemic. in this work, we combine a recently developed dynamical density functional theory model and an extended sird model with hysteresis to study effects of various measures and strategies using realistic parameters. depending on intervention thresholds, a variety of phases with different numbers of shutdowns and deaths are found. spatiotemporal simulations provide further insights into the dynamics of a second wave. our results are of crucial importance for public health policy. the rapid spread of the coronavirus disease , caused by the severe acute respiratory syndrome coronavirus (sars-cov- ) [ ] [ ] [ ] , has led governments across the globe to impose severe restrictions on social life, typically denoted "shutdown" or "lockdown". while these have been found to be very effective in reducing the number of infections, they have also been accompanied by high social and economical costs. moreover, it can be expected that infection numbers rise again after the shutdown has ended ("second wave"). consequently, the development of an effective containment strategy that avoids a collapse of both the economy and the healthcare system and that takes into account the problem of multiple outbreaks is of immense public interest. for this reason, a significant amount of research is currently performed on the effects of various nonpharmaceutical interventions (npis) [ ] [ ] [ ] [ ] [ ] [ ] and intervention strategies [ , , ] on the spread of infectious diseases. from a political perspective, the costs associated with different containment measures make it necessary to obtain a detailed understanding of the benefits of various strategies, the effectiveness of different combinations of npis, and of whether one type of intervention can compensate for another one. of particular importance is the question at which stage social restrictions should be imposed and lifted in order to avoid multiple outbreaks and a large number of deaths. a useful theory for such investigations is the susceptible-infected-recovered (sir) model developed by kermack and mckendrick [ ] , which has been generalized in a large variety of ways in order to incorporate, e.g., governmental interventions [ ] . recently [ ] , we have proposed an extension of the sir model based on dynamical density functional theory (ddft) [ ] [ ] [ ] [ ] that incorporates social distancing in the form of a repulsive interaction potential. it allows to treat different types of npis separately and therefore provides more detailed insights into containment strategies than the simple sir model while being computationally more efficient than individual-based models. a further recent development in sir theory is the description of adaptive containment strategies, which are relevant for the current pandemic [ ] , using hysteresis loops [ , , ] . in this work, we use the sir-ddft model and an extended susceptible-infected-recovereddead (sird) model with hysteresis to investigate the effects of various containment strategies with model parameters adapted to the current covid- outbreak in germany. we compare the effects of face masks and social distancing/isolation and of various threshold values (of the number of infected persons) for imposing and lifting restrictions. our simulations reveal the existence of various phases with different numbers of outbreaks. this effect needs to be taken into account when making political decisions on shutdown thresholds, as it can significantly affect both the total length of the shutdowns and the number of deaths. moreover, we show that a second wave can also arise if only one type of restriction is lifted. finally, it is found that second waves tend to have a different spatial distribution than first waves, an effect that is a current public health concern [ ] . our results thereby extend the work in refs. [ ] [ ] [ ] [ ] , as they are based on methods from statistical mechanics that allow for deeper insights. moreover, we break new ground in soft matter physics by developing a ddft model with time-and historydependent interaction potential, leading to interesting novel dynamical behavior. this model allows to test a large variety of shutdown strategies and their consequences for the "second wave" using our freely available code [ ] . the most widely used theory for modeling disease outbreaks is the sir model [ ] . it assumes that the population consists of three groups, namely susceptible (s), infected (i), and recovered (r) individuals. susceptible persons are infected at a rate c effĪ , where c eff is the effective transmission rate [ ] . infected persons recover at a rate w. recovered persons are immune to the disease. an extension is the sird model [ ] , in which infected persons die at a rate m. the governing equations of the sird model arė we use overbars to distinguish, e.g., the total number of infected persons (Ī) from the number of infected persons per unit area (i). due to its simplicity, the sir(d) model has become very popular and is used in modeling the current coronavirus outbreak, incorporating real data [ ] . at present, it is not clear whether persons that have recovered from covid- are immune against it, but experiments on rhesus macaques have found that a sars-cov- infection induces protective immunity against rechallenge [ ] . a drawback of the standard sir(d) model is the fact that it does not include spatiotemporal dynamics. moreover, it does not allow to treat various types of npis, such as face masks and social distancing, separately. this is possible in individual-based models, which, however, are computationally very expensive. therefore, an intermediate approach that combines the simplicity of the simple sir model with the flexibility of individual-based models is very promising in this context. such an approach is given by the sir-ddft model developed in ref. [ ] . it describes the densities s, i, and r of susceptible, infected, and recovered persons, respectively, as fields on spacetime governed by the equations with time t, mobility Γ φ for field φ = s, i, r, free energy f , and transmission rate c. the free energy f = f id + f exc + f ext consists of a term f id describing noninteracting persons ("ideal gas free energy"), a term f exc for social interactions, i.e., social distancing and self-isolation of infected persons, and a term f ext for an "external potential" describing, e.g., travel restrictions (not considered in this work). in comparison to the sir model, social distancing is therefore incorporated explicitly, based on a microscopic model of individual persons staying away from each other. the interaction strength is measured using two parameters c sd and c si for social distancing and self-isolation, respectively (which are negative if the interactions are repulsive). this model is an extension of the reaction-diffusion sir model, which has been found to give accurate predictions for the spread of the black death in europe [ ] . mathematical details on the sir-ddft model are given in the supplementary materials. ddft is reviewed in ref. [ ] . as discussed in ref. [ ] , the transmission rate c should be distinguished from the effective transmission rate c eff appearing in the standard sir model: the former measures the transmission rate given contact, where the amount of contacts is determined by the interactions that incorporate social distancing and self-isolation. on the other hand, the rate c eff depends on both c and the number of contacts. consequently, it is possible in the sir-ddft model (but not in the sir model) to treat these two factors separately. this is an important advantage, since it allows to distinguish the effects of two of the main npis that were implemented against the covid- outbreak: face masks and other hygiene measures such as frequent hand washing reduce c, i.e., they decrease the probability of an infection in case of contact. repulsive interactions, on the other hand, reduce the number of contacts. hence, performing a parameter scan in c and the interaction strength allows to distinguish the effects of the two types of measures, and thereby provides insights into the question to which extent these can supplement or replace each other. to obtain the phase diagram, we have solved the sir-ddft model (eqs. ( )-( )) numerically in two spatial dimensions with w = . /d and m = . /d. these parameter values are adapted to the outbreak in germany (see supplementary materials). moreover, we set Γ s = Γ i = Γ r = . we measure time in days (d) and everything else in dimensionless units. population numbers shown in the plots are normalized such that the initial total population size is one. details on the simulations can be found in the supplementary materials. the resulting phase diagram, shown in fig. , visualizes the dependence of the normalized maximal number of infected personsĪ max,n on c and on the strength of the repulsive interactions c sd = c si . it is found that both a reduction of c and an increase of |c sd | can decrease infection numbers. the model exhibits three phases, which are characterized by low (no outbreak), intermediate (contained outbreak), and large (uncontained outbreak) infection numbers, respectively. infection numbers are small if c is below w (indicated by a green line in fig. ). the outbreak can be (partially) contained by large values of |c sd | (even if c is also large) or by intermediate values of c and c sd . therefore, it is possible, to a certain extent, to reduce the amount of contact restrictions (i.e., to decrease |c sd |) without increasing the infection numbers if the transmission rate c is also reduced, which is possible by hygiene measures. consequently, the model shows that face masks allow to re-open a society after a shutdown in a controlled way. the way in which the parameter c is changed by implementing face masks depends on their efficacy and on the adherence in the population, a strong reduction of c is possible if both are large (see ref. [ ] for a quantitative estimate of the effect of face masks). up to now, we have assumed that the mitigation measures are imposed in the same way at all times, i.e., that the model parameters are constant. in practice, however, they will be imposed and lifted in an adaptive fashion depending on whether infection numbers rise above or fall below certain thresholds. strategies of this form are of significant importance for the covid- outbreak [ , ] . we now discuss how such approaches can be described mathematically, starting with the simple sird model. let us assume that a shutdown is started once the number of infected persons is larger than a threshold valueĪ start , and stopped once it falls below a valueĪ stop withĪ stop ≤Ī start . mathematically, this corresponds to a non-ideal relay operator (also called "rectangular hysteresis loop" or "lazy switch"), which was incorporated into the sir model by chladná et al. [ ] . here, we extend the model from ref. [ ] by also taking into account the fact that the infection rate will not jump immediately when a threshold is crossed, since a society requires some time to implement restrictions. thus, we assume that the effective transmission rate c eff converges exponentially [ ] to a value c or c in the presence or absence of interventions, respectively. these considerations lead to the dynamical equatioṅ here, α is a constant parameter, and the form of eq. ( ) ensures a convergence to c or c , depending on the infection numbers and the history of the system. usually, the initial condition will be c eff ( ) = c , since social distancing measures are not present at the beginning of an outbreak. as discussed in the supplementary materials, realistic parameter choices for the outbreak in germany (which we use for our simulations) are given by α = . /d, c = . /d, and the political decision that has to be made then is the choice ofĪ start andĪ stop , i.e., what the infection numbers should be in order for a shutdown to be started and stopped, respectively. to investigate this problem, we have solved eqs. ( )-( ) and ( ) numerically with parameter values w = . /d and m = . /d for different values ofĪ start andĪ stop in order to obtain the phase diagrams . details on the simulations can be found in the supplementary materials. we adapted the values of w and m to the current covid- pandemic (see supplementary materials). as an initial condition, we have used the number of confirmed infections in germany at the th of march (reported in ref. [ ] to be ) normalized by the population of germany. the results are shown in fig. , visualizing (a) the normalized maximal number of infected personsĪ max,n , (b) the normalized number of susceptibless ∞,n at the end of the pandemic (i.e., the number of persons that have never been infected), and (c) the normalized total number of deathsd ∞,n . as can be seen from fig. a , the maximal peakĪ max,n depends only onĪ start . hence, for avoiding a large number of infected persons at the same time and thus a collapse of the healthcare system, it is primarily important to start the shutdown sufficiently early. the point at which it is lifted again is less relevant. this observation is in agreement with results from ref. [ ] . a different and much more complex result is found when considering the final number of susceptibless ∞,n and the total number of deathsd ∞,n . here, various distinct phases can be observed, which have a staircase-shaped boundary that depends on bothĪ start andĪ stop . as can be expected, large values ofs ∞,n correspond to small values ofd ∞,n and vice versa (if fewer people are infected, fewer people die). for large values ofĪ start (i.e., in the phase on the right), the number of deaths is large. however, within each phase, the number of deaths increases upon reducingĪ start at fixedĪ stop . this is a remarkable and surprising result, since one would intuitively expect a smaller shutdown threshold to be beneficial. when reducingĪ stop at fixed i start within a phase, the number of deaths becomes smaller, although it jumps to a larger value if a phase boundary is crossed from above. an explanation for the complexity of the phase diagrams can be found in figs. d, e , and f, which show the number of waves of the pandemic n waves , the number of shutdowns n shut , and the total shutdown time t shut as a function ofĪ start andĪ stop . the difference between the various phases in figs. b and c is the number of shutdowns n shut . increasingĪ stop at fixed i start leads to a larger number of waves and shutdowns and a reduced total shutdown time (in agreement with ref. [ ] , whereĪ start andĪ stop were not distinguished). however, increasinḡ i start at fixedĪ stop reduces n waves and n shut . finally, a very interesting observation is that the phase boundaries for n waves and n shut are not at the same positions. while a larger number of shutdowns generally corresponds to a larger number of waves, reducingĪ start below the critical value for n shutdowns (with n ∈ n) does not immediately lead to n + waves because the critical value ofĪ start for n + waves is slightly smaller. since it is, as far as the number of deaths is concerned, beneficial to be slightly below the critical value ofĪ start separating regions with n and n − shutdowns, choosingĪ start in such a way that the wave n + is avoided needs careful adjustment. this requires, of course, that one is aware of the difference between the phase boundaries for n waves and n shut , which makes our results highly relevant for political decisions on shutdown thresholds. in figs. g- j, the time evolutions ofs,Ī,r,d, and c eff are shown for different combinations ofĪ start andĪ stop . shutdown periods are shaded in yellow. figure g , corresponding tō i start = . % andĪ stop = %, shows two shutdowns. after the first shutdown, which is longer, infection numbers rise again ("second wave"), such that a second shutdown is necessary. a third wave after the second shutdown is not observed. as can be seen from figs. d and e, i start = . % andĪ stop = % corresponds to a choice of parameters between the phase boundaries for n waves and n shut . in fig. h , results for the sameĪ stop = % and smallerĪ start = . % are shown. although these parameters also lead to two shutdowns, a third wave of the pandemic is observed here after the second shutdown. therefore, the final number of deathsd ∞,n is larger in this case. figure i shows the time evolution forĪ start = . % andĪ stop = . %, i.e.,Ī stop is increased at fixedĪ start compared to fig. g . here, the two shutdowns are shorter and closer to each other, andd ∞,n is larger than in fig. g . a third wave is also observed. finally, fig. j gives results forĪ start = . % andĪ stop = %, i.e.,Ī stop is the same as in figs. g and h, butĪ start is increased into the region with n shut = . consequently, there is only one shutdown. after it ends, a relatively large second wave occurs, leading to a relatively high overall number of deaths. our results have important consequences for political decisions on intervention strategies. of course, the best strategy for keeping bothĪ max,n andd ∞,n small is to start the shutdown early and stop it late (bottom left corner of the phase diagram). however, this is not always possible the number of waves is measured by the number of local maxima of the functionĪ(t). due to the social and economical costs associated with a shutdown (as can be seen in fig. f , the total shutdown time t shut is very long in this case). in practice, a political decision has to be made regarding the question when to start and end a shutdown given limited resources. when making a political decision on when to start and end shutdown (choosingĪ start and i stop ), one needs to take into account the existence of the various phases shown in fig. . a small variation of the threshold values can lead to a different phase, which changes the number of outbreaks and shutdowns and thus significantly affects the total number of deaths. the optimal strategy depends on what one is aiming for: • if the main goal is to keepĪ max,n small to avoid a collapse of the healthcare system, one should start the shutdown early (smallĪ start ). • as far asd ∞,n is concerned, it is beneficial to chooseĪ start andĪ stop close to a phase boundary in such a way that a slight increase ofĪ start or decrease ofĪ stop would reduce the number of shutdowns by one. • the choice ofĪ stop also corresponds to a trade-off betweend ∞,n and t shut . increasing it within a phase at constantĪ start leads to a larger number of deaths and a shorter shutdown time. • remarkably, strategies with multiple shutdowns can have advantages over strategies with a single shutdown. while in many cases more shutdowns correspond to more waves, an additional wave can be avoided after a further shutdown if the threshold values are chosen close to a phase boundary. in practice, contact restrictions will typically be removed earlier than hygiene requirements such as face masks if infection numbers decrease. consequently, more detailed insights can be gained using the sir-ddft model, in which effects of face masks and contact restrictions can (as shown in fig. ) be modeled separately. for this purpose, we introduce a dynamic equation for the interaction strength in the forṁ where i = sd, si. the form of eq. ( ) has been chosen in analogy to eq. ( ). changing the interaction strength according to eq. ( ) while keeping c constant models a scenario in which (figs. g- j) . in both cases, a second wave of the pandemic is observed after the first shutdown. forĪ start = %, a second shutdown is initiated to inhibit the second outbreak, whereas no second shutdown is observed forĪ start = %. the simulation results thereby confirm the observations from the simpler model. however, they also add to it an important new aspect: a second wave can also occur if, after a shutdown, only contact restrictions are lifted while other measures are kept in place (constant c). an effect of this type was observed in germany: while face masks are still mandatory in public places (constant c), contact restrictions have been relaxed after the initial shutdown. in consequence, infection numbers have risen again [ ] . the extended sir-ddft model allows for a detailed investigation of a variety of shutdown strategies by adapting the values of the model parameters. in fig. , we have chosen c in such a way that it allows to recover the effective reproduction number measured in germany in early march (corresponding to an infrequent use of face masks). the choice c sd, = c si, = − corresponds to the assumption that there is moderate social distancing in the no-shutdown phase that does not distinguish between healthy and infected persons (which can arise if infected persons cannot be easily identified as such, as it is the case for covid- [ , ] ). in the case of a shutdown, a strong increase of |c si | (large value of |c si, |) then reflects both an increased amount of testing (allowing for a specific isolation of infected persons) and stronger physical isolation. other possible scenarios include a lower value of c (increased use of face masks), larger values of |c sd, | and |c si, | (stricter social distancing in the no-shutdown phase), and larger values of |c sd, | and |c si, | with smaller ratio c si, /c sd, (strict physical distancing in the shutdown phase without testing). hence, the extended sir-ddft model is a flexible and useful tool for analyzing under which conditions and in which way a second wave will occur for a certain combination of measures. using our freely available code [ ] , simulations can be easily performed for any policy the consequences of which one wishes to investigate. snapshots from the time evolution of the density i(x, y, t) of infected persons as a function of position r = (x, y) t are shown in fig. b forĪ start = % andĪ start = %. the complete time evolutions are shown in the supplementary movies s and s . initially, the infected persons are concentrated in the middle of the domain and spread outwards radially (t = d). afterwards (at t = d), a phase separation effect is observed where the infected persons arrange into separated spots. this pattern formation, which was discussed in ref. [ ] , can be interpreted as infected persons self-isolating at their houses. when the shutdown ends, the strength of the interactions is reduced such that phase separation is no longer present (t = d). for i start = % (but not forĪ start = %), phase separation is observed a second time at t = d during a second shutdown. the second phase separation differs from the first one in that it emerges from a distribution that is already rather homogeneous and not from an accumulation of infected persons in the middle. finally, at t = d, there are almost no infected persons left in both simulations. these findings are very interesting for public health policy, since they show that the first and second wave do not only differ by the initial values fors,Ī, andr -the only aspect that can be captured in the simpler model -but also by their different spatial distributions. this can be seen when comparing the distributions at t = d and t = d, which represent the initial stages of the first and second wave, respectively. the first wave starts after a radial spread from the center, i.e., the infection is initially localized. before the second wave, however, the disease has already spread over the entire area. this difference is also relevant for the current spread of covid- in germany: the first wave was a consequence of infected persons arriving by travel, and therefore started at isolated positions. in contrast, the second wave emerges from a more homogeneous spatial distribution [ ] . from our model, this can be expected to be a common feature of second waves. initially, a disease will always break out at single spots, which corresponds to an inhomogeneous initial condition i( r, ). if contact restrictions (repulsive interactions) are lifted, the sir-ddft model describes a purely diffusive dynamics that typically leads to a homogeneous distribution. therefore, the initial condition for the second wave is more homogeneous than for the first one. on the other hand, as can be seen from fig. a , the overall infection numbers are smaller for the second wave. the snapshot for t = d in the bottom row of fig. b shows that phase separation does not occur at the center, where the concentration of infected persons is lower at t = d (initial stage of the second wave). physically, this corresponds to a shutdown that is locally restricted as a consequence of infection numbers becoming large only in certain regions. in summary, we have employed the sir-ddft model and an extended sird model with hysteresis to study the effects of different containment strategies. we have found that lifting contact restrictions can be partially compensated for by stricter hygiene measures. investigating adaptive strategies showed that different combinations of thresholds lead to various phases. they differ by the number of waves and shutdowns and, consequently, by the number of deaths and the total shutdown time, making this effect immensely important for public health policy. spatiotemporal simulations have revealed that a second wave can also arise if only contact restrictions are lifted, and that it tends to have a different spatial distribution than the first wave. by adapting parameter values, the model allows to study the effects of a large variety of containment strategies in any country. possible extensions of this work include the investigation of further strategies, such as partial shutdowns or isolation of specific groups. moreover, the sir-ddft model could be extended to include vaccination [ ] . here, we describe the sir-ddft model following ref. [ ] . dynamical density functional theory (ddft), reviewed in ref. [ ] , describes the time evolution of a density field ρ( r, t). for a single-component fluid, it is given by with a mobility Γ and a free energy f . equation (s ) can be derived from the microscopic dynamics of overdamped brownian particles using the adiabatic approximation, which approximates the pair correlations of the nonequilibrium system by those of an equilibrium system with the same one-body density [ ] . in the case of multiple fields {ρ i }, eq. (s ) generalizes to in our present work, the fields are given by s, i, and r (density of susceptible, infected, and recovered persons, respectively). in addition, we need to add reaction terms to the ddft equation (s ) (as done, with other physical motivations, in refs. [ , ] ), since the "particles" can change their species, i.e., persons can get infected or recover. the reaction terms are obtained from the sird model. this leads to the model with transmission rate c, recovery rate w, and death rate m. the free energy f has three terms: first, the ideal gas free energy describes a system of noninteracting particles with the rescaled inverse temperature β, number of spatial dimensions d, and thermal de broglie wavelength Λ. in the case f = f id , eq. (s ) simply gives the standard diffusion equation with d = Γβ − . the term f ext describes the influence of an external potential and is set to zero throughout this work. finally, the excess free energy f exc describes interactions. it is not known exactly and needs to be approximated. in our case, the interactions are social interactions such as social distancing and self-isolation. the basic idea is that persons practicing social distancing can be described as repulsively interacting particles [ ] . we assume that the repulsive interactions can be described by a soft (gaussian) pair potential. the reason for this is that, even in the case of social distancing, there will still be a certain (although reduced) amount of contact. hence, soft potentials are more appropriate than hard-core interactions. for interaction potentials as chosen here, the mean-field approximation is known to give good results [ ] . assuming that the excess free energy f exc contains a contribution for social distancing f sd and a contribution for self-isolation f si then gives here, c sd and c si determine the strength and σ sd and σ si the range of the interactions. inserting eqs. (s ), (s ), and (s )-(s ) into eqs. (s )-(s ) gives the final model equations [ ] ∂ t i = d i ∇ i − Γ i ∇ · i ∇(c si k si (s + i + r)) + csi − wi, (s ) where d φ = Γ φ β − for φ = s, i, r are the diffusion coefficients, are the kernels, and is the spatial convolution. the parameters for the simulations presented in the main text have been chosen in such a way that their order of magnitude is realistic for the current covid- outbreak in germany. we use days d as the unit of time and dimensionless units for all other quantities. for germany, the effective reproduction number r eff , which in our model is given by [ ] r eff = c effs w , is estimated by the robert koch institute (rki), the central public health institute of the german federal government, on a daily basis. if we assumes ≈ n = with the total population size n , we get from eq. ( ) of the main text, we can infer that the approach of r eff (t) to its shutdown value will be governed by a function of the form with r eff, = (c − c )/w and r eff, = c /w. as shown in fig. s , choosing r eff, = . , r eff, = . , and α = . /d gives a good agreement with empirical data. furthermore, we assume w = . /d, which is consistent with the mean infection duration of days reported in ref. [ ] . from this, we can infer c ≈ . /d and c ≈ . /d. moreover, we assume following ref. [ ] that the probability of dying from covid- in the case of available intensive care is p d = . . this result is given by the probability of hospitalization ( . ) multiplied by the probability of requiring intensive care given hospitalization ( . ) multiplied by the probability of dying during intensive care ( . ), which results in p d = . × . × . = . . given the probability p d of dying during time t , we can obtain the death rate m (which is needed for the sird model) as [ ] assuming that persons are infected for t = d and die at a constant rate during this time (which, of course, is a strong simplification) gives m ≈ . /d. fig. s . comparison of the function (s ) for the time evolution of the effective reproduction number r eff with data from the robert koch institute [ ] (interval: - - to - - ). the error bars of the data points are smaller than the data points. oscillations in the empirical data arise from differences in reported case numbers between different days of the week. for the extended sir-ddft model, the same parameter values for w, m, and α can be used. the parameter c of the sir-ddft model is not identical to the parameter c eff of the sir model, which is why we discuss here how the value of c can be obtained (which is important for practical applications of the sir-ddft model also to regions other than germany). in the simplest case of a homogeneous distribution of the population, the relation between c and c eff is given by c = c eff a with the domain area a [ ] . in this work, we use a = . if we set the total population size to n = and assumes ≈ n , eq. (s ) gives comparing eqs. (s ) and (s ) shows that the value used for c eff ( ) can also be used for c under the approximation c eff ( ) = c/a if the population size is set to a rather than to . in particular, using c ≈ . /d in the spatiotemporal simulations allows to recover, in the limiting case of a completely homogeneous distribution, the value of r eff that corresponds to the values measured in germany in early march (inserting c = . /d into eq. (s ) gives r eff = . , which is approximately equal to the result r eff ( ) = . obtained from eq. (s )). the simulations for figs. and have been performed in two spatial dimensions on a quadratic domain [−l/ , l/ ] × [−l/ , l/ ] with size l = and periodic boundary conditions. we have solved the equations of the sir-ddft model using an explicit finite-difference scheme with spatial step size dx = . for fig. and dx = . for fig. and adaptive time steps. the shutdown state was updated explicitly every . d. as an initial condition, we have used a gaussian distribution with amplitude ≈ . and variance l / centered at (x, y) = ( , ) for s(x, y, ), i(x, y, ) = . s(x, y, ), and r(x, y, ) = , such that the mean overall density was . regarding parameter values not specified in the main text, we have set d s = d i = d r = . and σ sd = σ si = . the simulations for fig. were also solved via an explicit finite-difference scheme with adaptive time steps, while the shutdown state was updated every . d. as an initial condition, we useds( ) = −Ī( ),Ī( ) = /( × ) (number of confirmed infections in germany at the th of march [ ] , normalized by the approximate population of germany),r( ) = ,d( ) = , and c eff ( ) = c . a new coronavirus associated with human respiratory disease in china a pneumonia outbreak associated with a new coronavirus of probable bat origin effects of non-pharmaceutical interventions on covid- cases, deaths, and demand for hospital services in the uk: a modelling study estimating the burden of sars-cov- in france impact of non-pharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand, imperial college covid- response team inferring change points in the spread of covid- reveals the effectiveness of interventions effective containment explains subexponential growth in recent confirmed covid- cases in china assessing the impact of coordinated covid- exit strategies across europe optimal control of an epidemic through social distancing, preprint ( ) a contribution to the mathematical theory of epidemics global dynamics of sir model with switched transmission rate effects of social distancing and isolation on epidemic spreading modeled via dynamical density functional theory the nature of the liquid-vapour interface and other topics in the statistical mechanics of non-uniform, classical fluids dynamic density functional theory of fluids dynamical density functional theory and its application to spinodal decomposition classical dynamical density functional theory: from fundamentals to applications dynamics of sir model with vaccination and heterogeneous behavioral response of individuals modeled by the preisach operator memory effects in population dynamics: spread of infectious disease as a case study ein plan für den herbst the basic reproductive number of ebola and the effects of public health measures: the cases of congo and uganda a simple mathematical model for ebola in africa sars-cov- infection protects against rechallenge in rhesus macaques geographic and temporal development of plagues face masks against covid- : an evidence review täglicher lagebericht des rki zur coronavirus-krankheit- (covid- der puls steigt substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) global stability analysis for a generalized delayed sir model with vaccination and treatment mechanism for the stabilization of protein clusters above the solubility curve controlling the microstructure and phase behavior of confined soft colloids by active interaction switching a hybrid multi-scale model of covid- transmission dynamics to assess the potential of non-pharmaceutical interventions binary gaussian core model: fluid-fluid phase separation and interfacial properties modellierung von beispielszenarien der sars-cov- -epidemie in deutschland rates and probabilities in economic modelling we thank benedikt bieringer, markus dertwinkel, and julian jeggle for helpful discussions. r.w. is funded by the deutsche forschungsgemeinschaft (dfg, german research foundation) -wi / - . the simulations for this work were performed on the computer cluster palma ii of the university of münster. the code used for performing the simulations underlying this work as well as the source data for figs. - and s are provided at zenodo [ ] . key: cord- -itviia v authors: chandra, vedant title: stochastic compartmental modelling of sars-cov- with approximate bayesian computation date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: itviia v in this proof-of-concept study, we model the spread of sars-cov- in various environments with a stochastic susceptible-infectious-recovered (sir) compartmental model. we fit this model to the latest epidemic data with an approximate bayesian computation (abc) technique. within this sir-abc framework, we extrapolate long-term infection curves for several regions and evaluate their steepness. we propose several applications and extensions of the sir-abc technique. the sir model (kermack & mckendrick ) traces trajectories in phase space: susceptible (s), infectious (i), and recovered members of the population (r). the transmission rate β represents the number of disease transmissions per unit time, per infected host. the recovery rate γ is simply the number of recoveries per unit time. the disease lifetime is exponential, with a wait time scaling as e −γt . the expectation of disease duration is hence γ . these parameters govern the disease model with the following differential equations: we use an implementation of the gillespie algorithm (gillespie ) to generate stochastic trajectories of s, i, and r from these differential equations. armed with the ability to generate stochastic infection and recovery curves from starting parameters, we turn to fitting the starting parameters from real-world epidemic data. since the models are stochastic in nature, there isn't a simple analytical form that we can minimize. additionally, rather than fitting for only the parameters themselves, we would also like to quantify how certain we are about those parameters. we therefore employ an approximate bayesian computation (abc) technique to compare our simulations to observations and recover the posterior distributions of β and γ (figure ). this technique was previously used to fit initial mass functions to nearby galaxies (gennaro et al. ) . the general goal of abc is to sample the posterior distributions of simulation parameters such that the simulations match the observed data. in practice, it is impossible for simulations to exactly match data due to noise and ill-posed models. additionally, if the observable space is continuous, then the probability of simulations exactly matching observations is exactly zero. therefore, we define some distance d between simulations and observations, as well as a tolerance . we accept those parameters who produce simulations are d < away from the observed data. by initially sampling from the prior distributions of the parameters and iteratively shrinking the tolerance up to some stopping criterion, we 'shrink' the prior into the posterior. the dong et al. ( ) epidemic data consists of a -dimensional time series comprising of the number of confirmed cases and the number of recovered cases per . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . day (r). we subtract these two quantities to derive the number of infectious cases per day, i. given a simulated epidemic and the observed data, we quantify the difference between both the infectious and recovered population curves to obtain a distance rather than a-priori assuming the initial susceptible population s, we marginalize over it as a nuisance parameter in our abc procedure. therefore, our abc algorithm fits for three parameters: β, γ, and s. we use the pyabc package in python (klinger et al. ) for our abc procedure. we employ a simple particle filter algorithm (sequential monte carlo) that accepts or rejects sampled particles based on the selection criterion d < , until p particles have been accumulated. the first iteration samples uniform priors on each parameter, and each subsequent iteration samples the posterior of the previous iteration. we shrink by setting i of the i th iteration equal to the median of all the sampled distances d from the (i − ) th iteration. as the parameters converge to their posterior, the shrinkage of slows down, and the sampler has to reject progressively more particles in order to accumulate p particles with d < . we choose a stopping criterion such that the acceptance ratio (number of total particles sampled in order to accumulate p valid particles) is %. we find that the models are well-converged at this point, and sampling further does not improve the parameter posteriors. the root mean square difference defined in eqn. is around d ∼ for the converged models. each fit takes ∼ minutes to complete on a regular laptop computer. figure . extrapolated infection curves for the worstaffected chinese provinces. we allow the epidemic solution to continue until no active infections remain. we fit our model to the provinces in china worst affected by sars-cov- , with the exception of hubei due to the lack of early-stage data there. we recover posterior densities of β, γ, and the number of susceptible citizens s (figure ). we present epidemic curves with our model simulations overlaid for all regions in the appendix. for most provinces, there is an excellent agreement between the sir-abc model and the total number of confirmed cases. the fit is less perfect for the individual infected-recovered curves. this is to be expected, since the real-world obviously does not truly follow an sir model. there are various externalities like spatial effects and government/healthcare responses. our simple sir model also lacks vital statistics like births and deaths. for a fatal illness like sars-cov- , it would be valuable to add these parameters to the model. however, for the purpose of this proof-of-concept study, we estimate that adding these parameters will negligibly affect the goodness-of-fit of the total confirmed cases (chen & li ) . we extrapolate the model for each region by allowing it to run until no active infections remain (fig. ) . we find a consistent extrapolated infection profile for all the provinces under study. this indicates a similar level of government response after the first infections were reported, despite differing population sizes in each region. we quantify the 'steepness' of the infection curve by dividing the maximum number of active infected patients by the total length of the extrapolated infection curve, i.e. the duration of the epidemic. we compare the steepness of different chinese provinces in fig. . we find a strong correlation (p < . ) between the steepness of the infection curve and the fitted initial number of susceptible patients. this is likely not a significant finding, but rather an intrinsic collinearity between these measures. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https: //doi.org/ . / . . . doi: medrxiv preprint b e ij in g h e il o n g ji a n g s ic h u a n c h o n g q in g s h a n d o n g j ia n g s u j ia n g x i a n h u i g u a n g d o n g z h e ji a n g h e n a n steepness of curve figure . relative 'steepness' of the extrapolated infection curves in fig. . in this proof-of-concept study, we apply approximate bayesian computation to fit stochastic epidemic models to real world data. we encourage researches to improve and adapt these methods to other problems. an interesting extension of our analysis would be characterizing the reproduction rate r of different regions. however, we use a non-standard parameterization of the sir model for the benefit of our abc optimization. therefore, our derived r = β/γ lacks interpretability and cannot be compared to other studies. we invite other researchers to repeat our analysis with the standard sir parameterization. additionally, whilst parameter fits are poorly constrained in populations where the infection has not already peaked, it would be interesting to explore epidemic forecasting on those datasets. the gillespie algorithm can be optimized to work faster with larger numbers of patients. our parameterization of the sir model can also be modified to include vital statistics like births and deaths. abc generalizes well to these higher-dimensional parameter spaces. specific to sars-cov- , age-structured models would also be a valuable development, as would models that include vaccinations and acquired immunity. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . the lancet infectious diseases the lancet infectious diseases this study uses the data repository for the novel coronavirus visual dashboard operated by the johns hopkins university center for systems science and engineering (jhu csse), supported by esri living atlas team and the johns hopkins university applied physics lab (jhu apl). key: cord- - b zk q authors: lesniewski, andrew title: epidemic control via stochastic optimal control date: - - journal: nan doi: nan sha: doc_id: cord_uid: b zk q we study the problem of optimal control of the stochastic sir model. models of this type are used in mathematical epidemiology to capture the time evolution of highly infectious diseases such as covid- . our approach relies on reformulating the hamilton-jacobi-bellman equation as a stochastic minimum principle. this results in a system of forward backward stochastic differential equations, which is amenable to numerical solution via monte carlo simulations. we present a number of numerical solutions of the system under a variety of scenarios. the classic sir model, originally proposed in [ ] , is the standard tool of mathematical epidemiology [ ] for quantitative analysis of the spread of an epidemic. it describes the state of the affected population in terms of three state variables, traditionally denoted by s, i, and r: (i) ≤ s ≤ , the fraction of individuals who are susceptible to the disease. (ii) ≤ i ≤ , the fraction of individuals who are infected with the disease. (iii) ≤ r ≤ , the fraction of individuals who have been removed and are immune to the disease. note that, in this simplified model, r includes the individuals who have left the population through mortality. the model dynamics is given as the following dynamical system: , where the constant parameters β > and γ > are called the infection rate and recovery rate, respectively. the evolution above is subject to the initial conditions: ( notice that this dynamics obeys the conservation law consistent with the assumption that the variables s, i, and r represent population fractions. this means that the variable r is, in a way, redundant, as its current value does not affect the dynamics of s and i, and it can be computed in a straightforward manner from ( ) . it is thus natural to consider the two dimensional system defined by s, i only. eventually every epidemic comes to a natural halt , but its impact on the population may be very serious. the overall objective of epidemic control is to slow down the spread of infection in a way that it does not overwhelm the healthcare system and it allows the economy to function. all of this should be done within the limits of available resources. in this note we study the problem of optimal control of an epidemic modeled by means of a stochastic extension of the sir model (see section for definition). we assume that the controlling agent ("government") has the ability to impact the spread of the disease through one of the policies (or the combination of both): (i) vaccination of susceptible individuals, with rate v. this makes the fraction vs of the susceptible population immune to the disease. (ii) isolation of infected individuals, with rate i. this removes the fraction ii of the infected population and prevents it from spreading the disease. the controlled dynamics of the sir model reads [ ] , [ ] : r(t) = v(t)s(t) + (γ + i(t))i(t). for efficiency, we will be using the notation x = s and x = i throughout the remainder of the paper. mathematical models are only as good as (i) their analytic specifications, and (ii) the data that fuel them. during initial phases of an epidemic the available data tend to be of limited usefulness: because of the lack of reliable large scale testing, it is not really known what fractions of the population fall into the different compartments s, i, and r. this may lead to a panic reaction of the population and a chaotic and economically devastating public health response to the epidemic. in the absence of an effective vaccine (which would allow to immunize a portion of the population) the optimal policy is to isolate at least a significant fraction of the infected individuals so that the basic reproduction ratio r can be brought significantly below one. since we lack the knowledge who is infected and who is not, the public health response is to try to isolate everyone, whether susceptible, infected or immune. these circumstances impose a serious limitation on practical applicability of the approach to optimal epidemic control discussed in this paper, as well as other quantitative approaches. unless the inputs to the model (β, γ, and the current snapshots of s and i) can reliably be estimated from the available data, the model's output is unreliable . the paper is organized as follows. in section we review the formulation of the continuous time stochastic sir model. the optimal control problem for the stochastic sir model is formulated in section . the optimal control problem is recast as the stochastic minimum principle problem and formulated in terms of a system of forward backward stochastic differential equations (fbsde). we present an algorithm for solving this system in section . this section presents also the results of a number "this is indeed a mystery," watson remarked. "what do you imagine that it means?" "i have no data yet," holmes replied. "it is a capital mistake to theorise before one has data. insensibly one begins to twist facts to suit theories, instead of theories to suit facts." [ ] of numerical experiments involving the three mitigation policies within various cost function regimes. acknowledgement. i would like to thank nicholas lesniewski for numerous discussions. we consider a continuous time stochastic extension of the deterministic sir model, see [ ] and references therein. let w t denote the standard brownian motion, and leṫ w t denote the white noise process. we assume that the infection rate β is subject to random shocks, rather than being constant, namely while the recovery rate γ remains constant. here σ > is a constant volatility parameter. this leads to the following system of stochastic differential equations (sde), driven by w t with the initial conditions the third component of the process, x = r, follows the dynamics which implies that the conservation law continues to hold in the stochastic model. notice that, under the stochastic sir model ( ), an epidemic eventually comes to a natural end. more precisely, the solution (x ,t , x ,t ) = ( , ) to ( ) is stable in probability [ ] . in order to see it, we set for ≤ x i ≤ , i = , , and fixed < ρ < β/γ. then v ρ ( , ) = , and v ρ (x , x ) > in a neighborhood of ( . ). furthermore, denoting by l the generator of the process( ), we verify that in other words, v ρ (x , x ) is a lyapunov function for ( ) and our claim follows from theorem . in [ ] . the model ( ) is a one factor model, driven by a single source of randomness. there is a natural two-factor version of the stochastic sir model [ ] , in which also the recovery rate γ is allowed to be subject to white noise shocks. for simplicity, our analysis will focus on the one-factor model ( ). we frame the problem of epidemic control as a stochastic control problem [ ] . we denote by u = (u , u ) ≡ (v, i) the vaccination and isolation controls, and we denote the controlled process by x u t . generalizing the deterministic specification ( ) to the stochastic case, we assume that the dynamics of x u t is given by: subject to initial conditions ( ) . two special cases of the controlled process are of interest. if a vaccine against the disease is unavailable, we set u = in the equation above, which yields the following controlled process: we will refer to this policy as an isolation policy. similarly, we can consider a vaccination policy, for which u = . in this case the controlled dynamics reads dx u we assume a finite time horizon t < ∞. the controlling agent's objective is to minimize a running cost function c(x t , u t ) and the terminal value function g(x t ). in other words, we are seeking a policy u * such that we consider the following cost function: where for i = , . in other words, the running cost of vaccination is assumed to be proportional to the number of susceptible individuals, while the cost of isolation is assumed to be proportional to the number of infected individuals. the coefficients l i > , m i , n i are determined by the overall cost of following the mitigation policy. in particular, they should be selected so that the running cost functions are strictly positive. as the terminal value function we take the transmission rate of the infection [ ] , namely we now invoke stochastic dynamic programming, see eg. [ ] , [ ] . the key element of this approach is the value function j(t, x, y). it is determined by two requirements: (b ) it satisfies bellman's principle of optimality, for all ≤ t < t , where e t denotes conditional expectation with respect to the information set at time t, and where the minimum is taken over all admissible controls u t , [ ] . (b ) it satisfies the terminal condition, using ito's lemma, we verify that these two conditions lead to the following nonlinear partial differential equation for the value function, namely the stochastic hamilton-jacobi-bellman equation: subject to the terminal condition as the first step towards solving the hjb equation ( ), we let u = u * denote the minimizer of the expression inside the curly parentheses in ( ) . in other words, u * satisfies which leads to the following first order condition on u * : substituting u * back to the hjb equation yields the equatioṅ we do not believe that the solution to this equation can be explicitly represented in terms of standard functions. this is a three dimensional partial differential equation, and solving it numerically may pose challenges. rather than following this path, we shall invoke the stochastic minimum principle and reformulate the problem as a system of fbsdes. among the advantages of this approach is that it might be amenable to a deep learning approach via the method advocated in [ ] . the stochastic minimum principle, see [ ] and references therein, offers an alternative approach to stochastic optimal control. it is a stochastic version of pontryagin's minimum principle introduced in the context of deterministic optimal control [ ] . it also offers an effective numerical methodology to solving the hjb equation ( ) via monte carlo simulations. in this approach, the key object is the hamiltonian function h = h(x, u, y, z) of four arguments and a system of stochastic differential equations, both forward and backward, determined by h. specifically, for the case of the controlled stochastic sir model ( ), we have x, u, y, z ∈ r , and the hamiltonian function reads: we consider the following system of stochastic hamilton's equations: and where equation ( ) is merely an alternative way of writing the underlying controlled diffusion process ( ), while equation ( ) reflects the dynamics of the control variables given the running cost function. note that while the first of the equations ( ) is a standard (forward) sde, the second one is a backward stochastic differential equation (bsde), see e.g. [ ] . explicitly, these four equations can be stated as subject to the boundary conditions in ( ). then u * is an optimal control, i.e. it satisfies the optimality condition ( ) . in fact, there is a direct link between the hamilton function h and the value function j (also known in classical dynamics as the action function). namely, we set [ ] , [ ] : if u * is an optimal control and x * denotes the corresponding optimal diffusion process, then the pair is the solution to the bsde ( ). going back to the main line of reasoning, we find that u * has to satisfy from equations ( ) we see that, up to a simple linear transformation, y t is essentially the optimal policy. furthermore, the process z t can be thought of as the sensitivity of the optimal policy to the underlying process x t (multiplied by the instantaneous volatility of that process). substituting this expression into ( ), we find that the optimal process has to follow the dynamics: subject to the boundary conditions in ( ) and ( ). in particular, the isolation only and vaccination only policies are given by the following systems of fbsdes: and respectively. even though derived in the context of a meaningful underlying dynamics, there is no a priori reason why these equations should have solutions. the drivers of the backward equations above contain terms quadratic in y , and so the standard existence theorems [ ] do not apply. we proceed in the following assuming that these systems do, in fact, have solutions. in this section we discuss a numerical algorithm for solving the system ( ). applying this methodology in a number of numerical experiments, we present compelling evidence that solutions to ( ) exist and are meaningful over a wide range of model parameters. we start by describing a method for discretization of the basic fbsde system ( ). we notice first that the two dimensional system defined by the state variables s, i is in fact hamiltonian [ ] . namely, we define new (canonical) variables and it is easy to verify that d dt under the dynamics ( ). the system ( ) can be explicitly written in the hamilton forṁ notice also that the hamiltonian function is separable, i.e. it is of the form h(p, q) = t (p) + v (q) . a convenient discretization to ( ) can be formulated in terms of the canonical variables (p t , q t ) = (− log x ,t , − log x ,t ) as follows [ ] . choose the number of steps n, define the time step δ = t /n, set t n = nδ for n = , , . . . , n, and use simplified (p n , q n ) ≡ (p tn , q tn ). this yields the following euler scheme: for n = , , . . . , n − . here, the brownian motion increments ∆w = w (n+ )δ − w nδ are independent variates drawn from the normal distribution n ( , δ). at each iteration step also have to to floor q n+ and p n+ at , q n+ = max(q n+ , ), p n+ = max(p n+ , ), so that e −qn , e −pn ≤ . approximating the backward equations of the system ( ) leads to the following backward euler scheme: where denote the generators of the two bsdes. starting with the terminal condition we will move backward in time with computing y n and z n , for n = n − , . . . , . notice that two difficulties arise while doing so: (i) the y n 's in ( ) are not necessarily adapted, and (ii) they depend on z n . these two problems can be solved by taking conditional expectations e n ( · ) = e( · | x , x , . . . , x n ) on time t n . this leads to the following condition: this is an implicit scheme, which may slow down the computations. however, we can easily replace it with an explicit scheme of the same order: in order to determine z i,n , i = , , we multiply ( ) by an increment ∆w n and take conditional expectations. this yields and hence we obtain the following expression for z i,n : in summary, we are led to the following discrete time scheme for solving ( ): for n = n − , . . . , . note that simulating this system requires numerical estimation of the conditional expected values e n ( · ) in the formulas above. we discuss this issue in the following section. a practical and powerful method of computing the conditional expected values in ( ) is the longstaff-schwartz regression method originally developed for modeling american options [ ] . we use a variant of this method that involves the hermite polynomials, and that was used for a similar purpose in [ ] . this choice is natural, as conditional expectations of hermite polynomials of a gaussian random variable lead to simple closed form expressions. let he k (x), k = , , . . ., denote the k-th normalized hermite polynomial corresponding to the standard gaussian measure dµ(x) = ( π) − / e −x / dx. these functions form an orthonormal basis for the hilbert space l r, dµ . the key property of he k (x) is the following addition formula for χ ∈ [ , ] and w, x ∈ r: consequently, integrating over x with respect to to the measure µ yields the following conditioning rule: here, w, x are independent standard normal random variables. we shall use this relation in order to estimate the conditional expected values in ( ). we set w ti = √ t i w i , for i = , . . . , m, where w i is an n-dimensional standard normal random variable. we notice that where χ i = t i /t i+ , and where x i is standard normal and independent of w i . in the following, we shall use this decomposition in conjunction with ( ). now, we assume the following linear architecture: where k is the cutoff value of the order of the hermite polynomials. this is simply a truncated expansion of the random variable y i+ in terms of the orthonormal basis he k (w i+ ). the values of the fourier coefficients are estimated by means of ordinary least square regression. then, as a consequence of the conditioning rule ( ), in other words, conditioning y i+ on w i is equivalent to multiplying its fourier coefficients g k by the factor χ k/ i . this allows us to calculate the first term on the right hand side of the third equation in ( ). in order to calculate the second term, we substitute the explicit formula for z n into the generators ( ) and repeat the calculations in ( ) and ( ) with y n+ replaced by y n+ + f (x n , y n , z n )δ. for numerical experiments within the framework developed above, we assume a time horizon of year ( days), and choose the sir model parameters as follows: β = . , γ = . , s = . , these parameters are purely hypothetical, and they do not arise from the calibration to any actually reported data. the corresponding value of the basic reproduction ratio r = s β/γ is . and it indicates a highly infectious disease such as covid- . we choose the diffusion parameter σ = . . ( ) a typical scenario generated by this model is graphed in figure . for solving the fbsde system ( ) we generate , scenarios (monte carlo paths) using the low variance spectral decomposition method. each scenario is based on daily sampling, i.e. n = . for calculating the conditional expectations in the longstaff-schwartz regression we use the cutoff value of k = . the expectation behind this choice is that we obtain the accuracy of four sigma. we attempt to construct a numerical solution to ( ) through the following iterative procedure. we start by generating initial monte carlo paths x ( ) t using the (discretized version of the) raw process ( ) . for the initial guess of y t we take y notice that this is not at solution to the backward equations in ( ), it merely satisfies the terminal condition. after this, we iterate for k = , , . . ., until the stopping criterion is satisfied or the maximum number of iterations is reached. for the stopping criterion we choose the condition that the average l -norm change of a monte carlo path falls below a tolerance threshold of − . we make various choices of the coefficients l, m, n defining the running cost functions. depending on the values of these coefficients, the iterative process described above converges to a meaningful solution or it diverges. at this point, it is unclear what choices of the coefficients lead to what outcomes, but it appears that there are well defined basins of convergence in the space of the parameters. we first consider a high cost policy. figures - show the graphs of the solutions assuming the following parameters of the quadratic polynomial in the running cost function c (u , x ): unlike figure , the curves in all graphs below show the averages of the corresponding quantities over , monte carlo paths. under this running cost function, the optimal policy is to implement a draconian isolation regime, which leads to a rapid drop in infections, while keeping the susceptible fraction of the population at a very high level. on the other hand, figures - show the plots of the solutions assuming a low cost policy, with the following parameters of the quadratic polynomial in the running cost function c (u , x ): under this running cost function, the optimal policy is a moderate isolation regime. following this policy, the isolation rate is high, as the infections are low, and then it declines over time while the epidemic develops. unlike the policy above, this leads to a gradual decline in both the infected and susceptible fractions of the population. consider now the case of an optimal vaccination strategy. again, we make two choices of the running cost function: high cost and low cost. for the high cost case we choose the parameters as follows: the results of monte carlo simulations for this cost function are plotted in figures - . they parallel the results presented above in the case of high cost isolation mitigation. the optimal policy is a massive vaccination campaign that dramatically reduces the susceptible fraction of the population and leads to significantly lower infections. population biology of infectious diseases: part i dynamic programming time-optimal control strategies in sir epidemic models mathematical models in epidemiology stochastic epidemic models: a survey deterministic and stochastic models for recurrent epidemics discrete time approximation and monte-carlo simulation of backward stochastic differential equations, stochastic processes and their applications a scandal in bohemia, ( ), included in the adventures of sherlock holmes backward stochastic differential equations in finance controlled markov processes and viscosity solutions a stochastic differential equation sis epidemic model geometric numerical integration illustrated by the störmerverlet method optimal control of epidemics with limited resources solving high-dimensional partial differential equations using deep learning a contribution to the mathematical theory of epidemics stochastic stability of differential equations simulating hamiltonian dynamics options on infectious diseases managing counterparty credit risk via backward stochastic differential equations valuing american options by simulation: a simple least-squares approach continuous-time stochastic control and optimization with financial applications the mathematical theory of optimal processes stochastic controls: hamiltonian systems and hjb equations key: cord- - g scnj authors: harko, tiberiu; mak, man kwong title: series solution of the susceptible-infected-recovered (sir) epidemic model with vital dynamics via the adomian and laplace-adomian decomposition methods date: - - journal: nan doi: nan sha: doc_id: cord_uid: g scnj the susceptible-infected-recovered (sir) epidemic model as well as its generalizations are extensively used for the study of the spread of infectious diseases, and for the understanding of the dynamical evolution of epidemics. from sir type models only the model without vital dynamics has an exact analytic solution, which can be obtained in an exact parametric form. the sir model with vital dynamics, the simplest extension of the basic sir model, does not admit a closed form representation of the solution. however, in order to perform the comparison with the epidemiological data accurate representations of the time evolution of the sir model with vital dynamics would be very useful. in the present paper, we obtain first the basic evolution equation of the sir model with vital dynamics, which is given by a strongly nonlinear second order differential equation. then we obtain a series representation of the solution of the model, by using the adomian and laplace-adomian decomposition methods to solve the dynamical evolution equation of the model. the solutions are expressed in the form of infinite series. the series representations of the time evolution of the sir model with vital dynamics are compared with the exact numerical solutions of the model, and we find that, at least for a specific range of parameters, there is a good agreement between the adomian and laplace-adomian semianalytical solutions, containing only a small number of terms, and the numerical results. the study of the epidemic mathematical models in different formal formulation has been proved to be of crucial interest in the understanding of the dynamics, spread and control of epidemic diseases [ ] [ ] [ ] [ ] [ ] [ ] . the simplest of the epidemic models are the so-called deterministic compartmental models, consisting of at least three compartments: given by: the number of susceptible individuals (s), the number of infectious individuals (i), and the number of removed (and immune), or deceased individuals (r), respectively. the first compartmental epidemiological models have been proposed and investigated in [ ] . despite their apparent phenomenological simplicity, the mathematical equations describing the sir type models are essentially nonlinear, with this nonlinearity raising a number of important and interesting mathematical problems in the study of even the simplest models of epidemics. on the other hand the sir type models still have a powerful predictive and investigating power, and many of them have been used to investigate the recent covid- pandemic [ ] , like, for example, in [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . the basic equations of the sir model are given by the following nonlinear system of ordinary differential equations [ , ] respectively, where x(t) > , y(t) > and z(t) > , ∀t ≥ . the system of equations ( ) must be integrated with the initial conditions x ( ) = n ≥ , y ( ) = n ≥ and z ( ) = n ≥ , respectively, where n i ∈ ℜ, i = , , . the time evolution of the sir epidemic model, as well as its intrinsic dynamics is determined by two epidemiological parameters only, the infection rate β , and the mean recovery rate γ, respectively, which are assumed to be positive constants. the exact solution of the sir model ( ) can be obtained in an exact parametric form, and it is given by [ ] x = x u, where n = ∑ i= n i , x = n e (β /γ) n , and u is a parameter. in these equations x(t) gives the number of individuals not yet infected with the epidemics at time t, or those susceptible to the infection, y(t) represents the number of individuals who have already been infected with the disease, and hence are able to spread the disease to the persons in the susceptible category, while z(t) denotes the individuals who have been infected, but did recover from the disease. for a pedagogical discussion of the sir model and of its exact solution see [ ] . however, due to the integral representation of the time variable, some alternative representations and methods have been used for the study of the solution of the sir model [ ] [ ] [ ] , including the variational iteration method, and the homotopy perturbation method. a powerful method for obtaining semianalytical solutions of strongly nonlinear differential equations is represented by the adomian decomposition method [ ] [ ] [ ] [ ] , whose basic idea is to decompose the nonlinear terms appearing in a differential equation in terms of the adomian polynomials, which are constructed recursively. the adomian decomposition method, as well as its very effective version, the laplace-adomian decomposition method, has been extensively used to study the approximate semianalytical solutions of different types of differential equations, and physical models [ ] [ ] [ ] [ ] [ ] [ ] . the adomian decomposition method was used for the study of the sir model in [ ] , while a series solution of the sir model by using the laplace-adomian decomposition method was obtained in [ ] , where the solutions have been expressed in the form of infinite series. the series representations of the time evolution of the sir compartments obtained in [ ] have been compared with the exact numerical solutions of the model, and, for a specific range of the model parameters, a good agreement between the laplace-adomian semianalytical solutions, containing only three terms, and the numerical results, was obtained. one of the simplest generalizations of the sir model is given by the sir model with vital dynamics [ , , , ] , in which the number of deaths and births is included in the model via a new constant µ. usually it is assumed that the death rate is equal to the birth rate. even if the mathematical modifications of the initial sir model with µ = look, at first sight, minimal, the sir model with vital dynamics has very important differences as compared to the sir model ( ) . the sir model without vital dynamics with µ = , has a first integral n = s + i + r, which is the total number of individuals in the given population. moreover, it also has a second first integral g(s, i, r) = β r + γ ln s [ ] . hence the sir system with µ = is a completely integrable system with two functionally independent first integrals. on the other hand in the case of µ = , in [ ] it was proved that the sir model has no polynomial or proper rational first integrals. this result is obtained by studying the invariant algebraic surfaces. moreover, although the sir model with µ = is not integrable, and hence it does not have an exact solution, the global dynamics of the sir model with vital dynamics can be studied based on the existence of an invariant algebraic surface. in [ ] it was shown that the generalization of the sir model with µ = , including births and deaths, and described by a nonlinear system of differential equations, can be reduced to an abel type equation. the reduction of the sir model with vital dynamics to an abel type first order differential equation greatly simplifies the analysis of its properties. an approximate solution of the abel equation was obtained by using a perturbative approach, in a power series form. moreover, the general solution of the sir model with vital dynamics can be represented in an exact parametric form. in the present work we consider the possibility of obtaining some accurate semianalytical solutions of the equations of the sir model with vital dynamics by using the adomian and the laplace-adomian decomposition methods, respectively. as a first step in our analysis we reduce the sir model with µ = to a basic second order differential equation, describing the evolution of the individuals infected with the disease. then, once the solution of the basic equation is known, the general solution of the sir model with vital dynamics can be obtained in terms of the variable u related to the number y of the infected individuals. in order to obtain some approximate solutions of the basic evolution equation we will apply to it both the adomian and the laplace-adomian decomposition methods, we obtain in each case the recurrence relations giving the successive terms in the adomian series representation as a function of the adomian polynomials. in the case of the adomian decomposition method the iterative solution can be evaluated exactly only for the first two terms of the series expansion, while for the laplace-adomian decomposition method all the terms in the series expansion can be obtained exactly. we also perform a careful comparison of the semianalytical results with the exact numerical solutions, and it turns out that for certain ranges of the model parameters β , γ, µ) both the adomian and the laplace-adomian decomposition methods can give a good description of the numerical results. the present paper is organized as follows. the basic equation describing the dynamical evolution of the sir model with vital dynamics is obtained in section ii. the semianalytical solutions of the sir system for µ = are obtained, by using the adomian and the laplace-adomian decomposition methods in section iii. the comparison with the exact numerical solutions is performed in section iv. we discuss and conclude our results in section v. in the present section we will obtain the basic equation describing the dynamics of the sir model with vital dynamics. then, by using this equation, we will obtain semianalytical, but still having a high numerical precision, solutions of the sir model with vital dynamics, represented in the forms of adomian type series, containing exponential terms. the strongly nonlinear system of equations describing the sir model with vital dynamics is given by [ , , , ] , where x(t) > , y(t) > and z(t) > , ∀t ≥ . the system of strongly nonlinear differential equations ( )-( ) must be integrated with the initial conditions the time evolution of the model is determined by three epidemiological parameters, the infection rate β , the mean recovery rate γ, and µ, representing the natural death rate, which in the present investigation is assumed to be equal to the birth rate. in the following β , γ, and µ are assumed to be positive constants. by adding eqs. ( )-( ), and integrating the resulting equation we immediately obtain where n is an arbitrary integration constant. in order to assure that the total number of individuals in the group is a constant, we must fix the integration constant n as zero, n = . in the following, in order to significantly simplify the mathematical formalism we introduce first a new function u(t), related to y(t) by and which satisfies the initial condition and thus obviously giving y( ) = n . then from eq. ( ) we obtain for x(t) the simple expression by substituting eqs. ( ) and ( ) into eq. ( ), the latter becomes eq. ( ) can be reformulated as eq. ( ) represents the basic dynamical evolution equation of the sir model with vital dynamics. it must be solved with the initial conditions by representing z(t) as eq. ( ) becomes giving and respectively, where we have used the condition w( ) = z( ) = n . the general solution of eq. ( ) cannot be obtained in a closed (exact) form. in the present section we will consider the applications of the adomian decomposition method for the study of the sir model with vital dynamics. in order to obtain a semianalytical solution of the model we will consider first the solution of the basic evolution equation ( ) by obtaining the adomian type recursive relations for solving eq. ( ) in both the standard adomian as well as in the laplace-adomian decomposition methods. we will begin our investigations of eq. ( ) by applying first the standard adomian decomposition method. we integrate eq. ( ) between and t. with the use of the initial conditions we obtain again we integrate eq. ( ) between and t to find now we will apply the adomian decomposition method to eq. ( ) . for this we assume and where a n (t) are the adomian polynomials, defined for an arbitrary function f (u(t)) according to the general formula [ ] substituting eqs. ( ) and ( ) into eq. ( ) we obtain explicitly, we can write the above equation as now we take ... therefore we have obtained the iteration now we can use the cauchy formula for repeated integration, which gives thus eq. ( ) becomes in the following we denote hence the first adomian polynomial is given by for u (t) we obtain where erf(z) = / √ π z e −t dt is the error function. the next term u cannot be obtained in an exact form, and hence we will not present it. in the following we will look for a series solution of eq. ( ), by using the laplace-adomian decomposition method. in the laplace-adomian decomposition method we first apply the laplace transformation operator l x , defined as by using the properties of the laplace transform, giving for the laplace transform of u(t) the equation we assume now that the function u(t) can be represented in the form of an infinite series, where all the terms v n (t), n = , , , ... can be computed recursively. as for the nonlinear operator f (u(t)) = e u(t) , we assume that it can be decomposed as where a n (t) are the adomian polynomials, defined according to eq. ( ). the first five adomian polynomials can be obtained in the following form, substituting eqs. ( ) and ( ) into eq. ( ) we obtain the matching of both sides of eq. ( ) gives the following iterative algorithm for obtaining the power series solution of the basic evolution equation of the sir model with vital dynamics, ... by applying the inverse laplace transformation to eq. ( ), we obtain the expression of u (t) as we expand the term (β n /µn) e −µt in power series, and hence in the first order approximation we obtain the first adomian polynomial is given by therefore, within the adopted approximation, we immediately obtain the second adomian polynomial is obtained as and thus we find similarly, after computing the third adomian polynomial, given by we obtain u (t). the higher order terms in the laplace-adomian power series representation of u(t) can be obtained by following the same approach, but due to their length we will not present them here. hence, once the expressions of the terms u n (t), n = , , , .. in the adomian decomposition are known, the approximate semianalytical solution of the sir model with vital dynamics can be obtained as the present laplace-adomian series solution is valid only for some specific ranges of the model parameters (β , γ, µ) and of initial conditions (n , n , n ) . if (n + n ) n > (γ + µ) /β , or (n + n ) n > (γ + µ)/β , the exponential functions in eq. ( ) do diverge in the long time limit. similarly, for n ) , singularities develop in the adomian series. in the present section we compare the semianalytical predictions of the adomian and laplace adomian decomposition methods as applied to the sir model with vital dynamics with the exact numerical solutions, obtained by numerically integrating the system of equations ( )-( ). in the case of the adomian decomposition method we will approximate the solution by using only two first two terms in the adomian iterative scheme. however, we will also add to the series solution a truncation of the term u , approximated by u (t) ≈ −µ t u (t)dt. hence we approximate the adomian solution of the basic evolution equation of the sir model with vital dynamics as the comparison of the numerical and of the semianalytical adomian approximate solution is presented, for different values of the model parameters β , γ and µ in fig. . (or a vaccine) for it is mostly a medical/biological/virusological problem, the understanding of the spread of the epidemics may lead to the adoption/imposition of quarantine or safety measures that could drastically reduce its intensity. in this context the mathematical epidemiological models could play an important role, since a successful modeling of the spread of the disease could essentially contribute, on the level of the society, to the implementation of the best policies that could guarantee the maximal safety of the citizens. despite their (apparent) mathematical simplicity, the compartmental epidemiological models did play an important role in the analysis of the covid pandemic. these models contain the basic features of the evolution of an epidemics, and once the parameters of the model are fixed from the epidemiological data they can provide some accurate predictions for the evolution of the infectious diseases. from a mathematical point of view the sir model is exactly integrable, which simplifies its analysis, since its exact solution is known [ ] . on the other hand the sir model with vital dynamics is non-integrable, and therefore it can be investigated only by using numerical or semianalytical methods. in the present paper we have introduced the powerful adomian and laplace-adomian decomposition methods for the study of this model. however, in obtaining the explicit form of the adomian polynomials one must use some approximations for their estimation. in the present approach we have approximated the u (t) term by its first order series expansion, leading to the first adomian polynomial as given by eq. ( ). in the large time limit if β (n + n ) /n > γ + µ, a diverges. hence the laplace-adomian decomposition method works in the present case optimally if the condition (n + n ) /n < (γ + µ)/β . in the present paper we have presented a series solution of the non-integrable sir epidemiological model with vital dynamics, by using both the adomian and the laplace-adomian decomposition methods. the application of these methods allows to obtain the explicit time dependencies of x(t), y(t) and z(t). the semianalytical solutions have a simple mathematical form, and they describes very precisely the numerical behavior of the model for a large range of the model parameters (β , γ, µ). the time dependence of the three compartments in the semianalytical solution obtained by using the laplace-adomian decomposition method is given by a sum containing exponential terms. such a representation may simplify the fitting with the epidemiological results. the series, truncated to a small number of terms, give a very good description of the numerical results for a large number of values of the model parameters, and of the initial conditions. in fact, the two terms approximation obtained with the use of the adomian decomposition method also gives a good approximation of the results of the numerical integration of the sir model with vital dynamics for a large range of parameter values. exact solutions of the epidemiological models are important for epidemiologists because they allow the study of the spreading of infectious diseases in different situations. they are also helpful in the design of the best social strategies for their control. hopefully the results obtained in the present paper may also contribute to the investigations of the dynamics, evolution and long term impact of the present and of the future epidemics. mathematical biology: i. an introduction epidemic modeling: an introduction modeling infectious diseases contribution to the mathematical theory of epidemics clinical features of patients infected with novel coronavirus in effective containment explains subexponential growth in recent confirmed covid- cases in china simulating the spread of covid- via spatially-resolved susceptible-exposed-infected-recovered deceased (seird) model with heterogeneous diffusion size and timescale of epidemics in the sir framework, phys. d nonlinear phenom computing the density function of complex models with randomness by using polynomial expansions and the rvt technique. application to the sir epidemic model susceptible-infected-recovered (sir) dynamics of covid- and economic impact inferring change points in the covid- spreading reveals the effectiveness of interventions how to reduce epidemic peaks keeping under control the time-span of the epidemic forecasting covid growth in india using susceptible-infected-recovered (sir) model the challenges of modeling and forecasting the spread of covid- a feedback sir (fsir) model highlights advantages and limitations of infection-based social distancing estimation of covid- spread curves integrating global data and borrowing information phenomenological dynamics of covid- pandemic: meta-analysis for adjustment parameters on the emergence of a power law in the distribution of covid- cases optimal control of an sir epidemic through finite-time non-pharmaceutical intervention understanding the covid infection curves -finding the right numbers exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates a mathematical model of epidemics-a tutorial for students, mathematics variational iteration method for solving the epidemic model and the prey and predator problem solution of the epidemic model by homotopy perturbation method a new method for solving epidemic model a review of the decomposition method in applied mathematics solving frontier problems of physics: the decomposition method a comparison between the variational iteration method and adomian decomposition method solving new fourth-order emden-fowler-type equations by the adomian decomposition method a reliable algorithm for positive solutions of nonlinear boundary value problems by the multistage adomian decomposition method computation of the general relativistic perihelion precession and of light deflection via the laplace-adomian decomposition method analytical and numerical treatment of falkner-skan equation via a transformation and adomian's method solving the nonlinear biharmonic equation by the laplace-adomian and adomian decomposition methods, surveys in mathematics and its applications vortex solutions in atomic bose-einstein condensates via the adomian decomposition method solution of the epidemic model by adomian decomposition method a simple computational approach to the susceptible-infected-recovered (sir) epidemic model via the laplace-adomian decomposition method on the integrability of the sir epidemic model with vital dynamics key: cord- - unrcb f authors: gaeta, giuseppe title: social distancing versus early detection and contacts tracing in epidemic management date: - - journal: chaos solitons fractals doi: . /j.chaos. . sha: doc_id: cord_uid: unrcb f different countries – and sometimes different regions within the same countries – have adopted different strategies in trying to contain the ongoing covid- epidemic; these mix in variable parts social confinement, early detection and contact tracing. in this paper we discuss the different effects of these ingredients on the epidemic dynamics; the discussion is conducted with the help of two simple models, i.e. the classical sir model and the recently introduced variant a-sir (arxiv: . ) which takes into account the presence of a large set of asymptomatic infectives. different countries are tackling the ongoing covid- epidemics with different strategies. awaiting for a vaccine to be available, the three tools at our disposal are contact tracing, early detection and social distancing . these are not mutually exclusive, and in fact they are used together, but the accent may be more on one or the other. within the framework of classical sir [ ] [ ] [ ] [ ] [ ] and sir-type models, one could say (see below for details) that these strategies aim at changing one or the other of the basic parameters in the model. in this note we want to study -within this class of modelswhat are the consequences of acting in these different ways. we are interested not only in the peak of the epidemics, but also in its duration. in fact, it is everybody's experience in these days that social distancing -with its consequence of stopping all kind of economic activities -has a deep impact on our life, and in the long run is producing impoverishment and thus a decline in living conditions of a large part of population. in the present study we will not specially focus on covid, but discuss the matter in general terms and by means of generalpurpose models. our examples and numerical computations will however use data and parameters applying to (the early phase of) the current covid epidemic in northern italy, in order to have realistic examples and figures; we will thus use data and parameters arising e-mail address: giuseppe.gaeta@unimi.it from our analysis of epidemiological data in the early phase of this epidemic [ , ] . unavoidably, we will also here and there refer to the covid case. some observations deviating from the main line of discussionor which we want to pinpoint for easier reference to them -will be presented in the form of remarks. the symbol marks the end of remarks. in the sir model [ ] [ ] [ ] [ ] [ ] , a population of constant size (this means the analysis is valid over a relatively short time-span, or we should consider new births and also deaths not due to the epidemic) is subdivided in three classes: susceptibles, infected (and by this also infectives), and removed. the infected are supposed to be immediately infective (if this is not the case, one considers so called seir model to take into account the delay), and removed may be recovered, or dead, or isolated from contact with susceptibles. we stress that while in usual textbook discussions of the sir model [ ] [ ] [ ] [ ] the removed are either recovered or dead, in the framework of covid modeling the infectives are removed from the infective dynamics -i.e. do not contribute any more to the quadratic term in the eqs. ( ) below -through isolation. this means in practice hospitalization in cases where the symptoms are heavy and a serious health problem develops, and isolation at home (or in other places, e.g. in some countries or region specific hotels were used to this aim) in cases where it is estimated that there is no relevant risk for the health of the infective. in this sense, the reader should pay attention to the meaning of r in the present context. the nonlinear equations governing the sir dynamics are written as d s/d t = − α s i d i/d t = α s i − βi ( ) d r/d t = βi. these should be considered, in physicists' language, as mean field equations; they hold under the (surely not realistic) assumption that all individuals are equivalent, and that the numbers are sufficiently large to disregard fluctuations around mean quantities. note also that the last equation amounts to a simple integration, r (t) = r + β t t i(y ) dy ; thus we will mostly look at the first two equations in ( ) . we also stress, however, that epidemiological data can only collect time series for r ( t ): so this is the quantity to be compared to experimental data [ ] . in fact, as stressed in remark , in the case of a potentially dangerous illness (as covid), once the individuals are identified as infective, they are effectively removed from the epidemic dynamic through hospitalization or isolation. according to our eqs. ( ) , s ( t ) is always decreasing until there are infectives. the second equation in ( ) immediately shows that the number of infectives grows if s is above the epidemic threshold γ = β/α. ( ) thus to stop an epidemic once the numbers are too large to isolate all the infectives, we have three (non mutually exclusive) choices within the sir framework: (a) do nothing, i.e. wait until s ( t ) falls below the epidemic threshold; (b) raise the epidemic threshold above the present value of s ( t ) by decreasing α; (c) raise the epidemic threshold above the present value of s ( t ) by increasing β. in practice, any state will try to both raise β and lower α, and if this is not sufficient await that s falls below the attained value of γ . in order to understand how this is implemented, it is necessary to understand what α and β represent in concrete situations. the parameter β represents the removal rate of infectives; its inverse β − is the average time the infectives spend being able to spread the contagion. raising β means lowering the time from infection to isolation, hence from infection to detection of the infected state. the parameter α represents the infection rate , and as such it includes many thing. it depends both on the infection vector characteristics (how easily it spreads around, and how easily it infects a healthy individual who gets in contact with it), but is also depends on the occasions of contacts between individuals. so, roughly speaking, it is proportional to the number of close enough contacts an individual has with other ones per unit of time. it follows that -if properly implemented -social distancing results in reducing α. each of these two actions presents some problem. there is usually some time for the appearance of symptoms once an individual is infected, and the first symptoms can be quite weak. so early detection is possible only by fast tracing and laboratory checking of all the contacts of those who are known to be infected. this has a moderate cost (especially if compared to the cost of an intensive care hospital stay) but requires an extensive organization. on the other hand, social distancing is cheap in immediate terms, but produces a notable strain of the societal life, and in practice -as many of the contacts are actually work related -requires to stop as many production and economic activities as possible, i.e. has a formidable cost in the medium and long run. moreover, it cannot be pushed too far, as a number of activities and services (e.g. those carrying food to people, urgent medical care, etc.) can not be stopped. let us come back to ( ) ; using the first two equations, we can study i in terms of s , and find out that as we know that the maximum i * of i will be reached when s = γ , this allows immediately to determine the epidemic peak . in practice, i is negligible and for a new virus s corresponds to the whole population, s = n; thus note that only γ appears in this expression; that is, raising β or lowering α produces the same effect as long as we reach the same γ . on the other hand, this simple formula does not tell us when the epidemic peak is reached, but only that it is reached when s has the value γ . but if measures are taken, these should be effective for the whole duration of the epidemic, and it is not irrelevant -in particular if the social and economic life of a nation is stopped -to be able to evaluate how long this will be for. acting on α or on β to get the same γ will produce different timescales for the dynamics; see fig. , in which we have used values of the parameters resulting from our fit of early data for the northern italy covid- epidemic [ ] . this observation can be made more precise considering the scaling properties of ( ) . in fact, consider the scaling numerically integrated and i ( t ) plotted in arbitrary units for given initial conditions and α, β parameters (solid), the maximum i * being reached at t = t * . then they are integrated for the same initial condition but raising β by a factor ϑ = / (dashed) with maximum i β = r i * reached at time t β = σ β t * ; and lowering α by the same factor ϑ = / (dotted) with maximum i α = i β reached at time t α = σαt * . time unit is one day, α = ( / ) * − , β = / ; these parameters arise from our fitting of data from the early phase of covid epidemics in northern italy [ ] ; the population of the most affected area in the initial phase is about million, that of the whole italy is about million. the numerical simulation is ran with n = * ; it results it is clear that under this scaling γ remains unchanged, and also the equations are not affected; thus the dynamics is the same but with a different time-scale . the same property can be looked at in a slightly different way. first of all, we note that one can write α = β/γ ; moreover, α appears in ( ) only in connection with s , and it is more convenient to introduce the variable now, let us consider two sir systems with the same initial data but different sets of parameters, and let us for ease of notation just consider the first two equations of each. thus we have the two systems we can consider the change of variables ( λ > ) with this, ( ) becomes we can thus eliminate the factor λ in both equations. however, if we had chosen λ = β/β, we get ˆ β = β; if moreover γ = γ , the resulting equation is just but we had supposed the initial data for { s, i } and for { s , i } (and hence also for ϑ and ϑ ) to be the same. we can thus directly compare ( ) with ( ) . we observe that { ϑ , i } have thus exactly the same dynamics in terms of the rescaled time τ as { ϑ, i } in terms of the original time t . in particular, if the maximum of i is reached at time t * , the maximum of i is reached at τ * = t * , and hence at t * = λ τ * = λ t * . ( ) analytical results on the timescale change induced by a rescaling of the α and β parameters have recently been obtained by m. cadoni [ ] ; see also [ ] . we have supposed infected individuals to be immediately infective. if this is not the case an "exposed" class should be introduced. this is not qualitatively changing the outcome of our discussion, so we prefer to keep to the simplest setting. (moreover, for covid it is known that individuals become infective well before developing symptoms, so that our approximation is quite reasonable.) one of the striking aspects of the ongoing covid- epidemic is the presence of a large fraction of asymptomatic infectives [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ; note that here we will always use "asymptomatic" as a shorthand for "asymptomatic or paucisymptomatic", as also people with very light symptoms will most likely escape to clinical detection of covid -and actually most frequently will not even think of consulting a physician. in order to take this aspect into account, we have recently formulated a variant of the sir model [ ] in which together with known infectives i ( t ), and hence known removed r ( t ), there are unregistered infectives j ( t ) and unregistered removed u ( t ). note that in this case removal amounts to healing; so while the removal time β − for known infected corresponds to the time from infection to isolation, thus in general slightly over the incubation time t i (this is t i . days for covid), the removal time η − for unrecognized infects will correspond to incubation time plus healing time. in the model, it is supposed that symptomatic and asymptomatic infectives are infective in the same way. this is not fully realistic, as one may expect that somebody having the first symptoms will however be more retired, or at east other people will be more careful in contacts; but this assumption simplifies the analysis,and is not completely unreasonable considering that for most of the infection-to-isolation time β − the symptoms do not show up. the equations for the a-sir model [ ] are note that here too we have a "master" system of three equations (the first three) while the last two equations amount to di- the parameter ξ ∈ [ , ] represents the probability that an infected individual is detected as such, i.e. falls in the class i . in the absence of epidemiological investigations to trace the contacts of known infectives, this corresponds to the probability of developing significant symptoms. in the first (arxiv) circulated version [ ] of our previous work [ ] , some confusion about the identification of the class j was present, as this was sometimes considered to be the class of asymptomatic infectives, and sometimes that of not registered ones . while this is not too much of a problem considering the "natural" situation, it becomes so when we think of action on this situation. actually, and unfortunately, this confusion has a consequence exactly on one of the points we want to discuss here, i.e. the effect of a campaign of chasing the infectives, e.g. among patients with light symptoms or within social contacts of known infectives; let us thus discuss briefly this point. if j is considered to be the set of asymptomatic virus carriers, then a rise in the fraction of these who are known to be infective, and thus isolated, means that the average time for which asymptomatic infectives are not isolated is decreasing. in other words, we are lowering η − and thus raising η. on the other hand, in this description ξ is the probability that a new infective is asymptomatic, and this depends only on the nature of the virus and its interactions with the immune system of the infected people; thus in this interpretation ξ should be considered as a constant of nature, and it cannot be changed. (this is the point of view taken in [ ] ; however some of the assumptions made in its first version [ ] were very reasonable only within the concurrent interpretation, described in a moment.) on the other hand, if j is the class of unknown infectives, things are slightly different. in fact, to be in this class it is needed ( a ) that the individual has no or very light symptoms; but also ( b ) that he/she is not traced and analyzed by some epidemiological campaign, e.g. due to contacts with known infected or because belonging to some special risk category (e.g. hospital workers). in this description, η is a constant of nature, depending on the nature of the virus and on the response of the "average" immune system of (asymptomatic) infected people, while effort s to trace asymptomatic infectives will act on raising the probability ξ . we want to discuss the effect of early detection of infectives, or tracing their contacts, within the second mentioned framework. note that a campaign of tracing contacts of infectives is useful not only to uncover infectives with no symptoms, but if accompanied by effective isolation of contacts with known infectives, and thus of those who are most likely to be infective, it will also reduce the removal time of "standard" (i.e. symptomatic) infectives, possibly to a time smaller than the incubation time itself. in this sense, we will look at an increase in ξ as early detection of infectives , and at an increase in both β and η (thus a reduction in the removal times β − and η − ) as tracing contacts of infectives . this should be kept in mind in our final discussion about the effect of different strategies. as mentioned above, one should also avoid any confusion between asymptomatic and pre-symptomatic infection. in our description, pre-symptomatic infectives -i.e. individuals which are infective and which do not yet display symptoms, but which will at a later stage display them -are counted in the class of "standard" infectives, i.e. those who will eventually display symptoms and hence be intercepted by the health system with no need for specific test or contact racing campaigns, exactly due to the appearance of symptoms. actually one expects that except for the early phase of the epidemics in the countries which were first hit in a given area (such as china for asia, or italy for europe), when symptoms could be attributed to a different illness, most infections by symptomatic people are actually pre-symptomatic , as with the appearance of symptoms people are either hospitalized or isolated at home; and even before any contact with the health system they will avoid contacts with other -and other people will surely do their best to avoid contacts with anybody displaying even light covid symptoms. in the case of asymptomatic infectives, instead, unless they are detected by means of a test or contact tracing campaign -see the forthcoming discussion -they remain infective until they recover, so that in this case removal is indeed equivalent to (spontaneous) recovery. this approach, indeed, was taken in one of the areas of early explosion of the contagion in northern italy, i.e. in vò euganeo; this had the advantage of being a small community (about residents), and all of them have been tested twice while embargo was in operation. in fact, this was the first systematic study showing that the number of asymptomatic carriers was very high, quite above the expectations [ ] . apart from its scientific interest, the approach proved very effective in practical terms, as new infectives were quickly traced and in that specific area the contagion was stopped in a short time. while testing everybody is not feasible in larger communities, the "follow the contacts" approach could be used on a larger scale, especially with the appearance of new very quick kits for ascertaining positivity to covid. the model will thus react to a raising of ξ by raising the fraction of i within the class of infectives, i.e. in k = i + j; but at the same time, as critical patients are always the same, i.e. represents always the same fraction of k , we should pay attention to the fact they will now represent a lower fraction of i . the chinese experience shows that critical patients are about % of hospitalized patients (i.e. of those with symptoms serious enough to require hospitalization); and hospitalized patients represented about half of known infected, the other being cured and isolated at home. similar percentages were observed in the early phase of the covid epidemic in italy; the fraction of infectives isolated at home has afterwards diminished, but it is believed that this was due to a different policy for lab exams, i.e. checking prioritarily patients with multiple symptoms suggesting the presence of covid rather than following the contacts. actually this policy was followed in most of italy, but in one region (veneto) the tracking of contacts and lab exams for them was pursued, and in there the percentages were much more similar to those known to hold for china. in our previous work [ ] we have considered data for the early phase of covid epidemics in italy, and found that β − best fits them while the estimate η − was considered as a working hypothesis. this same work found as value of the contact rate in the initial phase α . * − , and we will use this in our numerical simulations. it should be stressed that the extraction of the parameter α from epidemiological data is based on the number s n of susceptibles at the beginning of the epidemic, thus α and hence γ depend on the total population. the value given above was obtained considering n = * , i.e. the overall population of the three regions (lombardia, veneto and emilia-romagna) which were mostly affected in the initial phase. our forthcoming discussion, however, does not want to provide a forecast on the development of the covid epidemic in northern italy; we want instead to discuss -with realistic parameters and framework -what would be the differences if acting with different strategies in an epidemic with the general characteristics of the covid one. thus we will adopt the aforementioned parameters as "bare" ones (different strategies consisting indeed on acting on one or the other of these) but will apply these on a case study initial condition; this will be given by one important parameter is missing from this list, i.e. the detection probability ξ . following li et al. [ ] we assumed in previous work that ξ is between / and / . later works (and a general public interview by the head of the government agency handling the epidemic [ ] ) suggested that the lower bound is nearer to the truth; moreover a lower ξ will give us greater opportunity to improve things by acting on it (we will see this is not the best strategy, so it makes sense to consider the setting more favorable to it). we will thus run our simulation starting from a "bare" value as for the total population, we set n = * . with these choices we get a projection of what could have happened if no action was undertaken. a note by an oxford group [ ] , much discussed (also in general press [ ] ) upon its appearance, hinted that in italy and uk this fraction could be as low as ξ = / . we have ascertained that with this value of ξ , and assuming α was not changed by the restrictive measures adopted in the meanwhile, the a-sir model fits quite well the epidemiological data available to the end of april. however, despite this, we do not trust this hypothesisat least for italy -for various reasons, such as (in order of increasing relevance): ( i ) a viral infection showing effects only in % of affected individuals would be rather exceptional; ( ii ) albeit in our opinion the effect of social distancing measures adopted in italy is sometimes overestimated, we trust that there has been some effect; ( iii ) if only % of infected people was detected, in some parts of italy the infected population would be over %. on the other hand, the main point made by this report [ ] , i.e. that only a large scale serological study, checking if people have covid antibodies, will be able to tell how diffuse the infection is -and should be performed as soon as possible -is by all means true and correct. see also [ ] . a look at eqs. ( ) shows that i will grow provided where again γ = β/α, and we have introduced the ratio x ( t ) of known infectives over total infectives. in other words, now the epidemic threshold depends on the distribution of infectives in the classes i and j . note that if x = ξ (as one would expect to happen in early stages of the epidemic), then γ i = γ . needless to say, we have a similar result for j , i.e. j will grow as far as thus the epidemic threshold for unregistered infectives is it is important to note that x is evolving in time. more precisely, by the equations for i and j we get dx dt the behavior observed in fig. , which displays x ( t ) and related quantities on a numerical solution of eq. ( ) , can be easily understood intuitively. in the first phase of the epidemic, there is an exponential growth of both i and j ; due to the structure of the equations, they grow with the same rate, so their ratio remains constant; on the other hand, once the dynamics get near to the epidemic peak, the difference in the permanence time of the two (that is, the time individuals remain in the infect class) becomes relevant, and we see (plots (a ) and (b ) of fig. ) that not only the peak for j is higher than the one for i , but it occurs at a slightly later time. moreover, descending off the peak is also faster for i , as β − < η − , and thus x further decreases, until it reaches a new equilibrium while both classes i and j go exponentially to zero. if we look at ( ) we see that for fixed s the variable x would have two equilibria (one stable with < x < and one unstable with x > , stability following from β − η > ), easily determined solving d x/d t = . numerical simulations show that -apart from an initial transient -actually x ( t ) stays near, but in general does not really sticks to, the stable fixed point determined in this way. a relevant point should be noted here. if we consider the sum ( ) of all infectives, the a-sir model can be cast as a sir model in terms of s, k , and q = r + u as as x varies in time, this average removal rate is also changing. on the other hand, the basic reproduction number (brn) ρ (this is usually denoted as r , but we prefer to change this notation in order to avoid any confusion with initial data for the known removed r ( t )) for this model will be in other words, not taking the asymptomatic infectives into account leads to an underestimation of the brn. if the standard sir model predicts a brn of ρ , the a-sir model yields a brn ˆ ρ given this means that the epidemic will develop faster, and possibly much faster, than what one would expect on the basis of an estimate of ρ based only on registered cases, which in the initial phase are a subset of symptomatic cases as the symptoms may easily be leading to a wrong diagnosis (in the case of covid they lead to a diagnosis of standard flu). with our covid-related values β = / , η = / , and assuming that in the early phase there is thus a good reason for being surprised by the fast development of the epidemic: the actual brn is substantially higher than the one estimated by symptomatic infections [ ] . more generally, one would wonder what is the effect of the "hidden" infectives j ( t ) on the dynamics of the known infectives i ( t ) -which, we recall, include the relevant class of seriously affected infectives -and it appears that there are at least two, contrasting, effects: . on the one hand, the hidden infectives speed up the contagion spread and hence the rise of i ( t ); . on the other hand, they contribute to group immunity, so the larger this class the faster (and the lower the i level at which) the group immunity will be reached. the discussion above shows that the balance of these two factors leads to a much lower epidemic peak, and a shorter epidemic time, than those expected on the basis of the standard sir model (albeit in the case of covid with no intervention these are still awful numbers). on the other hand, we would like to understand if uncovering a larger number of cases (thus having prompt isolation of a larger fraction of the infectives) by early detection , i.e. raising ξ , would alter the time-span of the epidemic. it appears that this effect can be only marginal, as it appears only past the epidemic peak. we stress that this statement refers to "after incubation" analysis; if we were able to isolate cases before they test positive -i.e. to substantially reduce β − -the effect could be different. we will discuss this point, related to contact tracing , later on. an ongoing epidemic is not a laboratory experiment, and apart from not having controlled external conditions, i.e. constant parameters, the very collection of data is of course not the top priority of doctors fighting to save human lives. there has been considerable debate on what would be the most reliable indicator to overcome at least the second of these problems. one suggestion is to focus on the number of deaths; but this is itself not reliable, as in many cases covid is lethal on individuals which already had some medical problem, and registering these deaths as due to covid or to some other cause depends on the protocol adopted, and in some case also on political choices, e.g in order to reassure citizens (or on the other extreme, to stress great care must be taken to avoid contagion). another proposed indicator, possibly the most reliable in order to monitor the development of the epidemic, is that of patients in intensive care units. this appears to be sufficiently stable over different countries, and e.g. the italian data tend to reproduce in this respect -at least in regions where the sanitary system is not overstretched -the chinese ones. in this case, ic patients are about % of the total number of hospitalized cases; in china and for a long time also in italy (when protocols for choosing would-be cases to be subject to laboratory analysis have been stable), hospitalized cases have been about half of the known infection cases, the other having shown only minor symptoms and been cured (and isolated) in their home. the other, more widely used, indicator is simply the total number of known cases of infection. in view of the presence of a large class of asymptomatic infectives, this itself is strongly depending on the protocols for chasing infectives. on the other hand, this is the most available indicator: e.g., the w.h.o. situation reports [ ] provide these data. each of these indicators, thus, has advantages and disadvantages. we will just use the who data on known infected. in particular, in the case of covid we expect that with ξ the "bare" constant describing the probability that an infection is detected, out of the class i ( t ) we will have a % of infected with little or no symptoms ( i l ), a % of standard care hospitalized infected ( i h ), and a % of ic hospitalized infected ( i ic ). needless to say, this class is the most critical one, also in terms of strain on the health system. more generally, we say that with ξ the "bare" constant describing the probability that the infection under study is detected, there is a fraction χ (of the detected infections) belonging to the i ic class; that is, i ic (t) = χ i(t ) . we stress this depends on the protocol used to trigger laboratory tests; in our general theoretical discussion, this is any such protocol and we want to discuss the consequences of changing this in the sense of more extensive tests. we are now ready to discuss how modification of one or the other of the different parameters ( α, β, ξ ) on which we can act by various means will affect the a-sir dynamics. as it should be expected, this will give results similar to those holding for the sir model, but now we have one more parameter to be considered and thus a more rich set of possible actions. fig. . the effect of a change in ξ on the i ic class. we have used β = / , η = / , and α = . * − as in fig. , with a total population of n = * , and ran simulations with ξ = / (solid curve) and with ξ = / (dashed curve). the substantial increase in ξ produces a reduction in the epidemic peak and a general slowing down of the dynamics, but both these effects are rather small. a more extensive test campaign will raise ξ , say from ξ to ξ ; but of course this will not change the number of the most serious cases, as these are anyway getting to hospital and detected as being due to the infection in question. thus the new fraction χ of detected infections which need special care will be such that in order to describe the result of raising ξ , we should thus compare plots of this is what we do, indeed, in fig. . raising ξ corresponds to having more infective detected, and has some advantages from the point of view of the epidemic dynamics. in practical terms, this means extending tests to a larger class of subjects, and be able to isolate a larger fraction of asymptomatic infectives with the same speed and effectiveness as symptomatic ones. a different strategy for rapid action is also possible, and it consists of rapid isolations of subjects who had contacts with people known to have been infected, or who have themselves been in contact with known infectives (and so on). in other words, the strategy would be to isolate would-be infection carriers before any symptom could show up. this means that β − could be even smaller than the usual infection-to-isolation time (about seven days for covid) for symptomatic infectives, and even shorter than the incubation time (about five days for covid). it should be stressed that as each of these "possible infected" might have a small probability of being actually infected (depending on the kind of contacts chain leading to him/her from known infectives), here "isolation" does not necessarily mean top grade isolation, but might amount to a very conservative lifestyle, also -and actually, especially -within home, where a large part of registered chinese contagions took place. (the same large role of in-home contagion was observed in italy in the course of lockdown.) we have thus ran a simulation in which ξ is not changed, but β is raised from β = / to β = / ; the result of this is shown in fig. . in this case we have a marked diminution of the epidemic peak, and a very slight acceleration of the dynamics. . the effect of a change in β on the i ic class. we have used ξ = / , η = / , and α = . * − as in fig. , with a total population of n = * , and ran simulations with β = / (solid curve) and with β = / (dashed curve). the substantial increase in β produces a marked reduction in the epidemic peak and a very slightly faster pace in the dynamics. we have so far not discussed the most basic tool in epidemic containment, i.e. social distancing. this means acting on the parameter α by reducing it. direct measurement on the epidemiological data for northern italy show that this parameter can be reduced to about % of its initial value with relatively mild measures. in fact, albeit the media speak of a generalized lockdown in italy, the measures have closed schools and a number of commercial activities, but for the rest were actually more pointing at limiting leisure walk and sports and somewhat avoiding contacts in shops or in work environment than to a real lockdown as it was adopted in wuhan. this is a basic action to be undertaken, and in fact it is being taken by all nations. it is also the simplest one to be organized (albeit with high economic and social costs in the long run) and an action which can be taken together with other ones. no doubt this should be immediately taken when an epidemic is starting, and accompanied by other measures -such as those discussed above. but here we want to continue our study of what it means by itself in terms of modification of the epidemic dynamics. it is not clear what can be achieved in terms of reduction of social contacts. in fact, once the epidemic starts most of the dangerous contacts are the unavoidable ones, such as those arising from essential services and production activity (e.g. production and distribution of food or pharmaceutical goods), contacts at home, and above all contacts in hospitals. thus, after a first big leap downward corresponding to closing of schools and universities on the one side, and a number of unessential commercial activities on the other, and restrictions on travels, it is difficult to further reduce social contacts, not to say that this would have huge economic and social costs, and also a large impact on the general health in terms of sedentariness-related illness (and possibly mental health). a number of countries tried to further reduce social contacts by forbidding citizens to get out of their home; this makes good sense in densely populated areas, but is useless in many other areas. the fortunate slogan "stay home" risks to hide to the general public that the problem is not to seclude oneself in selfpunishment, but to avoid contacts . we point out that there is a further obstacle to reducing social contacts: as seen in the context of the simple sir model, reducing α will lower the epidemic peak, but it will also slow down the whole dynamic . while this allows to gain precious time to prepare fig. . the effect of a change in α on the i ic class. we have used β = / , ξ = / , η = / , with a total population of n = * , and ran simulations with α = . * − (solid curve) and with α = . * − (dashed curve). the reduction in α produces a marked reduction in the epidemic peak and also a marked slowing down in the dynamics. hospitals to stand the big wave, there is some temporal limit to an extended lockdown, and thus this tool cannot be used to too large an extent. we have thus ran a simulation in which β and ξ are not changed, while α is reduced by a factor . (smaller factors, i.e. smaller α, produce an untenable length of the critical phase); the result of this is shown in fig. . in this case we have a relevant diminution of the epidemic peak, and also a marked slowing down in the dynamics. an important remark is needed here. it may seem, looking at this plot, that social distancing is less effective than other way of coping with the epidemic. but these simulation concern a sir-type model; this means in particular that there is no spatial structure in our model [ ] . the travel ban is the most effective way of avoiding the spreading of contagion from one region to the others; while the "local" measures of social distancing can (and should) be triggered to find a balance with other needs, travel ban is the simplest and most effective way of protecting the communities which have not yet been touched by the epidemic. we can thus compare the different strategies we have been considering. this is done in fig. where we plot together i ic ( t ) for all our different simulations; and in table where we compare the height of the epidemic peak -again for i ic ( t ) -and the time at which it is reached. fig. . the effect of different strategies. we plot i ic ( t ) for n = * in the "bare" case, i.e. for α = . * − , β = / , ξ = / , η = / , and in cases where (only) one of the parameters is changed. in particular we have the bare case (solid line), the case where ξ is changed into ξ = / (dotted), the case where β is changed to β = / (dashed), and that where α is changed to α = . * − (solid, blue). we also plot a horizontal line representing a hypothetical maximal capacity of ic units. (for interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) table epidemic peak (for i ic ) and time for reaching it (in days) as observed in our numerical simulations. all simulation were ran with n = * and η = in fig. we have also drawn a line representing the hypothetical maximal capacity of ic units. this stresses that not only the different actions lower the epidemic peak, but they also -and to an even larger extent -reduce the number of patients which can not be conveniently treated. in looking at this plot, one should remember that the model does not really discuss permanence in ic units, and that i ic are the infected which when detected will require ic treatment; this may go on for a long time -which is the reason why ic units are saturated in treating covid patients. so the plots are purely indicative, and a more detailed analysis (also with real parameters) would be needed to estimate the ic needs in the different scenarios. it should be stressed that the strategies of contacts tracing and early detection are usually played together; but as confusion could arise on this point, let us briefly discuss it. we have tried to stress that these two actions are not equivalent: one could conduct random testing, so uncovering a number of asymptomatic infectives, and just promptly isolate them without tracing their contacts;or on the other extreme one could just isolate everybody who had a (direct or indirect) contact with a known infective, without bothering to ascertain if they are themselves infective or not. this strategy would be as effective in containing the contagion (and less costly in terms of laboratory tests) than that of tracking contacts, test them (after a suitable time for the infection to develop and test give positive if this happens), and isolate only those who really turn infective. the difference is that if we isolate everybody this would involve a huge number of people (e.g. all those who have been in the same supermarket the same day as an infective; and their families and contacts etc etc); so in this context early detection actually should be intended as early detection of non-infectives , so that cautionary quarantine can be kept reasonably short in all the cases where it is not really needed. finally we recall that it is a triviality, and it was already mentioned in the introduction, that in real situations one has not to choose between acting on one or the other of the parameters, and all kind of actions should be pursued simultaneously. the numerical computations of the previous subsections suggest that increasing ξ -that is, detection of a larger fraction of asymptomatic -is not a very efficient strategy to counter the diffusion of an infection with a large number of asymptomatic infectives, while a prompt isolation of infectives is a more effective action. it should be recalled, however, that in our computations -and in particular on fig. , where their outcomes are compared -we are focusing on the number of patients needing ic support, i.e. the most critical parameter from the point of view of the health system. in order to substantiate our conclusions, it is worth considering also different ways to evaluate the effect of different strategies. we have thus considered also a different indicator, i.e. the total number of infectives we have run several simulations, with total population n = * and with parameters α = μ α α β = μ β β η = μ η η ξ = μ ξ ξ . the outcome of these simulations is displayed in fig. ; see its caption for the parameter (that is, the modulation factor) values in different runs. we see from fig. that action on α slows down substantially the epidemic dynamics and reduce the epidemic peak, while action on ξ or on β alone produce only a moderate effect. on the other hand, actions affecting the value of η (alone or together with the value of β) reduce substantially the epidemic peak and slightly slow down the dynamics. it may be noted that the shapes of the i ic ( t ) (see fig. ) and of the k ( t ) (see fig. ) are different; in particular, the decay of i ic ( t ) after attaining its peak is faster than the decay of k ( t ). this corresponds to what is observed in the epidemiological data for italy. we have considered epidemic dynamics as described by "mean field" models of the sir type; more specifically, we have first considered the classical kermack-mckendrick sir model [ ] [ ] [ ] [ ] [ ] and then a recently introduced modified version of it [ ] taking into account the presence of a large set of asymptomatic -and thus most frequently not detected -infectives. these models depend on several parameters, and different types of measures can to some extent change these parameters and thus the epidemic dynamics. in particular, this action can effect two basic characteristics of it, i.e. the height of the epidemic peak and the time-span of the epidemic. while it is clear that in facing a real lethal epidemics (such as the ongoing covid epidemic) all actions which can contrast it should be developed at the same time, in this paper we have considered the result -within these models -of different tools at our disposal, i.e. (generalized) social distancing, early detection (of asymptomatic infectives) and contacts tracing (of symptomatic and asymptomatic infectives). it turns out that -both in the classical sir model and in the modified a-sir one -social distancing is effective in reducing the epidemic peak, and moreover it slows down the epidemic dynamics. on the other hand, early detection of asymptomatic infectives seems to have only a moderate effect in the reduction of the epidemic peak for what concerns critical cases, and also a very little effect on the temporal development of the epidemic. in contrast, contact tracing has a strong impact on the epidemic peak -also in terms of critical cases -and does not substantially alter the temporal development of the epidemic, at least for what concerns the curve describing the most serious cases. remark . the conclusion that early detection of asymptomatic has only a moderate effect may appear to be paradoxical, and requires some further discussion. first of all we should remind that we are here actually talking about an increase of the parameter ξ (see remark ) , while in a real situation early detection of asymptomatic will most likely go together with early detection of symptomatic, and hence a reduction in β as well. the increase of ξ per se means that some fraction of asymptomatic will be recognized as infective and be isolated on the same timescale β − as the symptomatic infectives, while the other asymptomatic will escape recognition and still be infective on a timescale η − . on the other hand, a realistic contact tracing campaign will lead to prompt isolation of symptomatic and asymptomatic alike, and thus correspond to a reduction in β − and in η − , and we have seen that this action is indeed the most effective one in terms of contrasting the spread of the epidemic. in other words, our result suggests that the key to fight covid is not so much in detection , but in prompt isolation of infectives, and most notably of asymptomatic ones. this can be achieved only by contact tracing -as already suggested by experienced epidemiologists. slowing down the epidemic dynamic can be a positive or negative feature depending on the concrete situation and on the desired effects. it is surely positive in what concerns getting ready to face the epidemic peak, in particular in the presence of a faltering health system. on the other hand, it may be negative in that maintaining a generalized lockdown for a long time can have extremely serious economic and social consequences. balancing these two aspects is not a matter for the mathematician or the scientist, but for the decision maker; so we will not comment any further about this. it should also be recalled that our analysis was conducted in terms of very simple sir-type models, with all their limitations. in particular, we have considered no age or geographical or social structure, and only considered a population of "equivalent" indi-viduals. in particular, as we have noted above, in the early stage of an epidemic, which presumably develops in very populated areas, a generalized travel ban can simply stop the contagion to propagate to other (possibly less well equipped in medical terms) areas; moreover, social distancing measures can be implemented very simply -basically, by a government order (albeit if we look at the goal of these measures, i.e. reducing the occasion of exchanging the virus, a substantial role would be played by individual protection devices, such as facial masks; in many european countries, these were simply not available to the general public, and in some cases neither to medical operators, thus substantially reducing the impact of these measures) -and are thus the first action to be taken. in fact, in relation with the ongoing covid epidemics, one of the reproaches made to many governments is usually to have been too slow or too soft in stopping crowd gatherings, surely not the contrary. on the other hand, we hope that this study makes clear what are the consequences of different options. in particular, our study shows that contacts tracing , followed by prompt isolation of wouldbe infected people -is the only way to reduce the impact of the epidemic without having to live with it for an exceedingly long time. the veneto experience [ ] shows that this strategy can be effectively im plemented without hurting privacy or personal freedom. the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. we are now going to briefly discuss these matters; we point out that this appendix was inserted in the revised version of this paper, so it can make use of knowledge not available at the time of writing the first submitted version, nd mentions papers appeared after the first submittal. compartment models, i.e. sir-type ones in this context, are based on several implicit and explicit assumptions, which are not realistic in many cases and surely when attempting to describe the covid epidemic, in particular in a full country. that is, among other aspects, sir-type models are (in physics' language) mean field (averaged) models and as such describe the dynamics and the underlying system as if: • all individuals are equivalent in medical sense, i.e. they all have equivalent pre-existent health status and equivalent immune system and react in the same way to contact with the pathogen; • in particular, as we know that covid is statistically more dangerous for older people, we are completely disregarding the age structure of the population, as well as the existence of other high risk classes related to pre-existent pathologies: all these contribute to an average over the whole population; • all individuals are equivalent in social sense, i.e. they all have an equivalent social activity and hence the same number and intensity of contacts with other members of the group, thus the same exposure to (possible) infectives; • in particular, this means we are completely disregarding any geographical structure in the population, and consider in the same way people living in large cities or in remote villages, just considering them in the same global average; • similarly, we do not consider that work can cause some people to be specially exposed through contact with a large number of people (e.g. shop cashiers) or even with a large number of infected people (e.g. medical doctors or nurses). thus one cannot hope to retain, through such models, effects like the faster spreading of the infection in more densely populated areas or the specially serious consequences of the covid infection among older people. we stress that these could be obtained by including geographical, age or social structures into the model, i.e. increasing the number of considered classes; in principles this should provide a finer and more realistic description of the epidemic dynamic, and in fact it is done in cases for which there is a large set of data, e.g. for influenza. such structured models would of course loose the main attractive of the sir model, i.e. its simplicity -which also allows to understand in qualitative terms the mechanisms at work. in particular, a relevant intermediate class of models is that of sir-type models on networks: these take into account geographical and social structures and make use of known information about contacts between different groups of individuals and about different health characteristics of different groups. the problem with these networked or however more structured models is that the network should be inferred from data. in this respect, it could be objected that the influenza monitoring over many years could give us the relevant data for reconstruction of such network; but it is everybody's experience, by now, that the social behavior of people are completely different if dealing with a well known and not so serious (except for certain categories) illness like influenza or with an unknown and potentially lethal one like covid; this not to say that the restrictive measures put into effect by many governments have completely changed the interaction patterns among people, so that previously accumulated data cannot be used in the present situation. when thinking of covid; it should be kept in mind that even if the countries which were first hit by the epidemic, we only have data over some months; e.g. for italy we have about days of data. if we were trying to give the model a geographical structure at the level of departments (which are themselves administrative units mostly with a very varied internal geographical structure), as there are departments in italy this would require in the simplest form to evaluate a × interaction matrix, and i cannot see any way to reliably build this out of such a scarce set of data. moreover, the epidemiological data are to some extent not reliable, especially around the epidemic peak, in that they are collected in an emergency situation, when other priorities are present in hospitals (e.g. in italy the data show a weekly modulation, which appears to be due simply to the procedure of data collection); so an even larger amount of data would be needed to filter out statistical noise and random fluctuations. in this sense, the weak point of sir-type models, i.e. their being based on an average over the whole population, turns out to be an advantage: they contain few parameters (two for the sir, four for the a-sir) and are thus statistically more robust in that fluctuations are averaged efficiently with less data than for more refined models with a large number of parameters. similar considerations hold when one compares sirtype models to a purely statistical description or to an "emerging behavior" approach. these approaches are extremely powerful, but are effective when one has a large database to build on and to which compare the outcome of the "experiment" (in this case the epidemic) under consideration. when we deal with a completely new pathogen,like for covid, we simply don't have a database, and we can only rely on the very general features of infective dynamics -which are well coded by sir and sir-like models. in other words, we are not claiming the sir approach to be superior to others, but only that it is appropriate when we have few data -as for covid. within the sir-type class, the a-sir model is specially simple; from the theoretical point of view its appeal lies in that it is the simplest possible model taking into account the presence of a large class of asymptomatic infectives; thus it focuses on the effect of this fact without the complications of a more detailed model. but, of course, it makes sense to rely on this model only if it is able to give a good, or at least a reasonable, agreement with observed data. of course each infective agent has its own characteristics, and using only the general sir model would completely overlook them, apart from the different values of the α and β parameters. thus we have to do something more than just evaluating the sir parameters. in our study we have identified the presence of a large class of asymptomatic infectives as one of the key problems in facing the covid epidemic, and we have considered a simple model which allows to focus precisely on this aspect. one should be aware that the a-sir model is focusing on this and not considering other features of covid, and indeed other more detailed sir-type models for covid have been formulated and studied (see also below). here we are taking an approach which is classical in mathematical physics and mathematical modeling, i.e. try to build and study the simplest model describing the phenomenon of interest. this will give results which are quantitatively worse than a more detailed models, but which are qualitatively good in that the model is simple enough to see more clearly what are the mechanisms at work and to understand the qualitative features if the dynamics and the qualitative outcome of any intervention able to modify the parameters of the model. having said that, it remains true that -as mentioned aboveit makes sense to rely on this model only if it is able to give a reasonable agreement with observed data. this is not the argument of this paper, and it was discussed in a previous paper [ ] ; the success of this model was the justification for this paper,i.e. for dis- however, the first version of this paper was submitted at mid-may, hence with two and half months of data available, while at the time of preparing this revised version we have four months of data; that represents a substantial increase in the available data, and it makes sense to wonder if the model is still describing the covid epidemic in italy. this is indeed the case, as shown in fig. ; they represent epidemiological data as communicated by the italian health ministry and by who (and widely available online through the standard covid databases) against a numerical integration of the a-sir eqs. ( ) . we refer to gaeta [ ] for a discussion of the parameters and their determination. note that the contact rate α is assumed to vary in response to the restrictive measures (and to the availability of individual protection devices); as these measures were taken in different steps, we also have different values of α in different time intervals. more precisely, the equations were integrated for a total population of n = * for the period february , through june , with initial data at day ( we stress that the parameter values are the same as in [ ] , even for the most recent time, not considered in that paper: the model continues to reasonably well describe the development of the epidemic in italy. we focused on a specific sir-type model, but several models of this type have been considered in the context of covid modeling. here we give a very short overview of these, with no attempt to completeness -which cannot even be imagined in such a rapidly evolving field. first of all, we note that other researchers have considered, motivated by the ongoing covid epidemic, the temporal aspects of the standard sir dynamics. we mention in particular cadoni [ ] (a related, but quite involved, approach had been considered by harko, lobo and mak [ ] ) and barlow and weinstein [ ] , who obtained an exact solution for the sir equations in terms of a divergent but asymptotic series [ ] ; see also [ , ] for a different approach to exact solution of sir and sir-type models. we also note that nonlinear modifications of the bilinear infection term of the standard sir model have been proposed -explicitly or implicitly -in the attempt to relate the standard sir model to covid dynamics [ , ] . we find [ ] of special interest, as this work introduces a model for the epidemic dynamics coupled to the immune system, and is thus able to take into account aspects related to the viral charge of infectives. extension of the sir model in the direction of allowing timedependence of the parameters -also to account for shifting public attitude -has also been considered [ ] . as mentioned above, see remark , considering the delay between infection and beginning of infectiveness would lead to consider seir-type models. the problem of temporal aspects of the dynamics for this class of models has been considered by becaer [ ] . the role of asymptomatic transmission in this class of models has also been considered [ , ] . the approach to sir by barlow and weinstein [ ] leading to exact solution has been extended to seir model [ ] . a generalization of the a-sir model, allowing for different infectiveness of symptomatic and asymptomatic infectives, has been considered by neves and guerrero [ ] . more elaborated compartment models with a larger number of compartments have been considered by a number of authors. we would like to mention in particular two papers which we consider specially significant, i.e. the work by the pavia group, in which mathematicians, statisticians and medical doctors collaborated [ ] , and the work by fokas, cuevas-maraver and kevrekidis [ ] , in which such a model -involving five compartments like the present paper, but chosen in a different way -is used to discuss (as in the present paper) exit strategies from the covid lockdown. as mentioned in the main text, one could -and should -consider epidemic dynamics on networks [ ] . attempts to analyze the covid epidemic in this way have of course been pursued, both on a small scale, with a network structure which can be determined by direct sociological study [ ] , and on a nationwide scale [ ] where the network structure has to be determined. this latter study [ ] also attempted to evaluate the effect of the containment measures; such a matter is of course very relevant and has been considered by many authors in many countries; even a cursory mention of these is impossible, and we will just mention one study applying to italy [ ] . we also stress that many of the papers mentioned above, see in particular [ , ] aim at using the models they study to evaluate the effect of interventions and containment measures. finally we would like to end on a positive note, and mention that while on the one hand it was found that the presence of asymptomatic makes that the basic reproduction number of covid is higher than initially estimated [ , , ] , the fact that the social contact rate is not uniform in the population makes that the herd immunity level should be lower than predicted on the basis of the standard sir-type models [ ] ; this is a specially nice result of the analysis on networks,as it only depends on general -and very reasonable -properties of the network and not on its detailed structure, thus overcoming the low statistics problem mentioned in remark above. contributions to the mathematical theory of epidemics mathematical biology. i: an introduction essential mathematical biology the mathematics of infectious diseases mathematical models in biology. siam arxiv: . ; data analysis for the covid- early dynamics in northern italy a simple sir model with a large set of asymptomatic infectives a simple sir model with a large set of asymptomatic infectives how to reduce epidemic peaks keeping under control the time-span of the epidemic accurate closed-form solution of the sir epidemic model asymptomatic transmission, the achilles heel of current strategies to control covid- (editorial) presumed asymptomatic carrier transmission of covid- pre-and asymptomatic individuals contribute up to % of covid- transmission evidence supporting transmission of severe acute respiratory syndrome coronavirus while presymptomatic or asymptomatic temporal dynamics in viral shedding and transmissibility of covid- the rate of underascertainment of novel coronavirus ( -ncov) infection: estimation using japanese passengers data on evacuation flights estimation of the asymptomatic ratio of novel coronavirus infections (covid- ) estimating the asymptomatic proportion of coronavirus disease (covid- ) cases on board the diamond princess cruise ship substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) covid- : four fifths of cases are asymptomatic, china figures indicate presymptomatic sars-cov- infections and transmission in a skilled nursing facility prevalence of asymptomatic sars-cov- infection suppression of a sars-cov- outbreak in the italian municipality of vÓ substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) interview to the newspaper la repubblica fundamental principles of epidemic spread highlight the immediate need for large-scale seologicalsurvey to assess the sage of the sarscov- epidemic asymptomatic infectives and r for exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates on the summation of divergent, truncated, and underspecified power series via asymptotic approximants path integral approach to uncertainties in sir-type systems should the rate term in the basic epidemiology models be second-order? immuno-epidemiological model of two-stage epidemic growth a time-dependent sir model for covid- with undetectable infected persons un modèle mathématique des débuts de lépidémie de coronavirus en france accounting for symptomatic and asymptomatic in a seir-type model of covid- covid- pandemic: a mobility-dependent seir model with undetected cases in italy analytic solution of the seir epidemic model via asymptotic approximant predicting the evolution of the covid- epidemic with the a-sir model: lombardy, italy and sao paulo state modelling the covid- epidemic and implementation of population-wide interventions in italy a quantitative framework for exploring exit strategies from the covid- lockdown spread of epidemic disease on networks heterogeneous contact networks in covid- spreading: the role of social deprivation spread and dynamics of the covid- epidemic in italy: effects of emergency containment measures the impact of a nation-wide lockdown on covid- transmissibility in italy the impact of undetected cases on tracking epidemics: the case of the disease-induced herd immunity level for covid- is substantially lower than the classical herd immunity level the work was carried out in lockdown at smri. i am also a member of gnfm-indam. our discussion was based on sir-type models, and in particular on the a-sir model. this raises several kind of questions, which we address in this appendix. key: cord- -ajdkasah authors: rojas, s. title: comment on “estimation of covid- dynamics “on a back-of-envelope”: does the simplest sir model provide quantitative parameters and predictions?” date: - - journal: nan doi: . /j.csfx. . sha: doc_id: cord_uid: ajdkasah this comment shows that data regarding cumulative confirmed cases from the coronavirus covid- disease outbreak, in the period december , –june , of some countries reported by the european centre for disease prevention and control, can be adjusted by the exact solution of the kermack – mckendrick approximation of the sir epidemiological model. departamento de física / sartenejas, august , chief editor chaos, solitons and fractals dear editor, the submitted article: • provides the right numerical solution of the kermack and mckendrick approximation missed in the article inspiring our comment [ ] . • data from nine countries are shown to be described by the full sir epidemiological model. with kindest regards, profesor sergio rojas departamento de física in a recent article published in this journal [ ] , after some (unnecessary) considerations, the author presents the logistic function (equation ( ) in [ ] ) as an alternative solution of the differential equation known as the kermack and mckendrick approximation [ ] of the sir epidemiological model [ , ] in order to fit data regarding the cumulative confirmed of covid- infected cases from some countries. clearly, the proposed logistic function (equation ( ) in [ ] ) does not accomplish the initial condition r( ) = . in this note we show that the data of the countries discussed in reference [ ] and of a few other countries (see figure ) can also be fitted using the r(t) solution obtained from the kermack and mckendrick approximation of the sir model, making it unnecessary the use of the verhulst (logistic) equation ( ) in [ ] . the sir model considers a population of size n on which, at time t, s(t) individuals are susceptible of being infected as a consequence that i(t) individuals are already infected and can transmit or spread the disease to the susceptible population. the number of individuals r(t) represents those who has recovered from the disease (which, if lethal, also includes death individuals) and can not be reinfected. thus, the dynamics of the disease, introduced in by kermack and mckendrick [ ] , is modeled by the set of differential equations: in these equations, the parameters β (the infection rate) and γ (the recovery or removal rate of infectives) are constants: β controls the transition between s and i, equation ( ), while γ controls the transition between i and r, equation ( ). for an epidemic to occur [ , , , ] , the number of infected individuals needs to increase from the initial number of infected individuals i . this condition will happen if at time zero, s > sc = ρ = γ/β. that is, ρ represents a critical value for an epidemic to occur and the sir model reveals a threshold phenomenon [ ] . from a dimensional point of view, assigning no units to s, i, r, and n the parameters β and γ have units of inverse of time (measured typically in days, weeks or months in epidemiological records). quantitatively, while the interaction in the form of the product si makes it difficult to determine the parameter β from observed epidemiological data, from equation ( ) the inverse of the parameter (γ) gives a measure of the time spent by individuals in the infectious stage. consequently, by carefully observing the development of an infectious disease, the parameter γ can be estimated (as the inverse of the recovered or infectious period) by epidemiologists from epidemiological records. one should be aware that neither of the parameters β or γ remains constant as the infection evolves [ , , , ] . moreover, the assumptions on which the model are built are no longer valid as soon as sanitary interventions are applied to control the infection. as discussed in the epidemiological literature [ , , ] , a straight forward combination of the sir model equations ( )-( ) leads to a non-linear differential equation for dr/dt, interpreted as the properly counted individuals removed (either because they have recovered or death) from medical units: for not severe epidemics, kermack and mckendrick ( ) [ ] considered r(t)/ρ < and proposes that dr/dt could be approximated by: considering that s ( ) can be written in the form [ , ] where tanh(x) is the hyperbolic tangent of x, and here tanh − (x) is the inverse of the hyperbolic tangent of x. from equation ( ), we also obtain the kermack and mckendrick ( ) approximated solution (or the km approximation) of the sir model [ ] where sech(x) is the hyperbolic secant of x. kermack and mckendrick were able to study a bombay - plague using equation ( ) using cumulative confirmed cases data reported by the european centre for disease prevention and control [ ] regarding the coronavirus covid- pandemic outbreak, we used computing routines to fit data using equation ( ) written in the form: setting c = c tanh(c ) to meet the initial condition r(t = ) = . to find numerical solution of the sir model, equations ( )-( ), in addition to the parameters β and γ we also need to know initial conditions s = s(t = ), i = i(t = ), and r = r(t = ). as required by the sir model, we set r = . as already mentioned, an estimated for γ could be obtained from epidemiological records as its inverse ( /γ) determines the average infectious period of the disease [ ] . according to the european centre for disease prevention and control regarding the coronavirus covid- pandemic [ ] the infectious period is "· · · estimated to last for - days in moderate cases and up to two weeks on average in severe cases.". accordingly, γ values for computation used in this comment were set to yield an infectious period in that range. for β, s , and i it is not easy to have observed estimated values. to find reasonable starting values for them we applied a heuristic approach which turns out to be helpful in order to find numerical solution of the full sir model adjusting itself to data fitted by the kermack and mckendrick solution in equation ( ) . then, by a standard figure : the graph shows the km approximation r(t) given in equation ( ) (with fitting parameters shown in table , according to equation ( ) ) and the full numerical solution of the sir epidemiological model defined by equations ( )-( ) (with integration parameters compiled in table ), both with reasonable estimated absolute and relative rmse values are observed to adjust reported cumulative confirmed covid- cases for a number of countries. data source (available in [ ] ) covers the range december , -june , . trial and error approach we were able to find suitable parameters for solving the full sir model adjusting itself to the covid- cases reported in this comment. to have an idea of how well each one of the fit adjust itself to the data, we use the root mean square error (rmse) and the relative root mean square error (rmserel), defined as follows: here o i is the i th observation in the considered o data set; f i is the corresponding value obtained by the corresponding fitting method; and max(o) is the maximum value in the considered o data set. as the uncertainty in the observed values o i is unknown [ ] , it is unrealistic to emphasize any further statistical measure characterizing the estimated parameters used in the analysis for the covid- pandemic data set. at this point it should be mentioned that the numerical computational work in this comment was carried out via the python scripting programming language and the numpy/scipy/matplotlib libraries described elsewhere [ , ] . the data for analysis comes from european centre for disease prevention and control [ ] , and the period covered at the moment of start writing this note was (for most countries) december , -june , . compiled in table are the parameters that best fit the data (from each studied country) to the function r(t) of equation ( ) expressed in the form of equation ( ) the results are shown in figure . the reported rmse and relative rmse values are indicative that a reasonable fit has been attained. the corresponding fit is also indicative that it is unnecessary to use the logistic function (i.e. equation ( ) table : the values above were used to fit the covid- confirmed cases data for each country shown in figure . the fit corresponds to r(t) defined via the equation ( ) written in the form of equation ( ) . similarly, in table we compiled values of the quantities required to find numerical solution of the full sir epidemiological model defined by equations ( )-( ) for each country whose results is given in figure . also, the reported rmse and relative rmse values are indicative that a reasonable match has been attained. the results are indicative that the sir model is a good choice to get a better understanding of covid- data. country ( ) of the sir epidemiological model. we were also able to show that the full sir model could be solved numerically adjusting itself to the analyzed data. since other, more complex, alternative approaches to the problem has been proposed [ ] , at this point it is hard to establish for sure which model better describe the evolution of the coronavirus covid- pandemic [ ] . consequently, given that the sir model captures some of the covid- data behavior, it could provides guidance to get better insight on the evolution of the pandemic as the only two parameters (β and γ) entering in the model are more or less well understood by epidemiologists and can be guessed from the data. consequently, before considering more complex models (requiring much more parameters than the sir model), it is clear that a better qualitatively understanding of the parameters β and γ in addition to the initial condition i , s (restricted to n = i + s ) is necessary to give an appropriated quantitative account of an epidemic. we are confident that the methodology applied in the development of this comment could also be extended to analyze other sets of data. the author has no competing interest to declare. estimation of covid- dynamics "on a back-of-envelope": does the simplest sir model provide quantitative parameters and predictions? estimation of covid- dynamics "on a back-of-envelope": does the simplest sir model provide quantitative parameters and predictions? a contribution to the mathematical theory of epidemics mathematical biology. i. an introduction modeling infectious diseases in humans and animals an introduction to mathematical modeling of infectious diseases introduction to phase transitions and critical phenomena geographic distribution of covid- cases worldwide covid- pandemic modeling is fraught with uncertainties learning scipy for numerical and scientific computing numerical and scientific computing with scipy (book-video) analysis and forecast of covid- spreading in china, italy and france the author is grateful to an anonymous referee who kindly provided useful comments to improving this article. key: cord- -ho q e authors: huang, tongtong; chu, yan; shams, shayan; kim, yejin; allen, genevera; annapragada, ananth v; subramanian, devika; kakadiaris, ioannis; gottlieb, assaf; jiang, xiaoqian title: population stratification enables modeling effects of reopening policies on mortality and hospitalization rates date: - - journal: nan doi: nan sha: doc_id: cord_uid: ho q e objective: we study the influence of local reopening policies on the composition of the infectious population and their impact on future hospitalization and mortality rates. materials and methods: we collected datasets of daily reported hospitalization and cumulative morality of covid in houston, texas, from may , until june , . these datasets are from multiple sources (usa facts, southeast texas regional advisory council covid report, tmc daily news, and new york times county level mortality reporting). our model, risk stratified sir hcd uses separate variables to model the dynamics of local contact (e.g., work from home) and high contact (e.g., work on site) subpopulations while sharing parameters to control their respective $r_ (t)$ over time. results: we evaluated our models forecasting performance in harris county, tx (the most populated county in the greater houston area) during the phase i and phase ii reopening. not only did our model outperform other competing models, it also supports counterfactual analysis to simulate the impact of future policies in a local setting, which is unique among existing approaches. discussion: local mortality and hospitalization are significantly impacted by quarantine and reopening policies. no existing model has directly accounted for the effect of these policies on local trends in infections, hospitalizations, and deaths in an explicit and explainable manner. our work is an attempt to close this important technical gap to support decision making. conclusion: despite several limitations, we think it is a timely effort to rethink about how to best model the dynamics of pandemics under the influence of reopening policies. covid- has taken the international community by surprise [ ] . at the time of writing this paper, the covid- pandemic has surpassed million confirmed cases and , deaths worldwide [ ] . covid- is having a dramatic impact on health care systems in even the most developed countries [ ] . without effective vaccines and treatments in sight, the only effective actions include policies of containment, mitigation, and suppression [ ] . the infection, hospitalization, and mortality trends of covid- across different countries vary considerably and are affected mainly by policy-making and resource mobilization [ ] . predicting the local trends of the epidemic is critical for the timely adjustment of medical resources and for the evaluation of policy changes in an attempt to curtail the economic impact [ ] . in the united states, policies vary by state and city, and therefore, robust local models are essential for learning fine-grained changes that meet the needs of local communities and policymakers. under appropriate intervention, early studies observe a trajectory of consumption recovery near the end of the eight-week post-outbreak period (following the classical epidemiology models) [ ] . however, traditional models do not account for the impact of local policies, such as a multiphase reopening. the recent rebounds in texas indicate different trends in different counties, which motivated the need to study the underlying impact of policy on local mortality and hospitalization trends. in this paper, we present the design of our regional model and demonstrate its use by applying them to the houston, tx area marking their difference from the global trend estimation models. due to the lack of consistent and accurate estimations of infection rates in asymptomatic individuals (using, e.g., random serological testing [ ] ), we are focusing on mortality and hospitalization. we present the development of a forecasting model using local fine-grained hospital-level data to track the changes in hospitalization and mortality rates owing to reopening orders in the greater houston area encompassing nine counties in the state of texas, usa. the modeled area consists of , km , incorporating a population of , , adults and , , children (by the census) and includes over hospitals with a total bed capacity of , [ , ] . our methodological contribution is directly modeling the impact of phased reopening. we achieve it by splitting the targeted population into low-contact and highcontact groups (determined by the subpopulations that return to work at different phases of the reopening). the mechanism adjusts the proportion of infectious subpopulations (depending on their category of jobs) to quantitatively represent the policy impact on the epidemiological dynamical system (please refer to figure for a high-level overview). it can built into most existing epidemiological models without ease, offering additional explanation ability and better prediction efficacy. we demonstrated our new approach using a policy-aware risk-stratified susceptible-infectious-recovered hospitalization-critical-dead (ssir-hcd) model, which compared favorably to existing methods (including our neural network latent space modeling, a nonlinear extension of sir-hcd). there are many predictive models for covid- trend prediction. the center for disease control (cdc) also hosts different trend predictors [ , ] to forecast total death. there are several big categories: • purely data-driven models (with no modeling of disease dynamics), which includes regression-based parametric and non-parametric models (auto-regressive integrated moving average or arima, support vector regression, random forest), neural network (deep learning) based trend prediction (e.g., gt-deepcovid [ ] ), etc. • epidemiology based dynamic models based on grouping populations into a discrete set of compartments (i.e., states), and defining ordinary differential equations (ode) rate equations describing the movement of people between compartments: seir (susceptible, exposed, infected, recovered) models and their myriad variants are examples in this category. • individual-level network-based models: finest grain modeling of a population through agent simulation, such as the ones built in netlogo by marathe et al. [ ] and notredame-fred [ ] . • various ensemble and hybrid models: including the imperial college london short-term ensemble forecaster [ ] and ihme model [ ] that combines a mechanistic disease transmission model and a curve-fitting approach. among existing models, the ode compartment-based models occupy a middle ground between network models at the individual-level and purely count-driven statistical analyses that are disease-dynamics-agnostic, which will be our main interest in this paper. compartment models, which originated in the early th century [ ] , still represent the mainstream in epidemiological studies of infectious disease. they make a critical mathematical simplification by decomposing the entire population into compartments (i.e., states), e.g., susceptible, infectious, recovered, and use odes to model the transitions between the compartments (table ) . these compartment models make assumptions that the observation counts in the various compartments naturally reflect the reproduction number r that changes over time. the recent covid- pandemic, however, has introduced the need to incorporate lockdown policy interventions (i.e., how long the population will remain at home), which existing compartment models have not considered. we observe different patterns of hospitalization and mortality even within a single metropolitan area such as houston, tx, which means traditional epidemiology systems might not be sufficient to explain the dynamics. many people speculated that local policies (shutdown and reopening) could have introduced perturbations to the disease dynamics. still, it is not clear how to quantify their impacts and provide counterfactual reasoning to support future policy decisions. our ssir-hcd is a unique effort to close the modeling gap by using appropriate data to enrich the established compartment models. the only other relevant model [ ] focused on anti-contagion policies, which is significantly different from our phased reopening policies model in that we considered the stratified risks in the population (related to people who might have more chances of exposure, depending on the phases in the reopening policy). we collected experimental datasets of the daily reported hospitalization and cumulative morality of covid- that occurred in houston, texas, from may , (the start date of phase reopening in houston, tx) until june , . population data was collected by usa facts [ ] , industry employment data was gathered from u.s. bureau of labor statistics [ ] , and the hospitalization data sources originate from southeast texas regional advisory council (setrac) covid- report [ ] . we used tmc daily news [ ] to set the initial length of hospitalization for our model. we also used mortality data from the new york times county-level report [ ] ). note that new york times data combine confirmed and suspected cases in their reporting of mortality. to be consistent, we used setrac hospitalization reporting that contains both confirmed and suspected cases. in this study, we focused on the data from harris county, one of the nine counties in houston, tx with the largest population. we propose a forecast model based on sir-hcd with a novel variant on compartments to address the differences in local policy. in sir-hcd, the entire population is divided into six subgroups: susceptible population , exposed population , infectious population , recovered population , hospitalized population , critical population , and dead population . the transitions between sub-groups are governed by nonlinear ordinary differential equations. please refer to table for our nomenclature. we use the sir-hcd to model the state transitions. the model is a simplification of seir-hcd. we decide to drop exposed state (e), which cannot be reliably modeled in covid- because the cdc guideline for exposure, determined as staying within less than six feet for more than fifteen minutes from a person with known or suspected covid- [ ] , is too short a time period to be modeled adequately. thus, a simpler sir-hcd model, which assumes the possibility of direct transitions between the susceptible state and the infectious state, is more suitable in covid- . in the sir-hcd model, some susceptible people may turn into an infectious status after the incubation period. infectious people may either get hospitalized or recover after a certain period of time. a proportion of the hospitalized people might be admitted to the intensive care unit (icu), while the rest of them will recover in the hospital. similarly, among the critical cases (i.e., icu patients), some people might die, and others will recover. thus, the sir-hcd model follows a series of nonlinear odes to model the state transitions: note that ( ), which is shorted as and used interchangeably in our paper, denotes a dynamically changing reproduction number (considering several changes of quarantine policies published in houston). the symbol denotes the average incubation period of covid- . in the equations that models , , , , the term ℎ represents the average time that a patient is in a hospital before either recovering or becoming critical, and denotes the average time that a patient is in a critical state before either recovering or dying. in addition, refers to the asymptomatic rate in infected populations , refers to the critical rate in hospitalized population h, and refers to the deceased rate in critical population . this model is more robust than sir, as the introduction of more reliable observations of , , provides extra stabilization to the dynamic system. figure illustrates the sir-hcd model with its basic states and transitions implied by the ode function. with reopening policies in place, there are more interactions between people and so the likelihood of spread increases. our expectation is either that remains constant (because people maintain safe distances and follow cdc protocols), or (more likely) that it increases with spotty compliance with pandemic protocols. to make the computation tractable, we decide to use the inverse operation of the exponential hill decay equation to model as following, where refers to the rate of decay, and controls the shape of the decay. when = , the above exponential equation is just a monotonically increasing linear equation. we set the starting point = as the reopening date, may , . the initial states ( = ) and ( = ) are the numbers of reported hospitalized cases and cumulative mortality in harris county on that date. we decided not to rely on confirmed cases, assuming that the actual number for the infected population is larger than the reported number (such an effect has been reported in california [ ] and new york [ ] ). since a fraction of the actual infected patients were hospitalized on the first day, the initial infectious population ( = ) is therefore estimated to be times the initially hospitalized number ( = ), where is a positive constant coefficient. some studies suggested that true positive infectious cases should be - times more than the reported positives [ , ] . in the harris county projection, we set to be , assuming that ( = ) is approximately equal to "known positives" on the first day. to estimate the recovery rate, we divided harris' case mortality rate (the number of confirmed deaths on the current day) by the number of confirmed cases days before that, as reported by the new york times [ ] . the average mortality rate starting from may , , was %. therefore, we have an estimated recovery rate of %. in this case, the initially recovered individuals ( = ) = . ⋅ ( = − ) = . ⋅ ( = − ), where = − refers to days earlier than the starting date (i.e., april , ) . the number of critical individuals ( = ) is set to be % of hospitalized individuals ( = ) based on the average proportion of icu usages among covid- hospitalization in texas [ , ] . the initial number of susceptible population ( = ) is where is the total population in the county. in this section, we introduce the unique aspect of our model that differentiates it from existing ones. our intuition here is that people get infected either through family transmission or through social (including job) activities. in the transition from a strict stay-at-home to reopening, the population is subject to changes in their social activities, which impact their probability of infection as well as their risk of transmission to their family members. therefore, we can divide the total population in harris county into two groups; a low-contact group, which includes people in industries that were still closed (e.g., working from home subpopulation and their families, including those who are unemployed but not homeless), and a high-contact group includes people in industries that were reopened due to economic restart (e.g., working on site subpopulation and their families). intuitively, the subpopulation of people who work from home is those who continue to stay at home and have limited chances of contacting the working subpopulation. the two groups share the same fitted parameters , ℎ , , , , , as well as the same constant incubation period , but they are estimating different . we set the initial ( = )for the low-contact group is slightly lower than that of the high-contact group, and low contact is slightly higher than or equal to the high-contact group. this is to keep the low-contact being differentiated from high-contact overtime. the unique coupling strategy makes it possible to directly reflect the impact of policy into ssir-hcd (the superscript means squared as we model two subpopulations in the joint ssir-hcd model). according to reopening announcements released on the texas government website[ ] and the houston employment rates by industry (reported by the greater houston partnership research [ , ] ), necessary industries such as transportation, utilities, government, and a subset of the health services kept running before and during the reopening of the economy, accounting for . % of the population in houston. after releasing reopening phase i policies (may , ), % of the essential industries reopened, in addition to % health services, % professional and business service, and % leisure and hospitality, constituting a working on site (high-contact) subpopulation proportion of . % after subtracting the unemployment rate of . % [ ] . the proportion of the high-contact population after reopening phase ii (may , ) was a combination of % of the essential industries, % health services, % of professional and business service, and % of leisure and hospitality industries. hence, the high-contact proportion among reopening phase ii was . % after subtracting the unemployment rate. our model accounts for the change of low-contact and high-contact subpopulations between reopening phase i and reopening phase ii, therefore directly modeling the policy's impact on epidemiological data over time. our training process uses msle to minimize the errors in curve-fitting. additionally, we evaluated mean squared error (mse), but it was not used for the curve-fitting process. as the training period is very short, and the observation data is highly volatile, we do not directly use the raw daily reported data for our training and forecasting. similar to early work conducted by the school of public health at uthealth [ ] , we also observed some data bumps (i.e., a large number of cases counted on one date instead of spread over time) in the reported hospitalization and mortality. following the same consideration to avoid the influence of unreliable data on our modeling, we used a -day rolling average to smooth the raw inputs (and generating the training hospitalization and mortality data in the experiments). the hospitalization curve represents a delayed epidemic effect since the publication of the strict stay-at-home order on march , . after reopening policies were issued in texas (may , ), their impacts start to impact the dynamics in reopening phase i and reopening phase ii. our local hospitalization and mortality modeling aims to fit the most recent phases (i.e., reopening phase i and reopening phase ii) starting from may , , to june , . we validated the accuracy with data between june and june . for comparison, our baselines were time-series regression models (exponential smoothing, autoregression, and arima) and vanilla sir-hcd. we predicted hospitalization and mortality, respectively, in the time-series regression models because they lack the capability to account for hospitalization and mortality together in one model. we also included our own neural network sir-hcd model, which is equally flexible as ssir-hcd. interested readers can find the details in the appendix. trained with harris county cumulative hospitalization and mortality data in reopening phase i and reopening phase ii, our ssir-hcd model fits the trends in the training data well: reopening phase i (mse= . for hospitalization, mse= . for mortality) and reopening phase ii (mse= . for hospitalization, mse= . for mortality). as figure shows, the local hospitalization and mortality training curves are very close to the reported data, and the test curves also follow the data trends closely, which indicates our model is not overfitting to the training period. table shows the prediction accuracy of the baseline models and risk-stratified sir-hcd (ssir-hcd) model. for the hospitalization prediction, the proposed ssir-hcd model had a significantly higher accuracy (mse= . ) compared to the baselines (mse= . , . , . for three time-series regressions, and mse= . for vanilla sir-hcd). for mortality prediction, we found that the time-series regression models generally predict well, and our proposed model had comparable accuracy. this high accuracy in mortality prediction of the general time-series models is mainly because the mortality rates were more stable than the hospitalization curve over time. table displays the fitted values of eight training parameters in ssir-hcd equations for the low-contact group and the high-contact group. these fitted parameter values correspond well to the values obtained in previous studies of covid- [ ] , [ ] , [ ] . and the ratio of hospitalizations turning into critical is close to the average icu proportion among hospitalizations in harris county, which was % in our initial state settings [ , ] . the constant parameter is set at . in both groups based on the values suggested by the world health organization (who) [ ] and the cdc [ ] . as a sanity check, the values in the low-contact group are indeed lower than those values in the high-contact group, indicating a lower expected number of cases directly infected by individuals in the low-contact group. figure displays the ssir-hcd model's counterfactual analysis results (of our model) on what would have happened in the absence of reopening policies after days on may , . in the x-axis, day refers to may , , day refers to may , , and day refers to june , . we restored the proportion of low-contact people and high-contact people to the no-reopening status (corresponding to . % high-contact proportion of the population) while keeping all the trained parameters the same. upon excluding all changes resulting from the reopening policies, it is noted that both modeled hospitalization and mortality curves become dramatically flat. the hospitalization curve with intervention reaches its peak on day , reducing nearly , existing cases. this demonstrates that quarantine policies are effective in controlling the spread of coronavirus as well as reducing the number of hospitalizations and mortality rates. similarly, figure displays the counterfactual estimations on what would have happened if the texas government did not continue to reduce limitations in reopening phase ii. in figure , the presumed reopening policies in reopening phase i represent moderate control to the hospitalization and mortality curves, reducing nearly , existing cases. since a long stay-at-home order is not economically practical, our counterfactual analysis demonstrates that moderate reopening policies, keeping essential quarantine measures (such as mask order adoption), and opening several industries to lower capacity, may offer a reasonable middle ground between the strict quarantine and fully open economy. the chart of dynamic values show how dynamic differentiates the low-contact group and high-contact group such that modeled hospitalization and mortality curves would be flattened by increasing the proportion of the low-contact population. the model does not use one single reproduction number value to measure the integral transmission rate as the two subgroups have different levels of risks for getting infected. our ssir-hcd model forecasts fine-grained covid- hospitalization and mortality by accounting for the impact of local policies. one challenge is that the ssir-hcd model is very sensitive to the initial values of , , , , , and as the number of infectious agents is nonzero at the initial time point. we have managed to avoid overfitting the local time-series curve by deploying values based on the accumulated knowledge of these initial variables and also using a smoothed time series as a rolling -day average to alleviate the fluctuations. after variable adjustment, the predictive results obtained a low error rate, while also obtaining parameters that are close to real-world values, such as the asymptomatic rate that is close to . % in the covid- scenarios outcome summary [ ] . in publicly reported data, the cumulative mortality data in reopening phase ii do not perfectly follow the hospitalization trends. our expectation was that it would lag after the hospitalization cases by approximately days. the actual mortality rate fluctuated in the middle of reopening phase ii (despite we already smoothed the curve) when the number of hospitalization cases started to increase rapidly. nonetheless, our ssir-hcd model still approximates the hospitalization and mortality trends better than competing models. thus, our model is advantageous over baseline regressions. it can fit epidemiological data with complicated shapes, such as harris hospitalization data, based on the proportion of low-contact and highcontact groups and can consider several epidemiological states together into one model that can make predictions for one or more sub-populations simultaneously. in addition to forecasting, our model offers another unique functionality to support counterfactual analysis, which can be useful in supporting critical decision-making. however, our ssir-hcd model inherited sir-hcd in assuming a monotonically increasing . this assumption has limitations to the future when economic reopening might be paused due to the overestimation of (facing a big susceptible population). for example, if a local policy were to clamp down on exposure (e.g., mandating masks and other means to influence infectivity), it is not reflected in ssir-hcd, which is an obvious weakness. one possible strategy is to introduce an adjustable control to the model, such as our extended model called neural network sir-hcd (see appendix), which learns the quarantine strength over time to determine change. additionally, our model interprets the recovered population as those who can no longer infect other individuals under the condition that the number of susceptible individuals keeps decreasing over time. we did not consider the possibility that some covid- survivors may be reinfected after they were recovered, which could influence the modeling coronavirus transmission rate. several of these aspects involve controversial discussions in the scientific community, but a powerful model should be able to accommodate different assumptions. there are other reality constraints that our model is not taking into consideration. for example, the number of daily hospitalizations and critical patients cannot increase without limit due to total bed capacity in hospitals. in fact, texas medical center reported they reached % of icu basis capacity on june , [ ] . our model did not consider hospitalization and icu delays when some hospitals are fully loaded, which needs more model parameters. yet another limitation of our model is the lack of full consideration for population density, demographics composition, daily in-bound/out-bound traffic flows, and medical resource disparities. for example, many patients in harris county might come from other counties, but they are treated in the texas medical center (in harris county), so the total hospitalization and mortality might not completely match the local infection rates. joint consideration of multiple counties and decomposition of hospitalized patients in terms of their residency would produce more accurate predictions. we have presented a proof-of-concept of a policy-aware compartmental dynamical epidemiological model by stratifying populations into low-and high-risk groups based on people's affiliated industries during the reopening phases at a county level using limited data. we believe it is an important effort to better understand dynamic feedback of this stratification through an ode control system. there are many limitations and future directions that we have exposed through this exploration. we will further explore these challenges with more data and better assumptions to improve existing models. international trends of combating covid- : present and future perspectives: technium conference covid- trend estimation in the elderly italian region of sardinia mask wearing to complement social distancing and save lives during covid- unfolding trends of covid- transmission in india: critical review of available mathematical models estimation of reproduction numbers of covid- in typical countries and epidemic trends under different prevention and control scenarios assessing the tendency of -ncov (covid- ) outbreak in china the impact of the covid- pandemic on consumption: learning from high frequency transaction data searching for the peak google trends and the covid- outbreak in italy centers for disease control and prevention home -covid forecast hub a simulation study of coronavirus as an epidemic disease using agent-based modeling institute for health metrics and evaluation a contribution to the mathematical theory of epidemics the effect of large-scale anti-contagion policies on the covid- pandemic first-principles machine learning modelling of covid- preliminary analysis of covid- spread in italy with an adaptive seird model anjum. seir-hcd model. kaggle. covid- in the united states covid- data report tmc daily new covid- hospitalizations -texas medical center covid- ) data in the united states covid- antibody seroprevalence actual coronavirus infections vastly undercounted, c.d.c. data shows. the new york times estimating covid- antibody seroprevalence algorithm : l-bfgs-b: fortran subroutines for large-scale bound-constrained optimization covid- kaggle community contributions covid- scenarios epidemic analysis of covid- outbreak and counter-measures in france coronavirus disease (covid- ) situation report - . who website texas won't specify where hospital beds are available as coronavirus cases hit record highs. the texas tribune quantifying the effect of quarantine control in covid- infectious spread using machine learning a method for stochastic optimization github link: https://github.com/shayanshams /nn-sir-hcd [neural network sir-hcd] neural network sir-hcd model (with adjusted quarantine control) since the controlling parameter within the sir-hcd model does not account for specific quarantine effects. in reality, it is reasonable to consider the quarantine factors for adjusting free parameters in our existing sir-hcd model. therefore, we utilize a multilayer perceptron (mlp) architecture [ ] , [ ] to estimate the hidden variable and augment the epidemiological estimation process. the augmented model introduces a quarantine strength term and quarantined population . we designed ( ) as an n-layer mlp network with a weighted vector , and the input vector ( ) = ( ( ), ( ), ( ), ( ), ( ), ( ), ( )). therefore, the hidden variable ( ) is estimated as ( ) = ( ( ), ) the original reproduction number at each timestep is the constant value. we aim to adjust the value of by adding the variant quarantine strength term ( ) so that the curve could be more flexible to fluctuate policy changes.we utilized a multilayer perceptron (mlp) network with two hidden layers in our implementation. the deep learning adjusted sir-hcd model is trained by minimizing the weighted mean squared log error loss function using the adam optimizer [ ] for iterations. the loss function calculates the weighted average squared error, in which the weight at a later time has a higher value. the optimization continues until the loss value converges. key: cord- -t dk syq authors: nadini, matthieu; zino, lorenzo; rizzo, alessandro; porfiri, maurizio title: a multi-agent model to study epidemic spreading and vaccination strategies in an urban-like environment date: - - journal: appl netw sci doi: . /s - - - sha: doc_id: cord_uid: t dk syq worldwide urbanization calls for a deeper understanding of epidemic spreading within urban environments. here, we tackle this problem through an agent-based model, in which agents move in a two-dimensional physical space and interact according to proximity criteria. the planar space comprises several locations, which represent bounded regions of the urban space. based on empirical evidence, we consider locations of different density and place them in a core-periphery structure, with higher density in the central areas and lower density in the peripheral ones. each agent is assigned to a base location, which represents where their home is. through analytical tools and numerical techniques, we study the formation mechanism of the network of contacts, which is characterized by the emergence of heterogeneous interaction patterns. we put forward an extensive simulation campaign to analyze the onset and evolution of contagious diseases spreading in the urban environment. interestingly, we find that, in the presence of a core-periphery structure, the diffusion of the disease is not affected by the time agents spend inside their base location before leaving it, but it is influenced by their motion outside their base location: a strong tendency to return to the base location favors the spreading of the disease. a simplified one-dimensional version of the model is examined to gain analytical insight into the spreading process and support our numerical findings. finally, we investigate the effectiveness of vaccination campaigns, supporting the intuition that vaccination in central and dense areas should be prioritized. the number of people living in urban areas has already exceeded billions and it is estimated to reach billions by (ritchie and roser ) . global urbanization poses new challenges in different sectors, ranging from transportation to energy supply, environmental degradation, and healthcare (cohen ) . among these challenges, understanding how urban environments shape the evolution of epidemic outbreaks and designing effective containment strategies have recently drawn considerable attention from researchers and media. paradigmatic are the examples of recent outbreaks, such as the sars (smith mers (de groot et al. , and the ongoing covid- (chang et al. ; chinazzi et al. ; ferguson et al. ) . analyzing how diseases spread within urban environments has been the topic of various experimental and theoretical studies (eubank et al. ; satterthwaite ; alirol et al. ; neiderud ; telle et al. ; massaro et al. ) . experimental studies have offered a detailed analysis of urban environments (satterthwaite ; neiderud ) , suggesting specific preventive measures for both urban residents and travelers (alirol et al. ) . theoretical studies have provided insights on how to contain outbreaks (eubank et al. ) , as well on possible key drivers of contagion, such as the role of human mobility patterns (massaro et al. ) and socio-economical risk factors (telle et al. ) . despite the importance of urban environments in the global diffusion of diseases (brockmann and helbing ) , how epidemic outbreaks unfold therein is yet to be fully elucidated. some attempts to mathematically describe the diffusion of diseases within and among cities can be found in metapopulation models (colizza and vespignani ; colizza and vespignani ; liu et al. ) . in these models, a fixed network of spatial localities is used to model the mobility patterns between cities, where homogeneously-mixed populations are affected by the epidemic. while metapopulation models can be, at least partially, tackled through analytical methods (colizza and vespignani ; colizza and vespignani ; liu et al. ) , considerable experimental evidence challenges the assumption of homogeneously-mixed populations, which could yield misleading estimates of the extent of epidemic outbreaks (pastor-satorras et al. ) . on the other side of the spectrum of epidemic models, agent-based models (eubank et al. ; degli atti et al. ; gilbert ) constitute a valuable framework to offer a realistic description of how diseases diffuse within urban environments. currently, this class of models is being leveraged to predict the diffusion of the covid- (chang et al. ; ferguson et al. ) , informing the design and implementation of timely containment measures. however, those advantageous features are accompanied by some drawbacks, including the need of mobility data and models, the use of massive computational resources when the system size scales up, and the lack of analytical techniques for model characterizations. a viable approach to agent-based modeling is based on two-dimensional representations, where agents move and interact according to proximity criteria (frasca et al. ; frasca et al. ; zhou and liu ; buscarino et al. ; yang et al. ; buscarino et al. ; huang et al. ; peng et al. ) . as a first approximation, the motion of the agents can be described according to a random walk with sporadic long range jumps (frasca et al. ) . building on this approximation, it is possible to include realistic features such as nonhomogeneous infection rates (buscarino et al. ) and heterogeneous radii of interaction (huang et al. ; peng et al. ) . much work is needed, however, to fully capture and describe realistic patterns of human mobility, which are shaped by the complex structure of urban environments (alessandretti et al. ) . here, we contribute to the field of agent-based modeling by presenting a twodimensional model that is capable of reproducing a spatially inhomogeneous urban-like environment, in which a heterogeneous population follows realistic rules of mobility. inspired by previous theoretical studies (huang et al. ; peng et al. ) , we assume that agents have a heterogeneous radius of interaction, which accounts for variations among individuals in their involvement in social behavior and activities. we consider a urban-like environment composed of multiple locations, each of them representing a well-defined region of the urban space (that is, a neighborhood of a city). through this spatial organization, our model is able to encapsulate two key features of urban environments. first, it can reproduce typical core-periphery structures, where central regions are more densely populated than peripheral ones (makse et al. ; de nadai et al. ) . second, it allows to mimic the inhomogeneity in movement patterns of humans, where people tend to spend most of their time in a few neighborhoods -for example, experimental studies suggest that individuals spend most time either at home or at work, while only sporadically visiting other neighborhoods (brasche and bischof ; schweizer et al. ; matz et al. ) . to reproduce realistic conditions for agents' movement patterns, we posit two different mobility schemes, applied within and outside the agents' base location (that is, where their home is). while the homogeneous mixing assumption seems reasonable within the agents' base location, we assume that agents tend to move outside of their base location following a gravity model and a biased random walk. hence, agents are more likely to explore regions close to their base location rather than remotely-located regions (brasche and bischof ; schweizer et al. ; matz et al. ) . from this mobility pattern, we construct a network of contacts, whose topology is examined in this study. through some mathematical derivations and numerical simulations, we seek to identify analogies between the proposed agent-based model and existing temporal network approaches, where spatial mobility is lumped into nodal parameters (perra et al. ; zino et al. ; zino et al. ; nadini et al. ; nadini et al. ) . we adopt the proposed framework to study how urban-like environments shape the diffusion of infectious diseases, using the illustrative epidemic models with the possibility of reinfection (susceptible-infected-susceptible, or sis) or permanent removal (susceptible-infected-removed, or sir) (keeling and rohani ) . our results confirm the intuition that agents' density plays a critical role on the diffusion of both sis and sir processes. in the limit case where the entire urban area consists of one location, agents that move outside the location only seldom interact with other agents, thereby hindering the contagion process. in the more realistic scenario of a core-periphery structure with multiple locations, we unexpectedly find that the time spent by agents in their base location does not influence the endemic prevalence in the sis model and the epidemic size in the sir model, which are measures of the overall fraction of population that is affected by the disease. a possible explanation for this counterintuitive phenomenon may be found in the agents' mobility rules. in fact, commuting patterns that bring agents from central areas to peripheral ones may yield a reduction in the diffusion in the central areas. contrarily, commuting patterns from peripheral to central areas lead to the opposite effects. to detail the working principles of this unexpected result, we present a minimalistic one-dimensional version of the model, which is amenable to a complete analytical treatment, thereby offering some preliminary analytical insight into the role of model parameters on epidemic spreading. we also explore the interplay between the agents' radius of interaction and their positioning in the core-periphery structure. we find that when agents' with larger radii are assigned to the less dense and peripheral locations, then the endemic prevalence (in the sis model) and the epidemic size (in the sir model) strongly decrease with respect to a random assignment. moreover, when agents' with larger radii are assigned to denser (and central) locations, the fraction of population affected by the disease is not sensibly increased. hence, our results support the intuition that more central areas are the crossroads of individuals commuting in a city and are critical for the spread of diseases. finally, we numerically analyze the effect of targeted vaccination strategies, which consist of immunizing a portion of the population in a specific location, prior to the disease onset. consistent with the intuition that central locations play a key role on the spread of epidemic diseases, we find that the best strategy is to prioritize the vaccination of agents belonging to central urban areas. the rest of the manuscript is organized as follows. in table , we summarize the notation and the nomenclature used throughout the paper. in "model", we introduce the model of agents' mobility. in "temporal network of contacts", we describe and analyze the temporal network formation mechanism. in "epidemic processes", we analytically and numerically study the spread of epidemic processes and compare several vaccination strategies. in "discussion and conclusion" sections, we discuss our main findings and propose further research directions. we consider n ≥ agents, labeled by positive integers v := { , . . . , n} . agents move in a square planar space with side length d > and with periodic boundary conditions (frasca et al. ) , that is, when an agent exits through one side of the square planar space, it re-appears on the opposite side. the position of agent i ∈ v at the discrete time t ∈ z ≥ in a cartesian reference frame is denoted by ( we deploy the n agents over l locations, each of them representing a bounded portion of the square space. the set of all locations is l = { , . . . , l} and each location ∈ l occupies a convex region of the planar space we assume that all the locations are mutually disjoint and we order them in ascending order according to their area, that is, a ≤ · · · ≤ a l . we hypothesize that a l d , that is, each location is much smaller than the whole square space. each agent is assigned a specific base location (that is, their home) according to a map: β : v −→ l; we assume that each base location is associated with the same number of agents, n = n/l. as a result, the density of agents assigned to location , varies with the location. also, locations are sorted in descending order of density, that is ρ ≥ · · · ≥ ρ l . for simplicity, in the numerical simulations implemented throughout this paper, the convex regions are taken as circles with nondecreasing radii ≤ · · · ≤ l . inspired by empirical and theoretical studies (witten jr and sander ; vicsek ; makse et al. ; de nadai et al. ) , radii of the locations are extracted from a power law distribution with probability density function p( ) ∝ −γ , with bilateral cutoffs such that ∈ [ min , max ], for any ∈ l. the upper bound guarantees that all locations fit in the exponents of the power law distribution of locations' radii min , max lower and higher cut-off of the distribution of locations' radii g(σ ) probability density function of agents' radii of interaction ω exponents of the power law distribution of radii of interaction σ min , σ max lower and higher cut-off of the distribution of radii of interaction k i degree of agent i λ infection probability per contact μ recovery probability per unit time fraction of susceptible, infected, and recovered agents in the system fraction of susceptible, infected, and recovered agents in fraction of susceptible, infected, and recovered agents at distance d from the closest location (t) contagion probability in out,d (t) contagion probability at a distance d from the closest location · statistical average of the quantity "·" expected value of the quantity "·" probability of an event "·" square, and the lower bound sets a maximum to the locations' density. since the radii are power law distributed with exponent −γ , the areas of the locations are also power law distributed with exponent − γ and cutoffs such that a ∈ π min , π max . empirical studies on urban environments suggest that cities are constructed according to a core-periphery structure, whereby locations with smaller areas and denser population are located in their center, while locations with larger areas and sparser population pertain to peripheral areas (makse et al. ; de nadai et al. ) , as shown in fig. a . we implement a heuristic algorithm to generate a locations' layout according to a coreperiphery structure and qualitatively reproduce empirical results. figure b shows the output generated by our algorithm, whose structure is qualitatively consistent with the empirical observations reported in fig. a . details of the algorithm used to create such a core-periphery structure are presented in appendix a. at the present time, the technical literature has yet to empirically study the relationship between the agents' radius of interaction and the density of their base location. in this paper, we explore different scenarios aiming at offering a first theoretical understanding of the impact of this potential relationship on the evolution of disease processes. unless otherwise specified, we consider that the n members of each location are randomly chosen, independently of their radius of interaction. we also examine the cases in which there is a correlation (positive or negative) between the agents' radius of interaction and the density in their base location: a positive correlation means that agents with larger radius are assigned to denser (central) locations, while a negative correlation identifies the case in which agents with larger radius are placed in the less dense (peripheral) locations. agents' positions evolve according to a discrete-time dynamics. hence, their positions are updated at each discrete time-step t ∈ z ≥ . the law of motion of the generic agent i depends on whether it is outside or inside its base location β(i) ∈ l. if agent i ∈ v is outside its base location, that is, (x i (t), y i (t)) / ∈ β(i) , it performs a biased random walk fig. qualitative comparison between real datasets from an experimental study (de nadai et al. ) , and the output of our algorithm. a experimental results about human digital activity density in the cities of milan and rome, italy. the highest density is registered in central areas, while lower densities are observed in peripheral ones. b using our algorithm, we generate l = , circular locations distributed in rings of decreasing densities. the first few rings contain the denser locations (darker central regions) and may parallel the city center of a urban environment, while the outer rings are less dense and represent peripheral areas (light gray regions). source of a: (de nadai et al. ) . parameters used to generate b: d = , , min = , max = , and γ = . . details of the generative algorithm used are available in appendix a toward its base location; on the contrary, if it is inside its base location, it can move to a random position (within its base location), or exit according to a probabilistic mechanism. specifically, if the agent is not in its base location, then here, v > is the (constant) speed and θ i (t) is an angle, determined as the sum of two terms: ( ) the first term, i (t), represents the angle of the direction of the shortest path from (x i (t), y i (t)) to the region β(i) , that is, to the agent base location. this quantity is formally defined by introducing so that the second term, α θ it , is modulated by θ it , which is a random variable that takes values uniformly in [ −π, π] and is extracted independently at every time t and for every agent i, and by α ∈[ , ], which is a randomness parameter that regulates how much the agents tend to deviate from the shortest path to return to their base location, when they are outside it. when α = , the agent moves completely at random, while, when α = , it moves along the shortest path toward its location. when the agent is in its base location, (x i (t), y i (t)) ∈ β(i) , the law of motion is defined as follows. given a parameter p ∈[ , ] (constant in time and equal for all agents), with probability − p, the agent moves to a position chosen uniformly at random within its base location, so that its position is completely uncorrelated with the previous one. otherwise, with probability p, the agent jumps outside its base location, ending in a position of the remaining space according to a distance decay law. in particular, we assume that the distance from the border of the base location at which an agent jumps is the realization of a random variable exponentially distributed with exponent c > . the corresponding probability density function p jump (d) is equal to for d ≥ . hence, the expected distance at which an agent jumps is equal to /c. a sensible choice of the exponent in the law in eq. ( ) yields a typical behavior observed in many empirical studies (levinson and el-geneidy ; boussauw et al. ) , whereby agents tend to gravitate within and around their base location, while sporadically initiating journeys toward further locations (liu et al. ) . two salient snapshots of agents' motion are illustrated in fig. a and c. . direction and modulus of the agents' velocity is drawn with solid red arrows. the position where agent will jump is indicated with a dotted red arrow. the four arrows around an agent indicate that it will move in a random position inside its own location. in b and d, we show the temporal network formation mechanism. agents' radii of interaction are represented with solid circles, and undirected links are represented with solid blue lines upon the mobility model, we construct the network of contacts, which is the means through which the disease is transmitted. in this vein, agents create undirected temporal links based on proximity with other agents. specifically, agent i ∈ v contacts all other agents located within a circle of radius σ i centered in its current position (x i (t), y i (t)). we assume that agents have heterogeneously distributed radii, extracted from a power law distribution with probability density function an undirected temporal link between two agents i and j is created when the euclidean distance at time t between the position of agent i, (x i (t), y i (t)), and the position of agent j, x j (t), y j (t) , is less than or equal to the maximum of the two radii σ i and σ j , that is, figure b and d show two consecutive instances of the network formation process. toward modeling of epidemics in urban environments, our model allows agents inside a location to interact with agents outside the location, see, for example, agents and in fig. a and b. the intricacy of the motion patterns and the nonsmooth process for generating the network of contacts hinder the analytical tractability of the model in its general formulation. however, for some cases it is possible to establish analytical insight on some model features. in appendix b, we analyze the system in the two specific cases of: i) a free space without any location (l = ), and ii) when the law of motion of the agents outside their base locations is deterministic (α = ) and the locations are uniformly distributed in the plane. in these two cases it is possible to apply a meanfield approach in the limit of large systems (n → ∞) to analytically study the number of connections generated by the agents, which represent potential paths of infection throughout the population. therein, numerical simulations for large systems are provided to validate theoretical findings. the general case of a core-periphery structure and stochasticity in the motion out of the location is treated only through numerical simulations, in which we record all the interactions and use their time-evolution over sufficiently long time-windows (t , where t is the duration of the observation) to study key topological features (average degree and clustering coefficient). in fig. a , we consider the case without locations. our numerical results are consistent with analytical predictions in appendix b, which are exact in the thermodynamic limit of large systems n → ∞. specifically, the expected degree of agent i is equal to so that agents with a larger radius of interaction have a greater average degree. note that when the agent radius is close to the minimum, that is, σ i ≈ σ min , eq. ( ) is dominated by the second summand, while when the radius is close to the maximum, that is, σ i ≈ σ max , the right-hand side of eq. ( ) scales with σ i . equation highlights a nontrivial relationship between the expected degree of an agent and its radius of interaction, which is due to the links passively received by the agent when it is located within the radii of interaction of other agents. this relationship is different from the case of directed interactions analyzed in huang et al. ( ) ; peng et al. ( ) , where e[ k i ] is proportional to σ i . in fig. b , we examine the case of multiple locations uniformly distributed in the plane. based on the theoretical derivations in appendix b, we obtain the following expression for the expected number of interactions that agent i establishes in its own location in the thermodynamic limit of large systems n → ∞: we observe that e k in,i is inversely proportional to the square of the radius of location β(i) , that is, β(i) . in fig. b , we multiply the numerical estimation of each agents' average degree by the square of the radius of the corresponding location, to allow a graphical representation of the comparison between numerical estimations and analytical predictions. numerical results in finite-size systems are in close agreement with analytical predictions of eq. ( ), which are exact in the limit of large systems. in order to offer insight into the influence that a core-periphery structure has on the agents' average degree, we analyze three different scenarios. first, we study the case in ( ) provides the expected degree, which is numerically estimated by tracking the corresponding agent in time. numerical results are presented using different colors and markers, corresponding to each of the locations (numerical findings share a common trend, which is well captured by the theory). in the simulations, we use the following parameters: l = , d = , min = , , max = , , σ min = , σ max = , p = . , and α = . agents are initially inside their base location and interactions are recorded after steps to allow agents to reach a steady-state configuration. other parameter values are n = , , v = , c = · − , ω = . , γ = . , and t = , which agents are strongly tied to their base location, such that they have low probability of jumping outside their base location (small p) and low probability of deviating from the shortest path to return to the base location, when they are outside (small randomness α), in fig. a . second, we examine the case in which the probability of jumping outside their base location and the agents' randomness in the random walk are intermediate, in fig. b . finally, we investigate the case in which agents tend to spend most of their time outside their base location (large p and α), in fig. c . as expected from the formulation of the model, we determine that agents with larger radii of interaction tend to have larger average degrees. also, agents with larger radii of interaction are more likely to contact agents outside of their base location, thereby leading to lower clustering coefficients c (saramäki et al. ), which is a measure of the agents' tendency to form clusters. the results of our simulations are reported in figs. d-f. during the evolution of an epidemic process, agents with large radii might act as "superspreaders (lloyd- smith et al. ) , " which are known to have a key role on the disease spreading, by creating many connections and infecting agents from different locations. less expected are the relationships between agents' radii of interaction and their base location, and between agents' clustering coefficients and their radii of interaction. among the agents with a small radius of interaction, the agents that are assigned to central locations have a larger average degree than those that are assigned to peripheral locations. this result is independent of the time spent outside their base location (that is, independent of p and α). interestingly, the same argument does not apply when agents have a fig. influence of the location radius on the agents' average degree a-c and clustering coefficient d-f, for three different parameter settings. average degree and cluster coefficient are numerically estimated by tracking every agent in the system. darker circles represent agents assigned to more peripheral locations, while brighter ones indicates agents belonging to more central locations. we set: a-d p = . and α = , b-e p = . and α = . , and c-f p = . and α = . . agents are initially inside their base location and contacts are recorded after steps to allow the agents to reach a steady-state configuration. other parameter values are l = , n = , , d = , min = , max = , , σ min = , σ max = , , v = , c = · − , ω = . , γ = . , and t = , large radius of interaction. in this situation, agents assigned to peripheral locations may have a larger degree than agents assigned to central locations, because their high radius of interaction allows a multitude of interactions, independent of the position of their base location. in addition, agents assigned to central locations have a lower clustering coefficient than agents assigned to peripheral locations. this is because the former group interacts with more agents and creates less tight clusters than the latter group that is assigned to peripheral locations. further, we comment that time spent outside the base location (regulated by p and α) is inversely proportional to the dispersion of the agents' degree. in fact, the largest dispersion in the agents' degree is registered when the probability of jumping outside the base location and the agent's randomness are small, in fig. a . dispersion in the agents' degree decreases as the probability of jumping outside the base location and the agent's randomness increase, in fig. b and in fig. c . a possible explanation for this phenomenon can be based on the following argument. the more the agents spend time inside their base location, the more they remain isolated from other agents in the system. on the contrary, agents' isolation is reduced when they spend more time outside their base location: they are able to interact with all the agents in the system, and, as a consequence, the dispersion in their degree decreases. here, we investigate the spreading of epidemics over spatially-distributed populations that behave according to the presented agent-based model. even though the complexity of the mobility mechanism and the presence of a geographical structure hinders the general mathematical treatment of the epidemics, some mathematical insight can be obtained by studying a simplified, one-dimensional version of the model. we start by presenting the one-dimensional simplification, discussing our main analytical results and validating them against numerical simulations. specifically, we focus on the impact of three salient model characteristics on epidemic processes. namely, i) the random exploration of the space governed by the parameter α, ii) the probability of jumping outside the base location p, and iii) the presence of multiple locations. interestingly, when multiple locations are present, the time spent inside the base location does not play an important role in the evolution of the contagion process. then, we consider the two-dimensional agent-based model and explore the effect of the same three salient model characteristics. we determine that results are qualitatively equivalent to those obtained in the one-dimensional case. we continue our numerical campaign on the two-dimensional agent-based model by studying whether the disease spreading is influenced by the presence of agents with larger radii in specific regions of the core-periphery structure. to this end, we study the presence of agents with greater radius of interaction in either the more central or more peripheral locations, thereby discovering that central locations are important for sustaining the overall diffusion. finally, we analyze the outcome of vaccination strategies, finding that the highest beneficial effect for the entire population is registered when the vaccination of agents in central locations is prioritized. we consider an infectious disease with the possibility of re-infection (sis model) or immunization (sir model), after the contraction of the infection. in the sis model, agents can be either susceptible to the disease or infected (keeling and rohani ) . two mechanisms characterize the epidemic dynamics: infection propagation and recovery process. the former occurs when an infected agent contacts a susceptible one, who may become infected with a probability λ, independently of the others. the latter consists of the spontaneous transition from the infected state to the susceptible one and occurs with probability μ at each unit time, independently of the others. in the sir model, instead, individuals who recover cannot be infected again and transition from the infected state to a removed state with probability μ per unit time (keeling and rohani ) . in the sis model, we examine the endemic prevalence (that is, the number of active cases in the long-term), which has typically two possible outcomes: either it quickly dies out and tends to zero, or it fluctuates around a quantity greater than zero for a nonnegligible amount of time, denoted by i * . for the sir model, instead, the fraction of infected individuals in the system always goes to zero in the long-run. however, the total fraction of individuals who have been infected may vary, depending on the model parameters. the sir epidemic size, denoted as r ∞ , is defined as the fraction of recovered individuals at the end of the epidemic process. here, we propose a one-dimensional model that provides some analytical intuitions on the influence that the randomness α, the probability of jumping outside the base location p, and the presence of a core-periphery structure have in the evolution of sis and sir epidemic processes. this model simplifies the two-dimensional case study by constraining agents to move in a discrete, infinitely long, one-dimensional lattice with periodic boundary conditions (that is, a ring). the l locations occupy consecutive positions on the lattice (labeled from to l), and a fixed number of n = n/l agents belong to each one, as their base location. to generate a contact, agents should occupy the same position along the lattice. agents belong to a unique base location in the lattice, which they may leave with probability p. we use a geometric distribution (chung and zhong ) to describe the agents' law of motion, that is, the probability of jumping at a distance d from the base location is equal to where c ∈ ( , ) is a constant parameter that governs the decay rate, similar to eq. ( ). once outside their base location, agents move toward their base location by making one step toward it, similar to the two-dimensional model with α = . a schematic representation of the one-dimensional model is provided in fig. . we remark that this one-dimensional model maintains some key features of the original two-dimensional agent-based model, that is: i) the presence of closely-spaced base locations, ii) a stochastic mechanism that governs the probability of jumping outside the base location, and iii) a gravity law that biases the agents to jump close to the base location according to an exponential distribution. a key feature that is not captured by this simplified model is the heterogeneity in the locations' density and agents' radii of interaction, which are numerically investigated in the two-dimensional model. we start our analysis by reporting the probability that a generic agent i ∈ v is inside location , which is explicitly derived in appendix c, similarly, the probability that a generic agent is in a position that is not occupied by any location and at a distance d from the closest location is computed in appendix c as where we assume that the closest location is = . by a simple change of variables, we can write an equivalent expression when the closest location is = l. in the sir and sis processes, the disease propagates from infected agents to susceptible ones occupying the same position of the one-dimensional lattice. we define as s(t), i(t), and (for the sir model only) r(t) the fractions of susceptible, infected, and recovered agents at time t, respectively. for large-scale systems, we can compute the fraction of susceptible, infected, and recovered agents along the lattice by using the law of large numbers (chung and zhong ) . in the thermodynamic limit of large systems n → ∞, the fraction of susceptible, infected, and recovered agents inside location is s (t) = q s(t), i (t) = q i(t), and r (t) = q r(t), respectively. similarly, the fraction of susceptible, infected, and recovered agents at a distance d from the closest location is s out,d (t) = q out,d s(t), i out,d (t) = q out,d i(t), and r out,d (t) = q out,d r(t), respectively. in the thermodynamic limit of large systems n → ∞, the evolution of the fraction of infected agents at time t + is determined by the following equation: where (t) is the contagion probability of an agent inside its base location at time t, that is and out,d (t) is the contagion probability of an agent at distance d from the closest location at time t, that is, the derivation of these expressions is reported in appendix c. the evolution of the fraction of infected agents in eq. ( ) depends on four terms: i) the fraction of infected at time t, ii) the fraction of newly recovered, iii) the fraction of newly infected in any location, and iv) the fraction of newly infected outside all the locations. the evolution of the sis model is fully determined by eq. ( ), since s(t) = − i(t). for the sir model, instead, eq. ( ) should be coupled with the following equation, which describes the evolution of the fraction of recovered agents, and with the conservation constraint s(t) = − r(t) − i(t). the evolution of the fraction of recovered agents only depends on the fraction of recovered at time t and the fraction of newly recovered. in order to gain qualitative insight into the behavior of the sis and sir epidemic processes described by eqs. ( ) and ( ), we compute the epidemic threshold of both processes by studying the stability of the disease-free equilibrium in eq. ( ). we linearize eq. ( ) and expand the expressions for the contagion probabilities in eqs. ( ) and ( ) about the disease-free equilibrium i * = , obtaining the epidemic threshold is computed by imposing i(t + ) ≤ i(t) in eq. ( ), obtaining in the case of one location, l = , the threshold in eq. ( ) reduces to where the last equality is obtained by substituting the explicit terms for q and q out,d from eqs. ( ) and ( ), respectively, and computing the sum of the obtained series. from inspection of eq. ( ), we observe that increasing the probability of jumping outside the location, p, contributes to increasing the epidemic threshold and thus lowers the endemic prevalence and epidemic size. when many locations are present, that is, l → ∞, the second term at the denominator yields a marginal contribution to the epidemic threshold in eq. ( ), so that, we observe that the epidemic threshold is now independent from any choice of the probability of jumping outside the location, p. we conclude the analysis of the one-dimensional model by numerically studying the effect of the agents' randomness α and of the probability of jumping outside the location p on the sis endemic prevalence and the sir epidemic size. these numerical simulations extend our analytical predictions, which are limited to the case α = . we consider two scenarios, one with l = locations, presented in fig. , and the other with l = locations, illustrated in fig. . our simulations suggest that increasing the agents' randomness α hinders the diffusion of both sis and sir epidemic processes. when only one location is present, increasing the probability of jumping outside the location (that is, shortening the time spent inside the base location) hinders both sis and sir epidemic processes. interestingly, when multiple locations are present, increasing p does not impact the evolution of the epidemic processes. our numerics for α = in figs. b,d and b ,d indicate the potential use of the analytical expressions in eqs. ( ) and ( ) for systems of finite size, with n = , agents. we consider the two-dimensional agent-based model and numerically study the influence of the randomness α, the probability of jumping outside the base location p, and the presence of a core-periphery structure on the evolution of sis and sir epidemic processes. we start our analysis by exploring the case of a space containing one location, that is, l = , fig. influence of the agents' randomness, α, and probability of jumping outside the location, p, on the sis endemic prevalence, a-b, and sir epidemic size, c-d. theoretical values of the sis endemic prevalence, b, and sir epidemic size, d, are computed by evaluating the steady state in eqs. ( ) and ( ), respectively. curves represent the median of independent simulations; % confidence bands are displayed in gray. agents are initially inside their base location and the infection starts after steps to allow the agents to reach a steady-state configuration. the fraction of randomly chosen initial infected agents is . . other parameter values are l = , n = , , d = , , r = . , c = . , λ = . , and μ = . which is the base for all the agents. agents can be either inside or outside their base location. their motion is constrained by the boundary of the location when they are inside it, while it is governed by the parameters α or p when they are outside their base location. our results reveal that increasing either α or p reduces the impact of the epidemic disease, both in the case of possible reinfection (sis), as shown in fig. a , and in the case of immunization after recovery (sir), as illustrated in fig. b . specifically, in the sis process, the endemic prevalence, i * , is high when α and p are low because agents spend more time inside the location, which is the densest region of the entire space, thus favoring interactions between agents. on the contrary, when agents spend more time outside the location (by increasing either α or p ), they explore a less dense region of the space and interactions become more sporadic. as a result, the likelihood that the disease spreads is lower. from our numerical simulations, we observe that there is a threshold for α (for α close toᾱ = . ), beyond which the disease spreading is halted. simulations with different values of the parameters show a similar behavior, with varying values of the threshold α. hence, in the sis dynamics, the disease is not able to spread and the endemic prevalence tends to zero, as shown in fig. a ; a similar behavior is observed for the sir process. similar results are obtained for the one-dimensional lattice, as illustrated in fig. . fig. influence of the agents' randomness, α, and probability of jumping outside the location, p, on the sis endemic prevalence, a-b, and sir epidemic size, c-d. theoretical values of the sis endemic prevalence, b, and sir epidemic size, d, are computed by evaluating the steady state in eqs. ( ) and ( ), respectively. curves represent the median of independent simulations; % confidence bands are displayed in gray. agents are initially inside their base location and the infection starts after steps to allow the agents to reach a steady-state configuration. the fraction of randomly chosen initial infected is . . other parameter values are l = , n = , , d = , , r = . , c = . , λ = . , and μ = . next, we consider the case in which multiple locations are present, forming a coreperiphery structure, as described in appendix a and illustrated in fig. b . agents that exit their base location are likely to jump inside another location and interact with other agents occupying a different portion of the urban environment. we investigate a scenario with l = locations, as illustrated in fig. . our numerical results suggest that increasing the agent's randomness, α, still reduces the endemic prevalence (in the sis model) and the epidemic size (in the sir model), i * and r ∞ , similar to the case of a single location. numerical results in fig. a and c, however, seem to display a nonmonotonic behavior of the fraction of population affected by the disease, whereby small values of α may favor the epidemic outbreak instead of hindering its inception. we record the existence of a threshold for α (in our simulations, this is close to . ) at which a sharp transitions takes place for both the endemic prevalence (in the sis model) and the epidemic size (in the sir model). according to eq. ( ), by increasing α, agents' randomness is increased and, as a consequence, agents tend to explore a larger portion of the urban environment and to occupy peripheral locations with a lower density of agents. hence, they become are less likely to interact with each other and support disease spreading. fig. influence of the agents' randomness, α, and the probability of jumping outside the location, p, on the endemic prevalence of the sis model, a-b, and the epidemic size of the sir model, c-d. curves represent the median of independent simulations; % confidence bands are displayed in gray. agents are initially inside their base location and the infection starts after steps to allow the agents to reach a steady-state configuration. the fraction of randomly chosen initial infected is . . other parameter values are l = , n = , , d = , min = , max = , , σ min = , σ max = , , v = , c = · − , ω = . , γ = . , λ = . , and μ = . surprisingly, we observe that the probability of jumping outside the base location, p, seems to have a negligible effect on the outcome of the sis and sir disease processes, similar to predictions from the one-dimensional simplified version of the model in eq. ( ) and fig. . a reason for this phenomenon may be found in the following intuition. the core-periphery structure analyzed in our work, illustrated in fig. , allows two contrasting effect to simultaneously occurs. on the one hand, agents moving outside the central areas are likely to end in peripheral ones, decreasing the agents' density in the central regions and increasing the density in the peripheral ones. on the other hand, agents moving outside the peripheral areas are likely to end in the central ones, thereby increasing the density in the central regions and decreasing the density in the peripheral ones. overall, these two opposite effects tend to balance each other. here, we study the impact of the correlation between the radius of interaction of agent i, σ i , and the density of its base location, ρ β(i) . we compare the uncorrelated case (analyzed earlier in fig. a and c) , where agents are randomly assigned to locations, with the cases of either positive or negative correlation. in the case of positive correlation, agents with fig. influence of the agents' randomness, α, and the probability of jumping outside the location, p, on the endemic prevalence (sis model), a-b, and the epidemic size (sir model), c-d. curves represent the median of independent simulations; % confidence bands are displayed in gray. agents are initially inside their base location and the infection starts after steps to allow the agents to reach a steady-state configuration. the fraction of randomly chosen initial infected is . . other parameter values are l = , n = , , d = , min = , max = , , σ min = , σ max = , , v = , c = · − , ω = . , γ = . , λ = . , and μ = . larger radius are assigned to denser (and central) locations. in the case of negative correlation, agents with larger radius belong to the less dense (and peripheral) locations. we consider a scenario with l = locations, whose results are illustrated in fig. . both the endemic prevalence, i * , and the epidemic size, r ∞ , are marginally affected by a positive correlation, while they strongly diminish if the radii and density of locations are negatively correlated, as shown in fig. a and b, respectively. in both the positivecorrelated and uncorrelated cases, agents with larger radii occupy the central locations, thereby sustaining the diffusion of the disease. on the other hand, if agents with large radii are relegated to peripheral and sparser areas, it would be more difficult for them to create connections and fuel the diffusion process. finally, we examine the effect of different vaccination strategies applied to our population. specifically, we consider a purely randomized strategy and two targeted vaccination policies. in the three strategies, we assume that a fraction of the population is vaccinated and is thus immune to the disease. in the "random" vaccination mechanism, we vaccinate a fraction of the population, sampled uniformly at random. in the "center" targeted mechanism, we select such a fraction starting from the agents assigned to the most central locations. in the "peripheral" targeted mechanism, we choose such a fraction starting from the agents assigned to the most peripheral locations. fig. impact of different ways of assigning agents to their locations on the endemic prevalence (sis model), a, and the epidemic size (sir model), b. the "uncorrelated" case represents a random assignment. in the "pos. correlated" case, agents with larger radii are assigned to the denser (central) locations, while, in the "neg. correlated" case, agents with larger radii belong to the less dense (peripheral) locations. curves represent the median of independent simulations; % confidence bands are displayed in gray. agents are initially inside their base location and the infection starts after steps to allow the agents to reach a steady-state configuration. the fraction of randomly chosen initial infected is . . other parameter values are l = , n = , , d = , min = , max = , , σ min = , σ max = , , v = , c = · − , ω = . , γ = . , λ = . , and μ = . from fig. , we observe that prioritizing the vaccination of agents assigned to the most central locations has the most beneficial effect for the prevention of the diffusion of the epidemic disease, while the worst strategy targets vaccination to peripheral areas. as detailed in fig. , agents assigned to more central base locations tend to have a larger expected degree than agents assigned to more peripheral locations, thereby potentially acting as "superspreaders" (lloyd-smith et al. ) . also, agents whose base locations are in the center can easily reach all portions of the environment, thereby contacting the majority of the agents. by focusing the vaccination on central areas, the contacts generated by these agents do not contribute to the spread, thereby significantly reducing the contagion. in this paper, we studied a class of agent-based models (frasca et al. ) , in which agents move in a two-dimensional space and interact according to proximity criteria. we extended such class of models by encapsulating a core-periphery structure, typical of urban environments (makse et al. ; de nadai et al. ) , where central areas are more densely populated than peripheral ones. our urban-like environment is partitioned in several closely spaced locations, each of them representing a restricted portion of the space. when agents are inside their base location, they take a random position within the base location at every time-step. when outside, they tend to move back to their base location by following a biased random walk. the contribution of the study is fourfold. first, we analytically and numerically studied the temporal network formation mechanism, demonstrating that heterogeneously distributed radii of interaction in the population generate heterogeneity in the degree distribution of the temporal network of contacts, similar to what is observed in many realworld systems albert et al. ; barrat et al. ). the role fig. effect of different vaccination strategies on the endemic prevalence (sis model), a-b-c, and epidemic size (sir model), d-e-f. the vaccination coverage represents the fraction of immune agents prior to the disease onset. in "random", we select the fraction of agents to vaccinate at random; in "center", we vaccinate first the agents that are assigned to central base locations, while in "peripheral", we prioritize vaccination for agents that belongs to the peripheral agents. we set: a-d p = . and α = . , b-e p = . and α = . , and c-f p = . and α = . . curves represent the median of independent simulations; % confidence bands are displayed in gray. agents are initially inside their base location and the infection starts after steps to allow the agents to reach a steady-state configuration. the fraction of randomly chosen initial infected is . . other parameter values are l = , n = , , d = , min = , max = , , σ min = , σ max = , , v = , c = · − , ω = . , γ = . , λ = . , and μ = . of the interaction radius is also evident in the study of the clustering coefficient, whereby we found that agents' with larger degree have a lower clustering coefficient. second, we investigated the role of the urban-like environment on the spread of epidemic outbreaks. specifically, we considered epidemic prevalence in the susceptibleinfected-susceptible (sis) model and epidemic size in the susceptible-infectedrecovered (sir) model. we found that both these quantities, which measure the fraction of the system that is affected by the disease, are lowered by increasing the randomness of the agents' law of motion. in fact, increasing agents' randomness improves the chance that agents randomly explore peripheral urban areas, where less agents are present and less contacts are thus created. a lower number of interactions hinders the contagion process. interestingly, we discovered that the endemic prevalence and epidemic size have nontrivial relationships with the probability of jumping outside the base location. when the entire urban environment is modeled as a unique location, larger probabilities of jumping outside hinder the epidemic diffusion. in fact, inside the location the density of agents is higher than outside it. as a consequence, interactions between agents are rare, slowing down the disease spread. instead, when multiple locations are arranged in a core-periphery structure, our numerical results suggest that epidemic prevalence and size are independent of the probability of jumping outside the base location. a possible explanation for this phenomenon might be that, when agents in central locations jump outside them, they are likely to end in peripheral locations, diminishing the fraction of agents in central areas. this event is compensated by agents from peripheral locations that jump in central ones. our numerical results are in agreement with the theoretical findings in the simplified, one-dimensional, version of our agent-based model. third, we found that central locations play a key role on the diffusion of epidemic diseases. in particular, we studied the influence of the correlation between agents' radius and locations' density. when these quantities are negatively correlated, agents with larger radius belong to less dense (peripheral) locations, while when positively correlated, agents with larger radius belong to denser (central) locations. the endemic prevalence (in the sis model) and the epidemic size (in the sir model) are only marginally favored by the presence of many agents with large radius in the more central locations (positive correlation), while the diffusion of the epidemic is hindered if central locations are mostly assigned to agents with small radius of interaction (negative correlation). finally, we studied the effect of targeted vaccination strategies. we found that the vaccination of agents that belong to central locations is the most beneficial approach for the entire population, leading to the smallest epidemic prevalence. our analysis corroborates our previous observation that central (and more dense) locations are crucial in the diffusion of disease processes. we emphasize that the proposed vaccination strategy can be implemented with information about the system at the mesoscopic level of locations, that is, without any information on the specific properties of single individuals (for instance, their radius of interaction). with information at the individual level, the proposed policy may be improved by combining knowledge about locations and radii of interaction prioritizing vaccination of central agents with large radius of interaction, which acts as "superspreaders. " a main limitation of our work resides in the assumption that each agent belongs to a unique location, while the remaining urban area, occupied by other locations, is only seldom explored. a more realistic approach could consider agents that may be assigned to multiple locations. our theoretical study of the one-dimensional case provides insight into some aspects of epidemic processes in urban environments. however, a general mathematical theory is missing. we believe that our preliminary results constitute a starting point for performing a more general theoretical analysis of the two-dimensional model. furthermore, variations of the proposed model can be easily generated. for instance, the gravity law in our model could be replaced by other laws, such as, the well-established radiation law (simini et al. ) or the one recently proposed in (schläpfer et al. ) . overall, our work determines that central urban areas are critical in the diffusion of epidemic diseases within a city, being the crossroad of most of the urban population, and thus should be carefully included into mathematical models of epidemic outbreaks. by vaccinating individuals in central urban areas, we can halt the overall contagion better than randomly distributing limited vaccination supplies. our proposed vaccination strategy may offer practitioners and epidemiologists general guidelines for emergency situations, complementing other strategies (braha and bar-yam ; génois et al. ) toward effective containment measures and herd immunity (fine ) within urban environments. when the system is in its steady state, the expected number of other agents within a distance σ i from agent i is proportional to the ratio between the area of a circle of radius σ i and the whole planar space. hence, the expected number of interactions created by agent i is equal to further, agent i can form undirected interactions with other agents if it is located within their radii of interaction. to avoid double counting and exclude connections that are already counted in eq. ( ), the radius of agent j should be greater than the one of agent i, which should be at a distance greater than σ i but smaller than σ j . when the system reaches the steady state, the probability of such an event is π σ j − σ i /d . let us introduce the set c i of agents with radius of interaction greater than σ i and let us define σ i = |c i | − j∈c i σ j as their average square radius. the expected number of connections formed by agent i with other agents beyond those included in eq. ( ) is by summing eqs. ( ) and ( ), we conclude that the average number of agents that an agent interacts with in a unit time, termed its average degree k i , is equal to the computation of the quantities |c i | and σ i , can be performed in the limit of large systems n → ∞, by means of the strong law of large numbers (chung and zhong ) . we start by explicitly writing the probability density function g(σ ) of the power law distribution of the radii of interaction with cutoffs σ ∈ [σ min , σ max ], as where ω is the exponent. from the expression of g(σ ), we compute |c i | using the strong law of large numbers (chung and zhong ) , which ensures that almost surely we define the conditional probability density function where the first equality holds due to scale invariance of the power law distribution, and then explicit computation is performed using the expression of g(σ ). using again the strong law of large numbers (chung and zhong ) and eq. ( ), we compute σ i as almost surely. finally, by substituting eqs. ( ) and ( ) in eq. ( ), the expected degree of agent i in the limit of large systems, n → ∞, almost surely reads neglecting the terms on smaller order in n. now, we consider the limit case in which agents move straight toward their base location, that is, α = , and we assume that locations are uniformly distributed in the planar space. we consider the generic agent i that belongs to location β(i) . since a β(i) d , we use the approximation d → ∞. the probability for this agent to be in its base location, q in , can be computed by introducing the following partition of the planar space, for any h ∈ z ≥ . note that c (i) h is the region of the plane from which agent i reaches its base location β(i) in exactly h time-steps. consequently, when h = agents are inside their base location, that is, c using the mapping z (i) , for each agent i ∈ v, we define the stochastic process z i (t) : z ≥ −→ z ≥ as z i (t) := z (i) (x i (t), y i (t)). since α = , when an agent is outside its base location, then its law of motion is purely deterministic and it moves in the direction of the location. therefore, if z i (t) = h = , then, z i (t + ) = h − . if z i (t) = , the agent is inside its base location, from which it exits only through a jump, which is statistically characterized by eq. ( ). hence, with probability − p the process z i (t) remains in state at the following time-step. else, if a jump occurs, the process z i evolves to state h with probability equal to the transition probabilities of z i (t) depend only on the state h in which the process is and on the model parameters. the process z i (t) is a markov chain, whose structure is illustrated in fig. and whose transition matrix is we observe that, if p > , then the markov chain is ergodic and it converges to its unique stationary distribution π, which can be computed as the left eigenvector of m associated with the eigenvalue (levin et al. ). when the system has reached its steady state, the probability for each agent to be inside its base location, q in = π , that is derived from the left eigenvalue equation for m in eq. ( ) (with unitary eigenvalue), that is, from eq. ( ), the expression of q h in eq. ( ), and using that ∞ h= π h = , we derive q in = π = e cv − ( + p)e cv − . ( ) when the system reaches its stationary state, the number of agents in location is equal to the sum of two contributions. the first one consists of agents whose base location is and are in that location, that is, on average, nq in . the second one is due to agents whose base location is not , but are in . the second contribution is relatively small since locations are placed randomly in the entire space d × d, and we discard it when the system is large. the steady-state density in location can be approximated by considering only the agents assigned to it. hence, the expected number of connections of agent i within its base location is approximated by where c i, and σ i, are the set of agents with radius greater than σ i in location and their average square radius, respectively. assuming the distribution of the radii of interaction to be independent of the agents' base locations, then, c i, = n− n c i and σ i, = σ i . under this assumption, in the limit of large systems, n → ∞, combining eqs. ( ) and ( ) into eq. ( ), we obtain e k in,i ≈ q in (n − ) when a core-periphery structure is present, as in fig. , locations are not uniformly distributed in space and often are close to each other. for instance, a central location is surrounded by other locations and interactions generated by agents whose base location is not cannot be neglected. this case is discussed in the main text by means of numerical simulations. where the first term refers to the probability that the agent is in its base location and its base location is , while the second and third terms correspond to the probability that the agent belongs to another base location and it occupies location . similarly, we compute the probability that agents are in a position not occupied by any location and at a distance d from the closest location as where we assume that the closest location is = . through a simple change of variables, we can write an equivalent expression when the closest location is = l. substituting expressions in eqs. ( ) and ( ) in eqs. ( ) and ( ) yields the two expressions reported in the main text, that is, eqs. ( ) and ( ). we now compute the probability that an agent becomes infected at time t. we first consider the probability of not being infected. in location , such a probability is equal to − λi (t) for each contact. on average, an agent contacts q n other agents, the probability of not being infected in location is equal to (t) = ( − λi (t)) q n . similarly, the probability of not being infected at a distance d from the closest location is equal to out,d = − λi out,d (t) q out,d n . thus, the contagion probability of an agent inside its base location is the complement of (t), that is, internet: diameter of the world-wide web multi-scale spatio-temporal analysis of human mobility urbanisation and infectious diseases in a globalised world emergence of scaling in random networks the architecture of complex weighted networks minimum commuting distance as a spatial characteristic in a non-monocentric urban system: the case of flanders from centrality to temporary fame: dynamic centrality in complex networks daily time spent indoors in german homes-baseline data for the assessment of indoor exposure of german occupants the hidden geometry of complex, network-driven contagion phenomena effects of motion on epidemic spreading local and global epidemic outbreaks in populations moving in inhomogeneous environments solving circle packing problems by global optimization: numerical results and industrial applications modelling transmission and control of the covid- pandemic in australia tthe effect of travel restrictions on the spread of the novel coronavirus (covid- ) outbreak a course in probability theory urbanization in developing countries: current trends, future projections, and key challenges for sustainability epidemic modeling in metapopulation systems with heterogeneous coupling pattern: theory and simulations commentary: middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group the death and life of great italian cities: a mobile phone data perspective mitigation measures for pandemic influenza in italy: an individual based model considering different scenarios circle packing for origami design is hard modeling disease outbreaks in realistic urban social networks impact of non-pharmaceutical interventions (npis) to reduce covid mortality and healthcare demand herd immunity: history, theory, practice a packing problem with applications to lettering of maps dynamical network model of infective mobile agents synchronization of moving chaotic agents data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers agent-based models epidemic spreading in random walkers with heterogeneous interaction radius modeling infectious diseases in humans and animals the minimum circuity frontier and the journey to work contagion dynamics in time-varying metapopulation networks uncovering patterns of inter-urban trip and spatial interaction from social media check-in data superspreading and the effect of individual variation on disease emergence modeling urban growth patterns with correlated percolation exact solution of the two-dimensional finite bin packing problem assessing the interplay between human mobility and mosquito borne diseases in urban environments urban-rural differences in daily time-activity patterns, occupational activity and housing characteristics urban-like environment epidemic spreading in temporal and adaptive networks with static backbone epidemic spreading in modular time-varying networks how urbanization affects the epidemiology of emerging infectious diseases epidemic processes in complex networks an sis epidemic model with vaccination in a dynamical contact network of mobile individuals with heterogeneous spatial constraints activity driven modeling of time varying networks urbanization. our world in data generalizations of the clustering coefficient to weighted complex networks the transition to a predominantly urban world and its underpinnings indoor time-microenvironment-activity patterns in seven regions of europe a universal model for mobility and migration patterns responding to global infectious disease outbreaks: lessons from sars on the role of risk perception, communication and management new approaches to circle packing in a square: with program codes the spread of dengue in an endemic urban milieu -the case of delhi diffusion-limited aggregation, a kinetic critical phenomenon traffic-driven epidemic spreading on networks of mobile agents epidemic spreading in communities with mobile agents continuous-time discrete-distribution theory for activity-driven networks modeling memory effects in activity-driven networks publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations received: april accepted: august appendix a: algorithm to generate a core-periphery structure from a practical point of view, packing all convex regions l (locations) in the square space d×d is a nondeterministic polynomial-time hardness (np-hard) problem (martello and vigo ; demaine et al. ) , often requiring to find approximate methods (formann and wagner ; szabó et al. ; castillo et al. ) . in our study, we aim at reproducing the core-periphery structure present in real urban areas, as shown in fig. a , while minimizing the space between locations. agents that exit from their base location occupy nearby locations, thereby interacting with agents that are assigned to different regions of the urban area. we developed a heuristic algorithm that unfolds according to the following steps. place the center of the denser location, x c , y c , in the center of the square space, x c , y c = (d/ , d/ ). initialize ← , σ in ← , and σ out ← σ = l = σ /l. create a circular crown centered in (d/ , d/ ) with internal radius σ in and external radius σ out . randomly place the center of location + in the crown and check for overlaps. i) if location + does not overlap with other locations, then the location is placed. increase the index by , that is, ← + . if = l, then terminate the algorithm. otherwise, resume it to step . ii) if an overlap occurs, then repeat the current assignment in . after a number of consecutive failed attempts (we set this limit to ), stop the current iteration and move to step . setσ in ← σ out and σ out ← σ out + σ , and resume the algorithm to step . here, we detail the analytical derivations of eqs. ( ) and ( ). to this end, we analyze the formation of the temporal network of interactions in the two specific cases of a free space, without any location (l = ); and when the law of motion of the agents outside their base locations is deterministic (α = ) and the locations are uniformly distributed in the plane. we begin our analysis by considering the case of a free space, that is, l = , where agents perform simple random walks with constant velocity equal to v in the plane. in this scenario, eq.( ) should be intended without the component associated with the location i (t) and with α = . according to eq. ( ), at time t, agent i creates undirected interactions with other agents if their euclidean distance is less than or equal to the maximum of their radii of interaction. in practice, the expected number of interactions of agent i, e [k i ], is equal to the sum of two contributions: the expected number of interactions that are generated by agent i with agents that are in its radius of interaction, denoted as e k + i , and the expected number of interactions that are generated by other agents with i, denoted as e k − i . here, we compute the contagion probability of an agent inside its base location at time t, (t), and the contagion probability of an agent at distance d from the closest location at time t, out,d (t). we start our analysis by computing the probability that agents are in their base location, denoted by ψ in , or in a position at a distance d from it, ψ d , when the system is at steady state. for p > , the system is ergodic and we can compute ψ in and ψ d at steady state (levin et al. ). similar to appendix b, from the steady-state equation we derive the following recursive system of equations:where the factor is because there are two positions at a distance d from any location ∈ l, as in fig. . from eq. ( ), the expression of p jump (d) in eq. ( ), and using that ψ in + ∞ d= ψ d = , we deriveandgiven that each agent is randomly assigned to one of the l locations, the probability that a generic agent i ∈ v is inside location is equal to likewise, the contagion probability when the agent is at a distance d from the closest location is the complement of out,d (t) , that is, a sample of the algorithms generated is available at (nadini ) . the entire set of algorithms is available upon request. the authors declare that they have no competing interests. key: cord- -crmfwjvf authors: bodova, katarina; kollar, richard title: emerging polynomial growth trends in covid- pandemic data and their reconciliation with compartment based models date: - - journal: nan doi: nan sha: doc_id: cord_uid: crmfwjvf we study the reported data from the covid- pandemic outbreak in january - may in countries. we observe that the time series of active cases in individual countries (the difference of the total number of confirmed infections and the sum of the total number of reported deaths and recovered cases) display a strong agreement with polynomial growth and at a later epidemic stage also with a combined polynomial growth with exponential decay. our results are also formulated in terms of compartment type mathematical models of epidemics. within these models the universal scaling characterizing the observed regime in an advanced epidemic stage can be interpreted as an algebraic decay of the relative reproduction number $r_ $ as $t_m/t$, where $t_m$ is a constant and $t$ is the duration of the epidemic outbreak. we show how our findings can be applied to improve predictions of the reported pandemic data and estimate some epidemic parameters. note that although the model shows a good agreement with the reported data we do not make any claims about the real size of the pandemics as the relation of the observed reported data to the total number of infected in the population is still unknown. the coronavirus disease (covid- ) pandemic caused by the severe acute respiratory syndrome coronavirus (sars-cov- ) is accompanied by an unprecedented challenge for its mathematical modeling. most of the difficulties stem in an extremely high level of uncertainty in available data: • methodology for reporting of all types of data (number of confirmed positive cases, hospitalizations, recovered individuals, and even confirmed deaths) is not systematic and varies from one country to another [ ] . • different types of tests and test protocols used for covid- detection have their own limitations both in sensitivity and specificity; testing procedures differ in methodology of a sample selection and in a testing sample size in different countries. in addition, different types of tests detect different phases of individual infections and their results differ based on the clinical stage of the infection [ ] . • a relation of the reported data to the real (unobserved) number of infected in a population is not understood and the estimates for the ratio of total cases in population to the number of observed cases vary even in the order of magnitude [ , , , , , , , ] . the extent of uncertainty in data prohibits designing and validating mathematical models that would provide verifiable accurate description of dynamics of the pandemic. that in turn has serious consequences on pandemic spread control and efficient epidemiological decision making. particular difficulty for mathematical modeling is that traditional compartment type models, often referred to as "sir models" with susceptible -infected -recovered compartments, that succeeded in an accurate description of previous epidemics, produce scenarios that do not match closely the observed data [ , ] and fail to capture significant trends observed in data, see also [ , ] for other early models. while it is still too early to call whether these models fail to capture the real dynamics of pandemics (as the observed data may not completely correspond to number of infected in the whole population) there is an urgent need to understand the available data and its relation to the sir models. we consider very important to emphasize the limited scope of our analysis as in the current literature some of these limitations are blurred and may eventually lead to misinterpretations. to overcome the difficulties with mathematical modeling and a high degree of uncertainty in data we propose a simple descriptive model that captures the dynamics of the observed data rather than a detailed mechanistic model of the pandemic dynamics in the whole population. therefore our results need to be interpreted with high precaution. we only present a systematic mathematical description of the observed reported data. we provide neither an explanation of the pandemics spreading, nor any claim about the real scope of the pandemics. however, if one conjectures that the observed reported data capture the extent of the pandemic size in the real population (e.g., if data systematically report a fixed percentage of the total infected population), then our results can be used to identify the nature of the emerging trends in pandemics spreading in individual countries. furthermore, we only model the first wave of the pandemic and we do not make any predictions about the extent and timing of next waves, although we discuss how our method can be used to study effect of next epidemic waves. we are aware that we ignore multiple key factors that may influence the relation between the observed data and the real epidemic size (the level of detection of infected by testing, delays in test reporting, variation in clinical aspects of the infection, etc.). note that in the text we use the term covid- for all reported infections based on a positive pcr test and thus we do not distinguish between the presence of the virus in the respiratory tract of an infected and the disease it causes. through the whole text we model the time series of the total number of active cases that is the difference of the reported total number of confirmed infections and the sum of the total number of reported deaths and recovered cases. we have identified a universal trend in the data reported by individual countries that helps to improve predictions for the final observed epidemic size and the related time scales. the universal trend is observed despite inhomogeneities and uncertainties in the data time series and various types and levels of mitigation policies applied. there is a transition in time series from an exponential growth (eg) to a polynomial growth (pg) and at a later stage to a combined polynomial growth with an exponential decay (pged) in the number of active cases across almost all countries world-wide (with a sufficient size of current data). our choice of the form of the trend in data is motivated by the theoretical results of [ , , , ] and by the data analysis performed in [ , ] that relate the observed transition to structural changes in the population contact networks. we analyze the reported data and estimate the parameters for individual countries. this allows us to categorize the countries into groups according to their advance through the first wave of the pandemics, and also to detect a possible divergence into a possible upcoming second wave in some countries. we rank a selected group of countries according to one of model parameters that captures their ability to identify, test, and isolate infected individuals from the rest of the population and thus prevent further spreading. we also provide a reconciliation of the pged regime with the sir model that allows to build pged directly into existing sir models despite the fact that pged is inconsistent with the sir models (in their traditional form). the pged regime corresponds to an explicit algebraic decay of the relative reproduction number r in time t in the form r ≈ t m /t with a constant t m . the trend is in general agreement with the current estimates of the evolution of r in many individual countries [ ] ; see also [ ] for a data supporting the observed dependence. a model based on pged regime can also aid forecasting of the future dynamics of the observed data including an analysis of eventual next epidemic waves. the keystones of mathematical modeling of epidemic spread are the compartment based models also referred to as sir models introduced by kermack and mckendrick [ , , ] , see also [ , ] for an extensive literature overview. we only review basic properties of the sir model relevant for our further analysis. the basic sir model describes the dynamics of susceptible (s = s(t)), infected (i = i(t)), and recovered (r = r(t)) populations at time t by the coupled system of ordinary differential equations here n is the total population, β > is the infection rate, and γ > is the removal rate of infections. the key assumptions behind this mechanistic model are • the population is well mixed, thus likelihood of a contact and transmission of the infection from any infected to any susceptible is the same, • the populations s, i and r are large enough to be well approximated by the real variable instead of integers. the basic model can be readily extended to account for an incubation period (seir model), for age structured and geographically structured populations, etc. also, the assumption of the constant total population can be removed. however, the well mixed population assumption cannot be completely removed unless one considers sir models on networks. in that case a good knowledge of the underlying contact network is necessary to calibrate the model to any type of data. the current extent of epidemic spread in the system is typically measured by the relative reproduction number r = r (t) that is related to the instantaneous rate of growth of the infected population. equation can be written as di/dt = γ(r − )i for thus r = corresponds to the epidemic peak, the tipping point that characterizes the moment when the population of infected individuals is stationary, di/dt = . note that di/dt > for r > and di/dt < for r < (for i > ). for practical reasons related to the covid- outbreak it is useful to rewrite the expression for r as here s = s(t) = s(t)/n is the proportion of the infected in the population at time t. the time scale t inf is associated with the removal of an average individual from the infected population, i.e., the typical length of the period during which the infected individual infects the susceptible population. it is related to γ by t inf = /γ. the infection rate β can be expressed as β = p/t sus , where t sus is the typical time scale associated with occurrence of interactions between susceptible and infected in the population where infection transfer may happen. the total number of transmission contacts between susceptible and infected population is given by si/t sus . finally, p is the probability of infection of a susceptible individual through a single random meeting with an infected. in the sir model the number of active cases decreases due to a decrease of r below . as β and t inf are constant it is achieved through a reduction of the proportion of the susceptible population s below the threshold /(βt inf ). that in general requires a significant decrease of the susceptible population through vaccination, gained natural immunity, infection (herd immunity) or through a long term quarantine of a large fraction of the susceptible population. however, the pandemic size apparently has not reached such a high level yet. thus r needs to be controlled through a decrease in β or t inf . the parameter β is an obvious candidate as strict public mitigation measures and social distancing decrease p and increase t sus . these measures are typically associated with a large economic cost and also parameters of this structural change on the level of the contact network are still unknown and their direct effect on r cannot be accurately quantified. a decrease of the parameter t inf does not require widespread mitigation measures. although ability to decrease t inf solely based on the observation of disease symptoms is limited from below by the length of the incubation period, t inf can be significantly reduced by active contact tracing. until mitigation measures are applied we expect that an epidemic outbreak is governed by the sir model - . that implies an exponential growth (eg) of i = i(t) during the early stages of epidemic. for completeness we present the asymptotic behavior of (s, i, r) in - for t → + . let r = r ( ) and (s( ), i( ), r( )) = (s , i , ). then s(t) ≈ s for t and reduces to di/dt ≈ γ(r − )i, i.e., consequently from and s + i + r = n we obtain that is consistent with . note that eg was not observed in data in multiple countries (e.g., slovakia, lithuania) as these countries introduced mitigation measures at very early stages of the epidemic (zero or only a few confirmed cases of infection). after eg we observe a systematic transition to polynomial growth (pg) in data during early epidemic stages. we support our claim here by fig. that shows the number of active cases in selected countries during the pandemic outbreak (the data source [ ] , reported data from may , ). the figure displays particular illustrative cases, see section for a systematic survey of all observed countries. for each country displayed we show the time series of active cases both on the semilogarithmic and on the double logarithmic scales. on the semilogarithmic plot we detect a divergence from the initial exponential trend while on the double log plot a polynomial growth (represented as a linear trend) is emerging after the initial eg. the basic sir model is not consistent with a systematic approximate polynomial growth in the infected population. otherwise, if i(t) ≈ at p , p > , on a interval t ∈ [t , t ], then di/dt ≈ (p/t)i. consequently from from it follows that however, the approximation on the interval t ∈ [t , t ] of a significant length is inconsistent with the assumption p > . note that modifications and extensions of sir model can eventually agree with the observed polynomial growth phase in the infected population, however, we do not explore this question here. polynomial growth with exponential decay regime (pged) during late epidemic stages in individual countries we systematically observe a transition to a universal scaling form (ansatz) for the number of active cases in all considered countries. in this phase the epidemic wave reaches its peak after which the number of active cases decays. the scaling has the form here a, t g and α are the model parameters. the scaling is a combination of a polynomial growth factor (t/t g ) α with an exponential decay exp(−t/t g ). therefore we refer to as polynomial growth with exponential decay (pged). it was derived for the size of the infected population by vazquez [ ] who used a branching process to describe the epidemic dynamics in a population interacting on a scale-free contact network. ziff & ziff [ ] use the pged scaling on reported covid- data and claim that public measures and social distancing enforced yield a fractal type contact network on which the epidemic transmission is strongly limited by the network topology. we note that the polynomial growth models in the literature discussed in section . can be also adopted to account for the exponential decay factor as well by an inclusion of a constant rate of loss from the infected population (similarly as in [ ] ). we use to match the observed pandemic data, particularly the number of active infection cases as reported in [ ] . note again that no prefactors are used here so the model only describes the dynamics of the reported data. the function i = i(t) in has a maximum at t = t m = αt g where it reaches the value p = a(α/e) −α t − g . note that the inflection points of the function i = i(t) are located at t ± i = (α ± √ α) t g , particularly the time t = t − i plays an important role in the observed epidemic data as it corresponds to a moment at which the growth of the number of active cases reaches its maximum and starts to decrease. in the sir model the ratio of the infected population at the inflection point (during the growth phase) and at the point of maximum is equal to / . however, in this ratio is given by α− √ α α α e √ α . this expression is equal to / for α . = . . for many countries we observe α < . and thus the slowdown of the epidemiological curve (inflection point) occurs at (often at a significantly) smaller fraction of the population than predicted by the sir model. an interpretation of the parameters a, α and t g is not completely straightforward and thus we reparametrize the model by a parameter combinations that correspond to naturally observed quantities. the equation rewritten using the parameters p, t m and α has the form the model parameters in were inferred by nonlinear least squares optimization in matlab c , see tables for the values obtained for individual countries. polynomial and exponential decay factors in motivate to use logarithmically rescaled data (log i(t) or both log i(t) and log t) in optimization. however, we use non-rescaled values instead as the pged trend is present only in later phases of the epidemic, particularly after the implementation of mitigation measures and social distancing (with a delay for its manifestation in the reported data). using non-rescaled data allows us to globally fit the whole data set as the early epidemic data has only a small weight in the optimization due to its relative magnitude unlike in the case of the rescaled data. the fit thus does not require any prior (or fitting) for the time of transition to pged. in general, fitting polynomial growth to data is very sensitive to a choice of the origin of the fitted time series. therefore we have set the data starting point in all considered countries systematically. to eliminate the effect of stochasticity in small data we have disregarded all the data points in time series before the infected population in country reached a set threshold n . for italy we set the threshold to while for all other countries the threshold was normalized -proportionally increased or decreased in agreement with the ratio of population size of the studied country to population of italy (with minimum threshold set to for countries with a small population). to eliminate obvious irregularities in daily reporting we smooth the studied data. the irregularities appear naturally as the testing procedures and protocols impose systematic nonuniformity: populations in large clusters are discovered simultaneously, there is a systematic delay in contacttracing testing, limited testing capacity, batch testing, etc. we use linear smoothing on the increments and decrements of the number of active cases via moving averages through seven days (weights: ( , , , , , , )/ ) that corresponds to three iterations of local averaging of three consecutive days. polynomial growth is not consistent with the sir model for an extented period. however, we consider very instructive and useful to reconcile the sir models with the pged regime as such a reconciliation would allow to build pged directly into the sir type models. therefore we study whether the form (equivalent to ) can solve a sir type model. equation implies we can now compare with and identify thus, the exponential decay e −t/t g term in can be seen as a consequence of the infected removal rate γ = /t g in the sir model. furthermore, the term βs/n corresponds to α/t. finally, we express this dependence in terms of the relative reproduction number r : where t m = αt g = α/γ is a constant, see section for its interpretation as the time of the epidemic peak. therefore pged implies an algebraic decay of r in the sir model. note that this result is in agreement with the argument about the reduction of infection transmissibility in [ ] and also with the r analysis in study [ ] of the impact of non-pharmaceutical interventions in european countries. there the change of the reproduction number r is modeled as an average percentage reduction per specific type of intervention and the analysis is based on a bayesian approach using data on number of infected and number of deaths, which are more reliable. also note that as t → + the relative reproduction number diverges to ∞. this is in an agreement with the model of [ ] where the fat-tail power-law distribution of r in the population has initially an infinite mean that with the epidemic outbreak reduces to finite values. average r further reduces during the epidemic by a gradual elimination of the individuals with a high values of r from the susceptible population -the individuals with high values of r are easily infected and removed from the susceptible class as the first. based on the presented analysis we expect three phases of a single epidemic wave. • initial exponential phase. during the initial phase (with no applied mitigation measures) we expect the exponential growth of the infected population i = i(t) as discussed in section : • polynomial growth phase. after the initial phase we expect a short transient phase during which i = i(t) smoothly transitions from the initial exponential phase to the polynomial growth phase described in section . during this phase • final polynomial growth with exponential decay phase. after an introduction of the mitigation measures and social distancing and a subsequent delay necessary for their appearance in the reported data we expect a transition of i = i(t) from the pg phase to pged phase described in section . during this phase note that is some countries the mitigation measures were applied when the number of infected was low or zero (slovakia, lithuania) and the initial exponential phase was too short to be identified in the data. also note that during the pg phase the function i = i(t) is convex (p > for all observed countries). for such a function to reach its local maximum it must first go through an inflection point. unless the inflection point appears during the transient phase between pg and pged the pg phase must connect to the pged phase before the pged phase reaches its inflection point, i.e., before the time t = t − i . this phenomenon was also observed in data of all surveyed countries and it helps to improve predictions for countries that have not reached the pged phase yet. we conducted a systematic survey of covid- pandemic data ([ ], the last reporting day may , ) for all countries where the time series are sufficiently long to display a consistent trend (in total countries). in each country (together with all its territories) we consider the number of active cases equal to the total number of reported confirmed infections decreased by the sum of reported total number of recovered and deaths. for a characterization of the epidemic progression we use the following phases: the initial exponential phase (eg), the polynomial growth phase (pg), and the polynomial growth with exponential decay phase (pged). the final pged phase has two checkpoints, the inflection point (i) and the epidemic peak-the point of maximum of the active infected population (m), after which the number of infected consistently decreases (d). our results are presented in tables - that summarize the stage of the epidemic in all surveyed countries. if a country reached the pged regime we report the estimated values of the related pged parameters. see section . for the methodology and remarks on exceptions made for some individual countries. we support our results in fig. - that display data from selected countries. the figures show total active cases data time series for countries that are organized by the epidemic phase. for each country presented we show the data on both linear and semilogarithmic plot (countries in eg and pg phase) and also on double logarithmic plot (countries in pged phase). for the countries in the pged regime we also show the best pged fit to the data. country in figure we present two groups of countries in the early stages of the epidemic: the countries in the eg phase (afghanistan, bolivia, colombia, el salvador, india) and countries in the pg phase (argentina, indonesia, qatar, russia, somalia). to demonstrate an evidence of eg and pg, respectively, we compare the recent data with a straight line in the respective plot. . countries in the pged phase close to and past the epidemic peak ( -may- ) selected countries close to and past the epidemic peak are displayed in figure and figures - , respectively. consistent approximate exponential decay in the number of reported active cases in many countries may serve as a sign of a successful strategy against the further spread of the coronavirus. however, due to factors as abatement of the strict mitigation measures, fatigue of following social distancing, or simply due to reintroduction of the virus into the community a further spreading may occur. such a trend that is demonstrated by a sudden slow down of the exponential decay (austria, australia, vietnam) or even a sign of the next epidemic wave (azerbaijan, burkina faso, chile, djibouti, iran, iraq, jordan, kyrgyzstan, lebanon, madagascar) also appears in the reported data, see selected countries in figure . for singapore we only model its second epidemic wave. the parameter t g of pged characterizes the typical time scale of removal of infected individuals, particularly those who would be eventually tested positive for sars-cov- infection by a pcr test, i.e., the time period during which an average infected individual can infect others (in susceptible population). a smaller value of t g correspond to a fast decay after a country reaches its epidemic peak while large values of t g indicate very flat epidemic peaks and thus a very slow gradual decay of the active cases. in practice, t g is influenced by multiple factors, among them ability to identify, test and quarantine positive cases in the population and contact-tracing procedures play a prominent role. while a larger and better selected infection testing sample can significantly decrease t g in countries with a higher degree of epidemic, in countries with a small number of active cases finding and testing a small number of infected in the whole population can be very difficult. therefore contact-tracing and a prevention of an import of infections from other countries can be the key measures to lower the value of t g . see fig. for the graphical display of the sorted values of t g for countries that are at or beyond the epidemic peak. countries classified as undergoing the second wave are not included in the plot as a presence of apparent second wave may be eventually a sign of spurious data. note that the countries that are currently close to their epidemic peak have higher values of t g than countries that are already further in the decay phase (with the exception of jamaica). the data suggest that countries with a small values of t g (mauritius * , iceland, ireland, australia, austria, new zealand) are very efficient in testing and isolation of the individuals who will be tested positive thus preventing them for further spreading of the infection. on the other hand, the data for countries with large values of t g (slovenia, norway, uruguay) do not show an indication of an efficient testing and isolation of infected (or they may not properly report recovered cases data). however, this interpretation needs to be taken with a caution with regard to the note of precaution in section . . particularly, the interpretation of t g in countries marked with * in fig. and tab. - can be influenced by the special adjustment of the time series mentioned in previous sections. the simple pged model, i.e., the universal scaling and nonlinear fitting of the parameters from the data, can be used for as a predictive tool for the number of the reported active cases, particularly in countries in the growth phase. once again keep in mind the note of precaution we formulated in section . . no verifiable connection of the number of active cases to the total number of infected in the population was established so far. therefore all the predictions only concern the reported data. we present a performance of the predictive capabilities of the pged model using the available data for eight selected countries (belgium, belarus, czechia, israel, italy, portugal, switzerland, and us) that are in different epidemic phases and also display a variable accuracy of predictions. testing was performed by comparing the values predicted by the pged model (based on the incomplete data in which we have removed up to data points from the end of the time series) to the withheld data. for each choice of the number of withheld days we have calculated the % confidence interval for the inferred parameters by matlab c function nlpredci.m. particularly, we were interested in the confidence intervals for the location of the epidemic peak and the number of active cases at the peak. predictive power of the model can be visualized by plotting the bounding boxes corresponding to the confidence interval around the inferred location of the epidemic peak. a good model should provide a consistent position of the confidence intervals with smaller boxes indicating a large degree of certainty in the predictions. note that the analysis using the bounding boxes can be considered also a study of a sensitivity of the fit to the data. the countries in fig. are among those past the inflection point, at the epidemic peak or past the peak. in the latter case we withheld sufficiently many data points so that the estimates would be nontrivial. overall, we have found that the short-term prediction (up to - weeks prior to the peak) tends to predict the location and value of the peak relatively well (belgium, portugal, israel, switzerland), however, the confidence intervals may be quite large in case not enough data beyond the inflection point is available (belarus). when the inflection point (and the pged regime) has not been reached yet, the information about an eventual exponential decay is not directly detectable in the data and the location and the value at the peak thus cannot be estimated, see the remark at the end of section for a discussion of a possible prediction improvement for countries in the pg phase, i.e., before they reach the pged regime. we also illustrate two common situations: while for us, italy and portugal the prediction with less information underestimates the severity of the infection, for czechia a forecast overestimates with less data. however, in both cases the fits were changing monotonically with the number of data points included in the analysis. note that some of these trends may be due to variation in testing procedures and protocols. in fig. we also show how % confidence intervals can be calculated for the whole future data trajectory. this is not straightforward as the % confidence intervals for the estimated parameters are not independent. however, using the covariance structure of the inferred parameters it is possible to sample the parameters from the multivariate normal distribution and display the confidence intervals systematically for all times, as shown in the figure. the presented model can be also visualized using the web based tool [ ]. it allows a general public to explore the data for various countries, including validation of the model predictions. using the pged model we have also successfully constructed a prediction on march , , for the reported covid- data in slovakia that estimated a epidemic peak of about active cases in early may and that very closely matched the observed data, see the reference in the national media [ ] . at that point in time the prediction differed by orders of magnitude from the predictions of compartment based models. subsequently, the pged model was incorporated into the main epidemic (sir type) model in slovakia maintained by the analytic unit of the ministry of health and that serves as a reference tool for the government crisis management team decision making during the covid- outbreak in slovakia [ ]. the state of exponential decay of the infected population is often viewed by policy makers as the ultimate goal. however, without reaching the state of herd immunity the epidemiological situation is unstable with respect to secondary infections caused by rare infected individuals, new imported cases, and related superspreading events. we illustrate such a situation in the numerical example in figure . as an example we consider the reported data in austria (over the period march -april ). for simplicity we match the data using the eg regime first (using the sir model with inferred parameters β and γ). the simulation is initialized on march ( infected, recovered, total population approx. . mil.). sir dynamics is applied for the first days reflecting the lack of measures in the early stages of the infection spread (note that the measures reflect in the reported data with a delay). after days we match the rest of the data with pged model (with inferred parameters p , α and t m and a continuous r ; the values are similar to the parameters for austria reported in table ). we then continue pged model until the may ( days after the considered initial date). the number of recovered during pged regime period is calculated from the and the number of susceptible as the complement of infected and recovered in the population. the remaining population of infected individuals is estimated to be on may . in the studied scenario we lift the mitigation measures completely on may . the dynamics then returns back to the standard sir model and undergoes an eg phase. we study the impact of an early detection of the emerging situation (upcoming second wave) and consequent implementation of mitigation measures. we considered three alternatives: mitigation measures fully implemented after , , and weeks (see shaded regions in fig. ). we observe that an early implementation of the mitigation measures dramatically reduces the next epidemic peak. a qualitatively similar progress can be seen in case of imported infections (we add new infected cases at the time of released mitigation measures). the numerical results indicate how essential is to implement mitigation measures as early as possible, which requires efficient tools for an early detection of infected individuals. this example shows how the very simple pged model can be used for analytics of the covid- pandemic in individual countries. reported data on covid- display systematically identifiable regimes -exponential growth, polynomial growth, and polynomial growth with exponential decay. the observed universal scaling is a bit surprising as the pandemic mitigation and social distancing measures, the testing procedures and protocols, and many other aspects, vary significantly from one country to another. nevertheless, the scaling appears to be a strong attractor of the reported active cases dynamics globally. an important feature of pg and pged regime is that they both contribute to a slowdown of the epidemic growth in the reported data compared to expected dynamics driven by the sir model. note that we have only considered the active cases data but our preliminary data analysis confirms that the pg trends are present in the reported deaths and where available also in reported hospitalizations. therefore we conjecture that the observed transition between the different regimes is comparable to phase transitions in physics, thus one expects that the universal scalings in data are a consequence of some unidentified fundamental properties related to the pandemic. lack of the reliable and detailed data that would allow to discriminate between their eventual sources and a high complexity of the studied system that involves the virus/disease (its medical, chemical, and physical properties), behavior of individuals in population, and enforcement of the mitigation measures (see [ ] for a summary of some related questions) do not allow to identify underlying factors for the observed pg and pged regimes. here we only list eventual candidates (or their combinations): • significant changes in the effective contact network (social distancing and other mitigation measures) including low infection transmission probability in majority of contacts due to imposed safety measures (personal protection items as face masks, gloves, disinfectants, etc.). • limitations of testing procedures and selection of the sample used for testing, including high level of uncertainty in test sensitivity (related to limit of detection and difficulties with sample collection) and specificity of all types of tests, over-and undersampling of various groups in testing, and failure to identify and test asymptomatic carriers; delays in test reporting. • limited understanding of details of infection spread mechanisms, particularly the role of individual and temporal variation of viral load in infected individuals and their ability of infect others, related to various clinical stages of the disease; a lack of understanding of mechanisms of superspreading events. note that the observed pg a pged regimes are not in agreement with the traditional sir type models that typically form a base for pandemic spread predictions published in the media unless their parameters are modified from their expected values, particularly a total population is decreased to a significantly lower effective total population. therefore we conclude that although this work does not provide understanding of the full extent of the pandemic as it only models the reported data, it still may provide a useful source for decision making, for a comparison of different countries, or for economical predictions by governments, epidemiologists, and economists. table . the data is shown from the epidemic onset until may , [ ]. table . the data is shown from the epidemic onset until may , [ ]. active cases czechia (withheld data: ) pged fit max. point data figure : predictions based on pged model. for each country we remove the last n data points or last n data points before the epidemic peak (if already reached), ≤ n ≤ d (see plot labels for the values of d) while always keeping the data points in black. pged model parameters are inferred for each n. % confidence intervals for two pged parameters (time and population of the epidemic peak) inferred by nonlinear regression are shown as bounding boxes around the mean in red. the presented data display: small uncertainty, small overlapping confidence intervals (a), large uncertainty, not enough data (b), monotonicity, additional data shifts the peak earlier (c), well predicted location of the peak and the data past (d), monotonicity, additional data shifts peak later (e, f, g), well predicted location of the peak but data past the peak not well captured (h).the data is shown from the epidemic onset to may , [ ]. the inferred values of the parameter t g of pged regime for countries close or beyond their epidemic peak (except those observing an apparent second epidemic wave). lower t g corresponds to efficient identification, testing and isolation/removal of infected. stars indicate modification of the data to account for data reporting irregularities, see the text. figure : confidence regions for the all future times for selected countries. the data (black) are used for inference of the pged parameters p , α and t m and their covariance matrix. the best fit is displayed (solid green line). symmetric % confidence intervals obtained by sampling parameters from the mutivariate normal distribution with the same covariance structure are displayed as shaded regions. methodological challenges of analysing covid- daa during the pandemic the british society for immunology and the academy of medical sciences (akbar a ed.) covid- estimating the covid- infection rate: anatomy of an inference problem the signicance of the detection ratio for predictions on the outcome of an epidemic -a message from mathematical modelers preprints impact of non-pharmaceutical interventions (npis) to reduce covid mortality and healthcare demand online report coronavirus: up to % of germany covid- antibody seroprevalence estimating covid- antibody seroprevalence in suppression of covid- outbreak in the municipality of vo covid- prevalence sora institut ihme covid- health service utilization forecasting team, murray cjl forecasting the impact of the first wave of the covid- pandemic on hospital demand and deaths for the usa and european economic area countries medrxiv early dynamics of transmission and control of covid- : a mathematical modelling study the lancet infectious diseases this work has been supported by the slovak research and development agency under the contract no. apvv- - (rk) and by the scientific grant agency of the slovak republic under the grants no. / / and / / . the authors would like to thank vlado boža, lukáš poláček, michal burger and the modelling team of institute of health policy for their useful comments and help. particular thanks goes to robert ziff for an inspiration and charlie doering for pointing us in the right direction. key: cord- -t v gs authors: barwolff, gunter title: prospects and limits of sir-type mathematical models to capture the covid- pandemic date: - - journal: nan doi: nan sha: doc_id: cord_uid: t v gs for the description of a pandemic mathematical models could be interesting. both for physicians and politicians as a base for decisions to treat the disease. the responsible estimation of parameters is a main issue of mathematical pandemic models. especially a good choice of $beta$ as the number of others that one infected person encounters per unit time (per day) influences the adequateness of the results of the model. for the actual covid- pandemic some aspects of the parameter choice will be discussed. because of the incompatibility of the data of the johns-hopkins-university to the data of the german robert-koch-institut we use the covid- data of the european centre for disease prevention and control (ecdc) as a base for the parameter estimation. two different mathematical methods for the data analysis will be discussed in this paper and possible sources of trouble will be shown. as example of the parameter choice serve the data of the usa and the uk. the resulting parameters will be used estimated and used in w.,o. kermack and a.,g. mckendrick's sir model. strategies for the commencing and ending of social and economic shutdown measures are discussed. the numerical solution of the ordinary differential equation system of the modified sir model is being done with a runge-kutta integration method of fourth order. at the end the applicability of the sir model could be shown essentially. suggestions about appropriate points in time at which to commence with lockdown measures based on the acceleration rate of infections conclude the paper. let us recollect something about the the model. i denotes the infected people, s stands for the susceptible and r denotes the recovered people. the dynamics of infections and recoveries can be approximated by the ode system we understand β as the number of others that one infected person encounters per unit time (per day). γ is the reciprocal value of the typical time from infection to recovery. n is the total number of people involved in the epidemic disease and there is n = s + i + r. the evenly distribution of members of the species s, i and r is an important assumption for the sir model . the empirical data currently available suggests that the corona infection typically lasts for some days. this means γ = / ≈ , . the choice of β is more complicated and will be considered in the next section. it should be noted, that there are a lot of modifications of the sir model adding other values then i, s or r, but the main behavior of the model will be the same. we use the european centre for disease prevention and control [ ] as a data for the covid- infected people for the period from december st to april th . at the beginning of the pandemic the quotient s/n is nearly equal to . also, at the early stage no-one has yet recovered. thus we can describe the early regime by the equation we are looking for periods in the spreadsheets of infected people per day where the course can be described by a function of type ( we solved this non-linear minimum problem with the damped gauss-newton method (see [ ] ). after some numerical tests we found the subsequent results for the considered countries. thereby we chose different periods for the countries with the aim to approximate the infection course in a good quality. the following figures show the graphs and the evaluated parameter of the usa and the uk. it must be said that evaluated β-values are related to the stated period. for the iterative gauss-newton method we guessed the respective periods for every country by a visual inspection of the graphs of the infected people over days. especially in medicine, psychology and other life sciences the logarithm behavior of data was readily considered. instead of the above table of values the following logarithmic one was used. day log(number of infected people) the logarithm of ( ) leads to log i(t) = log i + βt and based on the logarithmic table the functional is to minimize. the solution of this linear optimization problem is trivial and it is available in most of computer algebra systems as a "block box" of the logarithmiclinear regression. the following figures show the results for the same periods as above for the usa and the uk. figures - show that the logarithmic-linear regression implies poor results. thus, the non-linear optimization problem ( ) is to choose as the favored method for the estimation of i and β. we found some notes on the parameters of italy in the literature, for example β = , and we are afraid that this is a result of the logarithmic-linear regression. our result for italy is pictured in fig. and fig. . neither data from ecdc nor the data from the german robert-koch-institut and the data from the johns hopkins university are correct, for we have to reasonably assume that there are a number of unknown cases. it is guessed that the data covers only % of the real cases. considering this we get a slightly changed results and in the subsequent computations we will include estimated number of unknown cases to the initial values of i. for the uk we use the β-value , (see fig. ) and γ = , we get the course pictured in fig. . n was set to millions. in all countries concerned by the corona pandemic a lockdown of the social life is discussed. in germany the lockdown started at march th . the effects of social distancing to decrease the infection rate can be modeled by a modification of κ is a function with values in [ , ]. for example for t > t , t < t means for example a reduction of the infection rate of % in the period [t , t ] (∆ t = t − t is the duration of the temporary lockdown in days). a good choice of t and t k is going to be complicated. if we respect the chosen starting day of the usa lockdown, march st (this conforms the th day of the concerned year), and we work with we got the result pictured in fig. . the numerical tests showed that a very early start of the lockdown resulting in a reduction of the infection rate β results in the typical gaussian curve to be delayed by i; however, the amplitude (maximum value of i) doesn't really change. one knows that development of the infected people looks like a gaussian curve. the interesting points in time are those where the acceleration of the numbers of infected people increases or decreases, respectively. these are the points in time where the curve of i was changing from a convex to a concave behavior or vice versa. the convexity or concavity can be controlled by the second derivative of i(t). let us consider equation ( ). by differentiation of ( ) and the use of ( ) we get with that the i-curve will change from convex to concave if the relation is valid. for the switching time follows a lockdown starting at t (assigning β * = κβ, κ ∈ [ , [) up to a point in time with ∆ t as the duration of the lockdown in days, will be denoted as a dynamical lockdown (for t > t β * was reset to the original value β). t means the point in time up to which the growth rate increases and from which on it decreases. fig. shows the result of such a computation of a dynamical lockdown. we got t = (κ = , )-the result is significant. in fig. a typical behavior of d i dt is plotted. the result of a dynamical days lockdown for the uk is shown in fig. , where we found t = (κ = , ). data from china and south korea suggests that the group of infected people with an age of or more is of magnitude %. this group has a significant higher mortality rate than the rest of the infected people. thus we can presume that α= % of i must be especially sheltered and possibly medicated very intensively as a highrisk group. this result proves the usefulness of a lockdown or a strict social distancing during an epidemic disease. we observe a flattening of the infection curve as requested by politicians and health professionals. with a strict social distancing for a limited time one can save time to find vaccines and time to improve the possibilities to help high-risk people in hospitals. to see the influence of a social distancing we look at the uk situation without a lockdown and a dynamical lockdown of days with fig. (κ = , ) for the % high-risk people. the computations with the sir model show, that the social distancing with a lockdown will only be successful with a start behind the time greater or equal to t , found by the evaluation of the second derivative of i (formula ( )). if the lockdown is started at a time less then t the effect of such a social distancing is not significant. if we write ( ) or ( ) resp. in the form we realize that the number of infected people decreases if is complied. the relation ( ) shows that there are two possibilities for the rise of infected people to be inverted and the medical burden to be reduced. b) a second possibility is the reduction of the infection rate κβ. this can be achieved by strict lockdowns, social distancing at appropriate times, or rigid sanitarian moves. with respect to point a) it is important to note, that a lot of possible positive precautions by physicians and politicians can not be cover by mathematical models like the sir one. also the infected people are not distributed in the same way at all locations of a country. it is also possible and necessary to concentrate the modeling to hot spots like new york in the usa, madrid in spain or bavaria in germany to get a higher resolution of the pandemic behavior. the results are pessimistic in total with respect to a successful fight against the covid- -virus. hopefully the reality is a bit more merciful than the mathematical model. but we rather err on the pessimistic side and be surprised by more benign developments. note again that the parameters β and κ are guessed very roughly. also, the percentage α of the group of high-risk people is possibly overestimated. depending on the capabilities and performance of the health system of the respective countries, those parameters may look different. the interpretation of κ as a random variable is thinkable, too. at the end all precautions (for example social distancing, isolation of high-risk people) lead to a prolongation of the pandemic period with respect to the awaited and necessary herd immunity. but the decrease of the peak of the curve of infected people generates time for the improvement of the health systems and heights the possibilities to save life. a contribution to the mathematical theory of epidemics numerics for engineers, physicists and computer scientists a contribution to the mathematical modeling of the corona/covid- pandemic. medrxiv understandig the present status and forcasting of covid- in wuhan. medrxiv key: cord- -jxbw wl authors: prasad, j. title: a data first approach to modelling covid- date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: jxbw wl the primary data for covid- pandemic is in the form of time series for the number of confirmed, recovered and dead cases. this data is updated every day and is available for most countries from multiple sources. in this work we present a two step procedure for model fitting to covid- data. in the first step, time dependent transmission coefficients are constructed directly from the data and, in the second step, measures of those (minimum, maximum, mean, median etc.,) are used to set priors for fitting models to data. we call this approach a "data driven approach" or "data first approach". this scheme is complementary to bayesian approach and can be used with or without that for parameter estimation. we use the procedure to fit a set of sir and sird models, with time dependent contact rate, to covid- data for a set of most affected countries. we find that sir and sird models with constant transmission coefficients cannot fit covid- data for most countries (mainly because social distancing, lockdown etc., make those time dependent). we find that any time dependent contact rate, which falls gradually with time, can help to fit sir and sird models for most of the countries. we also present constraints on transmission coefficients and basic reproduction number r ~ as well as effective reproduction number r(t). the main contributions of our work are as follows. ( ) presenting a two step procedure for model fitting to covid- data ( ) constraining transmission coefficients as well as r ~ and r(t), for a set of most affected countries and ( ) releasing a python package pycov that can used to fit a set of compartmental models with time varying coefficients to covid- data. at present the world is going through an unprecedented crisis of pandemic covid- caused by a novel form of coronavirus, named sars-cov- which was passed to the human from bats in the wuhan city of china, some time in december [org a, org b, ea h, ea t, ea v, ea d, ea r, ea l] . till the middle of may the virus has reached almost all the parts of the world resulting in more than four million people infected and more than a quarter million deaths [wor ] . the measures to contain the virus medically by developing a vaccine are going on war footing. however, the success is still expected to be a few years away [ea f] . till a fraction of the population develop (herd) immunity or the vaccine is ready, the only means to contain the pandemic are social measures (social distancing, contact tracing etc.,) and enhanced hygiene practices [ea , ea s, ea p] . some of the most important problems related to covid- research are ( ) estimating the controlling parameters of the pandemic, ( ) making short term predictions using mathematical-statistical modeling which can help in mitigating policies ( ) simulating the growth of the epidemic by taking into account as many contributing effects as possible and ( ) quantifying the impact of mitigation measures, such as lockdown etc [ea j] . modeling covid- pandemic with compartmental models of kermack and mckendrick (for an introduction see [kr , li , bc ] ) has been one of the most active problems in the recent times [ea p, ea a, ea c, ea e, fp , ea m, oli ] . there have been alternative approaches also such as [rmi ] where statistical considerations are being taken into account for predictions. in one of the studies [fp ] it is argued that the data for the confirmed, recovered and dead, all three can easily fit a power law model with similar coefficients. the main attractive feature of these data driven approaches is that the complexity of the model being considered is determined by the data and not by theoretical expectations. in the present work we follow a middle approach and fit two compartmental models, named sir and sird with some modification, to the covid- data. one of the main reasons to consider these models has been that the covid- data is available only for the susceptible, infected, recovered and dead compartments (for the notations used here and other places in the present work see table ( )). it may be true that a large fraction of the population which may be exposed (defined later) play an important role in the dynamics of the pandemic however, it is hard to get reliable numbers for that. apart from that, a large number of undocumented cases [ea l] may have significant influence on the spread of the pandemic. a brief summary of the work presented here is as follows. in § we give a brief introduction to the compartmental models and introduce the notations and variables used in the work. in particular, we discuss the sir model in § . and the seir and the sird models in § . and § . respectively. one of the major parts of the work presented here is to study the time dependence of the contact rate β, we introduce a set of parametric models of β(t) in § . . we discuss the time series data used in the study in the § by giving an example of italy which is one of the most affected countries. the main results of our work are given in § and in § . in § we discuss the reconstruction (regression) procedure for the set of transmission coefficients as well as for the effective reproduction (defined later), number r(t). parameter estimation is discussed in § . the main conclusions of our work with a summary and some important points are discussed in § . mckendrick (see [nc , kr , li ] for an introduction) is still the main framework most commonly used. the main idea of the kermack and mckendrick's compartmental models is that every individual in a society belongs to one of the m compartments and the total number of individuals belonging to different compartments keep changing with time. the minimum value m can have is two, for the susceptible-infected-susceptible (sis) model, in which the recovery does not guarantee that one will not get the infection again [kr ] . during an epidemic phase an individual can go through many stages from being perfectly healthy to the recovered one after an infection, with or without any immunity (short or long term) or may die. if we represent every stage with a compartment and keep the track of the number of individuals in each compartment then we can easily model the dynamics of the epidemic. this approach is very similar to the approach taken in astronomy where we count the number of stars in different stages of their life to understand the stellar evolution. in principle we can have any number of logical compartments but in practice we should consider only those compartments for which we have the counts data, in particular for model fitting. taking into account the fact that we have data only for the number of confirmed, recovered and dead population, the only compartmental model that meets the requirement is the sird model. if we consider the recovered and dead together we get the sir model as is discussed in the next section. one of the important compartments that also is commonly considered is the 'exposed' one and represents the population which have received the infection but cannot pass to others, before a certain period called the incubation period. if we consider exposed population also then we get the seir model that also is discussed below. three compartmental models sir, sird, and seir are shown in the (a), (b) and (c) panels of figure ( ) respectively (for more detail one can refer to [kr , het , ea , oli ] ). if we identify the compartments with the nodes of a graph then the transmission between different compartments, as is represented by a set of coefficients, can be considered the edges of the graph. some of the nodes may have multiple edges and some of the edges could be bi-directional also. the main challenge of the modeling a pandemic like covid- is not the scarcity of mathematical models but it is of the reliable data for the compartments being considered. considered. if we consider these compartments as nodes of a graph then there are transmission coefficients for every connecting edge that determine how effective that edge is in changing the population of the connected compartments. in (a) and (c) representing sir and seir models, the compartments are connected in a linear way, however, for the case (b), representing the sird model, there is a branching also. since the total population must remain a constant so the rates of change along all the connecting edges must add to zero. notation description here β and γ are the transmission coefficients, also called the contact rate and the recovery removal rate respectively, and /β and /γ represent the mean duration of infectiousness and the average period of infectivity (see [het , ea a] ), respectively. in general there is some time lag between acquiring an infection and becoming infectious. however, in the sir model it is ignored and an assumption is made that individuals become infectious immediately upon getting an infection. this is a very strong assumption and the main reasons for all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . making this is that we do not know reliably how many people are actually 'exposed', or have the virus but are still not infectious (cannot pass it to others). one of the ways to address this problem could be contact tracing and assuming that anyone who has come into contact with an infected person is an exposed one. however, this assumption is as strong as the assumption made in the sir model. if we do not consider the birth, death and movement of people then the following condition must be satisfied. here s, i and r is the population of the s, i and r compartments respectively. in equation ( ) the transmission coefficient β is one of the most important parameters of the epidemic dynamics and can be written as the product of the contact rates (the average number of contacts per person per time) and the transmission probability (the probability of disease transmission on contact between a susceptible and an infectious person). as has been mentioned that the transmission coefficient γ can be identified with the recovery rate which is nothing but the inverse of the infectious period (during which an infected person can pass the virus to other healthy people). in general, the equations ( ) is solved with the following initial conditions: the second equation from can be written as: and for s/n > γ/β we get a positive infection rate. here we define one of the most important parameters of an epidemic in terms of the ratio β/γ, called the basic reproduction number r , when considered a constant, and called the effective reproduction number r(t), when considered a function of time. the most common definition [kr ] of it is that it is the average number of secondary cases arising from an average primary case in an entirely susceptible population. note that in the text we may also use just "reproduction number" and the meaning of it will depend on the context. some of the studies such as [ea a] call r and r(t)both as basic reproduction number, however, we follow the convention used in [ea b, ea a, cob ] . the basis reproduction number r is the main measure which quantifies the transmissibility of the virus and r > , sets a chain of transmissions leading an exponential growth of the pandemic. we can keep r < , by minimizing the contact rates (social distancing etc.,), lowering the infectiousness of the infected people (by treating them or putting them in a quarantine etc.) and reducing the susceptibility of the healthy people by vaccination etc., (for detail see [ea ] ). the sir model is one of the most basic models and can be easily generalized by one or more of the following ways: . adding more compartments: depending on the type of pandemic and other details we are interested in we can add more compartments to the sir model. these compartments can fit in between the existing ones (for example as shown in figure( ) (c) for seir case) or can branch out from the existing once (as shown in figure( all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. if we relax the assumption that the people who get the infection become infectious instantly and consider a latent period to the the onset of infectiousness there is a fraction of population (compartment) which has been exposed to the virus but will become infectious only after some latent period /σ, then the model is called susceptible-exposed-infected-recovered (seir) model represented by the following set of equations [kr ] note that if we combine the second and third equation above we get: from the above equation we can see that population in the e and i compartments together can grow with time only when the fraction of the susceptible population is greater than the inverse of the reproduction number : there are many forms of seir equations which are in common use (see [bc , ea p, ea u, ea b, p. , ea n] ) however, equation ( ) is the simplest one and does not include natural deaths. one of the common practices with the seir model has been to consider the incubation period /σ a constant, and estimates it from some other observations. the seir modal is quite complex as compared to the sir model and we cannot find the number of exposed people exactly at time t = for evolving the equations and so the approach used to define r no longer works. thanks to the new generation matrix models [bc ] it is still possible to write r in a close form for this case also. one of the serious drawbacks of the sir model is that people who recover and who die are treated in the same way -there are no separate compartments for the dead and recovered people. this drawback can be addressed by separating the compartments for the dead and recovered population as is done in the sird model described with the following set of equations (for a detail discussion see [ea a, vil ] ). all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint here a new transmission coefficient δ has been introduced which we can identify with the death rate. one of the advantages of the sird model is that it has three transmission coefficients β, γ and δ and we have the data for three time series i(t), r(t) and d(t) available so it is possible to compute the time dependency of all the three coefficients as well as the reproduction number r. the aim of any mitigation measures may be one or more of the followings: . lower the contact or infection rate β. . lower the mortality rate δ. . increase the recovery rate γ. the sird model provides us a framework to estimate or fit all these parameters. in one of the coming sections we will discuss how we can reconstruct the transmission coefficients β, δ and γ as well as r from the data by a direct reconstruction approach. the basic reproduction rate for the sird model can be written in the following way [ea a] : or, where r γ = β/γ and r δ = β/δ. if apart from death and recovery there is some other channel that can lower the population in the i compartment, for example if infected people move out from that region with transmission coefficient η then we can write: with r η = β/η. a more realistic model will have multiple compartments (nodes), either connected in series or some branching out from others, with data to constrain the transmission coefficients (edges). apart from this, realistic models may also require to consider different transmission coefficients of different subgroups (based on age etc.,). incorporating, all these considerations will lead to very complex models having very less connect with the actual data we have. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint . time dependent β models as a pandemic triggers various containment measures [ea s, dg , ea k, ts , rr , ea c] such as lockdown, social distancing, improved hygiene practices etc., are taken and that lead to transmission coefficients such as β becoming time dependent [gg , ea m, fp , ea e, ea i] . apart from this, the drop in the susceptible population also decreases β (see [ea a, cob ] ). lockdown has been one of the most common mitigation measures followed all over the world and, in its extreme form, we can assume that once it starts the contact rate between susceptible and infectious people drops to zero. in general, the lockdown starts on a fixed day t l and has a duration (time scale) we call τ (we will be using both τ and corresponding decay rate µ = /τ in the discussion). we can incorporate these two parameters into the modeling of β(t) in many different ways and a set of three common choices is given below: this model is discussed here just for an example and we do not expect the variation of β(t) as slow as linear one. this expression shows that β(t) starts with an initial value β and after time t l it starts decreasing linearly with a constant rate of µ = /τ and finally becomes β ( − µ) at t = ∞. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . this form of suppression of β(t) starts with a constant value β at some t = t l and keeps decaying for period represented by τ and finally settles to a final value β ( − α) as is shown in figure ( ). this can be written in the following way also: from equation ( ) we can also write : equations ( ) and equation ( ) are important to find the priors for α and µ once we know the priors for β and this will be discussed again in § and will be used in parameter estimation in § . . exponential suppression [fp , ea o, ea n] : this model is similar to the tanh model and in this case also β(t) starts from some initial value β and after decreasing for a period and finally approaches to a constant value β α at t = ∞ as all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint is shown in figure ( ) . note that the transmission coefficient β may decay with time without any intervention also as is discussed in [ea ] for plants. in this case also we can write : and these equations also will be used to find the priors for parameter estimation. in one of the studies [ea o] it has been argued that even the time of recovery /γ may also vary with time due to the improvement in medical understanding of the epidemic and facilities and that also can be modeled as an exponential function. there have been other physically well motivated exponentially decaying forms also such as given in [vil ] in which β starts from starting value β and decay with rate /τ finally becomes β . the author argues that β depends on the policy decisions leading to behavioral changes. this model is different from the model we are considering only in the respect that it considers the "lockdown" from the beginning i.e., t = . the time dependent β models as are discussed above and shown in figure ( ) share a common property that before a certain time t l , that we can identify with the day on which lockdown starts, β has a constant value β and after that it starts decreasing with a rate that depend on the parameter µ = /τ . the effect of the suppression in β is controlled by the parameter µ and for its zeros values all the models become constant β models. from figure ( ) we can conclude that different models can lead to the same amounts of "flattening" of the curve with a different choice of parameters so there is no preferred model for the suppression. the sir model with constant transmission coefficients is applicable only in the situation when the pandemic is let to grow without any intervention. in the real world once a pandemic starts interventions of different kinds (social, medical etc.,) are considered to reduce the rate at which the the epidemic spreads. these interventions can be easily taken into account by considering a time dependent (decaying) growth rate (β). as we can see from the above figure that a decaying (exponentially) β helps to contain the disease by lowering the height of the peak as shown in figure ( ). the primary data for covid- is in terms of three times series for the count of confirmed c(t) , recovered r(t) and dead d(t), persons for every country. by definition all the three times series all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . there are many factors, known and unknown, which determine the behavior of these time series, all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . the time series i(t) for the population in compartment i can be obtained by subtracting r and d from c. for a set of countries the time series of i(t) are shown in figure ( (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint δ, therefore it is a good measure which we can fit to a compartmental model, such as sir or sird and can get constraints. in this and next section we present the main results of the study in the form of demonstrating a reconstruction procedure for the time dependent transmission coefficients β(t), γ(t), δ(t) and the effective reproduction number r(t). we consider an example of italy and india for this procedure. note that this approach is common and can be used to understand the variation of the transmission coefficients with time as a result of interventions. the main advantages of this approach is that there are no parameters to adjust and so the results are easy to reproduce. the approach we use here is similar to as used in [ea e, gg ] . in this approach the evolution equations are written in a discretized form as shown in equation ( ). from the third equation we can write: and using this and second equation from ( ) we get, note that by definition r t+ ≥ r t , so γ(t) ≥ , however, we may have i t+ ≤ i t also, β(t) may become negative also once the population in the compartment i starts decreasing. here an important assumption is being made and that is the fraction of susceptible population s/n is close to unity which may be true at the beginning of the epidemic. once we have expressions for the time dependent β and γ we can also written an expression for the time dependent reproduction number in the following way: following the similar procedure we can write the sird equations in the following discretized form: from these questions we can write : and can write the expression for the reproduction number: all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. where ∆x t = x t+ − x t with x = i, r and d. this equation is identical to equation ( ) if we do not count dead and recovered separately i.e., replace ∆r t + ∆d t with ∆r t . one of the interpretations of r is that it is a ratio of two rates and so in case we are interested finding out two separates measures for γ and δ, we can also write : and so, the procedure as discussed above can be used to know the variation of the transmission coefficients β, γ, δ and effective reproduction r(t)with time. in order to follow this procedure we need to abandon first few data points which have very high noise. as explained above occasionally we may also have negative values of r(t). in figure ( ) we show the reconstruction for β(t), γ(t) and r(t)for italy with sir model and in figure ( ) the same is shown for β(t), γ(t) and δ(t) in case of sird model. we show r(t)as well as r γ (t) and r δ (t) for italy in the case of sird model in figure ( ). similar figure are also shown for india in figure ( ), ( ) and ( ). in all the figures the vertical red dashed lines mark the date of lockdown. one of the main attractive feature of the split of r for δ and γ is that it is sensitive to individual values of γ and δ and not just their sum γ + δ. for the case when γ = δ we get the usual reproduction all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. one of the import uses of the reconstruction procedure we have discussed here is to find the priors (minimum, maximum and best fit) values for the parameters to be fitted. once we have the estimates for β(t), γ(t) and δ(t) from the above procedure we can easily find x min , x max , x , values (with x = β, γ, δ. here, x is the approximate point for the parameter that is needed in many optimization procedure which iteratively find the solution. since in the present work we use parametric form of β(t), so we need priors for the parameters of β(t) i.e., β , α, µ and τ which can find from the reconstructed β(t) (see § . for detail). we consider a set of six compartmental models, three belonging to the sir and three to the sird class. the models are different from each other in terms of the choice for the epidemiological class (sir or sird) or the model for the contact rate β(t) (see § . for detail). a summary of the models is given in table ( ). all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . parameters note that in the model ( ) and ( ), β(t) starts decaying from the very beginning (in place of starting from a particular day representing the date of the lockdown) with a constant rate µ. in any fitting procedure the choice of the loss function depends on what we wish to fit. in the common least square fitting we use the sum of the squares of the offsets as the loss function. however, there is a problem here with the data we have for that choice. the time series we wish to fit have small values at the beginning and very large values at the later stage, so the the fitting is biased towards the points which have large values. one of the solutions for this could be to fit the log of the time series but then the fitting becomes biased toward small values, in the beginning (or later stage when all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . we decide to use the loss function of the ordinary least square which fits the data points close to the peak (having higher values) more accurately than other data points. we found this useful for the following two reasons: . the peak in the time series is an important feature, in particular its location and height, therefore any loss function biased towards it is justified. . for the short term predictions only the data points close to the dates of prediction is important, so using a loss function that fits later points (having higher values) more accurately than the noisy data points in the beginning is favorable. the loss function which we used for fitting the data to sir and sir models is given below. the variables used in the above equations are defined in table ( ): for sir and sird models we fit multiple time series together so we must weight the sum of the squares of the offsets for different time series since they have very different values -the value of i(t) is generally few orders of magnitude higher than r(t) and d(t). we use the following weights for this all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . wherex l is the average of the time series x(t). we use the solve ivp and minimize modules from the scipy [sci ] for integrating the differential equations and minimize the cost function respectively. the loss function given by equation ( ) represents the root mean square deviation (rmsd) and we use its final value as a measure of the goodness of the fit, and is shown by the points of different colors (for different models) in the figure ( ) . from this figure we can see that it is hard to conclude that which of the models fit the data best (has always lowest rmsd). we can also notice from the figure that the model ( ) is less sensitive towards the choice of a country for fitting the model (has the smallest fluctuations). a list of fitting parameter for the different models is given in table ( ). for the sir class of models, model ( ) and ( ) we have five fitting parameters named, γ, β , α, µ, t l and for the sird models, model ( ) and ( ) we have six fitting parameters named γ, β , α, µ, t l and δ. as we can notice that for model ( ) to model ( ) four of the parameters are associated with β(t) and for model ( ) and ( ) the variation of β(t) is controlled by just two parameter -β , the initial value of β and its decay rate µ = /τ . the best fit values of the fitting parameters with their % ci (standard deviation) as well as the median values are given table ( ) . the tables also give the estimate for the effective reproduction number r(t)which is a derived quantity here. note that for computing r we have extrapolated the value of r(t)to the last date for which the data is being used here. a histogram of the effective reproduction number for the different models being considered is shown in figure ( ) and detail values of that for different countries, which include the average values as well as % ci (stddev), are given in table ( ). all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . covid- is a global crisis and understanding its impact on different systems of the modern human life (medical, social, economical etc.,) and the responses presented is an important exercise to carry out. we understand that despite being a global phenomenon, the impact of covid- in terms of the loss to life and the resourced being exhausted depend on the local conditions as well as on the mitigation measures taken locally. however, we believe that the global picture of the crisis does help to plan and take policy decisions at the local scale also. full understanding of any pandemic, in particular like covid- which does not have any other examples in the history (in terms of the scale and impact), may become available only when it is over and the facts and figure presented here may have very short life. however, we still believe that any quick timely insight may help a lot in terms of the planning for the worse. knowing very well that all the mathematical models are wrong but some are useful, we believe that mathematical models which are presented in this work may help to develop some insight about the crisis. a brief summary of the work presented here is as follows. in § we have given a very brief introduction of the problem being addressed and reviewed some of the key works about covid- which motivated the present work. a brief introduction of the mathematical framework used in the work in § , in particular we have review a set of compartmental models sir, seir and sird in § . we have also discussed a set of of parametric models for one of the transmission coefficients β(t), in § . . we have discussed the data being used in the work in § . the main results of the present work are discussed in § and § . in § we have reviewed a reconstruction procedure for the transmission coefficients and basic reproduction number r(t). this procedure does not depend on the choice of any parameter and can be easily generalized for other similar models also. we have presented the best-fit values of the parameters with their % ci, § in the form a set of tables. we have presented the values of the parameters in the following two forms: . model based . country based all the fitting parameters for the models being considered are summarized in table ( ??). the country based parameters are given in terms of a set of two tables ( ) where every row represents a country. from the country based table we can see that the estimate are quite consistent -do not change much from one model to another. in place of presenting all the parameters for countries we present only the basic reproduction number corresponding to different models. we present table ( ) which in the appendix which have some basic information about the countries being considered for the modeling. in order to see how different models fits the data we have given the plots for all the six models for a set of countries in the figures from (??). all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . the work we presented here assumes that spreading of a pandemic like covid- happens homogeneously in space and time, however, we know that it is far from true. as the experience [ea b] shows that "super-spread" events (sses) or rare events where, one particular infectious person interacts with a very large number of susceptible people over a short period of time have the maximum impact. in these situation the average measures like r are not very informative. in the present work we data for a set of countries to constrain the parameters of the sir and sird model one similar exercise with sird model for india is done in [ea q] . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . an introduction to compartmental modeling for the budding infectious disease modeler modeling infectious disease dynamics early forecasts of the evolution of the covid- outbreaks and quantitative assessment of the effectiveness of countering measures analysis and fitting of an sir model with host response to infection load for a plant disease transmission dynamics and control of severe acute respiratory syndrome transmission dynamics of the etiological agent of sars in hong kong: impact of public health interventions strategies for containing an emerging influenza pandemic in southeast asia strategies for mitigating an influenza pandemic data-based analysis, modelling and forecasting of the covid- outbreak accounting for symptomatic and asymptomatic in a seir-type model of covid- assessing the efficiency of different control strategies for the coronavirus (covid- ) epidemic. arxiv e-prints epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study a time-dependent sir model for covid- with undetectable infected persons a strategic approach to covid- vaccine r&d an interactive web-based dashboard to track covid- in real time the species severe acute respiratory syndrome-related coronavirus: classifying -ncov and naming it sars-cov- monitoring the spread of covid- by estimating reproduction numbers over time inferring change points in the spread of covid- reveals the effectiveness of interventions early dynamics of transmission and control of covid- : a mathematical modelling study substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov- ) preliminary analysis of covid- spread in italy with an adaptive seird model a modified seir model to predict the covid- outbreak in spain and italy: simulating control scenarios and multi-scale epidemics a data driven analysis and forecast of an seiard epidemic model for covid- in mexico the effect of control strategies to reduce social mixing on outcomes of the covid- epidemic in wuhan, china: a modelling study studying the progress of covid- outbreak in india using sird model. medrxiv covid- : epidemiology, evolution, and cross-disciplinary perspectives the global impact of covid- and strategies for mitigation and suppression. imperial college covid- response team a new coronavirus associated with human respiratory disease in china modified seir and ai prediction of the epidemics trend of covid- in china under public health interventions a pneumonia outbreak associated with a new coronavirus of probable bat origin analysis and forecast of covid- spreading in china, italy and france mapping -ncov novel coronavirus (covid- ) cases extracting the effective contact rate of covid- pandemic the mathematics of infectious diseases data on covid- (coronavirus) confirmed cases, deaths, and tests space-time dependence of corona virus (covid- ) outbreak. arxiv e-prints modeling infectious diseases in humans and animals an introduction to mathematical modeling of infectious diseases mathematical models of infectious disease transmission refined compartmental models, asymptomatic carriers and covid- novel coronavirus ( -ncov) situation report - report of the who-china joint mission on coronavirus disease a time-dependent seir model to analyse the evolution of the sars-covid- epidemic outbreak in portugal a python package for fitting covid- data data-driven modeling reveals a universal dynamic underlying the covid- pandemic under social distancing age-structured impact of social distancing on the covid- epidemic in india scientific computing tools for python sk shahid nadim. assessment of days lockdown effect in some states and overall india: a predictive mathematical study on covid- outbreak estimating and simulating a sird model of covid- for many countries, states, and cities covid- coronavirus pandemic the author would like to thank dr. gaurav goswami for comments and feedback. at present the author works as an independent researcher and data scientist and the work presented here is not supported by any public or private agency. the author will be thankful to any agency, individual or individuals who come forward to sponsor/support this and other similar works on covid- . key: cord- -ziwfr dv authors: sauter, t.; pires pacheco, m. title: testing informed sir based epidemiological model for covid- in luxembourg date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: ziwfr dv the interpretation of the number of covid- cases and deaths in a country or region is strongly dependent on the number of performed tests. we developed a novel sir based epidemiological model (sivrt) which allows the country-specific integration of testing information and other available data. the model thereby enables a dynamic inspection of the pandemic and allows estimating key figures, like the number of overall detected and undetected covid- cases and the infection fatality rate. as proof of concept, the novel sivrt model was used to simulate the first phase of the pandemic in luxembourg. an overall number of infections of . and an infection fatality rate of , % was estimated, which is in concordance with data from population-wide testing. furthermore based on the data as of end of may and assuming a partial deconfinement, an increase of cases is predicted from mid of july on. this is consistent with the current observed rise and shows the predictive potential of the novel sivrt model. the pandemic disease covid- , caused by the coronavirus sars-cov- , became in a few months one of the leading causes of death worldwide with now over . fatalities and million reported cases (dong et al., ; john hopkins dashboard july , , n.d.) . the total number of cases and recovered patients is unknown as a fraction of the virus carriers do only show mild or no symptoms and hence, escape any diagnostics or could not get tested especially at the onset of the crisis due to a lack of infrastructure and test material. to a lesser extent, the number of cases is likely to be underestimated in countries that did not count deaths outside care facilities. whereas other countries like belgium included every fatality that has been tested positive for the virus regardless of the cause of death (https://www.politico.eu/article/why-is-belgiums-death-toll-so-high/). the lack of consistency among testing strategies and case counts prevents the reliable and comparable calculation of simple measures, such as the infection fatality rate (ifr) or the effective reproduction number rt_eff, which is required to better assess the virulence and spread of the disease. a unified large-scale testing strategy and a more rigorous integration of the testing information would enable more precise political decisions on measures beyond following the all-or-nothing example of china, which imposed a lockdown on their population to avoid a breakdown of the healthcare system due to a saturation of icu beds by covid- patients. among the first affected countries by the virus, only the ones that had experienced the previous mers-cov outbreak such as south korea and singapore and hence had mitigation strategies in place or had established large test infrastructures like iceland could avoid strict containment strategies. other countries like sweden and england attempted to find a balance between lockdown and uncontrolled spread, by slowing the contagion to protect the healthcare system and the elderly people without facing the economic harm caused by a full lockdown. in luxembourg, the first case and death were reported on the th of february and the th of march , respectively. on the th of march, schools were closed and all non-crucial workers shifted to remote work or were furloughed. two days later, the state of emergency was declared, in-person gatherings were prohibited, and restaurants and bars were closed. over . workers were furloughed and . more took a leave for family reasons to homeschool their children. in parallel, within the con-vince study serologic tests were performed to assess the presence of igg and iga in the plasma, as well as nose and mouth swabs on a random set of . habitants to assess the spread of the disease in the luxembourgish population. around , % of the samples had antibodies and people tested positive, indicating that luxembourg was far away from herd immunity (snoeck et al., ) . epidemic models such as compartment models have proven to be a useful tool in other outbreaks to access the efficiency of mitigation strategies and to plan the timing and strength of interventions. more specifically, sir (susceptible, infected and removed) and seir (susceptible, exposed, infected, and removed) models, formulated as ordinary differential equations (ode), allow determining when social distancing, hand washing, testing, and voluntary remote working should be sufficient to prevent an exponential growth of the cases and when a significant portion of the population has to return into lockdown (song et al., ; tang et al., ; wangping et al., ; yang et al., ) . adaptations and extensions of sir and seir models were published for covid- already (giordano et al., ; siwiak et al., ; tang et al., ) . such models allow describing the dynamics of mutually exclusive states such as susceptible (s) which for covid- is assumed to be the entire population of a country, a region or city, the number of infected (i) and removed (r) that often combines (deaths and recovered), as well as the number of exposed (e) for seir models. the variables i and r, are often unknown as the number of cases and announced recovered patients only accounts for a fraction of the real values that is dependent on the testing performed within a country. therefore the numbers of susceptible and exposed that equals the total population minus infected and removed in seir models are also undetermined. several studies extended the number of considered states in such models to further differentiate between detected and undetected cases (susceptible (s), infected (i), diagnosed (d), ailing (a), recognized (r), threatened (t), healed (h) and extinct (e)), (gaeta, ; giordano et al., ) or took the severity of the disease into account in relation to the age of the infected person (balabdaoui & mohr, ; wu et al., ) . however, with an increase in the number of states and parameters describing the transitions between these states, more data is required to calibrate the model, i.e. to estimate the model parameters. roda et al (roda et al., ) showed that an sir model seems to better represent data obtained from case reports than seir models. notably, sir models captured a link between the transmission rate β and the case-infection ratio that was missed by seir models. the underestimation of the infected, deaths, and removed, due to not considering country specific testing information, causes sir models to predict ifr and effective reproduction number (rt_eff ) values that vary drastically across countries with different testing and might often be overestimated. to overcome this issue, we propose an extended sir model (sivrt) which is informed by the number of performed tests and also takes the number of hospitalizations into account to parametrize the model. this allows for a better prediction of the evolution of the disease and the estimation of key pandemic parameters, as well as the analysis of different deconfinement and testing strategies. the novel sivrt model ( and detected death cases (d d ). all these transitions are modeled with first-order laws with rate constants kir, kiv, kvr, kvd, kidr, kidvd, kvdr, and kvdd, respectively. the rate constants for detected and non-detected states are assumed to be equal. regarding the testing, it is presumed that (i) severe cases s are tested with high probability compared to asymptomatic cases, as it is more likely that the severe cases will be spotted within the population; (ii) susceptible s and recovered people rcum (=r+rd) are tested with the same probability. testing of severe cases s is modelled as a first-order term as well (kvvd), and the remaining performed tests are distributed among infected (i) and the sum of susceptible and recovered (s+rcum). the ratio between these two groups is adjusted with parameter ktivss which is also subjected to optimization. for luxembourg data on people tested positive, death cases, hospitalisation and performed tests were obtained from the website of the luxembourgish government https://coronavirus.gouvernement.lu/en.html and are summarized in appendix . the model was implemented in the iqm toolbox (sunnåker & schmidt, ) for the predicted no lockdown at all scenarios (figure ), the lockdown event on day was removed. for the predicted light lockdown scenario (figure ) , the lockdown event on day was kept, but the infection rate parameter (ksi) was increased by % of the difference of its value during the full lockdown compared to its value before the lockdown. for the predicted partial lifting of the lockdown scenario as of end of may ( figure ), the infection rate parameter (ksi) was increased on day of the simulation by % of the difference of its value during the full lockdown compared to its value before the lockdown. the testing rate was kept constant. for the predicted lifting of the lockdown scenario as of end of may with increased testing approximately matching the luxembourgish strategy of testing (figure ), on day of the simulation the infection rate parameter (ksi) was set to its value before the lockdown and the testing rate was increased to . tests per day. as the number of performed tests strongly influences the dynamic analysis of the covid- pandemic in a country or region, we developed a novel sir based epidemiological model (sivrt, figure ) which allows the integration of this key information. the model consists of two layers describing the undetected and detected cases whereby the transition between these layers is realized by testing. the model distinguishes severe from non/less symptomatic cases . the probability for severe cases to get tested is assumed to be higher. the model consists of states and has been implemented in the iqm toolbox within matlab (methods & appendix ). importantly it allows the fitting to epidemiological data, among others to detected cases, deaths, march , , which had one of the largest numbers of cases and tests performed at the onset of the pandemic (kim et al., ) and more importantly with the estimated ifr after adjusting for delay from confirmation to death obtained on the diamond princess cruise ship (russell et al., are predicted to have occurred in luxembourg with only around . deaths being detected and assigned to the pandemic ( figure ). as the same number of tests in this simulation was assumed as performed in reality, a high number of deaths would not have been detected. whereas, a lighter lockdown could have led already to a second infection wave as shown in an example simulation (figure ). thus obviously also the model supports the huge necessity of the performed lockdown, in comparison with alternative scenarios. example simulation showing a reduced risk of second infection wave arising around mid of july (day ), compared to lower testing as shown in figure . legend as in figure . in summary, the novel testing informed sivrt model structure allows to describe and analyze the covid- pandemic data of luxembourg in dependency of the number of performed tests. this enables the estimation of the overall and recovered cases, including detected and nondetected cases and thereby the estimation of the infection fatality rate (ifr). it is furthermore possible to perform predictions on past and future scenarios of combinations of lockdown lifting and testing. simulations of the novel sivrt model with parameters estimated from the data of the early pandemic in luxembourg give a full dynamic picture including detected and non-detected cases. in particular, the overall number of cases until end of may and the ifr is estimated at around . , representing , % of the population and , %, respectively. this is in line with , % of volunteers in the con-vince study that had igg antibodies against sars-cov- in their plasma and the estimated ifr rate on the diamond princess cruise ship , ( % ci: . - . ) (russell et al., ) . the sivrt model also allowed predicting the appearance of a second wave in a time frame of days after a partial lifting of the lockdown. this is in concordance with the rise in cases see in luxembourg as of mid of july. age-stratified model of the covid- epidemic to analyze the impact of relaxing lockdown measures: nowcasting and forecasting for switzerland an interactive web-based dashboard to track covid- in real time a simple sir model with a large set of asymptomatic infectives modelling the covid- epidemic and implementation of population-wide interventions in italy understanding and interpretation of case fatality rate of coronavirus disease why is it difficult to accurately predict the covid- epidemic? estimating the infection and case fatality ratio for coronavirus disease (covid- ) using age-adjusted data from the outbreak on the diamond princess cruise ship from a single host to global spread. the global mobility based modelling of the covid- pandemic implies higher infection and lower detection rates than current estimates prevalence of sars-cov- infection in the luxembourgish population the con-vince study an epidemiological forecast model and software assessing interventions on covid- epidemic in china iqm tools-efficient state of the art modeling across pharmacometrics and systems pharmacology estimation of the transmission risk of the -ncov and its implication for public health interventions extended sir prediction of the epidemics trend of covid- in italy and compared with hunan estimating clinical severity of covid- from the transmission dynamics in wuhan, china shortterm forecasts and long-term mitigation evaluations for the covid- epidemic in hubei province key: cord- -fy oebs authors: amaro, j. e.; dudouet, j.; orce, j. n. title: global analysis of the covid- pandemic using simple epidemiological models date: - - journal: nan doi: nan sha: doc_id: cord_uid: fy oebs several analytical models have been used in this work to describe the evolution of death cases arising from coronavirus (covid- ). the death or `d' model is a simplified version of the sir (susceptible-infected-recovered) model, which assumes no recovery over time, and allows for the transmission-dynamics equations to be solved analytically. the d-model can be extended to describe various focuses of infection, which may account for the original pandemic (d ), the lockdown (d ) and other effects (dn). the evolution of the covid- pandemic in several countries (china, spain, italy, france, uk, iran, usa and germany) shows a similar behavior in concord with the d-model trend, characterized by a rapid increase of death cases followed by a slow decline, which are affected by the earliness and efficiency of the lockdown effect. these results are in agreement with more accurate calculations using the extended sir model with a parametrized solution and more sophisticated monte carlo grid simulations, which predict similar trends and indicate a common evolution of the pandemic with universal parameters. the sir (susceptible-infected-recovered) model is widely used as first-order approximation to viral spreading of contagious epidemics [ ] , mass immunization planning [ , ] , marketing, informatics and social networks [ ] . its cornerstone is the so-called "mass-action" principle introduced by hamer, which assumes that the course of an epidemic depends on the rate of contact between susceptible and infected individuals [ ] . this idea was extended to a continuous time framework by ross in his pioneering work on malaria transmission dynamics [ ] [ ] [ ] , and finally put into its classic mathematical form by kermack and mckendric [ ] . the sir model was further developed by kendall, who provided a spatial generalization of the kermack and mckendrick model in a closed population [ ] (i.e. neglecting the effects of spatial migration), and bartlett, who -after investigating the connection between the periodicity of measles epidemics and community size -predicted a traveling wave of infection moving out from the initial source of infection [ , ] . more recent implementations have considered the typical incubation period of the disease and the spatial migration of the population. the pandemic has ignited the submission of multiple manuscripts in the last weeks. most statistical distributions used to estimate disease occurrence are of the binomial, poisson, gaussian, fermi or exponential types. despite their intrinsic differences, these distributions generally lead to similar results, assuming independence and homogeneity of disease risks [ ] . in this work, we propose a simple and easy-to-use epidemiological model -the death or d model [ ] -that can be compared with data in order to investigate the evolution of the infection and deviations from the predicted trends. the d model is a simplified version of the sir model with analytical solutions under the assumption of no recovery -at least during the time of the pandemic. we apply it globally to countries where the infestation of the covid- coronavirus has widespread and caused thousands of deaths [ , ] . additionally, d-model calculations are benchmarked with more sophisticated and reliable calculations using the extended sir (esir) and monte carlo planck (mcp) models -also developed in this work -which provide similar results, but allow for a more coherent spatial-time disentanglement of the various effects present during a pandemic. a similar esir model has recently been proposed by squillante and collaborators for infected individuals as a function of time, based on the ising model -which describes ferromagnetism in statistical mechanics -and a fermi-dirac distribution [ ] . this model also reproduces a posteriori the covid- data for infestations in china as well as other pandemics such as ebola, sars, and influenza a/h n . the sir model considers the three possible states of the members of a closed population affected by a contagious disease. it is, therefore, characterized by a system of three coupled non-linear ordinary differential equations [ ] , which involve three time-dependent functions: • susceptible individuals, s(t), at risk of becoming infected by the disease. • infected individuals, i(t). • recovered or removed individuals, r(t), who were infected and may have developed an immunity system or die. the sir model describes well a viral disease, where individuals typically go from the susceptible class s to the infected class i, and finally to the removed class r. recovered individuals cannot go back to be susceptible or infected classes, as it is, potentially, the case of bacterial infection. the resulting transmission-dynamics system for a closed population is described by where λ > is the transmission or spreading rate, β > is the removal rate and n is the fixed population size, which implies that the model neglects the effects of spatial migration. currently, there is no vaccination available for covid- , and the only way to reduce the transmission or infection rate λ -which is often referred to as "flattening the curve"-is by implementing strong social distancing and hygiene measures. the system is reduced to a first-order differential equation, which does not possess an explicit solution, but can be solved numerically. the sir model can then be parametrized using actual infection data to solve i(t), in order to investigate the evolution of the disease. in the d model, we make the drastic assumption of no recovery in order to obtain an analytical formula to describeinstead of infestations -the death evolution by covid- . this can be useful as a fast method to foresee the global behavior as a first approach, before applying more sophisticated methods. we shall see that the resulting d model describes well enough the data of the current pandemics in different countries. the main assumption of the d model is the absence of recovery from coronavirus, i.e. r(t) = , at least during the pandemic time interval. this assumption may be reasonable if the spreading time of the pandemic is much faster than the recovery time, i.e. λ ≫ β. the sir equations are then reduced to the single equation of the well-known si model, which represents the simplest mathematical form of all disease models, where the infection rate is proportional to both the infected, i, and susceptible individuals n −i. equation is trivially solved by multiplying by dt and dividing by (n − i)i, integrating over an initial t = and final t we obtain where i = i(t ). taking the exponential on both sides finally, solving this algebraic equation we obtain the solution i(t) which can be written in the form where we have defined the constants the parameter b is the characteristic evolution time of the initial exponential increase of the pandemic. the constant c is the initial infestation rate with respect to the total population n . assuming c ≪ , eq. yields in order to predict the number of deaths in the d model we assume that the number of deaths at some time t is proportional to the infestation at some former time τ , that is, where µ is the death rate, and τ is the death time. with this assumption we can finally write the d-model equation as where a = µi e −τ /b , c = c e −τ /b , and a/c yields the total number of deaths predicted by the model. this is the final equation for the d-model, which presents a similar shape to the well-known woods-saxon potential for the nucleons inside the atomic nucleus or the bacterial growth curve. the rest of the parameters, µ, τ , i and n are embedded in the parameters a, b, c, which represent space-time averages and can be fitted to the timely available data. in fig. , we present the fit of the d-model to the covid- death data for china, where its evolution has apparently been controlled and the d function has reached the plateau zone, with few increments over time, or fluctuations that are beyond the model assumptions. this plot shows the duration of the pandemic -about two months to reach the top end of the curve -and the agreement, despite the crude assumptions, between data and the evolution trend described by the d-model. this agreement encourages the application of the d model to other countries in order to investigate the different trends. in order to get insight into the stability and uncertainty of our predictions, fig. ii shows the evolution of a, b, and c and other model predictions from fits to the daily data in spain. the meaning of these quantities is explained below: • the parameter a is the theoretical number of deaths at the day corresponding to t = . in general, it differs from the experimental value and can be interpreted as the expected value of deaths that day. note that experimental data may be subject to unknown systematic errors and different counting methods. • the parameter b, as mentioned above, is the char- acteristic evolution time. during the initial expo-nential behavior, it indicates the number of days for the number of deaths to double. moreover, /b is proportional to the slope of the almost linear behavior in the mid region of the d function. that behavior can be obtained by doing a taylor expansion around t = −b ℓn c and is given by • the parameter c is called the inverse dead factor because d(t → ∞) = a/c provides the asymptotic or expected total number of deaths. figure ii shows the stable trend of the parameters between days to (corresponding to march - ), right before reaching the peak of deaths cases, which occurred in spain around april . such stability validates the d-model predictions during this time. however, a rapid change of the parameters is observed, especially for a, once the peak is reached, drastically changing the prediction of the number of deaths given by a/c. this sudden change results in the slowing down of deaths per day and longer time predictions t and t . the parameters of the d model correspond to average values over time of the interaction coefficients between individuals, i.e. they are sensitive to an additional external effect on the pandemic evolution. these may include the lockdown effect imposed in spain in march and other effects such as new sources of infection or a sudden increase of the total susceptible individuals due to social migration and large mass gatherings [ ] . it is not possible to identify a specific cause because its effects are blurred by the stochastic evolution of the pandemic, which is why any reliable forecast presents large errors. one can also determine deaths/day rates by applying the first derivative to eq. , which allows for a determination of the pandemics peak and evolution after its turning point. the d model describes well the cumulative deaths because the sum of discrete data reduce the fluctuations, in the same way as the integral of a discontinuous function is a continuous function. however, the daily data required for d ′ have large fluctuations -both statistical and systematic -which normally gives a slightly different set of parameters when compared with the d model. using the d model fitted to cumulative deaths allows to compute deaths/day as where ∆t = day. figure shows that eqs. and yield similar parameters, as the time increment is small enough compared with the time evolution of the d(t) function. hence, the first derivative d ′ (t) can be used to describe deaths per day. in addition, fig. shows that the parameters may be different for both d and d ′ functions using cumulative and daily deaths, respectively, as shown for spain on april . it is also important to note that b is directly proportional to the full width at half maximum (f w hm ) of the d ′ (t) distribution, as shown below, the b parameter presents typical values between and for most countries undergoing the initial exponential phase, which yields a minimum and maximum time of and days, respectively, between the two extreme values of the f w hm . c. dn model with two or more channels of infection some models [ ] include changes in the transmission rate due to various interventions implemented to contain the outbreak. the simple d model does not allow to do this explicitly, but changes in the spread can be taken into account by considering the total d or d n function as the sum of two or more independent d-functions with different parameters, which may reveal the existence of several independent sources, or virus channels. an example is shown in fig. , where the two-channel function has been fitted with six parameters to the spanish data up to april . the fit reveals a second, smaller death peak, which substantially increase the number of deaths per day and the duration of the pandemic. this is equivalent to add a second, independent, source of infection several weeks after the initial pandemic. the second peak may as well represent a second pandemic phase driving the effects of quarantine during the descendant part of the curve. additionally, the cumulative d-function can also be computed with a two-channel function, which provides, as shown in fig. , a more accurate prediction for the total number of deaths and clearly illustrates the separate effect of both source peaks. it is interesting to note that for large t, a ≈ a , c ≈ c and b ≈ b. in such a case, the total number of deaths expected during the pandemic is given by d (∞) = a/c. the d-model can also be used to estimate i(t) using the initial values of i = i( ) and the total number of susceptible people n = s( ). the initial value of n is unknown, and not necessarily equal to the population of the whole country since the pandemic started in localized areas. here, we shall assume n = , although plausible values of n can be tens of millions. note that the no-recovery assumption of the d model is unrealistic, and this calculation only provides an estimation of the number of individuals that were infected at some time, independently of whether they recovered or not. from the definition of d(t) in eq. , the following relations between the several parameters of the model were extracted solving the first two equations for µ and i we obtain hence, µ can be computed by knowing n . however, to obtain i one needs to know the death time τ . this has been estimated to be about to days for covid- cases, which can be used to compute two estimates of i(t). these are given in fig. for the case of spain. since there is no recovery in the d model, the total number of infected people is i ∼ n for large t, i.e. n = in our case. in fig. which also depends on n and τ . for n = , the ratio d/i increases similarly to the separate functions d and i between the initial and final values, these results depend on the total susceptible population n . however, the ratio of infected with respect to susceptibles, i/n , is independent on n . this function depends only on τ and is shown in the bottom panel of fig. for τ = and days, which reveals the rapid spread of the pandemic. accordingly, between % and % of the susceptibles were infected in march , and one month later (april ), when the fit was made, all susceptibles had been infected. this does not means that the full population of the country got infected, since the number n is unknown and, for instance, excludes individuals in isolated regions, and it may additionally change because of spatial migration, not considered in the model. d-model predictions can be compared with more realistic results given by the complete sir model [ , ] , which is characterized by eqs. , , and with initial conditions r( ) = , i( ) = i , s( ) = n − i . the sir system of dynamical equations can be reduced to a non-linear differential equation. first, dividing eq. by eq. one obtains, which yields the following exponential relation between the susceptible and the removed functions, moreover, eq. provides a relation between the infected and the removed functions, which yields, by inserting into eq. , the final sir differential equation in order to obtain r(t) we only need to solve this first-order differential equation with the initial condition so that s + i + r = , then r(t) verifies which can be solved numerically, or by approximate methods in some cases. in ref. [ ] , a solution was found for small values of the exponent λn r/β. for the coronavirus pandemic, however, this number is expected to increase and be close to one at the pandemic end. at this point, we propose a modification of the standard sir model. instead of solving eq. numerically and fitting the parameters to data, the solution can be parametrized as which presents the same functional form as the d-model and, conveniently, provides a faster way to fit the model parameters by avoiding the numerical problem of solving eq. . in fact, numerical solutions of the sir model present a similar step function for r(t). additionally, one can assume that d(t) is proportional to r(t), and can also be written as where a , c = s and b = β/(λn ) are unknown parameters to be fitted to deaths-per-day data, together with the three parameters of the r(t)-function: a, b, c. figure shows fits of the esir model to daily deaths in spain during the coronavirus spread. the use of no boundary condition for the number of deaths (left panel) is not an exact solution of the sir differential equation. a way to solve this problem is to impose the condition d ′ (∞) = , as the number of deaths must stop at some time. numerically, it is enough to choose a small value of d ′ (t) for an arbitrary large t. the middle and right panels of fig. show different boundary conditions of d ′ ( ) = and d ′ ( ) = , respectively, which yield the same results and the expected behavior for a viral disease spreading and declining. it is also consistently observed (e.g. see middle and right panels of fig. ), that at large t, r(t) → a c ≈ , which essentially means that most of the susceptible population n recovers, as we previously inferred from the d model. this, together with the fact that c can always be adjusted to , leaves the esir model with essentially free parameters to fit to the daily death data; i.e. the same number of parameters than the original sir model. as shown in fig. , esir fits reproduce well the long flattening behavior observed in uk, usa, germany or iran, whereas it fails to reproduce the more-pronounced double-peak structure typically observed in countries like france, italy, spain or belgium. as previously done with the d model, one can also expand the esir model to accommodate this apparent failure to take lockdown effects into account. similarly, the esir model is proposed as, with where we have assumed that a = a ′ and c = a to accommodate that r(∞) → and c = . hence, we are left with five free parameters. finally, fig. shows the comparison between the esir and d ′ fits to real data for countries where covid- has widely spread: belgium, usa, france, germany, iran, italy, spain and uk, usa. death data are taken from refs. [ , , ] and consider -day average smoothing to correct for anomalies in data collection such as the typical weekend staggering observed in various countries, where weekend data are counted at the beginning of the next week. real error intervals are extracted from the correlation matrix. as discussed in section . , the reduced d ′ model has been used with a = a and c = c . although arising from different assumptions, both models provide similar data descriptions and predictions, with slightly better values of χ per degree of freedom for the esir model. it is also interesting to note that the reduced esir model with five parameters yields similar results to the full esir model, with eight parameters. as data become available, daily predictions vary for both esir and the d ′ models. this is because the model parameters are actually statistical averages over space-time of the properties of the complex system. no model is able to predict changes over time of these properties if the physical causes of these changes are not included. the values of the model parameters are only well defined when the disease spread is coming to an end and time changes in the parameters have little influence. more sophisticated calculations can be compared with esir and d ′ predictions. in particular, monte carlo (mc) simulations have also been performed in this work for the spanish case [ ] , which consist of a lattice of cells that can be in four different states: susceptible, infected, recovered or death. an infected cell can transmit the disease to any other susceptible cell within some random range r. the transmission mechanism follows principles of nuclear physics for the interaction of a particle with a target. each infected particle interacts a number n of times over the interaction region, according to its energy. the number of interactions is proportional to the interaction cross section σ and to the target surface density ρ. the discrete energy follows a planck distribution law depending on the 'temperature' of the system. for any interaction, an infection probability is applied. finally, time-dependent recovery and death probabilities are also applied. the resulting virus spread for different sets of parameters can be adjusted from covid- pandemic data. in addition, parameters can be made time dependent in order to investigate, for instance, the effect of an early lockdown or large mass gatherings at the rise of the pandemic. as shown in fig. , our mc simulations present similar results to the d ′ model, which validates the use of the simple d-model as a first-order approximation. more details on the mc simulation will be presented in a separate manuscript [ ] . interestingly, mc simulations follow the data trend up to may without any changes in the parameters for nearly two weeks. an app for android devices, where the monte carlo planck model has been implemented to visualize the simulation is available from ref. [ ] . in order to investigate the universality of the pandemic, it is interesting to compare all countries by plotting the d model in terms of the variable (t − t )/b, where t is the maximum of the daily curve given by t max = −b ℓn(c). by shifting eq. by t max = −b ℓn(c) and dividing by t max = a/c, the normalized d function is given by, the left of fig. shows similar trends for the normalized d curves of different countries, which suggests a universal behavior of the covid- pandemic. only iran seems to slightly deviate from the global trend, which may indicate an early and more effective initial lockdown. a similar approach can be done for the daily data using peaks. the global models considered in this work present some differences with respect to other existing models. first, in this work we have tried to keep the models as simple as possible. this allows to use theoretical-inspired analytical expressions or semi-empirical formulae to perform the data analysis. the use of semi-empirical expressions for describing physical phenomena is recurrent in physics. one of the most famous is the semi-empirical mass formula from nuclear physics. of course the free parameters need to be fitted from known data, but this allowed to obtain predictions for unknown elements. in our case we were inspired by the well known statistical sir-kind models slightly modified to obtain analytical expressions that carry the leading time dependence. we have found that the d and d models allow a fast and efficient analysis of the pandemics in the initial and advanced stages. our results show that the time dependence of the pandemic parameters due to the lockdown can be effectively simulated by the sum of two dfunctions with different widths and heights and centered at different times. the distance between the maxima of the two d-functions should be a measure of the time between the effective pandemic beginning and lockdown. in the spanish case this is about days. taking into account that lockdown started in march , this marks the pandemic starting time as about february . had the lockdown started on that date, the deaths would had been highly reduced. the smooth blending between the two peaks provides a transition between the two statistical regimes (or physical phases) with and without lockdown. the monte carlo simulation results are in agreement with our previous analysis with the d and d models. the monte carlo generates events in a population of in-dividuals in a lattice or grid of cells. we simulate the movement of individuals outside of the cells and interactions with the susceptible individuals within a finite range. the randon events follow statistical distributions based on the exponential laws of statistical mechanics for a system of interacting particles, driven by macroscopic magnitudes as the temperature, and interaction probabilities between individuals, that can be related to interaction cross sections. the monte carlo simulation spread the virus in spacetime, and allows also space-time dependence on the parameters. in this work we have made the simplest assumptions, only allowing for a lockdown effect by reducing the range of the interaction starting on a fixed day. this simple modification allowed to reproduce nicely the spanish death-per-day curve. the lockdown produces a relatively long broadening of the curve and a slow decay. similar mc calculations can be performed in several countries to infer the devastating effect of a late lockdown as compared with early lockdown measures. the later is the case of south africa and other countries, which have not reached the exponential growth. the death and extended sir models are simple enough to provide fast estimations of pandemic evolution by fitting spatial-time average parameters, and present a good first-order approximation to understand secondary effects during the pandemic, such as lockdown and population migrations, which may help to control the disease. similar models are available [ , ] , but challenges in epidemiological modeling remain [ ] [ ] [ ] [ ] . this is a very complex system, which involves many degrees of freedom and millions of people, and even assuming consistent disease reporting -which is rarely the case -there remains an important open question: can any model predict the evolution of an epidemic from partial data? or similarly, is it possible, at any given time and data, to measure the validity of an epidemic growth curve? we finally hope that we have added new insightful ideas with the death, the extended sir and monte carlo models, which can now be applied to any country which has followed the initial exponential pandemic growth. discussion: the kermack-mckendrick epidemic threshold theorem stability analysis of sir model with vaccination seasonality and the effectiveness of mass vaccination application of sir epidemiological model: new trends epidemic disease in england -the evidence of variability and of persistency of type report on the prevention of malaria in mauritius an application of the theory of probabilities to the study of a priori pathometry. -part i an application of the theory of probabilities to the study of a priori pathometry.-part iii a contribution to the mathematical theory of epidemics discussion of measles periodicity and community size by measles periodicity and community size deterministic and stochastic models for recurrent epidemics basic models for disease occurrence in epidemiology the d model for deaths by covid- the continuing -ncov epidemic threat of novel coronaviruses to global health the latest novel coronavirus outbreak in wuhan, china who. coronavirus disease attacking the covid- with the isingmodel and the fermi-dirac distribution function the sir model and the foundations of public health impact of non-pharmaceutical interventions against covid- in europe: a quasi-experimental study inferring covid- spreading rates and potential change points for case number forecasts special issue on challenges in modelling infectious disease dynamics modeling infectious disease dynamics in the complex landscape of global health mathematical epidemiology: past, present, and future true epidemic growth construction through harmonic analysis the authors thank useful comments from emmanuel clément, araceli lopez-martens, david jenkins, ra-mon wyss, liam gaffney and hans fynbo. this work was supported by the spanish ministerio de economía y competitividad and european feder funds (grant fis - -c - -p), junta de andalucía (grant fqm- ) and the south african national research foundation (nrf) under grant . key: cord- -hs uf u authors: adwibowo, a. title: flattening the covid curve in susceptible forest indigenous tribes using sir model date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: hs uf u covid is a global threat and globally spreading. the international cooperation involving indigenous peoples and local communities is urgently required in joint prevention to control the epidemic. currently, many indigenous populations are continuing to face covid . this study is concerned about the dynamic of covid pandemic among indigenous populations living in the remote amazon rainforest enclaves. using the susceptible infectious recovered (sir) model, the spread of the covid under intervention scenarios (low, moderate, high) is simulated and predicted in indigenous tribe populations. the sir model forecasts that without intervention, the epidemic peak may reach within days. nonetheless the peak can be reduced with strict interventions. under low intervention, the covid cases are reduced to % and % of the total populations. while, in the scenario of high intervention, the covid peaks can be reduced to values ranging from % to % .to conclude, the simulated interventions tested by sir model have reduced the pandemic peak and flattened the covid curve in indigenous populations. nonetheless, it is mandatory to strengthen all mitigation efforts, reduce exposures, and decrease transmission rate as possible for covid containment. recognized. in australia continent, communicable diseases explained % of the health gap among aboriginal and torres strait islander populations (vos et al. ). a most comprehensive study on vulnerability of indigenous populations towards communicable diseases can be found in study by walker et al. ( ) . their studies reported epidemics that affected different indigenous societies living in amazon rainforest and caused over deaths between and . those epidemics including measles, influenza, and malaria with proportions were %, %, and %. the others epidemics that were below % included tuberculosis ( %), hepatitis ( %), smallpox ( %), chicken pox, pertussis, polio, and cholera. the amazon indigenous population related covid study is still limited and only a few literature available (baumgartner et al. , ortiz-prado et al. . eventhough the study is still limited, nonetheless the available literatures have provided the magnitude of covid pandemic among indigenous populations. for that reasons, this study aims to investigate the dynamic of covid pandemic among indigenous populations living in the remote amazon rainforest enclaves. the investigations are made using the susceptible infectious recovered (sir) model combined with several intervention scenarios. the study populations were the indigenous tribes living in the rainforest enclaves along the amazon river ( figure ). those enclaves included lagartococha, yasuni, and callarú. the data collected from the indigenous tribes were total population and covid confirmed cases. the data were obtained from the published data provided by indigenous people articulation, regional organization of indigenous peoples of the east, and the national institute of statistics and informatics (inei) websites. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . in this study, a well known susceptible infectious recovered (sir) model was adopted (jo et al. , zhao et al. . in dealing with the current covid pandemic, sir is considered as a versatile compartmental mathematical tool to model any pandemic dynamic. the model helps us to understand kinetics of pandemic with any specific aspects and the sir model is also simple to understand and has clear interpretations (wagh et al. ) . in the standard sir model (waqas et al. ) , the total population (n) is divided into three compartments. the susceptible (s), the fraction of the total population that is vulnerable and at a risk of being infected. the infected (i), the population who has been infected. the recovered (r), the fraction of the total population that recovers. in the sir model, there are several input parameters need to be considered. those parameters denote as ß and γ. according to belfin et al. ( ) , β is the effective transmission rate and γ is the removal rate (recovery). according to mahmud and lim ( ) , γ is denoted as /d with d is the duration of recovery in day. the specific description of sir model is as follows: ( ) ( i) the scenario analysis was used to observe the shape of covid sir curve based on several simulated interventions applied. likewise, simulations were conducted based on the scaled epidemic prevention and control intervention intensities ranging from low ( %), moderate ( %), and high ( %). . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint figure . the covid sir model of indigenous tribe populations living in remote yasuni rainforest enclaves with simulated % (low), % (moderate), and % (high) interventions (x axis: days, y axis: proportion of total population). . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . the indigenous tribe populations and covid cases in amazon rainforest enclaves including lagartococha, callarú, and yasuni are presented in the figure . the highest tribe population was observed in the callarú and lagartococha has the lowest population. regarding the covid cases, the callarú population has the highest cases in comparison to other populations. until may , , there were cases reported. the second large covid cases among tribe population were observed in lagartococha enclaves. since may , , there were already cases in lagartochocha population. yasuni enclave has the lowest cases. among yasuni's population equal to people, the first cases was reported on the last may , . the figure shows the sir model applied to forecast the covid cases in lagartococha and yasuni rainforest enclaves based on the available data. from figure , it is clear that the covid cases will grow rapidly over time resulting a vastly infected population both in lagartococha and yasuni. for the small tribe population as observed in lagartococha, the covid is estimated will reach it peaks in less than days. after that the infection rates will decrease and and the recovery rates will increase however at slow rates. the similar patters are also observed in the larger population as can be observed in the sir model available for yasuni population. this tribe population is estimated will also experience high increase of covid cases between and days which is more longer than what happens to small tribe population in this case the lagartococha. based on the sir model, it is estimated that during its peaks, the highest cases for the lagartococha population are between % to % of the total population equal to people. while for the yasuni population, the highest cases are equal to % of its total population ( people). the simulated sir models to show the flattened covid curve were also presented in figure for lagartococha and figure for yasuni populations. the simulations were made based on the assumptions if the interventions were applied. the model used %, %, and % intervention scenarios. as the model notices, an increase in the interventions is associated with a decrease in the infected cases and a flattening in the covid curve. besides, the model shows a decrease in the proportion of total population infected by the covid and the disease lasts for a shorter period. for the lagartococha populations, the effect is clear for % and % interventions. the peak of infected case is reduced from % to % under % interventions. likewise, cases are reduced to % under % intervention scenario or almost half of the infected cases if there is no intervention applied. a flattened curve is observed for the yasuni population as well ( figure ). for the % intervention scenario, the covid peak is already reduced from % to %. the reductions are keep increasing for . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint % and % interventions as well. for % intervention scenario, the proportion of infected cases are only % of the total population. while, covid peak is reduced to only % under % intervention scenario. an indigenous tribe is accounted for significant populations living in the amazon rainforest. it was estimated that there were . million indigenous people lived in amazon (hern ) . among them, people under tribes are living in isolated parts of amazon (baumgartner et al. ) . the population numbers are varied and dynamic. in this study, the indigenous populations are varied ranging from to . the proposed sir model in this study simulates the widespread of covid throughout indigenous tribe populations living in remote lagartococha and yasuni rainforests. the study gains insights from current data driven approach on epidemics in infected populations. further this study recommends a measure to control the epidemic and to derive solutions for planning and managing confirmed cases. the figure shows the sir model applied to forecast the covid cases in lagartococha and yasuni rainforest enclaves based on the data available. from figure , it is clear that the covid cases will grow rapidly over time resulting a vastly infected population and the situation may even get out of control if no necessary steps are taken. the covid cases in this study are comparable from other findings. it is estimated that the proportion of covid cases among indigenous populations is approximately . % (ortiz-prado et al. ) and even . % (baumgartner et al. ) . therefore, it is high time to take precautions to reduce the covid and flatten the epidemic curve. the sir model shows that the peak of epidemic curve in lagartococha tribe could possibly happens immediately in days (figure ). while the peak for yasuni population will occur after days. nonetheless, the result and the pandemic curve possibly contains the essential uncertainty due to the possibility of changes in the social and climatic situations. the peak of epidemic curve can be flattened by improving policies like travel restriction, sanitization, social distancing, and more testing measures. flattening the curve refers to policy and behavior to control the covid daily cases at a manageable thresholds for health services. flat curve assumes the numbers of covid cases but the cases are occurring slower over a longer period of time. a slower infection rate will reduce the stress of health services and hospital visits on any given day. a sir model developed by wagh et al. ( ) has informed how the authority intervention can flatten the curve of covid in india. in the intact condition, the covid cases were infecting % of . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . population within days. nonetheless, the sir model with the authority intervention shows the covid cases were reduced to % and this happens within days. likewise in bangladesh, islam et al. ( ) have showed the case reductions due to the interventions. the lockdown interventions are estimated can reduce the cases from % to %. it reduces the infected population drastically which means social distancing is a major key to face this epidemic in this scenario. a similar flattened curve pattern was also observed in cameroon, egypt, and malaysia (arifin et al. , mahmud and lim ) . more than half reductions of the contacts with the infected individuals are simulated and proven can reduce the proportion of cases drastically from % to % in cameroon (nguemdjo et al. ) . in egypt (hasab ) , the interventions have reduced the cases from % to %. there are several public health interventions that can be adopted to the model and used to flatten the covid curve. those simulated interventions are increasing social distancing through reductions of numbers of exposure to potential infected individuals on daily basis. the exposure numbers can be articulated as high exposure ( % with low interventions) and % with high interventions (low exposures). moreover, the second simulated interventions are increasing hygiene measures through increasing frequency of hand washing like washing frequently ( % interventions) or never washing hands ( % interventions) (tartari et al. a confirmed covid especially in indigenous population (figure ) should raise a concern considering its isolation, population size, and genetic diversity. a study by cardoso et al. ( ) confirmed the extremely low genetic diversity and low population size observed in the tribe population living in yasuni forest. this condition related to the genetic drift events promoted by founder effects and anthropogenic factors includes tribe's warlike customs. the same condition is also observed in the callarú population. there was a decrease of the internal intervillage genetic heterogeneity in the callarú as reported by neel et al. . this condition was caused by the migrations from the interior forest enclaves to the river bank and individual mobility (salzano et al. ) . under the covid pandemic threats, a population with the low genetic diversity may be more susceptible. mulligan et al. ( ) found that low gene pools in native american tribes were determinant factors for common complex diseases, diabetes, and rare mendelian disorders as well. the covid is a respiratory and systemic illness (lippi and henry ) and there was a history of emerging respiratory tract infections (butler et al. ) and introduced diseases including tuberculosis (hern ) among indigenous populations. in fact, the influenza and cardiorespiratory were common among amazon tribes. as reported by walker et al. ( ) , influenza was accounted for % of observed disease in indigenous population. while herndon et al. ( ) found that cardiorespiratory related . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . diseases was accounted for . %. a historical record of respiratory illness among indigenous tribes is related to the presence of ace receptors. this is a concerning issue since the ace is the receptor for covid (zhang et al. ). hakim and de soto ( ) this present study is the first that delivers the analyses of the covid dynamic focusing on the tribe populations using the sir model. in here, the model has been presented by using several sir input parameters. the model shows an intact exponential increase in the total number of infected cases. the covid cases will peak within days for lagartococha and days for yasuni populations. this gives a first idea of how the epidemic is spreading rapidly across the indigenous populations. by applying intervention scenarios from the lowest to the highest intervention levels to the sir model, the epidemic peak can be reduced and hence curve flattens. this will indeed flatten the curve but will prolong the duration of the epidemic. with the simulated interventions, the covid peak can be reduced to more than half of the numbers of cases without interventions. these findings underline the importance of appropriate interventions for limiting and even stoppig the spread of the covid especially among the vulnerable and isolated indigenous populations. this study provides room for improvements for estimating the effectiveness of health interventions. the administration combined with controlling authority intervention will help to flatten covid epidemic curve. as per the situation, intervention by populations supported with government policy for instance social distancing, it can be used to predict the reduction of final epidemic size using sir model. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . a susceptible-infected-removed (sir) model of covid- epidemic trend in malaysia under movement control order (mco) using a data fitting approach social distancing and movement constraint as the most likely factors for covid- outbreak control in brazil covid- peak estimation and effect of nationwide lockdown in india emerging infectious diseases among indigenous peoples genetic uniqueness of the waorani tribe from the ecuadorian amazon virgin soil epidemics as a factor in the aboriginal depopulation in america ecuador's yasuní biosphere reserve: a brief modern history and conservation challenges medical basis for increased susceptibility of covid- among the navajo and other indigenous tribes flattening covid- curve in egypt: an epidemiological modelling health and demography of native amazonians: historical perspective and current status disease concepts and treatment by tribal healers of an amazonian forest culture covid- epidemic compartments model and bangladesh international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted analysis of covid- spread in south korea using the sir model with time-dependent parameters and deep learning new world depopulation and the case of disease does genetic diversity limit disease spread in natural host populations? indigenous health and environmental risk factors: an australian problem with global analogues indigenous representations of illness and aids in sub-saharan africa chronic obstructive pulmonary disease is associated with severe coronavirus disease (covid- ) applying the seir model in forecasting the covid- trend in malaysia: a preliminary study population genetics, history, and health patterns in native americans simulating the progression of the covid- disease in cameroon using sir models genetic studies on the ticuna, an enigmatic tribe of central amazonas health of indigenous people in africa epidemiological, socio-demographic and clinical features of the early phase of the covid- epidemic in ecuador genetic demography of the amazonian ticuna indians the sir model that used in this study can be easily applied to model the covid in other indigenous tribe populations for example in africa, australia, and asia continents.tartari e, fankhauser c, peters a, sithole b, timurkaynak f, masson-roy s, allegranzi b, pires d, pittet d. . scenario-based simulation training for the who hand hygiene self-assessment framework.antimicrobial resistance & infection control. ( ).. cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . key: cord- -ak gc nz authors: clum, charles; mixon, dustin g. title: parameter estimation in the sir model from early infections date: - - journal: nan doi: nan sha: doc_id: cord_uid: ak gc nz a standard model for epidemics is the sir model on a graph. we introduce a simple algorithm that uses the early infection times from a sample path of the sir model to estimate the parameters this model, and we provide a performance guarantee in the setting of locally tree-like graphs. during an epidemic, government leaders are expected to help maintain public health while simultaneously preventing an economic meltdown. in the absence of a vaccine, decision makers must choose between various non-pharmaceutical interventions. this decision requires an informative forecast of the epidemic at a very early time. to obtain such a forecast, it is helpful to have a parametrized model for epidemics. what follows is a particularly popular compartmental model that originates from the classic work of kermack and mckendrick [ ] . definition (sir model). fix a simple, connected graph g and parameters λ, µ ≥ . consider a continuous-time markov chain in which the state is a partition (s, i, r) of v (g). for the initial state, draw v ∼ unif(v (g)) and put in fact, (s(t), i(t), r(t)) is also a continuous-time markov chain in this case. the initial conditions are s( ) = n − , i( ) = , r( ) = , and the process transitions (s, i, r) → (s − , i + , r) with rate λis and (s, i, r) → (s, i − , r + ) with rate µi. assuming n is large and λn =: β, then putting σ := s/n, ι := i/n, and ρ := r/n, we may pass to the mean-field approximation: this approximation is popular because it is much easier to interact with. the approximation is good once the number of infected vertices becomes a fraction of n, and the approximation is better when this fraction is larger [ ] . this suggests an initial condition of the form σ(t ) = − δ − γ, ι(t ) = δ, ρ(t ) = γ for some small t , δ, γ > . for simplicity, we translate time so that t = . we argue there is no hope of determining (β, µ) from data of the form {ι(t) + ρ(t)} t∈[ , ] for small > . (while the following argument is not rigorous, it conveys the main idea.) notice that for t ∈ [ , ], it holds that σ(t) ≈ , and so ι(t) ≈ δe (β−µ)t and then our data takes the form δ · e (β−µ)t =: a + be ct . we can expect to determine a, b, and c by curve fitting. however, we don't know δ or γ, but rather their sum. as such, we claim that (a, b, c) only determines β − µ. indeed, for every choice of (β, µ) such that β − µ = c, it could be the case that which would then be consistent with the data (a, b, c). of course, additional information about (β, µ) could conceivably be extracted from higher-order terms, since a + be ct is merely an approximation of ι(t) + ρ(t). however, we expect any such signal to be dwarfed by noise in the data. while the short-term behavior of ι is exponential with rate β − µ, the long-term behavior is instead governed by the quotient r := β/µ, known as the basic reproductive number. this can be seen by dilating time by substituting s = µt. in this variable, the mean-field approximation instead takes the form that is, r (together with initial conditions) determines ι modulo time dilation, and notably, whether the curve ever exceeds the capacity of the medical care system. however, r = β/µ cannot be determined from β − µ. overall, the complete graph is not amenable to determining (λ, µ) from {u (t)} t∈[ , ] . however, real-world social networks are far from complete. like social networks, expander graphs have low degree, but considering their spectral properties, one might presume that they are just as opaque as the complete graph. surprisingly, this is not correct! in this paper, we show how certain graphs (including certain expander graphs) are provably amenable to determining (λ, µ) from {u (t)} t∈[ , ] . in the following section, we introduce our approach. specifically, we isolate infections that pass across bridges in a local subgraph of the social network, and then we estimate λ and µ from these infection statistics. section gives the proof of our main result: that our approach provides decent estimates of λ and µ in the setting of locally tree-like graphs. our proof makes use of the vast literature on sir dynamics on infinite trees. we conclude in section with a discussion of opportunities for future work. we start with the simple example in which g = k . according to the sir process, one of the two vertices is infected, and then it either infects the other vertex or it recovers before doing so. let z denote the random amount of time it takes for the second vertex to become infected. notice that z = ∞ with probability µ λ+µ . on the other hand, if we condition on the event z < ∞, then the distribution of z is exponential with rate λ + µ. (this is a consequence of the fact that the minimum and minimizer of independent exponential random variables are independent.) notice that if we could estimate then we could recover (λ, µ), as desired. for example, if we had access to multiple independent draws of the sir model on k , then we could obtain such estimates. for certain types of graphs, we can actually simulate this setup, and this is the main idea of our approach. in practice, we will not have the time to determine whether z < ∞, and so we instead truncate z ← min{z, τ } for some threshold τ > . in particular, we write z ∼ ci(λ, µ, τ ) to denote a random variable with distribution we seek to estimate λ and µ given τ and estimates of the following quantities: first, we show that good estimates of p and q yield good estimates of λ and µ: lemma . suppose m := (λ + µ)τ ≥ , and take p and q such that for some > . then proof. first, observe that since m ≥ , we have (m + )e −m ≤ / , and so thus, and similarly for ( − p )/q. next, we produce estimates p and q given independent realizations of z: proof. for convenience, we put k := |a| and identify a = [k]. we have el i = p, var l i = p( − p), and l i ∈ [ , ] almost surely. as such, we may apply bernstein's inequality for bounded random variables (see theorem . . in [ ] ): put δ = − e − . then both of the following hold with probability − e −c kp( −p) : note that this implies next, we estimate q. conditioned on a , the random variables {z a } a∈a are all distributed like a τ -truncated version y of a random variable x ∼ exp(λ + µ), and there exist absolute constants c , c > such that as such, we may apply bernstein's inequality for subexponential random variables (see theorem . . in [ ] ): the result follows from the union bound. if we had access to the infected vertices at time t , we could use the formulas in lemma to obtain estimators of the sir parameters that provide a good approximation to the true parameters: lemma . consider the sir model on a graph g with parameters λ and µ. select r, t , t > and put τ := t − t . let b(r) denote the subgraph of g induced by vertices of distance at most r from u ( ). the set of bridges in b(r) with one vertex in i(t ) and another vertex in for any fixed k, > , define the events proof. let s denote the vertices in g of distance exactly r from u ( ). notice that for every vertex v ∈ v (b(r)) \ s, the edges incident to v in b(r) are precisely the edges incident to v in g. as such, the sir processes on g and on b(r) are identical until the stopping time let b(r) denote the subgraph of g induced by vertices of distance at most r from the minimizer of t . for each quantity t (v), z a , a i , a i , b a , b(a), p, q,λ i ,μ i defined in the statement of the lemma, there is a corresponding quantity defined by replacing the sir process on g with the sir process on b(r), and we denote these variables byt (v),z a ,Ã,Ã ,b a ,b(a),p ,q,λ,μ. each of these variables equals its counterpart over the event it remains to bound p(f|e ). conditioned onÃ and {b(a)} a∈Ã , then for each a ∈Ã, the markov property implies thatz a has distribution ci(λ, µ, τ ). also, the vertices {b(a)} a∈Ã are pairwise distinct almost surely. indeed, ifb(a) =b(a ), then if we delete the edge {a,b(a)}, we can still traverse a walk from a to u ( ) to a tob(a ) =b(a), implying {a,b(a)} was not a bridge. as such, conditioned onÃ and {b(a)} a∈Ã , the variables {z a } a∈Ã are jointly independent. then lemmas and together imply of course, in our setup, we do not have access to i(t ), but rather u (t ) = i(t ) ∪ r(t ), and so we cannot apply lemma directly. instead, we assume that λ is sufficiently large compared to µ that u (t ) is a decent approximation of i(t ). this approach is summarized in algorithm . as we will see, the approximation i(t ) ≈ u (t ) introduces some error in our estimators. to analyze the performance of algorithm , it is convenient to focus on a certain (large) family of graphs. we say a graph g is (r, η)-locally tree-like if for a fraction − η of the vertices v ∈ v (g), it holds that the subgraph induced by the vertices of distance at most r from v is a tree. for example, it is known that for every fixed choice of d ∈ n with d > and c ∈ ( , ), there exists γ > such that a random d-regular graph on n vertices is (c log d− n, n −γ )-locally tree-like with probability approaching as n → ∞; see proposition . in [ ] . by focusing on this class of graphs, we may apply the vast literature on sir dynamics on infinite trees to help analyze the performance of algorithm . theorem (main result). fix d ∈ n with d ≥ and c, γ > , and consider any sequence of d-regular, (c log d− n, n −γ )-locally tree-like graphs on n vertices with n → ∞. select any α and β such that < α < β < c e(d− )λ , and put for each n, let e denote the event that the subgraph b(r) induced by vertices of distance at most r from u ( ) is a tree, and let e ∞ denote the event that u (∞) contains a vertex of distance greater than r from u ( ). suppose λ ≥ µ > . then one of the following holds: , meaning there is a subsequence of n for which, with probability at least − o( ), no vertex outside of b(r) will ever be infected, in which case the infection does not spread to even a constant fraction of the graph. (b) for each n, the following holds: with probability tending to as n → ∞. . the proof is given in the next section, and figure illustrates the actual behavior of algorithm for comparison. the factors of in ( ) are due to the approximation i(t ) ≈ u (t ), and they have not been optimized. we suspect that these factors can be replaced by terms that approach as d → ∞, but this requires a different technique. we also suspect the hypothesis λ ≥ µ is an artifact of our proof. as the following lemma indicates, the threshold (d − )λ > µ would be more natural; this threshold arises from standard results on galton-watson processes (see the lecture notes [ ] , for example). lemma . consider any sequence {g n } of d-regular graphs on n vertices with n → ∞ such that g n is (r n , η n )-locally tree-like with r n → ∞ and η n → . for each n, consider the sir model on g n with parameters λ and µ, and let e ∞ denote the event that u (∞) contains a vertex of distance greater than r from u ( ). proof. restrict to the event e , and put {v} = u ( ). deleting v from b(r n ) produces d connected components, each of which can be viewed as a subgraph of the infinite (d − )-ary tree t that is rooted by the corresponding neighbor w of v. the sir evolution on t determines a galton-watson process that gives the eventual number z m of unsusceptible vertices at distance m ≥ from w: where x m,k denotes the number of vertices infected by the kth infected vertex in t that has distance m from w. the random variables {x n,k } n,k≥ are independent with distribution matching a random variable that we denote x. we can describe the distribution of x as follows. draw random variables t ∼ exp(µ) and c , . . . , c d− ∼ exp(λ). if n denotes the number of k ∈ [d − ] for which c k < t , then n is distributed like x. conditioned on t , this number is a binomial with parameters d − and − e −λt . hence, since in addition, it holds that p{x > } > , theorem . in [ ] gives that ∞ m= z m is finite almost surely. put m := sup{m : z m > }. then m < ∞ almost surely. in particular, a union bound over the d different neighbors of v gives where the last step uses the fact that the survivor function of m vanishes at infinity. restrict to the event e , and put {v} = u ( ). as before, we identify a galton-watson process to analyze, but this one is slightly different: delete one of the neighbors of v from b(r n ) and identify the connected component c containing v with a subgraph of a (d − )-ary tree t with root v. the sir evolution on t determines a galton-watson process that gives the eventual number z m of unsusceptible vertices at distance m ≥ from v: where x m,k denotes the number of vertices infected by the kth infected vertex in t that has distance m from v. the random variables {x m,k } m,k≥ are independent with distribution matching a random variable that we denote x. we see that p(e c ∞ |e ) is at most the extinction probability q of this process. by theorem . it follows that q ≤ p −p −p . before estimating p and p , it is helpful to introduce some notation. an infected parent with d − children infects x of these children. the parent recovers exponentially with rate µ, and we let r denote the recovery time. simultaneously, each child is infected exponentially with rate λ, and so we denote independent random variables i i ∼ exp(λ) such that gives the time of infection for child i. then next, the order statistic i ( ) has the same distribution as this identity combined with the fact that x → µx µ+λx is an increasing function gives proof of theorem the last statement follows from lemma . to prove the remainder of the result, we will assume that (a) does not hold and prove that (b) holds. since (a) does not hold, there exists some κ > such that p(e ∩ e ∞ ) > κ for all sufficiently large n. select n sufficiently large, and let f denote the failure event that ( ) does not hold for some o( ) function to be identified later. we wish to show that p(f|e ∩ e ∞ ) = o( ). let e denote the event that u (t ) ∪ ∂u (t ) is contained within distance r of u ( ). in particular, on the event e ∩ e , all of infected and recovered vertices at time t reside in a tree with root u ( ). also, selecting k := α(µ+λ) · c log d− n, let e denote the event that |u (t )| ≥ k. we will make use of two simple inequalities involving arbitrary events a, b, and c. first, furthermore, if p(c c |b) = , then two applications of ( ) gives to bound the third term in ( ), we let p λ,µ denote the probability measure corresponding to the sir model with parameters λ and µ. then, for sufficiently large n, we have as such, it suffices to show that p λ, (e ∩ e c ) = o( ). we accomplish this by analyzing the corresponding branching process: lemma . let g denote the infinite d-ary tree. consider the process {h t } t≥ of induced subgraphs of g in which h is induced by the root vertex, and then for each v ∈ v (g)\v (h t ) that is a g-child of some vertex in h t , it holds that v is added to h t at unit rate. let b m denote the first time at which a vertex of distance m from the root vertex of g is added to h t . then b m ≥ m/( ed) with probability ≥ − e −m/ . proof. let b m,r denote the rth time at which a vertex of distance m from the root vertex of g is added to h t , and note that b m = b m, . let c, θ > be given (to be selected later). then markov's inequality gives where the last identity, which appears in theorem in [ ] , follows from analyzing a certain martingale. in our case, x := b , ∼ exp(d) and x r := b ,r+ − b ,r ∼ exp(d − r) for r ∈ { , . . . , d − } are independent. it follows that overall, selecting θ = ed and c = /( ed) gives the result. lemma . suppose g is (r, η)-locally tree-like, consider the sir process on g with µ = , and take any t ≤ r− e(d− )λ . then p(e c |e ) ≤ e −(r− )/ . proof. by time dilation, we may take λ = without loss of generality. condition on e . after time t ∼ exp(d), there are two infected vertices u and v. removing the edge {u, v} from h produces two (d − )-ary trees, with root vertices u and v. extend these trees to infinite (d− )-ary trees h u and h v . let b u denote the first time at which a vertex of distance r − from the root vertex is infected in h u , and similarly for b v . then by assumption on t , we have finally, we apply the union bound and lemma to get overall, ( ) and lemma together give p(e c |e ∩ e ∞ ) = o( ). next, we may combine this bound with ( ) to bound the second term in ( ): ( ) . to this end, it is convenient to consider the stopping time t := inf{t : t and c < , it follows that next, our assumptions that α < c e(d− )λ , (d − )λ > µ, and c < together imply from which it follows that k < r. on the event e ∞ , it holds that u (∞) induces a tree that contains a path of length greater than r, from which is follows that |u (∞)| > r. as such, this allows us to continue ( ): to continue, we show that p( proof. the random number n of transitions that occur over the interval ( , t ] is given by conditioned on the sequence of states of the discrete time markov chain {m n } n≥ , the transition times {x n } n≥ are independent and exponentially distributed with (deterministic) parameters λe(i n , s n ) + µ|i n |. for any sequence of states in the event {t < ∞}, the first n of these parameters are all at least µ + λ. put e := {t < ∞} and denote p e (a) := p(a|e). then put t := k µ+λ . conditioning on n , we may apply bernstein's inequality for subexponential random variables (see theorem . . in [ ] ). in particular, we let c > denote a universal constant such that any random variable of the form x ∼ exp(λ) satisfies x − ex ψ ≤ c λ . then where the last step applies the bound n ≤ k and c := c max(c ,c) . overall, ( ), ( ) and lemma together give that the second term in ( ) is o ( ) . it remains to show that the first term in ( ) is o ( ) . to this end, we first show that p( where the last step applies ( ) and lemma . combining this with ( ) and lemma after a union bound then gives as claimed. as such, it suffices to show that . for this, it is convenient to consider the event a t = {e(i(t ), s(t )) > } that the infection is still spreading at time t . since e is the event that the infection stays within b(r) up to time t (i.e., after t ) and e ∞ is the event that the infection eventually escapes b(r), it follows that since λ ≥ µ and d ≥ , we may select any ∈ ( µ µ+λ , − d ); we will refine our choice later. defining b t := {|r(t )| < |u (t )|}, we then have we bound the first term of ( ) by analyzing the underlying discrete time markov chain: almost surely, the end state of this process takes the form (s, ∅, v (g) \ s). importantly, {m n } n≥ is a discrete time markov chain in which, conditioned on m n , it holds that r n+ strictly contains r n with probability µ|i n | λe(i n , s n ) + µ|i n | . we will consider this process until the stopping time n := inf{n : e(i n , s n ) = }. specifically, for each n ∈ { , . . . , n }, let x n indicate whether the nth transition recovers a vertex, i.e., r n strictly contains r n− . for each n > n , let x n be bernoulli with success probability µ µ+λ , all of which are independent of each other and of m n . put m n := m min(n,n ) . let p n denote the (random) probability measure conditioned on the state history {m j } n j= . notice that in the event {n < n }, we have the bound p n {x n+ = } = p{x n+ = |m n } = µ|i n | λe(i n , s n ) + µ|i n | ≤ µ µ + λ almost surely. meanwhile, in the complementary event {n ≥ n }, x n+ is bernoulli with success probability λ µ+λ and independent of m n , and so p n {x n+ = } = p{x n+ = |m n } = p{x n+ = } = µ µ + λ almost surely. overall, p n {x n+ = } ≤ µ µ+λ almost surely. next, let j denote a set of positive integers j < · · · < j m . then the law of total probability gives next, p jm− {x j = ∀j ∈ j \ {j m }} ∈ { , }, and so take expectations of both sides and apply induction to get next, let n denote the number of transitions that have occurred by time t . in the event a t , it holds that n < n , and so |r(t )| = n j= x j . also, we have n ≤ |u (t )|. as such, µ+λ . this then gives overall, lemma gives that the first term in ( ) is o( ). it remains to bound to second term in ( ). to do so, we restrict to the event e ∩ e ∩ e ∩ b t and argue that f occurs with probability o( ). we adopt the notation from lemma and algorithm . since b(r) is a tree, a consists of the vertices in u (t ) that have a neighbor in s(t ), while a i = a ∩ i(t ). for every a ∈ a \ a i ⊆ r(t ), since a cannot infect b(a), it holds that z a = τ . it follows that q = q i and p = |a i | |a| p i , and sô λ λ i = p p i = |a i | |a| ,μ µ i = − p − p i = − |a i | |a| p i − p i = + − |a i | |a| · p i − p i . as such,λ λ i ≤ andμ µ i ≥ . to obtain bounds in the other directions, we require a lemma: in this paper, we introduced a simple algorithm to estimate sir parameters from early infections. there are many interesting directions for future work. first, we do not believe that theorem captures the true performance of algorithm , especially in light of figure , and this warrants further investigation. next, it would be interesting to consider other types of estimators. notice that since algorithm explicitly makes use of certain properties of the underlying graph, it is clear why it fails for the complete graph. does the behavior of the maximum likelihood estimator have a similarly transparent dependence on the underlying graph? we focused on locally tree-like graphs in part because there is a rich literature on sir dynamics over infinite trees, but it would be interesting to analyze the performance of algorithm on other graph families. also, there is a multitude of compartmental models for epidemics with various choices of probability distributions for transition times between compartments. finally, one might consider alternative models for what data is available. for example, to model asymptomatic infections, one might assume that a random fraction of infected vertices are not known to be infected. the simple galton-watson process: classical approach local kesten-mckay law for random regular graphs the first birth problem for an age-dependent branching process a contribution to the mathematical theory of epidemics limit theorems for sequences of jump markov processes early transmission dynamics in wuhan, china, of novel coronavirusinfected pneumonia high-dimensional probability: an introduction with applications in data science the authors thank boris alexeev for interesting discussions that inspired this work. dgm was partially supported by afosr fa - - - and nsf dms . since u (t ) induces a subtree of g, lemma gives that |a| ≥ ( − d )|u (t )|, and sowhere the last two steps use the fact that d ≥ and the choice = ( + ) µ µ+λ for some small > . it follows thatwhere the last two steps use the facts that λ ≥ µ and is small. next, considersincek and τ both increase with factors of log n, lemma implies that (λ i ,μ i ) converges to (λ, µ) in probability. as such,where the last step holds when n is large. the result then follows from the fact that (λ i ,μ i ) converges to (λ, µ) in probability. key: cord- - b o ccp authors: saakian, david b. title: a simple statistical physics model for the epidemic with incubation period date: - - journal: nan doi: nan sha: doc_id: cord_uid: b o ccp based on the classical sir model, we derive a simple modification for the dynamics of epidemics with a known incubation period of infection. the model is described by a system of integro-differential equations. parameters of our model directly related to epidemiological data. we derive some analytical results, as well as perform numerical simulations. we use the proposed model to analyze covid- epidemic data in armenia. we propose a strategy: organize a quarantine, and then conduct extensive testing of risk groups during the quarantine, evaluating the percentage of the population among risk groups and people with symptoms. mathematical modeling for epidemiology has a rather long history, dating back to the studies by d. bernoulli [ ] . later, kermack and mckendrick [ ] proposed their prominent theory for infectious disease dynamics, which influenced the following sir and related models. by the end of the last century, significant progress in the field was made (a systematic literature review for this period is presented in anderson and may's book [ ] ). the covid- pandemic has drawn the attention of researchers from all over the world and different areas to epidemic modeling. one of the simplest sir models for the virus spread in northern italy was introduced in [ ] . another research group used the logistic equation to analyze empirical data on the epidemic in different states [ ] . here, we mainly focus on mean-field models that discard the spatial dependence of the epidemic process. therefore, we avoid network models of epidemics. moreover, it is crucial to consider the final incubation period of the disease to construct a correct model for the covid- case. taking into account this distinctive feature, we consider the dynamics of the aged-structured population, which a well-known problem in evolutionary research [ ] - [ ] . generally, epidemic models have a higher order of non-linearity than evolutionary models, although there are some similarities between these two classes. in this study, we derive a system of integro-differential equations based on the rigorous master equation that adequately describes infection dynamics with an incubation period, e.g., covid- . first, we discuss the sir model. then, we move on to its modification and apply it to the data on the covid- epidemic in armenia. consider the sir model where the parameter s stands for the number of susceptible people, i for the number of infected people, and r for the number of people who have recovered and developed immunity to the infection. we assume that s, i, r satisfy the constraint s + i + r = n . where /b is the period when the infected people are contagious. the parameter a can be obtained from the empirical data on infection rate: thus, we assume that a healthy person is infected with a probability proportional to the fraction of infection in the population. probability is also proportional to the population density. one of the most widely discussed and crucial parameters in epidemiological data is the basic reproduction number of the infection: for the covid- , it has been estimated as [ ] : in fact, the real data allows us to measure three main parameters: the exponential growth coefficient at the beginning of the epidemic; the minimum period of time, in which an infected person can transmit the infection; and the maximum period, when an infected person ceases to transmit the infection. the most important objectives of the investigation are the maximal possible proportion of the infected population, and then the period before the peak of the epidemic. consider the spread of infectious diseases with a recovery period up to t days. at the t-th moment of time, we have s(t) for the size of the susceptible population, r(t) for the recovered population. we divide the infected population according to the age of infection, looking at time intervals δ and defining i l (t) as the number of infected people with the age of infection in (lδ, (l+ )δ). we assume that the incubation period for a random infected person is l and the infection spreads from l to t days. below, we take δ → for the continuous mode of time limit. assuming that the spread of infection has a rate a, we obtain the following system of equations: where the coefficient a is expressed via the infection rate we suggested that after t days a person recovers and the patients are not isolated from the rest of the population between days l and t . eq. ( ) describes the dynamics of the population over discrete time, which is the right choice for numerical simulation. consider now the continuous-time version of the model. in the limit of small δ, we introduce the continuous function i l (t) = i(x, t), where δi(x, t) is the size of the infected population with age x, x + δ. the continuous time versions for the first three equations are: the solution of the second equation in eq. ( ) is where we denote j(x) = i( , t) let us look at the difference using the latter expressions, we get the following full system of equations: at the start, when substituting an ansatz j(t) = e kt , we get: at k → , we get: for increasing α, we get an increasing value of k as well. in the sir model the epidemic threshold is at r = or a = b, so our model is similar to sir with b = /(t − l). in fig . , we analyze the epidemiological data for covid- in armenia using our model. we examine the dynamics of infected population in armenia from march , when the quarantine in the country has been introduced by the government, until april . let us consider the case, when the infectivity (the ability to transfer the infection to susceptible individuals) of infected individuals depends on the age of infection (via a kernel f (x)), also the population with the age x is diluted with the rate g(x). the latter seems to be a reasonable assumption, since an infected individual with a large age reveals some symptoms of infection, therefore, has chances to be isolated. now eq. ( ) is modified: the continuous time limit gives the following system of equations: consider now the asymptotic solution: then, we get the following equations: and thus, we derive for the epidemics threshold: a. the specific functions g(x) let us analyze our eq. ( ). if we are trying to reduce the growth rate k, it can be done in two ways: . reducing the number of contacts, a, let us introduce non-zero reduction, just after days, g(x) = g. hence, we get the following result: we should estimate the value of g that stops the epidemics. we now apply the generalized version of the model to the epidemics in armenia. we look at two periods of epidemics: the first period from march to april and the later (second period), when quarantine starts to work efficiently. a. the choice g= in the model. let us first take g = for the first period, see table . the parameter a in our model is proportional to the number of human contacts during the day. in the first time period, we have had a = . , k = . . after the quarantine in armenia, k decreased from the value . till the value . , with a = . . the critical value of a to eliminate the epidemics is a = . . we reduced % of human contacts. the reducing further % of remaining contacts, we can eliminate the epidemics. let's evaluate what degree of g we need to eliminate the epidemic at given values ofa, if we attribute the current situation to g = . for the case of quarantine, we take the current value k = . , then we see that g = . eliminates epidemics. the parameter a in our model is proportional to the number of human contacts during the day. in the first period of epidemics, we had a = . , k = . . after quarantine in armenia, k decreased from . to . , with a = . . the critical value of a for the elimination of epidemics is a = . . we have reduced % of human contacts. a further reduction of % of the remaining contacts, we can eliminate the epidemic. for the case without quarantine, k = . , we need g = . to eliminate the epidemics, much more efforts compared to the previous considered case. . let us take g = . for the first period (we identify the % of infected individuals during a week), see table ii . then, we apply a = . for the first period, a = . during the second period and we need a = . to eliminate the epidemic. due to the quarantine, we reduced % of contacts, we now needs in reducing of % of existing contacts. holding current values of contacts, we need rising the value of g from . to . . we verified that taking g = . before the quarantine does not give adequate results. how can we increase g in practice? testing the % per week in high risk groups of the population, we can eliminate the epidemics. in this paper, we introduced a version of sir model for infection spreading with known incubation period. this model was applied to analyze the covid- epidemic data in armenia. we constructed the simplest version of population dynamics of age-structured population. close work has been done in [ ] , which is related to sir model. in [ ] , a temporal kernel f (t) has been introduced that modulates the infectivity of each infected individual. compared to such model, we introduced the distribution of infected population at given moment of time via an age of infection, instead of looking just long history of focus populations. in other works related to the population dynamics of age-structured population, the differential equations with time delay usually have been considered. instead, we use integro-differential system of equations, which seems to be an adequate approach to the current situation with covid- epidemic. from our perspective, the proposed approach significantly changes the epidemiological picture (compare to classical sir models), since the virus is active for about two weeks. next, we introduced two functions: f (x), which describes the distribution of infectivity by age, and g(x), which describes the content measures. in the normal sir model, we have two parameters for the rate of infection and the removal of infections. in our integrodifferential model, mapping to elementary processes is straightforward: we just need the velocity parameter a and two periods: the incubation period l and recovery period t with symptoms after the carrier is separated from society. we derive an analytical result for exponential growth in the early stages of epidemics, as well as for the epidemic threshold. it will be very interesting to investigate the transitional situation near the threshold. we suggest simply making numbers and choosing a parameter value to match the correct exponential growth. we applied our model to understand the situation with epidemics in armenia. what is advantageous in our model, that we can clearly separate two aspects of the epidemic: contact strength (through coefficient a) and deterrence measures through parameter g. we check that in fact we need to make minimal efforts to stop the epidemics, and testing is much cheaper during quarantine. currently, if we detect only . % of the infected population per day, strictly monitoring the symptoms, we can stop the epidemic. this is much more complicated without quarantine. i thank armen allahverdyan, pavel krapivsky, ruben poghosyan, didier sornette, and tatiana yakushkina for useful discussions. the work is supported by the russian science foundation under grant - - from russian university of transport infectious diseases in humans data analysis for the covid- early dynamics in northern italy evolution in age-structured populations rdoes good mutation help you live longer? different fitnesses for in vivo and in vitro evolutions due to the finite generation-time effect db saakian, as martirosyan generalized logistic growth modeling of the covid- outbreak in provinces in china and in the rest of the world ke wu, didier darcet epidemic processes in complex networks epidemics with containment measures key: cord- -nrkvbdu authors: steinmann, paul title: analytical mechanics allows novel vistas on mathematical epidemic dynamics modelling date: - - journal: nan doi: nan sha: doc_id: cord_uid: nrkvbdu this contribution aims to shed light on mathematical epidemic dynamics modelling from the viewpoint of analytical mechanics. to set the stage, it recasts the basic sir model of mathematical epidemic dynamics in an analytical mechanics setting. thereby, it considers two possible re-parameterizations of the basic sir model. on the one hand, it is proposed to re-scale time, while on the other hand, to transform the coordinates, i.e. the independent variables. in both cases, hamilton's equations in terms of a suited hamiltonian as well as hamilton's principle in terms of a suited lagrangian are considered in minimal and extended phase and state space coordinates, respectively. the corresponding legendre transformations relating the various options for the hamiltonians and lagrangians are detailed. ultimately, this contribution expands on a multitude of novel vistas on mathematical epidemic dynamics modelling that emerge from the analytical mechanics viewpoint. as result, it is believed that interesting and relevant new research avenues open up when exploiting in depth the analogies between analytical mechanics and mathematical epidemic dynamics modelling. the global covid- pandemic, with alleged outbreak by the end of in wuhan, china [ ] ,despite its devastating implications for health, economy, and society -has in particular challenged modelling and simulation of mathematical epidemic dynamics. political decision makers around the globe seek (or should seek) advise from scientist such as virologists, biologists, clinicians, economists, sociologist as well as modelers from different fields. especially the latter are in the position to virtually simulate various scenarios based on well-founded assumptions in order to provide support and guidance for the difficult and momentous decisions of politicians, e.g. on lockdown measures and step-wise exit strategies thereof. thus, the critical importance of modelling is clearly appreciated, and indeed, mathematical epidemic dynamics modelling is a well-established and mature field: traditional mathematical modelling of epidemic dynamics roots in the concept of susceptible, infected, and recovered (sir) compartments as originally proposed in [ ] . various modifications extend the classical sir model to account for further compartments such as, e.g., deceased (sird model), exposed (seir model), and quarantined (siqrd model), among many other sophisticated options, see [ , ] . classical sir-type compartment-based models are coupled ordinary-differentialequations (odes). extending the ode-based sir-type modelling approach to integro-differentialequations (ides) allows to also consider the detailed course of the disease, e.g., delay due to incubation time and the infectious period, see [ ] . sir-type models describe the temporal spread of infectious diseases for integral populations, thereby, however, neglecting the interconnectedness of spatially distributed geographic areas. recently proposed multiple compartment models, e.g. the mcsir model in [ ] , also allow consideration of the geographical spread of potentially multiple infectious virus strains within the population and its potentially multiple subgroups (e.g. age groups). spatial network models, e.g. [ , , ] , for example based on the seir model at each network node can qualitatively simulate the outbreak dynamics of infectious diseases and the impact of travel restrictions in geographical areas at the global (macro) scale such as china and the usa [ ] or europe [ ] . however, due to its stochastic nature and strong impact of socio-economic factors, modelling epidemic dynamics within geographical areas at the local (micro) scale requires the use of another modelling paradigm, i.e. rule-driven, agent-based models [ ] . agent-based models allow for example studying the effect of various lockdown exit strategies on local geographical entities with only a comparatively small number of individuals (agents), see e.g. [ ] . regardless of the modelling approach taken, quantitative predictions of epidemic dynamics remain challenging and critically require careful identification of model parameters from reliable databases, see e.g. [ ] . given this mature state of affairs, why is a novel, alternative view on mathematical epidemic dynamics modelling justified at all? the answer is as follows: to date, modelers of complex mechanical systems and behavior have developed a versatile and extremely successful tool-set, including sophisticated analytical and in particular efficient and accurate computational methods. examples are techniques to master severe non-linearities and couplings with non-mechanical fields, a multitude of multi-scale and homogenization modelling approaches as well as incorporating uncertainty quantification into modelling and simulation. mathematical epidemic dynamics modelling can undoubtedly benefit largely from this accumulated expertise! in summary, it is therefore believed that first translating epidemic dynamics models into an analytical mechanics setting (related steps towards this aim may be found, e.g., in [ , , ] ) and then, secondly, exploiting the analogy between the two approaches while utilizing the full tool-set of mechanical modelling, can provide novel vistas and unprecedented opportunities. the present contribution aims to sketch out a few of these perspectives and to encourage the mechanics community to offer its strong modelling expertise to possibly and hopefully help further improve epidemic dynamics modelling. classical modelling of epidemics dynamics roots in the concept of susceptible, infected and recovered (sir) compartments as originally proposed in [ ] . the basic compartment-based sir model is the following set of two coupled ordinary differential equations (odes) here, i and s denote the stock of individuals in the infected and the susceptible compartments, respectively, normalized by the size of the entire population. the notation for the derivative of a quantity with respect to ordinary time t is the parameters β and γ are the infection and the recovery rate, respectively, with their ratio defining the basic reproduction number r := β/γ. note finally that the stock of individuals in the recovered compartment follows from the constraint s + i + r = , thus the evolution equation is tacitly suppressed in our presentation. in order to recast the basic sir model into a format more amenable to the analytical mechanics setting, it is proposed, as a first option, to re-scale time as the notation for the derivative of a quantity with respect to re-scaled time τ is consequently, the derivatives with respect to re-scaled and ordinary time are related via as a result, the basic sir model is eventually re-parameterized in terms of re-scaled time as in the time re-parameterized sir model the right-hand-side is abbreviated as the forcing term f , i.e. as the column matrix consisting of the time re-parameterized rate of infection v and force of infection f (a common terminology from mathematical epidemiology already establishing a semantic analogy to mechanics exclusively in the stock of individuals in the infected compartment and with the basic reproduction number and the infection rate as parameters. the minimal phase space coordinates collectively assembled in the column matrix z ∈ r , i.e. the generalized coordinate q defined as the stock of individuals in the infected compartment and the generalized momentum p defined as the stock of individuals in the susceptible compartment, span the two-dimensional phase space È, thus È := z := q p := i s . the hamiltonian h(z) in minimal phase space coordinates, which eventually results in the time re-parameterized sir model from eq. , is then identified as indeed, the corresponding hamilton equations deliver a re-formulation of the result in eq. , i.e. symbolic notation clearly reveals the hamiltonian structure of the time re-parameterized sir model thereby, the skew-symmetric j ∈ r × r denotes the so-called symplectic matrix in È with j t = −j and j = −i, whereby i ∈ r × r is the common unit matrix in È. the hamiltonian structure in terms of the skew-symmetric j clearly identifies the (autonomous) hamiltonian h(z) in minimal phase space coordinates as first integral, i.e. as a conserved quantity under the flow implied by the hamilton equations, since the gradient of the hamiltonian h(z) with respect to the minimal phase space coordinates, abbreviated in the sequel as g(z) ∈ r , computes as taken together, the time re-parameterized sir model obeys hamiltonian structure and identifies the relation between the gradient g(z) ∈ r of the hamiltonian h(z) in minimal phase space coordinates and the forcing term f (z) ∈ r as exchanging the time derivative to the one with respect to ordinary time t destroys the clean hamiltonian structure in terms of a constant symplectic matrix, i.e. one may, of course, re-interpret this result as a hamiltonian structure with non-constant, coordinatedependent symplectic matrix [si] j on a non-flat manifold. the minimal state space coordinate, i.e. the generalized coordinate q defined as the stock of individuals in the infected compartment, spans the one-dimensional state space Ë, thus Ë := {q := i} . the supremum condition identifies i • with the derivative ∂ s h(z) of the hamiltonian in minimal phase space coordinates from eq. with respect to the generalized momentum, and renders re-solving the above supremum condition for s in terms of i • delivers based on the lagrangian in minimal state space coordinates, hamilton's principle results in the stationarity condition clearly, the euler-lagrange equation in minimal state space coordinates coincides with the single, non-linear ode formulation of the time re-parameterized sir model in eq. . alternatively, extended state space coordinates collectively assembled in the column matrix q ∈ r , i.e. the generalized coordinates jointly defined as the stock of individuals in the infected and susceptible compartments, span the two-dimensional state space Ë, thus then the lagrangian l(q, q • ) in extended state space coordinates, which eventually results in the time re-parameterized sir model from eq. , is determined as here, the legendre transformation term q · j t · q • = q • · j · q expands as the skew-symmetric form whereas the hamiltonian h(q), corresponding to eq. , is now parameterized in terms of the extended state space coordinates q with as a result, the lagrangian in extended state space coordinates then renders the stationarity conditions of the corresponding hamilton's principle as , the euler-lagrange equations thus follow as unfolding the compact symbolic notation, these expand concretely into the euler-lagrange equations in eq. are next re-formulated by recalling that due to the skewsymmetry of the symplectic matrix, finally, with j = −i, the re-formulated euler-lagrange equations in eq. are modified into obviously, this format recovers the relation between the gradient g(q) ∈ r of the hamiltonian h(q) (in extended state space coordinates) and the forcing term f (q) ∈ r already established previously in eq. . likewise, exchanging the time derivative to the one with respect to ordinary time t results again in a formulation with non-constant, coordinate dependent symplectic matrix [si] j on a non-flat manifold. for academic curiosity it is also interesting to consider extended phase space coordinates q, i.e. the generalized coordinates jointly defined as the stock of individuals in the infected and susceptible compartments, and p , i.e. heretofore undefined generalized momenta, collectively assembled in the column matrix z ∈ r , to span the four-dimensional state space È, thus a legendre transformation of the lagrangian in extended state space coordinates from eq. defines the associated hamiltonian the corresponding supremum condition identifies p with the derivative ∂ q • l(q, q • ) of the lagrangian l(q, q • ) in extended state space coordinates from eq. with respect to the time reparameterized rate of the generalized coordinates q • , and renders as a result, p does not depend on q • , thus identifying the lagrangian l(q, q • ) in eq. as degenerate in the sense of the dirac's generalized hamiltonian dynamics [ ] . consequently, with q · j t = j · q and j = −i, an additional constraint for the extended phase space coordinates emerges the hamiltonian h(q, p ) in extended phase space coordinates thus follows from legendre transformation by incorporating the constraint via the lagrange multiplier Λ, i.e. taking into account the explicit form of the lagrangian l(q, q • ) in extended state space coordinates from eq. then renders the explicit representation of the hamiltonian h(q, p ) in extended phase space coordinates invoking ∂ q c(q, p ) = i and ∂ p c(q, p ) = j, hamilton's equations based on the hamiltonian h(q, p ) in extended phase space coordinates from eq. result in the unknown lagrange multiplier Λ is determined from the consistency condition for the constraint which, upon introducing hamilton's equations the solution of the consistency condition then renders the explicit representation for the lagrange multiplier using again hamilton's equation q • = Λ · j and exploiting the skew-symmetry of the symplectic matrix, i.e. Λ · j = −j · Λ, recovers once more the already previously established relation between the gradient g(q) ∈ r of the hamiltonian h(q) (in extended state space coordinates) and the forcing term finally, using hamilton's equation p • = −[g(q)+Λ] identifies eventually the time re-parameterized rate of the heretofore unknown generalized momenta as thereby, the last equality follows from j = −i and the previous relation j · g(q) = f (q). alternatively, and again aiming to recast the basic sir model into an analytical mechanics format, it is proposed, as a second option, to logarithmically transform the coordinates (or rather the independent variables) as i → i := ln i and s → s := ln s. as a result, the basic sir model, re-parameterized in terms of logarithmically transformed coordinates, however still in terms of derivatives with respect to ordinary time t, now reads exclusively in the logarithmic stock of individuals in the infected compartment. the minimal (logarithmic) phase space coordinates collectively assembled in the column matrix z ∈ r , i.e. the generalized coordinate q defined as the logarithmic stock of individuals in the infected compartment and the generalized momentum p defined as the logarithmic stock of individuals in the susceptible compartment, span the two-dimensional phase space Ô, thus Ô := z := q p := i s . next, the hamiltonian h(z) in minimal (logarithmic) phase space coordinates, which eventually results in the coordinate re-parameterized sir model from eq. , is identified as as a result, the corresponding hamilton equations deliver a re-formulation of eq. , i.e. symbolic notation showcases clearly the hamiltonian structure of the (logarithmic) coordinate reparameterized sir model thereby, the skew-symmetric j ∈ r × r denotes the appropriate symplectic matrix in Ô j := − ( ) with j t = −j and j = −i, whereby i ∈ r × r is the common unit matrix in Ô. the hamiltonian structure in terms of the skew-symmetric j clearly identifies the (autonomous) hamiltonian h(z) in minimal (logarithmic) phase space coordinates as first integral the gradient of the hamiltonian h(z) with respect to the minimal (logarithmic) phase space coordinates, abbreviated in the sequel as g(z) ∈ r , computes as summarizing, the coordinate re-parameterized sir model obeys hamiltonian structure and identifies the relation between the gradient g(z) ∈ r of the hamiltonian h(z) in minimal (logarithmic) phase space coordinates and the forcing term f (z) ∈ r as it is emphasized that despite the logarithmic nature of the re-parameterized coordinates, here, as a benefit, the hamiltonian structure involves the constant, coordinate-independent symplectic matrix j of a flat manifold as well as derivatives with respect to ordinary time t. the minimal (logarithmic) state space coordinate, i.e. the generalized coordinate q defined as the logarithmic stock of individuals in the infected compartment, span the one-dimensional state space ×, thus × := {q := i} . legendre transformation of the hamiltonian in minimal (logarithmic) phase space coordinates defines the corresponding lagrangian then the supremum condition identifies i • with the derivative ∂ s h(z) of the hamiltonian in minimal (logarithmic) phase space coordinates from eq. with respect to the generalized momentum, and renders re-solving the above supremum condition for s in terms of i • delivers based on the lagrangian in minimal (logarithmic) state space coordinates, hamilton's principle results in the stationarity condition alternatively, extended (logarithmic) state space coordinates collectively assembled in the column matrix q ∈ r , i.e. the generalized coordinates jointly defined as the logarithmic stock of individuals in the infected and susceptible compartments, span the two-dimensional state space ×, thus × := q := i s . the lagrangian l(q, q • ) in extended (logarithmic) state space coordinates, which eventually results in the coordinate re-parameterized sir model from eq. , reads here, the legendre transformation term q · j t · q • = q • · j · q expands as the skew-symmetric form the hamiltonian h(q), corresponding to eq. , is now parameterized in terms of the extended (logarithmic) state space coordinates q as h(q) := β[i(i) + s(s)] − γs. in extended (logarithmic) state space coordinates the lagrangian then renders the stationarity conditions of the corresponding hamilton's principle with ∂ q • l = q · j t / and ∂ q l = q • · j/ − g(q), the euler-lagrange equations thus follow as and concretely unfold into subsequently, the euler-lagrange equations in eq. are re-formulated by noting that q • ·j = −q • ·j t and q • · j t = j · q • , thus j · q • = −g(q). finally, with j = −i, the euler-lagrange equations from eq. re-formulate as this format recovers the relation between the gradient g(q) ∈ r of the hamiltonian h(q) (in extended (logarithmic) state space coordinates) and the forcing term f (q) ∈ r already established previously in eq. . for completeness we also consider extended (logarithmic) phase space coordinates q, i.e. the generalized coordinates jointly defined as the logarithmic stock of individuals in the infected and susceptible compartments, and p, i.e. heretofore undefined generalized momenta, collectively assembled in the column matrix z ∈ r , to span the four-dimensional state space Ô, thus Ô := z := q p with q := i s and p := υ σ . ( ) a legendre transformation of the lagrangian in extended (logarithmic) state space coordinates from eq. defines the associated hamiltonian the legendre transformation identifies p with the derivative ∂ q • l(q, q • ) of the lagrangian l(q, q • ) in extended (logarithmic) state space coordinates from eq. with respect to the rate of the generalized coordinates q • , and renders ∂l ∂q • = q · j t =: p. thus p does not depend on q • , consequently identifying the lagrangian l(q, q • ) in eq. as degenerate, [ ] . consequently, with q · j t = j · q and j = −i, an additional constraint for the extended (logarithmic) phase space coordinates results the hamiltonian h(q, p) in extended (logarithmic) phase space coordinates thus follows from legendre transformation by incorporating the constraint via the lagrange multiplier λ, i.e. taking into account the explicit form of the lagrangian l(q, q • ) in extended (logarithmic) state space coordinates from eq. results in the explicit representation of the hamiltonian h(q, p) in extended (logarithmic) phase space coordinates h(q, p) = h(q) + λ · c(q, p). invoking ∂ q c(q, p) = i and ∂ p c(q, p) = j, hamilton's equations based on the hamiltonian h(q, p) in extended (logarithmic) phase space coordinates from eq. result in the unknown lagrange multiplier λ is determined from the consistency condition for the constraint which, with q • = λ · j and p • = −[g(q) + λ], results in the solution of the consistency condition then renders the explicit representation for the lagrange multiplier using again hamilton's equation q • = λ · j and exploiting λ · j = −j · λ, recovers once more the already previously established relation between the gradient g(q) ∈ r of the hamiltonian h(q) (in extended (logarithmic) state space coordinates) and the forcing term f (q) ∈ r moreover, using hamilton's equation p • = −[g(q) + λ] identifies finally the rate of the heretofore unknown generalized momenta as the last equality follows from j = −i and the previous relation j · g(q) = f (q). in order to systematically explore lessons that can be learned from an analytical mechanics viewpoint on mathematical epidemic dynamics modelling, the coordinate re-parameterized version of the sir model, based on the hamiltonian in phase space, is taken as the point of departure: building on this compact representation, several -analytical-mechanics-inspired -novel vistas on mathematical epidemic dynamics that promise fruitful research avenues for its modelling are identified in the sequel. vista : allowing for non-autonomous, i.e. time-dependent generalized hamiltonians results in the generalized representation z • = j · g(z, t). possible options justifying a non-autonomous hamiltonian are for example: option a) various lockdown measures (cancellation of large events, school closing, contact limitations, etc) as well as their reversal (exit strategies) at discrete points in time are modelled by time-dependent parameters such as for example the infection and the recovery rates β = β(t) and γ = γ(t), thus making the hamiltonian time-dependent. option b) various modifications extend the classical sir model to account for further compartments such as, e.g., deceased (sird model), exposed (seir model), quarantined (siqrd model), among many other, more sophisticated options, see [ , ] . sir+ models of these types are then captured by appropriately extending the phase space variables contribution to the hamiltonian and its gradient. option c) classical sir-type compartment based models are coupled ordinary differential equations (odes). extending the ode-based sir-type modelling approach to integro differential equations allows also considering, e.g., delay due to incubation time and infectious period, see [ ] . for p representing relevant parameters, e.g. continuously distributed risk groups and/or past time, from space p , the right-hand-side of these read as g(z, t) = p j · γ(z, p, t) dp with γ(z, p, t) the appropriate p-density of g(z, t). allowing for an infinite dimensional phase space with its coordinates z = z(x, t) ∈ r +··· defined as fields in four-dimensional space-time, results in the generalized representation here, the right-hand-side is a functional of the phase space coordinates rather than a function. possible options for an infinite dimensional phase space are for example: option a) gradient-type models, whereby the hamiltonian depends on the phase space coordinates z = z(x, t) and their higher spatial gradients consequently, the right-hand-side follows from the variational derivative of the hamiltonian, rather than from its gradient. partial differential equations of reaction-convectiondiffusion-type describing the spatio-temporal spread of infectious diseases are thus a modelling option [ ] . option b) integral-type models, whereby, similar to peridynamics formulations [ ] , the righthand-side follows from a spatial integration g z(x), t = x γ z(x), x, t dx over a cut-off domain x (horizon) that covers spatial interaction. vista : allowing for an finite dimensional phase space with its coordinates z ∈ r [ +··· ]n defined as column matrices, results in the generalized representation possible options for an finite dimensional phase space are for example: option a) partition of the entire population into sub-populations, thereby separately considering different age/gender/risk groups, see e.g. [ ] z = [z pop , · · · , z pop max ]. option b) partition into various geographical locations in general network models accounting for the spatio-temporal spread of infectious diseases, see e.g. [ ] z = [z loc , · · · , z locmax ]. option c) partition into multiple virus strains providing for generic infectious diseases, see e.g. [ ] z = [z vir , · · · , z virmax ]. capturing the interactions among the various partitions in a network is then reflected by the off-diagonal terms in the hessian h := ∂ zz h of the hamiltonian. vista : for a pandemic such as covid- , spatial (geographical) resolution, i.e. resolution of a network is required at multiple scales: at the global (macro) scale, i.e. for the entire globe; at the medium (meso) scale, i.e. for individual countries; and at the local (micro) scale, i.e. for individual cities/communities. a fully detailed spatial resolution at the local (micro) scale for the entire globe is computationally prohibitive, moreover, most often an overkill degree of detail is also not needed and/or not possible due to the lack of data. however, the spatial resolution shall be adaptive to the quantity of interest, e.g. to study the dynamics of infectious disease spread in a particular city/community, only the integral results of more remote locations on the globe matter. these can be captured by a reduced resolution of the network in that geographical remote locations. this asks for a true multi-scale approach that adaptively zooms in only where needed. possible options for multi-scaling are for example: option a) vertical coupling of scales relies on the assumption that the two scales considered are sufficiently separated, see e.g. [ ] . then the 'force' term on the right-hand-side can be up-scaled from a sub-scale model by averaging in the sense of computational homogenisation z • =< j · g(z, t; z) > . here, the sup-scale (indicated by an over-bar) at the left-hand-side behaves like a sirtype model whereas the sub-scale model at the right-hand-side lives either on a finite dimensional phase space or is represented by a rule-driven, so-called agent-based model. agent-based models are an alternative modelling paradigm considering only a comparatively small number of individuals (agents). they are capable to capture the stochastic nature and strong impact of socio-economic factors present at small scales, see, e.g., [ , ] . the sub-scale model is driven by the sup-scale phase space coordinates, whereby a proper scale-transition condition defines suited boundary/initial conditions at the sub-scale. option b) horizontal coupling of scales, analogously to the quasi-continuum method [ ] , requires adaptive resolution of the network spacing, here indicated by the sup-script h adaptivity requires suited network densification indicators that may follow from a proper error analysis, a topic that is still largely under-investigated for epidemic dynamics models. vista : the availability and reliability of recorded data, e.g. regarding the cumulative or daily infection cases, during an epidemic is typically characterized by a large degree of uncertainty, e.g. regarding the infection rates, the degree of immunity and/or their dark figures. uncertainty quantification is based on simulations with uncertain data thereby, uncertain data is here parameterized in terms of elementary events ω from which one may repeatedly draw samples to investigate uncertainty propagation throughout our model, see e.g. [ , ] . possible options for the description of uncertainties are for example: option a) aleatoric uncertainties require the use of random variables with probability density function (pdf) as a measure of likelihood (e.g. gaussian pdf in terms of the mean value and standard deviation). aleatoric uncertainties are stochastic by nature and may not be neglected when the standard deviation is large. option b) epistemic uncertainties may be captured by fuzzy variables with possibility density function as a measure of degree-of-membership (e.g. symmetric triangular membership function in terms of its modal value and support). epistemic uncertainties reflect a lack of knowledge and, in the case of epidemic dynamics modelling, can be reduced by increasing testing for either infections and/or for anti-bodies. the discrete trajectory in time of the phase space variables is algorithmically traced by an integrator of the generic format z n+ = z n + ∆t j · g z n+α (ω), t n+α . here, sub-scripts n+ , n, and n+ α refer to, respectively, the end point, the start point, and an intermediate point of/within a time step of length ∆t. possible options for time integrators that display different accuracy, stability, and robustness, in particular when integrating non-linear right-hand-sides are for example: option a) runge-kutta integrators are of-the-shelf algorithms that come in a variety of different flavours (following from the corresponding butcher tableau) like, e.g., single-and multi-stage integrators of varying algorithmic accuracy. however, they do not necessarily respect first integrals such as the conservation of the hamiltonian for autonomous cases and may thus suffer from long-term deterioration of algorithmic stability and robustness. option b) variational integrators are based on a discrete form of the action integral, whereby the integrand of the action integral is given by a discrete lagrangian the resulting discrete hamiltonian principle then renders a variational integrator that follows from the discrete action integral being stationary. variational integrators preserve symmetries (momentum maps) and structure (symplecticity) and are thus characterized by long-term algorithmic accuracy, stability, and robustness, see e.g. [ ] . option c) time-finite-element integrators follow from discretizing the galerkin (weak) form of the hamilton equations choosing appropriate ansatz spaces for the test and trial functions, and suited quadrature rules for approximating the time integrals render integrators of arbitrary algorithmic accuracy that are also characterized by long-term algorithmic stability and robustness, see, e.g. [ ] . the underlying equations governing epidemic dynamics are oftentimes unknown. however, they may be discovered from a data-driven approach [ ] if sufficient many data is available, a scenario that is typically met for the spatio-temporal spread of infectious diseases. the key idea is then to connect the matrix arrangement of available discrete data points for the rate of the phase space coordinates by a matrix of ansatz functions, e.g. monomials of the matrix arrangement of available discrete data points for the phase space coordinates, with the matrix arrangement of discrete ansatz parameters. [z • dat , · · · , z • datmax ] t = a(z dat , · · · , z datmax )[a par , · · · , a par max ] t . only few relevant entries in the matrix arrangement of the discrete ansatz parameters are then determined from sparse regression, consequently the resulting models are denoted as parsimonious and compromise between accuracy and complexity, while avoiding overfitting. vista : many more modelling approaches inspired by analytical mechanics are conceivable and it is left to the mechanics community to harness those to further improve mathematical epidemic dynamics modelling. first, this contribution explored options of how to recast the classical sir model of mathematical epidemic dynamics modelling in the variational setting of analytical mechanics. in particular, it demonstrated that two conceptually entirely different re-parameterizations of the basic sir model, i.e. either by re-scaling time or by transforming coordinates (independent variables), severely ease identification of corresponding hamiltonians and lagrangians for use within hamilton's equations and hamilton's principle. in each case, formulations in either minimal or extended phase and state space coordinates are possibly, providing in total eight different modelling options. interestingly, in minimal phase space coordinates, the stock of individuals in the infected and the susceptible compartments represent the generalized coordinate and the generalized momentum, respectively. in contrast, for extended phase space coordinates, they jointly represent the generalized coordinates, whereas the associated generalized momenta are initially unknown and only follow from exploiting a constraint on the extended phase space coordinates. however, regardless of the particular formulation chosen, from either hamilton's equations or hamilton's principle one eventually recovers the original set of coupled odes of the sir model. as a recommendation, logarithmically transforming the coordinates appears more attractive, since derivatives with respect to ordinary time are retained for the evolution of the phase space coordinates. as an important perspective, recasting the classical sir model in one of the eight different modelling options enables the analytical mechanician to employ the full mechanical modelling tool-set for a plethora of important extensions. the striking analogy between analytical mechanics and mathematical epidemic dynamics modelling opens up a multitude of fascinating and relevant new research avenues for the progression of the latter. it is thus believed that future exploitation of the hamiltonian and/or lagrangian structure of mathematical epidemic dynamics modelling leads to unprecedented insights and options for novel formulations. multiscale mobility networks and the spatial spreading of infectious diseases modeling the spatial spread of infectious disease: the global epidemic and mobility computational model conservation properties of a time fe method. part i: timestepping schemes for nbody problems discovering governing equations from data by sparse identification of nonlinear dynamical systems mathematical tools for understanding infectious disease dynamics generalized hamiltonian dynamics. canad modelling exit strategies from covid- lockdown with a focus on antibody tests the mathematics of infectious deseases geometrical methods and numerical computations for prey-predator systems continuum-kinematics-inspired peridynamics. mechanical problems modeling infectious diseases using integro-differential equations: optimal control strategies for policy decisions and applications on meso-scale modeling og covid- spatio-temporal outbreak dynamics in germany analysis of experimental epidemics of the virus disease mouse ectromelia modeling influenza-like illnesses through composite compartmental models variational time integrators outbreak dynamics of covid- in europe and the effect of travel restrictions outbreak of pneunonia of unknown etiology in wuhan, china: the mystery and the miracle symmetries and conservation laws for biodynamical systems the quasicontinuum method: overview, applications and current directions epidemic processes in complex networks outbreak dynamics of covid- in china and the united states fuzzystochastic fembased homogenization framework for materials with polymorphic uncertainties in the microstructure on spectral fuzzystochastic fem for problems involving polymorphic geometrical uncertainties comparing agent-based and differential equation models aspects of computational homogenization at finite deformations: a unifying review from reuss' to voigt's bound on the use of multiple compartment epidemiological models to describe the dynamics of influenza in europe global stability and uniform persistence of the reaction-convectiondiffusion cholera epidemic model key: cord- -qjfkvu n authors: tang, lu; zhou, yiwang; wang, lili; purkayastha, soumik; zhang, leyao; he, jie; wang, fei; song, peter x.‐k. title: a review of multi‐compartment infectious disease models date: - - journal: int stat rev doi: . /insr. sha: doc_id: cord_uid: qjfkvu n multi‐compartment models have been playing a central role in modelling infectious disease dynamics since the early th century. they are a class of mathematical models widely used for describing the mechanism of an evolving epidemic. integrated with certain sampling schemes, such mechanistic models can be applied to analyse public health surveillance data, such as assessing the effectiveness of preventive measures (e.g. social distancing and quarantine) and forecasting disease spread patterns. this review begins with a nationwide macromechanistic model and related statistical analyses, including model specification, estimation, inference and prediction. then, it presents a community‐level micromodel that enables high‐resolution analyses of regional surveillance data to provide current and future risk information useful for local government and residents to make decisions on reopenings of local business and personal travels. r software and scripts are provided whenever appropriate to illustrate the numerical detail of algorithms and calculations. the coronavirus disease pandemic surveillance data from the state of michigan are used for the illustration throughout this paper. coronavirus disease , an infectious disease caused by severe acute respiratory syndrome coronavirus (sars-cov- ) (world health organization, ), has become a global pandemic that has spread swiftly across the world since its original outbreak in hubei, china, in december . as of june , this pandemic has caused a total of confirmed cases and fatalities in more than countries. being one of the most lethal communicable infectious diseases in human history, it is expected that the covid- pandemic will continue spreading in the world population, causing even higher numbers of infections and deaths in the future. with no effective medical treatments or vaccines currently available, public health interventions such as social distancing have been implemented in most of the countries to mitigate the spread of the pandemic. one of the central tasks of statistical modelling is to provide a suitable risk prediction model that enables both government and public health workers to evaluate the effectiveness of public health policies and predict risk of covid- infection at the national and regional levels. such information is valuable for governments to assess the preparedness of medical resources (personal protective equipments and intensive care unit beds), to adjust various intervention policies and to enforce the operation of social distancing. modelling for infectious diseases has a profound role in informing public health policy across the world (heesterbeek et al., ; siettos & russo, ) . the outbreak of the covid- pandemic in december has led to a surge of interest in disease projection that ubiquitously relies on mathematical and statistical models. a crucial step in modelling disease evolution is to capture key dynamics of the underlying disease transmission mechanisms from available public health surveillance data, which enables reliable projection of disease infection into the future. a prediction model may help us foresee some possible future epidemic/pandemic scenarios and learn consequent impacts of current economic and personal sacrifices due to various control measures. because of both data quality and data limitations from public surveillance data systems, a statistical model should take the following features into account in its design and development. first, a statistical model should be able to make predictions and, more importantly, to quantify prediction uncertainties. forecasting is known to be a notoriously hard task, which depends heavily on the quality of data at hand and a certain model chosen to summarise the information from observed data and then to reproduce information beyond the observational time period. the chosen model is of critical importance to deliver prediction. this paper concerns a review of the family of classical compartment-based infectious disease models, which have been the most widely used mechanistic models to capture key features of infection dynamics. we begin with the most basic susceptible-infectious-removed (sir) model to build up the framework (section ), and this three-compartment model is then generalised to have more compartments to embrace additional features of infection dynamics (section ), such as the well-known fourcompartment model, susceptible-exposed-infectious-removed (seir) model, which takes the incubation period of contagion into account. given many types of factors potentially influencing the evolution of an epidemic, a single prediction value is insufficient to be trustworthy unless prediction uncertainty is reported as part of forecast analysis. quantification of prediction uncertainty is of critical importance, especially when a forecast is made at an early phase of an epidemic with limited data. building sampling variations in infectious disease models makes a statistical modelling approach different from a mathematical modelling approach. a clear advantage of a statistical model is that the model parameters, including those in the mechanistic model, can be estimated, rather than being specified by certain subjectively chosen prior information. second, the consideration of building sampling uncertainties in the modelling of infectious disease is a fundamental difference of a statistical modelling approach from a mechanistic modelling approach known in the mathematical literature of dynamic systems. a mechanistic model is typically governed by a system of ordinary differential equations, such as the existing three-compartment sir model consisting of three differential equations, which explicitly specifies the underlying mechanisms of an epidemic. this model is assumed to govern an operational system of disease contagion and recovery or death, which, in reality, cannot be directly observed. most of the time, public surveillance data are accessible, which represent only a few snapshots of the underlying latent mechanistic system of an epidemic. such gaps may be addressed by a statistical model that incorporates sampling schemes to explain how observed data are collected from the underlying infection dynamics. in turn, prediction uncertainty will reflect forms and procedures of the chosen sampling schemes specified in the statistical model. in this paper (section . ), we will introduce the state-space model as a natural and effective modelling framework to integrate the mechanistic model and sampling schemes seamlessly. third, given the scarcity of the available data in public health surveillance systems, the complexity of a model used for prediction should be aligned with the issue of parameter identifiability. for example, at the beginning of an outbreak, one should consider a simple model, which may be expanded over the course of an epidemic's evolution with increased data availability. to make the specified model useful to answer a certain question of practical importance, a relevant feature should be included in the model building. for example, in the study of control measures to mitigate the covid- spread, the model specification should incorporate a structure that is sensitive to the influence of a preventive policy. in section . , we will introduce an expansion of the basic sir model in that time-varying control measures are allowed to enter. the flexibility of permitting certain modifications is an important property of a model to be considered in an infectious disease model. in this field, all models need to be tailored with increased data and more knowledge from the literature as a disease evolves over time. from this point of view, compartment-based models are superior to other models because, for example, it is easy to add other compartments, such as an exposure compartment, a quarantine compartment or a self-immunisation compartment, to improve the mechanistic model, to answer specific question of practical importance and to capture distinctive data features for better prediction. fourth, as the epidemic evolves further, surveillance data become abundant and have higher resolution. for example, in the usa, the numbers of confirmed symptomatic covid- cases and case fatalities are recorded for each county. the average county population size in the usa is approximately , so a microinfectious model may be built upon county-level surveillance data to make high-resolution prediction and to assess the effectiveness of control measures at a community level. this paper (section ) will discuss this important extension of the classical sir model, essentially a temporal model, to a spatio-temporal model that enables borrowing of information from different spatially correlated counties in the improvement of risk prediction. this exemplary model generalisation sets up an illustration from a nation-level macromodel to a county-level micromodel. the latter is more relevant and useful for local governments to make decisions of business reopenings and for residents to be aware of local infection risk. last, to make research findings transparent and to place resulting toolboxes into the hands of practitioners, an open-source software package must be a deliverable. this is indeed a rather demanding task, as the ease of implementation and numerical stability impact the choice of statistical models and statistical methods for estimation and prediction. note that not every statistical model permits delivery of a user-friendly computing package that is general and flexible enough to handle various types of data. in this paper, we focus on the discussion of markov chain monte carlo (mcmc) methods that have been developed in the literature to perform estimation and prediction for state-space models (section . ). in this paper, we invite the readers on a journey of surveillance data, modelling, estimation and prediction, implementation and software development. after reading this paper, one should be able to use existing compartment-based models or to expand them in a study of an infectious disease epidemic, to improve estimation and/or prediction methods, or create one's own software. it is our hope that this paper may pave the path to learning, practising or developing new methodologies that are useful for a broader range of infectious disease modelling problems. multi-compartment models have been the workhorse for modelling infectious diseases since the early th century. they are a class of mathematical models used for describing the evolution of masses (in unit of proportions or counts) among the compartments of a varying system, with broad use cases in epidemiology, physics, engineering and information science. this is a dynamic system that is typically represented by a system of ordinary differential equations (odes) with respect to time, and, given a starting condition, the mass in each of the components is regulated by a function over time. an ode is a simple mathematical model to depict a trajectory of a functional trend. one of such examples used extensively in epidemiology is an exponential growth function, f.t/ d e t , which may be viewed as a solution to an ode of the form: df .t / dt d f .t/, or dy dt d y, where y is a function of time t, which obviously is y d f.t/ d e t with an initial condition f. / d . it is worth pointing out that this simple ode explicitly characterises the rate of change (speed or velocity) for function y d f.t/, rather than directly specifying a form for the function f.t/ itself. such rate-based characterisation is termed as 'dynamics' in the mathematical literature. clearly, this ode is not a statistical model as it does not provide a law of data generation; in other words, there is no randomness in this ode to reflect sampling uncertainties. a typical multi-compartment model consists of several odes for a vector of rates that are linked each other. this is referred to as a dynamic system. the forms of odes are specified according to relevant scientific knowledge about the understanding of the underlying dynamic mechanism related to an infectious disease. in the context of infectious disease modelling, the sir model is the most basic threecompartment dynamic system that describes an epidemiological mechanism of disease evolution over time (see figure ). in brief, the model describes the flow of infection states or conditions by (i) moving susceptible individuals to the infectious compartment through a transmission process (the first arrow) and (ii) moving infectious individuals to the removed compartment (either dead or recovered) through a removal process (the second arrow). at a given time, the total population n under a study is partitioned into the three compartments, denoted by s, i and r, and their sizes satisfying s c i c r d n. with a slight abuse of notation, this notation denotes either the type of compartment or the size of compartment, whichever is applicable in a given context. in other words, s, i and r are used to denote the sizes of the mutually exclusive subpopulations of susceptible, infectious and removed individuals, respectively. this compositional constraint, that is, s c i c r d n, may be interpreted in a term of probability (or proportion) as follows: at a given time, an individual in the population is either at risk (susceptible), or under infection by a virus (infectious), or removed from the infectious system due to recovery or death; that is, Â s c Â i c Â r d , where Â s , Â i and Â r are, respectively, the probabilities of being susceptible, infectious and removed. this presents the primary constraint for a multi-compartment infectious disease model. more details of the sir model will be described in section . often times, the interest for such system lies in the function values over time, but the closedform analytical solution for such functions may not exist. for example, to answer the question of how many individuals will be infected with the covid- by the end of the year (or any future time) requires to know a calculator that computes the cumulative numbers of susceptible, infected and removed cases over time from the past to the future. unfortunately, in reality, functions relevant to this calculator are usually non-linear, and their exact forms are difficult to directly specify. in contrast, a set of odes helps better understand the disease transmission dynamics (i.e. traits of infectious diseases) and more conveniently captures their key features, where each ode may correspond to one mode of disease evolution. such odes for disease spread may be regarded as a model for the expected dynamic mechanism, serving as a systematic component in a statistical model. numerical methods such as the euler discretisation method or the runge-kutta approximation method (stoer & bulirsch, ; butcher, ) can be used to obtain approximate solutions of such odes with given boundary conditions. regardless of methods used, solutions to a dynamic system are deterministic functions. we illustrate a basic mechanistic model of disease spread in the succeeding text. additional review from deterministic and mathematical perspectives of multi-compartment models is given by anderson et al. ( ) and hethcote ( ) . example . consider the sir model for a hypothetical population with a constant population of n d residents and an initial condition of susceptible individuals, infectious individual and individual removed (either died or recovered). here subjects may be also regarded as % if the unit of proportion is used in the interpretation. the transitions between compartments, written in odes as in ( ), represent population movement from one compartment to another (see figure ). we consider an example withˇd : (a rate of moving from s to i) and d : (a rate of moving from i to r), leading to r dˇ= d : . here r is the socalled basic reproduction number that quantifies an average number of susceptible individuals contracting a virus from one contagious person in an environment of no preventive measures. this is a quite infectious scenario as we will see later. the r script in the succeeding text shows a scenario of obtaining the solution to the system of odes by standard ode solvers ( r package desolve) using the first-order euler method (not shown) or the runge-kutta fourthorder (rk ) approximation method (figure ). details about the rk method can be found in appendix a . . figure , on each of these days, the sum of the three values from the three curves is always equal to , presenting a time-varying redistribution of the individuals. with no control measures in this hypothetical infection dynamics, the susceptible compartment quickly drops and reaches an equilibrium state after days of the outbreak, and during the period of first days, the infectious compartment increases to a peak and then decreases to zero (no contagious individuals in the population) as all currently infected individuals move to the removed compartment, which is the exit of the system. despite relying on a valid infectious diseases mechanism, deterministic approaches have several drawbacks: (i) the actual population in each compartment at a given time is never accurately measured because we only obtain an observation around the mean; (ii) the nature of disease transmission and recovery is stochastic on the individual level and thus never certain; and (iii) without random component in the model, it is neither possible to learn model parameters (e.g. r ) from available data nor to assess prediction uncertainty. the latter is of critical importance given many unobserved and uncontrolled factors in surveillance data collection. in an early stage of the current covid- pandemic, the daily infection and death counts reported by health agencies are highly influenced by the availability of testing kits, reporting delays, reporting and attribution schemes, and under-ascertainment of mild cases in public health surveillance databases (see discussions in angelopoulos et al., ; banerjee et al., ) ; both disease transmission rate and time to recovery or death are also highly uncertain and vary by population density, demographic composition, regional contact network structure and non-uniform mitigation schemes (ray et al., ) . hence, statistical extensions are necessary to incorporate sampling uncertainty in estimation and inference for infectious disease models. the main focus of this paper will be given to a statistical modelling framework based on a class of state-space models, in which the systematic component is specified by multicompartment infectious disease models while the random component is governed by a certain sampling distribution of surveillance data. note that multi-compartment infectious disease models present a class of classical mechanistic models widely used in practice and that incorporating certain sampling distributions allows to make statistical estimation, inference and prediction with quantification of uncertainties. we organise the paper as follows. in the first part of the paper, we introduce a class of macromodels. we begin with the most basic sir mechanistic model in details, followed by some important extensions used to address representative scenarios of disease spread and infection evolution. examples include seir model with an additional compartment of exposure accounting for potential incubation period of infection and susceptible-antibody-infectious-removed (sair) model with an additional compartment of antibody accounting for potential self-immunisation after being infected. then, we formally introduce the framework of state-space models, a powerful statistical modelling approach that aims to model available surveillance data from public health databases with the utility of the underlying latent mechanistic model. in the second part of the paper, we introduce a class of micromodels. when an epidemic continues, data become abundant and of high resolution at community level. for example, the surveillance data of the covid- pandemic in the usa are collected from individual counties. this allows building county-level microinfectious models in addition to country-level or state-level macromodels. being a certain subgroup analysis, such micromodelling is appealing to address spatial heterogeneity across the more than counties in the usa and consequently improves the prediction accuracy. as far as the spatial modelling of infection dynamics concerns, we review the classical cellular automata (ca) that is extensively used to describe person-to-person interacting rules associated with epidemic spreading patterns in a population via relevant interlocation connectivity functions. this ca may vary spatially and temporally, which presents a principled way to extend a state-level macroinfectious disease model to a stratified microinfectious model. in addition to the case of geographical subgroups, other types of subgroups by, for example, age, race, income, political party and economy, are also of interest. our main objective of this paper is to introduce to readers the basics of infectious disease models, underlying modelling assumptions, statistical analyses and possible extensions. examples will be provided for demonstration purposes. this review targets readers who have had some statistical training but no prior experience in infectious disease modelling. the first infectious disease model (mckendrick, ; kermack & mckendrick, ) is widely known as the susceptible-infectious-removed model, or in short the sir model (see figure ). it is a three-compartment model for studying how infectious diseases evolve over time on the population level. it defines a mechanism of disease transmission and recovery for a population at risk by a dynamic system of three disjoint states: susceptible, infectious and removed. we note an important distinction between infectious and infected individuals. infectious individuals are those who are currently infected and not yet recovered or dead (currently infected individuals become infectious immediately in the sir model, although it may not be true in reality; see the seir model in section where currently infected individuals become infectious with a delay in time), whereas infected individuals could mean only currently infected or both currently and previously infected. for clarity, we will refer to currently infected as infectious so that the three states in the sir model are mutually exclusive. individuals in the susceptible state are not immunised and can become infected by coming into contact with infectious cases, so they are at risk at a given time. individuals in the infectious state contribute to the transmission of the disease until they ultimately recover or die, so they are contagious. individuals in the removed state include those who either recover or die (without distinction). this is an exit from the infection system, meaning that once an individual leaves this system (recovers or dies), he or she would never return to the system. this is true for people who die from the virus but may not be the case for recovered individuals. thus, in the sir model, there is a technical assumption that a recovered individual would become self-immunised to the virus and no longer impact the disease transmission. a possible way to relax this assumption is to create two separate compartments corresponding to recovery and death states, respectively, leading to a four-compartment infectious disease model. to make our presentation focused on the basic three-compartment model, we make this self-immunisation assumption in this section. given what we said earlier, the current version of sir is only applicable for diseases, where long-term immunity can be developed, and does not apply to recurring infectious diseases, such as the common cold. this is because the disease transmission rate is set as a constant in sir. in this section, we introduce the sir model in its basic deterministic form (section . ), define reproduction numbers (section . ), elaborate its assumptions (section . ) and properties (section . ) and present some technical extensions to the basic sir model. mechanistic extensions, such as modifications to the three-compartment sir model to account for additional components or disease mechanism, are discussed in section . we use s.t/, i.t/ and r.t/ to denote the time-course subpopulation sizes (i.e. the number of individuals) distributed into each of the three compartments at a given time t, where t is continuous. clearly, where n is the total population size, which is a fixed constant. the starting time is denoted as t d . the rates of change among these subpopulations are represented by a system of odes: ( ) ( ), these three odes define a dynamic system of three deterministic functional trajectories over time, including the susceptible trajectory s.t/, the infectious trajectory i.t/ and the recovered trajectory r.t/ for t . this sir dynamic system is well posed in the sense that non-negative initial conditions lead to non-negative solutions of the three functional trajectories. these trajectories collectively demonstrate the evolutionary mechanism of an infectious disease. the sir dynamic system in ( ) may be interpreted as follows. let us consider events occurring instantaneously at time t. in the first ode, the ratio i.t/=n represents the proportion of contagious individuals in the population, which may be thought of as a chance that a person in the at-risk population may run into a virus carrier. if each individual at risk has an independent chance to meet a contagious person, then, according to the binomial distribution, the expected number of susceptible individuals contracting the virus is s.t/i.t/=n. in reality, a person at risk may run intoˇ(say, ) contagious individuals, leading to a modified chanceˇi.t/=n. thus, instantaneously at time t, the system gains an additional number of infected cases equal toˇs.t/i.t/=n, and these cases will leave the susceptible compartment to enter the infectious compartment. such loss to s.t/ is attributed to the negative sign in the first equation. in the second ode, the first term is the number of new arrivals of contagious individuals and the second term is the loss of contagious individuals at a rate who either recover or die and then enter the removed compartment. the third ode is based on an absorbed compartment that always accumulates with new arrivals with no departure cases. in the literature, the transition rate represents the fraction of the infectious population that exits the infectious system per unit time. for example, d : means that the infection compartment will decay (or infectious individuals being recovered or dead) at an average rate %. in other words, = describes the expected duration ( days for d : ) over which an individual stays infectious under the exponential distribution of time for his or her sojourn. variations of the form in ( ) are often seen in the literature. among those, the most important sir specification is given as follows. because the total population n remains constant over the duration of infection, by dividing both sides of the ordinary differential equations by n, the rates of change in terms of population proportions can be derived, without changing the interpretation ofˇand . that is, ( ) where Â s .t/, Â i .t/ and Â r .t/ are the probabilities (or proportions) of being susceptible, infectious and removed at time t, respectively. here the probability of being infectious Â i .t/ is also known as the prevalence of disease in the epidemiology literature (see, e.g. osthus et al., ; wang et al., ) . a clear advantage of this alternative form of the sir model ( ) where the population size n is implicitly absorbed into the parameter of disease transmission rateˇ, which may be interpreted as a per capita effective contact in proportion to the population (see, e.g. johnson & mcquarrie, ). despite the differences in notations and presentations, they convey the same infection mechanism, but interpretations need to be given accordingly. although we use these model specifications exchangeably in this paper, the form given in ( ) is recommended to conduct practical studies. based on the two parametersˇand in an sir model, the ratio r dˇ= is termed as the basic reproduction number, which captures the expected number of new individuals who directly contract the virus from one contagious individual in an environment with no preventive measures. intuitively, it is a product of the infection rateˇand the infectious duration = . the basic reproduction number r does not depend on the distribution of people over the three compartments and presents a key appealing disease characteristic for describing and comparing across infectious diseases (see, e.g. chowell et al., ; ferguson et al., ; khan et al., ; liu et al., ). an epidemic is expected to occur when r > , or to disappear when r < . this is because in the sir model ( ), at the condition of s.t/=n , the former is equivalent toˇ> , leading to di.t/=dt .ˇ /i.t/ > , while the latter implies di.t/=dt < . the earlier interpretation of r relies on an implicit assumption that all contacts with a contagious individual are susceptible, which contrasts with the effective reproductive number. the effective reproductive number is defined as r e .t/ d r s.t/ n . it represents the expected number of newly infected individuals who contract the virus directly from a contagious individual at time t, given that each susceptible individual has a chance of s.t/=n to meet this contagious individual. this is not to be confused with the notation r.t/, the removed population. in the early outbreak of an infectious disease in a large population, r e .t/ r because s.t/=n . in contrast to r , which is only descriptive of the disease itself (or the progression of disease near time ), r e .t/ reflects the progression of the infectious disease in a population at any given time because it directs the sign of di.t/=dt corresponding to acceleration or deceleration of the infection dynamics. this may be seen by the second-order derivative d i.t/=dt ; a time, say t , at which d i.t /=dt d or the rate di.t /=dt reaches a peak, is referred to as a turning point (see the peak in the middle panel of figure ). hence, r is of most interest during the early phase of an epidemic, whereas r e .t/ is of most interest later on during the controlling phases of an epidemic. for example, the so-called herd immunity is the natural immunity developed when an epidemic reaches r e .t/ < . in other words, without interventions, it requires the proportion of susceptible individuals to be no more than =r , or the combined proportion of infectious and recovered to be at least =r in order to contain the spread. as another example, if an effective vaccine becomes available at time q t > , knowing r e . q t/ allows us to estimate the remaining proportion of population that needs to be vaccinated in order to control the epidemic (i.e. for achieving r e .t/ < ). figure shows that the effective reproductive number r e .t/ for example decreases as the group of susceptible individuals, s.t/, shrinks over time, eventually reaching below the threshold of at time . the value at time is r d r e . / d : , while r e . / d . the time of reaching this threshold also marks a special time of interest-when the number of active contagious individuals starts decreasing at time after reaching its maximum, as shown in the middle panel of figure . like every mathematical model, there are some assumptions and constraints such as boundary conditions that the sir model needs to satisfy. these restrictions define the circumstances where the sir model may be appropriate to use in practice. although some of them have been mentioned earlier, for the sake of self-contained summarisation, we list all key assumptions as follows. assumption : the population involved in the infection is closed with no additions or leakage of individuals, and the size of the population is fixed, say, n. this assumption may be satisfied by an epidemic that is rapid and short lived, during which disease evolution is not affected or is minimally affected by vital changes (e.g. natural births or deaths) and migration (i.e. immigration and emigration). technically speaking, the three compartments satisfy the condition of the form: assumption : individuals in the population meet each other randomly in that both probability and degree of interactions with one another remain constant over time, regardless of geographical and demographic factors. this is a strong assumption of homogeneity for the sir dynamic system that is governed by the same transmission and recovery param-etersˇand . in practice, such a homogeneity assumption may be easily violated. thus, modelling with heterogeneous dynamics of infection is an important and active research area in the literature on infectious diseases. assumption : one susceptible individual can only develop immunity (or selfimmunisation with antibody against the virus) through infection (i.e. no vaccination). in other words, as shown in figure , the infectious compartment is the only exit of the susceptible compartment, and there is no other state to which an at-risk individual would move next. once recovered from infection, one becomes immune to the virus for the remainder of the study period and would not return to be susceptible again. in effect, this is a rigorous definition of recovered case in the sir model. from a view of the graphic representation in figure , this implies that there is no connection from the removed compartment to the susceptible compartment, or in other words the removed compartment is the terminal state of the infection dynamics. it is worth pointing out that to date the validity of this assumption for the covid- pandemic remains unknown. in the literature, this condition is assumed for a certain period of time over which risk prediction is considered. assumption : the infection has zero latent period in that one becomes infectious once exposed. this is a key distinction of the sir model from the seir model. like many infectious diseases, the covid- has a reported average incubation period of between and days (li et al., ; pan et al., ) , which adds some additional complexity in the modelling of infectious disease dynamics. as a matter of fact, this latency of contagion is really the timing of being contagious and not that of being symptomatic. some studies have found that covid- carriers are most contagious in the early phase of illness prior to the occurrence of noticeable clinical symptoms (ip et al., ; he et al., ) . given these findings, it is tricky to see how the compartment of exposure for incubation would be added to extend the sir model for the covid- pandemic. assumption : because the sir model has constant transmission and recovery parameterš and , which are not time varying, the underlying infection is assumed to evolve in fully neutral environments with no mitigation efforts via external interventions such as a public health policy of social distancing, effective medication or fast testing kits for diagnosis. as far as the covid- pandemic is concerned, this is the biggest restriction of the sir model, which is not reflective of the reality-almost all countries with reported covid- cases have issued various non-pharmacological control measures. many researchers have proposed solutions to overcome this unrealistic assumption of the sir model in the analysis of covid- data (see, e.g. wang et al., ) . assumption : the population size n is large enough to have enough number of incidences, including the number of infections, the number of deaths and the number of recovered cases, so that the sir model parameters can be stably estimated with high precision. technically speaking, this is not a model assumption but a condition of sample size for statistical power. because this mechanistic model will ultimately be used for risk projection, a well-trained model with reliable data is necessary to not only produce an accurate prediction but also to adequately assess the prediction uncertainty. although these six assumptions specifically concern the sir model, most of these discussions or associated insights are useful to understand the restrictions of sir model extensions that will be presented in the remaining sections. knowing possible violations of a certain restriction on a multi-compartment model in data analyses gives rise to potential new research problems for further investigation. to further understand the mechanism of infection governed by the sir model, we now give a brief summary of its analytic properties that provide useful guidelines for us to build statistical models and methods to learn the sir model from available surveillance data from public health databases. ( ). more importantly, although the dynamic system defined by the sir model is continuous over time, available surveillance data are reported at discretised measurements over discretised time points. for example, most of the covid- public databases update data on a daily basis, in which 'a day' is the unit of time for measurement. knowing this discrepancy between the continuous time underlying mechanistic model and the sampling frequency at discrete times for available data is essential to create a statistical framework to link the sir model with the data at hand. property : the sir model is deterministic and does not contain any probabilistic components. it is noteworthy that dynamics and stochasticity are two different mathematical properties; a dynamic system (e.g. the sir model) is not necessarily stochastic, while a stochastic system is not necessarily dynamic. as shown in figure , the compartment sizes s.t/; i.t/ and r.t/ are time-varying functions with no random fluctuations, which are completely determined by the model parameters and the initial conditions of the sir model. obviously, this is a limitation of the sir model when it is applied for data analysis, where data collection is subject to profuse uncertainties and random errors. property : it is easy to show that the number of individuals at risk (in the entry of the system), s.t/, is monotonically non-increasing and that the number of removed cases (at the exit of the system), r.t/, is monotonically non-decreasing (see figure ). hence, the total number of individuals who have been exposed to a virus is equal to n s.t/ d i.t/ c r.t/, which is monotonically non-decreasing. i.t/, the number of active contagious cases, or the difference between the two groups of the exposed cases and the recovered cases, can be either increasing or decreasing. the middle panel of figure nicely conveys such directionality of movements, in which the time of i.t/ reaching the peak and the time of i.t/ reducing to zero are two important turning points of interest in epidemiology. the former indicates the turning point of disease mitigation, and the latter corresponds to the turning point of disease containment. property : it can be shown that i. / d (or equivalently, Â i . / d ), meaning that the disease will eventually die out. this is because when t ! , the rate of prevalence Â i .t/, given by .ˇÂ s .t/ / in ( ), will become negative at a certain time and then become more and more negative until converging to zero because Â s .t/ is a decreasing function and Â i .t/ is bounded in the succeeding text by zero. however, this property of decaying to zero is conditional on the assumptions listed earlier. violation of assumptions and are most likely to cause a disease to persist because the monotonicity of s.t/ used in the earlier argument is no longer valid. an example of such diseases includes seasonal influenza, where immunity does not last long. property : the sir model has a recursive property in that at any given time, disease progression (i.e. shapes of the three functions) is only dependent on their current values and not on other information from the past. this property of recursion should not be confused with the markov property that has exclusively used in the literature of stochastic processes under the conditional probability law. here there is no probability law involved in the recursive operation, which is indeed a fully deterministic recursion. such conceptual distinction may help understand the differences between dynamics and stochasticity. during an epidemic, various control measures are typically issued by governments to mitigate or contain the spread of the disease. a direct impact of these external interventions is that both the transmission and recovery rates are no longer constant over time. thus, an important generalisation of the sir model is to accommodate different degrees of mitigation policies, including social distancing, limiting transportation, mandatory mask wearing and city lockdown. as observed in the ongoing covid- pandemic, mitigation strategies are changing over time. limiting mobility of susceptible individuals and medically isolating contagious individuals in the population would reduce the rate of contracting virus, leading to a decreasing disease transmission rateˇ.t/. at the same time, gaining better knowledge on both treatment and self-management of symptoms and improving medical resources may increase the rate of recovery .t/ over the course of an epidemic. incorporating time-varying parameters into the sir model leads to an important extension of the basic sir model ( ): the form ofˇ.t/ can be specified mainly in two ways. one is to letˇ.t/ be either a parametric function (e.g. exponential decaying function) or a non-parametric function (smirnova et al., ; sun et al., ) , both of which may be estimated from available data. one useful feature for the use of a parametric function ofˇ.t/ is to incorporate seasonality in the transmission rate. it is well known that many infectious diseases spread most quickly in some of the winter months. especially, respiratory infectious diseases caused by some coronaviruses exhibit seasonal behaviours that are consistent with the trends of temperature and humidity (barreca & shimshack, ; sajadi et al., ) . accounting for such seasonal periodicity in the model would produce a better long-term prediction of an epidemic. as the public attention for covid- pandemic projection gradually shifts from the short term to the long term, it becomes increasingly important to take seasonality into account. following dietz ( ) , a simple way to introduce seasonality is to assume that the transmission rateˇfluctuates over the period of a year:ˇ. t Ã ; t d ; : : : ; ; whereˇ is the average contact rate, oe ; is the degree of seasonality with d reducing the model to the basic sir model, and oe ; / is the offset in time horizon so that peak transmission occurs at t d . other periodic functions or their combinations can also be used to model seasonality. as an alternative to a fully non-parametric function, wang et al. ( ) assume a form .t/ dˇ .t/, < .t/ Ä , where .t/ is a known function specified according to given control measures. this specification allows to assess the effectiveness of a target preventive measure, as well as to compare different preventive strategies. clearly, the model with .t/ Á represents disease progression in the absence of any mitigation effort, which sets up the baseline situation in the policy assessment and comparison. the flexibility in specifying .t/ allows easy incorporation of future business reopening events; for example, in the covid- pandemic, this function may be specified as a u-shaped curve in that control measures (e.g. social distancing) gradually relax after a certain time point (see more details from wang et al., , and some numerical results of the covid- data analysis). more discussions on the time-varying transmission rate are given in section . . the assumption of a fixed population size is restrictive, especially when an epidemic remains for a long period of time before it is contained. in this setting, inclusion of natural birth and death dynamics is needed to adequately characterise the time-varying size of each compartment in the sir model. first, let be the natural birth rate and let be the natural death rate. so, the population size will change according to the ode of the form in this case, there are three exits for natural deaths, each occurring at one compartment. an extension of the basic sir model is given as follows: ; as desired. note that when model ( ) is used, n.t/ will be automatically absorbed into the proportions and thus no longer appears in the model formulation. in this section, we review several four-compartment mechanistic models as extensions of the basic sir model introduced in section . being a simple version of a mechanistic model with three compartments, the sir model has some limitations in real-world applications. thus, extensions of this basic type to account for different disease mechanisms and assumptions have been widely considered in the literature. the commonly studied seir model takes into account an incubation period by adding an exposed compartment in between susceptible and infectious compartments (see figure ). the underlying assumption here is that individuals in this exposure subpopulation have contracted the virus but are not yet contagious and are bound to become contagious. in the current literature, most infectious diseases that are suitable for the sir model are believed to fit in the seir model. the exposed compartment may be regarded as a waiting room for virus carriers who are about to spread the virus in the population. let ı be the rate for an exposed individual becoming contagious. then, the basic sir model can be extended to a four-compartment model consisting of the following four odes: where e.t/ is the size of the exposed compartment at time t. in this case, the compositional constraint becomes this constraint is clearly satisfied by the seir dynamic system defined in ( ). let Â e .t/ be the probability (or proportion) of being exposed to the virus. then, the rates based sir model ( ) can similarly be extended from the model ( ) earlier. technically, the seir model often suffers from the issue of parameter identifiability because determining a correct incubation period of an infectious disease and thus the parameter ı is a rather difficult task in practice. first, incubation period varies from one person to another; in the case of covid- , the incubation period ranges from to days, with a median of . days (lauer et al., ) . in another study of covid- patients in china, guan et al. ( ) have reported that the estimated incubation period is between to days with a median of days. it is clear that this quantity is very person dependent. second, ascertainment of contagion may be largely delayed because of shortage of virus testing sources. this length-biased sampling problem is notoriously challenging for the estimation of the incubation period (qin et al., ) . third, in the literature (e.g. he et al., ) researchers found that covid- carriers tend to be more contagious right after contracting the coronavirus than a week later because they are not self-quarantined in the absence of clinical symptoms. in other words, in the case of the covid- , the incubation period (or sojourn at exposed state) is too short to play a substantial role in the modelling of the pandemic. not all infectious diseases will develop long-term immunity. individuals may develop immunity after recovery only for some time and could lose immunity such that they become susceptible again. thus, recovered individuals rejoin the susceptible compartment after a certain duration of immunity. this disease evolution is intuitively called the susceptible-exposed-infectious-removed-susceptible (seirs) model. we assume no death in the removed compartment (see figure where the recovered branch in the removed compartment is connected to the susceptible compartment). an example of diseases studied using this model includes the common cold. this seirs model is defined as follows: where is the rate of losing immunity and becoming susceptible again after recovery. different from the seirs model, there are some infectious diseases where long-term immunity is yielded by individuals who survive from their infection. to build the self-immunisation into the infection dynamics, introduce an antibody (a) compartment to the sir paradigm, shown in the bottom thread of figure . because individuals who enter the antibody compartment will no longer be at risk of infection for a certain period of time, this compartment is indeed an exit compartment, at least over a certain time window within which immunity is active, in addition to the removed compartment. in some infectious diseases such as the covid- , the subpopulation of self-immunised individuals is not directly observed or clinically confirmed by the viral rt-pcr diagnostic tests because of mild or absent clinical symptoms. they are self-cured at home with no clinical visits. adding this compartment in the modelling can help to greatly mitigate the issue of under-reporting for the actual number of infected cases in the population. this dynamic system consists of four compartments, that is, susceptible, self-immunised, infectious and removed, with the following odes: where˛is the rate of self-immunisation, which is not identifiable because of the lack of observed data. an approach to estimating the rate parameter˛is to collect data of antibody serological surveys from the population. refer to for more discussions. this section mainly focuses on an introduction of statistical models to analyse surveillance data of an epidemic. each statistical model consists of two components: a systematic component and a random component. in the context of infectious disease data analysis, the former may be specified by a dynamic infectious disease model from sections and . the latter is built upon a random sampling scheme that enables a stochastic extension of the mechanistic model (e.g. sir model) given in the systematic component. essentially, the notions about disease transmission, recovery or other characteristics are used to define key population attributes or parameters in an infection dynamic system of interest, which will be estimated by available data via a statistical modelling framework, where some covariates may be incorporated to learn some subgroup-specific risk profiles. a clear advantage of statistical and stochastic extensions is the ability to quantify uncertainty in both estimation and prediction in connection to sampling variability. this added uncertainty is crucial to policymaking as models not only generate an average estimation or prediction but also present the best and worst possible scenarios for more robust and confident handling of epidemics, given that surveillance data are subject to various issues in the data collection. an example presented in britton ( ) vividly shows the uncertainty in the progression of an infectious disease. consider patient zero, who will go on and infect on average r number of other individuals, as defined by a certain disease mechanism. the number of individuals who contract the virus from this patient is in fact stochastic, varying around the expected number of infections r , which could be described by a distribution (e.g. poisson or negative binomial) with mean r on the support of non-negative integers. with a non-zero probability of taking the value zero due to the variability in human activities, there is a non-negligible chance that an epidemic is completely averted. the opposite could be an outbreak with a non-zero probability that infects tens of thousands of people. without modelling such uncertainty, we cannot see all these possibilities and associated likelihoods of their occurrences during the course of an epidemic (roberts et al., ) . infectious disease systems governed by the class of multi-compartment models, though describing the population average, are useful to describe individual-based stochastic processes if certain random components are introduced into the modelling framework. thus, the resulting statistical models present more natural approaches to the analysis of surveillance infectious disease data. before introducing statistical methodologies that are commonly used for parameter estimation, we distinguish model parameters into two categories. those that can be determined a priori with no need for estimation, which we term as hyperparameters. those that cannot be fully determined and need to be estimated using the data at hand, which we term as target parameters. the choices of which parameter should be a target parameter versus a hyperparameter vary widely across methods. intuitively, the more we know about the biological characteristics of a disease, the more parameters can be held fixed a priori in the analysis. it is however very difficult to determine most of the model parameters early in an outbreak because of the limited amount of knowledge and data about the disease. indeed, many model parameters are not identifiable because of the lack of relevant data availability. one such example is the rate parameter of immunity˛in the sair model ( ). as relevant knowledge accumulates, literature reveals increasingly precise characterisation of the disease, such as its latency period, recovery rate, death rate, immunity duration and antibody acquirement. such information is typically obtained from surveys of high-quality individual-level data, which may provide much better quantification of these hyperparameters than having to be re-estimated by epidemic models, which, on the other hand, are largely based on much coarser surveillance data. in the case of the covid- pandemic, this survey-based approach may be too costly to carry out in countries with large and heterogeneous populations. in general, target parameters are mostly those that are location specific, for example, transmission rate and fatality rate. they vary largely across regions because of non-uniform mitigation effort and hospital resources; hence, datadriven estimations are preferred. in section , we introduce an areal spatial modelling approach to account for spatial heterogeneity in the analysis of infectious disease data. because of the issue of parameter identifiability in some mechanistic models, specifying hyperparameters in the model fitting is inevitable. however, holding hyperparameters fixed at certain values according to some external data sources is indeed controversial, and the validity of consequent analyses is highly dependent on the appropriateness of these certain prior values. to relax this technical weakness, later in section , we introduce a bayesian framework in which such prior information (e.g. hyperparameters) enters the statistical model via certain prior distributions rather fixed values, so that the uncertainty on those hyperparameters is adaptively compensated with the amount and quality of observed surveillance data. such flexibility has a great advantage in synthesising prior evidence and observed data. to present this section at a reasonable technical level, most of the discussions in the succeeding text are given in the setting of the basic sir model, and generalisation to other compartment models should follow with slight modification. in closing, it is noteworthy that the frequentist statistical methods discussed in the succeeding text are based on a fundamental assumption of data collection; that is, the population-level compartment data s.t/, i.t/ and r.t/, and others if relevant, can be directly collected from the study population. in other words, at given time, every individual in the population can be observed directly for his or her current status of being susceptible, infectious, recovered or died. this is practically impossible. thus, the interpretation of the estimation results should be carried out with caution. in the sir model ( ), the transmission rateˇand recovery rate are two target parameters of interest. estimation ofˇand can be carried out through optimisation in search for a model that best fits to the data. a commonly used minimisation criterion is the least squares loss. giveň and , numerical approximations (e.g. runge-kutta methods) can be used to solve for the trajectories, s.t/; i.t/ and r.t/. these expected trajectories are then compared with the observed trajectories to compute a discrepancy score, such as the sum (over time) of the squared errors, represented as a loss function of target parameters. now, it remains to find the estimates of these parameters that give rise to the curve that best fits the data through standard optimisation tools. in this case, the optimisation pertains to a two-dimensional search, which should be computationally straightforward. even a greedy search is computationally cheap. we illustrate using both simulated data and real data in examples and , respectively. example . we first generate an observed sequence of cumulative infectious counts following example , namely, the sir model with the true parameter valuesˇd : and d : . for simplicity, we fix d : in this example. we then evaluate the sum of squared error (sse) loss between the expected cumulative infectious count i.t/ and its sample counterpart i obs .t/, and the value that minimises this loss gives an estimate ofˇ. figure plots the sse loss versusǔ sing the simulated data i obs .t/, t d ; : : : ; t, with t d ; ; , respectively. it is found that the sse loss is minimised at Ǒ d : as expected. the longer the observed sequence, the more curved around . the sse appears, so the better we can identify the minimum of the sse curve. the r script shows the example for the case of t d . note that the sequence we used to define the fit is i.t/, but s.t/ and r.t/ can also be used in the estimation. similarly, a two-dimensional grid search can be used for estimatingˇand jointly when is not fixed in which the data of r.t/ must be used in the estimation. here we present only one replicate for illustration. example . we apply the same approach as given in example for analysing the daily time series of the covid- cumulative infectious counts in michigan during march to may . details of the data are described in appendix a , including the i.t/ sequence. the already defined sir function from example is used as the dynamic model, and the already defined sse function from example is used as the loss function. by fixing d : (i.e. average contagious period of days) the following code computes the solution Ǒ d : using the first observations ( to march). we then increase the number of observations in the estimation; as shown in figure , the value of Ǒ decreases when more data are used. this is noticeably different from example where Ǒ remains constant regardless of the number of observations used. the gradual decrease in our estimate ofˇindicates a potential reduction in the transmission rate over time in michigan due to the enforcement of statewide social distancing. in other words, the assumption of a constant transmission rateˇis inappropriate for the michigan data. this result suggests a need for using a more proper modelling technique, which will be demonstrated in section . . being often used as a classic textbook example, this least squares approach is equivalent to the maximum likelihood estimation (mle) under the assumption that measurement errors are independent and normally distributed with a homogeneous variance. in general, this approach gives consistent estimation and does not require a distributional assumption for the data generation and thus can be applicable to non-normal data. however, the ordinary least squares loss used in the earlier example assumes that data are independently sampled over time, which is not true because the observations are time series and are thus temporally correlated. because of this, the least squares estimation is not efficient. cintrón-arias et al. ( ) have discussed the use of a generalised least squares approach to account for more complex error structure, including temporal autocorrelations. it is not always the best practice to directly use data of i.t/ and r.t/ in the estimation of the model parameters. the covid- projection by gu (https://covid -projections.com/) adopts a loss optimisation approach based on the seir model using only death counts due to quality concerns with infection counts (e.g. under-reporting issue). the model uses a discrete state machine with probabilistic transitions to minimise a mixture of loss functions, such as mean squared error, absolute error and ratio error. in the literature, there are many other estimation procedures (e.g. wallinga & teunis, ; cori et al., ; thompson et al., ) . some of these alternatives do not estimateˇand , but more directly target the effective reproductive number r e .t/ in estimation and inference. here we present the method of moments, another routine estimation approach in the statistical literature for estimating the model parameters in the sir model ( ). during the early phase of an epidemic, one may assume s.t/=n and set dt d (e.g. a time unit of day for discretisation), so that the second ode of ( ) leads to the approximate exponential function solution: ( ) at discrete times at which data are actually recorded. after estimate o is obtained, we obtain Ǒ immediately. however, the estimation of is only accurate during the early phase of disease outbreak because the approximation of s.t/=n is used. in the literature, other types of moments are also used to derive parameter estimates. for instance, using the approximation from the first ode of the sir model ( ) at discrete times, one can easily obtain the following expression: an estimate ofˇmay be obtained by averaging the quantities given in the right-hand side of the equation earlier. in the case whenˇ.t/ varies over time because of changes of a certain mitigation measure, the earlier method of moments estimator may still be applied locally with a possible utility of a kernel weighting function such as the nadaraya-watson estimator (nadaraya, ; watson, ) . a very similar approach leads to the following approximation: which may give rise to a non-parametric estimator of the effective reproductive number. although r e .t/ can be identified at each time point using data solely from t, for numerical stability, the same idea of a kernel weighting (e.g. running-bin method) smoother is applied to estimate r e .t/ at t (see, e.g. wallinga & teunis, ) . linear approximations are easy to implement; however, the variances produced from such linear fits are typically inadequate in describing the true randomness of an infectious disease to allow valid inference and prediction. alternatively, it is promising to investigate the local linear fitting method (cleveland & devlin, ) that produces non-parametric estimates of the time-varying model parameters to better reflect temporal dynamics of the infection. in both the least squares estimation and method of moments estimation, there are no explicit assumptions about probability laws for data sampling. implicitly, both methods are based on the sampling scheme on the entire population; that is, the current status of every individual in the study population is recorded. this is certainly not true in practice. to overcome this, some estimation methods are proposed to account for sampling variability under certain parametric distributions. distribution assumptions can be made for many quantities in an infectious disease model. some are fully specified based on given knowledge. for example, the distribution of incubation period of a disease can be represented as a probability mass function by days (lauer et al., ) . on the other hand, some distributions are only specified to be from a family of shapes, with the exact form to be estimated. we illustrate the latter using a stochastic sir model. stochastic sir models typically require the same assumptions as a deterministic sir model (section . ). to reflect the stochastic nature of disease transmission and recovery, stochastic processes such as a poisson process are used to model the accumulation of cases. following the earlier definitions ofˇand , the number of effective contacts in the population is a poisson process with rateˇn. of these contacts, only those between the contagious and susceptible will lead to new infections. hence, the counting process defined by the number of exposed (i.e. i.t/ c r.t/, or equivalently n s.t/) follows a poisson process with ratě s.t/i.t/=n. hence, the number of newly exposed in an instantaneous duration of dt follows a poisson distribution with meanˇs .t /i.t / n dt. on the other hand, the duration of time individuals staying infectious is assumed to be independent and identically distributed according to an exponential distribution with rate , and hence, the mean infectious duration is = . when we jointly consider all i.t/ infectious subjects at time t, exit events occur independently with a rate i.t/, and the gap times between two adjacent exits are exponentially distributed with mean =f i.t/g. in summary, the number of removed individuals is a counting process following a poisson process with rate i.t/. such stochastic formulation is commonly used, for example, in bailey ( ) and andersson and britton ( ) . through the earlier definitions, s.t/; i.t/ and r.t/ are now random variables that can be directly sampled. in fact, it suffices to assume only two of the three counting processes in order to define a stochastic sir model due to the constant sample size constraint. for demonstration, at time t, in an instantaneous time interval oet; t c dt/, we may specify a stochastic sir model as follows: where i.t/ d n s.t/ r.t/. as a result of this probabilistic formulation, the effective reproductive number is now defined as an expectation, that is, r e .t/ d efˇs.t/i.t/=ng. the stochastic sir model ( ) is specified in continuous time, and we would hope that dt is very small. in practice, approximation to ( ) is used by letting dt d or a unit of day, which is typically the smallest time unit used in public surveillance data. as a result, s.t/ and r.t/ at time t are used to approximate the average in the entire duration of oet; t c /. this approximation turns a continuous time stochastic model into a discrete time scholastic model to proceed with statistical analysis. other distributions, such as negative binomial or general dispersion family (song, ) , may be considered to handle the issue of overdispersion in the counting processes. with distributions in place, we turn the focus to estimation and inference by the maximum likelihood approach. maximum likelihood estimation is often preferred in a parametric model where the underlying probability distribution is properly chosen. for convenience, we take day as the time of unit. by discretising time based on observed sequences, that is, t d ; ; : : : ; t, observed daily increments of counts s.t/ d s.t/ s.t c / in the susceptible compartment and r.t/ d r.t c / r.t/ in the infectious compartment are conditionally independent, given historical accumulated counts s.t/ and i.t/, according to the definition of model ( ). the second model in ( ) contains only the removal parameter , so the log-likelihood function of with respect to the data of daily increments in the removed compartment, r.t/, and daily cumulative counts of infections, i.t/, can be written as where s. / d n and i. / d . however, one caveat in the simplistic likelihood formulations earlier is that the cumulative time series s.t/ and i.t/ are assumed to be directly measured without errors. in other words, the earlier likelihood accounts only for the sampling uncertainties in the increments not those in the cumulative counts, so the resulting statistical inference may suffer from underestimated standard errors. there are two types of statistical inference theory considered in this context, namely, the infill asymptotic theory and the outreach asymptotic theory. the former pertains to the situation where the sampling points increase within a fixed time window (i.e. fixed t), while the latter is a situation of practical relevance where the time window of the data collection tends to infinity (i.e. t ! ). britton et al. ( ) discuss the infill large-sample properties under the assumption that the complete epidemic data, that is, continuously observed counting processes .s.t/; i.t// t [ ,t] , are available. under such setting, the asymptotic distribution of the mle based on continuously observed trajectories is established. obviously, it is really rare in practice to collect infectious disease data via such infill sampling schemes. nevertheless, for the sake of theoretical interest, we refer readers to britton et al. ( ) and references therein. the outreach large-sample theory for the mle with discrete time series data provides a statistical inference relevant to most of infectious disease applications. as an epidemic evolves, the number of equally spaced time points (say, daily) for data collection increases. when sampling errors in both i.t/ and s.t/ are allowed, the likelihood earlier is indeed a kind of conditional composite likelihood (varin et al., ) . thus, the standard theory of composite likelihood estimation implies that the asymptotic covariance of the estimator is given by the inverse godambe information matrix (or a sandwich estimator). the sensitivity matrix in the godambe information is hard to obtain analytically because of the serial dependence in the time series. instead, one may take a non-parametric bootstrap approach similar to that considered by gao and song ( ) to evaluate the standard errors in order to conduct a valid statistical inference. conditional independence is a strong assumption for mathematical convenience in the mle. relaxing it has drawn some attention in the literature. for example, lekone and finkenstädt ( ) and allen ( ) construct likelihood-based approaches using discrete time markov chain seir models; becker ( ) and becker and britton ( ) consider the mle in the sir model using martingale methods when all transition events for each individual are observed. it is however unlikely that such individual-level details are observed in most surveillance data used for modelling of infectious disease mechanisms. estimators using less detailed data have been proposed (e.g. becker, ; rida, ) . as part of efforts on further relaxing strong conditions in the earlier stochastic sir model ( ), in section . , we review a state-space modelling approach that generalises the current likelihood model and estimation framework, where s.t/, i.t/ and r.t/ are not directly measured and rather treated as markov latent processes. also, hyperparameters are included via their prior distributions instead of fixed values, and a bayesian estimation similar to the mle is established through the mcmc approach. this class of state-space models is so far one of the most flexible statistical modelling frameworks to analyse infectious disease data. we highlight several software packages that are publicly available for estimation of parameters in the multi-compartment models. overall, additional efforts in this computational domain are needed. several packages focus on the estimation and inference for r and r e .t/. for example, obadia et al. ( ) , in their r package r , implements multiple methods, including a method of moments-type approach (dietz, ) , a bayesian method (bettencourt & ribeiro, ) and likelihood-based estimation procedures (forsberg white & pagano, ; wallinga & teunis, ; wallinga & lipsitch, ) . along this line, cori et al. ( ) and thompson et al. ( ) develop bayesian methods to estimate the effective reproductive number and are made available through the r package epiestim and microsoft excel (https://tools.epidemiology.net/epiestim.xls). their methods use a moving window approach, assuming that the reproduction number r t, in this window oet c ; t is constant. a gamma prior distribution is used to derive the posterior distribution of the r t, given new infectious counts. state-space models refer to a class of linear or non-linear hierarchical stochastic models with parametric error distributions. the conventional state-space model is not formulated as a bayesian model, but later, its bayesian formulation has gained great popularity because of the availability of mcmc methods for the estimation of the model parameters (carlin et al., ) . this class of models primarily attempts to explain the dynamic features of the state-space model framework is advantageous over the stochastic compartment models introduced in section . in the following aspects of statistical modelling: (i) state-space model does not assume that the compartment processes s.t/, i.t/ and r.t/ are directly observed, which are treated as latent processes to be estimated from observed data. (ii) state-space model allows an explicit sampling scheme to be part of the model specification, which enables the quantification of both estimation and prediction uncertainties in the statistical analysis. (iii) state-space model is built upon the compartment probabilities (or rates or proportions) that automatically adjust for potentially varying population sizes. this conveniently relaxes the condition of a constant population size in the basic sir model. (iv) state-space model provides a flexible statistical modelling framework that embraces time-varying model parameters and integrates prior knowledge of disease mechanisms (e.g. r value from other studies) via prior distributions of the model parameters. (v) implementation of mcmc methods in statespace modelling provides a powerful approach to parameter estimation and predictions using conditional distributions given the history. this is different from all estimation methods in section that are always formulated via marginal distributions under strong assumptions of sampling rules. a state-space model consists of two stochastic processes: a d-dimensional observation process fy t g and a q-dimensional state process fÂ t g given as follows: the state process Â ; Â ; : : : is a markov chain with initial condition Â p .Â/, and transition (conditional) distribution is given by y t jÂ t f t .yjÂ t /: the observation process fy t g is conditionally independent given the state process fÂ t ; t g, and each y t is conditionally independent of Â s ; s ¤ t; given Â t , the conditional distribution is y t jÂ t f t .yjÂ t /. this model can be graphically presented by a comb structure shown in figure . according to cox et al. ( ) , the state-space model is a parameter-driven model in that the processes of the compartment proportions are unknown population parameters to be estimated, while the stochastic multi-compartment model such as the stochastic sir model in ( ) is a data-driven model where the compartment proportions are directly observed. as pointed out earlier, the validity of the latter is questionable in practice, especially in the analysis of the covid- pandemic data. let y s be the collection of all observations up to time s, namely, y s d .y ; : : : ; y s /. let be a generic notation for the set of model parameters. denote the conditional density of Â t , given y s d y s , by f t|s .Âjy s ; /. then, the prediction, filter or smoother density is defined, respectively, according to whether t > s, t d s or t < s. this conditional density f t|s .Â jy s ; / is the key component of statistical inference in state-space models. to develop the maximum likelihood inference for model parameters in state-space models, the one-step prediction densities f t|t are the key components for the computation of the likelihood function (see chapter of song, ) . given a time series data fy t ; t d ; : : : ; ng, the likelihood of y n is where f .y i / is expressed as follows: where by convention, g .Â i / d f | .Â jy i /, conditional on an initial observation y at time . in the earlier likelihood evaluation, one-step prediction densities, f t|t , and filter densities, f t|t , can be respectively given by the recursions ( ) with the recursion starting with f | .Â / d p .Â/. in general, exact evaluation of the integrals in ( ) and ( ) is analytically unavailable, unless in some simple situations, such as both processes being linear and normally distributed. for the linear gaussian state-space model, all f t|s are gaussian, so the first two moments of ( ) and ( ) can be easily derived from the conventional kalman filtering procedure, as discussed in chapter of song ( ) . however, with some computational costs, all integrals in the earlier likelihood and the filter can be evaluated numerically by mcmc methods. recently, wang et al. ( ) have developed an extended sir (esir) model that is built upon a state-space model with two (d d ) observed time series of daily proportions of infectious and removed cases, denoted by y i t and y r t , which are generated from the q-dimensional underlying infection dynamics fÂ t ; t g governed by a mechanistic sir model. in the case of the sir model, q d . as shown in figure , the latent process is a time series of the three-dimensional latent vector of population probabilities Â t d Â s t ; Â i t ; Â r t > that satisfies a three-dimensional markov process of the following form: where parameter Ä scales the variance. the function f. / is a three-dimensional vector as a solution to the sir model ( ), which determines the mean of the dirichlet distribution via the rk approximation. in comparison with the stochastic sir model in ( ), here the compartment proportions Â t are unobserved and explicitly modelled by a markov process to account for temporal correlations, so the parameter estimation can be carried out with multivariate likelihood functions. because the serial dependence is accounted for in the statespace model, the resulting estimation and prediction are more powerful than those given in section . . two observed time series y i t ; y r t > that are emitted from the underlying latent dynamics of infection Â t are assumed to follow the beta distributions at time t: where Â i t and Â r t are the respective probabilities of being infectious and removed at time t, and i and r are the parameters controlling the respective variances of the observed proportions. it is easy to see that y i t and y r t are conditionally independent given Â t , and e y i t jÂ t d Â i t and e.y r t jÂ t / d Â r t , and d . i ; r ; Ä;ˇ; /. because y i t and y r t share a common latent variable Â t , their marginal correlation is modelled. in fact, these two beta distributions define a sampling scheme of observed data, including daily empirical proportions of infectious cases and removed cases, which are a collection of daily signals from the underlying latent sir infection dynamic system. the earlier state-space model ( ), ( ) and ( ) is useful to assess the effectiveness of control measures (e.g. social distancing) via the projected epidemic evolution in the future time. to process, one can replace the constant transmission rateˇby a time-varying transmission rateˇ .t/, where .t/ is a given transmission rate modifier. it is specified as a function in time to reflect different forms and strengths of control measures. this results in an esir model proposed by wang et al. ( ) : where .t/ . obviously, the basic sir model is a special case with no intervention in place, .t/ Á . in general, the .t/ may be specified by a practitioner to reflect a particular control measure. for an example of the covid- in hubei province, china, a possible choice of .t/ given in the following is a step function that reflects government-initiated macroisolation measures: .t/ d figure (a)-(c), we obtain different types of transmission rate modifiers. alternatively, .t/ can be a continuous function, say, .t/ d exp. t/ or .t/ d expf . t/ ! g; > ; ! > , that reflects steadily increased community-level surveillance and personal protection (wearing face masks and washing hands) as shown in figure (d)-(f). note that this modifier function does not have to be a monotonic decreasing function and may take a u-shape to capture the relaxation of control measures. with such a modelling framework, one can carry out comparisons of different preventive protocols via the resulting projected infection risk Â i .t/ or other epidemic features such as the time of the effective reproduction number r e .t/ < and the time of a disease recurrence associated with relaxed control measures. a clear advantage of the state-space model is that it enjoys the resilience of mcmc being a primary method for statistical estimation and prediction. in other words, the statistical analysis methods can be easily modified to accommodate changes made in the latent multi-compartment models and/or in the observed time series models. one example of the covid- pandemic modelling given in wang et al. ( ) is to extend the three-compartment esir model to a four-compartment model by incorporating stringent quarantine measures issued by the hubei government via a new addition of in-home quarantine compartment. this new model is termed as susceptible-quarantined-infectious-removed (sqir) model. this quarantine compartment collects in-home isolated individuals who would have no chance of meeting any infectious individuals in the infection system. so, it is another exit from the dynamic system in addition to the removed compartment. let .t/ be the chance of a susceptible person being willing to take in-home isolation at time t. the basic sir model in equation ( ) is then extended to include a four-dimensional latent process where Â s t c Â q t c Â i t c Â r t d . the quarantine rate .t/ may be specified as a dirac delta function with jumps at times when major quarantine policies are issued by the government. for example, one may specify the time-dependent quarantine rate function .t/ for hubei province as follows: .t/ d note that at each jump, the respective proportion of individuals would leave the susceptible compartment and enter the quarantine compartment. figure (g)-(i) shows three different types of in-home quarantine rates during the period of the covid- pandemic in hubei province. in a similar spirit to the sqir example of application ii earlier, consider an interesting extension of the basic sir model in the analysis of the us covid- data to include an antibody compartment to handle the subpopulation of self-immunised individuals. this four-compartment model is termed as sair model, which has been discussed in detail in section . . because the antibody compartment is also a second exit from the infection system, similar to the quarantine compartment, one can turn the sair model given in ( ) into a similar form of the sqir model in ( ), where .t/ is replaced by˛.t/, the rate of self-immunisation. it is known that the population immunity rate cannot be estimated from observed surveillance data, which needs to be figured out by using large-scale serological surveys in the population. thus,˛.t/ may be specified as a dirac delta function (e.g. figure (g)-(i)) with jumps at times when the surveys are conducted and function values based on the survey results. it is worth pointing out that although the sqir and sair models have very similar model structures, their interpretations are very different. the former is applicable to the case of very stringent self-isolation control measures in hubei, while the latter is reflective to the situation of selfimmunisation due to mild control measures in the usa, so that a substantial proportion of individuals who contracted the virus, recovered and became immunised. markov chain monte carlo has been extensively used for the estimation and prediction in the state-space model (see, e.g. carlin et al., ; chan & ledolter, ; czado & song, ; de jong & shephard, ; zhu et al., , for a vast literature on this topic). such popularity of mcmc in the state-space model is rooted in its power to handle the evaluation of high-dimensional integrals involved in the likelihood function ( ). the essential strategy for the calculation of each high-dimensional integral is to approximate it by a sample mean of the involved integrand. this sample average is obtained from many mcmc sample draws from posterior distributions of the model parameters, including the time series of the latent probability vector Â t . let t be the current time up to which we have observed data . ( ) draw Â .m/ t from the posterior h Â t jÂ .m/ t ; .m/ i of the q-dimensional latent process, at t d ; : : : ; t ; t c ; : : : ; t; ; .m/ i according to the observed process, at t d ; : : : ; t ; t c ; : : : ; t, respectively. prior distributions are specified for some of the hyperparameters; for example, Â dirichlet y i y r ; y i ; y r , r dˇ= and follow some log-normal distributions, and i , r and Ä follow some gamma distributions or inverse gamma distributions, respectively. convergence diagnostics of the mcmc algorithm may use standard diagnostic tools such as the gelman-rubin statistic based on multiple chains with different initial values, monitoring trace plots of the model parameters and so forth. the r package coda provides a comprehensive toolbox of convergence diagnostics (brooks & gelman, ) . using the mcmc draws collected after the burn-in, various summary statistics may be obtained to estimate model parameters, conduct inference and make prediction. the summary statistics (e.g. posterior mean and posterior mode) from the in-sample draws of the model parameters can provide point estimates and % credible intervals with the left and right limits set respectively at the . th percentile and . th percentile, and those of the observed processes may be used to check the goodness of fit of a proposed model and to perform model selection via the deviance information criterion (spiegelhalter et al., ; gelman et al., ) . more importantly, the summary statistics from the out-sample draws of the latent process Â t ; t > t provide point predictions and their % credible prediction intervals. it is interesting to note that the earlier mcmc implementation does not depend much on the form of the runge-kutta solution f.Â t ;ˇ; / in the latent process ( ). as long as a mechanistic infectious disease model has an approximate analytic solution f. /, the bayesian estimation and inference can be carried out using mcmc. such flexibility is appealing to develop software applicable for a broad range of practical studies. mcmc procedures are well suited for the estimation and inference in the setting of statespace models because of fast and reliable numerical performances. for the michigan data analysis example in section . , using an average personal computer, we spend . h completing all mcmc calculations of draws with thinning bin size of after the burn-in judged by four separate mcmc chains. this computing speed can be improved by using highperformance computing facilities and/or some recent posterior sampling methods. as suggested by zhou and ji ( ) for a state-space sir model, one may set a more efficient sampler over highly correlated posterior spaces by parallel-tempering mcmc algorithm (geyer, ) , which provides rapid mixing in mcmc chains. also, along the line of online learning, sequential monte carlo methods for posterior sampling (doucet et al., ; dukic et al., ) are promising, as they permit efficient updating of existing posteriors with sequentially arrived data, in the hope to avoid refitting the model by running mcmc from scratch using the updated complete data. wang et al. ( ) and have developed a series of extended sir models by introducing time-varying transmission rate, quarantine process and asymptomatic immunisation process (details in section . ). the proposed methods have been established in an open-source r package esir, available on github (https://github.com/lilywang /esir). this package calls rjags to generate mcmc chains and retains a few mcmc controllers from rjags. the package is also updated weekly with new summarised us state-level count data for the covid- pandemic. several robust methods that are developed specifically for the prediction of the covid- are cited by the centers for disease control and prevention (https://www.cdc.gov/coronavirus/ -ncov/covid-data/forecasting-us.html). to name a few, the bayesian approach (verity et al., ) developed by researchers at imperial college london (featured in adam, ) and the hybrid modelling approach (ihme covid- health service utilization forecasting team & murray, ) adopted by the university of washington institute for health metrics and evaluation (ihme) (discussed by jewell et al., ) have attracted great public and government attention. we refer to their original work for modelling details. it is difficult to appreciate the original work and followed comments without running real covid- data using their software, which is lacking for the ihme models, among some others. to increase research transparency, releasing software or computing code used in statistical methods to the public is strongly encouraged. we now illustrate the use of r package esir to analyse the covid- surveillance data during the period of march to june from michigan state, usa. the michigan data used in this analysis are listed in appendix a , including both i.t/ and r.t/. in the data analysis, we demonstrate the use of both the state-space model described in application i and the mcmc method, where the transmission rate modifier .t/ is set as exponential functions. from package esir, we can extract many useful statistics related to estimation and forecasting. for example, we can obtain both mean and median projections of the prevalence curve Â i .t/; t > t as well as their % credible prediction intervals. in addition, this package provides the estimated first and second turning points of an epidemic. the former is the time when the daily number of new infectious cases stops increasing, while the latter is the date when the daily number of new infections becomes zero. mathematically, the first corresponds to the time t at which r Â i t d or the gradient of p Â i t is zero, and the second is the time t at which the rate of prevalence is zero p Â i t .t/ d : the following is the r script to perform the data analysis: in the above program, we consider a time-dependent declining transmission rate with the modifier value .t/ d exp. t/ where the parameter is chosen so that the modifier equals to . on may. this value is determined based on the social distancing scoreboard posted by unacast, inc. (https://www.unacast.com/covid /social-distancing-scoreboard). one needs to set exponential d true, to activate such setting. alternatively, as shown in figure (a)-(c), one may use a step function by providing a vector of pi , values and the corresponding vector of changes dates in change_time. in the main function above, we let the starting date be march and conduct the estimation and projection of days ahead (t_fin d ) on june and after. we run four separate mcmc chains with different initial values, each with length of , kept from every draws (thn d ) (a thinning operation to reduce autocorrelations) after draws are dropped. thus, with a relatively squandering setting, we expect a better performance of convergence and reliable quantification of prediction uncertainty using sample quantiles. there are two different prior settings for sensitivity analysis. one follows the example code earlier, with the prior mean for the log-normal distribution of the basic reproduction number to be . , the removal rate . and thus the mean transmission rate . , and the other with all these values to be , . and . , respectively. the two distinct settings provide similar estimating and forecast results as can be seen in figure . their estimated reproduction numbers are . ( % credible interval [ . , . ]) and . ( % credible interval [ . , . ]), respectively, which are similar considering that their prior settings are quite different. the output gelman-rubin statistic developed by gelman and rubin ( ) are close to (data not shown). both pieces of evidence as well as stationary trace plots warrant the convergence of the the michigan covid- data have been preprocessed to smooth away some unnatural gaps caused by the clustered reporting issue as discussed in appendix a . figure shows an adequate model fitting, where all observed numbers of confirmed infections fall in the % insample credible intervals of prevalence Â i t ; t Ä june. in contrast, the % out-sample credible intervals of the projected proportion (y i t ) are much wider, reflecting to the significant amount of uncertainty in the prediction. such uncertainty elevates as the time moves further away from the present time. despite the large uncertainty, the projected mean and median prevalence curves show a decreasing trend over time, which means that the social distancing works to mitigate the epidemic in michigan although the rate of improvement is moderate. also the fact that the two estimated turning points have occurred before june is another piece of evidence for the positive effects of the series of social distancing orders issued by the state governor since march . model diagnosis is an important part of a statistical analysis, which is typically conducted using various residual plots. as illustration, in this michigan data analysis, let n Â t be the posterior means over the period of march to june. we consider residuals of the two observed processes, defined by figure shows that there are dominant lag- autocorrelation (the three coefficients are about . ) and no any additional significant autocorrelations beyond the lag- dependence. this confirms the assumption that the three latent processes are all the first-order markov processes. all mechanistic models discussed in the previous sections are useful to analyse the infection dynamics for a large population such as a country or a state in which most of model parameters may be assumed to be homogeneous and representing the entire population. this type of macromodelling approach is particularly valuable at the early phase of disease outbreak when the national public health administration aims to come up with nationwide macrointervention protocols with very limited amounts of relevant data available. once an epidemic evolves further into its middle phase, with more and more surveillance data collected from local communities, a macromodel is no longer suitable for an in-depth analysis of microinfection dynamics owing to the existence of substantial heterogeneity across local communities. this section concerns a review of significant extensions of infectious disease models by incorporating spatial heterogeneity across different geographical locations into modelling and analysis. the focus will be on the recent development of integrating the classical spatial cellular automata (ca) (von neumann & burks, ) with the previously discussed temporal multi-compartment models, leading to an important class of spatio-temporal multi-compartment models. this class of models is useful to predict local infection risk. technically speaking, the majority of existing macromechanistic models to study the spread of infectious disease are based on the assumption that the system is homogeneous in space. this means that the spatial characteristics that could potentially play a non-trivial role in the development and outcome of disease infection are not taken into consideration. this is a valid assumption if the population vulnerable to the infectious disease is mixed well and the human interventions (e.g. vaccination strategies) are homogeneous across different spatial locations. however, in reality, there exists substantial heterogeneity in the urbanisation, ethnic distribution, political views, governance and economic composition across different subgroups of individuals distributed over geographical locations, all of which will influence the spread of infectious disease and make the previous macromechanistic model not appropriate to address the dynamics spatially. one possible extension is to utilise partial differential equations (pdes) (murray et al., ) in spatial homogeneity, which is relaxed to allow area-specific spread patterns of epidemics. as noted in the literature, one limitation of pdes is that this approach ignores the fact that infectious disease is spread through person-to-person interactions, rather than by a continuous population. thus, pdes may lead to impractical results about the dynamics of an epidemic (mollison, ) . a natural strategy is to embrace a micromodel mimicking an interactive particle system, and ca is one of the well-studied systems with the strength of modelling spatially varying infection dynamics. originated in the works of von neumann and burks ( ) and ulam ( ) , the ca paradigm has been used in many applied fields, including the modelling of infectious diseases. when applied to model spatial variations of epidemic spread, ca has three distinctive features: (i) it treats individuals as discrete entities in order to study person-level movements in the infection dynamics. this high-resolution paradigm necessitates the incorporation of individual's heterogeneity such as residential address, age, race, pre-existing medical conditions and others in the modelling. in surveillance data, geographical information is publicly available (e.g. county that an individual lives), so it is feasible to utilise this variable in the extension of the macromechanistic model. (ii) ca allows to introduce local stochasticity; for example, the ca paradigm may be built upon a person-to-person infectious mechanism if individual-level information is available; otherwise, it may be based on a group-group infection process. (iii) ca is formulated in a network of particles (e.g. individuals, groups, villages and counties) with certain rules of connectivity and stochastic laws of disease transmission. this network topology is well suited for computations and simulations. because of these unique advantages, the ca paradigm has been employed by researches as an efficient method to study spread patterns of epidemics (beauchemin et al., ; ahmed & agiza, ; boccara et al., ; quan-xing & zhen, ; fuks & lawniczak, ; willox et al., ; rousseau et al., ; sirakoulis et al., ; fuentes & kuperman, ; liu et al., ; yakowitz et al., ; sun et al., ) . in the modelling of infectious diseases, the basic ca formulation involves three primary components: (i) a two-way array of cells (e.g. an age group or a county) that contain groups of individuals under study, and each individual belongs to one cell; (ii) a set of discrete states (e.g. susceptible, self-immunised, contagious, recovered and death) that describe different conditions of individuals during an epidemic; and (iii) some specific rules or updating functions that determine spatially how local interactions with a target cell from its neighbouring cells can influence and change the states of individuals in the target cell; all cells in a ca system achieve a global propagation of infection status updates instantaneously and continuously. in the application of the ca, determining neighbouring cells is tricky, and different types of neighbourhood topology have been proposed in the literature, including von neumann neighbourhood, moore neighbourhood, mvonn neighbourhood and extended neighbourhood (hasani & tavakkoli, ) (see figure for an example of these four neighbourhood types). in the modelling of influenza a viral infections, beauchemin et al. ( ) use a simple two-dimensional ca model to investigate the influence of spatial heterogeneity on viral kinetics. their study population consists of two types of cell species, the epithelial cells and the immune cells. the epithelial cells are the target of viral infection, and the immune cells are those fighting the infection. the ca model is built upon a two-dimensional square lattice with the moore neighbourhood (see figure (b)), where the condition of a certain cell will only be influenced by the eight closest cells around it. the set of states for the epithelial cells include healthy, infected, expressing, infectious or dead, while an immune cell can be in any of two states: virgin or mature. decision rules of updating the ca system are governed by parameters, such as infect_rate that models the probability of a healthy epithelial cell being infected by contacting each infectious nearest neighbour. detailed updating functions are discussed in beauchemin et al. ( ) . simulations show that the proposed ca model is sophisticated enough to reproduce the basic dynamic features of the cell-to-cell infection. different from the modelling of the influenza a viral infection earlier, fuks and lawniczak ( ) propose a lattice gas ca that is closely connected to an sir framework of an epidemic, where the interacting patterns of individuals are modelled. it is assumed that the status of individuals will change between three types, susceptible, infectious and recovered, denoted as fs; i; rg. the space where the epidemic takes place is set as a group of regular hexagonal cells, in which the individuals are located at the centre of each cell and can move through a channel that is created by connecting two centres of adjacent cells. the evolution of the ca occurs at discrete time steps under the operation of three basic functions, including contact c, randomisation r and propagation p. with the application of function c, an individual who is susceptible can become infected with probability . ˇ/ n i , whereˇis the transmission rate and n i is the number of infectious individuals within the same cell. meanwhile, an individual who is infectious can recover with probability , where is the recovery rate. the function r randomly assigns individuals in each cell to move through the channels, which contributes to modelling the mixing process of individuals. in the final propagation step, individuals simultaneously move to the cells that they are randomly assigned to by r. in addition to the basic epidemic dynamics modelled by the proposed lattice gas ca, fuks and lawniczak ( ) also study the effect of heterogeneous spatial distribution of individuals with states s, i and r and the influence of different types of barriers in controlling the spread of an epidemic. whereˇis the population macrotransmission rate and is the population macrorecovery rate. first, when the set v d ∅, that is, an empty set, the ca-sir model for cell .i; j/ reduces a celllevel sir model similar to that given in ( ). second, the numerator n i cp;j cq Â i i cp;j cq .t / is the expected number of infectious cases yesterday (time t ) in a neighbouring cell .p; q/ v whose cell population is n i+p,j+q . so, the ratio is an empirical probability that a person in cell .i; j/ randomly runs in a contagious person from its neighbouring cell .p; q/. third, this random chance is weighted by a factor of intercell connectivity, denoted by ! .i;j / pq ; the stronger tie of cell .i; j/ with cell .p; q/, the higher likelihood of a person from cell .i; j/ running in contagious individuals in cell .p; q/. fourth, summing up all such likelihoods gives a total likelihood that an individual from cell .i; j/ would run in the virus carriers from all the neighbouring cells. a typical form of the intercell connectivity coefficient is given by ! .i;j / pq d c .i;j / pq m .i;j / pq , where c .i;j / pq and m .i;j / pq are broadly defined as a connection factor and a movement factor, respectively. they are used to characterise the intercell mobility or how easily individuals in the cells can move between the centre cell and its neighbouring cells. this ca-sir system, which is integrated with the sir model, can serve as a basis for the development of useful algorithms to emulate real-world epidemic infection spatially. . assume that there is a square array of cells that hold the population under the study of a certain epidemic. our target cell is the one at the centre (see figure ) we illustrate the predicted risk of infection with the covid- for all counties in michigan state using the state-space model with the mechanistic ca-esair latent process . in the first step, we apply the mcmc method to estimate the model parameters (ˇand ) and the vector of four probabilities Â t of being susceptible, self-immunised, infectious and removed by fitting the esair model with the state-level surveillance data since march. this can be performed easily using the r package esir, which has been illustrated in section . . both the antibody rate function˛.t/ and the transmission rate modifier .t/ are pre-specified using other data sources with the detail given in the succeeding text. after getting the estimates of the model parameters, we use them as the initial values to make county-level risk prediction by the ca-esair model ( ). in this example, we consider only a -day ahead infection rate prediction (i.e. may ) for all the counties in michigan, namely, Â i c .t c /. given that the covid- pandemic evolves fast in the state of michigan in early may , this kind of short-term forecast or nowcast is of great interest to the michigan government for timely decision making on either extending an existing governor's 'stay-at-home' order or relaxing this executive order. to perform the prediction, one important task is to specify the inter-county connectivity coefficient ! cc .t/. as discussed in , it is challenging to define ! cc .t/ objectively, as it involves many variables. in this illustration, we specify this coefficient as ! cc .t/ d cc expf Ár.c; c /g, where Á is a tuning parameter to be determined. briefly speaking, the first factor c,c is the inter-county mobility factor characterising the decrease of human encounters in terms of their potential movements between counties, which has been given online (https://www.unacast.com/covid /social-distancing-scoreboard). the second factor r.c; c / is a certain travel distance between two counties c and c in terms of both geodesic distance (karney, ) and 'air distance' based on the accessibility to nearby airports. in addition, the tuning parameter Á enables to adjust the scale of the travel distance by minimising the sum of (county-level) weighted absolute prediction error for the one-step ahead risk prediction of the infection rate. in addition to the specification of the connectivity coefficient ! cc .t/, the self-immunisation rate˛c.t/ is calculated based on the results of the new york statewide antibody test surveys released by the new york governor andrew cuomo on april (new york state report, ), and the transmission modifier function c .t/ is specified by the effectiveness score of state-specific social distancing using cell phone data in the usa from the transportation institute at the university of maryland (https://data.covid.umd.edu/). additional details of the determination of c,c , r.c; c /,˛c.t/ and c .t/ and the tuning of Á can be found in . figure (a) shows the -day ahead projected infectious rate for counties in michigan on may, and figure (b) plots the corresponding county-level weighted prediction errors (wpe), which is at the order of for the counties. the r package ca-esair is available on github (https://github.com/leyaozh/ca-esair). in this paper, we have presented the basics of multi-compartment infectious disease models from both deterministic and stochastic perspectives. we emphasise on the probabilistic extension of mechanistic models, which opens the door to a suite of statistical modelling techniques while still preserving the infectious disease dynamics in multi-compartment models. within the stochastic modelling framework, both the frequentist and the bayesian schools of modelling considerations and statistical methods are visited, along with high-level review and illustrative examples. epidemic models have played a key role in the past century to provide understanding of past and ongoing infectious diseases, and it is our belief that they will continually be valued and be improved to help us better understand the current covid- pandemic as well as future infectious diseases. we conclude with several remarks on future directions of stochastic infectious disease modelling. although publicly available surveillance data are useful to build preliminary models for the understanding of spreading patterns of infectious diseases, their data quality in terms of measurement biases and under-reporting has been known an outstanding issue that significantly impacts the validity of statistical analysis results (angelopoulos et al., ) . this is indeed an open problem to date with no appropriate solutions yet. with no insurance of reliable data, statistical methods, regardless of macromodels or micromodels, would fail to produce meaningful results. one potentially promising solution to such a fundamental concern is to build reliable and well-validated open-source benchmark databases that include not only traditional surveillance data but also personal clinical data from various sources such as hospital electronic health records, drug trials and vaccine trials. in addition, data from serological surveys and data from mobile devices or as such are also useful to increase information resolution and reliability, to remove major measurement biases and to calibrate data analytics. this task requires also efforts of data integration and international collaborations. research on the covid- pandemic certainly gives rise to a new opportunity of developing data integration methods to not only address challenges of data multi-modality but also overcome many data-sharing barriers and data confidentiality concerns. the population of self-immunised individuals is a significant source of bias in covid- surveillance data; they have never been captured by public health monitoring systems. according to survey results (new york state report, ), % of individuals in the city of new york have been tested antibody positive to the coronavirus. this simply means that a nationwide serological survey is a must in order to come up with an appropriate assessment for the underlying epidemiological features of the covid- pandemic in the usa. the design of this nationwide serological survey is a challenging statistical problem. solving it requires some innovative ideas and methods; for example, a cost-effective design of pooling several serum samples to perform a pooled test (e.g. gollier & gossner, ) , and an efficient design of hierarchical stratified survey sampling schemes. the sair model introduced in section . presents a basic framework for statistical models incorporating antibody serological surveys into the multi-compartment dynamics of infectious diseases. large-scale tracking data have played an important role in evaluating the effectiveness of social distancing in communities. the precision of intervention efficacy helps improve both estimation and prediction that directly impact government's decisions on tightening, extending or lifting control measures. one emerging data source pertains to the information of real-time cell phone locations, which allows better contact tracing so that individual data sequences can be recovered and used for modelling of personal risk and regional hotspots. a research group in the university of maryland (https://data.covid.umd.edu/) proposes several algorithms to process the cell phone data in the usa to extract key features of personal mobility, including location identification, trip identification, imputation of missing trip information, multilevel data weighting scheme, comprehensive trip data validation, and data integration and aggregation ghader et al., ) . however, these types of data are proprietary and subject to the issue of personal privacy (ienca & vayena, ) . integrating such data type or its summary statistics into infectious disease models should be encouraged, but in a cautious and responsible manner. in this field, statistical learning methods with differential privacy (dwork, ) are of great interest. statistical methodologies have been greatly challenged in the modelling and analysis of infectious diseases; almost every methodological troubling issue known the statistical literature surfaces, which presents new opportunities to statisticians and data scientists to develop innovative solutions. among many challenges, we emphasise a few of critical importance, which may be easily ignored in the new methodology development. we strongly advocate for the urgent need to build models that are transparent and reproducible (peng, ) . as most methods and models for the covid- pandemic are fairly recent and many have not yet been carefully peer reviewed, researchers should document the sources of data used, data preprocessing protocols, source computing code and sufficient modelling details to allow external validation from the public. such details are also necessary to allow others, who may have better quality data but without sufficient statistical expertise, to easily adopt new methodologies to obtain high-quality results. as mentioned in an original post by dr nilanjan chatterjee (https://link.medium.com/hquqilead ), transparency, reproducibility and validity are three criteria to assess and assure the quality of prediction models. his essay also mentioned the difficulty in reproducing the work given by the ihme to obtain accurate predictions and appropriate confidence intervals. similar to the ihme method that has no software available, gu's method for the covid- prediction (https://covid -projections. com/) that has recently received much attention does not provide software, either, unfortunately. without clear guidance and full reproducibility, even models that currently do well might fail in the future because predictions are relying on certain kinds of extrapolation assumptions that need to be unveiled to the scientific community with full transparency for validation and comparison. given that model projections for the covid- pandemic have been changing dramatically from day to day primarily because the underlying models are changing, the primary aim may be set at optimising prediction models for nowcasting or short-term projections and be aware of the probable worst case scenarios for longer-term trends. as shown in the data example in section . , the optimal tuning parameter is determined by the minimal short-term -day ahead prediction error. as pointed out by huppert and katriel ( ) , transmission models with different underlying mechanisms may lead to similar outcome in one context (e.g. short term) but fail to do so in another (e.g. long term). the further we project, the more we are uncertain about the validity of model assumptions. hence, extra caution is needed when reporting and interpreting long-term projection results. with the available surveillance data, making a nowcast of infection risk in next few hours is difficult; but it may become feasible when certain data sources of local information are accessible, such as electronic health records from local hospitals, viral testing results from local testing centres and mobile tracking data from individual cell phones. this requires a finer-resolution prediction machinery that may be established by generalising the ca to certain spatial point processes. despite being challenging, such prediction paradigm would be very useful and worth a serious exploration. because of the potential bias in surveillance data, either delayed reporting of infected case or inaccurate ascertainment of death caused by a virus, there are many measurement errors in data. this calls for statistical methods that can directly handle various data collection biases or are robust to such biases. there is little work performed in this important field of statistical modelling and analyses. in the current literature, model diagnostics for infectious disease models are largely lacking. given that most of the existing mechanistic models are based on certain parametric distributions (e.g. poisson processes), checking model assumptions is required. for example, for the proposed poisson process, the assumption of incremental independence and overdispersion should be checked. in addition, procedures of validating prediction accuracy are also important in which the choice of test data is tricky and needs to be guided by some objective criteria. a major weakness noted for the existing mechanistic models is the inflexibility of adding individual or subgroup covariates (e.g. age and race). the current strategy of handling these extra variables is via stratification, which would end up with strata with small sample sizes, so that subsequent statistical analyses lose power in both estimation and prediction of infection dynamics. an extension from the ca seems promising as the ca presents a system of particles distributed in different cells (or strata), where individual characterisations on particles may be added via covariates. the resulting model would assess and predict personal risk, as well as identify hotspots of new infection. this is worth serious exploration in the future with appropriate data available (e.g. electronic health records from hospitals). for a global pandemic such as the covid- that affects over countries in the world, an integrative analysis is appealing to understand common features of the pandemic so to learn different control measures. given the fact that a pandemic evolves typically in a certain time lag, experiences from countries with earlier outbreaks may be shared with countries with later outbreaks, where statistical methods may borrow relevant information to set up prior distributions in the model fitting. for example, the estimated reproduction number estimated from the european covid- data may be a hyperparameter in the statistical analysis of the us covid- data. there is a clear need of more comprehensive meta-analysis methods to better integrate data from different countries than using the data to create hyperparameters. along this line, one of the earliest attempts is to combine covid- forecasts from various research teams using ensemble learning (see, e.g. https://github.com/reichlab/covid -forecast-hub). most investigation efforts made by quantitative researchers have been relatively independent in an academic setting, and it is high time that policymakers and stakeholders are involved and play an active role in such modelling efforts. long-term projection of the covid- is most sensitive to and highly dependent on public health policy. a major source of uncertainty is due to the conflicting demands between public health (disease mitigation) and the need to sustain economic growth (livelihood), and the balance of the two is a moving target. one way to account for the modelling uncertainty is to factor in economic planning as a time-varying modifier of projection models. although some efforts have been made to incorporate economical data, most are retrospectively oriented, and we believe more efforts should be spent to prospectively incorporate expert inputs and economic forecasts. this is a research area of great importance worth serious exploration. we like to close this review paper by casting a few open questions of great interest to the public (at least to ourselves) that statisticians may help deliver answers with existing or new data to be collected by innovative study designs. we also hope that these questions motivate new methodological developments. question : how would researchers assess both timing and strength of the second wave of the covid- pandemic? is the second wave worse than the first one? answers to these questions need a relatively accurate long-term prediction of the infection dynamics. among so many different statistical models being able to predict future spreading patterns, we need to identify few ones or their combinations that are particularly useful to make long-term predictions. question : as many countries and regions started to reopen business, how would government monitor the likelihood of a recurring surge of covid- caused by business reopenings? does the social distancing measure help reduce a potentially rising risk? answers to these questions require adequate data that may not be easily collected by routine approaches. statisticians may work with practitioners to develop good sampling instruments and schemes for community risk surveillance. question : is face mask protective? if so, how to assess the compliance of face mask wearing? questions about the causal effect of face mask wearing on disease progression are very challenging. this is because there is no randomisation in the intervention allocation and many confounding factors are unobserved. question : is there evidence that the contagion of the coronavirus decays over time because of an increasing recovery rate of virus carriers and a decreasing rate of case fatality? statisticians ought to work out some thoughtful and convincing answers to the public. such surveillance data, there are data reporting gaps shown in figure a that are possibly caused by the so-called clustered reporting; that is, the recovered cases have not been released on the daily basis. to mitigate this data reporting artefact, we invoked a simple local polynomial regression procedure (loess) to smooth such unnatural jumps, resulting in a smooth fitted curve shown in figure a . the calibrated cumulative numbers of removed cases from the fitted curve (rounded to the corresponding integers) are available from the corresponding author upon request. the total population in michigan is set as . million. the summarised us state-level count data, which are weekly updated, can be also be found directly from the esir package introduced in section . . special report: the simulations driving the world's response to covid- on modeling epidemics including latency, incubation and variable susceptibility an introduction to stochastic epidemic models infectious diseases of humans: dynamics and control stochastic epidemic models and their statistical analysis on identifying and mitigating bias in the estimation of the covid- case fatality rate the mathematical theory of infectious diseases and its applications. charles griffin & company ltd: a crendon street estimating excess -year mortality associated with the covid- pandemic according to underlying conditions and age: a population-based cohort study absolute humidity, temperature, and influenza mortality: years of countylevel evidence from the united states a simple cellular automaton model for influenza a viral infections on a general stochastic epidemic model an estimation procedure for household disease data statistical studies of infectious disease incidence real time bayesian estimation of the epidemic potential of emerging infectious diseases a probabilistic automata network epidemic model with births and deaths exhibiting cyclic behaviour stochastic epidemic models: a survey stochastic epidemic models with inference general methods for monitoring convergence of iterative simulations numerical methods for ordinary differential equations a monte carlo approach to nonnormal and nonlinear state-space modeling monte carlo em estimation for time series models involving counts model parameters and outbreak control for sars the estimation of the effective reproductive number from disease outbreak data locally weighted regression: an approach to regression analysis by local fitting a new framework and software to estimate time-varying reproduction numbers during epidemics statistical analysis of time series: some recent developments state space mixed models for longitudinal observations with binary and binomial responses the simulation smoother for time series models the incidence of infectious diseases under the influence of seasonal fluctuations the estimation of the basic reproduction number for infectious diseases sequential monte carlo methods in practice tracking epidemics with google flu trends data and a state-space seir model differential privacy: a survey of results strategies for mitigating an influenza pandemic a likelihood-based method for real-time estimation of the serial interval and reproductive number of an epidemic cellular automata and epidemiological models with spatial dependence individual-based lattice model for spatial spread of epidemics composite likelihood em algorithm with applications to multivariate hidden markov model bayesian data analysis inference from iterative simulation using multiple sequences markov chain monte carlo maximum likelihood. interface foundation of north america observed mobility behavior data reveal social distancing inertia group testing against covid- clinical characteristics of novel coronavirus infection in china. medrxiv a multi-objective structural optimization using optimality criteria and cellular automata temporal dynamics in viral shedding and transmissibility of covid- modeling infectious disease dynamics in the complex landscape of global health the mathematics of infectious diseases mathematical modelling and prediction in infectious disease epidemiology forecasting covid- impact on hospital bed-days, icu-days, ventilator-days and deaths by us state in the next months on the responsible use of digital data to tackle the covid- pandemic viral shedding and transmission potential of asymptomatic and paucisymptomatic influenza virus infections in the community caution warranted: using the institute for health metrics and evaluation model for predicting the course of the covid- pandemic mathematical modeling of diseases: susceptible-infected-recovered (sir) model algorithms for geodesics a contribution to the mathematical theory of epidemics estimating the basic reproductive ratio for the ebola outbreak in liberia and sierra leone the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application statistical inference in a stochastic epidemic seir model with control intervention: ebola as a case study early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia the reproductive number of covid- is higher compared to sars coronavirus spatial organization and evolution period of the epidemic model using cellular automata applications of mathematics to medical problems dependence of epidemic and population velocities on basic parameters on the spatial spread of rabies among foxes on estimating regression amid ongoing covid- pandemic, governor cuomo announces results of completed antibody testing study of , people showing . percent of population has covid- antibodies the r package: a toolbox to estimate reproduction numbers for epidemic outbreaks forecasting seasonal influenza with a state-space sir model association of public health interventions with the epidemiology of the covid- outbreak in wuhan reproducible research in computational science predictions, role of interventions and effects of a historic national lockdown in india's response to the covid- pandemic: data science call to arms asymptotic properties of some estimators for the infection rate in the general stochastic epidemic model nine challenges for deterministic epidemic models dynamical phases in a cellular automaton model for epidemic propagation temperature and latitude analysis to predict potential spread and seasonality for covid- mathematical modeling of infectious disease dynamics a cellular automaton model for the effects of population movement and vaccination on epidemic propagation forecasting epidemics through nonparametric estimation of timedependent transmission rates using the seir model correlated data analysis: modeling, analytics, and applications bayesian measures of model complexity and fit phase transition in spatial epidemics using cellular automata with noise tracking reproductivity of covid- epidemic in china with varying coefficient sir model improved inference of time-varying reproduction numbers during infectious disease outbreaks on some mathematical problems connected with patterns of growth of figures an overview of composite likelihood methods estimates of the severity of coronavirus disease : a modelbased analysis theory of self-reproducing automata how generation intervals shape the relationship between growth rates and reproductive numbers different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures an epidemiological forecast model and software assessing interventions on covid- epidemic in china (with discussion) smooth regression analysis modeling epidemics using cellular automata epidemic dynamics: discrete-time and cellular automaton models naming the coronavirus disease (covid- ) and the virus that causes it cellular automaton modeling of epidemics an interactive covid- mobility impact and social distancing analysis platform semiparametric bayesian inference for the transmission dynamics of covid- with a state-space model a spatiotemporal epidemiological prediction model to inform county-level covid- risk in the usa semiparametric stochastic modeling of the rate function in longitudinal studies the authors are very grateful to the co-editors for the invitation to contribute a review paper for statistical modelling and analysis of infectious diseases and for their helpful feedback towards improving the manuscript. this research is partially supported by the national science foundation grant dms . although the two applications discussed earlier in section . give a framework of how ca models the dynamics of epidemic spread, white et al. ( ) provide a more direct incorporation of spatial ca with the temporal sir compartments at the population level, where each cell stands for a small population (e.g. a county) with different proportions of susceptible, infectious or recovered individuals. the resulting ca-sir given in white et al. ( ) is formulated by four parts (c, q, v and f). first, c d f.i; j/; Ä i Ä r; Ä j Ä cg defines the cellular space, or a collection of r c cells on a two-way array, where r c is referred to the dimension of the cells. second, q represents a finite set that contains all the possible states of a cellular space. in the case of the sir model, q d fs; i; rg corresponding to the susceptible, infectious and removed states. third, v d f.p k ; q k /; Ä k Ä ng is the finite set of indices defining the neighbourhood of each cell, and consequently, v ij d f.i c p ; j c q /; : : : ; .i c p n ; j c q n /g denotes the set of neighbouring cells for the central cell .i; j/. specifically, v d v f. ; /g represents all the neighbouring cells without the cell at the centre of consideration. fourth, function f stands for certain updating rules to govern the dynamics of interactions between cells in the a ca-sir system. for each cell at a discrete time t (say, today), its current status is described by three cell-specific compartments fÂ s ij .t/, Â i ij .t/ and Â r ij .t/g, where Â s ij .t/, Â i ij .t/ and Â r ij .t/ oe ; represent the cell-specific probabilities of being susceptible, infectious and recovered, respectively. clearly, Â s ij .t/ c Â i ij .t/ c Â r ij .t/ d to form a microcell-level sir model. the ca-sir model is updated based on the following transition functions: for cell .i; j/ v, based on the basic ca-sir model proposed in white et al. ( ) , extensions can be easily applied to better model the dynamics of infectious diseases using real data. propose a spatio-temporal epidemiological forecast model that combines ca with an extended sair (esair) model to project the county-level covid- prevalence over counties in the continental united states. this model is termed as ca-esair model in which a county is treated as a cell. to carry out cell-level infection prevalence updates, the macroparametersǎ nd need to be estimated from the macromodel esair model. in comparison with the esir model discussed in section . , a new antibody compartment (a) is included in the esair model to account for the individuals who are self-immunised and have developed antibodies to the coronavirus. the inclusion of the antibody compartment can address the under-reporting issue known for available public databases and to build self-immunisation into the infection dynamics. in this way, better estimation of the macromodel parameters can be obtained. the esair model can be described using the following odes, which govern the law of interactive movements among four compartments or states of susceptible (s), self-immunised (a), infectious (i) and removed (r):where˛.t/ is the self-immunisation rate, .t/ is a time-varying transmission rate modifier,ˇis the basic disease transmission rate and is the rate of being removed from the system (either dead or recovered). the earlier esair model is an alternative expression of model ( ) based on the compartment probabilities. in order to apply the ca-esair system to model the epidemic spread in the usa, relax the classical ca-esair from spatial lattices (or cells) to areal locations of counties. let c be the collection of counties. here we consider the extended neighbourhood type (all counties are neighbouring ones given high mobility in the us population). for a county c c, n c denotes the county population size, and c c denotes the set of all the other counties except county c. for county c at time t, the county-specific probability vector is denoted by Â c .t/ d .Â s c .t/; Â a c .t/; Â i c .t/; Â r c .t// > . the ca-esair model at discrete times is expressed by the following form:where˛c.t/ is the county-specific self-immunisation rate and c .t/ is the county-specific transmission modifier. same as the parameter mentioned in the ca-sir model ( ) earlier, ! cc .t/ is a connectivity coefficient that quantifies the inter-county movements between counties c and c . by applying the proposed ca-esair model, have proposed a t-day ahead risk forecast of the covid- as well as a personal risk related to a travel route. the runge-kutta method is an efficient and widely used approach to solving ordinary differential equations when analytic closed-form solutions are unavailable. it is typically applied to derive a numerical functional system of high-order accuracy with no need of high-order derivatives of functions. the most well-known runge-kutta approximation is the runge-kutta fourth-order (rk ) method. for example, in the case of the mechanistic sir model ( ) where y is an unknown function in time t, which can be either a scalar or a vector. then for a preselected (small) step size h > , a fourth-order approximate solution of y satisfies at a sequence of equally spaced grid points y n ; n d ; ; : : : ; with jy n y n j d h, y nc d y n c h.k c k c k c k /; n d ; ; : : : ;where k d f .t n ; y n /;k d f .t n c h; y n c hk /:because four terms k , k , k and k are used in the approximation, the earlier method is termed as an rk method of the ode solution to function y. for a general rk approximation, refer to stoer and bulirsch ( ) . in the succeeding text, we list michigan data from march to june . the numbers of daily confirmed cases and deaths are obtained from the github repository jhu csse (https://github.com/cssegisanddata/covid- ), and the daily recovery data are collected from point acres (https://coronavirus. point acres.com). the daily cumulative numbers of deaths and recovered cases are then summed as the cumulative number of removed cases. in key: cord- - gd hjc authors: ma, junling; earn, david j. d. title: generality of the final size formula for an epidemic of a newly invading infectious disease date: - - journal: bull math biol doi: . /s - - - sha: doc_id: cord_uid: gd hjc the well-known formula for the final size of an epidemic was published by kermack and mckendrick in . their analysis was based on a simple susceptible-infected-recovered (sir) model that assumes exponentially distributed infectious periods. more recent analyses have established that the standard final size formula is valid regardless of the distribution of infectious periods, but that it fails to be correct in the presence of certain kinds of heterogeneous mixing (e.g., if there is a core group, as for sexually transmitted diseases). we review previous work and establish more general conditions under which kermack and mckendrick's formula is valid. we show that the final size formula is unchanged if there is a latent stage, any number of distinct infectious stages and/or a stage during which infectives are isolated (the durations of each stage can be drawn from any integrable distribution). we also consider the possibility that the transmission rates of infectious individuals are arbitrarily distributed—allowing, in particular, for the existence of super-spreaders—and prove that this potential complexity has no impact on the final size formula. finally, we show that the final size formula is unchanged even for a general class of spatial contact structures. we conclude that whenever a new respiratory pathogen emerges, an estimate of the expected magnitude of the epidemic can be made as soon the basic reproduction number ℝ( ) can be approximated, and this estimate is likely to be improved only by more accurate estimates of ℝ( ), not by knowledge of any other epidemiological details. whenever a serious infectious disease emerges or re-emerges in a human population, a matter of immediate interest is the likely magnitude of the outbreak. this is often called the expected final size of the epidemic (bailey, ; anderson and watsons, ; anderson and may, ; andersson and britton, ; diekmann and heesterbeek, ) , which we denote z. of course, if the emergent pathogen becomes endemic in the population then there will never really be a "final" size, but in that case the matter at issue is the likely magnitude of the first major wave of cases. to our knowledge, a formal, mathematical argument leading to an estimate of an epidemic's expected final size z was first published in the landmark paper by kermack and mckendrick ( ) . the formula for z that kermack and mckendrick obtained [eq. ( ) below] depends only on the basic reproduction number, r (the expected number of secondary cases caused by a typical primary case in a fully susceptible population). however, without further analysis, it is unclear to what extent this formula depends on the particular assumptions that kermack and mckendrick made in constructing their model, and hence whether it bears a strong relationship to the final size in realistic situations. one implicit assumption of the kermack and mckendrick ( ) analysis is that infectious periods are exponentially distributed. anderson and watsons ( ) showed that the final size formula remains valid if infectious periods follow a gamma distribution. in section . of their book, diekmann and heesterbeek ( ) generalized this statement to cover an arbitrary distribution of infectious periods. another key assumption of the kermack and mckendrick ( ) analysis is that the host population is homogeneously mixed. it is well known that if this assumption is dropped then the final size will not necessarily be given by the standard formula. in particular, the existence of a core group, as for sexually transmitted diseases, gives rise to a different formula for the final size (anderson and may, ; diekmann and heesterbeek, ) . in this paper, we explore the generality of the standard kermack-mckendrick final size formula ( ). we begin with a pedagogical review of the main results that have been obtained previously, adding model structure in steps. we then proceed to generalize these results in three new directions, showing that the standard formula remains valid (i) regardless of the number of distinct infectious stages, (ii) if the mean contact rate is itself arbitrarily distributed and (iii) for a large class of spatially heterogeneous contact structures. we conclude that the kermack-mckendrick formula ( ) applies in great generality, making it an extremely useful relationship in practice. the standard sir model (anderson and may, ; diekmann and heesterbeek, ; brauer and castillo-chavez, ) is represented by a system of three ordinary differential equations, here, s, i, and r denote the proportions of the population that are susceptible, infectious and recovered, respectively. recovered individuals are assumed to be immune to reinfection. since s + i + r = , one of the three equations is redundant. the two parameters in the model are the transmission rate (β) and the recovery rate (γ ). the mean infectious period is t = /γ . the basic reproduction number is r = βt. the proportion of the population that is infected can increase, and hence an epidemic can occur, if and only if r > . in eqs. ( ), recruitment of new susceptibles through birth or immigration is ignored, as is loss of individuals through mortality or emmigration. this approximation is reasonable provided the timescale of the epidemic is much shorter than the timescale over which demographic turnover occurs. no recruitment ensures that the disease will eventually burn out, i.e., i(∞) = . to see formally that we must have i(∞) = , note that the positive orthant is invariant so all solutions of eqs. ( ) lie in the non-negative, bounded set defined by s, i, r ≥ and s + i + r = . observing that we see that s + i is decreasing whenever i > . however, s + i is bounded below by ; hence it has a limit. moreover, eq. ( ) implies that d dt (s + i) is bounded because i is bounded. hence lim t→∞ d dt (s + i) = , so eq. ( ) yields i(∞) = . this model ( ) is sufficiently simple that an exact solution can be obtained for the phase portrait. forming di/ds and integrating yields an epidemic ends when no infectives remain. consequently, we can find the final size z by setting i = in eq. ( ) and solving for z = s( ) − s. this yields the implicit relation (an explicit form is given in appendix a). the general formula ( ) applies regardless of the proportions of the population that are susceptible and infective initially. a less than fully susceptible population must be taken into account if some individuals have been vaccinated or retain immunity from previous exposures. however, in the important special case where a new pathogen enters a fully susceptible population, we have both i( ) and s( ) ∼ . in the limit i( ) → , s( ) → , which is the usual final size formula. the practical utility of the final size formula would appear to be limited by the fact that it is derived from the overly simplistic standard sir model. in the remainder of this paper we show that, in fact, the formula is valid under very general circumstances. we generalize the model in several steps, in order to explain the ideas clearly and to avoid exposing the reader to a notational burden at the outset. the standard sir model ignores any delay between the time an individual is infected and the onset of infectiousness. this latent period is often substantial compared with the infectious period (anderson and may, ) . in addition, some pathogens (notably hiv) give rise to a sequence of distinct stages of infection, each yielding a different transmission rate (redfield et al., ; seligmann et al., ) . it is common to incorporate latency by including an "exposed" class (e), yielding an seir model (schwartz and smith, ; earn et al., a) . however, noting that the exposed stage can be regarded as an infectious stage during which the transmission rate happens to be zero, we lose no generality by restricting attention to multiple infectious stage models, which we refer to as si n r models if there are n infectious stages. generalizing the sir model ( ) to include multiple infectious stages, the equations for an si n r arė where i i is the density of individuals in the ith infectious stage, β i is the transmission rate for contact with individuals in this stage, and γi is the mean duration of this stage. the sum of eqs. ( ) is zero, implying that s + n i= i i + r is invariant (and hence that eq. ( d) is superfluous, as in the basic sir model). to see that the final size is well-defined in this model, we generalize the argument given for the simple sir model in section . let j k = s + k i= i i . then d dt j k = −γ k i k , so i k > implies j k is decreasing and lim t→∞ d dt j k = . hence i k (∞) = for all k. before a new disease is introduced into a population, everyone is susceptible, i.e., s = , i = · · · = i n = and r = , which is known as the disease free equilibrium (dfe). the stability of the dfe determines whether the new disease can cause an epidemic. hyman et al. ( ) proved that the quantity determines the stability of the dfe, i.e., the dfe is locally stable if r < and unstable if r > . note r as defined by ( ) is the sum of the products of the transmission rate and mean duration in each infectious stage, i.e., it is the basic reproduction number for this model. we now proceed to show that eqs. ( ) yield the same final size formula as ( ). anderson and watsons ( ) established this for the special case in which each stage has the same transmission rate and duration (β i = β and γ i = γ for all i). their method can easily be generalized to arbitrary β i and γ i as follows: let g k = n i=k+ i i + r, k = , . . . , n − and g n = r. then eqs. ( c) and ( d) imply d dt g k = γ k i k for each k = , . . . , n. consequently, eq. ( a) implies that the function is a constant of the motion. for a newly invading disease, s( ) → , i k ( ) → and recalling that z = r(∞) = − s(∞), and eq. ( ), the final size formula eq. ( ) immediately follows. consideration of a special case of the model in the previous section allows us to infer that the distribution of stage durations in any infectious stage need not be exponentially distributed. the final size formula ( ) holds true if the stage durations follow a gamma distribution, which has probability density here, the shape parameter k is a positive integer and the scale parameter φ > . the mean is k/φ and the variance is k/φ . this family of distributions includes the exponential distribution (k = ), nearly normal distributions (k large) and the delta distribution, which yields a fixed duration (k → ∞). the term −γ i in eq. ( b) for the simple sir model implies that the durations of infectious periods are exponentially distributed with mean /γ (brauer and castillo-chavez, ) . since the sum of exponentially distributed random variables is gamma distributed, a standard trick (anderson and watsons, ; lloyd, a,b) enables us to replace an exponential distribution with mean /γ by a gamma distribution g k,kγ (which also has mean /γ ). the trick is to replace a single infectious stage with k identical exponentially distributed substages of mean duration /(kγ ). the model then has precisely the form the si n r model ( ), but with the sequence of infectious stages being mathematical artefacts rather than biologically meaningful. since this substage trick can be applied equally well to any infectious stage, anderson and watson's ( ) conclusion that the final size in an sir model with gamma distributed infectious periods is given by the usual formula ( ) now generalizes to an arbitrary number of stages, each with gamma distributed durations. although expressed differently, in their original paper kermack and mckendrick ( ) studied a particular limit of the gamma distributed multiple stage model described in the previous section. suppose we have a sequence of n infectious stages, each with the same, fixed stage duration τ . thus, this is the limit k → ∞ in the gamma distribution ( ), with φ = /τ . during stage i, the transmission rate is β i (i = , . . . , n). if we now let the number of infectious stages increase (n → ∞) and the length of each stage decrease (τ → ), keeping the total infectious period (t = nτ ) constant, then we arrive at a model in which the transmission rate (β) varies continuously through the infectious period. in such a model, an individual's infectivity depends on his/her stage-age, i.e., the amount of time since initial infection. using a stage-age sir formulation, kermack and mckendrick ( ) derived the final size formula ( ) and showed that its form is independent of how the transmission rate depends on stage-age. we have seen that the final size formula ( ) is the same if stage durations are distributed according to any member of the family of the gamma distributions ( ). this suggests that any distribution of stage durations will yield the same final size. unlike the situation for gamma distributed infectious periods, with an arbitrary distribution the time-evolution of the infectious class can no longer be expressed using ordinary differential equations (odes). instead, we must construct a system of integro-differential equations (feng and thieme, ) . let u(t) be the rate at which individuals become infectious (enter class i) at time t ≥ , i.e., the equation for the susceptible class is the same as in the standard sir model, we also define u(t) for t < since this will allow us to specify the initial conditions. for t < , u(t) is the rate at which individuals who were still infectious at the initial time (time ) became infectious at time t. hence, the number of infectious individuals at time t = is let f (t) be the probability density of the infectious period (so, for the special case of a gamma distribution, f (t) is given by g k,φ (t) in eq. ). for an individual who becomes infectious at time τ ≥ , the probability density that s/he recovers (leaves the i class) at time t > τ is however, for individuals who became infectious at time τ < , the probability density to leave class i at time t must be conditioned by still being infectious at time , hence, the rate of change of the number of infectious individuals at time t is those who leave class i enter class r, hence in appendix b, we show that if the infectious period is exponentially distributed then eqs. ( ) reduce to the the standard sir model ( ). as in the standard sir model ( ), the population size is invariant (s + i + r = ), s ≥ , and s(∞) exists. establishing that i(∞) = (so the disease will burn out regardless of the infectious period distribution) requires a little more work. since f (t) and l(τ, t) are probability densities, ∞ f (t) dt = ∞ l(τ, t) dt = ; consequently, switching the order of integrations, we find next, integrating from to ∞ on both sides of eq. ( f), we have i(∞) = . to relate the final size in the present model to the usual formula ( ), we need an expression for the basic reproduction number r . feng and thieme ( ) have we now divide eq. ( b) by s and integrate, yielding therefore, the usual final size formula ( ) will follow if ∞ i(t) dt = t z. this is plausible, intuitively, since the integral ∞ i(t) dt sums all infectious individuals, weighted by the time they are infectious; this must also equal the mean infectious period t times the number of individuals infected over the course of the entire epidemic (the final size z). more rigorously, in appendix b we prove lemma . . for the model specified by eqs. ( ), in the limit ( ). if the initial number of infectious individuals is small (i( ) → ), then the final size of the epidemic (z) is given by the classical formula ( ). in this formula, the basic reproduction number is r = βt, where t is the mean infectious period. with modest additional effort, we can generalize theorem . to the multiple stage si n r model, i.e., the usual final size formula ( ) is still valid if there are multiple infectious stages and the durations of each stage are arbitrarily distributed. recall that latent stages can be considered infectious stages with zero transmission rate, so we do not make any explicit reference to latency. as in all the situations considered above, we assume that vital dynamics (births and deaths) can be ignored and that recovery entails lifelong immunity. suppose the ith infectious stage has duration distribution with probability density f i (t) and mean t i . thus, in the special case that durations in each stage follow gamma distributions ( ), we have f i (t) = g ki φi (t) and the the rate of change of the susceptible class is given by eq. ( a), as when all stage durations are exponentially distributed. similar to section , the equation for each infectious stage can be writteṅ where u i (t) is the number of individuals entering class i i at time t, and the initial conditions are specified by individuals who leave the last infectious stage recover (into class r), hencė when the infectious periods and the latent period are exponentially distributed, i.e., f i (t) = γ i e −γi t , the technique introduced in appendix b can be applied to show that the model ( ) is indeed the standard multiple-stage sir model. in their analysis of the si n r model with arbitrarily distributed stage durations ( ), feng and thieme ( ) have shown that the basic reproduction number is and that the disease free equilibrium is unstable if and only if r > . similar to previous sections, we find that s + i i i + r = is invariant, s ≥ , and s(∞) exists. integrating eq. ( a) from to t and and switching the order of integrations, we have i(t) ≥ ; similarly, integrating from to ∞, we have i(∞) = . thus, the disease will also eventually burn out. theorem . . consider the multiple stage si n r epidemic model with arbitrarily distributed stage durations, specified by eqs. ( ) . if the initial number of infectious individuals is suffciently small, i.e., in the limit i i i ( ) → , the final size z is given by the unique solution of eq. ( ), or explicitly by eq. (a. ) , in which r is given by eq. ( ) . proof. to prove this, we divide by s on both sides of ( a) and integrate with respect to time, using lemma . , eq. ( ) becomes suppose i i ( ) = i . switching the order of integration, also, integrating eq. ( d) and switching the order of integrations, we have i.e., summed over all time, the number of individuals who enter the (i + )th stage is the same as the number who enter the ith stage. we then integrate eq. ( a). from the definition of u (t) in eq. ( c), we have for all i = , . . . , n. notice that the final size z = s( ) − s(∞), so eq. ( ) becomes during the epidemic of severe acute respiratory syndrome (sars) in (poutanen et al., ; low and mcgeer, ) , one aspect of observed transmission that received a great deal of attention was the existence of super-spreaders (who sars update , ), i.e., individuals who infect many more people than the average (e.g., at least an order of magnitude greater than r ). in this section, we prove that the existence of the type of super-spreaders that occurred during the sars outbreak in has no effect on the final size formula. one mechanism for the generation of super-spreaders is that the infectious period distribution could be bimodal, such that a small proportion of individuals are infectious much longer than average. this situation is already covered by theorem . , which deals with arbitrary distributions of stage durations. in this section, we consider a different type of generalization of the sir model. in all of the models we have considered thus far, it has been assumed implicitly that all infectious individuals have the same transmission rate. if some individuals actually have a much higher than average transmission rate then they will be super-spreaders. this could arise, for example, because of individual variation in viral load (reflecting variation in immune response) or as the result of chance occurence of environmental circumstances that promote disease spread (as apparently occurred during the sars epidemic (who sars update , )). whatever the origin, we focus here on variation in transmission rate that is manifested only after an individual is infected. in the following sections we consider the effects of intrinsic variation in contact rates among individuals, which results from inhomogeneous contact network structure in the host population. we now proceed to develop and analyze an sir model with a distribution of transmission rates and show that the usual final size formula ( ) applies. extension of our results to a multiple stage si n r model is straightfoward and similar to the way we generalized our results for the sir with arbitrary infectious period distribution to an si n r model with arbitrary stage duration distributions. let i(β, t) be the distribution (at time t) of infectious individuals that have transmission rate β. note that we assume implicitly that a given individual retains the same transmission rate throughout his/her infectious period, and hence that this transmission rate is determined as soon as infection occurs. let q(β) be the probability that a newly infectious individual has transmission rate β. thus, the mean transmission rate is the rate of change of the proportion of the population that is susceptible iṡ upon infection, individuals leave the susceptible class s and enter the infectious class i, a proportion q(β) of them having a transmission rate β. upon recovery, they leave the infectious class at rate γ . thus, in order to ensure that the model is well posed, we must establish that if i(β, ) > then i(β, t) > for all time. suppose this is not true. then there existsβ and t such that ∞ i(β, t) dβ > , i(β, t) > for t < t, but i(β, t) = . substituting these conditions into eq. ( b), we obtainİ(β, t) > . but this contradicts the hypothesis that i(β, t) decreases from i(β, ) > to i(β, t) = . hence i(β, t) > for all t ≥ . we now check that the disease will always burn out in this model, as expected. which is strictly negative. hence, s + i tot is decreasing and bounded below by zero. hence lim t→∞ d dt (s + i tot ) = , i.e., i tot (∞) = . for the model specified by eqs. ( ), the basic reproduction number is in appendix b,we prove that the disease free equilibrium (dfe) is unstable if and only if r > . to find the final size, we divide by s on both sides of ( a) and integrate, substituting eq. ( a) in eq. ( b) we can writė integrating this we obtain since we have established that i(β, ∞) = , we can write substituting ( ) into ( ) and using ( ), we obtain hence, since the final size z = s( ) − s(∞), we have it is possible to generalize this theorem for an si n r model with arbitrarily distributed stage durations and arbitrarily distributed transmission rates in each stage. we do not go through the details because no new ideas are required. with theorems . and . in hand, the extra effort required for the generalization is merely one of keeping track of notation. in all of the models that we have discussed so far, we have assumed that the population is homogeneously mixed. in the remainder of this paper, we explore the significance of heterogeneous mixing for the final size formula. in the present section we identify an important class of spatially structured models for which the standard formula ( ) remains valid. a simple but important example of heterogenous mixing occurs if the population is divided into a number of spatially isolated patches (e.g., cities), often called a metapopulation (hanski and gilpin, ) . in ecological models, coupling among patches in a metapopulation usually occurs as a result of migration (earn et al., b) . in the present context, inter-patch coupling occurs because individuals travel temporarily from their home patch to other patches. on such journeys, susceptible individuals might contact infectious individuals and become infected, and infectious individuals might contact susceptibles and infect them. if we restrict attention to a single infectious stage with exponentially distributed infectious period then the model is specified as follows. here, s i , i i , and r i (i = , . . . , n) are the proportions of the population in patch i that are susceptible, infectious recovered, respectively. summing eqs. ( ), we see that s i (t) + i i (t) + r i (t) = is invariant in all patches. we denote by n i the number of individuals in patch i, so the total population size is n = i n i . it is also convenient to use b = (β i j ) to denote the n × n transmission matrix. similar to our analysis of other models in previous sections, it is straightforward to prove that s i (∞), r i (∞), i i (∞) exist, that s i (∞) + r i (∞) = and that i i (∞) = . since β i j ≥ , the dominant eigenvalue λ of b is real and positive (horn and johnson, ) . if we linearize the model ( ) at the disease free equilibrium (dfe) (s i = , i i = , r i = ), we see that the stability of the dfe is determined by λ. if λ/γ > , then the dfe is unstable, otherwise it is stable. thus, the threshold for exponential growth of cases (an epidemic) is determined by the value of r = λ/γ , which can be interpreted as the basic reproduction number (van den driessche and watmough, ) . the final size in patch i is z i = s i ( ) − s i (∞). we can obtain a system of n coupled algebraic equations for the z i if we divide by s i on both sides of eq. ( a) and integrate from to ∞. these algebraic equations contain ∞ i i dt, which can be eliminated by integrating eq. ( c) and noting that z i = r i (∞) − r i ( ). in the limit i i ( ) → , s i ( ) → , we obtain these equations form a special case of a more general final size formula discussed by diekmann and heesterbeek ( ) (exercises . and . ). in their solution to exercise . , diekmann and heesterbeek ( ) sketch an argument that implies that if r > then there is a unique non-trivial (i.e., non-zero) solution to eqs. ( ). the final size for the whole metapopulation, z, is not in general the simple average of the z i 's in eq. ( ). rather, each z i must be weighted by the population size n i of patch i, i.e., consequently, there will always be particular values of the n i that make z in eq. ( ) equal to the solution of the standard formula ( ). however, since population sizes are given a priori, the problem of interest is to find necessary and sufficient conditions on the transmission matrix (b) that ensure that the sum in eq. ( ) is equal to the final size obtained from the standard formula ( ), regardless of the values of the n i . in eq. ( ), z will have no dependence on the patch population sizes if and and only if given this, eq. ( ) implies i.e., the transmission rate is the same in all patches. if, on the other hand, we start with eq. ( ) as an assumption, then inserting eq. ( ) in eq. ( ) restricting attention to situations in which the transmission rate is the same in every patch may seem to be an extremely stringent constraint. in fact, this is often the situation of interest. for example, individuals who normally reside in boston, philadelphia or new york can expect to contact similar numbers of people each day, even though the new york metropolitan area has a population that is several times larger than the other two. more generally, if patches represent large cities in a given country, then it is reasonable to expect that the number of contacts an individual has will not depend strongly on the size of the city. given eq. ( ), we can write where p i j is the probability, for an individual in patch i, that a given contact is with an individual in patch j, and β is the transmission rate (the product of the number of contacts per unit time and the probability that a contact between a susceptible and an infectious individual leads to transmission). note that since the rows of the matrix p are probability distributions, it follows that n j= p i j = , i = , . . . , n, i.e., p is a stochastic matrix (horn and johnson, ) . for convenience, we make the mild assumption that p i j > for all i and j, i.e., for any pair of patches, there is a non-zero probability that residents of the two patches will come into contact. it then follows that if the final size z is positive then the final size z i is strictly positive in every patch. to see this, note that if z i = for some i then eq. ( ) since p i j > implies β i j > , eq. ( ) implies z j = for all j. since p is a positive, stochastic matrix, its largest eigenvalue is (horn and johnson, ) . consequently, the largest eigenvalue of b = β p is β and the basic reproduction number is r = β γ . therefore, we have theorem . . consider the multi-patch sir epidemic model specified by eqs. ( ) . suppose the transmission rate in each patch is the same, i.e., n j= β i j = β, for each i. then, in the limit that the initial proportion of individuals that are infectious is small, i.e., i i ( ) → for each i, the final size z of an epidemic is given implicitly by eq. ( ), or explicitly by eq. (a. ) , where r = β γ . at this point, it will come as no surprise that with some effort in keeping track of notation, theorem . can be generalized to a multi-patch si n r model with arbitrarily distributed stage durations and arbitrarily distributed transmission rates. equations ( ), which specify the spatially heterogeneous model we analyzed in the previous section, can be interpretted as representing other types of social heterogeneities, which are known to yield different final size formulae (gart, ; dwyer et al., ; diekmann and heesterbeek, ; andreasen, ) . in this section, we briefly discuss two other interpretations, which lead to a transmision matrix that is not proportional to a stochastic matrix and will not yield the usual final size formula ( ). rather than specifying heterogeneities of transmission resulting from spatial structure, as in the previous section, suppose that heterogeneities arise from age-structured mixing patterns. one possible formulation is to categorize the population by discrete age cohorts, which is typically motivated by mixing patterns of children in schools. if we consider a time-scale short enough that transfer from one cohort to the next can be ignored, then the standard "realistic agestructured model" (schenzle, ) reduces to eqs. ( ) , where the transmission matrix b now refers to contact patterns among different age cohorts. as in the case of the spatial patch interpretation, it might be reasonable to assume that there is a stochastic matrix p such that where the interpretation is that individuals in cohort i have a proportion p i j of their contacts with individuals in cohort j, and β i is the transmission rate for individuals in cohort i. unlike the spatial interpretation, however, it would be hard to justify taking β i to be the same for each i. in particular, young children tend to come into much closer contact with their classmates than teenagers or adults do with members of their cohorts. as a result, theorem . will not apply and the final size will not be given by the usual formula ( ). in section where we discussed super-spreaders, we specifically excluded the types of super-spreaders that occur for sexually transmitted diseases, namely individuals who always have a higher rate of contact with others, regardless of whether they happen to be infected. when such "core groups" (yorke and hethcote, ; anderson et al., ) exist, eqs. ( ) can still be used, with the interpretation that the stratification of the population is by social group rather than spatial region or age cohort. like the age-structured situation, we can safely assume that eq. ( ) holds but not that β i is the same for every i as in eq. ( ). indeed, a large difference in β i is precisely what defines the core groups. again, we have a situation where the usual final size formula will not apply. core groups are well known to have a critical role in the transmission dynamics of sexually transmitted diseases, so it is not surprising that they will affect the final size of epidemics. the well-known formula ( ) for the expected final size of an epidemic is valid in remarkably general circumstances. previous work (e.g., kermack and mckendrick ( ) ; anderson and watsons ( ) ; diekmann and heesterbeek ( ) ) established that the formula is valid in an sir model with an arbitrarily distributed infectious period (theorem . ). here we have shown, in addition, that the standard formula ( ) is invariant to the number of latent and infectious stages of disease (theorem . ), the distributions of transmission rates within stages (theorem . ), and even to common spatial contact heterogeneities (theorem . ). the invariance of the final size formula has important practical implications. typically, the time at which one wishes to estimate the expected final size of an epidemic is long before enough information has been gathered to estimate the distributions of latent and infectious periods or other epidemiological details. our theorems provide rigorous support for estimates of the expected magnitude of epidemics based solely on estimates of the basic reproduction number r , which is the only parameter that appears in the final size formula ( ). it should be noted that the final size formula refers only to the ensemble average size of an epidemic for a disease with a given r . different stochastic realizations of the same process will lead to different final sizes and our analysis says nothing about the variance or higher moments of the final size distribution. kurtz ( ) proved that in the limit that the population size goes to infinity, the ensemble mean of the stochastic sir model with arbitrarily distributed infectious period converges to the solutions of the integro-differential equation model specified in our eqs. ( ). hence, the standard final size formula gives the mean final size of the stochastic sir model. however, it is difficult to deduce the distribution of the final size, and this remains an open problem for stochastic sir models with arbitrarily distributed infectious periods. for stochastic seir models with gamma distributed latent and infectious periods, anderson and watsons ( ) derived a normal approximation to the final size distribution that is valid in the limit of large population size. if there is little variation in the length of the latent period, and the infectious period is short, the stochastic seir model can be approximated by a chain binomial model (bailey, ) , which is a discrete time markov chain. the time step for this chain is equal to the fixed-length latent period, since the infectious period is presumed infinitesimal (all contacts occur instantaneously at the end of the latent period). for such models, the only parameter is the probability q that a contact between an infectious individual and a susceptible individual leads to infection. with the assumption that q = − e −r , von bahr and martin-lof ( ) and scalia-tomba ( ) showed that final size distribution for the traditional chain binomial model is asymptotically normal in the limit of large population size. scalia-tomba ( ) and andersson ( ) studied more general chain binomial models (with heterogeneous contact structures equivalent to those we discussed in sections and ) and found asymptotic final size distributions. their results imply that if q = − e −r then the ensemble mean final size in these chain binomial models is the solution of our eqs. ( ). we emphasized two circumstances under which the usual final size formula will not apply. one was the case of age-structured heterogeneity of transmission; while this would influence the final size, it is unlikely that we would be able to parameterize the age-structured contact patterns sufficiently well to improve on the final size estimate generated with the assumption of homogeneous mixing. the other case we discussed was the existence of social core groups, which are extremely important in the epidemiological dynamics of sexually transmitted diseases; in this case, some estimate of the difference in transmission rates in different social groups would likely be needed to obtain a useful estimate of the final size. in computing the final size, we have always assumed that there are no temporal changes in the transmission rate β. temporal variations in β could result from intrinsic seasonality of contact rates, imposition of control measures, or behavioral change in response to epidemic alerts. small seasonal variations in β cause only small perturbations to the final size formula, and substantial seasonality is generally associated with school-age children (london and yorke, ) who form a small fraction of the susceptible pool when a new infection enters a population. infection control measures, such as those adopted during the sars epidemic (lipsitch et al., ; wallinga and teunis, ) , could have a dramatic effect on the final size; in this case, rather than r , the reproduction number that is relevant for the final size formula must be calculated after sustainable precautionary measures have been put into place. individual behavioral change could alter the transmission rate through time-dependence (β = β(t), as in the case of imposed control measures) or through density-dependence (β = β(i), with β decreasing as i increases, which would arise if people tend to be more careful when they are aware of more cases). in either case, the standard final size formula might yield a poor approximation of the true final size. all of our analysis is predicated on the assumption that vital dynamics (births and deaths), and more generally any source of new susceptible individuals, can be ignored. this approximation is not valid in circumstances where infectious periods are substantial compared with life expectancy or where immunity decays rapidly. it will, however, typically be relevant whenever a new respiratory pathogen emerges, as in the case of the sars outbreak in (donnelly et al., ) or the emergence of a new subtype of influenza (earn et al., ) . we are interested in growth or decay of i(β, ), which is determined only by eq. (d. b). defining x(t) = ∞ β i(β, t) dβ, multiplying both sides of eq. (d. b) by β, and integrating with respect to β, we find on the other hand, recall that i(t) = ∞ i(β, t) dt (the total number of infectious individuals at time t). therefore, integrating eq. (b. b) with respect to β, noting that q(β) is a probability density function so the origin of this system of two ordinary differential equations is locally stable if and only if ∞ βq(β) dβ − γ < or, equivalently, r < . thus, the dfe is stable if r < and unstable if r > . we can expand model ( ) to include multiple stages. using the same techniques, we find that the conclusion that the epidemic threshold is given by r = remains valid. on the spread of a disease with gamma distributed latent and infectious periods infectious diseases of humans: dynamics and control a preliminary study of the transmission dynamics of the human immunodeficiency virus (hiv), the causative agent of aids stochastic epidemic models and their statistical analysis the asymptotic final size distribution of multitype chain-binomial epidemic processes dynamics of annual influenza a epidemics with immuno-selection the mathematical theory of infectious diseases and its application mathematical models in population biology and epidemiology mathematical epidemiology of infectious diseases: model building, analysis and interpretation epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in hong kong pathogen-driven outbreaks in forest defoliators revisited: building models from exprimental data ecology and evolution of the flu coherence and conservation a simple model for complex dynamical transitions in epidemics endemic models with arbitrarily distributed periods of infection i: fundamental properties of the model the mathematical analysis of an epidemic with two kinds of susceptibles metapopulation biology: ecology, genetics, and evolution matrix analysis the differential infectivity and staged progression models for the transmission of hiv a contribution to the mathematical theory of epidemics relationships between stochastic and deterministic population models transmission dynamics and control of severe acute respiratory syndrome destabilization of epidemic models with the inclusion of realistic distributions of infectious periods realistic distributions of infectious periods in epidemic models: changing patterns of persistence and dynamics. theor recurrent outbreaks of measles, chickenpox and mumps. i seasonal variation in contact rates sars-one year later identification of severe acute respiratory syndrome in canada the walter reed staging classification for htlv-iii/lav infection asymptotic final size distribution for some chain-binomial processes asymptotic final size distribution of the multitype reed-frost process an age-structured model of pre-and post-vaccination measles transmission infinite subharmonic bifurcation in an seir model reproduction numbers and subthreadold endemic equilibria for compartmental models fo disease transmission threshold limit theorems for some epidemic processes different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures from mathworld-a wolfram web resource. http:\math world.wolfram.com/lambertw-function.html severe acute respiratory syndrome (sars), multi-country outbreak severe acute respiratory syndrome (sars), multi-country outbreak gonorrhea: transmission dynamics and control an explicit solution for z expressed in terms of elementary functions is not possible, but eq. ( ) can be solved explicitly for z in terms of the lambert w function (weisstein) , when the infectious period is exponentially distributed, i.e., f (t) = γ e −γ t , we have l(τ, t) = f (t). thus eq. ( f) can be writtenlet key: cord- -wqkphg e authors: hazem, y.; natarajan, s.; berikaa, e. title: hasty reduction of covid- lockdown measures leads to the second wave of infection date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: wqkphg e the outbreak of covid- has an undeniable global impact, both socially and economically. march th, , covid- was declared as a pandemic worldwide. many governments, worldwide, have imposed strict lockdown measures to minimize the spread of covid- . however, these measures cannot last forever; therefore, many countries are already considering relaxing the lockdown measures. this study, quantitatively, investigated the impact of this relaxation in the united states, germany, the united kingdom, italy, spain, and canada. a modified version of the sir model is used to model the reduction in lockdown based on the already available data. the results showed an inevitable second wave of covid- infection following loosening the current measures. the study tries to reveal the predicted number of infected cases for different reopening dates. additionally, the predicted number of infected cases for different reopening dates is reported. the covid- (sars-cov- virus) pandemic is a global challenge that requires accurate forecasting and prediction methods to address its socio-economic implications [ ] . globally, governments have imposed firm lockdown conditions and mandatory quarantine measures to reduce the spreading of the virus [ ; ] . these measures have successfully reduced the infection rate in many countries; however, they have significant negative socio-economic impacts [ ] . hence, it is important to understand the consequences of easing lockdown measures and refreshing the economy. currently (up to th of may ), the countries can be classified into categories: (a) countries that have successfully conquered the covid- pandemic such as china, south korea, and australia, (b) countries on their way of beating covid- and have already reached their peak infection period such as the united states, italy, and spain, and (c) countries that have not yet reached their peak infection period such as russia, india, and brazil [ ] . countries from both (a) and (b) categories have started working on outlining their plans to reopen and save their economy [ ] . this paper aims to study the implications of loosening the lockdown conditions in the united states, germany, the united kingdom, italy, spain, and canada. all the chosen countries have already passed their peak infection period, and their governments have announced their intentions of reducing the lockdown measures and reopening some nonessential services within the next months. a modified version of the susceptible, infectious, and recovered (sir) model is used to accurately simulate the pandemic [ ] . for the study in hand, this model is used to forecast the infection rate if the lockdown measures are reduced by % on the st of june or the st of july ; hence, the impact of delaying this step is also investigated. this study gives a quantitative point of view for the risks of hasty reduction . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint in lockdown measures and predicts the evolution of the number of infected cases until the end of following the assumed conditions. several mathematical and physical models describe the evolution of an epidemic; these models differ in capabilities and complexity [ ; ; ; ; ] . in this work, the sir model is used due to its flexibility and simplicity, which make it the commonly adopted model in the literature [ ] . however, the sir basic formulation is not capable of modeling the variations in the lockdown measures and its impact on the infection rate. recent studies confirmed the accurate modeling of the covid- epidemic with modified versions of the sir model in all infection hot spots [ ; ; ] . the basic formalism of the sir model is depicted in the following coupled set of differential equations: where s indicates the time-dependent susceptible population, i is the time-dependent number of infected cases, r is the time-dependent number of recovered cases, n is the total population size, β resembles the probability of infection transmission from an infected case to a susceptible case, and resembles the rate at which infected cases recover. the initial conditions of the model are the whole population being susceptible to infection ( s(t ) ), the number of reported infected cases on the starting date of the outbreak ( i(t ) ), and zero recovered cases ( r(t ) ). . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint engineering (csse) data respiratory [ ; ] , the data used in this study spans the interval between th of january until th of may . the proposed algorithm to investigate the impact of reducing lockdown measures on infection rate is shown in fig. . the acquired data is analyzed to extract the infection rate per country. the outbreak date and turning point of the infection rate are extracted in order to divide the data into intervals: the first interval is characterized by a high infection rate and corresponds to low lockdown measures, and the second interval is characterized by lower infection rate and reflects the impact of the governmental lockdown measures. the high infection rate at the first interval can be interpreted by the lack of testing kits, social awareness, and governmental quarantine measures, which leads to a high β value. additionally, the healthcare system during this period operates at full power, which is reflected in the slightly higher value. after the turning point and experiencing peak infection rate, the quarantine measures are strictly applied, and the testing kits have been developed with higher reliability and shorter waiting times [ ] ; thus, β decreases. during the peak of the pandemic, the overwhelming load on the health care system and medical staff leads to the crash of some hospitals either due to the spread of the virus and the infection of the doctors and nurses, or the lack of intensive care units for everyone. this load reduces the overall healthcare system capability and interprets the slight decrease in value [ ] . the concept of timedependent β and has been proposed previously to overcome the limitations of the sir model and simulate the impact of the quarantine measures [ ; ] . table shows the parameters extracted from the sir model after fitting it to the acquired covid- data for the considered countries in the study. quantifying the governmental measures to compact the covid- pandemic is difficult; we assumed that the first interval and the second interval correspond to % and % . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint measures, respectively. these are just estimates; however, they can be used to predict the impact of loosening the lockdown measures on β and that in turn affects the infection rate. in this study, we considered loosening the lockdown measures by % compared to the % measures during the second interval. the infection rate has an exponential dependence on β and [ ] ; thus, an exponential dependence on the lockdown measures is assumed, and the parameters used for prediction are obtained according to: based on the described modeling scheme, the increase in infection rate due to loosening of lockdown measures is calculated until the end of , as shown in fig. . countries react differently to the proposed scheme as revealed by fig. and is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint more affected by reopening as they have not fully conquered covid- yet; hence, the hasty reduction of quarantine measures might lead to even higher infection rates that has happened before during the spanish flu [ ; ] . thus, there is a correlation between the turning point date, reopening date, and second-wave peak infection rate. quantitively, this study reveals that the second wave might lead to . based on the results presented in table , it is recommended to consider loosening the measures after at least months from the turning point date. additionally, delaying the reopening from june st to july st can reduce the overall number of newly infected cases by % as in the united states and up to % as in canada, which ensures that the capacity of the recovering healthcare system can meet the number of newly infected cases. according to washington university covid- model, the number of icu beds and invasive ventilators needed are projected to continue to decrease until the end of august [ ] . this decrease proves that the health care system will be able to partially recover before the second peak of the covid- hits the countries. despite our best effort in analyzing and optimizing the data collected, there are a few limitations to be considered. our model assumes that the partially recovered healthcare system is not deteriorated by the second wave; hence is fixed. in addition, the sir model assumes that the recovered cases gain immunity against the disease, which is not the case for the covid- pandemic [ ] . not forgetting that, the development of testing kits, treatment techniques, and a reliable vaccine can have a considerable positive impact on the infection rates, which is not considered in our analysis. . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint in conclusion, this study offers a quantifiable prediction of how reducing the lockdown measures shall lead to the second wave of covid- in the united states, germany, the united kingdom, italy, spain, and canada. further, delaying the reduction of lockdown measures show a reasonable reduction in the number of predicted infected cases. eventually, this study highlights the risks of hasty reduction of lockdown measures in countries in the middle of their battle with covid- . this study aims mainly to ring alarm bells for the risks accompanied by the rash loosening of the quarantine measures. moreover, this paper introduces a comprehensive worst-case view of shifting into the heard immunity paradigm. our calculations ignored any external measures to combat the covid- second wave. for further studies, it is recommended to include the possibility of governmental interventions that might lead to a different scenario with peak predictions. in addition, it is very recommended to start evaluating the pandemic second wave, applying a time-varying β and γ factors. . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . forecast the covid second wave based on the new β and γ . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint the socio-economic implications of the coronavirus and covid- pandemic: a review a simple planning problem for covid- lockdown the positive impact of lockdown in wuhan on containing the covid- outbreak in china economic effects of coronavirus outbreak (covid- ) on the world economy covid- coronavirus pandemic expected impact of reopening schools after lockdown on covid- epidemic in Île-de-france asymptotic behavior in a deterministic epidemic model mathematical theories of populations: deomgraphics, genetics, and epidemics infinite subharmonic bifurcation in an seir epidemic model periodicity and stability in epidemic models: a survey, differential equations and applications in ecology, epidemics, and population problems complete global stability for an sir epidemic model with delaydistributed or discrete a modified sir model for the covid- contagion in italy extended sir prediction of the epidemics trend of covid- in italy and compared with hunan modeling and forecasting the covid- pandemic in brazil coronavirus covid- global cases by covid- sir model). matlab central file exchange in vitro diagnostic assays for covid- : recent advances and emerging trends protocol for assessment of potential risk factors for coronavirus disease (covid- ) among health workers in a health care setting prospects and limits of sir-type mathematical models to capture the covid- pandemic estimation of the reproductive number of the spanish flu epidemic in the spanish flu in denmark forecasting covid- impact on hospital bed-days, icu-days, ventilator-days and deaths by us state in the next months cause analysis and treatment strategies of" recurrence" with novel coronavirus pneumonia (covid- ) patients after discharge from hospital. zhonghua jie he he hu xi za zhi= zhonghua jiehe he huxi zazhi= chinese journal of tuberculosis and respiratory diseases key: cord- -nc rtwtd authors: smeets, bart; watte, rodrigo; ramon, herman title: scaling analysis of covid- spreading based on belgian hospitalization data date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: nc rtwtd we analyze the temporal evolution of accumulated hospitalization cases due to covid- in belgium. the increase of hospitalization cases is consistent with an initial exponential phase, and a subsequent power law growth. for the latter, we estimate a power law exponent of ≈ . , which is consistent with growth kinetics of covid- in china and indicative of the underlying small world network structure of the epidemic. finally, we fit an sir-x model to the experimental data and estimate the effect of containment policies in comparison to their effect in china. this model suggests that the base reproduction rate has been significantly reduced, but that the number of susceptible individuals that is isolated from infection is very small. based on the sir-x model fit, we analyze the covid- mortality and the number of patients requiring icu treatment over time. as of march , the epidemic of new corona virus disease (covid- ) is rapidly spreading throughout european countries. some countries, such as italy and spain, have witnessed an explosion in cases, quickly saturating the treatment capacity of hospitals. to help steer governmental policies for containing the epidemic and to aid in the preparation planning of health services, understanding of the spreading behavior of covid- through the population is critical. studies on the outbreak of covid- in the hubei province and the rest of mainland china show that the temporal evolution of confirmed cases can be classified in three distinct regimes: ) an initial exponential growth phase, ) an extended phase of power law growth kinetics indicative of a small world network structure, with a universal growth exponent of µ ≈ . , and ) a slow inflection to a plateau phase, following a parabolic profile in double logarithmic scale [ ] . the roughly quadratic growth can be explained by considering the population as a two-dimensional planar network where the infected population only grows in the periphery of isolated 'patches' of infection [ ] . the observed final inflection is not to be confused with the saturation of a logistic growth curve, which arises due to negative feedback as the number of susceptible people decreases with spreading of the infection. this effect is unlikely to contribute in the chinese case, since even pessimistic estimates of the total number of covid- cases stay very small compared to the total population. more likely, this effect can be attributed to extreme containment measures enacted in china. these measures disconnect the social network structure, producing caging effects that sufficiently slow down the spreading below a reproduction number of . a popular epidemiological model is the sir model, which is based on the formulation of ordinary differential equations for the number of susceptible (s), infectious (i) and recovered, or removed (r) individuals [ ] . this model was recently extended to include symptomatic quarantined individuals (x), resulting in the 'sir-x' model, which was successfully applied to predict the spreading kinetics and assess containment policies for covid- in china [ ] , and is currently being used to monitor the number of confirmed covid- cases in various countries [ ] . in belgium, policies to contain the spreading of covid- have proceeded in multiple phases. initially, in phase i, strong quarantine measures were imposed on detected and suspected individuals who traveled from at-risk regions. in case of confirmed infections, their recent history of contacts was traced back and these individuals were quarantined as well as tested for covid- . phase ii included fast testing and quarantine of all individuals that exhibit symptoms. in phase iii, drastic societal containment strategies are enacted. regardless of symptoms, individuals are to minimize any social interactions. due to restricted testing capacity, tests are only performed on individuals that exhibit severe symptoms. an important consequence of this strategy is that the number of confirmed cases can be heavily biased by shifting testing capacity and testing priorities. as an alternative, we propose that the accumulated number of hospitalized individuals is a good indicator for the number of actual covid- , albeit with a shift in time. this temporal shift is roughly equal to the combination of the mean incubation time (≈ days) and the average time from the onset of symptoms to hospitalization (≈ days) [ ] . data is obtained from publicly available numbers on current hospitalization (h), current number of icu patients (icu ), accumulated number of deaths (d) and number of individuals released from the hospital (r). these statistics are made public on a daily basis starting from march th , based on data from more than % of belgian hospitals [ ] . for each day, the accumulated number of hospitalizations was computed as h a = h + r + d. here, we include data up to the th of march (release on the th). throughout the analysis, dates shown indicate the date of publication of new data, and the 'day' scale counts the number of days starting from march th. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted march , . . we use the sir-x model introduced by maier and brockmann ( ) to simulate the hospitalization data. this model is based on the following odes [ ] : here, α is the infection rate, β the recovery rate, κ the removal rate of symptomatic infected individuals, and κ the containment rate of both s and i populations. originally, it is assumed that the fraction x is proportional to the number of confirmed infected cases. we will assume that x is proportional to the number of hospitalized cases, estimating that around % of infected (selfisolating) cases will be hospitalized, and that this occurs with a time delay of days (average time between onset of symptoms and hospitalization). the precise proportionality of this scaling does not affect the further outcome of our analysis. the sir-x model measures the effectiveness of isolation strategies through the 'leverage factor' and the 'quarantine probability' p is a measure for how strong isolation policies affect the general public in comparison to quarantine measures on infected individuals. q is the probability that an infected individual is (self)quarantined. moreover, it allows for the formulation of an effective reproduction number r ,eff = α/(β + κ + κ ), which is always smaller than the basic reproduction number in free unconstrained growth r ,free = α/β. parameters α and β represent intrinsic properties of infectiousness and are not varied, but fixed at α = . and β = . , corresponding to a recovery time of days, and a free reproduction number of r ,free = . , as was assumed by the original authors [ ] . the free parameters during the fitting procedure are κ, κ and i /x , the initial fraction of infectious individuals. fits are performed using the levenberg-marquardt least squares methods. during this procedure eqs. ( - ) are integrated using the dormand-prince method, which uses a fourth-order runge-kutta method. the implementation of the fitting routine for the sir-x model was kindly provided by the original authors [ ] . the fitting of power law models was performed in double logarithmic space, and discarded the first five data points (march - ), which account for the exponential behavior. fig. shows h a , d and r as a function of time. the accumulated hospitalization h a showcases two distinct regimes: an initial exponential growth phase and a . cc-by-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint power law growth phase where h a ∝ t µ . for the latter, we estimate a fractal exponent µ = . . the number of deaths d follows a power law growth d ∝ t µ d with µ d = . . as of march th, no significant deviations of neither h a nor d from the power law growth can be observed. furthermore, fig. shows the predicted accumulated hospitalization x and infectious population i from a fit using the sir-x model. the parameters that fit the observed growth of h a are listed in table . the estimated value of κ , the containment rate of the whole population is very close to zero. consequently, the public containment leverage p is low as well. the quarantine probability is estimated at a value of q = . . furthermore, there is a strong reduction of the reproduction number, with an effective reproduction number of r ,eff = . , much smaller than the unrestrained reproduction number r ,free = . . finally, the sir-x model predicts that the maximal number of infectious individuals occurs around april . fig. (a) shows the number of accumulated hospitalizations as well as deaths due to covid- in comparison to the fitted sir-x model. setting an average mortality of % for all hospitalized cases [ ] , we find that the sir-x model coincides with the number of deaths when including a temporal delay of only ≈ days. assuming an average hospitalization time of days, and that between % and % of currently hospitalized patients require intensive care treatment (icu), we predict based on the sir-x model the temporal evolution of the current number of patients in icu - fig. (b-c) . these assumptions align with the observed current number of icu patients. for the estimated sir-x model parameters, the number of icu patients will peak around april th. the peak count of icu patients greatly varies with the average icu retention time, but will peak at significantly higher values than the current icu capacity in belgian hospitals of beds, fig. (d) . is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted march , . . this approximately quadratic growth indicates a small-world network structure with mostly local interactions through which the spreading of the infection occurs. this is consistent with the observation based on data from belgian telecom operators, which show that more than % of belgians have stayed within their own commune (postal code) for the last two weeks, and that individual displacements of over km have been reduced by %. at the time of this writing, no significant deviation from this power law scaling has been observed for the accumulated belgian hospitalization data. this indicates that the current social network is still sufficiently well connected to continue local spreading of the disease. in part, these local network links could be attributed to the infection of direct family members. without extensive testing, many infected people will be locked into their homes and pose a contamination risk for their families. in a severe lock-down, this effect should be controlled within a few days. hence, other factors may contribute to the continuation of the power law scaling. another spreading mechanisms could be local supermarkets, where in spite of extensive safety measures, significant spreading of the highly infectious corona virus could occur. a solution can be to enforce a more rigid approach, in which supermarkets are viewed as local distribution centres. most supermarkets already provide online shopping services, where people fill in their online shopping carts. this would allow for an optimal spreading of customers, and/or would make it possible to deliver groceries in a drive-through system. the result would be a large reduction of the small-world connectivity, resulting in a lowering of the exponents, thereby further flattening the curve. since the current scaling behavior of h a still closely follows the algebraic growth regime, it is very difficult to make accurate predictions on when the inflection point away from this regime will occur. as a rule-of-thumb, reliable prediction capacity does not extend a period of about days. the sir-x model predicts that h a will start to plateau around days after the initial day. the parameters from the sir-x model fit suggest a very low value for the containment rate of both infected and susceptible individuals κ . in other words, containment measures have only a weak effect on removing healthy individuals from the susceptible population. on the other hand, the removal . cc-by-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted march , . . , compared to predictions from the sir-x model, assuming that % or % of hospitalized patients require intensive care. (d) longer-time extrapolation assuming . % icu shows a peak of icu patients around april th. different curves show varying icu retention times. the shorter the icu retention time, the lower and earlier the icu peak will be. rate of symptomatic individuals is much higher, leading to a strongly decreased effective reproduction number, and a moderately high quarantine probability. these sir-x model parameters are somewhat similar to values estimated for the beijing region of china [ ] . when extrapolating the number of deaths using the sir-x model, the predicted death toll due to covid- will exceed by april rd. the sir-x model for accumulated hospitalizations is compatible with the current number of patients in intensive care when assuming that between % and % of hospitalized patients need icu treatment, and that the average retention time in icu is around days. extrapolation with these parameters predicts a peak in number of icu patients around april th, with the number of icu patients exceeding the capacity of the belgian healthcare system of beds. based on fig. (a,c-d) , one can conclude that the model matches very well with the total number of hospitalisations over time. this experimentally deter- . cc-by-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted march , . . mined parameter is a numerical integration of the number of new cases each day. this integration has the important advantage of averaging out the noisiness in the day-to-day reporting - figure (b) . belgian media have reported that some hospitals have published their numbers with a time-delay of one day, which has a large impact on the visualisation of the results in a linear scale. it should however be emphasized that these predictions are highly sensitive to the estimated parameters of the sir-x model [ ] . furthermore, the model assumes that these constants will not further change in time. in reality, the effect of containment and isolation measures may occur gradually and with a significant time delay. scaling features in the spreading of covid- quadratic growth during the novel coronavirus epidemic modeling infectious diseases in humans and animals effective containment explains subexponential growth in confirmed cases of recent covid- outbreak in mainland china event horizon covid- , forecast by country the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application fractal kinetics of covid- pandemic. medrxiv clinical features of patients infected with novel coronavirus in wuhan key: cord- - l bo authors: liu, laura; moon, hyungsik roger; schorfheide, frank title: panel forecasts of country-level covid- infections() date: - - journal: j econom doi: . /j.jeconom. . . sha: doc_id: cord_uid: l bo we use a dynamic panel data model to generate density forecasts for daily active covid- infections for a panel of countries/regions. our specification that assumes the growth rate of active infections can be represented by autoregressive fluctuations around a downward sloping deterministic trend function with a break. our fully bayesian approach allows us to flexibly estimate the cross-sectional distribution of slopes and then implicitly use this distribution as prior to construct bayes forecasts for the individual time series. we find some evidence that information from locations with an early outbreak can sharpen forecast accuracy for late locations. there is generally a lot of uncertainty about the evolution of active infection, due to parameter and shock uncertainty, in particular before and around the peak of the infection path. over a one-week horizon, the empirical coverage frequency of our interval forecasts is close to the nominal credible level. weekly forecasts from our model are published at https://laurayuliu.com/covid -panel-forecast/. this paper contributes to the rapidly growing literature on generating forecasts related to the current covid- pandemic. we are adapting forecasting techniques for panel data that we have recently developed for economic applications such as the prediction of bank profits, charge-off rates, and the growth (in terms of employment) of young firms; see liu ( ) , liu, moon, and schorfheide ( ) , and liu, moon, and schorfheide ( ) . we focus on the prediction of the smoothed daily number of active covid- infections for a crosssection of approximately one hundred countries/regions, henceforth locations. the data are obtained from the center for systems science and engineering (csse) at johns hopkins university. while we are currently focusing on country-level aggregates, our model could be easily modified to accommodate, say, state-or county-level data. in economics, researchers distinguish, broadly speaking, between reduced-form and structural models. a reduced-form model summarizes spatial and temporal correlation structures among economic variables and can be used for predictive purposes assuming that the behavior of economic agents and policy makers over the prediction period is similar to the behavior during the estimation period. a structural model, on the other hand, attempts to identify causal relationships or parameters that characterize policy-invariant preferences of economic agents and production technologies. structural economic models can be used to assess the effects of counterfactual policies during the estimation period or over the out-of-sample forecasting horizon. the panel data model developed in this paper to generate forecasts of covid- infections is a reduced-form model. it processes cross-sectional and time-series information about past infection levels and maps them into predictions of future infections. while the model specification is motivated by the time-path of infections generated by the workhorse compartmental model in the epidemiology literature, the so-called susceptible-infected-recovered (sir) model, it is not designed to answer quantitative policy questions, e.g., about the impact of social-distancing measures on the path of future infection rates. building on a long tradition of econometric modeling dating back to haavelmo ( ) , our model is probabilistic. the growth rates of the infections are decomposed into a deter-about model parameters and uncertainty about future shocks. we model the growth rate of active infections as autoregressive fluctuations around a deterministic trend function that is piecewise linear. the coefficients of this deterministic trend function are allowed to be heterogeneous across locations. the goal is not curve fitting -our model is distinctly less flexible in samples than some other models -but rather out-of-sample forecasts, which is why we prefer to project growth rates based on autoregressive fluctuations around a parsimonious linear time trend with a single break. a key feature of the covid- pandemic is that the outbreaks did not take place simultaneously in all locations. thus, we can potentially learn from the speed of the spread of the disease and subsequent containment in country a, to make forecasts of what is likely to happen in country b, while simultaneously allowing for some heterogeneity across locations. in a panel data setting, one captures cross-sectional heterogeneity in the data with unit-specific parameters. the more precisely these heterogeneous coefficients are estimated, the more accurate are the forecasts. a natural way of disciplining the model is to assume that the heterogeneous coefficients are "drawn" from a common probability distribution. if this distribution has a large variance, then there is a lot of country-level heterogeneity in the evolution of covid- infections. if instead, the distribution has a small variance, then the path of infections will be very similar across samples, and we can learn a lot from, say, china, that is relevant for predicting the path of the disease in south korea or germany. formally, the cross-sectional distribution of coefficients can be used as a so-called a priori distribution (prior) when making inference about country-specific coefficients. using bayesian inference, we combine the prior distribution with the unit-specific likelihood functions to compute a posteriori (posterior) distributions. this posterior distribution can then be used to generate density forecasts of future infections. unfortunately, the cross-sectional distribution of heterogeneous coefficients is unknown. the key insight in the literature on bayesian estimation of panel data models is that this distribution, which is called random effects (re) distribution in the panel data model literature, can be extracted through simultaneous estimation from the cross-sectional dimension of the panel data set. there are several ways of implementing this basic idea. reflect parameter uncertainty as well as uncertainty about shocks that capture deviations from the deterministic component of our forecasting model. our empirical analysis makes the following contributions. first, we present estimates of the re distribution as well as the distribution of location-specific coefficient estimates. second, we document how density forecasts from our model have evolved over time, focusing on the forecasts for three countries in which the level of infections peaked at different points in time: south korea, germany, and the u.s. due to the exponential transformation from growth rates to levels, density forecasts can feature substantial tail risk by assigning nontrivial probability to very high infection levels which materialized in the u.s. but not in germany and south korea. third, we evaluate one-week and four-week ahead density forecasts based on the continuous ranked probability score and interval forecasts based on cross-sectional coverage frequency and average length. in addition to forecasts from our panel data model, we also consider forecasts based on location-level time series estimates of our trend-break model and a simple sir model. once we decompose the set of locations into those that experienced the covid- outbreak early (prior to - - ) and those that experience the outbreak later on, then we find some evidence that for the late group the panel density forecasts are more accurate than the time-series forecasts. however, because of the substantial heterogeneity in our panel and the poor data quality for some countries, the empirical evidence in favor of the panel approach is not as tidy as the simulation evidenced provided in the monte carlo section of this paper. over time, in particular after the infection level has peaked and started to fall, forecast accuracy increases. the timing of the peak appears to be very difficult to forecast. prior to the middle of may the panel and time-series forecasts from our trend-break model are substantially more accurate than the forecasts from a simple time-varying coefficient sir model. for subsequent forecast origins, the accuracy across the three forecasting procedures becomes much more similar. weekly real-time forecasts are published on the companion website https://laurayuliu.com/covid -panel-forecast/. in terms of interval forecasts we find that over a one-week horizon the empirical coverage frequency of the trend-break model forecasts is close to the nominal coverage level based on which the forecasts were constructed. moreover, in april and may, the average interval lengths of the panel model forecasts are slightly smaller than the time-series intervals. at the four-week horizon the coverage frequency is considerably smaller than the nominal level and it deteriorates further for longer horizons. this paper is connected to several strands of the literature. the panel data forecasting approach is closely related to work by gu and koenker ( a,b) and our own work in liu ( ), liu, moon, and schorfheide ( ) , liu, moon, and schorfheide ( ) . all five papers focus on the estimation of the heterogeneous coefficients in panel data models. the forecasting model for the covid- infections is based on the alternative parametric model considered in liu ( ) and tailored to the specifics of the covid- pandemic. the approach has several desirable theoretical properties. for instance, liu, moon, and schorfheide ( ), building on brown and greenshtein ( ) , show that an empirical bayes implementation of the forecasting approach based on tweedie's formula can asymptotically (as the crosssectional dimension tends to infinity) lead to forecasts that are as accurate as the so-called oracle forecasts. here the oracle forecast is an infeasible benchmark that assumes that the distribution of the heterogeneous coefficients is known to the forecaster. liu ( ) shows that the density forecast obtained from the full bayesian analysis converges strongly to the oracle's density forecast as the cross-section gets large. the piecewise linear conditional mean function for the infection growth rate resembles a spline; see de boor ( ) for an introduction to spline approximation. unlike a typical spline approximation in which the knot locations are free parameters and some continuity of smoothness restrictions are imposed, the knot placement in our setting is closely tied to the first component of the spline, and we do not impose continuity. our model could be generalized by adding additional knots in the deterministic trend component of infection growth rates, but the extension is not pursued in this paper. other authors have explored alternative forms of nonlinearity which are often tied to the object that is being modeled, e.g., active infections, cumulative infections, new infections, deaths. for instance, li and linton ( ) model the logarithm of country-level new infections and new deaths via a quadratic trend, using rolling samples. ho, lubik, and matthes ( ) model the cumulative number of infections using a very flexible nonlinear parametric function. an important aspect of our modeling framework is that the panel model is specified in event time, i.e., time since the level of infections in a particular location exceeds . the forecasts, however, are generated based on calendar time. this allows us to sharpen forecasts for countries/regions that experienced an outbreak at a late stage (in terms of calendar time), based on information from locations with an early outbreak. this idea j o u r n a l p r e -p r o o f journal pre-proof is also utilized by larson and sinclair ( ) who use state-level panel data to nowcast unemployment insurance claims during covid- . a growing number of researchers with backgrounds in epidemiology, biostatistics, machine learning, economics, and econometrics are engaged in forecasting aspects of the covid- pandemic. because this is a rapidly expanding and diverse field, we do not attempt to provide a meaningful survey at this moment. instead, we simply provide a few pointers. forecasts are reported in the abovementioned papers by li and linton ( ) and ho, lubik, and matthes ( ) . the paper by avery, bossert, clark, ellison, and fisher ellison ( ) the remainder of this paper is organized as follows. section provides a brief survey of epidemiological models with a particular emphasis on the sir model. the specification of our panel data model is presented in section . section contains a small-scale monte carlo study and the empirical analysis is conducted in section . finally, section concludes. there is a long history of modeling epidemics. a recent survey of modeling approaches is provided by bertozzi, franco, mohler, short, and sledge ( ) . the authors distinguish three types of macroscopic models: (i) the exponential growth model; (ii) self-exciting point processes / branching processes; (iii) compartmental models, most notably the sir model that divides a population into susceptible (s t ), infected (i t ), and resistant (r t ) individuals. our subsequent discussion will focus on the exponential growth model and the sir model. while epidemiological models are often specified in continuous time, we will con-sider a discrete-time specification in this paper because it is more convenient for econometric inference. the exponential model takes the form i t = i exp(γ t). the number of infected individuals will grow exponentially at the constant rate γ . this is a reasonable assumption to describe the outbreak of a disease, but not the subsequent dynamics because the growth rate will typically fall over time and eventually turn negative as more and more people become resistant to the disease. the sir model dates back to kermack and mckendrick ( ) . in its most elementary version it can be written in discrete-time as follows: where n is the (fixed) size of the population, β is the average number of contacts per person per time, and γ is the rate of recovery or mortality. the model could be made stochastic by assuming that β and γ vary over time, e.g., in response to the recent covid- pandemic, several introductory treatments of sir models have been written for economists, e.g., avery, bossert, clark, ellison, and fisher ellison ( ) and stock ( ) . moreover, there is a growing literature that combines compartmental models with economic components. in these models, economic agents account for the possibility of contracting a disease when making their decisions about market participation. this creates a link between infection rates and economic activity through the frequency of interactions. examples of this work in macroeconomics include eichenbaum, rebelo, and trabandt ( ) , glover, heathcote, krueger, and rios-rull ( ) , and krueger, uhlig, and xie ( ). the advantage of models that link health status to economic activity is that they can be used to assess the economic impact of, say, social distancing measures. we now simulate the constant-coefficient sir model in ( model. it is a monotonically decreasing function of time that we approximate by fitting a piecewise linear least-squares regression line with a break point at t * which is the point in time when the infections peak and the growth rate transitions from being positive to being negative. under the second parameterization the transmission rate β = . is much lower and the recovery rate is slightly faster. this leads to an almost bell-curve shaped path of infections. while the resulting growth rate of the infections is not exactly a linear function of time t, the break at t * is much less pronounced. while the piecewise-linear regression functions do not fit perfectly, they capture the general time-dependence of the growth-rate path implied by the sir model. in particular, they allow for a potentially much slower change in the growth rate of infections after the peak. we use these simulations as a motivation for the subsequent specification of our empirical model. this model assumes that the growth rate of infections is a decreasing piecewiselinear function of time with a break when the growth rates cross zero and the infections peak. this deterministic component is augmented by a stochastic component that follows a first-order autoregressive, ar( ), process. we refer to the model as trend-break model. we will revisit a stochastic version of the sir model that comprises ( ) and ( ) in section . where we compare its forecasts to the proposed trend-break model. we now describe our empirical model in more detail. we begin with the specification of a regression model for the growth rate of infections in section . . our model features location-specific regression coefficients and heteroskedasticity. the prior distribution for the bayesian analysis is summarized in section . . posterior inference is implemented through a gibbs sampler that is outlined in section . . further computational details are provided through replication files on the companion webpage. the algorithm to obtain simulated infection paths from the posterior predictive distribution is outlined in section . . we specify a panel data model for infection growth rates y it = Δ ln i it , i = , . . . , n and t = , . . . , t . we assume that where ] captures the size of the break in the regression coefficients at t = t * i . the deterministic part of y it corresponds to the piecewise-linear regression functions fitted to the infection growth paths simulated from the sir in figure . the serially-correlated process u it generates stochastic deviations from the deterministic path γ i x t of the infection growth rate. the u it shocks may capture time variation in the (β, γ) parameters of the sir model or, alternatively, model misspecification. in section the break point t * i was given by the peak of the infection path. abstracting from a potential discontinuity at the kink, we define t * i as which implies that e[y it |t = t * i ] = . because of the ar( ) process u it , t * i is not the peak of the observed sample path, nor is it an unbiased or consistent estimate of the period in which the infections peak. for δ i = , the model reduces to note that the break date t * i is identified in this model even if δ i = , because we assume the break occurs when the deterministic component of the growth rate falls below zero. to construct a likelihood function we define the quasi-difference operator Δ ρ = − ρl such that Δ ρ u it = it . thus, we can rewrite ( ) as follows now let λ i = [γ i , δ i ] and n λ be the dimension of λ. the parameters of the panel data model are (ρ, λ :n , σ :n ). here, we use the notation z :l to denote the sequence z , . . . , z l . using this notation, we denote the panel observations by y :n, :t . we will subsequently condition j o u r n a l p r e -p r o o f journal pre-proof on y :n, to initialize conditional likelihood function. finally, from the growth-rates y it we can easily recover the level of active infections as to conduct bayesian inference, we need to specify a prior distribution for ( ρ, λ :n , σ :n ). we do so conditional on a vector of hyperparameters ξ that do not enter the likelihood function. our prior distribution has the following factorization: where ∝ denotes proportionality and f (•) is an indicator function that we will use to impose the following sign restrictions on the elements of λ i : the restriction γ i < ensures that the growth rates are falling over time. after the break point the rate of decline decreases (δ i > ), but stays negative (γ i + δ i < ). in addition we assume that the decrease in the rate of decline is associated with a downward shift, i.e., δ i < , of the intercept as shown in the sir simulation. because of the presence of the indicator function f (•) the right-hand side of ( ) is not a properly normalized density. in view of the indicator function f (•) we define the re distribution of λ i given ξ as in turn, the marginal prior distribution of the hyperparameters is given by building on liu ( ), we use the following densities p(•) in ( ) for ρ, λ i , and σ i : j o u r n a l p r e -p r o o f thus, the vector of hyperparameters is ξ = (μ, Σ, a, b). we decompose p(ξ) = p(μ, Σ)p(a, b). the density p(μ, Σ) is constructed as follows: the degrees of freedom for the inverse wishart distribution is set to the matrix w is constructed to align the scale of the variance of μ i with the cross-sectional variance of the data, adjusting for the average magnitudes of the regressors that multiply the λ i elements. to obtain the density p(a, b), we follow llera and beckmann ( ) the parameters (α a , β a , γ a , α b , β b ) need to be chosen by the researcher. we use α a = , β a = γ a = α b = β b = . , which specifies relatively uninformative priors for hyperparameters a and b. posterior inference is based on an application of bayes theorem. let p(y :n, :t |λ :n , σ :n , ρ) denote the likelihood function (for notational convenience we dropped y :n, from the conditioning set). then the posterior density is proportional to p(ρ, λ :n , σ :n , ξ|y :n, :t ) ∝ p(y :n, :t |λ :n , σ :n , ρ)p(ρ)p λ :n , σ :n , ξ , j o u r n a l p r e -p r o o f journal pre-proof where the prior was given in ( ). to generate draws from the posterior distribution we use a gibbs sampler that iterates over the conditional posterior distributions the gibbs sampler generates a sequence of draws ρ s , λ s :n , (σ :n ) s , ξ s , s = , . . . , n sim , from the posterior distribution. the implementation of the gibbs sampler closely follows liu ( ) . for the gibbs sampler to be efficient, it is desirable to have a model specification in which it is possible to directly sample from the conditional posterior distributions in ( ). unfortunately, the exact likelihood function leads to a non-standard conditional posterior distribution for λ :n |(y :n, :t , ρ, σ :n , ξ) because γ i enters the indicator function in ( ) through the definition of t * i . thus, rather than using the exact likelihood function, we will use a limited-information likelihood function of the form the densities p l (y i, :t |λ i , σ i ) are constructed as follows. let Δ be some positive number, e.g., three or five time periods. given a sample (y i, :t , ln i i, :t ) we define on the other hand, if t i,max < t , then it is likely that t * = t i,max . thus, we distinguish two cases: because δ i does not enter the likelihood function, its posterior is p(δ j o u r n a l p r e -p r o o f now δ i does enter the likelihood function and its prior gets updated in view of the data. bayesian forecasts reflect parameter and shock uncertainty. we simulate trajectories of infection growth rates from the posterior predictive distribution using the algorithm . the simulated growth rates can be converted into simulated trajectories for active infections using ( ). algorithm (simulating from the posterior predictive distribution) . based on the simulated paths i s :n,t + :t +h , s = , . . . , n sim , compute point, interval, and density forecasts for each period t = t + , . . . , t + h. we now conduct a small monte carlo experiment that compares the forecasts derived from the panel data model to time-series forecasts generated for each location separately. the experiment shows that in our environment forecasts for locations that experience an outbreak at a later point in time are more accurate than forecasts for locations that have an early outbreak because the early outbreaks facilitate learning about the re distribution that benefits the forecasts for the remaining locations. the data generating process (dgp) is described in section . and the results are summarized in section . . j o u r n a l p r e -p r o o f heterogeneous coefficients: initial infection level: i i = for all i. the data generating process (dgp) is given by the trend-break model ( ) for the growth rates of infections. for the simulation experiment we assume that the innovations it are homoskedastic, i.e., σ i = σ for all i. the dgp matches certain aspects of the empirical application in section , but it is more stylized in other dimensions. the time period t is a day. the number of locations in our simulation is n = . we split the locations into two groups: n = locations experience an early outbreak, starting at t = , and n = locations experience a late outbreak, starting at t = . we refer to these groups as "early" and "late." for the early group calendar time and event time are identical. for the late group, the event time is calendar time minus t δ = ( weeks). the parameters of the dgp are summarized in table . the persistence of the growth rates is set to ρ = . . the dispersion of the parameters λ i is controlled by a vector of means, λ and a covariance matrix Σ. both are calibrated to match some stylized facts about the cross-sectional distribution of the country-level data used in section . we then draw the λ i s independently from the n (λ, Σ) distribution. the innovation variance σ corresponds to a high-density value of the estimated density σ i ∼ ig(a, b). we assume that the outbreak starts in each geographical location i with i i = . there is also heterogeneity in the timing of the peak, which is illustrated in figure . the figure shows the percentage of locations that have peaked in or prior to period t. by construction, infections in the early-group locations tend to peak sooner than in the late-group locations. however, the peak dates in each group are quite dispersed: only % of early locations have peaked after days. it takes more than days for the remaining early locations to peak. forecasting models. we report results for two forecasting models: (i) the panel data forecast evaluation. because of the exponential transformation in ( ) from growth rates to levels, there is a large degree of cross-sectional heterogeneity among the levels of infection. locations with larger numbers of infections tend to be associated with larger forecast errors. if we simply average forecast errors or forecast interval lengths across locations, the results will be driven by a few locations with a high level of infections. therefore, we are standardizing all level-forecast evaluation statistics by the level of infections at the forecast origin, i it , i.e., we are reporting results for the forecast of i it +h /i it . we will report measures of density and interval forecasting performance below. we do not consider point forecasts because we strongly believe that due to the highly uncertain path of infections during a pandemic it is essential for forecasters to report forecasts that convey the degree of uncertainty in the predictive distribution. the density forecast performance is evaluated based on continuous ranked probability scores (crps). the crps measures the l distance between the cumulative density functionf it +h|t (x) associated with a predictive distribution for location i at forecast origin t and a perfect probability forecast that assigns probability one to the realized x it +h : the crps is a proper scoring rule, meaning that it is optimal for the forecaster to truthfully reveal her predictive density. here x it +h could either be a growth rate y it +h or a relative level i it +h /i t . for interval forecasts we will report the cross-sectional coverage frequency and the average length separately. as discussed in more detail in askanazi, diebold, schorfheide, and shin we begin with the top left panel of figure . in most early-group locations, the infections tend to peak between the forecast origin t = and the four-week-ahead forecast target t + h = . in the late-group locations, the peak occurs after the forecast target date. for t = the three important findings emerge. first, the panel forecasts clearly dominate the time-series forecasts. the discrepancy is particularly large for locations in the late group. second, while for the early group the crps based on the panel forecasts seem to be unrelated to the peak date, the accuracy of the time-series forecasts is substantially worse for early-group locations that peak between periods and than it is for locations that peak prior to period . third, the four-week-ahead panel forecasts for the late group are much more accurate than the panel forecasts for the early group. these findings can be explained as follows. first, in a panel setting, the experience of the early locations allows for relatively precise inference about the re distribution, which then sharpens the posterior inference for the late locations because the uncertainty about the prior distribution is reduced. note that the time series dimension for the late group is only . second, due to the structural break in the growth rate at the peak infection level, it is very difficult to predict how quickly the infections will die out after they have peaked. this makes it easier to predict infections for the late group which includes the locations that are still far away from the peak than for the early group in which infection levels are relatively close to the peak. the top right panel of figure indicates that after weeks (t = ) the benefit of the panel approach is a lot smaller, both for the early group and the late group. because more time series information is available to estimate the location-specific parameters, the benefit from using prior information is significantly diminished. the bottom panel of the figure shows crps for levels rather than growth rates. the key message remains the same: early on in the pandemic, the panel approach substantially improves forecasts for locations that experience a delayed outbreak, because there is some learning from the locations in which the outbreak occurred early on. in figure we plot the group-specific average crps as a function of the forecast origin are similar to the messages from figure , but now the results span a broad range of forecast origins. first, the panel forecasts are (at least weakly) more accurate than the time-series forecasts. however, the accuracy differential vanishes as the time-series dimension of the estimation sample increases over time. second, the benefit from using a panel approach is more pronounced for the locations that experience a late outbreak than those that experience an early outbreak. interval forecast accuracy. finally, we report results on the interval forecast performance for infection growth rates and levels in figure . we apply the panel forecasting techniques to country/region-level data on active covid- infections. the data set used in the empirical analysis is described in section . . we discuss the posterior estimates in for the - - forecast origin in section . . in section . the data set is obtained from csse at johns hopkins university. we define the total number of active infections in location i and period t as the number of confirmed cases minus the number of recovered cases and deaths. throughout our study we use country-level aggregates. the time period t corresponds to a day and we fit our model to one-sided three-day rolling averages to smooth out noise generate by the timing of the reporting. in a slight abuse of notation, the time subscript t in ( ) before discussing the forecasts, we will examine the parameter estimates for one of the early samples, namely - - . heterogeneous slope coefficients. our gibbs sampler generates draws from the joint posterior of (ρ, λ :n , σ :n , ξ)|y :n, :t . we begin with a discussion of the estimates of γ i and δ i , which affect the speed at which the growth rates are expected to change on a daily basis. γ i measures the average daily decline in the growth rate of active infections. for instance, suppose the at the beginning of the outbreak, in event time t = , the growth rate ln(i t /i t− ) = . , i.e., approximately %. a value of γ i = − . implies that, on average, the growth rate declines by . , meaning that after days it is expected to reach zero and turn negative subsequently. a positive value of δ i = . implies that after the growth rate becomes negative, its decline is reduced (in absolute value) to γ i + δ i = − . . peaked will take considerably longer than the rise to the peak. distribution. an important component of our model is the re distribution π(λ i |ξ) defined in ( ). prior and posterior uncertainty with respect to the hyperparameters ξ generate uncertainty about the re distribution. in the remaining panels of figure we plot draw from the posterior (center column) and prior (right column) distribution of the re density π(λ i |ξ). each draw is represented by a hairline. because the normalization constant c(ξ) of π(λ i |ξ) is difficult to compute due to the truncation of a joint normal distribution, we show kernel density estimates obtained from draws from π(λ i |ξ). we now turn to density forecasts generated from the estimated panel data model. for now, we will focus on the early stage of the pandemic. we use algorithm to simulate trajectories of infection growth rates which, conditional on observations of the initial levels i it , we convert into stocks of active infections. for each forecast horizon h we use the values y s it +h and i s it +h , s = , . . . , n sim to approximate the predictive density. strictly speaking, we are not reporting complete predictive densities. instead, we plot medians and construct equal-tail-probability bands that capture the range between the - % and - % quantiles. the wider the bands, the greater the uncertainty. infections. the path of active infections broadly resembles the paths simulated with the sir model in section . the rise of infections during the outbreak tends to be faster than the subsequent decline, which is a feature that is captured by the break in the conditional mean function of our model for the infection growth rate y it in ( ). the difference between the bands depicted in the second and third rows is that the former reflects parameter uncertainty only (we set future shocks equal to zero), whereas the latter reflects parameter and shock uncertainty. in the case of germany, shock uncertainty increases the width of the bands by approximately %. due to the exponential transformation that is used to recover the levels, the predictive densities are highly skewed and exhibit a large upside risk. this is particularly evident for the u.s. the growth rate prediction in the first row indicates that there is an approximately % probability of a positive infection growth rate throughout april and at least a % probability until the middle of june. converted into levels, temporarily positive growth rates of infections can generate a rise of infections from less than one million in april to more than five million two months later. in the bottom row of figure we plot cumulative density function for the date of recovery, which we define as the first date when the infections fall below the initial level i i . the density function is calculated by examining each of the future trajectories i s it +h for h = , . . . , generated by algorithm . for south korea the probability that the infection rate will fall below i i over the two month period is close to %, whereas for germany and the u.s. the probability is approximately % and %, respectively. in figure we overlay eight weeks of actual infections onto density forecasts generated we now turn to a more systematic evaluation of the forecasts and will assess the accuracy of density and interval forecasts represented by the bands in figures and . for reasons previously discussed in section . , we standardize future infections i it +h by the level of infections i it at the forecast origin. a closer inspection of the forecasts for more than countries/regions reveals that the long-run forecasting performance is not particularly good. this is not just a feature of our panel trend-break model, but also a feature of other epidemiological models such as the sir model for which we will report results below. thus, in this section we will focus on one-week and four-week ahead forecasts and not report results for an eight-week horizon. alternative models. in addition to the panel model forecasts, we consider two alternative forecasts. first, as in section , we generate time-series forecasts based on the trend-break model ( ) for each location. second, we estimate a version of the simple sir in ( ) with time-varying parameters β t and γ t . notice, that by rewriting ( ) we can express β t and γ t directly as a function of the observables (here we are omitting i subscripts): this allows us to estimate the ar( ) law of motion in ( ) for each country using bayesian techniques. the ar( ) models are then used to simulate trajectories ( β t + :t +h , γ t + :t +h ) from the posterior predictive distribution. for each parameter sequence, we iterate the sir model ( ) forward to obtain a predictive distribution of the active infections. density forecast accuracy. figure summarizes the one-week-ahead density forecasting performance for once-a-week forecast origins starting on - - and ending on - - . for each location, we compute the probability score crps i,t +h|t . the top row shows the cross-sectional median as a function of the forecast origin, whereas the center and the bottom row show the cross-sectional empirical distribution for two forecast origins: - - and - - . the panels in the left column of figure cover all locations, whereas the panels in the right column distinguish between early-group and late-group locations. the early group comprises locations that experienced more than infections before - - . the remaining the following additional variables are obtained from the jhu csse dataset: n is the total population of each country. s t is computed as n -i t -recovered cases -deaths. based on the specification of the sir model, we let β t , γ t > and ≤ s t , i t , r t ≤ n , for all t. the panels in the right column of figure distinguish between locations that experienced the covid- outbreak at an early stage and locations that were hit by the pandemic at a later stage. the key result is that for forecast origins dated - - or earlier, the panel forecasts for the late group are more accurate than the time series forecasts from the trend-break model. this result confirms the basic intuition that the panel approach can be advantageous during a slowly spreading pandemic because the experience of the early-group countries can sharpen inference on the re distribution for the latter countries. unfortunately, because the time series approach dominates the panel approach for the early countries, in the aggregate there is no clear advantage to the panel analysis in our data set. the left panels of figure the panel data forecasts have a smaller average length than the individual-level forecasts for both groups and in the aggregate. thus, on balance, in terms of interval forecasting, the panel approach comes out slightly ahead. finally, the bottom right panel shows that the interval forecasts for the late group are generally wider than for the early group. the additional uncertainty is caused by the difficulty of predicting the change in infection growth rates around the peak. figure displays results for a four-week horizon. over this longer horizon, the coverage frequency is generally poor. as for the shorter horizon, the sir model interval forecasts are substantially worse in terms of coverage frequency and interval length than the panel and time-series forecasts from the trend-break model. we adopted a panel forecasting model initially developed for applications in economics to forecast active covid- infections. a key feature of our model is that it exploits the experience of countries/regions in which the epidemic occurred early on, to sharpen forecasts and parameter estimates for locations in which the outbreak took place later in time. at the core of our model is a specification that assumes that the growth rate of active infections coverage probability notes: the nominal coverage probability is %. left column panels: solid is panel, dashed is country-level, and dashed-dotted is sir. right column panels: solid is panel, dashed is country-level. blue lines correspond to early group and orange lines to late group. trend function with a break. our specification is inspired by infection dynamics generated from a simple sir model. according to our model, there is a lot of uncertainty about the evolution of infection rates, due to parameter uncertainty and the realization of future shocks. moreover, due to the inherent nonlinearities and exponential transformations, predictive densities for the level of infections are highly skewed and exhibit substantial upside risk. consequently, it is important to report density or interval forecasts, rather than point forecasts. a natural extension of our model is to allow for additional, data-determined breaks in the deterministic trend function as the pandemic unfolds and countries/regions are adopting on the comparison of interval forecasts policy implications of models of the spread of coronavirus: perspectives and opportunities for economists the challenges of modeling and forecasting the spread of covid- nonparametric empirical bayes and compound decision approaches to estimation of a high-dimensional vector of normal means splinefunktionen the macroeconomics of epidemics estimating and simulating a sird model of covid- for many countries, states, and cities health versus wealth: on the distributional effects of controlling a pandemic unobserved heterogeneity in income dynamics: an empirical bayes perspective the probability approach in econometrics going viral: forecasting the coronavirus pandemic across the containing papers of a mathematical and physical character macroeconomic dynamics and reallocation in an epidemic nowcasting unemployment insurance claims in the time of covid- when will the covid- pandemic peak? density forecasts in panel data models: a semiparametric bayesian perspective forecasting with dynamic panel data models estimating an inverse gamma distribution forecasting the impact of the first wave of the covid- pandemic on hospital demand and deaths for the usa and european economic area countries dealing with data gaps key: cord- -apjwnwky authors: vrugt, michael te; bickmann, jens; wittkowski, raphael title: effects of social distancing and isolation on epidemic spreading: a dynamical density functional theory model date: - - journal: nan doi: nan sha: doc_id: cord_uid: apjwnwky for preventing the spread of epidemics such as the coronavirus disease covid- , social distancing and the isolation of infected persons are crucial. however, existing reaction-diffusion equations for epidemic spreading are incapable of describing these effects. we present an extended model for disease spread based on combining an sir model with a dynamical density functional theory where social distancing and isolation of infected persons are explicitly taken into account. the model shows interesting nonequilibrium phase separation associated with a reduction of the number of infections, and allows for new insights into the control of pandemics. controlling the spread of infectious diseases, such as the plague [ , ] or the spanish flu [ ] , has been an important topic throughout human history [ ] . currently, it is of particular interest due to the worldwide outbreak of the coronavirus disease induced by the novel coronavirus sars-cov- [ ] [ ] [ ] [ ] [ ] [ ] . the spread of this disease is difficult to control, since the majority of infections are not detected [ ] . due to the lack of vaccines, attempts to control the pandemic have mainly focused on social distancing [ ] [ ] [ ] [ ] and quarantine [ , ] , i.e., the general reduction of social interactions, and in particular the isolation of persons with actual or suspected infection. while political decisions on such measures require a way for predicting their effects, existing theories do not explicitly take them into account. in this article, we present a dynamical density functional theory (ddft) [ ] [ ] [ ] [ ] for epidemic spreading that allows to model the effect of social distancing and isolation on infection numbers. a quantitative understanding of disease spreading can be gained from mathematical models [ ] [ ] [ ] [ ] [ ] [ ] . a wellknown theory for epidemic dynamics is the sir model [ ] which has already been applied to the current coronavirus outbreak [ ] [ ] [ ] . it is a reaction-model that describes the number of susceptible s, infected i, and recovered r individuals as a function of time t. susceptible individuals get the disease when meeting infected individuals at a rate c. infected persons recover from the disease at a rate w. when persons have recovered, they are immune to the disease. a drawback of this model is that it describes a spatially homogeneous dynamics, i.e., it does not take into account the fact that healthy and infected persons are not distributed homogeneously in space, even though this fact can have significant influence on the pandemic [ , ] . to allow for spatial dynamics, disease-spreading theories such as the sir model have been extended to reactiondiffusion equations [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] , where a term d φ ∇ φ with diffusion constant d φ is added on the right-hand side of the dynamical equation for φ = s, i, r. reaction-diffusion equations, however, still have the problem that they -being based on the standard diffusion equation -do not take into account particle interactions other than the reactions. this issue arises, e.g., in chemical reactions in crowded environments such as inside a cell. in this case, the reactands, which are not pointlike, cannot move freely, which prevents them from meeting and thus from reacting. to get an improved model, one can make use of the fact that the diffusion equation is a special case of ddft. in this theory, the time evolution of a density field ρ( r, t) with spatial variable r is given by with a mobility Γ and a free energy f . note that we have written eq. ( ) without noise terms, which implies that ρ( r, t) denotes an ensemble average [ ] . the free energy is given by f = f id + f exc + f ext .its first contribution is the ideal gas free energy f id = β − d d r ρ( r, t)(ln(ρ( r, t)Λ d ) − ) corresponding to a system of noninteracting particles with the inverse temperature β, number of spatial dimensions d, and thermal de broglie wavelength Λ. if this is the only contribution, eq. ( ) reduces to the standard diffusion equation with d = Γβ − . the second contribution is the excess free energy f exc , which takes the effect of particle interactions into account. it is typically not known exactly and has to be approximated. the third contribution f ext incorporates the effect of an external potential u ext ( r, t). ddft can be extended to mixtures [ ] [ ] [ ] [ ] , which makes it applicable to chemical reactions. while ddft is not an exact theory (it is based on the assumption that the density is the only slow variable in the system [ , ] ), it is nevertheless a significant improvement compared to the standard diffusion equation as it allows to incor-porate the effects of particle interactions and generally shows excellent agreement with microscopic simulations. in particular, it allows to incorporate the effects of particle interactions such as crowding in reaction-diffusion equations. this is done by replacing the diffusion term d ∇ φ( r, t) in the standard reaction-diffusion model with the right-hand side of the ddft equation ( ) [ ] [ ] [ ] . thus, given that its equilibrium counterpart, static density functional theory, has already been used to model crowds [ ] , ddft is a very promising approach for the development of extended models for epidemic spreading. however, despite the successes of ddft in other biological contexts such as cancer growth [ , ] , protein adsorption [ ] [ ] [ ] [ ] , ecology [ ] , or active matter [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] , no attempts have been made to apply ddft to epidemic spreading (or other types of socio-economic dynamics). in this work, we use the idea of a reaction-diffusion ddft to extend the sir model given by eqs. ( )-( ) to a (probably spatially inhomogeneous) system of interacting persons, which compared to existing methods allows the incorporation of social interactions and social distancing. ddft describes the diffusive relaxation of an interacting system and is thus appropriate if we make the plausible approximation that the underlying diffusion behavior of persons is markovian [ ] and ergodic [ ] . using the mori-zwanzig formalism [ ] [ ] [ ] [ ] [ ] , one can connect the ddft model and its coefficients to the dynamics of the individual persons [ , , ] . the extended model reads note that we use different mobilities Γ s , Γ i , and Γ r for the different fields s, i, and r, which allows to model the fact that infected persons, who might be in quarantine, move less than healthy persons. for generality, we have added a term −mi on the right-hand side of eq. ( ) to allow for death of infected persons, which occurs at a rate m (cf. sird model [ , ] ). since we are mainly interested in how fast the infection spreads, we will set m = in the following. in this case, since the total number of persons is constant, one can easily show that is a conserved current. the ideal gas term f id in the free energy corresponds to a system of noninteracting persons and ensures that standard reaction-diffusion models for disease spreading [ ] arise as a limiting case. the temperature measures the intensity of motion of the persons. a normal social life corresponds to an average temperature, while the restrictions associated with a pandemic will lead to a lower temperature. moreover, the temperature can be position-dependent if the epidemic is dealt with differently in different places. the excess free energy f exc describes interactions. this is crucial here as it allows to model effects of social distancing and selfisolation via a repulsive potential between the different persons. social distancing is a repulsion between healthy persons, while self-isolation corresponds to a stronger repulsive potential between infected persons and other persons. thus, we set f exc = f sd + f si with f sd describing social distancing and f si self-isolation. note that effects of such a repulsive interaction are not necessarily covered by a general reduction of the diffusivity in existing reaction-diffusion models. for example, if people practice social distancing, they will keep a certain distance ( feet is recommended [ ] ) in places such as supermarkets, where persons accumulate even during a pandemic, or if people live in crowded environments, as was the case on the ship "diamond princess" [ ] . in our model, in the cases of two particles approaching each other, which even at lower temperatures still happens, repulsive interactions will reduce the probability of a collision and thus of an infection. existing models can only incorporate this in an effective way as a reduction of the transmission rate c, which implies, however, that properties of the disease (how infectious is it?) and measures implemented against it (do people stay away from each other?) cannot be modelled independently. furthermore, interactions allow for the emergence of spatio-temporal patterns. the final contribution is the external potential u ext . in general, it allows to incorporate effects of confinement into ddft. here, it corresponds to things such as externally imposed restrictions of movement. travel bans or the isolation of a region with high rates of infection enter the model as potential wells. the advantage of our model compared to the standard sir theory is that it allows -in a way that is computationally much less expensive than "microscopic" simulations, since the computational cost is independent of the number of persons [ ] -to study the way in which different actions affect how the disease spreads. for example, people staying at home corresponds to reducing the temperature, quarantine measures correspond to a strongly repulsive potential between infected an healthy persons, and mass events correspond to attractive potentials. specifically, we assume that both types of interactions can be modelled via gaussian pair potentials, depending on the parameters c sd and c si determining the strength and σ sd and σ si determining the range of the interactions. combining this assumption with a ramakrishnan-yussouff approximation [ ] for the excess free energy and a debye-hückel approximation [ ] for the two-body correlation, we get the specific sir-ddft model with the diffusion coefficients d φ = Γ φ β − for φ = s, i, r, the kernels k i ( r) = exp(−σ i r ) for i = sd, si, and the spatial convolution . a possible generalization is discussed in the supplemental material. we begin our investigation with a linear stability analysis of this model, using a general pair potential, in order to determine whether a homogeneous state with i = , which is always a fixed point, is stable. the full calculation is given in the supplemental material. in the simple sir model, the s-r plane in phase space (these are the states where everyone is healthy) becomes unstable when cs > w, where s is the initial number of susceptible persons. thus, the pandemic cannot break out if persons recover faster than they are able to infect others. a linear stability analysis of the full model, performed under the assumption that the initial number of immune persons r is small (which corresponds to a new disease) gives the eigenvalue λ = cs −w−d i k with the wavenumber k, such that this instability criterion still holds when interactions are present. this means that social distancing cannot stabilize a state without infected persons, and can thus not prevent the outbreak of a disease. as reported in the literature [ ] , the marginal stability hypothesis [ ] [ ] [ ] [ ] [ ] gives, based on this dispersion, a front propagation speed of v = d i (cs − w). however, there are two additional eigenvalues λ / = (−d j + j Γ j u sdĥd (k))k with j = s, r and the fourier transformed social distancing potential u sdĥd (k) associated with instabilities due to interactions. front speeds for dispersions of this form have been calculated by archer et al. [ ] . if both epidemic and interaction modes are unstable, the fronts might interfere, leading to interesting results depending on their different speeds. for a further analysis, we solved eqs. ( )-( ) numerically. we assume x and t to be dimensionless, such that all model parameters can be dimensionless too. the calculation was done in one spatial dimension on the domain x ∈ [ , ] with periodic boundary conditions, using an explicit finite-difference scheme with step size dx = . (individual simulations) or dx = . (parameter scan) and adaptive time steps. as an initial condition, we use a gaussian peak with amplitude and variance − centered at x = . for s(x, ), i(x, ) = . s(x, ), and r(x, ) = . since the effect of the parameters c and w on the dynamics is known from previous studies of the sir model, we fix their values to c = and w = . to allow for an outbreak. moreover, we set Γ s = Γ i = Γ r = , d s = d i = d r = . , and σ sd = σ si = . the relevant control parameters are c sd and c si , which control the effects of social interactions that are the new aspect of our model. we assume these parameters to be ≤ , which corresponds to repulsive interactions. measures implemented against a pandemic will typically have two aims: reduction of the total number of infected persons, i.e., making sure that the final number of noninfected persons s ∞ = lim t→∞ s(t) is large, and reduction of the maximum number of infected persons i max for keeping the spread within the capacities of the healthcare system. using parameter scans, we can test whether social distancing and self-isolation can achieve those effects. as can be seen from the phase diagrams for the sir-ddft model shown in fig. , there is a clear phase boundary between the upper left corner, where low values of i max and high values of s ∞ show that the spread of the disease has been significantly reduced, and the rest of the phase diagram, where the disease spreads in essentially the same way as in the model without social distancing. since all simulations were performed with parameters of c and w that correspond to a disease outbreak in the usual sir model, this shows that a reduction of social interactions can significantly inhibit epidemic spreading, and that the sir-ddft model is capable of demonstrating these effects. the phase boundary shows that, for a reduction of spreading by social measures, two conditions have to be satisfied. first, |c si | has to be sufficiently large. second, |c si | has to be, by a certain amount, larger than |c sd |. within our physical model of repulsively interacting particles, this arises from the fact that if healthy "particles" are repelled more strongly by other healthy particles than by infected ones, they will spend more time near infected particles and thus are more likely to be infected themselves. physically, |c si | > |c sd | is thus a very reasonable condition given that infected persons, at least once they develop symptoms, will be isolated more strongly than healthy persons. figure shows the time evolution of the total numbers s(t), i(t), and r(t) of susceptible, infected, and recovered persons, respectively, for the cases without interactions (usual sir model with diffusion) and with interactions (our model). if no interactions are present (i.e., c si = c sd = ), i(t) reaches a maximum value of about . and the pandemic is over at time t ≈ . in the case with interactions (we choose c si = c sd = − , i.e., parameter values inside the social isolation phase), the maximum is significantly reduced to a value of about . . the final value of r(t), which measures the total number of persons that have been infected during the pandemic, decreases from about . to about . . moreover, it takes significantly longer (until time t ≈ ) for the pandemic to end. this demonstrates that social distancing and self-isolation have the effects they are supposed to have, i.e., to flatten the curve i(t) in such a way that the healthcare system is able to take care of all cases. the theoretical predictions for the effects of quarantine on the course of i(t) (sharp rise, followed by a bend and a flat curve) are in good qualitative agreement with recent data from china [ , ] , where strict regulations were implemented to control the covid- spread [ ] . to explain the observed phenomena, it is helpful to analyze the spatial distribution of susceptible and infected persons during the pandemic. figure visualizes i(x, t) with x = ( r) . interestingly, during the time interval where the pandemic is present, a phase separation can be observed in which the infected persons accumulate at certain spots separated from the susceptible persons. (as this effect is reminiscent of measures that used to be implemented against the spread of leprosy, we refer to these spots as "leper colonies".) this phase separation is a consequence of the interactions. since the formation of leper colonies reduces the spatial overlap of the functions i(x, t) and s(x, t), i.e., the amount of contacts between infected and susceptible persons, the total number of in- the leper colony transition is an interesting type of nonequilibrium phase behavior in its own right. recall that we have motivated the sir-ddft model based on theories for nonideal chemical reactions. it is thus very likely that effects similar to the ones observed here can be found in chemistry. in this case, they would imply that particle interactions can significantly affect the amount of a certain substance that is produced within a chemical reaction, and that such reactions are accompanied by new types of (transient) pattern formation. in summary, we have presented a ddft-based extension of the usual models for epidemic spreading that allows to incorporate social interactions, in particular in the form of self-isolation and social distancing. this has allowed us to analyze the effect of these measures on the spatio-temporal evolution of pandemics. given the importance of the reduction of social interactions for the control of pandemics, the model provides a highly useful new tool for predicting epidemics and deciding how to react to them. moreover, it shows an interesting phase behavior relevant for future work on ddft and nonideal chemical reactions. a possible extension of our model is the incorporation of fractional derivatives [ , ] . furthermore, enhanced simulations in two spatial dimensions could show interesting pattern formation effects associated with leper colony formation. here, we present a possible generalization of our model. in the main text, we have used the decomposition which gives the excess free energy (i.e., the contribution from interactions) as a sum of social distancing and selfisolation. instead, one can use the form in this case, social distancing remains unaffected. however, there are now two terms f iso and f ill determining the way infected persons interact with others. f iso is the isolation term, which corresponds to a repulsive interaction between infected and healthy individuals. the term f ill models the interaction of infected persons with other infected persons. this can have various forms. they repel each other if they practice social distancing or self-isolation, but they can also attract each other (e.g., if they intentionally accumulate in a hospital or quarantine station). assuming that the interaction corresponding to f ill is also gaussian, i.e., with the parameters c iso and c ill for the strength and σ iso and σ ill for the range of the infected-noninfected and infected-infected interactions, respectively, the model given by eqs. ( )- ( ) in the main text generalizes to with the kernels k ill ( r) = e −σ ill r (s ) and k sd as defined in the main text. for c iso = c ill = c si and σ iso = σ ill = σ si , the standard case is recovered. the general model can also allow for attractive interactions between infected persons, or simply for a reduction of the repulsion between them (resulting from the fact that they are already ill). here, we perform a linear stability analysis of the extended model given by eqs. ( )-( ) from the main text. for the excess free energy, we use the combined ramakrishnan-yussouff-debye-hückel approximation as in eqs. ( )- ( ) , but now with general two-body potentials u sd h d (x − x ) for social distancing and u si h i (x − x ) for self-isolation. in one spatial dimension, we obtain (s ) any homogeneous state with s = s , r = r , and i = , where s and r are constants, will be a fixed point. we consider fields s = s +s and r = r +r with small perturbationss andr and linearize in the perturbations. this results in (s ) we now drop the tilde and make the ansatz s = s exp(λt − ikx), i = i exp(λt − ikx), and r = r exp(λt − ikx). this gives the eigenvalue equation here,ĥ d (k) andĥ i (k) are the fourier transforms of h d (x − x ) and h i (x − x ), respectively. the corresponding characteristic polynomial reads (s ) rather than solving this third-order polynomial in λ exactly, we consider the limit of long wavelengths. for k = , which corresponds to the usual sir model given by eqs. ( )-( ) in the main text if we assume k ĥ d (k) = , eq. (s ) simplifies to which has the solutions this means that the epidemic will start growing when cs > w, since in this case there is a positive eigenvalue. when interpreting this result, one should take into account that, since a susceptible person that has been infected cannot become susceptible again, the system will, after a small perturbation, not go back to the same state as before even if w > cs . actually, we have tested the linear stability of the s-r plane in phase space, and the fact that any parameter combination of s and r with i = is a solution of the sir model is reflected by the existence of the eigenvalue λ = with algebraic multiplicity (a perturbation within the s-r plane will obviously not lead to an outbreak). next, we consider the case k = , but assume that we can neglect the term s r k u sdĥ d (k)Γ s Γ r in eq. (s ). this corresponds to assuming either r = (i.e., we consider the begin of an outbreak of a new disease that no one is yet immune against) or small k (such that terms of order k can be neglected). then, eq. (s ) gives we can immediately read off the solutions the result for λ shows that the initial state still becomes unstable for cs > w, i.e., the interactions cannot stabilize a state without infected persons that would be unstable otherwise. the eigenvalues λ and λ , which were in the long-wavelength limit, now describe the dispersion due to interparticle interactions that may lead to instabilities not related to disease outbreak. for determining the propagation speed of fronts, we can use the marginal stability hypothesis [ ] [ ] [ ] [ ] [ ] . we transform to the co-moving frame that has velocity v and assume that the growth rate in this frame is zero at the leading edge. thereby, we obtain for a general dispersion λ(k) the equations iv + dλ dk = , the solution of these equations is which is in agreement with results from the literature [ ] . front speeds for dispersions of the form (s ) and (s ) can be found in ref. [ ] . * corresponding author: raphael.wittkowski@uni-muenster yersinia pestisetiologic agent of plague plague spanish flu outdid wwi in number of lives claimed oxford textbook of infectious disease control: a geographical analysis from medieval quarantine to global eradication a new coronavirus associated with human respiratory disease in china a pneumonia outbreak associated with a new coronavirus of probable bat origin origin and evolution of pathogenic coronaviruses a novel coronavirus outbreak of global health concern early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia a novel coronavirus from patients with pneumonia in china substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) covid- and rationally layered social distancing pre-symptomatic transmission in the evolution of the covid- pandemic impact of self-imposed prevention measures and shortterm government intervention on mitigating and delaying a covid- epidemic impact of nonpharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand transmission potential of the novel coronavirus (covid- ) onboard the diamond princess cruises ship the positive impact of lockdown in wuhan on containing the covid- outbreak in china a dynamical extension of the density functional theory dynamic density functional theory of fluids dynamical density functional theory and its application to spinodal decomposition nonlinear diffusion and density functional theory avalanche outbreaks emerging in cooperative contagions evolution and emergence of infectious diseases in theoretical and real-world networks the physics of spreading processes in multilayer networks critical regimes driven by recurrent mobility patterns of reaction-diffusion processes in networks exact solution of generalized cooperative susceptibleinfected-removed (sir) dynamics markovian approach to tackle the interaction of simultaneous diseases a contribution to the mathematical theory of epidemics statistics based predictions of coronavirus -ncov spreading in mainland china a simple stochastic sir model for covid- infection dynamics for karnataka: learning from europe predicting covid- distribution in mexico through a discrete and time-dependent markov chain and an sir-like model correlation between travellers departing from wuhan before the spring festival and subsequent spread of covid- to all provinces in china characterizing the dynamics underlying global spread of epidemics continuum description of a contact infection spread in a sir model infection fronts in contact disease spread natural human mobility patterns and spatial spread of infectious diseases complex dynamics of a reaction-diffusion epidemic model a reaction-diffusion system modeling the spread of resistance to an antimalarial drug asymptotic profiles of the steady states for an sis epidemic reaction-diffusion model global stability of the steady states of an sis epidemic reaction-diffusion model a sis reactiondiffusion-advection model in a low-risk and high-risk domain reaction-diffusion processes and metapopulation models in heterogeneous networks exponential stability of traveling fronts in a diffusion epidemic system with delay a fully adaptive numerical approximation for a two-dimensional epidemic model with nonlinear cross-diffusion dynamical density functional theory for interacting brownian particles: stochastic or deterministic? dynamical density functional theory: binary phase-separating colloidal fluid in a cavity selectivity in binary fluid mixtures: static and dynamical properties extended dynamical density functional theory for colloidal mixtures with temperature gradients multi-species dynamical density functional theory derivation of dynamical density functional theory using the projection operator technique projection operators in statistical mechanics: a pedagogical approach mechanism for the stabilization of protein clusters above the solubility curve: the role of non-ideal chemical reactions mechanism for the stabilization of protein clusters above the solubility curve development of reaction-diffusion dft and its application to catalytic oxidation of no in porous materials density-functional fluctuation theory of crowds dynamic density functional theory of solid tumor growth: preliminary models dynamical density-functional-theory-based modeling of tissue dynamics: application to tumor growth kinetics and thermodynamics of protein adsorption: a generalized molecular theoretical approach competitive adsorption in model charged protein mixtures: equilibrium isotherms and kinetics behavior dynamic density functional theory of protein adsorption on polymer-coated nanoparticles competitive adsorption of multiple proteins to nanoparticles: the vroman effect revisited optimizing the search for resources by sharing information: mongolian gazelles as a case study aggregation of selfpropelled colloidal rods near confining walls dynamical density functional theory for colloidal particles with arbitrary shape dynamical density functional theory for microswimmers dynamical density functional theory for circle swimmers particle-scale statistical theory for hydrodynamically induced polar ordering in microswimmer suspensions multi-species dynamical density functional theory for microswimmers: derivation, orientational ordering, trapping potentials, and shear cells active colloidal suspensions exhibit polar order under gravity active brownian particles in two-dimensional traps active crystals and their stability active brownian particles at interfaces: an effective equilibrium approach effective equilibrium states in the colored-noise model for active matter i. pairwise forces in the fox and unified colored noise approximations effective equilibrium states in the colorednoise model for active matter ii. a unified framework for phase equilibria, structure and mechanical properties effective interactions in active brownian suspensions the five problems of irreversibility particleconserving dynamics on the single-particle level on quantum theory of transport phenomena: steady diffusion transport, collective motion, and brownian motion ensemble method in the theory of irreversibility projection operator techniques in nonequilibrium statistical mechanics mori-zwanzig projection operator formalism for far-from-equilibrium systems with time-dependent hamiltonians microscopic derivation of time-dependent density functional methods investigation of epidemic spreading process on multiplex networks by incorporating fatal properties a simple mathematical model for ebola in africa should, and how can, exercise be done during a coronavirus outbreak? an interview with dr. jeffrey a. woods sedimentation of a twodimensional colloidal mixture exhibiting liquid-liquid and gas-liquid phase separation: a dynamical density functional theory study first-principles order-parameter theory of freezing theory of simple liquids: with applications to soft matter propagating pattern selection pattern propagation in nonlinear dissipative systems solidification fronts in supercooled liquids: how rapid fronts can lead to disordered glassy solids solidification in soft-core fluids: disordered solids from fast solidification fronts generation of defects and disorder from deeply quenching a liquid to form a solid coronavirus covid- global cases by the center for systems science and engineering at john hopkins university uk warns fifth of workforce could be off sick from coronavirus at its peak; army prepared dynamics of ebola disease in the framework of different fractional derivatives fractional derivatives applied to mseir problems: comparative study with real world data key: cord- - ycmxcuz authors: ifguis, ousama; el ghozlani, mohamed; ammou, fouzia; moutcine, abdelaziz; abdellah, zeroual title: simulation of the final size of the evolution curve of coronavirus epidemic in morocco using the sir model date: - - journal: j environ public health doi: . / / sha: doc_id: cord_uid: ycmxcuz since the epidemic of covid- was declared in wuhan, hubei province of china, and other parts of the world, several studies have been carried out over several regions to observe the development of the epidemic, to predict its duration, and to estimate its final size, using complex models such as the seir model or the simpler ones such as the sir model. these studies showed that the sir model is much more efficient than the seir model; therefore, we are applying this model in the kingdom of morocco since the appearance of the first case on march , with the objective of predicting the final size of the epidemic. during christmas , covid- caused an epidemic in the city of wuhan, hubei province of china [ ] . it spread to the other parts of china and subsequently to many other countries around the world. morocco is one of the countries affected by covid- . on march , the country identified the first case [ , ] , and as of march , confirmed cases reached up to with a death toll of [ ] . also, as the number of infected cases is increasing, it is necessary for modellers to estimate the severity of the epidemic in terms of the total number of people infected, the total number of confirmed cases, the total number of deaths, and basic reproduction and to predict the duration of the epidemic, the arrival of its peak, and its final size. is information can help public health agencies make informed decisions. in this work, we used the sir model to predict the development of the epidemic in the kingdom of morocco from the identification of the first case on march in the city of casablanca [ ] , given the reliability of the data and the definition of confirmed cases during this period and the simplicity of our forecasts and analyses. we were able to determine the detailed results of the sir model calibration and the predictions of our model, including the distribution of the peak period, the prediction interval of future confirmed cases, and the total number of infected persons. in the sir model, compartment s refers to the sensitive population in morocco, i refers to the infectious population, and r refers to confirmed cases [ ] . e latency of covid- infection is biologically realistic due to an incubation period of up to days; newly infected persons may not be contagious during this time period as the virus will be incubating in the organism. here, we highlight the difference between the latency period, the period between when an individual is infected and when he or she is infectious, and the incubation period, the period between when an individual is infected and when clinical symptoms, including fever and cough symptomatic of covid- , appear. e transfer diagram for the model is shown in figure . e biological significance of all model parameters is given in table . a key assumption in both models is that deaths occurring in compartments s, e, and i are negligible during the model prediction period. e differential equation system for the sir model is given as follows [ ] : we can estimate the nature of the disease in terms of the power of infection: it is called the basic reproduction number. r is the average number of people infected from another person. if it is high, the probability of the pandemic is also higher. for this study, we used the data accumulated since march concerning morocco [ ] . four parameters to be estimated in the sir model from the data are as follows: basic reproduction number (r ), contact rate (β), removal rate (c), and final number of cases (i). we do not consider the effect of the natural death or birth rate so that the total population remains constant. (i) n � constant (ii) n � total population in morocco � , , to solve the ordinary differential equation of the sir model (equations ( )-( )), we must first simulate the daily data of the sir model, compare them with the real data, and then execute the optimization algorithm (reduce the difference between the real data and the corresponding simulated data), so that it searches for the values of α and β, which minimize the difference between the real data and the simulated ones. in the optimal model, we supposed that the predicted infected cases should definitely be close to the actual number of infected cases; we found that the critical number of susceptible cases was about , (table ) . e transfer diagram for the sir model used in the simulation of the size of covid- infection in the moroccan state is provided in detail in figure . we note from figure that the number of confirmed cases is not very high; the highest daily confirmed number journal of environmental and public health was on march . is can be explained by the strategy of morocco, which declared a state of health emergency and confinement on march , friday, at p.m., to limit the displacement of the population as much as possible, the only way to keep the coronavirus under control. figure , based on optimal sir models, shows that the start of acceleration of the epidemic is around march , the regular growth will begin on april , and the end of the epidemic in morocco would be around april , with a total of , infected cases and final number of susceptible cases (table ). e peak of the reported cases will be around march with confirmed cases (figure ). e optimal estimates of r ranges from . to . (table ); our study confirmed that the transmissibility of covid- is r � . (table ) , which was comparable to that of severe acute respiratory syndrome coronavirus (sarscov) (ranging from . to . ) [ ] and much higher than that of middle east respiratory syndrome coronavirus (mers-cov) (ranging from . to . ) [ ] . our simulation study on the optimization of the final size of covid- epidemic evolution in the kingdom of morocco, with the sir model, has allowed us to accurately predict the peak of the infected and death cases (table ) , although the number of people tested is very low, about , , until march . e moroccan government should probably increase the number of cases tested on a daily basis in order to accurately identify the true size of the pandemic in morocco. e data used to support the findings of this study are cited in the article as reference [ ] . and the mathematical calculations are provided in the supplementary materials. updating the accounts: global mortality of the - "spanish" influenza pandemic a contribution to the world health organization world health organization why is it difficult to accurately predict the covid- epidemic? transmission dynamics of the etiological agent of sars in hong kong: impact of public health interventions unraveling the drivers of mers-cov transmission key: cord- -ozl ztz authors: enrique amaro, josé; dudouet, jérémie; nicolás orce, josé title: global analysis of the covid- pandemic using simple epidemiological models date: - - journal: appl math model doi: . /j.apm. . . sha: doc_id: cord_uid: ozl ztz several analytical models have been developed in this work to describe the evolution of fatalities arising from coronavirus covid- worldwide. the death or ‘d’ model is a simplified version of the well-known sir (susceptible-infected-recovered) compartment model, which allows for the transmission-dynamics equations to be solved analytically by assuming no recovery during the pandemic. by fitting to available data, the d-model provides a precise way to characterize the exponential and normal phases of the pandemic evolution, and it can be extended to describe additional spatial-time effects such as the release of lockdown measures. more accurate calculations using the extended sir or esir model, which includes recovery, and more sophisticated monte carlo grid simulations – also developed in this work – predict similar trends and suggest a common pandemic evolution with universal parameters. the evolution of the covid- pandemic in several countries shows the typical behavior in concord with our model trends, characterized by a rapid increase of death cases followed by a slow decline, typically asymmetric with respect to the pandemic peak. the fact that the d and esir models predict similar results – without and with recovery, respectively – indicates that covid- is a highly contagious virus, but that most people become asymptomatic (d model) and eventually recover (esir model). • similar results from esir and d models suggest that most susceptibles become infected, asymptomatic and eventually recover. • similar trends suggest a common pandemic evolution with universal parameters. the sir (susceptible-infected-recovered) model is widely used as first-order approximation to viral spreading of contagious epidemics [ ] , mass immunization planning [ , ] , marketing, informatics and social networks [ ] . its cornerstone is the so-called "mass-action" principle introduced by hamer, which assumes that the course of an epidemic depends on the rate of contact between susceptible and infected individuals [ ] . this idea was extended to a continuous time framework by ross in his pioneering work on malaria transmission dynamics [ ] [ ] [ ] , and finally put into its classic mathematical form by kermack and mckendric [ ] . the email addresses: amaro@ugr.es (josé enrique amaro), j.dudouet@ip i.in p .fr (jérémie dudouet), jnorce@uwc.ac.za (josé nicolás orce*) url: webpage: http://www.ugr.es/∼amaro (josé enrique amaro) , https://nuclear.uwc.ac.za (josé nicolás orce*) sir model was further developed by kendall, who provided a spatial generalization of the kermack and mckendrick model in a closed population [ ] (i.e. neglecting the effects of spatial migration), and bartlett, who -after investigating the connection between the periodicity of measles epidemics and community size -predicted a traveling wave of infection moving out from the initial source of infection [ , ] . more recent implementations have considered the typical incubation period of the disease and the spatial migration of the population. the pandemic has ignited the submission of multiple manuscripts in the last weeks. most statistical distributions used to estimate disease occurrence are of the binomial, poisson, gaussian, fermi or exponential types. despite their intrinsic differences, these distributions generally lead to similar results, assuming independence and homogeneity of disease risks [ ] . in this work, we propose a simple and easy-to-use epidemiological model -the death or d model [ ] that can be compared with data in order to investigate the evolution of the infection and deviations from the predicted trends. the d model is a simplified version of the sir model with analytical solutions under the assumption of no recovery -at least during the time of the pandemic. we apply it globally to countries where the infestation of the covid- coronavirus has widespread and caused thousands of deaths [ , ] . additionally, d-model calculations are benchmarked with more sophisticated and reliable calculations using the extended sir (esir) and monte carlo planck (mcp) models -also developed in this work -which provide similar results, but allow for a more coherent spatial-time disentanglement of the various effects present during a pandemic. a similar esir model has recently been proposed by squillante and collaborators for infected individuals as a function of time, based on the ising model -which describes ferromagnetism in statistical mechanics -and a fermi-dirac distribution [ ] . this model also reproduces a posteriori the covid- data for infestations in china as well as other pandemics such as ebola, sars, and influenza a/h n . the sir model considers the three possible states of the members of a closed population affected by a contagious disease. it is, therefore, characterized by a system of three coupled non-linear ordinary differential equations [ ] , which involve three time-dependent functions: • susceptible individuals, s(t), at risk of becoming infected by the disease. • infected individuals, i(t). • recovered or removed individuals, r(t), who were infected and may have developed an immunity system or die. the sir model describes well a viral disease, where individuals typically go from the susceptible class s to the infected class i, and finally to the removed class r. recovered individuals cannot go back to be susceptible or infected classes, as it is, potentially, the case of bacterial infection. the resulting transmission-dynamics system for a closed population is described by where λ > is the transmission or spreading rate, β > is the removal rate and n is the fixed population size, which implies that the model neglects the effects of spatial migration. currently, there is no vaccination available for covid- , and the only way to reduce the transmission or infection rate λ -which is often referred to as "flattening the curve"-is by implementing strong social distancing and hygiene measures. the system is reduced to a first-order differential equation, which does not possess an explicit solution, but can be solved numerically. the sir model can then be parametrized using actual infection data to solve i(t), in order to investigate the evolution of the disease. in the d model, we make the drastic assumption of no recovery in order to obtain an analytical formula to describe -instead of infestations -the death evolution by this can be useful as a fast method to foresee the global behavior as a first approach, before applying more sophisticated methods. we shall see that the resulting d model describes well enough the data of the current pandemics in different countries. the main assumption of the d model is the absence of recovery from coronavirus, i.e. r(t) = , at least during the pandemic time interval. this assumption may be reasonable if the spreading time of the pandemic is much faster than the recovery time, i.e. λ β . the sir equations are then reduced to the single equation of the well-known si model, which represents the simplest mathematical form of all disease models, where the infection rate is proportional to both the infected, i, and susceptible individuals n −i. where we have defined the constants the parameter b is the characteristic evolution time of the initial exponential increase of the pandemic. the constant c is the initial infestation rate with respect to the total population n. assuming c , in order to predict the number of deaths in the d model we assume that the number of deaths at some time t is proportional to the infestation at some former time τ, that is, where µ is the death rate, and τ is the death time. with this assumption we can finally write the d-model equation as where a = µi e −τ/b , c = c e −τ/b , and a/c yields the total number of deaths predicted by the model. this is the final equation for the d-model, which presents a similar shape to the well-known woods-saxon potential for the nucleons inside the atomic nucleus or the bacterial growth curve. the rest of the parameters, µ, τ, i and n are embedded in the parameters a, b, c, which represent space-time averages and can be fitted to the timely available data. consequently, the d-function passes into a well-known logistic model, which is described by the riccati equation, but with different constants (e.g. see ref. [ ] and references therein). in fig. , we present the fit of the d-model to the covid- death data for china, where its evolution has apparently been controlled and the d function has reached the plateau zone, with few increments over time, or fluctuations that are beyond the model assumptions. this plot shows the duration of the pandemicabout two months to reach the top end of the curveand the agreement, despite the crude assumptions, between data and the evolution trend described by the dmodel. this agreement encourages the application of the d model to other countries in order to investigate the different trends. in order to get insight into the stability and uncertainty of our predictions, fig. shows the evolution of a, b, and c and other model predictions from fits to the daily data in spain. the meaning of these quantities is explained below: • the parameter a is the theoretical number of deaths at the day corresponding to t = . in general, it differs from the experimental value and can be interpreted as the expected value of deaths that day. note that experimental data may be subject to unknown systematic errors and different counting methods. • the parameter b, as mentioned above, is the characteristic evolution time. during the initial exponential behavior, it indicates the number of days for the number of deaths to double. moreover, /b is proportional to the slope of the almost linear behavior in the mid region of the d function. that behavior can be obtained by doing a taylor expansion around t = −b n c and is given by • the parameter c is called the inverse dead factor because d(t → ∞) = a/c provides the asymptotic or expected total number of deaths. ( ) figure shows the stable trend of the parameters between days to (corresponding to march - ), right before reaching the peak of deaths cases, which occurred in spain around april . such stability validates the d-model predictions during this time. however, a rapid change of the parameters is observed, especially for a, once the peak is reached, drastically changing the prediction of the number of deaths given by a/c. this sudden change results in the slowing down of deaths per day and longer time predictions t and t . the parameters of the d model correspond to average values over time of the interaction coefficients between individuals, i.e. they are sensitive to an additional external effect on the pandemic evolution. these may include the lockdown effect imposed in spain in march and other effects such as new sources of infection or a sudden increase of the total susceptible individuals due to social migration and large mass gatherings [ ] . it is not possible to identify a specific cause because its effects are blurred by the stochastic evolution of the pandemic, which is why any reliable forecast presents large errors. one can also determine deaths/day rates by applying the first derivative to eq. , which allows for a determination of the pandemics peak and evolution after its turning point. the d model describes well the cumulative deaths because the sum of discrete data reduce the fluctuations, in the same way as the integral of a discontinuous function is a continuous function. however, the daily data required for d have large fluctuations -both statistical and systematic -which normally gives a slightly different set of parameters when compared with the d model. using the d model fitted to cumulative deaths allows to compute deaths/day as where ∆t = day. figure shows that eqs. and yield similar parameters, as the time increment is small enough compared with the time evolution of the d(t) function. hence, the first derivative d (t) can be used to describe deaths per day. in addition, fig. shows that the parameters may be different for both d and d functions using cumulative and daily deaths, respectively, as shown for spain on april . it is also important to note that b is directly proportional to the full width at half maximum (fw hm) of the d (t) distribution, as shown below, the b parameter presents typical values between and for most countries undergoing the initial exponential phase, which yields a minimum and maximum time of and days, respectively, between the two extreme values of the fw hm. some models [ ] include changes in the transmission rate due to various interventions implemented to contain the outbreak. the simple d model does not allow to do this explicitly, but changes in the spread can be taken into account by considering the total d or d n function as the sum of two or more independent d-functions with different parameters, which may reveal the existence of several independent sources, or virus channels. an example is shown in fig. , where the two-channel function has been fitted with six parameters to the spanish data up to april . the fit reveals a second, smaller death peak, which substantially increase the number of deaths per day and the duration of the pandemic. this is equivalent to add a second, independent, source of infection several weeks after the initial pandemic. the second peak may as well represent a second pandemic phase driving the effects of quarantine during the descendant part of the curve. additionally, the cumulative d-function can also be computed with a two-channel function, which provides, as shown in fig. , a more accurate prediction for the total number of deaths and clearly illustrates the separate effect of both source peaks. it is interesting to note that for large t, a ≈ a , c ≈ c and b ≈ b. in such a case, the total number of deaths expected during the pandemic is given by d (∞) = a/c. the d-model can also be used to estimate i(t) using the initial values of i = i( ) and the total number of susceptible people n = s( ). the initial value of n is unknown, and not necessarily equal to the population of the whole country since the pandemic started in localized areas. here, we shall assume n = , although plausible values of n can be tens of millions. note that the no-recovery assumption of the d model is unrealistic, and this calculation only provides an estimation of the number of individuals that were infected at some time, independently of whether they recovered or not. from the definition of d(t) in eq. , the following relations between the several parameters of the model were extracted solving the first two equations for µ and i we obtain hence, µ can be computed by knowing n. however, to obtain i one needs to know the death time τ. this has been estimated to be about to days for covid- cases, which can be used to compute two estimates of i(t). these are given in fig. for the case of spain. since there is no recovery in the d model, the total number of infected people is i ∼ n for large t, i.e. n = in our case. in fig. , we have labeled the beginning of the lockdown in spain (march ). for τ = days, most of the susceptible individuals were already infected on that date, and even more for τ = days, as the pandemic had started almost two months earlier. most of the individuals got infected, even if a great part of them -approximately % -had no symptoms of illness or disease. moreover, the top panel of fig. shows the ratio d(t)/i(t) (deaths over infected), as given by eqs. and , which also depends on n and τ. for n = , the ratio d/i increases similarly to the separate functions d and i between the initial and final values, these results depend on the total susceptible population n. however, the ratio of infected with respect to susceptibles, i/n, is independent on n. this function depends only on τ and is shown in the bottom panel of fig. for τ = and days, which reveals the rapid spread of the pandemic. accordingly, between % and % of the susceptibles were infected in march , and one month later (april ), when the fit was made, all susceptibles had been infected. this does not means that the full population of the country got infected, since the number n is unknown and, for instance, excludes individuals in isolated regions, and it may additionally change because of spatial migration, not considered in the model. d-model predictions can be compared with more realistic results given by the complete sir model [ , ] , which is characterized by eqs. , , and with initial conditions r( ) = , i( ) = i , s( ) = n − i . the sir system of dynamical equations can be reduced to a non-linear differential equation. first, dividing eq. by eq. one obtains, which yields the following exponential relation between the susceptible and the removed functions, moreover, eq. provides a relation between the infected and the removed functions, which yields, by inserting into eq. , the final sir differential equation in order to obtain r(t) we only need to solve this first-order differential equation with the initial condition r( ) = . moreover, if we normalize the functions s, i and r to , so that s + i + r = , then r(t) verifies which can be solved numerically, or by approximate methods in some cases. in ref. [ ] , a solution was found for small values of the exponent λ nr/β . for the coronavirus pandemic, however, this number is expected to increase and be close to one at the pandemic end. at this point, we propose a modification of the standard sir model. instead of solving eq. numerically and fitting the parameters to data, the solution can be parametrized as which presents the same functional form as the d-model and, conveniently, provides a faster way to fit the model parameters by avoiding the numerical problem of solving eq. . in fact, numerical solutions of the sir model present a similar step function for r(t). additionally, one can assume that d(t) is proportional to r(t), and can also be written as where a , c = s and b = β /(λ n) are unknown parameters to be fitted to deaths-per-day data, together with the three parameters of the r(t)-function: a, b, c. figure shows fits of the esir model to daily deaths in spain during the coronavirus spread. the use of no boundary condition for the number of deaths (left panel) is not an exact solution of the sir differential equation. a way to solve this problem is to impose the condition d (∞) = , as the number of deaths must stop at some time. numerically, it is enough to choose a small value of d (t) for an arbitrary large t. the middle and right panels of fig. show different boundary conditions of d ( ) = and d ( ) = , respectively, which yield the same results and the expected behavior for a viral disease spreading and declining. it is also consistently observed (e.g. see middle and right panels of fig. ), that at large t, r(t) → a c ≈ , which essentially means that most of the susceptible population n recovers, as we previously inferred from the d model. as shown in fig. , the esir model, where c has been adjusted to , is characterized by a broad plateau structure which, again, does not consider additional spatialtime effects. as previously done with the d model, one can also expand the esir model to accommodate its failure to take additional spatial-time effects into account. similarly, the esir model is proposed as, with where we have assumed that a = a and c = a to accommodate that r(∞) → and c = . hence, we are left with five free parameters. figure shows the comparison between the esir and d fits to real data for some european countries where covid- has widely spread: germany, italy, france, spain, united kingdom, sweden, belgium and netherlands, which indicate a common pattern for the evolution of the covid- pandemic. death data are taken from refs. [ , , ] and consider -day average smoothing to correct for anomalies in data collection such as the typical weekend staggering observed in various countries, where weekend data are counted at the beginning of the next week. real error intervals are extracted from the correlation matrix. as discussed in section . , the reduced d model has been used with a = a and c = c . although arising from different assumptions -no recovery (d-model) and recovery (esir model) -both models provide similar patterns following the data trends, with slightly better values of χ per degree of freedom for the esir model. it is also interest- figure : reduced esir and d model fits to deaths-per-day data up to august . ing to note that the reduced esir model with five parameters yields similar results to the full esir model, with eight parameters. as data become available, daily predictions vary for both esir and the d models. this is because the model parameters are actually statistical averages over space-time of the properties of the complex system. no model is able to predict changes over time of these properties if the physical causes of these changes are not included. the values of the model parameters are only well defined when the disease spread is coming to an end and time changes in the parameters have little influence. contrarily, fig. shows clear discrepancies between d and esir fits to data with larger χ /ndf values. there are several reasons for these anomalies: ) a second wave surges as lockdown measures are suddenly released, as clearly shown in the case of iran, ) different spatial-time effects as the virus spreads throughout a large country, or, generally, ) simply because of defective counting (e.g. weekend and backlog effects). more sophisticated calculations can be compared with d and esir predictions. in particular, monte carlo (mc) simulations have also been performed in this work for the spanish case [ ] , which consist of a lattice of cells that can be in four different states: susceptible, infected, recovered or death. an infected cell can transmit the disease to any other susceptible cell within some random range r. the transmission mechanism follows principles of nuclear physics for the interaction of a particle with a target. each infected particle interacts a number n of times over the interaction region, according to its energy. the number of interactions is proportional to the interaction cross section σ and to the target surface density ρ. the discrete energy follows a planck distribution law depending on the 'temperature' of the system. for any interaction, an infection probability is applied. finally, time-dependent recovery and death probabilities are also applied. the resulting virus spread for different sets of parameters can be adjusted from covid- pandemic data. in addition, parameters can be made time dependent in order to investigate, for instance, the effect of an early lockdown or large mass gatherings at the rise of the pandemic. as shown in fig. , our mc simulations present similar results to the d model, which validates the use of the simple d-model as a first-order approximation. more details on the mc simulation will be presented in a separate manuscript [ ] . interestingly, mc simulations follow the data trend up to may without any changes in the parameters for nearly two weeks. an app for android devices, where the monte carlo planck model has been implemented to visualize the simulation is available from ref. [ ] . in order to investigate the universality of the pandemic, it is interesting to compare all countries by plotting the d model in terms of the variable (t − t )/b, where t is the maximum of the daily curve given by t max = −b n(c). by shifting eq. by t max = −b n(c) and dividing by t max = a/c, the normalized d function is given by, the left of fig. shows similar trends for the normalized d curves of different countries, which suggests a universal behavior of the covid- pandemic. only iran seems to slightly deviate from the global trend, which may indicate an early and more effective initial lockdown. a similar approach can be done for the daily data using the d and esir models, as shown in the middle and right panels of fig. , respectively. although different countries show similar trends, statistical fluctuations in the daily data do not result in a nice universal behavior as compared with d norm . however, the d and esir plots show that an effective lockdown is characterized by flatter and broader peaks, best characterized the iranian case, whereas spain and germany present the sharper peaks. the global models considered in this work present some differences with respect to other existing models. first, in this work we have tried to keep the models as simple as possible. this allows to use theoreticalinspired analytical expressions or semi-empirical formulae to perform the data analysis. the use of semiempirical expressions for describing physical phenomena is recurrent in physics. one of the most famous is the semi-empirical mass formula from nuclear physics. of course the free parameters need to be fitted from known data, but this allowed to obtain predictions for unknown elements. in our case we were inspired by the well known statistical sir-kind models slightly modified to obtain analytical expressions that carry the leading time dependence. we have found that the d and d models allow a fast and efficient analysis of the pandemics in the initial and advanced stages. our results show that the time dependence of the pandemic parameters due to the lockdown can be effectively simulated by the sum of two dfunctions with different widths and heights and centered at different times. the distance between the maxima of the two d-functions should be a measure of the time between the effective pandemic beginning and lockdown. in the spanish case this is about days. taking into account that lockdown started in march , this marks the pandemic starting time as about february . had the lockdown started on that date, the deaths would had been highly reduced. the smooth blending between the two peaks provides a transition between the two statistical regimes (or physical phases) with and without lockdown. the monte carlo simulation results are in agreement with our previous analysis with the d and d models. the monte carlo generates events in a population of individuals in a lattice or grid of cells. we simulate the movement of individuals outside of the cells and interactions with the susceptible individuals within a finite range. the randon events follow statistical distributions based on the exponential laws of statistical mechanics for a system of interacting particles, driven by macroscopic magnitudes as the temperature, and interaction probabilities between individuals, that can be related to interaction cross sections. the monte carlo simulation spread the virus in space-time, and allows also space-time dependence on the parameters. in this work we have made the simplest assumptions, only allowing for a lockdown effect by reducing the range of the interaction starting on a fixed day. this simple modification allowed to reproduce nicely the spanish death-per-day curve. the lockdown produces a relatively long broadening of the curve and a slow decay. similar mc calculations can be performed in several countries to infer the devastating effect of a late lockdown as compared with early lockdown measures. the later is the case of south africa and other countries, which have not reached the exponential growth. the death (d) and extended sir (esir) models are simple enough to provide fast estimations of pandemic evolution by fitting spatial-time average parameters, and present a good first-order approximation to understand secondary effects during the pandemic, such as lockdown and population migrations, which may help to control the disease. similar models are available [ , ] , but challenges in epidemiological modeling remain [ ] [ ] [ ] [ ] . this is a very complex system, which involves many degrees of freedom and millions of people, and even assuming consistent disease reporting -which is rarely the case -there remains an impor-tant open question: can any model predict the evolution of an epidemic from partial data? or similarly, is it possible, at any given time and data, to measure the validity of an epidemic growth curve? we finally hope that we have added new insightful ideas with the death, the extended sir and monte carlo models, which can now be applied to any country which has followed the initial exponential pandemic growth. it is important to note that the esir and d models predict similar patterns of infected and death cases assuming very different premises: recovery and no recovery, respectively. this, together with the fact that the esir model predicts that r → for large t, i.e. that most infected cases eventually recover, leads to the logical conclusion that most people in a fixed population n become asymptomatic in the d model and eventually recover from covid- . one remaining important question is what is n exactly; is it the whole country, a state or a province, or is it localized to specific areas? discussion: the kermack-mckendrick epidemic threshold theorem stability analysis of sir model with vaccination seasonality and the effectiveness of mass vaccination application of sir epidemiological model: new trends epidemic disease in england -the evidence of variability and of persistency of type report on the prevention of malaria in mauritius an application of the theory of probabilities to the study of a priori pathometry. -part i an application of the theory of probabilities to the study of a priori pathometry.-part iii a contribution to the mathematical theory of epidemics discussion of âȂŸmeasles periodicity and community sizeâȂŹ by measles periodicity and community size deterministic and stochastic models for recurrent epidemics basic models for disease occurrence in epidemiology the d model for deaths by covid- the continuing -ncov epidemic threat of novel coronaviruses to global health âȂŤ the latest novel coronavirus outbreak in wuhan, china situation report - attacking the covid- with the ising-model and the fermi-dirac distribution function the sir model and the foundations of public health impact of non-pharmaceutical interventions against covid- in europe: a quasi-experimental study inferring covid- spreading rates and potential change points for case number forecasts special issue on challenges in modelling infectious disease dynamics modeling infectious disease dynamics in the complex landscape of global health mathematical epidemiology: past, present, and future true epidemic growth construction through harmonic analysis the authors thank useful comments from emmanuel clément, araceli lopez-martens, david jenkins, ramon wyss, azwinndini muronga, liam gaffney and hans fynbo. this work was supported by the spanish ministerio de economía y competitividad and european feder funds (grant fis - -c - -p), junta de andalucía (grant fqm- ) and the south african national research foundation (nrf) under grant . key: cord- -jfeu tho authors: fukui, m.; furukawa, c. title: power laws in superspreading events: evidence from coronavirus outbreaks and implications for sir models date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: jfeu tho while they are rare, superspreading events (sses), wherein a few primary cases infect an extraordinarily large number of secondary cases, are recognized as a prominent determinant of aggregate infection rates (r ). existing stochastic sir models incorporate sses by fitting distributions with thin tails, or finite variance, and therefore predicting almost deterministic epidemiological outcomes in large populations. this paper documents evidence from recent coronavirus outbreaks, including sars, mers, and covid- , that sses follow a power law distribution with fat tails, or infinite variance. we then extend an otherwise standard sir model with estimated power law distributions, and show that idiosyncratic uncertainties in sses will lead to large aggregate uncertainties in infection dynamics, even with large populations. that is, the timing and magnitude of outbreaks will be unpredictable. while such uncertainties have social costs, we also find that they on average decrease the herd immunity thresholds and the cumulative infections because per-period infection rates have decreasing marginal effects. our findings have implications for social distancing interventions: targeting sses reduce not only the average rate of infection (r ) but also its uncertainty. to understand this effect, and to improve inference of the average reproduction numbers under fat tails, estimating the tail distribution of sses is vital. on march th, , choir members were gathered for their rehearsal in washington. while they were all cautious to keep distance from one another and nobody was coughing, three weeks later, members had covid- , and two passed away. there are numerous similar anecdotes worldwide. many studies have shown that the average basic reproduction number (r ) is around . - for this coronavirus (e.g. liu et al., ) , but % of infected cases do not pass on to any others (nishiura et al., ) . the superspreading events (sses), wherein a few primary cases infect an extraordinarily large number of others, are responsible for the high average number. as sses were also prominent in sars and mers before covid- , epidemiology research has long sought to understand them (e.g. shen et al., ) . in particular, various parametric distributions of infection rates have been proposed, and their variances have been estimated in many epidemics under an assumption that they exist (e.g. lloyd-smith et al., ) . on the other hand, stochastic susceptible-infectious-recovered (sir) models have shown that, as long as the infected population is moderately large, the idiosyncratic uncertainties of sses will cancel out each other. that is, following the central limit theorem (clt), stochastic models quickly converge to their deterministic counterparts, and become largely predictable. from this perspective, the dispersion of sses is unimportant in itself, but is useful only to the extent it can help target lockdown policies to focus on sses to efficiently reduce the average rates r (endo et al., ) . in this paper, we extend this research by closely examining the distribution of infection rates, and rethinking how its dispersion influences the uncertainties of aggregate dynamics. using evidence from the several coronavirus outbreaks, we show that sses follow a power law, or pareto, distribution with fat tails, or infinite variance. that is, the true variance of infection rates cannot be empirically estimated as any estimate will be an underestimate however large it may be. when the clt assumption of finite variance does not hold, many theoretical and statistical implications of epidemiology models will require rethinking. theoretically, even when the infected population is large, the idiosyncratic uncertainties in sses will persist and lead to large aggregate uncertainties. statistically, the standard estimate of the average reproduction number (r ) may be far from its true mean, and the standard errors will understate the true uncertainty. because the infected population for covid- is already large, our findings have immediate implications for statistical inference and current policy. we begin with evidence. figure plots the largest clusters reported worldwide for covid- from data gathered by leclerc et al. ( ) . if a random variable follows a power law distribution with an exponent α, then the log of its scale (e.g. a us navy vessel had , cases tested positive) and the log of its severity rank (e.g. that navy case ranked st in severity) will have a linear relationship, with its slope indicating −α. figure shows a fine fit of the power law source: cmmid covid− working group online database (leclerc et al., ) number of total cases per cluster (in log) figure plots the number of total cases per cluster (in log) and their ranks (in log) for covid- , last updated on june rd. it fits a linear regression for the clusters with size larger than . the data are collected by the center for mathematical modelling of infectious diseases covid- working group (leclerc et al., ) . stantial uncertainties in aggregate epidemiological outcomes. concretely, we consider a stochastic model with a population of one million, whereby a thousand people are initially infected, and apply epidemiological parameters adopted from the literature. we consider effects of tails of distribution while keeping the average rate (r ) constant. under thin-tailed distributions, such as the estimated negative binomial distribution or power law distribution with α = , the epidemiological outcomes will be essentially predictable. however, under fat-tailed distributions as estimated in the covid- data worldwide (α = . ), there will be immense variations in all outcomes. for example, the peak infection rate is on average %, but its the percentile is % while its th percentile is %. under thin-tailed distribution such as negative binomial distribution, the average, th percentile and th percentile of the peak infection is all concentrated at %, generating almost deterministic outcomes. while our primary focus was on the effect on aggregate uncertainty, we also find important effects on average outcomes. in particular, under a fat-tailed distribution, the cumulative and peak infection, as well as the herd immunity threshold, will be lower, and the timing of outbreak will come later than those under a thin-tailed distribution, on average. for example, the average herd immunity threshold is percent with thin-tailed distribution, it is % with fat-tailed distribution. these observations suggest that the increase in aggregate uncertainty over r has effects analogous to a decrease in average r . this relationship arises because the average future infection will be a concave function of today's infection rate: because of concavity, mean preserving spread will lower the average level. in particular, today's higher infection rate has two countering effects: while it increases the future infection, it also decreases the susceptible population, which decreases it. we provide theoretical interpretations for each outcome by examining the effect of mean-preserving spread of r in analytical results derived in deterministic models. our findings have critical implications for the design of lockdown policies to minimize the social costs of infection. here, we study lockdown policies that target sses. we assume that the maximum size of infection rate can be limited to the top percent with some probabilities by banning large gatherings. because both the uncertainty and mean of the infection rate in the fat-tailed distribution are driven by the tail events, such policies substantially lower the uncertainty and improve the average outcomes. because the cost of such policy is difficult to estimate reliably, we do not compute the cost-effectiveness of such policy. nonetheless, we believe this is an important consideration in the current debates on how to re-open the economy while mitigating the uncertainties of subsequent waves. finally, we also show the implications of a fat-tailed distributions for the estimation of the average infection rate. under such a distribution with small sample sizes, the sample mean yields estimates that are far from the true mean and standard errors that are too small. to address such possibility, it will be helpful to estimate the power law exponent. if the estimate for example, it is prohibitively costly to shut down daycare, but it is less costly to prevent a large concert. its implications for lockdown policies (acemoglu et al., ; davies et al., ; gollier, ; rampini, ; glover et al., ; brotherhood et al., ) . we emphasize another dimension of targeting: targeting toward large social gatherings, and this policy reduces the uncertainty regarding various epidemiological outcomes. another related paper is beare and toda ( ) . they document that the cumulative number of infected population across cities and countries is closely approximated by a power law distribution. they then argue that the standard sir model is able to explain the fact. we document that the infection at the individual level follows a power law. finally, it is well-known that many variables follow a power law distribution. these include the city size (zipf, ) , the firm size (axtell, ), income (atkinson et al., , wealth (kleiber and kotz, ) , consumption (toda and walsh, ) and even the size of the earthquakes (gutenberg and richter, ) , the moon craters and solar flares (newman, ) . regarding covid- , beare and toda ( ) document that the cumulative number of infected population across cities and countries is closely approximated by a power law distribution. they then argue that the standard sir model is able to explain the fact. we document that the infection at the individual level follows a power law. we are also partly inspired by economics literature which argue that the fat-tailed distribution in firm-size has an important consequence for the macroeconomics dynamics, originated by gabaix ( ) . we follow the similar route in documenting that the sses are well approximated by a power law distribution and arguing that such empirical regularities have important consequences for the epidemiological dynamics. roadmap. the rest of the paper is organized as follows. section documents evidence that the distribution of sses follows power law. section embed the evidence into an otherwise standard sir models to demonstrate its implications for the epidemiological dynamics. section studies estimation of the reproduction numbers under fat-tailed distribution. section concludes by discussing what our results imply for ongoing covid- pandemic. at time t. then, given some threshold z, an individual i is said to have caused sse at time t if z it ≥ z . to make the estimation flexible, suppose the distribution for non-sses, z it < z, needs not follow the same distribution as those for sses. in this paper, we consider a power law (or pareto) distribution on the distribution of sse. denoting its exponent by α, the countercumulative distribution is where π is the probability of sses. notably, its mean and variance may not exist when α is sufficiently low: while its mean is in this paper, we formally call a distribution to be fat-tailed if α < so that they have infinite variance. while non-existence of mean and variance may appear pathological, a number of socioeconomic and natural phenomenon such as city sizes (α ≈ ), income (α ≈ ), and earthquake energy (α ≈ ) have tails well-approximated by this distribution as reviewed in the introduction. a theoretical reason why this distribution could be relevant for airborne diseases is that the number of connections in social networks often follow a power law (barabasi and frangos, ) . this characteristics stands in contrast with the standard assumption in epidemiology literature that the full distribution of z it follows a negative binomial (or pascal) distribution with finite mean and variance. the negative binomial distribution has been estimated to fit the data better than poisson or geometric distribution for sars (lloyd-smith et al., ) , and given its theoretical bases from branching model (e.g. gay et al., ) , it has been a standard distributional assumption in the epidemiology literature (e.g. nishiura et al., ) . this paper uses five datasets of recent coronavirus outbreaks for examining the distribution of sses: covid- data from (i) across the world, (ii) japan, and (iii) india, and (iv) sars data, (v) mers data. (i) covid- data from around the world: this dataset contains clusters of infections found by a systematic review of academic articles and media reports, conducted by the centre of mathematical modelling of infectious diseases covid- working group (leclerc et al., ) . the data are restricted to first generation of cases, and do not include subsequent cases from denoting its mean by r and dispersion parameter by k, the distribution is the variance of this distribution is r + r k . the distribution nests poisson distribution (as k → ∞) and geometric distribution (when k = .) the infections. the data are continuously updated, and in this draft, we have used the data downloaded on june rd. there were a total of clusters recorded. (ii) covid- data from japan: this dataset contains a number of secondary cases of covid- patients across clusters in japan until february th, , reported in nishiura et al. ( ) . this survey was commissioned by the ministry of health, labor, and welfare of japan to identify high risk transmission cases. (iii) covid- data from india: this dataset contains the state-level data collected by the ministry of health and family welfare, and individual data collected by covid india.org. we use the data downloaded on may st. (iv) sars from around the world: this dataset contains incidents of sses from sars in that occured in hong kong, beijing, singapore, and toronto, as gathered by lloyd-smith et al. ( ) through a review of papers. the rate of community transmission was not generally high so that, for example, the infections with unknown route were only about percent in the case of beijing. the data consist of sses, defined by epidemiologists (shen et al., ) as the cases with more than secondary cases. for singapore and beijing, the contact-tracing data is available from hsu et al. ( ) and shen et al. ( ) , respectively. when compare the fit to the negative binomial distribution, we compare the fit of power law to that of negative binomial using these contact tracing data. (v) mers from around the world: this dataset contains mers clusters reported up to august , . the cases are classified as clusters when thee are linked epidemiologically. the data come from three published studies were used in kucharski and althaus ( ) . total of clusters are recorded. we use multiple data sets in order to examine the robustness of findings. having multiple data sets can address each other's weaknesses in data. while data based on media reports is broad, they may be skewed to capture extreme events; in contrast, data based on contact tracing may be reliable, but are restricted to small population. by using both, we can complement each data's weaknesses. the datasets report cumulative number of secondary cases, either ∑ i z it (when a particular event may have had multiple primary cases) or ∑ t z it (when an individual infects many oth- https://www.kaggle.com/sudalairajkumar/covid -in-india. covid india.org is a volunteer-based organization that collects information from municipalities. even though lloyd-smith et al. ( ) had analyzed other infectious diseases, sars was the only one with sufficient sample sizes to permit reliable statistical analyses. he infectious diseases considered here share some commonalities as sars-cov that causes sars, mers-cov that causes mers, and sars-cov- that causes covid- are human coronaviruses transmitted through the air. they have some differences in terms of transmissibility, severity, fatality, and vulnerable groups (petrosillo et al., ) . but overall, as they are transmitted through the air, they are similar compared to other infectious diseases. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . ers through multiple events over time). denoting these cumulative numbers by z, we consider this distribution for some z ≥ z * . as discussed in appendix a. , we can interpret the estimates of this tail distribution as approximately the per-period and individual tail distribution and therefore map directly to the parameter of the sir model in the next section. the thresholds for inclusion, z * , will be chosen to match the threshold for sses when possible, but also adjust for the sample size. for covid- in the world, we apply z = to focus on the tail of the sse distribution. for sars, we apply z = as formally defined (shen et al., ) . for other samples, we apply z = because the sample size is limited. to assess whether the distribution of z follows the power law, we adopt the regressionbased approach that is transparent and commonly used. if z follows power law distribution, then by ( ), the log of z and the log of its underlying rank have a linear relationship: log rank(z) = −α log z + log(nπz α ). this is because, when there are n individuals, the expected ranking of a realized value z is erank(z) p(z ≥ z)n for moderately large n. thus, when n is large, we obtain a consistent estimate of α by the following regression: when n is not large, however, the estimate will exhibit a downward bias because log is a concave function and thus e log rank(z) < log erank(z). while we present the analysis according to ( ) in figures and for expositional clarity, we also report the estimates with small sample bias correction proposed by gabaix and ibragimov ( ) in appendix a. . . we also estimate using the maximum likelihood in appendix a. . . note that when there are ties (e.g. second and third largest had infections), we assigned different values to each observation (e.g. assigning rank of and to each observation). next, we also compare the extent to which a power law distribution can approximate the distribution of sses adequately relative to the negative binomial distribution. first, we plot what the predicted log-log relationship in ( ) would be given the estimated parameters of negative binomial distribution. second, to quantify the predictive accuracy, we compute the ratio of likelihood of observing the actual data. table summarizes the estimates of power law exponent (α) given as the coefficient of regression of log of number of infections (or size of clusters) on the log of their rankings. heteroskedasticityrobust standard errors are reported in the parenthesis. z denote the threshold number of infection to be included. log (lr) denotes "likelihood ratios", expressed in the log with base , of probability of observing this realized data with power law distributions relative to that with estimated negative binomial distributions. columns ( )-( ) report estimates for covid- ; columns ( )-( ) for sars, and column ( ) for mers. our analysis shows that the power law finely approximates the distribution of sses. figure visualizes this for covid- from across the world, and figure for sars, mers, and covid- in japan and india. their r range between . and . , suggesting high levels of fit to the data. because our focus is on upper-tail distribution, figure truncates below at the cluster size , figure truncates at for sars and at for mers and covid- in india and japan. figure a . in appendix presents a version of figure truncated below at . in addition, the estimates of regression ( ) suggest that the power law exponent, α, is below and even close to . table summarizes the main findings. the estimated exponents near suggest that extreme sses are not uncommon. for covid- in japan and india, the estimated exponents are larger than but often below . since applying the threshold of z * = is arguably too low, we must interpret out-of-sample extrapolation from these estimates with caution. when higher thresholds are applied, the estimated exponents tend to be higher. for example, when applying the threshold of z * = as in sars to covid- in india, the estimated exponent is . or . . this pattern is already visible in figure . table a . in appendix a. . presents results using bias correction technique of gabaix and ibragimov ( ) as well as maximum likelihood. the results are very similar. notably, the estimated exponent of india is higher than those of other data. there are two possible explanations. first, the lockdown policies in india have been implemented strictly rel- all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint notes: figure plots the number of total cases per cluster (in log) and their ranks (in log) for mers, and the number of total cases per cluster (in log) and their ranks (in log) for sars and covid- in japan and india. the data for sars are from lloyd-smith et al. ( ) , and focus on sses defined to be the primary cases that have infected more than secondary cases. the data for mers come from kucharski and althaus ( ) . the data for japan comes from periods before february , , reported in nishiura et al. ( ) . the data for india are until may , , reported by the ministry of health and family welfare, and covid india.org. the plots are restricted to be the cases larger than . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint figure plots the predicted ranking of infection cases given the estimated negative binomial (nb) distribution, in addition to the log-log plots and estimated power law (pl) distributions. the negative binomial distribution is parameterized by (r, k), where r is mean and k is the dispersion parameter with the variance being r( + r/k). the estimates for sars singapore come from our own estimates using the maximum likelihood (r = . , k = . ); mers come from the world (r = . , k = . ) estimated in kucharski and althaus ( ) ; and covid- in japan were from our own estimates using the maximum likelihood (r = . , k = . ). the estimates of singapore is slightly different from lloyd-smith et al. ( ) because we pool all the samples. ative to moderate approaches in japan and some other parts of the world during the outbreaks. by discouraging and prohibiting large-scale gatherings, sometimes by police enforcement, they may have been successful at targeting sses. second, contact tracing to ensure data reliability may have been more difficult in india until end of may than in japan until end of february. while missing values will not generate any biases if the attritions were proportional to the number of infections, large gatherings may have dropped more than in japan where the sses were found through contact tracing. nonetheless, these estimates suggest that various environments and policies could decrease the risks of the extreme sses. this observation motivates our policy simulations to target sses. next, we compare the assumption of power law distribution relative to that of a negative binomial distribution. figure shows that the negative binomial distributions would predict that the extreme sses will be fewer than the observed distribution: while it predicts the overall concretely, there were only cases of more than one secondary infections reported in the data among , primary cases in the data from india. that is, only . percents of primary cases were reported to have infected more than one persons. in contrast, there were cases with more than one secondary infections among primary cases in japan. that is, percent of primary cases were infectious. this difference in ration likely reflects the data collection quality than actual infection dynamics. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint negative binomial - % % % table : probabilities of extreme sses under each distribution notes: table shows the size of secondary cases at each quantile, top percentile, percentile, and percentile, given each distributions. the negative binomial distribution's estimates for sars are from singapore, for covid- are from japan, and for mars is from around the world. probability of sses accurately, they suggest that, when they occur, they will not be too extreme in magnitude. table reports the relative likelihood, in logs, of observing the data given the estimated parameters. it shows that, under the estimated power law distribution relative to the estimated negative binomial distribution, it is − times more likely to observe the sars data ( times more for mers, and times more covid- data in japan). such large differences emerge because the negative binomial distribution, given its implicit assumption of finite variance, suggests that the extreme sses are also extremely rare when estimated with entire data sets . if our objective is to predict the overall incidents of infections parsimoniously, then negative binomial distribution is well-validated and theoretically founded (lloyd-smith et al., ) . however, if our goal is to estimate the risks of extreme sses accurately, then using only two parameters with finite variance to estimate together with the entire distribution may be infeasible. these distributional assumptions have critical implications for the prediction of the extreme sses. table presents what magnitude top %, top %, and top % among sses will be given each estimates of the distribution. given the estimates of the negative binomial distribution, even the top % of sses above cases will be around the magnitude of - . however, given a range of estimates from power law distribution, the top % could be as large as . thus, it is no longer surprising that the largest reported case for covid- will be over , people. in contrast, such incidents have vanishingly low chance under binomial distributions. since the sses are rare, researchers will have to make inference about their distribution based some parametric methods. scrutinizing such distributional assumptions along with the estimation of parameters themselves will be crucial in accurate prediction of risks of extreme sses. motivated by the evidence, we extend an otherwise standard stochastic sir model with a fattailed sses. unlike with thin-tailed distributions, we show that idiosyncratic risks of sses induce aggregate uncertainties even when the infected population is large. we further show that the resulting uncertainties in infection rates have important implications for average epidemiological outcomes. impacts of lockdown policies that target sses are discussed. suppose there are i = , ..., n individuals, living in periods t = , , .... infected individuals pass on and recover from infection in heterogeneous and uncertain ways. let β it denote the number of new infection in others an infected individual i makes at time t. let γ it ∈ { , } denote the recovery/removal, where a person recovers (γ it = ) with probability γ ∈ [ , ]. note that, whereas z it in section was a stochastic analogue of "effective" reproduction number, β it here is such analogue of "basic reproduction number." assuming enough mixing in the population, these two models are related by this model departs from other stochastic sir models only mildly: we consider a fat-tailed, instead of thin-tailed, distribution of infection rates. based onthe evidence, we consider a power law distribution of β it : its countercumulative distribution is given by for the exponent α and a normalizing constant β, and π ∈ [ , ] is the probability that β ≥ β. note that the estimated exponent α can be mapped to this model, as discussed in appendix a. . if we assume β it is distributed according to exponential distribution or negative binomial distribution, we obtain a class of stochastic sir models commonly studied in the epidemiological literature (see britton ( britton ( , for surveys). we will compare the evolution dynamics under this power law distribution against those under negative binomial distribution as commonly assumed, keeping the average basic reproduction number the same. to numerically implement this, we will introduce normalization to the distributions. the evolution dynamics is described by the following system of stochastic difference equations. writing the total number of infected and recovered/removed populations by i t and r t , all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . this system is a discrete-time and finite-population analogue of the continuous-time and continuouspopulation differential equation sir models. parametrization: we parametrize the model as follows. the purpose of simulation is a proof of concept, rather than to provide a realistic numbers. we take the length of time to be one week. we set the sum of the recovery and the death rate per day is / following wang et al. ( ) , so that γ = / . as a benchmark case, we set α = . , which is the average from the our estimates from sars and covid- in japan, but we explore several other parametrization, α ∈ { . , . , . , }. as documented in nishiura et al. ( ) , % of people did not infect others. we therefore set π = . . this number is also in line with the evidence from sars reported in lloyd-smith et al. ( ) , in which % of cases were barely infectious. we choose β, which controls the mean of β it , so that the expected r ≡ eβ it /γ per day is . , corresponding to the middle of the estimates obtained in remuzzi and remuzzi ( ) . this leads us to choose β = . in the case of α = . . we will contrast the above model to a model in which β it is distributed according to negative binomial, β it /γ ∼ negative binomial(r , k). the mean of this distribution is eβ it /γ = r , ensuring that it has the same mean basic reproduction number as in the power law case, and all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint the variance is r ( + r /k). the smaller values of k indicate greater heterogeneity (larger variance). we use the estimates of sars by lloyd-smith et al. ( ) , k = . . the mean is set to the same value as power law case, r = . , figure a shows sample paths of infected population generated through the simulation of the model with α = . . one can immediately see that even though all the simulation start from the same initial conditions under the same parameters, there is enormous uncertainty in the timing of the outbreak of the disease spread, the maximum number of infected, and the final number of susceptible population. the timing of outbreak is mainly determined by when sses occur. to illustrate the importance of a fat-tailed distribution, figure b shows the same sample path but with a thin-tailed negative binomial distribution. in this case, as already , people are infected in the initial period, the clt implies the aggregate variance is very small and the all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . the herd immunity threshold is given by the cumulative number of infected, at which the infection is at the peak. formally, s t * where t * = arg max t i t . the peak number of infected is max t i t . model is largely deterministic. this is consistent with britton ( ) . britton ( ) shows that when the total population is as large as , or , , the model quickly converges to the deterministic counterpart. figure compares the entire distribution of the number of cumulative infection (top-left), the herd immunity threshold (top-right), the peak number of infected (bottom-left), and the days it takes to infect % of population (bottom-right). the herd immunity threshold is defined as the cumulative number of infected at which the number of infected people is at its peak. the histogram contrast the case with power law distribution with α = . to the case with negative binomial distribution. it is again visible that uncertainty remains in all outcomes when the distribution of infection rate is fat-tailed. for example, the cumulative infection varies from % to % in the power law case, while the almost all simulation is concentrated around % in the case of negative binomial distribution. table further shows the summary statistics for the epidemiological outcomes for various all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . table shows the summary statistics from simulations for five different tail parameters for the case of power law distribution, and for the negative binomial distribution. power law tail parameters, α, as well as for negative binomial distribution. with fat-tails, i.e. α close to one, the range between th percentile and th percentile for all statistics is wide, but this range is substantially slower as the tail becomes thinner (α close to ). for example, when α = . the peak infection rate can vary from % to % as we move from the percentile to th percentile. in contrast, when α = , the peak infection rate is concentrated at - %. moreover, when α = , the model behaves similarly to the model with negative binomial distribution because the clt applies to both cases. while our primary focus was the effect on the uncertainty of epidemiological outcomes, figure also shows significant effects on the mean. in particular, fat-tailed distribution also lowers all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . cumulative infection, the herd immunity threshold, the peak infection, and delays the time it takes to infect % of population, on average. why could such effects emerge? to understand these effects, we consider a deterministic sir model with continuous time and continuum of population. in such a textbook model, we consider the effect of small uncertainties (i.e. mean-preserving spread) in r . such theoretical inquiry can shed light on the effect because the implication of fat-tailed distribution is essentially to introduce time-varying fluctuation in aggregate r . we can thus examine how the outcome changes by r , and invoke jensen's inequality to interpret the results. . effect on cumulative infection: note that the cumulatively infected population is given by − s ∞ /n, where s ∞ is the ultimate susceptible population as t → ∞. taking the derivations shown in harko et al. ( ) , moll ( ) or toda ( ), s ∞ satisfies the following equation: in appendix b, we prove that s ∞ is a convex function of r if r > . , , which is likely to be met in sars or thus, the cumulative infection is concave in r , and the mean-preserving spread in r lowers the cumulative infection. . effect on herd immunity threshold: denoting the number of recovered/removed and infected population by r, the infection will stabilize when r n−r n = . rearranging this condition, the herd immunity threshold, r * is given by where r ≡ β/γ. thus, r * is concave in r . thus, the mean-preserving spread in r lowers the herd immunity threshold. . effect on timing of outbreak: let us consider the time t * when some threshold of outbreak i n * is reached. supposing s/n ≈ at the beginning of outbreak, t * satisfies thus, t * is convex in r , and the mean-preserving spread in r delays the timing of the outbreak. . effect on peak infection rate: the peak infection rate, denoted by i max n , satisfies where s is initial susceptible population. we show in the appendix that ( ) implies that the peak infection, i max /n, is a concave function of r if and only if r ≥ s exp( . ). if we let s ≈ , this implies r ≥ exp( . ) ≈ . . this explains why we found a reduction in peak infection rate, as we have assumed r = . . loosely speaking, since the peak infection rate is bounded above by one, it has to be concave for sufficiently high r . overall, we have found that the increase in the uncertainty over r has effects similar to a decrease in the level of r . this is because the aggregate fluctuations in r introduce negative correlation between the future infection and the future susceptible population. high value of today's r ≡ e β it γ increases tomorrow's infected population, i t+ , and decreases tomorrow's susceptible population, s t+ . that is, cov(s t+ , i t+ ) < . because the new infection tomorrow is a realization of β t+ multiplied by the two (that is, β t+ i t+ s t+ n ) this negative correlation reduces the spread of the virus in the future on average, endogenously reducing the magnitude of the outbreak. this interpretation also highlights the importance of intertemporal correlation of infection rates, cov(β t , β t+ ). when some individuals participate in events at infection-prone environments more frequently than others, the correlation will be positive. such effects can lead to a sequence of clusters and an extremely rapid rise in infections (cooper et al., ) that overwhelm the negative correlation between s t+ and i t+ highlighted above. on the other hand, when infections take place at residential environments (e.g. residential compound in hong kong for sars, and dormitory in singapore for covid- ), then the infected person will be less likely to live in another residential location to spread the virus. in this case, the correlation will be negative. in this way, considering the correlation of infection rates across periods will be crucial. note that the mechanism we identified on herd immunity thresholds is distinct from the ones described in gomes et al. ( ) ; hébert-dufresne et al. ( ); britton et al. ( ) . they note that when population has permanently heterogenous activity rate, which captures both the probability of infecting and being infected, the herd immunity can be achieved with lower threshold level of susceptible. they explain this because majority of "active" population becomes infected faster than the remaining population. our mechanism does not hinge on the permanent heterogeneity in population, which could have been captured by cov(β it , β it+ ) = . the fat-tailed distribution in infection rate alone creates reduction in the required herd immunity rate in expectation. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . how could the policymaker design the mitigation policies effectively if the distribution of infection rates is fat-tailed? here, we concentrate our analysis on lockdown policy. unlike the traditionally analyzed lockdown policy, we consider a policy that particularly targets sses. specifically we assume that the policy can impose an upper bound on β it ≤β with probability φ. the probability φ is meant to capture some imperfection in enforcements or impossibility in closing some essential facilities such as hospitals and daycare. for tractability, we assume that the government locks down if the fraction of infected exceeds % of the population and maintain lockdown for months. nonetheless, our results are not sensitive to a particular parameter chosen. we also setβ is percentile of the infection rate. while table b . in appendix presents results in detail, we briefly summarize the main results here. first, the lockdown policy reduces the mean of the peak infection rate, and the policy is more effective when the distribution features fatter tails. second, the targeted lockdown policy is effective in reducing the volatility of the peak infection rate in the case that such risks exist in the first place. for example, consider the case with α = . . moving from no policy (φ = ) to sufficiently targeted lockdown policy (φ = . ) reduces the th percentile of peak infection by %. in contrast, when α = or with negative binomial distribution, the policy only reduces by % and %, respectively. therefore the policy is particularly effective in mitigating the upward risk of overwhelming the medical capacity. this highlights that while the fat-tailed distribution induces the aggregate risk in the epidemiological dynamics, the government can partly remedy this by appropriately targeting the lockdown policy. we conclude this section by discussing several modeling assumptions. first, we have assumed that {β it } is independently and identically distributed across individuals and over time. this may not be empirically true. for example, a person who was infected in a big party is more likely to go to a party in the next period. this introduces ex ante heterogeneities as discussed in (gomes et al., ; hébert-dufresne et al., ; britton et al., ) , generating positive correlation in {β it } along the social network. or, a person who tends to be a superspreader may be more likely to be a superspreader in the next period. this induces a positive correlation in {β it } over time. if the resulting cascading effect were large, then the average effects on the epidemiological outcomes we have found may be overturned. second, we have exogenously imposed power law distributions without exploring underlying data generation mechanisms behind them. the natural next step is to provide a model in which individual infection rate is endogenously pareto-distributed. we believe sir models with social networks along the line of pastor-satorras and vespignani ( ), moreno et al. ( ) , castellano and pastor-satorras ( ) , may and lloyd ( ) , zhang et al. ( ) , gutin et al. ( ), and akbarpour et al. ( ) are promising avenue to generate endogenous power law in individual infection rates. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . we began with the evidence that sses follow a power law distribution with fat tails in many settings, and showed that such distributions substantively change the predictions of sir models. in this section, we discuss the implications of power law distributions for estimating the effective reproduction number. estimation of average reproduction numbers (r t ) has been the chief focus of empirical epidemiology research (e.g. becker and britton, ) . our estimates across five different data sets suggest that the exponent satisfies α ∈ ( , ) in many occasions: that is, the infection rates have a finite mean but an infinite variance. since the mean exists, by the law of large numbers, the sample mean estimates (see e.g. nishiura, ) that have been used in the epidemiology research will be consistent (i.e. converge to the true mean asymptotically) and also unbiased (i.e. its expectation equals the true mean with finite samples.) due to the infinite variance property, however, the sample mean will converge very slowly to the true mean because the classical clt requires finite variance. formally, while the convergence occurs at a rate √ n for distributions with finite variance, or thin tails, it occurs only at a rate n − α for the power law distributions with fat tails, α ∈ ( , ) (gabaix, ) . under distributions with infinite variance, or fat tails, the sample mean estimates could be far from the true mean with reasonable sample sizes, and their estimated confidence intervals will be too tight. figure plots a monte carlo simulation of sample mean's convergence property. for thin-tailed distributions such as the negative binomial distribution or the power law distribution with α = , even though the convergence is slow due to their very large variance, they still converge to the true mean reasonably under a few , observations. in contrast, with fattailed distributions such as power law distribution with α = . or α = . , the sample mean will remain far from the true mean. their sample mean estimates behave very differently as the sample size increases. every so often, some extraordinarily high values occur that significantly raises the sample mean and its standard errors. when such extreme values are not occurring, the sample means gradually decrease. with thin tails, such extreme values are rare enough not to cause such sudden increase in sample means; however, with fat tails, the extreme values are not so rare. for α = exactly, the convergence will occur at rate ln n. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint notes: figure depicts an example of sample mean estimates for thin-tailed and fat-tailed distributions. the draws of observations are simulated through the inverse-cdf method, where the identical uniform random variable is applied so that the sample means are comparable across four different distributions. all distributions are normalized to have the mean of . . the negative binomial (nb) distribution has the dispersion parameter k = . taken from (lloyd-smith et al., ) . the range of power law (pl) parameters is also taken from the empirical estimates. what methods could address the concerns that the sample mean may be empirically unstable? one approach may be to exclude some realizations as an outlier, and focus on subsamples without extreme values . however, such analysis will neglect major source of risks even though extreme "outlier" sses may fit the power law distributions as shown in figure . while estimating the mean of distributions with rare but extreme values has been notoriously difficult , in japan, the case of over infections in the cruise ship diamond princess was excluded from all other analyses. consider, for example, a binary distribution of infection rates such that one infects n others with /n probability, and others with − /n probability. in this case, the true mean r t = . suppose a statistician observes infected cases for each estimation. if n were , , then with (≈ . ) percent chance, nobody becomes infected so thatr t = , and the estimates' confidence interval will be [ , ]. but with less than percent chance when any infection occurs,r t will be larger than . thus, the percent confidence interval contains the true mean in less than percent of the time. to the best of our knowledge, there is no techniques that can help us completely avoid this problem given the fundamental constraint of small sample size. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint notes: figure plots the estimates of power law exponents and the resulting estimates of sample median, using the same data as in figure . note that while the number of observations contains all observations, the data points contributing to the estimates are only above some thresholds: only less than percents of the data contribute to the estimation of the exponents. there are some approaches to address this formally. with power law distributions, the estimates of exponent have information that can improve the estimation of the mean. figure shows that the exponents α can be estimated adequately with reasonable sample sizes. if α > , as may be the case for the india under strict lockdown, then one can have more confidence in the reliability of sample mean estimates. however, if α < , the sample mean may substantially differ from the true mean. at the least, one can be aware of the possibility. one transparent approach is a "plug-in" method: to estimate the exponentα, and plug into the formula of the meanα α− z. this method yields a valid confidence intervals (c.i.) of the median since the estimatedα has valid confidence intervals. figure shows the estima-tion results for the same data with α = . , . as shown in figure . first, while the sample mean in figure had substantially underestimated the mean, this estimated median is close to the true mean. second, while the sample mean estimation imposed symmetry between lower and upper bounds of percent confidence intervals, this estimate reflects the skewness of uncertainties: upward risks are much higher than downward risks because of the possibility of extreme events. third, the standard errors are much larger, reflecting the inherent uncertainties given the limited sample sizes. fourth, the estimates are more stable and robust to the extreme values than the sample mean estimates that have sudden jumps in the estimates after the extreme values. table demonstrates the validity of the "plug-in" method through a simulation experiment. the table shows the comparison of the probability that the constructed % c.i. covers the true mean using the , monte-carlo simulation. when the estimate is unbiased and has correct standard errors, this coverage probability is %. when the power law exponent is close to one, the traditional "sample means" approach has the c.i. that covers the true mean only with - % for all sample sizes. by contrast the "plug-in" method covers the true estimates close to %. as the tail becomes thinner toward α = , the difference between the two tends to disappear, with "sample mean" approach performing better some times. when the underlying distribution has fat-tails, however, estimation using the plug-in method is preferred. while the c.i. in the plug-in method has adequate coverage probabilities, it is often very large and possibly infinite. figure visualizes this. this large c.i. occurs especially when α because the mean of a power law distribution is proportional to α −α . how could the policymakers plan their efforts do given such large uncertainty in r ? given the theoretical results in section that the epidemiological dynamics will be largely uncertain even when α is perfectly known, we argue that applying the estimated r into a deterministic sir model will not lead to a reliable prediction. instead of focusing on the mean, it will be more adequate and feasible to focus on the distribution of near-future infection outcomes. for example, using the estimated power law distribution, policymakers can compute the distribution of the future infection rate. the following analogy might be useful: in planning for natural disasters such as hurricanes and earthquakes, policymakers will not rely on the estimates of average rainfall or average seismic activity in the future; instead, they consider the probabilities of some extreme events, and propose plans contingent on realizations. similar kinds of planning may be also constructive regarding preparation for future infection outbreaks. to overcome data limitations, epidemiologists have developed a number of sophisticated here. when the number of observations is less than , the estimated confidence interval of α contains values less than . , turning the upper bound of the mean to be ∞. this does not mean that a correct expectation is ∞ infections in the near future, but that there is serious upward risks in infection rates. this is because the estimation through log-likelihood will take the log of the realized value, instead of its level. α = . α = . α = . table : coverage probability of % confidence interval note: thable reports the probability that the % confidence interval, constructed in two different ways, covers the true value in simulation. "sample means" is simply uses the sample mean. "using power laws uses" first estimates the pareto exponent using the maximum likelihood, and then convert it to the mean estimates. methods such as backcalculation assuming poisson distribution (becker et al., ) , and ways to account for imported cases. there are also a number of methods developed to account for fat-tailed distributions (see e.g. stoyanov et al., , for a survey), such as tail tempering (kim et al., ) and separating the data into sub-groups (toda and walsh, ) . in the future, it will be important to examine what power law distributions will imply about existing epidemiological methods, and how statistical techniques such as plug-in methods can be combined with epidemiological techniques to allow more reliable estimation of risks. most research on infection dynamics has focused on deterministic sir models, and have estimated its key statistics, the average reproduction number (r ). in contrast, some researchers have concentrated on sses, and estimated the dispersion of infection rates using negative binomial distributions. nonetheless, stochastic sir models based on estimated distributions have predicted that idiosyncratic uncertainties in sses would vanish when the infected population is large, and thus, the epidemiological dynamics will be largely predictable. in this paper, we have documented evidence from sars, mers, and covid- that sses actually follow a power law distribution with the exponent α ∈ ( , ): that is, their distributions have infinite variance rather than finite variance as with the negative binomial distributions. our stochastic sir model with these fat-tailed distributions have shown that idiosyncratic uncertainties in sses will persist even when the infected population is as large as , , inducing major unpre-dictability in aggregate infection dynamics. since the currently infected population is estimated to be around million in the covid- pandemic, our analysis has immediate implications for policies of today. for statistical inference, the aggregate unpredictability suggests caution is warranted on drawing inferences about underlying epidemiological conditions from observed infection outcomes. first, large geographic variations in infections may be driven mostly by idiosyncratic factors, and not by fundamental socioeconomic factors. while many looked for underlying differences in public health practices to explain the variations, our model shows that these variations may be more adequately explained by the presence of a few, idiosyncratic sses. second, existing stochastic models would suggest that, keeping the distribution of infection rates and pathological environments constant, recent infection trends can predict the future well. in contrast, our analysis shows that even when the average number of new infections may seem to have stabilized at a low level in recent weeks, subsequent waves can suddenly arrive in the future. such uncertainties in outbreak timing and magnitude introduce substantial socioeconomic difficulties, and measures to assess and mitigate such risks will be invaluable. because the death rate is shown to increase when the medical capacity binds, the social cost of infection is a convex function of infection rates. in this sense, reducing uncertainties has social benefits. furthermore, uncertainties can severely deter necessary investments and impede planning for reallocation and recovery from the pandemic shocks. to assess such risks, we can estimate the tail distributions to improve our inference on the average number. to address such risks, social distancing policies and individual efforts can focus on large physical gatherings in infection-prone environments. our estimates suggest, like earthquakes, infection dynamics will be largely unpredictable. but unlike earthquakes, they are a consequence of social decisions, and efforts to reduce sses can significantly mitigate the uncertainty the society faces as a whole. according to worldometers.info, the cumulative infection worldwide is million, among which million have already recovered or died, as of june , . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint a restaurant and infect two other people at time t, and then goes to a shopping mall and infects three other people at time t + , and then goes to meet her two friends and infect them, and so on. however, this interpretation is inconsistent with numerous anecdotes. instead, a superspreader infects many people because he attends a sse that has infection-prone environment at a particular time t. conferences, parties, religious gatherings, and sports gyms are a particular place that can infect many at the same time. moreover, nishiura et al. ( ) paper whose data we use has identified particular environment that has caused sses. this interpretation is important because, if the extremely high cumulative number of infection were due to some staying infectious for a long time or some events having extremely high number of primary cases, then our model's prediction of sudden outbreak due to sse is no longer a valid prediction. second, we may be concerned that the exponent of Φ (z i ) may be different than the exponent of f (β iτ ), even if both have tails that follow power laws. we use two steps to show that this is not a concern: (i) if a random variable has a power law distribution with exponent α, then its weighted sum also has a tail distribution that follows a power law with exponent α (see e.g. jessen and mikosch ( ) or gabaix ( ) ). thus, neither summation over multiple periods nor the weights of s τ n will change this. (ii) the tail property of distribution can be examined by considering α f (z) = f (z) f (cz) for some c = and taking its limit. in particular, if f has a power law distribution, then α f (z) = c α . denoting the probability mass of g t (·) by g t (·), and the normalizing constant of each t by a t , thus, the exponent of Φ (z i ) will be identical to the exponent of f (β iτ ) asymptotically. this discussion suggests that whenever possible, it is desirable to take the estimates from the tail end of the distribution instead of using moderate values of z. for the covid- from the world, the distributions are estimated from the very extreme tail. but when the sample size of sses is limited, choice of how many observations to include thus faces a bias-variance trade-off. nonetheless, as many statistical theories are based on asymptotic results, these arguments show that it is theoretically founded to interpret the exponent of Φ (z i ) as the exponent of f (β iτ ), at least given the data available. we present several robustness checks on our empirical results. in figure , we truncated the size of cluster from below at . figure a. instead show results with a cut-off of . the fit is worse at the lower tail of the distribution, which suggests that the lower tail may not be approximated by power law distribution. this is a common feature among many examples. however, what matters for the existence of variance is the upper tail distribution, we do not think this is a concern. moreover, given that the data partly come from media reports, the clusters of small sizes likely suffer from omission due to lack of media coverage. gabaix and ibragimov ( ) show that an estimate of is biased in a small sample and propose a simple bias correction method that replace the dependent variable with ln(rank − / ). panel a of (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint source: cmmid covid− working group online database (leclerc et al., ) number of total cases per cluster (in log) all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . table a . summarizes two robustness check exercises of power law exponent (α). panel a. bias corrected estimates take log(rank− ) as the dependent variable. this is a small sample bias correction proposed by gabaix and ibragimov ( ) . heteroskedasticity-robust standard errors are reported in the parenthesis. panel b. presents the maximum likelihood estimates. standard errors are reported in the parenthesis. in both panels, log (lr) denotes "likelihood ratios", expressed in the log with base , of probability of observing this realized data with power law distributions relative to that with estimated negative binomial distributions. columns ( )-( ) report estimates for covid- ; columns ( )-( ) for sars, and column ( ) for mers. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . the derivative is di max /n dr = (r ) log(r s ). which is negative if and only if r > s exp( . ). table b . shows the simulation results with lockdown policies targeted at sses. φ = corresponds to no policy, φ = . (φ = . ) means that the government can prevent sses with % ( %) probability. as already discussed in the main text, when the distribution is fat-tailed, the targeted policy is not only effective in reducing the mean of the peak infection rate, but also its volatility (the interval between percentile and percentile). all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . power law negative α = . α = . α = . α = . α = binomial all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . the transmissibility of novel coronavirus in the early stages of the - outbreak in wuhan: exploring initial point-source exposure sizes and durations using scenario analysis the disease-induced herd immunity level for covid- is substantially lower than the classical herd immunity level an economic model of the covid- epidemic: the importance of testing and age-specific policies thresholds for epidemic spreading in networks pareto rules for malaria super-spreaders and super-spreading age-dependent effects in the transmission and control of covid- epidemics estimating the overdispersion in covid- transmission using outbreak sizes outside china identifying and interrupting superspreading events-implications for control of severe acute respiratory syndrome coronavirus the granular origins of aggregate fluctuations rank - / : a simple way to improve the ols estimation of tail exponents dimensions of superspreading assessment of the status of measles elimination from reported outbreaks: united states health versus wealth: on the distributional effects of controlling a pandemic cost-benefit analysis of age-specific deconfinement strategies seismicity of the earth and associated phenomena the effect of social distancing on the reach of an epidemic in social networks exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates beyond $r $: heterogeneity in secondary infections and probabilistic epidemic forecasting severe acute respiratory syndrome (sars) in singapore: clinical features of index patient and initial contacts regularly varying functions analysis of covid- infection spread in japan based on stochastic transition model deterministic and stochastic epidemics in closed populations a contribution to the mathematical theory of epidemics financial market models with lévy processes and time-varying volatility statistical size distributions in economics and actuarial sciences the role of superspreading in middle east respiratory syndrome coronavirus (mers-cov) transmission what settings have been linked to sars-cov- transmission clusters? the reproductive number of covid- is higher compared to sars coronavirus superspreading and the effect of individual variation on disease emergence infection dynamics on scale-free networks lockdowns in sir models epidemic outbreaks in complex heterogeneous networks power laws, pareto distributions and zipf's law time variations in the transmissibility of pandemic influenza in prussia assessing the transmission dynamics of measles in japan closed environments facilitate secondary transmission of coronavirus disease epidemic spreading in scale-free networks clinical microbiology and infection: the official publication of the european society of clinical microbiology and infectious diseases sequential lifting of covid- interventions with population heterogeneity covid- and italy: what next a simple stochastic sir model for covid infection dynamics for karnataka: learning from europe fat-tailed models for risk estimation susceptible-infected-recovered (sir) dynamics of covid- and economic impact the double power law in consumption and implications for testing euler equations phase-adjusted estimation of the number of coronavirus disease a stochastic sir epidemic on scale-free network with community structure human behavior and the principle of least effort a. relating empirical distribution of z to theoretical distribution of β it in this paper, we have used the estimates from the data to simulate the evolution dynamics of the epidemiological model. the key step in our argument is that the tail distribution of ∑ i z it or ∑ t z it , the cumulative "effective" number of infections, is equivalent to the tail distribution of β it , the individual and per-period "basic" number of infection. however, in general, this needs not hold: for example, even if β it were normally distributed (i.e. thin tailed), z may follow a tdistribution (i.e. fat-tailed). under what conditions is our interpretation about the relationship between distribution of z and distribution of β i valid? are they plausible in the settings of the coronaviruses?to clarify this question, let us lay out a model. formally, z is a mixture distribution of the weighted sum of β it . here, we provide notations for ∑ t z it but the identical argument will also apply to ∑ i z it . specifically, suppose i stays infected for t periods, and let the probability mass be δ t . in the case of exponential decay as in the sir model, δ t = γ t . denoting the countercumulative distribution of z i by Φ, and that of β it by f, we have first, we may be concerned that, even if Φ is a power law distribution, f may not be a power law distribution. a counterexample is that a geometric brownian motion with stochastic stopping time that follows exponential distribution can also generate power law distributions of the tail (beare and toda, ) . that is, the tail property of Φ needs not be due to tails of f: for ∑ t z it , it could also due to some individuals staying infectious for an extremely long periods. for ∑ i z it , it could also be due to some events having extremely high number of infected primary cases.while we acknowledge such possibilities, we argue that for superspreaders or sses of the coronaviruses, the main mechanism of extremely high number of cumulative infection is primarily due to some extreme events at particular time t. let us be concrete. if the counterexample's reasoning were true for ∑ t z it , then a superspreader is someone who goes, for example, to b theory appendixwe show that s ∞ is a concave function in r . recall that s ∞ is a solution toby the implicit function theorem,because s ∞ < /r . applying the implicit function theorem again,it remains to show that − /s ∞ − −r −r s ∞ < . we can rewrite this asnote that f (·) is minimized at s * ∞ = r . the minimum value istherefore f (s ∞ ) > for all s ∞ if and only if r > . this implies that when r > , s ∞ is a concave function of r .b. proof that i max is concave in r if and only if r > s exp( . )recall that the peak infection rate is given by key: cord- - i kg mj authors: mukhamadiarov, ruslan i.; deng, shengfeng; serrao, shannon r.; priyanka,; nandi, riya; yao, louie hong; tauber, uwe c. title: social distancing and epidemic resurgence in agent-based susceptible-infectious-recovered models date: - - journal: nan doi: nan sha: doc_id: cord_uid: i kg mj once an epidemic outbreak has been effectively contained through non-pharmaceutical interventions, a safe protocol is required for the subsequent release of social distancing restrictions to prevent a disastrous resurgence of the infection. we report individual-based numerical simulations of stochastic susceptible-infectious-recovered model variants on four distinct spatially organized lattice and network architectures wherein contact and mobility constraints are implemented. we robustly find that the intensity and spatial spread of the epidemic recurrence wave can be limited to a manageable extent provided release of these restrictions is delayed sufficiently (for a duration of at least thrice the time until the peak of the unmitigated outbreak) and long-distance connections are maintained on a low level (limited to less than five percent of the overall connectivity). the covid- pandemic constitutes a severe global health crisis. many countries have implemented stringent non-pharmaceutical control measures that involve social distancing and mobility reduction in their populations. this has led to remarkably successful deceleration and significant "flattening of the curve" of the infection outbreaks, albeit at tremendous economic and financial costs. at this point, societies are in dire need of designing a secure (partial) exit strategy wherein the inevitable recurrence of the infection among the significant non-immune fraction of the population can be thoroughly monitored with sufficient spatial resolution and reliable statistics, provided that dependable, frequent, and widespread virus testing capabilities are accessible and implemented. until an effective and safe vaccine is widely available, this would ideally allow the localized implementation of rigorous targeted disease control mechanisms that demonstrably protect people's health while the paralyzed branches of the economy are slowly rebooted. mathematical analysis and numerical simulations of infection spreading in generic epidemic models are crucial for testing the efficacy of proposed mitigation measures, and the timing and pace of their gradual secure removal. specifically, the employed mathematical models need to be (i) stochastic in nature in order to adequately account for randomly occurring or enforced disease extinction in small isolated communities, as well as for rare catastrophic infection boosts and (ii) spatially resolved such that they properly capture the significant emerging correlations among the susceptible and immune subpopulations. these distinguishing features are notably complementary to the more detailed and comprehensive computer models utilized by researchers at the university of washington, imperial college london, the virginia bioinformatics institute, and others: see, e.g., ( ) ( ) ( ) ( ) ( ) ( ) . we report a series of detailed individual-based kinetic monte carlo computer simulation studies for stochastic variants ( , ) of the paradigmatic susceptible-infectious-recovered (sir) model ( , ) for a community of about , individuals. to determine the robustness of our results and compare the influence of different contact characteristics, we ran our stochastic model on four distinct spatially structured architectures, namely i) regular two-dimensional square lattices, wherein individuals move slowly and with limited range, i.e., spread diffusively; ii) two-dimensional small-world networks that in addition incorporate substantial long-distance interactions and contaminations; and finally on iii) random as well as iv) scale-free social contact networks. for each setup, we investigated epidemic outbreaks with model parameters informed by the known covid- data ( ). to allow for a direct comparison, we extracted the corresponding effective infection and recovery rates by fitting the peak height and the half-peak width of the infection growth curves with the associated classical deterministic sir rate equations that pertain to a wellmixed setting. we designed appropriate implementations of social distancing and contact reduction measures on each architecture by limiting or removing connections between individuals. this approach allowed us to generically assess the efficacy of non-pharmaceutical control measures. although each architecture entails varied implementations of social distancing measures, we find that they all robustly reproduce both the resulting reduced outbreak intensity and growth speed. as anticipated, a dramatic resurgence of the epidemic occurs when mobility and contact restrictions are released too early. yet if stringent and sufficiently long-lasting social distancing measures are imposed, the disease may go extinct in the majority of isolated small population groups. in our spatially extended lattice systems, disease spreading then becomes confined to the perimeters of a few larger outbreak regions, where it can be effectively localized and specifically targeted. for the small-network architecture, it is however imperative that all longrange connections remain curtailed to a very low percentage for the control measures to remain effective. intriguingly, we observe that an infection outbreak spreading through a static scale-free network effectively randomizes its connectivity for the remaining susceptible nodes, whence the second wave encounters a very different structure. in the following sections, we briefly describe the methodology and algorithmic implementations as well as pertinent simulation results for each spatial or network structure; additional details are provided in the supplementary materials. we conclude with a comparison of our findings and a summary of their implications. our first architecture is a regular two-dimensional square lattice with linear extension ‫ܮ‬ = subject to periodic boundary conditions (i.e., on a torus). initially, ܰ = ܵሺ ሻ + ‫ܫ‬ሺ ሻ + ܴሺ ሻ = , individuals with fixed density ߩ = ‫ܮ/ܰ‬ ଶ ≈ . are randomly placed on the lattice, with at most one individual allowed on each site. almost the entire population begins in the susceptible state ܵሺ ሻ; we start with only . % infected individuals, ‫ܫ‬ሺ ሻ = , and no recovered (immune) ones, ܴሺ ሻ = . all individuals may then move to neighboring empty lattice sites with diffusion rate ݀ (here we set this hopping probability to ). upon their encounter, infectious individuals irreversibly change the state of neighboring susceptible ones with set rate ‫:ݎ‬ ܵ + ‫ܫ‬ → ‫ܫ‬ + ‫.ܫ‬ any infected individual spontaneously recovers to an immune state with fixed rate ܽ: ‫ܫ‬ → ܴ. (details of the simulation algorithm are presented in the supplementary materials.) for the recovery rate, we choose /ܽ ≅ . days (set to . monte carlo steps, mcs) informed by known covid- characteristics ( ). to determine the infection rate ‫,ݎ‬ we run simulations for various values, fit the peak height and width of the ensuing epidemic curves with the corresponding sir rate equations to extract the associated basic reproduction ratio ܴ (as explained in the supplementary materials, see figure s ), and finally select that value for ‫ݎ‬ for our individual-based monte carlo simulations that reproduces the ܴ ≈ . for covid- ( ). we perform independent simulation runs with these reaction rates, from which we obtain the averaged time tracks for ‫ܫ‬ሺ‫ݐ‬ሻ and ܴሺ‫ݐ‬ሻ, while of course ܵሺ‫ݐ‬ሻ = ܰ − ‫ܫ‬ሺ‫ݐ‬ሻ − ܴሺ‫ݐ‬ሻ and ܴሺ‫ݐ‬ሻ = ܽ ‫‬ ‫ܫ‬ሺ‫′ݐ‬ሻ ‫′ݐ݀‬ ௧ . the standard classical sir deterministic rate equations assume a well-mixed population and constitute a mean-field type of approximation wherein stochastic fluctuations and spatial as well as temporal correlations are neglected; see, e.g., ( , ) . near the peak of the epidemic outbreak, when many individuals are infected, this description is usually adequate, albeit with coarse-grained `renormalized' rate parameters that effectively incorporate fluctuation effects at short time and small length scales. however, the mean-field rate equations are qualitatively insufficient when the infectious fraction ‫ܫ‬ሺ‫ݐ‬ሻ/ܰ is small, whence both random number fluctuations and the underlying discreteness and associated internal demographic noise become crucial ( ) ( ) ( ) . already near the epidemic threshold, which constitutes a continuous dynamical phase transition far from thermal equilibrium, c.f. figure s in the supplementary materials, the kinetics is dominated by strong critical point fluctuations. these are reflected in characteristic initial power laws rather than simple exponential growth of the ‫ܫ‬ሺ‫ݐ‬ሻ and ܴሺ‫ݐ‬ሻ curves ( ) , as demonstrated in figure s (supplemental information). nor can the deterministic rate equations capture stochastic disease extinction events that may occur at random in regions where the infectious concentration has reached small values locally. the rate equations may be understood to pertain to a static and fully connected network; in contrast, the spreading dynamics on a spatial setting continually rewires any infectious links keeping the epidemic active ( , ) . consequently, once the epidemic outbreak threshold is exceeded, the sir rate equations markedly underestimate the time-integrated outbreak extent reflected in the ultimate saturation level ܴ ஶ = ܴሺ‫ݐ‬ → ∞ ሻ, as is apparent in the comparison figure s (supplemental information). once the instantaneous fraction of the population has reached the threshold %, ‫ܫ‬ሺ‫ݐ‬ሻ = . ܰ, we initiate stringent social distancing that we implement through a strong repulsive interaction between any occupied lattice sites (with ݊ = ), irrespective of their states ܵ, ‫,ܫ‬ or ܴ; and correspondingly an attractive force between filled and empty (݊ = ) sites, namely the ising lattice gas potential energy ܸሺ{݊ }ሻ = ‫ܭ‬ ∑ ሺ ݊ − ሻ ൫ ݊ − ൯ ழ,வ with dimensionless strength ‫ܭ‬ = , where the sum extends only over nearest-neighbor pairs on the square lattice. the transfer of any individual from an occupied to an adjacent empty site is subsequently determined through the ensuing energy change ∆ܸ by the metropolis transition probability ‫ݓ‬ = min{ , exp ሺ−∆ܸሻ} ( , ) , which replaces the unmitigated hopping rate ݀. as a result, both the mobility as well as any direct contact between individuals on the lattice are quickly and drastically reduced. for sufficiently small total density ߩ = ‫ܮ/ܰ‬ ଶ , most of the individuals eventually become completely isolated from each other. for our ߩ = . , the disease will continue to spread for a short period, until the repulsive potential has induced sufficient spatial anti-correlations between the susceptible individuals. the social-distancing interaction is sustained for a time duration ܶ, and then switched off again. with increasing mitigation duration ܶ, the likelihood for the disease to locally go extinct in isolated population clusters grows markedly. as seen in the bottom row, the prevalence and spreading of the infection thus becomes confined to the perimeters of a mere few remaining centers. hence we observe drastically improved mitigation effects for extended ܶ: as shown in figure , the resurgence peak in the ‫ܫ‬ሺ‫ݐ‬ሻ curve assumes markedly lower values and is reached after much longer times. in fact, the time ߬ሺܶሻ for the infection outbreak to reach its second maximum increases exponentially with the social-distancing duration, as evidenced in the inset of figure (see also figure below). we emphasize that localized disease extinction and spatial confinement of the prevailing disease clusters represent correlation effects that cannot be captured in the sir mean-field rate equation description. in modern human societies, individuals as well as communities feature long-distance connections that represent `express' routes for infectious disease spreading in addition to short-range links with their immediate neighbors. to represent this situation, we extend our regular lattice with diffusive propagation to a two-dimensional newman-watts small-world network ( ) , which was previously applied to the study of plant disease proliferation ( ) . in contrast to the watts-strogatz model ( ) , in which the small-world property is generated through rewiring bonds of a onedimensional chain of sites, a newman-watts small-world network may be constructed as follows: for each nearest-neighbor bond, a long-distance link (or `short-cut') is added with probability ߮ between randomly chosen pairs of vertices. as illustrated in figure s (supplemental information), the resulting network features ߮ ‫ܮ‬ ଶ long-distance links, with mean coordination number < ݇ > = ሺ + ߮ሻ. again, each vertex may be in either of the states ܵ, ‫,ܫ‬ ܴ, or empty, and each individual can hop to another site along any (nearest-neighbor or long-distance) link with a total diffusivity ݀. a typical snapshot of the sir model on this small-world architecture is shown in figure s (supplemental information). the unmitigated simulation parameters are: ‫ܮ‬ = , , ܰ = , , ‫ܫ‬ሺ ሻ = , ݀ = , and ߮ = . . the presence of long-range links increases the mean connectivity, rendering the population more mixed, which in turn significantly facilitates epidemic outbreaks (see figure s in the supplemental information). we remark that for the sir dynamics, the newman-watts smallworld network effectively interpolates between a regular two-dimensional lattice and a scale-free network dominated by massively connected hubs; moreover, as the hopping probability ݀ → , the small-world network is effectively rendered static. in the two-dimensional small-world network, we may introduce social-distancing measures through two distinct means: i) we can globally diminish mobility by adopting a reduced overall diffusivity ݀ ᇱ < ; and/or ii) we can drastically reduce the probability of utilizing a long-distance connection to ݀ ఝ ≪ . we have found that the latter mitigation strategy of curtailing the infection short-cuts into distant regions has a far superior effect. therefore, in figure we display the resulting data for such a scenario where we set ݀ ఝ = . , yet kept the diffusivity unaltered at ݀ = ; as before, this control was triggered once ‫ܫ‬ሺ‫ݐ‬ሻ = . ܰ had been reached in the course of the epidemic. the resurgence peak height and growth rate become even more stringently reduced with extended mitigation duration than for (distinct) social distancing measures implemented on the regular lattice. finally, we run the stochastic sir kinetics on two different static structures, namely i) randomly connected and ii) scale-free contact networks. each network link may be in either the ܵ, ‫,ܫ‬ or ܴ configurations, which are subject to the sir reaction rules, but we do not allow movement among the network vertices. for the random network, we uniformly distribute , , edges among ܰ = , nodes; this yields a poisson distribution for the connectivity with preset mean (equal to the variance) < ݇ > = ሺ∆݇ሻ ଶ = . for the scale-free network, we employ the barabasi-albert graph construction ( ) , where each new node is added successively with ݇ = edges, to yield a total of , edges. the connectivity properties in these quite distinct architectures are vastly different, since the scale-free networks feature prominent `hubs' through which many other nodes are linked. in the epidemic context, these hubs represent super-spreader centers through which a large fraction of the population may become infected ( , ) . to implement the stochastic sir dynamics on either contact network, we employ the efficient rejection-free gillespie dynamical monte carlo algorithm: each reaction occurs successively, but the corresponding time duration between subsequent events is computed from the associated probability function ( ) (for details, see supplemental information). the random social network may be considered an emulation of the well-connected mean-field model. indeed, we obtain excellent agreement for the temporal evolution of the sir kinetics in these two systems with ܽ = . mcs (for the scale-free network, a small adjustment to an effective mean-field recovery rate ܽ ≈ . mcs is required). we implement a `complete lockdown' mitigation strategy: once the threshold ‫ܫ‬ሺ‫ݐ‬ሻ = . ܰ has been reached, we immediately cut all links for a subsequent duration ܶ; during that time interval, only spontaneous recovery ‫ܫ‬ → ܴ can occur. in figure , we discern a markedly stronger impact of this lockdown on the intensity of the epidemic resurgence in both these static contact network architectures, see also figure a below. on the other hand, the mitigation duration influences the second infection wave less strongly, with the time until its peak has been reached growing only linearly with ܶ: ߬ሺܶሻ ~ ܶ, as is visible in figure b . there is however a sharp descent in resurgent peak height beyond an apparent threshold ܶ > /ܽ for the random network, and ܶ > /ܽ for the scale-free network. for both the two-dimensional regular lattice and small-world structure, a similar sudden drop in the total number of infected individuals ( figure b ) requires a considerably longer mitigation duration: in these dynamical networks, the repopulation of nodes with infective individuals facilitates disease spreading, thereby diminishing control efficacy. we remark that if a drastically reduced diffusivity ݀ ᇱ ≪ is implemented, the small-world results closely resemble those for a randomly connected contact network ( figure a ). moreover, we have observed an unexpected and drastic effective structural change in the scalefree network topology as a consequence of the epidemic outbreak infecting its susceptible nodes. naturally, the highly connected hubs are quickly affected, and through transitioning to the recovered state, become neutralized in further spreading the disease. as shown in figure , as the infection sweeps through the network (in the absence of any lockdown mitigation), the distribution of the remaining active susceptible-infectious (si) links remarkably changes from the initial scale-free power law with exponent − / to a more uniform, almost randomized network structure. the disease resurgence wave thus encounters a very different network topology than the original outbreak. in this study, we implemented social distancing control measures for simple stochastic sir epidemic models on regular square lattices with diffusive spreading, two-dimensional newman-watts small-world networks that include highly infective long-distance connections, and static contact networks, either with random connectivity or scale-free topology. in these distinct architectures, all disease spreading mitigation measures, be that through reduced mobility and/or curtailed connectivity, must of course be implemented at an early outbreak stage, but also maintained for a sufficient duration to be effective. in figure , we compare salient features of the inevitable epidemic resurgence subsequent to the elimination of social distancing restrictions, namely the asymptotic fraction ܴ ஶ /ܰ of recovered individuals, i.e., the integrated number of infected individuals; and the time ߬ሺܶሻ that elapses between the release and the peak of the second infection wave, both as function of the mitigation duration ܶ. we find that the latter grows exponentially with ܶ on both dynamical lattice architectures, but only linearly on the static networks ( figure b) . furthermore, as one would expect, the mean-field rate equations pertaining to a fully connected system describe the randomly connected network very well. in stark contrast to the mean-field results (indicated by the purple lines in figure ), the data for the lattice and network architectures reveal marked correlation effects that emerge at sufficiently long mitigation durations ܶ. for ܶ > /ܽ in the static networks, and ܶ > /ܽ in the lattice structures, the count of remaining infectious individuals ‫ܫ‬ becomes quite low; importantly, these are also concentrated in the vicinity of a few persisting infection centers. this leads to a steep drop in ܴ ஶ /ܰ , the total fraction of ever infected individuals, by a factor of about in the static network, and in the dynamic lattice architectures. thus, in these instances, follow-up disease control measures driven by high-fidelity testing and efficient contact tracking should be capable of effectively eradicating the few isolated disease resurgence centers. however, to reach these favorable configurations for the implementation of localized and targeted epidemic control, it is imperative to maintain the original social-distancing restrictions for at least a factor of three (better four) longer than it would have taken the unmitigated outbreak to reach its peak (ܶ ≈ /ܽ … /ܽ in our simulations) -for covid- that would correspond to about two months. as is evident from our results for two-dimensional small-world networks that perhaps best represent human interactions, it is also absolutely crucial to severely limit all far-ranging links between groups to less than % of the overall connections, during the disease outbreak. the graphs compare the outbreak data obtained without any mitigation (grey) and with social distancing measures implemented for different durations ܶ, as indicated. in all cases, social distancing is turned on once ‫ܫ‬ሺ‫ݐ‬ሻ reaches the set threshold of % of the total population ܰ. the resurgent outbreak is drastically reduced in both its intensity and growth rate as social distancing is maintained for longer time periods ܶ. (the data for each curve were averaged over independent realizations; the shading indicated statistical error estimates.) inset: time ߬ to reach the second peak following the end of the mitigation; the data indicate an exponential increase of ߬ with ܶ. square lattices with diffusive spreading on our regular square lattice with ‫ܮ‬ ଶ sites set on a two-dimensional torus, we implement the stochastic susceptible-infectious-recovered (sir) epidemic model with the following individualbased monte carlo algorithm: . randomly distribute ܰ individuals on the lattice, subject to the restriction that each site may only contain at most one individual, and with period boundary conditions. some small fraction of the individuals will initially be infectious, while the remainder of the population will be susceptible to the infection. . perform random sequential updates ‫ܮ‬ ଶ times in one monte carlo step (mcs) by picking a lattice site at random, and then performing the following actions: a. if the selected site contains a susceptible ܵ or a recovered individual ܴ, a hopping direction is picked randomly. if the adjacent lattice site in the hopping direction is empty, then the chosen individual is moved to that neighboring site with hopping probability ݀ that is related to a macroscopic diffusion rate. b. if the chosen lattice site contains an infectious individual ‫,ܫ‬ it will first try to infect each susceptible nearest neighbor ܵ with a prescribed infection probability ‫.ݎ‬ if this attempt is successful, the involved susceptible neighbor ܵ immediately changes its state to infected ‫.ܫ‬ after the originally selected infected individual has repeated its infection attempts with all neighboring susceptibles ܵ, it may reach the immune state ܴ with recovery probability ܽ. finally, this particular individual, whether still infectious or recovered, tries to hop in a randomly picked direction with probability ݀, provided the chosen adjacent lattice site is empty. . repeat the procedures in item for a preselected total number of monte carlo steps. to determine the effective (coarse-grained) basic epidemic reproduction ratio ܴ , we fit the infection curves to straightforward numerical integrations of the deterministic sir rate equations ݀ܵሺ‫ݐ‬ሻ/݀‫ݐ‬ = ‫ݎ−‬ ܵሺ‫ݐ‬ሻ ‫ܫ‬ሺ‫ݐ‬ሻ/ܰ, ‫ݐ݀/‪ሻ‬ݐ‪ሺ‬ܫ݀‬ = ‫ݎ‬ ܵሺ‫ݐ‬ሻ ‫ܫ‬ሺ‫ݐ‬ሻ/ܰ − ܽ ‫ܫ‬ሺ‫ݐ‬ሻ, ܴ݀ሺ‫ݐ‬ሻ/݀‫ݐ‬ = ܽ ‫ܫ‬ሺ‫ݐ‬ሻ, and adjust the lattice simulation infection probability ‫ݎ‬ ≈ . and to a lesser extent, the recovery probability ܽ to finally match the targeted covid- value ܴ ≈ . . we note that this slightly `renormalized' value for ܽ is subsequently utilized to set the time axis scale in the figures. on the mean-field level, initially ܴ = ሺ‫ܽ/ݎ‬ሻ ܵሺ ሻ/ܰ, since all nodes are mutually connected. in spatial settings, ܵሺ ሻ/ܰ is to be replaced with the mean connectivity (i.e., the coordination number for a regular lattice) to susceptible individuals. the lattice simulation data is fitted with the mean-field result by matching two parameters: the maximum value and the half-peak width of the infectious population curve ‫ܫ‬ሺ‫ݐ‬ሻ, see figure s . the lattice simulation curve digresses from the mean-field curves at low ‫ܫ‬ሺ‫ݐ‬ሻ values, far away from the peak region. in the lattice simulations, the initial rise of the infectious population curve exhibits power-law growths ‫ܫ‬ሺ‫ݐ‬ሻ ~ ‫ݐ‬ ଵ.ସ±.ଵ and ܴሺ‫ݐ‬ሻ ~ ‫ݐ‬ ଶ.ଷ±.ଵ in clear contrast with the simple exponential rise of the mean-field sir curve as obtained from integrating the mean-field rate equations. we note that these are the standard critical exponents ߠ and + ߠ for the temporal growth of an active seed cluster near a continuous non-equilibrium phase transition to an absorbing extinction state ( ) . figures s about here. figure s shows the dependence of the asymptotic number of recovered individuals ܴ ஶ on the density ߩ for various sets of hopping rates ݀ and initial infectious population values ‫ܫ‬ሺ ሻ. these data indicate the existence of a well-defined epidemic threshold, i.e., a percolation-like sharp transition from a state when only a tiny fraction of individuals is infected, to the epidemic state wherein the infection spreads over the entire population ( ) . as one would expect, this critical point depends only on the ratio ܽ/݀ of the recovery and hopping rates. varying the lattice simulation parameters just shifts the location of the epidemic threshold. once the model parameters are set in the epidemic spreading regime, the system's qualitative behavior is thus generic and robust, and only weakly depends on precise parameter settings. for our two-dimensional small-world network, whose construction is schematically depicted in figure s , we employ a similar monte carlo algorithm as described above; the essential difference is that individuals may now move to adjacent nearest-neighbor as well as to distant lattice sites along the pre-set `short-cut' links. figure s a demonstrates (for fixed diffusivity ݀ = ) that as function of the fraction ߮ of long-distance links in a two-dimensional small-world network, the epidemic threshold resides quite close to zero: the presence of a mere few `shortcuts' in the lattice already implies a substantial population mixing. the inset, where the ߮ axis is scaled logarithmically, indicates that sizeable outbreaks begin for ߮ ≥ . . figure s b similarly shows the outbreak dependence on the diffusion rate ݀ (here for ߮ = . ), with the threshold for epidemic spreading observed at ݀ ≈ . . evidently, prevention of disease outbreaks in this architecture requires that both mobility and the presence of far-ranging connections be stringently curtailed. figure s about here. for both the randomly connected and scale-free contact networks, we employ the gillespie or dynamical monte carlo algorithm, which allows for efficient numerical simulations of markovian stochastic processes. it consists of these subsequent steps: . initially, few nodes are assigned to be infected ‫,ܫ‬ while all other nodes are set in the susceptible state ܵ. each susceptible node ܵ is characterized by a certain number of active links that are connected to infected nodes ‫.ܫ‬ . we then determine the rate at which each infected node ‫ܫ‬ will recover, and at which each susceptible node ܵ with a non-zero number of active links becomes infected. from these we infer the total event rate ‫ݎ‬ ௧௧ . . based on this total rate ‫ݎ‬ ௧௧ , we select the waiting time until the next event occurs from an exponential distribution with mean ‫ݎ‬ ௧௧ . . we then select any permissible event with a probability proportional to its rate, update the status of each node, and repeat these processes for the desired total number of iterations. strategies for mitigating an influenza pandemic modeling targeted layered containment of an influenza pandemic in the united states forecasting covid- impact on hospital bed-days, icu-days, ventilatordays and deaths by us state in the next months impact of non-pharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand evaluating the impact of international airline suspensions on the early global spread of covid- the hidden geometry of complex, network-driven contagion phenomena a lattice model for influenza spreading networks and epidemic models a contribution to the mathematical theory of epidemics mathematical biology, vols. i + ii critical dynamics -a field theory approach to equilibrium and non-equilibrium scaling behavior chemical kinetics: beyond the textbook impact of non-pharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand generalized logistic growth modeling of the covid- outbreak in provinces in china and in the rest of the world controlling epidemic spread by social distancing: do it well or not at all statistical mechanics of driven diffusive systems nonequilibrium phase transitions in lattice models scaling and percolation in the small-world network model percolation and epidemics in a two-dimensional small world collective dynamics of 'small-world' networks topology of evolving networks: local events and universality reasoning about a highly connected world temporal gillespie algorithm: fast simulation of contagion processes on time-varying networks the full simulation movie files are research was sponsored by the u.s. army research office and was accomplished under grant no. w nf- - - . the views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the army research office or the u.s. government. the u.s. government is authorized to reproduce and distribute reprints for government purposes notwithstanding any copyright notation herein. s.d. gratefully acknowledges a fellowship from the china scholarship council, grant no. csc . key: cord- - blzw oy authors: malavika, b.; marimuthu, s.; joy, melvin; nadaraj, ambily; asirvatham, edwin sam; jeyaseelan, l. title: forecasting covid- epidemic in india and high incidence states using sir and logistic growth models date: - - journal: clin epidemiol glob health doi: . /j.cegh. . . sha: doc_id: cord_uid: blzw oy background: ever since the coronavirus disease (covid- ) outbreak emerged in china, there has been several attempts to predict the epidemic across the world with varying degrees of accuracy and reliability. this paper aims to carry out a short-term projection of new cases; forecast the maximum number of active cases for india and select high-incidence states; and evaluate the impact of three weeks lock down period using different models. methods: we used logistic growth curve model for short term prediction; sir models to forecast the cumulative, maximum number of active cases and peak time; and time interrupted regression model to evaluate the impact of lockdown and other interventions. results: the predicted cumulative number of cases for india was , ( % ci: , , , ) by may , and the observed number of cases was , . the model predicts a cumulative number of , , ( % ci: , , , , , ) cases by may , as per sir model, the maximum number of active cases is projected to be , on may , . the time interrupted regression model indicates a decrease of daily new cases after the lock down period which is statistically not significant. conclusion: the logistic growth curve model predicts accurately the short-term scenario for india and high incidence states. the prediction through sir model may be used for planning and prepare the health systems. the study also suggests that there is no evidence to conclude that there is a positive impact of lockdown in terms of reduction in new cases. title of the article: forecasting covid- epidemic in india and high incidence states using sir and logistic growth models. since the beginning of the covid- epidemic, there has been several mathematical and statistical modelling that have predicted the global and national epidemic with varying degrees of accuracy and reliability. , the accuracy of prediction and its uncertainty depend on the assumptions, availability and quality of data. the results can vary significantly if there is difference in the assumptions, and values of input parameters. during a pandemic like covid- , the availability and quality of data keep improving as the epidemic progress, which make predictions uncertain in the early stages and expected to improve in the later stages. moreover, an epidemic may not always behave in the same manner as pathogens are likely to behave differently over time. in terms of covid- , different models are used to estimate the key features of the disease such as the incubation period, transmissibility, asymptomaticity, severity, and the likely impact of different public health interventions. among the models, susceptible, exposed, infection and recover (seir), susceptible, infection and recover (sir) models, agent-based models and curve-fitting, logistic growth models due to the exponential nature of growth of the epidemic or extrapolation models, are commonly adopted using different biological and social processes. , [ ] [ ] [ ] [ ] [ ] [ ] in this scenario, the logistic growth models are better preferred option. choudhary ( ) has predicted the estimated cases very early till april , , using time series models. however, it was found to be a gross underestimation. in spite of the limitations, considering the unprecedented nature of the pandemic, uncertainties about the disease and the need for urgent but appropriate social, economic and public health responses; accurate forecasting of the size, severity and duration of the epidemic is critical to inform policies, programme and strategies. this paper aims to carry out short-term projection of new cases using the logistic growth curve model; forecast the maximum number of active cases for india and selected highburden states using the sir model with correction factor based on china, italy and south korea; and evaluate the impact of lockdown and other interventions on the incidence of daily cases. logistic growth is characterized by an increasing growth in the beginning, but a decreasing growth at a later stage, as it approaches the maximum. in covid- , the maximum limit will be the total population and the growth will necessarily come down when a greater proportion of the population is sick. the reason for using logistic growth for modelling the coronavirus outbreak is based on the evidence that the epidemic follows an exponential growth in the early stages and expected to come down during the later stages of the epidemic. the modified logistic growth model , is presented as follows, where, y(t) is the number of cases at any given time t c is the limiting value, the maximum capacity for y a = (c / y ) - b is the rate of change. • the number of cases at the beginning, also called initial value is: c / ( + a) • the maximum growth rate is at t = ln(a) / b when y is equal to c (that is, the population is at maximum size), y/c will be . therefore, the ( -(y/c)) will be and hence the growth will be . β is a transmission parameter, which is the average number of individuals that one infected individual will infect per time unit. it is determined by the chance of contact and the probability of disease transmission. γ is the rate of recovery in a specific period. d, the average time period during which an infected individual remains infectious which is derived from γ. = . the ratio = , is the basic reproduction number. r is the average number of people infected by an infected individual over the disease infectivity period, in a totally susceptible population. in order to fit a sir model, the parameters were obtained by minimizing the residual sum of squares between the observed cumulative active cases and the predicted cumulative infected cases. we have fixed r and as . and days respectively. , therefore, is . and the is . . the data for india was taken from the crowd sourced invariably, the sir model overestimates the active number of cases. in order to compute the overestimation, the actual number of reported cases from china was obtained up to april , and used to estimate the maximum number of active cases in china. subsequently, the ratio of maximum (peak) active cases projected by the model to the observed peak active cases was computed. the similar estimation was done for italy and south korea as well. in order to choose the best correction factor that is appropriate for india, we compared the age and gender distribution of population of these three countries with the age and gender distribution of population in india. china correction factor was applied to states such as maharashtra, rajasthan and tamil nadu. as the population size in delhi is small which is about four to five times lower than the other states, sir model was not done for delhi. data that were used in the modelling is presented in appendix. time interrupted regression analysis was done to assess the impact of weeks' lockdown on the incidence of new cases. dummy variable was introduced at april , . the hypothesis was that there will be a decline in the incidence of new cases after the lock down period, that is after april , . that is, the regression coefficient will be significant and negative in direction. as there were only cases reported from jan to march , , we excluded these time points from the analysis. table is presented in figure a & b. the rajasthan and tamil nadu, it will be , , , and , respectively. the corresponding peak time was expected to be june , , june , and june , respectively. the diagrammatic representation of the trend is presented in figure . the results of the interrupted time regression analyses are presented in table . the model indicates a decrease of daily new cases after april , , weeks after the lockdown which is not statistically significant. there have been several studies forecasting the incident cases of covid- in various countries. however, there are a little peer reviewed articles about india. forecasting covid- through appropriate models can help us to understand the possible spread across the population so that appropriate measures can be taken to prevent further transmission and prepare the health systems for medical management of the disease. it is also essential to evaluate the effectiveness of interventions so that appropriate and timely programmatic changes can be made to mitigate the epidemic. we forecasted the number of cumulative cases for india and four other high incidence states using logistic growth model which has projected the cumulative cases very closely to the observed cases. this model is based on the current trends of the cumulative cases in india and specific states. we have used the logistic growth model due to the exponential nature of growth of the epidemic which eventually get stabilised as against pure exponential model. , - end of may, . however, the total number of cases had already crossed , by april , , which was a gross underestimation. the sir model with correction factor predicted , cases which will be the maximum number of active cases by may , . however, the peak time gets pushed to june in other states. when we performed the sir model using the reported cases from china, south korea and italy, we found that the model predicted more number of active cases than what they observed up to a time point for which the data was analysed. in order to address the overestimation, we formulated a correction factor which is essential to predict the epidemic accurately. besides, as suggested by ranjan ( ) , the sir model depends heavily on the population who are susceptible. therefore, it may overestimate the maximum cases when the epidemic is not generalized in the population. therefore, this could be considered as a warning signal for preparing the health systems in terms of planning treatment facilities and other interventions. in covid- epidemic, assessing the effectiveness of lockdown is one of the key interest areas. india had a head start in imposing the lockdown relatively early, in addition to strong public health measures to mitigate the spread of the epidemic. it also raises an interesting question whether this lockdown has really impacted the incidence cases. several studies have assessed the effectiveness of interventions with varying level of results. , we carried out interrupted time series analyses that suggested no significant decline in the number of daily cases immediately after the lock down. ironically, there is an increase in the number of daily cases immediately after the weeks of lockdown period. it indicates that the lockdown and other interventions did not have any impact on reducing the number of daily cases after a certain period. this may be due to the fact that the number of tests done over a period of time has increased significantly. however, we need to revise the model every week as and when the data gets accumulated. limitations: as in any other projection using models, the limitation is that each model would behave differently, not merely due to differences in underlying assumptions but differences in population density, existing capacity of the health systems, current level of interventions and socio-demographic and economic situation across and within the states and districts. therefore, district level projections may be required, which would account the variations between the states and within the states. in covid- , there has been a higher level of uncertainly about the number of reported confirmed cases due to the issues in varying testing strategies, the proportion of asymptomatic cases and the effective transmission rate. because of this, we may be missing a significant number of reported confirmed cases which may affect the accuracy of any models. in conclusion, the short term projection predicts exactly well with the observed number of cases in india and in other states through the logistic growth model. the findings from sir model may be used for planning the interventions and prepare the health systems for better clinical management of the infected in the country and respective states. none of the authors have conflicts of interest to report. this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. not required early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia geneva: world health organization mass testing, school closings, lockdowns: countries pick tactics in 'war' against coronavirus the impact of social distancing and epicenter lockdown on the covid- epidemic in mainland china: a data-driven seiqr model study new delhi: ministry of health and family welfare, government of india why is it difficult to accurately predict the covid- epidemic? infect dis model when will the coronavirus outbreak peak? nature epidemic forecasting is messier than weather forecasting: the role of human behavior and internet data streams in epidemic forecast managing epidemics:key facts about major deadly diseases modeling and predictions for covid spread in india age-structured impact of social distancing on the covid- epidemic in india seir and regression model based covid- outbreak predictions in india mathematical modeling of the spread of the coronavirus disease (covid- ) taking into account the undetected infections. the case of china prudent public health intervention strategies to control the coronavirus disease transmission in india: a mathematical model-based approach epidemic situation and forecasting of covid- in and outside china covid- in india: predictions, reproduction number and public health preparedness forecasting covid- cases in india. towards data science analysis of a modified logistic model for describing the growth of durable customer goods in china modeling logistic growth covid- growth modeling and forecasting with prophet an introduction to compartmental modeling for the budding infectious disease modeler application of the susceptible-infected-recovered deterministic model in a gii.p emergent norovirus strain outbreak in romania in the reproductive number of covid- is higher compared to sars coronavirus pattern of early human-to-human transmission of wuhan segmented regression analysis of interrupted time series studies in medication use research predictions for covid- outbreak in india using epidemiological models predictions, role of interventions and effects of a historic national lockdown in india's response to the covid- pandemic: data science call to arms covid- : mathematical modelling and predictions -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- -feb- - feb- -feb- -mar- -mar- -mar- -mar- -mar- -mar- -mar- -mar- -mar- - mar- -mar- -mar- -mar- -mar- the authors whose names are listed immediately below certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.author names: ________________________________________________________________________________________ ______________________________________________________________________________________________________ ______________________________________________________________________________________________________ the authors whose names are listed immediately below report the following details of affiliation or involvement in an organization or entity with a financial or non-financial interest in the subject matter or materials discussed in this manuscript. please specify the nature of the conflict on a separate sheet of paper if the space below is inadequate. key: cord- -le eifv authors: rahman, mohammad mahmudur; ahmed, asif; hossain, khondoker moazzem; haque, tasnima; hossain, md. anwar title: impact of control strategies on covid- pandemic and the sir model based forecasting in bangladesh. date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: le eifv covid- is transmitting worldwide drastically and infected nearly two and half million of people sofar. till date cases of covid- is confirmed in bangladesh till th april though the stage- / transmission is not validated yet. to project the final infection numbers in bangladesh we used the sir mathematical model. we also tried to demonstrate the impact of control strategies like social distancing on the covid- transmission. due to large population and socio-economic characteristics, we assumed % social distancing and lockdown can be possible. assuming that, the predicated final size of infections will be on the th day from the first infections. to estimate the impact of social distancing we assumed eight different scenarios, the predicted results confirmed the positive impact of this type of control strategies suggesting that by strict social distancing and lockdown, covid- infection can be under control and then the infection cases will steadily decrease down to zero. coronavirus disease (covid- ) is exhibiting an unparalleled challenge before the mankind. till date ( th april ) there are about . million confirmed cases of covid- and about k reported deaths globally [ ] . nearly % of the world populations are currently under lockdown by govt. or community to reduce the transmission of this extreme contagious disease. covid- is the viral infectious disease caused by the sars-cov- , for which there is no treatment and vaccine yet. covid- is transmitted by respiratory droplets and fomites with incubation period from to days [ ] .institute of epidemiology, disease control and research (iedcr), bangladesh first reported a covid- case in bangladesh on march , [ ] . since then, there has been a steady increase in the number of infections with cases on april , among which there are , active cases, recovered cases and deaths. in response, bangladesh has employed international travel bans and a gradual lockdown. however, countries like bangladesh are at a greater risk because of large population density, inadequate infrastructure and healthcare systems to provide required support. initially, it was thought that hot and humid weather [ , ] , a large proportion of the young population, and probable immunity caused by bcg vaccinations [ ] , may help to keep infection number low.. however, larger portion of these outcomes are preliminary and correlation-based, thus additional confirmation is necessary for hard conclusion [ ] . we hereby present mathematical and epidemiological models for the covid- transmission in bangladesh. the trends of the maximum of pandemics follow the rapid exponential growth during the preliminary stage and ultimately fallen down [ ] . the mathematical epidemic models are therefore based on an exponential fit for short term and long term predictions. the susceptible-infectious-recovered (sir) compartment epidemiological model [ ] is used by considering susceptibles, infectious, and recovered or deceased status of individuals during pandemics. this sir model has revealed a significant prognostic aptitude for the increase of covid- transmission in bangladesh on a day-to-day basis. we have also calculated the probable effects of social distancing and frequent hand wash with soaps or sanitizer on the increase of infections. bangladesh announced a countrywide lockdown excepts the emergency services till april. there are no exact data how many people maintaining social distances in bangladesh, although our previous study (elsewhere submitted) [ ] showed that . % did not maintained social distances, however, the study was conducted through online cross-sectional methods thus a big portion of the population was not included due to unavailability of internet. the study also estimated that no one is free of risk of infection suggesting a longer period of lockdown is required for controlling the covid- pandemic. in the present study, we also estimated the possible infections in case of %, %, %, %, % and % populations are in lockdown. however, socioeconomic conditions in a country of more than million populations with high density cause considerable challenges in implementing strict social distancing. considering this dense population of bangladesh and to know the effects of tiny percentages of differences of social distance, we assumed two scenarios. we estimated the prediction of highest infection cases where % and . % people maintained strict social distancing. as the sir epidemiological model is entirely dependent on data, it is very important to mention on the character of this data. different diagnostic strategies are taken in different countries for the confirmation of covid- cases. in bangladesh, in the beginning testing has mostly been limited to persons travelling from infected countries and their direct contacts. very recently, countrywide testing is started with the suspected persons as well as selected pneumonia patients and symptomatic healthcare workers. as of april , bangladesh has tested , samples ( /million) [ ] . a number of recent studies [ ] have shown that the effectiveness of coronavirus infection may vary due to the warmer weather. in addition, differential immunity of bangladeshi people due to bcg vaccine [ ] is already completely assumed in the data as basic reproduction number. present sir model forecast the transmissions as a result of stage- (persons with a travel history to infected areas/countries) and stage- (person-to-person contact). however, if the confirmed cases of infections start to surpass the predicted infection thoroughly, then the outbreak will enter a new stage, and no mathematical model explained above will be applicable. nevertheless, as of april th , there is no strong evidence for community transmission. high population density as well as socio-demographic characters puts bangladesh on a high risk for stage three and four community transmission. even though, the social distancing and scrupulous contact tracing actions are taken by bangladesh authority, may limiting these virus transmissions to small groups, relocation of laborers, workers and small income groups could deteriorate the situation. consequently, these factors need to be measured during constructing conclusions based on the current study. the missing expat populations with infections possibly will also influence the predictions but this could be a debatable issue as discussed below. however, if a considerable number of infections were missed, a point would have already become visible in the curve by the end of the april. the most important and common questions regarding covid- is its final infection numbers and death tolls. to get the answer, a range of mathematical epidemic models have been utilized, such as stochastic [ ] , analytical [ ] , and phenomenological [ ] . in this study, we attempt to estimate the final epidemic size of covid- using the classic compartmental susceptible-infected-recovered (sir) model [ ] . with this model, we obtain a series of daily predictions with different circumstances. to predict the maximum infections number we used sir epidemic model [ ] .the sir epidemic model is a method of modeling infectious diseases by categorizing the population based on their disease condition. this classifies susceptible, infected and recovered. the susceptible population means they are not affected, however, are at risk for infection. infected persons already infected by the causative agents and are able to infect the susceptible persons. recovered means infected persons who have either recovered from the disease or achieved stable immunity, or are otherwise detached from the population that are not able to infect susceptible population (death, quarantine etc.). the sir model presents the increase of decrease information of an outbreak based on some initial data i.e. total given population (n), the infection rate of the infectious disease (β), the recovery rate of the disease (Ɣ), initial susceptible population (s ), initial infected population (i ) and the initial recovered population (r ). this model assumes blocked populations where no one is dies or born, so the population remains constant and every person is either part of s, i, or r. the general form of the model is here, β is infection rate per day, n is the total given population, and Ɣ is the recovery rate per day. (thus, Ɣ is the mean infection time). again + + = , and s +i +r =n indicate that the population is closed and change in numbers with respect to time is . we considered some initial conditions as well to use this model such as the initial susceptible numbers, infected numbers, and recovered populations. that means: ➢ s > (population who are susceptible), ➢ i > (at least one infected that can infect susceptible persons), and ➢ r ≥ (there may be some people already recovered or died population at the start of the model, or there may be no one). and again, both s +i +r=n and st+it+rt=n for any t. since rt can be found exclusively based on st and it, considering these variables, we can write after integration we got as st decreases with t increases (susceptible persons are infected but not ever added back into the susceptible numbers) and = Ɣ× , is the maximum value of it and if > Ɣ× then it will raise to that all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . highest before declining to zero. however, in case of ≤ Ɣ× then it will decline to zero and there will be no epidemic. so that, we can certainly say that it must approach to zero as t→∞, and while we observe from the model is strictly positive and based only on it. therefore, if it was not zero, rt would increase freely, which is not possible because the population is blocked. the dynamics of the sir mathematical model depend on the ratio where, r is the effective rate. r is = Ɣ , referred to as the basic reproduction ratio or basic reproduction number. as β is the infection rate per day, and Ɣ is the average infectious time (or average time an individual stays infected). generally, if r > then infected persons are transmitting diseases into susceptible people quicker than recovery rate, so the disease grow to be an epidemic. if r < , an epidemic does not take place. in a compartmental model, sir, populations are moving from one compartment to another. this model can often be molded using recursive interaction of the form, thus we can write: considering these parameter, the number of population at any time who are susceptible, infected or recovered may be calculated with the following equations. these equations estimate the number of person in each state today (n), based on the number yesterday (n- ) and the rates of infections and recovery and Ɣ respectively.the n denotes the number in one time period and n- stands for the number in the prior period. so with a time period of one day, the equation eight (e ) can be explained as the number of susceptible individual today (sn) equals the number of yesterday (sn- ), minus the fraction of people who turn into infected today (yesterday's number of susceptible individual (sn- ) divided by the original susceptible number (s), multiplies their rate of infection and number of individuals were infected (in- ) yesterday. in the equation number nine (e ), the numbers of infected person today (in) equals to the numbers who were infected yesterday (in- ), in addition, the numbers of susceptible individual who became infected today, and subtract the numbers of recovered today who were infected yesterday. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . in equation number ten (e ), the numbers of recovered people today (rn) equals to the previous numbers who had recovered and the numbers who were infected yesterday and recovered today. we obtained covid- infection data o bangladesh from icedr [ ] and calculated the rate of infections per day and the rate of recovery (recovered and dead) per day based on the number of infection, the number of recovery and death as of date th march. then we calculated the basic reproduction ratio/number and the highest infected population. to do this we used the total population [ ] of bangladesh where we assumed there is no interventions at all and later with the population who did not practices properly towards covid- . afterward with the equations (e , e and e ) we calculated the date of reach of the highest number of infection. we also predict infection numbers by assuming %, %, %, %, %, % and . % of the bangladeshi population maintained strict social distances. we have created the scatter plot to compare our model of infection prediction with actual infection of bangladesh, as well as we have made prediction of infection numbers on end of april, may, june, july and august . all the analyses were done in microsoft excel and spss using the equations (e , e , and e ) described above. graphs were prepared in graphpad prism . according to iedcr, on march , three individuals were confirmed with covid- . since then infection cases are gradually increasing and till date april it reaches ( figure ). in the beginning diagnostic tests were conducted by iedcr only, however, from last week several diagnostic facilities were opened country wide thus the infection cases are increased. although there are few evidences of community transformation, most infections are transmitted from infected persons to relatives and to health workers who treated them. the results for the sir mathematical models are discussed. we consider that the sir model will give good forecast for the stage- and stage- infections as we assumed there is no stage- transmission yet. in addition, we guesstimate all cases to be symptomatic since estimation of asymptomatic cases in numbers all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . https://doi.org/ . / . . . doi: medrxiv preprint is difficult. this possibly misjudges the real numbers of cases. figure is showing the predicted cases and confirmed cases till april suggesting that confirmed infection cases are following the sir model prediction trends (r = . , p< . ). combined prediction results according to the sir model have been showed in figure . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . due to high population density, it is difficult to control infections as well as prediction of infections. socio-economic conditions makes complicated to maintain social distances. though, the sir model analysis is not considering all of these factors, we used this classic model to predict the outbreak in bangladesh. figure shows all the prediction curves of susceptible, infected and recovery (sir) cases. the prediction was conducted based on the total population and considered there is no intervention implied. as government imposed lockdown from march , we again estimated the sir with these new all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . situations. however, due to dense population, socio-economic conditions, religious orthodoxy and no scientific data till date, it is not clear how many people are maintaining sustainable social distancing in bangladesh. bearing in mind these factors, we considered eight possible scenarios such as if there is no interventions, %, %, %, %, %, % and . % people maintain strict social distancing resulting %, %, %, %, %, % % and . % people of bangladesh susceptible for covid- . the sir model prediction results considering the above mentioned scenarios have been demonstrated in table . the figure shows the combined curve of susceptible, infected and recovery confirming our model worked flawlessly. in the model analysis, the infection rates per day (β) was . and recover rates (Ɣ) per day was . on march th , and the basic reproduction number (r) . confirms the pandemic conditions. if there is no intervention then the infection cases among bangladesh population will reach around . million in days ( th of june) from the first infection (table ) . then the infection cases will steadily decrease down to zero. on th day ( th of september) from the first infection there will no new infection according to sir prediction model. the sir model base prediction of infection curve was compared with the confirmed cases ( figure ). the comparison suggested that the confirmed cases are following the predictions till april . we also predicted the infection case numbers on the end of april. the prediction estimated that infections will reach , by the date where the whole populations were in susceptible. figure shows the prediction of infections along with the confirmed cases by the end of april. the sir model prediction. the consequences of social isolation on covid- pandemic have been observed by several investigators using diverse mathematical models [ , ] . it is well-known that the effects of social distancing become evident solitary after some days from the lockdown. since the sign of the covid- characteristically take - days to appear after sars-cov- infection. bangladesh announced lockdown pretty early ( days first case and the number of cases were ) compare to china (on cases) and all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . india (on cases) [ ] . however, a mismanagement of the announcement and the transport bar nearly millions of people moved to countrywide from the capital dhaka whom most of were low income people. we assumed that these people did not carry viruses from the capital to other regions as then the infection cases were very low. in addition, most of expats who returned to bangladesh from infected regions did not follow home quarantines and be scattered in different part of the country. some of them might carry viruses which are turning out to be true now. as a result, the exact scenarios of maintaining social distances in bangladesh are indistinguishable. therefore, we assumed the eight possible scenarios, mentioned above where %, %, %, %, %, % % and . % people of bangladesh are susceptible in our current sir model based study. as the infection rates per day, recover rates per day and the basic reproduction number remain unchanged, we wanted to know is there any effect of social distancing on covid- transmission with the above mentioned scenario. the prediction results are illustrated in figure and tabulated in the third column in table confirmed that by social distancing, covid- infection cases can be controlled and reduced as well as the ending of the outbreak will be rapid. in table we have summarized the predicted infection cases on the end of april, may, june, july and august. comparing with total population with all possible scenarios suggested that covid- transmission cannot be stopped now; however, it could be decreased at tolerable level by strict social distancing. factually, the later four scenarios are not possible for high population density countries like bangladesh. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . infection. according to the prediction the peak infection cases will appear in may and afterwards it will go down to zero and by august, bangladesh will be free from covid- infection. * denotes the confirmed cases reported by iedcr [ ] . in the sir model, the number of susceptible individual today equals to the persons who were susceptible yesterday minus the numbers who become infected today. as long as the disease is spreading the outstanding susceptible number declines each day. in addition, the persons who turn into infected today equals to the yesterday's number of susceptible multiplied by the rates of infection per day, but it might look unusual that we also multiply that result by how many were infected the previous day. this is because the rate of infection per day is for every infected individual. if persons are infected the chance of any person else becoming contaminated is times higher than individual is infected. so any estimation of the rate of transmission of the disease needs information of the infection rate per day and the numbers of primarily infected and originally susceptible persons. at the beginning of an epidemic the number of persons becoming infected each day is perhaps bigger than the number recovering, so the number of infected will maintain growing until more person recover than be infected. the numbers of susceptible individual always reduces, but the number of infected and recovered at first goes up and then turn down. the model used in this study is data-driven, so they are as dependable as the data are. compare to other model based studies [ ] on different locations, at the beginning; the infection patterns of bangladesh are in exponential growth stage. according to the available data, we be able to predict that the highest size of the covid- outbreak using the sir model will be nearly , , if there is no intervention. with such a large population and for socio-economic conditions, it is not possible to maintain even % lockdown or social distances in bangladesh. we assumed, by law and enforcements and self awareness % lockdown and social distances can be maintained. in accordance the final size of covid- will be cases which is obtained from the sir model analysis. early strict lockdown and social distancing is the key factors of prevented covid- transmissions. studies showed that several other countries such as uk, germany, italy and usa where this stringent action was employed only after covid- entered community transmission stage (stage- ), and the outbreak became uncontainable. additionally, different nations have different strategies as well as acquiescence levels due to several all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . https://doi.org/ . / . . . doi: medrxiv preprint realistic considerations in enforcing the lockdown. this may have an effect on the final size of outcome. for example, the infection rates in italy and usa are still not become stable after more than days of lockdown. they have witnessed the uppermost percentage of death as well. conversely, south korea, japan, singapore etc. has shown significant decline by imposing lockdown [ ] . as estimated in a recent study [ ] , a reduction of infections in australia can be evident only if the social distancing levels go beyond %. assuming the same pattern as australia in bangladesh as well (even though the lockdown in australia was more strict with stringent police control over person movement activities), we can assume that till april ( days from lockdown) very tiny effect of social estrangement will be observed. by this date, bangladesh may have reported cases as many as if % of bangladesh population are susceptible that are shown in table . this number could rise significantly if community transmission (stage- ) turns out and transmissions due to the movement of industries workers and laborers. in addition, on may , bangladesh should observe the peak of transmission predicted , if no further strict lockdown imposed. to reach on the numbers of infection, on th of may, bangladesh should expect around patients on a single day. a latest study by mandal et al.[ ] has revealed that social distancing can decrease cases by up to % confirms the effect of social distancing, the similar prediction were made by our model (table , and table ). exponential increase is thought throughout to consider for the worsen-case scenario. a reduction of - % can bring the condition to more convenient. in addition, if bangladesh pursues the case isolation approach strictly, it is anticipated that the infection curve will begin flattening out soon. naturally, the degree of accuracy of these estimates remains to be seen. in conclusion, qualitatively, both models show that the epidemic is moderating, but recent data show a linear upward trend. the next few days will, therefore, indicate in which direction the epidemic is heading. this investigative estimate shows that the transmission rate of covid- in bangladesh will be as high as . million and for the socio-economic conditions and some other practical considerations it is not possible to impose most strict lockdown. however, still now, the transmission rate per day and basic reproduction number for bangladesh are nearly the level of global ( . to . ) range [ ] . due to the questionable small amount of testing [ ] compare to other countries referencing a low transmission rate per day and therefore the lesser basic reproduction number than world. the mathematical epidemiological model sir is used to forecast the short-term and long-term outcomes. the sir model assumes all the infection cases to be symptomatic, which is a limitation and could be underestimate the actual cases because of an unsure number of asymptomatic cases. with this constraint, the sir model satisfactorily predicts the cases till today (april ). the prediction indicates that bangladesh will enter equilibrium by the end of the first week of june with estimated total number of cases to be approximately , if no further stringent measures taken by the government of bangladesh. it is projected that the effect of social distancing will be visible shortly by the end of april. however, bangladesh is on the door to go into community transmission due to reported infringement of quarantine standard by people as well as other sociodemographic characteristics. the predictions completed using the epidemiological model in this study will be unacceptable if the transmission goes into stage- massively. in conclusion, the model is as good as the original data. on account of real time changes in data every day, the forecasting will therefore changes. for this reason, the outcomes from this study are supposed to be used only for qualitative understanding and rational estimation of the nature of pandemic, but are not meaningful for any judgment making or strategy/policy change. this study is conducted with available data and concluded with predictions using sir epidemiologic model. however, the sir model predictions will invalid if the transmission enters into stage- . thus, no policy making decision should be made based on this predictions except imposing of strict lockdown. covid- coronavirus pandemic a review of coronavirus disease- (covid- ) covid- status bangladesh effects of temperature variation and humidity on the mortality of covid- in wuhan. medrxiv spread of sars-cov- coronavirus likely to be constrained by climate. medrxiv the mystery behind childhood sparing by covid- the role of absolute humidity on transmission rates of the covid- outbreak estimating initial epidemic growth rates . . si, sis, and sir epidemic models knowledge, attitude and practices (kap) towards covid- and assessment of risks of infection by sars-cov- among the bangladeshi population: an online cross sectional survey will coronavirus pandemic diminish by summer? available at ssrn a note on the derivation of epidemic final sizes early estimates of epidemic final sizes using phenomenological models for forecasting the ebola challenge countries in the world by population modellingtransmission and control of the covid- pandemic in australia predictions for covid- outbreak in india using epidemiological models prudent public health intervention strategies to control the coron-avirus disease transmission in india: a mathematical modelbased approach. the indian journal of medicalresearch authors thank and acknowledge to all health care workers including doctors, nurses, assistants and the law and enforcement authority for their diehard efforts to manage the covid- pandemic conditions in bangladesh. mmr, th and aa conceived the study with input from kmh. mmr, th and aa studied the equations and prepared study design. mmr led the project regarding, data collections, analysis to writing with the help of th and aa. mmr led the solving the equations and analyzing the data with the help from th and aa. mmr, aa and th produced the first draft of the manuscript; kmh did put efforts regarding writings and corrections of the manuscripts. kmh and mah added additional points in discussions. mmr, aa, kmh, th, and mah finalized the manuscripts after necessary corrections and obtaining suggestions from all authors. mmr, and kmh jointly supervised all the works from the beginning to the end. all authors did read and agreed unanimously to submit the manuscripts. this study has not yet received any funds from any institute, organizations or government. all authors declare that there is no conflict of interests among them. key: cord- -orh fd c authors: oliveira, a. c. s. d.; morita, l. h. m.; da silva, e. b.; granzotto, d. c. t.; zardo, l. a. r.; fontes, c. j. f. title: bayesian modeling of covid- cases with a correction to account for under-reported cases date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: orh fd c the novel of covid- disease started in late making the worldwide governments came across a high number of critical and death cases, beyond constant fear of the collapse in their health systems. since the beginning of the pandemic, researchers and authorities are mainly concerned with carrying out quantitative studies (modeling and predictions) overcoming the scarcity of tests that lead us to under- reporting cases. to address these issues, we introduce a bayesian approach to the sir model with correction for under-reporting in the analysis of covid- cases in brazil. the proposed model was enforced to obtain estimates of important quantities such as the reproductive rate and the average infection period, along with the more likely date when the pandemic peak may occur. several under-reporting scenarios were considered in the simulation study, showing how impacting is the lack of information in the modeling. the covid- epidemic disease is caused by the new sars-cov- coronavirus associated with the severe acute respiratory syndrome (sars) that began in wuhan, china, late (rodríguez-morales et al., ) . after the first detected case in china, the disease continued to spread globally with exported cases confirmed in all of the continents worldwide. in a matter of a few months, the disease overtook thousand reported cases until early april, . on march nd, the world health organization (who) declared covid- as pandemic disease, when more than thousand cases and almost a thousand deaths were registered in the european region -the center of this pandemic according to the europe's standing committee (who, ) . there are still many unknowns about covid- and the lack of evidence complicates the design of appropriate response policies -for example, it is impossible to precisely say something about the mortality rate and determine the disease recurrence rate (lenzer, ) . despite uncertainties, the frightening speed through which this disease spreads across communities and the collapse that it is capable of causing to the health systems are facts that must be faced. the exponential growth of the cases and the consequent number of deaths had been observed in a short period. in mid-january , a few weeks after the first detected covid- case in the world, the countries that are close to the territory of the virus origin, on the asian continent, as well in european and american regions also began to report cases of the disease. five months later, more than countries and territories around the world have reported over to million confirmed cases of covid- and a death toll of about thousand people. in brazil, the first confirmed covid- case occurred on february th, . this first case was a years-old male, who stayed from february th to february th, in lombardy -an italian region were a significant outbreak was ongoing at that time. on march th, the health authorities in são paulo confirmed the brazilian death from the new coronavirus. the victim, whose identity has not been disclosed, had been hospitalized in são paulo city. preserving due proportions, covid- is not the first experienced significant outbreaks of infections that were declared public health emergencies of international concern by the who. year after year we also have experimented with the zika and chikungunya outbreaks in the last decade and continue facing the huge consequences of dengue. confronting outbreaks in the large brazilian territory is a twofold problem. the first is the demographic and territorial size of the country, with an estimated population of million according to the brazilian institute for geography and statistics and the heterogeneity intrinsic to its extensive territory. another problem pointed out by the past epidemics run into a recurring problem of under-reporting (de oliveira et al., ; stoner et al., ) . the covid- , given its complexity and behavior, exposed the problem of under-reporting disease occurrence not only in brazil but in several countries worldwide. as a consequence, the lack of information has launched a warning about the researchers of the world concerning models and estimates, since the database available may not be reliable from what had indeed been observed. focusing on the modeling and estimating, aiming to preview the behavior and the speed of the covid- growth, this paper presents an approach to address the problem of under-registration of covid- cases in brazil, proposing methodologies to work on the inaccuracy of the official reported cases. then, we investigate a general framework for correcting under-reporting data making it possible to perform a model, in a bayesian framework, which allows great flexibility and leads to complete predictive distributions for the true counts, therefore quantifying the uncertainty in correcting the under-reporting. several scenarios of under-reporting were considered in a simulation study, presenting the real lack of data impact. this paper is organized as follows. section describes the methodology for estimating the reported rates. in section , we introduce the sir model for modeling epidemics. in section , we introduce the bayesian framework for the sir model with a modification to account for under-reporting. in section we show the model application for covid- cases in brazil and in section , we present a simulation study of the proposed model. finally, in section , we give some concluding remarks. although in the first moment there was a real hunt for the size and the moment of the covid- cases peak, the most important aspects of the outbreak are the growth rate of the infection. statistical and mathematical models are being used to preview the rates and analyze the growth curve behavior to assist health public managers in decision-making (cotta et al., ) . according to kim et al. ( ) , estimating the case fatality rate (cfr) is a high priority in response to this pandemic. this fatality rate is the proportion of deaths among all confirmed patients with the disease, which has been used to assess and compare the severity of the epidemic between countries. the rates can also be used to assess the healthcare capacity in response to the outbreak. indeed, several researchers are interested in estimating the cfr in the peak of the outbreak, analyzing its variation among different countries, and check the influence of other features as ages, gender, and physical characteristics in the cfr of the covid- . aiming to estimate the cfr, first of all, lets set up the brazilian scenario of covid- case notification: the brazilian ministry of health collects daily all confirmed cases data for brazil and all its states. although the data presented by the health authorities are official, they are only from patients with covid- confirmed by blood and/or swab positive tests. given the scarcity of tests for all the suspected individuals, the notified patients are only those with severe disease or that demanding hospitalization. it is relevant to highlight that no clinically diagnosed patient, even those with symptoms compatible with the disease have been officially counted, evidencing an under-reporting of the case frequency. faced with the lack of covid- tests, which naturally leads to the underreporting data, before any modeling purpose we have the desire to correct and update the current numbers, bringing them as close as possible to reality. following russel et al. ( ) , we also based on a delay-adjusted case fatality ratio to estimate under-reporting, using the incidence of cases and deaths to estimate the number of notified cases by where c t is the daily incidence of cases at the moment t, f j is the proportion of cases with a delay between the confirmation and the death, and µ t represents the underestimation proportion of cases with known outcomes, (nishiura et al., ) . then, the corrected cfr is given by where m t is the cumulative number of deaths. to estimate the potential for under-reporting, we assume that the cfr is . % with a % confidence interval from . % up to . % found in china (guan w-j, ) . thus, the potential for reporting rate is given by ( ) epidemic models are tools widely used to study the mechanisms by which diseases spread, to predict the course of an outbreak, and to evaluate strategies to control an epidemic disease. several analyses of an epidemic spreading disease can be found in the literature that applies the time series model (given the historical data), the log-logistic family of models (the chapman, richards, among others), and compartments models (bjørnstad, ). kermack & mckendrick ( ) proposed a class of compartmental models that simplified the mathematical modeling of infectious disease transmission. entitled as sir model, it is a set of general equations which explains the dynamics of an infectious disease spreading through a susceptible population. essentially, the standard sir model is a set of differential equations that can suit the susceptible (if previously unexposed to the pathogen), infected (if currently colonized by the pathogen), and removed (either by death or recovery) as follows: where s, i and r are the total number of susceptible, infected and removed individuals in the population, respectively, γ is the removal rate and β is the infectious contact rate. it is important to note that and so, the total population, s(t) + i(t) + r(t) remains constant for all t ≥ . . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . for the practical point of view, the most interesting issue is to estimate γ , which determines the average infection period and the basic reproductive ratio r = β γ , defined as the expected number of secondary infections from a single index case in a completely susceptible population (keeling & rohani, ) . the bayesian methods are used in several works (gelman et al., ) ; (paulino et al., ) . the bayesian approach in the context of the sir model is a flexible way to account for uncertainty in the parameters, in the form of the disease transmission dynamic. the dirichlet-beta state-space model appears in some papers as osthus et al. ( ) and song et al. ( ) . the target distribution for inference is the a posteriori distribution of the quantities of interest, more specifically β, γ, and r : the infectious contact rate, the removal rate, and the propagation rate, respectively. the application of this methodology is through markov chain monte carlo methods (mcmc) through gibbs sampling and the metropolis-hastings algorithm (chib & greenberg, ) . the use of dirichlet distribution for the proportions of susceptible, infected, and removed individuals in the target population are a feasible way to guarantee that the support set of these quantities has boundaries, for example, the number of infected individuals must be always positive. in this section, we present a modification to account for under-reporting in the context of the dirichlet-beta state-space model from osthus et al. ( ) . this adaptation is based on a reparametrization of beta distribution that includes the reported rate estimate, η, from equation ( ). the beta distribution, as is well known, is very flexible for proportions modeling since its density can have quite different shapes depending on the values of the two parameters that index this distribution (ferrari & cribari-neto, ) . for this reason, we made a reparametrization to the beta model in such a way that we could obtain a regression structure for the means of the response variables associated with a precision parameter. let y i t be the reported infected proportion, y r t be the reported removed proportion and θ t = (θ s t , θ i t , θ r t ) be the true but unobservable susceptible, infectious, and removed proportions of the population, respectively. hence, we rewrite the sir model in terms of these unobservable proportions as the following . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . then, the distributions for y i t , y r t , and θ t are given below where φ = (β, γ, θ , κ, λ) is the parameter vector for this model and f (θ t− , β, γ) is the solution for the differential equations in ( ). note that it is necessary to obtain the solutions for the proportions θ s t θ i t and θ r t . these solutions can be found using the runge-kutta fourth-order method, in short rk , for solving non-linear ordinary differential equations (mathews, ) and can be seen in appendix a. the official brazilian data consists of daily collections carried out by the national health department with records of infected individuals and deaths in all states and national territory, from february th, when the first case of covid- was registered up to may th, . it is notable in brazil a lack of testing due to the registry of only severe cases and consequently under-reporting cases of covid- . taking this fact into account, we consider for this research not only the official data but also the estimates of reported rate. in order to obtain the estimate of reported rate, assume that the delay in confirmation until death follows the same estimated distribution of hospitalization until death. using data from covid- in wuhan, china, between december th, , and january nd, , it has a lognormal distribution with mean of , median of . and standard deviation of . days (linton nm, ) . this methodology based on the information of delay from hospitalization until death is reasonable since china was considered as one of the countries that most tested the population for the virus, and consequently, it is supposed to have a tiny under-reporting rate. using the methodology presented in section , the reporting rate in brazil, η, was estimated to be . with % confidence interval from . up to . . prado et al. ( ) obtained a reporting rate of . with data from brazil until april th, . these results are similar to the analysis from ribeiro & bernardes ( ) , which present a . : under-reporting rate, meaning that the real cases in brazil should be, at least, seven times the published number. table presents the rates for all states of brazil, from which we can observe that paraíba has the lowest reported rate . and while roraima presents the highest reported rate . . indeed, prado et al. ( ) found that paraíba and pernambuco had a low reporting rate comparing with other states. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . for the adjustment of the bayesian model, the prioris and hyper-parameters are specified: γ -we assume that the average infection period is equal to days. thus, the γ a priori belongs to lognormal distribution with mean of . and variance of . . γ ∼ logn(− . , . ). the average infection period ρ comes directly from γ parameter, that is, ρ = γ . β -the reproduction number r of the disease is estimated by the ratio r = β γ . we assume that r a priori belongs to lognormal distribution with mean of and variance of . thus β values were obtained from the a priori distributions for k, λ i and λ r and θ were obtained according to osthus et al. ( ) , that is, the estimates from a posteriori distributions for r , β, γ, k, λ i and λ r were obtained through mcmc methods, specifically gibbs sampling (geman & geman, ) . to execute the sampling procedure, we used the r programming language (r core team, ), with rjags package (plummer, ) . the total number of iterations considered, as well as the discard (burn-in) and the minimum distance between one iteration to another (thin) were obtained through the criterion of raftery & lewis ( ) in the analysis of a pilot sample with , iterations. the convergence diagnosis of the mcmc procedure was verified using the geweke (geweke, ) and heidelberger and welch(heidelberger & welch, ) criteria, which are available in the coda package (plummer et al., ) . table shows the p-values from geweke, and heidelberger and welch convergence diagnostics, from which we conclude that chains reached convergence for all parameters (p-value > . ). the inference was made by considering the reported rate estimate in brazil,η = . , a chain of , interactions was generated, with a burn-in of , and a thin of , resulting in a final sample of , values. the parameter estimates are shown in table , in whichβ = . and γ = . are the major characteristics from sir model andk = , . , λ i = , . andλ r = , . express the magnitude of the process error for the unknown proportions (θ) in bayesian approach. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint the inference results show thatr = . which expresses a high reproductive rate of the virus. also,ρ = . days shows that the time for virus infection is very close to one month period. using the parameter estimates from table and the latent proportion (θ), we reached information about the peak from sir curve for the covid- transmission in brazil, that is the time when the proportion of infected individuals reaches its maximum. the peak estimate is june th, , occurring between june nd and june th, and it is shown in figure . concerning to evaluate the effect of the notification rate on the model's estimates, a simulation study was carried out. the model was estimated considering covid- data in brazil, assuming a reporting rate between . and . , varying every . . aiming the practical point of view, we conduct a simulation study to investigate the effects of under-reporting in the parameters of the sir model and how it impacts on the pandemic curve behavior. for each value of η, a chain of , interactions was generated, with a burn-in of , and a thin of . figure shows the point estimates and % credible intervals for β and γ versus the reported rate values. it can be observed that as reported rate increases, β estimate becomes lower, which means that the infectious contact rate . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , may , . . https://doi.org/ . may , / doi: medrxiv preprint is underestimated when under-reporting is ignored. additionaly, the removal rate γ remains almost constant when the reported rate increases, which means that it is not influenced by the rates. the graphics with the point estimates and % credible intervals for r and infection period ρ versus the reported rates are shown in figure , from which we observe that r decreases as the reported rate increases and ρ keeps roughly invariant, then we can conclude that the reproduction rate and infection period can be underestimated when under-reporting is ignored, affording an unreal impression on a tiny mean number of secondary individuals that a primary individual can infect, when in fact it is large. figure shows the estimated sir curves for covid- versus reported rate, from which we observe that the lower the reported rate, the earlier the peak is reached with a higher proportion of infected individuals. it is also observed that the contagion curves become similar to each other as the reported rates increase. these results reveal that the peak estimate of the covid- transmission curve . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint in brazil is compromised when the presence of under-reporting is ignored. finally, table presents the deviance information criterion (dic) (spiegelhalter et al., ) , which indicates the sir model with the reported rate of . as the best one that fitted the simulated data, since its dic value is the lowest. these results suggest that the notification rate is very low. in this paper, we show that the method of adjusting cases by delay can be used to determine the reported rate of covid- cases. thus, it was possible that the rate of cases reported in brazil is . and thus underestimates the real spreading of pandemic in the country. thus we proposed a sir model with correction for under-reporting. the bayesian approach is a feasible way to deal with the parameters inherent to the sir model. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint the methods reached convergence in the application with the brazilian covid- data set. thus, a reproductive rate of . was obtained, indicating that the epidemic is still booming in brazil. the simulation study revealed that the parameters estimates from the sir model and the peak estimate which is a concern of several researchers and health authorities are sensitive to reporting rates. future work may include considering the use of extended sir models like the seir model (with the compartments of susceptible, exposed, infected, and removed individuals), and further, consider different scenarios of isolation and quarantine for the strategy of the covid- transmission control. , o. n. ( ) . epidemics: models and data using r. springer. chib, s., & greenberg, e. ( ) . understanding the metropolis-hastings algorithm. the american statistician, , - . url: http://www.jstor. org/stable/ . cotta, r. m., naveira-cotta, c. p., & magal, p. ( ). parametric identification and public health measures influence on the covid- epidemic evolution in brazil. medrxiv , . arxiv:https://www.medrxiv.org/content/early/ / / / . . . .full.pdf. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . evaluating the accuracy of sampling-based approaches to the calculation of posterior moments clinical characteristics of coronavirus disease in china simulation run length control in the presence of an initial transient modeling infectious diseases in humans and animals containing papers of a mathematical and physical character understanding and interpretation of case fatality rate of coronavirus disease covid- : us gives emergency approval to hydroxychloroquine despite lack of evidence incubation period and other epidemiological characteristics of novel coronavirus infections with right truncation: a statistical analysis of publicly available case data numerical methods for mathematics, science and engineering early epidemiological assessment of the virulence of emerging infectious diseases: a case study of an influenza pandemic a random-censoring poisson model for underreported data forecasting seasonal influenza with a state-space sir model lisboa: fundação calouste gulbenkian rjags: bayesian graphical models using mcmc coda: convergence diagnosis and output analysis for mcmc análise de subnotificação do número de casos confirmados da covid- no brasil r: a language and environment for statistical computing. r foundation for statistical computing comment: one long run with diagnostics: implementation strategies for markov chain monte carlo estimate of underreporting of covid- in brazil by acute respiratory syndrome hospitalization reports going global-travel and the novel coronavirus. travel medicine and infectious disease using a delay-adjusted case fatality ratio to estimate under-reporting. available at the centre for mathematical modelling of infectious diseases repository, here an epidemiological forecast model and software assessing interventions on covid- epidemic in china. medrxiv bayesian measures of model complexity and fit a hierarchical framework for correcting under-reporting in count data who announces covid- outbreak a pandemic let f (θ t− , β, γ) be the runge-kutta rk approximation to the sir model. thus, key: cord- -og ybfzf authors: marinov, tchavdar t.; marinova, rossitza s. title: dynamics of covid- using inverse problem for coefficient identification in sir epidemic models date: - - journal: nan doi: . /j.csfx. . sha: doc_id: cord_uid: og ybfzf abstract this work deals with the inverse problem in epidemiology based on a sir model with time-dependent infectivity and recovery rates, allowing for a better prediction of the long term evolution of a pandemic. the method is used for investigating the covid- spread by first solving an inverse problem for estimating the infectivity and recovery rates from real data. then, the estimated rates are used to compute the evolution of the disease. the time-depended parameters are estimated for the world and several countries (the united states of america, canada, italy, france, germany, sweden, russia, brazil, bulgaria, japan, south korea, new zealand) and used for investigating the covid- spread in these countries. the inverse problem for estimating the time-dependent transmission and removal rates in the sir epidemic model is derived and solved. the minimization problem uses the entire dataset with data available on june , for estimating the non-constant rates. the obtained numerical results demonstrate that the transmission and removal rates and the unknown functions are accurately estimated. the numerically computed rates are used for forecasting the covid- pandemic for the world and a number of countries. the results of this research give insight of the pandemic in parts of the world and could help in determining policy. the sir model is a good choice for the short period of time of this epidemic; however, it possesses known limitations in case of a long term infectious disease. in future, we plan to use other models. depending on future developments of the disease, we may consider models addressing non-constant population, latency, reinfection, and vaccine. the covid- coronavirus appeared in late and quickly spread across many countries. according to [ ] and [ ] , by june , , there were more than million confirmed cases of infected people, with more than , reported deaths globally. governments closed the so-called non-essential businesses and services for weeks in order to slow down the growth of infections -especially among vulnerable populations -and thus, save lives. numerous mathematical models are developed to forecast the future of the covid- epidemic spread worldwide. they are used to assist governments in making decisions to cope with the virus and its consequences. as of june , there are no vaccines for this highly contagious disease; nor efficient antiviral drugs. mathematical modeling and forecasting the spread of epidemic diseases has a long history, see [ ] , [ ] , [ ] , and [ ] . the earliest published paper on mathematical modeling of spread of disease was carried out in by daniel bernoulli. trained as a physician, bernoulli created a mathematical model to defend the practice of inoculating against smallpox [ ] . infectious diseases include measles, malaria, varicella, hiv, ebola, and sars. a systematic review of the risk of death associated with middle east respiratory syndrome (mers) as well as risk factors for associated complications is given in [ ] . an analysis and forecast of covid- spreading in china, italy and france was given in [ ] . there is no vaccination for some of the infectious diseases, only preventive practices -see [ ] . kermack and mckendrick [ ] introduced their model in . their theory is a mathematical hypothesis proposed to explain the rapid rise and fall in the number of infected people with a contagious illness in a closed population over time. it is the origin of sir (susceptibles-infected-recovered) type models. this formalism is the basis of all current modeling of the dynamics and evolution of infectious diseases, see [ ] , [ ] , [ ] , [ ] , and [ ] . the models are useful in understanding the basic principles of the system, even in choosing a proper policy of infectious disease control. covid- attracted attention in recent publications [ ] , [ ] , [ ] , [ ] , [ ] , and [ ] . the dynamical systems for epidemics are highly nonlinear -they include diverse characteristics, from population specifics to the immune system of an individual. every new epidemics must be studied and mathematical models constructed to answer important questions on handling the disease. the parameters in the sir model, the rate of transmission a and the rate of removal b, depend on the evolution of the epidemic disease over time, see [ ] . complex and realistic mathematical models are used to assist policy decision making, recent relevant works include [ ] . this work aims to create a method that can accurately identify the time dependent parameters of the sir system using real data and then use the computed parameter values to predict the spread of the epidemics. if some conditions change, then the projection based on the "old" data will no longer be accurate and will require adjustment. the parameter estimation is referred to as the inverse modeling problem. it means adjusting the parameters of a mathematical model to reproduce measured data. model deficiencies are usually due to inaccurate parameters and must always be addressed. the inverse problem is crucial for calibrating the model and for controlling the model parameters. approaches involving inverse problems can be successfully applied to a variety of important processes, including the spread of infectious diseases, allowing epidemiologists and public health specialists to make predictions on epidemics [ ] . the present work extends the method proposed in [ ] for finding optimum values for the infectivity and recovery rates. in [ ] , these values are assumed to be constant over the whole interval because we applied the method to a short outbreak of influenza instead of considering the long term evolution of a pandemic as with the case of covid- . here, we assume the infectivity and recovery rates are functions of time, namely a = a(t) and b = b(t). the sir model is undoubtedly the most famous mathematical model for the spread of an infectious disease. a constant population of size n is divided into three classes: susceptible s, infectives i and removed r. removed individuals r are no longer susceptible nor infectious for whatever reason; for example, they have recovered from the disease and are now immune, or they have been vaccinated, or perhaps they have died from the disease. since the isolated infectious people are still capable of infecting individuals from class s (see for example the big number of infected medical personnel), we count these people as members of the class i. the diagram for the sir model is given on fig. . there exist other models considering additional events and features, such as birth and death rates (non-constant population), vaccine effect, reinfection, latent period, and so on. in this work, we use the sir model, because the considered time period is relatively short. we assume that the population under study is well mixed so that every person has equal probability of coming into contact with every other person. this major assumption does not hold in many situations [ ] , e.g. sexually transmitted diseases and covid- social distancing. let a∆t be the probability that a random infectious individual infects a random susceptible individual during the time period ∆t. then with s(t) susceptible and i(t) infectious population, the expected number of newly infectious individuals in the total population n during time ∆t is asi∆t, i.e., the rate of change of the class s(t) is −asi, where a > . we also assume that infectious individuals leave the i(t) class with rate b and they move directly into the r(t) class. there are reports for recovered from covid- individuals who are re-infected; hence, we added a dashed arrow from the class r to the class i. since the number of these cases at the present moment is very limited, we cannot estimate the rate c and we do not include this option into the system. since the differential equations ( ) and ( ) do not depend on r, it is convenient to split the system into two parts -equations ( ) and ( ) as one system, and equation ( ) by itself. if the coefficients a and b are given, we can derive initial conditions from the given data and solve the problem numerically by any of the well known methods for initial value problems (ivps). we refer the reader to [ ] for more information on methods for solving ivps numerically. the main question is how to find the coefficients a and b for a given infection, e.g. covid- . the problem for the estimation of the constants a and b is an inverse problem solved in [ ] . a similar approach for identifying coefficients in an euler-bernoulli equation from over-posed data is used in [ ] and [ ] . we assume that (from the available data) the values of s and i are known at two time moments, initial time moment t i and final time moment t f , and if the coefficients a and b are known, the solution of equations ( ) and ( ) can be determined using the initial conditions ( ) . in general, the terminal conditions ( ) are not satisfied exactly. therefore, if the coefficients a and b are given, the problem is overdetermined. let us assume that the coefficients a and b are constant and unknown. in this case, the general solution of the system ( ), ( ) depends on four constants-two constants from the integration and the unknown coefficients a and b. the number of conditions/equations in ( ) and ( ) is also four; hence, the problem for identifying the coefficients along with the functions s and i is well-posed, provided that an exact solution exists (or can be obtained). this type of problems belongs to the class of the so-called inverse problems. still, the problem for obtaining (s, i) and (a, b) from equations ( ) and ( ) under the conditions ( ) and ( ) could also be ill-posed. for arbitrary values of s i , s f , i i , i f , there may be no solution (s, i), (a, b) satisfying the equations ( ), ( ) and all of the conditions in ( ), ( ) . for this reason, we assume that the problem is posed correctly after tikhonov [ ] , i.e., it is known a-priori that a solution of the problem exists. in other words, the the boundary data have "physical meaning" and, therefore, a solution exists. we are now ready to construct an algorithm for approximating the solution (s, i), (a, b) of the inverse problem ( ), ( ), ( ), ( ) . for estimating the coefficients a and b of the sir model, we use an approach that is very similar to the variational method used in [ ] . this method is a generalization of the least squares method. more details about the method for transforming the inverse problem into a correct direct problem can be found in [ ] , [ ] , [ ] , [ ] . the original sir model assumes that the coefficients a and b are constants. in real live situations, e.g. the covid- pandemic, the coefficients vary in time, i.e., a = a(t) and b = b(t). the coefficients are affected by governments' restrictions, societal experiences, treatment, among others. a similar idea was proposed in [ ] . let d be a data set of function values s(t k ), i(t k ) at some time moments t , t , . . . . we assume there exists an algorithm for solving the inverse problem ( ), ( ), given the conditions ( ), ( ) and the data set d. here, we consider a and b to be piece-wise constant (step) functions of time the constants a k and b k can be estimated by solving the inverse problem ( ), ( ), ( ), ( ) using s i = s(t k− ), s f = s(t k ), i i = i(t k− ), and i f = (t k ), for k = , , . . . epidemics grow if the derivative di/dt > and decrease if di/dt < . an important parameter, which characterizes infectious diseases, is the reproduction rate (also called reproduction number or ratio) of the infection from equation ( ) and ( ), it follows that consequently, the epidemic grows if ρ(t) > and decreases if ρ(t) < . the reproduction rate ρ(t) represents the expected number of secondary infections produced by a single primary infected person. the reproduction rate in hubei province (china) by using case-report data from january to february , was estimated in [ ] as . (with % confidence interval . . ). the reproduction rate ρ(t) also relates to the fraction of the population that gets sick. another important characteristics of an infectious disease is the so-called herd immunity -the minimum fraction of the population,f , that is required to have immunity in order to prevent an epidemic [ ] . the stability analysis of the sir model shows thatf = − /ρ. the society can acquire herd immunity in two ways. the "natural" way is to let the epidemic spread until the required fraction of the population,f , obtains immunity. the most common "non-natural" way is vaccination. since vaccines are not presently available for covid- , some countries, e.g. sweden, tried the "natural" way. as of this moment, their approach is questionable, according to the results for their pandemic spread, see section . . many countries have been practicing social distancing and other measures to slow down the transmission rate a of the epidemic. a method for solving the inverse problem in the case of constant coefficients a and b is given in [ ] , along with a discussion of some theoretical aspects. here, we present the numerical algorithm for estimating the coefficients if they depend on time, i.e., a = a(t) and b = b(t). the inverse problem for the time dependent infectivity and recovery rates follows the idea of the method used in [ ] . the original problem is replaced by a minimization problem. for the correct embedded problem, a difference scheme and a numerical algorithm can be constructed. first, we start with the sub-problem in case of using data values at two time moments. since we are concerned with the numerical solution of the system ( )-( ), we seek approximations of the functions s(t), i(t), and r(t) at the discrete set {t , t , . . . , t n } of points in the interval [t i , t f ], where t i is the initial time moment, t f is the final time moment, and n is an integer greater than . the mesh of equidistant points is shown in fig. . we define the step size as τ = tf −ti n− . the nodes are the equidistant points t k = t i + kτ , k = , , . . . , n. now we are ready to discretize equations ( ) and ( ) this discretization secures the second order of approximation o(τ ). since the problem ( )-( ) is non-linear, we use an iterative procedure, assuming that s k andĪ k are given from the previous iteration. consider the function Φ(a, b, s , . . . , s n− , i , . . . , i n− ) = where k and δ k are the residuals of equations ( ) and ( ), respectively. since the function Φ is a homogeneous quadratic function of k and δ k , its absolute minimum is zero. on the other hand, the function Φ attains its minimum if and only if k = and δ k = for all k = , , . . . , n − . hence, there exists one-to-one correspondence between the solution of the system of equations ( ), ( ), ( ), ( ) and the problem for the minimization of the function Φ under the conditions ( ) and ( ). the necessary conditions for minimization of the function Φ with respect to its arguments s k and i k are conditions ( ) yield the following difference equations for k = , , . . . , n − . adding the initial and terminal conditions ( ) and ( ), we obtain a well-posed linear system of (n + ) equations for the unknown sets of values (s , s , s , . . . , s n ) and (i , i , i , . . . , i n ). we rewrite the function Φ in the form where the necessary conditions for minimization of the function Φ with respect to a and b are as follow the solution of the system ( ), ( ) is let us assume that the number of infectious individuals at some time moments ν , ν ,. . . , ν m , is known and given by for l = , , . . . , m, while the values of the coefficients a and b are unknown. suppose that for every ≤ l ≤ m, there exists index k l , such that ν l = t k l , i.e., the set of time moments {ν , ν , . . . , ν m }, is a subset of the set of mesh nodes {t , t , t , . . . , t n }. to simplify the calculations, let us introduce the notations χ k such as: if there exists k ∈ { , , , . . . , n} such that t k = ν l , ≤ l ≤ m, then χ k = σ l and µ k > , otherwise χ k = and µ k = , where µ k is the weight of equation ( ) in the lsm. similarly to subsection . , consider the function where k and δ k are the residuals of the equations ( ) and ( ), respectively. the necessary conditions for minimizations of the function Φ with respect to s k and i k are given by equations ( ) . for the case under consideration, we obtain for k = , , . . . , n − . adding the initial and terminal conditions ( ), ( ), we obtain a well-posed linear system with (n + ) equations for the unknown set of values (s , s , s , . . . , s n ) and (i , i , i , . . . , i n ). the equations for a and b in this case are the same as the equations ( ) and ( ), derived in subsection . . . we use the following approach for solving the system ( ), ( ), ( ), ( ): i) with given initial values forŜ,Î, a, b, the system ( ), ( ), ( ), ( ) is solved for the functions s and i. ii) the deviation of the values of s and i froms andĪ are computed as: if they are smaller than a given tolerance ε , then the algorithm proceeds to iii); otherwise,Ŝ andÎ are replaced by s and i, respectively, and the calculations return to i); iii) the coefficients a and b are computed from ( ) and ( ) . if the difference between the new and old values of a and b is less than ε then the calculations terminate; otherwise, the iterations continue, go back to i). we use ε = · − in all calculations presented here. . . parameters a, b, and ρ as functions of time the data set ( ) contains the number of infected individuals for every day for a period of m days. we divide the set into subsets of fixed length of p days (say p = , or , or days), as shown in fig. . next, we apply the algorithm presented in subsection . for every sub-interval [k − p + , k], for k = p, p + , . . . , m. consequently, we obtain a k and b k as functions of time. for every sub-interval we calculate the value of the reproduction rate models can help us understand what would be the worst case scenario and plan proper actions to achieve the best possible outcome under the given constraints. they cannot accurately predict what will happen. the forecasts of the epidemic spread are based on the available covid- data as of june , , at [ ] and [ ] . our experiments indicate that the optimal value of ε is − . the parameters a, b, and ρ as functions of time are computed with τ = / and µ = τ . they are estimated with calculations performed over , , and/or day intervals. the dynamic of the covid- disease is projected based on the values of the parameters taken at the last day of the period used for the transmission are recovery rates estimation. this section presents our results for the covid- pandemic. in particular, the following characteristics are important for understanding the infectious disease dynamics: • estimated up-to-date transmission rate a(t); • estimated up-to-date removal rate b(t); • estimated up-to-date reproduction rate ρ(t); • projected covid- infectious cases i(t) using the up-to-date estimated rates, taken at the last time period, for which data is available. the functions i(t), s(t), and r(t) satisfy the direct problem, if the coefficients are known. after estimating a and b, we find the numerical solution of the direct problem ( )-( ) with proper initial conditions in order to project the future behavior of the epidemic. in all calculations, we use second order runge-kutta method, a type of predictor-corrector method. the first step is: next, the corrector step improves the prediction obtained in the first step: finally, for k = , , . . . . we use τ = . for solving the direct problem numerically by the runge-kutta method. it must be noted that these forecasts are valid as far as the rates remain as estimated; however, since covid- is a highly contagious epidemic, for which very little is known, the rates of transmission, removal, and reproduction may suddenly change due to measures and/or events affecting the spread. over the last several months, people learned how to protect themselves from being infected; consequently, this helped keep the transmission and reproduction rates low. table . forecasts, based on the last , , and day interval estimations of a and b are given in fig. (d) . the reproduction rate ρ is still greater than one although decreasing. table . the projections, based on the last , , and day interval estimations of a and b are given in fig. (d) . the transmission rate a and the reproduction rate ρ have increased lately. table . the projections, based on the last , and day interval estimations of a and b are given in fig. (d) . canada started its reopening some services and businesses in several provinces around may , . however, during the month of june, the transmission rate a and the reproduction rate ρ steadily decrease. table . the projections, based on the last , , and day interval estimations of a and b are given in fig. (d) . the data shows that the epidemic is slowing down significantly and it may disappear soon. table . three projections, based on the last , and day interval estimations of a and b are given in fig. (d) . the reproduction rate has been less than one since the middle of may, but, the end of may it is again greater than one and the epidemic spread is increasing. (c) (d) figure : estimated values of a, b, ρ used for projections of i for france . . germany fig. (a)-(c) presents the obtained values of the parameters a, b, and ρ as functions of time, calculated over , , and day intervals. the values of a, b, and ρ for the last day are given in table . the projected values, based on the last , , and day interval estimations of a and b are given in fig. (d) . the reproduction rate has been less than one since the middle of april, leading to decrease of the epidemic spread. for the last week the reproduction rate tends to increase. (c) (d) figure : estimated values of a, b, ρ used for projections of i for germany . . sweden fig. (a)-(c) presents the obtained values of the parameters a, b, and ρ as functions of time, calculated over , , and day intervals. the values of a, b, and ρ for the last day are given in table . three forecasts, based on the last , and day interval estimations of a and b are given in fig. (d) . the authors could not find detailed data for sweden after may . table . three projections, based on the last , and day interval estimations of a and b are given in fig. (d) . russia flattened the curve of the reproduction rate ρ, with values being around one lately. table . three projections, based on the last , , and day interval estimations of a and b are given in fig. (d) . the transmission rate a decreases; however, the reproduction rate ρ is still greater than one. table . the projections, based on the last , , and day interval estimations of a and b are given in fig. (d) . the reproduction rate ρ was less than one for a while, however, is started to increase about two weeks ago. table . the projections, based on the last , , and day interval estimations of a and b are given in fig. (d) . the reproduction rate became less than one early may, causing the covid- spread to decrease. table . the projections, based on the last , , and day interval estimations of a and b are given in fig. (d) . the reproduction rate became greater than one during the last three weeks. there are similarities between the epidemic spread for some countries. fig. presents the obtained values of the reproduction rate ρ as function of time, calculated over day intervals, for four european countries: france; germany; italy; and bulgaria. the obtained values of the reproduction rate ρ as function of time, also calculated over day intervals, for the world, the united states of america, russia, and brazil are shown in fig. . the inverse problem for estimating the time-dependent transmission and removal rates in the sir epidemic model is derived and solved. the minimization problem uses the entire dataset with data available on june , for developments of the disease, we may consider models addressing non-constant population, latency, reinfection, and vaccine. a deterministic model for highly contagious diseases: the case of varicella classical and modern numerical analysis: theory, methods and practice, chapman & hall/crc numerical analysis and scientific computing infectious diseases of humans numerical modeling and theoretical analysis of a nonlinear advection-reaction epidemic system interepidemic intervals in forced and unforced seir models modelling mathematical methods and scientific computation an attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it identification of heat-conduction coefficient via method of variational imbedding a note on the existence and stability of an inverse problem for a sis model mathematical epidemiology of infectious diseases: model building, analysis and interpretation analysis and forecast of covid- spreading in china impact of non-pharmaceutical interventions (npis) to reduce covid mortality and healthcare qualitative study of a stochastic sirs epidemic model with information intervention the mathematics of infectious diseases analysis of sir epidemic model with information spreading of awareness a contribution to the mathematical theory of epidemics risk estimation of the sars-cov- acute respiratory disease outbreak outside china early dynamics of transmission and control of covid- : a mathematical modelling study global stability of a network-based sirs epidemic model with nonmonotone incidence rate dynamical behavior of a stochastic multigroup sir epidemic model novel numerical approach to solitary-wave solutions identification of boussinesq and korteweg-de vries equations coefficient identification in euler-bernoulli equation from over-posed data inverse problem for coefficient identification in sir epidemic models, computers and mathematics with applications coefficient identification in elliptic partial differential equation inverse problem for coefficient identification in euler-bernoulli equation clinical determinants of the severity of middle east respiratory syndrome (mers): a systematic review and meta-analysis mathematical biology. i. an introduction serial interval of novel coronavirus (covid- ) infections, international journal of infectious diseases characterization of the covid- pandemic and the impact of uncertainties, mitigation strategies, and underreporting of cases in south korea, italy, and brazil mathematical models in biology: an introduction viral evolution and transmission effectiveness real-time forecasts of the covid- epidemic in china from covid- infection: origin, transmission, and characteristics of human coronaviruses the sir model for spread of disease covid- : epidemiology, evolution, and cross-disciplinary perspectives mathematics for life science and medicine inverse problem theory and methods for model parameter estimation methods for solving incorrect problems novel coronavirus ( -ncov) situation reports -world health organization (who covid- coronavirus pandemic mathematical modeling and epidemic prediction of covid- and its significance to epidemic prevention and control measures key: cord- -jhokn bu authors: lachiany, menachem; louzoun, yoram title: effects of distribution of infection rate on epidemic models date: - - journal: phys rev e doi: . /physreve. . sha: doc_id: cord_uid: jhokn bu a goal of many epidemic models is to compute the outcome of the epidemics from the observed infected early dynamics. however, often, the total number of infected individuals at the end of the epidemics is much lower than predicted from the early dynamics. this discrepancy is argued to result from human intervention or nonlinear dynamics not incorporated in standard models. we show that when variability in infection rates is included in standard susciptible-infected-susceptible ([formula: see text]) and susceptible-infected-recovered ([formula: see text]) models the total number of infected individuals in the late dynamics can be orders lower than predicted from the early dynamics. this discrepancy holds for [formula: see text] and [formula: see text] models, where the assumption that all individuals have the same sensitivity is eliminated. in contrast with network models, fixed partnerships are not assumed. we derive a moment closure scheme capturing the distribution of sensitivities. we find that the shape of the sensitivity distribution does not affect [formula: see text] or the number of infected individuals in the early phases of the epidemics. however, a wide distribution of sensitivities reduces the total number of removed individuals in the [formula: see text] model and the steady-state infected fraction in the [formula: see text] model. the difference between the early and late dynamics implies that in order to extrapolate the expected effect of the epidemics from the initial phase of the epidemics, the rate of change in the average infectivity should be computed. these results are supported by a comparison of the theoretical model to the ebola epidemics and by numerical simulation. a goal of many epidemic models is to compute the outcome of the epidemics from the observed infected early dynamics. however, often, the total number of infected individuals at the end of the epidemics is much lower than predicted from the early dynamics. this discrepancy is argued to result from human intervention or nonlinear dynamics not incorporated in standard models. we show that when variability in infection rates is included in standard susciptible-infected-susceptible (sis) and susceptible-infected-recovered (sir) models the total number of infected individuals in the late dynamics can be orders lower than predicted from the early dynamics. this discrepancy holds for sis and sir models, where the assumption that all individuals have the same sensitivity is eliminated. in contrast with network models, fixed partnerships are not assumed. we derive a moment closure scheme capturing the distribution of sensitivities. we find that the shape of the sensitivity distribution does not affect r or the number of infected individuals in the early phases of the epidemics. however, a wide distribution of sensitivities reduces the total number of removed individuals in the sir model and the steady-state infected fraction in the sis model. the difference between the early and late dynamics implies that in order to extrapolate the expected effect of the epidemics from the initial phase of the epidemics, the rate of change in the average infectivity should be computed. these results are supported by a comparison of the theoretical model to the ebola epidemics and by numerical simulation. doi: . /physreve. . an important element in theoretical epidemiology is the epidemic threshold, which specifies the condition for an epidemic to grow. in mean-field epidemiological models, the concept of the basic reproductive number, ro, has been systematically employed as a predictor for epidemic spread and as an analytical tool to study the threshold conditions [ ] [ ] [ ] [ ] . an advantage of ro is that in many models it determines both the threshold for the emergence of an epidemic and the expected final outcome of an outbreak [ , ] . it has thus been widely used to gauge the degree of threat that a specific infectious agent will pose as an outbreak progresses [ , ] . however, over and over again differences have been observed between the predicted and observed sizes of epidemics, whereas in most cases the observed epidemic is much smaller than predicted [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . indeed, recent studies raise doubt about the validity of forecasting the outcome of epidemics using the ro-based estimate [ , ] . specifically, given an estimate of ro, one can project in the ordinary differential equation (ode) -based susceptibleinfected-recovered (sir) and susciptible-infected-susceptible (sis) models the future course of the epidemic. however, recent results have shown that these estimates are much larger than the true extent of the disease outcome [ ] . this overestimate of late dynamics when using early dynamics has been raised by many authors in general cases, as well as specifically for the ebola virus [ , ] . we build upon these known observations to show that even in fully mixed models, * louzouy@math.biu.ac.il the early dynamics cannot be used to estimate the outcome of the late dynamics, and we propose an alternative approach. this overestimate can be explained by a slow decrease in ro. ro can be reduced by human intervention, such as the removal of sick individuals from the society or removing carriers of the disease [ , ] , or a limitation of movement either for the entire population or for people expressing clinical signs [ , ] , which reduces the effective reproductive number [ ] . ro can also be reduced by passive vaccination of the population [ ] or aggressive vaccination of the population [ , ] . all models mentioned above deal with external elements reducing ro, and thus reducing the number of patients in steady state or the total number of removed individuals in sir models. we argue here that the measure of ro may not be indicative of future forecasts of the number of patients, even if no external factors or vaccinations are involved. consequently, additional information is required to estimate the number of people that will be affected by the disease in steady state, or when the epidemic is over. we propose here that the estimate of the change in the relative infection rate defined as i /i can be a good way to estimate the steady-state number of infected individuals in sis models, and the total number of removed individuals in sir models. we introduce an epidemic model in which each of the susceptible individuals has a different probability to get infected and the same recovery probability. this model differs from uniform models, but also from network models. in the standard sir and sis models, the probability of infection is constant for everyone [ ] [ ] [ ] [ ] . however, in reality, different people have different tendencies to be clinically sick and infectious (e.g., elderly people, children, or immune and deficient individuals [ ] [ ] [ ] [ ] [ ] [ ] ). in network sir models, the infectivity of each node is a function of its degree (and thus it varies among nodes). however, network models assume a constant interaction pattern that may be realistic in sexually transmitted diseases, but it is not realistic for most noncontact (e.g., airborne or vehicleled) transmission. moreover, in undirected networks, the probability to infect and to become infected are symmetrical [ ] [ ] [ ] [ ] [ ] [ ] , which is again mainly appropriate for sexually transmitted diseases, but not for airborne transmission. we present evidence here that in a wide class of models, the variability in the probability to get infected can break the link between the early phase of the epidemics and the predicted outcome. we investigate the dynamical processes driving this result, its validity, and its consequences. we then compare the conclusions of this model to observed ebola outbreaks. such results have been presented in network models. however, in such models the connectivity is fixed over time [ ] [ ] [ ] [ ] . in the current analysis, we show that inhomogeneity has a crucial effect even in fully mixed systems. the models used in this study are based on the sis and sir models. each susceptible individual has a different probability to get infected, but all infected individuals have the same probability to infect other susceptible individuals. specifically, individuals exist in two discrete states-"healthy" or "infected"-in the sis model, and three discrete states-"healthy," "infected," or "recovered"-in the sir model. at each time step, each susceptible (healthy) individual i is infected with rate β i . at the same time, infected individuals are cured and become susceptible again with rate γ in the sis model, or recovered in the sir model. the value of β i is only a function of the person getting infected and not the person infecting, and it does not vary over time for a given individual. given a variable probability to get infected in the population, we define the number of susceptible individuals with a β i value between β and β + dβ as s(β)dβ and the number of infected individuals as i (β)dβ. we further define n(β) as n (β) = s(β) + i (β). note that n (β) does not change over time and is thus equal to its initial condition. we have studied multiple distributions for this initial condition, as shall be further explained. s is the sum over all the susceptible individuals with different probability to get infected, i is the sum over all the infected individuals with different probability to get infected, and n is the sum over all the susceptible individuals and infected individuals with different probability to get infected. formally, β is the disease transmission rate (i.e., the probability that a person would be infected), and γ is the disease recovery rate. note that here we use an integral approximation of the discrete sum. we will further show that this approximation is consistent with numerical simulations. the equations for the sis model used here are and n = i + s. in the sir model, we added the recovered individuals class. r(β) is defined as a recovered individual with a probability β to get infected. formally, none of the recovered individuals can be infected again in the sir model. in this model, n (β) as n (β) = s(β) + i (β) + r(β). the equations of the sir model are where i is defined in eq. ( ), s is defined in eq. ( ), and r is defined above in eq. ( ) , when n = i + s + r. in the different models studied here, we implicitly consider four different distributions for the probability of susceptible individuals to get infected: (i) constant infection rate for all susceptible individuals. this is equivalent to the mean-field sis or sir models. we used β = × − , γ = , and a population size of n = . thus ro = βn/γ = . all other models only differ in the distribution of β. (ii) uniform probability distribution of infection rates within a range × − < β < × − . (iii) normal probability distribution with mean μ and variance σ , the probability of having infection rate β is . we used μ = × − , and σ = μ. (iv) scale-free distribution with the probability of having infection rate β is p (β) = β −α , where α = . , and β is limited to the range ( × − , × − ). monte carlo simulations of the systems studied have been performed with a population size of n = . we assign a different infection rate to each individual using the four distributions above. the population is initiated with a small number of infected individuals, i = . these initial infected individuals are random individuals (i.e., they have β values randomly chosen from the population). note that β is very low. however, β * n is high enough, allowing for the epidemics to spread, since n = . we assume full mixing and a mass action formalism, allowing for a very small number of infected individuals to spread the pathogen. formally, we assign each susceptible individual at each iteration a probability β i t to be infected. similarly, each infected individual is assigned in each iteration a probability of γ t = t to be removed and become susceptible again. the simulation updating is synchronous. the dynamics are simulated for different parameter values. the odes were solved numerically using the matlab fourth-order runge-kutta method [ ] , as applied in the matlab ode function assuming nonstiff equations [ ] . we study the behavior of an epidemic outbreak assuming that each susceptible individual has a different probability of getting infected. however, once it is infected, its contribution to the total force of infection is constant. in other words, p [s(β) + i → i (β) + i ] = β. this represents, for example, people with different levels of susceptibility to a given disease. within this model, we study two possible models of recovery. either the recovered hosts are immunized and then an sir model is used, or the recovered hosts become susceptible again and then their β value does not change over time, and an sis model is used. to study the effect of the distribution of β (the probability to be infected) on both initial and late dynamics, we assume one of four possible β value distributions: constant, uniform, gaussian, or scale-free. in all cases, we maintain the expected value of β equal among models. as will be further shown, the initial dynamics are only affected by the first moment of the distribution (the expected values of β), while the total number of infected individuals during the outbreak in the sir model or the steady-state infected fraction in the sis model can be strongly affected by the following moments. thus, in some distributions, it is impossible to predict the "outcome" of the epidemics from the observed initial dynamics and the resulting estimate of ro. to examine the behavior of the infected class as a function of time, we developed a moment closure scheme, and we use the following notations: for the sir model, in addition to the above definitions, we define we then use the following notation: as a product of eq. ( ) with eq. ( ), as a product of eq. ( ) with eq. ( ), as a product of eq. ( ) with eq. ( ) , and as a product of eq. ( ) with eq. ( ). for the sis model in eq. ( ), the number of infected individuals can be estimated via (see appendix a) where i is the first order of eq. ( ) and e n (β) is the first order of eq. ( ). an iterative equation can be developed to estimate the higher orders: this would obviously lead to an infinite number of coupled equations. however, the number of equations can be limited using a simple moment closure method. the highest order of the set of odes is set to be zero, and the order below it is set to be a constant. note that more advanced schemes could be used [ ] [ ] [ ] . however, this simple scheme agrees well with simulation and is enough for the current analysis. for the sir model in eq. ( ), the number of infected individuals can be estimated via where i is the first order of eq. ( ) , e n (β) is the first order of eq. ( ), and r is the first order of eq. ( ). we developed a similar scheme for the sir model, with eq. ( ). the iterative equation obtained for the number of susceptibles is where s n is defined in eq. ( ) . the closure scheme above is also applied here. for example, if the stopping order is s , we set s = , and the previous order is set to the moment closure scheme is consistent with the dynamics of the infected population simulated using a monte carlo simulation with the appropriate parallel distribution (figs. - ) . increasing the number of moments for the solution of eqs. ( ) and ( ) reduces the difference between monte carlo simulations and the moment closure results (data not shown). for example, in the sis model, for the three cases of distribution of infection rates, an order of coincides with the results obtained from the simulation. in the sir model, only an eighth order was needed to reach a very high prediction. ) for the sir model. the results of a monte carlo simulation, the full moment closure scheme, and the approximation of the initial dynamics were compared, and a good fit was obtained. for all four distribution studied, n = , γ = , and e n (β) = × − . the rising line is the initial time approximation, and the two other, highly similar, lines are the full moment closure solution and the simulation results. for the early dynamics of the sis model in eq. ( ), we can approximate the dynamics by neglecting elements of the order of −i i compared with other elements, since β and ( ) and the recovered individuals number as equation ( ) for the sis model and eq. ( ) for the sir model are consistent with the simulations (fig. ) . furthermore, as can be clearly seen, in both sis and sir, all distributions tested (uniform, gaussian, and scale-free) have the same dynamics as the constant infectivity. during the early dynamics, only the first moment of the sensitivity distribution affects the dynamics. thus, the higher moments of the distribution cannot be estimated from the observed value of r . however, these moments may affect the dynamics later in the epidemic. we thus examined if the term neglected in the previous subsection, (−i i ), has a differential effect as a function of the higher moments of the infection rate distribution. the general solution of the infected class for eq. ( ) can be estimated to be (see appendix a) where the ratio between the solution of eq. ( ) and the solution without the neglected term in eq. ( ) is for t = , r = . for t ≈ , one can estimate the same analysis can be performed for the sir model. we examined again the effect of the neglected terms on the dynamics, with the general solution of the infected class in eq. ( ) approximated by where . the ratio between the solution without the neglected terms [eq. ( ) ] and the solution of eq. ( ), marked by r(t), is with a similar approximation. one can clearly see that the difference between the models is affected by the value of e i (β)(t). thus if e i (β)(t) differs among models, so would the resulting dynamics. as can be seen in fig. uniform distributions, the values are close to those obtained with constant infectivity. however, in the sf distribution, r(t) deviates from the values obtained in the constant infectivity case. the source of the difference is in the drastic change in e i (β) in the sf model over time. this affects the denominator of eq. ( ) for the sis and sir models. this difference results from the effect of rare events in the sf distribution. while in the uniform and gaussian distribution the value of e i (β) is close to e s (β), this is not the case in the sf model, and early in the dynamics e i (β) e s (β). thus in eq. ( ), r(t) is expected to decrease much faster than in the constant β scenario, as is indeed the case. the source of the difference between the sf and all other distributions is the presence of the very high β values in the population (even if they are rare). people with such high values of β are almost automatically infected, and they increase e i (β) sharply. note that they also slightly decrease e s (β). in all other distributions, the variance of β is limited per definition by the requirement that all individuals have positive β values. thus, while the normal distribution cannot have a large variance, in the case of the sf distribution, β could get in principle values between ( ,∞). this is not the case in reality, since the sample is finite. still, the upper bound is a few orders above the average (fig. ) . the same happens for the sir model, with the important distinction that the individuals with high β values are rapidly removed from the population, leading to a decrease in e i (β) following the initial sharp rise. thus, while initially all distribution show dynamics purely determined by r , the dynamics evolve differently as a function of the distribution of β. one is thus led to ask whether the difference in the distribution can be estimated from the early dynamics. the conclusion from the results above are that in order to estimate the future dynamics, it is not enough to know r , but the change in e i (β) should be estimated. while this cannot be estimated directly, we can directly quantify the effect this has on s * e s (β) through the disease dynamics. specifically, a decrease in e s (β) is expected to lead to a parallel decrease in i /i . in the bottom graph of fig. , we calculated the difference in the number of infected individuals in every time step, divided by the total number of infected individuals in the same time. the total decrease in the expected value of β in the s compartment and the increase in the expected value of β in the i compartments are clear in the sf distribution, while it is more limited in all other distributions. this can be clearly seen in the measures of e s (β), in e i (β), and in i /i . the mechanism driving the difference is straightforward. in normal or uniform distribution the variance in β is limited, thus even infecting the most susceptible individuals does not significantly affect the values of e i (β) and e s (β). in the sf model, the expected value of β is strongly affected by the tail of the distribution. we have shown in the previous subsections that the distribution of the infection rate has a drastic effect on the dynamics for sf distributions. for other distributions, the infected class dynamics are similar to the constant infectivity model, or slightly lower. at the steady state of the system, the difference is further enlarged. in the sis model [eq. ( )], we can compute the equilibrium frequency of infected individuals (see appendix b) to be the solution of the following implicit equation: where c n are the taylor series coefficients, expanding the terms [β − e n (β)] n leads to as expected from the intermediate period, in the constant, uniform, and gaussian distributions, the same number of infected individuals is obtained in the sis model (fig. ) . however, in agreement with the intermediate period, the total number of infected individuals in equilibrium is much lower for the sf distribution. a similar analysis can be performed for the sir model with similar results (fig. ) . the results obtained in both models in simulations and in eq. ( ) are equivalent (fig. ) . moments of β are important, the first moment approximation (that is only affected by the first moment of β) obviously fails to properly reproduce the number of infected individuals. the intuitive explanation for the effect of the distribution of β on the number of infected individuals in equilibrium, and the resulting reduction in the number of people affected by the epidemics compared with the case of constant β, is the removal of individuals with high infection probability from the susceptible pool. this effect is directly quantifiable through the higher moments of β in the population and the resulting change in the first moment of β in the susceptible and infected pool. only in the distribution where the higher moments can be important will there be a difference between the models with constant infectivity and the models with a larger variability. to confirm the effect of heterogeneity in β on the outcome of observed epidemics, and that the steady-state number of infected individuals is much smaller than expected from the constant transmission rate model, we compared between observed epidemics and the analytical results described above. we studied the spread of the ebola virus in three african countries. the ebola virus is one of five viruses of the ebolavirus genus [ ] . four of the five known ebola viruses, including ebov, cause a severe and often fatal hemorrhagic fever in humans and other mammals, known as ebola virus disease (evd). the ebola virus has caused the majority of human deaths from evd, and it is the cause of the - ebola virus epidemic in west africa, which resulted in at least suspected cases and confirmed deaths [ ] . the natural reservoir of the ebola virus is believed to be bats, and it is primarily transmitted between humans and from animals to humans through bodily fluids [ ] . we analyzed the ebola virus daily infection rates collected by health authorities in three african countries: guinea, liberia, and sierra leone. we computed the theoretical parameters providing the best fit to the epidemics in each of the three distributions described above as well as the standard sir models. there are now known cases of reexposure to ebola. we thus fit the sir and not the sis model. for example, the free parameters are as follows: for the sf distribution n is number of the population, α,β min ,β max ,i ,γ ; for the gaussian distribution, n,μ,σ,γ,i ; for the uniform distribution, n,β min ,β max ,γ ,i ; and for the constant model, n,β,γ,i . using the observed i leads to suboptimal results for all models. this is probably the case since in the early dynamics, stochastic fluctuations can affect the results. we thus estimate i from the dynamics later when enough cases are available. in fig. , the top graph represents the observed infected class in sierra leone and the bottom graph represents the observed infected class in guinea. we added the real times as labels. both graphs are fitted to the models with the distribution above. while in sierra leone there is practically no difference between the different models (as can be observed in the f scores in fig. ), in the guinea case there is a large difference. the scale-free and normal distributions provide a better fit than the constant case and uniform cases. . this rapid decrease even in the early stage of the disease suggests that highly infectious individuals are rapidly removed from the population. in the bottom graph, an f test was calculated between the best fit in the constant case and all the other distributions to incorporate the different number of free parameters. three asterisks between the classic model and any of the distributions represent p < . . in guinea, all three distributions are much better than the constant case, but there is no significant difference between the quality of the nonconstant distributions. in liberia, the sf and gaussian distributions have a significantly better fit than the uniform distribution, and the same result is obtained for sierra leone. the relative infection rate i /i can be used to detect significant deviations from the straightforward dynamics expected from these models quite early in the dynamics. the top of fig. represents the observed infected class in sierra leone for the relative infection rate i /i . we computed from the observed epidemics the number of new infection cases divided by the current number of infected individuals with a moving window of days. to determine the best-fitting model for multiple states in africa, we calculated the sum of squared errors (sse) of the optimal fit obtained for each distribution, where the error is the difference between the observed number of infected individuals and the solution of the theoretical model for the appropriate distribution. in the bottom graph, the sse is plotted for three countries: guinea, liberia, and sierra leone. for all countries, the sse of all distribution of transmission coefficient is smaller than the classic case. an f test [ ] [ ] [ ] was conducted to determine whether the reduced sse is statistically significant. the f statistic is computed using one of two equations depending on the number of parameters in the models. if both models have the same number of parameters, the formula for the f statistic is f = σ sse /σ sse , where sse is for the first model and sse is for the second model. the p value of the results is computed using w -v degrees of freedom, where w is the number of data points and v is the number of parameters being estimated (one degree of freedom is lost per parameter estimated). the resulting f statistic can then be compared to an f distribution. if the models have different numbers of parameters, the formula becomes where df is the number degree of freedom of every model. df is number of degree of model and df is number of degree of model . in the bottom of fig. , the f test was calculated between the constant case and all the other distributions. three asterisks between the classic model and any of the distributions represent a better fit of any of the distributions than the classic case. in guinea, all three distributions are better than the constant case, but we cannot determine which of the three distributions is preferable. in liberia, the sf and gaussian distributions were shown to be preferable to the uniform distribution, and the same result was obtained for sierra leone. we also compared between the results of the const susceptible-exposed-infected-recovered (seir) model and other distributions of the sir model by the f test, and the residuals of the seir model are practically identical to those of the sir model, and in this case adding an extra parameter does not improve the fit to the real data. we investigated the sir model and the sis model for the three cases of distribution of the transmission coefficient in the population and the classic case with a constant transmission coefficient. we found that in realistic cases, nonuniform models provide a better description of the observed epidemics than the model where there is a constant transmission coefficient for the entire population. in the early dynamics, only the first moment of β determines the dynamics, and there is no difference between the models as long as they have the same transmission rate. later, the second moment of β starts affecting the dynamics for both the sir and sis models. during this period, the scale-free distribution behaves differently from other distributions and results in a lower number of infected individuals than expected. this reduction is the result of a difference between the first moment of β in the different compartments (s, i ). there is a sharp increase in the average value of β among infected compartments, accompanied by a smaller decrease in the susceptible compartment, compared with the initial value. such an effect can only be observed in distributions where the second moment of β in the total population is large enough. the difference is then further enlarged in steady state where all distributions of the transmission coefficient in the population lead to a smaller number of infected individuals compared with the model with constant β, but the biggest difference is in the sf model, again explained by the high second moment of β in this distribution. we have then tested this conclusion by studying the spread of the ebola virus in multiple african countries. an f test between the different distributions shows that they all produce a better fit than the constant β model. we attempted to detect early in the infection whether the total number of people affected by the epidemics will deviate significantly from the results expected from the early dynamics. classical models predict a large number of infected individuals in most epidemics with r higher than . however, in reality, many epidemics end with a limited impact. the most classical example is perhaps the huge difference between the predictions and the observed amplitude of the jcd [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . many models can explain this discrepancy, including, among others, nonlinear dynamics [ , ] , delays [ , ] , human intervention, passive vaccination [ , ] , and small effective population [ ] [ ] [ ] . we show here that even in the most standard sir and sis models, the initial dynamics cannot determine the total number of people affected by the epidemics. however, quite early in the dynamics, the relative infection rate i /i can be used to detect significant deviations from the straightforward dynamics expected from these models. the large difference is only expected if the distribution of β values is large. such a large difference can be the result of intrinsic differences, but also the result of environmental differences, partial mixing, or subpopulation structure. understanding this distribution in advance would improve our capacity to relate early and late dynamics. we now plan to develop methods to estimate this distribution from finer measures early in the disease dynamics. we thank miriam beller for the language editing. we solve eq. ( ) using iterative equations for the sis model. the equation for the infected compartment from eq. ( ) can be integrated over all values of β as follows: the term ∞ βn(β)dβ is equal to e(β)n , and the term ∞ βi (β)dβ can be defined as i . equation (a ) thus leads to di dt where it has a form similar to eq. ( ) . we can differentiate higher orders of i with respect to time to obtain equations similar to eq. (a ). this can be written at the nth order of i (i n ) as the following general solution: for the early dynamics for the sis model, we dropped the term −i i from eq. ( ) , as explained in the main text, and we obtained in the second order for initial time, the term −i i in eq. (a ) is not neglected. we can write this term as −i e i (β) using eq. ( ) . the new form of eq. (a ) leads to we denote −e n (β)n + γ = c and −e i (β) = c to obtain di dt the solution for that equation is we get one can change the constants to be e n (β)n − γ = ξ and e i (β) = ψ, and we obtain the boundary condition is i (t = ) = i o , leading to the solution for the infected class in eq. (a ) is the ratio between the solution without the neglected terms eq. (a ) and the current solution of eq. (a ), marked by r(t), is for t ≈ , we get and then we get where it has the same form as eq. ( ). we drop the terms −i i and −i r for the early dynamics to obtain and r is dr(β) dt = γ i (β). we integrate again and obtain d dt which is in the second order for initial time, the terms −i i and −i r in eq. (c ) will not be neglected. the same development for the term i is identical to that in appendix a. we solve eq. ( ) with the closure scheme by integrating by β: if we differentiate s , we get and the equation will be where we define s = ∞ β s(β)dβ. in the same way, we can define in general and the difference equation is we use the moment closure method to derive the solution for these differential equations. the order where the differential equation stops will be zero, and the order before it will be const. for example, if we want to stop at order s , this order will be s = , and order s = const, where the const = s (t = ) = ∞ β s(β)dβ. infectious diseases in humans mathematical tools for understanding infectious disease dynamics mathematical epidemiology of infectious diseases: model building, analysis and interpretation modeling infectious diseases in humans and animals generality of the final size formula for an epidemic of a newly invading infectious disease age-of-infection and the final size relation the mathematics of infectious diseases pandemic potential of a strain of influenza a (h n ): early findings representations of mad cow disease new phenotypes for new breeding goals in dairy cattle mad cow policy and management of grizzly bear incidents no brainer-the usda's regulatory response to the discovery of mad cow disease in the united states trust in food in the age of mad cow disease: a comparative study of consumers' evaluation of food safety in belgium risk perception of the "mad cow disease" in france: determinants and consequences villas-boas, consumer and market responses to mad cow disease tracking the human fallout from mad cow disease receptor recognition and cross-species infections of sars coronavirus cross-reactive antibodies in convalescent sars patients' sera against the emerging novel human coronavirus emc ( ) by both immunofluorescent and neutralizing antibody tests the genome sequence of the sars-associated coronavirus clinical progression and viral load in a community outbreak of coronavirus-associated sars pneumonia: a prospective study prion diseases: update on mad cow disease, variant creutzfeldt-jakob disease, and the transmissible spongiform encephalopathies early estimation of the reproduction number in the presence of imported cases: pandemic influenza h n - in new zealand epidemic models with uncertainty in the reproduction number rapid drop in the reproduction number during the ebola outbreak in the democratic republic of congo predicting the extinction of ebola spreading in liberia due to mitigation strategies spatiotemporal spread of the outbreak of ebola virus disease in liberia and the effectiveness of non-pharmaceutical interventions: a computational modeling analysis identifying transmission cycles at the humananimal interface: the role of animal reservoirs in maintaining gambiense human african trypanosomiasis transmission and control of african horse sickness in the netherlands: a model analysis simulating school closure strategies to mitigate an influenza epidemic world health organization writing group, nonpharmaceutical interventions for pandemic influenza, national and community measures estimating the effective reproduction number for pandemic influenza from notification data made publicly available in real time: a multi-country analysis for influenza a/h n v vaccination and passive immunisation against staphylococcus aureus a vaccination model for a multi-city system modeling vaccination in a heterogeneous metapopulation system strategies for mitigating an influenza pandemic clinical recognition and diagnosis of clostridium difficile infection ageing and infection case definitions of clinical malaria under different transmission conditions in kilifi district the management of community-acquired pneumonia in infants and children older than months of age: clinical practice guidelines by the pediatric infectious diseases society and the infectious diseases society of america clinical practice guidelines for clostridium difficile infection in adults: update by the society for healthcare epidemiology of america (shea) and the infectious diseases society of america (idsa) infection dynamics on scale-free networks automata network sir models for the spread of infectious diseases in populations of moving individuals epidemic outbreaks in complex heterogeneous networks slow epidemic extinction in populations with heterogeneous infection rates dynamics of person-to-person interactions from distributed rfid sensor networks griffiths phases on complex networks epidemic spreading in metapopulation networks with heterogeneous infection rates numerical solutions of the euler equations by finite volume methods using runge-kutta time-stepping schemes mastering matlab : a comprehensive tutorial and reference moment closure and the stochastic logistic model novel moment closure approximations in stochastic epidemics moment-closure approximations for mass-action models proposal for a revised taxonomy of the family filoviridae: classification, names of taxa and viruses, and virus abbreviations ebola situation report killers in a cell but on the loose-ebola and the vast viral universe the analysis of variance: fixed, random and mixed models mathematics for the clinical laboratory kenward-roger approximate f test for fixed effects in mixed linear models mathematical structures of epidemic systems a stabilizability problem for a reaction-diffusion system modeling a class of spatially structured epidemic systems production and germination of conidia of trichoderma stromaticum, a mycoparasite of crinipellis perniciosa on cacao differential equations and applications in ecology, epidemics, and population problems effect of vaccination in environmentally induced diseases listeriosis: a model for the fine balance between immunity and morbidity temporal genetic samples indicate small effective population size of the endangered yellow-eyed penguin small effective population size and genetic homogeneity in the val borbera isolate effective population size is positively correlated with levels of adaptive divergence among annual sunflowers key: cord- -ag j obh authors: higgins, g.c.; robertson, e.; horsely, c.; mclean, n.; douglas, j. title: ffp reusable respirators for covid- ; adequate and suitable in the healthcare setting date: - - journal: j plast reconstr aesthet surg doi: . /j.bjps. . . sha: doc_id: cord_uid: ag j obh nan "please doctor, could you tell him that i love him?": letter from plastic surgeons at the covid- warfront dear sir, how many times have we heard these words in this time? too many. the covid- pandemic has completely disrupted our normal surgical and clinical routine. in these days, many colleagues of whatever specialty are regularly employed by their hospitals to face covid- emergency in italy, europe and worldwide. we are not plastic surgeons anymore. many of us feel lost, unprepared and inadequate for such an emergency. here in bergamo, the centre of the italian epidemic, we felt small and incompetent at the beginning. however, we must remember that first of all we are doctors, then plastic surgeons. in these weeks we are putting our willingness at the service of our patients and colleagues. the numbers of the covid- pandemic in bergamo are impressive: positive patients and over official deaths in about one month. at the same time, the reaction of our hospital, papa giovanni xxiii, has been impressive too: over doctors and over nurses entirely dedicated to covid- positive patients; intensive (one of the largest intensive care unit in europe) and over nonintensive care beds are set aside for those patients. this huge wave of covid- positive patients, forced the hospital management to progressively and rapidly recruit, train and put on ward over physicians of any discipline and nurses from march th. several training programs about covid- infection and management have been scheduled in order to prepare the entire staff. two plastic surgeons of our team (on a total of six) have been fully dedicated on the shifting in covid medical areas coordinated by a pulmonologist and an intensivist. main activities focus on patient clinical exam, adjustment of oxygen therapy, regulation of cpap systems, hemogasanalysis implementation, blood and radiological exam monitoring and consequent therapy modulation, admission, discharge and deaths bureaucracy. despite these new clinical fields which are new for a plastic surgeon, we are learning how isolation of patients, due to public health reason, is the most devastating aspect of covid- pandemic. , every single day we phone and update the relatives of those who, because of the worsening of their respiratory condition, are unable to speak and call home. we are sometimes those who communicate the death of his or her beloved but also those who bring words of hope, words of love: "please doctor, could you tell him that i love him so much?". some of these patients die without the hug of their families. a plastic surgeon is not usually used to face death because in our surgery it is not so frequent. we would say that the death of a lonely patient also takes a part of us away. it acquires a different hint, touching some inner cord, it makes you feel impotent and lost. as plastic surgeons we often take care of the psychological side of patients and, except for some tumours and traumas, the pathologies we treat -like breast reconstruction -are not fatal diseases. if we compare the contribution of plastic surgery department in term of numbers, we are like a drop in the ocean. but as ovid wrote in epistulae ex ponto "gutta cavat lapidem" i.e. "the drop digs the rock". thanks to our support, a clinical physician is able to evaluate a larger number of patients, focusing on the most critical ones. this is why we keep going on. we want to make our part, working with commitment, dedication and professionalism and assisting all our patients to the best of our in-continueupdating knowledge. we are proud to help bergamo community to face covid- emergency and trying to make the difference in our wounded city. we hope this letter will help other colleagues not to consider themselves unprepared or unready. the contribute of everyone is crucial to defeat this ongoing pandemic which has not only upset our clinical routine, but it has woken us up from our everyday life. before covid- everything was scheduled, now there are no plans and we are not sure about our priorities. only if we behave, as long as necessary, with the awareness of being able to make a difference, we will win this terrible fight against sars-cov- . only together we will go back to hugging, kissing and loving each other. when the critical phase of this emergency is over, it will be necessary to think deeply about the socioeconomic development strategies to discover new horizons and new opportunities for a better future. we will never give up!…and what about you? are you ready to play your part? none. this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. dear sir, covid- is a novel coronavirus with increasing outbreaks occurring around the world. , during the past weeks, emergence of new cases has gradually decreased in china with the help of massive efforts from society and the government. in addition to those directly working in the respiratory, infectious, cardiology, nephrology, psychology, and icu departments and covid- patients, all members of the general population may encounter the new coronavirus. medical staff in plastics, reconstructive, and other departments also have a responsibility to prevent the disease spreading in our community. in order to protect both patients and medical staff, selective operations and cosmetic treatments were reduced or postponed in the plastic surgery hospital, beijing, china. gloves and medical masks were saved and donated to the doctors and nurses in wuhan as the demand for protective equipment increased significantly. in addition, a standard operation procedure for covid- was proposed in local hos-pitals. our hospital recommended online consultations to replace face-to-face interactions. hospital websites and official social media accounts provided updated practical disease prevention information instead of plastic surgery information. other colleagues also conducted publicity campaigns on disease prevention online via their own social media accounts for relatives and friends, especially for older persons who appeared to have developed a serious illness. at the early stages of the covid- outbreak in certain areas, the public may not care much about the new disease. as more information about covid- becomes available, people without medical background may be anxious to seek diagnosis, which may result in potential risks of cross infection in the crowded fever clinics. thus, proper information and guidance can help reduce their panic and anxiety. moreover, if individuals were exhibiting relevant symptoms with epidemiologic history, they were advised to seek medical care following the directions of local health authority. in general, plastic surgeons are particularly good at introducing novel surgical methods to the public and keeping in touch with a great number of patients. as a result, they may be able to present local health authority advice in the form of straightforward images and accessible videos, as well as promote practical information via personal social media or clinic websites. in addition to local doctors and nurses from other departments helping in fever clinics and isolation wards, , (as of march , ) members of medical staff from other provinces rushed to help their colleagues in hubei province. plastic surgeons that had completed icu training in beijing and other cities supported wuhan on their own initiative as well. we suggest that measures should be taken by medical staff from all departments to help slow further spread and to protect health systems from becoming overwhelmed. dear sir, as covid- spreads quickly from asia via europe to the rest of the world, hospitals are evolving into hot zones for treatment and transmission of this disease. with the increasing acceptance that operating theatres are high risk areas for transmission of respiratory infections for both patients and surgeons, and with our health care systems being generally well-designed to only deal with occasional high-risk cases, there is an obvious need to evolve our practice. although social media campaigns via the british association of plastic, reconstructive and aesthetic surgeons (#staysafestayhome) and british society for surgery of the hand (#playsafestaysafe) are attempting to raise awareness and reduce preventable injuries, we are still seeing a steady stream of patients present to our plastic surgery trauma service. we have had to act immediately so our systems can support essential surgical care while protecting patients and staff and conserving valuable resources. as a department we have developed a set of standard operating procedures which cover the full scope of plastic surgery from the facilitation of emergent life and limb saving surgeries, rationalised oncological management to the management of minor soft tissue and bony injuries. we have been cognisant of the need to reduce footfall to the hospital and the stratification into "dirty" and "clean" areas with attempted segregation of non-, suspected and confirmed covid cases within inpatient clinical areas. this has resulted in displacement of assessment and procedure rooms within the unit. the ward itself has been earmarked as an extended intensive care unit due to its layout and facilities. standards of practise have changed, with an emphasis on "see and treat" as operating theatre availability has been reduced due to the reduced availability of nurses and theatre staff and their conversion into intensive care areas for ventilated patients. there is also an emerging assumption that all patients are covid- positive until proven otherwise. the combination of unfamiliar environments, lack of accessible equipment, requirement to reduce time spent with patients and adherence to social distancing has resulted in the need to provide a more mobile and flexible service. in order to support our mobile service, we have found that, as in other disaster situations where specialised bags have been deployed, using a simple bag containing essential equipment and consumables has revolutionised our ability to work at the point of referral and avoid unnecessary trips to theatre. despite their simplicity, bags have been fundamental for the development of human civilization, with the word originating from the norse word baggi and comparable to the welsh baich (load, bundle)!!! our portable "pandemic pack" is now being carried by the first on-call in our department. this pack contains a l ultra dry adventurer tm , polymer dry bag measuring cm (w) × cm (l) as shown in figure . the contents are shown in figure . we have found this adequate for managing most common plastic surgery trauma and emergency scenarios. the bag is easily cleaned with ppm available chlorine (in accordance with public health england guidance) after each patient exposure. we have found it useful to make up two packs in advance so that one is available at handover whilst the other is replenished by the outgoing team. we are sure that this concept has been used elsewhere, but if it is not common practice in your unit, we would advo- cate implementing such a toolkit to facilitate management of trauma patients and reduce the amount time frontline staff need to be in a potential "dirty" environment during the covid- pandemic. teleconsultation-mediated nasoalveolar molding therapy for babies with cleft lip/palate during the covid- outbreak: implementing change at pandemic speed dear sir, cleft lip/palate is among the most common congenital anomalies, requiring multidisciplinary care from birth to adulthood. the nasolaveolar molding (nam) revolutionized the care provided to babies with a complete cleft, with proving its benefits to patients, parents, clinicians, and society. this therapeutic modality requires parents' engagement with nam care at home and continuous clinicianpatient/parent encounters, commencing at the second week of life and finishing just before the lip repair. the rapidly expanding covid- pandemic has challenged clinicians who are dealing with nam therapy to fully stop it, or adjust it to protect, both, the patient/parent and the healthcare team. based on the current who recommendation, to maintain social distancing, and the national regulation for the use of telemedicine, , the nam-related clinician-patient/parent relationship has timely been adjusted by implementing the non-face-to-face care model. babies with clefts are consulted individually by clinicians, proactively establishing the initial and subsequent telemedicine consultations, also providing an open communication channel for parents. based on a shared decisionmaking process, all parents have the option to completely stop nam therapy or use only lip tapping. given that each patient is at a particular stage within the continuum of nam care, numerous patient-and parent-derived issues are being addressed by video-mediated consultations. overall, this has helped explain the current covid- -related public health recommendations and precautions to parents, while addressing patients' needs and parents' feelings, fears, expectations, and answering parents' questions. moreover, clinical support is provided to patients and parents by visual inspection (looking for potential nam-derived facial irritation), and checking parents' hand-hold maneuvers, such as feeding and placement of the lip tapping and nam device, with immediate feedback for corrections. thus, the use of an audiovisual communication tool has considerably reduced the number of in-person consultations. when a face-to-face consultation could not be resolved using the telemedicine triage, an additional video-based conversation had been implemented, focusing on the key steps, established for patient/parent visits to the facility (i.e., frequent hand-cleaning, mask usage, and keeping m social distance) and on the covid- -focused screening. symptom-and exposure-screened negative parents/babies have been consulted in a time-specific scheduling with minimum waiting time to avoid crowded waiting rooms, by a clinician wearing personal protective equipment (cap, face shield, n mask, goggles, gloves, and gowns), and working in an environment with constant surface/object decontamination. parents, who screened positive for symptoms (e.g., fever, cough, sore throat), were indicated to follow to the appropriate self-care or triage mechanism, stipulated by the who guidelines and local authorities. [ ] [ ] [ ] [ ] in the covid- era, the care provision should be aligned with the latest clinical evidence. in response to the constantly changing needs, clinicians across the globe could adapt the telemedicine-based possibilities to their own environment of national/hospital regulatory bodies, technology accessibility, and the parents' level of technological literacy. as most of the issues addressed in the video conversations were recurrent reasons for consultations prior to the covid- outbreak, future investigations could assist in truly defining the key aspects of telemedicinebased clinician-patient/parent relationship in delivering nam therapy, and its impact on nam-related proxy-reported and clinician-derived outcome measures. there are no conflicts of interest to disclose. virtual clinics: need of the hour, a way forward in the future. adapting practice during a healthcare crisis the whole world is gripped by the novel coronavirus pandemic, with huge pressures on the health services globally. within the coming days, this is only going to increase the pressure on the health care services and needs robust planning and preparedness for this unprecedented situation, lest the whole system may cripple and we may see unimaginable mortalities and suffering. the whole concept of social distancing and keeping people in self isolation has reduced footfall to the hospitals but this is affecting delivery of routine care to patients for other illnesses in the hospital and telehealth is an upcoming way to reduce the risk of cross contamination as well as reduce close contact without affecting the quality of health care delivered. at the bedford hospital nhs trust, for the past one year we have been running a virtual clinic for our skin cancer suspect patients, where in after a particular biopsy if the clinical suspicion of a malignancy was low, these patients were not given a follow up clinic appointment and instead they were informed of the biopsy result through post, sent both to their gp and themselves. most patients encouraged this model to not have to come back to an appointment and this took significant pressure off our clinics. in the event we needed to see a patient, they were informed via a telephonic conversation to attend a particular clinic appointment. from an administration standpoint, this resulted in less unnecessary follow up appointments in our skin cancer follow up clinics, which could then be offered to our regular skin cancer follow up patients as per the recommended guidelines, without having to struggle with appointments. virtual clinics have previously shown to be safe and cost effective alternatives to the out patient visits in surgical departments like urology and orthopedics. they improved performance as well as improved economic output. , we have increased the use of these virtual clinics, with the onset of the novel coronavirus pandemic, in order to reduce the patient footfall to our clinics. most patients voluntarily chose not to turn up and with the risk being highest amongst the elderly, it was logical to keep them away from hospitals as far as possible. in order to achieve this, we have started virtual clinics for nearly all patients in order to triage patients that can do without having to come to the hospital for now. the world of telemedicine is the way forward in nearly all aspects of medical practice and this pandemic situation might just be the right time to establish such methods. we propose setting up of more such clinics in as many subspecialties of plastic surgery, which not only will help in the current crises situation, but will also be useful in the future to take pressure of our health care services. none declared not required funding none webinars in plastic and reconstructive surgery training -a review of the current landscape during the covid- pandemic dear sir, the covid- pandemic has resulted in cancellation of postgraduate courses and the vast majority of elective surgery. plastic surgery trainees and their trainers have therefore needed to pursue alternative means of training. in the face of cross-speciality cover and redeployment there is an additional demand for covid- specific education. the joint committee on surgical training (jcst) quality indicators for higher surgical training (hst) in plastic surgery state that trainees should have at least h of facilitated formal teaching each week. social distancing requirements have meant that innovative ways of delivering this teaching have needed to be found. a seminar is a form of academic instruction based on the socratic dialogue of asking and answering questions, with the word originating from the latin word seminarium meaning "seed plot". fast and reliable internet and the ubiquitous nature of webcams has led to the evolution of the seminar into the webinar. whilst webinars have been common place for a number of years, they represent an innovative and indispensable tool for remote learning during the covid- pandemic, where trainees can interact and ask questions to facilitate deep and meaningful learning. speciality and trainee associations have traditionally used their websites and email lists to publicise training opportunities. however, the covid- pandemic has seen a shift to social media; with people seeking constant updates and information from public figures, brands and organisations alike. surgical education has mirrored this trend, and we have increasingly observed that webinars are being launched through speciality and trainee association social channels to keep up with the fast-paced demand for accessible online content. the aim of this study was to audit cumulative compliance of active publicly accessible postgraduate plastic surgery training webinar frequency and duration against jcst quality indicators. we used the social listening tool brand tm ( https:// brand .com ). this tool monitors social media platforms for selected 'keywords' and provides analysis of search results. we used the search terms "plastic surgery webinar", "reconstructive surgery webinar", "royal college of surgeons", "bapras", "bssh", "british burns association", "plasta" and "bssh". there were mentions of these terms from th may to th may and of these were after rd march , the date that lockdown began in the united kingdom (uk). this represents an increase of , % post-lockdown. we supplemented this search strategy by searching google tm and youtube tm with "plastic and reconstructive surgery webinar". these search engines rank results in order of relevance using a relevancy algorithm, we therefore reviewed the first results only. additional webinars were identified through a snowballing technique where the host webinar webpage was searched for advertised webinars at other institutions. we included any educational webinar series aimed at trainees that was free to access, mirroring weekly plastic surgery hst teaching. free webinars which required membership registration were also included. we excluded webinars aimed at patient or parent education, webinars with less than one video, any historic webinar that did not have an accessible link and webinars behind a paywall or requiring paid membership. we systematically reviewed the search results from brand tm , google tm and youtube tm and identified webinar series currently in progress ( table ) and historic webinar series ( table ) . seven active webinar series and two historic webinar series were identified respectively. all were consultant or equivalent delivered. of the active webinar series, ( %) related to covid- , ( %) related to aesthetic surgery, ( %) related to pan-plastic surgery and ( %) related to hand surgery. the weekly total running time for active webinars amounted to h min, with h and min plastic surgery specific. this was a surplus of h min to jcst quality indicators. limitations of this study include us only identifying webinars advertised publicly. we are aware of training pro-grammes in the uk running in-house webinar series to supplement training and therefore the total available for training is likely to be higher than we have identified. we have also not reviewed the quality of educational content. we acknowledge there are good quality webinar series that require paid for membership such as those provided by the british association of aesthetic plastic surgeons and american society of plastic surgeons but it was not the aim of the study to present them here. innovation flourishes during times of crisis. the education of surgical trainees is of paramount importance and should be maintained, even during the difficult times we currently face. while operative skills will be difficult to develop, the use of technology can allow for the remote delivery of expert teaching to a large number of trainees at once. in this study we identify a number of freely available webinar series that provide a greater number of teaching hours than is recommended by the jcst. the training exists, it is up to trainees to make the most of it. none. none. dear sir, salisbury district hospital (sdh) is based in southwest england and provides a plastic surgery trauma service across the south coast, serving six local hospitals and the designated major trauma centre (mtc). prior to the covid- pandemic all patients referred to the trauma service, apart from open lower limb trauma, were reviewed in person within the trauma clinic. if surgery was required, it was usual for patients to return on a separate day for their operation and in most instances this was carried out under general anaesthetic in the main operating theatres. after discharge, patients were referred to the hand therapy and plastics dressing services and returned in person for all follow-up visits including dressing changes and therapy. patients with lower limb injuries from the mtc were transferred from southampton general hospital as inpatients to sdh for all complex reconstruction including free tissue transfer. at the start of the covid- crisis, it became quickly apparent that reducing patient footfall within our department was necessary to protect both patients and staff from the disease. this included reducing inpatient stays in hospital. we responded to this challenge in the following ways and hope that our experience will be of assistance to other trauma services over the course of the global pandemic. firstly, all patient protocols underwent significant redesign following which changes to the layout of our plastic surgery outpatient facility were made and patient flow through the department was altered and reduced. now, when patients are referred to our hand trauma service from peripheral hospitals, the initial patient consultations are carried out remotely using the 'attend anywhere' video platform. we are following the bssh covid- hand trauma guidelines for patient management. all patient decisions are discussed with the trauma consultant of the day. we are managing a greater number of patients conservatively and to aid this we have designed comprehensive patient information leaflets that enable our patients to increase understanding of their own management. patients who need to be seen in person at our department are screened for symptoms of covid- and their temperature taken at the department entrance. level ppe is worn by staff at all times. for hand trauma patients requiring surgery, this is provided on the same day to maximize efficiency and reduce the need for multiple visits. we have transformed our minor operating theatres, located adjacent to our clinic, into fully functional theatres equipped with a mini c-arm and all instruments for trauma operating. this reduces the need for our patients to be taken into the main hospital theatre suite. operations are carried out either under local anaesthetic, walant or regional block depending on complexity. all theatre staff wear level ppe and staffing is kept to a minimum. all wounds are closed with dissolvable sutures. immediately post operation, our on-site hand therapists review patients. splints are made on the same day and patients are educated about their post-operative management at this time. all follow-up is subsequently carried out virtually by the hand therapy team using 'attend anywhere'. with our hub and spoke service set up for lower limb trauma patients, we have ensured that there is an on-site consultant at the mtc every day. wound coverage is being undertaken for all patients at the mtc. two plastic surgery consultants in conjunction with the orthopaedic team carry out operating for these patients. all inter-hospital transfers for this group of patients have been stopped. choice of wound coverage for these patients is being designed to minimise inpatient stay and reduce operative time. the changes that we have made to our service in a short period of time have already been beneficial for patients, streamlining their care and reducing time spent in hospital. figure shows the drop in numbers of trauma patients that we have seen during the first four weeks of the uk lockdown ( n = in january to n = over the first weeks into lockdown). this is in line with reports from other uk units. this has given us time to refine our protocols for an expected upsurge of patients as the lockdown is lifted. furthermore, during this period where we have had extra capacity, our registrars have been trained to carry out new techniques. they now undertake insertion of both mid-lines and picc lines for medical inpatients under ultrasound guidance to support and reduce the burden placed on our anaesthetic and critical care colleagues who previously would have placed these. it is our expectation that many of the changes we have implemented to our service will be continued in the longterm. we will continue to learn and adapt our protocols as this phase of work continues. whilst many of the outcomes of the covid- pandemic will be negative, it has also been the catalyst for significant positive change within the uk nhs. dear sir, the covid- pandemic has caused unprecedented disruptions in patient care globally including management of breast and other cancers. however, cancer care should not be compromised unnecessarily by constraints caused by the outbreak. clinic availability and operating lists have been drastically reduced with many hospital staff members reassigned to the "frontline". furthermore, all surgical specialties have been advised to undertake emergency surgery or unavoidable procedures only with shortest possible operating times, minimal numbers of staff and leaving ventilators available for covid- patients. in consequence, much elective surgery including immediate breast reconstruction (ibr) has been deferred in accordance with guidance issued by professional organisations such as the association of breast surgery (uk) and the american society of plastic surgeons. , this will inevitably lead to backlogs of women requiring delayed reconstructions and it is therefore imperative that reconstructive surgeons consider ways to mitigate this and adapt local practice in accordance with national guidelines and operative capacity. in the context of the current "crisis" or the subsequent "recovery period", time consuming and complex autologous tissue reconstruction (free or pedicled flap) should not be performed. approaches to breast reconstruction might include the following options: . a blanket ban on immediate reconstruction, and all forms of risk-reducing, contralateral balancing and revisional/tertiary procedures. where reconstructive delay is neither feasible nor desirable, opting for simple and expedient surgery should be considered e.g.: a) expanded use of therapeutic mammaplasty: as a unilateral procedure in selected cases instead of mastectomy and ibr. b) exploring less technically demanding (albeit "controversial") implant-based forms of ibr: i. epipectoral breast reconstruction (fixed volume implants): this adds about minutes to the ablative surgery as the pre-prepared implant-adm complex is easily secured with minimal sutures. ii. "babysitter" tissue expander/implant: this acts as a scaffold to preserve the breast skin envelope for subsequent definitive reconstruction. . during the restrictive and early recovery phase, either a solo oncological breast surgeon or a joint ablative and reconstructive team (breast and plastic surgeon) performs surgery without the assistance of trainees or surgical practitioners. for joint procedures, the plastic surgeon acts as assistant during cancer ablation and as primary operator for the reconstruction. despite relatively high rates of complications for implant-based ibr (risking re-admission, prolonged hospital stays or repeat clinic visits), avoiding all ibr will lead to long waiting lists and have a negative psychological impact, particularly among younger patients. this will also impair aesthetic outcomes due to more extensive scars and inevitable loss of nipples. whilst appreciating the restrictions imposed by covid- , there is opportunity to offer some reconstructive options depending on local circumstances, operating capacity and the pandemic phase. we suggest that these proposals involving greater use of therapeutic mammaplasty as well as epipectoral and "babysitter" prostheses be considered in efforts to offset some of the disadvantages of covid- on breast cancer patients whilst ensuring that their safety and that of healthcare providers comes first. dear sir, the covid- pandemic has shifted clinical priorities and resources from elective and trauma hand surgery with general anaesthesia (ga) to treat the growing number of covid patients. at the time of this correspondence, the pandemic has affected over million people resulting in deaths worldwide, with uk deaths, with numbers still climbing. this has particularly affected our hand trauma services which serves north london, a population of more than million. we receive referrals from a network of hospitals in addition to emergency departments of the royal free group of hospitals and numerous gp practices and urgent care centres. in the first week following the british government lockdown, which commenced march rd, we experienced a % drop in referrals, from to a day. subsequently, numbers have been steadily rising to - a day by th of april. the british association of plastic, reconstructive and aesthetic surgeons, the british society for surgery of the hand and the royal college of surgeons of england, have all issued guidance: both encouraging patients to avoid risky pursuits, which could result in accidental injuries and to members how to prioritise and optimise services for trauma and urgent cancer work. we have adapted our hand trauma service to a 'one stop hand trauma and therapy' clinic, where patients are assessed, definitive surgery performed and offered immediate post-operative hand therapy where therapists make splint and give specialist advice on wound care and rehabilitation including an illustrated hand therapy guide. patients are categorised based on the bssh hand injury triage app. we already have a specific 'closed fracture' hand therapy led clinic, to manage the majority of our closed injuries. we combined this clinic with the plastic surgeons' led hand trauma clinic, and improved its efficiency further by utilising the mini c-arm fluoroscope within the clinic setting. this enabled us to immediately assess fractures and perform fracture manipulation under simple local anaesthesia. we have successfully been able to perform % of our operations for hand trauma under wide awake local anaesthesia no tourniquet (walant). prior to the pandemic, we used walant for selected elective and trauma hand surgical cases. in infected cases, where local anaesthesia is known to be less effective, we have used peripheral nerve blocks. previous data showed % of our trauma cases were conducted under ga, % under la, and % under brachial or peripheral nerve blocks. we have specifically modified our wound care information leaflets to minimise patient hospital attendance. afterwards patients receive further therapy phone consultations and encouragement to use the hand therapy exercise app developed by the chelsea and westminster hand therapists. the patient is given details of a designated plastic surgery nhs trust email address, for direct contact with the plastic surgery team: for concerns, questions and transfers of images. we have to date received emails, of which have been from patients directly, and the remainder from referring healthcare providers. the majority of inquiries are followed up via a telephone consultation and only complex cases or complications, attend face-to-face follow-up. this model has successfully combined assessment, treatment and post-op therapy into a one-stop session, which has greatly limited patient exposure to other parts of the hospital, such as the radiology and therapy departments. the other benefit of such clinic is an improved outcome through combined decision making. there is also a cost saving benefit compared to our traditional model of patient care. we have treated patients based on this model so far, who have been suitable for remote monitoring. on average we have saved plastics dressing clinic (pdc) visits for wound checks per patient, as a very minimum. we have previously calculated the cost of pdc at our centre at £ per visit and for our patients this translates to an approximately saving of £ per month just on pdc costs. if patients each month could be identified for remote monitoring, this could potentially lead to an annual saving of more than £ , . in addition, the estimated cost-saving by converting the mode of anaesthesia from ga to walant has been shown to cause a % reduction. the concept of a one-stop clinic has already been successfully implemented in the treatment of head & neck tumours, following introduction of nice guidelines in and the covid- pandemic has made us redesign a busy metropolitan service for hand injuries along the same lines. we believe this model is a good strategy and combining this with more widespread use of the walant technique, technology such as apps and telemedicine, as well as encouraging greater patient responsibility in their post-operative care and rehabilitation; is the way forward. we hope sharing this experience will result in improved patient care at this time of crisis. 'this is a saint patrick's day like no other' declared the irish prime minster on march th , whilst announcing sweeping social restrictions in a response to the worsening covid- pandemic. this nationwide lockdown involved major restrictions on work, travel and public gatherings and signified the government's shift from the suppression to the mitigation phase of the outbreak. the national covid- task force produced a policy specifying the redeployment of heath care workers to essential services such as the emergency department and intensive care. with the introduction of virtual outpatient clinics and the curtailment of elective operating lists, the apparent clinical commitments of a plastic surgeon during this pandemic has lessened. trauma is a continual and major component of our practice ; however, a decline in emergency department presentations has fuelled anecdotal reports of a reduction in the trauma workload. with diminishing resources, the risk of staff redeployment and consequences of poor patient outcomes we aim to assess the effect of the current lockdown due to covid- pandemic on plastic trauma caseload. we performed a retrospective review of a prospectively maintained trauma database at a tertiary referral hospi- during the first days of the lockdown, patients attended plastic surgery trauma clinic, in which ( . %) underwent a surgical procedure. as seen in figure , these numbers are comparable over the same time frame for the two previous years. upper limb trauma accounted for the near majority of referrals. frequency and type of surgery performed during the lockdown were similar to the previous two years, as seen in table . the percentage of patients requiring general anaesthesia was . % ( / ) in , . % ( / ) in , and slightly higher in at . % ( / ). we have refuted any anecdotal evidence proposing a decline in plastic trauma caseload during the covid nationwide lockdown. comparing the same time in previous years, the lockdown has produced an equivalent trauma volume. despite, the widespread and necessary restriction of routine elective work, somewhat surprisingly the pattern and volume of trauma remains similar to preceding years. with people confined to their household, it is the 'diy at home' associated injuries which attributes to this trend. and the exemption from regulations of certain industries such as agriculture and the food preparation chain. whilst not every trauma risk may be mitigated, the potential for these diy injuries to overwhelm the healthcare service has resulted in the british society for surgery of the hand (bssh) cautioning the general public on the safety of domestic machinery. as healthcare systems are stretched further than ever before we all must recognise the need for adaptation and structural reorganisation to treat those of our patients most in need during this pandemic. staff redeployment is a necessary tool to maintain frontline services; nonetheless, we wish to highlight the outcomes of this study to the clinical directors with the challenging job of allocating resources. our trauma presentations have not reduced during the first days of this pandemic, resources (staff and theatre) should still be accessible for the plastic surgery trauma team, with observance of all the appropriate risk reduction strategies as documented by british association of plastic, reconstructive and aesthetic surgeons. none. none. in light of the ongoing covid- pandemic, the american society of plastic surgeons (asps) has released a statement urging the suspension of elective, non-essential procedures. this necessary and rational suspension will result in detrimental financial effects on the plastic surgery community. given the simultaneous economic downturn inflicted by public health social-distancing protocols, there will be a bear market for elective surgery lasting well past the bans being lifted on elective surgeries. this effect will largely be due to the elimination of discretionary spending as individuals attempt to recover from weeks to months of lost earnings. as demonstrated during the - recession, economic decline was associated with a decrease in both elective and non-elective surgical volume. private practice settings performing mostly cosmetic procedures were particularly vulnerable to these fluctuations and demonstrated a significant positive correlation with gdp. the surgery community must prepare for the economic impact that this pandemic will have on current and future clinical volumes. these effects are likely to be more severe than the previous recession as surgeons are currently indefinitely unable to perform elective surgeries, coupled with the immense strain on hospital resources at this time. given this burden, elective surgery cases may be some of the last to be added back to the hospital once adequate resources are restored. while surgeons are temporarily unable to operate, they do have the potential to use telehealth in order to arrange preoperative consults and postoperative follow-up appointments. this could be accomplished in private practice settings with the use of telehealth services such as teladoc health, american well, or zoom, which allow for live consultation with patients without unnecessary exposure of patients or providers to potential infection. the main limitation of these types of appointments is the lack of an inperson physical exam, so providers have found that billing based on time spent with the patient is more effective with this tool. this could generate revenue and facilitate future surgical cases after the suspension of in-person elective patient care has been lifted. several strategies should be considered by the elective surgery community to minimize financial losses. many financial entities have changed their policies in order to support small businesses. examples include the small business administration offering expanded disaster impact loans and deferment of the federal income tax payments by three months to july . another option employers may leverage is temporarily laying off of employees so that employees can apply for and collect an expanded unemployment package by federal and state governments thereby reducing the payroll burden on stagnant practices with no cash flows and providing employees with a steady source of income during the pandemic. the employer's incentive to do this may be reduced with the potential suspension of the payroll tax on employers and loan forgiveness to employers who continue to pay employees wages. once elective procedures are again permitted, plastic surgeons that have retained a reconstructive practice should make a strategic business decision to increase reconstructive surgery and emergent hand surgery bookings as historically these procedures are less fluctuant with the economy. other options to maintain aesthetic case volume include price reductions or temporary promotions. however, it is important that these be adopted universally in order to minimize price wars between providers. as physicians, it is principle that surgeons practice nonmaleficence and minimize non-essential patient contact for the time being. however, this time of financial standstill should be used constructively to prepare for the financial uncertainty in the months to come. none demic advise certain groups to stringently follow social distancing measures. inevitably some health care workers fall into these categories and working in a hospital places them at high risk of exposure to the virus. studies have shown human to human transmission from positive covid- patients to health care workers demonstrating that this threat is real , and as in other infectious diseases is worse in certain situations such as aerosol generating and airway procedures , . there is therefore a part of our workforce that has been out of action reducing available workforce at a time of great need. in our hospital a group of vulnerable surgical trainees ranging from ct to st , and also consultants, have been able to keep working while socially isolating within their usual workplace. in light of covid- our hospital, a regional trauma centre for burns, plastic surgery and oral and maxillofacial surgery, was reorganized to increase capacity for both trauma and cancer work. as part of this a virtual hand trauma service has been set up. the primary aim of the new virtual hand trauma clinic was to allow patients to be triaged in a timely manner while adhering to social distancing guidelines by remotely accessing the clinic from home. further aims were to reduce time spent in hospital and reduced time between referral and treatment. in brief, patients referred to our virtual hand trauma clinic from across the region receive a video or telephone consultation using attend anywhere software, supported by nhs digital. following the virtual consultation patients are then triaged to theatre, further clinic, or discharged. our group of isolating doctors, plus a pharmacist and trauma coordinator, have been redeployed away from their usual face to face roles and are now working solely in the virtual trauma clinic. they are able to work to provide this service in an isolated part of the hospital named the 'virtual nest.' the nest is not accessible in a 'face to face' manner by non-isolating staff or patients. this allows a safe 'clean' environment to be maintained. the virtual team is able to participate in morning handover with other areas of the hospital via video conferencing using webex software. the nest workspace is large enough to allow social distancing between clinicians and by being on site they benefit from availability of dedicated workspaces with suitable it equipment and bandwidth. it is widely recognised that reconfiguration of hospitals and redeployment of staff has meant that training is effectively 'on hold' for many trainees. we have found that a benefit of the new virtual hand trauma clinic is that trainees can continue to engage with the intercollegiate surgical curriculum programme with work based assessments in a surgical field. while direct observation of procedural skills and procedure based assessment are not feasible, case based discussions and clinical evaluation exercises have been easily achievable due to trainees managing patients with involvement of supervising senior colleagues in decision making. this plus a varied case mix seen has enhanced development of knowledge, decision making, leadership and communication skills. as trainees are unable to attend theatre practical skills may suffer depending on how long clinicians are non patient facing. this has been acknowledged by the gmc in the skill fade review; skills have been shown to decline over - months . although it can only be postulated at the current time colleagues who are patient facing but redeployed may face a similar skill decline. the structure of the team is akin to the firm structure of days gone by with the benefits that brings in terms of support and mentorship. patients benefit from having access to a group of knowledgeable trainees, supported by consultants, and a service accessible from their own home. this minimizes footfall within our hospital, exposure to, and spread of covid- . local assessment of our practice is ongoing but we have found that this model has enabled a cohort of vulnerable plastic surgery trainees to successfully continue to work whilst reducing the risk of exposure to covid- and providing gold standard care for patients. none. nothing to disclose. dear sir, a scottish sarcoma network (glasgow centre) special study day on th march at the school of simulation and visualisation, glasgow school of art, with representatives from sarcoma uk, beatson cancer charity and the bbc. traditional patient information leaflets inadequately convey medical information due to poor literacy levels: - % of uk population have the lowest adult literacy level and % the lowest "health literacy" level (ability to obtain, understand, act on, and communicate health information). it was hypothesised that an entirely visual approach, such as ar, may obviate literacy problems by faciliating comprehension of complex dimensional concepts integral to reconstructive surgery. we report the first augmented reality (ar) in patient information leaflets in plastic surgery. to our knowledge we are among the first in the world to develop, implement, and evaluate an ar patient information leaflet in any speciality. developed for sarcoma surgery, the ar patient leaflet centred around a prototypical leg sarcoma. a storyboard takes patients through tumour resection, reconstruction, and the potential post-operative outcomes. input from specialist nurses, sarcoma patients, and clinicians during a scottish sarcoma network special study day in march informed the final content ( figure ). when viewed by smartphone camera (hp reveal studio, hp palo alto, california usa), photos in the ar leaflet automatically trigger additional content display without need for qr codes or internet connectivity: ( ) sequential tumour resection ( a d alt flap model was developed using body-parts d (research organization of information and systems database centre for life science, japan) and custom anatomical data. leaflet evaluation by consecutive lower limb sarcoma patients was exempted from ethics approval by greater glasgow and clyde nhs research office as part of service evaluation. ar leaflets were compared with pooled data from traditional information sources (sarcoma uk website patient leaflets ( ), self-directed internet searches ( ), generic sarcoma patient leaflets ( ); some patients used > source). the mental effort rating scale evaluated perceived difficulty of comprehension (or extrinsic cognitive load), as a key outcome measure in comparison to traditional information sources. patient satisfaction was assessed by likert scale ( was very, very satisfied and very, very dissatisfied). statistical analysis performed with social science statistics, . ar leaflets were rated as . (very, very low mental effort), traditional information sources as . (high mental effort) [unpaired t -test p < . ]. likert-scale satisfaction was . , indicating a very, very high satisfaction. when asked "do you think the ar leaflet would make you less anxious about surgery?", / ( %) patients responded 'yes'. when asked "would you think other patients would like to have a similar ar leaflet before surgery" and "would you like to see further ar leaflets to be developed in the future?", % responded "yes". no correlation was found between age or educational level and mental effort rating scale scores for ar patient leaflet (data not shown). subjective feedback analysis found that self-directed internet searches had too much unfocussed information: " (i) didn't want to google as may end up with all sorts" and "(there is) good and bad stuff on the internet, don't know what you're looking at". all patients felt the visual content in ar leaflets helped their understanding: "incredible…that would have made a flap easier to understand", "tremen-dous… good way of explaining things to my family", "so much better seeing the pictures, gives an idea in your head", and "helpful for others with dyslexia". traditional patient leaflets were often difficult to comprehend: "(i) didn't fully understand the sarcoma leaflets", "couldn't take information in from leaflets". feedback recommended adding simple instructions on the leaflet, however the ar leaflet is intended for use by the clinician in clinic, and to be so simple that no instructions are required once software is downloaded to the patient's smartphone (i.e., point and shoot without technical expertise, menus, or website addresses). all patients desired an actual paper leaflet for reassurance, preferring something physical show their family rather than direction to a website or video. this study demonstrates significant reduction in extraneous cognitive load (mental effort required to understand a topic) with ar patient leaflets compared to traditional information sources ( p < . ). ar visualisation may make inherently difficult topics (intrinsic cognitive load), such as reconstructive surgery, easier to understand and process. significant learning advantages exist over tradi-tional leaflets or web-based videos, including facilitating patient control, interactivity, and game-based learning. all contribute to increased motivation, comprehension, and enthusiasm in the learning process. ar leaflets reduced anxiety ( % patients), and scored very highly for patient satisfaction with information, which is notable given increasing evidence of strong independent determination of overall health outcomes. this study provided impetus for investment in concurrent development of other ar leaflets across the breadth of plastic surgery, and non-plastic surgery specialties. chief scientist office (cso, scotland) funding was recruited to aid development of improved, free, fully interactive d ar patient information leaflets and a downloadable app. ethical approval is in place for a randomised controlled trial to quantify the perceived benefits of ar in patient education. our belief is that ar leaflets will transform and redefine the future plastic surgery patient information landscape, empowering patients and bridging the health literacy gap. none. dear sir, we investigated if age has an influence on wound healing. wound healing can result in hypertrophic scars or keloids. from previous studies we know that age has an influence on the different stages of wound healing. - a general assumption seems to be that adults make better scars than children. knowledge of the influence of age on healing and scarring can give opportunities to intervene in the wound healing process to minimize scarring. it could guide patients in their decision when to revise a scar. it could also lead patients and physicians in their decision of the timing of a surgery, if the kinds of surgery allows this. this study is a retrospective cohort study at the department of plastic, reconstructive, and hand surgery of the amsterdam university medical center. all patients underwent cardiothoracic surgery through a median sternotomy incision. all patients had to be at least one year after surgery at time of investigation. hypertrophic scars were defined as raised mm above skin level while remaining within the borders of the original lesion. keloid scars were defined as raised mm above skin level and extending beyond the borders of the original lesion. the scars were scored with the patient and observer scar assessment scale (posas) as primary outcome measure. as secondary outcome measures we looked at wound healing problems and scar measurements. in order to ensure that the results of this study are as little as possible influenced by the already known risk and protective factors for hypertrophic scarring, the patients were questioned about co-existing diseases, scar treatment, allergies, medication, length, weight, cup size (females) and smoking. their skin type was classified with the fitzpatrick scale i to vi. all calculations were performed using spss and the level of significance was set at p ≤ . . patients were enrolled in this study. group contained children and group contained adults. there is a significant difference between the two groups for the amount of pain in the scar scored by the patient. this item was given higher scores by adults than children ( p = . ). there is no significant difference between the two groups for the other posas items (itchiness, color, stiffness, thickness, and irregularity), the total score of the scar and the overall opinion of the scar scored by the patient ( table ) . there is a significant difference between the two groups in pliability of the scar scored by the observer. the posas item pliability of the scars of the children was assessed higher, thus stiffer, than in adults ( p = . ). there is no significant difference between the two groups for the other posas items (vascularization, pigmentation, thickness, relief, and surface), the total score of the scar and the overall opinion of the scar scored by the observer ( table ) . there is no significant difference between children and adults in the occurrence of wound problems post-surgery. there is no significant difference in scar measurements between children and adults. in children we found three hypertrophic scars and two keloid scars. in adults we found seven hypertrophic scars and three keloid scars. for both groups together that is a percentage of . hypertrophic and keloid scars ( table ) . patients with fitzpatrick skin type i and iv-vi scored significantly higher, thus worse, in their overall opinion of the scar ( p = . ) than patients with skin type ii and iii. observer and patient assessed the overall opinion of the scar significantly higher (worse) in people who had gone through wound problems (respectively p = . and p = . ) than those who had not. we found no significant differences in the primary outcome measure between men and women, cup size a-c and d -g, smokers and non-smokers, bmi < and bmi > , allergies and no allergies, and scar treatment and no scar treatment. age at creation of a sternotomy wound does not seem to influence the scar outcome. this is contrary to what is often the fear of a parent of a child who needs surgery early in life. comparing scars remains difficult because of the many factors that can influence scar formation. we found that scars have the tendency to change, even years after they are made. a limitation of the study is the retrospective design. the long follow-up period after surgery is a strength of the study. to our best knowledge this is the first study that compares scars of children and adults to specifically look at the clinical impact of age on scar tissue. in order to detect even more reliable and possibly significant differences between children and adults, more patients should be enrolled in future prospective studies. for now we can conclude that there is no significant difference in the actual scar outcome between children and adults in the sternotomy scar. if we extend these results to other scars, the timing of surgeries should not depend on the age of a patient. none. none. metc. reference number: w _ # . . we published a systematic review of randomized controlled trials (rcts) on early laser intervention to reduce scar formation in wound healing by primary intention. while comparing our results with two other systematic reviews on the same topic, , we identified various overt methodological inconsistencies in those other systematic reviews. issue . including duplicate data ( table ) : karmisholt et al. included two rcts of which both reported the identical data on five people. the inclusion of duplicate data can bias the results of a systematic review and should be prevented in the quantitative as well as the qualitative synthesis of evidence. abbreviations. id: identity; n.l.t.: no laser treatment; pcs: prospective cohort study; pmid: pubmed identifier; rct: randomized controlled trial. a) listed are rcts which were included by at least one of the three identified systematic reviews. the systematic reviews are ordered by search date from left to right. b) "search date" refers to the searching of bibliographic databases by the authors of the corresponding systematic reviews. c) "publication date" refers to the publication history status according to medline®/pubmed® data element (field) descriptions. d) "n.l.t." means that the authors of the rcts compared laser treatment with no treatment or a treatment without laser. e) "pcs" means that the authors used this term to label the corresponding rct. f) "-" indicates that an rct could not have been identified because the publication of the corresponding rct happened after the search date. g) "missing study" means that an rct could have been identified because the publication of a corresponding rct happened before the search date. h) "excluded" that the authors of the present review excluded the corresponding rct based on the exclusion criteria provided. i) "not analyzed" means that an rct was reported within an article but the corresponding data were not included in the metaanalysis. j) "other laser" means that the authors of the rcts compared various types of laser treatment. attached the label "prospective cohort" to almost all considered studies including rcts and seven nonrandomized studies. in rcts, subjects are allocated to different interventions by the investigator based on a random allocation mechanism. in cohort studies, subjects are not allocated by the investigator but rather allocated in the course of usual treatment decisions or peoples' choices based on a nonrandom allocation mechanism. we believe that 'cohort study' is certainly not an appropriate label for rcts. furthermore, it is known for a long time that the shorthand labeling of a study using the words 'prospective' and 'retrospective' may create confusion due to the experience that these words carry contradictory and overlapping meanings. issue . mixing data from various study designs: karmisholt et al. did not clearly separate randomized from nonrandomized studies. combinations of different study design features should be expected to differ systematically, and different design features should be analyzed separately. issue . unclear definition of outcomes and measures of treatment effect: kent et al. reported, quote: "the primary outcome of the meta-analysis is the summed measure of overall efficacy provided by the pooling of overall treatment outcomes measured within individual studies." we think that the so-called "summed measure" is not defined and not understandable. the meta-analysis reported in that article included mean and standard deviation values from four rcts. these rcts applied endpoints and time periods for assessment which differed considerably among the included studies. it appears obscure to us which data were transformed in what way to finally arrive in the meta-analysis. we believe that traceability and reproducibility of data analyses are mainstays of systematic reviews. issue . missing an understandable risk of bias assessment: kent et al. reported, quote: "the risk of bias assessment tool provided by revman indicated that all studies had - categories of bias assessed as high risk." the term "revman" is a short term for the software "review manager provided by cochrane for preparing their reviews. the cochrane risk-of-bias tool for randomized trials is structured into a fixed set of domains of bias including those arising from the randomization process, due to deviations from intended interventions, due to missing outcome data, those in measurement of the outcome, and in selection of the reported result. we believe that the risk of bias assessment reported by kent et al. is not readily understandable and presumably does not match standard requirements. systematic reviews of healthcare interventions aim to evaluate the quality of clinical studies, but they might have quality issues in their own right. the identification of various inconsistencies in two systematic reviews on plateletrich plasma therapy for pattern hair loss should prompt future authors to consult the cochrane handbook ( https: //training.cochrane.org/handbook ) and the equator network ( http://www.equator-network.org/ ). the latter provides information to various reporting standards such as prisma for systematic reviews, consort for rcts, and strobe for observational studies. the authors declare no conflict of interest. dear sir, journal clubs have contributed to medical education since the th century. along the way, different models and refinements have been proposed. recently, there has been a shift towards "virtual" journal clubs, often using social media platforms. our team has refined the face-to-face journal club model and successfully deployed it at two independent uk national health service (nhs) trusts in . we believe there are reproducible advantages to this model. over months at one nhs trust, journal club events were held, with iterative changes made to increase engagement and buy-in of the surgical team. overall, tangible outputs included submissions of letters to editors, of which have been accepted. following this, the refined model was deployed at a second nhs trust, which had expanded academic support increasing its impact. over months, journal club events were held, with submissions of letters to editors, of which have been accepted. thus, in months of , the two sequential journal clubs generated submissions for publication, with different authors. these tangible outputs are matched by other intangible benefits, such as improving critical appraisal skills. this is assessed in uk surgical training entry selection and is also a key skill for evidence-based professional practice. therefore, we feel this helps our team members' career progression and clinical effectiveness. key aspects of the model include: . face-to-face meetings continue to have multiple intangible benefits there is a trend towards social media and online journal clubs. while such initiatives have considerable benefits, maintaining face-to-face contact in a department allows for an efficient discussion, and enhances teambuilding. instead of replacing face-to-face meetings with virtual ones, we use social media platforms, such as whatsapp, to support our events. this includes communications to arrange the event in advance, and for maintaining momentum on post-event activities, such as authoring letters to journals from the discussion. while some articles describing journal club models highlight the benefit of expert input in article selection, we also view it as a learning opportunity. a surgical trainee is allocated to present each journal club, with one of our three academically appointed consultant surgeons chairing and overseeing. trainees are encouraged to screen the literature and identify articles beforehand and make a shared decision with the consultant. the article must be topical and have potential to impact clinical practice. doing this prior to the session allows the article to be circulated to attendees with adequate time to read it. we routinely use both reporting guidelines (e.g., prisma for systematic reviews), and also methodological quality guidance (e.g., amstar- for systematic reviews) to guide trainees and structure the journal club presentation. in addition to three consultants with university appointments guiding critical appraisal, a locally based information scientist also joins our meetings. during journal club discussion, emphasis is placed on relating the article to the clinical experience of team members. this provides context and aids clinical learning for trainees. while undertaking critical appraisal may be a noble endeavour, in busy schedules, it is important that it adds value for everyone involved. reviewing contemporary topics can inform clinical practice for all levels of surgeon in the team, presenting the article improves trainees' presentation skills, and publishing the appraisal generates outputs that help trainees to progress. . publishing summaries of journal club appraisals can impact on multiple levels journal club does not only contribute to our trainees' development and departmental clinical practice. it benefits our own research strategy and quality, and open discussion of literature in plastic surgery contributes to a global culture of improving evidence. scheduling events on a regular basis increases familiarity with reporting and quality guidance and allows for the study of complementary article types (e.g., systematic review, randomised trial, cohort study). our iterations suggest that the following structure is most effective: joint article selection one week before event, dissemination to audience, set time and location during departmental teaching, chairing by an academic consultant with information scientist and senior surgeons present, presentation led by a surgical trainee, open-floor discussion of article and its implications for our own practice, summary, drafting of letter to the editor if appropriate. as we have used variations of this model successfully at two independent nhs trusts, we believe that these tactics can be readily adapted and deployed by others as well. nil. dear sir, surgical ablation of advanced scalp malignancies requires wide local excision of the lesion, including segmental craniectomies. the free latissimus dorsi (ld) flap is a popular choice for scalp reconstruction due to its potential for mass surface area resurfacing, ability to conform to the natural convexity of the scalp, reliable vascularity and reasonable pedicle length. one of the disadvantages of ld free flap use is the perceived need for harvest in in a lateral position. this necessitates a change in position of the patient intraoperatively for flap raise and can add to the overall operative time. current literature in microvascular procedures on the elderly demonstrates that a longer operative time is the only predictive factor associated with an increased frequency of post-operative medical and surgical morbidity. as most patients undergoing scalp malignancy resection are elderly it is important to reduce this surgical time in this cohort of patients. , we present our experience of reconstruction of composite cranial defects with ld flaps using a synchronous tumour resection and flap harvest with supine approach to reduce operative times and potential morbidity. all patients undergoing segmental craniectomies with prosthetic replacement and ld reconstruction under the care of the senior surgeons were included in the study. patients were positioned supine with a head ring to support the neck; a sandbag is placed between the scapulae and the arm on the chosen side of flap raise is free draped. a curvilinear incision is made posterior to the midaxillary line ( figure ). the lateral border of the ld muscle is identified, and dissection continued in a subcutaneous plane inferiorly, superiorly and medially until the midline is approached. the muscle is divided at the inferior and medial borders, and the flap lifted towards the pedicle. once the pedicle is identified, the assistant can manipulate the position of the free draped arm to aid access into the axilla; the pedicle is clipped once adequate length has been obtained. the flap is delivered through the wound and detached ( figure ). donor site closure is carried out conventionally.the flap inset is performed using a "vest over pants" technique utilising scalp over muscle by undermining the remaining scalp edges. a non-meshed skin graft is used to enhance aesthetic outcome. a total of patients underwent free ld muscle flaps. all were muscle flaps combined with split-thickness skin grafts. the study population included ten male patients and one female. the age range was - years with a mean age of . years. the defect area ranged from cm - cm . a titanium mesh was utilised for dural cover in all patients fixed with self-drilling × . mm cortical screws. the primary recipient vessel used was the superficial temporal artery and vein. however, in cases where a simultaneous neck dissection and parotidectomy are necessary for regional disease, the facial artery and vein are used ( n = in this series) or contralateral superficial temporal vessels. the ischaemia time ranged from - min, with a mean of . min. there were no take backs for flap re-exploration. the overall flap success rate was %. marginal flap necrosis with secondary infection occurred in one patient with a massive defect (at one week post-op). the area was debrided and a second ld flap was used to cover the resultant defect ( %). a further posterior transposition flap was used to cover a minor area of exposed mesh. the scalp healed completely. the total operating time ranged between - min, with a mean of min. all patients were followed up at and then four weeks for wound checks. the ld flap remains a popular choice due to its superior size and ability to conform to the natural convexity of the scalp compared with other flap choices. also, unlike composite flaps which often require postoperative debulking procedures, the ld muscle flap atrophy's and contours favourably to the skull. however, the traditional means of access to this flap requires lateral decubitus positioning of the patient, which can hinder simultaneous oncological resection. the supine position facilitates access for neck dissection, especially if bilateral access is required. our approach ensures that the tumour ablation and reconstruction is carried out in a time efficient manner in an attempt to reduce postoperative medical and surgical complications. synchronous ablation and reconstruction are key in reducing overall operative time and complication risk and is practised preferentially at our institute. it is important to maintain a degree of flexibility to achieve this -there may be situation where supine positioning overall is more favourable. likewise, there are situations relating to flap topography where a lateral approach to tumour removal and reconstruction is preferred. the resecting surgeon or reconstructive surgeon may have to compromise to achieve synchronous operating but is worthwhile to reduce overall total operative time. none. not required. once established, lymphorrhea typically persists and can present as an external lymphatic fistula. lymphorrhea occurs in limbs with severe lymphedema, as a complication after lymphatic damage, and in obese patients. some cases are refractory to conservative treatment and require surgical intervention. reconstruction of a lymphatic drainage table three patients had primary lymphedema, had age-related lymphedema, had obesity-related lymphedema, and had iatrogenic lymphorrhea. in the cases of iatrogenic lymphorrhea, the lesions were located in the groin and the others in the lower leg. abbreviations: bmi, body mass index; f, female; m, male. three patients had primary lymphedema, four had agerelated lymphedema (aging of the lymphatic system and function is thought to be the cause of age-related lymphedema .), three had obesity-related lymphedema, and two had iatrogenic lymphorrhea ( table ) . one of cases of lymphorrhea in the inguinal region was caused by lymph node biopsy and the other by revascularization after resection of malignant soft tissue sarcoma. compression therapy had been performed preoperatively in cases (using cotton elastic bandages in cases). four patients wore a jobst r compression garment. compression therapy was difficult to apply in patients. the duration of lymphorrhea ranged from to months. the severity of lymphedema ranged from campisi stage to ( table ). the clinical diagnosis of lymphorrhea was confirmed by observation of fluorescent discharge from the wound on lymphography. no signs of venous insufficiency or hypertension were observed in the subcutaneous vein intraoperatively. all anastomoses were performed between distal lymphatics and proximal veins. postoperatively, lymph was observed to be flowing from the lymphatic vessels to the veins. two to lvas were performed in the region distal to the lymphorrhea and - in the region proximal to the lymphorrhea in patients with lower limb involvement. six lvas were performed in patients with lymphorrhea in the inguinal region ( table ) . all patients were successfully treated with lvas without perioperative complications. the volume of lymphorrhea decreased within days following the lva surgery in all cases and had resolved by weeks postoperatively. the compression therapy used preoperatively was continued postoperatively. there has been no recurrence of lymphorrhea or cellulitis since the lvas were performed. an -year-old woman had gradually developed edema in her lower limbs over a period of - years. she had also developed erosions on both lower legs ( figure ). compression with cotton bandages failed to terminate the percutaneous discharge; about ml of lymphatic discharge through the erosion was noted each day. ultrasonography did not suggest a venous ulcer resulting from venous thrombosis, varix, or reflux. four lvas were performed in each leg ( distal and proximal to the leak). the lymphorrhea had mostly resolved by days postoperatively. the erosions healed within weeks of the surgery. no recurrence of lymphorrhea was noted during months of follow-up. iatrogenic lymphorrhea occurs after surgical intervention involving the lymphatic system. it is also known to occur in patients with severe lymphedema. obesity and advancing age are also risk factors for lymphedema. most patients with lymphorrhea respond to conservative measures but some require surgical treatment. patients with lymphorrhea are at increased risk of lymphedema. lymphorrhea that occurs after surgery or trauma is caused by damage to lymphatic vessels that are large enough to cause lymphorrhea. lymphorrhea that occurs in association with lipedema or age-related lymphedema indicates accumulation of lymph that has progressed to lymphorrhea. it is possible to treat lymphorrhea by other methods, including macroscopic ligation, compression, or negative pressure wound therapy . however, it is impossible to reconstruct a lymphatic drainage route using these procedures. we hypothesized that lymphorrhea can be managed by using lva to treat the lymphedema. lva is a microsurgical technique whereby an operating microscope is used to perform microscopic anastomoses between lymphatic vessels and veins to re-establish a lymph drainage route. the primary benefits of lva are that it is minimally invasive, can be performed under local anesthesia, and through incisions measuring - cm. one anastomosis is adequate to treat lymphorrhea and serves to divert the flow of the lymphorrhea-causing lymph to the venous circulation. if operative circumstances allow, or more anastomoses are recommended for the treatment of lymphorrhea complicated by lymphedema. lymphedema is a cause of delayed wound healing, and lva procedures are considered to improve wound healing in lymphedema via pathophysiologic and immunologic mechanisms . lva is a promising treatment for lymphorrhea because it can treat both lymphorrhea and lymphedema simultaneously. the focus when treating lymphedema has now shifted to risk reduction and prevention, so it is important to consider the risk of lymphedema when treating lymphorrhea. none over-meshing : meshed skin graft we were curious to learn if it's feasible to mesh already meshed skin grafts. we run our skin bank at the department of plastic surgery and used allograft skin that was tested microbiologically positive and thus not suitable for patient use. grafts were cut into cm x . cm pieces and meshed using mesh carriers to : and over-meshed with : . . we used two kind of mesh carriers for : . meshes. the meshed grafts were maximally expanded and measured again. the results were expressed as ratios, figure . we found that, over-meshing results in . -fold increase in graft area regardless of the mesh carrier used. figure illustrates close-up picture of the over-meshed graft. in the close-up picture the small : incisions are still visible. in those undesirable "oh no the graft is too small"or "the graft is too large" -situations this technique has its advantages. we have used over-meshed graft in a skin graft harvest site, supplemental figure, with acceptable outcome. it seems that the tiny extra incisions in the overmeshed skin graft do not deteriorate the aesthetic outcome from the : . mesh. what is the clinical value of the tiny incisions, we don't know, but we approximate it to be minimal if even that. to best of our knowledge, only one previous publication has addressed the over-meshing of skin grafts . henderson et al. showed in porcine split thickness skin grafts that overmeshing resulted in increase of . ratio, a bit larger compared to our results. taken together, the results point to the direction that meshing of already meshed graft is feasible and does not destroy the architecture of the original or succeeding mesh. each author declares no financial conflicts of interest with regard to the data presented in this manuscript. supplementary material associated with this article can be found, in the online version, at doi: . /j.bjps. . . . numerous autologous techniques for gluteal augmentation flaps have been described. in the well-known currently employed technique for gluteal augmentation, it is noticeable that added volume is unevenly distributed in the buttock. in fact, after a morphological analysis, it becomes clear that the volume is added to the upper buttock to the expense of the lower buttock. according to wong's ideal buttock criteria, the most prominent posterior portion is fixed at the midpoint on the side view. additionally, mendieta et al. suggest that the ideal buttock needs equal volume in the four quadrants and its point of maximum projection should be at the level of the pubic bone. we describe a technique of autologous gluteal augmentation using a para-sacral artery perforator propeller flap (psap). this new technique can fill up all the quadrants vertically with a voluminous flap shaped like a gluteal anatomic implant. gluteal examination is done in a standing and prone position. patients must have a body mass index less than kg/m , an indication for a body lift contouring surgery, gluteal ptosis with platypygia and substantial steatomery on the lower back. when the pinch test is greater than cm this is defined as substantial steatomery. preoperative markings: the ten steps a. standing position . limits of the trunk. the median limit (mlt) and the vertical lateral limit (llt) of the trunk are marked. . limits of the buttock. the inferior gluteal fold (igf) is drawn. the vertical lateral limit of the buttock (llb) is defined at the outer third between the mlt and the llt. . lateral key points. points c and c' are located on the vertical lateral limits: point c is to cm below the iliac crest, depending on the type of underwear. point c' is determined by an inferior strong tension pinch test performed from point c. mhz. this diagnostic tool is easy to access, non-invasive, and above all, reliable in the identification of perforating arteries, with sensitivity and a positive predictive value of almost %. usually, one to three perforators are identified on each side and marked. . design of the gluteal pocket. the shape is oval, with the dimensions similar to those of the flaps. the base is truncated and suspended from the lower resection line. the width of the pocket is one to two centimeters from the lmt laterally and two centimetres from llt medially. the inferior border of the pocket is not more than two fingers'-breadth above the ifg. therefore, the pocket lies medial in the gluteal region. . design of the flap. the flap is shaped like a "butterfly wing" with the long axis following a horizontal line. after a °medial rotation, the flap has a shape similar to an anatomical gluteal prosthesis. the medial boundary is two fingers'-breadth from the median limit of the buttock, and the width is defined by the two resection limits. the patient is placed in a prone position, arm in abduction. the flap is harvested from lateral to medial direction, first in a supra-fascial plane then sub-fascial when approaching the llb. the dissection is completed when the rotation arc of the flap is free of restriction ( °− °), and viewing or dissection of the perforators is usually not required. to create the pocket, custom undermining is done in the sub-fascial plane according to the markings. the flap is then rotated and positioned into the pocket. the superficial fascial system is closed with vicryl (ethicon) and the deep and superficial dermis are closed with a buried intradermal suture and running subcutaneous suture with . monocryl (ethicon). a compressive garment (medical z lipo-panty elegance coolmax h model, ec/ -h) was worn postoperatively for one month ( figure ). rhinoplasty is one of the most common procedures in plastic surgery and - % of the patients undergo revision. dorsal asymmetry is the leading ( %) nasal flaw in secondary patients. careful management of the dorsum to achieve a smooth transition from radix to tip is necessary. camouflage techniques are well known maneuvers for correcting dorsal irregularities. cartilage, fascia, cranial bone, and acellular dermal matrix were previously used for this aim. , bone dust is an orthotopic option, which is easily moldable into a paste. it is especially useful in closed rhinoplasty, where our visual acuity on the dorsum is reduced. we introduce a new tool, a minimally invasive bone collector, as an effective and safe device for harvesting bone dust from the nasal bony pyramid to obtain camouflage on the dorsum and for performing ostectomy simultaneously. patients were operated for nasal deformity by the senior author (o.b.) with closed rhinoplasty between february and november . in all cases, a minimally invasive bone collector was used for ostectomy and the harvest of bone dust. included patients were primary cases with standardized photos, complete medical records, and -year follow-up. written informed consent for operation and publishing their photographs was obtained and the study was performed in accordance with standards of declaration of helsinki. the authors have no financial disclosure or conflict of interest to declare. patient data were obtained from rhinoplasty data sheets and photographs were used for the analysis of nasal dorsum height, symmetry, and contour. physical examinations were carried out for detecting irregularities. micross (geitslich pharma north america inc., princeton, new jersey) is a bone collector, which allows easy harvest, especially in narrow areas. micross comes with a package containing sterile disposable scraper. it is externally mm in diameter and has a cutting blade tip. a collection chamber allows harvesting maximum of . cc graft at once. a sharp technique improves graft viability. incisions for lateral osteotomies were used to introduce micross when the planned ostectomy site was nasomaxillary buttress. infracartilaginous incision was used when the desired ostectomy site was dorsal cap or radix. bone dust was collected into a chamber with a rasping movement. the graft is mixed with blood during the harvest, this obtains an easily moldable bone paste (surgical technique is described in the video). after the completion of osteotomies and cartilaginous vault closure, the bone paste was placed on the site of bony dorsum, which is likely to show irregularities postoperatively. a nasal splint was used to maintain contour. the bone graft was not wrapped into any other graft. eighteen patients underwent primary closed rhinoplasty with -year follow-up. seventeen of patients were female and one was male. harvesting sites were nasomaxillary buttress in patients, radix in patients and dorsal cap in patients. the total graft volume was between . and . cc/per patient. the nasal dorsum height, symmetry, contour, and dorsal esthetic lines were evaluated using standardized preoperative and postoperative photographs. dorsal asymmetry, overcorrection of the dorsal height or residual hump were not observed in of the patients ( figures - ). only patient had a visible irregularity of the dorsum. physical examination revealed palpable irregularities in patients. none of the patients required surgical revision for residual or iatrogenic dorsum deformity. asymmetries and irregularities of the upper one-third of the nose, lead to poor esthetic outcomes, and secondary revision surgeries. to treat open roof after hump resection; lateral osteotomies, spreader grafts, flaps and camouflage grafts are commonly used. warping, resorbtion and migration, visibility, limited volume, donor site morbidity, and the risk of infection are the main disadvantages of grafts. Örero glu et al. have presented their technique of using diced cartilage combined with bone dust and blood. tas have reported results with harvesting bone dust with a rasp and using this for dorsal camouflage. the disadvantages of harvesting with a rasp were difficulty with collecting dust from the teeth of the rasp and losing a certain amount of graft material during the harvest. with using micross, a harvested graft is collected in the chamber, thereby the risk of losing the graft material is resolved. replacing "like with like" tissue concept is important, therefore the reconstruction of a bone gap can be achieved successfully with bone grafts. to limit the donor site morbidity, we prefer to harvest bone from the dorsal cap, which was preoperatively planned to be resected. the preference of lateral osteotomy lines as the donor site facilitates osteotomies by thinning the bone. the device allows us to effectively harvest the bone under reduced surgical exposure. simultaneous harvest and ostectomy contributes to a reduced operative time. operative cost is relatively low in comparison with alloplastic materials. in this series, we did not experience resorbtion, migration, visibility problems, or infection with bone grafts. a new practical, safe, and efficient tool for rhinoplasty was introduced. graft material was successfully used for smoothing the bony dorsum without any significant complications. none. not required. the authors have no financial disclosure or conflict of interest to declare in relation to the content of this article. no funding was received for this article. the work is attributed to ozan bitik, m.d. (private practice of plastic, reconstructive and aesthetic surgery in ankara, turkey) dear sir, early diagnosis of wound infections is crucial as they have been shown to increase patient morbidity and mortality. hence, it is important that such infections are detected early to guide decision-making and management . currently, the most common methods of identifying wound infection is by clinical assessment and semi-quantitative analysis using wound swabs. bedside assessment is subjective, and it is shown that bacterial infection can often occur without any clinical features. on the other hand, swabs have the disadvantages of missing relevant bacterial infection at the periphery of the wound due to the sampling technique as well as delaying diagnostic confirmation which may lead to a change in the bioburden of the wound. although tissue biopsy is the gold standard diagnostic tool, it is seldom used as it is invasive, has a higher technical requirement and is also more expensive. a hand-held and portable point-of-care fluorescence imaging device (moleculight i:x imaging device, moleculight, toronto, canada) was introduced to address the limitations of the other diagnostic methods . this device takes advantage of the fluorescent properties of certain by-products of bacterial metabolism such as porphyrin and pyoverdine. when excited by violet light (wavelength nm), porphyrins will emit a red fluorescence whereas pyoverdine has a cyan/blue fluorescence. the types of bacteria that produce porphyrins include s. aureus, e. coli , coagulase-negative staphylococci, beta-hemolytic streptococci and others whereas pyoverdine which emits cyan fluorescence is specific to pseudomonas aeruginosa. this allows users to localise areas of bacterial colonisation at loads ≥ amongst healthy tissue which instead emits green fluorescence . the benefits of this device are that it is portable, non-contact which means minimising cross-contamination, non-invasive and it provides real-time localization of bacterial infection. all these features allow it to be a useful tool to aid diagnosis and guide further investigation and management. many previous studies that have examined the efficacy of auto fluorescent imaging in diagnosing infections in chronic wounds - . however, equally important is identifying infections in acute wounds which will help guide antimicrobial management as well as surgical debridement. often, broad-spectrum antibiotics are given where clinical assessment remains inconclusive. this, however, may lead to an increase in antimicrobial resistance. therefore, the use of moleculight i:x to identify infections in acute open wounds in hand trauma was evaluated. we collected data from patients who attended the hand trauma unit over a -week period prior to irrigation and/or debridement. wounds were inspected for clinical signs of infection and autofluorescence images were taken using the moleculight i:x device. wound swabs were taken, and the results of these interpreted according to the report by the microbiologist. autofluorescence images were interpreted by a clinician blinded to the microbiology results. patients were included, and data collected from wounds. wounds ( . %) showed positive clinical signs of infection, ( . %) were positive on autofluorescence imaging and ( . %) of wound swab samples were positive for significant infection. autofluorescence imaging correlated with clinical signs and wound swab results for wounds ( . %). in one case, the clinical assessment and autofluorescence imaging showed positive signs of infection but the wound swabs were negative. to the best of our knowledge, this is the first time the use of autofluorescence imaging in an acute scenario was investigated. in this study, out of of the wound swab samples that were positive, autofluorescence imaging correctly identified both ( %) ( fig. ) . one of the autofluorescence images which showed red fluorescence on the wound and which was clinically identified as infected showed growth of usual regional flora on microbiological studies. the reason behind this could be due to the method of sampling from the centre of the wound. on autofluorescence image, the areas of significant bacterial growth were on the edges of the wound ( fig. ) . this example illustrates the potential of using autofluorescence imaging to guide more accurate wound sampling. this has also been shown in a non-randomised clinical trial performed by ottolino-perry et al. . from a surgeon's perspective, autofluorescence imaging can guide surgical debridement by providing real-time information of the infected areas of the wound. furthermore, because of its portability, this device can also be used in intra-operative scenarios to provide evidence of sufficient debridement. although easy to use, the requirement for a dark environment causes a logistical problem. the manufacturers have realised that this is a limitation of the device and have created a single-use black polyethene drape called "darkdrape" which connects to the moleculight i:x using an adapter to provide optimal conditions for fluorescence imaging. while autofluorescence imaging can help clinicians to decide whether to start antibiotics or not, it does not provide any information on the sensitivities of the bacteria. another limitation with autofluorescence imaging we encountered in our study is the difficulty with imaging acute bleeding wounds where blood shows up as black on fluorescence and therefore may mask any underlying infection. in conclusion, autofluorescence imaging in acute open wounds may be useful to provide real-time confirmation of wound infection and therefore guide management. none declared. none received. supplementary material associated with this article can be found, in the online version, at doi: . /j.bjps. . . . when compared with the two previously published studies, publication rates have improved from and have not continued to decline. interestingly, the number of publications in jpras has fallen. this may be explained by a rise in the impact factor of the journal, increasing competitiveness for publications as well as an expansion in the number of surgical journals. we observed that journal impact factor for free paper publications was significantly greater and likely reflects the stringency of the bapras abstract vetting process. comparison with other specialties is inherently difficult, primarily due to differences in study design and inclusion criteria. exclusion of posters, inclusion of abstracts published prior to presentation and studies not referenced in pubmed affect the reported publication rates. a large meta-analysis, assessing publication of abstracts, reported rates of %. rates from other specialties are shown in figure . although our figures of close to % may seemingly rank low versus other specialties, including abstracts published prior to presentation would increase the publication rate to %, therefore making it more comparable. however, this would not be a direct comparison to the two previous bapras studies. one may debate that the academic value of a meeting should be judged upon its abstract publication ratio. however, the definition of a publication is itself clouded, with an increasing number of journals not referenced in the previous 'gold standard' of pubmed, including a number of open access journals. most would still argue the importance of stringent peer review as the hallmark of a valuable publication and perhaps this along with citability should remain the benchmark. in an age where publications are key components of national selection and indeed lifelong progression in many specialties, we must ensure that some element of quality control remains so as not to dilute production of meaningful data. we have been able to reassess the publication rates for the primary meeting of uk plastic surgery. the bapras meeting remains a high-quality conference providing a platform to access the latest advances in the field. significant differences in the methodology of available literature make other speciality comparisons challenging. however, when these are accounted for publication rates are similar. within a wider context, with the increase in open access journals, it has become ever more difficult to define a 'publication'. if publication rate is to be used as a surrogate for meeting quality, then only abstracts published after the date of meeting should be included. in order to continually assess the quality of papers presented at bapras meetings, the conversion to publication should be regularly re-audited. none. dear sir, global environmental impact and sustainability has been a heated topic in the recent years. plastics and singleuse items are widely, and perhaps unnecessary, used in the healthcare sector. various recent articles , discuss the negative impacts of this in the surgical world, but can we look at the nhs sustainability as a bigger picture? whilst it is a positive step to be considering how we can reduce the environmental impact of modern operating practice, it risks falling into the trap of being overly focused and not taking an holistic view of how the health service as a whole can become more environmentally focused and reduce costs. in fact, the operating theatre is one of the more difficult places to make change. single use medical devices seem like an obvious item to replace with a more environmentally friendly re-usable alterative, but what about patient safety? such a change would require the implementation of new workflows and supervision structures to make sure patient safety is maintained. these take time to create, will meet resistance in their design and implementation, and may not ultimately be adopted. in order to overcome these challenges, we must take a holistic view of the hospital environment -doing this reveals numerous opportunities for improvement with minimal impact on patient safety. the nhs incurs significant waste through using energy unnecessarily. some examples are readily visibly working in a hospital for a just few weeks: computers are left on standby through the night and at weekend; lights are left on throughout the night; and empty rooms are heated or cooled when left unoccupied. other sources of energy waste are less visible, but it is likely that some machinery (particularly air conditioning units) would show rapid return on investment through energy savings if they were replaced on a more regular basis. in the past, saving energy would have required a sustained campaign to educate staff and still be subject to the vagaries of human management (forgetting to switch the heating off on a friday night could lead to more than two days of wasted energy if not revisited until monday). today, solutions based on internet of things (iot) technology can use sensors to monitor the environment and take action to reduce consumption. with the use of ai and machine learning, these systems are becoming advanced such that they can even monitor and anticipate energy usage allowing rooms to be heated or cooled at times which mean that when staff arrive in the relevant room it is the ideal temperature. the nhs is starting to use such technology, with wigan hospital as the first example to install intelligent lighting. adoption should not be limited to lighting, however, and the nhs needs to adopt best practice from the commercial sector. for example, sensorflow based in singapore, provide an intelligent system that optimises cooling/heating costs for hotels around south east asia, saving the operators up to % in energy costs. , without doubt, these systems can also apply to hospital infrastructures and can help the nhs further reduce energy consumption. in addition to reducing energy consumption, the reduction of single use plastics has become a key focus in recent years and the nhs has started to address this issue. at least million single use plastic items were purchased by the nhs last year. the target to phase out plastic items used by retailers in the next months is laudable, however there is also a significant amount of disposable plastic items used in staff coffee rooms and hospital canteen. getting rid of such items completely and encourage staff to use reusable coffee cups and metal cutlery can potentially compound the cost-saving and environmental benefits. the nhs has established an early leadership position tackling environmental challenges -the first european intelligent lighting installation and ambitious targets to cut disposable plastic items -but more needs to be done. to maximise impact, the nhs needs to be seen as a whole (not by department) with the most senior executives in the health service driving national level change. we read with interest the recent article 'healthcare sustainability -the bigger picture'. the wider picture of the nhs environmental impact and sustainability clearly needs to be addressed. however, large-scale improvement projects to hospital buildings, such as intelligent lighting and heating systems, are likely to require huge investment in infrastructure and modernisation that the nhs in its current form is unfortunately unlikely to be able to make. we believe that the field of medical academia should similarly be contributing to environmental sustainability. firstly, the shelves of hospital libraries and offices internationally are lined with print copies of journals. we reviewed the surgical journals with the highest impact factors and found that all were still offering the option of a subscription of print copies, with of these printing monthly issues. consumers are able to access all journals electronically through institutional subscriptions or via the nhs openathens platform, which in our view is a more time-efficient way to search for articles, read them and to reference them. as such, we commend jpras for their recent move to online-only publication. additionally, with the increasing use of social media to discuss research and the creation of visual abstracts for articles to encourage readership, this will be likely to encourage this shift further. secondly, the environmental impact of the current academic conferencing culture must be addressed. by the end of training, a uk surgical trainee spends an average of £ attending academic conferences, but beyond this personal expenditure, what is the environmental cost? for each conference we attend, the printing of poster presentations, conference programmes and certificates all detrimentally impact our environment. furthermore, consider the conference sponsor bags we receive, filled with further printed material, plastic keyrings, stress-balls and disposable pens, all contributing to the build-up of plastic in our oceans. conferences, such as the british association of plastic and reconstructive surgeons scientific meeting, have now started using electronic poster submissions, with presentations being held consecutively on large television screens -but further measures are possible. a well-designed conference smartphone app forgoes the need for printed programmes and leaflet advertising from sponsors and could include measures to reduce the carbon footprint, such as promotion of ride-share options for venue travel. the concept of virtual conferences has also been explored. organisers of an international biology meeting recently asked psychologists to assess the success of a parallel virtual meeting, with satellite groups organising local social events afterwards. more than % of the delegates joined online and there was an overall % increase those attending the conference; a full analysis of the success of this approach to conferences is awaited. virtual conferences may enable delegates to sign in from multiple time zones and minimise travel, disruption of clinical commitments and time away from family. this option is being pursued by the reconstructive surgery trials network (rstn) in the uk, whereby the annual scientific meeting will be delivered using teleconferencing technology at four research active hubs across the uk, reducing delegate travel substantially and the conference's carbon footprint in turn. there is a clear but unmeasurable benefit of networking face-to-face for formation of personal connections, exchange of knowledge and opportunities for collaboration. the use of social media, instant messaging applications and modern teleconferencing technology are vital to retain this valuable aspect of academic conferencing. equally, perhaps there is a balance to be found, with societies currently holding biannual meetings moving to include one virtual, or running a parallel virtual event for those travelling long distances. the academic community must play a role in environmental sustainability by reducing the carbon footprint of our journals and conferences. jcrw is funded by the national institute for health and research (nihr) as an academic clinical fellow. none for completion of submission. none. we read with interest the study by sacher et al., who compare body mass index (bmi) and abdominal wall thickness (awt) with the diameter of the respective diea perforator and siea. they found that there was a significant ( p < . ) positive correlation between these variables, concluding that this association may mitigate for the increased perioperative risk seen in patients with high bmi. their findings disagree with a previous smaller study by scott et al. reconstruction in the high bmi patient group can be challenging, and is associated with higher complication rates. despite this, satisfaction with autologous reconstruction appears similar across bmi categories. as the authors discuss, perfusion, as a function of perforator diameter, is of key relevance to the safety of performing autologous breast reconstruction in patients with higher bmi. larger perforator sizes relative to total flap weight have been suggested to reduce the risk of post-operative flap skin or fat necrosis. while this is likely an oversimplification, as flap survival will also depend on multiple factors including perforator row compared to abdominal zones harvested, it does suggest that if the high bmi patient group has reliably larger perforators then their risk profile may be reduced. however, we suggest caution regarding reliance on the correlation they found between bmi or awt and perforator size when planning free tissue transfer. while they demonstrate p values suggesting correlation between bmi or awt and perforator diameter, the r (correlation coefficient) values that they determined through pearson correlation analysis are low, ranging from . to . . the resulting r (coefficient of determination) values are therefore in the range . - . , suggesting that only . - % of the variation in perforator diameter can be related to bmi or awt. it is therefore likely that other variables, such as height and historical abdominal wall thickness, that were not accounted for in the correlation analysis also play roles in determining perforator size, in addition to anatomical variation. in addition, their analysis and results depend on a linear relationship between the variables, which may not be the case. therefore although the authors demonstrate a correlation between abdominal wall thickness and perforator size, there is substantial variation between individual patients and so this relationship cannot be relied upon when planning autologous reconstruction. we read with interest pescarini's et al. article entitled 'the diagnostic effectiveness of dermoscopy performed by plastic surgery registrars trained in melanoma diagnosis'. the article is of great interest in highlighting the potential of plastic surgery registrar training in domains such as dermoscopy, especially for those trainees looking to specialise in skin cancer. training in these experiential skill domains is essential to building a diagnostic framework, and the comparable accuracy in diagnosis to dermatologists reflects this. it would be of great benefit to understand further how diagnostic accuracy evolves along the inevitable learning curve experienced using the dermoscope. pescarini et al. comment briefly on method of training but we believe the timeline is key, as is mentorship and regular appraisal. terushkin et al. found that for the first year of dermoscopy training benign to malignant ratios in fact increased in trainee dermatologists before going on to decrease potentially secondary to picking up more anomalies but not yet having the skill set to determine if these are benign or not. there is no reason to suggest that plastic surgery trainees' learning curves should differ significantly. this of course would skew the data presented in terms of accuracy at the end of the three year study period. more helpful would be a demonstration of how accuracy changes with time and experience, as one would expect, and of course how these rates are comparable to those of dermatologists. this would have implications for training programmes where specific numbers of skin lesions or defined timeframes for skin exposure during training are set as benchmarks for qualification. this is particularly pertinent for uk trainees; the nice guidelines for melanoma state that dermoscopy should be undertaken for pigmented lesions by 'healthcare professionals trained in this technique'. to understand the number of lesions that trainee plastic surgeons have to assess with a dermatosope before their diagnostic accuracy improves -or the time needed to achieve that accuracymight be a key factor for placement duration and numbers required for trainees to become consciously competent dermoscopic practitioners. reproducible training programmes in this regard are therefore vital. it must be pointed out that the role of the dermascope for plastic surgeons is likely to be narrower than for our dermatological colleagues. within the uk, the role of the plastic surgeon is primarily reconstructive, with some subspeciality involvement in diagnosis of melanomas and a range of non-melanomatous skin cancers and skin lesions. the dermoscope is primarily a weapon in the diagnosis of insitu or early melanoma for plastic surgeons where diagnostic certainty is unclear following a referral for consideration for surgical removal. where doubt remains over a naevus, surgical excision is still the normal safe default. dermatologists use dermoscopes for a broad range of diagnostic purposes on a wide variety of skin conditions. the familiarity and expertise with this instrument that they garner is therefore not surprising. we must be clear in resource-limited healthcare systems about what our specific roles are as plastic surgeons and how the burden of patient assessment is shared to appropriately deploy our skills within the context of a broader multidisciplinary framework. accuracy with the dermoscope is essential to safely treating patients in a binary fashion -should the lesion be removed or monitored? comparison with dermatological expertise is helpful as a guide and dermoscopy has an important diagnostic role for plastic surgeons, but we should not strive to be equivalent in skills to dermatologists with dermascopes at the expense of the development of vital surgical reconstructive skills and excellence throughout plastic surgery training. response to the comment made on the article "the diagnostic effectiveness of dermoscopy performed by plastic surgery registrars trained in melanoma diagnosis" we strongly agree with the benefit correlated to understand the learning curve experienced by plastic surgery registrars using the dermoscope. as stated in our article, the limit of our study is its retrospective nature. moreover, the training and the level of competence differed between the three registrars. at the beginning of the data collection, two of them were at their third year of specialist training and were using dermoscope since at least one year while the other one was at his first year. all the registrars attended specific but different dermoscopy courses and all of them completed a h on site training with a competent consultant. for this reason, the expertise partially differed among the three registrars. nevertheless, we believe a years' period should be long enough to truly homogeneously estimate the accuracy in diagnosis of melanoma by them. in fact, townley et al. demonstrate the attendance of the first international dermoscopy for plastic surgeons, oxford, improved the accuracy of diagnosing malignant skin lesions by dermoscopy rather than using naked eye examination. we believe a well-planned prospective study should be of great benefit in term of planning a reproducible dermoscopy plastic surgery-oriented training program. this could help to estimate when a clinician can be considered as competent dermoscopic practitioner. it should be underlined as learning how to use dermoscope is something is not possible to do from time to time but it need effort and self-study. we believed is important to properly plan a formal training in dermoscopy for all the plastic surgery registrars who will use this tool in their practice. vahedi et al. stated, as per their survey, only one of % of the plastic surgery trainees that used dermoscope in their practice had formal training. as all trainees perform outpatient appointments dealing with skin lesions, especially for trainees looking to specialize in skin cancer, we believed the expertise gained through specific course and training is not at expense of the development of surgical reconstructive skills, but instead it can lead improvement in performing outpatient appointment. proper use of dermoscope will make the skin cancer specialized plastic surgeon more confident and truthful if not in detecting melanoma at least in leaving evident benign lesions. keeping always in mind a multidisciplinary approach and a close cooperation between dermatologists and plastic surgeon is of paramount importance in skin cancer treatment. there is no conflict of interest for all of the authors. dear sir, as the author mentioned in this publication, the correction of infra-orbital groove by microfat injection did increase the postoperative satisfaction of lower blepharoplasty surgery . in this study, we want to explore whether this procedure can replace the previous fat pad transposition. months after the microfat injection, we have observed that fat continues to be present but its volume gradually disappears, and, with some, it totally vanishes. with fat pad transposition, the fat volume does not decrease, it seems that both have their advantages and disadvantages because the volume of transplanted fat after lower blepharoplasty might disappear gradually by time. survival of transposed fat through fat pad transposition is the best, creating a more natural look at the tear trough. however, the volume of augmentation might not be enough. it would be exceptional if we could combine both advantages; that is, to administer microfat injection after fat transposition. but prior to that, we would like to share the experience of the author. the fat pad is usually transposed to periosteum by two limits: one is the transposition of the medial fat pad to the inner groove and the other one is the transposition of the central fat pad to the center of the infra-orbital groove. as mentioned by the author, we fill the superficial layer (under the skin) and the periosteum layer (deep layer). injection into the deeper layer is not performed after lower blepharoplasty but before the musculocutaneous flap was closed. after fat pad transposition is completed, we would first cover up the musculocutaneous flap before asking the patient to sit up. then, the surgeon assesses whether a further filling of the groove with the fat is needed or not. if necessary, the musculocutaneous flap is opened and more fat is injected in-between the fat pads into the groove, but, definitely, not into the fat pads. the reason why we do the injection before the flap is closed is to accurately perform the insertion and to avoid entering into the intra-orbital fat pad, which may worsen the presence of eye bags. we inject the superficial fat only after the flap wound is closed. this procedure modifies the groove under the eye more accurately. we share with you our surgical methods with the hope that fat utilization and fat pad transposition will greatly improve surgical satisfaction. dear sir, eiben and gilbert are thanked for their comments. they may be correct in the original description of the respective flaps, but the five-flap z-plasty in our experience has always been known colloquially as the jumping man flap. indeed, extra caution is required in burns secondary reconstruction. the skin of these patients is typically thin, often scarred and unforgiving. flaps should never be undermined unless in an area of completely virgin tissue. the modification we presented does result in an apparently thinner base for the 'arm limb' flaps, but traditionally wider based flaps would have been transferred and then trimmed with the same outcome. the tiny sizes involved in paediatric eyelid surgery would not be the best forum to experiment, and certainly mustardé's original design would seem safest in that setting. we had uniquely sought to also measure precisely the geometric gain in length, and felt that the result was impressive. none letter to the editor: evaluating the effectiveness of plastic surgery simulation training for undergraduate medical students we read with interest the recent correspondence regarding the effectiveness of plastic surgery simulation for training undergraduate medical students. we are in wholehearted agreement with the statement regarding medical school curricula lacking exposure to plastic surgery and commend the authors for their efforts to pique the interest of medical students in our specialty. we wish however to point out some vagueness that, unless clarified, could be misleading to your readership. the correspondence states: "the decrease in competition ratios for plastic surgery". we believe that current data supports the opposite view. taking into account published data from health education england over the last years , there has in fact been a % rise in the competition ratios from to ( fig. .) suggesting an increasing interest in the specialty. highlighting this increase in demand supports the authors' desire for more undergraduate exposure to plastic surgery. this increased input in the uk curriculum would also help all medical students become aware of the support plastic surgeons can provide to other specialties as this is a particular feature of the specialty. in an increasingly specialised medical world, we feel it is important that all doctors are equipped with the knowledge to best serve their patients. no funding has been received for this work and the authors have no competing interest. dear sir/madam, in response to critical personal protective equipment (ppe) shortages during the covid- pandemic, medsupply-driveuk was established by ent trainee ms. jasmine ho, and medsupplydriveuk scotland by two plastic surgery trainees (ms. gillian higgins and mrs. eleanor robertson). we applied the principles of creative problem solving and multidisciplinary collaboration instilled by our specialty. since march , we have recruited over volunteers to mobilise over , pieces of high quality ppe donated from industry to the nhs and social care. we have partnered with academics and leaders of industry to manufacture: surgical gowns, scrubs and visors using techniques including laser cutting, injection molding, and d printing. we have engaged with nhs boards and trusts and politicians at local, regional and national level to advocate for healthcare worker protection in accordance with health and safety executive and coshh legislation including: engineering controls and ppe that is adequate for the hazard and suitable for task, user and environment. public health england (phe) currently advise ffp level of protection only in the context of a list of aerosol gener-the authors have no competing interests. ating procedures . a surgical mask confers x ( %) protection, ffp /n x ( - %) and ffp - , x ( > %) protection ( figure ). as sars-cov- is a novel pathogen, evidence is naïve and evolving, and since transmission occurs via aerosol, droplets and fomites from the aerodigestive tract, all uk surgical associations have issued guidance to use higher levels of ppe for procedures that are not included in the phe list ( ) . cbs, entuk and baoms have issued statements supporting the use of reusable respirators and power air-purifying respirators, and their use is approved by phe, health protection scotland, public health agency, public health wales, nhs and the academy of medical royal collages . the first author has experienced the need to quote bapras guidance in defense of their use of ppe . medsupplydrive (uk and scotland) hope to empower all healthcare workers to demand provision of adequate (i.e. will protect from sars-cov- ) and suitable (for the task, user and environment) ppe by engaging with their employers directly or through unions, royal colleges and associations. as a nation we must learn from other countries who successfully protected their workforce. data suggests that staff death is avoidable with the use of occupational health measures and ffp grade ppe , despite which at least uk health care workers have died of covid- . the strain placed on systems by sars-cov- , with reduced access to operating theatres, beds, equipment and staff has the potential for serious detrimental consequences for surgical training . ppe shortages and the subsequent necessity for rationing is causing additional harm. due to global demand and supply chain failures, ffp disposable masks for people with small faces are in particularly short supply. the majority of these individuals are female, and they are currently provided with no solution apart from avoiding "high risk" operating if/when this resource runs out; further depriving them of training opportunities. reusable respirators provide superior respiratory protection over disposable ffp masks due to design characteristics. they are more likely to provide reliable fit due to increased seal surface area (half face mm, full face mm). as they are designed to be decontaminated between patients and after each shift they are both economically and ecologically advantageous whilst also reducing fit testing burden and negating reliance upon precarious supply chains. there are factories in the uk which already make reusable respirators and medsupplydrive have been contacted by uk manufacturers looking to retool to meet this demand. although some nhs trusts remain reluctant to use reusable respirators, others have already adopted them routinely, using manufacturer decontamination and filter change advice. one nhs trust has supplied every member of their workforce with a reusable respirator as a sustainable plan for ongoing pandemic waves. it is apparent that healthcare workers are unable to access sufficient quantities of high quality respiratory protection. reusable respirators provide adequate protection from sars-cov- as well as being eminently suitable for a wide range of users, tasks and environment. we call on those reviewing decontamination and filter policy for reusable respirators to appreciate the urgency of the situation and expedite the process to enable all health and social care workers to access the respiratory protection that they need. at the epicenter of the covid- pandemic and humanitarian crises in italy: changing perspectives on preparation and mitigation love in the time of corona references . world health organization world health organization. who director-general's opening remarks at the mission briefing on covid- - plastic and reconstructive medical staffs in front line national health commission of the people's republic of china. press conference of the joint prevention and control mechanism of the state council nam therapy-evidencebased results covid- : how doctors and healthcare systems are tackling coronavirus worldwide governmental public health powers during the covid- pandemic: stay-at-home orders, business closures, and travel restrictions a plastic surgery service response to covid- in one of the largest teaching hospitals in europe transmission routes of -ncov and controls in dental practice who declares covid- a pandemic covid- : uk starts social distancing after new model points to potential deaths telehealth for global emergencies: implications for coronavirus disease (covid- ) prospective evaluation of a virtual urology outpatient clinic virtual fracture clinic delivers british orthopaedic association compliance quality indicators for plastic surgery training available at url: available at url: https: //en.wikipedia.org/wiki/seminar (accessed internet resource: the telegraph. the inflexibility of our lumbering nhs is why the country has had to shut down internet resource: the british society for surgery of the hand. covid- resources for members caring for patients with cancer in the covid- era maxillofacial trauma management during covid- : multidisciplinary recommendations asps statement on breast reconstruction in the face of covid- pandemic statement from the association of breast surgery th march : confidential advice for health professionals blazeby jmbreast reconstruction research collaborative. short-term safety outcomes of mastectomy and immediate implant-based breast reconstruction with and without mesh (ibra): a multicentre, prospective cohort study how the wide awake tourniquet-free approach is changing hand surgery in most countries of the world. hand clin hand trauma service: efficiency and quality improvement at the royal free nhs foundation trust one -stop" clinics in the investigation and diagnosis of head and neck lumps the implications of cosmetic tourism on tertiary plastic surgery services the need for a national reporting database references . policy on the redeployment of staff trauma management within uk plastic surgery units president of the british society for surgery of the hand. ( ) th march highlights for surgeons from phe covid- ipc guidance american society of plastic surgery website. asps guidance regarding elective and non-essential patient care the effect of economic downturn on the volume of surgical procedures: a systematic review an analysis of leading, lagging, and coincident economic indicators in the united states and its relationship to the volume of plastic surgery procedures performed telemedicine in the era of the covid- pandemic: implications in facial plastic surgery united states chamber of commerce website. resources to help your small business survive the coronavirus transmission of covid- to health care personnel during exposures to a hospitalized patient early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia otorhinolaryngologists and coronavirus disease (covid- ) quantifying the risk of respiratory infection in healthcare workers performing high-risk procedures skills fade: a review of the evidence that clinical and professional skills fade during time out of practice, and of how skills fade may be measured or remediated ad hoc committee on health literacy for the council on scientific affairs training strategies for attaining transfer of problemsolving skill in statistics: a cognitive-load approach use of a virtual d anterolateral thigh model in medical education: augmentation and not replacement of traditional teaching? augmenting the learning experience in primary and secondary school education: a systematic review of recent trends in augmented reality game-based learning aging and wound healing tissue engineering and regenerative repair in wound healing duration of surgery and patient age affect wound healing in children investigating histological aspects of scars in children formation of hypertrophic scars: evolution and susceptibility early laser intervention to reduce scar formation in wound healing by primary intention: a systematic review early laser intervention to reduce scar formation -a systematic review effectiveness of early laser treatment in surgical scar minimization: a systematic review and meta-analysis cochrane handbook for systematic reviews of interventions version prospective or retrospective: what's in a name? how to run an effective journal club: a systematic review the evolution of the journal club: from osler to twitter free flap options for reconstruction of complicated scalp and calvarial defects: report of a series of cases and literature review the effect of age on microsurgical free flap outcomes: an analysis of , cases factors affecting outcome in free-tissue transfer in the elderly reconstruction of postinfected scalp defects using latissimus dorsi perforator and myocutaneous free flaps long-term superiority of composite versus muscle-only free flaps for skull coverage indocyanine green lymphography findings in older patients with lower limb lymphedema microsurgical technique for lymphedema treatment: derivative lymphatic-venous microsurgery lower-extremity lymphedema and elevated body-mass index lymphorrhea responds to negative pressure wound therapy lymphovenous anastomosis aids wound healing in lymphedema: relationship between lymphedema and delayed wound healing from a view of immune mechanisms evolving practice of the helsinki skin bank skin graft meshing, overmeshing and cross-meshing gluteal implants versus autologous flaps in patientswith postbariatric surgery weight loss: a prospective comparative of -dimensional gluteal projection after lower body lift redefining the ideal buttocks: a population analysis classification system for gluteal evaluation blondeel and others. doppler flowmetry in the planning of perforator flaps frequency of the preoperative flaws and commonly required maneuvers to correct them: a guide to reducing the revision rhinoplasty rate temporalis fascia grafts in open secondary rhinoplasty the turkish delight: a pliable graft for rhinoplasty bone dust and diced cartilage combined with blood glue: a practical technique for dorsum enhancement the use of bone dust to correct the open roof deformity in rhinoplasty wound microbiology and associated approaches to wound management moleculight _ ix _ user _ manual _ rev _ . _ english the use of the moleculight i:x in managing burns: a pilot study improved detection of clinically relevant wound bacteria using autofluorescence image-guided sampling in diabetic foot ulcers efficacy of an imaging device at identifying the presence of bacteria in wounds at a plastic surgery outpatients clinic publication rates for abstracts presented at the british association of plastic surgeons meetings: how do we compare with other specialties? are we still publishing our presented abstracts from the british association of plastic and reconstructive surgery (bapras)? full publication of results initially presented in abstracts the true cost of science publishing science for sale: the rise of predatory journals plastics in healthcare: time for a re-evaluation green theatre wigan's hospital organisation is first health trust in europe to install intelligent lighting sensorflow provides smart energy management for hotels in malaysia nhs bids to cut up to million plastic straws, cups and cutlery from hospitals healthcare sustainability -the bigger picture on behalf of the council of the association of surgeons in training cross-sectional study of the financial cost of training to the surgical trainee in the uk and ireland plastic waste inputs from land into the ocean low-carbon, virtual science conference tries to recreate social buzz body mass index and abdominal wall thickness correlate with perforator caliber in free abdominal tissue transfer for breast reconstruction patient body mass index and perforator quality in abdomen-based free-tissue transfer for breast reconstruction increasing body mass index increases complications but not failure rates in microvascular breast reconstruction: a retrospective cohort study are overweight and obese patients who receive autologous free-flap breast reconstruction satisfied with their postoperative outcome? a single-centre study predicting results of diep flap reconstruction: the flap viability index the diagnostic effectiveness of dermoscopy performed by pastic surgery registrars trained in melanoma diagnosis analysis of the benign to malignant ratio of lesions biopsied by a general dermatologist before and after the adoption of dermoscopy assessing suspected or diagnosed melanoma dermoscopy-time for plastic surgeons to embrace a new diagnostic tool? the use of dermatoscopy amongst plastic surgery trainees in the united kingdom modification of jumping man flap combined double z-plasty and v-y advancement for thumb web contracture plastic surgery in infancy evaluating the effectiveness of plastic surgery simulation training for undergraduate medical students united kingdom mr. b.s. dheansa queen victoria hospital recommended ppe for healthcare workers by secondary care inpatient clinical setting, nhs and independent sector personal protective equipment (ppe) for surgeons during covid- pandemic: a systematic review of availability, usage, and rationing covid- : protecting worker health. annals of work exposures and health memorial of health & social care workers taken by covid- nursing notes covid- robertson canniesburn plastic surgery and burns unit georope geo-technical and rope access solutions, west quarry none. the authors have no financial interests to declare in relation to the content of this article and have received no external support related to this article. no funding was received for this work. the authors would like to thank catriona graham, sarcoma specialist nurse who helped in the evaluation of this study. the authors kindly thank the beatson cancer charity, uk (grant application number - - ), the jean brown bequest fund, uk, and the canniesburn research trust, uk for funding this study. the sponsors had no influence on the design, collection, analysis, write up or submission of the research. supplementary material associated with this article can be found, in the online version, at doi: . /j.bjps. . . . none. the authors declare no funding. jeremy rodrigues provided data from the two nhs trust journal clubs and invaluable advice. nil. all authors declare that there were no funding sources for this study and they approved the final article. supplementary material associated with this article can be found, in the online version, at doi: . /j.bjps. . . . all authors disclose any commercial associations or financial disclosures. none. none. none. none. all authors agree to the fact there are no conflicts of interest to declare. no funding was provided for this letter. the authors have no financial or personal relationships with other people or organizations, which could inappropriately influence the work in this study. the authors have no financial disclosure or conflict of interest to declare in relation to the content of this article. no funding was received for this article. supplementary material associated with this article can be found, in the online version, at doi: . /j.bjps. . . . dear sir, long has the term 'publish or perish' been considered medical doctrine and this has historically been a prerequisite for progression in research-driven specialties such as plastic surgery. national, or indeed international, presentation is pivotal to disseminating information, but also provides a stepping-stone to future publications. in the uk, bapras meetings have always represented the ideal platform for this. of significant interest is the conversion of accepted abstracts into peer-reviewed publications.previous studies , have assessed abstract publication for bapras meetings and have shown a declining conversion rate. we re-assessed this in order to establish whether this reported downtrend is continuing and how plastic surgery compares to other specialties.all abstracts from bapras meetings between winter and summer were analysed. later meetings were excluded to allow adequate lag time for publication. abstracts were identified retrospectively from conference programmes accessible via the bapras website ( www.bapras. org.uk ). pubmed ( https://www.ncbi.nlm.nih.gov/pubmed/ ) and google scholar ( https://scholar.google.com/ ) databases were used to search for full publications. cross-referencing of published papers with abstracts for content was completed to ensure matched studies.abstracts published prior to the conference date were excluded. two-tailed t -testing was used to assess for statistical significance between variables. none. none. dear sir, diver and lewis described a modification of the "jumping man flap". in fact, what they have described is a modification of the -flap z-plasty. this was described by hirschowitz et al. it is not a jumping man as it has no body.the true jumping man flap was described by mustarde for the correction of epicanthal folds and telecanthus.we have used the -flap z-plasty particularly for the release of st web space contractures following burns, the modification of raised curved scars of the trunk and limbs following burns, and for the correction of epicanthal folds in small children.using the diver and lewis modification in burn cases results in thin and less vascular flaps. when correcting epicanthal folds in children the flaps are so small that reducing their size in any way would make it near impossible to suture the flaps correctly. no conflicts of interest. key: cord- -ditadt l authors: mitarai, o.; yanagi, n. title: suppression of covid- infection by isolation time control based on the sir model and an analogy from nuclear fusion research date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: ditadt l the coronavirus disease (covid- ) has been damaging our daily life after declaration of pandemic. therefore, we have started studying on the characteristics of susceptible-infectious-recovered (sir) model to know about the truth of infectious disease and our future. after detailed studies on the characteristics of the sir model for the various parameter dependencies with respect to such as the outing restriction (lockdown) ratio and vaccination rate, we have finally noticed that the second term (isolation term) in the differential equation of the number of the infected is quite similar to the "helium ash particle loss term" in deuterium-tritium (d-t) nuclear fusion. based on this analogy, we have found that isolation of the infected is not actively controlled in the sir model. then we introduce the isolation control time parameter q and have studied its effect on this pandemic. required isolation time to terminate the covid- can be estimated by this proposed method. to show this isolation control effect, we choose tokyo for the model calculation because of high population density. we determine the reproduction number and the isolation ratio in the initial uncontrolled phase, and then the future number of the infected is estimated under various conditions. if the confirmed case can be isolated in ~ days by widely performed testing, this pandemic could be suppressed without awaiting vaccination. if the mild outing restriction and vaccination are taken together, the isolation control time can be longer. we consider this isolation time control might be the only solution to overcome the pandemic when vaccine is not available. since the wuhan pneumonia in this jan. in china, was reported, we have been studying the covid- infection using sir model. then, we have noticed the second equation (the quarantine term) used in the sir mode is quite similar to the he ash particle balance equation in the nuclear fusion research (mitarai and muraoka, ) . by this analogy, we have found that the time in the isolation term can be interpreted as the natural isolation, which means that the isolation is not conducted in a controlled manner in a traditional sir model. this fact inspired us how to suppress this infectious disease. we show that it is possible to calculate the required isolation time (from infection to isolation) to suppress the covid- . in this paper we have chosen tokyo with large population where might be suitable to apply the sir model. the future number of the infected is estimated with various conditions with the outing restriction ratio and isolation control parameters. when the infected people are identified by massive testing and then quickly isolated from the society, the infectious disease could be terminated. in this paper we have used the traditional sir model (kermack, and mckendrick, , nishiura and inaba, ) in this model we interpret the isolation term using the analogy from the nuclear fusion research, providing the new insight into this field. sir model equations are given by where s is the susceptible fraction i is the infectious fraction and r is the recovered fraction which is the sum of the death toll and recovered. β is the infection rate per day and γ is the isolation ratio or the recovered rate per day. the second term in the right-hand side of eq. ( ) is called the isolation term or quarantine term, and it can be also expressed by where τ has the dimension of time expressing the averaged lifetime of the infectious disease. this is also interpreted that an infected person is finally removed from the society by recovery or death after the time period of τ (kermack, and mckendrick, ) . this isolation term in eq. ( ) is quite similar to the " he ash particle loss term" used in a "nuclear fusion research" (mitarai and muraoka, ) . this loss time is called the "particle confinement time" in a nuclear fusion, and expresses the average time for he particle to escape from the confined plasma. this analogy inspired us to consider how to terminate the covid- pandemic. detailed explanations on this analogy will be described in the appendix. if the isolation can be quickly conducted within a short time, isolation term is increased, and then the number of infected people would be reduced. this is because the contact time with infected people can be shortened. in this sir model, the total population is constant as n(t) = s(t)+i(t)+r(t) = constant by adding eqs. ( )-( ). therefore, we have calculated the number of the infected in tokyo because of a constant population in a short time. to solve the sir model, the initial value is set to s( )= , r( )= and i( )=given. we also note that this traditional sir model assumes the permanent immunity. using the effective reproduction number defined by r eff = βs(t)/γ, eq. ( ) can be written as di/dt = γ (r eff - )i. as the basic reproduction number r o = β/γ is for the initial value of s( )= , r eff =s(t)r o holds. we see that the number of the infected people increases for r eff and decreases for r eff < , where r eff expresses the positive or negative value corresponding to the derivative di/dt. r eff only shows the tendency at the present time slice. we divided the calculated time ranges into the first phase and second phase in the following. the first phase starts on may , in this study and may not be affected by any governmental control measures. as the day from the start (april , ) provides the largest number of the infected per day, the two days later (day ) was taken as the first day of the second phase. the first phase is given by the following traditional sir model expressed by the basic reproduction number r o as ( ) is the same as the second term in eq.( - ) with the inverse sign, we do not write down it in the following for simplicity. here the only one infected person is assumed as the initial value i( )= / , , = . x - and s( )= for the population of tokyo , , as given in table . we calculate the fraction of s, i and r, and multiply the population to obtain the final value. above simultaneous equation has been solved by mathematica (wolfram research). although we need parameters β, γ and then the basic reproduction number r o , it is difficult to determine γ by eq.( ) because data is not enough and not smooth in this case. as the basic reproduction number is reported to be to by recent research (liu, y., et al, ) , we have used this range of value. as the main purpose of this paper is to estimate the isolation time and not to analyze the present situation in detail, the recovery time is most important. therefore, we surveyed the recovery time using the cumulative number of the confirmed case and discharged case from the hospital as shown in fig. -(a) . the time difference between two curves is considered to be nearly the recovery time, which is less than days. as eq. ( - ) provides the number of the infected, γ and r o are chosen by the trial and error to fit with the infection data as shown in fig. -(b) . then we have employed r o = . as given by liu et al. (liu, t., et al, ) , and γ= . corresponding to the recovery time τ= . days less than days, which provide the coefficients β = . . is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint (okumura, ) ). in this work the second phase has started after days from the state of emergencies in japan (april ). after day we introduce the outing restriction ratio c = % in the sir model, which means that % of the total population can move freely outside and % are staying home. this calculation can be done by employing the basic reproduction number r o multiplied by ( -c) (kado, ) . this means that the number of the susceptible people is reduced by ( -c), namely s(t)( -c), and the number of non-susceptible people of s(t)c is staying home without infection. with the outing restriction ratio c, eqs. ( ) and ( ) can be written as the calculated results are shown in fig. for various outing restriction ratios c. the initial values in the second phase are taken from the last one in the first phase. a small number of c is not effective at all to reduce infection. a large number, such as c= . , can reduce the number of the infected substantially, but it takes two months (june , ) to reduce down to persons per day, and four months down to persons per day. on the other hand, if c= . is taken it will take days to reduce down to persons per day. if c= . is taken, the number of infected per day is not reduced and constant. this value can be derived by the condition of r o ( -c) = obtained by di/dt= and s(t)~ in eq. ( - ), providing c= - /r o = - / . = . . when c= . is chosen, the number of the infected is increasing as shown in fig. -(a) we should note that the variation of the outing restriction leads to the large difference in the number of the infected. we added the data until july , to compare it with the actual situation as shown in fig. - (b) . at the first glance, the state of emergency was so effective to reduce the infection. the actual number of the infected is much smaller than the curve of c= . and . near the end of the state of emergency. this is because we assume the outing restriction is imposed on only the susceptible people and the infected patients are free to move. as the outing of the infected patients is also limited, the first term in eq.( - ) should be s(t)( -c) βi(t) for the assumption of the same outing restriction ratio. therefore, the outing restriction of c= . actually corresponds to c = -( - . ) = . if we transform the form ( -c) to ( -c ). although larger value of c= . is proposed by committee, larger value of c = . was achieved in an actual situation. this estimation can be justified by the two-body collision theory as also used in nuclear fusion research (mitarai and muraoka, ) , corresponding to the first term in eq.(a- ) in appendix. but the red curve with c = . is still higher than the actual values. therefore, we increased the isolation ratio from γ= . (red line) to γ= . (blue line), providing the reasonable fit. the larger isolation ratio could be justified by active isolation conducted during this period. . cc-by-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint fig. temporal evolution of the number of the infected for various outing restriction ratios in the second phase during and after the state of emergency. green line at april shows the state of emergency. (a) calculated at april , , and (b) calculated at july , . note that red lines show the cases of the isolation ratio of γ= . (r o = . ) and a blue line γ= . (r o = . ). it is criticized that polymerase chain reaction (pcr) testing is not fully conducted to find the covid- patients in japan. therefore, more infected patients are expected (tanaka and oku, ) . however, as the purpose of this work is to see the future tendency and to study the isolation effect, we use the presently available data without discussion on this problem. even after the first wave is over, some portion of infected people still exit. therefore, the infected patient would be increased again as long as outing restriction is lifted up. we predict the number of the infected as a function of the outing restriction ratio in the third phase that started after days. in this calculation, the outing restriction ratio in the second phase ( to days) is assumed as c= . . in fig. -(a), the numbers of the infected in the third phase are shown for c= . to . . when the outing restriction is set to c= . , it increases again up to infected people per day in september. for c= . , it increases to the same level by the end of . however, when the outing restriction ratio is given by c= . , the number of the infected per days becomes constant. of course, when c= . or . are maintained, it will be decreased as seen in fig. -(a) . therefore, to keep the infection number low, the outing restriction ratio should be larger than c= . . however, it might be very difficult to maintain such a strong measure from a viewpoint of economy. when no more restriction is set, the number of infected people rapidly increases as in the first wave. the strong restriction can again suppress the number of the infected. the oscillatory situation can be seen in the report (ferguson, et al, ) . therefore, in tokyo after the state of emergency was lifted up on may , , the number of the infected slowly increased as shown in fig. -(b) . on july , persons were confirmed as positive. to estimate the level of outing restriction without the state of emergency, we calculated that the third phase started at day ( infected people) and fitted data with curves with c= to . as shown in fig. -(b) . the isolation ratio of γ= . (the blue line) is assumed in the third phase because the time difference between the confirmed and discharged date from the hospital is almost days after day , which is obtained by the same method as shown in fig. -(b) . as the cases of c= . ~ . are almost the best fitting curves, the new way of life under the condition without outing restriction, such as mask wearing, hand washing, and social distancing, may be providing this value. we note if the smaller isolation ratio of γ= . is taken, c is further increased as c= . ~ . . we note that the initial value of the number of patients at day sensitively determines the subsequent number of the infected people. . cc-by-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint fig. . prediction of the number of the infected as a function of the outing restriction in the third phase until the end of . each isolation ratio is the same as in fig. . (a) calculated on april, and (b) calculated on july , . red lines are for γ= . (r o = . ), and blue lines for γ= . (r o = . ). to avoid an outing restriction or lockdown, it is clear that we need to have vaccination. however, to obtain safe vaccine, we have to wait for it. therefore, we have to manage to keep the number of the infected as low as possible although it is not good for the testing the effect of vaccine. in this section, we propose the comprehensive sir model to unify the outing restriction, vaccination and isolation together, and show their equivalent relationship. we show in fig. how we can consider the sir epidemic curve when the outing restriction and vaccination are taken into account. figure -(a) corresponds to no restriction case on the human behavior. the basic reproduction number is r o = . as in section . . figure -(b) shows the case of % outing restriction (c= . . the number of the susceptible people s(t)( -c) with free movement is reduced by the outing restriction, and the rest of them s(t)c are staying home. as the horizontal line is given by the initial susceptible number s( )( -c), the epidemic curve exists below this line. as the effective reproduction number is r eff = r o ( -c) = . for s(t)~ in this case, the peak fraction of the infected i(t) would be delayed and reduced. when the susceptible people have vaccination with the ratio of v, sir model equations are given by is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint and result is shown in fig. -(d) for the vaccination ratio of v= . and the outing restriction ratio of c= . . we do see that the infected people appear earlier without any restriction, and later with restrictions. we show the various sir curves in fig. to see the effect of the outing restriction ratio. when the outing restriction ratio c is increased, it is seen that the peak fraction of the infected is reduced and delayed. the condition to reduce the infected case is given by the effective reproduction number less than as r eff =s(t)r o ( -c)( -v) ≤ when both the outing restriction and vaccination are taken into account. hence, we have the relationship: c ≥ - /{s(t)r o ( -v)} which is shown in fig for s(t) ~ . when the vaccination ratio increases, the outing restriction ratio can be relaxed. of course this depends on the basic reproduction number as seen in fig. . in the case of large basic reproduction number, the vaccination ratio should be large, because infectious disease is more transmissive. so called "herd immunity" holds for vaccination ratio over ~ % without the outing restriction as shown in fig. . however, the herd immunity strategy is quite dangerous because a lot of people are infected and then the death toll will be increased, and also it is not clear yet whether the antibody can last longer or not for covid- . is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . we note that if the new way of life with c= . ~ . is maintained as shown in fig. -(b) , herd immunity could be established with the vaccination ratio larger than %. at present as we have no effective and safe vaccine, how we can suppress this covid- without outing restriction or lockdown. we seek such a solution based on sir model. fig. relationship between the outing restriction ratio c and vaccination ratio v to suppress the infectious disease the number of the infected people depends on the isolation coefficient γ. this value γ was determined in the first phase where any control measures were not taken yet. here to know how the number of the infected is increased, we have calculated the first term (solid curve) and the second term (dotted curve) in eq. ( ) for various isolation coefficients γ as shown in fig. . when the isolation coefficient is as small as γ = . , the peak value of the infected people βs(t)i(t) is much larger than the one of the second term γi(t). we see that the second term is delayed from the first term. therefore, the infected people are increased. for example, γ = . has been used in the analysis for tokyo as shown in figs. , and . on the other hand, for the larger isolation coefficient as γ= . , the difference of their peaks is small and their time delay becomes shorter with the isolation ratio. in other word, the first term and second term become comparable. this means that the number of the infected is not increased if the isolation is taken as soon as the case is confirmed. this is also interpreted as follows. as the basic reproduction number is given by r o = β/γ, the large isolation coefficient γ provides the smaller r o , reducing the number of infected people. fig. the time dependence of the first term and second term in eq.( ) for various isolation ratios outing restriction (or lockdown) causes the large economical loss. therefore, we should consider how this epidemic could be overcome without vaccine and outing restriction. when the outing restriction ratio of c is imposed, the effective reproduction number is given from eq. ( - ) as r eff =βs(t)( -c)/γ. on the other hand, when the excess isolation q takes place (no infected person is isolated by testing error), we can use γ( +q) as the isolation term coefficient. the effective reproduction number is given by in this case . this leads to the following linear approximation in the case of q << as . cc-by-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . this is exactly the same expression with the outing restriction ratio of q. we have thus found that there is the equivalence between outing restriction (lockdown) and isolation in eq. ( ) in sir model. however, this holds only for the small value of q less than . if we introduce the isolation term as it can be used for the large value close to q = . this provides the effective reproduction number the large q value provides the smaller effective reproduction number. this situation is shown in fig. . during this recovering time, patient is infectious but not actually isolated from the susceptible space so far in the traditional sir model. by introduction of the isolation time control parameter q, we can artificially control this infectious time. fig. concept of natural isolation in sir model and isolation time control proposed in this work isolation control has the same effect as the outing restriction. to confirm this, we calculated the second phase using the isolation time control by changing q from . to . as shown in fig. . the number of the infected actually decreases with the isolation time control parameter q as the outing restriction case. the required isolation times τ γ =τ( -q) are also shown in the parenthesis for each q. it is seen that to days are necessary to reduce the number of the infected to the small value. the horizontal line is obtained when r o ( -q)= is satisfied, namely q= . for r o = . , which agrees with the numerical one. if the isolation time control parameter q is larger this value, the number of the infected decreases. the required isolation time can be longer when the outing restriction is imposed at the same time as will be seen in the next section. the required isolation time thus obtained becomes quite reasonable. if isolation can be done as soon as the infected is identified, it reduces the infection. fig. temporal evolution of the number of the infected for various isolation time control parameters q without any outing restriction c= . (τ γ = τ ( -q) and τ = . days). red lines are for γ= . (r o = . ). . cc-by-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint when the outing restriction is imposed at the same time, isolation control is relaxed. the isolation time can be longer than that with isolation control alone. equation ( - ) can be written in this case as and the effective reproduction number is given by r eff =βs(t)/γ = s(t)r o ( -c) ( -q) . this means that the outing restriction is equivalent to the isolation control. for example, as the outing restriction ratio of c = . and isolation time control parameter of q = . provides ( -c)( -q)= . . . , which is almost the same as the outing restriction ratio of c = . . long term prediction until the end of , corresponding to fig. , is conducted for various isolation control parameters of q= . to . . results are shown in fig. in the third phase, how the number of the infected is controlled by the isolation time control parameter together with the outing restriction of c= . , corresponding to the parameters in the new way of life as shown in fig. -(b) . for q= . , the number of the infected is reduced to persons per day at the end of . for the case of q= . , the number of the infected is . person after days. thus we have found that the longer isolation time is permitted when outing restriction is imposed at the same time, which could be called the combined control. the condition of the flat curve in the third phase given by , r o = . and c= . as shown in fig. . to compare the case without the outing restriction in the third phase, the same condition is used except for c= . to suppress the infectious disease it is necessary to isolate the infected people within . days (q= . ) without the outing restriction as shown in fig. . however, with the outing restriction ratio of c= . , . days (q= . ) is enough to suppress (fig. ). thus we have found that it is easier to suppress the infectious disease by isolation control if the mild outing restriction is imposed at the same time. fig. dependencies of the number of the infected on isolation parameters without the outing restriction c= . (τ γ = τ ( -q) and τ = . days). when peoples have vaccination with outing restriction, eq. ( - ) is finally given by is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . suppress this infection disease within by the isolation control with q > . (isolation within days) as shown in fig. without outing restriction. thus, not to affect the daily life and social economy, this isolation control strategy is a hopeful solution. we have noticed the interesting papers on the effectiveness of isolation and quarantine to overcome infectious disease (niu and xu, ) . we have used the word "isolation" as used in the reference (wilder-smith and freedman, ) . accurate data on viral load with respect to the time from infection as shown in the reference (hellewell et al., , glasser et al., could improve the estimation proposed in this work. because only average time has been used in this work. however, such work is beyond this paper. big problem in this covid- is that there are a lot of asymptomatic but infectious cases (dickens et al., , long et al., . therefore, active and massive testing is so important for quick isolation of the identified person. such technology has been under active development. for example, recently the new method is invented by takarabio in usa to allow faster and larger scale pcr testing, where , tests can be done in about two hours (takarabio ). as the national institute of health is aiming at the million pcr test per day, such aggressive research plan is promising to suppress this pandemic (tromberg et al., ) . recently, innovative technique such as vocal biomarkars is proposed by mit lincoln laboratory to identify the asymptomatic people with covid- positive (quatieri et al, ) . vocal change before and after infection could be detected by processing speech recording. in addition, contact-tracing technique using communication tool is also useful (hellewell et al., ) . various new ideas from the different research areas should be developed to overcome this pandemic. to know the truth of infectious disease, we have studied the characteristics of sir model equation in detail. based on the he ash removal study in a d-t nuclear fusion (mitarai and muraoka, ) , we have found that the time in the isolation term can be interpreted as the natural isolation time, which means that the isolation is not conducted in a controlled manner in a traditional sir model. therefore, we introduce the isolation time control parameter q to control the isolation time. we have finally found that the outing restriction ratio c, the vaccination rate v, and the isolation time control parameter q are equivalent with respect to the basic reproduction number. this means that isolation control has the same function of vaccination or the outing restriction. in an actual situation especially after the first wave when the number of patient decreases, massive pcr testing etc should be conducted and then infected people especially without symptom should be identified and isolated as soon as possible. we have to construct such an advanced social system with great care about human rights. without such system, the economic activity cannot be maintained unless a safe vaccine is not supplied. in this study, the required isolation time is ~ days without any outing restriction, but it can be longer as ~ days with the outing restriction of c= . to suppress this covid- . we note that this value is obtained by the new way of life as shown in fig. -(b) . although this isolation time depends on the basic reproduction number, the required isolation time obtained in this study is quite realistic and could be accomplished. above estimated time could be also justified by the fact that the median duration of viral shedding in the asymptomatic group is days (long et al., ) . this value is quite similar to our employed value of infectious period or recovery time τ = . days. this isolation control concept may be able to explain the part of success in south korea and taiwan to suppress covid- disease. in conclusion, with reliable, simple and massive testing method, if isolation can be done in a short time after case identification, this pandemic could be overcome without the strong outing restriction such as lockdown before vaccine is completed. medical research institutes and pharmaceutical companies are actively developing the vaccine to curtail this pandemic. in contrast, only the central and local governments can do wide, repeated and quick covid- testing and quick isolation. the bureaucratic system such as the public health center and the ministry of health can play an important role to manage the isolation control together with various laws. upon considering this simple theory proposed here, we believe government should be definitely a main player to terminate this infectious disease. none declared is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted september , . . https://doi.org/ . appendix analogy between nuclear fusion research and sir model in a deuterium and tritium nuclear fusion reactions, the following reaction takes place. d + t -> he ( . mev) + n ( . mev) here high energy alpha particles ( he particle) should be removed from the d-t plasmas after losing their energy to the surrounding d-t plasmas. this he particle can be expressed by the particle balance equation given by [eq.( ) in the ref. (mitarai and muraoka, )] dn he dt = σv dt n d n t − n he τ he the first term shows the he production term by d-t nuclear reactions, where n d is the deuterium, density, n t is the tritium density, <σv> dt is the d-t fusion rate, σ is the cross section of the d-t fusion and v is the relative velocity between d and t particles. this term is also analogous to the first term in eq. ( ) in the sir model, which expresses the collision (close contact) between the susceptible and infective persons. the second term expresses the he particle escaping from the plasma. escaping time is τ he in an average as shown in fig.a . this term is exactly the same as the isolation term in sir model equation ( ). this can be interpreted as that the infected person goes out to the isolation space after τ γ during this infectious time the infected persons moves freely. the infected people is not isolated immediately after infection as this he ash. we have long been studying how we can exhaust these he ash particles quickly from the fusion plasma such as in tokamak and helical reactors. we have noticed that especially in an advanced fusion reactor such as neutron lean fusion reactor, exhaust of ash particles are crucially important (mitarai, et al, ) . the particle confinement time in the second term should be as short as possible to build a fusion reactor. this fusion concept is the quite similar to the epidemic strategy to maintain the economic activity by isolating the infected people quickly. csv data institutional, not home-based, isolation could contain the covid- outbreak impact of non-pharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand march, imperial college covid- response team modeling and public health emergency responses: lessons from sars feasibility of controlling covid- outbreaks by isolation of cases and contacts growth or decline will this epidemic lead to? physical meaning of the reproduction number r and its determination -to access our behavior modification based on a simple model equation -, reprint from rad-it web journal contributions to the mathematical theory of epidemics-i transmission dynamics of novel coronavirus ( -ncov) the reproductive number of covid- is higher compared to sars coronavirus clinical and immunological assessment of asymptomatic sars-cov- infections, nature medicine | www ignition analyses for burn control and diagnostics developments in iter ignition studies of d- he spherical tokamak reactor prediction of infectious disease outbreak with particular emphasis on the statistical issues using transmission model deciphering the power of isolation in controlling covid- outbreaks a framework for biomarhkers of covid- based on coordination of speech-production subsystems high-throughput viral detection via qpcrhttps: //www estimation of true number of covid- infected people in japan using line questionnaire special report on "rapid scaling up of covid- diagnostic testing in the united states -the nih radx initiative isolation, quarantine, social distancing and community containment: pivotal role for old-style public health measures in the novel coronavirus ( -ncov) outbreak key: cord- -vh s vi authors: libotte, gustavo barbosa; lobato, fran s'ergio; platt, gustavo mendes; neto, antonio jos'e da silva title: determination of an optimal control strategy for vaccine administration in covid- pandemic treatment date: - - journal: nan doi: nan sha: doc_id: cord_uid: vh s vi during decades, mathematical models have been used to predict the behavior of physical and biologic systems, and to define strategies aiming the minimization of the effects regarding different types of diseases. in the present days, the development of mathematical models to simulate the dynamic behavior of novel coronavirus disease (covid- ) is considered an important theme due to the quantity of infected people worldwide. in this work, the aim is to determine an optimal control strategy for vaccine administration in covid- pandemic treatment considering real data from china. for this purpose, an inverse problem is formulated and solved in order to determine the parameters of the compartmental sir (susceptible-infectious-recovered) model. to solve such inverse problem, the differential evolution (de) algorithm is employed. after this step, two optimal control problems (mono- and multi-objective) to determine the optimal strategy for vaccine administration in covid- pandemic treatment are proposed. the first consists of minimizing the quantity of infected individuals during the treatment. the second considers minimizing together the quantity of infected individuals and the prescribed vaccine concentration during the treatment, i.e., a multi-objective optimal control problem. the solution of each optimal control problems is obtained using de and multi-objective differential evolution (mode) algorithms, respectively. the results regarding the proposed multi-objective optimal control problem provides a set of evidences from which an optimal strategy for vaccine administration can be chosen, according to a given criterion. in the last decades, countless mathematical models used to evaluate the spread and control of infectious diseases have been proposed. these models are very important in different fields, such as policy making, emergency planning and risk assessment, definition of control-programs, promoting the improvement of various health-economic aspects (al-sheikh, ) . in general, such models aim to describe a state of infection (susceptible and infected) and a process of infection (the transition between these states) by using compartmental relations, i.e., the population is divided into compartments by taking assumptions about the nature and time rate of transfer from one compartment to another (trawicki, ; blackwood and childs, ) . one can cite several studies using models for measles vaccination (bauch et al., ; widyaningsih et al., ) , hiv/aids (mukandavire et al., ) , tuberculosis (bowong and kurths, ) , dengue (weiss, ) , pertussis epidemiology (pesco et al., ) , alzheimer (ebrahimighahnavieh et al., ) , among others. recently, the world has been experiencing the dissemination of a new virus, referred to as covid- (coronavirus disease ). covid- is an infectious disease emerged from china in december , that has rapidly spread around in many other countries worldwide (gorbalenya et al., ; world health organization, (accessed april , . the common symptoms are severe respiratory illness, fever, cough, and myalgia or fatigue, especially at the onset of illness (huang et al., ) . the transmission may happen person-to-person, through direct contact or droplets (chan et al., ; li et al., ; riou and althaus, ) . since the covid- outbreak in wuhan city in december of , various computational model-based predictions have been proposed and studied. lin et al. ( ) proposed a susceptible-exposed-infectious-removed (seir) model for the covid- outbreak in wuhan. these authors considered some essential elements including individual behavioral response, governmental actions, zoonotic transmission and emigration of a large proportion of the population in a short time period. benvenuto et al. ( ) proposed the auto regressive integrated moving average (arima) model to predict the spread, prevalence and incidence of covid- . roda et al. ( ) used a susceptible-infectious-removed (sir) model to predict the covid- epidemic in wuhan after the lockdown and quarantine. in this study, these authors demonstrate that non-identifiability in model calibrations using the confirmedcase data is the main reason for such wide variations. prem et al. ( ) proposed a seir model to simulate the spread of covid- in wuhan city. in this model, all demographic changes in the population (births, deaths and ageing) were ignored. the simulations showed that control measures aimed at reducing social mixing in the population can be effective in reducing the magnitude and delaying the peak of the covid- outbreak. in order to evaluate the global stability and equilibrium point of these models, li and muldowney ( ) studied a seir model with nonlinear incidence rates in epidemiology, in terms of global stability of endemic equilibrium. al-sheikh ( ) evaluated a seir epidemic model with a limited resource for treating infected people. for this purpose, the existence and stability of disease-free and endemic equilibrium were investigated. li and cui ( ) studied a seir model with vaccination strategy that incorporates distinct incidence rates for exposed and infected populations. these authors proved the global asymptotical stable results of the disease-free equilibrium. singh et al. ( ) developed a simple and effective mathematical model for transmission of infectious diseases by taking into consideration the human immunity. this model was evaluated in terms of local stability of both disease free equilibrium and disease endemic equilibrium. widyaningsih et al. ( ) proposed a seir model with immigration and determined the system equilibrium conditions. kim et al. ( ) developed a coxian-distributed seir model considering an empirical incubation period, and an stability analysis was also performed. in order to reduce the spread of dissemination of covid- worldwide, various procedures have been adopted. as mentioned by zhai et al. ( ) and wei et al. ( ) , quarantine and isolation (social-distancing) can effectively reduce the spread of covid- . in addition, wearing masks, washing hands and disinfecting surfaces contribute to reducing the risk of infection. according to the u.s. food and drug administration, there are no specific therapies to covid- treatment. however, treatments including antiviral agents, chloroquine and hydroxychloroquine, corticosteroids, antibodies, convalescent plasma transfusion and radiotherapy are being studied . as alternative to these treatments, the use of drug administration (vaccine) arises as an interesting alternative to face this pandemic. it must be emphasized that there is currently no vaccine for covid- , but there is a huge effort to develop a vaccine in a record time, which justifies the present study (lurie et al., ) . mathematically, the determination of optimal protocol for vaccine administration characterizes an optimal control problem (ocp). this particular optimization problem consists in the determination of control variable profiles that minimize (or maximize) a given performance index (bryson and ho, ; biegler et al., ) . in order to solve this problem, several numerical methods have been proposed (bryson and ho, ; feehery and barton, ; lobato, ; lobato et al., ) . these methods are classified according to three broad categories: direct optimization methods, pontryagin's maximum principle (pmp) based methods and hjb-based (hamilton-jacob-bellman) methods. the direct approach is the most traditional strategy considered to solve an ocp, due to its simplicity. in this approach, the original problem is transformed into a finite dimensional optimization problem through the parametrization of control or parametrization of control and state variables (feehery and barton, ) . from an epidemiological point of view, neilan and lenhart ( ) proposed an optimal control problem to determine a vaccination strategy over a specific period of time so as to minimize a cost function. in this work, the propagation of a disease is controlled by a limited number of vaccines, while minimizing a percentage of the overall number of dead people by infection, and a cost associated with vaccination. biswas et al. ( ) studied different mathematical formulations for an optimal control problem considering a susceptible-exposed-infectious-recovered model. for this purpose, these authors evaluated the solution of such problems when mixed state control constraints are used to impose upper bounds on the available vaccines at each instant of time. in addition, the possibility of im-posing upper bounds on the number of susceptible individuals with and without limitations on the number of vaccines available were analyzed. the optimal control theory was applied to obtain optimal vaccination schedules and control strategies for the epidemic model of human infectious diseases. in this word, the objective is to determine an optimal control strategy for vaccine administration in covid- pandemic treatment considering real data from china. in order to determine the parameters that characterizes the proposed mathematical model (based on the compartmental sir model), an inverse problem is formulated and solved considering the differential evolution (de) algorithm (storn and price, ; price et al., ) . after this step, two optimal control problems (mono-and multi-objective) used to determine the optimal strategy for vaccine administration in covid- pandemic treatment are proposed. the mono-objective optimal control problem considers minimizing the quantity of infected individuals during the treatment. on the other hand, the multi-objective optimal control problem considers minimizing together the quantity of infected individuals and the prescribed vaccine concentration during the treatment. to solve each problem, de and multi-objective differential evolution (mode) algorithms (lobato and steffen, ) are employed, respectively. this work is organized as follows. section presents the description of the mathematical model considered to represent the evolution of covid- pandemic. in section , the general aspects regarding the formulation and solution of an ocp is presented. a brief review on de and its extension to deal with multi-criteria optimization is presented in section . in section , the proposed methodology is presented and discussed. the results obtained using such methodology are presented in section . finally, the conclusions are outlined in section . in the specialized literature, various compartmental models used to represent the evolution of an epidemic can be found (forgoston and schwartz, ; pesco et al., ; shaman et al., ; cooper et al., ; azam et al., ) . the study of these models is very important to understand an epidemic spreading mechanism and, consequently, to investigate the transmission dynamics in population (forgoston and schwartz, ) . as mentioned by keeling and rohani ( ) , these compartmental models can be divided into two groups: i) population-based models and ii) agentbased or individual-based models. in turn, the first one can be subdivided into deterministic or stochastic (considering continuous time, ordinary differential equations, partial differential equations, delay differential equations or integrodifferential equations) or discrete time (represented by difference equations). the second class can be subdivided into usually stochastic and usually discrete time. in the context of population-based models, the deterministic modeling can be represented, in general, by the interaction among susceptible (denoted by s -an individual which is not infected by the disease pathogen), exposed (denoted by e -an individual in the incubation period after being infected by the disease pathogen, and with no visible clinical signs), infected/infectious (denoted by e -an individual that can infect others) and recovered individuals (denoted by r -an individual who survived after being infected but is no longer infectious and has developed a natural immunity to the disease pathogen). considering a population of size n, and based on the disease nature and on the spreading pattern, the compartmental models can be represented as (keeling and rohani, ; hethcote, ) : • susceptible-infected (si): population described by groups of susceptible and infected; • susceptible-infected-recovered (sir): population described by groups of susceptible, infected and recovered; • susceptible-infectious-susceptible (sis): population also described by groups of susceptible and infected. in this particular case, recovering from some pathologies do not guarantee lasting immunity. thus, individuals may become susceptible again; • susceptible-exposed-infectious-recovered (seir): population described by groups of susceptible exposed, infected and recovered. it is important to mention that in all these models, terms associated with birth, mortality and vaccination rate can be added. in addition, according to keeling and rohani ( ) and hethcote ( ) , these models can include: i) time-dependent parameters to represent the effects of seasonality; ii) additional compartments to model vaccinated and asymptomatic individuals, and different stages of disease progression; iii) multiple groups to model heterogeneity, age, spatial structure or host species; iv) human demographics parameters, for diseases where the time frame of the disease dynamics is comparable to that of human demographics. human demographics can be modeled by adopting constant immigration rate, constant per capita birth and death rates, density-dependent death rate or disease-induced death rate. thus, the final model is dependent on assumptions taken during the formulation of the problem. in this work, the sir model is adopted, in order to describe the dynamic behavior of during the covid- epidemic in china. the choice of this model is due to the study conducted by roda et al. ( ) . these authors demonstrated that the sir model performs more adequately than the seir model in representing the information related to confirmed case data. for this reason, the sir model will be adopted here. the schematic representation of this model is presented in fig. . mathematically, this model has the following characteristics: • an individual is susceptible to an infection and the disease can be transmitted from any infected individual to any susceptible individual. each susceptible individual is given by the following relation: where t is the time, β and µ represents the probability of transmission by contact and per capita removal rate, respectively. in turn, s is the initial condition for the susceptible population. • any infected individual may transmit the disease to a susceptible one according to the following relation: where γ denotes the per capita recovery rate. i is the initial condition for the infected population. • once an individual has been moved from infected to recovered, it is assumed that it is not possible to be infected again. this condition is described by: where r is the initial condition for the recovered population. it is important to emphasize that the population size (n) along time t is defined as in practice, the model parameters must be determined to represent a particular epidemic. for this purpose, it is necessary to formulate and to solve an inverse problem. in the section that describes the methodologies adopted in this work, more details on the formulation and solution of this problem is presented. mathematically, an ocp can be formulated as follows (bryson and ho, ; feehery and barton, ; lobato, ) . initially, let where z is the vector of state variables, and u is the vector of control variables. Ψ and l are the first and second terms of the performance index, respectively. the minimization problem is given by arg min with consistent initial conditions given by according to the optimal control theory (bryson and ho, ; feehery and barton, ) , the solution of the ocp, whose problem is defined by eqs. ( ) and ( ), is satisfied by the co-state equations and the stationary condition given, respectively, byλ where h is the hamiltonian function defined by this system of equations is known as the euler-lagrange equations (optimality conditions), which are characterized as boundary value problems (bvps). thus, to solve this model, an appropriated methodology must be used, as for example, shooting method or collocation method (bryson and ho, ) . as mentioned by bryson and ho ( ) and feehery and barton ( ) , the main difficulties associated with ocps are the following: the existence of end-point conditions (or region constraints) implies multipliers and associated complementary conditions that significantly increase the complexity of solving the bvp using a indirect method; the existence of constraints involving the state variables and the application of slack variables method may introduce differential algebraic equations of higher index; the lagrange multipliers may be very sensitive to the initial conditions. differential evolution is a powerful optimization technique to solve mono-objective optimization problems, proposed by storn and price ( ) . this evolutionary strategy differs from other population-based algorithms in the schemes considered to generate a new candidate to solution of the optimization problem (storn and price, ; price et al., ; lobato and steffen, ) . the population evolution proposed by de follows three fundamental steps: mutation, crossover and selection. the optimization process starts by creating a vector containing np individuals, called initial population, which are randomly distributed over the entire search space. during g max generations, each of the individuals that constitute the current population are subject to the procedures performed by the genetic operators of the algorithm. in the first step, the mutation operator creates a trial vector by adding the balanced difference between two individuals in a third member of the population, by v (g+ ) the parameter f represents the amplification factor, which controls the contribution added by the vector difference, such that f ∈ [ , ]. in turn, storn and price ( ) proposed various mutation schemes for the generation of trial vectors (candidate solutions) by combining the vectors that are randomly chosen from the current population, such as: the second step of the algorithm is the crossover procedure. this genetic operator creates new candidates by combining the attributes of the individuals of the original population with those resulting in the mutation step. the vector u (g+ ) jk , such as k = , . . . , d, where d denotes the dimension of the problem and randb (k) ∈ [ , ] is a random real number with uniform distribution. the choice of the attributes of a given individual is defined by the crossover coefficient, represented by cr, such that cr ∈ [ , ] is a constant parameter defined by the user. in turn, rnbr ( j) ∈ [ , d] is a randomly chosen index. after the generation of the trial vector by the steps of mutation and crossover, the evolution of the best individuals is defined according to a greedy strategy, during the selection step. price et al. ( ) have defined some simple rules for choosing the key parameters of de for general applications. typically, one might choose np in the range from to times the dimension (d) of the problem. in the case of f, it is suggested taking a value ranging between . and . . initially, f = . may be a good choice. in the case of premature convergence, f and np might be increased. the multi-objective optimization problem (mop) is an extension of the mono-objective optimization problem. due to the conflict between the objectives, there is no single point capable of optimizing all functions simultaneously. instead, the best solutions that can be obtained are called optimal pareto solutions, which form the pareto curve (deb, ) . the notion of optimality in a mop is different from the one regarding optimization problems with a single objective. the most common idea about multi-objective optimization found in the literature was originally proposed by edgeworth ( ) and further generalized by pareto ( ) . one solution is said to be dominant over another, if it is not worse in any of the objectives, and if it is strictly better in at least one of the objectives. as an optimal pareto solution dominates any other feasible point in the search space, all of these solutions are considered better than any other. therefore, multi-objective optimization consists of finding a set of points that represents the best balance in relation to minimizing all objectives simultaneously, that is, a collection of solutions that relates the objectives, which are in conflict with each other, in most cases. let f (x) = ( f (x) , . . . , f m (x)) t be the objective vector such that f k : p → ir, for k = , . . . , m, where x ∈ p is called decision vector and its entries are called decision variables. mathematically, a mop is defined as (deb, ; lobato, ) : due to the favorable outcome of de in solving mono-objective optimization problems, for different fields of science and engineering, lobato and steffen ( ) proposed the multi-objective differential evolution (mode) algorithm to solve multi-objective optimization problems. basically, this evolutionary strategy differs from other algorithms by the incorporation of two operators to the original de algorithm, the mechanisms of rank ordering (deb, ; zitzler and thiele, ) , and exploration of the neighborhood for potential solution candidates (hu et al., ) . a brief description of the algorithm is presented next. at first, an initial population of size np is randomly generated, and all objectives are evaluated. all dominated solutions are removed from the population by using the operator fast non-dominated sorting (deb, ) . this procedure is repeated until each candidate vector becomes a member of a front. three parents generated by using de algorithm are selected at random in the population. then, an offspring is generated from these parents (this process continues until np children are generated). starting from population p of size np, neighbors are generated to each one of the individuals of the population. these neighbors are classified according to the dominance criterion, and only the non-dominated neighbors (p ) are put together with p , in order to form p . the population p is then classified according to the dominance criterion. if the number of individuals of the population p is larger than a predefined number, the population is truncated according to the crowding distance (deb, ) criterion. this metric describes the density of candidate solutions surrounding an arbitrary vector. a complete description of mode is presented by lobato and steffen ( ) . as mentioned earlier, the first objective of this work is to determine the parameters of the sir model adopted to predict the evolution of covid- epidemic considering experimental data from china. in this case, it is necessary to formulate and to solve an inverse problem. it arises from the requirement of determining parameters of theoretical models in such a way that it can be employed to simulate the behavior of the system for different operating conditions. basically, the estimation procedure consists of obtaining the model parameters by the minimization of the difference between calculated and experimental values. in this work, it is assumed that, since the outbreak persists for a relatively short period of time, the rate of births and deaths of the population is insignificant. thus, we take µ = , since there are probably few births/deaths in the corresponding period. we are interested in the determination of the following parameters of the sir model: β, γ and i . let and i sim i are the experimental and simulated infected population, respectively, and m represents the total number of experimental data available. in this case, the sir model must be simulated considering the parameters calculated by de, in order to obtain the number of infected people estimated by the model and, consequently, the value of the objective function (f ). as the number of measured data, m, is usually much larger than the number of parameters to be estimated, the inverse problem is formulated as a finite dimensional optimization problem in which we aim at minimizing f . in order to formulate both ocps, the parameters estimated considering the proposed inverse problem are used. as proposed by neilan and lenhart ( ) and biswas et al. ( ) , a new variable w, which denotes the number of vaccines used, is introduced in order to determine the optimal control strategy for vaccine administration. for this purpose, the total amount of vaccines available during the whole period of time is proportional to us . physically, u represents the portion of susceptible individuals being vaccinated per unit of time (biswas et al., ) . it is important to mention that u acts as the control variable of such system. if u is equal to zero there is no vaccination, and u equals to one indicates that all susceptible population is vaccinated. a schematic diagram of the disease transmission among the individuals for the sir model with vaccination is shown in fig. . mathematically, the sir model considering the presence of control is written as: where w is the initial condition for the total amount of vaccines. it is important to emphasize that the population size (n) after the inclusion of this new variable w along the time t is defined as n(t) = s (t) + i(t) + r(t) + w(t). the first formulation aims to determine the optimal vaccine administration (u) to minimize the infected population, represented by Ω . thus, let the ocp is defined as arg min u Ω ( ) subject to eqs. ( ) -( ) and u min ≤ u ≤ u max , where t and t f represents the initial and final time, respectively, and u min and u max are the lower and upper bounds for the control variable, respectively. the second formulation considers two objectives, i.e., the determination of the optimal vaccine administration, in order to minimize the number of infected individuals and, at the same time, to minimize the number of vaccines needed. the total number of vaccines can be determined by whereas the number of infected people is given by eq. ( ) . thus, the multi-objective optimization problem is formulated as arg min u (Ω , Ω ) ( ) subject to eqs. ( ) -( ) and u min ≤ u ≤ u max . in both problems, the control variable u must be discretized. in this context, the approach proposed consists of transforming the original ocp into a nonlinear optimization problem. for this purpose, let the time interval , t f be discretized using n elem time nodes, with each node denoted by t i , where i = , . . . , n elem − , such that t ≤ t i ≤ t f . for each of the n elem − subintervals of time, given by [t i , t i+ ], the control variable is considered constant by parts, that is, in order to obtain an optimal control strategy for vaccine administration, that can be used in medical practice, we consider the bang-bang control which consisting of a binary feedback control that turns either "on" (in our case, when u = u max = ) or "off" (when u = u min = ) at different time points, determined by the system feedback. in this case, as the control strategy u is constant by parts, the proposed optimal control problem has n elem − unknown parameters, since the control variable at the start and end times are known. the resulting nonlinear optimization problems are solved by using the de, in the case of the mono-objective problem, given by eq. ( ), and mode, for the multi-objective problem defined by eq. ( ). in order to apply the proposed methodology to solve the inverse problem described previously, the following steps are established: • objective function: minimize the functional f , given by eq. ( ); • design space: . ≤ β ≤ . , . ≤ γ ≤ . and − ≤ i ≤ . (all defined after preliminary executions); • de parameters: population size ( ), number of generations ( ), perturbation rate ( . ), crossover rate ( . ) and strategy rand/ (as presented in section . ). the evolutionary process is halted when a prescribed number of generations is reached (in this case, ). independent runs of the algorithm were made, with different seeds for the generation of the initial population; • to evaluate the sir model during the optimization process, the runge-kutta-fehelberg method was used; • initial conditions: s ( ) = − i , i( ) = i , and r( ) = . in this case, i is chosen as the first reported data in relation to the number of infected individuals in the time series; • the data used in the formulation of the inverse problem refer to the population of china, from january to april , , taken from johns hopkins resource center ( (accessed april , ). table presents the results (best and standard deviation) obtained using de. it is possible to observe that de was able to obtain good estimates for the unknown parameters and, consequently, for the objective function, as can be verified, by visual inspection of fig. . these results were obtained, as mentioned earlier, from runs. thus, the values of the standard deviation demonstrate that the algorithm converges, practically, to the same optimum in all executions (best). physically, the probability of transmission by contact in the chinese population is superior to % (β equal to . ). in addition, γ equal to . implies a moderate per capita recovery rate. one must consider that, since many cases may not be reported, for different reasons, as for example an asymptomatic infected person, the value of i may vary, as well as the behavior of the model over time. it is important to emphasize that when choosing i as a design variable, the initial condition for the susceptible population (s ) is automatically defined, that is, s = − i , since there is not, at the beginning of an epidemic, a considerable number of recovered individuals and, thus, r = is a reasonable choice. in this case, the available data refer to the number of infected individuals and these represent only the portion of individuals in the population that have actually been diagnosed. this is due, among other facts, to the lack of tests to diagnose the disease of all individuals who present symptoms. thus, as the number of susceptible individuals at the beginning of the epidemic is dependent on the value of i , in this work it is considered that the total size of the population, typically defined as n = s + i + r, is actually a portion of the total population, since the number of infected individuals available is also a fraction of those who have actually been diagnosed. in this case, the results presented below represent only the fraction of the infected population that was diagnosed and, consequently, the fraction of individuals susceptible to contracting the disease. qualitatively, the results presented are proportional to the number of individuals in the population who were diagnosed with the disease. in order to evaluate the sensitivity of the solutions obtained, in terms of the objective function, the best solution (β = . , γ = . , and i = . ) was analyzed considering a perturbation rate given by δ. for this purpose, the range [( − δ) θ k , ( + δ) θ k ] was adopted, for k ⊂ { , , }, where θ = (β, γ, i ). thus, in each analysis, one design variable is perturbed and the value of f in relation to this noise is computed. figure presents the sensitivity analysis for each estimated parameter, in terms of the objective function, considering δ equal to . and equally spaced points in the interval of interest. in these figures, it is possible to observe that the variation in the result of each parameter, as expected, in a worst value for the f . in addition, that the design variable more sensible to δ parameter is the β parameter, since a wide range of values for the f were obtained. we consider two distinct analysis in this section, in order to evaluate the proposed methodology considered to solve the mono-objective optimization problem: i) solution of the proposed mono-objective optimal control problem and ii) evaluation on the influence of the maximum amount of vaccine, by defining an inequality constraint. for this purpose, the following steps are established: • objective function: minimize the functional Ω , given by eq. ( ); • the previously calculated parameters (β, γ and i ) are employed in the simulation of the sir model; • design space: ≤ t i ≤ t f , for i = , . . . , t n elem − , and n elem = . it is important to mention that this value was chosen after preliminary runs, i.e., increasing this value do not produce better results in terms of the objective function; • de parameters: population size ( ), number of generations ( ), perturbation rate ( . ), crossover rate ( . ) and strategy rand/ (as presented in section . ). the evolutionary process is halted when a prescribed number of generations is reached (in this case, ). independent runs of the algorithm were made, with different seeds for the generation of the initial population; • to evaluate the sir model during the optimization process, the runge-kutta-fehelberg method was used; • initial conditions: s ( ) = − i , i( ) = i , and r( ) = . as in the previous case, i is chosen as the first reported data in relation to the number of infected individuals in the time series; table presents the best solution obtained by using de and considering ten control elements, in terms of the number of individuals. the objective function obtained (about . individuals) is less than the case in which no control is considered (about . individuals), i.e., the number of infected individuals is lower when a control strategy is considered (see figs. (a) and (c)). if the number of infected individuals is reduced, due to control action, the number of susceptible individuals rapidly decreases until its minimum value ( . × − individuals) and, consequently, the number of recovered individuals rapidly increase until its maximum value ( . individuals), as observed in figs. (b) and (d), respectively. in terms of the action regarding the control variable, the effectiveness is readily verified in the beginning of the vaccine administration. further the administration is conducted in specific intervals of time, which preserves the health of the population, as observed in fig. (e) . the evolution of the number of vaccinated individuals is presented in fig. (f) . in this case, due to control action, the vaccinated population increase rapidly until the value is saturated ( . individuals). in summary, all obtained profiles are coherent from the physical point of view. finally, it is important to mention that the standard deviation for each result is, approximately, equal to − , which demonstrates the robustness of de to solve the proposed mono-objective optimal control problem. in this model, the evaluation of the number of vaccinated individuals is associated with an inequality constraint. this relation bounds the quantity of individuals that can be vaccinated due to the limitation related to the production of vaccines. for this purpose, two control elements are incorporated to the model: if w(t ) ≤ w lim , then u = . otherwise, u = (t is the instant of time that w(t ) = w lim , and w lim is the upper bound for the number of vaccinated individuals). table presents the results obtained considering different quantities for the parameter w lim . as expected, the insertion of this constraint implies in limiting the maximum number of vaccinated individuals and, consequently, a lower number of individuals are vaccinated. the increase of the parameter w lim implies in the reduction of the objective function value, in number of infected and recovered individuals and, consequently, an increase in the number of susceptible individuals. these analysis can be observed in fig. . as presented previously, a multi-objective optimal control problem was proposed in order to minimize the number of infected individuals (Ω ) and to minimize the quantity of vaccine administered (Ω ). to evaluate the proposed methodology considered to solve this multi-objective optimization problem, the following steps are established: • objective functions: minimize both Ω and Ω together, which are defined by eqs. ( ) and ( ), respectively; • the previously calculated parameters (β, γ and i ) are employed in the simulation of the sir model; • design space: ≤ t i ≤ t f , for i = , . . . , t n elem − , and n elem = . • mode parameters: population size ( ), number of generations ( ), perturbation rate ( . ), crossover rate ( . ), number of pseud-curves ( ), reduction rate ( . ), and strategy rand/ (as presented in section . ). the stopping criterion adopted is the same as in the previous cases. • to evaluate the sir model during the optimization process, the runge-kutta-fehelberg method was used; • initial conditions: s ( ) = − i , i( ) = i , r( ) = , and w( ) = . table . it must be stressed that the pareto curve presents the non-dominated solutions, as described in section . . the point a represents the best solution in terms of the minimization of the number of infected individuals, with Ω = . , that is, the number of infected individuals at t f assume its lowest value, which is equal to i(t f ) = . , but considering a larger amount of vaccine administered (Ω = . ). on the other hand, the point b represents the best solution in terms of the quantity of vaccine administered, with Ω = . , i.e, the minimization of such value when t = t f . however, for this point, the number of infected individuals is high (Ω = . ). the point c is a compromise solution, which is a good solution in terms of both objectives simultaneously, with intermediary values for both objectives -Ω = . and Ω = . . table . in figure (e) it is possible to observe the activation of the control variable when vaccine is introduced. besides, in both results obtained, the action of such treatment is readily verified in the population during a larger interval of time in the beginning of the vaccine administration. in figures (b) , (c), (d) and (f) the susceptible, infectious, recovered and number of vaccines profiles are presented, respectively, for each point described in table . in these figures we can visualize the importance of the control strategy used. for example, the points a and c are good choices in terms of the minimization of infected individuals, although the point a has a highest value in terms of the objective Ω . on the other hand, point b is satisfactory in terms of minimizing the amount of vaccines administered, but, from a clinical point of view, it is not a good choice, as the number of infected individuals is not minimized. in this contribution was proposed and solved an inverse problem to simulate the dynamic behavior of novel coronavirus disease (covid- ) considering real data from china. the parameters of the compartmental sir (susceptible, infectious and recovered) model were determined by using differential evolution (de). considering the parameters obtained with the solution of the proposed inverse problem, two optimal control problems were proposed. the first consists of minimizing the quantity of infected individuals. in this case, an inequality that represents the quantity of vaccines available was analyzed. the second optimal control problem considers the minimizing together the quantity of infected individuals and the prescribed vaccine concentration during the treatment. this problem was solved using multi-objective differential evolution (mode). in general, the solution of the proposed multi-objective optimal control problem provides information from which an optimal strategy for vaccine administration can be defined. the use of mathematical models associated with optimization tools may contribute to decision making in situations of this type. it is important to emphasized that the quality of the results is dependent on experimental data considered. in this context, one may cite the following limitations regarding the sir model: i) poor quality of reported official data and ii) the simplifications of the model, as for example terms as birth rate, differential vaccination rate, weather changes and its effect on the epidemiology. finally, it is worth mentioning that the problem formulated in this work is not normally considered in the specialized literature (only the minimization of the infected individuals is normally proposed). in this context, the formulation of the multi-objective optimization problem and its solution by using mode represents the main contribution of this work. modeling and analysis of an seir epidemic model with a limited resource for treatment numerical modeling and theoretical analysis of a nonlinear advection-reaction epidemic system scheduling of measles vaccination in low-income countries: projections of a dynamic model application of the arima model on the covid- epidemic dataset advances in simultaneous strategies for dynamic process optimization a seir model for control of infectious diseases with constraints an introduction to compartmental modeling for the budding infectious disease modeler parameter estimation based synchronization for an epidemic model with application to tuberculosis in cameroon applied optimal control: optimization, estimation and control a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster forecasting the spread of mosquito-borne disease using publicly accessible data: a case study in chikungunya multi-objective optimization using evolutionary algorithms deep learning to detect alzheimer's disease from neuroimaging: a systematic literature review dynamic simulation and optimization with inequality path constraints predicting unobserved exposures from seasonal epidemic data severe acute respiratory syndrome-related coronavirus: the species and its viruses-a statement of the coronavirus study group the mathematics of infectious diseases a new multi-objective evolutionary algorithm: neighbourhood exploring evolution strategy clinical features of patients infected with novel coronavirus in wuhan mapping -ncov modeling infectious diseases in humans and animals global stability of an seir epidemic model where empirical distribution of incubation period is approximated by coxian distribution dynamic analysis of an seir model with distinct incidence for exposed and infectives global stability for the seir model in epidemiology early transmission dynamics in wuhan, china, of novel coronavirusâȂŞinfected pneumonia a conceptual model for the coronavirus disease (covid- ) outbreak in wuhan, china with individual reaction and governmental action hybrid approach for dynamic optimization problems multi-objective optimization for engineering system design determination of an optimal control strategy for drug administration in tumor treatment using multi-objective optimization differential evolution a new multi-objective optimization algorithm based on differential evolution and neighborhood exploring evolution strategy developing covid- vaccines at pandemic speed mathematical analysis of a sex-structured hiv/aids model with a discrete time delay an introduction to optimal control with an application in disease modeling cours d'Économie politique. f. rouge modelling the effect of changes in vaccine effectiveness and transmission contact rates on pertussis epidemiology the effect of control strategies to reduce social mixing on outcomes of the covid- epidemic in wuhan, china: a modelling study differential evolution: a practical approach to global optimization pattern of early human-to-human transmission of wuhan novel coronavirus ( -ncov) why is it difficult to accurately predict the covid- epidemic? infectious disease modelling inference and forecast of the current west african ebola outbreak in guinea stability of seir model of infectious diseases with human immunity differential evolution-a simple and efficient heuristic for global optimization over continuous spaces deterministic seirs epidemic model for modeling vital dynamics, vaccinations, and temporary immunity clinical characteristics of hospitalized patients with novel coronavirusâȂŞinfected pneumonia in wuhan, china radiotherapy workflow and protection procedures during the coronavirus disease (covid- ) outbreak: experience of the hubei cancer hospital in wuhan the sir model and the foundations of public health susceptible exposed infected recovery (seir) model with immigration: equilibria points and its application the epidemiology, diagnosis and treatment of covid- multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach this study was financed in part by the coordenação de aperfeiçoamente de pessoal de nível superior-brasil (capes)-finance code , fundação carlos chagas filho de amparo à pesquisa do estado do rio de janeiro (faperj), and conselho nacional de desenvolvimento científico e tecnológico (cnpq). key: cord- - qjae x authors: mbuvha, rendani; marwala, tshilidzi title: bayesian inference of covid- spreading rates in south africa date: - - journal: plos one doi: . /journal.pone. sha: doc_id: cord_uid: qjae x the severe acute respiratory syndrome coronavirus (sars-cov- ) pandemic has highlighted the need for performing accurate inference with limited data. fundamental to the design of rapid state responses is the ability to perform epidemiological model parameter inference for localised trajectory predictions. in this work, we perform bayesian parameter inference using markov chain monte carlo (mcmc) methods on the susceptible-infected-recovered (sir) and susceptible-exposed-infected-recovered (seir) epidemiological models with time-varying spreading rates for south africa. the results find two change points in the spreading rate of covid- in south africa as inferred from the confirmed cases. the first change point coincides with state enactment of a travel ban and the resultant containment of imported infections. the second change point coincides with the start of a state-led mass screening and testing programme which has highlighted community-level disease spread that was not well represented in the initial largely traveller based and private laboratory dominated testing data. the results further suggest that due to the likely effect of the national lockdown, community level transmissions are slower than the original imported case driven spread of the disease. the first reported case of the novel coronavirus (sars-cov- ) in south africa was announced on march , following the initial manifestation of the virus in wuhan china in december [ ] [ ] [ ] . due to its further spread and the severity of its associated clinical outcomes, the disease was subsequently declared a pandemic by the world health organisation (who) on march [ , ] . in south africa, by april , people had been confirmed to have been infected by the coronavirus with fatalities [ ] . numerous states have attempted to minimise the growth in number of covid- infections [ , , ] . these attempts are largely based on non-pharmaceutical interventions (npis) aimed at separating the infectious population from the susceptible population [ ] . these initiatives aim to strategically reduce the increase in infections to a level where their healthcare systems stand a chance of minimising the number of fatalities [ ] . some of the critical indicators for policymaker response planning include projections of the infected population, estimates of health care service demand and whether current containment measures are effective [ ] . as the pandemic develops in a rapid and varied manner in most countries, calibration of epidemiological models based on available data can prove to be [ ] . this difficulty is further escalated by the high number of asymptomatic cases and the limited testing capacity [ , ] . a fundamental issue when calibrating localised models is inferring parameters of compartmental models such as susceptible-infectious-recovered (sir) and the susceptible-exposedinfectious-recovered (seir) that are widely used in infectious disease projections. in the view of public health policymakers, a critical aspect of projecting infections is the inference of parameters that align with the underlying trajectories in their jurisdictions. the spreading rate is a parameter of particular interest which is subject to changes due to voluntary social distancing measures and government-imposed contact bans. the uncertainty in utilising these models is compounded by the limited data in the initial phases and the rapidly changing dynamics due to rapid public policy changes. to address these complexities, we utilise the bayesian framework for the inference of epidemiological model parameters in south africa. the bayesian framework allows for both incorporation of prior knowledge and principled embedding of uncertainty in parameter estimation. in this work we combine bayesian inference with the compartmental seir and sir models to infer time varying spreading rates that allow for quantification of the impact of government interventions in south africa. compartmental models are a class of models that is widely used in epidemiology to model transitions between various stages of disease [ , , ] . we now introduce the susceptible-exposed-infectious-recovered (seir) and the related susceptible-infectious-recovered (sir) compartmental models that have been dominant in covid- modelling literature [ , , , ] . the susceptible-exposed-infectious-recovered model. the seir is an established epidemiological model for the projection of infectious diseases. the seir models the transition of individuals between four stages of a condition, namely: • being susceptible to the condition, • being infected and in incubation • having the condition and being infectious to others and • having recovered and built immunity for the disease. the seir can be interpreted as a four-state markov chain which is illustrated diagrammatically in fig . the seir relies on solving the system of ordinary differential equations below representing the analytic trajectory of the infectious disease [ ] . where s is the susceptible population, i is the infected population, r is the recovered population and n is the total population where n = s + e + i + r. λ is the transmission rate, σ is the rate at which individuals in incubation become infectious, and μ is the recovery rate. /σ and /μ therefore, become the incubation period and contagious period respectively. we also consider the susceptible-infectious-recovered (sir) model which is a subclass of the seir model that assumes direct transition from the susceptible compartment to the infected (and infectious) compartment. the sir is represented by three coupled ordinary differential equations rather than the four in the seir. fig depicts the three states of the sir model. the basic reproductive number r . the basic reproductive number (r ) represents the mean number of additional infections created by one infectious individual in a susceptible population. according to the latest available literature, without accounting for any social distancing policies the r for covid- is between and . [ , , , ] . r can be expressed in terms of λ and μ as: extensions to the seir and sir models. we use an extended version of the seir and sir models of [ ] that incorporates some of the observed phenomena relating to covid- . first we include a delay d in becoming infected (i new ) and being reported in the confirmed case statistics, such that the confirmed reported cases cr t at some time t are in the form [ ] : we further assume that the spreading rate λ is time-varying rather than constant with change points that are affected by government interventions and voluntary social distancing measures. we follow the framework of [ ] to perform bayesian inference for model parameters on the south african covid- data. the bayesian framework allows for the posterior inference of parameters which updates prior beliefs based on a data-driven likelihood. the posterior inference is governed by bayes theorem as follows: where p(w|d, m) is the posterior distribution of a vector of model parameters (w) given the model(m) and observed data(d), p(d|w, m) is the data likelihood and p(d) is the evidence. the likelihood. the likelihood indicates the probability of observing the reported case data given the assumed model. in our study, we adopt the student-t distribution as the likelihood as suggested by [ ] . similar to a gaussian likelihood, the student-t likelihood allows for parameter updates that minimise discrepancies between the predicted and observed reported cases. priors. parameter prior distributions encode some prior subject matter knowledge into parameter estimation. in the case of epidemiological model parameters, priors incorporate literature based expected values of parameters such as recovery rate(μ), spreading rate(λ), change points based on policy interventions etc. the prior settings for the model parameters are listed in table . we follow [ ] by selecting lognormal distributions for λ and σ such that the initial mean basic reproductive number is . which is consistent with literature [ , , , , ] . we set a lognormal prior for the σ such that the mean incubation period is five days. we use the history of government interventions to set priors on change points in the spreading rate. the priors on change-points include / / when a travel ban and school closures were announced, and / / when a national lockdown was enforced. we keep the priors for the lognormal distributions of the spreading rates after the change points weakly-informative by setting the same mean as λ and higher variances across all change points. this has the effect of placing greater weight on the data driven likelihood. similar to [ ] we adopt weakly-informative half-cauchy priors for the initial conditions for the infected and exposed populations. markov chain monte carlo (mcmc). given that the closed-form inference of the posterior distributions on the parameters listed in table is infeasible, we make use of markov chain monte carlo to sample from the posterior. monte carlo methods approximate solutions [ , ] . in this work, we explore inference using metropolis-hastings (mh), slice sampling and no-u-turn sampler (nuts). metropolis hastings (mh). mh is one of the simplest algorithms for generating a markov chain which converges to the correct stationary distribution. the mh generates proposed samples using a proposal distribution. a new parameter state w t � is accepted or rejected probabilistically based on the posterior likelihood ratio: a common proposal distribution is a symmetric random walk obtained by adding gaussian noise to a previously accepted parameter state. random walk behaviour of such a proposal typically results in low sample acceptance rates. while sample acceptance is guaranteed with slice sampling, a large slice window can lead to computationally inefficient sampling while a small window can lead to poor mixing. hybrid monte carlo (hmc) and the no-u-turn sampler (nuts). metropolis-hastings (mh) and slice sampling tend to exhibit excessive random walk behaviour-where the next state of the markov chain is randomly proposed from a proposal distribution [ ] [ ] [ ] . this results in low proposal acceptance rates and small effective sample sizes. hmc proposed by [ ] reduces random walk behaviour by adding auxiliary momentum variables to the parameter space [ ] . hmc creates a vector field around the current state using gradient information, which assigns the current state a trajectory towards a high probability next state [ ] . the dynamical system formed by the model parameters w and the auxiliary momentum variables p is represented by the hamiltonian h(w, p) written as follows [ , ] : where m(w) is the negative log-likelihood of the posterior distribution in eq , also referred to as the potential energy. k(p) is the kinetic energy defined by the kernel of a gaussian with a covariance matrix m [ ] : the trajectory vector field is defined by considering the parameter space as a physical system that follows hamiltonian dynamics [ ] . the dynamical equations governing the trajectory of the chain are then defined by hamiltonian equations at a fictitious time t as follows [ ] : in practical terms, the dynamical trajectory is discretised using the leapfrog integrator. in the leapfrog integrator to reach the next point in the path, we take half a step in the momentum direction, followed by a full step in the direction of the model parameters-then ending with another half step in the momentum direction. due to the discretising errors arising from leapfrog integration a metropolis acceptance step is then performed in order to accept or reject the new sample proposed by the trajectory [ , ] . in the metropolis step the parameters proposed by the hmc trajectory w � are accepted with the probability [ ] : pðw � jd; a; b; hÞ pðwjd; a; b; hÞ algorithm shows the pseudo-code for the hmc where � is a discretisation stepsize. the leapfrog steps are repeated until the maximum trajectory length l is reached. the hmc algorithm has multiple parameters that require tuning for efficient sampling, such as the step size and the trajectory length. in terms of trajectory length, a trajectory length that is too short leads to random walk behaviour similar to mh. while a trajectory length that is too long results in a trajectory that inefficiently traces back. the stepsize is also a critical parameter for sampling, small stepsizes are computationally inefficient leading to correlated samples and poor mixing while large stepsizes compound discretisation errors leading to low acceptance rates. tuning these parameters requires multiple time consuming trial runs. nuts automates the tuning of the leapfrog stepsize and trajectory length. in nuts the stepsize is tuned during an initial burn-in phase by targeting particular levels of sample acceptance. the trajectory length is tuned by iteratively adding steps until either the chain starts to trace back (u-turn) or the hamiltonian explodes (becomes infinite). we use the samplers described above to calibrate the seir and sir models on daily new cases and cumulative cases data for south africa up to and including april provided by johns hopkins university's center for systems science and engineering(csse) [ ] . sir and seir model parameter inference was performed using confirmed cases data up to and including april and mcmc samplers described in the methodology section. each of the samplers are run such that samples are drawn with burn-in and tuning steps. we use leave-one-out(loo) cross-validation error of [ ] to evaluate the goodness of fit of each model. table shows the loo validation errors of the various models. it can be seen that the sir model with two change points as the best model fit with the lowest mean loo of . . the seir model with two change points showed a mean loo of . . we note that [ ] similarly finds that the sir model displayed superior goodness of fit to the seir on german data. we now further present detailed results of the sir and seir models with inference using nuts, the trace plots from these models indicating stationarity in the sampling chains are provided in s and s figs. the trace plots for the sir and seir models using mh are provided in s and s figs, while similar trace plots for slice sampling are provided in s and s figs. the trace plots largely indicate that the nuts sampler displays greater agreement between parallel chains thus lower rhat values. time-varying spread rates allow for inference of the impact of various state and societal interventions on the spreading rate. fig shows the fit and projections based on sir models with zero, one and two change points. as can be seen from the plot the two change point model best captures the trajectory in the development of new cases relative to the zero and one change point models. the superior goodness of fit of the two change point model is also illustrated in table . the fit and projections showing similar behaviour on the seir model with various change points are shown in fig . the mean reporting delay time in days was found to be . (ci [ . , . ]), literature suggests this delay includes both the incubation period and the test reporting lags. the posterior distribution incubation period from the seir model in fig yields the inference of parameters is dependent on the underlying testing processes that generate the confirmed case data. the effect of the mass screening and testing campaign was to change the underlying confirmed case data generating process by widening the criteria of those eligible for testing. while initial testing focused on individuals that either had exposure to known cases or travelled to known covid- affected countries, mass screening and testing further introduced detection of community level transmissions which may contain undocumented contact and exposure to covid- positive individuals. we have performed bayesian parameter inference of the sir and seir models using mcmc and publicly available data as at april . the resulting parameter estimates fall in-line with the existing literature in-terms of mean baseline r (before government action), mean incubation time and mean infectious period [ , , , ] . we find that initial government action that mainly included a travel ban, school closures and stay-home orders resulted in a mean decline of % in the spreading rate. further government action through mass screening and testing campaigns resulted in a second trajectory change point. this latter change point is mainly driven by the widening of the population eligible for testing, from travellers (and their known contacts) to include the generalised community who would have probably not afforded private lab testing which dominated the initial data. this resulted in an increase of r to . . the effect of mass screening and testing can also be seen in fig indicating a mean increase in daily tests preformed from to . the second change point illustrates the possible existence of multiple pandemics, as suggested by [ ] . thus testing after march is more indicative of community-level transmissions that were possibly not as well documented in-terms of contact tracing and isolation relative to the initial imported infection driven pandemic. this is also supported by the documented increase in public laboratory testing (relative to private) past this change point, suggesting health care access might also play a role in the detection of community-level infections [ ] . we have utilised a bayesian inference framework to infer time-varying spreading rates of covid- in south africa. the time-varying spreading rates allow us to estimate the effects of government actions on the dynamics of the pandemic. the results indicate a decrease in the mean spreading rate of %, which mainly coincides with the containment of imported infections, school closures and stay at home orders. the results also indicate the emergence of community-level infections which are increasingly being highlighted by the mass screening and testing campaign. the development of the community level transmissions (r � . (ci[ . , . ])) of the pandemic at the time of publication appears to be slower than that of the initial traveller based pandemic (r � . (ci[ . , . ])). a future improvement to this work could include extensions to regional and provincial studies as current data suggests varied spreading rates both regionally and provincially. as more government interventions come to play priors on more change points might also be necessary. on data-driven management of the covid- outbreak in south africa. medrxiv covid- ) an interactive web-based dashboard to track covid- in real time. the lancet infectious diseases coronavirus disease (covid- ) case data-south africa impact of nonpharmaceutical interventions (npis) to reduce covid mortality and healthcare demand inferring covid- spreading rates and potential change points for case number forecasts an epidemiological forecast model and software assessing interventions on covid- epidemic in china compartmental models in epidemiology an introduction to compartmental modeling for the budding infectious disease modeler fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the sars-cov- epidemic feasibility of controlling -ncov outbreaks by isolation of cases and contacts. medrxiv detecting suspected epidemic cases using trajectory big data bayesian automatic relevance determination for feature selection in credit default modelling sampling techniques in bayesian finite element model updating automatic relevance determination bayesian neural networks for credit card default modelling bayesian learning via stochastic dynamics bayesian learning for neural networks mcmc using hamiltonian dynamics. handbook of markov chain monte carlo practical bayesian model evaluation using leave-one-out cross-validation and waic sa's covid- pandemic trends and next steps update on covid- key: cord- -y w f authors: pinter, g.; felde, i.; mosavi, a.; ghamisi, p.; gloaguen, r. title: covid- pandemic prediction for hungary; a hybrid machine learning approach date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: y w f several epidemiological models are being used around the world to project the number of infected individuals and the mortality rates of the covid- outbreak. advancing accurate prediction models is of utmost importance to take proper actions. due to a high level of uncertainty or even lack of essential data, the standard epidemiological models have been challenged regarding the delivery of higher accuracy for long-term prediction. as an alternative to the susceptible-infected-resistant (sir)-based models, this study proposes a hybrid machine learning approach to predict the covid- and we exemplify its potential using data from hungary. the hybrid machine learning methods of adaptive network-based fuzzy inference system (anfis) and multi-layered perceptron-imperialist competitive algorithm (mlp-ica) are used to predict time series of infected individuals and mortality rate. the models predict that by late may, the outbreak and the total morality will drop substantially. the validation is performed for nine days with promising results, which confirms the model accuracy. it is expected that the model maintains its accuracy as long as no significant interruption occurs. based on the results reported here, and due to the complex nature of the covid- outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. this paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. severe acute respiratory syndrome coronavirus , also known as sars-cov- , is reported as a virus strain causing the respiratory disease of covid- [ ] . the world health organization (who) and the global nations confirmed the coronavirus disease to be extremely contagious [ , ] . the covid- pandemic has been widely recognized as a public health emergency of international concern [ ] . to estimate the outbreak, identify the peak ahead of time, and also predict the mortality rate the epidemiological models had been widely used by officials and media. outbreak prediction models have shown to be essential to communicate insights into the likely spread and consequences of covid- . furthermore, governments and other legislative bodies used the insights from prediction models to suggest new policies and to assess the effectiveness of the enforced policies [ ] . the covid- pandemic has been reported to be extremely aggressive to spread [ ] . due to the complex nature of the covid- outbreak and its irregularity in different countries, the standard epidemiological models, i.e., susceptible-infected-resistant (sir)-based models, had been challenged for delivering higher performance in individual nations. furthermore, as the covid- outbreak showed significant differences with other recent outbreaks, e.g., ebola, cholera, swine fever, h n influenza, dengue fever, and zika, advanced epidemiological models have been emerged to provide higher accuracy [ ] . nevertheless, due to several unknown variables involved in the spread, the complexity of population-wide behavior in various countries, and differences in containment strategies model uncertainty has been reported inevitable [ ] [ ] [ ] . consequently, standard epidemiological models face new challenges to deliver more reliable results. the strategy standard sir models is formed around the assumption of transmitting the infectious disease through contacts, considering three different classes susceptible, infected, and recovered [ ] . the susceptible to infection (class s), infected (class i), and the removed population (class r) build the foundation of the epidemiological modeling. note that definition of various classes of outbreak may vary. for instance, r is often referred to those that have recovered, developed immunity, been isolated, or passed away. however, in some countries, r is susceptible to be infected again and there exists uncertainties in allocating r a value. advancing sir-based models requires several assumptions. it is assumed that the class i transmits the infection to class s where the number of probable transmissions is proportional to the total number of contacts computed using basic differential equations as follows [ ] [ ] [ ] . = − ( ) where , , and represent the infected population, the susceptible population, and the daily reproduction rate, respectively. the value of in the time-series produced by the differential equation gradually declines. at the early stage of the outbreak, it is assumed that ≈ where becomes linear. eventually the class i can be stated as follows. = − , = ( ) where regulates the daily rate of spread. furthermore, the individuals excluded from the model is computed as follows. considering the above assumption, the outbreak modeling with sir is finally computed as follows: furthermore, to evaluate the performance of the sir-based models, the median success of the outbreak prediction is used which is calculated as follows: ( ) several analytical solutions to the sir models have been provided in literature [ , ] . as the different nations take different actions toward slowing down the outbreak, the sir-based model must be adopted according to the local assumptions [ ] . inaccuracy of many sir-based models in predicting the outbreak and mortality rate have been evidenced during the covid- in many nations. the key success of an sir-based model relies on choosing the right model according to the context and the relevant assumptions. sis, sird, msir, seir, seis, mseir, and mseirs models are among the popular models used for predicting covid- outbreaks worldwide. the more advanced variation of sir-d models carefully considers the vital dynamics and constant population [ ] . for instance, at the presence of the long-lasting immunity assumption when the immunity is not realized upon recovery from infection, the susceptible-infectious-susceptible (sis) model was suggested [ ] . in contrast, the susceptible-infected-recovered-deceased-model (sird) is used when immunity is assumed [ ] . in the case of covid- different nations took different approaches in this regard. seir models have been reported among the most popular tools to predict the outbreak. seir models through considering the significant incubation period of an infected person reported to present relatively more accurate predictions. in the case of varicella and zika outbreaks the seir models showed increased model accuracy [ , ] . seir models assume that the incubation period is a random variable and similarly to the sir model, there is a disease-free-equilibrium [ , ] . it should be noted, however, that seir models can not fit well where the contact network are non-stationary through time [ ] . social mixing as key factor of non-stationarity determines the reproductive number which is the number of susceptible individuals for infection. the value of for covid- was estimated to be which greatly trigged the pandemic [ ] . the lockdown measures aimed at reducing the value down to . nevertheless, the seir models are reported to be difficult to fit in the case of covid- due to the non-stationarity of mixing, caused by nudging intervention measures. therefore, to develop accurate sir-based models an in-depth information about the social movement and the quality of lockdown measures would be essential. other drawback of sir-based models is the short lead-time. as the lead-time increases, the accuracy of the model declines. for instance, for the covid- outbreak in italy, the accuracy of the model reduces from = for the first five days to = . for day [ ] . overall, the sir-based models would be accurate if firstly the status of social interactions is stable. secondly, the class r can be computed precisely. to better estimate the class r, several data sources can be integrated with sir-based models, e.g., social media and call data records (cdr), which of course a high degree of uncertainty and complexity still remains [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . considering the above uncertainties involved in the advancement of sir-based models the generalization ability are yet to be improved to achieve scalable model with high performance [ ] . due to the complexity and the large-scale nature of the problem in developing epidemiological models, machine learning has recently gained attention for building outbreak prediction models. ml has already shown promising results in contribution to advancing higher generalization ability and greater prediction reliability for longer lead-times [ ] [ ] [ ] [ ] [ ] . machine learning has been already recognized a computing technique with great potential in outbreak prediction. application of ml in outbreak prediction includes several algorithms, e.g., random forest for swine fever [ [ , ] , survival prediction [ ] , and icu demand prediction [ ] . furthermore, the non-peer reviewed sources suggest numerous potentials of machine learning to fight covid- . among the applications of machine learning improvement of the existing models of prediction, identifying vulnerable groups, early diagnose, advancement of drugs delivery, evaluation of the probability of next pandemic, advancement of the integrated systems for spatio-temporal prediction, evaluating the risk of infection, advancing reliable biomedical knowledge graphs, and data mining the social networks are being noted. as stated in our former paper machine learning can be used for data preprocessing. improving the quality of data can particularly improve the quality of the sir-based model. for instance, the number of cases reported by worldometer is not precisely the number of infected cases (e in the seir model), or calculating the number of infectious people (i in seir) cannot be easily determined, as many people who might be infectious may not turn up for testing. although the number of people who are admitted to hospital and deceased wont support r as most covid- positive cases recover without entering hospital. considering this data problem, it is extremely difficult to fit seir models satisfactorily. considering such challenges, for future research, the ability of machine learning for estimation of the missing information on the number of exposed e or infecteds can be evaluated. along with the prediction of the outbreak, prediction of the total mortality rate (n(deaths) / n(infecteds)) is also essential to accurately estimate the number of potential patients with the in the critical situation and the required beds in intensive care units. although the research is in the very . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . early stage, the trend in outbreak prediction with machine learning can be classified in two directions. firstly, improvement of the sir-based models, e.g., [ , ] , and secondly time-series prediction [ , ] . consequently, the state-of-the-art machine learning methods for outbreak modeling suggest two major research gaps for machine learning to address. firstly, improvement of sir-based models and secondly advancement in outbreak time series. considering the drawbacks of the sir-based models, machine learning should be able to contribute. this paper contributes to the advancement of time-series modelling and prediction of covid- . although ml has long been established as a standard tool for modeling natural disasters and weather forecasting [ ] [ ] [ ] [ ] [ ] , its application in modeling outbreak is still in the early stages. more sophisticated ml methods are yet to be explored. a recent paper by ardabili et al, [ ] , explored the potential of mlp and anfis in time series prediction of covid- in several countries. contribution of the present paper is to improve the quality of prediction through proposing a hybrid machine learning and compare the results with anfis. in the present paper the time series of the total mortality is also included. the rest of this paper is organized as follows. section two describes the methods and materials. the results are given in section three. sections four presents conclusions. dataset is related to the statistical reports of covid- cases and mortality rate of hungary which is available at: https://www.worldometers.info/coronavirus/country/hungary/. figure and presents the total and daily reports of covid- statistics, respectively from -march to -april. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . in the present study, modeling is performed by machine learning methods. training is the basis of these methods as well as many artificial intelligence (ai) methods [ , ] . according to some psychologists, humans or living things interact with their surroundings by trial and error and achieve the best performance to reach a goal. based on this theory and using the ability of computers to repeat a set of instructions, these conditions can be provided for computer programs to interact with the environment by updating values and optimizing functions, according to the results of interaction with the environment to solve a problem or achieve a specific goal. how to update values and parameters in successive repetitions by a computer is called a training algorithm [ ] [ ] [ ] . one of these methods is neural networks (nn), according to which the modeling of the connection of neurons in the human brain, software programs were designed to solve various problems. to solve these problems, operational nn such as classification, clustering, or function approximation are performed using appropriate learning methods [ , ] . the training of the algorithm is the initial and the important step for developing a model [ , ] . developing a predictive ai model requires a dataset categorized into two sections i.e. input(s) (as independent variable(s)) and output(s) (as dependent variable(s)) [ ] . in the present study, time-series data have been considered as the independent variables for the prediction of covid- cases and mortality rate (as dependent variables). time-series dataset was prepared based on two scenarios as described in table . the first scenario categorizes the time-series data into four inputs for the last four consequently odd days' cases or mortality rate for the prediction of xt as the next day's case or mortality rate, and the second scenario categorizes the time-series data into four inputs for the last four consequently even days' cases or mortality rate for the prediction of xt as the next day's case or mortality rate. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . [ , ] . for this reason, hybrid methods have been growing up [ , ] . hybrid methods contain a predictor and one or more optimizer [ , ] . the present study develops a hybrid mlp-ica method as a robust hybrid algorithm for developing a platform for predicting the covid- cases and mortality rate in hungary. the ica is a method in the field of evolutionary calculations that seeks to find the optimal answer to various optimization problems. this algorithm, by mathematical modeling, provides a socio-political evolutionary algorithm for solving mathematical optimization problems [ ] . like the all algorithms in this category, the ica constitutes a primary set of possible answers. these answers are known as countries in the ica. the ica gradually improves the initial responses (countries) and ultimately provides the appropriate answer to the optimization problem [ , ] . the algorithms are based on the policy of assimilation, imperialist competition and revolution. this algorithm, by imitating the process of social, economic, and political development of countries and by mathematical modeling of parts of this process, provides operators in the form of a regular algorithm that can help to solve complex optimization problems [ , ] . in fact, this algorithm looks at the answers to the optimization problem in the form of countries and tries to gradually improve these answers during a repetitive process and eventually reach the optimal answer to the problem [ , ] . in the nvar dimension optimization problem, a country is an array of nvar × length. this array is defined as follows: to start the algorithm, the country number of the initial country is created (ncountry). to select the nimp as the best members of the population (countries with the lowest amount of cost function) as imperialists. the rest of the ncol countries, form colonies belonging to an empire. to divide the early colonies between the imperialists, imperialist owns number of colonies, the number of which is proportional to its power [ , ] . the following figure symbolically shows how the colonies are divided among the colonial powers. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . figure . the initial empires generation [ ] the integration of this model with the neural network causes the error in the network to be defined as a cost function, and with the changes in weights and biases, the output of the network improves and the resulting error decreases [ ] . an adaptive neuro-fuzzy inference system (anfis) is a type of artificial neural network based on the takagi-sugeno fuzzy system [ , ] . this method was developed in the early s. since this system integrates neural networks and concepts of fuzzy logic, it can take advantage of the capabilities of both methods. it has nonlinear functions [ , ] . figure presents the architecture of the developed anfis model. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . evaluations were conducted by determination coefficient, root mean square error and mean absolute percentage error values. these factors compare the target and output values and calculates a score as an index for the performance and accuracy of the developed methods [ , ] . table presents the evaluation criteria equations. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . rmse= √ ∑ ( − ) ( ) where n is the number of data, x and y are, respectively, predicted (output) and desired (target) values. the performance of the proposed algorithm is evaluated using both training and validation data. the training data are used to train the algorithm and define the best set of parameters to be used in anfis and mlp-ica. after that, the best setup for each algorithm is used to predict outbreaks on the validation samples. worth mentioning that due to the lack of adequate sample data to avoid the overfitting, the training is used to evaluate the model with higher performance. the training step for anfis was performed by employing three mf types as described in tables and . according to table , it can be claimed that gaussian mf provided the lowest error and highest accuracy compared with other mf types for the prediction of mortality rate. also it can be claimed that, for the selected mf type, scenario provides the highest performance compared with scenario for the prediction of mortality rate. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . figure presents the plot diagram for the selected models according to table . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . according to figure , mlp-ica in the presence of scenario provides lower deviation from target . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint value followed by mlp-ica in the presence of scenario than the anfis in the presence of both scenarios. figures and present total cases and total mortality rate, respectively, and figure and present the daily prediction of the results from -april to -july. each figure has two sections including the reported . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . table . represents the validation of the mpl-ica and anfis models for the period of - april. the proposed model of mpl-ica presented promising values for rmse and determination coefficient for prediction of both outbreak and total mortality. this approach outperforms commonly used prediction tools in the case of hungary. more work is required if this technique is adequate in all cases and for different population types and sizes. nonetheless, the learning approach could overcome imperfect input data. incomplete catalogs can occur because infected persons are asymptotic, not tested, or not listed in databases. tests in a closed environment such as large aircraft carriers in france and the us have shown that up to % of the infected personnel were asymptomatic. of course, military personals are not representative of large and mixed populations. nonetheless, it shows that false negatives can be abundant. in emerging countries, access to laboratory equipment required for testing is extremely limited. this will introduce a bias in the counting. finally, it is unclear if all the cases are registered. in the uk for example, it took public pressure for the government to make the casualties in retirement hospices known. and there is still doubt in the community that china produced complete data for political reasons. at the same time, national governments and local administrations implemented containment measures such as confinement and social distancing. these actions have a huge impact on transmissions and thus on cases and casualties. access to modern medical facilities is also a parameter that mainly affects the number of casualties. all these aspects will affect traditional estimation procedures whereas learning algorithms might be able to adapt, especially if multiple datasets are available for a given region. not only can our approach outperform the commonly used sir but it requires fewer input data to estimate the trends. while we provide successful results for hungarian data we need to further test these novel approaches on other databases. nonetheless, the presented results are promising and should incite the community to implement these new tools rapidly. although sir-based models been widely used for modeling the covid- outbreak, they include some degree of uncertainties. several advancements are emerging to improve the quality of sirbased models suitable to covid- outbreak. as an alternative to the sir-based models, this study proposed machine learning as a new trend in advancing outbreak models. the machine learning approach makes no assumption on the pandemic and spread of the infection. instead it predicts the time series of the infected cases as well as total mortality cases. in this study the hybrid machine learning model of mlp-ica and anfis are used to predict the covid- outbreak in hungary. the models predict that by late may the outbreak and the total morality will drop substantially. based on the results reported here, and due to the complex nature of the covid- outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. two scenarios were proposed. scenario considered sampling the odd days and scenario used even days for training the data. training the two machine learning models anfis and mlp-ica were considered for the two scenarios. a detailed investigation was also carried out to explore the most suitable number of neurons. furthermore, the performance of the proposed algorithm is evaluated using both training and validation data. the training data are used to train the algorithm and define the best set of parameters to be used in anfis and mlp-ica. after that, the best setup for each algorithm is used to predict outbreaks on the validation samples. the validation is performed for nine days with promising results which confirms the model accuracy. in this study due to the lack of adequate sample data to avoid the overfitting, the training is used to choose and evaluate the model with higher performance. in the future research, as the covid- progress in time and with the availability of more sample data further testing and validation can be used to better evaluate the models. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . both models showed promising results in terms of predicting the time series without the assumptions that epidemiological models require. both machine learning models, as an alternative to epidemiological models, showed potential in predicting covid- outbreak as well as estimating total mortality. yet, mlp-ica outperformed anfis with delivering accurate results on validation samples. considering the availability of a small amount of training data, further investigation would be essential to explore the true capability of the proposed hybrid model. it is expected that the model maintains its accuracy as long as no major interruption occurs. for instance, if other outbreaks would initiate in the other cities, or the prevention regime changes, naturally the model will not maintain its accuracy. for the future studies advancing the deep learning and deep reinforcement learning models is strongly encouraged for comparative studies on various ml models for individual countries. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint the species severe acute respiratory syndromerelated coronavirus: classifying -ncov and naming it sars-cov- a familial cluster of pneumonia associated with the novel coronavirus indicating personto-person transmission: a study of a family cluster novel coronavirus ( -ncov): situation report coronavirus disease (covid- ): situation report covid- and italy: what next? lancet predicting the impacts of epidemic outbreaks on global supply chains: a simulation-based analysis on the coronavirus outbreak (covid- /sars-cov- ) case the forecasting of dynamical ross river virus outbreaks inter-outbreak stability reflects the size of the susceptible pool and forecasts magnitudes of seasonal epidemics on the predictability of infectious disease outbreaks real-time forecasting of hand-foot-and-mouth disease outbreaks using the integrating compartment model and assimilation filtering supervised forecasting of the range expansion of novel non-indigenous organisms: alien pest organisms and the h n flu pandemic testing predictability of disease outbreaks with a simple model of pathogen biogeography short-term forecasting of bark beetle outbreaks on two economically important conifer tree species real-time predictions of the - ebola virus disease outbreak in the democratic republic of the congo using hawkes point process models a note on the derivation of epidemic final sizes mathematical models of sir disease spread with combined non-sexual and sexual transmission routes. infectious disease modelling effective containment explains sub-exponential growth in confirmed cases of recent covid- outbreak in mainland china the effectiveness of fallowing strategies in disease control in salmon aquaculture assessed with an sis model. preventive veterinary medicine mathematical modelling of the transmission dynamics of ebola virus research about the optimal strategies for prevention and control of varicella outbreak in a school in a central city of china: based on an seir dynamic model calibration of a seir-sei epidemic model to describe the zika virus outbreak in brazil fitting the seir model of seasonal influenza outbreak to the incidence data for russian cities transmission dynamics of zika fever: a seir based model real-time prediction of influenza outbreaks in belgium forecasted size of measles outbreaks associated with vaccination exemptions for schoolchildren simple framework for real-time forecast in a data-limited situation: the zika virus (zikv) outbreaks in brazil from to as an example. parasites vectors predicting social response to infectious disease outbreaks from internet-based news streams effective network size predicted from simulations of pathogen outbreaks through social networks provides a novel measure of structure-standardized group size google trends predicts present and future plague cases during the plague outbreak in madagascar: infodemiological study prediction of dengue outbreaks based on disease surveillance, meteorological and socio-economic data forecasting respiratory infectious outbreaks using ed-based syndromic surveillance for febrile ed visits in a metropolitan city superensemble forecast of respiratory syncytial virus outbreaks at national, regional, and state levels in the united states the norovirus epidemiologic triad: predictors of severe outcomes in us norovirus outbreaks consensus and conflict among ecological forecasts of zika virus outbreaks in the united states seasonal difference in temporal transferability of modified seir and ai prediction of the epidemics trend of covid- in china under public health interventions covid- ) classification using ct images by machine learning methods covid- : automatic detection from x-ray images utilizing transfer learning with convolutional neural networks prediction of survival for severe covid- patients with three clinical features: development of a machine learningbased prognostic model with clinical data in wuhan critical care utilization for the covid- outbreak in lombardy, italy: early experience and forecast during an emergency response regression model based covid- outbreak predictions in india a machine learning methodology for real-time forecasting of the - covid- outbreak using internet searches, news alerts, and estimates from mechanistic models covid- outbreak prediction with machine learning flood prediction using machine learning models: literature review forecasting shear stress parameters in rectangular channels using new soft computing methods hybrid model of morphometric analysis and statistical correlation for hydrological units prioritization complete statistical analysis to weather forecasting advances in machine learning modeling reviewing hybrid and ensemble methods hybrid machine learning model of extreme learning machine radial basis function for breast cancer detection and diagnosis; a multilayer fuzzy expert system list of deep learning models. engineering for sustainable future deep learning and machine learning in hydrological processes climate change and earth systems a systematic review prediction of combine harvester performance using hybrid machine learning modeling and response surface methodology. engineering for sustainable future, lecture notes in networks and systems, springer nature switzerland. . . ardabili;, s.; mosavi;, a.; varkonyi-koczy;, a. systematic review of deep learning and machine learning models in biofuels research,. engineering for sustainable future performance analysis of combine harvester using hybrid model of artificial neural networks particle swarm optimization comparative analysis of ann-ica and ann-gwo for crop yield prediction prediction of combine harvester performance using hybrid machine learning modeling and response surface methodology state of the art survey of deep learning and machine learning models for smart cities and urban sustainability imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition an imperialist competitive algorithm with memory for distributed unrelated parallel machines scheduling imperialist competitive algorithm for minimum bit error rate beamforming evolving artificial neural network and imperialist competitive algorithm for prediction oil flow rate of the reservoir adaptive neuro-fuzzy inference system for prediction of water level in reservoir an expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease prediction of the strength and elasticity modulus of gypsum using multiple regression, ann, and anfis models online genetic-anfis temperature control for modeling the effects of ultrasound power and reactor dimension on the biodiesel production yield: comparison of prediction abilities between response surface methodology (rsm) and adaptive neuro-fuzzy inference system (anfis) application of anfis-based subtractive clustering algorithm in soil cation exchange capacity estimation using soil and remotely sensed data modeling and analysis of significant process parameters of fdm d printer using anfis application of anfis to predict crop yield based on different energy inputs water quality prediction model utilizing integrated wavelet-anfis model with cross-validation estimation of wind speed profile using adaptive neuro-fuzzy inference system (anfis) . faizollahzadeh_ardabili; mahmoudi, a.; mesri gundoshmian, t. modeling and simulation controlling system of hvac using fuzzy and predictive (radial basis function ,rbf) controllers modelling temperature variation of mushroom growing hall using artificial neural networks. engineering for sustainable future key: cord- -hdf blpu authors: ahmetolan, semra; bilge, ayse humeyra; demirci, ali; peker-dobie, ayse; ergonul, onder title: what can we estimate from fatality and infectious case data using the susceptible-infected-removed (sir) model? a case study of covid- pandemic date: - - journal: front med (lausanne) doi: . /fmed. . sha: doc_id: cord_uid: hdf blpu the rapidly spreading covid- that affected almost all countries, was first reported at the end of . as a consequence of its highly infectious nature, countries all over the world have imposed extremely strict measures to control its spread. since the earliest stages of this major pandemic, academics have done a huge amount of research in order to understand the disease, develop medication, vaccines and tests, and model its spread. among these studies, a great deal of effort has been invested in the estimation of epidemic parameters in the early stage, for the countries affected by covid- , hence to predict the course of the epidemic but the variability of the controls over the course of the epidemic complicated the modeling processes. in this article, the determination of the basic reproduction number, the mean duration of the infectious period, the estimation of the timing of the peak of the epidemic wave is discussed using early phase data. daily case reports and daily fatalities for china, south korea, france, germany, italy, spain, iran, turkey, the united kingdom and the united states over the period january , –april , are evaluated using the susceptible-infected-removed (sir) model. for each country, the sir models fitting cumulative infective case data within % error are analyzed. it is observed that the basic reproduction number and the mean duration of the infectious period can be estimated only in cases where the spread of the epidemic is over (for china and south korea in the present case). nevertheless, it is shown that the timing of the maximum and timings of the inflection points of the proportion of infected individuals can be robustly estimated from the normalized data. the validation of the estimates by comparing the predictions with actual data has shown that the predictions were realized for all countries except usa, as long as lock-down measures were retained. the rapidly spreading covid- that affected almost all countries, was first reported at the end of . as a consequence of its highly infectious nature, countries all over the world have imposed extremely strict measures to control its spread. since the earliest stages of this major pandemic, academics have done a huge amount of research in order to understand the disease, develop medication, vaccines and tests, and model its spread. among these studies, a great deal of effort has been invested in the estimation of epidemic parameters in the early stage, for the countries affected by covid- , hence to predict the course of the epidemic but the variability of the controls over the course of the epidemic complicated the modeling processes. in this article, the determination of the basic reproduction number, the mean duration of the infectious period, the estimation of the timing of the peak of the epidemic wave is discussed using early phase data. daily case reports and daily fatalities for china, south korea, france, germany, italy, spain, iran, turkey, the united kingdom and the united states over the period january , -april , are evaluated using the susceptible-infected-removed (sir) model. for each country, the sir models fitting cumulative infective case data within % error are analyzed. it is observed that the basic reproduction number and the mean duration of the infectious period can be estimated only in cases where the spread of the epidemic is over (for china and south korea in the present case). nevertheless, it is shown that the timing of the maximum and timings of the inflection points of the proportion of infected individuals can be robustly estimated from the normalized data. the validation of the estimates by comparing the predictions with actual data has shown that the predictions were realized for all countries except usa, as long as lock-down measures were retained. the coronavirus disease caused by severe acute respiratory syndrome coronavirus (sars-cov- ) is a highly contagious disease affecting huge numbers of people all over the world. the earliest case was identified in china in december . after the first diagnosis, the disease has spread very quickly to other countries, in spite of efforts to slow and stop the transmission of covid- , such as self-isolation, quarantine, social distancing, contact tracing, and travel limitations. as a result of its rapid spread and very high infection rates, the world health organization (who) declared covid- a pandemic in march ( ) . as of april , even though the pandemic has passed its early stage and there are % fewer cases in china as a consequence of successful containment measures, the disease is rapidly expanding in europe, america, asia, middle east, and africa. despite the application of travel restrictions by many countries, there have been no substantial delays in the arrival of the pandemic in non-affected areas, as in the case of the h n epidemic in ( ) . a great deal of effort has been invested in the estimation of epidemic parameters of covid- in the early stage for china and some other countries ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . in ( ), the authors analyzed the temporal dynamics of the disease in china, italy and france in the period between nd of january and th of march . in ( ), the potential for sustained human-to-human transmission to occur in locations outside wuhan is assessed based on the estimations of how transmission in wuhan varied between december, , and february, . the difficulties related to the accurate predictions of the pandemic is discussed in ( ) . in ( ) , the authors used phenomenological models that were developed for previous outbreaks to generate and assess shortterm forecasts of the cumulative number of confirmed reported cases in hubei province and for the overall trajectory in china ( ) . epidemic analysis of the disease in italy is presented in ( ) by means of dynamical modeling ( ) . forecasting covid- is investigated in ( ) by using a simple iteration method that needs only the daily values of confirmed cases as input. a cumulative distribution function (cdf) and its first derivative are used to predict how the pandemic will evolve in ( ) . in ( ) , the authors proposed a segment poisson model for the estimation. in ( ), a meta-population model of disease transmission in england and wales was adapted to predict the timing of the peak of the epidemic. in addition, it was shown that the change in the epidemic behavior of various countries can be traced by the use of data driven systems ( ) . one of the common features of these works is the existence of variations in these parameter estimations. in the present work, the determination of the following parameters is discussed: ) the basic reproduction number ℜ , ) the mean duration of the infectious period t, ) the time t m (days) at which the number of infectious cases reaches its maximum, i.e., the first derivative of i(t) is zero, ) the time t a (days) at which the rate of increase in the number of infectious cases reaches its maximum, i.e., the time at which the second derivative of i(t) is zero and the first derivative is positive, ) the time t b (days) at which the rate of decrease in the number of infectious cases reaches its maximum, i.e., the time at which the second derivative of i(t) is zero and the first derivative is negative. by employing the susceptible-infected-removed (sir) model, we show that the quantity that can be most robustly estimated from normalized data, is the timing of the maximum and timings of the inflection points of the proportion of infected individuals. these values correspond to the peak of the epidemic and to the highest rates of increase and the highest rates of decrease in the number of infected individuals. the stability of the estimations is discussed by comparing predictions based on data with long time spans. publicly accessible data that have been released by the state offices of each country are used for the analysis. the data set of each country is collected according to published official reports and available at the website http://www.worldometers. info/coronavirus/ (access: april , ). updated data are also available at the website http://epikhas.khas.edu.tr/. data used for the analysis covers the period january -april , and in the following, "day " corresponds to january , . the analysis uses susceptible-infected-removed (sir) model ( ) and solutions are obtained by numerical methods. updated data covering the period april- july is used to assess the performance of the model. the susceptible-infected-removed (sir) model is a system of ordinary differential equations modeling the spread of epidemics in a closed population, under the assumption of permanent immunity and homogeneous mixing ( ) . these equations are since the right hand sides of these equations add up to zero, the sum s + i + r is a constant that is equal to the total number of individuals in the population. thus by normalizing, we may assume that s, i, and r are proportions of individuals in respective groups. since the covid- infection has an incubation period, the right model to use is the seir system. but, in previous work ( ) it was shown that the parameters of the seir model cannot be determined from the time evolution of the normalized curve of removed individuals. thus, the seir model should not be used in the absence of additional information that might be obtained by clinical studies. in the present work, since we assume no clinical information we will use the sir model, with necessary modifications for the interpretation of the results, as indicated in ( ) . the ratio β/η, called the basic reproduction number and denoted as ℜ , is the key parameter in both the sir and seir models. this number is related to the growth rate of the number of infected individuals in a fully susceptible population and determines the final value of r denoted by r f that is the proportion of individuals that will be affected by the disease. this proportion includes individuals who gain immunity without showing symptoms, those who are treated, as well as diseaserelated fatalities. the reciprocal of the parameter η, t = /η is considered as a representative of the mean infectious period. the relation between ℜ and r f is determined as follows. note that r(t) is a monotonically increasing function, and hence it can be used as an independent variable, instead of t. the derivative of s with respect to r is given by assuming initial conditions s → and r → as t approaches negative infinity, on can integrate and obtain then, as t approaches positive infinity, since i → , s + r = yields ℜ can be solved from this equation as a function of r f , and their relation is displayed on figure . the graph of r f vs. ℜ is shown on figure , together with the ranges of ℜ for well-known diseases. it can be seen that for ℜ > . , r f is > %. the figure also shows that the increase in r f with respect to ℜ is very slow for ℜ > . it is generally accepted that the ℜ for covid- is > despite all containment measures ( ) ( ) ( ) . thus, unless vaccination is applied, one would expect that at least % of the population would be affected by the disease. in addition, the knowledge of its precise value would have little effect on the planning of healthcare measures. it should also be kept in mind that containment measures provide a temporary control of the spread of the epidemic, just to the point of reducing the burden of the epidemic to a manageable size. according to the centers for disease control and prevention (cdc), at the time we completed the data collection phase of our research, it was still unknown when viral shedding begins or how long it lasts for, and nor is the period of covid- 's infectiousness known. like infections with mers-cov and sars-cov, sars-cov- rna may be detectable in the upper or lower respiratory tract for weeks after illness onset, though the presence of viral rna is no guarantee of the presence of the infectious virus. it has been reported that the virus was found without any symptoms being shown (asymptomatic infections) or before symptoms developed (pre-symptomatic infections) with sars-cov- , though the role they may play in transmission remains unknown. according to prior studies, the incubation period of sars-cov- , like other coronaviruses, may last for - days . to illustrate an example for an sir model, ℜ , t, and r( ) are chosen as , , and − , respectively and the related graphs are given on figure . from figure , it can be seen that for parameter values ℜ = , t = day, the duration of the epidemic is about days. the peak of the epidemic occurs approximately at day . note that the derivative of i(t) vanishes at time t m when s(t m ) = / ℜ . in this example, s(t m ) = . , i(t m ) = . , and r(t m ) = . . the final values of s(t) and r(t) are s f = . and r f = . at the end of the epidemic. it is in general accepted that the number of fatalities represents the number of removed individuals and the number of confirmed cases represents the number of infected individuals. in the initial phase of the epidemic, little information was available on the proportionality constants, but as long as they don't change in time, one can work with the normalized case reports and normalized fatalities and look for the determination of the epidemic parameters from the shape of these normalized curves. in section estimation of the sir model parameters, it will be shown that for the covid- data, total cases would be a better representative of the number of removed individuals. according to the sir model, given by the equations in ( ), the rate of change of the number of removed individuals is proportional to the number of infectious cases. in terms of observations, this corresponds to the fact that the ratio of, for example, daily fatalities to daily infectious cases should be constant. in the literature on the analysis of historical epidemics, fatality reports are usually the only available data, hence models are necessarily based on the assumption that cumulative fatalities represent cumulative number of removed individuals. for the covid- pandemic, as daily fatality and infectious case reports are available, further evaluation of the representation of r(t) in terms of fatality data is presented. daily infections and total fatalities are displayed on figure , for all countries. normalized daily infectious cases and total fatalities are shown on figure . from figure , it can be seen that the epidemic cycle has been completed in china over the course of about days. the jump in total fatalities is due to a change in the reporting scheme. as our analysis is based on total infectious cases, this change has no effect on the models. for south korea, the epidemic is in a state of slow decrease at the end of about days, but the rate of infections is still high. this qualitative behavior is an indication of the fact that ℜ for south korea is expected to be much higher than the one for china ( , ) . for france, germany and iran, the epidemic is in the decline phase. for the rest of the countries, further analysis is needed in order to assess epidemic phase. as noted above, the knowledge of ℜ determines the total proportion of individuals that would be affected, r f . frontiers in medicine | www.frontiersin.org furthermore, the peak of i(t) occurs at the time t m , at which the proportion of susceptible individual falls to the value /ℜ . this information is useful for the determination of the proportion of people that have to be vaccinated in order to drag the proportion of susceptible individuals below this threshold. the basic reproduction number is "defined" as the number of new infections per unit time in a fully susceptible population. thus, it is a quantity that might be measured by direct on-site observations. on the other hand, the knowledge of ℜ by itself does not give any information on the timing of the progress of the epidemic. it will be seen that ℜ and t can be estimated only for china where the spread of the epidemic is over. for other countries, ℜ and t cannot be estimated from the normalized data, but the timings of the key events, t m , t a , and t b can be determined quite reliably. methods for estimating the parameters ℜ , t, t m , t a , and t b these parameters are determined by a "brute force" approach: the models are run for a broad range of parameters. then the difference between data and the model is compared by using various norms. finally, the models that match data within % are selected. if the scatter plot of the errors vs. the parameter to be estimated has a sharp minimum, it is concluded that the corresponding parameter can be determined from the shape of the normalized data. the parameter ranges for the sir model are and the initial values are chosen as where < k < . for south korea, these parameter ranges are extended appropriately. in the sir model, since r ′ = η i; that is, the rate of change in the number of removed individuals is proportional to the number of infected individuals, it is expected that the cumulative cases are proportional to cumulative fatalities. thus, the sir model predicts the simultaneity of the daily fatalities and daily infections. the verification of this fact requires the availability of data both for infections and for fatalities. the data for the h n epidemic collected at certain major hospitals ( ) is valuable in the sense of reflecting information on both infections and fatalities. the peculiarity of this data is a shift of about days between total infections and total fatalities, the peak of infections occurring days prior to the peak of fatalities. this time shift was explained by a multi-stage sir model ( ) . cumulative cases and cumulative fatalities for covid- do not show such a clear time shift. on the contrary, in china and korea, fatalities increase faster than infections. in germany, there is a slight lead for infections, while for other countries the two curves more or less coincide. the lead of fatalities over infections that is observed in china and in korea is an unexpected fact, which is possibly due to the irregularities in the statistics, in medical treatment practices, etc. we should also note that the progression of the covid- epidemic is unique in the sense that new treatment methods are applied during the initial phase in china and these methods have been applied in other countries. for china, several programs were run, first by fitting the predicted r(t) to the total fatality data, then to the cumulative infectious case data. in the first case, about models fitting cumulative fatalities within % error and about , models that fit cumulative infections within % error are found. furthermore, in the latter case, the minima for the quantities that were aimed to be determined were much sharper. for south korea, as it will be explained later, the model matching was not successful. for other countries, as the difference between total infections and total fatalities was negligible, total infections are used as a representative of r(t) of the sir model. our main result is that it is not possible to determine the basic reproduction number and the mean duration of the infectious period from the shape of the normalized data (unless there are reasonable estimates for either of these parameters). in order to make a reliable determination of the parameters ℜ and t by using the early stage data, a certain period of time has to pass. this period is ∼ days for a(h n ) epidemic ( ) . however, this period for covid- is still uncertain. this is possibly the reason why the parameters for countries other than china and south korea can not be established. on the other hand, the timings of the peak of the infectious cases, the peak of the rate of increase and the rate of decrease of the infectious cases can be determined more precisely from the shape of the normalized data. the 'best' estimations of the parameters ℜ and t lie on a curve that is nearly linear when a sir model is used to fit the data of an epidemic. this fact has been observed in previous work ( ) , in the study of the h n epidemic and it was explained by the fact that the duration of the epidemic pulse (appropriately defined in terms of a fraction of the peak of infections) was nearly invariant for values of ℜ and t, with ℜ /t constant. in order to visualize this situation, the solutions of this system of differential equations of the sir model ( ) for parameter range < ℜ < , and β = ℜ /t = / are obtained. the graphs of normalized solutions (after an appropriate time shift) are given in figure . the scatter plots of the mean infectious period t vs. ℜ , and the scatter plots of the modeling error vs. the parameters are presented in figures - where i t and i tt represent the values of the first and the second derivatives of i(t) at the last day of the data april , , respectively. the error stands for the relative error between the normalized r(t) of the model and normalized total infectious cases, in the l norm. normalized values of i(t) for < ℜ < , and β = ℜ /t = / , together with the inflection points (t a , t b ), the peak point (t m ), and the timing of the initial (t ) and final (t ) points when the % of maximum value i(t) epidemic, (b) dependency of t , t a , t m , t b , t on ℜ . figure | china: the th and th graphs indicate that the epidemic is in phase iv. south korea: the th and th graphs indicate that the epidemic is in phase iv. the values for ℜ and t don't seem to fall in reasonable ranges and the data for south korea should be studied more closely. in figures - , the first graph, in the upper left of the panel is the scatter plot of the mean duration of the infectious period, t, with respect to the basic reproduction number ℜ , for models that fit data within % error in the norm described above. for all countries, the "best" parameters lie on a curve, instead of being agglomerated around a mean. this indicates that although the sir model fitting normalized data is unique, the parameters ℜ and t cannot be determined precisely from normalized data. the colors blue, red, and yellow in figures - represent the results according to whether the last day of the analysis, t f , is , , and , respectively. in figures - , the second (first row, right panel) and the third (second row, left panel) graphs display the scatter plot of the modeling error with respect to ℜ and t, respectively. for china, there are well-defined minima in the modeling errors at nearly ℜ = and t = . for south korea, the minima of the error in ℜ seems to be located beyond ℜ = , and the minimal error in t corresponds to t = approximately. these parameter values are not in the ranges reported in the literature. data for south korea shows different characteristics, that might be due to the strategy of extensive testing and filiation, as opposed to lock-down measures. an indication of figure | france: the th and th graphs indicate that the epidemic is at the beginning of phase iii. germany: the th and th graphs indicate that the epidemic is at the beginning of phase iv. figure | italy: the th and th graphs indicate that the epidemic is in phase iii. spain: the th and th graphs indicate that the epidemic is in phase iii. extensive testing policy is the fact that ∼ . percent of confirmed coronavirus patients in south korea were in their s, showing that asymptomatic cases are also included in the statistics. for all of the remaining countries, the ranges of ℜ and t corresponding minimal modeling errors are too large to attempt any reasonable estimation for these parameters. if either ℜ or t is estimated by using alternative methods (medical observations etc.), it would be possible to obtain better estimates and improve the model by bootstrapping. the fourth (second row, right panel) graph in figures - shows the scatter plot of the modeling error vs. t m , the timing of the peak of the number of infections. for all of the countries analyzed, this parameter can be estimated quite sharply. in order to study the reliability of this estimation, the model matching process is repeated for t f = , , and . the ratio of infected individuals i(t) has two inflection points. the first inflection point (t a ) is located at the left of the maximum (t m ) whereas the second one (t b ) is located at the right of t m . t a and t b correspond to the highest rate of increase and decease in i(t), respectively. in figures - , the right and left panels of the third row display scatter plot of the error in these quantities. their variation with respect to t f is also investigated. the values of the first and second derivatives at t f are shown on the fourth row, left and right panels, respectively. if the first derivative is positive (negative), the i(t) is in the rising (falling) phase, while if the second derivative is positive (negative) the curve is concave up (down). the epidemic phases which are shown in figure , are categorized by the sign of the first and the second derivatives of i(t) as follows in section results for each country, it can be seen that although ℜ and t cannot be determined, it was possible to estimate t m , t a , and t b quite sharply from data. in this section, the reliability of these estimates is discussed by comparing predictions based on data with different time spans. the best sir models fitting data for , , and days are obtained, and data and graphs of best models for each time span are plotted in figures , . for china and south korea, for which the epidemic cycle is more or less complete, estimations based on time spans varying by days give the same result as can be observed in figure . on the other hand, for those countries that are as yet before or around the peak of the epidemic, the situation may be different, as can be observed in figure . accuracy of estimates was ascertained through comparison with the real data between th april and st july. these comparisons are given in figures , as red dashed curves. the observations are as follows. when the initial analysis was performed, china, and south korea were in phase . our estimates and the real data for both countries are consistent. the estimate for france is not consistent with the real data post day . french authorities loosened quarantine restrictions on th may (day ). this event may be the reason for the fluctuations in the number of infectious cases. the estimates for germany and iran are consistent with the real data. however, germany is going through the third and the fourth phases faster than expected. besides, the active infectious cases post rd may (day ) show a continuous increase. the decrease in the active infectious cases up to this date was close to our estimates. the estimates for italy is consistent with the real data. on the other hand, the maximum of the infectious cases occurred slightly later than expected. in addition, italy is going through the third and the fourth phases more slowly than expected. the most recent data conforms closely to our predictions. spain has not shared the data for daily discharged patients since the th of may (day ). therefore, the estimates are compared with the real data up to th may (day ). our estimates and the real data for spain are consistent. however, the maximum of the infectious cases occurred slightly later than expected. in addition, spain is going through the third and the fourth phases more slowly than expected as in italy. our estimates and the real data for turkey are consistent. the maximum of the infectious cases occurred slightly later than expected. the decrease in the active infectious cases was close to our estimates up to a certain date. later, the number of infectious cases shows fluctuations. loosening quarantine restrictions on st june may be the reason for these fluctuations. united kingdom has not shared the data for daily discharged patients for a long time. we can not compare our estimation with the real data. as for the usa the spread of the epidemic has been beyond all predictions and it is still growing. the discrepencies between estimates and real data and the failure to estimate parameteres for usa can be explained as follows. the basic reproduction number ℜ is beta/eta and beta is a product of the virulence of the virus and the contact rate in the society. the contact rate depence crucially on lock-down measures. as these measures change, the course of the epidemic follows a different dynamic. the epidemic parameters of covid- for selected countries are estimated by using the data released by the state offices. these parameters include the basic reproduction number, mean duration of infectious period, the time at which the number of infectious cases reaches its maximum, the time at which the rate of increase in the number of infectious cases reaches its maximum, the time at which the rate of decrease in the number of infectious cases reaches its maximum. for each country, the best susceptible-infected-removed (sir) models fitting cumulative case data are obtained. a wide variety of intervals with different scales of the parameters, basic reproduction number ℜ , and infectious period t, are observed. more specifically, the basic reproduction number and mean duration of infectious period are estimated only for china since the spread of the disease there is over. these parameters are found to be and , respectively. the fact that the median incubation and infection periods are ∼ days, supports the observations for ℜ and t. however, the basic reproduction number and infectious period for other countries cannot be predicted from the normalized data but the timing of key events can be estimated quite reliably. to summarize, we show that the quantity that can be the most robustly estimated from the normalized data, is the timing of the highest rate of increase in the number of infections, i.e., the inflection point of the number of infected individuals. however, it should be pointed out that the analysis performed by the sir model for south korea provides dissimilar results which can be explained by the unique age distribution nature of the confirmed cases. publicly available datasets were analyzed in this study. this data can be found here: http://www.worldometers.info/ coronavirus/; http://epikhas.khas.edu.tr/. ab and ad performed the computations. ad collected the data. oe provided the medical insights. sa and ap-d performed the literature survey and wrote the paper. all authors contributed to the article and approved the submitted version. world health organization real-time numerical forecast of global epidemic spreading: case study of analysis and forecast of covid- spreading in china, italy and france early dynamics of transmission and control of covid- : a mathematical modelling study why is it difficult to accurately predict the covid- epidemic? real-time forecasts of the covid- epidemic in china from propagation analysis and prediction of the covid- epidemic analysis of covid- in italy by dynamical modeling ssrn extended sir prediction of the epidemics trend of covid- in italy and compared with hunan forecasting covid- can we predict the occurrence of covid- cases? considerations using a simple model of growth predicting turning point, duration and attack rate of covid- outbreaks in major western countries a spatial model of covid- transmission in england and wales: early spread and peak timing. medrxiv estimation of the final size of the covid- epidemic. medrxiv qualitative analyses of communicable disease models on the uniqueness of epidemic models fitting a normalized curve of removed individuals modelling the epidemic trend of the novel coronavirus outbreak in china. biorxiv estimation of the transmission risk of the -ncov and its implication for public health interventions novel coronavirus -ncov: early estimation of epidemiological parameters and epidemic predictions. medrxiv preliminary estimation of the basic reproduction number of novel coronavirus samanlioglu f, bilge ah, ergonul o. a susceptible-exposed-infectedremoved (seir) model for the - a/h n epidemic in istanbul on the time shift phenomena in epidemic models determination of epidemic parameters from early phase fatality data: a case study of the a (h n ) pandemic in europe the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.copyright © ahmetolan, bilge, demirci, peker-dobie and ergonul. this is an open-access article distributed under the terms of the creative commons attribution license (cc by). the use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. no use, distribution or reproduction is permitted which does not comply with these terms. key: cord- - d j n authors: hong, hyokyoung g.; li, yi title: estimation of time-varying reproduction numbers underlying epidemiological processes: a new statistical tool for the covid- pandemic date: - - journal: plos one doi: . /journal.pone. sha: doc_id: cord_uid: d j n the coronavirus pandemic has rapidly evolved into an unprecedented crisis. the susceptible-infectious-removed (sir) model and its variants have been used for modeling the pandemic. however, time-independent parameters in the classical models may not capture the dynamic transmission and removal processes, governed by virus containment strategies taken at various phases of the epidemic. moreover, few models account for possible inaccuracies of the reported cases. we propose a poisson model with time-dependent transmission and removal rates to account for possible random errors in reporting and estimate a time-dependent disease reproduction number, which may reflect the effectiveness of virus control strategies. we apply our method to study the pandemic in several severely impacted countries, and analyze and forecast the evolving spread of the coronavirus. we have developed an interactive web application to facilitate readers’ use of our method. a recent work [ ] demonstrated that r is likely to vary "due to the impact of the performed intervention strategies and behavioral changes in the population". the merits of our work are summarized as follows. first, unlike the deterministic odebased sir models, our method does not require transmission and removal rates to be known, but estimates them using the data. second, we allow these rates to be time-varying. some timevarying sir approaches [ ] directly integrate into the model the information on when governments enforced, for example, quarantine, social-distancing, compulsory mask-wearing and city lockdowns. our method differs by computing a time-varying r , which gauges the status of coronavirus containment and assesses the effectiveness of virus control strategies. third, our poisson model accounts for possible random errors in reporting, and quantifies the uncertainty of the predicted numbers of susceptible, infectious and removed. finally, we apply our method to analyze the data collected from the aforementioned github time-series data repository. we have created an interactive web application (https://younghhk.shinyapps.io/ tvsirforcovid /) to facilitate users' application of the proposed method. we introduce a poisson model with time-varying transmission and removal rates, denoted by β(t) and γ(t). consider a population with n individuals, and denote by s(t), i(t), r(t) the true but unknown numbers of susceptible, infectious and removed, respectively, at time t, and by s (t) = s(t)/n, i(t) = i(t)/n, r(t) = r(t)/n the fractions of these compartments. the following ordinary differential equations (ode) describe the change rates of s(t), i(t) and r(t): with an initial condition: i( ) = i and r( ) = r , where i > in order to let the epidemic develop [ ] . here, β(t) > is the time-varying transmission rate of an infection at time t, which is the number of infectious contacts that result in infections per unit time, and γ(t) > is the time-varying removal rate at t, at which infectious subjects are removed from being infectious due to death or recovery [ ] . moreover, γ − (t) can be interpreted as the infectious duration of an infection caught at time t [ ] . from ( )-( ), we derive an important quantity, which is the time-dependent reproduction number r ðtÞ ¼ bðtÞ gðtÞ : time-varying sir based poisson model for covid - to see this, dividing ( ) by ( ) leads to where (di/dr)(t) is the ratio of the change rate of i(t) to that of r(t). therefore, compared to its time-independent counterpart, r ðtÞ is an instantaneous reproduction number and provides a real-time picture of an outbreak. for example, at the onset of the outbreak and in the absence of any containment actions, we may see a rapid ramp-up of cases compared to those removed, leading to a large (di/dr)(t) in ( ), and hence a large r ðtÞ. with the implemented policies for disease mitigation, we will see a drastically decreasing (di/dr)(t) and, therefore, declining of r ðtÞ over time. the turning point is t such that r ðt Þ ¼ ; when the outbreak is controlled with (di/dr)(t ) < . under the fixed population size assumption, i.e., s(t) + i(t)+ r(t) = , we only need to study i (t) and r(t), and re-express ( )-( ) as with the same initial condition. as the numbers of cases and removed are reported on a daily basis, t is measured in days, e.g. t = , . . ., t. replacing derivatives in ( ) with finite differences, we can consider a discrete version of ( ): iðt þ Þ À iðtÞ ¼ bðtÞiðtÞf À iðtÞ À rðtÞg À gðtÞiðtÞ; rðt þ Þ À rðtÞ ¼ gðtÞiðtÞ; where β(t) and γ(t) are positive functions of t. we set i( ) = i and r( ) = r with t = being the starting date. model ( ) admits a recursive way to compute i(t) and r(t): iðt þ Þ ¼ f þ bðtÞ À gðtÞgiðtÞ À bðtÞiðtÞfiðtÞ þ rðtÞg; for t = , . . ., t − . the first equation of ( ) implies that β(t) < γ(t) or r ðtÞ ¼ bðtÞg À ðtÞ < leads to that i(t + ) < i(t) or the number of infectious cases drops, meaning the spread of virus is controlled; otherwise, the number of infectious cases will keep increasing. to fit the model and estimate the time-dependent parameters, we can use nonparametric techniques, such as splines [ ] [ ] [ ] [ ] [ ] [ ] , local polynomial regression [ ] and reproducible kernel hilbert space method [ ] . in particular, we consider a cubic b-spline approximation [ ] . denote by b(t) = {b (t),. . .,b q (t)} t the q cubic b-spline basis functions over [ , t] associated with the knots = w < w < . . . < w q− < w q− = t. for added flexibility, we allow the number of knots to differ between β(t) and γ(t) and specify log bðtÞ ¼ when b ¼ � � � ¼ b q and g ¼ � � � ¼ g q , the model reduces to a constant sir model [ ] . we use cross-validation to choose q and q in our numerical experiments. denote by β ¼ ðb ; . . . ; b q Þ and γ ¼ ðg ; . . . ; g q Þ the unknown parameters, by z i (t) and z r (t) the reported numbers of infectious and removed, respectively, and by z i (t) = z i (t)/n and z r (t) = z r (t)/n, the reported proportions. also, denote by i(t) and r(t) the true numbers of infectious and removed, respectively at time t. we propose a poisson model to link z i (t) and z r (t) to i(t) and r(t) as follows: we also assume that, given i(t) and r(t), the observed daily number {z i (t), z r (t)} are independent across t = , . . ., t, meaning the random reporting errors are "white" noise. we note that ( ) is directly based on "true" numbers of infectious cases and removed cases derived from the discrete sir model ( ) . this differs from the markov process approach, which is based on the past observations. with ( ), ( ) and ( ), r(t) and i(t) are the functions of β and γ, since given the data (z i (t), z r (t)), t = , . . ., t, we obtain ðβ;ĝÞ, the estimates of (β, γ), by maximizing the following likelihood or, equivalently, maximizing the log likelihood function where c is a constant free of β and γ. see the s appendix for additional details of optimization. we then estimate the variance-covariance matrix of ðβ;γÞ by inverting the second derivative of −ℓ(β, γ) evaluated at ðβ;γÞ. finally, for t = , . . ., t, we estimate i(t) and r(t) byÎðtÞ ¼ nîðtÞ andrðtÞ ¼ nrðtÞ, whereîðtÞ andrðtÞ are obtained from ( ) with all unknown quantities replaced by their estimates; estimate β(t) and γ(t) bybðtÞ andĝðtÞ, obtained by using ( ) with (β, γ) replaced by ðβ;γÞ; and estimate r ðtÞ byr ðtÞ ¼βðtÞ=γðtÞ. estimation: let n be the size of population of a given country. the date when the first case was reported is set to be the starting date with t = , i = z i ( )/n and r = z r ( )/n. the observed data are {z i (t), z r (t), t = , . . ., t}, obtained from the github data repository website mentioned in the introduction. we maximize ( ) to obtainβ ¼ ðb ;b ; . . . ;b q Þ and γ ¼ ðĝ ;ĝ ; . . . ;ĝ q Þ. the optimal q and q are obtained via cross-validation. we denote by since the first case of covid- was detected in china, it quickly spread to nearly every part of the world [ ] . covid- , conjectured to be more contagious than the previous sars and h n [ ] , has put great strain on healthcare systems worldwide, especially among the severely affected countries [ ] . we apply our method to assess the epidemiological processes of covid- in some severely impacted countries. the country-specific time-series data of confirmed, recovered, and death cases were obtained from a github data repository website (https://github.com/ulklc/covid -timeseries). this site collects information from various sources listed below on a daily basis at gmt : , converts the data to the csv format, and conducts data normalization and harmonization if inconsistencies are found. the data sources include in particular, the current population size of each country, n, came from the website of worldometer. our analyses covered the periods between the date of the first reported coronavirus case in each nation and june , . in the beginning of the outbreak, assessment of i and r was problematic as infectious but asymptomatic cases tended to be undetected due to lack of awareness and testing. to investigate how our method depends on the correct specification of the initial values r and i , we conducted monte carlo simulations. as a comparison, we also studied the performance of the deterministic sir model in the same settings. fig shows that, when the initial value i was mis-specified to be times of the truth, the curves of i (t) and r(t) obtained by the deterministic sir model ( ) were considerably biased. on the other hand, our proposed model ( ), by accounting for the randomness of the observed data, was robust toward the mis-specification of i and r : the estimates of r(t) and i(t) had negligible biases even with mis-specified initial values. in an omitted analysis, we mis-specified i and r to be only twice of the truth, and obtain the similar results. our numerical experiments also suggested that using the time series, starting from the date when both cases and removed were reported, may generate more reasonable estimates. using the cubic b-splines ( ), we estimated the time-dependent transmission rate β(t) and removal rate γ(t), based on which we further estimated r ðtÞ, i(t) and r(t). to choose the optimal number of knots for each country when implementing the spline approach, we used -fold cross-validation by minimizing the combined mean squared error for the estimated infectious and removed cases. fig shows sharp variations in transmission rates and removal rates across different time periods, indicating the time-varying nature of these rates. the estimated i(t) and r(t) overlapped well with the observed number of infectious and removed cases, indicating the reasonableness of the method. the pointwise % confidence intervals (in yellow) represent the uncertainty of the estimates, which may be due to error in reporting. fig presents the estimated time-varying reproduction number,bðtÞĝðtÞ À , for several countries. the curves capture the evolving trends of the epidemic for each country. in the us, though the first confirmed case was reported on january , , lack of immediate actions in the early stage let the epidemic spread widely. as a result, the us had seen soaring infectious cases, and r ðtÞ reached its peak around mid-march. from mid-march to early april, the us tightened the virus control policy by suspending foreign travels and closing borders, and the federal government and most states issued mandatory or advisory stay-home orders, which seemed to have substantially contained the virus. the high reproduction numbers with china, italy, and sweden at the onset of the pandemic imply that the spread of the infectious disease was not well controlled in its early phases. with the extremely stringent mitigation policies such as city lockdown and mandatory mask-wearing implemented in the end of january, china was reported to bring its epidemic under control with a quickly dropping r ðtÞ in february. this indicates that china might have contained the epidemic, with more people removed from infectious status than those who became infectious. sweden is among the few countries that imposed more relaxed measures to control coronavirus and advocated herd immunity. the swedish approach has initiated much debate. while some criticized that this may endanger the general population in a reckless way, some felt this might terminate the pandemic more effectively in the absence of vaccines [ ] . fig demon- strates that sweden has a large reproduction number, which however keeps decreasing. the "big v" shape of the reproduction number around may might be due to the reporting errors or lags. our investigation found that the reported number of infectious cases in that period suddenly dropped and then quickly rose back, which was unusual. around february , a surge in south korea was linked to a massive cluster of more than , cases [ ] . the outbreak was clearly depicted in the time-varying r ðtÞ curve. since then, south korea appeared to have slowed its epidemic, likely due to expansive testing programs and extensive efforts to trace and isolate patients and their contacts [ ] . . estimated i(t), r(t), β(t), γ(t) , and r ðtÞ. the us (left) and china (right) are shown based on the data up to june , . the blue dots and the red dashed curves represent the observed data and the model-based predictions, respectively, with % confidence interval. more broadly, fig categorizes countries into two groups. one group features the countries which have contained coronavirus. countries, such as china and south korea, took aggressive actions after the outbreak and presented sharper downward slopes. some european countries such as italy and spain and mideastern countries such as iran, which were hit later than the east asian countries, share a similar pattern, though with much flatter slopes. on the other hand, the us, brazil, and sweden are still struggling to contain the virus, with the r ðtÞ curves hovering over . we also caution that, among the countries whose r ðtÞ dropped below , the curves of the reproduction numbers are beginning to uptick, possibly due to the resumed economy activities. we have developed a web application (https://younghhk.shinyapps.io/tvsirforcovid /) to facilitate users' application of the proposed method to compute the time-varying reproduction number, and estimated and predict the daily numbers of active cases and removed cases for the presented countries and other countries; see fig for an illustration. our code was written in r [ ] , using the bs function in the splines package for cubic b-spline approximation, the nlm function in the stats package for nonlinear minimization, and the jacobian function in the numderiv package for computation of gradients and hessian matrices. graphs were made by using the ggplot package. our code can be found on the aforementioned shiny website. the rampaging pandemic of covid- has called for developing proper computational and statistical tools to understand the trend of the spread of the disease and evaluate the efficacy of mitigation measures [ ] [ ] [ ] [ ] . we propose a poisson model with time-dependent transmission and removal rates. our model accommodates possible random errors and estimates a timedependent disease reproduction number, r ðtÞ, which can serve as a metric for timely evaluating the effects of health policies. there have been substantial issues, such as biases and lags, in reporting infectious cases, recovery, and deaths, especially at the early stage of the outbreak. as opposed to the deterministic sir models that heavily rely on accurate reporting of initial infectious and removed cases, our model is more robust towards mis-specifications of such initial conditions. applications of our method to study the epidemics in selected countries illustrate the results of the virus containment policies implemented in these countries, and may serve as the epidemiological benchmarks for the future preventive measures. several methodological questions need to be addressed. first, we analyzed each country separately, without considering the traffic flows among these countries. we will develop a joint model for the global epidemic, which accounts for the geographic locations of and the connectivity among the countries. second, incorporating timing of public health interventions such as the shelter-in-place order into the model might be interesting. however, we opted not to follow this approach as no such information exists for the majority countries. on the other hand, the impact of the interventions or the change point can be embedded into our nonparametric time-dependent estimates. third, the validity of the results of statistical models eventually hinges on the data transparency and accuracy. for example, the results of chinazzi et al. [ ] suggested that in china only one of four cases were detected and confirmed. also, asymptomatic cases might have been undetected in many countries. all of these might have led to underestimation of the actual number of cases. moreover, the collected data could be biased toward patients with severe infection and with insurance, as these patients were more likely to seek care or get tested. more in-depth research is warranted to address the issue selection bias. finally, our present work is within the sir framework, where removed individuals include recovery and deaths, who hypothetically are unlikely to infect others. although this makes the model simpler and widely adopted, the interpretation of the γ parameter is not straightforward. our subsequent work is to develop a susceptible-infectious-recovered-deceased (sird) model, in which the number of deaths and the number of recovered are separately considered. we will report this elsewhere. containment of covid- requires the concerted effort of health care workers, health policy makers as well as citizens. measures, e.g. self-quarantine, social distancing, and shelter in place, have been executed at various phases by each country to prevent the community transmission. timely and effective assessment of these actions constitutes a critical component of the effort. sir models have been widely used to model this pandemic. however, constant transmission and removal rates may not capture the timely influences of these policies. we propose a time-varying sir poisson model to assess the dynamic transmission patterns of covid- . with the virus containment measures taken at various time points, r may vary substantially over time. our model provides a systematic and daily updatable tool to evaluate the immediate outcomes of these actions. it is likely that the pandemic is ending and many countries are now shifting gear to reopen the economy, while preparing to battle the second wave of virus attack [ , ] . our tool may shed light on and aid the implementation of future containment strategies. coronaviruses: an overview of their replication and pathogenesis bats are natural reservoirs of sars-like coronaviruses discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus human coronavirus and severe acute respiratory infection in southern brazil. pathogens and global health evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of coronavirus disease johns hopkins cornonavirus resource center real-time epidemic forecasting for pandemic influenza mathematical models of infectious disease transmission modelling transmission and control of the covid- pandemic in australia challenges in control of covid- : short doubling time and long delay to effect of interventions individual vaccination as nash equilibrium in a sir model with application to the - influenza a (h n ) epidemic in france estimating epidemic parameters: application to h n pandemic data bayesian estimation of the dynamics of pandemic (h n ) influenza transmission in queensland: a space-time sir-based model. environmental research modeling super-spreading events for infectious diseases: case study sars deterministic sir (susceptible-infected-removed) models applied to varicella outbreaks an introduction to compartmental modeling for the budding infectious disease modeler risk analysis foundations, models, and methods a contribution to the mathematical theory of epidemics statistics based predictions of coronavirus -ncov spreading in mainland china. medrxiv a time delay dynamical model for outbreak of -ncov and the parameter identification epidemic analysis of covid- in china by dynamical modeling preliminary prediction of the basic reproduction number of the wuhan novel coronavirus -ncov effective containment explains sub-exponential growth in confirmed cases of recent covid- outbreak in mainland china lessons from the history of quarantine, from plague to influenza a. emerging infectious diseases a time-dependent sir model for covid- with undetectable infected persons sir model with time dependent infectivity parameter: approximating the epidemic attractor and the importance of the initial phase an epidemiological forecast model and software assessing interventions on covid- epidemic in china. medrxiv modeling count data methods for estimating disease transmission rates: evaluating the precision of poisson regression and two novel methods fitting outbreak models to data from many small norovirus outbreaks multi-species sir models from a dynamical bayesian perspective the estimation of the basic reproduction number for infectious diseases mathematical epidemiology of infectious diseases: model building transmission potential of smallpox: estimates based on detailed data from an outbreak measurability of the epidemic reproduction number in data-driven contact networks a time-dependent sir model for covid- with undetectable infected persons notes on r a practical guide to splines parameter estimation for differential equations: a generalized smoothing approach modelling transcriptional regulation using gaussian processes linear latent force models using gaussian processes latent force models mechanistic hierarchical gaussian processes empirical-bias bandwidths for local polynomial nonparametric regression and density estimation new reproducing kernel functions. mathematical problems in engineering a review of spline function procedures in r. bmc medical research methodology a note on the jackknife, the bootstrap and the delta method estimators of bias and variance covid- , chronicle of an expected pandemic covid- : how doctors and healthcare systems are tackling coronavirus worldwide closing borders is ridiculous': the epidemiologist behind sweden's controversial coronavirus strategy why a south korean church was the perfect petri dish for coronavirus coronavirus cases have dropped sharply in south korea r: a language and environment for statistical computing current status of global research on novel coronavirus disease (covid- ): a bibliometric analysis and knowledge mapping. available at ssrn investigating the cases of novel coronavirus disease (covid- ) in china using dynamic statistical techniques the impact of social distancing and epicenter lockdown on the covid- epidemic in mainland china: a data-driven seiqr model study. medrxiv covid- italian and europe epidemic evolution: a seir model with lockdown-dependent transmission rate based on chinese data the effect of travel restrictions on the spread of the novel coronavirus (covid- ) outbreak as china's virus cases reach zero, experts warn of second wave asian nations face second wave of imported cases key: cord- -dvgqouk authors: anzum, r.; islam, m. z. title: mathematical modeling of coronavirus reproduction rate with policy and behavioral effects date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: dvgqouk in this paper a modified mathematical model based on the sir model used which can predict the spreading of the corona virus disease (covid- ) and its effects on people in the days ahead. this model takes into account all the death, infected and recovered characteristics of this disease. to determine the extent of the risk posed by this novel coronavirus; the transmission rate (r ) is utilized for a time period from the beginning of spreading virus. in particular, it includes a novel policy to capture the r response in the virus spreading over time. the model estimates the vulnerability of the pandemic with a prediction of new cases by estimating a time-varying r to capture changes in the behavior of sir model implies to new policy taken at different times and different locations of the world. this modified sir model with the different values of r can be applied to different country scenario using the real time data report provided by the authorities during this pandemic. the effective evaluation of r can forecast the necessity of lockdown as well as reopening the economy. this is a new virus and the world is facing a new situation [ ] as there is no vaccine has made to combat this virus. in this situation, on january , world health organization (who) declared it to be a public health emergency of global concern [ ] . as of may , the disease was confirmed in more than cases reported globally and there are death cases reported. world health organisation (who) declared it a pandemic situation caused by coronavirus spreading [ ] . in this paper, the literature review consists of some relevant mathematical models which tried to describe the dynamics of the evolution of covid- . some of the phenomenological models tried to generate and assess short-term forecasts using the cumulative number based on reported cases. the sir model is a traditional one to predict the vulnerability of a pandemic and also can be used to predict the future scenario of coronavirus cases. however this model is modified [ ] , [ ] including other variables to calibrate the possible infection rate with time being. coronavirus transmission (how quickly the disease spreads) is indicated by its reproduction number (ܴ , pronounced as r-nought or r-zero), which indicates the actual number of people to whom the virus can be transmitted by a single infected case. who predicted (on jan. ) ܴ , would be between . and . . other research measured ro with various values somewhere between . and . , and . .the long used value of ro is . for flu and . for sars [ ] [ ] . in this research, firstly sir model is simulated taking a constant value of ܴ ൌ . to observe the response. since reducing the face to face contact among people and staying home in lockdown can improve in reducing the further infection rate, ܴ is taken as a time varying constant rather than a fixed one to observe the overall scenario of coronavirus spreading. the organization of the paper is as follows: a brief study of some mathematical models correlated to coronavirus spreading is extracted in section , literature review part, section contains the modeling of coronavirus spreading .section is the result and discussion part and section concludes the paper. the sir epidemic model which is used in this work is one of the simplest compartmental models firstly used by kermack and mckendrick ( ) .compartmental model denotes mathematical modeling of infectious diseases where the population is separated in various compartments for example, s, i, or r, (susceptible, infectious, or recovered). many works have been undergone during the coronavirus pandemics utilizing the compartmental models [ ], [ ] and imperial college covid- response team ( ) provides a useful overview of this classic model to show that these models can be applied to understand the current health hazards. much interest has been generated among economists to identify the impact of current pandemic on economic . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint sectors by exploring compartmental models along with standard economic models using econometric techniques. [ ] , [ ] . it was argued by the economists that many of the parameters controlling the move among compartments are not structural however it depends on individual decisions and policies. for example, according to eichenbaum et al. [ ] and farboodi, jarosch and shimer [ ] , the number of new infections is a function of the endogenous labor supply and consumption choices of individuals which is determined by the rate of contact and by the standard decision theory models where the rate of contact is amenable. similarly, the death rates and recovery rates are not just clinical parameters. moreover, it can be used as functions of policy decisions for example by expanding emergency hospital capacity may increase the recovery rate and decrease death rates. also, in this case the fatality ratio is a complex function because it depends on many clinical factors. therefore the selection-into-disease mechanisms are themselves partly the product of endogenous choices [ ] . moreover concerning with the identification problems of compartmental models economists atkeson [ ] and korolev [ ] have found that these models lack many set of parameters that is able to fit the observed data efficiently before the long-run degradable consequences. some of the researchers linton [ ] and kucinskas [ ] used time-series models utilizing the econometric tradition other than using compartmental models. however many of the economists are pushing the study of compartmental models in a multitude of dimensions. acemoglu et al. [ ] and alvarez et al. [ ] identified that the optimal lockdown policy for a planner who wants to control the fatalities of a pandemic while minimizing the output cost of the lockdown. berger et al. [ ] examine the role of testing and case-dependent quarantines. bethune and korinek [ ] estimate the infection externalities associated with covid- . bodenstein et al. [ ] examine a compartmental model with a multi-sector dynamic general equilibrium model. other researchers like garriga, manuelli and sanghi [ ] ,hornstein [ ] , and karin et al. [ ] study a variety of containment policies. toda [ ] , estimated a sird model to explores the optimal mitigation policy that controls the timing and intensity of social distancing. flavio toxvaerd [ ] also developed a simple economic model emphasizing on endogenous social distancing. furthermore, many of the economists commented that coronavirus infection transmission cannot be a biologically induced constant rather it can be varied with human behavior and the change in human behavior can be predicted in response to changing social policies. another form of multirisk sir model with assumed a targeted lockdown period provided by the economists daron acemoglu et al [ ] which is an epidemiology models with economic incentives. the authors of this paper argued about the herd immunity that might be much lower if the super spreaders like people in hospitals, emergency service providers, bus drivers, etc are set to immune first. . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint therefore in this paper, we have tried put emphasis on these ideas by allowing the infection rates changeable over time by imposing social distancing among people. moreover we tried to focus on the reproduction rate ܴ which is not a constant but varies with time based on the demographic and policy heterogeneity. according to the conventional sir model, for the n number of constant population, each of whom may be in one of five states with respect to time implies that: while a susceptible person can be affected by the disease when comes in contact with an infectious person. the sir model is used to predict the vulnerability of any type pandemic which may not be applicable to coronavirus cases since this model assumes the reproduction rate as a constant. the virus seems to be diminished when all affected people will be recovered which is practically not possible. the affected people by corona virus are highly contagious to other people who come in contact to them. thus the spreading of coronavirus infection is increasing day by day. therefore, a mathematical model of the spreading of it can help predicting its vulnerability and while to take some efficient measures to lowering the contact rate. the modified behavioral sir model is presented by [ ] : . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . in the sir model, β is a constant and hence we use a constant ܴ ൌ reproduction rate. eventually the exponential growth of the disease then the growth can be confined by the reducing the number of susceptible person in the total number of population. moreover, each of the infected people is considered to be recovered people after certain time. but in actual scenario it is not happening because of the death from coronavirus infection. in the actual sir model lowered β was used. however, by lowering the reproduction rate ܴ , the further spread of coronavirus can be suppressed. after the lockdown most of the people reduced their contacts which decrease the proportionally of the increased number of infectious people. since β is the rate of susceptible person so that a logarithmic function can be used as it can't be negative. cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint however, due to absence of sufficient continuous testing the number of infectious people at any time cannot be predicted. therefore the further expansion model can be expressed as utilizing the current death rate: the logarithmic function was used to get the idea in the early declines while super spreading activities are considered to be eliminated. let us assume the total number of population is million. in this model we used the same parameter value as used by the researcher chad and jesús [ ] . hereby, γ = . which is used for days of remain infectiousness in an average, if the death rate is %, therefore the calibration of α is calculated at an infection rate of . %. i.e. death at a million population. cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . the standard sir model simulation with those above mentioned parameters values is as follow: . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint from the graph, the first day starts with only one infected person then the infection rate in the graph increases exponentially with the increasing number of new infected people while it show the pick value at almost half the population. however, after days of the infection spreading the herd immunity begun while the maximum number of infected people was found. the number of sick people who are resolving peaks after few days. the pandemic period is over months and it is expected to go away after that. while everyone of the total million populations get infected and that results into . % or death. furthermore, the behavioral modified sir model simulation, the system reacts by reducing the transmission rate meaning that the varied values of r function. . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . the graph shows that about people will be infected but not the total million populations. the pandemic gets going exponentially (blue line), while the infection rate reach per million there is a noticeable acute reduction in the further reproduction rate indicated in black colored dashed line. moreover the transmission rate r is asymptotes in this model, with a decline in the number of infection rate as well as less number of death cases per day. in this process, it may take a long period to achieve herd immunity if all the initiatives taken in order to lessening contacts are stopped. assuming the reproduction rate is asymptotes to r ൌ , the following graph shows: . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. this simulation graph shows an aggressive response, α is increased by a factor of showed in the dashed line (red) in the first graph. in this graph from the vertical scale it is clear that, the less people are vulnerable to the infections implies that the overall rate of infection is comparatively low. being stricter about the social gathering and possible contacts the situation doesn't change since the reproduction or transmission rate is still asymptotes, such that r declines to one. however in such scenario, we get low infection rate on daily basis. after reducing the number of infected people over time, letting r varies over time and α is increasing by a factor of for days ,the following graph shows: . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . since after a few weeks deaths occur by the infections is found steady over time. then the situation is considered under control. after letting people go back to the normal life with no restriction, the situation may get back with more severe affects. then the coronavirus pandemic might be uncontrolled with high deaths in the first wave while by taken possible measures and consciousness of people to avoid contact can reduce the infection rate and death rate. furthermore, after lowering the death rate people ease up and a second wave of infection rate appears so forth. therefore, the reproduction rate r cannot be constant as it varies with the behavioral change of people and policy taken to control the pandemic. the covid- is a current global issue which spread in almost every country of the world and caused restriction to the free movement of people resulting a massive economical loss worldwide. transmission rate (r ) is the number of newly infected individuals derived from a single case is the factor that can calibrated the reproduction rate of coronavirus utilizing the sir model. in the traditional sir model, taking ro as a constant cannot predict the actual scenario of coronavirus spreading. the value of r can be different for different places and time period. moreover, ro can be changed with the behavioral changes of people because of the adopted policies by respective authorities. therefore, in this research r is not considered a constant rather it is used as a time varying function. by taken the possible measures to reduce the social contact r can be minimized with the time causing less death and forecasting to reopen the economy. population biology of infectious diseases: part mathematical formulation and validation of the be-fast model for classical swine fever virus spread between and within farms a novel spatial and stochastic model to evaluate the within-and between-farm transmission of classical swine fever virus. i. general concepts and description of the model theglobaldynamicsforanagestructuredtuberculosistransmissionmodelwiththeexponential progression rate a novel coronavirus outbreak of global health concern severe acute respiratory syndrome-related coronavirus: the species and its viruses -a statement of the coronavirus study group. biorxiv naming the coronavirus disease (covid- ) and the virus that causes it unique epidemiological and clinical features of the emerging novel coronavirus pneumonia (covid- ) implicate special control measures statement-on-the-second-meeting-of-theinternational-health-regulations director-general's opening remarks at the media briefing on covid- - estimating and simulating a sird model of covid- for many countries, states, and cities. no. w early dynamics of transmission and control of covid- : a mathematical modelling study. the lancet infectious diseases the mathematics of infectious diseases data gaps and the policy response to the novel coronavirus policy implications of models of the spread of coronavirus: perspectives and opportunities for economists the macroeconomics of epi demics internal and external effects of social distancing in a pandemic what does the case fatality ratio really measure? how deadly is covid- ? understanding the difficulties with estimation of its fatality rate identification and estimation of the seird epidemic model for covid- when will the covid- pandemic peak? tracking r of covid- a multi-risk sir model with optimally targeted lockdown a multi-risk sir model with optimally targeted lockdown anseirinfectiousdiseasemodelwith testing and conditional quarantine covid- infection externalities: trading off lives vs. livelihoods social distancing and supply disruptions in a pandemic optimal management of an epidemic: an application to covid- socialdistancing, quarantine,contacttracing, andtesting: implications of an augmented seir-mode adaptive cyclic exit strategies from lockdown to suppress covid- and allow economic activity susceptible-infected-recovered (sir) dynamics of covid- and eco nomic impact cambridge working papers in economics individual variation in susceptibility or exposure to sars-cov- lowers the herd immunity threshold estimating and simulating a sird model of covid- for many countries, states, and cities. no. w key: cord- -kvh qt authors: kumar, sunny title: predication of pandemic covid- situation in maharashtra, india date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: kvh qt presently, the world is infected by covid virus which has created an emergency public health. for controlling the spreading of the virus, we have to prepare for precaution and futuristic calculation for infection spreading. the coronavirus affects the population of the world including inia. here, we are the study the virus spreading rate on the maharashtra state which is part of india. we are predicting the infected people by the sir model. sir model is one of the most effective models which can predict the spreading rate of the virus. we have validated the model with the current spreading rate with this sir model. this study will help to stop the epidemic spreading because it is in the early stage in the maharashtra region. a virus [ ] is micro/nano meter in size which is reproduced inside the living cell of an organism. presently coronavirus is created a health emergency to the world population and became a pandemic [ ] , [ ] , [ ] . initially, this virus is transferred from the bat animal to the human [ ] . further this virus shows the human to human transmission [ ] . covid virus is spreading to the people by the respiratory droplets and contact mode [ ] . previously there are several mathematical models reported [ ] . the sir model is simple and effective model which can give the prediction of different pandemic situation [ ] . here, we are studying the spreading effects of the covid to the maharashtra state. there are states in india in which maharashtra is one of them. this state has a total population of . crore which is ~ . % of the overall population of india. the first case was observed march in maharashtra [ ] where the couple was returned from dubai. in india, the first case was observed on january [ ] . at march, world health organization (who) announce that covid is outbreak of pandemic where the term used by disease experts when epidemics are growing in multiple countries and continents at the same time [ ] . in current situation, after march corona virus cases are rapidly increase in india and there are several cases also observed in the maharashtra. this study explains the epidemic growth by using sir model for this state which helps to control this epidemic. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the ratio of the infected ratio and recovery ration is known as reproduction ratio (r) . in the results and discussion section, the present rate of the virus infection is studied. the infected population is also predicted by the sir model. the reproduction ratio parameter also explained for the spreading rate of the virus. where i represent the infected people and t represents the days. the undercover area is covered by the black line which follows the equation . after that, if we plot the graph up to the april then the infected population will be ~ . this predication is followed a similar tend when there is no recovery and death of the population. cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint certain peak. after certain period, the infected population are decreases. the blue line shows the recovery or death of the population which is increasing by time. image (b) shows the reproduction ratio (r) effects to the infected people. the reproduction ratio is varied from . to where infected population was increased in the high reproductive ratio. when susceptible are high than infected people will be more as shown in figure (c)-(d) and corresponding infected people at different reproduction ratio. during the lockdown, people will make the proper social distancing which will reduce the spreading of infection. the study explains the spreading of covid in the maharashtra region which is part of india. in this particular region, the effect of the virus is studied by the sir model. the model predicted that the maximum infected population will be almost ~ after days at the peak point. after reaching the peak point the recovery will be more and the virus infection will be high for r ~ . . this study is considered only population are migrating in the state which can be affected in the lockdown condition. if the migrated population will increase which causes the increment of infection and infect the population more. this can control by the lockdown situation only which is presently imposed by the governments to the people. there should be a global health community that unites to urgently avoid the pandemic issues. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint the ancient virus world and evolution of cells the lessons of the pandemic a review of the herald pandemic wave: importance for contemporary pandemic response strategies a sars-like cluster of circulating bat coronaviruses shows potential for human emergence a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster respiratory virus shedding in exhaled breath and efficacy of face masks the mathematical theory of infectious diseases and its applications forecasting seasonal influenza with a state-space sir model this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. the ethical approval or individual consent was not applicable. the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity.is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint key: cord- -nvi h t authors: dinh, ly; parulian, nikolaus title: covid‐ pandemic and information diffusion analysis on twitter date: - - journal: proc assoc inf sci technol doi: . /pra . sha: doc_id: cord_uid: nvi h t the covid‐ pandemic has impacted all aspects of our lives, including the information spread on social media. prior literature has found that information diffusion dynamics on social networks mirror that of a virus, but applying the epidemic susceptible‐infected‐removed model (sir) model to examine how information spread is not sufficient to claim that information spreads like a virus. in this study, we explore whether there are similarities in the simulated sir model (sirsim), observed sir model based on actual covid‐ cases (siremp), and observed information cascades on twitter about the virus (infocas) by using network analysis and diffusion modeling. we propose three primary research questions: (a) what are the diffusion patterns of covid‐ virus spread, based on sirsim and siremp? (b) what are the diffusion patterns of information cascades on twitter (infocas), with respect to retweets, quote tweets, and replies? and (c) what are the major differences in diffusion patterns between sirsim, siremp, and infocas? our study makes a contribution to the information sciences community by showing how epidemic modeling of virus and information diffusion analysis of online social media are distinct but interrelated concepts. assert that the sir model can be applied to examine the decline of diffusion activities on friendster online social network, and found that the decline started when popular users left friendster (labeled as r in sir model). however, applying the epidemic sir model to examine how information spread is not sufficient to claim that information spreads like a virus (lerman, ; wu, huberman, adamic, & tyler, ) . there are different mechanisms that influences how information spread from one user to another, but does not influence how a virus spread from one person to another, and vice versa (lerman & ghosh, ; mønsted, sapieży nski, ferrara, & lehmann, ) . in this study, we examine in parallel the epidemic and information diffusion, and the mechanisms by which both diffusion processes contribute to covid- 's spread. specifically, we compare covid- virus's (a) sir -modeled and (b) empirically observed diffusion patterns with (c) information cascades of retweeting, quote tweeting, and replying behaviors on twitter social network to understand the relationships between information and virus diffusion. to do this, first, we create an sir simulation (we call this sirsim) of covid- 's diffusion with respect to empirically validated parameters such as reproductive rate (r ), incubation period, and symptom length range. secondly, we create an sir model from actual confirmed cases with data gathered from johns hopkins university (jhu-csse, ) (we call this siremp). thirdly, we construct information cascades from on our collected twitter data (we call this infocas) based on three dimensions: retweets, quote tweets, same as retweets but with comment included), and replies to tweets. for the information cascades, we also categorize each piece of information to either susceptible (new tweet about the virus), infected (retweets, quoting of retweets, or replying to tweets), or removed (tweet not shared by others after a period of time). consistent with the aspects of the study, we propose three primary research questions: rq : what are the diffusion patterns of covid- virus spread, based on sirsim and siremp? rq : what are the diffusion patterns of information cascades on twitter (infocas), with respect to retweets, quote tweets, and replies? rq : what are the major differences in diffusion patterns between sirsim, siremp, and infocas? our study makes a contribution to the information sciences community by showing how epidemic modeling of virus and information diffusion analysis of online social media are distinct, but interrelated concepts. with the advent of social networking sites and online microblogs such as twitter, individuals can create and exchange information with larger amounts of people in lesser amounts of time. these online social networks are thus instrumental for researchers to examine what types of information diffuses between individuals and what underlying mechanisms facilitate the diffusion. in the context of social networks, information diffusion is formally defined as a process by which a piece of information is passed down from one node to another node through an edge (gruhl, guha, liben-nowell, & tomkins, ; guille, hacid, favre, & zighed, ) . two seminal models have been widely adopted to examine diffusion dynamics with network structure considered, namely independent cascade models (goldenberg, libai, & muller, ) and linear threshold models (granovetter, ) . independent cascade models assume that each node has a certain fixed probability to spread, or "infect" a piece of information to a neighboring node. on the other hand, linear threshold models posit that a node would be "infected" by a piece of information if a certain threshold of neighboring nodes have also been infected by that information. both models have been widely used to detect influential topics (gruhl et al., ) and influential users (yang & leskovec, ) in online social networks and the impacts they have on diffusion rate. (gruhl et al., ) focus on the spread of topics on blogs based on rss (rich site summary) feeds and found that topics were either consistently popular (called "chatter") or only popular for a short time (called "spikes"). the authors also observed that topics with high chatter also contained larger and more frequent spikes. (yang & leskovec, ) demonstrate that an influential node can be detected with respect to how many nodes have been influenced by that particular node before. in addition to the independent cascade model and linear threshold model, scholars studying information diffusion from a wide range of disciplines have also found the utility of modeling diffusion as an epidemic process. in particular, the sir model has been frequently used to explain how information in an online social network becomes "infectious" and passes from one node to another. sir is known as a compartmental model. because it categorizes an individual to be in one of three states at a certain point in time, susceptible (s), infected (i), or removed (r) (kermack & mckendrick, ). an individual may transition their state due to influence from another individual in the same network, in which the transition is linear (s!i, i!r). at the first transition point, s!i occurs because a susceptible individual was in contact with an infected individual and therefore got the virus. the infection assumed at this transition point is at a constant rate of β per time unit. at second transition point, i ! r transition occurs when an infected individual either recovered from the virus and got immunity from it, or has been removed (i.e., has died). at this transition, the model assumes that recovery rate is fixed at γ per time unit. these assumptions are stated in the following set of equations of (s), (i), (r) at time (t): abdullah & wu, ) examine how trending news spread on twitter by sorting users into three compartments, s for users who saw tweets from an infected user, i for users who tweet about a news topic, and r for users who no longer tweet about a topic after a predefined timeframe of h. the authors also assume fixed infection rate β and recovery rate γ in their epidemic simulation and observed model with twitter data, and found a strong fit between the models. in addition to news, scholars have also examined whether false rumors and disinformation diffuse on social networks in a manner similar to how an infectious disease spread (jin, dougherty, saraf, cao, & ramakrishnan, ; nekovee, moreno, bianconi, & marsili, ) . research by (nekovee et al., ) conceptualizes rumor spreading as a epidemic transition process between ignorants, spreaders, and stiflers. they found that rumor spread rate is higher in scale-free networks than in random graphs. their finding is consistent with (lerman & ghosh, )'s observation that information cascades on twitter follow a power-law distribution. (jin et al., ) also refine the sir model to examine rumor diffusion by adding exposed e and skeptical z individuals, and found that the rate of rumor infection (i) increases as the rate of e decreases, and the susceptible (s) rate decreases as z increases. other works have also found sir models to be useful in explaining diffusion of content on other social networking platforms such as flickr (cha, mislove, adams, & gummadi, ) and digg (ver steeg et al., ). on the other hand, several studies observe that there are clear differences in sir epidemic model and information diffusion process. (goel, munagala, sharma, & zhang, ) do not find strong correlation between the sir model and observed retweet cascades as the epidemic model do not take into account users' characteristics. similarly, (liu & zhang, ) point out that information diffusion process includes variables not in sir model such as content of the information, strength of ties among individuals, and other social factors. in light of diverse findings on the extent to which sir models can explain information diffusion on social networks, we examine whether there are similarities in our simulated sir model (sirsim), observed sir model based on actual covid- cases (siremp), and observed information cascades on twitter about the virus (infocas). we empirically test whether there are similarities between the information diffusion process on twitter about covid- topics and the diffusion of the virus itself between individuals. to do this, we develop three different networks. the first two networks are created to capture the diffusion of the covid- virus in the entire population, via an sir simulated model (sirsim) and an observed model based on reported data about infected (i), and removed (r) cases (siremp). the third network is constructed from information cascades on twitter (we call this infocas), where infected (i) are tweets that interacted with the original tweets about covid- by either retweeting, quoting, or replying, and removed (r) include tweets that are no longer interacted with for a defined period. we describe the datasets used and the process of constructing each network in the following sections. all data collected and code used in this work are available on figshare (dinh & parulian, ) . we implement a sir simulation model of covid- on netlogo, an open-source environment for agent-based modeling. we extended an existing model on virus spread on netlogo, and refined model parameters based on official sources' information about covid- spread and shown in table . we keep the parameters constant throughout the simulation, and set the duration of the simulation to days. we choose the duration of days to reflect the timeframe between december , to march , . we choose december , as opposed to december , as the first date of covid- to take into account the days (see table for virus symptom length) of symptoms leading up to the confirmation of the infected case. the initial population for our model includes the entire world population, at . billion people. figure shows the netlogo interface of our sirsim model, with additional parameters included to simulate the transitions of agents from (s!i), and i!r). adhering to the sir model, s agents represent the carriers of the virus, i agents are those infected by the carriers, and r are agents who are removed due to death. due to computational limitations that poses difficulty to represent each individual as an agent, we group million people in each agent (#-people-per-agent setting). thus, our model contains , agents interacting with one another. the first agent represents patient zero, and is originated the city of wuhan in our world map (x-axis: , y-axis: - ). we assign agents to move around major cities across the world (e.g., new york city, paris, tokyo, moscow) (see table a in appendix). all agents initially started in s state, except for patient zero, who then spreads the disease by contacting with agents from other cities through two modes of traveling: driving (parameter mode = "human") or flying (parameter mode = "plane"). we set these parameters through the use of patches (pixel) feature, enabling each agent to move certain distances depending on the patch size. the circumference of our simulated "world" is pixels, and with the given circumference of , miles, each patch covers about miles in our model. to simulate driving, we calculate the average mileage driven per day ( . miles), and then derive a movement of . patches per day for each agent. to simulate flying, each agent has a random chance to create an airplane and fly to any other major cities. while our model accounts for many parameters that are reflective of actual virus spread dynamics, we do not take into account any virus control strategies such as quarantine or social distancing. we repeat the simulation over iterations to ensure reliability of experimental results. each iteration result is presented as a network that contains multiple types of nodes, susceptible, infected, and removed. an edge can form between any two node types, and node type can change over time (e.g., from susceptible to infected if there is an edge between the two nodes), except for when a node has been labeled as removed. we gather actual cumulative cases of covid- from johns hopkins center for systems science and engineering (jhu csse)'s data repository. this repository contains global confirmed cases, death cases, and recovered cases from january to march , , for over countries (jhu-csse,- ). to our knowledge, this data repository is the most comprehensive so far, with triangulation of cases counts from sources (e.g., who, china cdc, italy ministry of health, worldometers). we analyze this dataset within the assumptions of sir model, where s are individuals in the population that are not yet infected nor immune to the virus, i is equivalent to "confirmed cases" in the dataset, and r is equivalent to "deaths cases". we do not include the "recovered" cases in our model as the data does provide whether these cases are re-entered into the "confirmed cases" in latter time-frames. in the original dataset, there is no inclusion of s, given that susceptible nodes include all members of the world population. the third dataset we use for this research is twitter data that contains information about covid- . we collect tweets during the period of december , to march , with a maximum of , samples (limit set by firehose) for each day from crimson hexagon firehose. we collect , tweets that include either or all of the hashtags #coronavirus, #covid , #ncov. we construct information cascades based on three primary behaviors that occurs between tweets in our dataset: ( ) retweet, ( ) quote tweet, and ( ) reply. we exclude all tweets content originated from european countries, in recognition of general data protection regulation (gdpr). based on the sir model, we define the conditions for infected nodes, and removed nodes below. our approach does not consider susceptible nodes because in this context, susceptible tweets are all tweets that exist on twitter. an original tweet that has yet to be retweeted or interacted with is counted in this category. there are , tweets in this category. if a tweet interacts with an s tweet through either retweeting, quoting, or replying, the tweet is counted in this category. (a) retweet is an action of reposting an original tweet, and without changing the original tweet content. if an original tweet has not been retweeted, quoted, or replied to by other tweets in a defined period. we used the average delta time between each activity on the original tweet as our incubation period. therefore, if there is no user interaction with the tweet between the average time frame from the latest spread, we consider the tweet is removed. average delta time statistics for each type of cascade (retweet, quote tweet, and reply tweet) can be seen in table . there are , tweets in total that are in this category. an information cascade is determined by the period other tweets (i) interact with an original tweet (s) on this dataset. given an original tweet (t ) on time t the cascade c on time t (c t ) is equal to: for each type of information cascade, we analyze the cascade growth by aggregating the s, i, and r tweet for each day. our first research question asks about the diffusion patterns of covid- based on both a simulated sir model (sirsim) and actual number of cases from empiricallyvalidated sources (siremp). for sirsim, across iterations of our simulation, we find the average counts of susceptible agents to be , . million, average counts of infected to be . million, and average counts of removed to be . million. thus, the proportion of healthy, but susceptible agents is . % (s) in our model. there are only % (i) of agents that are infected by the virus, and only . % (r) are removed due to death. as shown in figure , the distribution of infected (blue line, left) and removed (red line, left) agents per day, noncumulatively, and find an increasing pattern for both trendlines. the proportions of removed cases is much lower than infected cases, and this is shown in the network visualization in figure . we then compare these results to siremp, which finds that as of march , , there were , infected cases, and , removed cases (deaths only). by proportion with the world population, therefore, infected cases is . %, and removed cases is a minimal percent. by comparison, the empirically-validated results show substantially lower proportions of infected and removed agents, and in turn, higher proportion of susceptible agents. we also analyze the distribution of infected (blue line, right) and removed cases (red line, right) for siremp, and finds multiple spikes in the blue line, but flat distribution for the red line. the spikes in infected counts are due to inclusion of cases from countries such as the u.s, south korea, italy. in comparison to the distributions from sirsim, the distribution of removed cases in siremp is relatively static throughout. table (network statistics) shows the sizes of the three network cascades within infocas, retweet, quote tweet, and reply tweets. we find that retweet cascade is times larger in size than the quote tweet cascades, and times larger than the reply cascades. this finding is consistent with the notable differences in the number of cascades f i g u r e distribution of infected and removed agents for sirsim (left) and siremp (right) models f i g u r e sirsim network. blue nodes = infected cases, red nodes = removed (death) cases present in each network, in which retweet network has times more cascades than quote tweet network, and times more cascades than reply tweet network. figure presents the rapid growth in tweet activities, with stark increase in retweets, quote tweets, and reply tweets during mid-january. we find that the growth distributions for all three tweet types follow a logarithmic curve. in addition, the number of infected users, equivalent to individuals spreading the information, is much higher compared to the new information consistent on the three observations. we also observe that the cascade growth for retweets is substantially higher than growth for quote tweets and reply tweets. table shows the coefficients and parameters for each linear fit of the number of tweets to the day-period. as we can see from the table, the slope of a retweet is the highest, followed by the quote tweet and reply tweet. the slope for removed information is the lowest compared to the infected and new information and consistent for all cascade types. this indicates that as the number of new information is introduced each day, some portion of the information stops spreading. we aggregate the data from sir-simulation over iterations (sirsim) and csse's real-infection data (siremp) and analyze correlation with twitter's information growth (infocas) for the same time period. table shows the correlational values in terms of pearson's correlation, for each sir state. for the cascades of infected nodes, we find the highest correlation between sirsim and infocas -retweets (r = . ). the second-highest correlation is between retweets and quote tweets (r = . ). another notable correlation is between sirsim and quote tweets (r = . ). siremp has low correlations with all other types of cascades, with correlations ranging from . to . . f i g u r e retweet, quote tweet, and reply tweet growth for each day during the covid- outbreak period. x-axes represent the day, y-axes represent the number of tweets. new information represents the original source of information, infected represents an interaction with another user, and removed represents the end of the information spread after a defined period intercept (β ) in terms of the removed nodes cascades, there is also high correlation observed between infocas -retweets and quote tweets (r = . ). retweets also have high correlation with reply tweets (r = . ). these two correlations show that retweet cascades are most correlated to quote tweets and reply tweets with respect to tweets that are no longer interacted with, and thus can no longer spread that particular tweet content in the network. correlation between infocas and sirsim is relatively lower (r = . - . ), showing that there is a weaker relationship between the simulated and observed twitter's removed cascades. similarly, there is a weak relationship between siremp and all infocas cascades, especially with reply tweets (r = . ). our study focuses on the diffusion patterns of covid- virus itself and the information shared online about the virus. to capture the diffusion patterns of the virus, we create an sir model (sirsim) based on empirically-validated transmission dynamics of covid- (e.g., reproductive ratio, incubation period), and then compare with actual confirmed cases of covid- from january to march , (siremp). to examine diffusion patterns of information discussed online about covid- , we construct three cascades (infocas) based on retweets, quote tweets, and reply tweets on twitter that mentioned covid- from the period of december st to march , . our first research question asks about the diffusion patterns of covid- virus, based on epidemiological assumptions of sir. from our sirsim model, we find the proportions of infected cases to be only % of the entire world population, and the proportions of removed (dead) cases is only . % of the population. our model accounts for days since the first case of the virus, and the upward trajectory beyond linear growth suggests to us that rate of infection and deaths may increase logarithmically. this is consistent to current findings on covid- that finds the distributions of infected cases follow a logarithmic distribution (cao et al., ; maier & brockmann, ) . (cao et al., ) finds the logarithmic growth rate is suitable considering that covid- is relatively in the early stage, and thus growth is slowly increasing. we also find notable differences in the simulated model and the actual confirmed cases of covid- (from siremp). in fact, the distribution of removed cases in siremp is flat, as opposed to the increasing distribution observed in sirsim. there are two reasons for the mismatch in simulated and actual distributions of sir cases. the first is that our model does not take into account preventive measures such as social distancing, self-quarantine, and shelter-in-place which are found to be effective in "flattening the curve" (lewnard & lo, ; parmet & sinha, ) . the second reason may be that the quantification of infection and death rates need further modifications, specifically because there is still limited testing (ioannidis, ) , and reporting delays (gardner, zlojutro, & rey, ) . the second research question asks about the diffusion patterns of information cascades on twitter about covid- . we construct retweet cascade, quote tweet cascade, and reply cascade (we call these infocas) to fully capture the different types of interactions between users on twitter. all three cascades show strong fit with linear-log distribution, suggesting a power-law decay in the diffusion of new information about covid- over time. with this finding along with the cascade length of each tweet type, we expect that retweet cascade decays at the fastest rate, given that its cascade length is only approximately hours. on the other hand, we find quote tweets' average cascade length to be about days, which means that each original tweet that has been interacted with via quotes has longer duration in terms of activity. this is also observed for reply tweets, where the average cascade length is about days. the third research question focuses on the correlation in diffusion patterns between sirsim, siremp, and infocas to address the connection between epidemic and information diffusion dynamics. based on the examination of infected cascades, we find the stronger positive correlation between sirsim and infocas -retweets (r = . ), and quote tweets (r = . ). on the other hand, we observe low correlations between siremp and all three infocas types (r = . - . ). this shows that the distribution of infected agents are more correlated between infocas and sirsim, and not so much with siremp. with the rapid spread dynamics seen in sirsim, this correlation shows that tweets about covid- gets retweeted most quickly, then followed by quote tweets, and then reply tweets. the correlation between sirsim and siremp is relatively low (r = . ), which may indicate that either the simulated model potentially overestimates the infection rate, or that the actual reported cases may underestimate the infection rate. for the removed cascades, we find strongest correlations between infocas cascades, specifically between retweets and quote tweets (r = . ), retweets and reply tweets (r = . ), and quote tweets and reply tweets (r = . ). we find weaker correlations between infocas and sirsim (r = . - . ), and weakest correlations between infocas and siremp (r = . - . ). this result is consistent with our observation that the removed distribution on siremp is more uniform and flat compared to other distributions. it is also expected that the removed distribution for infocas would be different from sirsim, given that the likelihood of tweets to transition from infected to removed is notably higher. overall, we find complex relationships between diffusion dynamics about covid- from the simulated virus spread model, the actual reported cases of the virus spread, and the information shared and discussed online. our study demonstrates how epidemic modeling, in combination with examining information cascades about the virus can help capture the many activities surrounding the covid- pandemic. in future work, we hope to expand our data collection to more recent dates, given the constantly-changing nature of the pandemic. additionally, we aim to improve our simulated epidemic model (sirsim) to include additional control variables that reflects prevention strategies, namely social distancing, self-quarantine, and shelter-in-place. an epidemic model for news spreading on twitter estimating the effective reproduction number of the -ncov in china. medrxiv coronavirus disease characterizing social cascades in flickr covid datasets to examine diffusion patterns modeling the spread-ing risk of -ncov who director-general's opening remarks at the media briefing on covid- a note on modeling retweet cascades on twitter talk of the network: a complex systems look at the underlying process of word-ofmouth threshold models of collective behavior information diffusion through blogspace information diffusion in online social networks: a survey a fiasco in the making? as the coronavirus pandemic takes hold, we are making decisions without reliable data coronavirus -ncov global cases by johns hopkins csse epidemiological modeling of news and rumors on twitter a contribution to the mathematical theory of epidemics the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application information is not a virus, and other consequences of human cognitive limits information contagion: an empirical study of the spread of news on digg and twitter social networks scientific and ethical basis for social-distancing interventions against covid- information spreading on dynamic social networks effective containment explains sub-exponential growth in confirmed cases of recent covid- outbreak in mainland china evidence of complex contagion of information in social media: an experiment using twitter bots theory of rumour spreading in complex social networks covid- -the law and limits of quarantine the collapse of the friendster network started from the center of the core what stops social epidemics information ow in social groups characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention global health crises are also information crises: a call to action modeling information diffusion in implicit networks