key: cord-0787225-p5aj5k2g authors: Park, Sang Woo; Bolker, Benjamin M.; Champredon, David; Earn, David J.D.; Li, Michael; Weitz, Joshua S.; Grenfell, Bryan T.; Dushoff, Jonathan title: Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty: framework and applications to the novel coronavirus (SARS-CoV-2) outbreak date: 2020-02-02 journal: nan DOI: 10.1101/2020.01.30.20019877 sha: 0ff4ad5359ee2df3568605385a8616d8da66bf2c doc_id: 787225 cord_uid: p5aj5k2g A novel coronavirus (SARS-CoV-2) has recently emerged as a global threat. As the epidemic progresses, many disease modelers have focused on estimating the basic reproductive number Ro -- the average number of secondary cases caused by a primary case in an otherwise susceptible population. The modeling approaches and resulting estimates of Ro vary widely, despite relying on similar data sources. Here, we present a novel statistical framework for comparing and combining different estimates of Ro across a wide range of models by decomposing the basic reproductive number into three key quantities: the exponential growth rate $r$, the mean generation interval $bar G$, and the generation-interval dispersion $kappa$. We then apply our framework to early estimates of Ro for the SARS-CoV-2 outbreak. We show that many early Ro estimates are overly confident. Our results emphasize the importance of propagating uncertainties in all components of Ro, including the shape of the generation-interval distribution, in efforts to estimate Ro at the outset of an epidemic. Since December 2019, a novel coronavirus (SARS-CoV-2) has been spreading in China and other parts of the world (World Health Organization, 2020d). Although the virus is believed to have originated from animal reservoirs (Centers for Disease Control and Prevention, 2020), the ability of SARS-CoV-2 ability to directly transmit between humans has posed a greater threat for its spread World Health Organization, 2020c) . As of February 27, 2020, the World Health Organization (WHO) has confirmed 82,294 cases of the coronavirus disease , including 3,664 confirmed cases in 46 different countries, outside China (World Health Organization, 2020a). As the disease continues to spread, many researchers have already published their analyses of the outbreak as pre-prints (e.g., Bedford et al. (2020) ; ; Liu et al. (2020) ; Majumder and Mandl (2020) ; Read et al. (2020a) ; ) and in peer-reviewed journals (e.g., ; Riou and Althaus (2020b) ; Wu et al. (2020) ; ), focusing in particular on estimates of the basic reproductive number R 0 (i.e., the average number of secondary cases generated by a primary case in a fully susceptible population (Anderson and May, 1991; Diekmann et al., 1990) ). Estimates of the basic reproductive number are of interest during an outbreak because they provide information about the level of intervention required to interrupt transmission (Anderson and May, 1991) , and about the potential final size of the outbreak (Anderson and May, 1991; Ma and Earn, 2006) . We commend these researchers for their timely contribution and those who made the data publicly available. However, it can be difficult to compare a disparate set of estimates of R 0 from different research groups (as well as the associated degrees of uncertainty) when the estimation methods and their underlying assumptions vary widely. Here, we show that a wide range of approaches to estimating R 0 can be understood and compared in terms of estimates of three quantities: the exponential growth rate r, the mean generation intervalḠ, and the generation-interval dispersion κ. The generation interval, defined as the interval between the time when an individual becomes infected and the time when that individual infects another individual (Svensson, 2007) , plays a key role in shaping the relationship between r and R 0 (Wearing et al., 2005; Roberts and Heesterbeek, 2007; Wallinga and Lipsitch, 2007; Park et al., 2019) ; therefore, estimates of R 0 from different models directly depend on their implicit assumptions about the generation-interval distribution and the exponential growth rate. Early in an epidemic, information is scarce and there is inevitably a great deal of uncertainty surrounding both case reports (affecting the estimates of the exponential growth rate) and contact tracing (affecting the estimates of the generation-interval distribution). We suggest that disease modelers should make sure their assumptions about these three quantities are clear and reasonable, and that estimates of uncertainty in R 0 should propagate error from all three sources (Elderd et al., 2006) . We compare seven disparate models published online between January 23-26, 2020 that estimated R 0 for the SARS-CoV-2 outbreak (Bedford et al., 2020; Liu et al., 2020; Majumder and Mandl, 2020; Read et al., 2020a; Riou and Althaus, 2020a; . We use a Bayesian multilevel model to construct pooled estimates for the three key quantities: r,Ḡ, and κ; the pooled estimates reflect the uncertainties present in modeling approaches and their underlying assumptions. We use these pooled estimates to illustrate the importance of propagating different sources of error, particularly uncertainty in both the growth rate and the generation interval. We also use our framework to tease apart which assumptions of these different models led to their different estimates and confidence intervals. Despite the availability of more recent and/or updated estimates of R 0 , we restrict ourselves to the estimates above in order to focus on the resolution of uncertainty in the earliest stages of an epidemic. We gathered information on estimates of R 0 and their assumptions about the underlying generation-interval distributions from 7 articles that were published online between January 23-26, 2020 (Table 1) . Five studies (Liu et al., 2020; Majumder and Mandl, 2020; Read et al., 2020a; Riou and Althaus, 2020a; were uploaded to pre-print servers (bioRxiv, medRxiv, and SSRN); one report was posted on the web site of Imperial College London ; and one report was posted on nextstrain.org (Bedford et al., 2020) . Their modeling approaches vary widely: a branching process model (Bedford et al., 2020; Riou and Althaus, 2020a) , a deterministic Susceptible-Exposed-Infected-Recovered (SEIR) model (Read et al., 2020a) , an exponential growth model , a Poisson offspring distribution model (Liu et al., 2020) , and the Incidence 3 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint Table 1 : Reported estimates of the basic reproductive number and the assumptions about the generation-interval distributions. Estimates of R 0 and their assumptions about the shape of the generation interval distributions were collected from 7 studies. * We treat these intervals as a 95% confidence interval in our analysis. † We assume κ = 0.5 in our analysis. ‡ The authors presented R 0 estimates under different assumptions regarding the reporting rate; we use their baseline scenario in our analysis to remain consistent with other studies, which do not account for changes in the reporting rate. Decay and Exponential Adjustment (IDEA) model (Majumder and Mandl, 2020) . Four studies estimated R 0 by directly fitting their models to incidence data (Read et al., 2020a; Liu et al., 2020; Majumder and Mandl, 2020) . The remaining three studies estimated R 0 by comparing the predicted number of cases from their models with the estimated number of total cases by January 18 (between 1,000 and 9,7000 ) Some of these studies have now been published in peer-reviewed journals (Riou and Althaus, 2020b; or have been updated with better uncertainty quantification (Read et al., 2020b) . Early in an outbreak, R 0 is difficult to estimate directly; instead, R 0 is often inferred from the exponential growth rate r, which can be estimated reliably from incidence data (Ma et al., 2014) . Given an estimate of the exponential growth rate r and an intrinsic generationinterval distribution g(τ ) (Champredon and Dushoff, 2015) , the basic reproductive number can be estimated via the Euler-Lotka equation (Wallinga and Lipsitch, 2007) : In other words, estimates of R 0 must depend on the assumptions about the exponential growth rate r and the shape of the generation-interval distribution g(τ ). Here, we use the gamma approximation framework (Park et al., 2019) to (i) characterize the amount of uncertainty present in the exponential growth rates and the shape of the generation-interval distribution and (ii) assess the degree to which these uncertainties affect the estimate of R 0 . Assuming that generation intervals follow a gamma distribution with the meanḠ and the squared coefficient of variation κ, we have 4 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . This equation demonstrates that a generation-interval distribution that has a larger mean (higherḠ) or is less variable (lower κ) will give a higher estimate of R 0 for the same value of r (Wallinga and Lipsitch, 2007). As most studies do not report their estimates of the exponential growth rate, we first recalculate the exponential growth rate that correspond to their model assumptions. We do so by modeling reported distributions of the reproductive number R 0 , the mean generation intervalḠ, and the generation-interval dispersion parameter κ with appropriate probability distributions; we used gamma distributions to model values reported with confidence intervals and uniform distributions to model values reported with ranges. For example, Study 3 estimated R 0 = 2.92 (95% CI: 2.28-3.67); we model this estimate as a gamma distribution with a mean of 2.92 and a shape parameter of 67, which has a 95% probability of containing a value between 2.28 and 3.67 (see Table 2 for a complete description). For each study i, we construct a family of parameter sets by drawing 100,000 random samples from the probability distributions ( Table 2 ) that represent the estimates of R 0i and the assumed values ofḠ i and κ i and calculate the exponential growth rate r i via the inverse of Eq. 2: This allows us to approximate the probability distributions of the estimated exponential growth rates by each study; uncertainties in the probability distributions that we calculate for the estimated exponential growth rates will reflect the methods and assumptions that the studies rely on. We construct pooled estimates for each parameter (r,Ḡ, and κ) using a Bayesian multilevel modeling approach, which assumes that the parameters across different studies come from the same gamma distribution. The pooled estimates, which are represented as probability distributions rather than point estimates, allow us to average across different modeling approaches, while accounting for the uncertainties in the assumptions they make: where µ r , µ G , µ κ represent the pooled estimates, and σ r , σ G , and σ κ represent between-study standard deviations. We account for uncertainties associated with r i ,Ḡ i and κ i (and their correlations), by drawing a random set from the family of parameter sets for each study at each Metropolis-Hastings step. Since the gamma distribution does not allow zeros, we use κ = 0.02 instead for Study 7. We note that this approach does not account for nonindependence between the parameter estimates made by different modelers. As we add more models, the pooled estimates can become sharper even when the models no longer add more information. Thus, the pooled estimator should be interpreted with care. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint Table 2 : Probability distributions for R 0 ,Ḡ, and κ. We use these probability distributions to obtain a probability distribution for the exponential growth rate r. The gamma distribution is parameterized by its mean and shape α. Constant values are fixed according to Table 1 . * We do not account for this uncertainty during our recalculation of the exponential growth rate r because the reported estimate of R 0 and its uncertainty assumesḠ = 8; the original article reports three R 0 (and 95% CIs) estimates using three different values ofḠ: 7.6 (MERS-like), 8 (average), and 8.4 (SARS-like). We still account for this uncertainty in our pooled estimates (µ G ). † Study 6 uses the IDEA model (Fisman et al., 2013) , through which the authors effectively fit an exponential curve to the cumulative number of confirmed cases without propagating any statistical uncertainty. Instead of modeling R 0 with a probability distribution and recalculating r, we use r = 0.114 days −1 , which explains all uncertainty in the reported R 0 , when combined with the considered range ofḠ. We use weakly informative priors on hyperparameters: µ r ∼ Gamma(mean = 1 week −1 , shape = 2) µ G ∼ Gamma(mean = 1 week, shape = 2) µ κ ∼ Gamma(mean = 0.5, shape = 2) (σ r , σ G , σ κ ) ∼ half-normal(0, 10). We followed recommendations outlined in Gelman et al. (2006) , parameterizing the top-level gamma distributions in terms of their means and standard deviations and imposing weakly informative prior distributions on between-study standard deviations, i.e., half-normal(0, 10). We had initially used gamma priors with small shape parameters (< 1) on between-study shape parameters (= µ 2 /σ 2 ) but found this put too much prior probability on large betweenstudy variances. This phenomenon is a known problem (Gelman et al., 2006) . Alternative choices of prior for the between-study shape parameters are also suboptimal: imposing strong priors (e.g. half-t(µ = 0, σ = 1, ν = 4) assumes a priori that between-study variance is large, while weak priors (e.g. half-Cauchy(0,5)) can lead to poor mixing. We run 4 independent Markov Chain Monte Carlo chains each consisting of 500,000 burnin steps and 500,000 sampling steps. Posterior samples are thinned every 1000 steps. Convergence is assessed by ensuring that the Gelman-Rubin statistic is below 1.01 for all hyperparameters (Gelman et al., 1992) ; trace plots and marginal posterior distribution plots are presented in Appendix. 95% confidence intervals are calculated by taking 2.5% and 97.5% quantiles from the marginal posterior distribution for each parameter. 6 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint Squared coefficient of variation Figure 1 : Comparisons of the reported parameter values with our pooled estimates. We inferred point estimates (black), uniform distributions (orange) or confidence intervals (purple) for each parameter from each study, and combined them into pooled estimates (red; see text). Open triangle: we assumed κ = 0.5 for Study 2 which does not report generation-interval dispersion. Fig. 1 compares the reported values of the exponential growth rate r, mean generation in-tervalḠ, and the generation-interval dispersion κ from different studies with the pooled estimates that we calculate from our multilevel model. We find that there is a large uncertainty associated with the underlying parameters; many models rely on stronger assumptions that ignore these uncertainties. Surprisingly, no studies take into account how the variation in generation intervals affects their estimates of R 0 : all studies assumed fixed values for κ, ranging from 0 to 1. Assuming fixed parameter values can lead to overly strong conclusions (Elderd et al., 2006 ). Fig. 2 shows how propagating uncertainty in different combinations would affect estimates and CIs for R 0 . For illustrative purposes, we use our pooled estimates, which may represent a reasonable proxy for the state of knowledge as of January 23-26 ( Fig. 2A) . Comparing the models that include only some sources of uncertainty to the "all" model, we see that propagating error from the growth rate (which all but one of the studies reviewed did) is absolutely crucial: the middle bar ("GI mean"), which lacks growth-rate uncertainty, is relatively narrow. In this case, propagating error from the mean generation interval has negligible effect compared to propagating the uncertainty in r. Uncertainty in the generation-interval dispersion also has important effects as it determines the functional form of the relationship between r and R 0 (compare "growth rate + GI mean" with "all"). For example, reducing the dispersion parameter κ from 1 (assuming exponentially distributed generation intervals) to 0 (assuming fixed generation intervals) changes the r-R 0 relationship from linear to exponential, therefore increasing the sensitivity of R 0 estimates to r andḠ. As uncertainty associated with the exponential growth rate decreases, accounting for uncertainties in generation intervals becomes even more important (Fig. 2B) . Propagating error only from the growth rate gives very narrow confidence intervals in this case. Likewise, propagating errors from the growth rate and the mean generation interval gives wider but still too narrow confidence intervals. We expect this hypothetical example to better reflect more recent scenarios, as increased data availability will allow researchers to estimate r with more certainty. Figure 2 : Effects of r,Ḡ, and κ on the estimates of R 0 . We compare estimates of R 0 under five scenarios that propagate different combinations of uncertainties (A) based on our pooled estimates (µ r , µ G , and µ κ ) and (B) assuming a 4-fold reduction in uncertainty of our pooled estimate of the exponential growth rate (using (µ r + 3 × median(µ r ))/4, instead). base: R 0 estimates based on the median estimates of µ r , µ G , and µ κ . growth rate: R 0 estimates based on the the posterior distribution of µ r while using median estimates of µ G and µ κ . GI mean: R 0 estimates based on the the posterior distribution of µ G while using median estimates of µ r and µ κ . growth rate + GI mean: R 0 estimates based on the the joint posterior distributions of µ r and µ G while using a median estimate of µ κ . all: R 0 estimates based on the joint posterior distributions of µ r , µ G , and µ κ . Vertical lines represent the 95% confidence intervals. We also compare the estimates of R 0 across different studies by replacing their values of r,Ḡ, and κ with our pooled estimates (µ r , µ G , and µ κ , respectively) one at a time and recalculating the basic reproductive number R 0 (Fig. 3) . This procedure allows us to assess the sensitivity of the estimates of R 0 across appropriate ranges of uncertainties. We find that incorporating uncertainties one at a time increases the width of the confidence intervals in all but 7 cases. We estimate narrower confidence intervals for Study 3, Study 6, and Study 7 when we account for proper uncertainties in the generation-interval dispersion because they assume a narrow generation-interval distribution (compare "base" with "GI variation"); when higher values of κ are used, their estimates of R 0 become less sensitive to the values of r andḠ, giving narrower confidence intervals. We estimate narrower confidence 8 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . intervals for Study 5 and Study 7 when we account for proper uncertainties in the mean generation interval (compare "base" with "GI mean") because the range of uncertainty in the mean generation intervalḠ they consider is much wider than the pooled range (Fig. 1) . Substituting the reported r orḠ from Study 1 with our pooled estimates give narrower confidence intervals for similar reasons. We find that accounting for uncertainties in the estimate of r has the largest effect on the estimates of R 0 in most cases (Fig. 3) . For example, recalculating R 0 for Study 7 by using our pooled estimate of r gives R 0 = 3.9 (95% CI: 2.3-8.6), which is much wider than the uncertainty range they reported (2.0-3.1). There are two explanations for this result. First, even though the exponential growth rate r and the mean generation intervalḠ have identical mathematical effects on R 0 in our framework (Eq. 2 in Methods), r is more influential in this case because it is associated with more uncertainty (Fig. 1) . Second, assuming a fixed generation interval (κ = 0) makes the estimate of R 0 too sensitive to r andḠ. One exception is Study 1: we find this estimate of R 0 is most sensitive to generation-interval dispersion κ. This is because Study 1 assumes an exponentially distributed generation interval (κ = 1): estimates that rely on this assumption make R 0 relatively insensitive and thus tend to have particularly narrow confidence intervals. Finally, we incorporate all uncertainties by using posterior samples for µ r , µ G , and µ κ to recalculate R 0 and compare it with the reported R 0 estimates. Our estimated R 0 from the pooled distribution has a median of 2.9 (95% CI: 2.1-4.5). While the point estimate of R 0 is similar to other reported values from this date range, the confidence intervals are wider than all but one study. This result does not imply that assumptions based on the pooled estimate are too weak; we believe that this confidence interval more accurately reflects the level of uncertainties present in the information that was available when these models were fitted. In fact, because the pooled estimate does not account for overlap in data sources used by the models, we feel that it is more likely to be over-confident than under-confident. Our median estimate averages over the various studies, and therefore particular studies have higher or lower median estimates. We note in particular that, while the baseline example we used from Study 6 may appear to be an outlier, the authors of this study also explore different scenarios involving changes in reporting rate over time, under which their estimates of R 0 are similar to other reported estimates. Here, our focus is on estimating uncertainty, not on identifying potential explanations for these discrepancies. Estimating the basic reproductive number R 0 is crucial for predicting the course of an outbreak and planning intervention strategies. Here, we use a gamma approximation (Park et al., 2019) to decompose R 0 estimates into three key quantities (r,Ḡ, and κ) and apply a multilevel Bayesian framework to compare estimates of R 0 for the novel coronavirus outbreak. Our results demonstrate the importance of accounting for uncertainties associated with the underlying generation-interval distributions, including uncertainties in the amount of dispersion in the generation intervals. Our analysis of individual studies shows that many 9 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint Basic reproductive number q base growth rate GI mean GI variation all Figure 3 : Sensitivity of the reported R 0 estimates with respect to our pooled estimates of the underlying parameters. We replace the reported parameter values (growth rate r, GI meanḠ, and GI variation κ) with our corresponding pooled estimates (µ r , µ G , and µ κ ) one at a time and recalculate R 0 (growth rate, GI mean, and GI variation). The pooled estimate of R 0 is calculated from the joint posterior distribution of µ r , µ G , and µ κ (all); this corresponds to replacing all reported parameter values with our pooled estimates, which gives identical results across all studies. Horizontal dashed lines represent the 95% confidence intervals of our pooled estimate of R 0 . The reported R 0 estimates (base) have been adjusted to show the approximate 95% confidence interval using the probability distributions that we defined if they had relied on different measures for parameter uncertainties. early estimates of R 0 rely on strong assumptions. Of the seven studies that we reviewed, two of them directly fit their models to cumulative number of confirmed cases. This approach can be appealing because of its simplicity and apparent robustness, but fitting a model to cumulative incidence instead of raw incidence can both bias parameters and give overly narrow confidence intervals, if the resulting nonindependent error structures are not taken into account (Ma et al., 2014; King et al., 2015) . Naive fits to cumulative incidence data should therefore be avoided. Many sources of noise affect real-world incidence data, including both dynamical, or "process", noise (randomness that directly or indirectly affects disease transmission); and observation noise (randomness underlying how many of the true cases are reported). Disease modelers face the choice of incorporating one or both of these in their data-fitting and 10 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . modeling steps. This is not always a serious problem, particularly if the goal is inferring parameters rather than directly making forecasts (Ma et al., 2014) . Modelers should however be aware of the possibility that ignoring one kind of error can give overly narrow confidence intervals (King et al., 2015; Taylor et al., 2016) . There are other important phenomena not covered by our simple framework. Examples that seem relevant to this outbreak include: changing reporting rates, reporting delays (including the effects of weekends and holidays), and changing generation intervals. For emerging pathogens such as SARS-CoV-2, there may be an early period of time when the reporting rate is very low due to limited awareness or diagnostic resources; for example, (Study 6) demonstrated that estimates of R 0 can change from 5.47 (95% CI: 4.16-7.10) to 3.30 (95% CI: 2.73-3.96) when they assume 2-fold changes in the reporting rate between January 17, when the official diagnostic guidelines were released (World Health Organization, 2020b), and January 20. Delays between key epidemiological timings (e.g., infection, symptom onset, and detection) can also shift the shape of an observed epidemic curve and, therefore, affect parameter estimates as well as predictions of the course of an outbreak (Tariq et al., 2019) . Even though a constant delay between infection and detection may not affect the estimate of the growth rate, it can still affect the associated confidence intervals. Finally, generation intervals can become shorter throughout an epidemic as intervention strategies, such as quarantine, can reduce the infectious period (Hethcote et al., 2002) . Accounting for these factors is crucial for making accurate inferences. Here, we focused on the estimates of R 0 that were published within a very short time frame (January 23-26). During early phases of an outbreak, it is reasonable to assume that the epidemic grows exponentially (Anderson and May, 1991) . However, as the number of susceptible individuals decreases, the epidemic will saturate, and estimates of r used for R 0 should account for the possibility that r is decreasing through time. Although our analysis only reflects a snapshot of a fast-moving epidemic, we expect certain lessons to hold: confidence intervals must combine different sources of uncertainty. In fact, as epidemics progress and more data becomes available, it is likely that inferences about exponential growth rate (and other epidemiological parameters) will become more precise; thus the risk of over-confidence when uncertainty about the generation-interval distribution is neglected will become greater. We strongly emphasize the value of attention to accurate characterization of the transmission chains via contact tracing and better statistical frameworks for inferring generationinterval distributions from such data (Britton and Scalia Tomba, 2019) . A combined effort between public-health workers and modelers in this direction will be crucial for predicting the course of an epidemic and controlling it. We also emphasize the value of transparency from modelers. Model estimates during an outbreak, even in pre-prints, should include code links and complete explanations. We suggest using methods based on open-source tools allow for maximal reproducibility. In summary, we have provided a basis for comparing exponential-growth based estimates of R 0 and its associated uncertainty in terms of three components: the exponential growth rate, mean generation interval, and generation interval dispersion. We hope this framework will help researchers understand and reconcile disparate estimates of disease transmission 11 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . early in an epidemic. 12 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.01.30.20019877 doi: medRxiv preprint 18 . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.01.30.20019877 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101 /2020 Infectious diseases of humans: dynamics and control Genomic analysis of nCoV spread Estimation in emerging epidemics: Biases and remedies 2019 Novel Coronavirus (2019-nCoV) Intrinsic and realized generation intervals in infectious-disease transmission On the definition and the computation of the basic reproduction ratio R 0 in models for infectious diseases in heterogeneous populations Uncertainty in predictions of disease spread and public health responses to bioterrorism and emerging diseases An IDEA for short term outbreak projection: nearcasting using the basic reproduction number Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) Inference from iterative simulation using multiple sequences Effects of quarantine in six endemic models for infectious diseases Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet Report 2: Estimating the potential total number of novel Coronavirus cases in Wuhan City Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola Early transmission dynamics in Wuhan, China, of novel coronavirusinfected pneumonia Transmission dynamics of 2019 novel coronavirus Estimating initial epidemic growth rates Generality of the final size formula for an epidemic of a newly invading infectious disease Early transmissibility assessment of a novel coronavirus in Wuhan A practical generationinterval-based approach to inferring the strength of epidemics from their speed Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions Pattern of early human-to-human transmission of wuhan 2019-nCoV Pattern of early human-to-human transmission of Wuhan Model-consistent estimation of the basic reproduction number from the incidence of an emerging infection A note on generation times in epidemic models Assessing reporting delays and the effective reproduction number: The Ebola epidemic in DRC Stochasticity and the limits to confidence when estimating R 0 of Ebola and other emerging infectious diseases How generation intervals shape the relationship between growth rates and reproductive numbers Appropriate models for the management of infectious diseases Coronavirus disease 2019 (COVID-19) Situation Report -38 Laboratory testing for 2019 novel coronavirus (2019-nCoV) in suspected human cases Novel Coronavirus (2019-nCoV) Situation Report -6 Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak We thank Daihai He for providing helpful comments on the manuscript. SWP and JD developed the statistical framework. SWP reviewed the published literature. SWP performed the analysis. SWP, BMB, and JD created the figures. SWP and JD wrote the first draft. All authors contributed to the writing and approval of the final report. BMB and DJDE were supported by Natural Sciences and Engineering Research Council (NSERC). ML was supported by Canadian Institutes of Health Research (CIHR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We declare no competing interests.