key: cord-0980411-fmmxoa97 authors: You, Chong; Deng, Yuhao; Hu, Wenjie; Sun, Jiarui; Lin, Qiushi; Zhou, Feng; Pang, Cheng Heng; Zhang, Yuan; Chen, Zhengchao; Zhou, Xiao-Hua title: Estimation of the time-varying reproduction number of COVID-19 outbreak in China date: 2020-05-11 journal: Int J Hyg Environ Health DOI: 10.1016/j.ijheh.2020.113555 sha: b5d475f0a67034f1bd432df91907ef606801be48 doc_id: 980411 cord_uid: fmmxoa97 BACKGROUND: The 2019 novel coronavirus (COVID-19) outbreak in Wuhan, China has attracted world-wide attention. As of March 31, 2020, a total of 82,631 cases of COVID-19 in China were confirmed by the National Health Commission (NHC) of China. METHODS: Three approaches, namely Poisson likelihood-based method (ML), exponential growth rate-based method (EGR) and stochastic Susceptible-Infected-Removed dynamic model-based method (SIR), were implemented to estimate the basic and controlled reproduction numbers. RESULTS: A total of 198 chains of transmission together with dates of symptoms onset and 139 dates of infections were identified among 14,829 confirmed cases outside Hubei Province as reported as of March 31, 2020. Based on this information, we found that the serial interval had an average of 4.60 days with a standard deviation of 5.55 days, the incubation period had an average of 8.00 days with a standard deviation of 4.75 days and the infectious period had an average of 13.96 days with a standard deviation of 5.20 days. The estimated controlled reproduction numbers, [Formula: see text] , produced by all three methods in all analyzed regions of China are significantly smaller compared with the basic reproduction numbers [Formula: see text]. CONCLUSIONS: The controlled reproduction number in China is much lower than one in all regions of China by now. It fell below one within 30 days from the implementations of unprecedent containment measures, which indicates that the strong measures taken by China government was effective to contain the epidemic. Nonetheless, efforts are still needed in order to end the current epidemic as imported cases from overseas pose a high risk of a second outbreak. On December 29, 2019, four cases of pneumonia with unknown etiology were reported in Wuhan, the capital city of Hubei Province in Central China. 1 Since then, the outbreak has dramatically worsened over a short span of time and has received considerable global attention. On January 7, 2020, the pathogen of the current outbreak was identified as a novel coronavirus (2019-nCoV), and its gene sequence was quickly submitted to the WHO (The coronavirus was renamed COVID-19 by the WHO on February 12). 2, 3 On January 30, the WHO announced the listing of this novel coronavirus-infected pneumonia (NCP) as a "public health emergency of international concern". As of March 31, 2020, the National Health Commission (NHC) of China had confirmed a total of 82,631 cases of COVID-19 in China, including 3,321 fatalities and 76,415 recoveries. Since January 19, 2020, strict containment measures, including travel restrictions, contact tracing, entry or exit screening, non-hospital isolation, quarantine and awareness campaigns have been implemented by the Wuhan municipal government and quickly adopted by other cities within China with the aim to minimize virus transmission via human-to-human contact. In 2009, similar measures were employed in China in response to the outbreak of H1N1 virus breakout. This article investigates the change in the basic reproduction number and controlled reproduction number since the outbreak of COVID-19. We have found that the estimated controlled reproduction numbers in all different regions are significantly smaller compared with the basic reproduction numbers , which indicates that the containment measures carried out by Chinese government was effective and efficient. A form was designed to collect and standardize information from each individual confirmed case. There were two parts in the designed form: (A) Part A directly retrieved pubic available data from provincial or municipal health commissions as well as apps and websites managed by local governments in China. The following details were included on each confirmed case: case ID, region, age, gender, first sign of symptoms, date of symptom onset, date of diagnosis, history of travel or previous residency in Hubei province, contact history with other confirmed cases and others. See more detail information collected in Part A of the form in Supplement. (B) Part B includes probable transmission chains which were inferred based on history of travel to or previous residency in Hubei and other related information. The transmission chain was inferred as follows: (1) if the individual X has not recently been to Hubei province, but has been in close contact with an imported case Y from Hubei, then the individual X was determined to be infected by this imported case Y and hence formed a transmission chain; (2) if the individual X has not recently been in Hubei province, but has been in close contact with a local case Y who was clearly infected before the contact, then this individual X was determined to be infected by the local case Y. Note: if the individual has been to Hubei Province, the transmission history would not be recorded despite the existence of contact tracing information. See Part B of an empty form in Supplement. The date of infection was not included in the form but some can be inferred from the form as follows: (1) If the individual has not been to Hubei province recently, but was exposed within a three-day period (i.e., the individual had contact with confirmed cases within three consecutive days), then the corresponding date of infection is inferred as the middle of the exposure period; (2) If the individual previously traveled to Hubei province but returned within three days, then the date of infection is inferred as the middle of the travel period. All data collection and processing were done manually except that the data from Shenzhen was semi-automated collected (a spreadsheet can be downloaded from an official website for Part A, however information in Part B still needed to be inferred and entered manually). Data collectors were trained and divided into 6 groups according to the regions of confirmed cases with 2 collectors in each group to ensure the efficiency and accuracy of data collection. Data sources, typically the website address, and names of the collectors were also recorded in the form of each case for quality control purposes. The collected data were consistently monitored and spot checked by two assigned supervisors. As of March 31, 2020, we have collected a total of 14,829 confirmed cases detected outside Hubei province in China. The generation time is the time difference between dates of infection of successive cases in a transmission chain while the serial interval is the difference in dates of symptoms onset between a pair of a primary case and its secondary case. The incubation period is defined as time difference between contraction of the disease and symptoms onsets. The infectious period is the duration of which an infected individual can transmit pathogens to a susceptible host. In this study, the infectious period is defined as the time difference between date of infection and date of diagnosis as there is strong evidence showing that a diseased individual remains infectious even during the incubation period, and would be immediately isolated upon positive diagnosis hence losing the transmissibility. All are key quantities that depict an epidemic and are essential to estimate the basic/controlled reproductive number, / . Among the 645 chains of transmission identified from 14,829 confirmed cases recorded outside Hubei Province as of March 31, 2020, very few of them have their dates of infection acquired, but 198 of them have their dates of symptoms onset available. Hence, we only calculate the serial interval but not the generation time. We can see that some serial intervals are negative, which suggests that COVID-19 is infectious during incubation and negative values were caused by different lengths of incubation period between individuals. The average of the serial intervals is 4.60 days and the standard deviation is 5.55 days (see table 1 ). Note that the serial interval of SARS-nCoV in Hongkong was 8.4 days on average. 4 In fact, the distribution of serial interval may be biased for estimating generation time, especially when the disease is infectious during incubation, in that the variance of generation time could be overestimated. 5 In addition, a total of 169 cases in the collected data were able to identify the dates of infection according to the method described in previous section. The histogram of infectious period is in Figure 1 while the numerical summary is in Table 1 . We acknowledge that isolation could occur before the date of diagnosis, suspected cases could be isolated without the formal positive results, hence infectious period could be potentially overestimated. Furthermore, incubation periods of these 169 cases can be identified. See Figure 1 and Table 1 for the histogram and numerical summary for the incubation period. We found that there were no significant demographical differences between the subset of cases used to estimate serial interval and infectious period and the cases in the full dataset. Therefore, the inference made on serial interval, incubation period and infectious period based on the corresponding subsets should be able to represent the full dataset. The reproduction number is defined as the (average) number of new infections generated by one infected individual during the entire infectious period in a fully susceptible population. 6 The basic reproduction number reflects the ability of an infection spreading under no control. When the size of susceptible population is limited, the quantity, effective reproduction number , is used instead of . Similarly, the quantity, controlled reproduction number , should be used to describe the ability of disease spreading when interventions (such as quarantine, isolation, or traffic control) are taking place. Hence a good measure of any intervention is to reduce . Note that the disease will decline and eventually die out if ≤ 1. The basic reproduction number can be estimated through a variety of models. 7 In this section, we have compared three most popular estimates of or as shown below. This method assumes that the total number of secondary cases infected by a single primary case follows a Poisson distribution. The number of individuals infected on Day is usually approximated by the number of new reported confirmed cases on Day , and the generation time is approximated by its corresponding serial interval. Let be the number of reported new confirmed cases on Day . Suppose that the serial interval has a maximum of days and the number of new cases generated by an infected individual is assumed to follow a Poisson distribution with parameter . 8 The probability that the serial interval of an individual lies in days is , which can be estimated from the empirical distribution of serial interval or by setting up a discretized Gamma prior on it. Note only the non-negative values of serial interval are used here. Thus, the likelihood function can be reduced into a thinned Poisson The reproduction number ⋅ can be estimated by maximizing the likelihood function. Note that if the empirical distribution of serial interval is used or 's is given, then (2) Exponential growth rate-based (EGR) method At the early period of an epidemic, the number of infected cases rises exponentially. Similar to the ML method, the number of individuals infected on Day is approximated by the number of new reported confirmed cases, and the generation time is approximated by its corresponding serial interval. Suppose the exponential epidemic growth rate (Malthusian coefficient) is ), which can be estimated by fitting a least square line to the daily number of reported new confirmed cases in a log-scale, namely, log . Let . / denote the probability density function of serial interval. Hence the reproduction number can be calculated according to the Euler-Lotka equation in a moment generating form 9 ' ⋅ = 1 Here we consider a stochastic Susceptible-Infected-Removed (SIR) model rather than a standard deterministic one. The major advantage of using a stochastic dynamic model is that it affords improved accounting for real variabilities and increases opportunity for quantifying uncertainties. 10 Here we denote 4 , 5 and as the number of susceptible, infected but not lab-confirmed cases (including those in incubation period) and removed population (including recoveries, fatalities and confirmed cases) at time respectively, and note that = 4 + 5 + is a constant. There was some evidence indicating that the COVID-19 is infectious during its incubation period. 11 Due to this unique nature of COVID-19, individuals in state 5 is contagious during even the incubation period. With the assumption of equal transmissibility during the whole infectious period, individuals in state 5 pass pathogens to susceptible population with a constant transmission rate 7. The removed individuals are no longer infectious since they have been isolated in hospital. Suppose that the infectious period of an individual is a random variable 8 ∼ Exp = , then the reproduction number ⋅ = 7> 8 = 7/ =, where = and 7 are the removing rate and transmission rate respectively in the system of ordinary differential equation (ODE) below, 24 2 = − 754 , The maximum likelihood method is used to estimate model parameters where the likelihood is obtained by sequential Monte Carlo method, and parameters are estimated using the Iterated Filtering algorithm (IF2) 12 implemented as mif in the R package pomp 13 . Here we set 4 0 equals the population of the region, 0 = 0, 5 0 is 14 times the average number of confirmed cases from Day 0 to Day 7, and = = 1/13.96, the inverse of mean infectious period, obtained from the collected data described before. It is arguable that the transmission rate 7 is a constant over the whole infectious period, some studies indicate that 7 may vary and possibly peak on or before symptoms onset. 14 However, it is beyond the scope of this study. In addition, this model does not consider asymptomatic and undocumented case, quarantine measures and human mobility, a more sophisticated and realistic model to reflect unique features of COVID-19 is worth of another paper itself. Nonetheless, this SIR model should at least provide some useful insight into reproduction numbers and is a better alternative of the popular SEIR model which assumes noninfectious incubation period. In this section we have estimated the basic reproduction number and the controlled reproduction number . Since January 19, 2020, various containment measures have been strictly implemented, especially after the State Council agreed to include COVID-19 into the Management of the Infectious Diseases Law and the Health and Quarantine Law on January 20. Wuhan was locked down on January 23, and the inter-provincial flow of people was greatly reduced (https://qianxi.baidu.com). which effectively controlled the exportation of infected individuals. Based on an average 13.91-day infectious period estimate from our collected data, we expect a flatter rate of increment starting from the end of January. Figure 2 plots the number of daily new cases on a log-scale against date, and, as anticipated, the trend supports this estimate. Therefore, the basic reproduction number and the early-phase controlled reproduction number , are estimated based on collected data in two separate periods, i.e., from January 21 (the starting date of daily updates of confirmed cases nationwide) to January 28, and from January 29 to February 5 respectively. Table 2 and Table 3 . Despite the disagreement between different estimation methods, all three methods indicate notable reductions from to which suggests an improvement from January 21 to February 5. This is possibly due to the effective interventions and prompt actions by the local and central governments to minimize further spreading. We also notice that EGR yields smaller estimates of compared to other methods. This might be because the number of infected patients does not grow exponentially after such strict containment measures, hence EGR is only recommended to estimate in early stage of an epidemic. Furthermore, the time-varying controlled reproduction number can be estimated through the Poisson likelihood (ML) method where is from February 1 to 29, 2020. Note that there were very few new cases confirmed outside Hubei province since March 1, 2020, hence we only updated up to February 29, 2020. For each Day , the number of daily reported new cases from Day − 9 to Day is used to estimate ' . Figure 3 plots the estimated controlled reproduction number ' along with its 95% confidence interval (CI) for selected regions of China. Note that the estimated ' reflects the average spreading ability of the epidemic in a short period prior to Day . As a result, the real-time might be overestimated if the general trend of is declining. Despite the continuous increase in new confirmed cases between January 21, 2020 and February 5, 2020, the estimated controlled reproduction numbers produced by all three methods in all analyzed regions are significantly smaller compared with the basic reproduction numbers . We can see that outside Hubei province, dropped below 1 around February 11, 2020 while in Hubei fell below 1 around February 19, 2020, which indicate indicates that the containment measure carried out by Chinese government was effective and efficient. The controlled reproduction numbers is now much lower than one in all regions of China for quite a time, however, it has yet reached zero. A possible explanation is that the number of foreign imported cases has grown significantly since late February, and posed a high risk of a second outbreak. Efforts are needed in order to end the current epidemic, especially improving quarantine measures at the border. In this study, we estimated the reproduction number of COVID-19 in China based on three approaches, namely Poisson likelihood-based method (ML), exponential growth rate-based method (EGR) and stochastic Susceptible-Infected-Removed dynamic model-based method (SIR). The EGR method can be only used at the early period of an epidemic, when the number of confirmed cases grows exponentially. The SIR method is not able to provide a time-varying reproduction number, and as existing literature on COVID-19 showed that SIR model is likely to overestimate the basic reproduction number since a large proportion of susceptible cases were isolated due to the strong control measures implemented 15, 16, 17 . Hence, ML method is preferred in this study. Note that this study omits the effect of human mobility before the lockdown starting from January 23, 2020, which may cause an overestimation of the basic reproduction outside Hubei. A more realistic transmission model with spatial spread such as metapopulation disease model can be used but this is beyond the scope of this study. 18 The dataset used in this study is based on the confirmed cases reported by the NHC of China. However, during our period of data collection, the official guidelines for diagnosis and treatment of COVID-2019 underwent six updates. The criteria of confirmation have evolved from the original "whole genome sequencing of the respiratory excretion" to "positive viral nucleic acid results by the RT-PCR of the respiratory excretion or viral gene sequence" in the 5 th edition, and, as of now, the inclusion of positive nucleic acid results of the blood sample. The confirmation process has been simplified by the removal of the accreditation process by the national expert committee for confirmed cases. The fourth edition of the official guidelines for diagnosis and treatment granted the accrediting authority to municipalities. 19 In addition, the medical resources in Hubei province especially in Wuhan received a remarkable boost from early February 2020. All of these changes might result in a temporary surge of confirmed cases and lead to an overestimation of during mid of February, especially in Hubei Province. The current containment measures in China mainly aim to cut the transmission from human to human via respiratory droplets and have received a significant success by reducing the reproduction number below one in 30 days from the implementation of measures (namely February 19, 2020). However, other transmission pathways, including fecal-oral transmission and aerosol transmission, could not yet be excluded based on current evidence. If other transmission mechanisms do exist, the values could increase in the future unless further measures would intersect these transmission pathways. China NHC have started to report the number of asymptomatic cases since April 1. The asymptomatic cases are typically hard to be detected. If asymptomatic cases are not isolated, there may be a second chance of outbreak. Note the incubation period, serial intervals and infectious period estimated in this study only apply to symptomatic patients. Furthermore, despite that the outbreak has been effectively contained currently in China, the number of imported cases will potentially grow with the development of global pandemic. As reported by China NHC, there are totally 788 confirmed imported cases and 4 domestic cases related to these imported cases during March 2020. In brief, the main goal of epidemic prevention in China has shifted from preventing the transmission of domestic confirmed cases to prevention of the spread of asymptomatic cases and foreign imported cases. We thank Taojun Table 2 : Estimates and 95% confidence intervals of basic reproduction number in some selected provinces (or cities) of China, from Jan 21 to Jan 28, 2020. ML EGR SIR Table 3: Estimates and 95% confidence intervals of controlled reproduction number in some selected provinces (or cities) of China Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet A novel coronavirus outbreak of global health concern Transmission dynamics and control of severe acute respiratory syndrome Estimation in emerging epidemics: Biases and remedies Infectious diseases of humans: dynamics and control Comparison of methods to Estimate Basic Reproduction Number (R0) of influenza, Using Canada 2009 and 2017-18 A (H1N1) Data A likelihood-based method for real-time estimation of the serial interval and reproductive number of an epidemic How generation intervals shape the relationship between growth rates and reproductive numbers Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany Inference for dynamic and latent variable models via iterated, perturbed Bayes maps Statistical inference for partially observed Markov processes Temporal dynamics in viral shedding and transmissibility of COVID-19 Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions An updated estimation of the risk of transmission of the novel coronavirus (2019-nCov) Modeling the spatial spread of infectious diseases: the GLobal Epidemic and Mobility computational model Yuhao Deng: writing, data analysis Wenjie Hu: data analysis Jiarui Sun: data analysis Qiushi Lin: data collection Feng Zhou: data collection Cheng Heng Pang: writing Yuan Zhang: writing Zhengchao Chen: wrting Xiao-Hua Zhou: overall design We have no financial relationships (regardless of amount of compensation) with any entities. There is no conflict of interest.