key: cord-0921038-tvzn0112
authors: Tian, Huaiyu; Liu, Yonghong; Li, Yidan; Wu, Chieh-Hsi; Chen, Bin; Kraemer, Moritz U. G.; Li, Bingying; Cai, Jun; Xu, Bo; Yang, Qiqi; Wang, Ben; Yang, Peng; Cui, Yujun; Song, Yimeng; Zheng, Pai; Wang, Quanyi; Bjornstad, Ottar N.; Yang, Ruifu; Grenfell, Bryan T.; Pybus, Oliver G.; Dye, Christopher
title: An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China
date: 2020-03-31
journal: Science
DOI: 10.1126/science.abb6105
sha: 7ec7962b9724b94f0085b4974f86575066a0c9df
doc_id: 921038
cord_uid: tvzn0112

Responding to an outbreak of a novel coronavirus (agent of COVID-19) in December 2019, China banned travel to and from Wuhan city on 23 January and implemented a national emergency response. We investigated the spread and control of COVID-19 using a unique data set including case reports, human movement and public health interventions. The Wuhan shutdown was associated with the delayed arrival of COVID-19 in other cities by 2.91 days (95%CI: 2.54-3.29). Cities that implemented control measures pre-emptively reported fewer cases, on average, in the first week of their outbreaks (13.0; 7.1-18.8) compared with cities that started control later (20.6; 14.5-26.8). Suspending intra-city public transport, closing entertainment venues and banning public gatherings were associated with reductions in case incidence. The national emergency response appears to have delayed the growth and limited the size of the COVID-19 epidemic in China, averting hundreds of thousands of cases by 19 February (day 50).

In order to quantify the association between the Wuhan travel shutdown (23 January 2020) and COVID-19 spread, we used data collected between 31 December 2019 and 28 January 2020. The association between distance, human movement, interventions and timing of COVID-19 spread was assessed by a linear regression. Among five possible regression models examined (Table S3) , the model judged best by the Akaike Information Criterion) was:

Dependent variable Yi is the arrival time (day) of the first confirmed case in city i, a measure of the spatial spread of COVID-19. The βj's are the regression coefficients. α is the intercept. TotalFlowi represents the passenger volume from Wuhan to city i by airplane, train and road during the whole of 2018. Popi is the population of city i. Lati and Longi represent the latitude and longitude of city i. The binary dummy variable Shutdowni is used to identify whether the arrival time of COVID-19 in newly-infected city i is associated with the Wuhan travel ban. For each city, Shutdown was set to 0 for arrival before 23 January 2020 and 1 for arrival on or after 23 January 2020. The regression analysis was performed using the R version 3.4.0. All of the candidate models examined (Table S3) produced similar estimates for the estimated delay in the arrival time due to the shutdown.

Diagnostic analyses were performed to check whether model assumptions have been violated. "Residuals" in this section refers to the residuals of linear regression model for travel ban. Spatial coordinates (Lat and Long) and passenger volume from Wuhan (TotalFlow) were included to adjust for possible spatial and temporal autocorrelation, respectively. However, we performed additional checks to investigate whether there was spatial autocorrelation among the residuals, which could contribute to data dependency.

If there were spatial autocorrelation in the residuals, then the pairwise residual differences of cities closer together would tend to be smaller than those of cities far apart. In other words, we would expect a high correlation between the pairwise differences in residuals and pairwise distances of cities. Such a correlation was not evident (r=0.03).

Differential accessibility from Wuhan to other cities may contribute to data-dependence. Therefore, we examined the timing of peak inflow from Wuhan City and the arrival time of COVID-19 in each city. The result shows no evident correlation between them (r=-0.07, P=0.25). In addition, we calculated the correlation between the residuals and peak time of Wuhan inflow during 11 to 23 January (15 days before Chinese New Year to the Wuhan shutdown). Again, we found no evident correlation (r=0.08).

We did not detect issues regarding heteroscedasticity (studentized Breuch-Pagan test P=0.20) nor normality (Shapiro-Wilk test P=0.06). The residuals generally lie close to the straight line in the Q-Q plot (Fig. S6) . Therefore, it is not evident that the distribution of errors departs from a normal distribution.

The associations between transmission control measures and the number of cases reported during the first week of an outbreak in a new location

The Level 1 national emergency response required suspected and confirmed cases of COVID-19 to be isolated and reported immediately in all cities. Using data for 296 cities across China, we investigated the associations between the epidemic intensity and three transmission control measures: closure of entertainment venues and banning public gatherings (B); suspension of intra-city public transport (S); and prohibition of travel by any means to and from other cities (P). The timing of implementation was recorded for each control measure in each city, including the delay in implementation since 31 December 2019 (day 0 of the epidemic). Each city was regarded as implementing an intervention when the official policy was announced publicly (Table S1 ). Other transmission control measures included delineating control areas, closure of schools, isolation of suspected and confirmed cases, and the disclosure of information. We could not investigate the associations between these interventions and development of the epidemic because they were reportedly applied in all cities uniformly and without delay.

We perform an association analysis using Poisson regression to investigate how the interventions B, S and P were associated with the dependent (Poisson) variable Yi-the total number of confirmed cases that were reported during the first seven days of the outbreak in city i. We detected heteroscedasticity in the Pearson residuals of the Poisson regression model. In attempt to remedy this we applied negative-binomial regression and quasi-poisson regression. Neither methods resolved the issue with heteroscedasticity. Therefore, we resorted to regression models that does not have dependent variables as counts.

To check and confirm the validity of results obtained with the Poisson regression model, we repeated the analysis with a log-linear model. The first step was to standardize case counts by dividing by the number of people in each city (incidence per capita) and the number of people arriving from Wuhan, giving dependent variable Y * . The loglinear model is then:

The subscripts of the coefficients (j) are consistent with a Poisson regression model. We found that the relationship between Y * and the timing of B is nonlinear. To correct for this, we discretize the timing variable Ti,B to obtain an ordinal variable Ci,B with four categories, namely 1) 30 December 2019-22 January 2020, 2) 23 January 2020, 3) 24

January 2020 and 4) 25 January onwards. To preserve statistical power, proportional change of E[Y * ] was assumed when moving onto a later category. For example, the difference in E[log(Yi * )] time categories 1 and 2 was assumed the same as that between time categories 2 and 3.

To avoid heteroscedasticity, variables describing the distance from Wuhan, and the implementation and timing of P (prohibiting inter-city travel) were removed. Further exploration of the model showed that these variables did not help to explain the variation in Y * . Table S3 presents the results of the log-linear regression analysis, which uphold the conclusions reached from the Poisson regression model.

As the purpose of the log-linear regression analysis is to validate the results from the Poisson regression analysis, we performed various diagnostic checks on the log-linear regression model. In summary, we did not find any issues that might invalidate the results. The details of the diagnostic analysis are presented below. "Residuals" in this section refers of the residuals of the log-linear regression model.

As with the regression analysis on the Wuhan city travel ban, we explored whether there is any spatial autocorrelation in the residuals. There was no evident correlation between the pairwise differences in residuals and those of cities (r=-0.09). We also examined whether the timing of Wuhan inflow induced dependence in the data. To this end, we evaluated the correlation between the residuals and peak time of inflow from Wuhan to each city during the period from 11 to 23 January. Such correlation is not evident (r=-0.04),

In terms of temporal correlation within the residuals, there was no evidence that the mean residuals with arrival time on day j associated with the residuals with arrival time on day j+1 (P=0.80).

We found no evidence of heteroscedasticity in the residuals (studentised Breusch-Pagan test P=0.148). The Shapiro-Wilk test provides evidence against residual normality (P<0.01). However, in a sufficiently large dataset such as this one, the Shapiro-Wilk test is very sensitive to departure from normality but does not indicate the severity level of the departure. The Q-Q plot (Fig. S7) indicates a moderate departure from the normal distribution in the residuals, where the histogram indicates that there is a longer right tail (Fig. S7) . As the distribution is not severely asymmetric, and our sample size is sufficiently large, this should not be an issue by the central limit theorem (21, 22) .

Nevertheless, to ensure the robustness of the conclusions, we also evaluated bootstrap estimates of the regression coefficients, confidence interval and P-value (Table S3) using the R package boot (23, 24) . The bootstrap estimates are very similar to the least squared estimates from log-linear regression model (data not shown).

For each province, we estimated the effect of transmission control measures by fitting an SEIR model (25) where S, E, I, and R are the number of susceptible, exposed (latent), infectious, and removed individuals on day t in province i. This standard SEIR model makes some simplifying assumptions: for example, the human population is homogeneous (e.g. not stratified by age or sex), contacts between infectious and susceptible people are also homogeneous (e.g. not stratified by social group) and infection is fully immunizing (1). The model and its assumptions apply only to the first 50 days of the epidemic, the scope of the analysis in this study.

Even with these assumptions, the model describes the data quite accurately for the whole country (Fig. 4A) , and reasonably well for each province, with the notable exceptions of Beijing and Shanghai, which had relatively few cases among their well-connected populations (Fig. 4A, Fig. S3 ). The accuracy of the model's description of national data is remarkably good, given that the model fitting was done separately for each province, and then aggregated to obtain the national total.

The basic reproductive number of the model is R0 =β/γ, where β is the per capita transmission rate per day and 1/δ and 1/γ are, respectively, the mean latent and infectious periods. emergency responses (stage 1), with effects measured as C1 (Fig. S4) . Because the effects of control measures varied among provinces during the scale-up, C1 was grouped into high C1_high, medium C1_medium, and low C1_low. The allocation of provinces to groups was made by proposing several alternative hypotheses and testing each by model fitting ( Table 3, Table S4 ). Stage 2 of control (C2) began when more than 95% of cities in a province had implemented control measures, including the closure of entertainment venues, suspension of intra-city public transport or prohibition of travel by any means to and from other cities (see above). In Hubei Province (except Wuhan city), stage 2 included the use of shelter or "Fang Cang" hospitals from early February onwards.

Model fitting was performed using the Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm with the MATLAB (version R2016b) toolbox DRAM (Delayed Rejection Adaptive Metropolis). Prior estimates of the mean and (Gaussian) variance of R0, δ, and γ were derived from epidemiological surveys (27) . There was no evidence to inform a prior for the reporting rate ρ, the proportion of cases that were reported among all latent and infectious individuals in Wuhan. Systematic surveys of infection (e.g. by serological testing) have not yet been reported. In the absence of any guiding data, ρ was given a prior uniform distribution between 0 and 1.

After a burn-in of 1 million iterations, we ran the MCMC simulation for a further 10 million iterations, sampled at every 1000th step to avoid auto-correlation. Trace plots and Gelman and Rubin diagnostics were used to judge convergence of the MCMC chains (Fig. S4 ). Each fitting exercise was repeated three times to test the robustness of results, which converged to the same estimates on each occasion (Fig. S5) . We used the fitted SEIR model, with posterior estimates of parameter values, to simulate outbreaks outside Wuhan, with and without the Wuhan travel ban and with and without the national emergency response (Fig. 4B) . Synchrony is measured by the correlation between the number of cases reported in two provinces on each day, using a spatial non-parametric correlation function (28) . Table S1 . Candidate statistical models used to study the association between the Wuhan city travel ban and the arrival time of COVID-19 in other cities (see Table 1 of the main text). The chosen model is shown in bold face. Table S4 . Candidate models used to characterize the association between control measures and the new confirmed cases reported each day in different provinces (see Table 3 of the main text). Based on the deviance information criterion (DIC), the chosen model is shown in bold face. High, medium and low represent the efficacy of control measures in three groups of provinces.

China Novel Coronavirus Investigating and Research Team, A novel coronavirus from patients with pneumonia in China

Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding

A new coronavirus associated with human respiratory disease in China

A pneumonia outbreak associated with a new coronavirus of probable bat origin

Roles of Different Transport Modes in the Spatial Spread of the 2009 Influenza A(H1N1) Pandemic in Mainland China

A novel coronavirus outbreak of global health concern

COVID-19 control in China during mass population movements at New Year

The effect of human mobility and control measures on the COVID-19 epidemic in China

Tibet activates highest-level public health alert

Travelling waves and spatial hierarchies in measles epidemics

The hidden geometry of complex, network-driven contagion phenomena

Quantifying the impact of human mobility on malaria

Strategies for containing an emerging influenza pandemic in Southeast Asia

Global trends in emerging infectious diseases

The challenge of emerging and re-emerging infectious diseases

Synchrony, waves, and spatial hierarchies in the spread of influenza

Impact of human mobility on the emergence of dengue epidemics in Pakistan

Code for: An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study

Essential medical statistics

Probability and Statistics

boot: Bootstrap R (S-Plus) functions

Bootstrap methods and their application

Infectious Diseases of Humans: Dynamics and Control

Discrete time modelling of disease incidence time series by using Markov chain Monte Carlo methods

Spatial population dynamics: Analyzing patterns and processes of population synchrony