key: cord-0696351-sz26gchn authors: Bandoy, DJ Darwin R; Weimer, Bart C title: Pandemic dynamics of COVID-19 using epidemic stage, instantaneous reproductive number and pathogen genome identity (GENI) score: modeling molecular epidemiology date: 2020-03-20 journal: nan DOI: 10.1101/2020.03.17.20037481 sha: e0565d1bce8a2ea67608adee91eb894e86087457 doc_id: 696351 cord_uid: sz26gchn Background: Global spread of COVID-19 created an unprecedented infectious disease crisis that progressed to a pandemic with >180,000 cases in >100 countries. Reproductive number (R) is an outbreak metric estimating the transmission of a pathogen. Initial R values were published based on the early outbreak in China with limited number of cases with whole genome sequencing. Initial comparisons failed to show a direct relationship viral genomic diversity and epidemic severity was not established for SARS-Cov-2. Methods: Each country's COVID-19 outbreak status was classified according to epicurve stage (index, takeoff, exponential, decline). Instantaneous R estimates (Wallinga and Teunis method) with a short and standard serial interval examined asymptomatic spread. Whole genome sequences were used to quantify the pathogen genome identity score that were used to estimate transmission time and epicurve stage. Transmission time was estimated based on evolutionary rate of 2 mutations/month. Findings: The country-specific R revealed variable infection dynamics between and within outbreak stages. Outside China, R estimates revealed propagating epidemics poised to move into the takeoff and exponential stages. Population density and local temperatures had variable relationship to the outbreaks. GENI scores differentiated countries in index stage with cryptic transmission. Integration of incidence data with genome variation directly increases in cases with increased genome variation. Interpretation: R was dynamic for each country and during the outbreak stage. Integrating the outbreak dynamic, dynamic R, and genome variation found a direct association between cases and genome variation. Synergistically, GENI provides an evidence-based transmission metric that can be determined by sequencing the virus from each case. We calculated an instantaneous country-specific R at different stages of outbreaks and formulated a novel metric for infection dynamics using viral genome sequences to capture gaps in untraceable transmission. Integrating epidemiology with genome sequencing allows evidence-based dynamic disease outbreak tracking with predictive evidence. Outbreaks are defined by the reproductive number (R) 1,2 a common measure of 100 transmission. Probability of further disease spread is evaluated based on the threshold value to 3.9) using serial intervals for 424 patients in Wuhan, China 4 . Recalculation with 2033 cases based on the refined estimate from China. However, this is falling short in predicting the spread of the pandemic and expansion within individual locations, suggesting that R is not likely to be 109 constant and likely to be dynamic for each outbreak location that results in underestimates of 110 the spread rate. This limitation is hindering epidemic dynamics as previously noted due to the 111 parameter is context specific and dynamic 1,2 . Hence, there is a need to rapidly estimate country 112 specific R values during the epidemic. This will provide global comparisons of expansion at 113 each location. The Wallinga and Teunis method for R estimation requires input of outbreak incidences and 115 the serial interval (i.e. the period between the manifestation of symptoms in the primary case 116 and the onset of symptoms in secondary cases) 6 . This approach was implemented in a web 117 resource to estimate R during epidemics 7 . A key advantage of the approach is the ease of 118 production of credible intervals compared to other maximum likelihood estimation approaches. Yet to be done is integration of viral genetic variation with R estimates but one study found that 120 there was no obvious relationship between R, severity of the epidemic and COVID-19 genome 121 diversity 20 . COVID-19 has reached global spread in all continents except Antarctica and was defined to 123 be a pandemic by the World Health Organization (WHO) in March 2020 [8] [9] [10] . The outbreak 124 dynamics are different between countries as well as varying within individual countries. In part 125 this is due to varying and diverse healthcare systems, socio-cultural contexts, and rigorous 126 testing. Considering the lack of containment globally, except in Singapore, Hong Kong, and 127 Taiwan, we hypothesized that previously calculated R values do not provide reliable estimates 128 because they are more dynamic than is being considered and that influx of new cases and viral 129 mutation are likely sustaining expansion. While viral sequencing is occurring, it is not being 130 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020 . . https://doi.org/10.1101 /2020 effectively integrated with epidemiological information because there is no existing framework 131 for that to systematically occur. 145 genome is changing over the outbreak but there is controversy about the impact and specifics of 146 the exact mutations. In this study, we used incidence data to derive R and compared country 147 specific COVID-19 infection dynamics with viral population genome diversity. By incorporating 148 R, epidemic curve timing, and viral genome diversity we created a systematic framework that 149 deduced how viral genome diversity can be used to describe epidemiological features of an 150 outbreak before new cases were observed. This was done by creating a genome diversity 151 metric that was directly and systematic integrated to provide context and allowed quantification 152 of the infection dynamics globally that are divergent from the early estimates with genomic 153 evidence. We call this approach pathogen genome identity (GENI) scoring system. Using GENI 154 differentiated each stage of the outbreak. It also indicated cryptic local transmission from 155 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020 . . https://doi.org/10.1101 /2020 surveillance systems. This a defining advantage of using sequences as previous cryptic 156 transmission can be inferred in the genomic sequences. (Baltimore, MD, USA) that was accessed on March 1, 2020 15 . We constructed epidemic curves 162 or epicurves from the incidence data and classified country status accordingly. We defined four 163 groups that characterize increasing expansion with a decline phase. The extracted time series case data served as the input for determining instantaneous 165 reproductive number on a daily basis to effectively capture dynamic changes due to new 166 detected cases and reduction of cases due to social distancing and nonpharmaceutical 167 interventions. The prior value for R was selected at 2 and prior standard deviation of 5 to allow 168 fluctuations in reporting of cases in the exponential phase. As there is limited access to 169 epidemiological data of case, parametric with uncertainty (offset gamma) distributional estimate 170 of serial interval was used. A mean of 2 and 7 days, with standard deviation of 1 was used to 171 capture short and standard serial interval assumptions using 50 samplings of serial interval 172 distribution. The Wallinga and Teunis method, as implemented by Ferguson 7 is a likelihood-173 based estimation procedure that captures the temporal pattern of effective reproduction 174 numbers from an observed epidemic curve. R was calculated using the web application 175 EpiEstim App (https://shiny.dide.imperial.ac.uk/epiestim/) 7 . The descriptive statistics were used 176 to compute mean and confidence intervals of the instantaneous reproductive number. GENI score was anchored on the principle of rapid pathogen evolution between 178 transmission events. This requires defining a suitable reference sequence of the outbreak, 179 which is on the early stages the sequence nearest to the timepoint of the index case. For the 180 case of COVID-19, the reference sequence is Wuhan seafood market pneumonia virus isolate 181 Wuhan-Hu-1 NC_045512.2 16 . Publicly available virus sequences were retrieved from GISAID 182 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.17.20037481 doi: medRxiv preprint average mutation per isolate was divided to the total epidemic curve days to derive a daily 184 epidemic mutation rate and scaled to a monthly rate. We calculated the average nucleotide 185 change per month to be 1.7 (95% CI 1.4-2.0), which was within boundaries of another estimate 186 with the substitution rate of 0.9 × 10 -3 (95% CI 0.5-1.4 × 10 -3 ) substitutions per site per year 20 . We derived a transformed value of this rate before integrating it with epidemiological 188 information. The output from the variant calling step was then used to determine GENI score by 189 calculating the nucleotide difference. The basis for GENI score cutoffs to estimate transmission 190 dates are derived from accepted evolutionary inference of mutation rates of COVID-19. We defined four epidemic curve (epicurves) stages to provide a clear method to define 192 increases in the outbreak. The 'index stage' is characterized by the first report (index case) or 193 limited local transmission indicated by intermittent zero incidence creating undulating epicurve. Secondly, which is distinctly different from stage 1, is the 'takeoff stage' in which the troughs are 9 stage. Three countries were in the exponential stage and five countries in the takeoff stage 209 ( Figure 1 ). China was the only country that reached the peak of the epicurve and characterized 210 to be in the decline stage -decreasing cases. At this point there was no evidence of any other 211 country near the decline stage and some countries that were poised to move into the takeoff 212 and exponential phase. Instantaneous R sensitively described real-time shifts of COVID-19 incidence captured 214 within each epicurve stage (Figure 2 ). The decline stage in China was reflected by a decrease 215 in R estimates in the latter stages the outbreak and relative to the early estimates: 1.6 (95 % CI 216 0.4-2.9) and 1.8 (95 % CI 1.0-2.7) for 2-and 7-days serial interval, respectively. Superspreading 217 events inflated R estimates seen in exponential stage that was observed in South Korea: 2.8 218 (95% CI 0.6-5.3) and 25.6 (95 % CI 3.0-48.2) for 2-and 7-days serial interval, respectively. Efficient disease control was instituted in Singapore enabling it to remain in the index stage 220 while Japan was moving to the takeoff stage characterized by increased R estimates 3.6 (95% 221 CI 0.4-7.3) 2.2 (95% CI 1.3-3.0) for 2-and 7-days serial interval, respectively. The R estimates 222 overlaps for all exemplar country outbreak stages in the two serial interval scenarios, suggesting 223 that the transmission could be as short as 2 days. These estimates were relatively lower than 224 previously reported, bringing to light possibility of transmission in the incubation period that is 225 associated with rapidly expanding outbreaks, which is currently being observed in many 226 European countries. Low detection of COVID-19 was observed in representative countries in the index stage with 228 low R values (<2) that can be attributed to effectiveness of social distancing intervention (i.e. Hong Kong) or under detection for countries with limited testing (i.e. United States) (Figure 3a ). Sustained local transmission was occurring in five countries that were progressing into takeoff 231 stage (Japan, Germany, Spain, Kuwait and France) as measured by R values (>2) (Figure 3b ). The magnitude of spread was apparent with relatively higher R estimates (>10) in Italy, Iran and South Korea, which demonstrated sudden surges in incidence due to prior undetected clusters 234 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10. 1101 /2020 in part but other factors may contribute to this observation (Figure 3b ). This significantly 235 increased the instantaneous R estimates versus other methods of estimation but allows a more 236 obvious depiction of the surge of cases allows differentiation of the takeoff stage from 237 exponential stage. We further examined the value of computing country-specific instantaneous R by 239 comparing different temperature range (tropical versus temperate) and population density. Population density of key cities (Table 2) We determined the relationship of epicurve stage with viral genetic variation using a 253 metric that merges absolute genome variation with the rate of genome change to create the 254 GENI metric that anchored population genome diversity with the rate of evolution for the SARS-255 Cov-2. To examine how the viral genome diversity was associated with the epicurve stages we 256 first examined the index stage (Singapore) and the exponential (South Korea). Integration of 257 GENI scores successfully distinguished the index from exponential stage (Figure 4 ). An 258 increase in GENI scores was associated with exponential stage with a median score of 4, 259 suggesting that the viral diversity and rate of mutation played was directly proportional to case 260 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.17.20037481 doi: medRxiv preprint determined that R is much more dynamic in the COVID-19 pandemic than previously 288 appreciated by country as well as over the outbreak within each country (Fig 2-3) . The 289 instantaneous R estimation with a serial interval of 2 was extremely sensitive to shifts in the 290 epicurve during the index phase (Fig 2-3) . Singapore is an excellent example of effectively 291 controlling and containing the COVID-19 outbreak. They previously designated a response 292 system called Dorscon (Disease Outbreak Response System Condition) 22 providing a 293 systematic approach to control so that they have not moved past the index phase. In contrast, 294 most other countries in this phase are poised to move into the takeoff phase (Fig 3) . The 295 transition into the takeoff phase signified a transition from a 2-day serial interval to a 7-day serial 296 interval that was more sensitive to shifts in the epicurve. While estimates of R alone is insightful in retrospect, gaps in epidemiological surveillance 298 due to several factors creates blind spots that hindered the ability to determine interventions. To 299 overcome this limitation, we merged GENI estimates based on whole genome sequence 300 variation and mutation rate with the epicurve and R and provided a predictive triad of 301 measurement that resulted in insight that accurately refined case expansion (Fig. 4) . Each CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020 . . https://doi.org/10.1101 /2020 metric that is scientifically robust and at the same time can convey complex biological properties 315 to enable an efficient characterization of an outbreak in combination. Transforming complex 316 pathogen characteristics was made usable to public health and medical field using the GENI 317 score as a complete merged information set with other characteristics of the outbreak. Previous outbreaks, such as Ebola, employed state of the art analysis using phylodynamics 319 that is anchored on the genetic evolution 13 . Inference such as time to most recent common 320 ancestor allowed estimation of outbreak origin, population size, and R -yet this was not Prior work forewarned the practice of being overly dependent on early estimates of R 337 alone 28 . By having the most accurate possible information for a dynamic metric and taking into 338 account the complex dynamics that factor in the calculation of R along with merging this the 339 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020 . . https://doi.org/10.1101 /2020 these resources opens unexpected collaboration and avenues for applying relevant 343 bioinformatic and disease modelling skills across the scientific community to solve global public 344 health problems. Examples that hindered this were observed in several countries that led to 345 cryptic spread of the disease in countries. Additionally, lacking the epidemiological infrastructure 346 and genome sequencing capabilities limit this approach that is not acceptable for modern public 347 health. However, without the appropriate technical skills in the performing complicated 348 phylogenetic inference, utility of such innovation will be limited. Establishing a protocol for 349 merging epidemiology and genomics was defined in this work (Fig. 5) and can be instituted 350 globally. This study integrated population genomics into epidemiological methods to provide a framework sequences from GISAID's COVID-19 Genome Database. We also thank the global community 367 for rapid information sharing that enabled integration of these data. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020 . . https://doi.org/10.1101 /2020 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020 . . https://doi.org/10.1101 /2020 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020 . . https://doi.org/10.1101 /2020 The concept of Ro in epidemic theory