key: cord-0748358-wun5rwsk authors: Washington, Nicole L.; Gangavarapu, Karthik; Zeller, Mark; Bolze, Alexandre; Cirulli, Elizabeth T.; Schiabor Barrett, Kelly M.; Larsen, Brendan B.; Anderson, Catelyn; White, Simon; Cassens, Tyler; Jacobs, Sharoni; Levan, Geraint; Nguyen, Jason; Ramirez, Jimmy M.; Rivera-Garcia, Charlotte; Sandoval, Efren; Wang, Xueqing; Wong, David; Spencer, Emily; Robles-Sikisaka, Refugio; Kurzban, Ezra; Hughes, Laura D.; Deng, Xianding; Wang, Candace; Servellita, Venice; Valentine, Holly; De Hoff, Peter; Seaver, Phoebe; Sathe, Shashank; Gietzen, Kimberly; Sickler, Brad; Antico, Jay; Hoon, Kelly; Liu, Jingtao; Harding, Aaron; Bakhtar, Omid; Basler, Tracy; Austin, Brett; MacCannell, Duncan; Isaksson, Magnus; Febbo, Phillip G.; Becker, David; Laurent, Marc; McDonald, Eric; Yeo, Gene W.; Knight, Rob; Laurent, Louise C.; de Feo, Eileen; Worobey, Michael; Chiu, Charles Y.; Suchard, Marc A.; Lu, James T.; Lee, William; Andersen, Kristian G. title: Emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States date: 2021-03-30 journal: Cell DOI: 10.1016/j.cell.2021.03.052 sha: 3966d03d72440c939928b54f0d024f49181db116 doc_id: 748358 cord_uid: wun5rwsk The highly transmissible B.1.1.7 variant of SARS-CoV-2, first identified in the United Kingdom, has gained a foothold across the world. Using S gene target failure (SGTF) and SARS-CoV-2 genomic sequencing, we investigated the prevalence and dynamics of this variant in the United States (U.S.), tracking it back to its early emergence. We found that while the fraction of B.1.1.7 varied by state, the variant increased at a logistic rate with a roughly weekly doubling rate and an increased transmission of 40-50%. We revealed several independent introductions of B.1.1.7 into the U.S. as early as late November 2020, with community transmission spreading it to most states within months. We show that the U.S. is on a similar trajectory as other countries where B.1.1.7 became dominant, requiring immediate and decisive action to minimize COVID-19 morbidity and mortality. Since the onset of the COVID-19 pandemic, there has been concern about the possibility of 55 novel SARS-CoV-2 variants emerging that are more transmissible than the ancestral lineage 56 first identified in Wuhan, China. During the third quarter of 2020, the SARS-CoV-2 Variant of 57 Concern (VOC) 202012/01 (a.k.a. 501Y.V1; B.1.1.7) lineage carrying the N501Y mutation 58 emerged and took hold in the United Kingdom (U.K.), followed by several European countries. 59 The N501Y mutation is also shared with other VOCs first identified in South Africa The B.1.1.7 lineage has been shown to be inherently more transmissible, with a growth rate that 68 has been estimated to be 40-70% higher than other SARS-CoV-2 lineages in multiple countries, 69 which is hypothesized to be partly due to the N501Y mutation increasing receptor binding 70 affinity of the SARS-CoV-2 spike protein with angiotensin-converting enzyme 2 (ACE2) (Volz et 71 al., 2021). While initially thought to have comparable clinical outcomes to other SARS-CoV-2 72 variants, recent reports indicate that infection with B.1.1.7 may lead to ~30-50% higher mortality 73 rates (Iacobucci, 2021) . Though the exact origin of the B.1.1.7 variant is unclear, the proactive 74 and large scale SARS-CoV-2 genomic surveillance program in the U.K. facilitated the initial 75 detection after investigators in South Africa had observed an association between N501Y and 76 increased transmission (Chand et al., 2020) . 77 Routinely administered RT-qPCR SARS-CoV-2 diagnostic tests can provide hints to the 78 presence of viral lineages with sequence-based differences when mutations occur at the test's 79 target probe location(s) that lead to unexpected results. Importantly, the 69-70 deletion in the 80 SARS-CoV-2 spike (S) gene, present in B.1.1.7 and other variants, can be characterized by the 81 failure to detect the S gene using certain tests, such as the Thermo Fisher TaqPath™ COVID-82 19 assay, known as S gene target failure (SGTF) (Bal et al., 2020) . Retrospective analyses from 83 the U.K. show that the proportion of B.1.1.7 in SGTF samples rose from 3% during the week of 84 October 12, 2020 to more than 90% during the week of November 30, 2020. It has since 85 reached near-fixation across most of the U.K. (Chand et al., 2021) . 86 The earliest samples of the B.1.1.7 variant were sequenced in Southern England in late 87 September, 2020 and has since been detected in over 75 countries . In 88 this study, we sought to understand the prevalence and growth dynamics of this variant in the 89 U.S., from early emergence to rapid onward transmission. Surveillance programs typically select 90 a subset of samples tested by SARS-CoV-2 RT-PCR for sequencing, and therefore prioritizing 91 SGTF samples for sequencing serves to enrich for detection of B.1.1.7. Here, we describe the 92 introduction and early spread of B.1.1.7 in the U.S., based on historical SGTF rates in RT-PCR 93 SARS-CoV-2 tests and a nationwide SGTF viral sequencing program. We find that B.1.1.7 94 J o u r n a l P r e -p r o o f arrived in the U.S. towards the end of November, 2020 and, as of February, 2021, has since 95 spread to over 40 U.S. states (CDC, 2021) . Importantly, similarly to what has been observed in 96 other countries, we find that the B.1.1.7 variant is 40-50% more transmissible across the U.S., 97 doubling in relative frequency every week to week and a half. These findings show that B.1.1.7 98 will probably become the dominant variant in many U.S. states by March, 2021, likely leading to 99 further surges of COVID-19 in the country, unless urgent mitigation efforts are immediately 100 implemented. 101 Results 102 The proportion of SGTF samples is rapidly increasing in the U.S. 103 We examined the prevalence of SGTF in all SARS-CoV-2 positive samples from across the 104 U.S. tested at Helix since July 2020 (~0.5 million samples; Figure 1A ). The Helix ® COVID-19 105 Test calls positive samples when at least two of three targets (N, Orf1ab, and S) are detected 106 using the Thermo Fisher TaqPath™ assay. We only considered samples to be SGTF if they 107 were positive for both N and Orf1ab, and negative for S. We restricted our analyses to positive 108 samples with Cycle quantification (Cq) <27 for the N gene based on previous reports that single 109 target failures were more frequent at higher Cq (Bal et al., 2020; Kara Steel And, 2021) . 110 We began to observe consistent, low frequency SGTF in early October 2020, with 1.4% of daily 111 SARS-CoV-2-positive tests exhibiting this pattern during October 18 to October 24, 2020, 112 followed by a steady increase in the weeks that followed (Data S1). We found that the 113 nationwide proportion of SGTF increased from an average of 0.8% in the first week of January 114 (January 1 to January 7) to 10.6% in the third week of February (February 14 to February 19) 115 (Data S1). 116 To investigate regional differences across the U.S., we examined the nationwide distribution of 117 SGTF. By grouping our samples based on patient state of residence, we observed SGTF in 25 118 out of 53 U.S. States and territories during January and February, 2021, several of which had 119 SGTF frequencies consistently above 1% (Data S1). When restricting our analysis to assess 120 U.S. states with more than 500 positive tests in January and February, 2021, we observed that 121 the fraction of SGTF varies significantly across the nation (Data S1). A caveat to this analysis is 122 that our testing footprint does not evenly cover the U.S. (Figure 1A) 24% sequences from other U.S. states (Data S1). We found that 662 (67% of all SGTF 132 sequences) samples were from the B.1.1.7 lineage (insert; Figure 2A and Data S1), distributed 133 J o u r n a l P r e -p r o o f across 15 U.S. states (insert; Figure 2A) Figure 1C) . Figure S1C) . Clade "TX-1" had a median TMRCA of January 23, 2020 [January 215 17 -January 26) (Supplemental Figure S1C) . We found that the other U.S. clades had median 216 TMRCAs in December, 2020 and January, 2021 (Supplemental Figure S1C) , suggesting 217 repeated introductions of B.1.1.7 into the U.S. from international locations from November, 2020 218 through the present. 219 B.1.1.7 has likely been spreading between U.S. states since late 2020 220 In addition to the main B.1.1.7 clades that contained sequences primarily from individual states, 221 including "CA", "CA-1" to "CA-3", "FL", "FL-1" to "FL-9", "TX-1", "PA", "NY", and "GA", we found 222 that eight clades ("Mixed-1" to "Mixed-8") were diverse with respect to geographic sampling, 223 containing SARS-CoV-2 genomes from across multiple U.S. states (Figure 2A and 224 Supplemental Figure S1A ). These findings indicate that B.1.1.7 has been spreading locally 225 between different U.S. states ( Figure 2D and Supplemental Figure S1D ), likely since at least 226 December, 2020 based on our TMRCA estimates ( Figure 2C and Supplemental Figure S1D ). In addition, we found that clade "TX-2" with 18 sequences from Texas, and three sequences 228 from Louisiana (TX-2; Figure cluster. In addition, we also expect the growing number of sequences from B.1.1.7 to add 268 further evidence for local transmission within the U.S. presented in this study and potentially 269 also elucidate the directionality of the spread of this lineage. 270 In addition to well-supported local clades in California, Florida, and Georgia from our 271 phylogenetic analyses, many of the initial B.1.1.7 cases in the U.S. did not report recent 272 international travel prior to infection (Davis, 2020; Romo, 2020). These findings suggest that 273 significant community transmission of B.1.1.7 is already ongoing across the U.S., which is likely 274 fueled by the increased growth rate and transmissibility of B.1.1.7. We found the growth rate in 275 Florida (~8% / day) to be slightly lower than in California (~10% / day). This difference may be 276 due to differences in statewide or regional social distancing protocols or mobility patterns, 277 population density, biases in sampling and/or demographics, or potential competition from other 278 SARS-CoV-2 variants. In addition, other countries have observed that earlier estimates of 279 increased B.1.1.7 transmission have been lower than later, more robust, estimates. We expect 280 the same may be true in the U.S. and our growth estimates might therefore increase as we 281 obtain more data. 282 The nationwide growth rate of the proportion of B.1.1.7 cases of ~7.5% / day is slightly lower 283 than those observed in Portugal (10% / day (Borges, 2021)), Denmark (10.3% / day (Statens 284 Serums Institute)) and the U.K. (10.4% / day (Davies et al., 2020) ). This potential difference 285 requires further investigation, but, as described above, in addition to social, demographic, and 286 policy factors, may likely be down to the relative sparsity of currently available U.S.-wide data 287 and we expect it may increase as more data is collected. 288 Our study shows that although SGTF is not yet a universal proxy for the B. help with logistics. We thank the healthcare workers, frontline workers, and patients who made 327 the collection of this SARS-CoV-2 dataset possible and all those who made genomic data 328 available for analysis via GISAID (Data S2). This work has been funded by CDC BAA contracts 329 75D30121P10258 . There is a ~2 week lag 361 between sequence data and testing data. We had sequence data until Feb 2nd but we had 362 testing data until Feb 19th. To fully utilize the testing data, we used the average proportion of See also Data S1. 373 in total positive tests were inferred using, 451 !"#"!$%"& "' (. 1.1.7 %& #"+%$%,-$-+$+ = . . . . 452 The winter storm in Texas led to a delay in shipments in every state but California from Feb 12th 453 to Feb 18th. Hence, we only include testing data until Feb 11th for every state apart from 454 California. In addition, there is a ~2 week lag between sequence data and testing data. We had 455 sequence data until Feb 2nd but we had testing data until Feb 19th with a 1-2 day variation 456 between different states. In order to fully utilize the testing data, we used the mean proportion of 457 observed B. Genome assembly of viral reads and variant calling were performed using an in-house 557 automated bioinformatics pipeline as previously described (Deng et al., 2020) . 558 Consensus these positions. We constructed a maximum-likelihood tree using this dataset under a HKY 572 nucleotide substitution model implemented in iqtree2 (Minh et al., 2020) . Using this phylogeny, 573 we selected 954 sequences, including all the sequences in our dataset (n=662) and we 574 subsampled sequences outside the US such that we retained one sequence at every polytomy 575 in the maximum-likelihood tree to represent global diversity of the B.1.1.7 lineage (n=291 and 576 NC_045512 as an outgroup root). We estimated the time-resolved phylogeny using a HKY 577 nucleotide substitution model with discrete-gamma distributed rate variation under an 578 uncorrelated relaxed clock model implemented in BEASTv1.10.5pre (Suchard et al., 2018) . We 579 used a relatively uninformative conditional reference prior on the overall clock rate (Ferreira and 580 Suchard, 2008) , an exponentially growing population prior over the unknown tree and the 581 BEAGLE library to improve computational performance (Ayres et al., 2019) . We ran four 582 independent chains for 100 million steps each and summarized the estimates across the four 583 runs. However, we also find that each of the 4 chains converges to similar estimates as shown 584 in Data S2. The effective sample size of all scientifically relevant parameters in the combined 585 log file was > 200 as shown in Data S2. The phylogenetic tree was visualized using baltic 586 (https://github.com/evogytis/baltic). The BEAST XML, the log files for the 4 independent runs, 587 and the combined log file are available at https://github.com/andersen-lab/paper_2021_early-588 b117-usa. 589 QUANTIFICATION AND STATISTICAL ANALYSIS 590 Statistical analyses were performed using BEAST and R and are described in the Figure 591 legends Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical 616 Phylogenetics Two-step strategy for the identification of 619 SARS-CoV-2 variants co-occurring with spike deletion H69-V70 Tracking SARS-CoV-2 VOC 202012/01 (lineage B.1.1.7) dissemination in 622 Portugal: insights from nationwide RT-PCR Spike gene drop out data US COVID-19 Cases Caused by Variants COVID-19 Variant First Found in Other Countries and States Now Seen More 626 Frequently in California Investigation of novel SARS-CoV-2 variant Variant of 630 Concern Investigation of novel SARS-CoV-2 variant Variant 633 of Concern Alarming COVID variants show vital role of genomic surveillance Estimated transmissibility and severity of 638 novel SARS-CoV-2 Variant of Concern 202012/01 in England San Diego man tests positive for new, more contagious UK variant of COVID-641 19 Genomic surveillance reveals multiple introductions of 645 SARS-CoV-2 into Northern California Risk related to the spread of new SARS-CoV-2 variants of concern in the 647 EU/EEA -first update Genomic characterisation of 650 an emergent SARS-CoV-2 lineage in Manaus: preliminary findings Bayesian analysis of elapsed times in continuous-652 time Markov chains Emergence of SARS-CoV B.1.1.7 Lineage -United States Tracking virus outbreaks in the twenty-first century An amplicon-based sequencing 662 framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar Nextstrain: real-time tracking of pathogen evolution Covid-19: New UK variant may be linked to increased death rate, early 668 data indicate Coronavirus (COVID-19) infection survey, UK -office for national 670 statistics Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM 672 IQ-TREE 2: New Models and Efficient Methods for Phylogenetic 675 Inference in the Genomic Era Phylogenetic Assignment of Named Global 677 Outbreak LINeages The 679 COVID-19 Genomics UK (COG-UK) consortium, Network for Genomic Tracking the 681 international spread of SARS-CoV-2 lineages B Investigation of novel SARS-CoV-2 variant: Variant of Concern 683 202012/01 (GOV.UK) Multiplex PCR method for 686 MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples Preliminary genomic characterisation of an emergent 690 SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations (COVID-19 Genomics 691 Consortium UK (CoG-UK)) A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist 694 genomic epidemiology Florida Becomes 3rd U.S. State To Identify New Coronavirus Variant Prospective mapping of viral mutations that escape antibodies used to 698 treat COVID-19 Statens Serums Institute Estimerede scenarier for udviklingen i cluster B.1.1.7 27.01 Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 Emergence and rapid spread of a new severe 705 acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike 706 mutations in South Africa (medRxiv) TSA checkpoint travel numbers Transmission of SARS-CoV-2 Lineage 711 B.1.1.7 in England: Insights from linking epidemiological and genetic data (medRxiv) S gene 713 dropout patterns in SARS-CoV-2 tests suggest spread of the H69del/V70del mutation in the US is ~50% more transmissible than other circulating lineages in the US. 2. The proportion of cases caused by B.1.1.7 is increasing at a rate of ~7 Genomic epidemiology analyses explain the introduction and transmission of the B.1.1.7 variant of SARS-CoV-2 into the United States Percentage of B.1.1.7 in sequenced samples with SGTF over time in states with >500 SARS-CoV-2 positive tests at Helix from October 2020 to January 29, 2020 and across the United States (U.S.), Related to Figure 1. The USA panel reflects all sequences and SGTF results from across the whole of the U.S., without the minimum threshold as was applied for the individual states. Note that more than 29.6% of the sequenced SGTF samples represented in the USA panel were drawn from states outside of CA and FL. Figure 2C , we used the mean proportion from the last 5 days with sequence data and this is labelled, "mean last 5 days" on the x-axis. The error bars indicate the 95% CI on the estimated logistic growth rate.