key: cord-314445-4cb4a9r5 authors: McNamara, Ryan P.; Caro-Vegas, Carolina; Landis, Justin T.; Moorad, Razia; Pluta, Linda J.; Eason, Anthony B.; Thompson, Cecilia; Bailey, Aubrey; Villamor, Femi Cleola S.; Lange, Philip T.; Wong, Jason P.; Seltzer, Tischan; Seltzer, Jedediah; Zhou, Yijun; Vahrson, Wolfgang; Juarez, Angelica; Meyo, James O.; Calabre, Tiphaine; Broussard, Grant; Rivera-Soto, Ricardo; Chappell, Danielle L.; Baric, Ralph S.; Damania, Blossom; Miller, Melissa B.; Dittmer, Dirk P. title: High-density amplicon sequencing identifies community spread and ongoing evolution of SARS-CoV-2 in the Southern United States date: 2020-06-19 journal: bioRxiv DOI: 10.1101/2020.06.19.161141 sha: doc_id: 314445 cord_uid: 4cb4a9r5 SARS-CoV-2 is constantly evolving. Prior studies have focused on high case-density locations, such as the Northern and Western metropolitan areas in the U.S. This study demonstrates continued SARS-CoV-2 evolution in a suburban Southern U.S. region by high-density amplicon sequencing of symptomatic cases. 57% of strains carried the spike D614G variant. The presence of D614G was associated with a higher genome copy number and its prevalence expanded with time. Four strains carried a deletion in a predicted stem loop of the 3’ untranslated region. The data are consistent with community spread within the local population and the larger continental U.S. No strain had mutations in the target sites used in common diagnostic assays. The data instill confidence in the sensitivity of current tests and validate “testing by sequencing” as a new option to uncover cases, particularly those not conforming to the standard clinical presentation of COVID-19. This study contributes to the understanding of COVID-19 by providing an extensive set of genomes from a non-urban setting and further informs vaccine design by defining D614G as a dominant and emergent SARS-CoV-2 isolate in the U.S. The current COVID-19 pandemic is an urgent public health emergency with over 112,000 deaths in the United States (U.S.) alone. COVID-19 is caused by infection with the severe acute 48 respiratory syndrome coronavirus-2 (SARS-CoV-2). The typical symptoms for COVID-19 may include 49 the following: fever, cough, shortness of breath, fatigue, myalgias, headache, sore throat, abdominal To provide finer granularity about biological changes during SARS-CoV-2 transmission, we 106 employed next generation sequencing (NGS) as an independent screening modality. This allowed us 107 to reconstruct the mutational landscape of cases seen at a tertiary clinical care center in the 108 southeastern U.S. from the start of the U.S. epidemic on March 3, 2020, until past the peak of the first distribution of 10x coverage for all samples is presented in Figure 1A . As expected, more mapped 220 reads yielded higher coverage. Of the 33 negative controls, none had >10 2 total reads aligned. Of the 221 positive samples, greater than 5*10 3 total mapped reads were needed to obtain 1x coverage of the 222 whole genome, a minimum of 3.1x10 4 reads were needed to obtain >90% coverage at 10x. The 223 number of reads aligned varied depending on the viral load, as determined by real-time qPCR using 224 CDC primer N1, but not total RNA, as determined using RNAse P, of the samples ( Figure 1B) . In this 225 assay, any CP <35 for SARS-CoV-2 qPCR yielded reliable coverage, which increased linearly with 226 viral load. At a CP ≥35 most positive samples still yielded reads that mapped to the target genome 227 and thus allowed detection of SARS-CoV-2 sequences; however, the results were less consistent, 228 and coverage was more variable. As expected, total RNA (measured by RNAse P) was not 229 associated with sequencing coverage and varied considerably across samples, even though each 230 sample used the same amount of virus transport medium (VTM). The coverage level distribution is shown in Figure 1C Independently derived consensus genomes from the SARS-CoV-2/human/USA-WA1/2020 295 isolates showed evidence of divergence between the original isolate, the seed stock, and 296 commercially distributed standard (Figure 2B) . Similar culture-associated changes were recently 297 reported for a second, culture-amplified reference isolate: Hong Kong/VM20001061/2020. This is not 298 surprising, given that any large-scale virus amplification in culture is accompanied by virus evolution, 299 but it raises concerns about the utility of using a natural isolate, rather than a molecular clone 300 (Graham et al., 2018; Thao et al., 2020) as standard for sequencing. The phylogeny based on whole genome nucleotide sequences revealed several interesting 302 facets. Predictably, all UNC isolates of SARS-CoV-2 were significantly different from SARS-CoV and 303 RatTG13 (Figure 2B, purple color) . RatTg13 was used as an outgroup for clustering. The first NC 304 case (NC_6999, (Figure 2B , arrow labeled "WA")) was a person returning from Washington (WA) and One large deletion was identified in four independent samples: 14 nucleotides were deleted 318 beginning at position 29745 (indicated in Figure 2C by a delta symbol) . This region is within the 319 previously recognized "Coronavirus 3' stem-loop II-like motif (s2m)". This was confirmed in multiple 320 isolates, supported by multiple, independent junction-spanning reads (Figure 3A, B) . Junctions were 321 mapped to single nucleotide resolution directly from individual reads. The variant 3' end does not 322 destroy overall folding but introduces a shorter stable hairpin (Figure 3C, D) . How this mutation 323 affects viral fitness remains to be established. In sum, this study generated exhaustive SNV information representing the introduction and 325 spread of SARS-CoV-2 across a suburban low-density area in the Southern U.S. All samples were 326 from symptomatic cases and the majority of genomes clustered with variants that predominate the 327 outbreak in the U.S., rather than Europe or China. This supports the notion that the majority of U.S. There seems to be partial overlap between the bulged stem-loop and the pseudoknot, suggesting that 360 these two structures are mutually exclusive and may serve as a switch to regulate the ratio of full 361 length RNA and defective RNA (Goebel et al., 2004) . These two structures are also present in SARS- CoV. These isolates represent full-length genomes from symptomatic patients rather than disjointed 363 RNA fragments recovered after clinical disease had subsided, thus we speculate that these deletion About half of the specimen not clinically tested for SARS-CoV-2 tested positive by sequencing. This was not surprising, as to this day testing capabilities are limited, and probable cases are triaged 419 based on clinical and public health indications. These unknown cases were not asymptomatic but 420 represent patients with a clinically indicated need for upper respiratory sampling. Finding additional 421 SARS-CoV-2 cases in this population suggests that case counts based on NAT represent a lower 422 estimate of SARS-CoV-2 prevalence. It may also suggest that the current triage criteria for SARS- CoV-2 testing are too limited to understand spread of this virus. In sum, this study underscores the 424 sensitivity and accuracy of current NAT assays and demonstrates the utility of testing by sequencing. It contributes to the worldwide effort to understand and combat the COVID-19 pandemic by providing 426 the first set of full-length SARS-CoV-2 genomes from a non-urban setting. Coronavirus Susceptibility to the Antiviral Remdesivir (GS-5734) Is 496 Mediated by the Viral Polymerase and the Proofreading Exoribonuclease The proximal origin of 499 SARS-CoV-2 Presymptomatic SARS-CoV-2 Infections and Transmission in 502 a Skilled Nursing Facility SARS-CoV-2 viral spike G614 mutation exhibits higher case 504 fatality rate Covid-19 in Critically Ill Patients in the Seattle 507 Region -Case Series Genomic variance of the 2019-nCoV coronavirus Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in 512 China: a descriptive study Molecular evolution of the SARS coronavirus during the course of the 515 SARS epidemic in China Detection of 2019 novel coronavirus (2019-nCoV) by real-519 time RT-PCR The species 522 Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-523 Could the D614 G substitution in the 525 SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? Coast-to-Coast Spread of SARS-CoV-2 during the Early 529 Epidemic in the United States Phylogenetic network analysis of SARS-531 CoV-2 genomes Genomic epidemiology of hCoV-19 Characterization of the RNA 535 components of a putative molecular switch in the 3' untranslated region of the murine coronavirus 536 genome A live, 538 impaired-fidelity coronavirus vaccine protects in an aged, immunocompromised mouse model of 539 lethal disease Evaluation of a 541 recombination-resistant coronavirus as a broadly applicable Clinical Characteristics of Coronavirus Disease 2019 in China Nextstrain: real-time tracking of pathogen evolution Temporal dynamics in viral shedding and transmissibility of COVID-19 SARS-CoV-2 Transmission from 554 Presymptomatic Meeting Attendee Faster quantitative real-time PCR protocols may 557 lose sensitivity and show increased variability SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 561 and Is Blocked by a Clinically Proven Protease Inhibitor MAFFT multiple sequence alignment software version 7: 564 improvements in performance and usability Infection and Rapid Transmission of SARS-CoV-2 in Ferrets Functional assessment of cell entry and receptor usage 573 for SARS-CoV-2 and other lineage B betacoronaviruses Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Efficiency clustering for low-density 579 microarrays and its application to QPCR Antibody responses to SARS-CoV-2 in patients with COVID-19 Genomic Epidemiology of SARS-CoV-2 in Guangdong Province Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins 589 and receptor binding Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins 592 and receptor binding US CDC Real-Time Reverse Transcription PCR Panel for Detection of Severe 596 Acute Respiratory Syndrome Coronavirus 2 The neighbor-joining method: a new method for reconstructing 599 phylogenetic trees Burden of respiratory viral infection in persons with 603 human immunodeficiency virus. Influenza Other Respir Viruses Structural basis of receptor recognition by SARS-CoV-2 GISAID: Global initiative on sharing all influenza data -from vision to 609 reality Prospects for inferring very large phylogenies by using the 612 neighbor-joining method Coronavirus Disease 2019 in Children -United States Rapid reconstruction of SARS-CoV-2 using a synthetic 619 genomics platform Aerosol and Surface Stability of SARS-CoV-622 2 as Compared with SARS-CoV-1 Emergence of genomic diversity and recurrent mutations in SARS-626 An outbreak of severe Kawasaki-like disease at the Italian epicentre of the SARS-629 CoV-2 epidemic: an observational cohort study Antigenicity of the SARS-CoV-2 Spike Glycoprotein Receptor Recognition by the Novel 635 Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Receptor Recognition by the Novel 638 Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. 639 A phylogenetically conserved hairpin-type 3' 641 untranslated region pseudoknot functions in coronavirus RNA replication Virological assessment of hospitalized patients with COVID-645 2019 Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation A new coronavirus associated with human respiratory disease in China Factors 653 associated with prolonged viral RNA shedding in patients with COVID-19 Characteristics of pediatric SARS-CoV-2 infection and potential evidence for persistent fecal viral 657 shedding Quantitative Detection and Viral Load Analysis of SARS-CoV-2 in Infected Patients Viral and 662 host factors related to the clinical outcome of COVID-19 Clinical 665 course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a 666 retrospective cohort study A pneumonia outbreak associated with a new coronavirus of probable bat origin A Novel Coronavirus from Patients with Pneumonia in China SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients Genetic interactions between an 678 essential 3' cis-acting RNA pseudoknot, replicase gene products, and the extreme 3' end of the mouse 679 coronavirus genome This work was funded by public health service grants CA016086, CA019014, and CA239583