key: cord-0917688-3bqmyeoi authors: Hartley, Paul D.; Tillett, Richard L.; AuCoin, David P.; Sevinsky, Joel R.; Xu, Yanji; Gorzalski, Andrew; Pandori, Mark; Buttery, Erin; Hansen, Holly; Picker, Michael A.; Rossetto, Cyprian C.; Verma, Subhash C. title: Genomic surveillance revealed prevalence of unique SARS-CoV-2 variants bearing mutation in the RdRp gene among Nevada patients date: 2020-09-11 journal: medRxiv DOI: 10.1101/2020.08.21.20178863 sha: f03ba69d56d7bbc7d7347be98e024faa289ef0f3 doc_id: 917688 cord_uid: 3bqmyeoi Patients with signs of COVID-19 were tested with CDC approved diagnostic RT-PCR for SARS-CoV-2 using RNA extracted from nasopharyngeal/nasal swabs. In order to determine the variants of SARS-CoV-2 circulating in the state of Nevada, 200 patient specimens from positively identified cases were sequenced through our robust protocol for sequencing SARS-CoV-2 genomes from the nasopharyngeal or nasal swabs. This protocol enabled the identification of specific nucleotide variants including those coding for D614G and clades defining mutations. Additionally, these sequences were used for determining the phylogenetic relationships of SARS-CoV-2 genomes of public health importance occurring in the state of Nevada. Our study reports the occurrence of a novel variant in the nsp12 (RdRp-RNA dependent RNA Polymerase) protein at residue 323 (314aa of orf1b) to Phenylalanine (F) from Proline (P), present in the original isolate of SARS-CoV-2 (Wuhan-Hu-1). This 323F variant is found at a very high frequency (46% of the tested specimen) in Northern Nevada, possibly because the virus accumulated this mutation while circulating in the community and the shelter in place orders restricted the introduction and spread of other variants into this region. Structural modeling of the RdRp with P323F variant did not show any significant difference in protein conformation, but the phenotypic effect is unknown and an area of active investigation. In conclusion, our results highlight the introduction and spread of specific SARS-CoV-2 variants at very high frequency within a distinct geographic location that is important for clinical and public health perspectives in understanding the evolution and transmission of SARS-CoV-2. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted . . https://doi.org/10.1101 3 IMPORTANCE: 50 SARS-COV-2 genomes accumulate nucleotide mutations while passing in the 51 human population and these mutations may confer phenotypic differences including 52 altered immune response and anti-viral drug resistance. We developed a robust workflow 53 to sequence SARS-CoV-2 directly from the nasal/nasopharyngeal swabs containing even 54 a very low viral loads (>35 Ct value samples). Our protocol does not rely on amplicon 55 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted . . https://doi.org/10.1101 Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the cause of 74 coronavirus disease 2019 , was first identified and reported in December 75 2019 in Wuhan, Hubei province, . RNA sequencing and phylogenetic analysis 76 of specimens taken during the initial outbreak in Wuhan determined that the virus is most 77 closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus 78 Betacoribavirus, subgenus Sarbecovirus) which had previously been identify in bats in 79 China (2, 4) . Coronaviruses have a recent history as emerging infections, first SARS-CoV SARS-CoV-2 in patients has primarily occurred using RT-qPCR to detect viral RNA from 88 respiratory specimens (primarily nasal and nasopharyngeal swabs). While RT-PCR 89 results can be quantified through determination of a cycle threshold (Ct) value for each 90 sample, it does not yield sequence data leading to the description of genomic variants. 91 To further study of such variants, and to better understand the epidemiology of the virus 92 in the state of Nevada, we developed a workflow that allowed us to sequence SARS-CoV-93 2 genomic RNA from patient swabs containing a broad range of viral loads. Of the 94 sequences of SARS-CoV-2 currently submitted to common database (GenBank and 95 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted . . https://doi.org/10.1101 5 GISAID), several were obtained after the virus had been passed in Vero cells (9, 10) and 96 others came directly from patient specimens (11, 12) . Certain data have suggested a 97 potential for lab acquired mutations following passage in cell culture (9, 13) . Specifically, 98 a report of SARS-CoV-2 passage in Vero cells which resulted in a spontaneous 9 amino 99 acid deletion within the spike (S) protein that overlaps with the furin cleavage site (13). 100 The loss of this site is suggested to increase the viral entry into Vero cells (14). For both 101 research and epidemiological purposes, sequencing of SARS-CoV-2 directly from patient 102 specimens not only reduces the possibility of laboratory acquired mutations following 103 passage in cell culture but also reduces the time that would be spent growing the virus 104 from the patient specimens and subsequently also reduces handling larger amounts of 105 infectious virus. Additionally, one of the goals in developing an optimized SARS-CoV-2 106 NGS protocol was to be able to generate adequate depth of coverage of the viral genome 107 while minimizing the sequencing of non-viral RNA which would allow for more specimens 108 to be multiplexed together during sequencing. 109 Our workflow employs a combination of RNA amplification, conversion into 110 Illumina-compatible sequencing libraries and enrichment of SARS-CoV-2 library 111 molecules prior to sequencing. Using this novel methodology, we sequenced SARS-112 CoV-2 from a total of 200 patient specimens collected over a three-month period 113 originating from Nevada. Of the 200 selected, 173 were sequenced with enough quality 114 to be used for determining SARS-CoV-2 nucleotide variants to perform further 115 phylogenetic analysis and study the viral epidemiology within the state of Nevada. 116 Analysis of the data suggests a specific epidemiological course for the local epidemic 117 within Northern Nevada. This was characterized by an initial observation of variants 118 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted . . https://doi.org/10.1101 6 closely resembling isolates originating directly from China or Europe. Subsequent to 119 government-mandated period of restrictions on business and social activity, we observed 120 that a viral isolate not seen elsewhere in the world emerged within Northern Nevada cases 121 (nucleotide 14,407 and 14,408) . This isolate contains an amino acid change in residue 122 P323L/F of RdRp (nsp12). Furthermore, we found that sampled viral isolates in Southern 123 Nevada, unlike those in Northern Nevada, closely resembled the makeup of the United 124 States in general. surrounding cities). Nucleotide similarity and variants were determined and used to 139 measure the phylogenetic relationships (Supplemental Fig. S2 ). The combined nucleotide 140 diversity across the entire SARS-CoV-2 genome for the Nevada specimens is shown in 141 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint figure 1D , along with the genomic areas that were assessed for change in frequency 142 corresponding to amino acids D614G, P323L/F and nucleotide 379. 143 During the sequencing analysis we also examined the correlation between Ct 144 values from the diagnostic RT-PCR and percentage coverage of the viral genome to 145 determine the performance and robustness of our sequencing method in relation to 146 available viral RNA in a specimen of a given Ct (Fig. 1C) . In general, a Ct value less than 147 31 resulted in at least 90% coverage to the SARS-CoV-2 genome. Importantly, our in-148 house developed method for viral genome enrichment and sequencing directly from the 149 patient's specimens (nasal and nasopharyngeal swabs) was robust and yielded 150 sequences covering over 90% of the genome even in samples having very high Ct (~40) 151 of viral genome detection. This is highly significant and shows the power of our workflow 152 in sequencing of SARS-CoV-2 genome from a spectrum of samples including the ones 153 having inadequate amounts of specimen (due to the variability in collection) or lower viral 154 loads in nasal secretions. Consequently, our sequencing protocol avoids any molecular 155 epidemiological bias, which may get acquired through cell culture-based amplification 156 especially in those specimens with high Ct (low viral load) as our method eliminates the 157 need of virus culture. 158 Earlier studies have revealed the emergence, spread and potential importance of 161 an alteration, D614G (genomic change at 23403A>G), of the spike protein (15). This 162 missense mutation has become a clade-distinguishing locus that differentiates viral 163 isolates originating in Asia from those that have emerged from Europe. A total of 173 164 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint 8 cases were analyzed to determine the number and relative proportion of the specimens 165 which carried the D614G spike protein variant in Nevada. The cumulative frequency for 166 D614 and G614 were plotted from March 6 to June 5 ( Fig To assess the introduction and spread of the clades in Nevada the cumulative 185 frequency for the clades were plotted from March 6 to June 5 (Fig. 3A) . The earliest 186 sequenced specimens from Nevada were collected in the beginning of March (March 6-187 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint March 15) and are predominantly from clades 19A and 19B. Additional sequenced 188 specimens collected from March to June revealed a shift to a higher frequency of 20C 189 were collected and sequenced from March to June there was a shift to a higher frequency 210 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint of L323 and F323 (Fig. 4A) . We performed phylogenetic reconstruction of the Nevada 211 specimens and noted the P323L/F variants on the circular dendrogram with the indicated 212 colors (Fig. 4B) . Interestingly, analysis of the Northern Nevada and Southern Nevada 213 specimen showed very different dominant variants (Fig. 4B ). In Northern Nevada the 214 F323 was more prevalent, while in Southern Nevada L323 was more prevalent. We used 215 a subsampling of Nextstrain.org data to assess the frequency of P323L/F in the United 216 States and globally during the same time period (March 6 to June 5). P323 is the 217 predominant variant in Asia, while L323 is more prevalent in other areas of the world and 218 F323 is only appreciably noted in North America ( Fig. 4D and 4E) . 219 While we are investigating the phenotypic significance of this predominant variant 220 of RdRp, we performed in-silico structural modeling of RdRp to determine the spatio-221 temporal location of this 323aa on RdRp in complex with its accessory proteins, nsp7, 222 nsp8 and nsp3. Our data showed the location of 323aa in the interface domain of RdRp 223 and variation of P323 to L or F did not significantly change the conformation of the protein 224 ( Fig. 5 ). This suggested that P323 variants are most likely causing a neutral mutation in 225 the RdRp without conferring any fitness advantage or disadvantage in transmission or 226 pathogenicity although pending investigation will provide confirmatory results on its 227 significance. 228 229 We have developed a novel method which combines specific depletion and 231 enrichment strategies that results in efficient SARS-CoV-2 RNA-seq with high genome 232 coverage and depth. An advantage of this protocol is that it generates sequence data 233 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint directly from swab specimens without the need to passage the virus in cell culture thereby 234 reducing the handling of infectious material and induction of culture-acquired mutations. 235 Another obstacle in sequencing directly from swab specimens is that most FDA-approved 236 commercially available RNA extraction kits are specifically optimized to recover low 237 amounts of total nucleic acids, include carrier polyA RNA that could be convertible into 238 sequence able molecules, as has been observed previously with RNA-seq of Lassa-or 239 Ebola-positive clinical specimens (17). 240 The workflow incorporates amplification of low-abundance RNA into micrograms 241 of DNA, followed by conversion from a fraction of the DNA into Illumina-compatible 242 sequencing libraries and enrichment of these libraries for SARS-CoV-2 sequences. In 243 addition, during the reverse transcription step a reagent was incorporated to reduce the 244 subsequent amplification of host ribosomal RNA. This approach is robust in that it 245 converts low amounts of RNA into microgram quantities of DNA representative of all the 246 RNA species (aside of rRNA) present in the specimen. This DNA can be stored 247 indefinitely to be interrogated by multiple techniques at a later date. Additionally, RNA 248 amplification is likely less sensitive to low viral abundance compared to RT-PCR. Finally, 249 the use of probes to enrich for coronavirus-specific sequencing library molecules is less 250 sensitive to variants compared to tiling PCR amplicon approaches (18) (19) (20) (21) (22) . 251 The data herein implicate that early in the pandemic, before the "stay-at-home" 252 order on April 1st, there were multiple introductions of SARS-CoV-2 into the state of 253 Nevada. From April 1st to the beginning of June, Nevada experienced a period of semi-254 isolation, as the casinos and most hotels shut down, tourism and travel to the state 255 essentially stopped. Because of the stay-at-home order and social distancing measures 256 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint put in place, there was less mobility of people within and between states. It is possible 257 that these measures, compounded by potential inherent transmission variability of some 258 viral isolates, influenced the change in the frequency of D614G, clades and P323L/F that 259 we noted during this time period within Nevada. In addition, we also found 379C>A with 260 a high prevalence in our study specimens compared to the subsampling of sequences 261 from the United States and globally ( Figure 6 ). This is a synonymous mutation in nsp1, 262 hence the biological relevance of this nucleotide variant remains to be elucidated. study 62 of the 133 specimens from Northern Nevada contain P323F. That is 46% of 267 specimens from Northern Nevada contain P323F compared to 0.04% of NCBI deposited 268 SARS-CoV-2 isolates. We performed structural modeling of the P323L/F variation and 269 did not find any significant change to the nsp12 conformation with either P323L or P323F, 270 therefore this variant is most likely a neutral mutation and does not confer either a fitness 271 advantage or disadvantage to transmission or pathogenicity of SARS-CoV-2. 272 We find that the overall trend for D614G in Nevada during this time period was 273 similar to what was observed in other states and internationally, with the exception of 274 within Asia where the D614 allele had originated. We noted that there were differences 275 between Northern Nevada and Southern Nevada. In Northern Nevada clade 20C and 276 F323 were more frequent, while during this same time period in Southern Nevada clade 277 20A and L323 were more prevalent. These data indicate that there were distinct genomic 278 profiles of the SARS-CoV-2 viruses that were circulating in these populations during the 279 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint initial months of the pandemic while stay-at-home order were in place to help prevent 280 transmission of the virus. carried out using random hexamers. The synthesized DNA was ligated to one another, 299 followed by isothermal linear amplification. Amplified DNA (1 µg) was enzymatically 300 sheared to an average insert size of 300 bp, and Illumina-compatible dual-indexed 301 sequencing adapters were ligated to the ends. Next, about 300 ng of adapter-ligated 302 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint sample was amplified with 6 cycles of PCR with KAPA HiFi HotStart polymerase (Roche 303 sequence was conducted with a myBaits kit and coronavirus-specific biotinylated probes 305 (Arbor Biosciences). Each enrichment used 500 ng of PCR-amplified DNA, was carried 306 out based on manufacturer instructions at a hybridization temperature of 65° for 16 hours, 307 and was completed with 8-16 cycles of PCR using KAPA HiFi HotStart polymerase. 308 Samples were sequenced using an Illumina Next-seq mid-output (2 x 75). The generated 309 FASTQ files from the sequencing reaction were analyzed as described below. The data . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted . . https://doi.org/10.1101 and combined with our own samples to determine their phylogenetic placement (29, 30) . Sequence of nsp12 (RdRp) protein for SARS-CoV-2 (YP_009725307.0) was 336 retrieved from NCBI protein database and 3D model was structured based on a previously 337 published report (PDB ID: 6XEZ (32). In addition to nsp12 (chain A), the model also 338 contains nsp7 (chain C), nsp8 (chain B and D), nsp13 (chain E and F), ligands (Zn 2+ , 339 Mg 2+ ) and RNA template and product strands. Mutational changes to residue 323 within 340 nsp12 were performed using PyMol Molecular Graphics System (version 2.0, Schrödinger 341 LLC). The original proline (P) was mutated to either leucine (L) or phenylalanine (F) as 342 indicated, these residues along with residues containing side chains within 5 Å of P323L/F 343 are shown as sticks. The rotamers for each P323L/F were assessed and those with the 344 least rotational strain and steric hindrance were used to generate the final image. To 345 determine any NCBI deposited sequences which contain the P323F variant, standard 346 protein BLAST from the BLASTp suite was used to find nsp12 protein sequences which 347 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted . . https://doi.org/10.1101 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted . . https://doi.org/10.1101 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted . . https://doi.org/10.1101 period from a subsampling of sequences deposited in Nextstrain.org. The size of the pie 577 chart corresponds to the relative specimen number for each region. 578 579 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted . . https://doi.org/10.1101 The Architecture of 469 SARS-CoV-2 Transcriptome Isolation and Full-Length Genome Characterization of SARS-CoV-2 from COVID-472 19 Cases in Northern Italy An 81-Nucleotide Deletion in SARS-CoV-2 ORF7a Identified from Sentinel Surveillance in Arizona A new coronavirus associated with human respiratory disease in 480 Characterisation of 483 the transcriptome and proteome of SARS-CoV-2 reveals a cell passage induced 484 in-frame deletion of the furin-like cleavage site from the spike glycoprotein Antigenicity of the SARS-CoV-2 Spike Glycoprotein SARS-CoV-2 Spike: Evidence that D614G Increases 494 Infectivity of the COVID-19 Virus Happi 499 within the U.S. and (e) internationally. The size of the pie chart corresponds to the relative 625 specimen number for each region. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted doi: the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprintThe copyright holder for this this version posted September 11, 2020. Republic of China, in February, 2003 . Lancet 362:1353 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Nextstrain.org were used to generate the proportion of 379C>A in (d) the indicated states 624 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprintThe copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprintThe copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprintThe copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.08.21.20178863 doi: medRxiv preprint 33 Figure 6 : 728 729 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprintThe copyright holder for this this version posted . . https://doi.org/10.1101