key: cord-0845087-3jphe0y8 authors: Martin, Ross; Li, Jiani; Parvangada, Aiyippa; Perry, Jason; Cihlar, Tomas; Mo, Hongmei; Porter, Danielle; Svarovskaia, Evguenia title: Genetic Conservation of SARS-CoV-2 RNA Replication Complex in Globally Circulating Isolates and Recently Emerged Variants from Humans and Minks Suggests Minimal Pre-Existing Resistance to Remdesivir date: 2021-02-05 journal: Antiviral Res DOI: 10.1016/j.antiviral.2021.105033 sha: 73d6ecfb2085a4300b44ec50d74f27d51aac2bd1 doc_id: 845087 cord_uid: 3jphe0y8 Remdesivir (RDV) exhibits potent antiviral activity against SARS-CoV-2 and is currently the only drug approved for the treatment of COVID-19. However, little is currently known about the potential for pre-existing resistance to RDV and the possibility of SARS-CoV-2 genetic diversification that might impact RDV efficacy as the virus continue to spread globally. In this study, >90,000 SARS-CoV-2 sequences from globally circulating clinical isolates, including sequences from recently emerged United Kingdom and South Africa variants, and >300 from mink isolates were analyzed for genetic diversity in the RNA replication complex (nsp7, nsp8, nsp10, nsp12, nsp13, and nsp14) with a focus on the RNA-dependent RNA polymerase (nsp12), the molecular target of RDV. Overall, low genetic variation was observed with only 12 amino acid substitutions present in the entire RNA replication complex in ≥0.5% of analyzed sequences with the highest overall frequency (82.2%) observed for nsp12 P323L that consistently increased over time. Low sequence variation in the RNA replication complex was also observed among the mink isolates. Importantly, the coronavirus Nsp12 mutations previously selected in vitro in the presence of RDV were identified in only 2 isolates (0.002%) within all the analyzed sequences. In addition, among the sequence variants observed in ≥0.5% clinical isolates, including P323L, none were located near the established polymerase active site or sites critical for the RDV mechanism of inhibition. In summary, the low diversity and high genetic stability of the RNA replication complex observed over time and in the recently emerged SARS-CoV-2 variants suggests a minimal global risk of pre-existing SARS-CoV-2 resistance to RDV. variant was described as rapidly spreading across the United Kingdom [21] . The B.1.1.7 variant 1 contained a signature of spike amino acid changes including H69/V70 deletion, Y144 deletion, 2 N501Y, A570D, D614G, P681H, T716I, S982A, and D1118H. Another SARS-CoV-2 variant, 3 B.1.351 commonly referred to as 501Y.V2, was described to be spreading quickly across South 4 Africa [22] . The B.1.351 variant also contains a signature of spike amino acid changes including 5 K417N, E484K, and N501Y. In addition, the transmission of SARS-CoV-2 between humans 6 and minks has been reported near mink farms in Denmark and the Netherlands [23]. These 7 events raise concerns of possible further diversification of the SARS-CoV-2 genomes and the 8 potential impact on the efficacy of therapeutics and vaccines. 9 In this report, we analyzed amino acid substitutions in all critical parts of the RNA replication 10 complex of SARS-CoV-2 in geographically diverse human clinical isolates including the recent 11 emergent United Kingdom and South Africa variants and isolates from minks using publicly 12 available sequences to assess global genetic diversification and its potential impact on the 13 susceptibility of circulating viruses to RDV. 14 15 2. METHODS 16 SARS-CoV-2 full genome sequences from clinical isolates and minks collected from December 18 2019 through early September 2020 were obtained from GISAID EpiCov database [24] . 19 Sequences were aligned to the Wuhan-Hu-1 viral isolate sequence (NC_045512) using the Mafft 20 Sequence Aligner v7.394 [25] . Nucleotide and amino acid changes across the SARS-CoV-2 21 genome were tabulated. Sequences (N=3,590 and N=5 clinical and mink isolate sequences, 22 respectively) were excluded that contained ambiguous bases within RdRp (nsp12). A total of 1 N=92,334 clinical isolate sequences and N=333 mink sequences were included in the analysis. 2 The nsp12 nucleotide sequences were adjusted for the ribosomal slippage at 3 nucleotide position 13468 and translated for amino acid analysis. For nucleotide and amino acid 4 analysis, if indels spanned multiple consecutive positions, only one change was counted. 5 Ambiguous nucleotides (N) were ignored in counts of nucleotide changes. A linear least-squares 6 regression was used to find trends in frequencies of amino acid substitutions over time (per 7 month). 8 9 Additional SARS-CoV-2 sequences containing Spike N501Y mutation were downloaded on 10 December 21, 2020. These sequences were separated into B. To assess the genetic variation of nsp12 and other proteins involved in RNA replication, publicly 10 available SARS-CoV-2 sequences from the ongoing pandemic were downloaded. A total of 11 N=92,334 full genome sequences from human clinical isolates and N=333 isolates from minks 12 were selected and aligned to the reference strain Wuhan-Hu-1 viral isolate (NC_ 045512). To 13 note, the USA-WA-1 viral isolate (MN985325), to which most RDV antiviral activity testing 14 was previously performed, contains no amino acid substitutions in the RNA replication complex 15 as compared to the Wuhan-Hu-1 viral isolate. The number of sequences obtained, together with 16 time and geographical location of sample collections are summarized in Table 1 . The clinical 17 isolate sequences were obtained from 109 countries from December 2019 up to September 2020. 18 The mink sequences were obtained from Netherlands and Denmark from April to September 19 2020. 20 The number of nucleotide changes relative to the reference strain Wuhan-Hu-1 (NC_ 045512) 22 across the full genome and counts of amino acid substitutions across all ORFs for each human 23 clinical isolate was calculated. Among the 92,334 analyzed SARS-CoV-2 clinical isolates, there 1 were on average 9.8 (range 0-196, median 9) nucleotide changes and 5.6 (range 0-117, median 5) 2 amino acid changes compared to the reference sequence (Supplementary Table 1 ). The average 3 number of both nucleotide and amino acid changes increased steadily from December 2019 4 through September 2020 (Supplementary Table 1) . 5 The number of amino acid substitutions within the RNA replication complex (nsp7, nsp8, nsp10, 7 nsp12, nsp13, and nsp14) was calculated for the whole set of analyzed human clinical isolates. 8 Low variation was observed across all genes encoding proteins of the RNA replication complex 9 (see Supplementary Figures 1-6 ) with only 12 amino acid substitutions present in ≥0.5% of all 10 clinical isolates (see Table 2 ). No amino acid substitutions with frequency ≥0.5% were observed 11 in nsp8 or nsp10. The most prevalent substitution in RNA replication complex was nsp12 12 P323L, which was observed in 75,892/92,334 (82%) clinical isolates from 103 of 109 countries. 13 Residue P323 of nsp12 is polymorphic across different coronaviruses, but conserved within each 14 coronavirus (see Supplementary Figure 8 and Supplementary Table 3 ). Excluding P323L, no 15 other substitutions were observed in 87% of SARS-CoV-2 clinical isolates in nsp12. No amino 16 acid substitutions were observed in 97%, 97%, 99%, 88%, and 91% of clinical isolates in nsp7, 17 nsp8, nsp14, nsp10, and nsp13, respectively. 18 The frequency of amino acid substitutions within the RNA replication complex was assessed in 20 N=333 mink isolates. Low variation was observed across all proteins with only N=12 amino 21 acid substitutions in ≥0.5% of mink isolates. No amino acid substitutions with frequency ≥0.5% 22 were observed in nsp14 or nsp10. Similar to human clinical isolates, the most prevalent 1 substitution was nsp12 P323L, which was observed in 89.5% of mink isolates. Nsp12 T739I was 2 observed in 26.1%, Nsp13 I285V in 27.9%, and Nsp13 R392C in 10.5% of mink isolates. Each 3 of these substitutions except for Nsp12 P323L was observed in <0.5% of human clinical isolates 4 (Table 3) . 5 Human Clinical Isolates 7 In the next step, we focused our analysis on temporal changes in genetic diversity of SARS-8 CoV-2 replication complex genes. In order to obtain reliable and interpretable data, we included 9 only months with ≥1000 clinical isolate sequences available and focused on amino acid 10 substitutions with total frequency ≥0.5% In the genes of the RNA replication complex, only the 11 Nsp12 P323L substitution consistently increased in frequency in human clinical isolates over 12 time. Nsp13 substitutions P504L and Y541C decreased in frequency over time. (Supplementary 13 table 2 ). Available sequence data for mink isolates was limited and therefore excluded from 14 this analysis. 15 Tables 4 and 5 ), respectfully. The most prevalent 22 substitution in RNA replication complex was nsp12 P323L, which was observed in 100% of the 23 B.1.1.7 isolates and the B.1.351 isolates analyzed. Both SARS-Cov-2 variants also each 1 contained one prevalent substitution in Nsp13; K460R was observed in 51% of B.1.1.7 isolates 2 and T588I was observed in 14% of B.1.351 isolates. RDV has been shown to have similar 3 antiviral activity against early-lineage SARS-CoV-2 and the B.1.1.7 variant in both human 4 gastrointestinal and lung epithelial cells [26] . None of the other substitutions observed are 5 expected to reduce RDV susceptibility, but experimental testing will be needed to validate. 6 Amino acid sequence alignments demonstrate that nsp12 of SARS-CoV-2 exhibits 96% 8 sequence identity with SARS-CoV, 71% with MERS-CoV, and 66% with MHV. However, the 9 residues that define the nsp12 polymerase active site show significantly higher conservation, 10 approaching 100% [17] . Furthermore, the nsp12 residues F476 and V553 that changed in MHV 11 as a result of RDV in vitro selection in prior studies [19] , corresponding to F480 and V557 in 12 SARS-CoV-2, are 100% conserved across all Alpha, Beta, and Gamma CoVs (see 13 Supplementary Figure 7) . Finally, the nsp12 S861 residue that is responsible for RDV's delayed 14 chain termination is 100% conserved across Alpha, Beta, and Delta CoV's, but is an alanine in 15 Gamma CoV's. 16 Among the whole set of >90,000 analyzed SARS-CoV-2 human clinical isolates, nsp12 amino 17 acid substitution F480L was detected in a single isolate sequence and V557A in a single 18 independent isolate sequence. This observation indicates an extremely low frequency (0.002%) 19 of potentially pre-existing reduced susceptibility to RDV among circulating clinical isolates of 20 SARS-CoV-2. S861F was observed in another single independent isolate sequence. In the 21 S861 residues in the context of SARS-CoV-2 viral replication are in progress and will be 1 reported independently. 2 CoV-2 Clinical Isolates 4 All observed amino acid substitutions in nsp12, nsp7, nsp8, and nsp13 with a frequency greater 5 than 0.5% in clinical isolates were mapped to a cryo-EM structure of the replication complex 6 (PDB 6XEZ [4], see Figure 1 ). A similar map was generated on a homology model of nsp14 7 [27] (see Supplementary Figure 9 ). With respect to nsp12, none of the observed substitutions 8 were determined to have any direct interaction with pre-incorporated RDV-TP, the site of 9 delayed chain termination near S861, or the site of template dependent inhibition near A558. 10 P323L, A97V and T141I are all solvent exposed residues which are > 25 Å from the polymerase 11 active site. A449V, with an occurrence of 0.57%, is in the fingers domain and could have an 12 indirect impact on residues in the F-motif, including V557 and A558. A similar indirect effect 13 on RDV susceptibility was recently identified for Ebola virus, where an F548S mutation, also in 14 the fingers domain, was seen to confer low level resistance [28] . While prediction of the 15 potential effect of substitutions on drug resistance can involve multiple factors beyond a simple 16 direct interaction with the inhibitor molecule, our assessment is that the potential for significant 17 impact of the identified pre-existing nsp12 substitutions on RDV susceptibility appears to be 18 low. With the recent transmission of SARS-CoV-2 to minks, genetic differences were evaluated for 19 potential impact to RDV. In the mink dataset, low variation was also observed in the RNA 20 replication complex and nsp12 P323L was the most frequent amino acid substitution, consistent 21 with the human data. The transmission of SARS-CoV-2 from humans to other species and back 22 to humans, may introduce further genetic diversification over time and should continue to be 23 monitored for emergence of changes that could impact efficacy of therapeutics and vaccines. 1 From the mink dataset evaluated in this report, the genetic diversification impacting RDV 2 efficacy seems to be minimal. 3 Previous characterization of the in vitro resistance profile of RDV using related CoVs indicated a 4 high genetic barrier to RDV resistance, and identified substitutions at two highly conserved 5 nsp12 amino acid residues corresponding to F480 and V557 in SARS-CoV-2 to be associated 6 with low-level reduced susceptibility of coronaviruses to RDV in vitro [19] . Among >90,000 7 circulating human clinical isolates of SARS-CoV-2 analyzed in the present study, only one 8 isolate was found to have an amino acid substitution at residue F480, and one other independent 9 isolate was found to have a substitution at residue V557. Another residue, S861, which was 10 previously implicated in the delayed chain termination of RDV [17] , was found to be conserved 11 in all but one single isolate. In the B.1.1.7 and B.1.351 SARS-CoV-2 variants, no changes were 12 observed at F480, V557, or S861. 13 Given the acute nature of SARS-CoV-2 infection, high sequence conservation in the SARS-14 CoV-2 RNA replication complex, and a short treatment duration with RDV, the probability of 15 resistance development in patients treated with RDV is believed to be low. However, further 16 studies are needed to assess the potential for resistance development specifically in the context of 17 RDV treatment of patients with active SARS-CoV-2 infection. For instance, based on a recent 18 study of Ebola virus resistance to RDV [28] , focus on residues in the fingers domain, such as 19 A449V observed in 0.57% of clinical isolates, and their potential impact on template dependent 20 inhibition may be warranted. Currently, in vitro resistance selections with RDV and SARS-21 CoV-2 are being conducted to compare with the findings from previous studies with related 22 CoVs [19] as well as potentially identifying additional residues that might confer reduced 23 susceptibility to RDV. To determine the potential emergence of RDV resistance and its 1 implications for a clinical response to RDV treatment, viral sequencing of SARS-CoV-2 isolates 2 before, during, and after RDV treatment is in progress as a part of the analysis of both the 3 completed and ongoing clinical studies. In addition, the SARS-CoV-2 sequence analysis 4 methods combined with drug target structural analysis described herein can be applied to 5 monitor pre-existing or emerging resistance to any other direct-acting antiviral drugs including 6 other nucleoside RdRp inhibitors, viral protease inhibitors or neutralizing antibodies. 7 In conclusion, results of this extensive sequence analysis across >90,000 global SARS-CoV-2 8 isolates, including the recently emerged variants, highlights the low diversity and high genetic 9 stability of the RNA replication complex, and suggest a minimal global risk of pre-existing 10 SARS-CoV-2 resistance to RDV. 11 12 Acknowledgements 13 We would like to thank the GISAID data submitters and curators supporting progress in the 14 understanding of the SARS-CoV-2 genetics. 15 J o u r n a l P r e -p r o o f nsp7 is in pink, two subunits of nsp8 are in yellow and blue, and two subunits of nsp13 are in 5 orange and white. Only amino acid substitutions in nsp12 are annotated. None of the mapped 6 mutations are seen to interact directly with the nsp12 polymerase active site (Pol) or locations 7 associated with either delayed chain termination or template dependent inhibition. 8 9 10 J o u r n a l P r e -p r o o f • Little is known about the potential for pre-existing resistance to RDV and the global spread of 4 SARS-CoV-2 and transmission between human and other species may lead to genetic 5 diversification 6 • A large set of SARS-CoV-2 human clinical isolates (>90,000), including the recently emerged 7 United Kingdom and South African SARS-CoV-2 variants, and mink isolates (>300) sequences 8 were investigated for genetic changes in the RNA replication complex since the start of 9 pandemic. • Low genetic diversity was observed in RNA replication complex in human clinical isolates and 11 mink isolates 12 • Amino acid substitutions previously identified to cause reduced susceptibility to RDV in-vitro 13 were observed at extremely low frequency (0.002%). A Novel Coronavirus from Patients with Pneumonia in China A Structural View of SARS-CoV-2 RNA 14 Replication Machinery: RNA Synthesis, Proofreading and Final Capping Structure of the SARS-CoV nsp12 polymerase bound to nsp7 and 16 nsp8 co-factors Structural Basis for Helicase-Polymerase Coupling in the 18 SARS-CoV-2 Replication-Transcription Complex Broad-Spectrum Antiviral GS-5734 Inhibits Both 20 GS-5734 and its parent nucleoside analog inhibit Filo-, Pneumo-, 22 and Paramyxoviruses