key: cord-0941762-hcprycwh authors: Kang, Lin; He, Guijuan; Sharp, Amanda K.; Wang, Xiaofeng; Brown, Anne M.; Michalak, Pawel; Weger-Lucarelli, James title: A selective sweep in the Spike gene has driven SARS-CoV-2 human adaptation date: 2021-03-05 journal: bioRxiv DOI: 10.1101/2021.02.13.431090 sha: 5a54883a74ce5dd67ff028d25c0002fd68af2771 doc_id: 941762 cord_uid: hcprycwh Summary While SARS-CoV-2 likely has animal origins1, the viral genetic changes necessary to adapt this animal-derived ancestral virus to humans are largely unknown, mostly due to low levels of sequence polymorphism and the notorious difficulties in experimental manipulations of coronavirus genomes. We scanned more than 182,000 SARS-CoV-2 genomes for selective sweep signatures and found that a distinct footprint of positive selection is located around a non-synonymous change (A1114G; T372A) within the Receptor-Binding Domain of the Spike protein, which likely played a critical role in overcoming species barriers and accomplishing interspecies transmission from animals to humans. Structural analysis indicated that the substitution of threonine with an alanine in SARS-CoV-2 concomitantly removes a predicted glycosylation site at N370, resulting in more favorable binding predictions to human ACE2, the cellular receptor. Using a novel bacteria-free cloning system for manipulating RNA virus genomes, we experimentally validated that this SARS-CoV-2-unique substitution significantly increases replication in human cells relative to its putative ancestral variant. Notably, this mutation’s impact on virus replication in human cells was much greater than that of the Spike D614G mutant, which has been widely reported to have been selected for during human-to-human transmission2,3. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of Coronavirus Disease 2019 (COVID- 19) , has caused over 60 million infections with at least 1.3 million deaths worldwide as of early November 2020 4 . The virus was first described in late 2019 in Wuhan, China, and quickly spread globally 1 . SARS-CoV-2 is closely related to SARS-CoV, which caused a more limited outbreak in several countries in 2003 5, 6 ; however, several bat and pangolin-derived viruses are even more closely related to SARS-CoV-2, indicative of a zoonotic origin [7] [8] [9] . Bat coronavirus RaTG13-originally isolated in China from Rhinolophus affinis bats in 2013-shares 96% nucleotide identity with SARS-CoV-2 across the genome and ~97% amino acid identity in the Spike (S) protein, which mediates receptor binding and membrane fusion, and is the key coronavirus determinant of host tropism 10 . Similarly, several viruses found in Malayan pangolins (Manis javanica) are closely related to SARS-CoV-2; with up to 97.4% amino acid concordance in the receptor-binding domain (RBD) of the S protein 8, 9 . However, the exact origin and mechanism of cross-species transmission of the SARS-CoV-2 progenitor are still unknown. In the past two decades, the emergence of severe acute respiratory syndrome coronavirus (SARS-CoV) 6, 11, 12 and Middle East respiratory syndrome coronavirus (MERS-CoV) 13 in humans and swine acute diarrhoea syndrome coronavirus (SADS-CoV) into pigs has highlighted the epidemic potential of coronaviruses 14 . Typically, only modest changes to a virus are required to initiate adaptation to a new host; for example, only two amino acid changes were necessary to produce a dramatic difference in human adaptation in both SARS-CoV and MERS-CoV S proteins 15, 16 . This phenomenon is readily observed in other viruses: Ebola viruses' human adaptation following spill-over from bats was at least partly mediated by a single alanine-tovaline mutation at position 82 in the glycoprotein 17, 18 . Similarly, individual amino acid changes have been associated with recent outbreaks of several RNA viruses: chikungunya virus 19 , West Nile virus 20, 21 , and Zika virus 22 . While an individual mutation that likely increases replication of SARS-CoV-2 in humans has been identified-a single aspartic acid to glycine change at position 614 in the S protein 2,3 -this occurred after emergence into humans, and the genetic determinants of SARS-CoV-2's expansion from an animal reservoir into humans remain entirely unknown. For a virus recently acquired through a cross-species transmission, rapid evolution, and a strong signature of positive selection are expected. For example, several rounds of adaptive changes have been demonstrated in SARS-CoV genomes during the short SARS epidemic in 2002-2003 23, 24 . However, in its brief epidemic, SARS-CoV-2 has been characterized by relatively low genetic variation, concealing signals of positive selection, and leading to contradictory reports of limited positive selection 25 , "relaxed" selection 26 , or even negative (purifying) selection 27, 28 . However, these results are based on dN/dS tests that are traditionally designed for eukaryotic interspecies comparisons, and thus ill-equipped to detect hallmark signatures of positive selection in viral lineages with limited sequence divergence 29 . Here, we employ highly sensitive methods enabling detection of selective sweeps, in which a selectively favorable mutation spreads all or part of the way through the population, causing a reduction in the level of sequence variability at nearby genomic sites 30 . With unprecedented statistical power that leverages information from more than 182,000 SARS-CoV-2 genomes, we demonstrate that positive selection has played a critical role in the adaptive evolution of SARS-CoV-2, manifested as selective sweeps in Spike and several other regions, also providing candidate mutations for further analysis and interventions. Given its role in coronavirus host tropism, we hypothesized and experimentally validated that the selective sweep identified in the S protein involves an adaptive mutation increasing replication in human lung cells, which, in turn, could facilitate more efficient human-to-human transmission. Selective sweeps analysis identified a Spike region with high confidence from 182,792 sequences. OmegaPlus 31 and RAiSD 32 were used to find putative selective sweep regions in 182,792 SARS-CoV-2 genomes downloaded from the publicly available GISAID EpiCov database (www.gisaid.org). Eight selective sweep regions were detected, including four in ORF1ab and four in the Spike region ( Fig. 1 & Table 1 ). The Spike protein plays an important role in the receptor recognition and cell membrane fusion process during viral infection, and this protein is highly conserved among all coronaviruses. Next, we screened genomic sites in the Spike region that may be involved in the adaptive evolution of SARS-CoV-2 in the new host by comparing the non-synonymous differences between SARS-CoV-2 and four other Sarbecovirus members (one pangolin coronavirus and three bat coronaviruses; see Materials and Methods). A total of six such sites were identified (Supplementary Table 1) ; notably, only a single site (A1114G) was centrally located in one of the sweep regions, whereby the 372 nd amino acid threonine in the Spike protein of the four Sarbecovirus members was substituted with alanine (Thr372Ala) in human SARS-CoV-2. Out of the 182,792 SARS-CoV-2 genomes, no sequence polymorphism was found in this position (1114G), suggesting a rapid fixation of this mutation via hard sweep. The alternative, putatively ancestral, coronavirus variant (A1114) was perfectly conserved in Sarbecovirus members from bats and pangolin. Structure-based analysis of SARS-CoV-2 S protein variants. Comparative molecular modeling of WT (372A -SARS-CoV-2), 372T, and 614G S protein was performed to connect the selective sweep G1114A mutation to structural data (Fig. 2 ). Structures were energy-minimized after mutation and analyzed for change in ACE2 The N370-glycosylated 372T S protein, colored by the residue side-chain properties -colors represent: green for hydrophobic, blue for positively charged, red for negatively charged, teal for polar uncharged, and gray for neutral. We generated the Spike A372T reverse mutant with the putatively ancestral G1114 nucleotide, using a bacteria-free cloning approach we have previously developed to prevent bacterial toxicity associated with manipulating unstable viral genomes in bacteria 34, 35 . Concurrently, we generated the Spike D614G mutant that has been associated with higher titers in nasopharyngeal swabs in humans and increased replication in human cells and hamsters 2,3 . Both mutants were constructed in an infectious clone originally produced in yeast 36 of SARS-CoV-2 strain 2019-nCoV BetaCoV/Wuhan/WIV04/2019 1 . A schematic of the A372T mutant is presented in Figure 3a ; while not depicted, the D614G mutant was made by replacing the WT codon (GAT) with the glycine-encoding codon (GGC). Following virus rescue, viral plaque morphology was similar for all three viruses, although the A372T mutant plaques appear slightly smaller (Fig. 3b) . We next evaluated the replication kinetics of each virus--WT, Spike A372T, and Spike D614G--in Vero E6 and Calu-3 cell lines, monkey kidney and human lung epithelial cell lines, respectively. Following infection in Vero E6 cells, viral titers rose rapidly for all three viruses, and only minor differences in peak titers were observed among the viruses (Fig. 3c) . In Calu-3 cells, the D614G mutant produced significantly higher titers than WT 1 day post-infection (dpi) but levels were similar for the remaining timepoints ( Fig. 3d; p=0 .0066 by 2way ANOVA with Dunnett's correction at 1 dpi). No differences were observed 24 hours after infection between the WT and A372T mutant but later timepoints showed a marked reduction in replication for the A372T mutant (p=0.0033, <0.0001, and <0.0001 for 2, 3, and 4 dpi, respectively). Compared to WT, the D614G had modest differences of 2.9-, 2.9-, 1.3-, and 0.8-fold in viral titers on 1, 2, 3, and 4 dpi, respectively; in contrast, as compared to WT, A372T titers were 1.8-, 5.5-, 31.1-, and 64.1-fold lower on 1, 2, 3, and 4 dpi, respectively (Fig. 3d) . These data indicate that an alanine at Spike position 372 confers a robust fitness advantage over several timepoints in human lung cells and that this effect is considerably more substantial than the change at position 614. Based on structural analysis, others postulated that the SARS-CoV-2 Spike trimer would have higher thermal stability than the Spike of bat virus RaTG13 37 . To determine if A372T altered SARS-CoV-2 thermal stability, we incubated 10 5 PFU of WT SARS-CoV-2, D614G, or A372T at room temperature (~25°C) or 37°C to mimic environmental and human body temperature, respectively. A372T titers did not significantly differ from WT at any timepoint for either temperature (Fig. 3e-f ). Following 48-hours incubation at room temperature, the titer of D614G was higher than WT SARS-CoV-2 (p=0.0303), which is consistent with previous reports 3 . nucleotide polymorphism (SNP) that is fixed in all SARS-CoV-2 genomes sequenced to date, while an alternative, and presumably ancestral SNP, is fixed in the other members of the Sarbecovirus lineage. Residue 372 lies within the RBD (Fig. 2a) , which mediates viral entry through the human ACE2 receptor 1 . While positioned adjacent to the ACE2 interface of the RBD, the presence of an alanine at position 372 (372A) is predicted to remove a glycosylation site present at the asparagine at position 370 37 , which may alter S protein maturation or receptor binding (Fig. 2) . Indeed, molecular modeling of an N-glycan at N370 in an open conformation of 372T S protein shows a highly solvent accessible glycan site (Fig. 2) . In the closed conformation of 372T, the N370 glycan site becomes less solvent exposed and further fills a solvent-accessible region on the outer edge of the RBD. N-Glycans are known to modulate the RBD of S protein, with glycans at position N165 and N234 influencing the open/closed metastable conformation states of the RBD, and Nglycans at N331 and N343 serving more of a shielding role of the RBD itself regardless of state 33 . N370 glycosylation is in close structural proximity of the N-glycan site at N343 and is in relative distance to the RBM and RBD/ACE2 interface (Fig. 2a-c) . Free energy of binding of ACE2 to S protein indicates a decrease in relative binding affinity of ACE2 to S protein in the N370-glycosylated 372T variant compared to WT (-69.825 kcal/mol vs -180.503 kcal/mol, respectively). We hypothesize that while an additional N-glycan at N370 in 372T S protein could contribute to glycan shielding of the RBD, its proximity to the RBM and center positioning between essential glycan sites could preclude RBD/ACE2 interaction affinity or ability to transition between open/close states, decreasing RBD/ACE2 affinity and subsequent virulence. Using a reverse genetics system to generate a SARS-CoV-2 mutant containing the putative ancestral Calu-3 human lung epithelial cells (Fig. 3d) . Further, the growth of the A372T S mutant was greatly reduced for multiple days, which may be indicative of an impact on viral shedding kinetics in humans. Of note, we also generated the D614G S mutant here--widely reported to increase SARS-CoV-2 infectivity 2 -which only increased viral titers by a maximum of 2.9-fold in Calu-3 cells compared to WT, a finding that is consistent with previous results 3 . We also observed a slight attenuation for the A372T S mutant in Vero E6 cells (3.8-fold lower titers compared to WT 2 dpi). The large differences in replication differences between the two cell lines suggest a cell-specific mechanism of attenuation. In fact, besides their species of origin, Calu-3 and Vero E6 cells differ in several important aspects. First, Vero E6 cells are deficient in type-1 interferon signaling 40 , which inhibits SARS-CoV-2 replication 41, 42 . However, the S protein is not known to antagonize IFN production, and, therefore, interferon is unlikely driving the differences observed here. Additionally, the S protein requires host-mediated proteolytic cleavage to undergo fusion, which can be driven by several proteases, including TMPRSS2 at the cell surface and cathepsins B and L (CatB/L) in endosomes 43 . Notably, Calu-3 cells express low levels of cathepsins but high levels of TMPRSS2, suggesting a TMPRSS2-dependent entry mechanism in Calu-3 cells 44 . In contrast, SARS-CoV-2 infection of Vero E6 cells is CatB/L dependent 43 . Clinical isolates of coronaviruses prefer entry through TMPRSS2 as opposed to CatB/L 45, 46 ; accordingly, Calu-3 cells mimic the human environment closely in terms of S protein priming. Thus, these data hint that the A372T S mutant's attenuation could be mediated by inefficient TMPRSS2 cleavage of Spike. Host proteases have been implicated in the cross-species transmission of MERS-CoV from bats to humans 16, 47 . Hence, it will be important for future studies to define the importance of TMPRSS2-mediated cleavage of the S protein in the context of these mutations. We did not observe large temperature stability differences between viruses here. A previous report predicted that the SARS-CoV-2 S protein would have higher thermal stability than the S protein of bat coronavirus RaTG13 37 ; however, it does not appear that the residue difference at position 372 dictates this difference. It may also be that at different timepoints or temperatures that differences would have been observed; nonetheless, these data suggest that thermal stability is not a likely driving factor in the emergence of the variants at position 372 or 614 in the S protein. Overall, our data supply solid evidence that S protein residue 372 is critical for replication in human Although the experimental data presented here clearly demonstrate the dramatic impact of the S protein A372T mutation on virus replication in human lung cells, we cannot definitively conclude that it enabled efficient human-to-human transmission or that it was necessary for cross-species transmission. Our findings suggest, though, that efficient replication in a human would be unlikely with a threonine at S protein position 372, from which we could infer that transmission would be equally unlikely. Since the true putative SARS-CoV-2 ancestor has not been isolated, it is impossible to know when this mutation may have arisen. Phylogenetic estimates suggest SARS-CoV-2 emerged in late November 2019 to early December 2019 53 , though the first known case was not detected until December 1st, 2019 54 . However, this case had no connection to the Huanan seafood market, indicating that transmission was ongoing before early December or that the seafood market is not the origin of the pandemic, but rather a spreading point. While it is impossible to know SARS-CoV-2's exact emergence date, it seems likely that transmission occurred unnoticed for some period of time, thereby providing a window for SARS-CoV-2's ancestor to adapt to human replication. Ethics and biosafety. The generation of recombinant SARS-CoV-2 was approved by the Institutional Biosafety 36 . The viral rescue procedure is described below. however, it was included for consistency with our previous studies 34 . We then amplified the circular product using the FemtoPhi DNA Amplification (RCA) Kit with Random Primers (Evomic Science). Virus rescue. RCA reactions were linearized with EagI-HF (NEB) and then column purified (Macherey-Nagel). The N expression plasmid was linearized using EcoRV-HF (NEB). Capped-RNA was produced using the mMESSAGE mMACHINE T7 Transcription Kit (Invitrogen) by overnight incubation (~16 h) at 20°C using 2-3 μg of DNA. We used this lower temperature to obtain more full-length transcripts 67 . Reactions for full-length viral transcripts were supplemented with an additional 4.5 mM of GTP. We electroporated the RNA transcripts into a mixture of Vero E6 (75%) and BHK-21 (25%) cells containing a total of 2x10 7 cells per electroporation 36 . The Bio-Rad Gene Pulser Xcell Electroporation System was used with the following conditions: 270 volts, resistance set to infinity, and capacitance of 950 μF 68 . Before pulsing, the cells were washed thoroughly and then resuspended in Opti-Mem (Invitrogen). Following a single pulse, cells were allowed to incubate at room temperature for 5 minutes and we then added fresh growth media before seeding a T-75 flask and placing it at 37°C with 5% CO2. The cells were monitored daily and the supernatant was harvested at 25% CPE. Sequences were confirmed by Sanger sequencing of virus stocks. Only virus direct from transfection (p0 stock) was used for further characterization. Virus titers were assessed by plaque assay on Vero E6 cells. Plaque assays and growth curves. Viral titration was performed on Vero E6 cells by plaque assay. Briefly, serial ten-fold dilutions of each sample were made and then added to confluent monolayers of Vero E6 cells. An overlay containing 0.6% tragacanth gum (Millipore Cat# 104792) was then added; plaques were visualized following formalin fixation and staining with crystal violet. For growth curves, Vero E6 and Calu-3 were infected at a multiplicity of infection (MOI) of 0.05 with each virus. Following one-hour of infection, we removed the virus inoculum, washed once with 1x PBS, and added fresh growth media. We then collected supernatant as the 0-day timepoint and daily thereafter until 50% cytopathic effect (CPE) was observed, each time replacing the volume taken with fresh growth media. Infectious virus was measured by plaque assay on Vero E6 cells. Temperature stability. A virus stock containing 10 5 PFU of each virus was prepared in RPMI-1640 containing 2% FBS and 10 mM HEPES. The virus stock was aliquoted into tubes in triplicate or quadruplicate for each timepoint; a 0-hour timepoint was collected immediately and stored at -80°C for normalization. At each timepoint, we placed a subset of the tubes at -80°C for storage until virus titration by plaque assay. The remaining virus was calculated by dividing the individual titers at each timepoint by the average of the viral titer at the 0-hour timepoint. Statistics. Statistical analyses were performed in GraphPad version 9. Viral titers were compared to WT using a two-way ANOVA with Dunnett's multiple comparisons test; a p-value of less than 0.05 was considered significant. The limit of detection for our plaque assays is 2.3 log10 PFU/mL; however, negative values were given an arbitrary value of 0.9 plaques for a ten-fold diluted sample, which corresponds to 2.26 log10 PFU/mL. A pneumonia outbreak associated with a new coronavirus of probable bat origin Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus Spike mutation D614G alters SARS-CoV-2 fitness COVID-19 Map -Johns Hopkins Coronavirus Resource Center Coronavirus as a possible cause of severe acute respiratory syndrome Characterization of a novel coronavirus associated with severe acute respiratory syndrome A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission A novel coronavirus associated with severe acute respiratory syndrome Identification of a novel coronavirus in patients with severe acute respiratory syndrome Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Fatal swine acute diarrhoea syndrome caused by an HKU2-related coronavirus of bat origin Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2 Two Mutations Were Critical for Bat-to-Human Transmission of Middle East Respiratory Syndrome Coronavirus Human Adaptation of Ebola Virus during the West African Outbreak Ebola Virus Glycoprotein with Increased Infectivity Dominated the 2013-2016 Epidemic Two Chikungunya isolates from the outbreak of La Reunion (Indian Ocean) exhibit different patterns of infection in the mosquito, Aedes albopictus A newly emergent genotype of West Nile virus is transmitted earlier and more efficiently by Culex mosquitoes Evolutionary enhancement of Zika virus infectivity in Aedes aegypti mosquitoes Characterization of severe acute respiratory syndrome coronavirus genomes in Taiwan: molecular epidemiology and genome evolution Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China Computational Inference of Selection Underlying the Evolution of the Novel Coronavirus, Severe Acute Respiratory Syndrome Coronavirus 2 The origin and underlying driving forces of the SARS-CoV-2 outbreak Emergence of SARS-CoV-2 through recombination and strong purifying selection Comparative genomic analysis revealed specific mutation pattern between human coronavirus SARS-CoV-2 and Bat-SARSr-CoV RaTG13. Cold Spring Harbor Laboratory The population genetics of dN/dS The hitch-hiking effect of a favourable gene OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets RAiSD detects positive selection based on multiple signatures of a selective sweep and SNP vectors Beyond Shielding: The Roles of Glycans in the SARS-CoV-2 Spike Protein Using barcoded Zika virus to assess virus population structure in vitro and in Aedes aegypti mosquitoes Molecularly barcoded Zika virus libraries to probe in vivo evolutionary dynamics Rapid reconstruction of SARS-CoV-2 using a synthetic genomics platform SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects Summary of probable SARS cases with onset of illness from 1 Middle East respiratory syndrome coronavirus (MERS-CoV) Defectiveness of interferon production and of rubella virus interference in a line of African green monkey kidney cells (Vero) Antiviral activities of type I interferons to SARS-CoV-2 infection Inhibition of SARS-CoV-2 by type I and type III interferons SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Calu-3 cells are largely resistant to entry driven by filovirus glycoproteins and the entry defect can be rescued by directed expression of DC-SIGN or cathepsin L Wild-type human coronaviruses prefer cell-surface TMPRSS2 to endosomal cathepsins for cell entry Clinical Isolates of Human Coronavirus 229E Bypass the Endosome for Cell Entry Receptor usage and cell entry of bat coronavirus HKU4 provide insight into bat-to-human transmission of MERS coronavirus Positive Selection of ORF1ab, ORF3a, and ORF8 Genes Drives the Early Evolutionary Trends of SARS-CoV-2 During the 2020 COVID-19 Pandemic Pathogenesis and transmission of SARS-CoV-2 in golden hamsters Simulation of the Clinical and Pathological Manifestations of Coronavirus Disease 2019 (COVID-19) in a Golden Syrian Hamster Model: Implications for Disease Pathogenesis and Transmissibility Infection and Rapid Transmission of SARS-CoV-2 in Ferrets SARS-CoV-2 is transmitted via contact and via the air between ferrets Clinical features of patients infected with 2019 novel coronavirus in Wuhan Minimap2: pairwise alignment for nucleotide sequences MAFFT multiple sequence alignment software version 7: improvements in performance and usability Receptor binding and priming of the spike protein of SARS-CoV-2 for membrane fusion Release 2020-2: Maestro. Schrodinger, LLC The PyMOL molecular graphics system Prediction of Glycosylation Sites in Proteins Antibody structure determination using a combination of homology modeling, energy-based refinement, and loop prediction Structure-based approach to the prediction of disulfide bonds in proteins Applying physics-based scoring to calculate free energies of binding for single amino acid mutations in protein-protein complexes A hierarchical approach to all-atom protein loop prediction On the Role of the Crystal Environment in Determining Protein Side-chain Conformations Identification of a 450-bp region of human papillomavirus type 1 that promotes episomal replication in Saccharomyces cerevisiae Improved synthesis of full-length RNA probe at reduced incubation temperatures An Infectious cDNA Clone of SARS-CoV-2 Competing interests. The authors declare no competing interests.