key: cord-0787144-rfetytbf authors: Lasek-Nesselquist, E.; Pata, J.; Schneider, E.; St. George, K. title: A tale of three SARS-CoV-2 variants with independently acquired P681H mutations in New York State date: 2021-03-12 journal: nan DOI: 10.1101/2021.03.10.21253285 sha: ce0c919077d5248723b2c7fc070855a5a249da02 doc_id: 787144 cord_uid: rfetytbf Several SARS-CoV-2 variants of concern have independently acquired some of the same Spike protein mutations, notably E484K, N501Y, S477N, and K417T, associated with increased viral transmission and/or reduced sensitivity to neutralization by antibodies. Repeated evolution of the same mutations, particularly in variants that are now rapidly spreading in various regions of the world, suggests a fitness advantage. Mutations at position P681 in Spike, possibly affecting viral transmission, have also evolved multiple times, including in two variants of concern. Here, we describe three variants circulating in New York State that have independently acquired a P681H mutation and the different trajectories they have taken. While one variant rose to high prevalence since later summer 2020 it appears to be in decline. The other two variants were more recently detected in New York and harbor additional Spike mutations that might be cause for continued monitoring. The latter two P681H variants have shown moderate increases in prevalence but ultimately all might be subject to the same fate as more competitive variants come to dominate the scene New York State has served as a bellwether during the SARS-CoV-2 pandemic in the United States (US). It was the initial epicenter of the pandemic in the US, highlighting both the effects of delayed containment efforts and successful mitigation strategies (Rosenberg et al. 2020; Lasek-Nesselquist, Singh, et al. 2021; Maurano et al. 2020; Gonzalez-Reiche et al. 2020 ). An outbreak in Westchester County led to the widespread distribution of the 20C clade, demonstrating the rapidity with which SARS-CoV-2 could spread through a community and beyond (Lasek-Nesselquist, Singh, et al. 2021; Deng et al. 2020; Zhang et al. 2020; Gonzalez-Reiche et al. 2020) . New York was also one of the first to identify the presence of the B.1.1.7 variant in the US (associated with 40-70% increased growth rate; (Volz et al. 2021 ) and possibly 30% increased mortality rate; Iacobucci 2021), serving as both a source and a sink for this variant of concern (Alpert et al. 2021). Recently, the number of B.1.526 variants harboring Spike E484K or S477N mutations (possibly enabling increased transmission and/or evasion of monoclonal antibodies) were identified as rapidly growing within the state (Annavajhala et al. 2021; Lasek-Nesselquist, Lapierre, et al. 2021; West et al. 2021) , underscoring the need for fortified surveillance efforts. Here we report on the trajectory of three SARS-CoV-2 variants in New York with P681H mutations in the Spike protein. The Spike protein is required for entry into the host cell and is composed of an S1 subunit that attaches to the host cell receptor angiotensin-converting enzyme 2 (ACE2), an S2 subunit necessary for fusion between virus and cell membranes, and a multibasic furin cleavage site that separates S1 and S2 subunits upon cleavage, thereby initiating subsequent events that lead to membrane fusion (Shang et al. 2020; Hoffmann, Kleine-Weber, Schroeder, et al. 2020; Walls et al. 2020; Bosch et al. 2003) . P681 is adjacent to the furin cleavage site, which facilitates efficient SARS-CoV-2 transmission and infection (Peacock et al. 2020; Hoffmann, Kleine-Weber & Pöhlmann 2020) . Mutations at this location (P681H and P681R) have been acquired by the B.1.1.7 variant rapidly spreading around the world and the A.23.1 lineage, recently reported as the dominant lineage in Uganda (Rambaut et al. 2020; Bugembe et al. 2021 ). Currently, the P681H mutation is the most commonly identified Spike mutation in SARS CoV-2 genomes from New York State (aside from D614G, which characterizes the B.1. lineage and is essentially ubiquitous in the US). While the majority of these genomes are members of the B.1.243 lineage, we detected independently acquired P681H mutations in several lineages belonging to clades 19B, 20A, 20C, 20D, and 20G. Notably, we identified a lineage B.1.1.222 variant (clade 20B) with both P681H and T478K Spike mutations and a lineage B.1 variant (clade 20C) variant with Spike mutations P681H and S494P. T478 and S494 sit within the receptor binding domain (RBD) responsible for binding the spike protein to the ACE2-receptor of host cells. Mutations at both positions were shown to reduce the activity of some monoclonal antibodies (Liu et al. 2021; Greaney et al. 2021a; Weisblum et al. 2020) . Evidence of convergent evolution as well as the structural location of P681 suggest that mutations at this position could contribute to an adaptive advantage. While the two variants described here show only moderate increases since their first appearances in New York, the combination of P681H and T478K or P681H and S494P mutations could confer additional benefits to the virus and are worthy of continued surveillance. New York is thus an experimental field, where variants endowed with differing fitness compete. It remains to be seen whether the P681H variants described here gain a foothold or succumb to more dominant forms on the rise. As of 2020-03-04, over 12% of all SARS-CoV-2 genomes sequenced in New York carried a P681H mutation (Figure 1 ). After an initial peak from September to November 2020, the proportion of genomes with this mutation remained stable until a small increase in early winter 2020 ( Figure 1 ). The rapid spread of B.1.1.7, which also has a P681H mutation as well as outbreaks in congregant settings, likely contributed to the increase in later months ( Figure 1 ). In fact, excluding B.1.1.7 genomes from the analysis revealed a small decline in the number of P681H-containing genomes by the end of February 2021 ( Figure 1 ). The majority of SARS-CoV-2 genomes within New York state that carry a P681H mutation are members of the B. (Figure 4) . Thus, the foothold this variant had gained in certain locations, mainly Hawaii and New York, could reflect the effects of random processes, such as founder effects, rather than an adaptive advantage. A second variant carrying a P681H mutation has emerged within the 20B clade, lineage B.1.1.222, which also harbors a T478K mutation in the spike protein. Phylogeographic analyses suggest that the B.1.1.222 variant emerged in Mexico around Mid-September 2020 (Mexico 100% CI, 2020-08-26 to 2020-09-29; Figure 2 ) with subsequent spread, including to 48/50 states (the greatest proportion occurring in Texas, California, and New York; Supplementary Table 2 ). The B.1.1.222 variant first appeared in New York on 2020-12-23 and now represents the majority of B.1.1.222 genomes (66%; Table 2 ) from multiple introductions into the state ( Figure 2 ). However, B.1.1.222 variants have increased only ~2% since December 2020 and represent a minor fraction of the circulating viral population in New York (Table 1 and Figure 4) . A third variant carrying a P681H mutation evolved independently within the 20C clade, lineage B.1, with an inferred date of introduction into New York in early November 2020 (NY 100% CI 2020-10-05 to 2020-12-06; data not shown). Additionally, this variant harbors T716I and S494P mutations in the spike protein, the latter of which has been shown to reduce the efficacy of some monoclonal antibodies (Greaney et al. 2021; Liu et al. 2021; Weisblum et al. 2020 ). Examination of almost 6500 B.1 genomes from the United States that were deposited in GISAID with collection dates between 2020-10-01 and 2021-02-26 revealed that only 4.4% harbored both S494P and P681H mutations. Of these, 68% derived from New York, representing the majority of cases (Supplementary Table 2 ). New Jersey, the state with the second highest number of B.1 variants had only 6% (Supplementary Table 2 ). Since its initial detection in New York on 2020-12-01, the number of SARS-CoV-2 genomes with a P681H mutation attributable to this variant has risen, approaching 50% of all SARS-CoV-2 genomes with a P681H mutation in February 2021 (Supplementary Figure 1) . The proportion of SARS-CoV-2 genomes containing P681H and S494P mutations in New York also increased by over 4% since December 2020 (Table 1 and Figure 4 ). This is less than the increase demonstrated by B.1.1.7 and both B.1.526 variants but greater than B.1.427 and B.1.429 (Table 1 and . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; Multiple lineages have independently acquired a P681H mutation in New York and globally ( Figure 2 ). Within New York, B.1.243 genomes with a P681H mutation represent ~7% of the total number of SARS CoV-2 genomes sequenced. This variant showed an initial spike in frequency in late summer 2020 and slowly rose in prevalence only to decrease again in recent months. Sampling biases likely do not substantially influence the decline in the B.1.243 variant. While the regionally diverse sampling that occurred in September and October 2020 coincided with an uptick in COVID-19 cases due to this variant, it was already in decline by January 2021 when more New York counties were sampled at a greater depth ( Figure 5 ). Analyses of B.1.243 in other states suggests that the rise of the B.1.243 variant in New York could be due to chance rather than increased fitness. Indeed, the initial spike of cases due to B.1.243 variants in October and November 2020 and a later smaller spike in December 2020 are associated with outbreaks in congregant settings, where founder effects could be responsible for rapid propagation. Alternatively, it is possible that P681H conferred a minor advantage to the B.1.243 lineage, allowing it to spread more easily in congregant settings, but was outcompeted by more recently emerged, highly transmissible variants, such as B.1.1.7 and B.1.526, and even other variants with a P681H mutation. Indeed, the dramatic decline of B.1.243 variants in February 2021 coincides with the explosion of other variants in New York ( Figure 4 ). However, we cannot rule out the effects of some sampling bias as almost all SARS-CoV-2 genomes sequenced during this time period derived from the NYC metropolitan area (New York City, the Mid-Hudson, and Long Island), which had fewer B.1.243 variants in comparison to the rest of the state ( Figure 5 ). In contrast to the recent decline in B.1.243 variants, other variants with a P681H mutation carrying T478K or S494P mutations have modestly increased in frequency in New York. P681 is located five residues upstream of a multibasic site in the Spike protein, the cleavage of which facilitates processes (such as membrane fusion) required for viral entry into the cell (Shang et al. 2020; Papa et al. 2021) . This mutation also occurs in B.1.1.7 and could contribute to its increased rate of transmission (Davies et al. 2021 ). T478 is located in the RBD of the Spike protein, near the interface with ACE2 ( Figure 6) . A T478I mutation conferred moderate resistance to neutralization by two monoclonal antibodies and by convalescent serum from two patients (Liu et al. 2021) . We would expect the T478K mutation to have a greater impact on neutralization, by introducing a larger residue with a positive charge at this position. The T478K mutation was identified in an in vitro evolution experiment to discover RBD mutations with increased affinity to ACE2, but it was not characterized further (Zahradnik et al., 2021) . It is unclear why the T478K mutation would result in increased binding affinity, although it could be caused by either the change in the electrostatic potential in that region or by inducing a conformational change. S494 is located in a beta strand in the RBD, at the interface with the ACE2 receptor ( Figure 6 ). An in vitro screen for spike mutations that escape neutralization by monoclonal antibodies and convalescent serum identified S494P (Lui et al., 2021) , which was later shown to reduce neutralization by 3-5-fold in some convalescent sera (Greaney et al. 2021b . However, this mutation was not as potent at neutralization as E484K (Greaney et al. 2021b Tchesnokova et al. 2021 ). However, we cannot rule out the effects of sampling biases, particularly for February 2021, when almost all specimens sequenced derived from the NYC metropolitan area -the location where the majority of B.1.1.222 and B.1 variants were circulating ( Figure 5 ). Considering the phenotypic characteristics conferred by the mutations present in B.1 and B.1.1.222 variants, their moderate increase in New York, and that convergent evolution is often indicative of positive selection, we believe both variants are worth monitoring. However, if B.1.243 serves as an example, any potential selective advantage these variants have might be relatively inconsequential in the face of more transmissible or infectious forms, such as B.1.526 and B.1.1.7. Thus, this could be a case of enhanced surveillance noting variants of interest before they became a concern. Respiratory swabs in viral transport medium (VTM, UTM or MTM) were received at the Virology Laboratory, Wadsworth Center, for SARS-CoV-2 diagnostic testing. Total nucleic acid was extracted using either easyMAG or eMAG (bioMerieux, Durham, NC), MagNA Pure 96 and Viral NA Large Volume Kit (Roche, Indianapolis, IN), or EZ1 with the DSP Virus Kit (QIAGEN, Hilden, Germany) following manufacturers' recommendations. Extracted nucleic acid was tested for SARS-CoV-2 RNA using the 2019-Novel Coronavirus Real-Time RT-PCR Diagnostic Panel (Centers for Disease Control and Prevention) according to the Instructions for use on an ABI 7500Dx. Library preparation and whole genome amplicon sequencing of SARS-CoV-2 was performed using a modified ARTIC protocol (https://artic.network/ncov-2019) as described previously (Lasek-Nesselquist, Lapierre, et al. 2021 ). Reads were processed with ARTIC nextflow pipeline (https://github.com/connor-lab/ncov2019artic-nf/tree/illumina, last updated April 2020) as previously described (Lasek-Nesselquist, Lapierre, et al. 2021 ). All SARS-CoV-2 genomes from New York were downloaded from GISAID as of 2021-03-04 or 2021-02-26. Genomes were annotated with NextClade (https://clades.nextstrain.org/), the results of which were inspected for changing trends in the data and further queried for mutations of interest. Pangolin lineages, geographic locations, and collection dates for genomes carrying mutations of interest were then extracted from the associated metadata provided by GISAID. Lineage frequencies were calculated on a monthly basis by dividing the number of genomes of a lineage by the total number of genomes sequenced each month. This was also performed for mutations and variants of interest. Stacked area plots and bubble maps were rendered in R v4.0.3 (https://www.r-project.org/) with ggplot2 and map packages and coloring provided by the viridis package. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 This work was approved by the New York State Department of Health Institutional Review Board, under study numbers 02-054 and 07-022. The authors declare no conflicts of interest. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; Table 1 . Number and proportion of variants in New York for December 2020 to February 2021. Count, the total number of variants for each month; Percent, the proportion of the total number of SARS-CoV-2 genomes sequenced (n = 2197, 3079, 2612 for December, January, and February, respectively as of 2021-03-04) that the variants represent; Increase, the percent difference between February 2021 and December 2020. Jan - . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; Stacked area plot of several mutations detected in SARS-CoV-2 genomes from New York. E484K is associated with three variants of concern: P.1, B.1.351, and B.1.526. L452R is characteristic of B.1.429 and B.1.427 lineages, which were identified as variants of concern in California. N501Y is also present in three variants of concern: B.1.1.7, B.1.351, and P.1. Other is the frequency of all other spike mutations. Q677H has recently been noted for its independent evolution in multiple lineages (Hodcroft et al. 2021) . P681H currently represents the most frequently encountered spike mutation (aside from D614G) in SARS-CoV-2 genomes from New York. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; Other . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; Figure 6 was prepared using the PyMOL Molecular Graphics System v2.3.5 (Schrödinger, LLC). . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 The Coronavirus Spike Protein Is a Class I Virus Fusion Protein: Structural and Functional Characterization of the Fusion Core Complex A SARS-CoV-2 lineage A variant (A.23.1) with altered spike has emerged and is dominating the current Uganda epidemic Estimated transmissibility and severity of novel SARS-CoV-2 Variant of Concern 202012/01 in England A Genomic Survey of SARS-CoV-2 Reveals Multiple Introductions into Northern California without a Predominant Lineage Introductions and early spread of SARS-CoV-2 in the New York City area Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies Emergence in late 2020 of multiple lineages of SARS-CoV-2 Spike protein variants affecting amino acid position 677 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells Covid-19: New UK variant may be linked to increased death rate, early data indicate A statewide analysis of SARS-CoV-2 transmission in New York The localized rise of a B.1.526 variant containing an E484K mutation in New York State Identification of SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization Sequencing identifies multiple early introductions of SARS-CoV-2 to the New York City Region Furin cleavage of SARS-CoV-2 Spike promotes but is not essential for infection and cell-cell fusion The furin cleavage site of SARS-CoV-2 spike protein is a key determinant for transmission due to enhanced replication in airway cells Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations COVID-19 Testing, Epidemic Features, Hospital Outcomes, and Household Prevalence Cell entry mechanisms of SARS-CoV-2 Acquisition of the L452R mutation in the ACE2-binding interface of Spike protein triggers recent massive expansion of SARS-Cov-2 variants Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants Next generation sequencing was performed by the Advanced Genomic Technologies Core of the Wadsworth Center. Initial funding for sequencing was generously provided by the New York Community Trust. This publication was also supported by Cooperative Agreement number NU50CK000516, funded by the Centers for Disease Control and Prevention. Its contents are