key: cord-0795120-6xrv0908 authors: McCarthy, Kevin R.; Rennick, Linda J.; Nambulli, Sham; Robinson-McCarthy, Lindsey R.; Bain, William G.; Haidar, Ghady; Duprex, W. Paul title: Natural deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape date: 2020-11-19 journal: bioRxiv DOI: 10.1101/2020.11.19.389916 sha: 688933f7bd812be7df2965478e8c8451a72d1293 doc_id: 795120 cord_uid: 6xrv0908 Zoonotic pandemics follow the spillover of animal viruses into highly susceptible human populations. Often, pandemics wane, becoming endemic pathogens. Sustained circulation requires evasion of protective immunity elicited by previous infections. The emergence of SARS-CoV-2 has initiated a global pandemic. Since coronaviruses have a lower substitution rate than other RNA viruses this gave hope that spike glycoprotein is an antigenically stable vaccine target. However, we describe an evolutionary pattern of recurrent deletions at four antigenic sites in the spike glycoprotein. Deletions abolish binding of a reported neutralizing antibody. Circulating SARS-CoV-2 variants are continually exploring genetic and antigenic space via deletion in individual patients and at global scales. In viruses where substitutions are relatively infrequent, deletions represent a mechanism to drive rapid evolution, potentially promoting antigenic drift. These data from PLTI1 prompted us to interrogate a number of patient metadata sequences deposited in GISAID (17). In searching for viruses similar to those obtained from PLTI1 we found eight patients with deletions in the S protein that had viruses sampled longitudinally over a period of weeks to months (Figs. 1A and S1A). For each, early time points had intact S sequences and at later time points deletions within the S gene. Six had deletions that were identical to, overlapping with or adjacent to those in PLTI1. Deletions at a second site developed in the other two patients (Fig. 1B) . Viruses from seven patients possessed unique constellations of substitutions that were present at both early and late time points (Fig. S1B) . These differentiate the viruses from each patient and strongly suggest that the deletion variants were not acquired in the community or nosocomially. Two unrelated patients with similar deletions have been recently reported by Avanzato & Matson (18) and Choi & Choudhary (19) and their respective colleagues. These sequences are included in our analysis. The most parsimonious explanation is that each deletion arose independently in response to a common and strong selective pressure, to produce strikingly convergent outcomes. We searched the GISAID sequence database (17) for additional instances of deletions within S protein. From a dataset of 146,795 sequences (deposited from 12/01/2019 to 10/24/2020) we identified 1,108 viruses with deletions in the S gene. When mapped to the S gene, 90% occupied four discrete sites within the NTD ( Fig. 2A) . We term these sites recurrent deletion regions (RDRs) and number them 1-4 from the 5' to 3' end of the gene. RDR2 corresponds to the deletion in Fig. 1A and RDR4 to Fig. 1B . The vast majority of deletions appear to have arisen and been subsequently retained in replicating viruses. In-frame deletions should occur one third of the time and are multiples of three nucleotides. We observed a preponderance of in-frame deletions with lengths of 3, 6, 9 and 12 (Fig. 2B) . Among all deletions, 93% are in frame and do not produce a stop codon (Fig. 2C ). In the NTD, >97% of deletions maintain the open reading frame, with most mapping to RDRs 1 to 4. Other spike domains do not follow this trend. Deletions in the receptor binding domain (RBD) and S2 preserve the reading frame 30% and 37% of the time, respectively. Tolerance and enrichment for deletions are therefore an intrinsic feature of RDRs. The RDRs harbor a spectrum of deletions, from those that appear only in a single virus to those that are frequent in length and position. Deletions at RDRs 1 and 3 were strongly biased to a single site while RDRs 2 and 4 are composed of many different overlapping deletions. Preferences to remove specific nucleotides are apparent from the histograms in Fig. 2D . For all four RDRs, it appears that selection and perhaps transmission favors specific deletions over others. We compared the geographic distribution and GISAID clade designations of viruses with deletions in RDRs to our entire dataset ( Fig. 2E-F) . Viruses with deletions in RDRs 2 and 4 generally reflected the geographic and genetic diversity in the GISAID database. This patterning is consistent with recurrent, independent deletion events at these sites. In contrast, viruses with deletions at RDRs 1 and 3 were overwhelmingly from Europe (and Oceania for RDR3) and from clades G and GR respectively. This indicates that viruses recurrently explore deletions at RDRs 1 and 3, and selection has favored specific deletions, in certain clades that circulate in limited geographies. The geographic and genetic distributions of some RDR variants suggest human-to-human transmission. We identified, for each RDR, instances where viruses with identical deletions were isolated from different patients around the same time. Two patients in France (male, age 58, EPI_ISL_582112 and female, age 59, EPI_ISL_582120) were found to have viruses that where 100% identical, including a six-nucleotide deletion in RDR1. We identified a cluster of four individuals in Senegal that shared a three-nucleotide deletion in RDR2 and a deletion in Orf1ab (1605 to1608). These viruses group together among all Senegalese samples (Fig. 3A) . The RDR2 deletion is identical to those in PLTI1, MSK-4, MSK-6 and MSK-8, demonstrating that this mutation arises independently and transmits between humans. Four patients from Ireland had viruses that share a three-nucleotide deletion in RDR3. These sequences form distinct branches among Irish SARS-CoV-2 sequences (Fig. 3B) . A cluster of sequences from Switzerland, from at minimum two individuals, share a nine nucleotide deletion in RDR4 (Fig. 3C) . These examples are illustrative. Most sequences lack sufficient accompanying data to distinguish between recurrent sampling of a single patient or viruses from multiple patients. We found 599 sequences with the same three-nucleotide deletion in RDR1 that were sequenced by centers across the United Kingdom (UK). Similarly, other sequences from the UK either shared three-nucleotide deletions in RDR2 (n=87) or RDR3 (n=48). We examined the prevalence of RDR variants throughout the global pandemic from December 2019 to October 2020 (Fig. 3D ). Representatives at each site are present throughout. Deletions at RDRs 1 and 3 were the most frequent. For these, a single variant, Δ69-70 in RDR1 and Δ210 in RDR3, predominate (Fig. 3E ). RDR2 deletions appear to be more diverse with Δ145 predominating. The Δ69-70 variant has rapidly increased in abundance, from 0.01% of all viral sequences in July 2020 to ~2.5% in October 2020 (1 st to 24 th ). The frequencies of Δ69-70, Δ210, and likely Δ145 with a rise and fall pattern, are best explained by bursts of natural transmission between humans. The recurrence and convergence of RDR deletions, particularly during long-term infections, is indicative of selection and escape from a common and strong selective pressure. RDRs 2 and 4 and RDRs 1 and 3 occupy two distinct surfaces on the S protein NTD (Fig. 4A ). Both sites are the targets of antibodies (16, 20, 21). The epitope for neutralizing antibody 4A8 is formed entirely by beta sheets and their extended connecting loops that harbor RDRs 2 and 4. We generated a panel of S protein mutants representing the four RDRs. We transfected cells with plasmids expressing these mutants and used indirect immunofluorescence to determine if RDR deletions modulated 4A8 binding. The two RDR2 deletions and one RDR4 deletion completely abolished binding of 4A8 whilst still allowing recognition by a monoclonal antibody targeting the S protein RBD (Fig. 4B ). Deletions at RDRs 1 and 3 had no impact on the binding of either monoclonal antibody, confirming that they alter independent sites. Convergent evolution operates both within single RDRs and between RDRs to produce functionally equivalent adaptions by altering the same epitope. These observations demonstrate that naturally arising and circulating variants of SARS-CoV-2 S have altered antigenicity. Historically, pandemics have waned and left behind endemic human pathogens. This transition is contingent upon evading immunity imparted by previous infection. Influenza viruses exemplify this pattern, having followed it at least four successive times in the past century. Unlike the error-prone RdRps of most human respiratory pathogens, coronaviruses like SARS-CoV-2 possess polymerases with proofreading activity (11) (12) (13) . However, proofreading cannot correct deletions, which can rapidly alter entire stretches of amino acids and the structures they form. We have identified an evolutionary signature defined by prevalent and recurrent deletions in the S protein. Deletion is followed by human-to-human transmission of variants with altered antigenicity. The simplicity of using deletion to drive diversity is biologically compelling. The most recent sequences in our dataset are strongly biased to the UK and we show many variants with deletions in RDRs 1, 2, and 3 circulated widely across England, Northern Ireland, Scotland and Wales. These deletions alter one antigenic site (16, 21, 26) and likely alter another. The UK is a site for at least one Phase III trial of a SARS-CoV-2 vaccine. Given that deletion variants alter the antigenicity of SARS-CoV-2 S protein, potential mismatches between circulating and vaccine candidates may confound estimates of efficacy. SARS-CoV-2 appears to be on a trajectory to become an endemic human pathogen and antigenic sites will continue evolving to evade preexisting immunity. Deletions that rapidly alter entire stretches of amino acids at specific antigenic sites are already playing an important role. Efforts to track and monitor these recurrent, rapidly arising, geographically widespread variants are vital. All identified deletion variants and MN985325 were aligned using MAFFT (27, 28) and adjusted manually in recurrent deletion regions for consistency. Phylogenetic analyses utilized all sequences in our dataset from a country at a specific time, or in the case of Senegalese sequences the entirety of the pandemic. For non-Senegalese samples, sequences obtained within 1-2 months of the variants of interest were aligned to MN985325 using MAFFT (27, 28) . FastTree (29) was used to generate a preliminary phylogeny from which we extracted the sequences corresponding to the lineage of interest and adjacent outgroups. These sequences were realigned using MAFFT. Maximum-Likelihood phylogenetic trees were calculated using RAxML (30) using a general time reversible model with optimization of substitution rates (GTR GAMMA setting), starting with a completely random tree, using rapid Bootstrapping and search for best-scoring ML tree. Between 1,000 and 10,000 bootstraps of support were performed. Cell lines: Human 293F cells were maintained at 37° Celsius with 5% CO2 in FreeStyle 293 Expression Medium (ThermoFisher) supplemented with penicillin and streptomycin. Vero E6 cells were maintained at 37° Celsius with 5% CO2 in high glucose DMEM (Invitrogen) supplemented with 1% (v/v) Glutamax (Invitrogen) and 10% (v/v) fetal bovine serum (Invitrogen). A Novel Coronavirus from Patients with Pneumonia in China A new coronavirus associated with human respiratory disease in China A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural 9 Integrating influenza antigenic dynamics with molecular evolution Mapping the antigenic and genetic evolution of influenza virus Rates of evolutionary change in viruses: patterns and determinants Coronaviruses: an RNA proofreading machine regulates replication fidelity and diversity Discovery of an RNA virus 3'->5' exoribonuclease that is critically involved in coronavirus RNA synthesis A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants Low genetic diversity may be an Achilles heel of SARS-CoV-2 RBD MAb; red) detects all mutant forms of the protein (Δ69-70, Δ141-144, Δ146, Δ210, Δ243-244) and the unmodified protein (wild-type). 4A8 monoclonal antibody (4A8 MAb; green) does not detect mutants containing deletions in RDR2 or RDR4 (Δ141-144, Δ243-244, Δ146) Cloning and transfection of SARS-CoV-2 spike protein deletion mutants: A series of deletion mutants were generated in HDM_SARS2_Spike_del21_D614G (33) a plasmid containing SARS-CoV-2 S protein lacking the 21 C-terminal amino acids. HDM_SARS2_Spike_del21_D614G was a gift from Jesse Bloom (Addgene plasmid # 158762 Cloning strategies were designed to delete S protein amino acids Assemblies were transformed into DH5-alpha chemically competent cells (New England Biolabs) and correct clones were identified by restriction profile and Sanger sequencing (Genewiz) of small scale After the 30 minute incubation Opti-MEM in the wells was replaced with 500 µl per well Opti-MEM and 100 µl per well of transfection mixes were added. Transfected cells were incubated at 37° Celsius, 5% (v/v) CO2 for 24 hours. Indirect immunofluorescence assay: Indirect immunofluorescence was performed as previously reported (34). Briefly, cells transfected with the SARS-CoV-2 S protein deletion mutants and controls were washed once with DPBS (Fisher Scientific), fixed with 4% (w/v) paraformaldehyde in PBS (Boston Bioproducts) for 20 minutes at room temperature, rinsed twice with DPBS and permeabilized with 0.1% (v/v) Triton-X100 (Sigma) in DPBS for 30 minutes at 37° Celsius Cells were washed three times with DPBS and secondary antibodies [goat anti-rabbit Alexa Fluor-568, Invitrogen, and goat anti-human Alexa Fluor-488, Invitrogen, diluted 1 : 400 in DPBS containing 0.1 % (v/v) Triton X-100 were added and incubated at 37° Celsius for 1 hour. Cells were washed three times with DPBS and nuclei were counterstained with 4',6-diamidino-2-phenylindole (DAPI) nuclear stain (300 nM DAPI stain solution in PBS; Invitrogen) for 10 minutes at room temperature. Fluorescence was observed with a DMi 8 UV microscope (Leica) and photomicrographs were acquired using a camera (Leica) and LAS X software (Leica) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform MAFFT multiple sequence alignment software version 7: improvements in performance and usability FastTree: computing large minimum evolution trees with profiles instead of a distance matrix RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies A neutralizing human antibody binds to the N-terminal domain of the Spike protein of SARS-CoV-2 Antibodies to a Conserved Influenza Head Interface Epitope Protect by an IgG Subtype-Dependent Mechanism H3N2 influenza infection elicits more cross-reactive and less clonally expanded anti-hemagglutinin antibodies than influenza vaccination Protocol and Reagents for Pseudotyping Lentiviral Particles with SARS-CoV-2 Spike Protein for Neutralization Assays SARS-CoV-2 growth, furin-cleavage-site adaptation and neutralization using serum from acutely infected hospitalized COVID-19 patients