key: cord-0936044-ltp7rxpy authors: Di Giacomo, Simone; Mercatelli, Daniele; Rakhimov, Amir; Giorgi, Federico M. title: Preliminary report on severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) Spike mutation T478K date: 2021-05-15 journal: J Med Virol DOI: 10.1002/jmv.27062 sha: 4f1c0bee58e0240fd4fe6f3c121bbcd3dcbbc69a doc_id: 936044 cord_uid: ltp7rxpy Several severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) variants have emerged, posing a renewed threat to coronavirus disease 2019 containment and to vaccine and drug efficacy. In this study, we analyzed more than 1,000,000 SARS‐CoV‐2 genomic sequences deposited up to April 27, 2021, on the GISAID public repository, and identified a novel T478K mutation located on the SARS‐CoV‐2 Spike protein. The mutation is structurally located in the region of interaction with human receptor ACE2 and was detected in 11,435 distinct cases. We show that T478K has appeared and risen in frequency since January 2021, predominantly in Mexico and the United States, but we could also detect it in several European countries. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiological cause of coronavirus disease 2019 (COVID-19) is responsible for the most severe pandemic outbreak of the current century. 1 Naturally, it is the object of unprecedented scientific scrutiny, with more than one million SARS-CoV-2 genomic sequences having been generated and publicly shared since December 2019. This avalanche of data was made possible, thanks to the efforts of thousands of contributing laboratories across the World, and collected by the GISAID initiative database. 2 This currently allows to run nearly real-time operations of genomic surveillance, by scrutinizing the evolution of the virus temporally and geographically. 3 In the first 17 months since the appearance of SARS-CoV-2, genomic surveillance has proven itself fundamental in tracking viral outbreaks 4 and in identifying potential new variants of clinical concern. One of these is the variant B.1.1.7, 5 characterized by 18 mutations over the reference genomic sequence (NCBI entry NC_045512.2, most notably a mutation A23063T, causing an aminoacidic change N501Y in the viral Spike protein interaction domain with human receptor angiotensin-converting enzyme 2 (ACE2). 6 The interaction with ACE2, a surface protein expressed in human respiratory epithelial cells, is one of the key mechanisms for viral entry in the host, and it is a molecular mechanism directly connected with host specificity, early transmissibility, 7 and higher viral infectivity. 8 N501Y is only one of the 9 Spike mutations of variant B.1.1.7, also characterized by mutations in polyprotein open reading frame (ORF1a), proteins ORF8, and nucleocapsid (N). 9 Another mutation in the Spike protein, D614G, was prevalent in early 2020 and is currently present in more than 90% of all circulating SARS-CoV-2s; this mutation is not located in the interaction domain with ACE2, but it has been associated with increased entry efficiency into human host cells. 10 The US Center for Disease Control and Prevention defined "variants of concern" all those mutations and lineages, which have been associated with an increase in transmissibility and virulence, a In this short communication, we will show a report on a novel SARS-CoV-2 Spike mutation, T478K, which is also located at the interface of the Spike/ACE2 interaction, and it is worryingly rising in prevalence among SARS-CoV-2 sequences collected since the beginning of 2021. We downloaded all publicly available SARS-CoV-2 genomic sequences from the GISAID database on April 27, 2021. This yielded 1,180,571 samples, annotated with features, such as collection date, region of origin, age, and sex of the infected patient. Only viruses collected from human hosts were kept for further processing, discarding, for example, environmental samples or viruses obtained from other mammals. We compared all these sequences with the SARS-CoV-2 Wuhan genome NC_045512.2, using a gene annotation file in GFF3 format available as Supplementary File 1. This provided 27,388,937 mutations when compared with the reference. These nucleotide mutations were then converted in corresponding cumulative effects on protein sequence using the Coronapp pipeline. 18 The 3D rendering of the location of S:T478K in the SARS-CoV-2 Spike/Human ACE2 complex was based on the crystal structure from, 19 deposited in the Protein Data Bank 20 entry 6VW1. Comparison between wild-type and mutant Spike was performed using the Pymol suite 21 with the adaptive Poisson-Boltzmann Solver plugin. 22 All statistical analysis, algorithms, and plotting were implemented with the R software. 23 In total, we could detect the Spike:T478K (S:T478K) mutation in for this lineage do not make it a variant of concern yet. S:T478K does not seem to be significantly associated with patient age (one-way analysis of variance test p > .1, Figure 1B and Figure S2 ), nor with patient sex ( Figure 1C ). The geographic distribution of S:T478K ( Figure 1D and Figure S3 ) shows a noticeable prevalence in Figure 1D and Figure S4 ). S:t478k has been detected also in European countries, such as Germany, Sweden, and Switzerland ( Figure 1E and Table S1 ). One of the reasons of concern about S:T478K is that it is rapidly growing over time, both in the number of detected samples ( Figure 2A ) and in prevalence, calculated as the number of cases over the total number of sequenced genomes ( Figure 2B ). We Figure 3A ). The amino acid change from the polar but uncharged threonine (T) to a basic, charged lysine (K) is predicted to increase the electrostatic potential of Spike to a more positive surface, in a region directly contacting ACE2 ( Figure 3B) . Also, the larger side chain of lysine is predicted to increase the steric hindrance of the mutant, possibly further affecting the Spike/ACE2 interaction ( Figure 3C ). S:T478K is frequently co-occurring with three other Spike mutations located outside the canonical ACE2 interaction region. One is D614G (99.83% co-occurrence), one of the founding events of SARS-CoV-2 lineage B, currently the most diffused Worldwide (Table 1) . F I G U R E 3 (A) 3D representation of the SARS-CoV-2 Spike/Human ACE2 interacting complex, derived from the crystal structure from. 19 (B) Representation of the SARS-CoV-2 Spike electrostatic potential calculated with the Adaptive Poisson-Boltzmann Solver (APBS) program 22 implemented in PyMOL. 21 Molecular surface was colored according to the molecular electrostatic potential (ranging from −2.0 [red] to 2.0 [blue]) in T478 (reference, above) and in K478 (more recent mutation, below). (C) 3D detail of structural superposition of WT SARS-CoV-2 RBD and S:T478K. T478 side chains are colored in cyan, while K478 side chains are colored in red. ACE2, angiotensin-converting enzyme 2; RBD, receptor-binding domain; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; WT, wild-type The other two are P681H and T732A, with 93.8% and 88.7% cooccurrence with S:T478K, respectively (Table 1) . We could detect S:T478K in copresence with other Spike mutations as well, but currently all at much lower frequencies (<4%). The Spike S:T478K mutation is frequently co-existing also with mutations in other proteins, such as the diffused two-aa Nucleocapsid mutation N:RG203KR, and mutations in nonstructural proteins (NSPs) derived from the polyprotein encoded ORF1, which include for example the viral RNAdependent RNA polymerase NSP12 (Table 2 ). In this short communication, we report the distribution of the Spike mutation S:T478K and its recent growth in prevalence in the SARS- but we could detect it also in several European countries. T478K has been detected in other phylogenetically non-derived lineages from B.1.1.519, supporting the hypothesis that this mutation arose more than once in distinct events. Since the highest abundance of this mutation seems to be in Mexico and USA, this may allow to hypothesize a founder effect in which a chance founder event was followed by natural selection progression, since the frequency of the mutation has, slowly but steadily, increased in the first months of 2021. The location of S:T478K in the interaction complex with human ACE2 may affect the affinity with human cells and therefore influence viral infectivity. An in silico molecular dynamics study on the protein structure of Spike has predicted that the T478K mutation, substituting a non-charged amino acid (Threonine) with a positive one (Lysine) may significantly alter the electrostatic surface of the protein (Figure 3) , and therefore the interaction with ACE2, drugs, or antibodies, 25 and that the effect can be increased if combined by other co-occurring Spike mutations (see Table 1 ). Another experiment showed that T478K and T478R mutants were enriched when SARS-CoV-2 viral cultures were tested against weak neutralizing antibodies, 26 highlighting, at least in vitro, a possible genetic route the virus can follow to escape immune recognition. Everything considered, we believe that the continued genetical and clinical monitoring of S:T478K and other Spike mutations are of paramount importance to better understand COVID-19 and be able to better counteract its future developments. The authors are very grateful to the GISAID Initiative and all its data contributors, that is, the authors from the originating laboratories responsible for obtaining the specimens and the Submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this study is based. This study was funded by the Italian Ministry of University and Research, Montalcini grant. SARS-CoV-2 variants and ending the COVID-19 pandemic GISAID: global initiative on sharing all influenza data-from vision to reality Geographic and genomic distribution of SARS-CoV-2 mutations Web tools to fight pandemics: the COVID-19 experience A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Structural genetics of circulating variants affecting the SARS-CoV-2 Spike/human ACE2 complex Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom Higher infectivity of the SARS-CoV-2 new variants is associated with K417N/T, E484K, and N501Y mutants: an insight from structural data Emergence of SARS-CoV-2 B.1.1.7 lineage-United States SARS-CoV-2 D614G Spike mutation increases entry efficiency with enhanced ACE2-binding affinity SARS-CoV-2 variants of concern Evolutionary and structural analysis elucidates mutations on SARS-CoV2 Spike protein with altered human ACE2 binding affinity A vaccine targeting the RBD of the S protein of SARS-CoV-2 induces protective immunity SARS-CoV-2 immune evasion by variant B.1.427/B.1.429 E484K mutation in SARS-CoV-2 RBD enhances binding affinity with HACE2 but reduces interactions with neutralizing antibodies and nanobodies: binding free energy calculation studies Neutralization of SARS-CoV-2 Spike 69/70 deletion, E484K and N501Y variants by BNT162b2 vaccine-elicited Sera Efficacy of the ChAdOx1 NCoV-19 Covid-19 vaccine against the B.1.351 variant Coronapp: a web application to annotate and monitor SARS-CoV-2 mutations Structural basis of receptor recognition by SARS-CoV-2 The protein data bank PyMOL and Inkscape bridge the data and the data visualization Improvements to the APBS biomolecular solvation software suite A tale of three SARS-CoV-2 variants with independently acquired P681H mutations in New York State. medRxiv Comparative molecular dynamics study of the receptor-binding domains in SARS-CoV-2 and SARS-CoV and the effects of mutations on the binding affinity Development of potency, breadth and resilience to viral escape mutations in SARS-CoV-2 neutralizing antibodies Spike mutation T478K The authors declare that there are no conflict of interests.T A B L E 1 Ten SARS-CoV-2 Spike mutations most frequently cooccurring with S:T478K T A B L E 2 Ten SARS-CoV-2 non-Spike mutations most frequently co-occurring with S:T478K