key: cord-0768103-ae828tmp authors: Muttineni, Radhakrishna; R.N., Binitha; Putty, Kalyani; Marapakala, Kavitha; K.P., Sandra; Panyam, Jaslin; Vemula, Aravind; Singh, Shashi Mohan; Balachandran, Subin; S.T., Viroji Rao; Kondapi, Anand Kumar title: SARS‐CoV‐2 variants and spike mutations involved in second wave of COVID‐19 pandemic in India date: 2022-03-24 journal: Transbound Emerg Dis DOI: 10.1111/tbed.14508 sha: 462ef2b1b8b3fb8adb9d0744ad333ac017fb16ce doc_id: 768103 cord_uid: ae828tmp Against the backdrop of the second wave of COVID‐19 pandemic in India that started in March 2021, we have monitored the spike (S) protein mutations in all the reported (GISAID portal) whole‐genome sequences of SARS‐CoV‐2 circulating in India from 1 January 2021 to 31 August 2021. In the 43,102 SARS‐CoV‐2 genomic sequences analysed, we have identified 24,260 amino acid mutations in the S protein, based on which 265 Pango lineages could be categorized. The dominant lineage in most of the 28 states of India and its 8 union territories was B.1.617.2 (the delta variant). However, the states Madhya Pradesh, Jammu & Kashmir, and Punjab had B.1.1.7 (alpha variant) as the major lineage, while the Himachal Pradesh state reported B.1.36 as the dominating lineage. A detailed analysis of various domains of S protein was carried out for detecting mutations having a prevalence of >1%; 70, 18, 7, 3, 9, 4, and 1 (N = 112) such mutations were observed in the N‐terminal domain, receptor binding domain, C ‐terminal domain, fusion peptide region, heptapeptide repeat (HR)‐1 domains, signal peptide domain, and transmembrane region, respectively. However, no mutations were recorded in the HR‐2 and cytoplasmic domains of the S protein. Interestingly, 13.39% (N = 15) of these mutations were reported to increase the infectivity and pathogenicity of the virus; 2% (N = 3) were known to be vaccine breakthrough mutations, and 0.89% (N = 1) were known to escape neutralizing antibodies. The biological significance of 82% (N = 92) of the reported mutations is yet unknown. As SARS‐CoV‐2 variants are emerging rapidly, it is critical to continuously monitor local viral mutations to understand national trends of virus circulation. This can tremendously help in designing better preventive regimens in the country, and avoid vaccine breakthrough infections. researchers worldwide who are generating and sharing virus wholegenome sequences publicly, multiple variants of SARS-CoV-2 have been documented globally throughout this pandemic. Currently, more than 4 million SARS-CoV-2 sequences are available via the Global Initiative on Sharing All Influenza Data (GISAID) (GISAID -Initiative, n.d.) . Sequence data greatly enables the identification of mutations that potentially change viral properties and detect emerging SARS-CoV-2 variants. The entry of SARS-CoV-2 into host cells is mediated by its spike (S) protein binding to the host cell-surface receptor, angiotensinconverting enzyme 2 (ACE2) . The S protein is heavily glycosylated and induces a protective immune response (F. Li, 2016; Walls et al., 2020) . The S protein is divided into two subunits S1 and S2. These two domains are biologically significant. S1 exhibits the function of receptor-binding and S2 region is associated with the membrane fusion to facilitate cell entry (Xia, 2021) . S1 region has four domains, signal peptide (1-13 aa), N-terminal (14-305 aa), receptor binding (319-541 aa) and C-terminal-SD1 and SD2 (542-685 aa). The S2 region has five domains, fusion peptide (688-811 aa), heptapeptide repeats 1 and 2 (812-1142 aa), transmembrane (1213-1236 aa), cytoplasmic domain, and extra amino acids (1274-1943 aa) (Winger & Caspari, 2021) . Mutations in S protein were always of major concern as they play a major role in the emergence of new SARS-CoV-2 variants . The World Health Organization (WHO) classification of variant viruses as variants of concern (VOC) and variants of interest (VOI) is also based on mutations in S protein (Tracking SARS-CoV-2 Variants, n.d.) (Table 1 ). In India, the second wave of COVID-19 emerged in March 2021 (and started declining towards August 2021), with a new set of symptoms affecting the gastrointestinal system along with the previous (first wave) symptoms that primarily included respiratory problems (Kamble et al., 2021) . Oxygen requirement, mortality rate and the disease-spreading rate increased during the second wave even though vaccines like COVAXIN (inactivated whole virus vaccine) and COV-ISHIELD (adenovirus vectored vaccine) were available and in use since January 2021 (albeit in initial stages by the time second wave emerged in India). In India, an estimate of 62,76,35,244 people completed 2 doses of vaccination; however, vaccine breakthrough infections are not uncommon (Dash et al., 2021) . With this perspective, we initiated this study wherein we analysed the S protein sequences of SARS-CoV-2 variants reported in India and submitted in the GISAID portal from 1 January 2021 to 31 August 2021. Mutations throughout the S protein domains were studied in detail and analysed (at a translational level) to understand whether the mutations reported affect the virus properties and antigenic consequences such as infectivity, transmissibility, and resistance to the available vaccines, based on the literature available on these mutations. are analysed in detail. These mutations were studied in detail in S protein region-1 and S protein region-2. Based on existing literature, the S protein region's mutations were analysed for increased infectivity, decreased infectivity, and decreased neutralising sensitivity. S protein structure PDBID-6VXX was downloaded from PDB, and increasing infectivity mutations were mapped. The final statistical data and muta-tion mapping was formulated into the graphical models using BioRender (https://biorender.com). Figure 1 and Table 2 ). Furthermore, Pango lineages identified in the country whose prevalence was >1% were also recorded ( Figure 2) Of the 24,260 mutations recorded in the S protein, mutations that pos- (Table S4) wave of COVID-19 in India. Of the 112 mutations recorded throughout the S protein with >1% prevalence in Indian populations, 15 (T19R, D63G, V70del, G142D, D377Y, L452R, T478K, E484Q, E484K, N501Y, A570D, D614G, Q677H, P681H, P681R) were known to be involved in increasing viral pathogenicity, 3 (E156G, F157del, R158del) were known to be vaccine breakthrough mutations, 1 mutation (Y144del) was known to be involved in the escape of the virus to neutralizing antibodies, and 1 (R385K) mutation was known to be involved in decreased virulence ( Figure 3) . mutations with >1% prevalence, and most of these mutations were shown to be clinically significant. P77L mutation is dominant in the delta variants (Patchett et al., 2021) , but the clinical significance is unknown. Y144del mutation was known to be involved in escaping neutralizing antibodies (R. . E156G, F157del, G142D mutations were known to be involved in vaccine breakthrough infections (Mishra et al., 2021; Shen et al., 2021) . In our study, these mutations are found at 17% prevalence in Indian population and are likely to be clinically significant as these mutations are involved in reduced sensitivity to existing vaccines . T19R mutation (found at 43.66% prevalence in the current study) is found in Delta plus variant, which is classified as a VOC (Jordan et al., 2021) . D63G and V70del were reported to increase viral pathogenicity (X.-J. Yang, 2021a Yang, , 2021b . S26L, T77A, K81N, and V82A mutations observed in the Indian variants in our study are at significant prevalence percentages, but their role in the infectivity of SARS-CoV-2 has not been reported yet. Furthermore, L42F, P46L, S97I, F108L, L116F, D119V, F120L, P65S, and T183I are not reported yet as significant in terms of viral infectivity. (Santos & Passos, 2021) . One mutation in the receptor-binding domain R385K was reported to decrease the viral infectivity (Miljanovic et al., 2021) . Pango lineages in India during January-August 2021. Pango lineages whose prevalence was >1% were shown. The numbers in the outer circle refers to percentages of Pango lineages Seven mutations were recorded in the CTD (with >1% prevalence) and five of these mutations that is, A570D, D614G, Q677H, P681H, and P681R are known to increase virus infectivity. A570D increases virus infectivity by enhancing host receptor binding, and can also act as a deleterious immune escape mutation (X. Yang et al., 2020) . D614G mutation was shown to increase virion S protein infectivity and density , Q677H was shown to enhance viral infectivity and confer neutralizing antibody resistance (Zeng et al., 2021) . P681H mutation was reported as a worldwide emerging mutation that increases the infectivity of SARS-CoV-2 (Maison et al., 2021) . P681R was shown to have played a crucial role in the alpha to delta replacement and increase in infectivity and pathogenicity of the virus . Three mutations with >1% prevalence, that is, T716I, T749I, A706S were recorded in the fusion peptide region; the role of these mutations in the pathogenicity of SARS CoV-2 is yet unknown. Nine mutations were recorded in the HR-1 region of the spike protein (with >1% prevalence). S982A mutation was reported in the antigenic sites of alpha, which might be involved in increasing viral infectivity (Y. . Only one mutation, that is, P1228L was recorded in the transmembrane domain, and the clinical significance of this mutation is not reported yet. Four mutations were recorded in the signal peptide domain: S2P, D3L, D3Y, H11Q. The biological significance of all these mutations is not reported yet. Among the 15 mutations that were reported to increase the infectivity, 4 mutations (T19R, D63G, V70 del, and G142D) were located on the Nterminal domain of the spike protein, which is involved in the host cell attachment through diverse polysaccharide moieties. Most of the VOC have mutations in the N-terminal domain to escape the neutralization (McCallum et al., 2021) . T19R mutation is found on the surface patch targeted by most NTD-neutralizing antibodies (Planas et al., 2021) . D63G mutation is important as it is directly involved in the RNA binding (Dinesh et al., 2020) . V70del is present in the prominent exterior loop of the spike and is required for the efficient cell entry and increases cleavage of S2 thus enhancing spike infectivity (Meng et al., 2021) . SARS-CoV-2 variants with a mutation at G142D showed resistance to the mAbs (Suryadevara et al., 2021) . Six mutations that increase F I G U R E 3 Spike mutations of >1% prevalence in India, January-August 2021. Percentage prevalence of mutations across the domains of S protein whose prevalence was >1% were shown. A specific colour was assigned for each respective category of amino acid mutations, increase of infectivity (pink), decrease the infectivity (yellow), vaccine breakthrough mutations(grey), escaping neutralizing antibodies (black), and those whose role is yet unknown are shown in turquoise. X-axis refers to amino acid changes in S protein, Y-axis refers to frequency of S protein mutations in percentages. SP: signal peptide sequence domain, RBD: receptor binding domain, CTD: C-terminal domain, FP: fusion peptide domain, HR-1: heptapeptide repeat domain, TM: transmembrane domain infectivity were found in the RBD, which interacts with host receptor ACE2. L452R mutation increases spike stability and viral infectivity, thereby increasing viral infection (Motozono et al., 2021) . Mutation T478K is located on the interface with ACE2. Amino acid change from threonine to lysine is known to be involved in enhancement of electrostatic potential of S protein and hence was predicted to affect the spike-ACE2 interaction (Giacomo et al., 2021) . Mutation at E484 (either with Q or K) has shown higher binding affinity for the ACE2 cell receptor resulting in enhanced transmissibility (Augusto et al., 2022) . Variant with N501Y substitution was found to increase the viral transmissibility; substitution with tyrosine allowing more interaction with ACE2 receptor may lead to higher binding affinity to the host cell receptor . The CTD harbours A570D mutation, which is found to be more infectious in the pseudoviral assay. Structural analysis on A570D revealed that this mutation introduces the molecular switch for the opening and closing of RBD (X.-J. Yang, 2021a Yang, , 2021b . D614G mutation is more infectious and several studies indicated that this mutation is associated with increased viral load in the patients infected with COVID-19 as D614G shifts S protein conformation towards an ACE2binding fusion state (Yurkovetskiy et al., 2020) . It was demonstrated that Q677H mutation increases viral infectivity and syncytium formation (Zeng et al., 2021) . Mutations P681R and P681H are located in the polybasic S1/S2 furin cleavage site and were reported to enhance the fusogenic activity of the spike protein (Tao et al., 2021) . Three mutations in the N-terminal domain, E156G, F157del, and R158del revealed less flexibility compared to the Wuhan strain. Docking studies of these mutants with monoclonal antibodies reported low binding affinity compared to the Wuhan strain; this may be a possible case of immune escape (Chaudhari et al., 2021) . Y144 del has been found to abrogate binding to neutralizing antibodies (McCarthy et al., 2021) . The initial contact between RBD of the S-protein and ACE2 is through the peptidase domain (PD) (Adhikari et al., 2020) . The possible binding mechanism is of pivotal importance and has vital implications for vaccine design. The presence of an unexpected furin cleavage site at the S1/S2 boundary of SARS-CoV-2 S, which is cleaved during biosynthesis, is a novel feature setting this virus apart from SARS-CoV and SARSr-CoVs (Walls et al., 2020) . The boundary of the (uncoupled) N-terminal region lies in physical proximity to the furin-targeted motif RRAR, which is essential for pre-activation of SARS-CoV-2 spike protein through proteolysis (Serapian et al., 2020) . A second proteolytic cleavage at site S2′ releases the fusion peptide, which penetrates the host cell membrane, preparing it for fusion (Apellániz et al., 2014) . S glycoproteins are densely decorated by heterogeneous N-linked glycans protruding from the trimer surface. These oligosaccharides participate in S folding, affect priming by host proteases, and might modulate the antibody recognition (Walls et al., 2020) . Glycan acts as a shield that protects epitope from antibody binding. Site-specific glycan analysis of SARS-CoV-2 S protein was performed in which glycopeptides are generated using proteases and analysed by liquid chromatography-mass spectrometry; as a result, 22 N-linked glycosylation sequons per protomer were determined (Watanabe et al., 2020) . One study demonstrated that the simulated models of N-glycan at position 165 and 234 act as RBD modulators on its binding with ACE2. On addition of N-glycans, RBD is stabilized and the removal of N-glycans showed a reduced ACE2 binding, which revealed conformational shift of RBD (Casalino et al., 2020) . E484 is an immunodominant spike protein residue in RBD region with various substitutions (E484A, E484D, E484G, and E484K) . E484K introduces a residue with a charge opposite to the wild-type, which would significantly alter the electrostatic complementarity of antibody binding to this region (Andreano et al., 2021) . Also, the same group in another study reported that mutation E484K can lead to a fourfold decrease in neutralization activity of convalescent plasma to SARS-CoV-2 (Andreano et al., 2021) . ΔY144 deletion was found to alter the N3 NTD loop (140-156 residues) and can abolish the neutralizing effect of various antibodies ). An in-silico study predicted that replacement of aspartic acid to glycine (D614G) leads to the loss of hydrogen bond interactions formed with valine at position 859, thereby eliminating the hydrogen bonding between S1 and S2 domain, leading to increased main-chain flexibility enabling a more favourable orientation of Q613, possibly facilitating cleavage by TMPRSS2 by perturbing its affinity with the S1-furin cleavage site (Raghav et al., 2020) . One previous study performed an unbiased multi-microsecond molecular dynamics of 7 glycosylated S-protein variants derived from cryo-EM structure 6VSB and applied a matrix of low coupling energy (MLCE)-based approach to predict potential changes in immunogenic regions on each variant (Triveri et al., 2021) . In this study, reference S structure used was the dominant D614G variant. According to the MLCE study, contiguous residues uncoupled from the S protein core shrink in number compared to D614G S; their experimental data also confirms that variant evades Abs binding to the shrunk or lost epitopes. The number of residues defining the epitope located in the long RBD loop (residues 417−503) is much lower in mutants 501Y.V2 (South African), B1.1.28 (gamma), and N439K. Also, they predict that N439K RBD forms a new interaction with the human ACE2 receptor (hACE2) and has an enhanced affinity for hACE2 (Triveri et al., 2021) . , and Y144del were identified as vaccine breakthrough mutations (Hacisuleyman et al., 2021; Wu et al., 2021) . Variants including A475V, L452R, V483A, and F490L were known to be involved in the putative antibody resistance (Q. Li et al., 2020) . It is alarming that these mutations were also identified in the present study with significant prevalence rates: E484K (2.03%), T95I (11.5%), Y144del (5.86%), L452R (30.25%). To conclude, we have identified the total number of circulating SARS-CoV-2 lineages reported during the second wave of the COVID-19 pandemic in India. Further, the S mutations associated with the rise in COVID-19 cases were investigated in detail. Importantly, we have identified 15 mutations that were known to be involved in increasing viral pathogenicity (T19R, D63G, V70del, G142D, D377Y, L452R, T478K, E484Q, E484K, N501Y, A570D, D614G, Q677H, P681H, P681R); 3 that were known to be vaccine breakthrough mutations (E156G, F157del, R158del); and 1 mutation that was known to escape neutralizing antibodies (Y144del). Since new variants will continue to emerge, it is crucial to identify and monitor mutations in the S protein, and any functionally significant co-occurring modifications. Going forward, experimental assessment of SARS-CoV-2 S protein mutations can be beneficial from a public health point of view to understand vaccine breakthrough infections, and to devise better preventive strategies by individual nations. The authors would like to acknowledge GISAID and researchers worldwide depositing their sequences in GISAID. The authors declare no conflict of interest. The authors confirm that the ethical policies of the journal, as noted on the journal's author guidelines page, have been adhered to. No ethical approval was required as this article only uses whole genome sequences deposited on GISAID portal that are publicly available with no original research data. Radhakrishna Muttineni conceptualized the study. Binitha R.N., Jaslin Panyam, Aravind Vemula, Shashi Mohan Singh, Subin Balachandran, and Sandra K.P. collected data and did the preliminary analysis. Radhakrishna Muttineni, Binitha R.N., Kalyani Putty, Kavitha Marapakala, Intra-and intermolecular atomic-scale interactions in the receptor binding domain of SARS-CoV-2 spike protein: Implication for ACE2 receptor binding SARS-CoV-2 escape from a highly neutralizing COVID-19 convalescent plasma The three lives of viral fusion peptides In vitro data suggest that Indian delta variant B.1.617 of SARS-CoV-2 escapes neutralization by both receptor affinity and immune evasion From one to many: The within-host rise of viral variants Beyond shielding: The roles of glycans in the SARS-CoV-2 spike protein E156/G and Arg158, Phe-157/del mutation in NTD of spike protein in B.1.167.2 lineage of SARS-CoV-2 leads to immune evasion through antibody escape SARS-CoV-2 spike mutations, L452R, T478K, E484Q and P681R, in the second wave of COVID-19 in Maharashtra COVID-19). (n.d Breakthrough SARS-CoV-2 infections among BBV-152 (COVAXIN ® ) and AZD1222 (COVISHIELD TM) recipients: Report from the eastern state of India Structural basis of RNA recognition by the SARS-CoV-2 nucleocapsid phosphoprotein Preliminary report on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike mutation T478K Vaccine breakthrough infections with SARS-CoV-2 variants SARS-CoV-2 variants, spike mutations and immune escape T cell immune responses to SARS-CoV-2 and variants of concern (Alpha and Delta) in infected and vaccinated individuals Synopsis of symptoms of COVID-19 during second wave of the pandemic in India. Hormone Molecular Biology and Clinical Investigation Recurrent emergence of SARS-CoV-2 spike deletion H69/V70 and its role in the Alpha variant B.1.1.7 Molecular determinants and mechanism for antibody cocktail preventing SARS-CoV-2 escape Structure, function, and evolution of coronavirus spike proteins The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity Delta spike P681R mutation enhances SARS-CoV-2 fitness over Alpha variant Genetic characteristics and phylogeny of 969-bp S gene sequence of SARS-CoV-2 from Hawai'i reveals the worldwide emerging P681H mutation N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2 Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape The first molecular characterization of Serbian SARS-CoV-2 isolates from a unique early second wave in Europe SARS-CoV-2 spike L452R variant evades cellular immunity and increases infectivity A molecular sensor determines the ubiquitin substrate specificity of SARS-CoV-2 papain-like protease Reduced sensitivity of SARS-CoV-2 variant Delta to antibody neutralization Analysis of Indian SARS-CoV-2 genomes reveals prevalence of D614G mutation in spike protein predicting an increase in interaction with TMPRSS2 and virus infectivity The high infectivity of SARS-CoV-2 B.1.1.7 is associated with increased interaction force between Spike-ACE2 caused by the viral N501Y mutation The answer lies in the energy: How simple atomistic molecular dynamics simulations may hold the key to epitope prediction on the fully glycosylated SARS-CoV-2 spike protein Spike protein NTD mutation G142D in SARS-CoV-2 Delta VOC lineages is associated with frequent back mutations, increased viral loads, and immune evasion Neutralizing and protective human monoclonal antibodies recognizing the N-terminal domain of the SARS-CoV-2 spike protein The biological and clinical significance of emerging SARS-CoV-2 variants Tracking SARS-CoV-2 variants SARS-CoV-2 spike protein mutations and escape from antibodies: A computational model of epitope loss in variants of concern Antigenic minimalism of SARS-CoV-2 is linked to surges in COVID-19 community transmission and vaccine breakthrough infections CoV-2 variant with L452R and E484Q neutralization resistance mutations Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Analysis of SARS-CoV-2 variant mutations reveals neutralization escape mechanisms and the ability to use ACE2 receptors from additional species The infectivity and antigenicity of epidemic SARS-CoV-2 variants in the United Kingdom Site-specific glycan analysis of the SARS-CoV-2 spike The spike of concern-The novel variants of SARS-CoV-2 mRNA-1273 vaccine induces neutralizing antibodies against spike mutants from global SARS-CoV-2 variants Domains and functions of spike protein in Sars-Cov-2 in the context of vaccine design SARS-COV-2 δ variant drives the pandemic in the USA through two subvariants Delta-1 variant of SARS-COV-2 acquires spike V1264L and drives the pandemic in Indonesia Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: A single-centered, retrospective, observational study Structural and functional analysis of the D614G SARS-CoV-2 spike protein variant Neutralization of SARS-CoV-2 variants of concern harboring Q677H SARS-CoV-2 spikeprotein D614G mutation increases virion spike density and infectivity Broad and differential animal angiotensin-converting enzyme 2 receptor usage by SARS-CoV-2 Convergent evolution of SARS-CoV-2 in human and animals SARS-CoV-2 variants and spike mutations involved in second wave of COVID-19 pandemic in India Viroji Rao S.T., and Anand Kumar Kondapi analysed the data and wrote the manuscript. The data that support the findings of this study are openly available in GISAID at https://www.gisaid.org.