key: cord-0328230-0wq57z7b authors: Ferrareze, Patrícia Aline Gröhs; Zimerman, Ricardo Ariel; Franceschi, Vinícius Bonetti; Caldana, Gabriel Dickin; Netz, Paulo Augusto; Thompson, Claudia Elizabeth title: Molecular evolution and structural analyses of the spike glycoprotein from Brazilian SARS-CoV-2 genomes: the impact of the fixation of selected mutations date: 2021-07-19 journal: bioRxiv DOI: 10.1101/2021.07.16.452571 sha: f53409e9a77024d614c8f7a037e2be34f38ad5b8 doc_id: 328230 cord_uid: 0wq57z7b The COVID-19 pandemic caused by Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has reached by July 2021 almost 200 million cases and more than 4 million deaths worldwide since its beginning in late 2019, leading to enhanced concern in the scientific community and the general population. One of the most important pieces of this host-pathogen interaction is the spike protein, which binds to the human Angiotensin-converting enzyme 2 (hACE2) cell receptor, mediates the membrane fusion and is the major target of neutralizing antibodies against SARS-CoV-2. The multiple amino acid substitutions observed in this region, specially in the Receptor Binding Domain (RBD), mainly after almost one year of its emergence (late 2020), have enhanced the hACE2 binding affinity and led to several modifications in the mechanisms of SARS-CoV-2 pathogenesis, improving the viral fitness and/or promoting immune evasion, with potential impact in the vaccine development. In this way, the present work aimed to evaluate the effect of positively selected mutations fixed in the Brazilian SARS-CoV-2 lineages and to check for mutational evidence of coevolution. Additionally, we evaluated the impact of selected mutations identified in some of the VOC and VOI lineages (C.37, B.1.1.7, P.1, and P.2) of Brazilian samples on the structural stability of the spike protein, as well as their possible association with more aggressive infection profiles by estimating the binding affinity in the RBD-hACE2 complex. We identified 48 sites under selective pressure in Brazilian spike sequences, 17 of them with the strongest evidence by the HyPhy tests, including VOC related mutation sites 138, 142, 222, 262, 484, 681, and 845, among others. The coevolutionary analysis identified a number of 28 coevolving sites that were found not to be conditionally independent, such as the couple E484K - N501Y from P.1 and B.1.351 lineages. Finally, the molecular dynamics and free energy estimates showed the structural stabilizing effect and the higher impact of E484K for the improvement of the binding affinity between the spike RBD and the hACE2 in P.1 and P.2 lineages, as well as the stabilizing and destabilizing effects for the positively selected sites. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiological agent of the COVID-19 pandemic. The virus is composed of an enveloped positive-sense single-stranded RNA genome, which encodes 16 non-structural proteins (nsp1-16), four structural proteins (Spike, Membrane, Envelope, and Nucleocapsid) , and other accessory proteins (ORFs 3a, 6, 7a, 7b, 8, 10) (Fehr and Perlman, 2015; Liu et al., 2014) . The spike (S) glycoprotein is necessary for the viral binding to ACE2 host cell receptor, it is common to multiple coronaviruses (e. g., MERS-CoV and SARS-CoV) (Fehr and Perlman, 2015) , and is the major target of neutralizing antibodies against SARS-CoV-2 Yuan et al., 2020) . Spike has a homotrimeric form where each monomer (protomer) is composed by a S1 subunit that mediates the receptor binding and an alpha-helix rich S2 subunit that promotes the subsequent membrane fusion . Since the emergence of the first SARS-CoV-2 in the early Wuhan epidemic, several mutations have been identified, both occurring isolated or as a signature of a broader mutational complex. Although S protein represents only 13% of the viral genome, substitution at this site has been overrepresented suggesting immunodominance of this protein, specially at its Receptor Binding Domain (RBD) Weisblum et al., 2020) . These mutations can enhance receptor binding, either directly or allosterically, alter viral fitness or promote immune evasion, ensuing occasional reinfection and possible impacting the efficacy of developed vaccines Harvey et al., 2021; Korber et al., 2020; Weisblum et al., 2020) . The S protein mutation D614G (aspartic acid to glycine substitution at amino acid position 614) has been progressively dominant worldwide since March 2020, playing an essential role in infectivity and augmented viral load (Groves et al., 2021; C. B. Jackson et al., 2021; Korber et al., 2020; Yurkovetskiy et al., 2020) . This substitution may have emerged independently and promptly became dominant in several viral strains, outcompeting D614 harboring viruses, even where they originally appeared. D614G abolishes a hydrogen bond between this position and a threonine present in S2 of the neighbour protomer. The consequent conformational changes acts allosterically favouring the maintenance of the Receptor Binding Domain (RBD) in an activated "up" or "open" position, which exhibits its ACE2 binding site (Receptor Binding Motif) for longer periods. This allows a more efficient binding to ACE2 (Groves et al., 2021) , with consequent higher replication and viral loads. However, G614 mutants are similarly (or even more) susceptible to immune neutralization as the original D614 variant (Plante et al., 2020; Weissman et al., 2021) . Notably, this enhanced epitope exposure of "up" RBD may be a "weakness", since RBD is poor in predicted O and N-linked glycan sites required for immune shielding. In theory, this undesirable effect could be compensated by accumulation of other RBD mutations, leading to increased infectiousness, and rendering D614G unnecessary. Specifically, the "HMN 19B variant", recently described in France, is a direct descendent from the earlier 19B that have predominated before D614G harboring lineages took over (Fourati et al., 2021) . The codon 501 in RBD has been considered another major mutational hotspot. Substitutions at this position have been associated with increased binding affinity to ACE2 (Gu et al., 2020; Starr et al., 2020) . The non-synonymous mutation from an asparagine to a tyrosine at position 501 (N501Y) has first appeared in samples from Wales and other parts of the world. However, its actual emergence and dissemination have been related to coevolution with different mutations. The deletion of H69-V70 in the N-terminal-domain, co-occurring in B.1.1.7 lineage, has been particularly important for the N501Y emergence (Meng et al., 2021) . However, N501Y has increasingly been demonstrated in coevolution with many different mutational signatures, for instance, in the Variants of Concern (VOCs) B.1.351 and P.1, as well as in the more recent HMN 19B aforementioned. Data suggests that some N501Y harboring viruses could be 50% more transmissible and up to 61% more lethal, mainly due to major changes in the electrostatic interaction between Spike and ACE2 receptors (Washington et al., 2021; Davies et al., 2021) . This would be related to the substitution of a relatively small asparagine (N) for a larger aromatic tyrosine (Y), allowing for an extra contact site with ACE2 (Ali et al., 2021; Gu et al., 2020; Tang et al., 2021) . Interesantly, N501Y increases host diversity allowing direct infection of non-humanized mice (Rathnasinghe et al., 2021) . The E484K mutation in RBD has been drowning progressive attention. It is an amino acid replacement of glutamic acid to lysine at amino acid position 484, which could significantly alter the complementarity of antibodies to the RBD region leading to immune evasion Greaney et al., 2020; Weisblum et al., 2020) . In fact, both in a Deep Mutational Scanning (DMS) experiment and in a selective pressure study using 19 types of mAbs, different substitutions at this position (E484K/Q/P/A/D/G) were consistently demonstrated as being the most critical for decreasing antibody neutralization titers Harvey et al., 2021) . Since this mutation has been arising independently in multiple lineages , at the beginning it may represent a common evolutionary solution for viral maintenance. E484K emerged in a few selected lineages where it co-exists with other signature substitutions (e.g. in B.1.351 and in P1), similarly to the phenomenon previously described for N501Y. Importantly, the combination of E484K and N501Y seems to induce more conformational changes than the N501Y mutant alone, potentially altering antibodies binding to this region and resulting in the immune evasion phenomena . The emergence of three independently evolving lineages (B.1.1.7, B.1.351, and P.1) (Faria et al., 2021; Rambaut et al., 2020; Tegally et al., 2021) , all characterized by a constellation of mutations (including the aforementioned del 69-70, K417N/T, E484K, and N501Y convergent mutations) in RBD, has prompted concerns about the evolutionary forces leading to the SARS-CoV-2 adaptation. The presence of these sets of consistently present substitutions are highly suggestive of coevolutionary mutational processes . In addition to these convergent mutations, other substitutions of these VOCs are also associated with positive selection. For example, there is evidence that eight out of the 10 lineage-defining mutations in the spike protein of the P.1 lineage are under diversifying positive selection (Faria et al., 2021) . It is possible to detect selection along prespecified lineages that affect certain subsets of codons in a protein-coding gene. Sites 484 and 501 (located in the RBD) definitively show a pattern of nucleotide diversification that can be linked with positive selection (Tegally et al., 2021) . On the other hand, alternative evolutionary processes that could have led to this plethora of substitutions have been rarely described thus far. Recombination, a recurrent and well documented process of the molecular evolution of coronaviruses, has been described between B.1.1.7 and other lineages , but their real role in SARS-CoV-2 evolution remains elusive. The present work aimed to evaluate the effect of positively selected mutations fixed in the Brazilian SARS-CoV-2 lineages and to check for mutational coevolution evidence. Additionally, we evaluated the impact of some mutations identified in some VOC and VOI lineages (C.37, B.1.1.7, P.1 and P.2) of Brazilian samples on the structural stability of the spike protein, as well as their possible association with more aggressive infection profiles by the binding affinity estimation in the RBD-hACE2 complex. After the exclusion of incomplete or low coverage (Ns > 5%) sequences, 11,078, Brazilian genomes were recovered from the GISAID database between January 01 st , 2020 and June 06 th , 2021 (submission date up to June 06 th , 2021). The multiple sequence alignment with NC_045512.2 as reference was performed by the MAFFT v.7 web server (Katoh et al., 2019) with default parameters and 1PAM / κ=2' scoring matrix for closely related DNA sequences. For the spike sequence analysis, the region between the positions 21,562 and 25,384 (spike regions in the NC_045512.2 reference genome) were selected from the multiple sequence alignment previously performed. The deletion of sequence duplicates (identical spike sequences) kept 2,901 unique spike sequences. The clade assignment and variant calling was performed by Nextclade (https://clades.nextstrain.org/). The phylogenetic analysis was started by the inference of the best evolutionary model by ModelTest-NG (Darriba et al., 2020) , which identified GTR+R4 in all selection strategies. The phylogenetic tree reconstruction was performed by the Maximum Likelihood method in the IQ-TREE program (Nguyen et al., 2015) , using 1,000 replicates of ultrafast bootstrap (Hoang et al., 2018) and a Shimodaira-Hasegawa-like approximate likelihood ratio test (SH-aLRT) with 1,000 replicates (Guindon et al., 2010) , 2,000 iterations and the optimization of the UFBoot trees by NNI on bootstrap alignment. The tree visualization was performed by FigTree software (http://tree.bio.ed.ac.uk/software/figtree/). The multiple sequence alignment of the 2,901 unique spike sequences and the phylogenetic tree previously built were used as input in the HyPhy program. To perform different selection tests, the methods: (i) Fast Unconstrained Bayesian AppRoximation (FUBAR) (Murrell et al., 2013) , (ii) Fixed Effects Likelihood (FEL) (Kosakovsky Pond and Frost, 2005) , and (iii) Single-Likelihood Ancestor Counting (SLAC) (Kosakovsky Pond and Frost, 2005) were evaluated. The analysis of coevolution across sites in the spike sequences was performed by BGM (Bayesian Graphical Model) (Poon et al., 2007) , a tool for detecting coevolutionary interactions between amino acid positions in a protein by a Markov Chain Monte Carlo (MCMC) method. The analysis of the spike structural stability by changes in free energies upon mutation (ΔΔG kcal/mol) was performed using PDB ID 6XR8 (chain A) as the structure of the full-length prefusion conformation of the SARS-CoV-2 spike protein (Cai et al., 2020) using five methodologies: the web server DynaMut (Rodrigues et al., 2018) , FoldX Suite v5.0 (Schymkowitz et al., 2005) , and the web servers iMutant3.0 (Capriotti et al., 2005) , MAESTRO (Laimer et al., 2016) , and PremPS . The web-server DynaMut was selected to perform the analyses of vibrational entropy and total energy using the PDB ID 6XR8 (chain A). The DynaMut implements a Normal Mode Analysis (NMA) through Bio3D (Grant et al., 2006) and ENCoM (Frappier and Najmanovich, 2014) approaches, providing rapid and simplified access to insightful analyses about protein motions. Moreover, DynaMut also enables rapid analysis of the impact of mutations on dynamics and stability of proteins resulting from vibrational entropy changes. kcal/mol < ΔΔG ≤ +0.92 kcal/mol), destabilizing mutations (+0.92 kcal/mol < ΔΔG ≤ +1.84 kcal/mol), and highly destabilizing mutations (ΔΔG > +1.84 kcal/mol) (Studer et al., 2014) . iMutant3: The iMutant web server was selected to estimate the free energy changes by a support vector machine (SVM)-based tool. This system predicted the sign of the protein stability change upon mutation and as a regression estimator predicted the related ΔΔG values in the physiological pH (7.4) and temperature (36.5 ℃). The Multi AgEnt STability pRedictiOn web server was used to estimate the changes in unfolding free energy upon point mutation through a machine learning system. The estimation of the unfolding Gibbs free energies was performed by a random forest regression scoring function using the ProTherm database for parameterization. Negative values indicate stabilization by the decrease of the free energies. To perform an accurate estimation of the binding free energy associated with mutations in the spike protein belonging to different lineages, spike RBD -ACE2 protein complexes were modelled for the reference SARS-CoV-2 spike (YP_009724390.1) and lineages C.37, B.1.1.7, P.1, P.2, and P.2+452. The PDB file 6M0J was selected as the best template (2.45 Å and 194 amino acids) to the modeling using the MODELLER pipeline (Webb and Sali, 2016) . Five models were generated for each sequence. The thirty resulting structures were evaluated in relation to stereochemical parameters and structural quality by the programs PROCHECK (Laskowski et al., 1993) and VERIFY3D (Eisenberg et al., 1997) , available on SAVES v6.0 web server (https://saves.mbi.ucla.edu/). The analysis of the Ramachandran plot statistics, G-factors and residue properties (PROCHECK), as well as the evaluation of the 3D-1D score using VERIFY3D, allowed the selection of the best modelled RBD structures to be subsequently used for the energy calculations. The generation of the spike RBD -ACE2 complexes for each different lineage was performed by the PyMOL software using the structural alignment of the RBD models with the 6M0J template, which was the source of the ACE2 structural coordinates. The structures of the fragments comprising residues 333 until 526 of the wild-type (reference) spike protein, as well as of the variants P.1 (K417T, E484K, N501Y), P.2 (E484K), P.2+452 (E484K, L452V), C.37 (L452Q, F490S), and B.1.1.7 (N501Y) complexed with the human ACE2 protein (residues 19 until 615) in the pdb format were used as input for classical molecular dynamics simulations using GROMACS (Abraham et al., 2015) with the AMBER03 force field (Duan et al., 2003) . The spike-ACE2 complexes were simulated in cubic boxes with periodic boundary conditions, solvated with TIP3P water molecules (Jorgensen et al., 1983) and with sodium and chloride ions corresponding to physiological concentration. The van der Waals interactions were calculated using a cutoff radius of 1.2 nm and the electrostatic interactions were calculated with the Particle Mesh Ewald (PME) method (Darden et al., 1993) . All systems were initially energetically minimized using conjugate gradients and steepest descent algorithms. After minimization, they were submitted to a 500 ps where the coordinates of both proteins were restrained, allowing the solvent and ions to relax without disturbing the geometry of the complex. After the position restrained simulation, a thermalization phase consisting of a sequence of three unrestricted molecular dynamics simulations with 5 ns each in temperatures of 200 K, 240 K, and 280 K was carried out. The production phase consisted of 200 ns long simulation runs, in the NPT ensemble, using a Nosé-Hoover thermostat (Hoover, 1985; Nosé, 1984 ) and a Parrinello-Rahman barostat (Parrinello and Rahman, 1981) . After simulations, the trajectories were visually analyzed using VMD (Humphrey et al., 1996) and GROMACS tools to quantify the structural and thermodynamic stability of the complexes, the number of hydrogen bonds and the interface area. The interface area was computed taking SASA (Solvent Accessible Surface Area) (Eisenhaber et al., 1995) of ACE2 and spike minus SASA of the complex and dividing the result by two (because the contact was counted twice). The spike-ACE2 binding free energy was calculated using Molecular Mechanics Poisson-Boltzmann Surface Area (MM/PBSA) binding free energy calculations (Baker et al., 2001; Homeyer and Gohlke, 2012) , employing sets of 200 configurations (1ns spaced snapshots obtained from the molecular dynamics trajectories) for each system. The calculations were carried out using the program g_mmpbsa (Kumari et al., 2014) , which is compatible with GROMACS, using as settings a gridspace of 0.5 A, salt concentration of 0.150 M, solute dielectric constant of 2, and estimating the nonpolar solvation energy using the solvent accessible surface area (SASA). The contributions of the residues to the binding free energy were calculated with the same program. The spike phylogenetic analysis of 2,901 unique sequences showed the formation of a large monophyletic group containing two main subclades: one formed by P.1 and P. . Node tips are colored by PANGO lineages represented by ≥ 10 genomes. "Other" defines lineages representing < 10 genomes. The tree is rooted using the reference spike nucleotide sequence (NC_045512.2). (B) Average number of amino acid substitution events in the spike protein by lineage. The average values were calculated based on the spike set of 2,901 brazilian sequences between January, 2020 and June 06, 2021. This spike set represents the nucleotide sequence variability for 11,054 brazilian genomes. The selection analysis performed with HyPhy for site-to-site tests evaluated the presence of diversifying and purifying selection in the spike protein using a set of 2,901 sequences, which represents the genetic variability for 11,054 genomes (24 sequences were excluded due to the presence of >5% of ambiguous sites or truncating insertions in the final alignment). The FUBAR method identified 20 sites under adaptive pressure along the phylogeny (Table 1 The FEL method was mainly tested to detect negatively selected sites, since more powerful methods such as FUBAR do not evaluate purifying selection. With the initial assumption that the selective pressure for each site is constant along the entire phylogeny, FEL indicated sites that are evolving under positive and negative selective pressures in the spike protein sequence. As result, 239 sites were marked as targets of purifying selection (Supplementary File 3) along the phylogeny, while other 44 sites were suggested to be under adaptive selection. Interestingly, all positively selected sites indicated by the FUBAR method were found by FEL (Table 1 and Figure 2A ), including those with low frequency in the analysed genomes. Some of them were indeed related to the VOC lineages. Among the sites identified only by the FEL test are the residues 29, 49, 63, 76, 78, 367, 483, 553, 585, 689, 747, 1, 027, 1, 084, 1, 124, 1, 133, and 1, 167 (Supplementary File 2) . Finally, using a modified version of the Suzuki-Gojobori counting algorithm, the SLAC method assumes that the selective pressure for each site is constant. Despite being the weakest method of this group (Kosakovsky Pond and Frost, 2005) , the analysis of pervasive site-selection with SLAC indicated the presence of 130 sites under negative selection, of which 112 were previously found by FEL (Supplementary File 3 and Figure 2B ). In relation to the 29 positively selected sites, 17 of them were previously identified by the FUBAR and FEL methods (Table 1) , with eight also found by FEL (12, 26, 98, 570, 653, 684, 846, and 1, 078) and four only identified by SLAC (27, 701, 769, and 1, 228) . Considering the consensus sites identified by the three selection methods, the residues 5, 21, 67, 75, 138, 142, 222, 262, 484, 572, 614, 681, 688, 845, 1, 176, 1, 219 , and 1,264 presented the strongest evidences for positive selection pressure on the spike protein (Table 1 and In order to evaluate if the spike protein sites could be under a coevolutionary process, File 4) , of which 28 achieved a P ≥ 0.8 ( To analyze the impact of the spike sites under adaptive selection on the protein stability and host-pathogen interaction, two main categories of structural analyses were applied related to: (i) the folding/unfolding free energies and the vibrational entropy: a reference spike structure PDB ID 6XR8 (relative to the NC045512.2 genome) was used to estimate the energy changes caused by these positively selected single mutations in the prefusion conformation of the spike protein and (ii) protein molecular dynamics: SARS-CoV-2 spike RBD models belonging to different viral lineages in a complex with the human ACE2 were simulated using molecular dynamics and submitted to estimation of binding free energies to provide a better understanding of the effect of those sites under positive selection (single and combined) in the viral fitness. The evaluation of the structural stability of the spike protein considered the results of five different methodologies in order to find a majority consensus. The application of DynaMut, FoldX, iMutant, MAESTRO, and PremPS provided the estimate of the unfolding and total free energies, as well as the vibrational entropy of the mutated structures. Despite some sites presenting opposite trends in relation to the stabilizing/destabilizing impact, which denotes the specific differences in energy calculations among these algorithms ( Figure 3A) , a majority consensus result (found by 3 or more tests) was obtained ( Figure 3B and Table 3 ). Sites without resolution in the crystallographic structure, located at N/C-terminal regions, as those in the furin cleavage site (among others), were not analyzed for the energy estimation. The PremPS algorithm was used to calculate the changes in unfolding Gibbs free energy generated by each single mutation in the full-lenght prefusion conformation of the spike protein. As observed in Table 3 between the wild-type and the mutated 6XR8 spike structures. According to the analysis of the thermodynamic properties, the mutations marked as positively selected seem to destabilize the protein structure. However, the majority consensus evaluation did not confirm this trend, suggesting 51 substitution events (from 33 sites) as stabilizing mutations. Finally, the FoldX pipeline repaired the 6XR8 PDB file by the correction of residues with bad torsion angles or van der Waals clashes and performed the energy minimization testing different rotamer combinations. As result of the mutagenesis and total energy estimation, the mutations in sites T29A, G142D, and A570D were considered as highly destabilizing, increasing the total energy in 1.936, 2.848 and 2.939 kcal/mol, respectively. Despite some energy differences being categorized as neutral, the increase or decrease of the math signal suggests the potential effect of these mutations on the structural stability of spike, in the tested conditions. The comparison between the mutational effect predicted by the five methods and the majority consensus result are presented in Destabilizing mutations (highlighted in cyan and labeled) related to the positively selected sites, according to the majority consensus analysis. For visualization purposes, sites with multiple possible mutations were represented by the prevalent one and its respective effect (stabilizing/destabilizing). In total, 30 theoretical models with 194 amino acid length for the spike RBD -ACE2 protein complexes were obtained using PDB ID 6M0J as template, being five models generated for each lineage (reference, P.1, P. 89.9% of the residues in favoured regions and 10.1 up to 11.3% of the residues in allowed regions, which indicated the good model quality. There were no residues in generously allowed or disallowed regions ( Figure S1 ). The overall G-Factor values ranged between -0.06 and -0.12. For all selected models, the compatibility of the atomic model with the amino acid sequence was defined with 100% of the residues achieving an average 3D-1D score ≥ 0.2 in the VERIFY3D analysis. The analysis of the RMSD, center of mass and minimum distances indicated that all systems were structurally stable, in the considered time window, with overall conservation of the arrangement of the complex, as can also be seen by visual inspection of the snapshots (Figures 7 and 8) . The lowest changes were found in the reference and lineages P.1 and C.37, meaning higher structural stability. Some structural changes in spike were found in the P.2 and B.1.1.7 lineages, whereas for P.2+452 and B.1.1.7 the structural changes were also noticeable in relation to the ACE2 molecule. For all systems, the spike-hACE2 interaction was characterized by the presence of many close contacts, which is reflected in a substantial interface area, and many hydrogen bonds (Figure 9 and Table 4 ). Despite the distance plots with similar results, indicating that the relative position between the spike and ACE2 is not altered during the simulation time, the lineages P.2+452, C.37 and B.1.1.7 showed a small alteration in the distance, with a slight approximation between spike and ACE2 ( Figure S2 ). In the reference system, there were on average 9.51 ± 2.09 hydrogen bonds and an interface area 10.15 ± 0.55 nm 2 . The systems P.2, P.2+452, and C.37 displayed a higher number of hydrogen bonds and interface area than the reference, whereas the lineage P.1 showed larger interface area but less hydrogen bonds and the lineage B.1.1.7 showed smaller interface area and number of hydrogen bonds. About the number and temporal evolution of the hydrogen bonds, the order followed: P.2 > P.2+452 > C.37 > reference > P.1 > B.1.1.7. The intensity of the hydrogen bonds in the lineages P.2 and P.2+452 is reflected in the interaction intensity with ACE2. As for the contact area between spike and ACE2, proportional to the intensity of the van der Waals interactions (also reflected in the interaction intensity), the order followed with a slight change: P.2+452 > P.2 > P.1 > C.37 > reference > B.1.1.7 ( Figure 9 and Table 5 ). In all cases, the binding free energy of the mutant spikes was found to be more negative (stronger binding) than the native one and the order is roughly similar to the intensity of interface area or hydrogen bonds: P.2+452 > P.2 > P.1 > B.1.1.7 > C.37 > reference. Specifically, for P.2+452 and P.2, the binding intensity was significantly stronger than the reference, while for B.1.1.7, C.37 and reference the interaction intensity was almost the same (Table 5 ). In relation to the hotspots for the RBD-ACE2 interaction, the residue distances for the wild-type (reference) and the mutated sites were calculated (Supplementary Figures S3-S7 ). The spike K417 interacts with hACE2 residues T27, D30, K31, and H34 in the wild type, while the mutation for T417 in P.1 maintains the same contacts. In the P.2 lineage, K417 is affected by the E484K mutation, changing the residue interactions to the hACE2 D30, N33, H34, and P389, which is the same profile identified in B. In relation to the magnitude of interaction, all three top-ranked lineages bear the same E484K mutation, which yielded a particularly strong binding, remarkably stronger than in the native case. Indeed, the analysis of the contribution of the residues to the free energy of binding showed that the mutation of the negatively charged residue GLU 484 to the positive charged residue LYS 484 yielded a substantial negative free energy contribution to the stabilization of the complex. The mutation in the residue 501 (ASN to TYR, both polar non charged residues) presented in the lineages P.1 and B.1.1.7 can also contribute to the stabilization of the complex, but much less than E484K (Table 6 ). The spike protein is one of the most rapidly evolving regions in the SARS-CoV-2 genome, with slightly different evolution rates according to the clades, emerging in the beginning and middle 2020 (Pereson et al., 2020) . The emergence of the P.1 lineage, at the end 2020, with a possibly increased rate of the molecular evolution, represents an important change in the SARS-CoV-2 evolutionary history (Faria et al., 2021) . That is the main reason why lineage assignments should be carefully evaluated, since a considerable number of sequences Several studies Hark Gan et al., 2021) have demonstrated the increased binding affinity upon the amino acid changes E484K+N501Y+K417T (P.1), superposing the K417T effect that decreases the binding affinity. The threonine substitution at site 417 avoids the salt bridge formation with D30 in hACE2. Consequently, there is expected diminution in the binding affinity for the mutated structures (Winger and Caspari, 2021) . As previously elucidated, the binding free energy between RBD and ACE2 reflects the infectivity (Qu et al., 2005) . Previous analysis of the substitutions L452R, F490S, and N501Y presented relatively high binding free energy changes suggesting that these sites may generate more infectious lineages . Our findings corroborates the improved binding pattern found in P.1 models, showing the greater contribution of E484K mutation to the binding free energy reduction and the RBD-hACE complex stabilization in P.1 and P.2 lineages, especially in the presence of a mutated L452V site. Despite L452Q (C.37) suggesting similar properties to L452R (B.1.617.2), increasing viral infectivity (Acevedo et al., 2021) , nothing is known about the valine substitution and its effect on P.2 viruses. However, a significant interaction effect with E484K was observed in this study, also with indication of structural changes in the ACE2 molecule (as for B.1.1.7 model). Interestingly, the escape from neutralizing antibodies may be related to mutants with increased ACE2 binding affinities (Van Egeren et al., 2021) . As shown by our molecular dynamic simulations, the comparison of the combined binding energies for the evaluated lineages indicates a major reduction in the binding free energy of those containing E484K, with only a small difference between the reference and lineages such as C.37 and B.1.1.7. In our data, mutations such as P26S (B.1.1.7, P.1, P.1.1, P.1.2, P.2), T63N (P.2), R78S/T (P.1, P.2), K417T/N (P.1, P.1.1, P.1.2, P.2, B.1.351), A570D/V (B.1.1.7, P.1), L585F/V (P.1, P.2), T747A (P.1, P.2), A846S (P.1, P.2), and V1133F (P.1), among others, in positively selected sites, seem to impact the spike prefusion structure with a destabilizing effect. Except for P26S and K417T, which are found in 58.45% and 49.11% of the spike sequences from the genome set (n=11,078), respectively, the remaining mutations are observed in very low frequencies. The selection of theoretically "destabilizing" or "stabilizing" mutations still remains to be completely elucidated. One possible explanation for fixation of such mutations would be the occurrence of "copy choice" recombinations capable of selecting very quickly for a set of mutations that have coevolved for the aforementioned reasons. The other possibility would be rapid evolution mediated by lineages enhanced with greater replicative fitness, as a consequence of key RBD mutations and/or other allosteric interactions (e.g., D614G). The higher viral turnover would account for a tremendous selective pressure and fast viral selection, provided the viruses find an appropriate environment for transmissions. In this regard, it is interesting to note that both B.1.351 and P.1 appeared to have emerged in a very intense selective pressure scenario. In some Amazonas State cities in Brazil, where the P.1 lineage has emerged, up to 75% exposure were found for previous first-wave lineage. Similarly, B.1.351 has emerged in South Africa in a scenario with up to 30% of the population being previously exposed to other lineages. Taken together this evidence supports the role of consecutive accumulation of mutations rather than recombinations in order to explain the emergence of multiple lineages with specific sets of mutations, characteristic of the VOC viruses. Finally, it is possible that an immunosuppressed host would be critical for selecting these viruses, since the first occurrence of important, but intrinsically "destabilizing" substitution, would prevent further evolution if the virus encounters a robust immune response. In other hand, an immunocompromised patient would allow the progressively mutated virus to evolve, eventually acquiring fitness recovery and/or escape mutants permitting transmission to normal hosts. Although this hypothesis is merely speculative at this time, previous works involving multiple genotyping of a single chronically immunosuppressed patient (Choi et al., 2020; Kemp et al., 2021) suggested that this could be a way of rapidly selecting virus harboring mutations that, on an individual basis, are "destabilizing". The funders had no role in the study design, data generation and analysis Ferrareze: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Visualization, Writing -Original Draft Formal analysis, Investigation, Writing -Original Draft, Writing -Review & Editing. Vinicius B. Franceschi: Investigation, Visualization, Writing -Original Draft, Writing -Review & Editing. Gabriel D. Caldana: Writing -Original Draft, Writing -Review & Editing. Paulo A. Netz: Methodology, Software, Formal analysis, Investigation, Resources, Visualization, Writing -Original Draft Formal analysis, Investigation, Resources, Writing -Original Draft, Writing -Review & Editing, Supervision, Project administration. All authors have read and approved the manuscript GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers Infectivity and immune escape of the new SARS-CoV-2 variant of interest Lambda The new SARS-CoV-2 strain shows a stronger binding affinity to ACE2 due to N501Y mutant Electrostatics of nanosystems: Application to microtubules and the ribosome Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets Distinct conformational states of SARS-CoV-2 spike protein I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure Computational prediction of the effect of amino acid changes on the binding affinity between SARS-CoV-2 spike protein and the human ACE2 receptor PremPS: Predicting the impact of missense mutations on protein stability Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models Increased mortality in community-tested cases of SARS-CoV-2 lineage B.1.1.7 A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations VERIFY3D: assessment of protein models with three-dimensional profiles The double cubic lattice method: Efficient approaches to numerical integration of surface area and volume and to dot surface contouring of molecular assemblies Coronaviruses: An Overview of Their Replication and Pathogenesis E484K as an innovative phylogenetic event for viral evolution: Genomic analysis of the E484K spike mutation in SARS-CoV-2 lineages from Brazil Novel SARS-CoV-2 Variant Derived from Clade 19B Mutation hotspots, geographical and temporal distribution of SARS-CoV-2 lineages in Brazil A coarse-grained elastic network atom contact model and its use in the simulation of protein dynamics and the prediction of the effect of mutations Sister-Scanning: a Monte Carlo procedure for assessing signals in recombinant sequences Bio3d: an R package for the comparative analysis of protein structures Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition The D614G mutations in the SARS-CoV-2 spike protein: Implications for viral infectivity, disease severity and vaccine design Adaptation of SARS-CoV-2 in BALB/c mice for testing vaccine efficacy New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0 Interferon Resistance of Emerging SARS-CoV-2 Variants Structural modeling of the SARS-CoV-2 Spike/human ACE2 complex interface can identify high-affinity variants associated with increased transmissibility SARS-CoV-2 variants, spike mutations and immune escape UFBoot2: Improving the Ultrafast Bootstrap Approximation SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Phylogenetic evidence for recombination in dengue virus Free Energy Calculations by the Molecular Mechanics Poisson−Boltzmann Surface Area Method Canonical dynamics: Equilibrium phase-space distributions VMD: Visual molecular dynamics COG-UK) consortium, 2021. Recombinant SARS-CoV-2 genomes involving lineage B.1.1.7 in the Functional importance of the D614G mutation in the SARS-CoV-2 spike protein Comparison of simple potential functions for simulating liquid water MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization SARS-CoV-2 evolution during treatment of chronic infection Type I and III interferon responses in SARS-CoV-2 infection. Experimental & Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection g_mmpbsa-A GROMACS Tool for High-Throughput MM-PBSA Calculations MAESTROweb: a web server for structure-based protein stability prediction PROCHECK: a program to check the stereochemical quality of protein structures Accessory proteins of SARS-CoV and other coronaviruses RDP: detection of recombination amongst aligned sequences RDP4: Detection and analysis of recombination patterns in virus genomes A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints The emergence and ongoing convergent evolution of the N501Y lineages coincides with a major global shift in the SARS-CoV-2 selective landscape Recurrent emergence of SARS-CoV-2 spike deletion H69/V70 and its role in the Alpha variant B.1.1.7 FUBAR: A Fast, Unconstrained Bayesian AppRoximation for Inferring Selection Molecular dynamic simulation reveals E484K mutation enhances spike RBD-ACE2 affinity and the combination of E484K, K417N and N501Y mutations (501Y.V2 variant) induces conformational change greater than N501Y mutant alone, potentially resulting in an escape mutant IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies A unified formulation of the constant temperature molecular dynamics methods Possible Emergence of New Geminiviruses by Frequent Recombination Polymorphic transitions in single crystals: A new molecular dynamics method Phylogenetic analysis of SARS-CoV-2 in the first few months since its emergence Spike mutation D614G alters SARS-CoV-2 fitness An Evolutionary-Network Model Reveals Stratified Interactions in the V3 Loop of the HIV-1 Envelope Evaluation of methods for detecting recombination from DNA sequences: Computer simulations Prediction and Evolution of the Molecular Fitness of SARS-CoV-2 Variants: Introducing SpikePro Identification of Two Critical Amino Acid Residues of the Severe Acute Respiratory Syndrome Coronavirus Spike Protein for Its Variation in Zoonotic Tropism Transition via a Double Substitution Strategy Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations The N501Y mutation in SARS-CoV-2 spike leads to morbidity in obese and aged mice and is neutralized by convalescent and post-vaccination human sera DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability The FoldX web server: an online force field Analyzing the mosaic structure of genes Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding Stability-activity tradeoffs constrain the adaptive evolution of RubisCO Emergence of a new SARS-CoV-2 variant in the UK Detection of a SARS-CoV-2 variant of concern in South Africa Risk of rapid evolutionary escape from biomedical interventions targeting SARS-CoV-2 spike protein Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Vaccine-escape and fast-growing mutations in the United Kingdom, the United States Emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States Comparative Protein Structure Modeling Using MODELLER Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants D614G Spike Mutation Increases SARS CoV-2 Susceptibility to Neutralization The Spike of Concern-The Novel Variants of SARS-CoV-2 A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV We thank Amanda M. Mayer for suggestions in the introduction section. We also thank the administrators of the GISAID database and research groups across the world for supporting the rapid and transparent sharing of genomic data during the COVID-19 pandemic. Full tables acknowledging the authors and corresponding labs submitting sequencing data used in this study can be found in Supplementary File S1. Additional information used and/or analysed during the current study are available from the corresponding author on reasonable request. The authors declare no competing interests.