key: cord-305742-wf6qxplf authors: Gomez, Santiago A.; Rojas-Valencia, Natalia; Gomez, Sara; Egidi, Franco; Cappelli, Chiara; Restrepo, Albeiro title: Binding of SARS–CoV–2 to cell receptors: a tale of molecular evolution date: 2020-09-28 journal: Chembiochem DOI: 10.1002/cbic.202000618 sha: doc_id: 305742 cord_uid: wf6qxplf The magnified infectious power of the SARS–CoV–2 virus compared to its precursor SARS–CoV is intimately linked to an enhanced ability in the mutated virus to find available hydrogen bond sites in the host cells. This characteristic is acquired during virus evolution because of the selective pressure exerted at the molecular level. We pinpoint the specific residue (in the virus) to residue (in the cell) contacts during the initial recognition and binding and show that the virus· · · cell interaction is mainly due to an extensive network of hydrogen bonds and to a large surface of non–covalent interactions. In addition to the formal quantum characterization of bonding interactions, computation of absorption spectra for the specific virus· · · cell interacting residues yields significant shifts of ∆λ max = 47 and 66 nm in the wavelength for maximum absorption in the complex with respect to the isolated host and virus, respectively. At the time of the writing of this manuscript, the situation regarding the global pandemic produced by the spread of the SARS-CoV-2 virus (over 17.7 million confirmed cases, over 675000 deaths, with no end in sight), with dire consequences in all aspects of life, from social interactions, to the overwhelming of health and economic systems, is changing fast. Because this is a critical problem, just as the rate of virus transmission on the early stages of dissemination, the number of scientific papers on the subject (mostly preprints) increases exponentially. SARS-CoV-2 is an enveloped virus of the Coronaviridae family with a single-stranded RNA genome. [1] Figure 1 highlights the most important structural features of the virus: besides the nucleocapsid (N) proteins, the only proteins in direct contact with the genetic material (the N-RNA core is embedded in a lipid environment), there are membrane (M), envelope (E), and spike (S) proteins. It is the spike proteins which lead to the now familiar external morphology of the virus, but more importantly, S proteins are responsible for the interactions with receptors in the host membrane (epithelial cells in humans). These S· · · Receptors contacts initiate the infectious cycle of the virus. [2] Each spike consists of a trimer of S proteins. Individual S proteins have been divided into two clear S1, S2 sections, [3] with S1 containing the N-terminal domain (NTD), and the receptor binding domain (RBD), the domain ultimately responsible for the interactions with the coupling factors present in cell membranes. [4] Coupling factors include a variety of proteins, carbohydrates, or other types of biomolecules expressed on the surface of the cell membrane and in charge of signaling and transport, among other functions. Viruses take advantage of these molecules during the infection process. It seems well established that initial virus↔host recognition and binding is driven by S1, and that further changes in the conformation of the S2 section mediate the viral envelope fusion to the host cell membrane. We also show a cell with internal organelles and with a few enzymes that act as a virus receptors. 4 10.1002/cbic.202000618 Accepted Manuscript ChemBioChem This article is protected by copyright. All rights reserved. The most commonly invoked culprit (with plenty of experimental evidence [5] ) for the reception of SARS-CoV-2 is the Angiotensin Converting Enzyme 2 (ACE2). This receptor is the subject of intensive studies aiming at finding effective therapies, and is the central focus of the ongoing race to find a vaccine. In this work, we are interested in two crucial aspects of the initial virus· · · cell interaction problem: to pinpoint the specific residue to residue binding sites between the structurally known spike proteins of the virus [6] and the structurally known ACE2 receptor in cell membranes, [5] and to understand, from a fundamental, quantum perspective, the molecular factors driving the virus· · · cell binding. We expect this knowledge to considerably better our understanding of the problem and to hopefully contribute to a rational design of drugs and vaccines to fight the virus. See the Computational Methods section for details of our calculations. Our data shows that the RBD(S)· · · ACE2 complex reached well defined persistent equilibrium states long before the 600 ns of the molecular dynamics (MD) simulation time are consumed in each of the three replicas. This stability is especially encouraging in the interaction region as clearly shown in the highlighted areas of the bottom panels in Figure 2 . We obtained an interaction energy ∆G int = −419.91 kcal/mol, which is in excellent agreement with calculations reported in closely related systems. [7] 2.1 Virus· · · cell contacts In all cases, only hydrogen bonds (HBs) were found as responsible for explicit virus· · · cell pair-wise interactions. Naturally, this does not mean that other weak, long range cumulative for the later steps of the simulations. Table 1 lists all individual binary contacts in the form of hydrogen bonds between residues in the RBD(S)· · · Receptor complex found in our MD simulations with an arbitrary threshold average of 15% occupancy on the triplicate runs. Notice that this procedure intends to extract a representative sample from the simulations, thus, there are considerably more contacts not explicitly shown because they have lower occupancies or, because while having high occupancies, are not persistent in the 3 replicas. These HB contacts, whose bonding interactions are dissected below, are responsible for the attachment of the virus to epithelial cells in humans, initiating the infection process. Number of fragment to fragment hydrogen bonds in the RBD(S)· · · ACE2 complex averaged over three independent replicas as the MD progresses. A summary of the quantum descriptors for the virus· · · cell interactions is listed in Table 1 , the corresponding pictures are shown in Figures 4, 5. Without exception, despite being weak organic acids, residue to residue hydrogen bonds are stronger than the archetypal HB in the water dimer, this is seen in the larger binding energies, smaller distances, orbital interaction energies, E d→a of comparable magnitudes, larger bond indices, and in the properties of the bond critical points. The LYS417→ASP30 is an exceptional case because it corresponds to Table 1 : Properties of virus· · · cell hydrogen bonds. Persistent hydrogen bonds in the RBD(S)· · · Receptor complex exceeding the 15% occupancy threshold during the entire 600 ns MD simulation of each replica. The arrows state the directionality of the donor → acceptor interaction in the corresponding hydrogen bond according to the classical electrostatic X δ− − H δ+ → Y δ− description. All occupancies averaged over the three MD replicas. WBI are the Wiberg bond indices. [8] The archetypal hydrogen bond in the water dimer is included for comparison purposes. See the specialized literature [9] [10] [11] [12] [13] [14] for the formalism on how NBO and QTAIM descriptors are related to bonding. bridge in a previous work. [15] All hydrogen bonds are well characterized long range interactions. This is clearly seen in (i) There is never a formal σ orbital between the fragments, on the contrary, orbital In other words, in the NBO picture ( Figure 4 ), all explicit virus· · · cell contacts are stabilized by charge transfer from one lone pair in an oxygen atom in the donor residue to an antibonding orbital in the acceptor residue. (ii) All properties of the bond critical points support the same picture: small ρ (r c ), small bond orders, virial ratios smaller than 1, and positive bond degree parameters. Again, is the exception, with all the calculated descriptors indicating a highly ionic contact. The non covalent interactions (NCI) calculation for the interaction region uncovers a large discontinuous non-covalent wall separating RBD(S) from the receptor. Therefore, we characterize the virus· · · cell binding as due to a large number of non-covalent contacts between the two proteins, enhanced by the water molecules, acting in conjunction with the specific residue to residue hydrogen bonds. the ACE2 receptor (in blue). The snapshot was randomly extracted from the late stages of one of the three MD replicas. [16] All persistent virus· · · cell hydrogen bonds listed in Table 1 are explicitly highlighted on the right frames. The non covalent [17, 18] virus· · · cell interaction surface is explicitly shown in green, including the water molecules. Notice that both fragments contain glycosylated glycoproteins and that all fine structure is accounted for during the calculations, however, the glycans are not shown in this picture for clarity. Bottom: NBO donor→acceptor interactions responsible for the persistent hydrogen bonds. We concentrate on simulating the spectra of the aminoacids involved to investigate whether measurable changes in their spectral response occur upon binding, while residues that are not involved in the interaction can safely be viewed as changing very little as the two structures connect. For this purpose we propose a QM/MM approach where the aminoacid pairs involved in the binding are treated quantum mechanically by means of Density Functional Theory (DFT), while the rest of the protein environment is modelled classically, through the use of the AMBER force field. [19] In this way, the electronic structure of the QM portion is influenced by its environment by means of an Electrostatic Embedding paradigm, [20] where fixed charges are assigned to the MM atoms and directly affect the QM density and computed electronic excitations. Figure 5 shows that in all six cases, binding events set off drastic changes in the spectral response of the system, explicitly seen in red-shifted absorptions. The red-shift of the absorption bands is clearly visible in the convoluted spectrum and therefore provides unequivocal evidence of virus· · · cell bonding. The results listed here are quite encouraging and constitute an initial step that will hopefully motivate the design of experimental protocols to detect virus infection. However, it is clear that a number of details need to be worked out before practical applications can be devised. In particular, the potential interference of signals arising from functional groups in the same region of λ max , and the ability of the dimer model to accurately mimic physiological environments, should be addressed. the most serious problems faced when developing vaccines and therapies, and is particularly true for SARS-CoV-2. [21] We argue that this view of evolution as driven by environment induced molecular responses at the virus/biomolecules scales helps explain many aspects of evolution that are difficult to rationalize otherwise, namely, (i) in most cases evolution is a highly localized process (ii) because of the large number of mutation possibilities, which occur no matter how small the individual probabilities, an increase in entropy of the universe is the ultimate factor driving evolution (iii) evolution is a deterministic process driven by cumulative random changes (iv) in that sense, life itself is a deterministic process that only requires a large increase in the entropy of the universe (in other words, a long time), such that it will emerge in local environments capable of sustaining it. See for example early arguments by Schrodinger [22] stating the the apparent macroscopic stability is due to the microscopic chaos resulting from random events, and invoking a net entropy gain by the universe due to the continual energy transformation despite the heavy entropy investment in maintaining highly organized living organisms. More recently, England has discussed the statistical physics of self-replication. [23] In the context of this work, the previous discussion of molecular and virus evolution leads us to hypothesize that one of the key factors in the molecular evolution problem faced by the precursor SARS-CoV virus on its way to mutating into SARS-CoV-2 was solved by favoring those changes in RBD(S) that lead to an improved ability to locate available sites for hydrogen bonding in the host cell, ability that is further enhanced by the slightly basic pH ≈ 7.4 found in physiological environments. This improved hydrogen bonding capabilities may be achieved in a number of ways, for example, incorporating aminoacids with more acidic protons, or incorporating larger aminoacids whose hydrogen bonding regions are simply closer to the receptor, among others. We support the need for improved hydrogen bonding as the selective pressure in virus mutation hypothesis in the following evidence: 1. Table 1 shows that fragment to fragment contacts are all in the form of hydrogen bonds. For the specific case of the SARS-CoV-2 virus, in five of the six identified contacts, including the most persistent HBs, the residues in RBD(S) act as donors to the corresponding hydrogen bond 2. Besides a small sheet and a small helix (Figure 4 ), there is no secondary structure in RBD(S), thus, the receptor binding domain of the spike protein has a high structural flexibility which allows the virus to probe for available hydrogen bonding sites in the receptor, which in contrast has well defined secondary and tertiary structures in the interaction region 3. We obtained from the GenBank the sequences of aminoacids for the precursor SARS-CoV (ID AFR58742.1) and for the mutated SARS-CoV-2 (ID QHD43416.1) viruses. [24] We compare below only the 196 aminoacids in the RBD(S) and highlight in red the receptor binding motif (RBM). We also underline aminoacid substitution in the mu- proteins. [25] [26] [27] [28] [29] Here, we take a pragmatic approach to determine relative acidities between the precursor SARS-CoV and the mutated SARS-CoV2 viruses. We took averages of the experimental isoelectric points [30] (IEP) for the 49 aminoacids involved in the mutation, that is, we calculated the average IPE in the replaced aminoacids in RBD(S) and found 6.50 and 6.44 for the precursor and for the mutated viruses, respectively. We also calculated the same averages for the interaction motifs only and obtained 6.44 and 5.98 over the 34 mutations. Thus the mutated virus is collectively considerably more acidic in the interaction region, which improves its ability to donate protons to hydrogen bonds The most important contributions from this work may be summarized as follows: 1. We pinpoint the specific residue (in the virus) to residue (in the cell) interactions during the initial virus· · · cell binding 2. We characterize the virus· · · cell molecular attachment as the result of a large number The starting point of our calculations was the complex between the receptor binding domain of the S protein, RBD(S), and the ACE2 receptor. Cartesian coordinates for the RBD(S)· · · ACE2 complex were taken from the protein data bank (PDB ID 6LZG [31] ) and then treated with CHARMM-GUI, the graphical user interface of CHARMM [32, 33] to include missing hydrogen atoms at pH = 7.4, to ensure that all glycans are included, and to construct the force field. The entire system was enclosed by a truncated octahedral box such that the smallest atom· · · wall distance was set to 9Å, then the available volume in the box was filled with TIP3P [34] water molecules. NaCl molecules were added until a physiological 0.154 M concentration was attained and, finally, counterions were added to restore charge neutrality. This procedure lead to a system comprising a total of 171161 atoms, with 52735 water molecules, 9585 atoms in the receptor, and 3034 atoms in RBD(S). The system was subjected to a steepest descent energy minimization in order to correct for potential inconsistencies in atom coordinates that may arise during the procedure of randomly filling the available space with water, NaCl, and counterions. Once minimized, we ran triplicate all-atom MD simulations under the conditions summarized in Table 2 and described next. First, there were three equilibration steps lasting a total of 0.625 nanoseconds (ns) with 1 femtosecond (fs) time intervals, during which the structural constraints were progressively relaxed until finally being totally lifted. These structural constraints were imposed by harmonic constants that prevent deformation of the backbone (k bb ), side chain (k sc ), and dihedrals (k d ). Then, the system underwent a production step lasting 600 ns with time intervals of 2 fs. For all MD runs, the Lennard-Jones potential was softened starting at 0.8 nm until eventually vanishing at 1.0 nm. Also, the cutoff radius for electrostatic interactions was set to 1.0 nm. All these simulations were conducted using the CHARMM36m force field [35, 36] as implemented in GROMACS 2019.4 [37] at 310.15 K and 1 bar. The RBD(S)· · · ACE2 interaction energy (∆G int ) was estimated via the MM/PBSA [38] as implemented in GROMACS. [37] Dielectric constants were set to solute = 1, solvent = 80 at the simulation temperature. In short, ∆G int = ∆E virus···cell + ∆G p + ∆G np − T ∆S. Here, ∆E virus···cell is the gas phase energy of the RBD(S)· · · ACE2 complex, ∆G p , ∆G np are the solvation energies due to the polar and non-polar interactions, respectively, and T ∆S is the entropy contribution. More precisely, ∆G p was computed under the Poisson-Boltzmann model, ∆G np was estimated using the solvent accessible surface area, and the entropy term was obtained from the model of Duan and coworkers. [39] To finally estimate ∆G int , we took 300 points in the 140-170 ns interval of one of the MD replicas. It has been recently shown [16] that randomly chosen configurations from late stages of MD simulations are adequate sources to obtain deep insight into interfragment bonding. Accordingly, aiming at understanding the fundamental forces driving the attachment of RBD(S) to host cells, virus· · · cell bonding interactions were dissected following these steps: 1. Persistent residue (in the virus) to residue (in the host cell) contacts during the 600 ns of the MD simulations were identified using the VMD program [40] with a cutoff radius of 3.5Å 2. One frame was randomly chosen from the late stages of one MD run 3. We extracted all extended interacting pairs in the chosen frame, kept them in the configurations they had in the interacting system (this is more accurate to understand the virus· · · cell bonding interactions than reoptimizing the isolated pairs), and (a) Computed accurate interaction energies using highly correlated domain based local pair-natural orbital coupled-cluster (DLPNO-CCSD(T)/aug-cc-pVDZ) single point energy calculations [41, 42] on the dimers and in the monomers. The ORCA suite of programs, version 4.0.1.2, was used to this end [43] (b) Dissected the intermolecular interactions using the tools provided by the natural bond orbitals (NBO [12] [13] [14] as implemented in NBO7.0 [44] ) and by the quantum theory of atoms in molecules (QTAIM [9] [10] [11] as implemented in AIMall [45] ) (c) Calculated QM/MM absorption spectra for the monomers and for the dimers. All TD-DFT calculations were carried out using the B3LYP/aug-cc-pVDZ model chemistry [46] [47] [48] [49] (tests using the dispersion corrected B3LYP-D3, ωB97xD, functionals yielded essentially identical results). QM/MM Electrostatic Embedding was exploited, [20] in which only the extended dimers were considered as the quantum region, and the rest of the system as the MM region, which was modelled by the Amber force field [19] and by assigning to atom types the same charges used in the MD runs. A large number of excited states are needed to guarantee that both the intensities and shapes of the absorption spectra are accurately reproduced. Therefore, in this work the first 20 excited states were computed at the TD-DFT/QM-MM level in each case. Vertical excitations were shifted by -0.7 eV to account for the systematic error due to the choice of functional. This value was chosen in order to match the experimental absorption maximum for Tyrosine. [50, 51] All Spectra were then convoluted with Gaussian lineshapes with full width half maximum (FWHM) of 0.6 eV. All QM/MM calculations were carried out with Gaussian16 [52] 4. We isolated the interaction region by including everything within a 3.0Å radius from the last atom at the end of each aminoacid (1325 atoms in total) and calculated the interfragment non covalent interaction (NCI as implemented in NCIPLOT [53] ) surface using the promolecular densities approximation. [17, 18] Atoms in Molecules: A Quantum Theory Discovering Chemistry with Natural Bond Orbitals Proceedings of the National Academy of Sciences 2020 What is Life? The Physical Aspect of the Living Cell Electrophoresis CRC Handbook of Chemistry and Physics WIREs Computational Molecular Science AIMALL (version 19.10.12) Gaussian 16 Revision B.01 Internal support from Universidad de Antioquia via "Estrategia para la sostenibilidad" is acknowledged. Partial funding for this project from H2020-MSCA-ITN-2017 European Training Network "Computational Spectroscopy In Natural sciences and Engineering" (CO-SINE), grant number 765739 is also acknowledged. N.R. thanks Colciencias for her doctoral scholarship. Cartesian coordinates for all dimer pairs listed in Table 1 and for the fragment taken for the NCI surfaces. A figure with the minimum interacting distances used to determine the interaction regions is included as well. A video of one of the trajectories is also provided.