key: cord-0713126-xsiytyhh authors: Williams, Jonathan K.; Wang, Baifan; Sam, Andrew; Hoop, Cody L.; Case, David A.; Baum, Jean title: Molecular dynamics analysis of a flexible loop at the binding interface of the SARS‐CoV‐2 spike protein receptor‐binding domain date: 2021-08-23 journal: Proteins DOI: 10.1002/prot.26208 sha: 492e7a9a8a8abe5e5ecdded7d0f7cc00b70b05b5 doc_id: 713126 cord_uid: xsiytyhh Since the identification of the SARS‐CoV‐2 virus as the causative agent of the current COVID‐19 pandemic, considerable effort has been spent characterizing the interaction between the Spike protein receptor‐binding domain (RBD) and the human angiotensin converting enzyme 2 (ACE2) receptor. This has provided a detailed picture of the end point structure of the RBD‐ACE2 binding event, but what remains to be elucidated is the conformation and dynamics of the RBD prior to its interaction with ACE2. In this work, we utilize molecular dynamics simulations to probe the flexibility and conformational ensemble of the unbound state of the receptor‐binding domain from SARS‐CoV‐2 and SARS‐CoV. We have found that the unbound RBD has a localized region of dynamic flexibility in Loop 3 and that mutations identified during the COVID‐19 pandemic in Loop 3 do not affect this flexibility. We use a loop‐modeling protocol to generate and simulate novel conformations of the CoV2‐RBD Loop 3 region that sample conformational space beyond the ACE2 bound crystal structure. This has allowed for the identification of interesting substates of the unbound RBD that are lower energy than the ACE2‐bound conformation, and that block key residues along the ACE2 binding interface. These novel unbound substates may represent new targets for therapeutic design. between the viral Spike RBD and ACE2, the nature of these residuelevel interactions, and the overall strength of the interaction. Both the CoV1-RBD and CoV2-RBD binding sites for ACE2 adopt a similar interface (Figure 1 ), consisting of long, unstructured stretches of 14 residues, which form a range of stabilizing hydrogen bonds, van der Waals contacts, and salt bridges with ACE2. [4] [5] [6] The RBD binding interface in general contains four loops (Figure 2A ) that have the potential to be dynamic and flexible, both in an unbound state as well F I G U R E 1 X-ray crystal structures of CoV1 and CoV2 receptor binding domains (RBD) used in MD simulations. X-ray crystal structures of the SARS-CoV RBD in complex with a neutralizing antibody (PDB ID: 2dd8) and the SARS-CoV2 RBD in complex with the ACE2 receptor (PDB ID: 6m0j). The RBDs from these structures are used as starting structures in this work. The RBD is shown in color and the binding partner is in gray. The loop regions are in blue, and the secondary structure elements are in purple, highlighting the large degree of unstructured regions in the RBD. Enlarged inset of the CoV2 RBD-ACE2 binding interface shown on the right. Residue sidechains on the RBD (blue) and ACE2 (gray) that participate in the binding interaction are shown in stick configuration 5, 7, 8 In the case of CoV2, three residues within the flexible Loop 3 of the RBD (F486, N487, and Y489) were identified to participate in stabilizing interactions with ACE2. 4 Interestingly, several antibodies developed to target the RBD have also been found to bind to the flexible Loop 3 ( Figure S1A ). When antibodies interact with Loop 3, the distribution of Loop 3 conformations is greater than when Loop 3 is not part of the binding interface ( Figure S1B ). This suggests that Loop 3 has an inherent conformational flexibility that is not observed from static structures of the RBD-ACE2 complex. The role that protein dynamics play in mediating protein-protein binding is not only of great importance to understanding the basic mechanisms of binding, but also plays a crucial role in the design of protein binding therapeutics. 9, 10 Although much progress has been made in understanding the interaction between the Spike RBD and ACE2, what remains to be elucidated is the flexibility and conformational dynamics of the RBD in an unbound state. The internal motions of proteins play a key role in their interactions and functionality, a fact that is often lost in static structures derived from electron microscopy and X-ray crystallography. Understanding the conformational ensemble of RBD states without a binding partner may reveal novel targets not observed in static structures of the RBDs, which will aid in the design of therapeutics targeting this important binding domain. In this work, we utilize MD simulations to probe the flexibility and conformational sampling of the SARS-CoV and SARS-CoV-2 RBDs in an unbound state. We focus on the Loop 3 region of the RBD, which contains several residues that participate in stabilizing interactions with ACE2 and is a hot-spot of several common single amino acid mutations that have been identified during the ongoing COVID-19 pandemic. We find that Loop 3 represents a localized area of dynamic flexibility in an unbound state, and our simulations suggest that this flexibility is resilient to perturbation by mutations. Finally, using loop-modeling to probe novel conformations of the Loop 3 region, we have identified interesting substates of the unbound RBD that block the binding interface and are lower energy than the conformation of the RBD bound to ACE2, and thus may represent enticing targets for therapeutic intervention. In the context of the full trimeric Spike protein complex, the unbound binding-competent state of the RBD corresponds to Spike protein structures with RBDs in an "up" configuration (e.g., PDB ID 6vsb). In this "up" state, the RBD is standing separate and free from the other domains of the protein complex, and in particular the ACE2-binding interface of the RBD is entirely solvent exposed. As such, multi-microsecond MD simulations of a single RBD in solution were recorded in order to explore the conformational flexibility and dynamics of the unbound wild-type spike protein RBDs from CoV1 and CoV2. The initial coordinates used in the MD trajectories were taken from the high-resolution structures determined by X-ray diffraction of CoV1-RBD in complex with a neutralizing antibody (PDB: 2dd8) and of CoV2-RBD in complex with the human ACE2 receptor (PDB: 6m0j; Figure 1 ). Glycosylation of the RBD was not included since the two glycosylation sites that have been identified are at the very N-terminal region of the RBD (N331 and N343), where they are far from the residues that make up the ACE2 binding domain. 11 While it contains a well ordered β-sheet core, much of the RBD is unstructured ( Figure 1 It is important to consider how mutations perturb the conformation and dynamics of the RBD, especially during an ongoing pandemic as mutations continue to accumulate. 13, 14 Continued identification and evaluation of mutants is crucial in order to better understand the evolving nature of the pandemic, and to ensure that the treatments and vaccines whose primary target is the Spike protein continue to be effective. As To better examine the conformational states of the RBD binding interface that were sampled during the MD simulations, and to identify binding and nonbinding conformational states, we performed a cluster analysis on each of the wild-type and mutant simulations using a hierarchical agglomerative (heiragglo) algorithm. 19 Using an epsilon (ε) cutoff of 1.9 Å, the 22 500 conformations of the CoV1-RBD and CoV2-RBD were separated into 51 and 14 clusters respectively, whereas the CoV2-RBD mutants clustered into fewer groups (G476S: four clusters, S477N: four clusters, T478I: two clusters, and V483A: seven clusters). The average RMSD of the residues in the RBD binding interface with respect to the starting crystal structure was then calculated for each cluster. Clusters with low RMSD then represent conformations that are very similar to the crystal structures of RBD bound to ACE2, while clusters with large RMSD correspond to conformations that are very different from these receptor bound states. Figure 2B ). However, it is interesting to note that the loops tend to have higher predicted disorder propensity than the rest of the RBD ( Figure 2B ), a propensity that may be evolutionarily conserved. 20 In addition, the missing residues of these loop regions in many cryo-EM structures of the spike protein suggest that these loops may be dynamic and sample a conformational ensemble distinct from the bound state. Figure S3 with Figure 3A ). This indicates that the bulk of the RBD structure is resilient to change even in the presence of large conformational flexibility of Loop 3. In addition, some of these loop models represent conformations that are more energetically favorable for the unbound RBD in solution (Table 1) face with ACE2. 13 The binding free energy between the RBD and ACE2 was also found to be consistent between the wild-type and the mutants with the exception of T478I, which had a binding free energy 6 kcal/mol higher than the wild-type. These results are consistent with experimental data from deep mutational scanning and flow cytometry, which found that all naturally occurring mutants have a similar degree of expression and a similar binding affinity for ACE2 as in the wild-type. 14 Our own analysis of the SARS-CoV-2 proteome evolution 16 suggesting a therapeutic designed to target this region may be broadly applicable to RBDs with mutations in this region. In addition, we have identified unique conformations of the unbound CoV2-RBD in solution that naturally block the binding interface with ACE2 and may be interesting targets for drug design to interfere with RBD-ACE2 binding. We hope that these results will help to catalyze future identification of therapies relevant to CoV2 or to future coronaviruses that may emerge. The structures used to model the wild-type RBDs were taken from the Coronavirus Structural Taskforce (https://github.com/thorn-lab/ coronavirus_structural_task_force), which further refined the highresolution structures determined by X-ray diffraction of CoV1-RBD in complex with a neutralizing antibody (PDB: 2dd8) and of CoV2-RBD in complex with the human ACE2 receptor (PDB: 6m0j). In order to isolate the RBD for subsequent MD simulations, the protein modeling platform PyRosetta 18 was employed to remove the ACE2 receptor residues and RBD glycans from the model, leaving only the clean RBD residues. The four selected mutants (G476S, S477N, T478I, and V483A) were then generated from the clean wild-type RBD structure by creating a decoy of the wild-type structure in PyRosetta and restricting for the selected mutation. These mutant decoys were then relaxed based on the ref2015_cst score function within PyRosetta. 28 One-hundred energy minimized decoys for each mutant were generated in this protocol, and the lowest energy decoy for each mutant RBD was selected as the starting structure for MD simulation. Loop 3 variant structures of the wild-type RBD were generated using the lowest energy decoy of the wild-type RBD, using the same proto- C480 and C488 were selected to output a decoy, and this protocol was run until 100 decoys had been generated. All 100 decoys were then relaxed based on the ref2015_cst score function using PyRosetta. 28 Once again, 100 energy minimized structures for each initial loop decoy were generated and five loop structures were chosen at random for MD simulations. The lowest energy decoy of each of the five loop structures was used for MD simulation. All of the water molecules in the initial X-ray structure were removed. Each protein was immersed in a truncated octahedral box of OPC water molecules 29 with the box border at least 20 Å away from any atoms of the RBD. Each system was neutralized by adding 2 Cl À counter ions. The protein was treated with the ff19SB force field. 30 The simulations were performed with the GPU-enabled CUDA version of the pmemd module in the AMBER 2018 package. 31 Prior to MD simulation, the systems were subjected to energy minimizations and equilibration. The minimization started with 1000 steps of steepest descent minimization followed by 1000 steps of conjugate gradient minimization. The system was heated from 0 to 300 K over 100 ps with protein position restraints of 10 kcal/mol A À2 . Then a series of equilibrations (each lasting 10 ns) were performed at constant temperature of 300 K and pressure of 1 atm with protein position restraints that were incrementally released (10.0, 1, 0.1, and 0 kcal/mol A À2 ). Periodic boundary conditions were used, and electrostatic interactions were calculated by the particle mesh Ewald method, 32, 33 with the nonbonded cutoff set to 9 Å. The SHAKE algorithm 34 was applied to bonds involving hydrogen, and a 2 fs integration step was used. Pressure was held constant at 1 atm with a relaxation time of 2.0 ps. The temperature was held at 300 K with Langevin dynamics and a collision frequency of 5.0 ps À1 . The production runs for wild-type CoV1-RBD and wild-type CoV2-RBD are 4 μs, for the CoV2-RBD mutants are 2 μs, and for the CoV1-RBD and CoV2-RBD loop models are 750 ns. All analysis of MD trajectories, including the RMSD, root-meansquare fluctuation (RMSF), hierarchical agglomerative clustering, and the extraction of representative structures from trajectories were performed using CPPTRAJ 35 respectively) using the average-linkage hierarchical agglomerative method. 19 Coordinate RMSD was used as the distance metric. The critical distance ε value was set to 1.9 Å and the sieve value was set to 10. Only the backbone C, CA, and N atoms were used in the clustering. Emerging coronaviruses: genome structure, replication, and pathogenesis Site-specific glycan analysis of the SARS-CoV-2 spike Origin and evolution of pathogenic coronaviruses Computational alanine scanning and structural analysis of the SARS-CoV-2 spike protein/angiotensin-converting enzyme 2 complex Effect of mutation on structure, function and dynamics of receptor binding domain of human SARS-CoV-2 with host cell receptor ACE2: a molecular dynamics simulations study Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus Does SARS-CoV-2 bind to human ACE2 more strongly than does SARS-CoV? Dynamics of the ACE2-SARS-CoV-2/SARS-CoV spike protein interface reveal unique mechanisms Importance of protein dynamics in the structure-based drug discovery of class a G protein-coupled receptors (GPCRs) Protein structure-based drug design: from docking to molecular dynamics Analysis of the SARS-CoV-2 spike protein glycan shield reveals implications for immune recognition PrDOS: prediction of disordered protein regions from amino acid sequence Critical sequence hotspots for binding of novel coronavirus to angiotensin converter enzyme as evaluated by molecular simulations Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding GISAID: global initiative on sharing all influenza data -from vision to reality Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first six months of the COVID-19 pandemic Evolution of the SARS-CoV-2 Proteome in Three Dimensions (3D) during the First Six Months of the COVID-19 Pandemic-Supplementary tables and models PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta Clustering molecular dynamics trajectories: 1. Characterizing the performance of different clustering algorithms An exploration of the SARS-CoV-2 spike receptor binding domain (RBD) -a complex palette of evolutionary and structural features Flexibility and small pockets at protein-protein interfaces: new insights into druggability Expanding the number of 'Druggable' targets: non-enzymes and protein-protein interactions Cryptic binding sites on proteins: definition, detection, and druggability Exploring the structural origins of cryptic sites on proteins Structural and functional analysis of the D614G SARS-CoV-2 spike protein variant Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity. bioRxiv The Rosetta all-atom energy function for macromolecular modeling and design Building water models: a different approach ff19SB: amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution Particle mesh Ewald: an NÁlog (N) method for Ewald sums in large systems A smooth particle mesh Ewald method Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data Py: an efficient program for end-state free energy calculations Improved generalized born solvent model parameters for protein simulations Generalized born model with a simple, robust molecular volume correction UCSF chimera-a visualization system for exploratory research and analysis The authors declare no conflict of interest.