key: cord-0844850-q41ls15h
authors: Pawnikar, Shristi; Bhattarai, Apurba; Wang, Jinan; Miao, Yinglong
title: Binding Analysis Using Accelerated Molecular Dynamics Simulations and Future Perspectives
date: 2022-01-06
journal: Adv Appl Bioinform Chem
DOI: 10.2147/aabc.s247950
sha: 3e95f3660e370d5b974228863b06c688bcb7e537
doc_id: 844850
cord_uid: q41ls15h

Biomolecular recognition such as binding of small molecules, nucleic acids, peptides and proteins to their target receptors plays key roles in cellular function and has been targeted for therapeutic drug design. Molecular dynamics (MD) is a computational approach to analyze these binding processes at an atomistic level, which provides valuable understandings of the mechanisms of biomolecular recognition. However, the rather slow biomolecular binding events often present challenges for conventional MD (cMD), due to limited simulation timescales (typically over hundreds of nanoseconds to tens of microseconds). In this regard, enhanced sampling methods, particularly accelerated MD (aMD), have proven useful to bridge the gap and enable all-atom simulations of biomolecular binding events. Here, we will review the recent method developments of Gaussian aMD (GaMD), ligand GaMD (LiGaMD) and peptide GaMD (Pep-GaMD), which have greatly expanded our capabilities to simulate biomolecular binding processes. Spontaneous binding of various biomolecules to their receptors has been successfully simulated by GaMD. Microsecond LiGaMD and Pep-GaMD simulations have captured repetitive binding and dissociation of small-molecule ligands and highly flexible peptides, and thus enabled ligand/peptide binding thermodynamics and kinetics calculations. We will also present relevant application studies in simulations of important drug targets and future perspectives for rational computer-aided drug design.

Cellular functions such as gene translation, protein folding, ligand binding and signaling often require conformational transitions of biological macromolecules such as proteins, nucleic acids and lipids into various low-energy states. [1] [2] [3] [4] Free energy landscapes of biomolecules essentially determine their structure, dynamics and functions. 5, 6 It is important to comprehend the dynamic nature of biomolecules. Experimental techniques including nuclear magnetic resonance (NMR), X-ray crystallography and cryo-electron microscopy (cryo-EM) have greatly advanced structural characterizations of the biomolecular binding processes, but these efforts are time-consuming and very expensive. Moreover, these techniques often provide a rather static picture of the biomolecular interactions and conformations, whereas biomolecules are dynamic in vivo to perform the necessary cellular functions. It remains challenging and critical for us to understand the mechanisms, thermodynamics and kinetics of biomolecular recognition.

Molecular dynamics (MD) is a powerful "computational microscope" that helps us to visualize atomistic time evolution of biomolecules. 7 Remarkable advances have been made in computing hardware (eg, Anton supercomputers and GPUs) and algorithm developments over the past several decades for running longer and cheaper MD simulations. [8] [9] [10] [11] [12] With microsecond-timescale conventional MD (cMD) simulations, spontaneous ligand/drug binding was successfully captured to the Src protein kinase, 13 the β 1 -and β 2adrenergic receptors (β 1 AR and β 1 AR) 14 and the M 2 and M 3 muscarinic receptors 15 by the DE Shaw Research Group using the Anton supercomputer, for which the simulation predicted ligand binding poses closely matched those determined in the X-ray crystal structures. Moreover, hundreds-of-nanosecond cMD simulations were combined in Markov State Models to characterize the benzamidine ligand binding to the trypsin model protein by the De Fabritiis group. 16 In addition, microsecond-timescale cMD simulations have been successful to capture peptide binding in several peptide-protein complexes, including proline-rich motif (PRM)-SH3, 17 BAD-BclX L and p53-MDM2. 18 The "folding-upon-binding" mechanism of peptides was investigated using a measles virus nucleoprotein fragment and X domain of its phosphoprotein complex. 19 Anton supercomputer was used to run ~200 μs MD simulations at 400 K elevated temperature. The simulations could capture 70 binding and unbinding events of the peptide fragment to the protein domain. Therefore, cMD has been successful to capture binding of ligands and peptides on the microsecond timescales. However, cMD with limited simulation timescales still lacks the power to study slower biological processes that would occur over longer timescales (eg, milliseconds and beyond). 15 Metadynamics, 20,21 umbrella sampling, 22, 23 adaptive biasing force (ABF) calculations 24, 25 and conformational flooding 26, 27 represent more advanced simulation methods for enhanced sampling of biomolecules developed over the past few decades. Metadynamics, in particular, has enabled investigating ligand binding to receptors in terms of the kinetic rates of binding and unbinding, residence time 28, 29 and binding free energies. [30] [31] [32] [33] The binding rate constant of peptide p53 to MDM2 protein predicted by ~120 μs weighted ensemble simulations was in good agreement with the experimentally calculated values. 34 Enhanced sampling implemented with a combination of long microsecond cMD and Hamiltonian replica exchange method could simulate rare events such as binding and unbinding of the PMI peptide inhibitor-MDM2 oncoprotein fragment complex 35 and the resulting kinetic rates agreed well with experiments. 35 Moreover, infrequent metadynamics could predict peptide association and dissociation rates of the p53-MDM2 peptide-protein complex, whereas, biasexchange metadynamics could calculate binding free energy of the same system with 27 μs simulations. 36 Although these methods have shown remarkable improvements in sampling and capturing rare events that happen over exceedingly long timescales, they often pose a limitation for the requirement of predefined reaction coordinates and collective variables (CVs). 27, 37, 38 To address the above limitations, more enhanced sampling methods that do not require CVs have been developed, 39 including replica exchange MD, 40,41 selfguided MD, [42] [43] [44] [45] and accelerated MD (aMD), 46-55 etc. In particular, aMD adds a boost potential ΔV to smooth the system potential energy surface to sample biomolecular transitions across reduced energy barriers. 55, 56 The aMD simulations have been successful in capturing spontaneous ligand binding to a muscarinic G-protein-coupled receptor (GPCR). 57 However, aMD poses a drawback of large energetic noise during reweighting because of very high boost potential that is normally on the order of tens to hundreds of kcal/mol. 58 Recovering the original free energy landscapes represents a major limitation of aMD simulations. 57, 59, 60 To address this issue, Gaussian aMD (GaMD) 61, 62 has been developed using a harmonic function to construct the boost potential that exhibits a Gaussian distribution. Through cumulant expansion to the second order, accurate reweighting can be achieved from GaMD simulations. 63 Without the need of predefined CVs, GaMD provides "unconstrained" enhanced sampling and can be easier to use for simulations of complex biomolecules than CV-based enhanced sampling methods. GaMD has been also shown to clearly identify distinct low-energy intermediate states during large protein conformational changes, ligand binding and protein-protein interactions. These intermediate states may not be sufficiently sampled in the CV-biasing simulation methods. On the other hand, relatively longer GaMD simulations could be needed in order to achieve sufficient sampling. Nevertheless, hundreds-of-nanosecond to microsecond timescale GaMD simulations have been successfully applied to sample large conformational changes of various biomolecules, including fast-folding proteins, 61,62,64 the CRISPR-Cas9 gene-editing system, 65, 66 GPCRs, 62, [67] [68] [69] [70] [71] protein kinases, 72, 73 ligand binding 61, 62, [67] [68] [69] [70] [71] 74, 75 and protein-protein/membrane/nucleic acid interactions. 66, [76] [77] [78] [79] [80] The principles of GaMD and notably applications in understanding drug binding pathways were reviewed https://doi.org/10.2147/AABC.S247950

Advances and Applications in Bioinformatics and Chemistry 2022:15 2 recently. 74, 81 Here, we will review recent method developments of GaMD, Ligand GaMD (LiGaMD) 82 and Peptide GaMD (Pep-GaMD), 83 which have greatly expanded our capabilities to simulate biomolecular binding processes. More recently, spontaneous binding of various biomolecules to their receptors such as drug binding to membrane proteins like chemokine GPCRs, 71 human angiotensinconverting enzyme 2 (ACE2), 84 and RNA-protein binding 85, 86 has been demonstrated by GaMD simulations. Furthermore, microsecond LiGaMD and Pep-GaMD simulations were able to simulate repetitive binding and dissociation of small-molecule ligands 82 and highly flexible peptides, 83 and thus enabled ligand/peptide binding thermodynamics and kinetics calculations. We will also present relevant application studies in simulations of important drug targets and future perspectives for rational computer-aided drug design.

GaMD is an unconstrained enhanced sampling technique that reduces the system energy barriers by adding a harmonic boost potential to the system potential energy terms. 61 The added boost potential smooths the system potential energy surface and allows biomolecules to sample large conformational space. The method is briefly illustrated here. For a system with N atoms at positions r; r * 1 ; � � � ; r * N n o , when the system potential V r ð Þis lower than a threshold energy E, a boost potential is added as:

where k is the harmonic force constant. The two adjustable parameters E and k are automatically determined based on three enhanced sampling principles. 61 The reference energy needs to be set in the following range:

where V max and V min are the system minimum and maximum potential energies. To ensure that Eqn. (2) is valid, k has to satisfy: k � 1 V max À V min Let us define ;k 0 � 1 V max À V min , then 0<k 0 � 1. The standard deviation of ΔV needs to be small enough (ie, narrow distribution) to ensure proper energetic reweighting: 63 

where V avg and σ V are the average and standard deviation of the system potential energies, σ ΔV is the standard deviation of ΔV with σ 0 as a user-specified upper limit (eg, 10k B T) for proper reweighting. When E is set to the lower bound E=V max , k 0 can be calculated as:

Alternatively, when the threshold energy E is set to its upper bound E ¼ V min þ 1 k , k 0 is set to:

if k 0 is found to be between 0 and 1. Otherwise, k 0 is calculated using Eqn. (3) . Like aMD, GaMD also provides options to add only the total potential boost, only dihedral potential boost, or the dual potential boost. The dual-boost GaMD (GaMD_Dual) simulation generally provides higher acceleration than the other two types of simulations for enhanced sampling. 56 The simulation parameters comprise of the threshold energy values and the effective harmonic force constants for the total and dihedral potential boost, respectively.

For energetic reweighting to calculate the potential of mean force (PMF), the probability distribution along a reaction coordinate is written as p � A ð Þ. Given the boost potential ΔV r ð Þ of each frame, p � A ð Þ can be reweighted to recover the canonical ensemble distribution, p A ð Þ, as:

where M is the number of bins, β ¼ k B T and e βΔV r ð Þ j is the ensemble-averaged Boltzmann factor of ΔV r ð Þ for simulation frames found in the jth bin. The ensemble-averaged reweighting factor can be approximated using cumulant expansion:

where the first two cumulants are given by:

The boost potential obtained from GaMD simulations usually follows near-Gaussian distribution. 74 

expansion to the second order thus provides a good approximation for computing the reweighting factor. 61, 63 The reweighted free energy F A ð Þ ¼ À k B Tlnp A ð Þ is calculated as:

Þ is the modified free energy obtained from GaMD simulation and F c is a constant. A free Python scripts toolkit "PyReweighting" 63 is developed and distributed at http://miao.compbio.ku.edu/ PyReweighting/.

LiGaMD is developed based on GaMD 61 and efficiently sample ligand binding and unbinding by selectively boosting non-bonded interaction potential energy between the ligand-receptor and ligand/environment. A detailed description of the method has been provided in the previous LiGaMD report. 82 A brief summary is provided below. Let us assume a system containing protein P with ligand L and a biological environment E. The system has N atoms with their coordinates r; r 

where K p ð Þ and V r ð Þ are the system kinetic and total potential energies, respectively. Now, the potential energy can be expressed as:

where V P;b , V L;b and represent the bonded potential energies in protein P, ligand L and environment E, respectively. V PP;nb , V LL;nb and V EE;nb are the self non-bonded potential energies in protein P, ligand L and environment E, respectively. V PL;nb , V PE;nb and V LE;nb represent the non-bonded interaction energies between P-L, P-E and L-E, respectively. As we know, in the classical molecular mechanics force fields, 87-90 the non-bonded potential energies are usually calculated as:

Where V elec and V vdW represent the system electrostatic and van der Waals potential energies. Let us assume, while the ligand binds to the receptor, it mainly involves the non-bonded interaction energies,

In accordance with the previous GaMD algorithm, we will add boost potential to the essential ligand nonbonded potential energy in the LiGaMD method:

where k L,nb represents the harmonic constant and E L,nb represents the threshold energy for applying boost potential. These parameters can be based on three enhanced sampling principles as described in GaMD. 61 Next, one can add multiple ligand molecules in the solvent to facilitate ligand binding to proteins in MD simulations. 14, 70 This is based on the fact that the ligandbinding rate constant k on is inversely proportional to the ligand concentration. The higher the ligand concentration, the faster the ligand binds, provided that the ligand concentration is still within its solubility limit. In addition to selectively boosting the bound ligand, another boost potential could thus be applied on the unbound ligand molecules, protein and solvent to facilitate both ligand dissociation and rebinding. The second boost potential is calculated using the total system potential energy other than the nonbonded potential energy of the bound ligand in Eqn. (10) as:

where E D and k D are the corresponding threshold energy for applying the second boost potential and the harmonic constant, respectively. This leads to dual-boost LiGaMD (LiGaMD_Dual) with the total boost poten-

Peptides often undergo large conformational changes during binding to target proteins, being distinct from smallmolecule ligand binding or protein-protein interactions (PPIs). Another algorithm called peptide GaMD or "Pep-GaMD" has been developed to enhance sampling of peptide binding. 83 In Pep-GaMD, we consider a system of peptide L binding to a protein P in a biological https://doi.org/10.2147/AABC.S247950

Advances and Applications in Bioinformatics and Chemistry 2022:15

4 environment E. We decompose the potential energy into similar terms as in Eqn. (10) . Presumably, peptide binding mainly involves in both the bonded and non-bonded interaction energies of the peptide since peptides often undergo large conformational changes during binding to the target proteins. Thus, the essential peptide potential energy is

In Pep-GaMD, we add boost potential selectively to the essential peptide potential energy according to the GaMD algorithm:

where E L is the threshold energy for applying boost potential and k L is the harmonic constant. The Pep-GaMD simulation parameters are derived similarly as in GaMD.

A second boost is added to the interaction between the protein and the environment/solvent to accelerate the peptide binding, along with the selective boost added to the peptide. This boost represents the total system potential energy without the essential peptide potential energy included:

Where V D represents the total system potential energy without the essential peptide potential energy included, E D represents the second boost potential threshold energy and k D represents the harmonic constant. Hence, this contributes to the dual-boost Pep-GaMD as the total boost potential ΔV r

Chemokine receptors are key GPCRs with involvement in many biological functions including immune cell response, growth and development of bodily structure like cardiac tissues and central nervous system, and several cancers. 91, 92 In particular, CXCR4 facilitates the entry of HIV into host immune cells. [93] [94] [95] Plerixafor (PLX) was discovered as the first selective small-molecule inhibitor of CXCR4 (Mozobil ® , AMD3100), however, is suggested to have multiple binding modes to CXCR4. [96] [97] [98] Here, GaMD simulations were performed to study the pathway and binding mechanism of the PLX drug to the CXCR4 receptor. 71 The X-ray structure of CXCR4 (PDB: 3ODU) with the highest resolution (2.5 Å) was used to build the simulation system. The conformation of PLX with positively charged N1, N3, N4 and N7 of the bicyclam rings at physiological pH was also added to the system. GaMD simulations were performed for ligand in both unbound and bound state with ten ligand molecules >15 Å away from CXCR4 ( Figure 1A) , and with the lowest energy bound conformation of PLX obtained from rigid-body docking using Autodock. 99 GaMD_Dual_NB boost scheme was used in which system nonbonded and dihedral energy terms were boosted. GaMD_Dual_NB was applied to five independent 800-1000 ns GaMD production runs to capture PLX drug binding to CXCR4 from an unbound state.

The PLX bound simulations showed stable binding of PLX in the CXCR4 orthosteric site. The simulations were used for hierarchical agglomerative clustering to obtain five clusters; the top-ranked cluster was selected as a reference, termed the "bound state". Spontaneous binding of PLX was captured to the orthosteric site of the receptor during one of the five replicas. Complete binding of PLX was observed at ~480 ns with minimum RMSD of 2.76 Å relative to the X-ray structure of CXCR4 (PDB: 3ODU). The positively charged PLX formed stable salt bridge interactions with residues Asp97 2.63 , Asp262 6.58 and Glu288 7.39 occupying both major and minor sub-pockets of the receptor ( Figure 1B ). Other simulations revealed important drug-receptor interactions significant in the recognition and binding of PLX to the CXCR4 receptor.

Free energy profiles of PLX binding to the CXCR4 receptor were calculated using the PLX RMSD and the distance between the PLX and the receptor as reaction coordinates of all five GaMD_Dual_NB simulations combined. The 2D PMF profile helped identify four low-energy conformational states of PLX including "Bound (B)", "Intermediate 1 (I1)", "Intermediate 2 (I2)" and "Unbound (U)". In the I1 and I2 state, polar residues in the ECL2-TM5-TM6 region of the CXCR4, namely, D187 ECL2 , D193 5.32 and D262 6.58 formed favourable interactions with the nitrogen atoms of PLX which are positively charged ( Figure 1C This study reveals a novel allosteric site that is expected to facilitate drug design of CXCR4.

Human ACE2 receptor has been of critical importance in biology and medicine. It is highly expressed in different organ epithelial cells such as heart, kidney, pancreas and lungs. Human ACE2 receptor is an excellent therapeutic (C) Intermediate I1 state with interacting residues highlighted in green sticks, including D187 ECL2 and D262 6.58 that formed ionic interactions with PLX atoms N4 and N3, respectively; (D) Intermediate I2 state with interacting residues highlighted in green sticks, including D187 ECL2 , D193 5.32 and D262 6.58 that formed salt bridges with positively charged N7, N4 and N1 of PLX, respectively. target to severe acute respiratory syndrome coronavirus (SARS-Cov) as it serves as receptor for both SARS-CoV and SARS-CoV-2. It is critically important to study drug binding mechanism to this receptor, which remains elusive till today, for the development of effective treatment against SARS-CoV-2.

In order to study the molecular and structural basic of ligand binding to the human ACE2, LiGaMD simulations were performed using the X-ray structure of the inhibitor MLN-4760-bound ACE2 as model (PDB: 1R4L). 84 The AMBER ff19SB force field and GAFF-2 were used to model the protein and ligand, respectively. In addition to the MLN-4760 in the X-ray structure, 9 MLN-4760 molecules were added to the solvent to build the initial simulation systems. 64 ns LiGaMD equilibration followed by ten independent 700-2000 ns LiGaMD productions were performed. Dissociation of MLN-4760 from the ACE2 active site was captured during LiGaMD equilibration. Protein conformational change was observed upon ligand dissociation where the receptor subdomain I transited from "Closed" to "Open" state. Ligand binding and unbinding to the ACE2 active site was captured in three of the ten LiGaMD simulations ("Sim1", "Sim2" and "Sim3") with minimum ligand RMSD relative to the X-ray structure of ~0.99 Å (Figure 2A ). Large-scale conformational changes of the ACE2 were sampled in LiGaMD. In addition, LiGaMD identified four low-energy conformations of the ACE2 receptor, viz. "Open", "Partially Open", "Closed", and "Fully Closed". The "Fully Closed" conformation was newly discovered through LiGaMD simulations, whereas, the three "Open", "Partially Open" and "Closed" conformations were highly consistent with the experimentally resolved structures.

Particularly, in Sim2, within ~100 ns, one MLN-4760 initially interacted to the receptor interface between the α5 and 3 10 H4 helices. In ~100-160 ns, the ligand further entered the ACE2 active site through the cavity between the subdomain I and II, bound to the active site up to ~500 ns and then dissociated ( Figure 2B ). Ligand dissociation pathway was completely different than that of the ligand binding ( Figure 2C and D) . When the ligand dissociated, the conformation of receptor subdomain I was changed from "Closed" to "Open" through opening between the receptor helices α2 and α4.

Using the ligand RMSD relative to the X-ray structure and interdomain distance as reaction coordinates, free energy landscape was calculated by combining all LiGaMD simulation trajectories. The free energy landscape identified nine low-energy conformations of the receptor-ligand complex, namely, "Bound (B)", "Intermediate-1 (I-1)", "Intermediate-2 (I-2)", "Intermediate-3 (I-3)", "Intermediate (I-4)", "Unbound-1 (U-1)", "Unbound-2 (U-2)", "Unbound-3 (U-3)" and "Unbound-4 (U-4)" ( Figure 2E ). Ligand bound at the active site with minimum ligand RMSD of 0.99 Å was termed as the "Bound" conformation, in which the receptor interdomain distance was ~12-14 Å (Figure 2A ). In the I-1 state, ligand interacted with residues in the α5 helix, α11 helix, α14 helix, α18 helix and 3 10 H4 helix loop in the two protein subdomains. The ligand RMSD and interdomain distance in the I-1 state were ~9.8 Å and ~18-22 Å, respectively. The receptor adopted a "Partially-open" conformation in this state. In the I-2 state, ligand interacted with receptor subdomain II involving the α17 helix, α18 helix, and α19 helix. The ligand RMSD and interdomain distance was located at (~25.6 Å, ~5-7 Å). In the I-3 state, ligand interactions were observed with the α2 helix and α4 helix in the protein subdomain I and II, respectively, in which the ligand RMSD and interdomain distance was located at (~30.4 Å, ~13-20 Å). In the I-4 state, ligand interacted with the protein subdomain II involving the α8 helix, α14 helix, and 3 10 H4 with ligand RMSD and the interdomain distance located at (~31.7 Å, ~25-26 Å). In the unbound states U-1, U-2, U-3 and U-4 exhibited interdomain distances of ~5-7, ~10-12, ~20-21, and ~25 Å, respectively, at ligand RMSD ~80 Å.

LiGaMD has successfully captured the dissociation and rebinding pathways of MLN-4760 to the human ACE2 receptor. With detailed analysis of the receptor and protein conformations, LiGaMD simulations suggested that the binding of MLN-4760 to human ACE2 mainly followed the conformational selection mechanism as the ligand binding shifted the receptor conformation to the "Closed" state. These simulation findings provide important insights for design and development of drugs for therapeutic treatment of COVID-19 and other diseases associated with the ACE2 receptor.

In case of SARS-CoV-2 infection, RBD of the viral spike protein attaches to the top region of subdomain I in the ACE2 receptor, which is distant from the receptor peptidase active site. Thus, the MLN-4760 inhibitor does not compete directly with the RBD of SARS-CoV -2. However, there have been evidences of allosteric modulation in the ACE2 receptor, which could be uti- 8 identified two small-molecule compounds from in silico screening that bound to the hinge region of ACE2 and could activate the receptor ~2 fold in an in vitro ACE2 activity assay, with EC 50 values of approximately 20 μM. Huentelman et al 101 also discovered a novel ACE2 inhibitor from virtual screening, which bound the receptor active site but also allosterically blocked SARS-CoV spike protein-mediated receptor attachment. Here, we have characterized the mechanism of ligand binding/dissociation and further highlighted the dynamic nature of the ACE2 receptor in terms of the large-scale movement of subdomain I upon ligand binding. LiGaMD simulations also showed that MLN-4760 could form hydrophobic interactions with the RBD binding site in subdomain I of the receptor in the closed state. This information could be helpful in designing drugs that could block RBD binding to the receptor. Furthermore, since the human ACE2 receptor shows conformational selection for ligand binding as revealed from the LiGaMD simulations, virtual screening using ensemble docking [102] [103] [104] with receptor structural ensembles generated from the LiGaMD simulations will be a promising approach to designing potent drug molecules of the ACE2 receptor against coronavirus infection.

As one of non-segmented negative-sense (NNS) RNA viruses, human respiratory syncytial virus (HRSV) including rabies and Ebola virus could cause severe lower track respiratory diseases. [105] [106] [107] [108] The function of RNA-dependent RNA polymerase (RdRP) is regulated by a zinc-binding transcription terminator, M2-1 protein, in HRSV. Thus, M2-1 protein is an important therapeutic target for drug design. The structure of a short positive-sense gene-end RNA (SH7) binding to the HRSV M2-1 was resolved, suggesting multiple M2-1 residues were interacted with the RNA. 85 Five independent 300 ns dual-boost GaMD simulations were performed to investigate the binding mechanism between RNA and the M2-1 protein. 85 The initial simulation model was prepared by placing the seven-nucleotide SH7 RNA (AGUUAAU) at a distance >20 Å from the M2-1 protein (PDB: 6PZQ). Spontaneous binding of the SH7 RNA to the M2-1 protein was captured. Structural clustering of the SH7 RNA was performed to obtain the top five clusters using a hierarchical agglomerative algorithm. The first and second top-ranked cluster showed RNA binding to the ZBD and the CD of M2-1, respectively. In the ZBD of M2-1, protein residue R4 formed polar contact with the nucleotide U-4 of SH7 RNA ( Figure 3B and D) . In the CD of M2-1, protein residue K92 formed ionic interaction with the phosphodiester backbone of U-4 RNA nucleotide ( Figure 3A and C), being consistent with the crystal structure that a direct interaction of K92 and the phosphodiester backbone of RNA1ʹ was observed.

Free energy profiles were calculated from GaMD simulations by using the polar interactions and the radius of gyration (Rg) of the SH7 RNA as reaction coordinates. With the RNA molecule being very small, the end-to-end distance can also be used to characterize the molecular conformational changes although it often undergoes high fluctuations. Here, using Rg as a reaction coordinate to calculate free energy profiles allowed us to capture different conformational states of the system. For SH7 binding to the CD of M2-1 protein, the distance between M2-1 residue K92 and RNA base U4 was ~3 Å in the free energy minimum ( Figure 3E) ; While for RNA binding to the ZBD of M2-1, free energy minimum was located at M2-1 residue R4 and RNA base U4 distance ~5 Å ( Figure 3F ).

In agreement with the experimental results, GaMD simulations revealed that binding of RNA to the CD or ZBD of the M2-1 protein are independent events. Together, the results provided important insights into the recognition mechanism between RNA and HRSV M2-1 protein, which can be valuable for effective drug design and development for its related diseases.

The Musashi 1 (MSI1) protein serves as a therapeutic drug target for treating several cancers such as colorectal, ovarian, bladder and myeloid leukemia. It is known to bind and suppresses translation of the 3ʹ-UTR of Numb mRNA, 109 however, the molecular mechanism of this interaction remains elusive which is important for effective drug design.

We performed all-atom GaMD simulations to study the binding mechanism between Numb RNA and MSI1. 86 For system setup, the Numb RNA was placed ~30 Å away from the MSI1. The AMBER force fields were used with ff14SBonlysc for protein, RNA.LJbb [110] [111] [112] TIP3P model for water molecules. Spontaneous binding of Numb RNA to the MSI1 protein was successfully captured in 6 out of the 19 independent 1200 ns of GaMD simulations. In Sim1, RNA binding was observed at ~100 ns where the RMSD of RNA relative to the NMR structure reduced to ~2.50 Å ( Figure 4A ). In Sim2, RNA binding was observed at ~1010-1130 ns followed by RNA dissociation into the bulk solvent ( Figure 4A ). The RNA bound to MSI1 after ~800 ns in Sim3, Sim4 and Sim5 ( Figure 4A ). In Sim6, spontaneous binding of RNA wasobserved after ~1000 ns ( Figure 4A ).

Free energy profiles were calculated by using reaction coordinates including backbone RMSD of RNA core relative to its NMR bound conformation, radius of gyration (Rg) and the number of native contacts (N contact ). Five low-energy minima were characterized -"Bound", "Intermediate I1", "Intermediate I2", "Intermediate I3" and "Unbound". These states were identified at the backbone RMSD of the RNA core and N contacts being (2.0 Å, 1500), (5.2 Å, 480), (9.5 Å, 200), (25.0 Å, 10) and (40 Å, 0), respectively ( Figure 4B ). The "Unbound", "Intermediate I1", "Intermediate I2" and "Intermediate I3" states were identified at the backbone RMSD of the RNA core and R g of Numb being (40.0 Å, 6.2 Å), (5.0 Å, 7.2 Å), (6.9 Å, 6.2 Å), respectively ( Figure 4C ). From the GaMD simulations, the R g of the Numb RNA in the "Bound" state was observed to have a wider range as compared to the "Unbound" and "Intermediate" conformations ( Figure 4B and C) which suggested an induced fit mechanism of Numb RNA binding to MSI1. The I1 state showed interactions of Numb RNA with the β2-β3 loop and C terminus of MSI1 ( Figure 4D ). Three hydrogen bonds were formed between MS1 C terminus residue Arg99 and the RNA nucleotide A106 ( Figure 4D ). The I2 state showed flipping of the MS1 residue Arg61 sidechain towards the solvent leading to the formation of hydrogen bond and salt-bridge interactions with the phosphate oxygen of the sidechain and backbone of RNA nucleotide A106, respectively ( Figure 4E ). The I3 state showed large conformational changes in the MS1 C terminus where hydrogen bond and salt bridge interactions were observed between the C terminus residue Arg99 and the sidechain and backbone of RNA nucleotide A106, respectively ( Figure 4F ). These important understandings of the RNA binding mechanism to MSI1 provided by 

GaMD simulations can aid rational structure-based drug design against MSI1 and other related diseases.

LiGaMD simulations performed on model systems such as host-guest and protein-ligand enabled ligand binding thermodynamics and kinetics calculations through repetitive events of binding and dissociation observed of smallmolecule ligands. 82 Calculations were done using LiGaMD and LiGaMD_Dual schemes where a boost potential is applied to the non-bonded and bonded interactions between the ligand-protein/environment, respectively, along with GAFF and q4MD force field parameters used to simulate binding of aspirin and 1-butanol (two guest molecules) to cyclodextrin (CD) (host). PMF profiles were calculated regarding the distance between the guest and host molecules using multiple independent 300 ns LiGaMD simulations. For most of the systems, the bound state was located at the global energy minima. The PMF plot showed similar results as compared to 6500ns cMD simulations in terms of low energy states including "Bound", "Intermediate" and "Unbound". Further, to characterize the conformational changes in the host, PMF profiles were calculated using the radius of gyration (R g ) as the reaction coordinate. The apo system and bound guest 1-butanol biased the system to adopt "Compact" (lower R g ) low energy conformational state whereas aspirin bound system predominately adopted "Open" (higher R g ) state. LiGaMD PMF profiles showed that the q4MD force field preferred open conformation where the host predominantly adopted compact conformation using GAFF forcefield. Similarly, the binding free energies of the host-guest systems were calculated using LiGaMD 3D PMF profiles. The binding free energies calculated from hundreds of nanoseconds of LiGaMD simulations were in excellent agreement with that of the experiments and previous "converged" microsecondtimescale cMD simulations. In addition to cMD, LiGaMD simulations' errors in these values were less than 1.0 kcal/mol. Moreover, LiGaMD simulations were also used to calculate the kinetic binding and dissociation constants. Reweighting of kinetics of the LiGaMD simulations was done using the Transition state theory and Kramer's rate theory. The kinetic binding rate calculated from the LiGaMD simulations decreased to some degree as compared to the previous cMD simulations while guest dissociation rates increased significantly in LiGaMD_Dual simulations.

Similarly, repetitive binding and dissociation of benzamidine inhibitor in trypsin were observed during microsecond LiGaMD simulations ( Figure 5A , B and F). 2D reweighted PMF profile was plotted regarding the atom distances benzamidine:C-Asp189:CG and Trp215:NE-Asp189:CG to characterize conformational changes of protein during ligand binding/unbinding ( Figure 5C ). Five low-energy wells were characterized from the free energy profiles-Bound, Intermediate 1, Intermediate 2, Unbound 1, and Unbound 2. When the inhibitor was bound to the protein, benzamidine: C-Asp189:CG distance was ~4 Å whereas Trp215:NE-Asp189:CG distance was ~12.5 Å. Here, the inhibitor was bound to the Asp189 residue of the protein S1 pocket as seen in the X-ray crystal conformation. The Intermediate 1 showed inhibitor bound near Trp215 "gate" close to His57-Asp102-Ser214 catalytic whereas the Intermediate 2 showed ligand bound to the open S1* pocket ( Figure 5D and E). Rearrangement of Trp215 loop and sidechain flipping of Trp215 residue facilitated the opening of S1* pocket in the Intermediate 2 state. Additionally, the ligand binding free energy of trypsin-benzamidine complex calculated from the 3D PMF profile was −6.13 ± 0.35 kcal/mol which agreed excellently with the experimental value of −6.2 kcal/mol. 111 Similarly, with the recorded time periods of ligand binding and unbinding, the reweighted k on and ko ff values were calculated as 1.15 ± 0.79 × 10 7 M −1 ·s −1 and 3.53 ± 1.41 s −1 , respectively. Similar protocol was applied, as in the host-guest binding, to calculate these values using Kramer's rate theory. These data were comparable to the values calculated from experiments. 111 

Pep-GaMD simulations have allowed us to capture repetitive binding and dissociation of highly flexible peptides, 83 and thus enabled peptide binding thermodynamics and kinetics calculations. In Pep-GaMD, potential energy of the peptide is selectivity boosted allowing it to sample higher flexibility during the simulation. In the dual boost scheme, the entire system potential energy is boosted. Three model peptide systems including "PAMPAR" (PDB: 1SSH), "PPPALPPKK" (PDB: 1CKA) and "PPPVPPRR" (PDB: 1CKB) were used to capture the binding/dissociation to the SH3 domains. 113 

Pep-GaMD simulations, the 1SSH peptide bound 3-4 times and dissociated 3 times to the SH3 domains. The 1CKA peptide was observed having 3-5 binding events whereas 2-5 dissociation events. Similarly, in the 1CKB simulation, the peptide bound 3-4 times to the receptor whereas it dissociated 2-3 times ( Figure 6 ).

3D Pyreweighting scripts were used to calculate the peptide binding free energies. These values agreed well with the experimental findings. Pep-GaMD predicted the binding free energy value to be −7.72 ± 0.54 kcal/mol for the 1CKA peptide which agreed well with the experimental value of −7.84 kcal/mol. 112 The 1CKB peptide had −6.84 ± 0.14 kcal/mol binding free energy which is close to −7.24 kcal/mol 112 as calculated from the experiments. The differences of binding free energies between the simulation predictions and experiments were less than 1.0 kcal/mol. In addition, with efficient sampling of peptide binding/dissociation Pep-GaMD helped us to predict the peptide binding kinetics. For the 1CKA system, the k on and k off were calculated as 4.06 ± 2.26×10 10 M −1 ⋅s −1 and 1.45 ± 1.07×10 3 s −1 , respectively. This was in excellent agreement with the 14 experimental values 115 of k exp on = 1.5 × 10 9 M −1 ⋅s −1 and k exp off = 8.9 × 10 3 s −1 . Hence, the Pep-GaMD simulation has enabled us to study the mechanism of peptide binding/dissociation to proteins by accelerating the interactions between them. This is a significant contribution to the field as it is difficult to attain correct sampling of biomolecules that involves long-range electrostatic interactions and conformational selection.

GaMD presents to be an unconstrained enhanced sampling technique, which can sample the conformational space of biomolecules and their interactions without the need of predefined CVs. Using cumulant expansion to the second order, GaMD can achieve accurate reweighting of the simulations because the added boost potential exhibits a Gaussian distribution. GaMD can be implemented to a broad range of biological systems. In particular, recent GaMD simulations have revealed important binding mechanisms of smallmolecule ligands, 71, 84 peptides, 116,117 proteins 78 and nucleic acids 85, 86 to the target receptors. 300 ns of GaMD simulations has successfully captured complete folding of chignolin (10 residues) into its native conformation 61 with a minimum RMSD of 0.2 Å relative to the NMR native conformation (PDB: 1UAO). The reweighted free energy profiles calculated showed distinct low-energy conformational states of chignolin as "Folded", "Intermediate" and "Unfolded". With the same simulation time, however, aMD failed to distinguish between the unfolded and intermediate states. 118 GaMD has been further developed for more powerful techniques of improved sampling such as LiGaMD 82 and Pep-GaMD. 83 Altogether, novel GaMD methods and advancement in computer hardware will enable us to approach and simulate complex biomolecular interactions at atomic level.

GaMD has been a successful enhanced sampling method to characterize biomolecular interactions and binding processes. The novel selective GaMD algorithms including LiGaMD and Pep-GaMD can efficiently calculate the binding free energy of ligand/peptide and their kinetic rate constants by adding a selective boost to the essential ligand/peptide potential energy terms. Important mechanistic insights into drug binding to the ACE2 receptor have provided a significant perspective for rational drug development against the COVID-19 pandemic. RNA binding to the M2-1 and MSI1 model proteins depicted by GaMD simulations was in good agreement with the experimentally resolved structures. However, only very few binding events were captured in GaMD simulations of the CXCR4-drug ( Figure 1 ) and protein-RNA (Figures 3 and  4) systems and LiGaMD simulations of the ACE2-inhibitor system ( Figure 2 ). The simulations did not sample enough binding and unbinding events (eg, at least 3-5 or more) in order to estimate the ligand/RNA binding free energies and kinetics for comparison with experimental data. Therefore, these simulations still suffered from insufficient sampling. Further method developments and advances in supercomputing are still needed to obtain converged binding simulations and thus calculate accurate biomolecular binding thermodynamics and kinetics. In addition to binding of the smallmolecule ligands and flexible peptides/RNA, new platforms also need to be explored to study biomolecular interactions of larger complexes such as protein-protein interactions. The binding analysis using GaMD and further improved simulation methods shall provide significant contributions in rational computer-aided drug design.

Dynamic personalities of proteins

Impact of CRISPR immunity on the emergence and virulence of bacterial pathogens

The nature of protein folding pathways

Fine-tuning of GPCR activity by receptor-interacting proteins

Theory of protein folding: the energy landscape perspective

Energy landscapes as a tool to integrate GPCR structure, dynamics, and function

Molecular dynamics simulations of biomolecules

ACEMD: accelerating biomolecular dynamics in the microsecond time scale

Showcasing modern molecular dynamics simulations of membrane proteins through G protein-coupled receptors

Atomic-level characterization of the structural dynamics of proteins

To milliseconds and beyond: challenges in the simulation of protein folding

Molecular dynamics simulation for all

How does a drug molecule find its target binding site?

Pathway and mechanism of drug binding to G-protein-coupled receptors

Structure and dynamics of the M3 muscarinic acetylcholine receptor

Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations

Mechanism of fast peptide recognition by SH3 domains

Exploring protein-peptide recognition pathways using a supervised molecular dynamics approach

Mechanism of coupled folding-upon-binding of an intrinsically disordered protein

Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science

Using metadynamics and path collective variables to study ligand binding and induced conformational transitions

Nonphysical sampling distributions in Monte Carlo free-energy estimation: umbrella sampling

The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method

Adaptive biasing force method for scalar and vector free energy calculations

Calculating free energies using a scaled-force molecular dynamics algorithm

Predicting slow structural transitions in macromolecular systems: conformational flooding

A molecular dynamics study of slow base flipping in DNA using conformational flooding

Characterizing drug target residence time with metadynamics: how to achieve dissociation rate efficiently without losing accuracy against time-consuming approaches

Unbinding kinetics of a p38 MAP kinase Type II inhibitor from metadynamics simulations

Exploring molecular mechanisms of ligand recognition by opioid receptors with metadynamics

Directly binding rather than induced-fit dominated binding affinity difference in (S)-and (R)-crizotinib bound MTH1

P-loop conformation governed crizotinib resistance in G2032R-mutated ROS1 tyrosine kinase: clues from free energy landscape

An efficient metadynamics-based protocol to model the binding affinity and the transition state ensemble of G-protein-coupled receptor ligands

Efficient atomistic simulation of pathways and calculation of rate constants for a protein-peptide binding process: application to the MDM2 protein and an intrinsically disordered p53 peptide

Protein-peptide association kinetics beyond the seconds timescale from atomistic simulations

Free energy profile and kinetics of coupled folding and binding of the intrinsically disordered protein p53 with MDM2

Enhanced sampling in molecular dynamics using metadynamics, replica-exchange, and temperature-acceleration

Equilibrium sampling in biomolecular simulations

Unconstrained enhanced sampling for free energy calculations of biomolecules: a review

Replica-exchange molecular dynamics method for protein folding

Generalized-ensemble algorithms: enhanced sampling techniques for Monte Carlo and molecular dynamics simulations

Crossing the time scale of protein folding through self-guided molecular dynamics simulation

Self-guided molecular dynamics simulation for efficient conformational search

Self-guided Langevin dynamics simulation method

Self-guided Langevin dynamics via generalized Langevin equation

Routine access to millisecond time scale events with accelerated molecular dynamics

Activation and dynamic network of the M2 muscarinic receptor

Allosteric effects of sodium ion binding on activation of the M3 muscarinic G-proteincoupled receptor

Nucleotide-dependent mechanism of Get3 as elucidated from free energy calculations

Allosteric networks in thrombin distinguish procoagulant vs. anticoagulant activities

Accelerating chemical reactions: exploring reactive free-energy surfaces using accelerated ab initio molecular dynamics

Accessing a hidden conformation of the maltose binding protein using accelerated molecular dynamics

Enhanced lipid diffusion and mixing in accelerated molecular dynamics

Hyperdynamics: accelerated molecular dynamics of infrequent events

Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules

Sampling of slow diffusive conformational transitions with accelerated molecular dynamics

Accelerated molecular dynamics simulations of ligand binding to a muscarinic G-protein-coupled receptor

A statistical analysis of the precision of reweighting-based simulations

Free energy landscape of G-protein coupled receptors, explored by accelerated molecular dynamics

Reduced free energy perturbation/hamiltonian replica exchange molecular dynamics method with unbiased alchemical thermodynamic axis

Gaussian accelerated molecular dynamics: unconstrained enhanced sampling and free energy calculation

Gaussian accelerated molecular dynamics in NAMD

Improved reweighting of accelerated molecular dynamics simulations for free energy calculation

Accelerated molecular dynamics applied to the peptaibol folding problem

Allosteric motions of the CRISPR-Cas9 HNH nuclease probed by NMR and molecular dynamics

Deciphering off-target effects in CRISPR-Cas9 through accelerated molecular dynamics

Graded activation and free energy landscapes of a muscarinic G-protein-coupled receptor

Understanding the molecular basis of agonist/antagonist mechanism of human mu opioid receptor through gaussian accelerated molecular dynamics method

In silico studies of conformational dynamics of Mu opioid receptor performed using gaussian accelerated molecular dynamics

Structural basis for binding of allosteric drug leads in the adenosine A1 receptor

Pathway and mechanism of drug binding to chemokine receptors revealed by accelerated molecular simulations

Structural consequences of multisite phosphorylation in the BAK1 kinase domain

Conformation control of the histidine kinase BceS of Bacillus subtilis by its cognate ABC-transporter facilitates need-based activation of antibiotic resistance

Gaussian accelerated molecular dynamics: theory, implementation and applications

A molecular dynamics simulation study decodes the Zika virus NS5 methyltransferase bound to SAH and RNA analogue

Isolation of a structural mechanism for uncoupling T cell receptor signaling from peptide-MHC binding

Structural basis for arginine glycosylation of host substrates by bacterial effector proteins

Mechanism of the G-protein mimetic nanobody binding to a muscarinic G-protein-coupled receptor

CRISPR-Cas9 conformational activation as elucidated from enhanced molecular simulations

G-protein-coupled receptor-membrane interactions depend on the receptor activation state

Gaussian accelerated molecular dynamics for elucidation of drug pathways

Ligand Gaussian accelerated molecular dynamics (LiGaMD): characterization of ligand binding thermodynamics and kinetics

Peptide Gaussian accelerated molecular dynamics (Pep-GaMD): enhanced sampling and free energy and kinetics calculations of peptide binding

Mechanism of ligand recognition by human ACE2 receptor

Structure of the human respiratory syncytial virus M2-1 protein in complex with a short positive-sense gene-end RNA

Mechanism of RNA recognition by a Musashi RNA-binding protein

CHARMM additive and polarizable force fields for biophysics and computer-aided drug design

CHARMM general force field: a force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields

A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations

A second generation force field for the simulation of proteins

Balkwill F The significance of cancer cell expression of the chemokine receptor CXCR4. Paper presented at: Seminars in cancer biology

Targeting chemokine receptors in chronic inflammatory diseases: an extensive review

Crystal structure of the chemokine receptor CXCR4 in complex with a viral chemokine

Structures of the CXCR4 chemokine GPCR with small-molecule and cyclic peptide antagonists

Structure of CC chemokine receptor 5 with a potent chemokine antagonist reveals mechanisms of chemokine recognition and molecular mimicry by HIV

Physiology and pharmacology of plerixafor

Allosteric modulation of chemoattractant receptors

Molecular mechanism of amd3100 antagonism in the cxcr4 receptor transfer of binding site to the cxcr3 receptor

AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility

Structurebased identification of small-molecule angiotensin-converting enzyme 2 activators as novel antihypertensive agents

Structure-based discovery of a novel angiotensin-converting enzyme 2 inhibitor

Computational drug design accommodating receptor flexibility: the relaxed complex scheme

An improved relaxed complex scheme for receptor flexibility in computer-aided drug design

Ensemble docking in drug discovery

Social, economic, and health impact of the respiratory syncytial virus: a systematic search

Human respiratory syncytial virus: infection and pathology

Global burden of acute lower respiratory infections due to respiratory syncytial virus in young children: a systematic review and meta-analysis

Global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in young children in 2015: a systematic review and modelling study

Musashi RNA-binding proteins as cancer drivers and novel therapeutic targets

Refinement of the Cornell et al. Nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles

Use of proflavine as an indicator in temperature-jump studies of the binding of a competitive inhibitor to trypsin

Structural basis for the specific interaction of lysine-containing proline-rich peptides with the N-terminal SH3 domain of c-Crk

How do proteins associate? A lesson from SH3 domain

Recognition of proline-rich motifs by protein-protein-interaction domains

Role of electrostatic interactions in binding of peptides and intrinsically disordered proteins to their folded targets. 1. NMR and MD characterization of the complex between the c-Crk N-SH3 domain and the peptide Sos

Improved modeling of peptide-protein binding through global docking and accelerated molecular dynamics simulations

Computational modeling of cyclic peptide inhibitor-MDM2/MDMX binding through global docking and Gaussian accelerated molecular dynamics simulations

Accelerated molecular dynamics simulations of protein folding

Advances and Applications in Bioinformatics and Chemistry Dovepress Publish your work in this journal Advances and Applications in Bioinformatics and Chemistry is an international, peer-reviewed open-access journal that publishes articles in the following fields: Computational biomodelling

Computational Biochemistry; Computational Biophysics; Chemoinformatics and Drug Design; In silico ADME/Tox prediction. The manuscript management system is completely online and includes a very quick and fair peerreview system, which is all easy to use

This work used supercomputing resources with allocation awards TG-MCB180049 through the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562, project M2874 through the National Energy Research Scientific Computing Center, which is a US Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231, and computational resources provided by the Research Computing Cluster at the University of Kansas. This work was supported in part by the American Heart Association (Award 17SDG33370094), the National Institutes of Health (R01GM132572) and the startup funding in the College of Liberal Arts and Sciences at the University of Kansas.

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

The authors report no conflicts of interest in this work.