key: cord-0922141-3s7iiux8 authors: Tam, Nguyen Minh; Nguyen, Trung Hai; Ngan, Vu Thi; Tung, Nguyen Thanh; Ngo, Son Tung title: Unbinding ligands from SARS-CoV-2 Mpro via umbrella sampling simulations date: 2022-01-26 journal: R Soc Open Sci DOI: 10.1098/rsos.211480 sha: 2dfb0b61965854f3ca196990fc02b33637b541bf doc_id: 922141 cord_uid: 3s7iiux8 The umbrella sampling (US) simulation is demonstrated to be an efficient approach for determining the unbinding pathway and binding affinity to the SARS-CoV-2 Mpro of small molecule inhibitors. The accuracy of US is in the same range as the linear interaction energy (LIE) and fast pulling of ligand (FPL) methods. In detail, the correlation coefficient between US and experiments does not differ from FPL and is slightly smaller than LIE. The root mean square error of US simulations is smaller than that of LIE. Moreover, US is better than FPL and poorer than LIE in classifying SARS-CoV-2 Mpro inhibitors owing to the reciever operating characteristic–area under the curve analysis. Furthermore, the US simulations also provide detailed insights on unbinding pathways of ligands from the binding cleft of SARS-CoV-2 Mpro. The residues Cys44, Thr45, Ser46, Leu141, Asn142, Gly143, Glu166, Leu167, Pro168, Ala191, Gln192 and Ala193 probably play an important role in the ligand dissociation. Therefore, substitutions at these points may change the mechanism of binding of inhibitors to SARS-CoV-2 Mpro. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211480 the difference in the potential of mean force (PMF) between bound and unbound states of the complexes. The PMF value is obtained by the weighted histogram analysis method (WHAM) [63] calculations. The unbinding pathway is able to estimate via the collective-variable free energy landscape (FEL) [64] . The representative structures of the complexes can be determined using the clustering method [65] . The obtained results revealed that the combination of FPL/US simulations was useful for accurately evaluating potential inhibitors for SARS-CoV-2 Mpro. Twenty four three-dimensional structures of the solvated SARS-CoV-2 Mpro + inhibitor were reported in the previous work [22] , in which the structure of SARS-CoV-2 Mpro was downloaded from the protein data bank with identification of 6Y2F [66] and structure of ligands downloaded from the PubChem database [67] . Details of the ligands are reported in the electronic supplementary material, table S1. GROMACS v. 5.1.5 [68] was used to carry out atomistic simulations. As mentioned above, the initial conformations of complexes (cf. figure 1b) were obtained from previous work [22] . Force field parameters for the protein + ions, water molecules and ligand were taken from the Amber99SB-ILDN force field [69] , TIP3P water model [70] and general Amber force field [71] , respectively. The MD parameters were chosen according to the previous work [21] . Particularly, the integration was carried out with a timestep of 2 fs. Cut-off for non-bonded interaction was set at a distance of 0.9 nm. The electrostatic interactions were computed using the Particle Mesh Ewald method [72] , whereas the van der Waals interactions were calculated with the cut-off scheme. The solvated complexes were relaxed using energy minimization, constant number of particles, volume and temperature (NVT) and constant number of particles, volume and pressure (NPT) simulations before the steered-MD was applied. The NPT-equilibrated complex conformation was used as a starting structure of FPL calculations. During an FPL simulation the centre of mass of the ligand was pulled along the reaction coordinate, ξ, via an external-harmonic force. In particular, the spring constant of pulling force is k = 600 kJ mol −1 nm −2 and the pulling velocity is v = 0.005 nm ps −1 . An FPL calculation was during an interval of 0.7 ns. The pulling parameters were referred to in the previous works [36, 73] . The systemic coordinate, pulling force and pulling work were recorded every 0.1 ps. The optimized FPL trajectory, whereas the deviation of the pulling work to the mean is the smallest value, was used to generate US windows. The displacement of ligands between neighbouring windows is ca 0.1 nm. The PMF was estimated from US simulations with 21 windows and length of 2 ns per window. It should be noted that NPT simulations were performed with length of 0.1 ns to evade any staring bias. Moreover, the C α atoms were positionally restrained via a weak harmonic force to prevent the enzymic reorientation and translation during both FPL and US simulations. It helped reduce the computational effort during US simulations because we didn't have to thoroughly sample the whole protein conformational space which may be huge. The approach is reasonable because the protein conformation does not change significantly upon ligand binding. The two-dimensional FEL was constructed using the number of contacts within 0.45 nm between protein-ligand and ligand displacement as two reaction coordinates [64, 65] . Representative structures of complexes localizing in the FEL minima were predicted using the clustering methods [65] . The WHAM [74] was used to estimate the PMF. The computed error was estimated using a bootstrapping method [75] . The Scikit-Learn library [76] was used to compute the receiver operating characteristic-royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211480 area under the curve (ROC-AUC). In particular, the set of ligands were sorted and split into two groups according to their experimental binding free energy, ΔG EXP . Namely, strong binders are ligands having ΔG EXP < −7.60 kcal mol −1 whereas weak binders are those with ΔG EXP ≥ −7.60. The Python code for estimating errors using the bootstraping method and for calculating ROC-AUC is included in the electronic supplementary material. We can roughly classify proteins into two groups based on the geomorphology of their active sites. The first group includes proteins having a 'close' active site which requires protein structural change to allow a ligand to bind/unbind. The second one consists of proteins having an 'open' active site permitting its ligand to move in/out without protein conformational change. SARS-CoV-2 Mpro belongs to the second group with an 'open' active site that we can use pathway-based binding affinity approaches, such as FPL [36] , US [62] and non-equilibrium molecular dynamics (NEMD) [39, 40, 77] simulations, to evaluate the ligand-binding affinity besides end-to-end free energy approaches. FPL was shown to be suitable for rapidly estimating the binding affinity of inhibitors to SARS-CoV-2 Mpro [22, 58, 73] . However, the obtained values cannot directly estimate the binding free energy, ΔG. The approach also cannot evaluate the unbinding pathway of ligands owing to the limited sampling that was generated. ΔG and details of the unbinding pathway can be calculated via NEMD, but the approach requires huge computing resources. US simulations, an enhanced sampling method, emerges as a potential technique to complete this task force. Initial conformations used for US simulations were generated by FPL simulations with a window spacing of ca 0.1 nm. During FPL, the ligand displacement was recorded as shown in figure 2. Inhibitors remain in the binding cavity until the pulling force reaches a maximum value, called rupture force. Inhibitors then constantly dissociate from the binding cleft. In particular, more details about FPL results are described in the electronic supplementary material, tables S2-S4. The results obtained in this study are consistent with the recent works [21, 22] . Coordinates of solvated complexes were recorded during FPL simulations every 0.1 ps that the US windows were then extracted over the optimized FPL trajectory. The PMF values along the reaction coordinate, ξ, (cf. figure 3 ; electronic supplementary material, table S5) were calculated using the WHAM [74] , whereas the US histograms are reported in the electronic supplementary material, table S6. The shape of the free energy profile is very consistent with previous works [48, 78] . In particular, the PMF features a deep minimum at short distance and a plateau region at large distance where the non-bonded interactions between protein and ligand vanish. The observation is in good agreement with the FEL analysis as mentioned below. A free energy profile is given in figure 3 and the values of calculated binding free energy DG US are given in table 1. The calculated binding free energy ranges from −8.63 to −1.80 kcal mol −1 , with the mean value of DG US ¼ À4:59 + 0:41 kcal mol À1 . The calculated mean underestimates the experimental mean, which royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211480 is of DG EXP ¼ À7:81 + 0:21 kcal mol À1 . The computed error bars were estimated by the standard error of the mean. The underestimation of US calculations was similar to the results when the US approach was applied to the cathepsin K (CTSK) system [62] . The inaccuracy of the force field in describing the interaction between biomolecules [83, 84] , which included the SARS-CoV-2 Mpro, inhibitor, water molecule, and counter ion, may cause the computed binding free energies to deviate from experimental values. The US calculations adopted an appropriate Pearson correlation with R US ¼ 0:66 + 0:13 (figure 4), which is similar to the FPL results, R W ¼ À0:65, but is lower than that of the LIE, R LIE ¼ 0:73 + 0:09 [21, 22] . Moreover, the accuracy of computational approaches was also investigated via root mean square error (RMSE) calculation. The obtained result of US is of RMSE US ¼ 3:58 + 0:28 kcal mol À1 . The RMSE US is slightly smaller than that from LIE calculation, RMSE LIE ¼ 4:12 + 0:40 kcal mol À1 and is significantly smaller than that from MM/PBSA, RMSE LIE ¼ 10:15 + 1:92 kcal mol À1 [21] . Furthermore, because the initial complexes obtained via the docking approach is probably far from the native structure, the US accuracy is not very high. Therefore, the systems need to relax in the MD simulations for long enough for the system to overcome local minima and reach equilibrium states [21] . The accuracy of the FPL approach was increased according to the strategy [21] . Otherwise, increasing the US window interval would probably resolve the issues since the complex in the bound state can reach equilibrium states. The issues should be carefully investigated in further work. In order to characterize the insights into the unbinding pathway of ligand dissociation, the collectivevariable FEL was constructed. Two coordinates are the number of protein-ligand contacts at a cut-off distance of 0.45 nm and the ligand displacement. The two-dimensional FEL is given in figure 5 and the electronic supplementary material, table S7. Consistent with the PMF curve (figure 3a), the number of contacts between protein-ligand rapidly decreases when the ligand is displaced from 1.1 to 2.5 nm. Along the dissociation pathway, numerous local minima were observed. The representative structures of SARS-CoV-2 Mpro + inhibitor, which corresponded to these minima, were characterized using a data clustering algorithm with all-atom root mean square deviation chosen as pair distance and a cut-off of 0.2 nm. A representation of these conformations is described in figure 5 . In particular, the SARS-CoV-2 Mpro + 11b formed six minima, which were denoted as B, D1, D2, D3, D4 and D5. Structure B responds to the bound state of the complex, while structures D1-4 correspond to the transition states during the dissociated process of the ligand. Shape D5 responds to the completed dissociation state of the protein-ligand complex. Besides, the two-dimensional interaction diagram of SARS-CoV-2 Mpro + 11b is described in the electronic supplementary material, figure S1 mentioning more details of critical residues controlling the binding process of the compound 11b. The residues Cys44, Thr45, Ser46, Leu141, Asn142, Gly143, Glu166, Leu167, Pro168, Ala191, Gln192 and Ala193 Figure 4 . Linear correlation between DG US and DG EXP . The computing error was estimated using the bootstrapping method. probably are important elements over the ligand dissociation. These residues contribute a large number of contacts to the inhibitor over the dissociated process. The potential mutations at these points would probably alter the ligand binding pathway to SARS-CoV-2 Mpro, especially, the change in binding kinetics appeared. In addition, ROC-AUC is commonly used to assess the predictive power of a binary classifier in separating a dataset into two classes. If we separate the set of ligands into two sets according to their experimental binding free energy as being strong binders (low binding free energy) and weak binders (high binding free energy) using some arbitrary cut-off, we can use ROC-AUC, to assess the ability of our computed free energy in distinguishing strong binders against weak binders. More specifically, if we randomly select a ligand from the strong binder set and a ligand from the weak binder set, ROC-AUC gives us the probability that the selected strong binder will have lower calculated binding free energy than the weak binder. Therefore, ROC-AUC for SARS-CoV-2 Mpro was calculated. The obtained result is ROC À AUC US ¼ 0:83 + 0:09, indicating that US is weaker than LIE, ROC À AUC LIE ¼ 0:86 + 0:08, and better than FPL, ROC À AUC FPL ¼ 0:76 + 0:11, in classifying SARS-CoV-2 Mpro inhibitors [21] . We have demonstrated that the US approach provides a reliable estimate of binding free energies DG US when compared with experimental data [19, 20, 66, [79] [80] [81] [82] . The correlation between US binding free energy and experiments is rather high with a value of R ¼ 0:66 + 0:13, which is in the same range of correlation with FPL [22] . Moreover, the obtained RMSE of US simulations, RMSE US ¼ 3:58 + 0:28 kcal mol À1 , is slightly smaller than that of LIE [21] . Furthermore, US is better than FPL and poorer than LIE in classifying SARS-CoV-2 Mpro inhibitors owing to ROC-AUC analysis, ROC À AUC US ¼ 0:83 + 0:09. Clustering conformations found on local minima of the two-dimensional FEL revealed important insights into the unbinding pathway of the ligand from SARS-CoV-2 Mpro binding cavity. The obtained results imply that the residues Cys44, Thr45, Ser46, Leu141, Asn142, Gly143, Glu166, Leu167, Pro168, Ala191, Gln192 and Ala193 probably play an important role in the ligand dissociation. Therefore, substitutions appearing at these points possibly change the ligand binding mechanism of SARS-CoV-2 Mpro. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211480 Data accessibility. All relevant data and software necessary to reproduce all results in the paper are described in the material and methods of the main text and in the electronic supplementary material. Information on software is provided in the material and methods. Python code for bootstrap analysis and ROC-AUC calculations is included in the electronic supplementary material. Authors' contributions. N.M.T.: formal analysis, investigation, methodology, visualization, writing-original draft, writing-review and editing; T.H.N.: conceptualization, formal analysis, methodology, writing-original draft, writing-review and editing; V.T.N.: investigation, visualization, writing-original draft; N.T.T.: investigation, visualization, writing-original draft; S.T.N.: conceptualization, formal analysis, funding acquisition, investigation, project administration, resources, visualization, writing-original draft, writing-review and editing. All authors gave final approval for publication and agreed to be held accountable for the work performed therein. Coronavirus disease 2019 (COVID-19) Situation Report -52 Clinical features of patients infected with 2019 novel coronavirus in Wuhan 2020 A novel coronavirus outbreak of global health concern Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CLpro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1 FDA Approves First Treatment for The 'very, very bad look' of remdesivir, the first FDA-approved COVID-19 drug First case of 2019 novel coronavirus in the United States Remdesivir for the treatment of COVID-19-final report 2021 Five reasons why COVID herd immunity is probably impossible Antibody resistance of SARS-CoV-2 variants B.1.351 and B.1.1.7 SARS-CoV-2 variants B.1.351 and P.1 escape from neutralizing antibodies International committee on taxonomy of viruses and the 3,142 unassigned species. Virology 2, 64 Potential COVID-2019 3C-like protease inhibitors designed using generative deep learning approaches 2020 Characterization and noncovalent inhibition of the deubiquitinase and deISGylase activity of SARS-CoV-2 papain-like protease Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra α-helical domain Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease 2021 Benchmark of popular free energy approaches revealing the inhibitors binding to SARS Binding of Inhibitors to the monomeric and dimeric SARS-CoV-2 Mpro Computer-aided drug design An improved relaxed complex scheme for receptor flexibility in computer-aided drug design Computational methods in drug discovery A fast algorithm for selecting sets of dissimilar molecules from large chemical databases AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility Computational proteinligand docking and virtual drug screening with the AutoDock suite Searching for anthranilic acidbased thumb pocket 2 HCV NS5B polymerase inhibitors through a combination of molecular docking, 3D-QSAR and virtual screening Using the fast Fourier transform in binding free energy calculations A new method for predicting binding affinity in computer-aided drug design Improving the LIE method for binding free energy calculations of protein-ligand complexes Continuum solvent studies of the stability of DNA, RNA, and phosphoramidate-DNA helices Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models Assessing the performance of the MM/PBSA and MM/GBSA methods. 6. Capability to predict protein-protein binding free energies and re-rank binding poses generated by protein-protein docking Fast and accurate determination of the relative binding affinities of small compounds to HIV-1 protease using non-equilibrium work Estimation of protein-ligand unbinding kinetics using non-equilibrium targeted molecular dynamics simulations Determination of the absolute binding free energies of HIV-1 protease inhibitors using nonequilibrium molecular dynamics simulations Nonequilibrium equality for free energy differences Absolute FKBP binding affinities obtained via non-equilibrium unbinding simulations High-temperature equation of state by a perturbation method. I. Nonpolar gases Calculation of absolute protein-ligand binding free energy from computer simulations Statistical mechanics of fluid mixtures A contribution to the drug resistance mechanism of darunavir, amprenavir, indinavir, and saquinavir complexes with HIV-1 protease due to flap mutation I50 V: a systematic MM-PBSA and thermodynamic integration study Free energy perturbation Hamiltonian replica-exchange molecular dynamics (FEP/H-REMD) for absolute ligand binding free energy calculations Oversampling free energy perturbation simulation in determination of the ligand-binding free energy Adequate prediction for inhibitor affinity of Aβ40 protofibril using the linear interaction energy method Estimating the ligand-binding affinity via λ-dependent umbrella sampling simulations Relationship between the inhibition constant (KI) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction Evaluation of predicted protein-protein complexes by binding free energy simulations Accurate binding free energy predictions in fragment optimization Computer-aided drug design methods Computational chemistry in drug lead discovery and design Predicting binding free energies: frontiers and benchmarks Computational modeling of β-Secretase 1 (BACE-1) inhibitors using ligand based approaches Molecular dynamics-driven drug discovery: leaping forward with confidence Large scale relative protein ligand binding affinities using non-equilibrium alchemy Computational determination of potential inhibitors of SARS-CoV-2 main protease Ligand binding affinity prediction by linear interaction energy methods Free energy calculations: applications to chemical and biochemical phenomena Nonphysical sampling distributions in Monte Carlo freeenergy estimation: umbrella sampling Effective estimation of ligand-binding affinity using biased sampling method The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method Molecular recognition of platinated DNA from chromosomal HMGB1 Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case 2020 Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors PubChem substance and compound databases GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19-25 Motional timescale predictions by molecular dynamics simulations: case study using proline and hydroxyproline sidechain dynamics Comparison of simple potential functions for simulating liquid water Development and testing of a general amber force field Particle mesh Ewald: an N⋅log(N) method for Ewald sums in large systems Rapid prediction of possible inhibitors for SARS-CoV-2 main protease using docking and FPL simulations g_wham-a free weighted histogram analysis implementation including robust error and autocorrelation estimates Bootstrap methods: another look at the jackknife Scikit-learn: machine learning in python Equilibrium free-energy differences from nonequilibrium measurements: a master-equation approach Prediction of AChEligand affinity using the umbrella sampling simulation 3C-like protease inhibitors block coronavirus replication in vitro and improve survival in MERS-CoV-infected mice GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main protease Identify potent SARS-CoV-2 main protease inhibitors via accelerated free energy perturbation-based virtual screening of existing drugs Feline coronavirus drug inhibits the main protease of SARS-CoV-2 and blocks virus replication Force field benchmark of amino acids: I. Hydration and diffusion in different water models Force field benchmark of amino acids. 2. Partition coefficients between water and organic solvents