key: cord-104122-klvx927g
authors: Tayfuroglu, Omer; Yildiz, Muslum; Pearson, Lee-Wright; Kocak, Abdulkadir
title: An Accurate Free Energy Method for Solvation of Organic Compounds and Binding to Proteins
date: 2020-05-28
journal: bioRxiv
DOI: 10.1101/2020.05.26.116459
sha: 
doc_id: 104122
cord_uid: klvx927g

Here, we introduce a new strategy to estimate free energies using single end-state molecular dynamics simulation trajectories. The method is adopted from ANI-1ccx neural network potentials (Machine Learning) for the Atomic Simulation Environment (ASE) and predicts the single point energies at the accuracy of CCSD(T)/CBS level for the entire configurational space that is sampled by Molecular Dynamics (MD) simulations. Our preliminary results show that the method can be as accurate as Bennet-Acceptance-Ration (BAR) with much reduced computational cost. Not only does it enable to calculate solvation free energies of small organic compounds, but it is also possible to predict absolute and relative binding free energies in ligand-protein complex systems. Rapid calculation also enables to screen small organic molecules from databases as potent inhibitors to any drug targets.

The life cycle of the virus as many other viruses has an essential proteolytic auto-processing step. 2 3-4 The 3CL-protease takes an important role in freeing individual functional proteins from single-chain polyprotein that has been translated by the host cell translation machinery. 5 Interfering to this step has been shown to have capability of inhibiting virus replication. 6 Therefore, 3CL-protease is one of the valid and most attractive drug targets for Covid-19

treatment.

Developing a drug molecule from scratch for curing a disease is quite expensive and time consuming. Instead, repurposing an FDA approved drug is a much faster and less expensive strategy since the time is very critical parameter in controlling such invasive viruses. Even testing all FDA approved drugs experimentally one by one may cause intolerable waste of the time. To help to ease the problem, computational calculations can be very handy. There have already been several computational studies about predicting an efficient FDA approved drugs that can be repurposed for COVID-19. [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] In searching for drug candidates as inhibitors, there are two core questions that can be addressed: (a) what is the binding mode? (i.e. where does it bind from the enzyme?) (b) what is the binding affinity? (binding free energy). 18 Inhibition of an enzyme can be involved in various mechanisms. For instance, an inhibitor can bind to the enzyme from the active site in the absence of substrate (competitive inhibition) forming an additional or it can bind in the presence of the substrate (i.e. enzyme-substrate complexed) from the allosteric site (uncompetitive inhibition).

Alternatively, it can reduce the activity of the enzyme by binding from allosteric site of the enzyme either free or complex (non-competitive inhibition). 19 In all these mechanisms the equilibrium constant, K s belonging to the reaction between the enzyme and substrate is shifted towards the reactants by existing inhibitor in the media. Kinetic studies to understand the activity loss of the enzyme on substrate by the inhibitor report values such as inhibition equilibrium constant, K i or half maximal inhibitory concentration, IC50. Both these experimental parameters are proportional (they are equal in special cases). Thermodynamic equilibrium constant K i is related to the standard binding Gibbs free energy of the system by:

The ΔG o can also be calculated from the thermodynamic potentials. Thus, it is possible to compare experimental ΔG o determined from K i and theoretical ΔG o predicted by thermodynamic potentials using computational methods. However, logarithmic relation between the K i and ΔG o complicates the reliability of the calculations. An error of 3-4 kcal is acceptable for the most computations, but this makes thousands of times less efficient inhibitor. Therefore, most computational methods fail to estimate the experimental binding affinity trend (relative binding free energies) among the inhibitors. Reproducing the absolute standard free energies of binding is even more difficult for the computational methods. Moreover, a trade-off between the computational accuracy and speed must be made.

However, almost all of the studies screen databases listing the FDA approved drugs at the molecular docking level, which is mostly based on geometrical alignment of a ligand into a binding pocket, namely the active site of 3C-like protease (3CL pro ). Despite the speed of docking approach, they are very coarse methods and almost never predict the correct experimental binding affinity trend among the inhibitors. [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] More sophisticated methods to calculate the potential binding free energy of inhibitor candidate to the protein ranges from post molecular dynamics simulations such as Molecular

Mechanics Poisson-Boltzmann Surface Area (MMPBSA) [20] [21] [22] [23] to perturbation methods such as Bennett acceptance ratio (BAR), 24-31 the latter being much more accurate. [42] [43] [44] are also commonly used and proven to be sufficiently accurate in estimating solvation free energies along with relative and absolute binding free energies. However, all these methods require very high computational cost. Screening all the drug candidates using these methods may not be very practical. For a typical free energy of binding, these methods require at least 30 λ-states for the decoupling of L from PL complex and another that many for decoupling L from water (i.e. solvation free energy of ligand), which increases the computational cost by 60-folds.

Machine Learning (ML) techniques have attracted a great interest for the past decade due to their successful algorithms to scientific questions including but not limited to chemical reactions, [45] [46] potential energy surfaces, [47] [48] [49] [50] forces, 51-53 atomization energies, 54-55 and proteinligand complex scorings. 56 . One of the most promising aspects of the ML techniques is that the trained models can be applied to new systems (transferrable). 57 Recently several models using active learning such as ANI-1, 58 ANI-1x 59 and ANI-1cxx 60 Here, we introduce a new strategy to estimate free energies of solvation of small organic compounds and binding to proteins in explicit solvent using single end-state MD simulations.

The method is adopted from ANI-1ccx neural network potentials (Machine Learning) for the The insertion of the ligand to an environment of solvent (solvation free energy) or receptor (binding free energy) can be defined by a coupling parameter, λ. 61 At each λ-state, the potential energy of the system would be:

where V is the interaction energy between the ligand and environment.

ܷ is the reference potential energy, which includes all energy terms except for the ligand-environment interaction.

The partition function is;

where F is the free energy depending on λ-state. Re-organizing the above equation,

The equation above is only valid when the ligand-environment interaction is proportional to coupling term λ. This results in;

In the limiting case, this equation yields as;

The equation implies that the free energy change is nearly half-the potential between the ligand-environment interaction in the complex configurational ensemble. 20 For the binding studies, the protein-ligand complex (~4700 atoms) was placed in the center of a dodecahedron box. For the solvation studies, the ligand was placed in the center of a cubic box.

Each system was solvated in the TIP3P model type water 64 Energy minimization was carried out to a maximum 100 kJ.mol -1 .nm -1 force using Verlet cutoff scheme. For both long range electrostatic and Van der Waals interactions, a cutoff length of 12 Å was used with the Particle Mesh Ewald method (PME) (6 th order interpolation). 65 The neighbor list update frequency was set to 1 ps -1 . As with our earlier studies, [66] [67] a stepwise energy minimization and equilibration schemes were used. Each minimization step consisted of a 5000 cycle of Steepest Descent and a subsequent 5000 cycles of l-bfgs integrators.

After minimization, each system was equilibrated within three steps using Langevin In order to test the ML-QM method, we run 10-ns MD simulations for the molecules whose solvation free energies have been extensively studied. We first built a system in which the ligand is replaced in the center of box and the box is solvated with just single water molecule. The ligands studied are given in Figure 1 . We prepared other independent simulations in which we increased the number water molecules (nH 2 O) in the box. Each time the average ML-QM energies were almost 0. This has been the case until nH 2 O=100. Figure 3a shows the interaction energy between the ligand and water molecules calculated by ML-QM energy difference:

is the total energy of the system including ligand and water.

are the energies of free ligand and free water molecules, respectively, which have been calculated by extracting from the trajectory of complex structure. When we kept adding more water in the system and running other MD simulations, we found that starting from nH 2 O=100, the system undergoes from the unbound state to bound state by a drastic energy jump after ~9.5 ns. The degree of freedom for the water molecules in the bound state must be It is apparent that the ML-QM interaction energy is constant for the simulations with more than sufficient waters exist. The empirically determined linearity constant is α =1/2, thus the free energy change is approximately half of the interaction energy between the ligand and environment. However, considering the experimental conditions are more or less similar, the energy profile (i.e. the trend among the inhibitors) should at least be reproduced. Here, we see that ML-QM is the most accurate method to yield the correct trend among the compared methods ( Figure 5 ). 

A Novel Coronavirus from Patients with Pneumonia in China

Coronavirus Main Proteinase (3CL pro ) Structure: Basis for Design of Anti-SARS Drugs

Virus-encoded proteinases and proteolytic processing in the Nidovirales

Conservation of substrate specificities among coronavirus main proteases

Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α -ketoamide inhibitors

In silico study the inhibition of angiotensin converting enzyme 2 receptor of COVID-19 by Ammoides verticillata components harvested from Western Algeria

State-of-the-art tools to identify druggable protein ligand of SARS-CoV-2

Molecular docking analysis of N-substituted oseltamivir derivatives with the SARS-Cov-2 main protease

Drug repurposing for coronavirus (COVID-19): in silico screening of known drugs against coronavirus 3CL hydrolase and protease enzymes

Reverse vaccinology approach to design a novel multi-epitope vaccine candidate against COVID-19: an in silico study

Repurposing of known anti-virals as potential inhibitors for SARS-CoV-2 main protease using molecular docking analysis

COVID-19 spike-host cell receptor GRP78 binding site prediction

Virtual screening and repurposing of FDA approved drugs against COVID-19 main protease

Molecular docking analysis of selected natural products from plants for inhibition of SARS-CoV-2 main protease

Molecular docking analysis of Withaferin A from Withania somnifera with the Glucose regulated protein 78 (GRP78) in comparison with the COVID-19 main protease

Ligand binding free energy and kinetics calculation in 2020

An intuitive look at the relationship of K-i and IC50: A more general use for the Dixon plot

The Mm/Pbsa and Mm/Gbsa Methods to Estimate Ligand-Binding Affinities

A. g_mmpbsa-A GROMACS Tool for High-Throughput MM-PBSA Calculations

A Comparative Linear Interaction Energy and MM/PBSA Study on SIRT1-Ligand Binding Free Energy Calculation

Revisiting Free Energy Calculations: A Theoretical Connection to MM/PBSA and Direct Calculation of the Association Free Energy

Efficient Estimation of Free Energy Differences from Monte Carlo Data

Efficiency of Alchemical Free Energy Simulations I: Practical Comparison of the Exponential Formula, Thermodynamic Integration and Bennett's Acceptance Ratio Method

Comparison of thermodynamic integration and Bennett's acceptance ratio for calculating relative protein-ligand binding free energies

Bayesian Estimation of Free Energies From Equilibrium Simulations

Unorthodox Uses of Bennett's Acceptance Ratio Method

Multiscale Free Energy Simulations: An Efficient Method for Connecting Classical MD Simulations to QM or QM/MM Free Energies Using Non-Boltzmann Bennett Reweighting Schemes

On Relation between the Free-Energy Perturbation and Bennett's

Acceptance Ratio Methods: Tracing the Influence of the Energy Gap

Comparison of efficiency and bias of free energies computed by exponential averaging, the Bennett acceptance ratio, and thermodynamic integration

Binding Thermodynamic Cycles: Hysteresis, the Locally Weighted Histogram Analysis Method, and the Overlapping States Matrix

Calculations of Solvation Free Energy through Energy Reweighting from Molecular Mechanics to Quantum Mechanics

Convergence of single-step free energy perturbation

Absolute Hydration Free Energies of Blocked Amino Acids: Implications for Protein Solvation and Stability

Bruckner, S.; Boresch, S. Efficiency of Alchemical Free Energy Simulations II: Improvements for Thermodynamic Integration

QM/MM free-energy perturbation compared to thermodynamic integration and umbrella sampling: Application to an enzymatic reaction

Separation-Shifted Scaling, a New Scaling Method for Lennard-Jones Interactions in Thermodynamic Integration

Efficient Determination of Protein-protein Standard Binding Free Energies from First Principles

Free-Energy Cost for Translocon-Assisted Insertion of Membrane Proteins

Scalable Molecular Dynamics with NAMD

Efficient Syntheses of Diverse, Medicinally Relevant Targets Planned by Computer and Executed in the

Predicting reaction performance in C-N cross-coupling using machine learning

The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics

Machine learning of accurate energy-conserving molecular force fields

Quantumchemical insights from deep tensor neural networks

First Principles Neural Network Potentials for Reactive Simulations of Large Molecular and Condensed Systems

Molecular Dynamics with On-the-Fly Machine Learning of Quantum-Mechanical Forces

Accurate interatomic force fields via machine learning with covariant kernels

Energy-free machine learning force field for aluminum

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning

Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error

Protein-Ligand Scoring with Convolutional Neural Networks

ANI-1, A data set of 20 million calculated offequilibrium conformations for organic molecules

Less is more: Sampling chemical space with active learning

Linear interaction energy (LIE) models for ligand binding in implicit solvent: Theory and application to the binding of NNRTIs to HIV-1 reverse transcriptase

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Improved Side-Chain Torsion Potentials for the Amber ff99SB Protein Force Field

Molecular-dynamics study of atomic motions in water

Computational insights into the protonation states of catalytic dyad in BACE1-acyl guanidine based inhibitor complex

Docking, molecular dynamics and free energy studies on aspartoacylase mutations involved in Canavan disease

Electrostatics of nanosystems: application to microtubules and the ribosome