key: cord-0933166-ms4nde59
authors: Sulimov, A. V.; Ilin, I. S.; Kutov, D. C.; Stolpovskaya, N. V.; Shikhaliev, Kh. S.; Sulimov, V. B.
title: Supercomputing, Docking and Quantum Mechanics in Quest for Inhibitors of Papain-like Protease of SARS-CoV-2
date: 2021-08-09
journal: Lobachevskii J Math
DOI: 10.1134/s1995080221070222
sha: dc82890f53ac2f5e5c18ec6b07e5152c8341947b
doc_id: 933166
cord_uid: ms4nde59

Lomonosov-2 supercomputer is used to search for new organic compounds that can suppress the replication of the SARS-CoV-2 coronavirus. The latter is responsible for the COVID-19 pandemic. Docking and a quantum-chemical semiempirical atomistic modeling method are used to find inhibitors of the SARS-CoV-2 papain-like protease, which is one of the key coronavirus enzymes responsible for its replication. The atomistic model of the papain-like protease of this coronavirus is based on the high-resolution structure deposited in the Protein Data Bank. The SOL docking program has been used for virtual screening of more than [Formula: see text] low molecular weight molecules (ligands). Ligands with the highest protein-ligand binding energy, selected using the docking results, were subjected to quantum-chemical calculations. The latters are performed by the PM7 semiempirical method with the COSMO implicit solvent model using the MOPAC program. The enthalpy of protein-ligand binding is calculated for the best position of the ligand in the protein. [Formula: see text] ligands were selected for experimental in vitro testing as candidates for papain-like protease inhibitors base on docking and quantum-chemical results. In case of experimental confirmation, these compounds may become the basis for direct-acting antiviral drugs for the SARS-CoV-2 coronavirus.

The COVID-19 pandemic caused by the SARS-CoV-2 coronavirus makes the development of drugs that directly suppress viral replication extremely urgent. Molecules of such drugs should bind to the active sites of the proteins (therapeutic targets) that are responsible for replication of the virus, block the functioning of these proteins, and thereby suppress virus replication. Computer-aided structure based drug design is the most efficient way to design and discover such molecules, called inhibitors [1, 2] . To use this technology, you need to know the therapeutic targets as well as their 3D structures. Currently, the molecular mechanisms of the SARS-CoV-2 replication life cycle are largely understood, the therapeutic targets are identified [3] [4] [5] , and 3D structures of corresponding protein-ligand complexes are determined with high resolution and stores in Protein Data Bank (PDB) [6] . The PDB structures of the protein-ligand complexes are used to construct a full-atomic model of the corresponding target protein, which, in turn, is used to search for inhibitors of the target protein. The search is performed using docking programs that determine the position of the ligand (a molecule assumed to be an inhibitor) in the active site of the target protein and estimate the energy of the protein-ligand binding [7, 8] . The higher the binding energy, the more effective a drug can be created using such an inhibitor because a lower drug concentration should lead to the desired therapeutic effect.

Virtual screening of large databases of ligands using docking helps identify ligands with high proteinligand binding energy [9] . The inhibitory activity of best binding ligands should be tested experimentally using an in vitro protein-binding assay where the so-called IC 50 concentration is determined. The latter is the concentration of the inhibitor capable of suppressing the target protein by 50%. The experimentally confirmed inhibitors, the so-called "hits", are subject to further optimization using the chemical synthesis of new compounds to increase their inhibitory activity, aqueous solubility and other properties that a medicine should have. The discovery of new inhibitors launches the entire drug development pipeline, consisting of testing inhibitor's ability to suppress viral replication in cell cultures, preclinical studies in animal models, and several phases of clinical studies in humans. Thus, the development of inhibitors is a key step in the entire drug development process that determines the success of new drug discovery.

Currently, there are no drugs that inhibit druggable targets of the SARS-CoV-2 coronavirus, but many computational and experimental efforts were made in the past year to find such inhibitors among the existing and approved drug [10, 11] . Several reversible and irreversible (covalent) low molecular weight inhibitors of various targets have been found, mainly RNA-dependent RNA protease and main protease (M pro ). However, SAR-CoV-2 papain-like polymerase (P L pro ) is also a wide-recognized druggable target, but there are only a few publications on its inhibitors with experimental confirmation [4, 10] , and most of them have been identified among existing drugs.

We present here the results of virtual screening a database containing more than 16 thousand structures of low molecular weight organic molecules using our own SOL docking program. The target protein is the SARS-CoV-2 papain-like protease. For the ligands with best docking scores the binding enthalpy is calculated using a quantum-chemical semiempirical method and an implicit solvent model. 19 compounds have been selected for the further experimental in vitro validation on the base of best docking scores as well as best binding enthalpy values. The same approach and database were recently used to search for SARS-CoV-2 M pro inhibitors [12] , and later the inhibitory activity of some of the identified compounds was experimentally confirmed.

To predict the activity of ligands against SARS-CoV-2 P L pro , we use structure-based virtual screening combined with quantum-chemical refinement of docking results. For this, a P L pro model is prepared using available structures of this protein in the PDB. We focus on the P L pro structures that contain co-crystallized inhibitors, since the P L pro active site has a flexible BL2 loop containing Tyr268, which changes its conformation upon binding of the inhibitor [13] . In terms of biochemistry, P L pro is a cysteine endoproteinase which cleaves three viral proteins: nsp1, nsp2, and nsp3-from polyproteins of SARS-CoV-2. Being cleaved, these proteins start performing their function. Besides proteolysis of polyproteins P L pro also inhibits host innate immune responses helping the virus evade immune recognition and elimination. The catalytic triad of SARS-CoV-2 P L pro consists of Cys111, His278, and Asp293.

Validation of the model should be performed using docking of known non-covalent PLpro inhibitors. Low-molecular-weight organic molecules subject to docking are prepared with considering different protomers at pH 7.4 and conformers of macrocycles. The virtual screening consists of two main stages: docking with subsequent quantum-chemical calculations of the protein-ligand binding enthalpy. The final selection of possible actives relies upon three criteria: sufficiently negative docking score and binding enthalpy, and visual observation of ligand positions in the protein active site. 

The success of computer screening largely depends on the initial experimental structure of the protein used to prepare its complete atomic spatial model suitable for modeling. To date, there are only twelve SARS-CoV-2 P L pro structures in the PDB. Of these, five crystal structures are related to apo-form, i.e. without ligands. Other 5 complexes contain covalent inhibitors. The remaining two complexes, 7D7T and 7JRN, crystallized with non-covalent SARS-CoV-2 P L pro inhibitors, have many missed atoms and poor 3.15Å (7D7T) or satisfactory 2.5Å resolution (7JRN). Only three PDB structures of SARS-CoV-2 P L pro have resolution better than 2.0Å, and only one of them is crystallized with an inhibitor (PDB ID: 6WX4) [14] . This complex (see Fig. 1 ) contains a peptide inhibitor covalently bound to the catalytic cysteine (Cys111) of P L pro . For clarity, the P L pro is represented in Fig. 1 by the solvent accessible surface covering this protein. Because this complex has good resolution (1.66Å) and only few terminus residues missed, we choose the 6WX4 for our design efforts.

Preparing P L pro extracted from 6WX4 includes the following steps. First, the covalent inhibitor, the VIR251 tetrapeptide bound to the Cys111 sulfur atom, is removed. Next, protonation of the protein atoms at pH 7.4 is performed using the APLITE program [15] . The addition of hydrogen atoms to the protein is required because PDB structures usually have no hydrogen atoms at all. After that, the broken covalent bond of the sulfur atom is saturated by a hydrogen atom, and Cys111 is restored to the state before the covalent inhibitor is bound. Finally, the positions of the Cys111 atoms are optimized using the MMFF94 force field to eliminate possible changes caused by the binding of the covalent inhibitor. In this study, we are not targeting the pocket next to Cys111, as it is very narrow and appears to only be appropriate for covalent inhibitors. Instead, we are looking for inhibitors that can bind next to the catalytic site, including Cys111, which is responsible for the cleavage of a substrate peptide bond, namely in the S3 and S4 subpockets located at the entrance to the catalytic pocket. The non-covalent SARS-CoV-2 P L pro inhibitors bind to these particular pockets in the 7D7T and 7JRN complexes.

Due to the high similarity between the SARS-CoV-2 and SARS-CoV-1 coronaviruses (82% gene sequence identity), we perform cross-docking validation of non-covalent SARS-CoV-1 P L pro inhibitors to verify the reproducibility of experimental conformations of the inhibitors with our P L pro model. The obtained values of the scoring function of the docking program, corresponding to the docked positions of such inhibitors, estimate the threshold separating inhibitors from inactive ligands during virtual screening. To assess accuracy of cross-docking, we superimpose atoms of the 6WX4 complex onto atoms of two complexes, 3MJ5 [16] and 4OW0 [17] , of SARS-CoV-1 P L pro with non-covalent inhibitors. The superimposed inhibitors become "quasi-native" as if they are crystallized with the protein of the 6WX4 complex. This procedure allows one to estimate cross-docking accuracy in terms of RMSD values calculated between the "quasi-native" ligand position and a predicted docked position. RMSD values lower than 2.0Å are indicators of successful cross-docking and confirm applicability of the model for screening. The results of cross-docking for our P L pro model are presented in Table 1 .

As shown in Table 1 , docking with the prepared SARS-CoV-2 P L pro model reproduces the bioactive conformation reasonably well for one of the two known non-covalent inhibitors of SARS-CoV-1 P L pro . In the case of the 4OW0 complex, the reason for the failed docking may be that this inhibitor is bound to the SARS-CoV-1 P L pro and its activity against SARS-CoV-2 P L pro is unknown. Based on this validation and limited available data on known inhibitors, we used the model based on 6WX4 for our further virtual screening. Taking into account extremely high worldwide scientific activity in the search for inhibitors of SARS-CoV-2 target proteins, new structures of SARS-CoV-2 P L pro complexes with non-covalent inhibitors will appear soon in the Protein Data Bank, and they should be used for further validation of the constructed here model and possibly its modification.

In this study, the chemical library of Voronezh State University (VGU) is selected for in silico screening. This library consists of around 16 thousand off-the-shelf compounds having drug-like and lead-like properties. The library has been successfully exploited in several drug design projects in which kinase inhibitors, thrombin inhibitors, factor Xa/XIa inhibitors, plant growth stimulants were predicted and experimentally confirmed.

Preparation of the VGU library for docking includes two steps: protonation at pH 7.4 and conversion into 3D format. The first step is performed with Chemaxon's pKa module [18] capable of predicting distribution of different charged microspecies for an organic molecule at given pH. More technically, it calculates pKa values of the molecule based on its partial charge distribution, polarizability and inner H-bonds. In this study, a few charged species for a ligand molecule are considered in our virtual screening with keeping those which are at least 10% present at a physiological pH. The second step, computation of 3D geometry, is conducted by OpenBabel [19] with additional generation of conformers for macrocycles (limit-no more than three conformers for each charge state of molecule). The total size of the VGU library prepared for screening is 42829 three-dimensional molecular structures.

SOL [2, 7, [20] [21] [22] is a classic docking program using a genetic algorithm for global optimization, and a pre-calculated grid of potentials describing the interactions of ligand atoms with a protein within the MMFF94 force field [23] . The target energy function undergoing global optimization is the sum of the energy of the ligand in the field of the protein and the ligand internal stress energy, the latter is also calculated within the MMFF94 force field. The desolvation effect upon ligand binding is taken into account in a simplified form of the generalized Born implicit solvent model. The best ligand position in the protein active site corresponds to the lowest value of the target energy function. This ligand position is found using a genetic algorithm that transforms an initial random population of ligand positions in a protein (a population of individuals) through a series of successive generations of the population, keeping the population size constant. The transition from the previous generation to the next is carried out by selecting individuals (ligand positions) with lowest values of the target energy function using the operations of crossover, mutations, elitism and niching [2, [20] [21] [22] . The position of the ligand in the protein active site with the lowest target energy function in the last generation is the solution to the problem of global optimization.

The default values for the most important parameters of the genetic algorithm, population size and number of generations, are 30000 and 1000, respectively. Each independent run of this algorithm requires from 1 to a couple of dozens of minutes on one computing core using the default parameters. The running time of the algorithm depends both on the size of the ligand and on the complexity of the active center of the protein. The Fig. 2 shows two graphs of the dependence of the average time of SOL docking on the number of ligand atoms for two different SARS-CoV-2 target proteins P L pro and M pro [12] .

The complexity of the ligand (the number of rotating groups) does not affect the duration of the SOL calculations. The heuristic nature of the genetic algorithm requires confirmation of reliability of the 2.4. Protein-ligand Binding Enthalpy After virtual screening using docking, best selected ligands are subjected to quantum-chemical calculations. The enthalpy of protein-ligand binding H bind is calculated using following equation:

where H(P L),H(P ),H(L) are values of the heat of formation of the protein-ligand complex, protein and ligand, respectively. These values are calculated by the MOPAC program [25] using the PM7 semiempirical quantum-chemical method [26] and the COSMO solvent model [27] . The PM7 method is a state-of-the-art semiempirical method with dispersion and hydrogen and halogen bond corrections, parameterized on an unprecedentedly broad molecular data calculated using rigorous ab initio quantum-chemical methods. Dispersion and hydrogen bonds are extremely important for the correct description of intermolecular as well as intramolecular interactions. The MOZYME module [28, 29] implemented in MOPAC makes it possible to calculate such large molecular systems as protein-ligand complexes containing many thousands of atoms. The choice of the PM7 method with the COSMO solvent model for calculating energies is also made because this energy function demonstrates high docking accuracy [30] [31] [32] [33] .

The H(P L) is calculated as follows. The position of the ligand in the active site of the protein, found during docking, is used as a starting point for local optimization of the PM7 energy of the proteinligand complex in vacuum using the limited-memory Broyden-Fletcher-Goldfarb-Shanno method in MOPAC. The Cartesian coordinates of all ligand atoms are varied while the positions of all protein atoms are unchanged. In the optimized geometry, the energy is calculated using PM7 again, but with COSMO solvent. The heat of formation of the unbound ligand H(L) is found using the following procedure. Several low energy ligand conformations are generated using the Open Babel program [19] , and then the energies of these conformations are optimized by the PM7 method in vacuum by changing the Cartesian coordinates of all ligand atoms, the heats of formation of the optimized conformations are recalculated using the COSMO solvent model, and the most negative of them is used in the equation (1) as H(L). The heat of formation of the unbound protein is calculated by PM7+COSMO without energy optimization for the same protein conformation used in docking. 

After virtual screening of all molecular structures of the VGU library only 38 ligands demonstrated reasonable values of the SOL docking score < −5.0 kcal/mol. Only for these ligands the enthalpy of protein-ligand binding is calculated. As a result, 19 compounds have been selected for the experimental testing. They are presented in Table 2 . Two-dimensional structures of representatives of these compounds are shown in Fig. 2 , and their analogues are listed in Table 2 . SOL score is the estimation of the free energy of protein-ligand binding calculated by the SOL docking program.

H bind is the enthalpy of protein-ligand binding calculated by the equation (1).

As can be seen in Fig. 3 , the selected molecules do not contain warhead groups. These are reactive groups of atoms that are capable of forming covalent bonds with some protein residues. This highlights that we have mainly found non-covalent inhibitors of SARS-CoV-2 P L pro here. One exception is 17513 which is a derivative of dithiocarboxylic acid and its mercapto part can serve as a reactive group.

The known non-covalent inhibitors of SARS-CoV P L pro and SARS-CoV-2 P L pro , for which crystal complexes are resolved, are related to derivatives of naphthalenylmethylpiperidine. This moiety is not found among our top compounds. However, the common pharmacophore pattern resembling naphthalenylmethylpiperidine "-a positively charged nitrogen in proximity with an aromatic fragment can be found for 15108, 16189, 39544, 40858, 40863, 214132, 214136, 225718 . This finding underpins the importance of negatively charged Asp164 and a few aromatic residues (T yr264, T yr268, T yr273) for ligand binding. We can also note that the benzodioxole fragment found for one of known inhibitors of SARS-CoV P L pro (the native ligand from 3MJ5) is also present in 15108. Score 5.78 , 'H 53.41 Fig. 3 . Structures of compounds selected as P L pro inhibitors for in vitro experimental test (see Table 2 ).

The virtual screening of a drug-like database of organic compounds is performed using a two-step procedure. First, the SOL program is used to select most promising ligands on the base of their SOL score. Second, the enthalpy of protein-ligand binding is computed for these selected ligand using the quantum-chemical semiempirical PM7 method with the COSMO implicit solvent model. The most promising candidates 19 compounds have been submitted for experimental determination of their inhibitory activity against the SARS-CoV-2 papain-like protease.

The work is supported by Russian Ministry of Science and Higher Education, agreement no. 075-15-2019-1621. The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University.

Supercomputing technologies in medicine

Docking: Molecular Modeling for Drug Discovery

COVID-19: Drug targets and potential treatments

Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods

Druggable targets from coronaviruses for designing new antiviral drugs

The protein data bank

Advances in docking

Docking paradigm in drug design

Supercomputer docking

Therapeutic targets and potential agents for the treatment of COVID-19

Chinese therapeutic strategy for fighting COVID-19 and potential small-molecule inhibitors against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)

In search of non-covalent inhibitors of SARS-CoV-2 main protease: Computer aided drug design using docking and quantum chemistry

Targeting SARS-CoV-2 proteases and polymerase for COVID-19 treatment: State of the art and future opportunities

Activity profiling and crystal structures of inhibitor-bound SARS-CoV-2 papainlike protease: A framework for anti-COVID-19 drug design

Influence of the method of hydrogen atoms incorporation into the target protein on the protein-ligand binding energy

Severe acute respiratory syndrome coronavirus papain-like novel protease inhibitors: Design, synthesis, protein-ligand X-ray structure and biological evaluation

X-ray structural and biological evaluation of a series of potent and highly selective inhibitors of human coronavirus papain-like proteases

ChemAxonsoftware

Open Babel: An open chemical toolbox

The SOL docking package for computer-aided drug design

Application of the docking program SOL for CSAR benchmark

Development of docking programs for Lomonosov supercomputer

Merck molecular force field

Supercomputer Lomonosov-2: Large scale, deep monitoring and fine analytics for the user community

Stewart Computational Chemistry. MOPAC2016

Optimization of parameters for semiempirical methods V I: More modifications to the NDDO approximations and re-optimization of parameters

COSMO: A new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient

Application of localized molecular orbitals to the solution of semiempirical self-consistent field equations

Application of the PM6 semi-empirical method to modeling proteins enhances docking accuracy of AutoDock

Evaluation of docking target functions by the comprehensive investigation of protein-ligand energy minima

The SQM/COSMO filter: Reliable native pose identification based on the quantum-mechanical description of protein-ligand interactions and implicit COSMO solvation

New generation of docking programs: Supercomputer validation of force fields and quantum-chemical methods for docking

Search for approaches to supercomputer quantum-chemical docking