key: cord-0866297-cxouk3vv authors: Arshia, Amir Hossein; Shadravan, Shayan; Solhjoo, Aida; Sakhteman, Amirhossein; Sami, Ashkan title: De Novo design of Novel protease inhibitor candidates in the treatment of SARS-CoV-2 using deep learning, docking, and molecular dynamic simulations date: 2021-10-25 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2021.104967 sha: 1919f6461658001df68ea78690bf6b8ffd2c6be9 doc_id: 866297 cord_uid: cxouk3vv The main protease of SARS-CoV-2 is a critical target for the design and development of antiviral drugs. 2.5 M compounds were used in this study to train an LSTM generative network via transfer learning in order to identify the four best candidates capable of inhibiting the main proteases in SARS-CoV-2. The network was fine-tuned over ten generations, with each generation resulting in higher binding affinity scores. The binding affinities and interactions between the selected candidates and the SARS-CoV-2 main protease are predicted using a molecular docking simulation using AutoDock Vina. The compounds selected have a strong interaction with the key MET 165 and Cys145 residues. Molecular dynamics (MD) simulations were run for 150ns to validate the docking results on the top four ligands. Additionally, root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), and hydrogen bond analysis strongly support these findings. Furthermore, the MM-PBSA free energy calculations revealed that these chemical molecules have stable and favorable energies, resulting in a strong binding with Mpro's binding site. This study's extensive computational and statistical analyses indicate that the selected candidates may be used as potential inhibitors against the SARS-CoV-2 in-silico environment. However, additional in-vitro, in-vivo, and clinical trials are required to demonstrate their true efficacy. The recent COVID-19 pandemic, caused by the Severe Acute Respiratory Syndrome benefit in patients with severe symptoms (17). Thus, it is necessary to develop more 56 effective and capable new chemical entities (NCEs) to target the 3CL protease in the virus 57 specifically. Thanks to the most recent technological advancement in AI, scientists can 58 now extract existing knowledge and use it to investigate the virtually limitless chemical 59 space and create new small molecules with the necessary biological and physicochemical 60 properties to treat various diseases (18, 19) . It is worth noting that artificial intelligence- diagram depicting the generation, fine-tuning, and evaluation sessions 6 Afterward, 30 SMILEs were randomly selected from the validated set, and their Tanimoto 121 similarity to other validated SMILEs was calculated. The threshold was primarily set to 122 0.05. If there were similar insufficient compounds, the threshold was increased 123 incrementally until 1,000 SMILE candidates were found. Additionally, a new list of HIV 124 Inhibitor SMILEs was added to the existing list. Ultimately, the list was docked with the 125 6LU7 protein. 126 127 Molecular docking 128 The RCSB Protein Data Bank (PDB ID: 6LU7) was accessed to download the crystal 129 structure of the SARS-CoV-2 main protease (SARS-CoV-2 Mpro) in complex with the 130 inhibitor N3 (10) . AutoDockTools-1.5.6 was used to prepare the protein by removing 131 water atoms and the native ligand from the active site, adding polar hydrogen atoms and 132 charges, and converting the protein and ligand PDB files to a PDBQT format. A molecular 133 docking study was performed using the AutoDock Vina virtual screening program (version 134 1.1.2) to determine the protein's interacting residues with specific ligands (23). Re- 135 docking 6LU7 with its crystallographic inhibitor was performed to validate docking studies. 136 The grid box's center points and dimensions were set to target the active site of the main 137 protease protein (24), with the center at X: -11.493, Y: 10 possess the slightest resemblance to the new list were added. Moreover, the weight of 151 each molecule was determined. A weight adjustment score was defined in order to 152 prioritize the molecules with the smallest mass. 153 Consequently, the weight-adjusted score for each SMILE was calculated to prioritize the 154 molecules with lighter compounds. The obtained list was sorted by Weight Adjustment 155 Score; the first five entries were selected and added to the new list. Inspired by the 156 fundamental genetic algorithms for reinforcing random mutations, another five SMILEs 157 were randomly selected and added to the list from the basic generation (generation 0). 158 In total, 50 SMILEs were chosen based on these criteria. 159 The 50 SMILEs were used to fine-tune the LSTM network. Over ten iterations, the network 160 was trained using this list. After updating the weights, 10,000 brand-new SMILEs were 161 generated for the subsequent generation, validity, uniqueness, and originality were 162 calculated, and the docking and fine-tuning steps were repeated for a total of ten 163 generations, as illustrated in Figure 1 (b). 164 All ten generations were combined and sorted based on their binding affinity score in this 165 step. Given the total number of generated ligands, 20,866, only those with a binding 166 affinity score of less than ten were selected. Following that, the list was expanded to 167 include HIV inhibitors and Remdesivir. Then, all of the SMILEs and their associated 168 properties were saved in a file, along with the ligands used to calculate binding affinity. was continued until all clusters were merged as one. According to Figure The protein complexes were solvated in the cubic box with TIP3P water molecules (29) 211 and then neutralized by adding 0.15 mol/L Na/Cl-ions. The system was minimized using 212 the steepest descent algorithm. The systems were then equilibrated using NVT and NPT 213 with 100 ps steps, respectively, using a V-rescale Berendsen thermostat and a Parrinello- 214 Rahman. The heating of the systems was gradually increased from 0 to 300 K, and the 215 pressure of the systems was set to 1 atm for the NVT and NPT ensembles, respectively. 286 The "gmx covar" was used to generate the eigenvalues and eigenvectors by computing 19 . To this end, following validation, each SMILE was docked with the SARS-CoV-2 main 327 protease to identify those with the highest binding energy, which were then used to 328 improve the network by fine-tuning it after each generation. 329 A recent study chose the binding affinity score between the novel generated ligands and 330 the SARS-CoV-2 3CL protease as the primary measurement criterion. Additionally, a previous study on novel generated ligands achieved a maximum binding affinity score of 332 -38.07 KJ/mol, whereas selected ligands' maximum binding affinity score was -54 KJ/mol 333 (the complete table is available in supplementary table l) 334 In-silico studies 335 Computer-aided drug design (CADD) has emerged as a valuable approach in modern 336 drug discovery due to its ability to reduce the cost and labor associated with the process 337 significantly. Therefore, due to the reliability of their predictions, molecular docking, The binding free energy between selected ligands in the active site and the main protease 483 was calculated using MM-PBSA. Table 2 In addition, the total binding free energies between the four selected ligands and n3 and 505 protein were decomposed into each residue to identify the key residues involved in the 535 The radius of gyration (Rg) in each simulation system was calculated to determine the 536 structure's compactness. The lower degree of fluctuation and its consistency throughout 537 the simulation indicates that the system is more compact and rigid. Thus, a stably folded 538 protein is likely to maintain a relatively constant radius of gyration (50). The radius of gyration for each system is plotted against simulation time in Figure 10 Concise but Updated Comprehensive Review Prevention Centers for Disease Control and. Novel coronavirus. Wuhan China: 611 Information for Healthcare Professionals Clinical features of patients Novel 2019 coronavirus structure, 621 mechanism of action, antiviral drug promises and rule out against its treatment Coronaviruses -drug 624 discovery and therapeutic options Structural bioinformatics and its impact to biomedical science Molecular modeling and chemical 628 modification for finding peptide inhibitor against severe acute respiratory 629 syndrome coronavirus main proteinase Crystal structure 632 of SARS-CoV-2 main protease provides a basis for design of improved α-633 ketoamide inhibitors Advancing drug discovery via 635 artificial intelligence Potential Drugs Targeting Early Innate Immune 637 Evasion of SARS-Coronavirus 2 via 2'-O-Methylation of Viral RNA. Viruses Knowledge-based structural models of SARS-CoV-2 proteins and their complexes 641 with potential drugs Screening of plant-based natural compounds as a 643 potential COVID-19 main protease inhibitor: an in silico docking and molecular 644 dynamics simulation approach Identification 646 of bioactive molecules from tea plant as SARS-CoV-2 main protease inhibitors Rapid Identification of Potential 649 Inhibitors of SARS-CoV-2 Main Protease by Deep Docking of 1 Ritonavir in Adults Hospitalized with Severe Covid-19 Deep reinforcement learning for de novo drug 655 design Deep learning enables rapid identification of potent DDR1 kinase 658 inhibitors A 660 deep learning approach to antibiotic discovery ChEMBL: 663 a large-scale bioactivity database for drug discovery Molecular sets (MOSES): A benchmarking platform for molecular 667 generation models AutoDock Vina: improving the speed and accuracy of docking 669 with a new scoring function, efficient optimization, and multithreading Only one protomer is active in the 672 dimer of SARS 3C-like proteinase Virtual 674 screening, molecular dynamics and structure-activity relationship studies to 675 identify potent approved drugs for Covid-19 treatment GROMACS: fast, flexible, and free Binding mechanism of inhibitors to p38α 680 MAP kinase deciphered by using multiple replica Gaussian accelerated molecular 681 dynamics and calculations of binding free energies Effect of mutations on binding of 684 ligands to guanine riboswitch probed by free energy perturbation and molecular 685 dynamics simulations Optimized intermolecular potential 687 functions for liquid hydrocarbons A fast SHAKE algorithm to 689 solve distance constraint equations for small molecules in molecular dynamics 690 simulations VMD: visual molecular dynamics Pymol: An open-source molecular graphics tool. CCP4 Newsletter 694 on protein crystallography An Efficient Program for End-State Free Energy Calculations Open Source Drug Discovery Consortium, Lynn A. 699 g_mmpbsa--a GROMACS tool for high-throughput MM-PBSA calculations Calculating 702 structures and free energies of complex molecules: combining molecular 703 mechanics and continuum models Free Energy Calculations by the Molecular Mechanics 705 Target SARS-CoV-2: 709 computation of binding energies with drugs of dexamethasone/umifenovir by 710 molecular dynamics using OPLS-AA force field Comprehensive in silico screening and molecular dynamics studies of 713 missense mutations in Sjogren-Larsson syndrome associated with the ALDH3A2 714 gene Farnoosh 716 G. Disulfide bridge formation to increase thermostability of DFPase enzyme: A 717 computational study Deep Learning for Predicting Drug-Target Interactions: A Case Study Learning-Based Potential Ligand Prediction Framework for COVID-19 with Drug-723 De novo design of new chemical entities 725 for SARS-CoV-2 using artificial intelligence De novo design and bioactivity prediction of SARS-CoV-2 main protease inhibitors using recurrent neural network-based transfer 729 learning. BMC Chemistry Successful applications of computer aided 731 drug discovery: moving drugs from concept to the clinic Discovery of Ganoderma lucidum triterpenoids as potential inhibitors against 735 Dengue virus NS2B-NS3 protease. Sci Rep Virtual 737 screening and molecular dynamics simulation study of plant-derived compounds 738 to identify potential inhibitors of main protease from SARS-CoV-2. Brief 739 Bioinformatics Structure-based 741 screening and validation of bioactive compounds as Zika virus methyltransferase 742 (MTase) inhibitors through first-principle density functional theory, classical 743 molecular simulation and QM/MM affinity estimation Pharmacoinformatics and molecular dynamics simulation studies reveal 747 potential covalent and FDA-approved inhibitors of SARS-CoV-2 main protease 748 3CLpro A 750 molecular modeling approach to identify effective antiviral phytochemicals against 751 the main protease of SARS-CoV-2 604 This research has been partially funded by Shiraz University.