key: cord-317227-zb434ve3 authors: Beck, Bo Ram; Shin, Bonggun; Choi, Yoonjung; Park, Sungsoo; Kang, Keunsoo title: Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model date: 2020-03-30 journal: Comput Struct Biotechnol J DOI: 10.1016/j.csbj.2020.03.025 sha: doc_id: 317227 cord_uid: zb434ve3 Abstract The infection of a novel coronavirus found in Wuhan of China (SARS-CoV-2) is rapidly spreading, and the incidence rate is increasing worldwide. Due to the lack of effective treatment options for SARS-CoV-2, various strategies are being tested in China, including drug repurposing. In this study, we used our pre-trained deep learning-based drug-target interaction model called Molecule Transformer-Drug Target Interaction (MT-DTI) to identify commercially available drugs that could act on viral proteins of SARS-CoV-2. The result showed that atazanavir, an antiretroviral medication used to treat and prevent the human immunodeficiency virus (HIV), is the best chemical compound, showing an inhibitory potency with Kd of 94.94 nM against the SARS-CoV-2 3C-like proteinase, followed by remdesivir (113.13 nM), efavirenz (199.17 nM), ritonavir (204.05 nM), and dolutegravir (336.91 nM). Interestingly, lopinavir, ritonavir, and darunavir are all designed to target viral proteinases. However, in our prediction, they may also bind to the replication complex components of SARS-CoV-2 with an inhibitory potency with Kd < 1,000 nM. In addition, we also found that several antiviral agents, such as Kaletra (lopinavir/ritonavir), could be used for the treatment of SARS-CoV-2. Overall, we suggest that the list of antiviral drugs identified by the MT-DTI model should be considered, when establishing effective treatment strategies for SARS-CoV-2. Coronaviruses (CoVs), belonging to the family Coronaviridae, are positive-sense enveloped RNA viruses and cause infections in birds, mammals, and humans (1) (2) (3) . The family includes four genera, such as Alphacoronavirus, Betacoronavirus, Deltacoronavirus, and Gammacoronavirus (4) . Two infamous infectious coronaviruses in the genus Betacoronavirus are severe acute respiratory syndrome coronavirus (SARS-CoV) (5) and Middle East respiratory syndrome coronavirus (MERS-CoV) (6) , which have infected more than 10,000 people around the world in the past two decades. Unfortunately, the incidence was accompanied by high mortality rates (9.6% for SARS-CoV and 34.4% for MERS-CoV), indicating that there is an urgent need for effective treatment at the beginning of the outbreak to prevent the spread (7, 8) . However, this cannot be achieved with current drug development or an application system, taking several years for newly developed drugs to come to the market. Unexpectedly, the world is facing the same situation as the previous outbreak due to a recent epidemic of atypical pneumonia (designated as coronavirus disease 2019; COVID- 19) caused by a novel coronavirus (severe acute respiratory syndrome coronavirus 2; SARS-CoV-2) in Wuhan, China (5, 9) . SARS-CoV-2, which belongs to Betacoronavirus, contains a positive-sense singlestranded RNA [(+)ssRNA] genome (29,903 bp) and contains genes encoding 3C-like proteinase, RNA-dependent RNA polymerase (RdRp), 2'-O-ribose methyltransferase, spike protein, envelope protein, nucleocapsid phosphoprotein, and several unknown proteins, according to the genome sequencing data of SARS-CoV-2 (https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/). Typical clinical symptoms of COVID-19 are fever, dry cough, and fatigue within 3-7 days of latency on average after infection. This is relatively slower than severe acute respiratory syndrome (SARS), which was caused by SARS-CoV (10) . During the life cycle of coronaviruses, the virus replicates via the following processes after entering the host cell: 1) translation of genomic RNA (gRNA), 2) proteolysis of the translated polyprotein with viral 3C-like proteinase, 3) replication of gRNA with the viral replication complex that consists of RNA-dependent RNA polymerase (RdRp), helicase, 3'-to-5' exonuclease, endoRNAse, and 2'-O-ribose methyltransferase, and 4) assembly of viral components (11) . These replication-associated proteins are the primary targets of post-entry treatment drugs to suppress viral replication. Although much intensive effort is being made worldwide to develop drugs or vaccines for SARS-CoV-2, patients currently suffering from COVID-19 cannot expect benefits from them due to the slow development process of novel drugs or vaccines. Thus, a rapid drug application strategy that can be immediately applied to the patient is necessary. Currently, the only way to address this matter is to repurpose commercially available drugs for the pathogen in so-called "drug-repurposing". However, in theory, artificial intelligence (AI)-based architectures must be taken into account in order to accurately predict drug-target interactions (DTIs). This is because of the enormous amount of complex information (e.g. hydrophobic interactions, ionic interactions, hydrogen bonding, and/or van der Waals forces) between molecules. To this end, we previously developed a deep learning-based drug-target interaction prediction model, called Molecule Transformer-Drug Target Interaction (MT-DTI) (12) . In this study, we applied our pre-trained MT-DTI model to identify commercially available antiviral drugs that could potentially disrupt SARS-CoV-2's viral components, such as proteinase, RNA-dependent RNA polymerase, and/or helicase. Since the model utilizes simplified molecular-input line-entry system (SMILES) strings and amino acid (AA) sequences, which are 1D string inputs, it is possible to quickly apply target proteins that do not have experimentally confirmed 3D crystal structures, such as viral proteins of SARS-CoV-2. We share a list of top commercially available antiviral drugs that could potentially hinder the multiplication cycle of SARS-CoV-2 with the hope that effective drugs can be developed based on these AI-proposed drug candidates and act against SARS-CoV-2. Amino acid sequences of 3C-like proteinase (accession YP_009725301.1), RNAdependent RNA polymerase (accession YP_009725307.1), helicase (accession YP_009725308.1), 3'-to-5' exonuclease (accession YP_009725309.1), endoRNAse (accession YP_009725310.1), and 2'-O-ribose methyltransferase (accession YP_009725311.1) of the SARS-CoV-2 replication complex were extracted from the SARS-CoV-2 whole genome sequence (accession NC_045512.2), from the National Center for Biotechnology Information (NCBI) database. Molecule transformer-drug target interaction (MT-DTI) was used to predict binding affinity values between commercially available antiviral drugs and target proteins. MT-DTI is based on the self-attention mechanism that showed remarkable success in natural language process (NLP) literature. MT-DTI is inspired by the idea that for a chemist, understanding a molecule sequence is analogous to understanding a language. To apply the NLP model to drug-target interaction (DTI) tasks, MT-DTI is pre-trained with 'chemical language' (represented as SMILES) of approximately 1,000,000,000 compounds. Similar to the NLP model, which successfully extracts complex patterns from word sequences, MT-DTI successfully finds useful information in DTI tasks. Therefore, it shows the best performance and most robust results in diverse DTI datasets according to a previous study (12) . To train the model, the Drug Target Common (DTC) database (13) and BindingDB (14) database were manually curated and combined. Three types of efficacy value, Ki, Kd, and IC50 were integrated by a consistence-score-based averaging algorithm (15) to make the Pearson correlation score over 0.9 in terms of Ki, Kd, and IC50. Since the BindingDB database includes a wide variety of species and target proteins, the MT-DTI model has the potential power to predict interactions between antiviral drugs and SARS-CoV-2 proteins. After the MT-DTI prediction, the raw prediction results were screened for antiviral drugs that are FDA approved, target viral proteins, and have a Kd value less than 1,000 nM. SMILES containing salt forms were excluded from the final results as the prediction is focused to pairs of a single molecule and the target protein. In addition, remdesivir was also incoprated in the analysis as its therapeutic potential to COVID-19 is recently suggested by Wang et al. (16) and Gliead Sciences announcements (https://www.gilead.com/purpose/advancing-globalhealth/covid-19). AutoDock Vina (version 1.1.2), which is a molecular docking and virtual screening application (17) , was used to predict binding affinities (kcal/mol) between 3C-like proteinase of SARS-CoV-2 and 3,410 FDA-approved drugs. SMILES of 3,410 FDA-approved drugs were converted to the PDBQT format using Open Babel (version 2.3.2) (18) with the following options: --gen3d -p 7.4. The hydrogens were added to the 3C-like proteinase model using MGLTools (version 1.5.6) (19) . Then, binding affinities between the protein and FDAapproved drugs were calculated using AutoDock Vina. The exhaustiveness parameter was set to 10. To identify potent FDA-approved drugs that may inhibit the functions of SARS-CoV-2's core proteins, we used the MT-DTI deep learning-based model, which can accurately predict binding affinities based on chemical sequences (SMILES) and amino acid sequences (FASTA) of a target protein, without their structural information (12) . This deep learningbased approach is particularly useful, since it does not require protein structural information, which can be a bottleneck for identifying drugs targeted for uncharacterized proteins with traditional three-dimensional (3D) structure-based docking approaches (20) . Neverthless, MT-DTI showed the best performance (12) when compared to a deep learning-based (DeepDTA) approach (21) and two traditional machine learning-based algorithms SimBoost (22) , and KronRLS (23), with the KIBA (24) and DAVIS (25) data sets. Taking advantage of this sequence-based drug-target affinity prediction approach, binding affinities of 3,410 FDAapproved drugs against 3C-like proteinase, RdRp, helicase, 3'-to-5' exonuclease, endoRNAse, and 2'-O-ribose methyltransferase of SARS-CoV-2 were predicted. To confirm the performance of MT-DTI at least in silico, we compared the binding affinities of 3,410 FDAapproved drugs predicted by MT-DTI to those estimated by AutoDock Vina (a widely used 3D structure-based docking algorithm). It was possible since the 3D structure of the 3C-like proteinase protein was recently unveiled by the X-ray crystallography (PDBID 6LU7) (26). Significant negative correlations, meaning that the results of both algorithms showed moderate similarities (higher is better for MT-DTI, whereas lower is better for AutoDock Vina) were observed in both the antiviral drug dataset (R = -0.34, and p-value = 0.0071) and the FDA-approved drug dataset (R = -0.32, and p-value < 2.2e-16) (Fig. 1 ). While it is not possible to determine which algorithm is more reliable without various experimental evaluations, a previous study showed that the MT-DTI model is one of the best deep learningbased models that can predict the binding affinity between a given protein and compound (12) . Therefore, we further applied the MT-DTI model to repurpose those FDA-approved drugs that have the potential to inhibit key proteins of SARS-CoV-2. The SARS-CoV-2 3C-like proteinase was predicted to bind with atazanavir (Kd 94.94 nM), followed by remdesivir, efavirenz, ritonavir, and other antiviral drugs that have a predicted affinity of Kd > 100 nM potency (Table 1) . No other protease inhibitor antiviral drug was found in the Kd < 1,000 nM range. Although there is no real-world evidence about whether these drugs will act as predicted against COVID-19 yet, some case studies have been identified. For example, a docking study of lopinavir along with other HIV proteinase inhibitors of the CoV proteinase (PDBID 1UK3) suggests atazanavir and ritonavir, which are listed in the present prediction results, may inhibit the CoV proteinase in line with the inhibitory potency of lopinavir (27) . According to the prediction, viral proteinase-targeting drugs were predicted to act more favorably on the viral replication process than viral proteinase through the DTI model (Tables 2-6 ). The results include antiviral drugs other than proteinase inhibitors, such as guanosine analogues (e.g., acyclovir, ganciclovir, and penciclovir), reverse transcriptase inhibitors, and integrase inhibitors. Among the prediction results, atazanavir was predicted to have a potential binding affinity to bind to RNA-dependent RNA polymerase (Kd 21.83 nM), helicase (Kd 25.92 nM), 3'-to-5' exonuclease (Kd 82.36 nM), 2'-O-ribose methyltransferase (Kd of 390.67 nM), and endoRNAse (Kd 50.32 nM), which suggests that all subunits of the COVID-19 replication complex may be inhibited simultaneously by atazanavir (Tables 2-6 ). Also, ganciclovir was predicted to bind to three subunits of the replication complex of the COVID-19: RNA-dependent RNA polymerase (Kd 11.91 nM), 3'-to-5' exonuclease (Kd 56.29 nM), and RNA helicase (Kd 108.21 nM). Lopinavir and ritonavir, active materials of AbbVie's Kaletra, both were predicted to have a potential affinity to COVID-19 helicase (Table 3) and are suggested as potential MERS therapeutics (28). Recently, approximately $2 million worth of Kaletra doses were donated to China (29) , and a previous clinical study of SARS by Chu et al. (30) may support this decision (30) . Another anti-HIV drug, Prezcobix of Johnson & Johnson, which consists of darunavir and cobicistat, was to be sent to China (29) , and darunavir is also predicted to have a Kd of 90.38 nM against COVID-19's helicase (Table 3) . However, there was no current supporting literature found for darunavir to be used as a CoV therapeutic. Although remdesivir is not a FDA approved drug, its predicted potency to COVID-19 resulted as follows: against RNA-dependent RNA polymerase (Kd 20. In many cases, DTI prediction models serve as a tool to repurpose drugs to develop novel usages of existing drugs. The application of DTI prediction in the present study may be useful to control unexpected and rapidly spreading infections such SARS-CoV, Middle East respiratory syndrome (MERS-CoV), and SARS-Cov-2 at the frontline of the disease control until better therapeutic measures are developed. Several recent studies have identified promising drug candidates that may help reduce symptoms of COVID-19 by inhibiting some aspects of SARS-CoV-2. For example, remdesivir and chloroquine showed inhibitory effects against SARS-CoV-2 in vitro (31) . Another in-vitro study showed that hydroxychloroquine was found to be more potent than chloroquine for inhibiting SARS-CoV-2 (32). Remdesivir and lopinavir/ritonavir (Kaletra) also reduced pneumonia-associated symptoms of some COVID-19 patients (33, 34) . However, these studies are based on previous knowledge that these drugs showed some inhibitory effects on similar coronaviruses such as SARS-CoV and/or MERS-CoV. In contrast, our approach was truly based on a pre-trained MT-DTI deep-learning model that understands drug-target interactions without domain knowledge (12) . In fact, MT-DTI successfully identified the epidermal growth factor receptor (EGFR)-targeted drugs that are used in clinics (in top-30 predicted candidates) among 1,794 chemical compounds registered in the DrugBank database in a previous study (12) , suggesting that 3D structural information of proteins and/or molecules is not necessarily required to predict drug-target interactions. Our results showed the following intriguing findings that need to be tested experimentally and clinically in the near future. First, MT-DTI generally showed similar results overall compared to the conventional 3D structure-based prediction model, AutoDock Vina, but some differences were observed. For example, atazanavir, remdesivir, and efavirenz were the top three predicted drugs that may bind to the 3C-like proteinase of SARS-CoV-2. This is while saquinavir, nelfinavir, and grazoprevir were the top three drugs identified by AutoDock Vina (Fig. 1) . Secondly, when the search space was expanded to all FDA-approved drugs, some immunosuppressant drugs (rapamycin and everolimus) and a drug (tiotropium bromide) for asthma and chronic obstructive pulmonary disease (COPD) were identified as promising candidates by MT-DTI. In contrast, AutoDock Vina predicted purmorphanime, lumacaftor, and verrucarin A were the top three drugs that could bind to the 3C-like proteinase of SARS-CoV-2. However, there is currently no supporting evidence that these drugs may be effective in inhibiting SARS-CoV-2. Lastly, atazanavir appears to be effective in the treatment of COVID-19 by showing overall high binding affinities among tested antivirals for six proteins of SARS-CoV-2 including 3C-like proteinase and the replication complex components (Tables 1-6 and S1-6). But, this prediction also needs to be validated in vitro, in vivo, and in a wide range of clinical trials for efficacy and safety. We hope our prediction results may support experimental therapeutic options for China and other countries suffering from the SARS-CoV-2 pandemic and align with recent clinical trials (35) . Beck B.R., Choi Y., and Park S. are employed by company Deargen Inc. Shin B. is employed by Deargen Inc as a part-time advisor. Kang K. is one of the co-founders of, and a shareholder in, Deargen Inc. Comparison of MT-DTI and AutoDock Vina results. 60 known FDA-approved antiviral drugs (left) and 3,410 FDA-approved drugs (right) were evaluated by means of the MT-DTI deep learning-based affinity score (higher is better), and AutoDock Vina docking score (lower is better). Remdesivir, which is not an FDA-approved drug, but regarded as a promising antiviral drug for SARS-CoV-2, was included in this analysis. Table 2 . Drug-target interaction (DTI) prediction results of antiviral drugs available on markets against a novel coronavirus (SARS-CoV-2, NCBI reference sequence NC_045512.2) RNA-dependent RNA polymerase (accession YP_009725307.1). * indicates isomeric form SMILES. Coronavirus avian infectious bronchitis virus. Veterinary research Human coronaviruses: a review of virus-host interactions Coronavirus pathogenesis Coronavirus genomics and bioinformatics analysis Newly discovered coronavirus as the primary cause of severe acute respiratory syndrome Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Middle East respiratory syndrome coronavirus (MERS-CoV) Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Systematic Comparison of Two Animal-to-Human Transmitted Human Coronaviruses: SARS-CoV-2 and SARS-CoV Human Coronavirus: Host-Pathogen Interaction Self-attention based molecule representation for predicting drug-target interaction Drug Target Commons 2.0: a community platform for systematic analysis of drug-target interaction profiles BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities Global mapping of pharmacological space Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading Open Babel: An open chemical toolbox AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility Software for molecular docking: a review DeepDTA: deep drug-target binding affinity prediction SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines Toward more realistic drug-target interaction predictions Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis Comprehensive analysis of kinase inhibitor selectivity Structure of M pro from COVID-19 virus and discovery of its inhibitors Lopinavir; a potent drug against coronavirus Infection: insight from molecular docking study. Archives of Clinical Infectious Diseases drugmakers ship therapies to China, seeking to treat coronavirus. The Wall Street Journal Role of lopinavir/ritonavir in the treatment of SARS: initial virological and clinical findings Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro In Vitro Antiviral Activity and Projection of Optimized Dosing Design of Hydroxychloroquine for the Treatment of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) First Case of 2019 Novel Coronavirus in the United States Clinical characteristics and therapeutic procedure for four cases with 2019 novel coronavirus pneumonia receiving combined Chinese and Western medicine treatment More than 80 clinical trials launch to test coronavirus treatments Ganciclovir Nc1nc(=O)c2ncn(COC(CO)CO)c2 Penciclovir Nc1nc(=O)c2ncn(CCC(CO)CO)c2 =O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc (-c2ccccn2)cc1) C[C@H]1CCCCCc1nc3ccc(OC)cc3nc1O2)C(=O)N S(=O)(=O) NC(=O)C1CC(Oc2ncc(OC)c3ccc(Cl)cc Ritonavir CC(C)c1nc(CN(C)C(=O)NC(C(=O)NC(Cc2ccccc2) CC(O)C(Cc2ccccc2)NC(=O)OCc2cncs2)C(C)C) C)C)sc2C(=O)O) C2CCC(O)CC2)CC1 )NCc3ccc(F)cc3) c(O)c(=O)n2C NC(=O)C1CC(Oc2ncc(OC)c3ccc(Cl)cc Ritonavir CC(C)c1nc(CN(C)C(=O)NC(C(=O)NC(Cc2ccccc2) CC(O)C(Cc2ccccc2)NC(=O)OCc2cncs2)C(C)C) C[C@H]1CCCCCc1nc3ccc(OC)cc3nc1O2)C(=O)N S(=O)(=O) Dolutegravir CC1CCOC2Cn3cc(C(=O)NCc4ccc(F)cc4F)c(=O)c( O)c3C(=O) C)C)sc2C(=O)O) C2CCC(O)CC2)CC1 Cl)cc2C(C#CC2CC2)(C(F)(F)F) Nelfinavir Cc1c(O)cccc1C(=O)NC(CSc1ccccc1)C(O) CCCCC2CC1C(=O) Abacavir Nc1nc(NC2CC2)c2ncn(C3C=CC(CO)C3)c2n1 )NCc3ccc(F)cc3) c(O)c(=O)n2C Writing Original draft preparation This study was supported by a grant from the National R&D Program for Cancer Control, Ministry of Health & Welfare, Republic of Korea (1720100).