key: cord-0978035-k6jpq930 authors: Yang, Fan; Zhang, Qi; Ji, Xiaokang; Zhang, Yanchun; Li, Wentao; Peng, Shaoliang; Xue, Fuzhong title: Machine Learning Applications in Drug Repurposing date: 2022-01-23 journal: Interdiscip Sci DOI: 10.1007/s12539-021-00487-8 sha: 021298624510d5b0b1e9044dff21dad76946435a doc_id: 978035 cord_uid: k6jpq930 The coronavirus disease (COVID-19) has led to an rush to repurpose existing drugs, although the underlying evidence base is of variable quality. Drug repurposing is a technique by taking advantage of existing known drugs or drug combinations to be explored in an unexpected medical scenario. Drug repurposing, hence, plays a vital role in accelerating the pre-clinical process of designing novel drugs by saving time and cost compared to the traditional de novo drug discovery processes. Since drug repurposing depends on massive observed data from existing drugs and diseases, the tremendous growth of publicly available large-scale machine learning methods supplies the state-of-the-art application of data science to signaling disease, medicine, therapeutics, and identifying targets with the least error. In this article, we introduce guidelines on strategies and options of utilizing machine learning approaches for accelerating drug repurposing. We discuss how to employ machine learning methods in studying precision medicine, and as an instance, how machine learning approaches can accelerate COVID-19 drug repurposing by developing Chinese traditional medicine therapy. This article provides a strong reasonableness for employing machine learning methods for drug repurposing, including during fighting for COVID-19 pandemic. As of January 27, the aggregate number of patients infected with SARS-CoV-2 in the world has exceeded 100 million, and the total number of deaths has reached 2 million. 1 There are still 30 million confirmed patients but without receiving specific drugs for treatment. There are two main strategies to find effective treatment drugs for SARS-CoV-2 quickly and effectively. One is the de novo drug design, and the other is the drug repurposing. In which, de novo drug design is the process of signaling new drugs or drug combinations with starting from studying the structure of the receptor protein. Drug repurposing is the process of signaling new effects of drugs or drug combinations based on existing drugs. The details of employing machine learning to assisting the drug discovery and drug repurposing are shown in the following sections, as well as the details describing the progress of discovering therapeutic drug combinations for fighting against COVID-19. The current development of de novo drug discovery faces grand challenges, such as long research and development cycles, high cost, and limited experimental success rate. In recent years, the timeliness of drug research and development in the pharmaceutical industry is delaying yearly. However, employing machine learning methods to mining the properties and activities of compounds can be saving the time and costs in an efficient way. The characteristics of drug compounds can be represented by the molecular fingerprints of compounds. The molecular fingerprints include static fingerprints and dynamically generated fingerprints, which can be automatically inferred in the training process by employing machine learning approaches. Specifically, representation learning employs neural network-based approaches to training the embedding of compound features directly. Lusci et al. [1] employed UG-RNN (update gate recurrent neural network) model to train the vector of the same growth degree with molecular structure. UG-RNN model conveyed the vector of molecular structure to the fully connected neural network. Duvenaud et al. [2] trained molecular structure into molecular fingerprints via the GCN (graph convolution network) model. An algorithm that is mostly used for generating new compounds is variational autoencoders (VAE) [3] , through which the compounds' feature space can be mapped to a latent space using the encoder, and decoded into a representation of the characteristics of the original compounds through a decoder. The strategy of combining VAE and GAN (generative adversarial network) gained rising attention in studying generating the new compounds. Comparing to VAE, RNN (Recurrent neural network) is another achievable way to design compounds. The model of RNN benefits in better to learn the probability distribution of feature space. Jin et al. proposed the model of GCPN [4] to construct the molecular graph structure by employing graph convolutional neural network along with the reinforcement learning framework. In addition, GCPN integrated GAN to minimize the bias between the generated distribution and the original distribution. GraphAF [5] is a compound generation model that combines the auto-regressive model with flow-based generation. The experimental results showed that GraphAF can generate 68% of chemically valid molecules without any priori chemical knowledge. Moflow model [6] is a flow-based model, which can generate molecule graphs in a certain validity guarantee by linking two probability distributions of the adjacency matrix. Jin et al. [7] proposed a generative model and borrowed the molecular pair approach to generate set of molecular rationales (molecular substructures). And a neural network-based approach was employed to combine molecular rationales to design the molecule that simultaneously conform to multiple objectives. Bung et al. [8] proposed a generative model as the pre-training model to learn the distribution of physical and chemical characteristics of the compound. And the model can be used to signal and identify SARS-CoV-2 3CLpro chemical frameworks. Zhavoronkov et al. [9] employed deep learning-based methods consisting of autoencoders (AE), GAN, and reinforcement learning to identifying small molecules that can inhibit SARS-CoV-2 3CL protein. Computational method-assisted drug design is mainly by employing approaches of molecular docking and network pharmacology. To date, there have a considerable volume of works focused on developing methods of network pharmacology and molecular docking based on traditional Chinese medicine (TCM) to fight against COVID-19 [10, 11] . The results indicate that TCM compounds can play an indirect therapeutic effect by directly acting on the new coronavirus or by anti-inflammatory and immune regulation. Ren et al. [12] employed data-driven approaches to obtain TCM prescriptions as the potential treatment for pestilence from analyzing classical prescriptions. In this study, they targeted Mpro (3CL hydrolase) and ACE2 (angiotensin-converting enzyme 2) as the vital docking ingredients. And they also analyzed that Gancao (Licorice), HuangQin (Scutellaria), Dahuang (rhubarb), and Chaihu (Bupleurum) contain more potential target treating compounds. Yan et al. [13] employed the network pharmacology and molecular docking technology to explored the potential targets, signal pathways, and biological functions of Lianhua treating for COVID-19. And this research is based on combined six mostly used medicine databases. The effect of employing the strategy of de novo on drug discovering is significant enrichment as mentioned above, though the cost and timeliness of drug design is usually unaffordable. By contrast, drugs with known mechanism of action and pharmacokinetics can be considered as the priori knowledge of specific domain. When discovered the potential effects of the known drug, which is more effective and safe to be used and without having to start from scratch. In such case, the time and economic cost are much smaller of developing "old drugs for a new use". Drug repurposing is a plausible strategy and highly promising technique that has attracted growing attention from governments and pharmaceutical companies for its outstanding performance in saving time and cost. AI technology assisted drug relocation further on reducing time and economic costs. The workflow of the drug repurposing is described in Fig. 1 . Drug repurposing can benefit from new computational methods in detecting relationships among various types of biological entities such as genes, portions, diseases, and drugs. This study has the advantage in identifying alternative therapeutic indications for existing drugs. Since the outbreak of the COVID-19 epidemic, drug repurposing has become the most used methods to signaling therapeutic drugs or potential drug combinations due to the long cycle of de novo drug design. Several drugs, such as chloroquine, phosphoate, and radecivir, have been used to evaluate the therapeutic effect for COVID-19. Drug repurposing can win in high efficiency and low cost comparing with the traditional drug discovery strategy. Besides, it has more advantages in treating pandemic, like COVID-19. Baritinib is a drug to treat rheumatoid arthritis. After outbreak of COVID-19, Benevolent AI Company discovered that AAK1 is a regulatory factor in the process of SARS-CoC-2 infection by generating a drug-knowledge-graph. However, Baritinib can inhibit the activity of AAK1 without obvious side effects. Guney et al. [14] quantified the therapeutic effects of drugs and predicted new drug-disease associations by utilizing biological information to measuring the quantified interplay between drug targets and diseases. Zeng et al. [15] proposed an integrative deep network to generate a large network containing multiple relationships, collecting a large volume of expectations from PubMed and Drug-Bank database to embedding the entity as the vector representation. The results showed that 41 repurposable drugs were predicted (including dexamethasone, indomethacin, niclosamide, and toremifene) to be considered as potential therapeutic drugs for treating SARS-CoV-2. The big picture of utilizing machine learning methods to developing drug repurposing is shown in Fig. 2 . By constructing a large computational biology network consisting of drugs, genes, and diseases to measure the interaction between target and biomolecule, Chen et al. [16] treated drug combination as another form of drug repurposing. Beck [17] developed the model MT-DTI (molecule transformer-drug target interaction) based on the pre-trained drug-target interactions. MT-DTI can be used to signal the affinity of the compounds and target protein to identify commercially available drugs that could act on viral proteins of SARS-CoV-2. The results showed that atazanavir can be used to treat and prevent the human immunodeficiency virus (HIV). Kim et al. [18] identified potential associations between drugs and diseases by employing several machine learning methods (logistic regression, random forest, and SVM) by taking into account self-defined similarity metrics (drug-drug similarities and disease-disease similarities). Hooshmand et al. [19] mined possible therapeutic drugs to treat SARS-CoV-2 by analyzing the chemical structures of small-molecule drugs. DeepPurpose [20] is a deep learning toolkit for drug-target interaction (DTI) integrating encoding-based approaches of drug molecules and protein amino acid sequences. Belyaeva et al. [21] proposed a causal framework using multiple data patterns to generate a causal network consisting of nodes represented by COVID-19 and aging. The method integrated transcriptomics, proteomics, and other omics data Researches based on the real world can take advantage of massive data to reflect the actual process of diagnosis, treatment, and the health status of patients in the real situation. Since traditional statistical methods limited in handling the large volume of data, deep learning can be employed to signaling and mining treasures from massive real-world data owing to its outstanding power. Liu et al. [22] generated a framework for drug repurposing to estimating the effect of one single drug by taking into account the feature space of existing known drugs. After given a cohort of patients, potential drugs were extracted, and each drug can be categorized to the intervention group and a control group. Confounders and disease progression in the two cohorts were estimated, and LSTM combined with the attention mechanism was used to correct the bias, and finally, a drug with therapeutic effect was obtained. Comparing the aforementioned studying in mining smallmolecule repurposing drugs, there are a few of studies on using machine learning to find TCM treatment for COVID-19 [23] . Wang et al. [24] used an ontology-based sideeffect prediction framework (OSPF) integrating the neural network-based methods to evaluate the TCM prescriptions officially recommended by China Health Ministry as the treatment of COVID-19. The results showed that QFPD-T, HSBD-F, PMSP, GCT-CJ, SF-ZSY, and HSYF-F can be regarded as the potential therapeutic treatments. Liao et al. [25] used deep learning approaches to mine the relationship between patients' facial and prescriptions and propose to construct convolutional neural networks that generate TCM prescriptions according to the patient's face image. Guo et al. [26] conducted hierarchical clustering of TCM using unsupervised methods to classify the compounds into several modules with similar therapeutic functions. And the method is to investigate the polypharmacology effect of TCM, benefit to clarifying the mechanism of action of TCM, and providing new possibilities for disease treatment. Weng et al. [27] proposed a framework for automated medical [2] 2015 GCN de novo drug design General drug Duvenaud et al. [3] 2018 VAE de novo drug design General drug Jin [4] 2019 Junction tree encoder-decoder, VAE de novo drug design General drug Shi et al. [5] 2020 Flow-based auto-regressive model de novo drug design General drug Zang et al. [6] 2020 Flow-based graph generative model de novo drug design General drug Jin et al. [7] 2020 Graph generative models. de novo drug design General drug Bung et al. [8] 2021 Transfer learning, Reinforcement learning de novo drug design Drug for SARS-CoV-2 Zhavoronkov et al. [9] 2021 Autoencoder, GAN, Reinforcement learning de novo drug design Drug for SARS-CoV-2 Guney et al. [14] 2016 Network-based Drug repurposing General drug Zeng et al. [15] 2020 DGL-KE, RotatE Drug repurposing Drug for SARS-CoV-2 Wu et al. [16] 2013 Network-based Drug repurposing General drug Beck et al. [17] 2020 MT-DTI, Molecule Transformer Drug repurposing Drug for SARS-CoV-2 Kim et al. [18] 2019 logistic regression, Random forest, SVM Drug repurposing Traditional Chinese medicine Hooshmand et al. [19] 2020 MM-RBM Drug repurposing Drug for SARS-CoV-2 Huang et al. [20] 2020 Reinforcement learning Drug repurposing General drug Belyaeva et al. [21] 2021 Causal network models Drug repurposing Drug for SARS-CoV-2 Liu et al. [22] 2021 Causal inference, LSTM Drug repurposing General drug Wang et al. [24] 2021 ANN Drug repurposing Traditional drug Liao et al. [25] 2018 CNN Drug repurposing Traditional Chinese medicine Guo et al. [26] 2019 Unsupervised clustering Drug repurposing Traditional Chinese medicine Ruan et al. [28] 2019 Graph embedding based framework Drug repurposing Traditional Chinese medicine Wang et al. [29] 2019 SVM, DT, KNN Drug repurposing Traditional Chinese medicine Liu et al. [30] 2019 Attention, LSTM Drug repurposing Traditional Chinese medicine knowledge graph construction based on semantic analysis, which automatically extracted semantic reasoning through the graph. The computed TCM prescription can be introduced to diagnosis based on clinical symptoms. Table 1 describes an overview of computational drug repurposing studies, consisting of the adopted strategies, computational approaches, and main techniques. Machine learning methods play a vital role in studying drug repurposing; in which traditional machine learning mainly include, such as Logistic Regression, Random Forest, Support Vector machine, KNN and RotatE, etc. [15, 18, 29] , which are mainly used in the early stage. During the past decades, deep learning methods own the more significant power in signaling and discovering repurposable drugs, such as RNN [1] , GCN [2] , CNN [25] , GNN [7, 28] , LSTM [22, 30] , VAE [4] , and Transformer methods [17] . In addition, deep learning approaches can extract more informative features with respect to molecules and mapping these molecule structures to potential spaces. Since, flow-based models can switch the distribution of features, which have been got more attention [5, 6] . Machine learning plays an important role in studying drug repurposing, especially since the occurrence of COVID-19, scientists around the world used machine learning-based approaches to signal effective drugs. At present, there are still some problems, such as the black box problem of deep learning to signaling repurposable drugs causing hard to explain the rationality of the results. It is necessary to develop interpretable deep learning and causal learning along with the traditional drug discovery experiments. Furthermore, it is the fusion problem of the general field of machine learning in drug development, how to better characterize molecules and their conformational changes, to better extract the characteristics of molecules. By developing machine learning methods, we can accelerate drug discovery and improve human health in a way that has never been possible before. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules Convolutional networks on graphs for learning molecular fingerprints Automatic chemical design using a data-driven continuous representation of molecules Learning multimodal graph-to-graph translation for molecular optimization Graphaf: a flow-based autoregressive model for molecular graph generation Moflow: an invertible flow model for generating molecular graphs Multi-objective molecule generation using interpretable substructures De novo design of new chemical entities for sars-cov-2 using artificial intelligence Potential non-covalent SARS-CoV-2 3C-like protease inhibitors designed using generative deep learning approaches and reviewed by human medicinal chemist in virtual reality The pharmacological mechanism of huashi baidu formula for the treatment of covid-19 by combined network pharmacology and molecular docking Network pharmacology and molecular docking analyses on lianhua qingwen capsule indicate akt1 is a potential target to treat and prevent covid-19 Identifying potential treatments of covid-19 from traditional chinese medicine (tcm) by using a data-driven approach Mechanism and material basis of lianhua qingwen capsule for improving clinical cure rate of covid-19: a study based on network pharmacology and molecular docking technology Network-based in silico drug efficacy screening Repurpose open data to discover therapeutics for covid-19 using deep learning Network-based drug repositioning Predicting commercially available antiviral drugs that may act on the novel coronavirus (sars-cov-2) through a drug-target interaction deep learning model Drug repositioning of herbal compounds via a machine-learning approach A multimodal deep learning-based drug repurposing approach for treatment of covid-19 Deeppurpose: a deep learning based drug repurposing toolkit Causal network models of sars-cov-2 expression and aging to identify candidates for drug repurposing A deep learning framework for drug repurposing via emulating clinical trials on real-world patient data Traditional Chinese herbal medicine-potential therapeutic application for the treatment of covid-19 Evaluating the traditional Chinese medicine (tcm) officially recommended in china for covid-19 using ontology-based side-effect prediction framework (ospf) and deep learning Convolutional herbal prescription building method from multi-scale facial features Exploration of the mechanism of traditional Chinese medicine by ai approach using unsupervised machine learning for cellular functional similarity of compounds in heterogeneous networks, xiaoerfupi granules as an example Framework for automated knowledge graph construction towards traditional Chinese medicine Discovering regularities from traditional Chinese medicine prescriptions via bipartite embedding model Predicting meridian in Chinese traditional medicine using machine learning approaches Attentiveherb: a novel method for traditional medicine prescription generation On behalf of all authors, the corresponding author states that there is no conflict of interest.