key: cord-0565570-peo6gywq
authors: Doshi, Siddhant; Chepuri, Sundeep Prabhakar
title: Dr-COVID: Graph Neural Networks for SARS-CoV-2 Drug Repurposing
date: 2020-12-03
journal: nan
DOI: nan
sha: 1a51c27821240bf155428f705d6db0c19b41b4ce
doc_id: 565570
cord_uid: peo6gywq

The 2019 novel coronavirus (SARS-CoV-2) pandemic has resulted in more than a million deaths, high morbidities, and economic distress worldwide. There is an urgent need to identify medications that would treat and prevent novel diseases like the 2019 coronavirus disease (COVID-19). Drug repurposing is a promising strategy to discover new medical indications of the existing approved drugs due to several advantages in terms of the costs, safety factors, and quick results compared to new drug design and discovery. In this work, we explore computational data-driven methods for drug repurposing and propose a dedicated graph neural network (GNN) based drug repurposing model, called Dr-COVID. Although we analyze the predicted drugs in detail for COVID-19, the model is generic and can be used for any novel diseases. We construct a four-layered heterogeneous graph to model the complex interactions between drugs, diseases, genes, and anatomies. We pose drug repurposing as a link prediction problem. Specifically, we design an encoder based on the scalable inceptive graph neural network (SIGN) to generate embeddings for all the nodes in the four-layered graph and propose a quadratic norm scorer as a decoder to predict treatment for a disease. We provide a detailed analysis of the 150 potential drugs (such as Dexamethasone, Ivermectin) predicted by Dr-COVID for COVID-19 from different pharmacological classes (e.g., corticosteroids, antivirals, antiparasitic). Out of these 150 drugs, 46 drugs are currently in clinical trials. Dr-COVID is evaluated in terms of its prediction performance and its ability to rank the known treatment drugs for diseases as high as possible. For a majority of the diseases, Dr-COVID ranks the actual treatment drug in the top 15.

The dreadful pandemic outbreak of the coronavirus disease 2019 (COVID- 19) has affected about 56 million people with more than a million deaths worldwide as of November 2020. The June 2020 Global Economic Prospects [1] estimated a 5.2% downfall in the global gross domestic product (GDP) in 2020 that would lead to the worst economic slowdown in history after the Second World War. The disease affects mammals' respiratory tract and shows symptoms similar to pneumonia, causing mild to severe respiratory tract infections [2] . The pathogen that causes COVID-19 belongs to the Coronaviridae family, which is a family of enveloped positive-strand RNA viruses that affect mammals, birds, and amphibians. The name coronavirus (CoV) is derived because of the crown-shaped spikes that project from their surface. Coronaviruses are majorly grouped into four genera: alphacoronavirus, betacoronavirus, deltacoronavirus, and gammacoronavirus. While deltacoronaviruses and gammacoronaviruses infect birds, alphacoronaviruses and betacoronaviruses infect mammals [3] . Out of the seven known strains of human CoVs (HCoVs), the three betacoronaviruses, namely, middle east respiratory syndrome coronavirus (MERS-CoV), severe acute respiratory syndrome coronavirus (SARS-CoV), and the novel severe acute respiratory syndrome coronavirus (SARS-CoV-2) produce severe symptoms. In the past two decades, the world witnessed highly fatal MERS-CoV and SARS-CoV that led to global epidemics with high mortality. Although the 2003 SARS-CoV outbreak was controlled, it infected 8098 individuals and resulted in 774 deaths. As of November 2019, 2494 cases and 855 deaths were reported due to MERS-CoV, with the majority in Saudi Arabia [3] . In December 2019, similar cases were again reported in Wuhan City, China [4] , wherein investigations confirmed it to be the third novel CoV, i.e., SARS-CoV-2, which is also referred to as HCoV-2019, 2019-nCoV, or colloquially simply as coronavirus [5] . SARS-CoV-2 being highly contagious, on 30 January 2020, the World Health Organization (WHO) declared it as a public emergency of international concern warning all the countries with vulnerable health care systems [6] .

The current treatment for COVID-19 is completely supportive and symptomatic as there are no specific known medicines. Several research groups around the world are trying to develop a vaccine that would prevent and treat SARS-CoV-2. Looking at the current unpredictable trajectory of how the disease spreads and the life cycle of the virus, there is an urgent need to develop preventive strategies against it. Given this strict timeline, a more realistic solution lies in drug repurposing or drug repositioning, which aims to iden-tify new medical indications of approved drugs. Drug repurposing offers several advantages. It has a low risk of failure as the drug has already been approved with less unknown harmful adverse effects. It reduces the time frame for drug development as the drugs have passed all the pre-clinical trials and safety norms.

Finally, compared to the discovery of a new drug, drug repurposing requires less economic investment and puts fewer lives of volunteers (particularly kids) involved in clinical trials at risk [7] . Some of the examples of repurposed drugs are Sildenafil, which was initially developed as an antihypertensive drug was proved effective in treating erectile dysfunction by Pfizer [7] , and Rituximab that was originally used against cancer was proved to be effective against rheumatoid arthritis [7] , to name a few. Even for COVID-19, drugs like Remdesivir (a drug for treating Ebola virus disease), Chloroquine/Hydroxychloroquine (antimalarial drugs), Dexamethasone (anti-inflammatory drugs) are being repurposed and are under clinical trials as per the International Clinical Trials Registry Platform (ICTRP), which is a common platform maintained by WHO to track the clinical trial studies across the world.

Drug repurposing involves identifying potential drugs and monitoring their in vivo efficacy and potency against the disease. The most critical step in this pipeline is identifying the right candidate drugs, for which experimental and computational approaches are usually considered. To identify potential drugs experimentally, a variety of chromatographic and spectroscopic techniques are available for target-based drug discovery. Phenotype screening is used as an alternative to target-based drug discovery when the identity of the specific drug target and its role in the disease are not known [7] . Recently, computational approaches are receiving attention due to the availability of large biological data. Efficient ways to handle big data has opened up many opportunities in the field of pharmacology. Zitnik, et al. [8] elaborates several data-driven computational tools to integrate large volumes of heterogeneous data and solve problems in pharmacology such as drug-target interaction prediction (identify interactions between a drug and its target genes), drug repurposing, and drug-drug interaction or side effect prediction, to list a few. Hence this field is known as computational pharmacology. Many standard machine learning (ML) and deep learning (DL) techniques have been applied in computational pharmacology. Drug-drug interaction was formulated as a binary classification problem and solved using ML techniques like random forest, support vector machines (SVM), and naive bayes [9] , and using DL models like deep multi-layer perceptrons and recurrent neural networks, to name a few. DL techniques often outperform standard ML techniques [10, 11] . However, these methods lack the ability to capture the structural information in the data, specifically the connections between different biological entities (e.g., interactions between drugs and genes or between drugs and diseases). A natural and efficient way to represent such structural information is to construct a graph with nodes representing entities like drugs, genes, diseases, etc., and edges representing the complex interactions between these entities. Graph neural networks (GNNs) capture the structural information by accounting for the underlying graph structure while processing the data. Decagon, a GNN-based model designed for predicting the side effects of a pair of drugs has proved its capability by outperforming the non-graph based machine learning models in terms of its prediction performance [12] . Similarly, drug repurposing has been studied using computational methods such as signature matching methods, molecular docking, and network-based approaches. Recently, network-based and machine learning approaches [13] [14] [15] [16] [17] , and GNN based approaches [18] and [19] have been proposed for drug repurposing.

In this work, we propose a GNN architecture for COVID-19 drug repurposing called Dr-COVID, which is a dedicated model for drug repurposing. We formulate our problem by constructing a four-layered heterogeneous graph comprising drugs, genes, diseases, and anatomies. We then build a deep learning model to predict the links between the drug and disease entities, where a link between a drug-disease entity suggests that the drug treats the disease. Specifically, Dr-COVID is based on the scalable inceptive graph neural network (SIGN) architecture [20] for generating the node embeddings of the entities. We propose a quadratic norm scoring function that rank orders the predicted drugs. All the network information and node features are derived from the drug repurposing knowledge graph (DRKG) [21] . DRKG is a biological knowledge graph compiled using several databases, and comprises entities like drugs, diseases, anatomies, etc., and their connections. We leverage their generic set of low-dimensional embeddings that represent the graph nodes and edges in the Euclidean space for training. We validate Dr-COVID's performance on the known drug-disease pairs. Although we present the results and analysis for COVID-19, Dr-COVID is generic and is useful for any novel human diseases. From a list of 150 drugs predicted by Dr-COVID for SARS-CoV-2, 46 drugs are currently in clinical trials. For a majority of diseases with known treatment, the proposed Dr-COVID model ranks the approved treatment drugs in the top 15, which suggests the efficacy of the proposed drug repurposing model. As we use the SIGN architecture that does many computations beforehand, Dr-COVID is computationally efficient as compared to the other GNN-based methods [18, 19] .

Specifically, in contrast to [18] we include additional entities such as anatomies as the side information in our graph. This additional information provides indirect interactions between the disease and gene entities.

The norm scorer we design captures correlations between the drug and disease pairs, and as a consequence, the model predicts many more drugs (e.g., Brexanolone) that are in clinical trials as compared to the existing GNN-based and network-based drug repurposing models.

In this section, we present the drugs predicted by Dr-COVID for COVID-19 according to their pharmacological classifications, and elaborate on their roles in treating the disease. We individually predict drugs for the 27 entities that specify the SARS-CoV-2 genome structure as identified by Gordon et al. [22] . This genome structure includes structural proteins, namely, envelope (SARS-CoV2-E), membrane (SARS-CoV2- the better is the rank, as indicated by the rank bar on the right side. As can be seen, a major portion of the heatmap is covered with dark patches as we only consider the top 10 ranked drugs. We can infer from the heatmap that cardiovascular drugs (e.g., Captopril, Atenolol) and anti-inflammatory drugs (e.g., Celecoxib, Prednisone) are ranked high for the alphacoronaviruses, and a combination of antiparasitic (e.g., Ivermectin), corticosteroids (e.g., Prednisolone, Dexamethasone), antivirals (e.g., Cidofovir), and antineoplastic drugs (e.g., Methotrexate, Sirolimus) in the case of betacoronaviruses. drugs for the COVID-19 nodes. The majority of the corticosteroids we predict belong to the respiratory system (R) class, which has been the primary target for the coronaviruses, as reflected by the symptoms.

However, COVID-19 has a multi-organ impact on the human body and is not limited to the respiratory system [23] . Complications due to the cytokine storm with the effects of angiotensin converting enzyme (ACE) have led to cardiac arrest, kidney failure, and liver damage resulting in many deaths. For these reasons, we see drugs from various ATC classes are being considered for clinical trials. Next, we discuss in detail these pharmacological classifications of some of the predicted drugs.

Anti-inflammatory (AI) agents: Inflammatory cytokine storms are prominently evident in COVID-19 positive patients and timely anti-inflammation treatment is required [24] . Pneumonia caused by the coronavirus results in a huge amount of inflammatory cell infiltration leading to acute respiratory distress syndrome (ARDS), causing many deaths [25, 26] . A wide range of anti-inflammatory treatments including glucocorticoids, non-steroidal anti-inflammatory drugs (NSAIDs), immunosuppressants, inflam- Antiviral and anti-parasitic agents: Dr-COVID predicts nucleotide analogue antivirals like Acyclovir, Valaciclovir, Cidofovir, and Entecavir that have shown positive results in terminating the RNA synthesis catalyzed by polymerases of coronaviruses [27] . Ivermectin and Nitazoxanide are used against many parasite infestations and are also known to have antiviral properties. Mebendazole is another similar antiparasitic drug that Dr-COVID ranked high. One of the recent reports shows that Ivermectin is an effective inhibitor of the SARS-CoV-2 and many other positive single-stranded RNA viruses. A 5000-fold reduction in the virus titer within 48 hours in cell culture was obtained with a single treatment (5µM) of Ivermectin [28] .

Statins and ACE inhibitors/ beta-blockers/ calcium channel blockers: Statins are lipid-lowering drugs that inhibit the cholesterol synthesis enzyme (also known as HMG-CoA reductase), which also has anti-inflammatory properties. There have been implications of lipid metabolism in the SARS-CoV-2 pathogenesis [29] , due to which there are reports on including statins in the line of treatment for COVID-19.

Dr-COVID predicts Atorvastatin, Simvastatin, and Rosuvastatin, where all the three drugs are currently in clinical trials. On the contrary, some studies show that statins tend to increase the cellular expression of ACE inhibitors [30] , to which the SARS-CoV2-spike protein binds at the entry-level in humans [31] . Analyzing this issue, an observational study by Zang et al. [32] reported a reduced mortality rate in the patients treated with statins and no adverse effect was observed by adding an ACE inhibitor drug also to the line of treatment. These ACE inhibitors are cardiovascular drugs causing relaxation of blood vessels that are primarily used to treat high blood pressure and heart failure. Beta-adrenergic and calcium channel blockers are other similar functioning drugs that lower blood pressure, are also currently considered to treat COVID-19. Dr-COVID predicts Captopril (ACE inhibitor), Atenolol (beta-blocker) and Nifedipine (calcium channel blocker), which are currently in clinical trials. Additionally, the list of predicted drugs includes Spironolactone and Hydrochlorothiazide, that help prevent our body from absorbing too much salt and eventually lowering the blood pressure and avoiding cardiac failure.

Miscellaneous: Dr-COVID also predicts some of the pre-discovered vaccines such as Rubella virus vaccine, which is majorly considered for all the healthcare workers, the Yellow fever vaccine [33] , and the Ebola zaire virus vaccine (rVSV-ZEBOV). Further, we also have Mercaptopurine, an antineoplastic agent that has been considered as a selective inhibitor of SARS-CoV [34] in the list of predicted drugs. Antidepressant Brexanolone that is currently considered for patients on ventilator support due to ARDS, vasodilators Nitroglycerine and Alprostadil, nutritional supplements like Riboflavin (Vitamin B 2 ) [35] , Niacin, Cholecalciferol (Vitamin D 3 ), and Iron are some more top-ranked drugs. Interestingly, Ephedra sinica root, a herb generally used to treat asthma and lung congestion, and an ingredient of lung cleansing and detoxifying decoction (LCDD), which is a widely used traditional Chinese medicine [36] is one of the drugs predicted in our list.

In essence, Dr-COVID predicts drugs for COVID-19 from different pharmacological classes like the corticosteroids, antivirals, antiparasitic, NSAIDs, and cardiovascular drugs, as the disease does not target particular anatomy and impacts multiple organs in the human body.

In this section, we describe the dataset that we use to train and test Dr-COVID for COVID-19 drug repurposing. We also describe how we model the data as a multilayer graph to capture the underlying complex interactions between different biological entities. We derive the required information from DRKG, which is a comprehensive biological knowledge graph relating genes, drugs, diseases, biological processes, side effects, and other eight more entities useful for computational pharmacological tasks like drug repurposing, drug discovery, and drug adverse effect prediction, to list a few. DRKG gathers all this information from six databases, namely, Drugbank [37] , Hetionet [38] , GNBR [39] , STRING [40] , IntAct [41] , and DGIdb [42] . From DRKG, we consider four entities that are relevant to the drug repurposing task. The four entities are drugs (e.g., Dexamethasone, Sirolimus), diseases (e.g., Scabies, Asthma), anatomies (e.g., Bronchus, Trachea), and genes (e.g., Gene ID: 8446, Gene ID: 5529). All the genes are referred with their respective Entrez IDs throughout the paper. We extract the details about these entities specifically from the Drugbank, Hetionet, and GNBR databases. We form a four-layered heterogeneous graph with these four entities in each layer as illustrated in Fig 3a. The four-layered graph is composed of 8070 drugs, 4166 diseases, 29848 genes, 400 anatomies, and a total of 1,417,624 links, which include all the inter-layer and intra-layer connections. Next, we discuss the interactome that we consider for drug repurposing.

Interactome: There are inter-layered connections between the four layers and some have intra-layered connections. The inter-layered connections are of different types. The drug-disease links indicate treatment or palliation, i.e., a drug treats or has a relieving effect on a disease. For example, interaction between Ivermectin-Scabies (as seen in Fig 3b) and Simvastatin-Hyperlipidemia (as seen in Fig 3d) Fig 3 (b,c and d) . These common gene targets between a drug and a disease are one of the reasons for the drug to be a potential repurposing candidate against the disease. CoV-NL63, and two non-human CoVs namely MHV, and IBV. We consider interactions of these disease nodes with human genes. There are 129 links between these six disease nodes and the gene nodes [21] . In addition, we consider all the 27 SARS-CoV-2 proteins (including the structured proteins, nsp, and orf) and

their 332 links connecting the target human genes as given by Gordon et al. [22] . In other words, there are only disease-gene interactions available for these COVID-19 nodes. With this available information, we train Dr-COVID to predict possible drug connections for these COVID-19 nodes.

In the last few years, deep learning has gained significant attention from a variety of scientific disciplines due to its extraordinary successes in solving many challenging tasks like data cleansing, mining, and classification, mainly for images, speech, or text datasets. However, in many applications, the structure underlying data is not always Euclidean. Some examples include social networks, transportation networks, brain networks, sensor networks, chemical molecules, protein-protein interactions, meshed surfaces in computer graphics, and the drug repurposing network, as discussed above, to list a few. For these applications, more recently, deep learning for graph-structured data, also known as geometric deep learning (GDL) [43] , is receiving steady research attention. GDL aims at building neural network architectures known as graph neural network (GNNs) to learn from graph-structured data. GDL models are used to learn low-dimensional graph representations or node embeddings by taking into account the nodal connectivity information. These embeddings are then used to solve many graph analysis tasks like node classification, graph classification, and link prediction, to list a few. GNN architectures are developed using concepts from spectral graph theory and generalize the traditional convolution operation in the convolutional neural network (CNN) to the graph setting. In this section, we describe the proposed Dr-COVID architecture for COVID-19 drug repurposing and describe numerical experiments performed to evaluate our model.

Consider an undirected graph G = (V , E ) with a set of vertices V = {v 1 , v 2 , · · · , v N } and edges e i j ∈ E denoting a connection between nodes v i and v j . We represent a graph G using the adjacency matrix A ∈ R N×N , where the (i, j)th entry of A denoted by a i j is 1 if there exists an edge between nodes v i and v j , and zero otherwise. To account for the non-uniformity in the degrees of the nodes, we use the normalized

in the graph is associated with its own feature vector (referred to as input feature). Let us denote the input feature of node i by x (0)

i ∈ R d , which contains key information or attributes of that node (e.g., individual drug side effects). Let X (0) ∈ R N×d be the input feature matrix associated with the N nodes in the graph G obtained by stacking the input features of all the nodes in G . The new embeddings for a node is generated by combining information from its neighboring nodes (e.g., diseases or genes) to account for the local interactions. This process of combining information and generating new representations for a node is done by a single GNN block. If we stack K such blocks, we can incorporate information for a node from its K-hop neighbors (e.g., in Fig 3c, the drug Ivermectin is a 2-hop neighbor of the anatomy Lung and is connected via Gene ID: 8614). Mathematically, this operation can be represented as

where X (k) ∈ R N×d k represents the kth layer embedding matrix and d k is the embedding dimension in the kth layer. Here,Ā = I +Ã, where the identity matrix I ∈ R N×N , is added to account for the self-node embeddings, W k ∈ R d k ×d k+1 is the learnable transformation matrix, and g k (·) is the activation function in the kth layer. There exist several GNN variants such as graph convolutional networks (GCN) [44] , GraphSAGE [45] , graph attention networks (GAT) [46] and scalable inception graph neural network (SIGN) [20] , to name a few. GCN is a vanilla flavored GNN based on Eq (1). GAT gives individual attention to the neighboring nodes instead of treating every node equally. To address the issue of scalability, GraphSAGE uses a neighbor sam-pling method, wherein instead of taking the entire neighborhood, we randomly sample a subset of neighbor nodes. SIGN takes a different approach to solve the scalability issue and introduce a parallel architecture.

The proposed Dr-COVID architecture is based on the SIGN approach due to its computational advantages.

The predicted list of drugs from other GNNs are available in our repository. Next, we describe the proposed Dr-COVID architecture.

A. Dr-COVID architecture

The proposed GNN architecture for SARS-CoV-2 drug repurposing has two main components, namely, the encoder and decoder. The encoder based on the SIGN architecture generates the node embeddings of all the nodes in the four-layer graph. The decoder scores a drug-disease pair based on the embeddings.

The encoder and decoder networks are trained in an end-to-end manner. Next, we describe these two components of the Dr-COVID architecture, which is illustrated in Fig 4. Encoder: The Dr-COVID encoder is based on the SIGN architecture [20] , which provides low-dimensional node embeddings based on the input features and nodal connectivity information. Recall that the matrix A is the adjacency matrix of the four-layered graph G andÃ is the normalized adjacency. SIGN uses linear diffusion operators represented using matrices F r , r = 1, 2, · · · , to perform message passing and aggregate local information in the graph. By choosing F r =Ã r we can incorporate information for node v from its r-hop neighbors. Here,Ã r denotes the rth matrix power. To start the information exchange between the nodes, we assume that each node has its own d dimensional feature, which we collect in the matrix X ∈ R N×d to obtain the complete input feature matrix associated with the nodes of G . We can then represent the encoder as

where Y is the final node embedding matrix for the nodes in the graph G , and {Θ 0 , · · · , Θ r , W} are the ture. The main benefit of using SIGN over other sequential models (e.g., GCN, GAT, GraphSAGE) is that the matrix product F r X is independent of the learnable parameters Θ r . Thus, this matrix product can be pre-computed before training the neural network model. Doing so reduces the computational complexity without compromising the performance.

In our setting, we choose r = 2, i.e., the low-dimensional node embeddings have information from 2-hop neighbors. Choosing r ≥ 3 is not useful for drug repurposing, as we aim to capture the local information of the drug targets such that a drug node embedding should retain information about its target genes and the shared genes in its vicinity. For example, the 1-hop neighbors of Dexamethasone as shown in Fig 3b, are the diseases it treats (e.g., Asthma), and the drugs similar to Dexamethasone (e.g., Methylprednisolone) and its target genes (e.g., Gene ID: 8446, Gene ID: 387). The 2-hop neighbors are the anatomies of the target genes (e.g., Bronchus) of Dexamethasone, and the drugs that have similar effects on the diseases (e.g., Hydrocortisone and Dexamethasone have similar effects on Asthma). It is essential for the embedding related to Dexamethasone to retain this local information for the drug repurposing task, and not much benefit is obtained by propagating more deeper in the network.

Decoder: For drug repurposing, we propose a score function that takes as input the embeddings of the drugs and diseases and outputs a score based on which we decide if a certain drug treats the disease. Fig 4 illustrates the proposed decoder. The columns of the embedding matrix Y, contains the embeddings of all the nodes in the four-layer graph, including the embeddings of the disease and drug nodes. Let us denote the embeddings of the ith drug as y c i ∈ R l and the embeddings of the jth disease as y d j ∈ R l . The proposed scoring function f (·) to infer whether drug c i is a promising treatment for disease d j is defined as

where σ {·} is the nonlinear sigmoid activation function and Φ ∈ R l×l is a learnable co-efficient matrix.

We interpret s i j as the probability that a link exists between drug c i and disease d j . The term y T c i Φy d j can be interpreted as a measure of correlation (induced by Φ) between the disease and drug node embeddings.

We use d = 400 and l = 250 in our implementation. The model is trained in a mini-batch setting in an end-to-end fashion using stochastic gradient descent to minimize the weighted cross entropy loss, where the loss function for the sample corresponding to the drug-disease pair (i, j) is given by

where z i j is the known training label associated with score s i j for the drug-disease pair, z i j = 1 indicates that drug i treats disease j and otherwise when z i j = 0. Here, w is the weight on the positive samples that we choose to account for the class imbalance. As discussed in the Dataset Section, we include the no-drug- (4)), is also chosen to be the class imbalance ratio of each batch, i.e., we fix w to be 1.5.

We perform experiments on three sequential GNN encoder architectures, namely, GCN [44] , GraphSAGE [45] , and GAT [46] for the drug repurposing task, which we treat as a link prediction problem, and compare with the proposed Dr-COVID architecture. Specifically, the SIGN encoder in Dr-COVID is replaced with GCN, GraphSAGE, and GAT to evaluate the model performance. Two blocks of these sequential models are stacked to maintain the consistency with r = 2 of the Dr-COVID architecture. We evaluate these models on the test set, which are known treatments for diseases that are not shown to the model while training.

The model is evaluated based on two performance measures. Firstly, we report the ability to classify the links correctly, i.e., to predict the known treatments correctly for diseases in the test set. This is measured through the receiver operating characteristic (ROC) curve of the true positive rate (TPR) versus the false positive rates (FPR). Next, using the list of predicted drugs for the diseases in the test set, we report that model's ability to rank the actual treatment drug as high as possible (the ranking is obtained by ordering the scores in Eq (3)). We also evaluate Dr-COVID in terms of ranks of the actual treatment drug in the predicted list for a disease from the testing set, where the rank is computed by rank ordering the scores as before. In addition, we compute the network proximity scores [13] and rank order the drugs based on these scores to compare with other GNN encoder models. These network proximity scores are a measure of the shortest distance between drugs and diseases. They are computed as

where P i j is a proximity score of drug c i and disease d j . Here, C is the set of target genes of c i , T is the set of target genes of d j , and d(p, q) is the shortest distance between a gene p ∈ C and a gene q ∈ T in the gene interactome. We convert these into Z-scores using the permutation test as

where µ is the mean proximity score of c i and d j , which we compute by randomly selecting subsets of genes with the same degree distribution as that of C and T from the gene interactome, and ω is the standard deviation of the scores generated in the permutation test. Table I gives the rankings, which clearly

show that the Dr-COVID results in better ranks on the unseen diseases than the other GNN variants. Also, compared to the network proximity measure, which is solely based on the gene interactome, Dr-COVID performs better. We choose these drug-disease pairs for evaluation as these links are not shown during the training. It is evident that the diseases on which we evaluate are not confined to single anatomy (e.g., rectal

neoplasms are associated to the rectum anatomy, whereas pulmonary fibrosis is a lung disease), nor do they require a similar family of drugs for their treatment (e.g., Fluorouracil is an antineoplastic drug, and

Prednisone is an anti-inflammatory corticosteroid). Thus, showcasing our model's unbiased nature. For a majority of the diseases in the test set Dr-COVID ranks the treatment drug in top 10 (as seen in Table I ).

In the case of Leukemia (blood cancer), other antineoplastic drugs like Hydroxyurea and Methotrexate are ranked high (in top 10) and its known treatment drug Azacitidine is ranked 17. We give more importance to the ranking parameter as any drug predictor requires classifying and ranking the correct drugs as high as possible. Considering this AUROC-ranking trade-off we can see that Dr-COVID with SIGN encoder performs the best. Table. The Table gives the ranking performance of Dr-COVID compared with other GNN variants and the network proximity measures. There are no associated genes with some of the disease in our database, which makes it impossible to compute the Z scores. These are indicated as "Not computable". The best results are highlighted in bold font.

CoV, IBV, MERS-CoV, CoV-229E, CoV-NL63, and MHV, we individually predict the drugs for all these 33 entities. Each protein in SARS-CoV-2 targets a different set of genes in humans, so we give individual predictions. We then pick the top 10 drugs from all the predicted drugs and list 150 candidate repurposed drugs for COVID-19. Out of these 150 drugs, 46 are currently in clinical trials. Our predictions have a mixture of antivirals, antineoplastic, corticosteroids, monoclonal antibodies (mAb), non-steroidal anti-inflammatory drugs (NSAIDs), ACE inhibitors, and statin family of drugs, and some of the vaccines discovered previously for other diseases. Refer to the Results Section for a detailed discussion on the analysis of the predicted drugs for COVID-19.

In this work, we presented a generalized drug repurposing model, called Dr-COVID for novel human diseases. We constructed a biological network of drugs, diseases, genes, and anatomies and formulated the drug repurposing task as a link prediction problem. We proposed a graph neural network model, which was then trained to predict drugs for new diseases. Dr-COVID predicted 150 potential drugs for COVID-19, of which 46 drugs are currently in clinical trials. The considered GNN model is computationally efficient and better ranks known treatment drugs for diseases than the other GNN variants and non-deep methods like the network proximity approaches. This work can be extended along several directions. Considering the availability of substantial biological data, the inclusion of information like individual side effects of drugs, the molecular structure of the drugs, etc., may further improve the predictions. Considering the comorbidities of a patient would help us analyze the biological process and gene interactions in the body specific to an individual and accordingly prescribe the line of treatment. Predicting a synergistic combination of drugs for a disease would be another area of interest where graph neural networks can be beneficial.

Global Economic Prospects

Coronaviruses-drug discovery and therapeutic options

Coronavirus infections-more than just the common cold

Outbreak of pneumonia of unknown etiology in Wuhan, China: The mystery and the miracle

Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study

World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19)

Drug repurposing: progress, challenges and recommendations

Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities

Machine learning-based prediction of drug-drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties

Drug-drug interaction extraction via recurrent neural network with multiple attention layers

A multimodal deep learning framework for predicting drug-drug interaction events

Modeling polypharmacy side effects with graph convolutional networks

Network-based approach to prediction and population-based validation of in silico drug repurposing

Network-based in silico drug efficacy screening

Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2

A genome-wide positioning systems network algorithm for in silico drug repurposing

Graph theory enables drug repurposing-how a mathematical model can drive the discovery of hidden mechanisms of action

Network medicine framework for identifying drug repurposing opportunities for COVID-19

Few-shot link prediction via graph neural networks for Covid-19 drugrepurposing

Scalable Inception Graph Neural Networks

DRKG -Drug Repurposing Knowledge Graph for Covid-19 2020

A SARS-CoV-2 protein interaction map reveals targets for drug repurposing

COVID-19 and multi-organ response

The use of anti-inflammatory drugs in the treatment of people with severe coronavirus disease 2019 (COVID-19): The experience of clinical immunologists from China

Pathogenic human coronavirus infections: causes and consequences of cytokine storm and immunopathology

Cytokine storm and sepsis disease pathogenesis. Semin Immunopathol

Library of Nucleotide Analogues Terminate RNA Synthesis Catalyzed by Polymerases of Coronaviruses Causing SARS and COVID-19

The FDA-approved drug ivermectin inhibits the replication of SARS-CoV-2 in vitro

Teaching old drugs new tricks: statins for COVID-19?

The effect of fluvastatin on cardiac fibrosis and angiotensin-converting enzyme-2 expression in glucose-controlled diabetic rat hearts

Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target

In-hospital use of statins is associated with a reduced risk of mortality among individuals with COVID-19

KU Leuven Breakthrough as Modified Yellow Fever Virus Destroys COVID-19 in Preclinical Animal Research

Thiopurine analogue inhibitors of severe acute respiratory syndrome-coronavirus papain-like protease, a deubiquitinating and deISGylating enzyme

Pathogen reduction of SARS-CoV-2 virus in plasma and whole blood using riboflavin and UV light

Plant Solutions for the COVID-19 Pandemic and Beyond: Historical Reflections and Future Perspectives

Drugbank 5.0: a major update to the Drugbank database for 2018

Systematic integration of biomedical knowledge prioritizes drugs for repurposing

A global network of biomedical relationships derived from text

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

The MIntAct project-IntAct as a common curation platform for 11 molecular interaction databases

GIdb 3.0: a redesign and expansion of the drug-gene interaction database

Geometric deep learning: going beyond euclidean data

Semi-supervised classification with graph convolutional networks

Inductive representation learning on large graphs

Graph attention networks

The authors thank the Deep Graph Learning team for compiling DRKG and making the data public at https://github.com/gnn4dr/DRKG.

All the implementation and the data required to reproduce the results in the paper are available at https: //github.com/siddhant-doshi/Dr-COVID.