key: cord-0563806-f38g1nqw authors: Wu, Yifan; Gao, Min; Zeng, Min; Chen, Feiyang; Li, Min; Zhang, Jie title: BridgeDPI: A Novel Graph Neural Network for Predicting Drug-Protein Interactions date: 2021-01-29 journal: nan DOI: nan sha: e9cacacd4db04c2aea8298cd67451f4deb93c71b doc_id: 563806 cord_uid: f38g1nqw Motivation: Exploring drug-protein interactions (DPIs) work as a pivotal step in drug discovery. The fast expansion of available biological data enables computational methods effectively assist in experimental methods. Among them, deep learning methods extract features only from basic characteristics, such as protein sequences, molecule structures. Others achieve significant improvement by learning from not only sequences/molecules but the protein-protein and drug-drug associations (PPAs and DDAs). The PPAs and DDAs are generally obtained by using computational methods. However, existing computational methods have some limitations, resulting in low-quality PPAs and DDAs that hamper the prediction performance. Therefore, we hope to develop a novel supervised learning method to learn the PPAs and DDAs effectively and thereby improve the prediction performance of the specific task of DPI. Results: In this research, we propose a novel deep learning framework, namely BridgeDPI. BridgeDPI introduces a class of nodes named hyper-nodes, which bridge different proteins/drugs to work as PPAs and DDAs. The hyper-nodes can be supervised learned for the specific task of DPI since the whole process is an end-to-end learning. Consequently, such a model would improve prediction performance of DPI. In three real-world datasets, we further demonstrate that BridgeDPI outperforms state-of-the-art methods. Moreover, ablation studies verify the effectiveness of the hyper-nodes. Last, in an independent verification, BridgeDPI explores the candidate bindings among COVID-19's proteins and various antiviral drugs. And the predictive results accord with the statement of the World Health Organization and Food and Drug Administration, showing the validity and reliability of BridgeDPI. The drug discovery and drug screening are complex. The typical timeline usually takes 10-20 years and costs US$0.5-2.6 billion (Avorn et al., 2015; Paul et al., 2010) . Among them, exploring possible drug-protein interactions (DPIs) is a crucial step. Although experimental assays remain the most reliable approach for determining DPIs, they are timeconsuming and cost-intensive. Therefore, efficient computational methods for predicting protein-drug interactions are significant and urgently demanded. Current DPI prediction methods can be summarized as three forms: docking-based methods, machine learning-based methods, and deep learning-based methods. Docking-based methods look for the best binding Then through the multilayer perceptron layers, the node embeddings are calculated and fed to the GNN; Finally, the outputs of GNN are multiplied and a full connected layer with sigmoid activation is applied. position inside the binding pocket of the proteins for drug molecules (Gschwend et al., 2015; Led and Caflisch, 2017) . However, it takes a lot of time and lacks available 3D protein structures for a large-scale dataset. Machine learning-based methods (Bleakley and Yamanishi, 2009; Ballester and Mitchell, 2010; Faulon et al., 2008) usually use handcrafted features. However, one needs to choose, combine and compare these handcrafted features carefully, which require expertise and experience. These methods usually use handcrafted features before modeling, which require much expertise. Recently, with the plentiful accumulation of data, deep learning-based methods have been successfully applied to various bioinformatics tasks A et al., 2020; Min et al., 2017; Zhang et al., 2019 Zhang et al., , 2020 . It has further improved the performance of DPI prediction by using the deep structure and vast learnable parameters. Deep learning models such as DeepDTA (Öztürk et al., 2018) , WideDTA (Öztürk et al., 2019) , DeepConv-DTI (Lee et al., 2019) , PADME (Feng et al., 2018) , GraphDTA (Nguyen et al., 2019) , E2E (Gao et al., 2018) , drugVQA (Zheng et al., 2020) have similar steps: 1) encode proteins and drugs. 2) design a feature extractor modules to capture high-level features of proteins and drugs. 3) fuse the high-level features of proteins and drugs, and perform prediction through full connected layers. The disadvantage of these methods is that they neglect to leverage the protein-protein associations (PPAs) and drug-drug associations (DDAs). The magic of PPAs and DDAs comes from the fact that proteins usually interact with similar drugs (Chi and Hou, 2011; Baudot et al., 2017) . Therefore, the information in DDAs and PPAs would improve the DPI prediction. A large number of conventional methods that used PPAs and DDAs convinced the effectiveness of DDAs and PPAs in the DPI prediction (Chen et al., 2012) . Generating PPAs and DDAs mainly has three types of methods: structure-based methods, sequence-based methods and the methods based on known DTIs. But there are still some limitations. For structure-based methods, although they can measure PPAs accurately, they are limited by the availability of protein structural data. For sequence-based methods, PPAs are generally computed by using BLAST which is based on multi sequence alignment and homology information. The results of BLAST depend on the scale of protein data. When researchers only have hundreds or thousands of protein sequences in their study, using BLAST on such a small number of protein sequences cannot find their homological proteins for most of the sequences. Thus these sequences cannot provide useful information to represent PPAs appropriately. For the methods based on known DTIs, it uses negative sampling to generate negative samples for PPAs and DDAs without any reliable biological evidence. However, negative sampling treat the unknown drug-protein pairs as negative. However, the negative drug-protein pairs could be positive. To tackle the above limitations and further improve the prediction performance, we develop BridgeDPI, a deep learning framework for predicting DPIs. The novelty of BridgeDPI is that we introduce the hyper-nodes to connect different proteins/drugs. The role of hyper-nodes is to construct bridges between proteins/drugs. The bridges implicitly measure the associations among proteins/drugs and therefore can be used as the networks of PPAs and DDAs. On one side, the hyper-nodes are automatically learned for DPI prediction. The quality of the learned PPAs and DDAs is guaranteed by the back propergation of the end-toend learning of BridgeDPI. On the other side, some unknown biological connections may also be explored by the hyper-nodes. In brief, the hypernodes are considered as the essential elements to improve the prediction performance. We demonstrate the superior performance of BridgeDPI: the AUC scores of 0.989 (for seen proteins), 0.952 (for unseen proteins) in the customized BindingDB dataset, 0.995 in the C.elegans dataset, 0.990 in the human dataset. The overview of BridgeDPI is shown in Figure 1 . BridgeDPI contains two inputs: the protein sequence and the drug smiles. For protein sequences, a three-layer feed-forward network is used to process the k-mer features. In addition, a Convolutional Neural Network (CNN) is used to extract the sequence features of proteins. We combine the two features to obtain the final protein embeddings. For drug smiles, a two-layer feed-forward network is used to process the molecular fingerprint features. In addition, a CNN is used to extract the sequence features of drugs. We also combine the two features to obtain the final drug embeddings. After that, we introduce some hyper-nodes to construct the bridges between proteins/drugs and use Graph Neural Network (GNN) to learn the associations between proteins/drugs. Finally, we get element-wise products of the protein and drug outputs after GNN, and then use a linear layer with sigmoid activation to predict the interactions. Before feeding protein sequences and drug molecules to BridgeDTI, they need to be encoded as numeric vectors. For a long time in the past, the k-mer and FingerPrint (FP) features have been very effective to represent proteins and drugs. Since the advent of deep learning techniques, people begin to characterize proteins and drugs at the amino acid level and atomic level by using CNN, GNN, etc. However, in fact, k-mer and FP features cannot be replaced completely by the sequence or graph features. Thus, we reintroduce k-mer and FP features, which are effective but ignored by many researchers at present. For proteins, k-mer is a classical and effective method for protein embedding. K-mer features can describe the type and number of amino acid functional groups, which are very important in the prediction of DPIs. As a result, k-mer features are used to represent the proteins. In our study, we set k as 1,2,3, and thus we got 20 1 + 20 2 + 20 3 = 8420 k-mer features. x ..8420 represent 1-mer, 2-mer, 3-mer features of protein i, respectively. We normalize each part horizontally to eliminate the effect of protein length, as shown in Formula (2). where x i 1...20 represents 1-mer feature of protein i, mean(·) and std(·) are to calculate the mean and standard deviation. We also do the same for the 2-mer part x i 21...420 and 3-mer part x i 421...8420 . For drug molecules, FP is a very efficient technique in drug discovery and virtual screening. In molecular FP, the topological information of molecular structure is encoded as a vector, which can well represent a drug molecule. Consequently, we use Morgan FP (Cereto-Massagué et al., 2015) to represent a molecule. It can encode a drug as a vector with dimensions of 1024. where F j represents FP features of drug j, b 1...1024 are the binary values in the fingerprint. After getting the characterization of proteins A i and drugs F j , their relationship cannot be constructed directly. Because the two vectors have different dimensions and do not belong to the same space. Therefore, they need to be mapped into the same space through several layers of neural networks. At the same time, we use CNN with max-pooling to extract the sequence features of proteins and drugs, respectively, and then fuse them together. where fp(·) and f d (·) are the nonlinear transformation layers for protein and drug, respectively, p i and d j are the sequence features extracted by CNNs. Finally, u i is used as the embedding of protein i. v j is used as the embedding of drug j. After getting the embedding of proteins and drugs, how can we predict the interactions between proteins and drugs, especially those unconnected protein-drug pairs candidates? It is the main challenge to DPI predictions due to the lack of sufficient neighborhood information. As shown in Figure 2 , given known interacted pairs, we need to infer unknown relationship between protein p 2 and drug d 2 . Traditional methods usually computed DDAs and PPAs to construct a heterogeneous network. And then according to the associations between proteins/drugs, we can construct a path p 2 − p 1 − d 1 − d 2 to infer the relationship. However, as mentioned above, it is difficult to acquire high-quality PPAs and DDAs. Therefore, we introduce a new kind of virtual nodes, namely hyper-nodes. These nodes connect all proteins and drugs, which are randomly initialized and updated during the training process. With these hyper-nodes, we can construct some bridges between proteins/drugs and capture the relationship between them. It makes our model learn a higher-quality PPA and DDA information from the data and further improve the predictive performance of DTI. As illustrated in Figure 2 , although we do not know the relationship between p 1 and p 2 , by introducing a hyper-node h 1 , we can get the relationship between p 1 and h 1 , p 2 and h 1 . Therefore, the relationship between p 1 and p 2 can be derived from p 2 − h 1 − p 1 . In the same way, the relationship between d 1 and d 2 can be derived from d 2 − h 2 − d 1 . Finally, the relationship between p 2 and d 2 can be inferred through Assume that the set of hyper-nodes is {n 1 , n 2 , . . . , nm}, where m is the number of hyper-nodes, and hyper-nodes have the same embedding size with u i and v j . For each protein-drug pair in dataset, we can get a graph including nodes u i , v j , n 1 , n 2 , . . . , nm with edges calculated by cosine similarity. where n i and n j represent any two nodes in the graph (including u and v), · 2 is to compute 2-norm of a vector. In fact, A i,j could be negative, which are not suitable for the calculation of GNN. Thus, we need to filter out the negative edges by ReLU(·), which means there's no edge if cosine similarity is less than zero. And the computational steps of GNN are as follows: is the output of i-th layer of GNN, W i and b i are the parameters in i-th layer of GNN. Additionally, residual structure (He et al., 2016) is applied in each layer of GNN, as shown in Equation (9). It is a very effective structure in deep networks because it guarantees that some shallower network may exist in the model. The output of GNN belongs to R (m+2)×d , and the first two rows are the embeddings of protein nodes and drug nodes after fusing graph information. The two embeddings are taken out to do element-wise multiplication and then fed to a fully connected layer. Finally, the sigmoid(·) is applied to classify whether the protein-drug pairs has interactions. h ij =û i ⊗v j (10) whereû i andv j are the first two rows of the final output matrix in GNN, fo(·) is the fully connected layer,ỹ ij represents the probability of the interaction between protein i and drug j, ⊗ means element-wise product. Batch normalization (Ioffe and Szegedy, 2015) , which standardizes each batch of data, is a very effective method to accelerate training and improve the performance of the deep network. Dropout (Srivastava et al., 2014) , which randomly drops part of neuron units, can greatly improve the generalization of the model. We add batch normalization and dropout after each layer in the model. Finally, the model is trained by minimizing the cross-entropy loss function. L(y ij ,ỹ ij | θ) = −(y ij logỹ ij +(1−y ij )log(1−ỹ ij )+λ θ∈Θ θ 2 (12) where θ is the set of all model parameters, λ is the regularization coefficient. BindingDB dataset. BindingDB is a public, web-accessible database for measuring binding affinities, focusing on interactions between proteins and small drug molecules. It contains a total of 2,067,981 binding data, for 8,161 protein targets and 910,476 small molecules (Gilson et al., 2016) . Based on the raw BindingDB dataset, Gao et al. constructed a binary classification dataset containing 39,747 positive samples and 31,218 negative samples (Gao et al., 2018) . A protein-drug sample is positive if IC50 is less than 100nm, or negative if IC50 is greater than 10,000nm. They divided the data into training set (28,240 positive samples, 21,915 negative samples), validation set (2,831 positive samples, 2,776 negative samples), and test set (2,706 positive samples, 2,802 negative samples) with a guaranteed ratio of positive to negative samples. We choose this customized BindingDB as one of our benchmark datasets for head-to-head comparisons. C.elegans and human dataset. Liu et al. obtained a set of highly credible negative samples of DPI via silico screening method (Liu et al., 2015) . Combined with the known positive samples, they constructed two dataset C.elegans and human respectively. Following Tsubaki et al. (Tsubaki et al., 2019) 's work, we choose the balanced versions of these datasets. For the C.elegans dataset, there are 7,786 binding samples (3,893 positive samples, 3893 negative samples), with 1,876 protein targets and 1,767 drug molecules. For the human dataset, 6,728 binding data are included, with 2,001 protein targets and 2,726 drug molecules. These two datasets are randomly divided into 5 folds respectively, and each time one fold is selected as the validation set for 5-fold cross-validation. DUD-E dataset. DUD-E is a widely used dataset covering 102 proteins and 22,886 clustered ligands (Mysinger et al., 2012) . There are 50 decoys for each activity with similar physical and chemical properties but dissimilar 2D topology. It contains 1,429,790 protein-ligand samples in total (22,645 positive samples, 1,407,145 negative samples). Following Zheng et al. (Zheng et al., 2020) 's work, we perform 3-fold cross-validation in this dataset and the average evaluation metrics are reported. In addition, DUD-E is used as an independent test set to evaluate the performance of models in the real world. The proposed model is implemented with Pytorch 1.6.0 (Paszke et al., 2019) and the parameters are initialized by default. We use Adam (Kingma and Ba, 2019) optimizer with a learning rate of 0.001 to adjust the parameters in the training process. In order to prevent over-fitting, L2 regularization is added to the loss function. For each step, a batch of protein-drug pairs is randomly selected to run the gradient descent algorithm and the batch size is set to 512. For the setting of other hyperparameters, such as the number of layers, the number of neurons, the ratio of dropout, etc., many experiments are carried out to choose the values according to the performances on the validation set. The maximum number of epochs is set to 100, and the models with the best area under ROC curve AUC in the validation set are saved. Finally, the saved models are evaluated on a test set by metrics of accuracy (ACC) and the AUC. Besides, our model has low computational complexity, high parallelism, and fast training speed. The training process of the customized BindingDB dataset will be finished in about 15 minutes on the GPU platform of GTX 1080ti. For proteins, we introduce a 2-layers network for nonlinear transformation, containing 1024, 128 neurons, respectively. For drug molecules, we introduce a 3-layers network for nonlinear transformation, containing 1024, 256, 128 neurons, respectively. The outputs are 128dimensional vectors that serve as the input of node embeddings on the complete DPI graph. Then, we use a 3-layers GNN, which means that each node has aggregated three layers of neighbor information. In the end, the scores of the interactions between proteins and drugs are obtained through a 2-layers feed forward network. Moreover, we dropout 50% neurons after each layer. And the L2 regularization coefficient is set to 0.001 to limit the model ability and prevent over-fitting. First of all, we conduct experiments on the customized bindingDB dataset which is extracted from the bindingDB by Gao et al. Following their settings, we also make the same division of the dataset to ensure that the validation set and test set contain some unseen proteins (that are not appear in the training set), which is closer to the real world. Based on the prediction results in the test set, we calculate AUC and ACC (threshold is 0.5, same as below) of BridgeDPI with the best parameters. To compare the performance with other methods, we choose Tiresias (Fokoue et al., 2016) , DBN (Wen et al., 2017) , GNN (Tsubaki et al., 2019) , E2E (Gao et al., 2018) and DrugVQA (Zheng et al., 2020) as baselines. As we know, there are a large number of unknown proteins in nature, and this is why we should focus on predicting the new proteins, i.e., cold-start problem. Therefore, the test set is divided into an unseen protein set (the proteins that do not appear in the training set) and a seen protein set (the proteins that appear in the training set). As shown in Figure 3 models generally achieve good performances in the seen protein set (AUC exceeds 0.9, ACC exceeds 0.85), but differ greatly in the unseen protein set. Based on a traditional relational network, Tiresias performs poorly on the unseen proteins, with an AUC of only 0.68. Tiresias is a method based on computed similarity information to obtain the features of proteins and drugs, and it predicts DPIs through a logistic regression model. We hypothesize that only based on a linear model using similarity information, the expression of the unseen proteins may not be sufficient, resulting in the ACC on unseen proteins is even less than 0.5. DBN, E2E, DrugVQA, and BridgeDPI have AUC over 0.9 and ACC over 0.8 in unseen proteins, indicating the effectiveness of deep learning techniques in solving unseen proteins. Among them, our BridgeDPI outperforms other baselines and achieves start-of-the-art performances, with AUC and ACC reaching 0.987 and 0.954 in seen proteins, 0.951 and 0.887 in unseen proteins. It indicates that the introduction of hyper-nodes indeed improves the expression of proteins/drug features, and the deep graph neural network also enables BridgeDPI to learn the deeper interaction rules between proteins and drugs. Furthermore, we also conduct experiments on the C.elegans dataset and human dataset (Liu et al., 2015) which are widely used in many studies. The results are shown in Table 1 . Since Gao et al. (Gao et al., 2018) do not provide the code of E2E, we reproduced the model and obtain experimental results on the two datasets without using the Gene Ontology (GO) features. Other results of baselines are from their original papers (Tsubaki et al., 2019; Zheng et al., 2020) . From Table 1 , for randomly divided C.elegans and human datasets, almost all proteins in the test set are seen, which means models can learn all protein information better from the training dataset, resulting in very good results. In this case, the unsupervised k-NN is slightly worse than other models, with AUC 0.858 and F1 0.814 on the C.legans dataset, AUC 0.860 and F1 0.858 on the human dataset, respectively. The supervised machine learning methods (RF, L2, SVM) are slightly better, with AUC of the C.elegans dataset reaching around 0.9, AUC of the human dataset exceeding 0.9. In contrast, GNN, E2E/GO, DrugVQA, and BridgeDPI based on deep learning methods perform very well, with AUCs over 0.97 and F1s over 0.9. Among them, BridgeDPI achieves the best performances, with AUC, precision, recall, and F1 of 0.995, 0.980, 0.965, 0 .972 on the C.elegans dataset, respectively, and 0.990, 0.963, 0.949, 0.956 on the human dataset, respectively. The results are in line with our expectations. Because the models such as KNN, RF, L2, and SVM, without high-quality features, are difficult to learn complex nonlinear relationships (protein-drug interaction), while the deep learning models have strong feature extraction abilities to learn the interaction rules. On this basis, BridgeDPI integrates PPA and DDA information, further improving the results. Although we have achieved excellent results on these benchmark datasets, such datasets have serious data bias, which will lead to the inflated performances (Chen et al., 2019; Yang et al., 2020) . In order to verify the realistic performances of our model, we conduct the following experiments: train models on the customized BindingDB dataset and test models on the DUD-E dataset. We conduct 5-fold cross-validation "main" -2021/2/1 -page 6 -#6 picture(0,0)(-35,0)(1,0)30 (0,-35)(0,1)30 picture picture(0,0)(35,0)(-1,0)30 (0,-35)(0,1)30 picture experiments on the customized BindingDB dataset to obtain 5 models, and then predict on the DUD-E dataset to obtain 5 results, last the 5 results are averaged to evaluate the performances. The results are shown in Figure 4 . Similar to the previous reasons, the results for DrugVQA and E2E are not obtained. Not surprisingly, the performances of these models are greatly reduced, with AUC of the SVM even less than 0.5. Compared with other models, the AUC (0.709) of BridgeDPI is the best, which is 9.41%, 8.58%, 29.14%, 32.03%, 46.79% higher than E2E/GO, KNN, RF, L2, SVM, respectively. Moreover, if the whole BindingDB dataset is used for training, the AUC of BridgeDPI and E2E/GO will reach to 0.772 and 0.748. The results show the effectiveness of hyper-nodes and that BridgeDPI performs better even under more realistic conditions. By introducing the hyper-nodes, BridgeDPI builds many bridges between all proteins/drugs to conduct the prediction of DPI. In order to discover the role of the hyper-nodes, we carry out a further ablation study on the customized BindingDB dataset about the influence of the number of hyper-nodes in our model. We set the number of hyper-nodes to -1, 1, 2, 4, 8, 16, 32, 64, 128, 256 separately to observe the results of our model on the test set. Among them, -1 means to remove the hyper-nodes from BridgeDTI, and the extracted feature vectors of proteins and drugs are multiplied directly and fed to the final fully connect layer. The overall AUC/ACC and the unseen proteins' AUC/ACC are mainly focused, as shown in Figure 5 . We can see that the introduction of the hyper-nodes can indeed improve the prediction performance of DTI, either in the overall AUC and ACC or in the unseen proteins' AUC and ACC. As the number of hyper-nodes increases, the performances are further improved. When the number is at about 64, the overall AUC and unseen proteins' AUC have reached the best values, 0.973 and 0.951 respectively. We speculate that more hyper-nodes mean more bridges between proteins/drugs. The more hyper-nodes measure the relationship between proteins/drugs together and play a similar role of voting. However, too many hyper-nodes will bring excessive costs and the risk of over-fitting. Therefore, we end up introducing 64 hyper-nodes into BridgeDPI. The outbreak of COVID-19 has caused untold damage to human society, and scientists are working hard on drug discovery. The gene sequence of COVID-19 has already been detected. In order to verify the effectiveness of BridgeDPI in practical problems, we test a variety of interactions between current possible antiviral drugs and the protein targets translated from COVID-19's viral viral gene fragments. First, we obtain the protein sequences translated from the COVID-19's gene fragments via NCBI database (Pruitt et al., 2007) . Then, these virus protein sequences and hydroxychloroquine, Chloroquine, etc. drugs are fed into BridgeDPI. Finally, the prediction results are visualized by the heat map. As can be seen from Figure 6 , the main potential targets of these drugs are concentrated in the protein products translated from COVID-19 gene fragment 25393 to 29533, including protein ORF3a, envelope protein, protein ORF6, protein ORF7a, protein ORF7b, protein ORF8, nucleocapsid phosphoprotein. In our results, we find that Dexamethasone and Remdesivir have the most significant effects, with the possibility of their interactions with viral protein products ORF3a, Envelope protein, ORF7b, and Nucleocapsid phosphoprotein exceeding 60%. In fact, many studies and clinical trials have shown that the two drugs are very effective in treating COVID-19: Dexamethasone can significantly reduce mortality in COVID-19 patients (Group and Tso, 2020; Mahase, 2020; Horby et al., 2020) ; Remdsivir can block the replication of COVID-19 (Gordon et al., 2020) , reduce recovery time (Beigel et al., 2020) and improve survival rate (Grein et al., 2020) in COVID-19 patients. In contrast, unrelated drugs such as Radix Isatidis Granule have little interaction potential with viral protein products. These experimental results have verified the validity and reliability of our model in predicting new drugs, indicating that BridgeDPI has a certain guiding role in the actual research and drug discovery. proteins/drugs so that the information of PPA and DDA can be captured, and then get the prediction of DTIs. The experiments show that our approach outperforms other competing methods on the customized BindingDB, C.elegans, Human, DUD-E datasets and achieves state-ofthe-art performances. In order to verify the realistic performances of our model, we perform cross-validation experiments on the different datasets (training on BindingDB dataset, testing on DUDE dataset) and achieve a great result. Finally, the case study with concrete examples reaffirms the usefulness of our model. Kaicd: A knowledge attention-based deep learning framework for automatic icd coding The $2.6 billion pill-methodologic and policy considerations A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking Network analysis and protein function prediction with the prodistin web site Remdesivir for the treatment of covid-19 -preliminary report Supervised prediction of drug-target interactions using bipartite local models Molecular fingerprint similarity search in virtual screening Hidden bias in the dud-e dataset leads to misleading performance of deep learning in structure-based virtual screening Drug-target interaction prediction by random walk on the heterogeneous network An iterative approach of protein function prediction Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Padme: A deep learning-based framework for drug-target interaction prediction Predicting drugdrug interactions through large-scale similarity-based link prediction Interpretable drug target prediction using deep neural representation Bindingdb in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology Remdesivir is a direct-acting antiviral that inhibits rnadependent rna polymerase from severe acute respiratory syndrome coronavirus 2 with high potency Compassionate use of remdesivir for patients with severe covid-19 Dexamethasone in hospitalized patients with covid-19 -preliminary report Molecular docking towards drug discovery Deep residual learning for image recognition Effect of dexamethasone in hospitalized patients with covid-19: Preliminary report Batch normalization: Accelerating deep network training by reducing internal covariate shift A method for stochastic optimization Protein structure-based drug design: from docking to molecular dynamics Deepconv-dti: Prediction of drug-target interactions via deep learning with convolution on protein sequences Improving compoundprotein interaction prediction by building up highly credible negative samples Covid-19: Low dose steroid cuts death in ventilated patients by one third, trial finds Deep learning in bioinformatics Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking Graphdta: prediction of drug-target binding affinity using graph convolutional networks Deepdta: deep drug-target binding affinity prediction Widedta: prediction of drug-target binding affinity Pytorch: An imperative style, highperformance deep learning library How to improve r&d productivity: the pharmaceutical industry's grand challenge Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences Deeplearning-based drug-target interaction prediction Predicting or pretending: artificial intelligence for protein-ligand interactions lack of sufficiently large and unbiased datasets Protein-protein interaction site prediction through combining local and global features with deep neural networks Deepfunc: a deep learning framework for accurate prediction of protein functions from protein sequences and interactions A deep learning framework for gene ontology annotations with sequence-and network-based information Predicting drugprotein interaction using quasi-visual question answering system This work is supported in part by the NSFC-Zhejiang Joint Fundfor the Integration of Industrialization and Informatization underGrant No. U1909208, Hunan Provincial Science and TechnologyProgram 2019CB1007.