key: cord-0927099-qbw2pjnc
authors: El-Behery, Heba; Attia, Abdel-Fattah; El-Feshawy, Nawal; Torkey, Hanaa
title: Efficient Machine Learning Model for Predicting Drug-Target Interactions with Case Study for Covid-19
date: 2021-07-05
journal: Comput Biol Chem
DOI: 10.1016/j.compbiolchem.2021.107536
sha: 69fa4c1859989d2535b0e7e086f890507593d248
doc_id: 927099
cord_uid: qbw2pjnc

BACKGROUND: Discover possible Drug Target Interactions (DTIs) is a decisive step in the detection of the effects of drugs as well as drug repositioning. There is a strong incentive to develop effective computational methods that can effectively predict potential DTIs, as traditional DTI laboratory experiments are expensive, time-consuming, and labor-intensive. Some technologies have been developed for this purpose, however large numbers of interactions have not yet been detected, the accuracy of their prediction still low, and protein sequences and structured data are rarely used together in the prediction process. METHODS: This paper presents DTIs prediction model that takes advantage of the special capacity of the structured form of proteins and drugs. Our model obtains features from protein amino-acid sequences using physical and chemical properties, and from drugs smiles (Simplified Molecular Input Line Entry System) strings using encoding techniques. Comparing the proposed model with different existing methods under K-fold cross validation, empirical results show that our model based on ensemble learning algorithms for DTI prediction provide more accurate results from both structures and features data. RESULTS: The proposed model is applied on two datasets:Benchmark (feature only) datasets and DrugBank (Structure data) datasets. Experimental results obtained by Light-Boost and ExtraTree using structures and feature data results in 98% accuracy and 0.97 f-score comparing to 94% and 0.92 achieved by the existing methods. Moreover, our model can successfully predict more yet undiscovered interactions, and hence can be used as a practical tool to drug repositioning. A case study of applying our prediction model on the proteins that are known to be affected by Corona viruses in order to predict the possible interactions among these proteins and existing drugs is performed. Also, our model is applied on Covid-19 related drugs announced on DrugBank. The results show that some drugs like DB00691 and DB05203 are predicted with 100% accuracy to interact with ACE2 protein. This protein is a self-membrane protein that enables Covid-19 infection. Hence, our model can be used as an effective tool in drug reposition to predict possible drug treatments for Covid-19.

Interactions between drugs and targets indicate that the drug is linked to the target site that causes a change in behavior. Drugs or refers to essential medicines, which have chemical compound that may cause material change in the human body when consumed, by injection or absorbed. The targets are any part of the organism to which the drug is linked to make physiological changes. Predictions of drug-target interactions play a vital role in drug detection aimed at identifying new drug compounds for biological objectives [19] .

The most common drug targets of currently drugs are:

 protein-coupled receptors  Enzymes  Ion channels Figure 1 shows the different biological interactions that could occur between Drug and target. In DTI, the chemical compound of the drug is linked to the target molecule by forming the provisional bonds. The drug attached then interacts with the biological objective to make a positive or negative change by using the messages and then leave the biological target. These drugs goal is preventing some stimulating reactions in the human body to treat diseases, which is achieved by preventing communication with certain enzymes called stilts [19] .

DTIs can occur in different ways. In drug known as competition inhibitors, the drug links itself to the active target site of the reaction. Another drug type, called lostereroic inhibitors, is associated with the site of the lostereroic target. It works on changing the shape and structure of the target to protect some substrate from being identified, and hence prevent the reaction of the target. Preventing the reaction of the target can succeed in correcting the imbalance in the metabolic balance or killing pathogens to treat diseases [25] . Drug can also target the cell receptors, which are the proteins that recognize and respond to the body's own chemical messengers such as hormones and neurotransmitters [6] Receptor proteins are located either on the surface of the cell or inside the effector cell. Receptors perform two essential function; recognition of messenger molecule, and transduction of the signal into a response. There are a huge number of various receptors in the body, which interact with different chemical messengers [17] . Predicting drugs target interaction has different applications; it facilitates the process of drug discovery, drug repositioning and drug side effect prediction [40] .

Targeted drug interactions can be inferred through wet laboratory experiments using various techniques of traditional and inverted drugs. However, laboratory experiments to predict these interactions take time and cost [1] . Computational forecasting DTIs is an open-ended problem, where, machine-learning methods are utilized, and the new data are represented. There are different factors that determine how to use within the silicone to confirm the interaction.

Biological steps during drug target interaction [6] Computational methods for predicting drug-targeted interactions can be broadly categorized into three categories: ligand methods, docking methods and chemogenomic methods. Figure 2 categorized the different methods for prediction DTI [17] . Ligand methods are developed based on the idea that similar molecules are usually associated with similar protein targets. Thus, these approaches predict interactions based on similarities between connections protein. The disadvantage of this method is that the time to perform the calculation increasing with the square of the size of the training set. While docking methods use 3D structures for drugs and proteins to predict whether they will react. Docking approaches are subject to some flaws. For instance, there are some proteins, such as membrane proteins, that do not know structures in threedimensional because predicting their structures is a difficult task [34] . The chemogenomic methods use drug and protein omics data for predictions. Chemogenomic methods can overcome the disadvantages of the existing ligand and docking techniques. It can use large-scale biological data that is readily available in public databases on the Internet.

Chemogenomic approaches can be categorized into different categories, such as machine learning-based methods, graph methods and network methods [11] . Among all the chemogenomic approaches, the machine Learning methods have gained the greatest attention to reliable predictive results. Machine Learning methods can be more broadly categorized into two categories; features and similarity methods [17] The feature methods represent target-drugs pairs with a carrier of descriptors. The different properties of drugs and proteins are encoded as corresponding features. In feature methods, the interactions of target drug pairs are predicted by the discovery of the most distinctive features. Therefore, the inputs of these methods are different vector factors that can result from combining drug and targets characteristics [9] .

A vector is calculated for features of drug (D1, D2, ... Dn) and vector features target (F1, F2, ... Fm) independently. These vectors can be calculated by identifying some of the hallmarks of coding or using some bioinformatics software packages that can automatically calculate their chemical or biological features. As these vectors often have a huge dimension, some methods use dimensionally reduction techniques to reduce the number of the features, and thus improve performance and the efficiency of the prediction [11] . Similarity methods are developed to calculate the similarity between drug compounds and the targeting proteins. They generate a similarity matrix using different strategies to measure similarity.

In this paper, we proposed DTIs prediction model using heterogeneous dataset (features and sequences information) of drugs and target proteins. The model we present obtains features from J o u r n a l P r e -p r o o f amino-acid protein sequences using physical and chemical properties, and from smiles (Simplified Molecular Input Line) series using coding techniques. Our goal in this work is to evaluate different machine learning techniques and emphasize which techniques provide more accurate prediction using this dataset. In our model, a data extraction and preprocessing for the drug and protein sequences is first performed. Next, we adopt a methodical prediction scheme and introduce several machine learning approaches. For instance; ensemble learning techniques (LightBoost, XGBoost, and ExtraTree), deep learning (DBN), and traditional machine learning methods (random forest(RF), and support vector machine (SVM)). Our proposed model is compared with various recent approaches for validation. The empirical results show that our DTI prediction model provides more accurate results from both structures and feature data, which indicates that the proposed model can predict drug-target interactions effectively. Covid-19 is a type of viral infection that causes symptoms like pneumonia. It was first reported in Wuhan, China, in December 2019. Its outbreak on 30 January 2020 is the third major outbreak of a severe virus in the population. Covid-19 is believed to be less lethal but more gastric than SARS-SIV [22] . Drug detection organizations worldwide are working tirelessly to assess the compounds that can prevent the spread of SARS-COV-2 in humans. To achieve this, it is necessary to identify drug targets and then identify and evaluate vehicles and biological agents that can effectively involve them and discourage their spread [3] . However, novel drug development process is time consuming and mostly requires several years of work before clinical approval. Repurposing of drug is an effective strategy to tackle new diseases. Computational methods could be utilized for repositioning of drugs. As low time is required for, these methods are beneficial for high throughput screening of the available drugs. In this paper, we used our trained DTIs prediction model to identify the available drugs that could influence Covid-19 viral proteins.

The main contributions in this paper could summarize as: I. Extract combined structured data (from the DrugBank) and the feature data (from benchmark data). Preprocessing the protein sequences and the drug's smile into a set of descriptors to convert sequences data into features. II. Apply these data on different machine learning techniques, deep learning techniques and ensemble learning techniques to predict interaction between the drugs and their target proteins in the human cell. III. Experimental comparison among these learning techniques on the extracted data reveal that results obtained by ensemble techniques like LightBoost and ExtraTree were 98% and F1-Score 0.97 compared to 94% and 0.92 achieved by current methods on either structure only of feature only dataset. Moreover, our model can predict more undetected interactions and thus can be used as a practical tool for drug repositioning. IV. Our model is applied on the proteins known to be affected by Covid-19 to predict possible interactions between these proteins the existing drugs announced in DrugBank.

Which leads to discovering the drug reposition in the case of Covid-19 infection, which use the proteins affected by Covid-19 in the human cell. Such as the ACE2 protein, which interact with DB00691and DB05203 with predicted probability of 100%, and GBRA3_HUMAN protein interacts with umifenovir with predicted probability equal 100%.

The rest of the paper is organized as follow; section 2 presented the existing prediction methods. Section 3 explains our proposed model with detailed description about the used techniques and workflow. In section 4, the results and discussion are provided with the case study for Covid-19. Finally, conclusion and future direction are presented in section 5.

Recently, various techniques have been proposed using machine learning models for DTI prediction in the last decade. Wei Wang et.al. (2020) [31] studies on drug-protein interactions were of great importance regarding drug repositioning. Suggest a prediction method of a connection based on the reaction of proteins and drug (DPI) and local structural similarity (DLS). The DLS approach combines the prediction of a link with the binary network structure of the DPI prediction. The validation method applied ten times is applied in the trial. After comparing the expected capacity of DLS with a network optimization such as the forecast method, and DLS results on the test set are much better. In addition, several proteins nominated for three approved medicines were predicted (captubril, dezferyuksamine, and losartan) and their prediction were also validated through literature. In addition, the combination of the CN and DLS models provided a new idea of the integrated application of the prediction for link method.

In [24] , authors introduced the deep learning structure model, which uses the special power of convolutional neural networks (CNNs) to obtain 1D representations of protein amino acids sequences and SALS (simplified molecular input line input system) strings. The results achieved show that using CNNs to obtain representations of data, rather than traditional specifications, improves performance across all models and is clearer because of the difference between using and obtaining deep representations from the protein sequence, smiles, and global specifications. In addition, it is also possible to highlight the difference between applying traditional automated learning techniques.

In [15] , the authors drug specifications, protein sequence data from DrugBank and domain information from the NCBI protein. DTIs data verified by DrugBank has been downloaded, and a new approach based on similarity has been developed to build negative DTIs. Multiple LASSO models to combine different sets of features to explore the strength of prediction and prediction DTIs is proposed. The proposed LASSO-DNN performance is compared to LASSO, Standard logistic regression (SLG), Support Vector Machine (SVM), and standard DNN models, showing that effective drug representation and targeted traits are necessary to build learning models for DTIs prediction. Genes responsible for disease-related risks have been identified from a wide range.

Another research in [30] , an algorithm uses NRLMFβ (Neighborhood regularized logistic matrix factorization) score when a pair of drugs and proteins are associated with the least reaction information is presented. The beta distribution shape is determined by the value of the parameter quantity of the β that represents the degree of reaction information. The NRLMF score has been recalculated by specifying the expected value of the beta distribution defined as NRLMFβ. In the evaluation experience, in order to compare NRLMF and NRLMFβ generalization performance, the verification across four data sets of nuclear receptors, GPCR, ion channel, enzyme, and calculated the average values of AUC and AUPR [22] .

An automated learning method was proposed in [11] to predict the targeted interactions of drugs called LRF-DTIS. First, specific registration matrix (PSEPSSM) and molecular fingerprint is extracted the characteristics of drug target. Second, use Lasso for feature selection method and then clawing the minority technique was used in the Oversampling (smote) to handle unbalanced data. Finally, the treatment was done, and the theme vector was entered into a random forest workbook (RF) to predict interactions between target medicines.

In [33] , the authors offer PDTI-ESSB, a new computational model for determining the DTI index using the protein chain and the molecular structure of drugs. More specifically, each molecule of drugs is transformed as the molecular substructure Fingerprint. For protein sequence, different configurations are used to represent their evolutionary, sequential, and structural information. In addition, using data-balancing techniques to address the imbalance problem and apply feature selection methods to extract the important features. They used four categories of target indicator standard. In [21] , a framework has been developed based on deep learning called DeepDTIs. First, it extracts representations from initial input specifications using unsupervised pre-training model and then applies known reaction groups to labels to build a rating model. Tables 1 summarize and compares theses DTIs prediction methods to identify interactions related to our proposed model. 

Protein sequences and Smiles strings were extracted from DrugBank dataset (https://www.drugbank.ca/releases/latest#structures), in their Canonic form. We use protein sequences and smiles strings directly respectively to input in the descriptor stage.

The second dataset consists of a four types of benchmark datasets; nuclear receptor, GPCR, ion channel and enzyme which are previously issued by Yamanishi et al. [38] . Table 2 . 

Our model of predicting drug targeting is based on similar assumption drugs often target similar target proteins. Using conventional methods of unknown interactions between targeted drugs as negative examples may result in bias because unknown interactions between targeted drugs may contain undetected interactions between the targeted drugs. All known interaction pairs in datasets are defined as positive samples. For negative interaction, drug-target pairs are derived from the known interaction pairs with different and randomized sequences are selected.

Our DTIs prediction model using directly protein sequences, features, and smiles (1D raw data) is shown in figure 3. Our model consists of i.

Data preprocessing iii. Machine learning Prediction techniques.

In this stage, data is converted into the corresponding values using the protein and drug descriptor. For protein, conversion regulates amino acids in 7 groups according to their physical chemical properties. Each amino acid is encoded to an integer based on the corresponding group from reference protein substitution table [6] . This method scans triads one by one along the sequence of amino acid group. In the case of Smiles strings, a simple correct-encoding technique was used to convert each character of the strings to an integer to result the sequence of values uses as feature of the drugs. In the conversion stage using a standard dictionary for each protein sequences and smiles strings to extract features, it is necessary to set a threshold on the basis of 

Preprocessing data is a data manipulation technique that involves converting raw data into understandable format. Data in the real world is often incomplete, inconsistent in some behaviors or trends, and likely to contain many errors. Preprocessing data is a proven way to solve these problems. The main step in the model-preprocessing phase is to feature scaling. A feature scaling is a method used to normalize the range of independent variables or data features. The feature scaling limits the variety of parameter so that you can compare them to common foundations. It often helps to speed up calculations in machine learning techniques [17] . In our model we used the MinMaxScaler() for transform the training and testing data to scaling form.

For bioinformatics research, machine learning plays an important role in filtering large amounts of data into patterns. The overall learning workflow techniques in target-drug prediction (DTI) can be divided into two steps. First, training the basic model based on a set of learning rules; and secondly, using the trained model to forecast a test dataset. In our model different ML algorithms have been tested, and the results from these algorithms are evaluated and compared with the most recent methods. These algorithms are support vector machine (SVM), Random Forest (RF), ensemble learning techniques (LightBoost, XGBoost, and ExtraTree), and deep learning techniques (DBN, CNN and ANN) [32] .

SVM is a supervised machine-learning algorithm, which can be used for both classification and regression problems. In the SVM algorithm, we draw each data element as a point in the Ndimensional space (where n is the number of features) with the value of each feature being a specific coordinate value. Then, the prediction is executed by finding the plane that most characterizes each category of the data. In our model, SVM parameters used are {C=1.0, kernel='rbf', degree=3, gamma='scale'} Parameters are as follows:

 C: It is the regularization parameter, C, of the error term.

 kernel: It specifies the kernel type to be used in the algorithm. It can be 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed', or a callable. The default value is 'rbf'.

 degree: It is the degree of the polynomial kernel function ('poly') and is ignored by all other kernels. The default value is 3.

J o u r n a l P r e -p r o o f  gamma: It is the kernel coefficient for 'rbf', 'poly', and 'sigmoid'. If gamma is 'scale', then 1/n_features will be used instead.

Random Forest algorithm consists of several individual decision trees that act as a division. Each individual tree is bound by a class and layer prediction where most sounds prediction becomes the model. Random forests work well for a wide range of data elements from a single decision tree. Also, RF algorithm good accuracy can be maintained even with a large percentage of data missing. 

XGBoost is improve group model based on Gradient Tree Boosting (GTB) [39] , widely used in classification tasks by researchers. It follows the same procedure as the GTB algorithm with a slight change in the regular target to improve the models efficiency. In our model, we use default parameters {max_depth=5 ,learning_rate= 0.2612, n_estimators= int(75.5942), reg_alpha = 0.9925, nthread = -1, objective='binary:logistic'}.

LightBoost is a fast, high-performance, consolidation framework distributed based on the decision tree algorithm. It is often used to arrange, classify, and many other tasks in automated learning. It is proven to perform well with large data sets with a significant decrease in training time compared to the XGBoost [10] . 

ExtraTree algorithm is another extension of bagged decision tree ensemble method. In this method, the random trees are constructed from the features of the training dataset. It assembles the results of many interconnected decision trees collected in a 'forest' with the aim of producing the result of their classification. In our model, its optimization parameters are { n_estimators=100, max_depth=None, min_samples_split=2}

Artificial neural networks (ANN)

Deep learning methods are a branch of machine learning. Deep Learning has been used in many categories of biology and chemistry. A main advantage of abstract representations in deep learning is that they can be fixed for local changes in input data. ANN is multilayer fully connected neural networks. Based on trying different ANN architecture, our ANN model consists of an input layer, multiple hidden layers, and an output layer. Every node in one layer is connected to every other node in the next layer. The hidden layers are five layer and the size of each equal 10 neuron, the full architecture of the ANN network is shown in Table 3 . The number of iterations equal 100, batch-size = 32, activation function is Relu function in the output layer and the activation function is sigmoid in hidden layers.

DBN is a neural network made by stacking restricted Boltzmann machine (RBMs) and trained in a greedy manner. Training on the DBN network can be divided into two successive operations: unsupervised greed training and supervised fine-tuning process. The architecture of the ANN network is shown in Table 3 , where the two hidden layer has 256 node, learning-rate-rbm=0.05, number of iterations =100, batch-size=32, and activation-function is relu. The proposed model training process is as follows:  Initializing parameter W, b, c by using random generator where W represents the weights that connect hidden and visible units. b and c are the offsets of the visible and hidden layers, respectively.  Train the first and second layer as RBM, using the raw input vector x as its visible layer.  Train the second and third layer as RBM, taking the second layer as visible layer and obtain. the representation of third layer. While the supervised fine-tuning process is as follows:

 Using the output of the last hidden layer of the DBN as the input of the logistic regression classifier (LR).  Fine-tune all the RBM and LR parameters via supervised stochastic-gradient-descent (SGD) of the DBN log-likelihood cost.

Convolutional neural network (CNN) is network architecture for deep learning. A CNN is comprised of one or more convolutional layers then followed by one or more fully connected layers as in a standard multilayer neural network and pooling layer.

J o u r n a l P r e -p r o o f  Convolutional Layer: The main task of the convolutional layer is to detect local conjunctions of features from the previous layer and mapping their appearance to a feature map.  Fully Connected Layer: The fully connected layers in a convolutional network are practically a multilayer perceptron.  A pooling layer provides a typical down sampling operation, which reduces the in-plane dimensionality of the feature maps in order to introduce a translation invariance to small shifts and distortions and decrease the number of subsequent learnable parameters. It is of note that there is no learnable parameter in any of the pooling layers, whereas filter size, stride, and padding are hyper parameters in pooling operations, similar to convolution operations. 

The different measurements to predict the target reaction of drugs to evaluate and compare different techniques are:

The accuracy of the test is its ability to distinguish negative from positive conditions correctly.

To estimate the accuracy of the test, we should calculate the true positive negative ratio in all cases evaluated. If TP is true positive, TN is true negative, FP is false positive and FN is false negative, the accuracy can be stated as:

The precision visualizes the ratio of positive reactions that are correct. The reminder shows the ratio of positive reactions that have been correctly identified. It can be calculated as:

J o u r n a l P r e -p r o o f F1-Score evaluates the balance between precision (p) & recall (r) in the system, and estimated as:

The 

Its value ranges from −1 to 1, where −1 is a false binary learning method and 1 is a completely valid binary learning method [8] . Mathew's correlation coefficient can be calculated as:

Finally, the time it took for various training forecasts, as well as for prediction purposes, is also a metric assessment and comparison of different techniques.

MSE measures the average of squares of the errorsthe average of the quadratic difference between actual value and estimated values. MSE measure of estimated qualityalways nonnegative, and values closer to zero are better. MSE estimated with the following equation:

Add the model evaluation experiments are run on a computer with the following characteristics, The algorithms are accelerated on the operating system is Windows 10 with 2.50 GHz Intel core i5 processor and 4GB RAM.

In this section, we highlight the empirical results of our proposed prediction model for DTIs implementing on two datasets, which are protein sequences, and drug SMILES (1D raw data) and features data. Each technique is applied using scikit-learn, ensemble package, kares library, tensorflow library and XGBoost package in Python language (version 3.6).

The results in Table 4 report the accuracy, mean square error, MCC and f-score achieved by different algorithms. Using benchmark dataset, the best accuracy score value 0.98 were obtained by LightBoost and ExtraTree ensemble learning, while RF achieved the 2nd best value of 0.97. For DrugBank dataset, the best precision score value 0.966 were obtained by ExtraTree ensemble learning, and Random Forest achieved the 2nd highest value of 0.96. ExtraTree algorithm also provides the highest F1-score for prediction. Table 4 Results of the deep, machine and ensemble techniques according to Accuracy, Mean Square Error, MCC Score and F1-score.

In The area under the curve (AUC) is calculated based on ROC curve for each model to describe the quality of the work, which provides more accurate visual interpretation for Drug Target Interactions prediction. Figure 4 show the ROC curve and the value of the area under the curve (AUC) for the learning methods. For DrugBank datasets the random forest and ANN method predict maximum value in the AUC=0.937 for DrugBank data set in the bench mark data set the extra tree method predict the maximum value in the AUC=0.982.. Precision-Recall (PR) Curve, as presented in figure 5 , is simply a graph with Precision values on the y-axis and Recall values on the x-axis. It is important to note that Precision is also called the Positive Predictive Value (PPV). Recall is also called Sensitivity, Hit Rate or True Positive Rate (TPR) [14] . The method gives the highest precision recall curve in the case of sequence data is Random Forest method and in the case of features data is DBN. Figure 4 : the results for the ROC curve and the value of the area under the curve (AUC) for the learning methods which shown the random forest and ANN method predict maximum value in the AUC=0.937 for DrugBank data set in the bench mark data set the extra tree method predict the maximum value in the AUC=0.982.

shows the precision and recall curve for sequence and features Data. The tradeoff between Precision and recall shows a different threshold. High space below the curve represents both high recall and high accuracy, where high resolution is related to low false positive, and high recall is associated with a low false negative rate. the precision and recall curve is better option for evaluating model performance.

From figure 5 , the random forest technique is the highest area under -precision and recall curve that mean it is the better in the case of sequence data. The DBN technique is the highest area under precision and recall curve that mean it is the better in the case of feature data (Benchmark).

Here we compare the drug target interaction prediction between our model and two of the recent state of the art methods first method, introduced a deep learning structure model, which uses the special strength of convolutional neural networks (CNNs) to obtain 1D representation of protein amino acid sequences and SALS series (simplified molecular input line input system). The results show that CNN's use of representative data, rather than traditional specifications, improves performance and performance in all models and is more evident because of the difference between using and obtaining deep representative data from protein sequences, smiles and global specifications [13] . While the second method offer a new similarly similar approach was developed to build a negative DTIS. Multiple Lasso models were suggested to combine different sets of features to explore the power of prediction and prediction DTIS. Additionally, enter the LASSO-DNN model to predict DTIS. Lasso DN's proposed performance compares to LASSO, standard logistic regression (SLG), support vector machine (SVM), and standard DNN models, showing that effective drug representation and targeted traits are necessary to build learning models for DTISs prediction [15] . Our method outperforms all other method by achieving the best performance across all DrugBank dataset. From the table 6, our work (highest average accuracy=0.93) which is 2%higher average accuracy than the first method from [24] and 12% higher average accuracy than the first method from [15] . 

It will take at least several years for drugs to develop from scratch. Repositioning effective current anti-retroviral drugs in the Covid-19 may be the only solution to the current pandemic of sudden infectious diseases. Human Immunodeficiency System plays a crucial role in the recurrence of coronaviruses, and the use of interferon will enhance the immune response to Covid-19. There are a huge number of researches that used data driven methods to identify potential treatments for the virus, many of these works has used data from resources such as DrugBank. Anti-virus drugs targeting SARS-COV-2 can be classified into two main categories, with the first group targeting virus-host interactions or preventing viral aggregation. Another approach would include drugs that modify the insidious immune responses of hosts with a wide spectrum [7, 9] . In [5] the author using the MT-DTI deep learning model pre-training that interactions understand the goal of drugs without knowing the field, MT-DTI successfully identified EGFR receptors as target drugs used in clinics (in the top 30 prospective candidates) among 1,094 chemical compounds registered in the DrugBank database in a previous study . This indicates that 3D structured information from proteins and/or molecules is not necessarily required to Predict drug target reactions. There is currently no evidence to support that these drugs may be effective in discouraging Covid-19 [8] . In addition, atazanavir seems effective in covid-19 by demonstrating comprehensive, high-binding antiretroviral approaches to six Covid-19 proteins, including 3C-like proteins and complex replication components. Another research in [36] provides guidelines on how to use artificial intelligence (AI) to accelerate the redefinition or repositioning of drugs and show which AI approaches are not only massive but also necessary. It discusses how AI models can be used in precision medicine, as an example, how AI models can accelerate the reuse of Covid-19 drugs. Artificial intelligence and networking technologies can be developed rapidly, powerful, innovative, and accelerated therapeutic development. This review provides a strong rationale for using AI-assisted tools to reuse medicines for human diseases, including during the "Covid-19 " pandemic.

In this case study, we employ our DTIs prediction model relies on data releasing from DrugBank and published in previous research [28] , in order to identify the potential interacting protein with these set of drugs as shown in tables 6. Our work uses the trained model with the best accuracy for achieve the accurate drug discovery for the Covid-19, where we use the random forest, light boost and extra tree techniques for drug target interaction prediction, then favor among them based on the highest prediction probability. First, we download the sequences of the proteins (target) from the web site (https://www.uniprot.org/) and the drugs from DrugBank then using our model to predict the drug target interaction in Covid-19. The experimental results in Table 5 report that the drugs which uses to treatment the Covid-19 and Influenza not affected on the proteins affected by coronavirus within the human cell. From the table 7, the umifenovir drug interact with GBRA3_HUMAN protein with Prediction probability equal 1.0, this result has been taken from the random forest technique. Ritonavir drug interact with cytochrome C oxidase polypeptide II protein (which uses in the respiratory chain that catalyzes the reduction of oxygen to water) also with Prediction probability equal 1.0. Darunavir drug interact with DNA-TO -POISOMERASE II protein with prediction 1.0 probability.

There are a number of proteins that are found to effect on Covid-19 virus [26] .In this study, we use the proteins discovered to be affected by coronavirus within the human cell through research [2] . The outer surface of the SARS-COV-2 virus is made of Spayk protein (S), protein envelope (E), membrane protein (M), and protein nuclopide (N). The M and E proteins in Morphogenesis share the virus and the assembly. ACE2 is an endogenous membrane protein that enables Covid-19 infection. During infection, the extracellular peptidase domain of ACE2 binds to the receptorbinding domain of spike protein, which is a surface protein on SARS-COV-2. The Spike protein present in both SARS-COV and SARS-CoV-2 binds to the host cell through the receptor-binding protein called angiotensin-converting enzyme 2 (ACE2) [29] , which is located on the host membrane cell surface. While both SARS-COV and SARS-COV-2 bind to the same host cell as ACE2, the SARS-COV-2 binding affinity to ACE2 is significantly higher than that of SARS-COV-2. The viral protein responsible for hosting and replication of SARS-COV-2 entry is identical in structure to SARS-COV-2 [20, 4] . We discover the drugs that related to these proteins to uses in Covid-19 drug reposition.

Drugs for the proteins that it is influence on Covid-19, mentioned at [23] , contain the drug name, predicted interact drugs, prediction probability.

Angiotensin-converting enzyme 2(Q9BYF1) [ The experimental results in Table 8 reports the predicted drugs that interact with the proteins affected by coronavirus within the human cell. The results are taken from the extra-tree techniques and light-boost where the N-protein and S-protein interact with ZINC DRUGS with Prediction probability 0.8. The ACE2 protein interacts with DB0069 and DB05203 with 1.0 Prediction probability. Moexipril (DB00691) is a non-sulfhydryl containing precursor of the active angiotensin-converting enzyme (ACE) inhibitor moexiprilat. It is used to treat high blood pressure (hypertension). The Prediction probability between the ACE2 protein and moexipril equal one.

In this paper, we present workflow for identifying DTIs by incorporating various datasets. This proposed model is capable of successfully predicting drug target pairs based on both sequence and protein structural features. Where most of the previous methods considered evolutionary features from protein, amino acid sequences using physical chemical properties and drug. Our proposed data preprocessing makes the prediction difficulty more flexible regarding space complexity and running time. Experimental results prove that our predictive models can successfully identify more interactions between the drugs and proteins in human cell. 

Computational Prediction of Drug-Target Interactions via Ensemble Learning

Computational Methods for Drug Repurposing

Automatic Feature Scaling and Selection for Support Vector Machine Classification with Functional Data

Entry mediated by the viral spike protein

Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model

Adenosine receptors as drug targets-what are the challenges?. Nature reviews Drug discovery

COVID-19: Finding the Right Fit) Identifying Potential Treatments Using a Data-Driven Approach

Potentially highly potent drugs for 2019-nCoV

A new coronavirus associated with human respiratory disease in China

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure

Similaritybasedmachine learning methods for predicting drug target interactions: a brief review

CoronavirusesMethods and Protocols

The Relationship Between Precision-Recall and ROC Curves

Predicting drug-target interaction network using deep learning model

Mapping the landscape of artificial intelligence applications against COVID-19

A comprehensive review of feature based methods for drug target interaction prediction

A comprehensive review of feature based methods for drug target interaction prediction

Drug-protein binding: a critical review of analytical tools

Receptor recognition mechanisms of coronaviruses: a decade of structural studies

Deep-Learning-Based Drug−Target Interaction Prediction

Target identification and mechanism of action in chemical biology and drug discovery

Enzyme annotation in UniProtKB using Rhea

Deep neural network architecture for drugtarget interaction prediction. InInternational Conference on Artificial Neural Networks

GPCRomics: An Approach to Discover GPCR Drug Targets

DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank

Receiver Operating Characteristic (ROC) Curvefor Medical Researchers

Machine Learning for Drug-Target Interaction Prediction

COVID-19: A drug repurposing and biomarker identification by using comprehensive gene-disease associations through protein-protein interaction network analysis

NRLMFβ: Betadistribution-rescored neighborhood regularized logistic matrix factorization for improving the performance of drug-target interaction prediction

DLS: A Link Prediction Method Based on Network Local Structure forPredicting Drug-Protein Interactions

A survey of deep neural network architectures and their applications

Prediction of Drug-Target Interaction Based on Protein Features Using Undersampling and Feature Selection Techniques with Boosting

Application of text mining in the biomedical domain

DrugBank 5.0: a major update to the DrugBank database

Artificial intelligence in COVID-19 drug repurposing

G-protein coupled receptors: advances in simulation and drug discovery

Prediction of drug-target interaction networks from the integration of chemical and genomic spaces

Deep learning in bioinformatics: Introduction, application, and perspective in the big data era

Computer-aided drug discovery and development

This statement is signed by all the authors to indicate agreement that the above information is true and correct