key: cord-0031178-78rl00re
authors: You, Yujie; Lai, Xin; Pan, Yi; Zheng, Huiru; Vera, Julio; Liu, Suran; Deng, Senyi; Zhang, Le
title: Artificial intelligence in cancer target identification and drug discovery
date: 2022-05-10
journal: Signal Transduct Target Ther
DOI: 10.1038/s41392-022-00994-0
sha: 8e08f45826991850f3077dc09206ae4a4f94d194
doc_id: 31178
cord_uid: 78rl00re

Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates.

As one of the cutting-edge cancer treatments, targeted drug therapy has the advantages of high efficiency, few side effects, and low drug resistance for patients 1 . However, there are several drawbacks to the existing targeted therapies, such as a few druggable targets 2 , ineffective coverage of the patient population, and the lack of alternative responses to drug resistance in patients 1 . Therefore, identifying novel therapeutic targets and evaluating their druggability 3, 4 becomes the current cancer research focus of targeted drug therapy.

Since we have difficulty in comprehensively understanding the pathogenesis of cancer due to the complexity of the disease 5 , most of the current targeted drugs are developed based on the experimentally validated hypothesis that can explain a possible mechanism underlying carcinogenesis but ignore other facts of the disease 6 . As a result, these therapies could have undesired impacts on normal tissues and even provoke serious side effects for patients 7, 8 .

To elucidate the molecular mechanisms underlying cancer genesis, interactome data can be comprised and modelled in network structures in which components are biological entities (e.g., genes, proteins, mRNAs, and metabolites) and edges are associations/interactions between them (e.g., gene co-expression, signalling transduction, gene regulation, and physical interaction between proteins [9] [10] [11] [12] [13] [14] . Artificial intelligence biology analysis algorithms are effective method to process the biological network data, which build machines or programs to simulate human intelligence, so as to implement classification, clustering and prediction tasks in biological network 15 . Therefore, artificial intelligence algorithms can effectively tackle the complexity of cancer that arises from interactions between genes and their products 16, 17 in biological network structures, so as to improve our understanding of carcinogenesis 11, 12, [18] [19] [20] [21] [22] and explore novel anticancer targets [23] [24] [25] [26] [27] [28] [29] .

Over the past few decades, we have seen a fast development of artificial intelligence biology analysis algorithms. To make this study easy to understand, we not only divide these artificial intelligence algorithms into network-based biology analysis algorithm and machine learning-based (ML-based) biology analysis algorithm according to the data of biological network structure, but also employ Fig. 1 to describe the historical milestone for these artificial intelligence biology analysis algorithms.

On the one hand, network-based biology analysis algorithms provide a variety of alternative network approaches to identify cancer targets. More importantly, various network-based biology analysis algorithms can investigate network data from different perspectives, therefore they can compensate each other to provide accurate biological explanations 30 .

On the other hand, ML-based biology analysis [31] [32] [33] not only can efficiently handle high throughput, heterogeneous, and complex molecular data, but also can mine the feature or relationship in the biological networks. Thus, we should develop more ML-based biology analysis algorithms to provide such advanced biology analyses that can allow precise target identification and drug discovery for cancer.

Although artificial intelligence biology analysis has been widely used to improve our understanding of carcinogenesis, to the best of our knowledge, there is no systematic review that introduces the scope of related research and explains the network-based and the ML-based biology analysis algorithms to identify novel anticancer targets and discover drugs. Therefore, in the next section, we will describe the scope of artificial intelligence biology analysis for novel anticancer targets investigation. In the third section, we will introduce the basic principles and theory of commonly used artificial intelligence biology analysis algorithms. Then, we will briefly review and discuss studies that utilize network-based and ML-based biology analysis for cancer target identification and drug discovery. Finally, we will summarize the content of the article, discuss the limitations and challenges faced by the community, and point out the potential of artificial intelligence biology analysis to identify the therapeutic targets and discover drugs for cancer.

Recently, the rapid development of cancer-related multiomics technologies [34] [35] [36] has been one of the most important factors for artificial intelligence biology analysis to explore novel anticancer targets [37] [38] [39] . Figure 2 classifies these technologies into five aspects: epigenetics, genomics, proteomics, metabolomics, and multiomics integration analysis. Furthermore, Table 1 lists the related major diseases, drug targets, genomics, and network databases commonly used in multiomics integration analysis for these five aspects. Next, we will detail these five aspects.

Epigenetics analyses the reversal modifications of DNA or DNArelated proteins 54 . These modifications affect gene expression without changing the DNA sequence 54 . Investigating epigenetic data through artificial intelligence is not only important for elucidating fundamental mechanisms of cancer but also necessary for the design of targeted therapeutics. For example, Wilson et al. 55 took advantage of information-rich transcriptomic and epigenetic data to study regulatory networks surrounding histone lysine demethylation and highlighted the importance of epigenetic regulators in mitogenic control and their potential as therapeutic targets, which showed that epigenetic regulators such as KDM1A, KDM3A, EZH2, and DOT1L 56 are critical in oncogenesis and drug resistance.

Genomics aims to characterize the function of every genomic element of an organism by using genome-scale assays such as genome sequencing 57 . Applications of genomics include finding associations between genotype and phenotype 58 , discovering biomarkers for patient stratification 59 , predicting the function of genes 60 and charting biochemically active genomic regions such as transcriptional enhancers 49 . Recent developments in networkbased biology analysis methods, such as sequence-similarity networks, genome networks, and gene family networks, have significantly improved the usability of molecular datasets in comparative genomics analysis 61 . These network methods collect expression and interaction data in the beginning and then transform them into interpretable biological processes 62, 63 , leading to the identification of tumour subtypes and the discovery of drug targets 64 .

For example, Medi et al. 65 integrated gene expression profiles into genome-scale molecular networks to identify novel therapeutic targets for cervical cancer, including receptors, microRNAs (miRNAs), transcription factors (TFs), proteins (e.g., CRYAB, CDK1, PARP1, WNK1, GSK3B, and KAT2B), and metabolites (arachidonic acids). Laura et al. 66 developed a network-based biology analysis workflow that integrates different layers of genomic information, including transcription factor cotargeting, miRNA cotargeting, protein-protein interaction and gene coexpression, into a biological network. Then, the authors applied a consensus clustering algorithm (An ML-based biology analysis algorithm that divide the network into sub-modules with different functions) 67-73 on identified network communities to discover cancer driver genes, which demonstrated that F11R, HDGF, PRCC, ATF3, BTG2, and CD46 could be oncogenes and promising markers for pancreatic cancer.

For proteomics, proteomic experiments are performed for annotation and correlation of genome sequences, quantitation of protein abundance, detection of posttranslational modifications, and identification of protein-protein interactions (PPIs) 74 . PPIs not only play fundamental roles in structuring and mediating biological processes but also have been widely used for proteomics data analysis 75 . For example, Vinayagam et al. 37 analysed the human PPI interaction network to identify indispensable proteins that affect the controllability of the network with control theory 76 , which shows that if a system can be driven from any initial state to any desired final state in finite time with a suitable choice of inputs, the system is controllable. By changing the number of driver nodes in the network upon removal of that protein, the hub can be classified as "indispensable" "neutral" or "dispensable", which correlates with increasing, no effect, or decreasing the number of driver nodes in the network upon removal of the key protein. The evidence shows that these indispensable proteins are primary targets of disease-causing mutations, viruses, and drugs. Furthermore, analysing data from 1,547 cancer patients revealed 56 indispensable genes in nine cancers. 46 of these genes were associated with cancer for the first time, demonstrating the ability of intelligent network controllability analysis to identify novel disease genes and potential drug targets 77 . Moreover, Valle et al. 78 developed a network-based biology analysis framework to compute the proximity between polyphenol targets and disease proteins. The calculated results indicated that the diseases whose proteins are proximal to polyphenol targets have significant gene expression changes, while the diseases whose proteins are distal to polyphenol targets have no such change. The network relationship between disease proteins and polyphenol targets provides not only a computing method to reveal the effect of polyphenols on diseases but also a basis to identify novel anticancer targets.

Metabolomics is routinely applied for biomarker discovery by profiling metabolites in biofluids, cells and tissues 34 . Because of the inherent sensitivity of biotechnology, subtle alterations in metabolic pathways can be detected to provide insights into the mechanisms that underlie various physiological conditions and cancer processing 34 . Owing to innovative developments in network biology, researchers employ biological networks to perform metabolomic analyses and provide us with a systemslevel understanding of the role that metabolites play in cancer.

For example, Basler et al. 79 proposed an effective networkbased biology analysis framework for the systematic study of flow control and identification of driver reactions in large-scale metabolic networks. They found that the driver reactions were under complex cellular regulation in Escherichia coli, suggesting their preeminent role in facilitating cellular control. Correlation statistics indicate that the driven response plays an important role in inhibiting tumour growth and represents a potential therapeutic target.

For multiomics integration analysis, addressing the complexity of tumour-host interactions requires an approach to handle integrative omics data 80 . Compared to single omics studies, multiomics data provide researchers with various and interconnected molecular profiles to study carcinogenesis 80 . Thus, integrated multiomics datasets in a network structure to artificial Fig. 2 Artificial intelligence to integrate multiomics data (e.g., epigenetics, genomics, proteomics, and metabolomics) for cancer therapeutic targets identification. (Created with BioRender.com)

intelligence biology analysis has emerged as a powerful tool to fully appreciate the complex interlayer regulatory interactions in cancer progression. Such an approach allows us to benefit from prior information that can be summarized and presented in networks, thereby providing us with insights into carcinogenesis from an overall perspective 81 .

For example, Gov et al. 82 first performed comparative analyses of transcriptome data, and then identified common and tissuespecific reporter biomolecules such as genes, receptors, membrane proteins, TFs, and miRNAs. Second, they used the interactions among receptors, TFs, miRNAs, and their targeted DEGs to reconstruct a tissue-specific network for ovarian cancer and used network-based biology methods to identify interaction hubs. Finally, GATA2 and miR-124-3p were identified as hub nodes, suggesting that they are potential biomarkers for ovarian cancer.

This study divides these commonly used artificial intelligence biology analysis algorithms into two categories. One is networkbased biology analysis algorithm, including shortest path 83 , module detection 84 , and network centrality 85 ; the other is MLbased biology analysis algorithm including decision tree [86] [87] [88] and deep learning models [89] [90] [91] .

The principles and theory of network-based biology analysis algorithms Biological networks are efficient in integrating complicated biological data, because they can capture the property of biological entities and their relationships 92 . Mathematically, a network can be represented as a graph G = (V, E) where V and E are a set of nodes (vertices) and edges, respectively. Nodes in biological networks can represent proteins, genes, diseases, and drugs and edges in the network represent various biochemical physical or functional interactions between nodes. Therefore, network-based biology analysis algorithms focuses on identifying therapeutic targets and discovery of novel drugs for cancer from molecular networks such as protein-protein interaction networks 75 , gene regulatory networks 93 , metabolic networks 94 , and drug-drug interaction networks 95 .

Computational biologists have developed several networkbased biology analysis algorithms to effectively process and analyze non-ordered or non-Euclidean data in biological networks, which can perform tasks such as link prediction 96 , node ranking 85 , network propagation 97 , network modularization 98 , and network control 99 . Here, we briefly review and discuss the shortest path algorithm, module detection algorithm, and node prioritization methods using node centrality in identifying cancer therapeutic targets and discovering drugs.

Tthe shortest path algorithm. The shortest path algorithm, one of network link algorithm, is used to intelligently identify the shortest connection between two genes or proteins in a graphical model that represents a cellular network 100, 101 . The algorithm is illustrated in Fig. 3 and Algorithm 1. The shortest distance for a given network is calculated by Eq. (1):

Here, S and T stand for the source and target node, respectively. d (S,T) is the length of the shortest path from node S to T. V is a set of network nodes. K stands for a node in the network, and d K,T represents the lengths of possible paths connecting nodes K and T. The shortest path algorithm has been widely used to determine regulatory paths in cancer networks 103, 104 and then discover the key targets on the paths 105 . For example, Li et al. 106 first identified a set of six genes that can distinguish colorectal tumours from normal adjacent tissues using the maximum relevance minimum redundancy approach 107 . The method ranks genes according to their relevance to the class of samples concerned while considering the redundancy of genes. Those genes that had the best trade-off between the maximum relevance to the sample class and the minimum redundancy were considered "good" biomarkers. Then, the authors applied the shortest path algorithm among the six genes in a PPI network underlying cancer and identified 15 shortest paths between any two genes of the gene set. Last, they found 35 genes on the identified shortest paths and ranked them according to their betweenness 108 . The results showed that androgen receptor (AR), a ligand-dependent transcription factor, is ranked as the top gene, suggesting its involvement in colon carcinogenesis through regulating the proliferation and differentiation of tumour cells 109 .

Additionally, Chen et al. 105 used a network-based biology analysis method, SAM (Significance Analysis of Microarrays) 110 , to analyse omics data and identified 153 differentially methylated CpG sites and differentially expressed molecules, including 42 miRNAs and 1,373 protein-coding genes. The authors first used the differentially expressed genes from the STRING database 111 to construct a PPI network. Then, they searched all the shortest paths connecting dysfunctional genes to identify potential cancer driver genes. Next, they ranked the genes by a permutation test and their network properties, such as betweenness and interaction scores. The top-ranking genes at different levels (i.e., methylation level, miRNA level, mutation level, and mRNA level) were regarded as driver genes of lung adenocarcinoma. Among these cancer driver genes, some appeared to be top candidates at different levels, suggesting their multifaceted contribution to lung carcinogenesis.

Above all, the shortest path algorithms 100,101 can help us efficiently identify regulatory paths in networks, allowing us to identify potential genes that are proximate to known cancer genes and thereby important for tumorigenesis. However, due to the complexity of the disease, potential cancer genes are not always on the identified shortest paths 106 , revealing the limitations of such algorithms. To resolve this issue, Lu et al. 112 proposed a random walk with restart algorithm method and identified 298 potential CRC-associated genes, which is more effective and accurate than the shortest path algorithm proposed by Li et al. 106 . In particular, the computing efficacy of the shortest path algorithm could be compromised by large networks and their search strategies 112 .

The module detection algorithm. Cancers usually result from disruption of interactions of key regulatory genes with their partners 81, 113 . Module detection algorithms 114 , one of network propagation algorithm, identify communities of cancer genes in complex networks 115 by analysing their topological structures ( Fig. 4 and Algorithm 2). Here, we explain and illustrate the commonly used modularity maximization algorithm 116 , which identifies network modules with the maximum modularity coefficients by Eq. 2.

where Q represents the modularity coefficient of an identified module, M is the total number of edges in the network, A ij is the adjacency matrix, and P ij represents the expected number of edges between nodes i and j. C i or C j represents the module to which node i or node j belongs. If i and j belong to the same module, δ Ci ;Cj ¼ 1; otherwise, δ Ci ;Cj ¼ 0. The identified modules are a group of genes that are supposed to have a similar biological function, such as promoting or inhibiting tumourigenesis.

Algorithm 2. Module detection algorithm. Currently, many researchers employ module detection algorithms to intelligently identify potential therapeutic targets for cancer [117] [118] [119] . For example, Ghiassian et al. 120 used the DIseAse MOdule Detection (DIAMOnD) method 121 to identify the local modules within the interconnected map of molecular components. They found that disease-related genes were significantly enriched in highly overlapping modules, which indicated that the predicted modules may help identify new anticancer targets. Of note, since the results of module detection algorithms depend mainly on network structures, the identified modules may vary for the same disease network with slightly different topology 85, 117 .

Since potential drug targets may exist in different network modules, we can make use of the correlation between modules to identify reliable cancer treatment targets 81 . Therefore, Wang et al. 122 proposed the seed connector algorithm (adding a few extra hidden nodes as much as possible to link disease proteins) by considering the interactions among cancerassociated proteins. First, this algorithm starts with known seed proteins and induces a loosely connected subnetwork consisting of only seed proteins. Second, Wang et al. sequentially select such proteins as seed connectors that maximally increase the size of the largest connected component of the subnetwork until there is no additional protein that can be selected as a seed connector. Finally, the cancer modules are pinpointed.

While these aforementioned algorithms [122] [123] [124] can intelligently identify meaningful functional modules from network topologies, it may be difficult to capture disease modules 125 . One possible reason is that disease proteins do not constitute particularly densely connected subgraphs but agglomerate in specific large regions of the network. For this reason, Tripathi et al. 126 considered analysing the patterns of connectivity in a disease module to be an effective way to understand the properties of disease modules.

The node centrality. Node centrality measures the importance of nodes and is suitable to intelligently locate key nodes with important biological functions for network biology 127 .

Usually, we listed four types of node centrality as follows: (1) As the simplest form of network centrality, degree centrality is the number of nodes directly connected to the network 127,128 ;

(2) Coreness centrality considers both the degree of nodes and their positions in a network 129 ; (3) Betweenness centrality of a node is the probability for the shortest path between two randomly chosen nodes to go through that node, and it determines the actor that controls information among other nodes by connecting paths 130 ; (4) Eigenvector centrality 131 not only considers the number of edges and the position of nodes but also the impact of adjacent nodes on the interactive network. Table 2 shows the formulas for node centrality computing. Figure 5 (a-d) illustrates the above four types of node centrality, and Algorithm 3 presents the pseudocode to compute four types of node centrality. Degree centrality

Coreness centrality C C ðiÞ ¼ P j2NðiÞ ksðjÞ Vertex j belongs to the neighbours of vertex i, ks(j) is the k-shell index of vertex j.

Betweenness centrality C B ðiÞ ¼ P j<k g j;k ðiÞ=g j;k g j,k is the number of all shortest paths between j and k, g j,k (i) is the number of shortest paths between j and k containing i.

Eigenvector centrality

Artificial intelligence in cancer target identification and drug discovery You et al.

Algorithm 3. The algorithm of degree centrality, coreness centrality, betweenness centrality and eigenvector centrality.

1: function1 Degree centrality: 2: Input: Network G 3:

for each vertex i in Network: 4: d i ← the number of ties that vertex i has 5: C D (i)=d i 6:

Output: C D (i) 7: function2 Coreness centrality: 8:

Input: Network G 9:

for each vertex i in Network: 10:

N(i) ← the set of the neighbours adjacent to vertex i 11:

for each vertex j in N(i): 12:

ks(j) ← the k-shell index of vertex j 13:

Output: C C (i) 15: function3 Betweenness centrality: 16:

Input: Network G 17:

for each vertex i in Network: 18:

for each vertex j in Network: 19: for each vertex k in Network: 20:

if j < k: 21:

g j,k ← number of all shortest paths between j and k 22: g j,k (i) ← number of shortest paths between j and k containing i 23:

Output: C B (i) 25: function4 Eigenvector centrality: 26:

Input: Network G 27:

for each vertex i in Network: 28:

for each vertex j in Network: 29:

if vertex i is linked to vertex j: 30: a i,j =1 31: else: 32: a i,j =0 33:

x j ← the degree of vertex j 34:

Output: C E (i)

As described in Fig. 5 (a) and Eq. 3, the degree centrality of node 2 is 3 (C D (2) = 3) because node 2 interacts with nodes 0, 1, and 3. We demonstrated that highly connected nodes or hubs are more likely to be essential 127 . Because the more direct connections a node has, the greater the impact that the node can exert on the network 132 , we can utilize the degree centrality of nodes to identify cancer therapeutic targets.

For example, Zhang et al. 133 predicted that hypoxia inducible factor-1α (HIF-1α) and prolyl 4-hydroxylase beta polypeptide (P4HB) may be considered potential biomarkers of gastric cancer by constructing a PPI network. Nevertheless, not only Jalili et al. 130 suggested that high connectivity does not necessarily imply its essentiality, but also Kitsak et al. 129 argued that the location of nodes is more significant than the immediate neighbours to evaluate its spreading influence because degree centrality considers only direct interactions of a node but not its impact on other nodes, resulting in low accuracy for target prediction compared to other methods such as coreness centrality 134 .

As shown in Fig. 5 (b) and Eq. 4, the coreness centrality of node 3 is 8 (C C (3) = 8) because the neighbours adjacent to the labelled vertex (3) are vertex (1), vertex (2), vertex (4) and vertex (5) , and these four nodes belong to a 2-shell. Coreness centrality is an advanced form of node centrality because it considers both the degree of nodes and their positions in a network to quantify the importance of nodes in a network 129 . A node with a greater coreness means that the node is located in a more central place and is much more influential in network propagation than the nodes with high-degree but less coreness 129 . Among them, the most classic method to calculate the coreness centrality of network nodes is the k-core decomposition method 135 , which decomposes the network iteratively according to the remaining degree of the nodes.

For instance, Li et al. 136 employed the k-core decomposition method to obtain the coreness of the PPI network. Subsequently, the targets were screened for topological importance. Then, the major hubs in the hub interaction network were determined, and a total of 62 major hubs were identified, including 11 indirubin (EGFR, JAK2, ERBB2, CHUK, CDK5, KIF11, DRD2, CDK3, HTR1A, JAK3 and TYK2) and derivative targets and 51 differentially expressed genes (DEGs) for imatinib resistance. These 11 major hubs were closely related to DEGs that were resistant to imatinib. Indirubin and its derivatives may inhibit imatinib resistance through the regulation of these genes to treat chronic myeloid leukaemia (CML).

Described by Fig. 5 (c) and Eq. 5, the betweenness centrality of node 1 is 3.5 (C B (1) = 3.5) because there are four node pairs contributing to node one (g 0,2 (1)/g 0,2 (1) = 1, g 0,3 (1)/g 0,3 = 1, g 0,4 (1) / g 0,4 = 1, and g 2,3 (1)/g 2,3 =0.5). Betweenness centrality is based upon the frequency with which a node lies between the shortest path of all other possible pairs of nodes within a network and identifies the gatekeepers that control communication of nodes in the network 130 .

For example, Taylor et al. 137 used betweenness centrality analysis to identify intermodular hub proteins and intramodular hub proteins in the breast cancer network. The identified proteins may serve as an indicator of breast cancer prognosis. Moreover, Raman et al. 138 computed degree, betweenness, and closeness indices in PPI networks for 20 organisms and showed that the degree and betweenness centralities of nodes correlate with their lethality in many organisms.

As described in Fig. 5 (d) and Eq. 6, the eigenvector centrality of node 1 is 3 (C E (1) = 3) because node 1 is connected to nodes 0, 2 and 3 (a 1,0 , a 1,2 and a 1,3 equal 1, respectively), and the degree of x 0 , x 2 and x 3 equals 1, respectively. Eigenvector centrality considers not only the number of edges and the position of nodes but also the impact of adjacent nodes on a network.

For example, Mallik et al. 139 first identified differentially expressed and methylated genes in uterine leiomyoma tumours and then found TFs and miRNAs that regulate the expression of these genes. Subsequently, they reconstructed a network that comprised the genes, TFs, and miRNAs and then used eigenvector centrality to identify potential biomarkers. They specified that PTGS2 and TACSTD2 are potential novel biomarkers, since both genes are downregulated and hypermethylated in the tumour.

Moreover, several researchers have attempted to integrate more than one centrality index to increase the efficiency of the node centrality algorithm. For instance, Chen et al. 140 used the differentially expressed proteins of prostate cancer (PC) to construct a PPI network. Then, they integrated the connectivity degree, betweenness centrality, and closeness centrality of nodes to evaluate critical nodes to identify the core module of the PPI network. Finally, they identified SLC2A4 and TUBB2C as important proteins regulating the pathogenesis of cancer, suggesting the proteins involved in biological processes and pathways as potential targets for PC diagnosis and treatment. In addition, Aamri et al. 141 constructed a gene-gene-interaction network for the entire human genome and then applied betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network to identify possible cancer-related genes. The results showed that the average precision for identifying breast, prostate, and lung cancer genes varied between 80-100%.

Although highly connected nodes in the network architecture are essential, recent studies point out that integrating the prior knowledge of cancer into centrality indices can accurately identify anticancer targets 130 . For this reason, Jiang et al. 142 developed a network-based biology analysis method, named NEST, which predicts essential proteins according to the expression levels of their interacting partners in a network. Additionally, the results showed that NEST significantly outperformed the classic centralities on gene essentiality prediction and functional screen result enhancement.

Machine learning-based biology analysis algorithms Machine learning (ML) algorithm is a subset of AI algorithms that can learn from data, therefore removing the need for explicit instructions on how to do certain tasks 15 . The key to identify therapeutic targets and discover drugs using ML-based biology analysis is to make use of network features in biological networks. The network features include the topological features (such as node centrality, interaction, local structure, subgraph, network propagation results, and network-based structure similarities) and the biological information that is embedded in network nodes (such as the gene expression profile, gene mutation frequency, and gene functional annotation).

Here, we introduce two classical ML-based algorithms: one is the decision tree algorithm, which selects significant topological features for cancer; the other is deep learning, which uses the network features to identify cancer targets and discover drugs.

The decision tree algorithm. A decision tree is a supervised classification algorithm 143 with three steps: feature selection, decision tree generation, and decision tree pruning [86] [87] [88] . Figure 6 shows how to classify a set of samples into two groups using the decision tree algorithm.

In the network-based biology analysis, network topology features 88 are usually integrated into a decision tree to classify gene-phenotype associations for cancers [144] [145] [146] to select significant topological features for cancer.

For instance, Ramadan et al. 147 extracted thirteen network topological features (Table 3 ) from a publicly available gene coexpression network and a PPI network of breast cancer. Then, to assess the significance of topological measurements associated with breast cancer, they used Decision Tree Bagger 156 to classify breast cancer gene-phenotype associations. The importance of each topological measure was then evaluated using a score that combines the accuracy of breast cancer classification and the Gini index 148 (Table 3 ). The computed scores of the top five identified features (i.e., structural holes, node degree, node coreness, k-Step Markov and subgraph) outperformed the others, and they were selected as key features for the classification of breast cancer phenotype-gene associations.

Although the decision tree algorithm can help us select key network features, it usually has the overfitting problem when too many features exist in the network 157 , which significantly decreases the classification and prediction on independent testing 157 . At present, there are two commonly used methods to resolve overfitting caused by the decision tree algorithm. One method is using dimension reduction 157 and pruning strategy 86 to improve the classification accuracy by feature reduction; the other is employing the random forest algorithm 158 , an ensemble algorithm with multiple decision trees. The random forest algorithm adopts a bagging strategy, which has higher accuracy and reliability than the classical decision tree algorithm 159 .

For example, Toth et al. 160 used the random forest algorithm to predict the aggressive behaviour of prostate cancer. Their methylation-based classifier demonstrated excellent performance in discriminating prognosis subgroups of the test set (Kaplan-Meier survival analyses with log-rank p value < 0.0001) with an AUC value of 0.95 161 for the sensitivity analysis. Finally, the experimental verification showed that the loss of ZIC2 protein expression was associated with poor prognosis and correlated with a significantly shorter time to biochemical recurrence.

In addition to the overfitting problem, it is difficult for decision trees to visualize the complicated classification procedure 146 . Recently, the alternating decision tree (ADTree) 162 has made the classification procedure intuitive and easy to understand by adding an intuitive graphical model, and the algorithm builds decision trees over a user-defined number of iterations using confidence-rated boosting, so it returns both a class label and a score that measures confidence in the classification, as shown in Fig. 7 and Algorithm 4.

For example, Carson et al. 146 used ADTree to classify proteins in a breast cancer network. As indicated in Fig. 7 , the most effective attributes to distinguish disease and non-disease proteins are node degree, disease neighbour ratio, eccentricity, and neighbourhood connectivity, which was proven by Hao et al. 163 and Zhang et al. 164 . Node coreness Considers both the degree of nodes and their positions in a network 12.05 k-Step Markov 150 The probability that a random walk of length k makes the system reach a certain vertex 10.47

The number of times a given vertex participates in different connected subgraphs of a network 10.36

Within-module z-score 152 Measure how nodes are related. 8.88

Katz status index 153 Rank a vertex as highly important if many nodes are connected to it. 8.64

The average length of the shortest path between nodes 8.18

Proximity prestige The average shortest path length of a node 8.12

Eigenvector centrality The influence of directly adjacent nodes on central node 8.09

Betweenness A node acts as a bridge along the shortest path between two other nodes 7.93 Bary centre score 154 Rank the nodes by the total shortest path of the vertex 5.70

Clustering coefficient 155 Measure the degree of cohesiveness 0.15 Although the decision tree, random forest and ADTree 86-88,158 demonstrate the tendency to identify such proteins that are well annotated and studied for cancer, these methods are subject to producing local optimal solutions. Therefore, Chen et al. 143 proposed using the decision tree classifier based on particle swarm optimization 166 to avoid falling into the trap of local minima by adding randomness to optimize the number of features and detection accuracy of cancer treatment targets. Furthermore, the gradient boosting decision tree 167 is a very flexible and scalable method to classify network nodes for future study.

The deep learning algorithms. Deep learning is a subfield of machine learning, and the origin of neural networks sets the stage for the emergence of deep learning models 168 . Deep learning model is a neural network composed of complex structures and nonlinear transformations 90,91 that attempts to model high-level abstractions of data using multilayer neurons. Through training and iteratively updating its hyperparameters (Eq. 7), the initial lowlevel feature representation (such as topological features and biological information) of samples is transformed into the highlevel representation that shows the distinction between samples. The strength of deep learning is its ability to detect complex patterns in data, making it suitable to interrogate the biological networks that consist of complex, interdependent relationships among genes.

W, k, and C are the weight, iteration, learning rate, and loss function, respectively. Currently, there are many neural network models and complex functions for ML-based biology analysis. In this paper, we only present several commonly used neural networks (Table 4) . Benefiting from the strong ability of neural networks in mining complex information on links or nodes, deep learning is a suitable method to identify potential cancer targets and discover drugs for cancer treatment in complex biological networks 175 . For example, Selvaraj et al. 176 searched for therapeutic targets for lung adenocarcinoma in a network of protein-protein and proteindrug interactions and employed a neural network to identify candidate drugs, where phosphothreonine is predicted via molecular dynamics simulations to target the hub node MAPK1 in the network.

Currently, artificial intelligence biology analysis has benefited from the utilization of graph-based neural networks instead of commonly used non-graph neural networks such as CNN 170 or DNN 169 , because graph-based neural networks can take the biological network structure as the input directly, learn an embedding that contains information about the neighbourhood of a target node in a graph, and analyse the biological network with neural networks technology. Figure 8 illustrates the basic flowchart of graph-based neural networks for the investigation of different properties of biological networks.

There are two advantages in using graph-based neural networks to identify cancer targets or discover drugs from biological networks.

1. Feature representation. Graph embedding 177 is the core method to extract features in graph-based neural networks, which represent network nodes as a low-dimensional vector representation, preserving both network topology and node content information 178 . For example, Li et al 174 proposed a similarity-based miRNA-disease prediction method that used DeepWalk, a graph embedding algorithm, to compute the topological similarities between two diseases nodes. The model extracts the disease node features in the diseasedisease network based on the random walk algorithm, and significantly enhances the prediction performance by utilizing global network association information. For diseases nodes with similar features, if one of the diseases is 

Convolutional neural network (CNN) obtains local information between input data by convolution. 170 Graph-based Neural Network GCN Graph convolutional network (GCN) applied cconvolution in networks to obtain local information between nodes and neighbour nodes. 171 

Graph autoencoder (GAE) uses autoencoder to extract the embedded features of the network. 172 GAN Graph attention network (GAN) uses attention mechanism instead of convolution to obtain local or global information between nodes. 173 DeepWalk DeepWalk is a network embedding model, which can represent the attributes of graph nodes as low dimensional and dense eigenvectors. 174 Artificial intelligence in cancer target identification and drug discovery You et al.

associated with miRNA, the other is predicted to be associated with the miRNA. In addition, Zheng et al. 179 proposed an attention-based graph neural networks (attention mechanism assigns different weight parameters to different targets through learning, so as to consider the importance of key targets locally and globally 180 ) to learn the graph embedding feature (association scores) from piRNA-disease association network. The results showed that the predicted scores of piRNA-disease associations are positively correlated with the association probability between a piRNA and a disease, suggesting that piRNAs with closer distances to tumour genes in the network are more likely to be therapeutic targets of cancer. 2. Feature integration, which integrates the heterogeneous, noisy, nonlinear-related biological network information (such as node similarity, node interactions, upstream and downstream relationships) multi-views (such as drug molecular structures and drugs' indications) 181 . For example, Ma et al. 172 proposed a novel graph autoencoders model (GAE) to learn accurate and interpretable drug similarity measures from multiple types of drug properties. The GAE uses attention mechanism 180 to integrate multi-view (multiple types of drug properties) from drug-drug interactions network and determines the weights for each view with respect to the similarity measure tasks for better explaining the contribution of drug properties to drug similarity. Due to the ability to integrate network data from multi-views and autoencoder structures, GAE can resist the noise interference in the data. Thus, graph-based neural networks are more robust and reliable in most application scenarios 182 .

Overall, deep learning can comprehensively explore features such as node degree, edge length, and module in biological networks [83] [84] [85] 183 to provide an accurate prediction for drug targets of cancer through artificial intelligence of multiomics data in complex biology networks 184 . However, there are still two key issues to be addressed. One is the interpretability of the models, which is critical for clinical adoption 185 . The other is how to demonstrate the generalizability of the approach 185 and validate these approaches in the context of multi-institutional datasets. Therefore, these issues are actively being tackled from model interpretation, extraction of biological insights 186 and model reproducibility 187 .

Because the wide and easy accessibility of high-throughput data in oncology has provided the basis for developing novel artificial intelligence methods and validating their capability to identify therapeutic targets, this section will focus on reviewing the biomedical applications from four perspectives. First, we present the artificial intelligence applications to identify novel anticancer targets. Second, we present the artificial intelligence applications to evaluate the druggability of potential target genes. Third, we show the artificial intelligence applications for drug discovery. Fourth, we show the artificial intelligence applications for drug property prediction.

Artificial intelligence biology analysis applications 188 usually use omics data to build networks and identify co-expression modules of genes, proteins, metabolites, critical pathways between molecules, and key molecules in biological networks 189 . This study will introduce these applications from two perspectives: one is network-based biology analysis applications, and the other is ML-based biology analysis applications.

Network-based artificial intelligence for identifying novel anticancer targets. Network-based biology analysis applications firstly reconstruct networks by computing differential expressions of molecules and their correlations [190] [191] [192] [193] . Then, gene set enrichment analysis are performed to identify network modules with different biological functions 194 . Finally, the identified network modules are used to discover key genes that are potential therapeutic targets (or biomarkers) for cancer. Here, we show the key target identification procedure by network-based biology analysis applications as follows. WGCNA 195 is a commonly used network-based biology analysis application that uses various gene expression matrices as input. Then, WGCNA outputs different gene network modules and the core genes in the biological network. For example, Zhou et al. 196 used WGCNA to analyse colorectal cancer data from TCGA (Fig. 9 ), which demonstrated that 11 hub genes and 5 hub miRNAs have predictive power for the prognosis of colorectal cancer patients by the following steps.

In Step 1, the correlation between all pairs of genes and miRNAs by differential gene expression analysis was calculated, and two similarity matrices were constructed. In Step 2, the adjacency matrix, which comes from similarity matrices, is transformed into a topological overlap matrix (TOM) by using TOM similarity, and then the coexpressed gene and miRNA modules are identified by using dynamic tree cutting 197 . In Step 3, after module preservation analysis, six gene modules were found to have strong stability, Fig. 9 The workflow to identify novel anticancer targets by network-based. (Created with BioRender.com) Artificial intelligence in cancer target identification and drug discovery You et al. and one miRNA module was found to have low stability. In Step 4, they performed module-trait relationship analysis to further validate the module-clinical trait relationships, and two pathological stage-related gene modules and one pathological stagerelated miRNA module were identified. In Step 5, hub genes and hub miRNAs were identified by calculating the module membership and gene significance.

Though network-based biology analysis methods are useful in identifying anticancer targets, they have some limitations, such as they cannot effectively handle multiomics data, leading to high false-positive rates of identified targets 42 . Developing comprehensive network-based biology analysis applications may resolve the problems and increase the precision for predicting cancer biomarkers 198 .

For example, Lai et al. 199 deployed an integrated approach that combined network-based algorithms and RNA sequencing data to delineate miRNA-based strategies that enhanced DC (dendritic cell)elicited immune responses. First, the authors performed RNA sequencing to obtain the protein-coding genes and miRNAs in relation to standard DCs. Then, they analysed miRNA-gene interactions at the pathway level and reconstructed regulatory networks underlying the immunological functions of DCs. Finally, they performed network-based prioritization of miRNAs by combining their expression profiles and strength of association with other protein-coding genes. Their analysis identified dozens of promising miRNA candidates, of which miR-15a and miR-16 are the most promising ones for increasing the immunogenic potency of DCs and therefore improving DC-based immunotherapy against cancer.

In summary, we consider that an increasing number of networkbased biology analysis applications will be developed for novel anticancer targets identification in the distant future.

ML-based artificial intelligence for identifying novel anticancer targets. ML-based biology network analysis applications are applied to interrogate the large, complex data and thus iden tifying reliable potential novel targets as effective treatments of human diseases 200 . These ML-based biology analysis applications for novel anticancer targets identification consist of classification 201 , clustering 202 , neural networks 203, 204 , and so on 205 . Here, due to the limit space of the review, we only focus on the MLbased biology network analysis applications for classifications and graph-based neural networks.

ML-based biology network analysis applications for classifications identify key targets by determining the key factors of classifications 206 . It considers specific biomarkers (such as gene or protein nodes) of the defined classes as key targets 206 . Recently, the classification-based applications and molecular profiling 207 , use genome-wide gene transcription profiles, protein expression profiles and/or mutational landscapes to make a more accurate classification of tumor subtypes and identify biomarkers for specific tumor types.

For example, Sinkala et al, 208 applied classification analysis on networks to reveal subtypes of pancreatic cancer and their molecular characteristics. Firstly, the authors employed K-means clustering to the reverse phase protein array (RPPA), determined proteomics data with 45 high-purity pancreatic cancer samples, and then identified two clusters of samples.

Secondly, they compared their clustering results to other subtypes that have been reported in the literature for various other molecular data types (such as DNA methylation status, protein expression levels and expression levels of mRNAs and miRNAs), and then applied the similarity network fusion (SNF) to identify two-cluster and three-cluster solutions comprised 25 and 20 tumors. The SNF method solves the disparate clustering problem by constructing similarity networks of samples for each available molecular data type and then efficiently fuses these into one network that represents clustering based on all the underlying data.

Thirdly, they applied proteomics-based signaling pathway analysis to distinguish disease subtypes and found that, for tumors of the two major pancreatic cancer subtypes, oncogenesis may be primarily driven by perturbation in either SMAD4 or mTOR signaling pathways. Furthermore, they performed gene set enrichment analysis using the Gene Ontology database 52 and found that pancreatic cancer subtypes classified by mRNA expression levels and DNA methylation statuses show differences in molecular functions in terms of mRNA.

Finally, given that different types of molecular data yield different patterns of tumor clustering, they attempted to identify a list of biomarkers that can differentiate the two tumor subtypes. Using neighborhood component analysis, they identified biomarker sets comprising 50 mRNAs, 49 methylated genes, 14 proteins, and 20 miRNAs. Subsequently, they separately applied hierarchical clustering using each type of the molecular data and successfully reproduced the two pancreatic cancer subtypes.

For graph-based neural networks, they take advantage of not only making use of the correlation among samples described by similar networks, but also message passing between targets and neighbors to improve the accuracy of targets identification 209 .

For example, to the best of our knowledge, the MOGONET proposed by Wang et al. 203 is the first to make use of both graph convolution networks (GCNs) and cross-omics relationships in the label space for effective multiomics integration in biomedical data classification tasks. The specific process is as follows:

Firstly, they constructed a weighted sample similarity network for each type of omics data using cosine similarity. Taking both the omics features and the corresponding similarity network as the input, a GCN is trained for each type of omics data to predict class labels.

Secondly, the predictions generated by each omics dataspecific GCN are further utilized to construct a new tensor, named cross-omics discovery tensor, which can reflect the crossomics label correlations.

Finally, the cross-omics discovery tensor is forwarded to VCDN (view correlation discovery network) to explore the latent correlations across different omics data for final label prediction. Because the importance of a feature to the classification task can be measured by the performance decrease after removing individual features. Therefore, they used this method on the test data set to quantify and rank the contribution of each feature of different omics data to the prediction. Using the method, they identified top-ranking features as biomarkers for breast cancer.

In addition, Xuan et al. 204 proposed a novel method based on the graph convolutional network and convolutional neural network (GCNLDA) to infer disease-related lncRNA candidates. First, they developed a network that is comprised of lncRNA, disease, and miRNA nodes. Then, they developed an embedding matrix of lncRNA-disease node pairs with respect to the biological premises. Then, they employed a convolutional neural network to explore various connections related to lncRNAdisease on node pair embedding. Finally, they learned the local network representations of lncRNA-disease pairs by deeply integrating the graph convolution autoencoder into topological lncRNA-disease-miRNA heterogeneous networks. Crossvalidation confirmed that GCNLDA outperforms other state-ofthe-art methods in terms of both AUC and AUPR 161 . Case studies 204 on stomach cancer, osteosarcoma and lung cancer confirmed that GCNLDA effectively discovered potential lncRNAdisease associations. Therefore, GCNLDA is becoming an effective tool to screen reliable candidates for lncRNA-disease association validation with the help of biological experiments.

In summary, we consider that an increasing number of MLbased biology analysis applications will be developed to identify novel anticancer targets with the development of deep learning in the future.

Evaluation of the druggability of potential targets Druggability is a concept that assesses whether a drug can bind to a protein to alter its activity 3, 4 . The human proteome has approximately 6,000 to 8,000 potential pharmacological targets, but only a small fraction can be targeted by drugs 7, 210 . Therefore, it is important for us to evaluate druggability after finding novel anticancer targets. This study will introduce these applications from two perspectives: one is network-based biology analysis applications, and the other is ML-based biology analysis applications.

Network-based artificial intelligence for evaluating the druggability of potential targets. The druggability evaluating approach requires a long development cycle and high financial cost for the 3D structures of protein analysis 211 , while network-based biology analysis application provides an alternative methods to accelerate the evaluation procedure for the druggability of potential targets 212 .

Described by Fig. 10 , PockDrug is a novel web server that is employed to predict pocket druggability on proteins and queried for a protein or a set of proteins 213 . For example, Yang et al. 214 constructed a protein-protein interaction network for thyroid cancer and identified three key targets, HEY2, TNIK, and LRP4. Then, they used PockDrug to predict whether HEY2, TNIK, or LRP4 have targetable pockets for drugs in the following three steps.

In Step 1, they inputted the potential target and located pocket estimation methods. In Step 2, they predicted the druggability of the pockets by computing the physicochemical properties of the target pockets. In Step 3, they screened three hub genes, HEY2, TNIK, and LRP4. Based on the predictions, TNIK, which has 8 out of 538 residues, has an average druggability probability greater than 0.5 and thus was considered a druggable pocket for thyroid cancer.

In short, with the in-depth study of protein pocket, an increasing number of network-based biology analysis applications are developed to accurately evaluate the druggability of anticancer targets, providing reliable druggable targets for cancer treatment.

ML-based artificial intelligence for evaluating the druggability of potential targets. These ML-based biology analysis applications for evaluating the druggability of potential targets consist of protein structure modeling and drug-target affinity analysis. Previously, traditional analysis of protein structure modeling required considerable time and financial cost 211 , which greatly limited the traditional application of PockDrug since it is heavily dependent on an accurate 3D protein structure. Recent ML-based biology analysis applications have focused on developing methods to predict the 3D structure of a protein from its genetic sequence, also known as the protein folding problem. The cuttingedge ML-based modelling method [215] [216] [217] can generate 3D protein structures with high accuracy and efficiency, which makes it possible for PockDrug to be widely used.

For example, Yang et al. 218 developed the trRosetta algorithm, which fast and accurately predicts protein structures based on energy minimizations with restrained trRosetta. They employ a deep residual neural network to predict the restrained trRosetta, which consists of inter-residue distance and orientation distributions. Since trRosetta outperforms all previously protein modelling methods in benchmark tests on CASP13-219 and CAMEO-220 derived sets, it turns out that trRosetta can accurately predict protein structure. Furthermore, Senior et al. 221 developed Alphafold to predict protein structures from amino acid sequences. First, Alphafold predicts the distances between pairs of residues by training a neural network to analyse the covariation of homologous sequences. Then, Alphafold constructs a potential mean force that accurately describes the shape of a protein. Finally, Alphafold optimizes the protein structure by a gradient descent algorithm. Because AlphaFold can predict protein structure with high accuracy even for such sequences with fewer homologous sequences, we consider that AlphaFold makes great progress in protein-structure prediction.

ML-based biology analysis applications for drug-target affinity (DTA) analysis application estimates the interaction strength of Compared with other methods, such as molecular docking 223 and collaborative filtering 224 , graph-based neural networks are more effective in DTA prediction, because graph-based models facilitate the learning by considering both drug structure and drug-target interaction information instead of representing the drugs as string, as string sequences may lose the structural information of the molecule and may impair the predictive power of models 225 .

For example, Nguyen et al. 225 is the first to use GNN for predicting DTA. The authors proposed GraphDTA, a new neural network model for regression tasks, which takes the drug-target pair as the input and outputs the continuous measurement of the binding affinity of the pair.

In detail, for the input drug-target pair, the protein targets are represented as sequence information instead of the molecular diagram of tertiary structure. While the drug compounds are represented as network graphs of atomic interaction, where each node is an eigenvector that represents five kinds of information: the atom symbol, the number of adjacent atoms, the number of adjacent hydrogens, the implicit value of the atom, and whether the atom is in an aromatic structure. For the output, GraphDTA combined the drugtarget pair feature information to predict the continuous measurement of the binding affinity of the drug-target pair.

Through a multivariable statistical analysis of GraphDTA's output data from hidden layers, the authors have two conclusions. One is to identify the correlations between hidden node activations and domain-specific drug annotations, such as the number of aliphatic hydroxyl groups, which suggests that the graph neural network can automatically assign importance to well-defined chemical features without any prior knowledge. The other is that the model makes it easier to extract features from drugs with obvious molecular structure patterns to achieve high-precision predictions. Especially, drugs that do not have an obvious molecular structure pattern are more difficult to predict.

In short, with the development of deep learning, an increasing number of ML-based biology analysis applications can quickly and accurately evaluate the druggability of anticancer targets, providing reliable druggable targets for cancer treatment and reducing the time and financial costs of experiments.

Drug discovery After evaluating the druggability of potential targets, it is essential to discover the drugs that interact with the potential therapeutic targets. As complex or concomitant diseases may usually require treatment with multiple drugs, but the use of multiple drugs will increase the risk of side effects 200 , it is very essential for drug discovery to predict the interactions between drug-target and drug-drug.

This study will introduce these applications from two perspectives as the above section: one is network-based biology analysis applications, and the other is ML-based biology analysis applications.

Network-based artificial intelligence for drug discovery. These network-based analysis applications for drug discovery consist of drug screening and drug repurposing. Drug screening is a process that potential drugs are identified and optimized before selecting a candidate drug to progress to clinical trials 226 . Since screening drugs through biological experiment is quite laborious, expensive, and time-consuming 226 , network-based biology analysis application becomes an alternative way for efficiently drugs screening.

Identifying drug-target interactions (DTIs) is crucial for drug screening. Especially, novel DTIs can be employed to look for the novel anticancer drugs with known targets 227 .

The network-based biology analysis applications for DTI prediction are usually based on guilt-by-association principle that a protein may be a target for a drug if many of the protein's neighbors in the interaction network are targets of the drug 228 . Based on this principle, we classify the network-based biology analysis applications for predicting DTI into two categories.

One is 'top-down', which is from observable characteristics, such as side-effects or the diseases treated by a drug, to the interaction. For example, Campillos et al. 229 used the physiological effect information from side effect similarity networks between entities for DTI prediction to predict whether two molecules could interact.

The other is 'bottom-up', which is from molecular features, such as protein structure, to interactions. For example, Feng et al. 230 and Lee et al. 231 predicted DTI based on the proteins in proteinprotein interaction networks with similar property features that may interact with the same drug.

Drug repurposing, also known as drug repositioning, is another drug discovery application. It refers to a method that identifies new indications for approved drugs or drug candidates which have failed in the development phase 232 . Compared to the drug screening process, since drug repurposing can significantly reduce the drug development period and costs 233 , it is a better application to discover anticancer drugs.

The network-based biology analysis applications are efficient to carry out drug repurposing analysis, because the constructed drug similarity networks contain the similarity, interaction or linkages between drugs, diseases, and targets. Here, we introduce four major network-based biology analysis applications of drug repurposing 234-241 as follows.

The first network-based biology analysis application of drug repurposing quantifies the similarities or relationships for known drug-disease associations, and then uses regression models or statistical models to predict novel drug-disease associations 234, 235 . For example, Cheng et al. 242 presented a network-based drug repurposing tool, which can accurately predicts drug responses in cancer cell lines by integrating human protein-protein interactome with transcriptome profiles, whole-exome sequencing, drug-target interactions and drug-induced microarray data.

The second network-based biology analysis application of drug repurposing infers new indications of drugs through analyzing information flow or performing random walks on drug-disease association networks [236] [237] [238] . For example, Luo et al. 243 proposed a novel random walk method to measure the similarity of drugs and diseases respectively by the drugs properties and diseases properties, so as to predict potential indications of drugs.

The third network-based biology analysis application of drug repurposing, named individualized Network-based Co-Mutation, quantifies putative genetic interactions in cancer and it can be used to identify candidate therapeutic pathways for cancer 239 . For example, Cheng et al. 244 used the approach to identify potential targets or new indications of existing cancer drugs that directly target significantly mutated genes or their neighbor genes in the human PPI interaction network.

The fourth network-based biology analysis application of drug repurposing can be realized directly through calculating the adjacency matrix of drug and disease network 240, 241 . Based on this method, Luo et al. 245 utilized the matrix completion algorithm to fills out the unknown entries in the drug-disease matrix by constructing a low-rank matrix approximation. New drug-disease associations will be screened by the predicted fill value.

Taken together, the network-based drug screening and repurposing applications provide researchers a lot of alternative approaches for quickly anticancer drugs discovery.

ML-based artificial intelligence for drug discovery. Currently, MLbased biology analysis applications have been employed to carry out drug screening and drug repurposing. For drug screening, previous studies have shown that network-based biology analysis applications can only screen the neighbour proteins of known 227 developed the DTI-Voodoo that combines molecular features and phenotypes information with an interaction network using graph neural networks to predict drug-protein interactions (Fig. 11) . Firstly, the model takes the two features, phenotypes features and molecular features, as input. To extracted phenotypes features, they utilized DL2Vec 246 to obtain ontology-based representations. DL2vec constructs a PPI network by introducing nodes for each ontology class and edges for ontology axioms, followed by random walks starting from each node in the graph to generate representations that enable encoding drug effects or protein functions while preserving their semantic neighborhood within that graph. To extract molecular features, they utilized SmilesTransformer 247 to capture the molecular organization of each drug from molecular structures of drugs and utilized DeepGOPlus 248 to capture protein molecular features from protein amino acid sequences.

Secondly, they used two learnable feature transformer models to investigate the latent relationship between phenotypes features and molecular features. According to relationship information, the transformer model, which input the phenotypes features, will output the protein embedding for PPI networks (the top-down approach), and the other transformer model, which input the molecular features, will output drug embedding (the bottom-up approach).

Finally, a DNN was used to extract similar information related to protein from drug embedding, while a GCN is used to update the nodes embedding in PPI networks. Then both protein features and both drugs' features are combined to calculate the similarity by cosine similarity. Since DTI-Voodoo performs well, it demonstrated that graph-based neural networks are good at identifying novel drug-protein interactions.

For drug repurposing, graph-based neural networks take the advantage of feature representation, which can not only utilize the drug-drug links information, but also the features between drugcancer pairs. For example, Cui et al. 249 proposed GraphRepur, a model for drug repurposing prediction based on graph neural networks. Firstly, the authors collected the drug-induced gene expression data from the LINCS project 250 as well as the drug-drug links information from the STITCH database 251 . Secondly, to obtain the signature of drugs, they identified differentially expressed genes for breast cancer and used the drug-induced genes from LINCS as drug signatures. Thirdly, based on the drug-drug links information from the STITCH database and drug signatures, they constructed a drug-drug links graph with drug signatures as node features. Fourthly, they input drug signatures and drug-drug links information into GraphRepur, and then the model computes scores for drugs that can be repurposed for treating breast cancer. Finally, the authors validated some predictive drugs for breast cancer using experimental data from the literature and showed that the model has significantly better performances than others, such as GCN, DNN, and random forest, in drug repurposing. using published studies.

Furthermore, the authors summarize three conclusions. The first conclusion is that the drug-drug links information plays an important role in studying drug repurposing. The second conclusion is that if such a network with fewer isolated nodes can provide a lot of network topology information, it will significantly improve the prediction performance of graph neural networks. The third is that the drug-induced genetic feature help to improve the DTI prediction accuracy of graph neural network.

Taken together, with the development of graph-based neural networks, an increasing number of ML-based drug screening and repurposing applications can quickly and accurately discover anticancer drugs, reducing the time and financial costs of experiments.

Drug properties prediction ADMET properties prediction. As discussed in section 4.3 (drug discovery step), after we have a list of drug molecules showing high affinity with the therapeutic target, it is necessary to investigate the properties of these candidates' drugs [252] [253] [254] [255] . Since the prediction of drug properties usually adopts the ML-based methods, this study mainly reviews the ML-based biology analysis applications for drug properties prediction such as the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of chemical compounds 256 . Table 5 briefly described the  ADMET properties. ADMET properties prediction can be considered as a classification or regression problem. Because of the strong ability of feature representation 177 , graph-based neural networks can capture the drug descriptors (the physicochemical properties, molecular representations, and drug-like properties of molecules) from the drug fingerprints (the substructure features of a molecule) 257 , so as to predict ADMET properties by classification or regression algorithm (Fig. 12) 258 .

For example, Duvenaud et al. 259 proposed a graph convolution network to learn drug molecular fingerprints, which shows better performance than the state-of-the-art circular fingerprint method for ADMET properties prediction. After that, more and more scientists have used graph-based neural networks to predict the ADMET properties of drug molecules.

For example, Liu et al. 171 proposed Chemi-Net, which utilizes GCN for ADMET properties prediction. They set the characterization of the atoms of the drug molecule and the relationship between atoms as the input of the Chemi-Net, while the output of Chemi-Net is the ADMET properties prediction of drug molecules. The predictive process of Chemi-Net is as follows.

Firstly, the model projects the assembling of the atoms and atom pair descriptors (features between atomic pairs) 257 onto a 3D space to obtain a drug molecule-shaped graph structure. Secondly, Chemi-Net carries out a series of graph convolution operations to output a single fixed-sized molecule embedding. Finally, they obtain accurate ADMET properties predictions of drugs after passing the molecule embedding representation through fully connected layers.

In summary, we consider that more artificial intelligence models for drug properties prediction will be developed in the distant future.

The drug properties application in clinical trial. Since there have been a large number of applications based on artificial intelligence to study the properties of drugs, it still takes on average 10-15 years and 1.5-2.0 billion to bring a new drug to market 260 . One of the main stumbling blocks is the high failure rate of clinical trials. Therefore, some research are committed to the application of artificial intelligence for clinical trial design.

For example, Shah et al 261 construct an artificial intelligence system that made use of the 'self-learning' deep reinforcement learning technology to looks at treatment regimens currently in use, and iteratively adjusts the doses. Therefore, the system can determine the fewest, smallest doses that could still shrink brain tumors, reduce toxicity and eventually find an optimal treatment plan with the lowest possible potency and frequency of doses that should still reduce tumor sizes to a degree comparable to that of traditional regimens. In simulated trials of 50 patients, the system designed treatment cycles that reduced the potency to less than a half of all the doses while maintaining the same tumor-shrinking potential.

In conclusion, we believe that with the development of artificial intelligence applications for drug property prediction, these applications will provide better help for clinical trial.

Modelling of cellular networks underlying cancer has provided us with a quantitative framework to investigate the link between network properties and the disease by artificial intelligence biology analysis, thereby leading to the discovery of potential novel anticancer targets and drugs [23] [24] [25] [26] [27] [28] [29] . However, there is no systematic review that introduces artificial intelligence biology analysis in cancer target identification and drug discovery. For this reason, this study briefly reviewed the scope of artificial intelligence biology analysis to explore new anticancer Absorption The ability of a drug that cross membranes of many cell to reach its site of action, when drug is administered via oral ingestion.

Distribution After absorption or systemic administration into the bloodstream, a drug is distributed to its site of action through the circulatory systems.

Metabolism The process of chemically converting a drug to a metabolite is called metabolism or biotransformation.

The collective term used for irreversibly removing a drug from the body

The extent to which a drug damages an entire organism, an organism's substructure, or an organ.

Artificial intelligence in cancer target identification and drug discovery You et al.

targets 34, 54, 57, 74, 80 , the principles and theory of commonly used artificial intelligence biology analysis algorithms [83] [84] [85] [86] [87] [88] [89] [90] [91] , and the artificial intelligence applications for artificial intelligence biology analysis 42, 195, 213 .

The scope of artificial intelligence analysis to explore novel anticancer targets consists of epigenetics 54 , genomics 57 , proteomics 74 , metabolomics 34 , etc. Since it is not accurate to have anticancer targets by single omics studies, we have to employ artificial intelligence biology analysis to effectively integrate multiple omics data and tackle the complexity of cancer that arises from interactions between genes and their products 16, 17 and improve our understanding of carcinogenesis [23] [24] [25] [26] [27] [28] [29] . Therefore, how to employ artificial intelligence biology analysis algorithms to integrate multiomics data and identify novel anticancer targets will be an important future study direction.

Next, we introduced two categories of commonly used artificial intelligence algorithms. One is network-based biology analysis algorithms and the other is ML-based biology analysis algorithms. We here discuss their limitations and advantages.

The network-based biology analysis algorithms usually are comprised of shortest path 83 , module detection 84 and network centrality 85 , which have three major advantages: First, they provide a variety of alternative approaches to identify cancer targets, and different algorithms can compensate each other to identify targets from various perspectives, therefore providing new biological explanations 30 ; Second, since they are not limited by the scale of the network, they are good at dealing with the case of small sample network; Third, prior biological knowledge and experience could be conveniently integrated into network-based biology analysis algorithms to make them interpretable.

However, previous studies also show two major shortcomings for the network-based algorithms: First, the current biological network data are biased toward much-studied targets 262 . Since previous studies have paid much attention to these targets, the network-based algorithms will more likely identify these wellstudied targets than others due to the data bias 262 . Second, most algorithms only use the topological information of the biological network, but neglect the association between cell function or phenotypes and topological features (such as centrality-based algorithms that are discussed in Section 3.1.2).

ML-based biology analysis algorithms are usually comprised of decision trees [86] [87] [88] and deep learning [89] [90] [91] , which have two major advantages.

One is feature learning and detection 177, 181 , which employ sophisticated neural network architectures to link up features of biological networks and characterize their relationships. Subsequently, they iteratively train the model to detect such features that are hard to be detected by network-based biology analysis algorithms.

The other is their ability to effectively integrate large and diverse data. It is possible for ML-based networks biology analysis algorithms to integrate multiomics biological network data and identify novel targets 263 , because of the fast development of deep learning models and the easy access to high-throughput biological.

Although employing ML-based algorithms greatly benefits the target identification and drug discovery for cancer treatment 174 , we still have three major challenges to overcome.

The first challenge is the lack of consistent data for validation 33 . Although the recent advances in biotechnologies have enabled the fast generation of massive biomedical data, such data often suffer from inconsistency in production and information missing in annotation, resulting in the lack of reliable and consistent data for validating deep learning models 264 .

The second challenge is the integration of heterogeneous information 103 . Although deep learning models facilitate the integration of multimodal biological data, it is still difficult to build up a universal deep learning model due to the lack of biological domain knowledge 200 .

The third challenge is hard to provide interpretability of deep learning models 185 . However, a recent study sheds a light to resolve the issue through a combination of a disease network with a neural network to characterize the mechanism of melanoma 263 . In addition, graphs-based neural networks can improve the interpretability of deep learning models 265 .

In the last section of the study, we have reviewed the applications of artificial intelligence biology analysis for cancer therapy from four perspectives: novel anticancer targets identification 189 , evaluating the druggability of potential targets 3,4 , drug discovery 200 , and drug properties prediction [252] [253] [254] [255] .

First, we presented several widely used applications to identify novel anticancer targets. However, exemplified by WGCNA 195 , these network-based biology analysis applications not only requires high computing costs to reconstruct gene coexpression networks 42 but also has difficulty in accurately locating effective network nodes. Although ML-based biology analysis applications employ collaborative modelling by neighbourhood nodes information to reduce the computational cost and improve the predictive accuracy for anticancer targets, biological networks still have data bias 262 , resulting in most of the identified targets by current applications already have been reported in previous studies. Therefore, how to develop such an efficient feature selection application that can solve the data bias problem will be appealing for novel therapeutic anticancer target identification [266] [267] [268] in the distant future.

Second, we introduce several widely used applications to evaluate the druggability of potential targets. For example, PockDrug is usually used to predict druggable pockets on proteins 213 . Although trRosetta 218 and Alphafold 221 offer opportunities for Pockdrug to evaluate the pharmaceuticals of potential targets, Pockdrug neither accurately predicts druggability due to the complexity of protein structure [269] [270] [271] nor costs low efforts to validate through biological experiments 272, 273 . Nevertheless, since DTA prediction can quickly provide reliable druggable targets for cancer care with low financial costs 211 , it is potential to develop Fig. 12 The graph-based neural network capture the features related to drug properties from drug molecular structure to predict ADMET properties of drugs. (Created with BioRender.com) Artificial intelligence in cancer target identification and drug discovery You et al. the related efficient artificial intelligence biology analysis applications for DTA prediction in the distant future.

Third, we investigated several widely used applications for drug discovery, which consists of drug screening and drug repurposing.

For drug screening, identifying drug-target interactions (DTIs) is a crucial step. Since network-based biology analysis applications for DTI prediction are usually based on the guilt-by-association principle 228 , it can only predict the interacting neighbors of known cancer targets. Currently, ML-based biology analysis applications can extend the predictions to downstream consequences 227 , thereby screening out more possible anticancer drugs.

For drug repurposing 232 , there are four commonly used networkbased biology analysis applications 234-241 that integrate the similarities among various drugs but ignore prior knowledge. However, ML-based biology analysis applications not only can take advantage of the similarity among drugs, but also can integrate drug properties to improve the accuracy of drug repurposing.

Fourth, we introduce widely used applications for drug properties prediction. For example, graph convolution networks, which have a strong ability of feature representation 177 , can capture the features related to ADMET properties of drugs from their molecular structures. Therefore, it is becoming a popular method to predict drug properties by integrating drug molecular structures and drug clinical phenotype for drug properties prediction through graph convolution networks 274 . Here, we wish once more and more artificial intelligence biology analysis models are developed to capture the features related to ADMET properties from the drug molecular structure, to improve the success rate of clinical trials.

In summary, although we have reviewed and discussed many artificial intelligence algorithms and corresponding applications for novel anticancer target identification and drug discovery, this review is still too brief to cover the entire research area. However, because artificial intelligence algorithms are effective in exploring new anticancer targets and discovering drugs, we wish this review could offer valuable enlightenments for interested researchers to develop an understanding of the principles behind artificial intelligence biology analysis in cancer target identification and drug discovery. Moreover, we wish that our perspective on artificial intelligence and related applications will provide the pathway for further advancement in the field.

Y.Y. and X.L. contributed equally to this work. Y.Y., X.L., Y.P., H.Z., J.V., S.L., S.D. and L.Z. contributed to writing and revising the paper. X.L., S.D., and L.Z. supervised the research. All authors have read and approved the article.

Competing interests: The authors declare no competing interests.

Targeting receptor tyrosine kinases using monoclonal antibodies: the most specific tools for targeted-based cancer therapy

An omics perspective on drug target discovery platforms

Opinion: The druggable genome

Targeting transcription factors in cancer-from undruggable to reality

Interpreting pathways to discover cancer driver genes with Moonlight

Drug development in the era of precision medicine

Targeted drug delivery strategies for precision medicines

Progress and challenges towards targeted delivery of cancer therapeutics

Denoising of MR and CT images using cascaded multi-supervision convolutional neural networks with progressive training

MCDB: a comprehensive curated mitotic catastrophe database for retrieval, protein sequence alignment, and target prediction

Robust needle localization and enhancement algorithm for ultrasound by deep learning and beam steering methods

A brief review of artificial intelligence applications and algorithms for psychiatric disorders

Comprehensively benchmarking applications for detecting copy number variation

Using game theory to investigate the epigenetic control mechanisms of embryo development: Comment on

Artificial intelligence in COVID-19 drug repurposing

Systems biology of cancer metastasis

Network biology: understanding the cell's functional organization

A review of artificial intelligence applications for antimicrobial resistance

Exploring the dynamics and interplay of human papillomavirus and cervical tumorigenesis by integrating biological data into a mathematical model

2019nCoVAS: developing the web service for epidemic transmission prediction, genome analysis, and psychological stress assessment for 2019-nCoV

CGIDLA: developing the web server for CpG island related density and LAUPs (Lineage-Associated Underrepresented Permutations) study

Exploring the computational methods for proteinligand binding site prediction

Variations in DNA elucidate molecular networks that cause disease

Network biology concepts in complex disease comorbidities

Network approaches and applications in biology

MiR-205-5p and miR-342-3p cooperate in the repression of the E2F1 transcription factor in the context of anticancer chemotherapy resistance

Systems biology-based investigation of cooperating microRNAs as monotherapy or adjuvant therapy in cancer

A multi-network approach identifies protein-specific coexpression in asymptomatic and symptomatic Alzheimer's disease

Interactome networks and human disease

Bio-network medicine

Machine learning-assisted network inference approach to identify a new class of genes that coordinate the functionality of cancer networks

Biological network analysis with deep learning

Nextgeneration machine learning for biological networks

Metabolomics: beyond biomarkers and towards mechanisms

Multi-omics approaches to disease

Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types

Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets

Network integration of multi-tumour omics data suggests novel targeting strategies

A comprehensive analysis of metabolomics and transcriptomics in cervical cancer

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders

Pathology databanking and biobanking in The Netherlands, a central role for PALGA, the nationwide histopathology and cytopathology data network and archive

DrugBank 5.0: a major update to the DrugBank database for 2018

Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics

PubChem: a public information system for analyzing bioactivities of small molecules

ChEMBL: a large-scale bioactivity database for drug discovery

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

Comprehensive genomic characterization defines human glioblastoma genes and core pathways

The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity

An integrated encyclopedia of DNA elements in the human genome

COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer

The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets

The Gene Ontology (GO) database and informatics resource

Kyoto encyclopedia of genes and genomes

Omics, big data and machine learning as tools to propel understanding of biological mechanisms and to discover novel diagnostics and therapeutics

A network of epigenomic and transcriptional cooperation encompassing an epigenomic master regulator in cancer

Crosstalk between epigenetics and metabolism-Yin and Yang of histone demethylases and methyltransferases in cancer

Integrating genomics with biomarkers and therapeutic targets to invigorate cardiovascular drug development

Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Proteomics: guilt-by-association goes global

Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks

Leveraging models of cell regulation and GWAS data in integrative network-based association studies

A network analysis to identify mediators of germline-driven differences in breast cancer prognosis

Computational analysis of fused coexpression networks for the identification of candidate cancer gene biomarkers

Potential biomarkers and therapeutic targets in cervical cancer: Insights from the meta-analysis of transcriptomics data within network biomedicine perspective

Detection of gene communities in multi-networks reveals cancer drivers

CpG-island-based annotation and analysis of human housekeeping genes

Computed tomography angiography-based analysis of high-risk intracerebral haemorrhage patients by employing a mathematical model

EZH2-, CHD4-, and IDH-linked epigenetic perturbation and its association with survival in glioma patients

Investigation of mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multiscale agent-based model and experimental optimization/validation

Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfishbased LAUPs analysis application (JBLA)

Building up a robust risk mathematical platform to predict colorectal cancer

Bioinformatic analysis of chromatin organization and biased expression of duplicated genes between two poplars with a common wholegenome duplication

Mass spectrometry-based proteomics turns quantitative

OncoPPi network of cancer-focused protein-protein interactions to inform biological insights and therapeutic strategies

Mathematical description of linear dynamical systems

Identification of critical regulatory genes in cancer signaling network using controllability analysis

Network medicine framework shows that proximity of polyphenol targets and disease proteins predicts therapeutic effects of polyphenols

Control of fluxes in metabolic networks

Onco-Multi-OMICS approach: a new frontier in cancer research

The identification of key genes and pathways in hepatocellular carcinoma by bioinformatics analysis of high-throughput data

Multiomics analysis of tumor microenvironment reveals Gata2 and miRNA-124-3p as potential novel biomarkers in ovarian cancer

Network-based in silico drug efficacy screening

Community detection in graphs

Vital nodes identification in complex networks

Drug discovery with explainable artificial intelligence

Classification and regression trees

Ensemble deep learning in bioinformatics

Nordhausen & Klaus An introduction to statistical learning-with applications in R by

Deep learning

Deep learning

Network medicine: a network-based approach to human disease

Modelling and analysis of gene regulatory networks

Metabolic network structure determines key aspects of functionality and regulation

Architecture of the drug-drug interaction network

A survey of link prediction in complex networks

Spatiotemporal signal propagation in complex networks

On the limits of active module identification

Controllability of complex networks

Netpredictor: R and Shiny package to perform drug-target network analysis and prediction of missing links

The shortest path is not the one you know: application of biological network resources in precision oncology research

Solving uncapacitated multiple allocation p-hub center problem by Dijkstra's algorithm-based genetic algorithm and simulated annealing

Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network

Identification of disease treatment mechanisms through the multiscale interactome

Identification of novel candidate drivers connecting different dysfunctional levels for lung adenocarcinoma using protein-protein interactions and a shortest path approach

Identification of colorectal cancer related genes with mrmr and shortest path in protein-protein interaction network

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Betweenness centrality in large complex networks

Localization of functional domains in the androgen receptor

Significance analysis of microarrays applied to the ionizing radiation response

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

Inferring novel genes related to colorectal cancer via random walk with restart algorithm

Uncovering disease-disease relationships through the incomplete interactome

Assessment of network module identification across complex diseases

Community detection in networks: a user guide

Communities, modules and large-scale structure in networks

Network propagation: a universal amplifier of genetic associations

Simultaneous integration of multi-omics data improves the identification of cancer driver modules

Discovering key transcriptomic regulators in pancreatic ductal adenocarcinoma using Dirichlet process Gaussian mixture model

Endophenotype network models: common core of complex diseases

A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome

Network-based disease module discovery by a novel seed connector algorithm with pathobiological implications

EW_dmGWAS: edge-weighted dense module search for genome-wide association studies and gene expression profiles

A gene module identification algorithm and its applications to identify gene modules and key genes of hepatocellular carcinoma

Gene co-expression in the interactome: moving from correlation toward causation via an integrated approach to disease module discovery

Adapting community detection algorithms for disease module identification in heterogeneous biological networks

Lethality and centrality in protein networks

Identification of key regulators in prostate cancer from gene expression datasets of patients

Identification of influential spreaders in complex networks

Evolution of centrality measurements for the detection of essential proteins in biological networks

Distinct types of eigenvector localization in networks

Emergence of scaling in random networks

P4HB, a novel hypoxia target gene related to gastric cancer invasion and metastasis

Identification of influential spreaders in complex networks using HybridRank algorithm

Locating influential nodes in complex networks

Deciphering the mechanism of Indirubin and its derivatives in the inhibition of Imatinib resistance using a "drug target prediction-gene microarray analysis-protein network construction

Dynamic modularity in protein interaction networks predicts breast cancer outcome

The organisational structure of protein networks: revisiting the centrality-lethality hypothesis

MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset

Construction and analysis of protein-protein interaction networks based on proteomics data of prostate cancer

Analyzing a cooccurrence gene-interaction network to identify disease-gene association

Network analysis of gene essentiality in functional genomics experiments

Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data

Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

Exploring the intrinsic differences among breast tumor subtypes defined using immunohistochemistry markers based on the decision tree

Network-based prediction and knowledge mining of disease genes

Network topology measures for identifying disease-gene association in breast cancer

A note on the calculation and interpretation of the Gini index

Structural holes and good ideas

Vehicle trajectory prediction based on Hidden Markov Model

Subgraph centrality in complex networks

Functional cartography of complex metabolic networks

A new status index derived from sociometric analysis

Detection of gene orthology from gene co-expression and protein interaction networks

Network clustering coefficient without degreecorrelation biases

Bagging predictors

Drug-target interaction prediction using ensemble learning and dimensionality reduction

Random forests

Random Forest algorithm for the classification of neuroimaging data in Alzheimer's disease: a systematic review

Random forest-based modelling to detect biomarkers for prostate cancer progression

Using AUC and accuracy in evaluating learning algorithms

What are decision trees?

The dichotomy in degree correlation of biological networks

Understanding crowd-powered search groups: a social network perspective

The Alternating Decision Tree Learning Algorithm

Revealing dynamic regulations and the related key proteins of myeloma-initiating cells by integrating experimental data into a systems biological model

Breast cancer diagnosis using a multiverse optimizer-based gradient boosting decision tree

Learning representations by backpropagating errors

HNet-DNN: inferring new drug-disease associations with deep neural network based on heterogeneous network features

Protein-ligand scoring with convolutional neural networks

Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets

Drug similarity integration through attentive multi-view graph auto-encoders

GANLDA: graph attention network for lncRNA-disease associations prediction

Predicting MicroRNA-disease associations using network topological similarity based on DeepWalk

Deep learning for biology

Identification of target gene and prognostic evaluation for lung adenocarcinoma using gene expression meta-analysis, network analysis and neural network algorithms

Graph embedding techniques, applications, and performance: a survey. Knowl.-Based Syst

A comprehensive survey on graph neural networks

Inferring diseaseassociated Piwi-interacting RNAs via graph attention networks

Attention is all you need

A comprehensive overview of biometric fusion

A representation learning model based on variational inference and graph autoencoder for predicting lncRNAdisease associations

Machine learning: trends, perspectives, and prospects

Using knowledge-driven genomic interactions for multi-omics data analysis: metadimensional models for predicting clinical outcomes in ovarian carcinoma

Applications of machine learning in drug discovery and development

Maximum entropy methods for extracting the learned features of deep neural networks

Artificial intelligence faces reproducibility crisis

In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development

NetworkAnalyst-integrative approaches for protein-protein interaction network analysis and visual exploration

PaintOmics 3: a web resource for the pathway analysis and visualization of multi-omics data

Patients with early-stage oropharyngeal cancer can be identified with label-free serum proteomics

Systems-level differential gene expression analysis reveals new genetic variants of oral cancer

NetCAD: a network analysis tool for coronary artery diseaseassociated PPI network

Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap

WGCNA: an R package for weighted correlation network analysis

Identifying miRNA and gene modules of colon cancer associated with pathological stage by weighted gene co-expression network analysis

Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R

Cell adhesion-related molecules play a key role in renal cancer progression by multinetwork analysis

Network-and systems-based re-engineering of dendritic cells with non-coding RNAs for cancer immunotherapy

Application of deep learning methods in biological networks

Network-based support vector machine for classification of microarray samples

Integrative network analysis of differentially methylated and expressed genes for biomarker identification in leukemia

MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification

Convolutional network and convolutional neural network based method for predicting lncrna-disease associations

Regularized logistic regression with network-based pairwise interaction for biomarker identification in breast cancer

Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology

Fine-needle aspiration and molecular analysis. Surgery of the Thyroid and Parathyroid Glands

Machine learning and network analyses reveal disease subtypes of pancreatic cancer and their molecular characteristics

Multi-Omic graph transformers for cancer classification and interpretation

The current status of nucleic acid amplification technology in transfusion-transmitted infectious disease testing

Advances in protein structure prediction and design

Prediction of druggable proteins using machine learning and systems biology: a mini-review

PockDrug-Server: a new web server for predicting pocket druggability on holo and apo proteins

Identification of TNIK as a novel potential drug target in thyroid cancer based on protein druggability prediction

Accurate de novo prediction of protein contact map by ultradeep learning model

Protein structure determination using metagenome sequence data

Improved fragment sampling for ab initio protein structure prediction using deep neural networks

Improved protein structure prediction using predicted interresidue orientations

Critical assessment of methods of protein structure prediction (CASP)-Round XIII

Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12

Improved protein structure prediction using potentials from deep learning

Prediction of drug-target binding affinity using similarity-based convolutional neural network

Artificial intelligence and big data facilitated targeted drug discovery

SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines

GraphDTA: predicting drug-target binding affinity with graph neural networks

Bioluminescence and chemiluminescence in drug screening

DTI-Voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drug-target interactions

Guilt-by-association goes global

Drug target identification using side-effect similarity

Drug target protein-protein interaction networks: a systematic perspective

DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences

Drug repurposing: a promising tool to accelerate the drug discovery process

Enhancing the promise of drug repositioning through genetics

PREDICT: a method for inferring novel drug indications with application to personalized medicine

Systematic drug repositioning for a wide range of diseases with integrative analyses of phenotypic and molecular data

Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks

Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm

Drug repositioning by integrating target information through a heterogeneous network model

Individualized genetic network analysis reveals new therapeutic vulnerabilities in 6,700 cancer genomes

Drug repositioning based on bounded nuclear norm regularization

Overlap matrix completion for predicting drug-associated indications

A genome-wide positioning systems network algorithm for in silico drug repurposing

Drug repositioning based on comprehensive similarity measures and Bi-Random Walk algorithm

A network-based drug repositioning infrastructure for precision cancer medicine through targeting significantly mutated genes in the human cancer genomes

Computational drug repositioning using low-rank matrix approximation and randomized algorithms

Predicting candidate genes from phenotypes, functions and anatomical site of expression

SMILES transformer: pre-trained molecular fingerprint for low data drug discovery

DeepGOPlus: improved protein function prediction from sequence

Drug repurposing against breast cancer by integrating drugexposure expression profiles and drug-drug links based on graph neural network

A next generation connectivity map: L1000 platform and the first 1,000,000 profiles

STRING v10: protein-protein interaction networks, integrated over the tree of life

Predicting adverse side effects of drugs

Trial watch: phase III and submission failures

The SIDER database of drugs and side effects

Metabolic network prediction of drug side effects

ADMET properties: overview and current topics. Drug Design: Principles and Applications

ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling

The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models

ConvolutioNal networks on graphs for learning molecular fingerprints

Artificial intelligence for clinical trial design

Reinforcement learning with action-derived rewards for chemotherapy and clinical trial dosing regimen selection

Literature-curated protein interaction datasets

A disease network-based deep learning approach for characterizing melanoma

Opportunities and obstacles for deep learning in biology and medicine

Discovering protein drug targets using knowledge graph embeddings

Deep learning based classification of breast tumors with shearwave elastography

Improved metabolomic data-based prediction of depressive symptoms using nonlinear machine learning with feature selection

Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms

Network motifs modulate druggability of cellular targets

Global vision of druggability issues: applications and perspectives

PockDrug-Server: a new web server for predicting pocket druggability on holo and apo proteins

Discovery and verification of the potential targets from bioactive molecules by network pharmacology-based target prediction combined with high-throughput metabolomics

Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets

Inverse similarity and reliable negative samples for drug side-effect prediction

which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder