key: cord-0546953-orcx4uw6 authors: Rozemberczki, Benedek; Bonner, Stephen; Nikolov, Andriy; Ughetto, Michael; Nilsson, Sebastian; Papa, Eliseo title: A Unified View of Relational Deep Learning for Drug Pair Scoring date: 2021-11-04 journal: nan DOI: nan sha: 92f7ee9af1e53e0c11c22855cc22d42441edba01 doc_id: 546953 cord_uid: orcx4uw6 In recent years, numerous machine learning models which attempt to solve polypharmacy side effect identification, drug-drug interaction prediction and combination therapy design tasks have been proposed. Here, we present a unified theoretical view of relational machine learning models which can address these tasks. We provide fundamental definitions, compare existing model architectures and discuss performance metrics, datasets and evaluation protocols. In addition, we emphasize possible high impact applications and important future research directions in this domain. Relational deep learning has an unprecedented potential for revolutionizing the drug discovery process and pharmaceutical industry [Gaudelet et al., 2021] . A number of high value use cases for relational deep learning in the pharmaceutical domain involve answering questions about what happens when two drugs are administered at the same time. These potential applications might want to answer questions such as: Will a combination of two drugs be more effective at destroying a specific type of lung cancer cells [Preuer et al., 2018] ? Is there an unexpected (polypharmacy) side effect of using these two drugs together? Is there an unwanted chemical interaction [Sunyoung et al., 2017] that these drug molecules can have? All of these previously mentioned questions can be answered by what we see as drug pair scoring, a machine learning task that involves a set of drugs and the task of predicting the behaviour of pairs in a specific context of interest. Given an incomplete database of drug pairs, drug administration contexts and outcomes, the goal is to train a model to accurately make probabilistic predictions for unseen entries. The reasons for answering these questions via algorithmic methods are multi-fold. Firstly, testing all drug pairs in all of the contexts is not feasible due to time and financial constraints such as drug prices and labour costs [Preuer et al., 2018] . Secondly, certain pair scoring tasks such as polypharmacy side effect prediction can only be validated in human-based trials. Finally, laboratory testing of drug pairs is prone to human errors [Liu et al., 2020] . Traditional supervised machine learning methods which solve the drug pair scoring task use handcrafted molecular features to predict the outcome of administering the drugs together in a specific context [Sidorov et al., 2019; Chiang et al., 2020] . Another group of techniques uses an unsupervised approach which diffuses the profile of the drug pair on a heterogeneous biological graph [Zhang et al., 2017; Huang et al., 2019] in order to find potential polypharmacy, synergy or interaction indications. Deep learning techniques which solve the drug pair scoring task can be seen as a fusion and extension of these traditional methods. Such models first generate drug representations based either on molecular structure or the heterogeneous graph based neighbourhood context. In the second optional step, these representations are propagated in the biological graph and aggregated. Finally, drug pair representations are formed and probability scores are outputted in the specific drug administration contexts. We present a high level summary of the drug pair scoring task idea in Figure 1 . Figure 1 : Drug-drug interaction, polypharmacy side effect, and pair combination therapy design prediction tasks follow the same template. Given a pair of drugs with optional biological context, the task is to predict an outcome in a specific application domain. Relational machine learning models which solve these task can exploit molecular features, knowledge graph based neighbourhoods or both. Our main contributions can be summarized as: 1. We provide a unification of drug-drug interaction, polypharmacy side effect and synergistic drug combination prediction tasks. 2. We present an overview on the design of relational machine learning models which can address these predictive tasks. 3. We highlight the publicly available datasets used to train and test the models on these tasks and survey the literature for the most commonly used evaluation metrics. 4. We review the most important applications of these techniques and discuss directions for future research in the domain. The remainder of this survey is structured as follows. In Section 2 we establish the foundations of a unified view of discriminative machine learning tasks defined on pairs of drugs. Section 3 discusses the architectural details of models that can solves these tasks. The evaluation metrics, protocols and datasets used in the literature are detailed in Section 4. Several important key application areas are highlighted in Section 5. We discuss the limitations of current approaches and future research directions in Section 6. The paper concludes with Section 7. The survey is supported by a collection of relevant works under the https://github. com/AstraZeneca/polypharmacy-ddi-synergy-survey repository. Our discussion of drug pair scoring models requires the introduction of a drug set D = {d 1 , . . . , d n } that describes compounds of interest and a context set C = {c 1 , . . . , c k } that contains contexts where two drugs are used in a pair combination. Definition 1. Labeled drug pair. A labeled drug pair defined on drug set D and context set C is the tuple (d, d , c, y d,d ,c ) , where the binary indicator y d,d ,c ∈ {0, 1} is the outcome for drug pair d, d ∈ D in context c ∈ C. A labeled drug pair is a known fact about the drug pair having an effect in a context such as a specific polypharmacy side effect, interaction or synergistic relationship at treating a disease. The purpose of pair scoring models is to learn from these tuples to predict the labels for unlabeled drug pairs and contexts. Definition 2. Database of labeled drug pairs. A database of labeled drug pairs defined on drug and context sets D and C is the set S containing labeled drug pairs (d, d , c, y d,d ,c ) where d, d ∈ D and c ∈ C. Pair scoring models are trained on databases of labeled drug pairs and the trained models are used to predict the label of pairs for which we do not know the outcome in certain contexts. Definition 3. Heterogeneous interaction graph with drug entities. We denote with G(V, R, E) the heterogeneous interaction graph with drug entities, where V and R are the entity and relation sets, it holds that the drug set D ⊂ V and E is formed by typed edges of the form (v, r, u) ∈ V × R × V . We consider a heterogeneous graph where the drug set is a subset of the vertex set. This definition of heterogeneous (biological) knowledge graph helps to create knowledge graph based representation for the compounds of interest. Definition 4. Neighbourhood encoder. A neighbourhood encoder is the function: (1) In Equation (1) Θ u is a parametric vector representation of u ∈ V and N (·) is a neighbourhood set. The neighbourhood encoder function [Hamilton et al., 2017] creates a vector representation of drug vertices of the graph based on the aggregation of trainable parameter vectors in the neighbourhood of the source node. Neighbourhoods of a drug can be defined based on arbitrary notions of proximity and the aggregation itself could be a parametric transformation. Definition 5. Molecular encoder. A molecular encoder is the function h d = h Θ (M d ), parametrized by Θ where h d is the learned vector representation and M d is a generic notation of molecular features describing the drug d. A molecular encoder is a neural network which generates a vector representation from the features of the molecule -these molecular features can be derived from generic features (e.g. hydrophilicity), a string representation, molecular graph or geometry. Definition 6. Neighbourhood informed molecular encoder. This encoder is the function : where h Θ (M u ) and AGGREGATE(·, ∀u ∈ N (d)) are molecular and neighbourhood encoders respectively. This encoder combines the layers described in Definitions 4 and 5. It is essentially a neighbourhood encoder parametrized by representations outputted by a molecular encoder -molecular representations learned by the molecular encoder are aggregated in the neighbourhood of source drug nodes in the knowledge graph which has drug entities. Definition 7. Molecular representation combiner. Given the drugs d, d ∈ D with vector representations h , h d the molecular representation combiner is the function h The representation output by this combiner function can be drug orchestration order dependent. This way the temporal order of drug orchestration can be expressed by the pair scoring model. For example the concatenation of drug vectors results in order dependent representations of pairs, while a bilinear transformation of drug representations with a diagonal matrix does not. Definition 8. Scoring head layer. The scoring head layer is Given a drug pair representation and a context, the scoring head layer outputs a probability score for the outcome. Definition 9. Drug pair scoring loss and cost functions. Given the drug pair d, d ∈ D , context c ∈ C, ground-truth label y d,d ,c and predicted scoreŷ d,d ,c the loss is defined as the function (y d,d ,c ;ŷ d,d ,c ). The cost on the whole drug pair database S is defined by Equation (2). In practical settings, drug pair scoring models are trained by the minimization of the binary cross-entropy summed over the labeled drug pair, context triples. 1: A machine learning task, model view level, induction, interaction graph node type (entity) and drug feature based comparison of drug pair scoring machine learning models. Machine learning models that solve a specific pair scoring task are ordered chronologically in the Our discussion of the drug pair scoring models introduces our unified view about the general architecture of these models and compares state-of-the-art architecture designs. Based on the definitions outlined in Section 2 we propose a unified view of drug pair scoring models. We postulate that the abstract design of drug pair scoring models irrespective of the specific subtask solved always has the following architecture: 1. An encoder to generate drug representations -this can be one of the functions described by Definitions 4, 5 and 6. 2. A molecular representation combiner function to generate a drug pair representation -see Definition 7. 3. The scoring head layer to predict the probability of a context dependent outcome proposed by Definition 8. 4. The loss function of Definition 9 which depends on groundtruth labels and the probabilities output by the head layer. This architecture and design allows for the joint end-to-end training of the individual model components -gradient descent based update of the layer weights. We compare state-of-the-art model architectures in Table 1 that can solve pair scoring tasks. Our comparison considers the model level, induction capabilities, specific subtask, node types of the heterogeneous graph and the molecular features exploited by the model. Model attributes used for comparison were the following: • Model level: A model operates at the following levels based on the encoder architecture used for generating the drug representations: (a) higher-view -neighbourhood encoder, (b) lower-view -molecular encoder, (c) hierarchical-viewneighbourhood informed molecular encoder. • Machine learning task: The drug pair scoring task of interest solved by the dedicated model architecture proposed in the research paper. It has to be one of interaction, polypharmacy or synergy prediction. • Induction: A model is inductive if it can predict the label of drug pairs where at least one of the drugs was not in the training set drug pairs. • Entities: The types of hetereogeneous graph entities (drugs, proteins, diseases) used by the model to solve the task. • Drug features: Molecular features and information about the compound encoded by the molecular encoder function. Our comparison highlights that there is a hard trade-off between induction and the exclusion of compound features. It is also evident that there is a connection between the machine learning subtask and the model architecture design: for example, polypharmacy side effect prediction models are mostly high level transductive neighbourhood encoders with a scoring layer on top. Synergy scoring models are mostly inductive techniques which exploit the molecular information about the drugs. Currently, there is no single pair scoring model which includes all of the considered biological modalities. The evaluation of machine learning models requires performance metrics, train-test split strategies and publicly accessible datasets. The predictive performance of drug pair scoring models is evaluated by metrics tailored to binary classification tasks. We summarise how these metrics are used for the evaluation of state-of-the-art drug pair scoring architectures in Table 2 . Looking at Table 2 it is evident that the evaluation metrics used in the literature can be grouped into two categories: • Score based metrics: These quantify predictive performance based over the whole domain of discrimination thresholds. The precision-recall area under the curve (AUPRC) considers the precision-recall trade off under the whole domain of discrimination thresholds while the receiver operating characteristic area under the curve (AUROC) considers false and true positive rates. • Hard cut off evaluation metrics: These performance metrics (accuracy, F 1 score, precision, recall) apply a hard discrimination threshold to assign a label to the data points based on the scores output by the pair scoring model. In order to calculate these, one needs to set a discrimination threshold. Our findings demonstrate that pair scoring models are predominantly evaluated by score based metrics (AUPRC and AUROC) which do not require manual setting of a discrimination threshold. It is also evident that seminal research works which defined the key pair scoring tasks influenced the later evaluation metric choices -polypharmacy prediction models adapted the evaluation metrics from for example. The evaluation of drug pair scoring tasks allows for the use of various train-test split strategies [Preuer et al., 2018] to test the performance of the model under cold-start and inductive scenarios [Dewulf et al., 2021] . Given a labeled drug pair-context database S, defined on the drug and context sets D and C, we assume that one can create the randomized splits S T rain and S T est . We summarized these splitting strategies in Figure 2 . Using the formalism established to describe the pair scoring models, the splitting strategies are defined as: • Random split: labeled drug pair -context entries of S are randomly split between S T rain and S T est . • Drug pair stratified split: A drug pair d, d ∈ D that appears in entries of S T rain does not appear in entries of S T est . This split requires a pair scoring model which is inductive with respect to drugs. • Drug stratified split: A drug d ∈ D that appears in entries of S T rain does not appear in entries of S T est . Like the drug pair stratified split this requires the model to be inductive with respect to new drugs. • Context stratified split: A context c ∈ C that appears in entries of S T rain does not appear in entries of S T est . This requires that the pair scoring model is inductive with respect to the set of contexts. We detail public sources for drug pair data which have been used by the approaches in this review in Table 3 . Datasets are listed chronologically according to subtask and the licence and any restrictions for commercial use are detailed where available. It can be seen that the majority of datasets contain a small number of drugs, indicating most focus on approved drugs rather than all possible compounds, with the interactions captured in drug pairs being much more numerous. It should be noted that established resources such as TWO-SIDES and DrugBank are frequently filtered, cleaned and split into new datasets. For example the Therapeutics Data Commons (TDC) resource contains filtered versions of both of these datasets designed for benchmark use . It is also common for datasets to be named differently in publications, for example the split of TWOSIDES contained in TDC is also called ChChSe-Decagon in some works [Marinka Zitnik et al., 2018] . In this section we introduce three key, yet currently largely, unexplored applications for the methods detailed in this review. One topical application of these methods is in relation to COVID-19 pandemic. Patients affected by polypharmacy of certain drug types (anti-psychotics and opiates being prominent examples) had a significantly higher chance of a negative clinical outcome from COVID-19 [Iloanusi et al., 2021; Jin et al., 2021] . Using methods covered in this review to predict which combinations may have a negative effect for COVID-19 patients, could enable high risk groups to seek alternative treatments, reducing the risk of a negative outcome. The prevalent use of antibiotics has resulted in microbes evolving resistance to the drugs, reducing efficacy and potentially eliminating cost effective ways of treating severe bacterial-related diseases such as Tuberculosis. Interestingly, it has been shown that the combination of different antibiotics can slow, and even reverse, this evolutionary resistance [Singh et al., 2017] . However discovering these suppressive interactions using traditional methods is a complex and slow process, yet one currently unexplored using the methods covered in this review. Although drug combinations can result in an increase of unwanted side effects, one promising application is that the combination of two or more drugs can actually lead to a reduced level of toxicity for patients. This is due to the fact that synergistic drugs, which together posses a higher level of efficacy at targeting a certain condition, means that the levels of each individual compound can actually be lowered, reducing toxicity issues associated with higher doses [Ianevski et al., 2020] . Thus, accurate prediction of synergistic drug combinations can reduce the impact of toxicity resulting from the individual compounds. The body of work regarding relational machine learning for drug pair scoring primarily focuses on the design of novel architectures and applications. Our unification survey identified a number of potential shortcomings of existing approaches and venues for novel research in the domain. Our summary on the design of relational machine learning architectures for drug pair scoring tasks in Table 1 highlighted that molecular geometry and spatial structure of the molecules is rarely encoded by existing models. Recent advances in geometric deep learning applied to chemistry [Qi et al., 2017; Xie et al., 2018; Fey et al., 2018] would allow the inclusion of geometric information which could lead to better predictive performance on the pair scoring tasks. The modularity of existing architectures makes replacing the molecular encoders with state-of-the-art geometric encoder layers a possibility. Existing research about the interactions, unwanted side effects and synergy of drugs is primarily focused on the evaluation of binary pair combinations. This is driven by the lack of datasets focused on the outcomes of using higher order drug combinations and the lack of architectures designed specifically for these higher order combinations. By using set based representation aggregation layers [Vinyals et al., 2016; Baek et al., 2020] , the existing pair scoring models could be adapted to generate drug subset representations. Self-supervised and unsupervised learning for pretraining molecular encoders is already widely used for single molecule machine learning tasks [Dewulf et al., 2021] . This provides an opportunity for pretraining the molecular encoders on single molecule tasks and fine-tuning them on the data scarce, pair scoring tasks. Another opportunity for transfer learning comes from the fact that certain pair scoring tasks have a greater quantity of labeled data available. The summary of drug pair scoring datasets in Table 2 demonstrated that the drug-drug interaction prediction task has datasets such as STITCH-CCI-5 which covers a large number of pair combinations, while the polypharmacy side effect and synergy prediction tasks have smaller databases. Pretraining models by performing drug-drug interaction prediction and fine-tuning these models for other tasks seems to be an important future research direction for training accurate, and therefore useful, models. A heterogeneous graph based representation of drugs allows for the fusion of multiple data modalities. Our survey of existing models in Table 1 has demonstrated that only a handful of existing architectures integrates multimodal data effectively [Rozemberczki et al., 2021a; without losing induction. Integrating multi-omics data such as proteomics, molecular structure and biological pathway information could be an important venue for designing novel pair scoring architectures. Currently there is no dedicated open-source machine learning library which was specifically designed for solving the drug pair scoring task. Developing a dedicated relational machine learning framework on top of existing geometric deep learning [Fey et al., 2019; Rozemberczki et al., 2021b] and deep chemistry frameworks [Ramsundar et al., 2019; Korshunova et al., 2021] could be an important contribution to the domain. This Bi-Level Graph Neural Networks for Drug-Drug Interaction Prediction. ICML 2020 Graph Representation Learning and Beyond (GRL+) Workshop, 2020 Large-Scale Analysis of Drug Combinations by Integrating Multiple Heterogeneous Information Networks GCN-BMP: Investigating Graph Representation Learning for DDI Prediction Task Drug-Drug Interaction Prediction Based on Co-Medication Patterns and Graph Matching Drug-Drug Interaction Prediction with Wasserstein Adversarial Autoencoder-Based Knowledge Graph Embeddings Relation Matters in Sampling: A Scalable Multi-Relational Graph Neural Network for Drug-Drug Interaction Prediction DPDDI: A Deep Predictor for Drug-Drug Interactions MTDDI: A Graph Convolutional Network Framework for Predicting Multi-Type Drug-Drug Interactions The National Cancer Institute ALMANAC: A Comprehensive Screening Resource for the Detection of Anticancer Drug Pairs with Enhanced Therapeutic Activity Driver Network as a Biomarker: Systematic Integration and Network Modeling of Multi-Omics Data to Derive Driver Signaling Pathways for Drug Combination Prediction Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development Anticancer Drug Synergy Prediction in Understudied Tissues Using Transfer Learning Network Propagation Predicts Drug Synergy in Cancers KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction DCDB 2.0: A Major Update of the Drug Combination Database DrugCombDB: A Comprehensive Database of Drug Combinations Toward the Discovery of Combinatorial Therapy TranSynergy: Mechanism-Driven Interpretable Deep Neural Network for the Synergistic Prediction and Pathway Deconvolution of Drug Combinations Enhancing Drug-Drug Interaction Prediction Using Deep Attention Neural Networks Sagar Maheshwari Marinka Zitnik SSI-DDI: Substructure-Substructure Interactions for Drug-Drug Interaction Prediction An Unbiased Oncology Compound Screen to Identify Novel Combination Strategies. Molecular Cancer Therapeutics PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Jae Yong Ryu, Hyun Uk Kim, and Sang Yup Lee. Deep Learning Improves Prediction of Drug-Drug and Drug-Food Interactions SYNERGxDB: An Integrative Pharmacogenomic Portal to Identify Synergistic Drug Combinations for Precision Oncology Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data Structure-Based Drug-Drug Interaction Detection via Expressive Graph Convolutional Networks and Deep Sets Deep-CCI: End-to-End Deep Learning for Chemical-Chemical Interaction Prediction STITCH 5: Augmenting Protein-Chemical Interaction Networks with Tissue and Affinity Data GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity Interactions Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties GraphSynergy: Network Inspired Deep Learning Model for Anti-Cancer Drug Combination Prediction Mining Signaling Flow to Interpret Mechanisms of Synergy of Drug Combinations Using Deep Graph Neural Networks. bioRxiv The authors would like to thank Peizhen Bai, Piotr Grabowski, Haiping Lu, Rocío Mercado, and Paul Scherer for help and feedback throughout the preparation of this manuscript. Stephen Bonner is a fellow of the AstraZeneca postdoctoral program. We have provided an exhaustive overview of relational machine learning models designed to solve drug pair scoring tasks. We outlined a general theoretical framework which unifies the drugdrug interaction, polypharmacy side effect and drug synergy prediction tasks and created a taxonomy of models which address these. By surveying the literature, considering the architecture and evaluation of existing models, we identified key real world application areas and important directions for future research.