key: cord-0979473-j6wf8f0b
authors: Tomazou, M; Bourdakou, M M; Minadakis, G; Zachariou, M; Oulas, A; Karatzas, E; Loizidou, E; Kakouri, A; Christodoulou, C; Savva, K; Zanti, M; Onisiforou, A; Afxenti, S; Richter, J; Christodoulou, C G; Kyprianou, T; Kolios, G; Dietis, N; Spyrou, G M
title: Multi-omics data integration and network-based analysis drives a multiplex drug repurposing approach to a shortlist of candidate drugs against COVID-19
date: 2021-05-01
journal: Brief Bioinform
DOI: 10.1093/bib/bbab114
sha: 2e10c7523deb62fa95b25a910ed4462105adb6ea
doc_id: 979473
cord_uid: j6wf8f0b

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic is undeniably the most severe global health emergency since the 1918 Influenza outbreak. Depending on its evolutionary trajectory, the virus is expected to establish itself as an endemic infectious respiratory disease exhibiting seasonal flare-ups. Therefore, despite the unprecedented rally to reach a vaccine that can offer widespread immunization, it is equally important to reach effective prevention and treatment regimens for coronavirus disease 2019 (COVID-19). Contributing to this effort, we have curated and analyzed multi-source and multi-omics publicly available data from patients, cell lines and databases in order to fuel a multiplex computational drug repurposing approach. We devised a network-based integration of multi-omic data to prioritize the most important genes related to COVID-19 and subsequently re-rank the identified candidate drugs. Our approach resulted in a highly informed integrated drug shortlist by combining structural diversity filtering along with experts’ curation and drug–target mapping on the depicted molecular pathways. In addition to the recently proposed drugs that are already generating promising results such as dexamethasone and remdesivir, our list includes inhibitors of Src tyrosine kinase (bosutinib, dasatinib, cytarabine and saracatinib), which appear to be involved in multiple COVID-19 pathophysiological mechanisms. In addition, we highlight specific immunomodulators and anti-inflammatory drugs like dactolisib and methotrexate and inhibitors of histone deacetylase like hydroquinone and vorinostat with potential beneficial effects in their mechanisms of action. Overall, this multiplex drug repurposing approach, developed and utilized herein specifically for SARS-CoV-2, can offer a rapid mapping and drug prioritization against any pathogen-related disease.

most important genes related to COVID-19 and subsequently re-rank the identified candidate drugs. Our approach resulted

The coronavirus disease 2019 has claimed, at the time of writing, 1.83 million lives with >83.3 million cases worldwide [World Health Organization-Weekly Situation Report 5 January 2021-https://www.who.int/]. Despite the unprecedented number of ongoing independent efforts, this global health emergency remains unanswered in terms of effective treatment(s) against COVID-19 and the responsible virus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

The short-term challenge that needs to be addressed by the scientific community is the realization of therapeutics that can withhold the onslaught of SARS-CoV-2, until vaccines enable mass immunization. The time scale required for identifying, testing and approving new therapeutic compounds is prohibiting for the urgency of the unfolding situation, albeit more relevant as a medium-to long-term response. Thus, the focus is on drug repurposing (DR) of existing therapeutics. Yet, even for the case of existing compounds, the experimental preclinical screening followed by human safety and efficacy trials is a laborious and time-consuming process. Therefore, prioritization of the candidate drugs entering the testing pipeline is necessary. To this effect, computational approaches can expedite the candidate drug prioritization process through a multitude of methodologies [1] that can exhaustively analyze and integrate the available information on a given drug.

Therapeutics can be categorized roughly into (a) those that target the virus itself by blocking its replication and cell entry, leading to the reduction of the viral load of the infected patient or reducing transmission, (b) those aiming to reverse the effects of the disease such as the acute immune response of the host, leading to the well-documented cytokine storm and organ damage [2] [3] [4] and (c) both of the above. Therefore, equally important is the analysis of the disease itself towards deciphering the mechanisms through which COVID-19 develops and progresses.

As the amount of available data accumulates, novel avenues of analysis and informed decisions on drug prioritization emerge. The arsenal of high-throughput sequencing (HTS) methodologies has already been utilized to produce a diverse space of omics data, originating from clinical and preclinical setups from samples derived from human, animal hosts and cell lines. Several computational DR strategies were developed for COVID-19. Most of these were focused on network-based approaches by combining the viral and the human interactome and through network-based strategies prioritized the existing drugs [5, 6] . On the other hand, several studies adopted transcriptomic-based in silico DR approaches, utilizing publicly available gene expression data from SARS-CoV-2 and other infectious viruses, to propose candidate repurposed drugs against COVID-19 [7] [8] [9] . As over 500 structures of SARS-CoV-2 proteins or protein complexes are available [10] , more and more structure-based DR approaches are being developed using molecular docking and dynamic simulations to identify antiviral compounds against COVID-19 [11] [12] [13] [14] [15] [16] [17] .

The remaining challenge is on how to integrate the heterogeneous data and extract the critical information needed for selecting potentially repurposable drugs against COVID- 19. In this work, we present a multiplex DR scheme against COVID-19 via three discrete approaches, stepping on the analysis of multi-source and multi-omics publicly available data from patients, cell lines and databases. Following the DR approaches, we used a network-based multi-omic data integration approach to prioritize the most important genes related to COVID-19 and subsequently re-rank the identified candidate drugs. By combining drug structural diversity filtering along with experts' curation and drug-target mapping on the depicted molecular pathways, we compiled an informed integrated shortlist of candidate drugs against COVID-19. The proposed candidates include compounds that were recently found to generate promising results in clinical trials, but also compounds that are presented for the first time in the literature as potential COVID-19 therapeutics.

The workflow of the proposed pipeline in this paper can be described in the following main steps (as illustrated in detail in Figure 1 ):

(A) Multi-omics and protein-protein interaction (PPI) data selection and preprocessing: multi-omics datasets were selected from various available sources in order to identify differentially expressed genes (DEGs), differentially abundant metabolites and proteins along with PPIs between SARS-CoV-2 and other pathogens with the human host.

(B) DR pipelines: multiplex DR approaches were implemented based on (i) transcriptomics analysis, (ii) GWAS-phenotype association analysis and (iii) pathogen-host interaction network analysis in order to generate an initial list of repurposed drugs for COVID-19.

(C) Multi-omic data integration: we developed a 'networkbased multi-source data integration' methodology to integrate multi-omic data from COVID-19 patients.

(D) Drug re-ranking: the re-ranking of drug candidates was driven by a disease association score of the calculated from the synthetic integration network.

(E) Drug filtering: the structural similarity of candidate drugs was calculated to cluster the top scoring compounds. Finally, we proposed an integrated drug list that comprises 65 drugs out of which 16 were manually curated by experts for further annotation. Overview of the workflow. (A) Data sources: multi-omics datasets were selected from various sources in order to identify differentially expressed genes, differentially abundant metabolites and proteins along with protein-protein interactions (PPIs) between SARS-CoV-2 and other pathogens with the human host. This was followed by (B) drug repurposing approaches based on (i) transcriptomics, (ii) genomics-GWAS-phenotype association analysis and (iii) pathogen-host interaction network analysis. (C) Multiplexing of repurposed drug lists: (i) integration of the multi-omic data from patients, (ii) functional analysis, (iii) drug re-ranking based on the calculated target-disease association of the integration map and (iv) structural clustering and shortlisting of drug candidates. Finally, the integrated drug list comprises 65 drugs out of which 16 were manually curated by experts for further annotation.

We selected datasets from multiple sources available at the time of conducting this research. Their description including the number of DEGs, proteins and metabolites with their corresponding selection criteria are shown in Table 1 and their analysis is presented in following sections.

Transcriptomic (expression profiling by HTS) data were obtained from Gene Expression Omnibus (GEO) accessed on 9 April 2020 with ID GSE147507 [18] . This dataset consists of transcriptional profiling of different cell lines with SARS-CoV-2 infection and of transcriptional profiling of COVID-19 lung biopsies. In our study, we used independent biological triplicates of primary human lung epithelium Normal Human Bronchial Epithelial (NHBE) cell (Series 1), transformed lung alveolar A549 cells (Series 2-5) and transformed lung-derived Calu-3 cells (Series-7), which were mock treated or infected with SARS-CoV-2. We also used independent biological copies of the transformed alveolar lung A549 cells (Series 6) that were transported with a carrier expressing human angiotensin I converting enzyme 2 (ACE2), which were also mock treated or were infected with SARS-CoV-2. Finally, we included uninfected human lung biopsies from two patients that were used as biological replicates and lung samples derived from a single COVID-19 deceased patient that was treated in technical replicates (Series 15). Differential expression analysis of the RNA sequencing (RNA-Seq) data was performed in R programming language using the EdgeR package [19] . We removed low-expressed genes by retaining only those genes that are represented at least in two samples of each group. The messenger RNA counts were normalized using the trimmed mean of M-values method. Finally, we used EdgeR's negative binomial model to perform the differential expression analysis of infected compared with control samples. From the comparisons, we selected the DEGs by applying selection thresholds of adjusted P-value < 0.05 and log fold-change (logFC) ≥ 1.

In addition, we included RNA-Seq data of RNAs isolated from bronchoalveolar lavage fluid (BALF) and peripheral blood mononuclear cells (PBMC) specimens of COVID-19 patients [20] . Specifically, this dataset included two BALF samples from Zhongnan Hospital of Wuhan University and three BALF samples from healthy controls, downloaded from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database with accession numbers SRR10571724, SRR10571730 and SRR10571732. PBMC samples from three COVID-19 patients and three healthy donors were obtained from Zhongnan Hospital of Wuhan University (Table 1) . From these sets, DEGs were selected based on the original publication analysis as described by Xiong et al. [20] . Specifically, DEGs had been identified using the DESeq2 package (v1.26.0) [21] . Selection thresholds on BALF-derived DEGs were set at FC ≥ 4, adjusted P-value < 1e-10 and at least 10 read counts on average across all samples. For the PBMC data, due to lower sequencing depth, the DEGs were selected using FC ≥ 2, adjusted P-value < 0.01 and over 100 read counts on average across all samples.

Due to the heterogeneity of experimental methodologies and samples, the selection thresholds used were different between BALF, PBMC and Series 1, 2-5, 6, 7, 15. For BALF and PBMC, the thresholds were chosen according to the selection criteria applied in the corresponding published studies. For Series 1, 2-5, 6, 7, 15, we analyzed the available raw data and we applied uniform selection criteria.

To further enrich our study, we included proteomic and metabolomic data from Shen et al. [22] . In this study, the authors performed proteomic (SerumP) and metabolomic (SerumM) profiling from sera of COVID-19 and healthy individuals. These samples were procured from 65 patients who visited Taizhou Hospital from January to March 2020. They were diagnosed as COVID-19 according to the Chinese Government Diagnosis and Treatment Guideline. From the analyzed proteomic and metabolomic profiles, 28 severe COVID-19 patients and 28 healthy controls were used in our analysis, and more specifically, the differentially abundant proteins and metabolites, based on adjusted P-value < 0.05 and absolute logFC > 0.25, reported by the authors using a two-sided unpaired Welch's t-test.

Mapping between metabolites and genes was performed using the Ingenuity Pathway Analysis (IPA) software by Qiagen [23] (http://www.ingenuity.com). Human Metabolome Database (HMDB) Identifiers (IDs), logFC and P-value were used as input into the IPA software, which identified the upstream regulators (Supplementary File S1) that may upregulate or downregulate the input metabolites. The right-tailed Fisher's exact test was used to obtain the P-values of the identified regulatory genes.

A set of 336 SARS-CoV-2 human proteins, available in the Human Protein Atlas (HPA) repository, have been further employed in the study. Specifically, the set involves 332 high-confidence SARS-CoV-2-human PPIs, identified by Gordon et al. [24] , who had cloned, tagged and expressed 26 SARS-CoV-2 proteins in human cells by means of affinity purification mass spectrometry. Four additional proteins were included: the ACE2 receptor used by SARS-CoVs for host cell entry [25, 26] , the serine protease TMPRSS2 involved in the S protein priming [25] and the endosomal cysteine proteases cathepsins CTSB and CTSL needed for viral cell entry, which have been widely expressed in human tissues [25, 26] .

PPIs between host and pathogens other than SARS-CoV-2 were downloaded from IntAct [27] , PHISTO [28] and VirHostNet [29] , amounting to a total of 42 684 unique PPIs from a total of 625 organisms.

The transcriptomic-based DR was performed based on human cell lines and ex vivo (rat liver) DR tools. The 150 over-and underexpressed genes (based on their FC value) from the BALF, PBMC, Series 15, 1, 2, 5, 6 and 7 were used as transcriptomic signatures. The expression data from Series 2 and 5 were pooled and analyzed prior to the DR rans in a single set since they were generated from the same cell lines and treatments. Next, each set was used as an input in three different transcriptomicbased computational DR tools: Connectivity Map [30] , L1000CDS2 [31] and L1000FWD [32] . These tools use transcriptional expression data from multiple human cell lines to probe relationships between diseases and therapeutic agents. Drugs are sorted according to their 'enrichment' score, which characterizes if a drug can enhance or reverse the expression levels of a disease based on a given set of genes. Drugs with high negative scores are those that can reverse the gene expression profile towards the normal state.

For each input set, we obtained a candidate list of drugs predicted by each of the three tools, ranked based on their respective reverse enrichment score (inhibition score). Since the output of L1000CDS2 is limited to 50 drugs, we applied the same cutoff for all the other repurposed drug lists. Hence, the top 50 drugs from each of the three tools were combined as their union of unique drugs and ranked by calculating the weighted sum of normalized average rankings and the normalized number of appearances according to Equation (1):

where R i is the average ranking score from each of the three tools, A i the number of appearances of each drug in the three DR tools, and w 1 and w 2 are 0.7 and 0.3, respectively. The drug lists obtained from Series 1, 2 and 5, 6 and 7 sets were combined and re-ranked again using Equation (1) in order to conclude to a single drug list from all cell line-derived RNA-Seq data.

The same gene expression sets were used as input to the CRowd Extracted Expression of Differential Signatures (CREEDS) gene and drug perturbation database [33] . The CREEDS database consists of a list of 4295 single drug perturbations and 8620 single gene perturbations obtained using gene expression data from different tissue types of rat, collected from GEO. The CREEDS database was used herein to extract drugs from the DrugMatrix [33] that are predicted to be able to reverse the disease expression profile of the gene sets of interest. The drugs are ranked by the tool based on Fisher's exact test-derived P-value. Like above, the top 50 drugs based on the enrichment score returned by CREEDS. Again, the candidate drugs obtained from Series 1, 2 and 5, 6 and 7 were combined and ranked according to Equation (1) . Table 2 shows a summary of the eight drug lists (T1-4, TC1-4) obtained from transcriptomics data.

The 40 strong SARS-CoV-2 interactors according reported by Gordon et al. [24] were used as an input to PhenoScanner [34, 35] . PhenoScanner returns genotype-phenotype associations across traits and proxies that are collected from various other online databases such as GWAS catalog [36] and CHARGE [37] .

Phenotype-associated genes were extracted automatically from PhenoScanner and were used to search for drug interactors. More specifically, the genes were used as input to the Drug-Gene Interaction Database (DGIdb) (www.dgidb.org) [38] , which consolidates, organizes and presents drug-gene interactions and gene druggability information from papers, databases and web resources.

Pathogen identification and taxonomy-based distances. From the polypeptide targets data file of DrugBank, we extracted the NCBI taxonomy [39] IDs of all organisms other than Homo sapiens contained in DrugBank [40, 41] and have at least one protein that is targeted by a drug. This set was filtered to identify and remove nonpathogens like Metazoa and plants using a custom written script in R in combination with the taxize package to retrieve the taxonomy classification of the organisms. The final In-house methods list of pathogens included viruses, bacteria and known human parasites from eukaryotes. From the resulting pathogen tax ID list, we constructed their taxonomy tree using R's taxize package [42] . The distance between each node was calculated using a variable step size between taxonomy ranks proportional to the loss of information at each level as previously proposed [42] [43] [44] .

DR using the taxonomy distance matrix. Our working assumption was that drugs with a direct inhibitory effect against a given pathogen are more likely to have a similar effect to closely related pathogens in terms of taxonomic distance. Based on this assumption, drugs with protein targets across a broad and diverse range of taxonomy distances are expected to have a less taxon-specific effect, raising the prospect that they are repurposable against SARS-CoV-2, as opposed to drugs known to target only a specific group of distant pathogens. This broad spectrum antipathogenic activity can be captured by the maximum distance of organisms affected by the same drug, whereas the diversity in the inter-taxon distances can be captured using the Shannon Index H, i.e. entropy. Using the above, we have scored drugs based on function (2):

where R x is the set of n pathogens with known protein targets for drug x, d i is the taxonomy distance of the ith pathogen targeted by drug x and α is the maximum difference of distances d across the pathogens in set R x , which can be written as:

Finally, the Shannon diversity [45, 46] of distances H can be expressed as:

where D x is the set of discrete distance values observed in R x , whereas p z is the ratio of the distance d z frequency over the total number of measured distances in R x (equal to n-1). S x is maximized as d i approaches 0, but for larger distances, a drug is favored over other equidistant drugs if it is found to have protein targets over a broader and diverse range of pathogens.

Pathogen-host PPI network-based DR. The approach in this section draws from the assumption that similarities between SARS-CoV-2 and other pathogens at host-protein level may provide the appropriate framework to identify and rank candidate drugs to be used against COVID-19. Specifically, a database repository was initially developed by collecting and combining all the protein and host-protein interactions from all organisms found in IntAct [27] , PHISTO [28] and VirHostNet [29] data repositories. The pathogens were identified from the combined unique list of NCBI taxonomy [39] IDs from the aforementioned databases and filtered for nonpathogenic organisms like above. For the case of SARS-CoV-2 pathogen, the 366 host-proteins found in HPA [47] were used, where a pairwise pathogen-to-pathogen network was developed based on common host-proteins inbetween pathogens. Herein, the node size represents the number of host-proteins related to a specific pathogen, and the edge weight represents the number of common host-proteins inbetween two pathogens. The underlying network revealed 189 pathogens that share common host-proteins with SARS-CoV-2, forming a network of first-neighbors. Based on the assumption that the relation between two pathogens cannot be approximated only by their observed commonality at host-protein level but also by the previously mentioned taxonomy distance D inbetween them, the latter network was further enriched with an additional edge score (S edge ), according to the following equation:

where N 1 , N 2 are the number of host-proteins interacting with the two pathogens forming a single edge, N com is the number of common host-proteins that two pathogens share, N min , N max are the maximum and minimum estimations of v = {N 1 , N 2 }vector, and D is the taxonomy distance between two pathogens.

Herein, the final ranked drug list comes from a specific methodology that scans the edges of the underlying SARS-CoV-2 subnetwork, one by one. Specifically, for each pair of pathogens, drugs were obtained from DrugBank, which target the common host-proteins that form the specific edge. Each drug in the list was ranked according to the following generalized 4-fold equation:

where N vtar is the number of drug targets included in SARS-CoV-2 host-proteins, N xtar is the number of drug targets included in the x-pathogen that forms an edge with SARS-CoV-2, N dtar is the total number of drug targets that derive from DrugBank repository [40, 41] , and N 1 , N 2 are the total number of hostproteins included in SARS-CoV-2 and the x-pathogen accordingly. Combining Equations (5) and (6) yields the overall drug score (S d ) as follows:

where N app is the number of times a specific drug appeared through the scanning process, UN tar is the number of unique targets appeared in the scan process, N 1 is the number of SARS-CoV-2 host-proteins used and S edge , S drug are the estimated scores per edge obtained from Equations (5) and (6) accordingly.

Towards the integration of multi-source data from patient samples, we developed a 'network-based multi-source data integration' methodology based on the integration scheme presented by Zachariou et al. [48] .

We integrated data from multiple sources in the form of gene lists with two columns, corresponding to gene identity and gene score from the following sources and shown in Supplementary File S1:

1. Ranked genes lists in terms of absolute logFC from three serum transcriptomic (T) datasets (Series 15, BALF, PBMC) [18] ;

2. Ranked genes lists in terms of absolute log fold from one proteomics (P) dataset [22] ; 3. Ranked genes lists in terms of P-value from one metabolomics (M) dataset [22] ; 4. Unranked list of host proteins (PPI), which interact with SARS-CoV-2 from Gordon et al. [24] ; 5. Unranked unique gene list from HPA, excluding the genes identified by Gordon et al.

The gene symbols in all the gene lists were converted to entrez ID for consistent merging to a single gene ID using the R package org.Hs.eg.db [49] for genome-wide annotation for human. For the three transcriptomics and the proteomics dataset, the gene score GS x per list was calculated based on

where L x is the total number of genes per list and R The three transcriptomic gene lists were merged and the final score per gene was assigned to be the average GS x , x ∈ {Series15, BALF, PBMC} across the three datasets.

We calculated a characteristic score per gene, known as the Multi-source Information Gain (MIG) comprised by two parts:

where MIG n represents the normalized integrated nth genespecific information (i.e. node characteristics) and MIG e represents the normalised integrated gene-gene information (based on the topology of the multi-integrated super network) and corresponds to the weighted degree of the multi-integrated super network. We considered equal contribution to the score of the gene-specific information and of the topology of the integrated gene-gene super network (w = 0.5).

The gene-specific information is given by

where V T is a vector corresponding to the average gene score of the three transcriptomic datasets (Series 15, BALF, PBMC), V P is a vector corresponding to the scored gene list from the proteomics, V M is a vector corresponding to the ranked genes from the dataset, V PPI is a list of unranked genes corresponding to the set of proteins as identified from Gordon et al. [24] and V HPA is a list of unranked genes corresponding to the unique proteins annotated to be key in the HPA, excluding the V PPI ones. The weights w n x for the respective sources were set to ensure that at least 20% of each gene list is included in the top 500 genes. In addition, the weights were set so that their sum satisfies the condition:

The integrated gene list contained in total 1351 genes. We filtered to retain only the genes which were recognizable by the GeneMANIA tool, resulting in a list of 1118 genes. From that, we selected the top 1000 top-scored genes based on their integration score to build networks using the GeneMANIA tool [50] in Cytoscape [51] . Note that 1001 genes were selected as three genes shared the same score at the bottom of the list. We selected four networks in GeneMANIA based on co-expression, co-localization, genetic interaction and physical interaction.

The Multi-source Information (MI) super network was constructed based on the weighted sum of the pairwise weighted edge vectors (for each pair of genes) for these four types of networks (the edge weight automatically calculated by GeneMA-NIA). The total number of connected nodes out of the 1001 genes was 995, and the total number of edges was 45 486.

We calculated the final gene score MIG based on the combination of nodal score and topological information for each gene using equation (10) , and ranked the genes with respect to their importance and involvement in COVID-19 (Supplementary File S4).

The igraph package [52] in R [53] was used to generate and analyze the MI super network and to compile the MIG score.

PathWalks [54] , a map-driven random walk-based methodology on a pathway-to-pathway network, was applied to reveal communities of connected pathways. PathWalks exploits a map that we construct in the form of a synthetic gene network, containing integrated information regarding a disease of interest. For our calculations, we used the multi-thread version of PathWalks with 15 walkers, 10 000 steps per walker and restart every 50 steps run in the infrastructure provided by the National Initiatives for Open Science in Europe-NI4OS Europe (https:// ni4os.eu/ni4os-europe-vs-covid19/). Using the resulting pathway frequencies, i.e. number of visits per pathway, we performed an odds ratio (OR) analysis with respect to a random walk of the pathway network using only the topology of the network without any gene guidance. Specifically, we defined the OR between guided to nonguided walks as:

where index G and T denote the variables corresponding to the guided and topology-only runs, respectively. P G/T i is the visiting probability of the ith pathway calculated as the frequency F i ratio over the total visits F t T on pathways. According to this analysis pathways, with OR values >1 are the ones with higher relative visiting frequency compared with the topology-only runs and are thus more likely to be involved in the disease of interest. On the other hand, OR values <1 correspond to pathways that are less relevant. Pathways with OR values >1 were visualized as a network using R's igraph package [52] , highlighting specific pathways of interest.

The top 50 host-targeting drugs from the 10 lists (Supplementary File S3) were used as input to the CoDReS (Computational Drug Repositioning Score) tool [55] . CoDReS combines an initial drug ranking that may be the repurposing score or an a priori score (aS), with a functional score (FS) of each drug results from the analysis of the disease of interest, as well as with a structural score (StS) derived from drugability violations. Specifically, the initial ranking score of each host-targeted drug from the 10 lists divided with the absolute maximum ranking score was used as the normalized a priori repurposing score for CoDReS. Furthermore, for the case of FS and StS, the 1001 genes from the integration analysis were used with their disease association scores and the structures (SMILES format) of each drug, respectively. A composite score (CoDRes score) was calculated, for each drug, as the normalized-weighted sum of the initial aS with an FS and an StS. The weights that determine the desired influence of each part to the final score were defined as w aS = 0.45, w FS = 0.45 and w StS = 0.1. These parameters were chosen such that the contribution of the initial ranking from the individual approaches (aS) and the integrated disease-gene network (FS) are equally weighted, but the influence of the StS score, which pertains to chemical structural violations, is significantly reduced. Although the latter score is relevant for libraries of novel chemical compounds, the drugs of interest in this work are approved drugs with characterized chemical behavior.

Finally, the top 20 drugs from each re-ranked list were selected for chemical structure diversity analysis. The top 20 cutoff was chosen arbitrarily in order to limit and focus the downstream analyses to approximately one-third of the 600 drugs selected from all individual DR approaches.

We searched and downloaded the structures of the 240 drugs from PubChem [56] , CLUE-The Drug Repurposing Hub [57] (https://clue.io/repurposing#download-data) and from the literature. We removed (i) duplicates entries (drugs found in more than one list), (ii) drugs for which we did not find a structure and (iii) elemental entries (i.e. copper). We then used the OpenBabel software [58] to convert the structures of the remaining 210 drugs to a single Structure data file (SDF) library file, which was then used as input in the ChemBioServer 2.0 tool [59] for calculating the distance matrix of their chemical and structural similarity. Drugs were clustered using a minimum Tanimoto similarity of 80%, which corresponds to a 0.2 distance cutoff.

The rank of each drug was normalized according to Equation (14):

where r N ij is the normalized rank of the ith drug of the jth drug list, whereas n j is the total number of drugs selected. n j was set to 20 since we selected the top 20 drugs from all lists following the CoDReS re-ranking step.

The highest normalized rank value of a given drug across all lists was assigned as its maximum normalized value (Max Rank). The Max Rank of drugs introduced exclusively from one list was set equal to the corresponding normalized rank value.

Finally, from the list of 185 unique drugs obtained after removing duplicates and structurally redundant drugs, we shortlisted the top ∼30% (65 drugs) with respect to their Max Rank.

All listed clinical studies related to the COVID-19 were collected from the ClinicalTrials.gov. Small-molecule drugs curated through the reported clinical studies and the 2D structures of the drugs (SDF files) were obtained from PubChem (https://pu bchem.ncbi.nlm.nih.gov/), where available.

The SDF file containing the chemical structures of the integrated list of 65 drugs and all the available structures of the drugs reported in currently running clinical trials was used as an input to Chembioserver 2.0 in order to obtain the corresponding Tanimoto distance matrix. The latter was analyzed in R to identify which proposed drugs have the same or similar compounds in clinical trials using a distance threshold of ≤0.2.

The workflow adopted in this study outlines five mains steps described in detail below. The overall process entails the analysis of COVID-19-related omics datasets to identify significant genes and PPIs of interest with the subsequent identification and shortlisting of candidate repurposed drugs ( Figure 1 ).

During the first part of the analysis, we collected publicly available multi-omics datasets released from February to May 2020. These included proteomics and metabolomics datasets from blood serum [22] and seven transcriptomic datasets-four from cultured cell lines (Series 1-NHBE, Series 2-5-A549, Series 6-A549 with a transduced ACE2 expressing vector and Series 7-Calu-3) [18] and three patient-derived tissue samples (Series 15lung Biopsy, BALF and PBMC) [20] . A summary of the data along with the source study is available in Table 1 (see Methods) .

Using the logFC value and P-value thresholds (see Methods) from the proteomic dataset, we identified an approximately equal number of over-and underexpressed proteins, 62 and 58, respectively. The selected metabolites from the metabolomics dataset included 97 overabundant compounds, whereas the underabundant were ∼3-fold higher-317. The analysis of the RNA-Seq datasets yielded four sets of DEGs. The PBMC and BALF sets included approximately equal numbers of ∼700 overand ∼300 underexpressed DEGs. On the contrary, from the Series 15 set, we obtained a relatively large number of 3838 DEGs that were under-against 515 overexpressed DEGs. Finally, the DEG sets from cell lines (Series 1, 2 and 5, 6 and 7) included 2588 overand 1036 underexpressed DEGs in total (Table 1 ).

In addition, we collected the available information on PPIs between the SARS-CoV-2 and host proteins as reported by Gordon et al. [24] and HPA database [47] . This comprised a set of 336 human proteins found to interact with SARS-CoV-2 proteins. For the host-pathogen-based approaches (see Methods-PPI Data Selection), we collected 42.7 K PPIs across a total of 625 pathogens and human proteins. From the latter, 189 pathogens were found to interact with at least one common host protein with SARS-CoV-2.

We applied three discrete DR approaches based on (i) transcriptomics, generating eight lists of candidate drugs, (ii) GWASphenotype association (one list) and (iii) pathogen-host PPIs network analysis, generating three lists. A summary of all 12 generated lists of candidate drugs is shown in Table 2 , whereas all drugs per list are reported in Supplementary File S3.

Using the RNA-Seq-derived DEG sets, we performed a series of in silico DR analyses with existing computational tools. By selecting the top 150 over-and underexpressed genes with respect to their logFC value (300 in total, shown in Supplementary File S1), we obtained two lists of candidate drugs using each DEG set as an input to:

(1) An ensemble of DR tools: connectivity map [30] , L1000CDS2 [31] and L1000FWD [32] , followed by a scoring process yielding four lists with 50 drugs each, namely, Tr1 (from Series 1, 2 and 5, 6, 7), Tr2 (from Series 15), Tr3 (from BALF) and Tr4 (from PBMC). (2) CREEDS [33] using the DrugMatrix-based repurposing feature of the tool, which maps the expression profile of the DEGs of interest to the effect of various drugs as measured in rats. By applying the same cutoff of 50 drugs per run, we obtained four additional lists-TrC1 (from Series 1, 2 and 5, 6, 7), TrC2 (from Series 15), TrC3 (from BALF) and TrC4 (from PBMC).

We used 44 genes as input to the PhenoScanner database [34, 35] . These genes correspond to 40 strong SARS-CoV-2 interactors according to the evolutionary analysis as reported by Gordon et al. [24] and four highlighted proteins, ACE2, TMPRSS2, CTSB and CTSL, from the HPA database [47] . PhenoScanner identified 480 genetic associations between the input COVID-19-related genes and genes-or proxy genespreviously associated with various phenotypes with a default Pvalue ≤ 1e-05. Out of these, 186 associations were of genomewide significance (P-value ≤ 5e-08), with cardiovascular diseases (CVD) being the predominant associated phenotype (Supplementary File S2). Overall, the identified Single-nucleotide polymorphism (SNP) associations corresponded to a set of 83 genes, which we used to search for potential drugs in the DGIdb [38] . Following this approach, we compiled a list of 58 drugs (list GW) able to target the 83 genes of interest.

We worked on the assumption that genetic, functional and morphological similarities between SARS-CoV-2 and other pathogens can be approximated by their taxonomic classification. Drugs with a direct inhibitory effect against a given pathogen are more likely to have a similar effect to closely related pathogens in terms of taxonomic distance. To repurpose drugs based on the above assumption, we derived a taxonomy distance matrix (see Methods), which was used to identify antiviral compounds able to target directly a pathogen proteins according to DrugBank's polypeptide target file [40, 41] . This process yielded a list of DrugBank compounds scored as a function of (a) the taxonomic proximity of known target-pathogens to SARS-CoV-2 and (b) a metric of the broad spectrum activity of a given drug. This approach resulted in a ranked list (TaxAV) of 841 antiviral drugs (Supplementary File S5) that were found in Drug-Bank [40, 41] to target a total of 345 unique pathogen NCBI taxonomy IDs. The taxonomy tree for all the pathogens considered in this work and their associated drugs is illustrated in Figure 2 .

Building on the above approach, we included the identified PPIs between pathogen and host proteins in an effort to obtain a more informative scoring scheme regarding the functional interactions between pathogens and human. To this effect, we constructed a pathogen-to-pathogen network where edges describe the commonality of each pathogen to SARS-CoV-2 with respect to (a) their common interactions with host proteins and (b) the taxonomy distance (see Methods). Herein, the antiviral drugs were scored based on the proximity of target pathogens to SARS-CoV-2, resulting in 1178 (Supplementary File S5) scored drugs (list HPAV).

For both TaxAV and HPAV list, we selected the top 20 drugs for forwarding to the structural similarity analysis step (Supplementary File S3).

We used the underlying network to screen further for drugs targeting host proteins. This was based on the assumption that drugs targeting a host protein set with wide interaction overlap between SARS-CoV-2 and other pathogens are deemed more likely to exhibit similar antiviral effects. Herein, drugs were scored by means of a generalized 4-fold equation that accounts for both the host and pathogen protein targets, resulting in a list of 301 drugs. The top 50 drugs (list HPH) shown in Supplementary File S3 were selected and forwarded to the drug re-ranking step, similarly with the other host protein targeting drug lists.

We integrated the available multi-omic data from patient samples based on a previously proposed scheme [48] . Specifically, we integrated the following data: (1) ranked DEGs lists in terms of absolute logFC resulting from the transcriptomics data analysis, (2) a ranked gene list in terms of absolute logFC from the proteomics data analysis, (3) a ranked gene list in terms of P-value derived from the metabolomics data analysis, (4) an unranked list of host-proteins relevant to viral cell entry highlighted in HPA [47] and (5) an unranked list of host-proteins, which interact with SARS-CoV-2 from Gordon et al. [24] .

We calculated the MIG score, a characteristic score per gene comprising the integrated gene-specific information, and the local-weighted degree from a synthetic gene-to-gene network based on co-expression, genetic interactions, physical interactions and co-localization extracted from GeneMANIA [50] , as described in Methods. We then derived the integrated MI network illustrated in Figure 3 along with the relevant score distributions. The network comprises the top 1000 genes as nodes (specifically 1001 genes due to score ties) originating from all sources, with a total of 45 486 edges prorating the gene-to-gene relationships.

Finally, for all downstream functional analysis and drug reranking, we used the genes ranked based on their MIG score, which represents the integrated gene-disease association.

We used the top 300 MIG-ranked genes (as described above) to create a map of significant gene-disease associations. Under the guidance of this integrated map, we used PathWalks [54] to allow walkers to cross a pathway-to-pathway network derived from Kyoto Encyclopedia of Genes and Genomes (KEGG). The most frequent trajectories highlighted communities of pathways predicted to be widely involved in COVID-19. Using the resulting pathway frequencies, i.e. number of visits per pathway, we performed an OR analysis with respect to a random walk using only the topology of the pathway network without any gene map guidance. OR values >1 correspond to higher relative visiting frequency compared with the nonguided runs and are thus more likely to be involved in COVID-19. The network of highlighted pathway communities is shown in Figure 4 .

The CoDReS tool [55] was used to re-rank the candidate drugs based on (a) an FS combining the drug targets' relevance to the disease (as captured by the MIG score above), and the binding affinity to its target genes, (b) an aS defined as the normalized initial drug ranking from each list and (c) an StS representing drugability violations.

We performed the CoDReS re-ranking on the top 50 drugs with respect to their weighted normalized score, from each of the 10 lists of drugs that target host proteins: eight from RNA-Seq data, one from GWAS and one from host-pathogen interactions. The re-ranked lists are given in Supplementary File S3. Next, we selected the top 20 CoDReS drugs from each list (200 in total) for further analysis.

The 200 re-ranked drugs together with the top 20 candidate antiviral drugs originating from the two pathogen-host interaction-based approaches, 240 in total, were screened for chemical structure similarity. Using the ChemBioServer 2.0, we calculated the structural distance matrix based on the Tanimoto index [59] for all pairwise combinations of candidate drugs.

A hierarchical clustering analysis revealed that the input drugs spanned a broad range of chemical structure diversity. Specifically, by applying a distance threshold of <0.2 for highly similar compounds, we obtained 185 clusters, from which 25 comprise more than one drug. We eliminated the structural redundancy within our drug list by selecting the top scoring drug from each cluster, yielding a list of 185 drugs. Alcohol was manually excluded as it was deemed inappropriate for pharmacological use against COVID-19.

Furthermore, in order to assess the contribution of each list to the final nonredundant set of 185 drugs, we performed a redundancy analysis given in Supplementary File S7. The analysis suggests that there is a limited overlap between lists in terms of their member drugs, but all lists are contributing a unique set of drugs. The highest overlap was observed between TaxAV and HPAV, while TaxAV had the lowest contribution of unique drugs (seven drugs) and Tr4 and TrC4 the highest (18 drugs each).

Finally, we shortlisted the top one-third candidate drugs based on their maximum normalized rank (Max Rank), amounting to 65 candidate drugs shown in Table 3 .

We evaluated further the list of 65 drugs by comparing it against the drugs that are currently in clinical trials against COVID-19 as obtained from clinicaltrials.gov. Specifically, 5 out of the 11 top-scoring drugs (dexamethasone, beta-estradiol, atorvastatin, cyclosporin A and remdesivir) are already in clinical trials. Also, eight drugs with a lower normalized ranking score (imatinib, hydroxychloroquine, dactolisib, ofloxacin, leflunomide, simvastatin, pioglitazone and methotrexate) were also found in ongoing clinical trials.

From the remaining drugs, we identified, through structural similarity analysis, two more drugs, which have similar compounds (Tanimoto distance <0.2) in clinical trials. In particular, fluocinoloneacetonide and testosterone were found structurally similar to budesonide and hydrocortisone, which are currently in clinical trials, respectively. The integrated list is shown in Table 3 . To assess potential biases towards particular drugs in our lists, we repeated the transcriptomic-based DR pipeline using as an input 10 random gene lists. We found 3 out of the 65 drugs could potentially be randomly selected when keeping the same number of drugs from each list as in the followed shortlisting procedure. The three drugs are vorinostat, cyclosporin A and cisplatin, and are marked with ( * ) in Table 3 . Further details on the null models used and the corresponding results can be found in the Supplementary File S7.

The 65 drugs included in the produced integrated list (Table 3 ) present a diverse group of compounds from a chemical, pharmacological and clinical perspective. We used three major classification systems to categorize these compounds: the Medical Subject Headings classification (MeSH), the Target-Based Classification (TBC) and the general Chemical classification system (Chem). Although the integrated list contains a versatile group of drugs from a wide range of therapeutic areas (such as anticoagulants, antihistamines and hypolipidemics), the majority of these are antineoplastic agents, followed by immunosuppressive drugs, antivirals and antibacterials. Nevertheless, 16 drugs from the integrated list are experimental drugs, which either are in clinical trials for a number of conditions or are still under preclinical investigation. The majority of the drugs in the integrated list are enzyme or protein inhibitors, in terms of their established mechanism of action. Regarding their chemistry, the list is inclusive of important chemical classes of drug molecules such as triazoles, pyrimidines, platinum-containing compounds and benzimidazoles. However, the most prominent general chemical groups in the integrated list were aromatic amines (nine drugs), drugs with at least one piperazine ring (seven drugs), steroids (five drugs), thiazoles (four drugs) and quinolines (four drugs).

A characteristic representation of the drug candidates' classification can be found within the top 5% of their maximum ranking score (between 0.95 and 1), from which 11 (65%) are protein inhibitors, seven (41%) are antineoplastic agents, five (30%) are aromatic amines and four (24%) include a piperazine ring.

From the integrated list of 65 drugs, a thorough expert curation has highlighted a set of 16 drugs (Table 3 and Figure 5 ). The curation was based on following three main criteria: (a) drugs that have exhibited evidence of efficacy against COVID-19 in Phase 3 clinical trials, (b) drugs that bear pharmacological evidence of direct targeting of SARS-CoVs molecular components and (c) clinically approved drugs that have activity in molecular pathways that have been shown in the literature to be implicated in SARS-CoV-2 biology.

The first criterion was fulfilled by two drugs from the integrated list, dexamethasone and remdesivir, which are the only drugs that have shown to be effective against COVID-19. Nevertheless, from these two, only remdesivir has shown a direct effect against SARS-CoV-2, whereas dexamethasone showed an effect in the reduction of the associated inflammation of the disease rather than the biology of the virus.

The second criterion of the curation was fulfilled by six experimental drugs from the integrated list, which have shown a direct effect on SARS-CoVs in various assays (mostly in vitro). Benzyl (2-oxopropyl) carbamate, D3F, F3F, GRL0617 and WR1 have shown direct inhibition of the coronavirus replicase polyprotein 1ab [60] [61] [62] [63] [64] 977610), which is a major viral protein for the viral replication machinery [24] . N-(2-aminoethyl)-1-aziridineethanamine has shown direct inhibition of the human ACE2 receptor [65, 66] .

The third criterion was fulfilled by nine clinical drugs from the integrated list. These drugs were shown to be effective in targeting SARS-CoV-2 replication cycle and more specifically processes implicated in the generation of virally encoded nonstructural proteins (NSPs), which are essential for the assembly of the viral replicase complex. Vorinostat and hydroquinone are histone deacetylase (HDAC) inhibitors; it has been proposed that the main viral protease (Nsp5) may inhibit HDAC2 transport into the nucleus, and therefore HDAC2 inhibitors may be able to disrupt this interaction and suppress the HDAC2-mediated inflammation and interferon activation [24] . Some HDAC inhibitors have been shown to have antiviral activity (such as HDAC6 inhibitors against Influenza A Virus) [67] , and there is considerable literature that links HDAC inhibitors with T-cell biology and the immune response [68, 69] . Bosutinib, dasatinib, imatinib and saracatinib are tyrosine kinase inhibitors; tyrosine kinase-linked pathways have been implicated in SARS-CoV-2 biology and activation [70] , with various inhibitors having a potential anti-SARS-CoV-2 efficacy. Interestingly, dasatinib and imatinib have previously shown inhibitory effects against the Middle East respiratory syndrome (MERS) and SARS-CoV viruses in vitro at micromolar concentrations [71] , as well as against SARS-CoV-2 [72, 73] . Their wide antiviral efficacy has also been showcased against human immunodeficiency virus (HIV) [74] . Methotrexate is a drug with a strong effect in nucleic acid synthesis and a multi-facet role as antineoplastic, antimetabolite and antirheumatic drug. Interestingly, it was recently shown to present a submicromolar activity against SARS-CoV-2, in vitro [73] . Finally, dactolisib is a drug with a dual activity as a PI3K/mTOR inhibitor, a pathway that we know is important in SARS-CoV biology [75, 76] . Although there is evidence of the antiviral activity of the above drugs against HIV [77] , their potential efficacy against SARS-CoVs is yet to be determined.

Mounting evidence indicate that the clinical manifestations of COVID-19 are systemic, affecting mainly the respiratory and digestive tract, the cardiovascular but also the central nervous system (CNS) [78] [79] [80] [81] [82] . Thus, we pursued an approach that integrates multi-omics data from different types of tissue (BALF, PBMC, alveolar, NHBE, Calu-2 cell lines and blood serum) aimed to build a comprehensive molecular profile of the disease to obtain a more informative basis for DR.

Across the most significant DEG sets selected for the integration step, we observed minimal overlap (24 out of 1 K selected-2.4%), indicating a complex perturbation scheme involving a broad range of biological pathways across different tissues and further supporting the need of a multi-source approach.

Coming from patients with severe COVID-19, the identified DEG sets were as expected enriched with genes that are involved in acute immunoresponse and inflammation. Specifically, the top MIG scoring inflammation factors SAA1/2 and CRP were previously proposed as severity biomarkers for the disease [83] and as indicators of inflammatory tissue damage. Cytokine releaserelated factors, such as CCL3/4, CXCL10, CLC, IL2R and IL10, were also highly ranked along with p53 apoptosis pathway-related genes NTRK1, CTSL, CTSB and IGFB [20] . Genes involved in complement activation and coagulation cascades such as C4a and C5 were also highly perturbed likewise with past Betacoronavirusesrelated outbreaks (SARS and MERS) [84] .The FDCSP, KCNJ2 and ST20 reported to be highly dysregulated from lung biopsies [85] along with FLNA and EGFR point to significant effects of COVID-19 in lung and gastrointestinal epithelia cell proliferation signaling. Specifically the latter (EGFR) has sparked a wider discussion on the involvement of growth factor receptors in viral infections and potential DR avenues of related antineoplastic compounds [86, 87] .

The composition of the identified DEG sets and the functional analysis performed in PathWalks highlighted several pathways related to immune and inflammatory response pathways. Namely, in the top 30 scoring KEGG pathways (Figure 4 ) with respect to their OR values, we highlighted the 'Complement and coagulation cascades, IL-17 signaling pathway, Chemokine signaling pathway, Cytokine-cytokine receptor interaction' and 'Fc epsilon RI signaling pathway'. The 'Renin-angiotensin system' and 'Renin secretion' pathways were also highlighted as a result of the central role of ACE2 in viral entry along with the genes involved in the coagulation cascades [25] . In addition, we found a number of mechanisms involved in more generic KEGG disease terms that are related to infections and respiratory conditions like in 'Pertussis, Asthma, Staphylococcus aureus infection, Prion diseases, Malaria, Systemic lupus erythematosus and African trypanosomiasis'.

In line with the above COVID-19 molecular profile, nine candidate drugs in our proposed integrated list are classified (MeSH) as anti-inflammatory and immunosuppressors such as dexamethasone, cyclosporin A, mercaptopurine, cytarabine, fluocinoloneacetonide, triptolide, LY-255283, tegafur and methotrexate. These were complemented by the nonsteroid anti-inflamatory drugs, bromfenac, zileuton, rofecoxib and choline salicylate. The anticoagulants, renin-angiotensin system targeting drugs in the list were Y-27632, calcium citrate and the ACE2 targeting N-(2-aminoethyl)-1-aziridineethanamine and hydroxychloroquine. Several drugs from the integrated shortlist are classified as antineoplastic/antiproliferative agents (20 in total) including vorinostat, bosutinib, dasatinib and others. Although the antineoplastic drugs are expected to arise from such repurposing approaches, which inevitably identify key genes in proliferation signaling (e.g. EGFR) and apoptosis (p53 signaling pathway), there is a growing interest in the use of this class against viral infections [86, 87] .

Focusing on the experts' curated shortlist from a pathophysiological perspective, a group of four drugs, bosutinib, dasatinib, cytarabine and saracatinibare, are of particular interest as their primary mechanism of action is mediated by the inhibition of Src tyrosine kinase. Inhibiting the latter could help COVID-19 patients who enter a severe clinical trajectory through a number of pathophysiological mechanisms recognized to be involved from preclinical and clinical studies either directly involving COVID-19 or based on previous observations and experiments:

(a) Bruton's tyrosine kinase signaling has been associated with the production of pro-inflammatory cytokines that can contribute to COVID-19 immunopathology, T-cell differentiation, function and survival; hence, it might be beneficial in treating COVID-19-related immunopathology and lymphopenia [88] . This has been already tested in clinical trials [89] , and for dasatinib, it has been confirmed from seminal studies in chronic myelogenous leukemia [90] .

(b) Tyrosine kinase inhibitors have been used to inhibit platelet function (antithrombotic activity) through several mechanisms including novel mechanisms like GAS6/TAM signal inhibition, targeting the MERTK tyrosine kinase active site with a highly potent and bioavailable MERTK inhibitor, UNC2025 [91] .

(c) Mechanical ventilation (MV) has been extensively used to support patients with COVID-19 pneumonia who are developing respiratory failure; however, mortality is unexpectedly high, higher than current Acute respiratory distress syndrome (ARDS) mortality from other etiologies. It has been confirmed from in vitro, ex vivo and in vivo studies that MV itself can induce lung injury through inflammatory Src tyrosine kinase signaling, ICAM-1 expression, leukocyte infiltration and vascular hyperpermeability [92] . Pulmonary vascular endothelial barrier function is indeed partially regulated by Src kinasedependent phosphorylation of caveolin-1 and intercellular adhesion molecule-1. Blocking these mechanisms with Src tyrosine inhibition (i.e. nintedanib [93] ) has proved effective in preventing acute lung injury (ALI) in animal studies including the potentiation of bleomycin-induced ALI [94] .

(d) Pulmonary ischemia-reperfusion, which is associated with a wide range of clinical events [95] , including lung transplantation, cardiopulmonary bypass, trauma, resuscitation for circulatory arrest, atherosclerosis and pulmonary embolism, the latter being widely recognized as a detrimental complication of severe COVID-19 [96] .

(e) A final mechanism by which tyrosine kinase inhibitors could possibly exert anti-COVID-19 action is by eliminating excess soluble fms-like tyrosine kinase (in blood), which has been correlated with endothelial dysfunction and organ failure in critically ill COVID-19 patients [97, 98] .

Complementary to the Src tyrosine kinase inhibitors, the interest on hydroquinone and vorinostat stems from the fact that melanin synthesis inhibition (e.g. by hydroquinone) involves similar pathways with tyrosine kinase. Both drugs are inhibitors of HDAC and have been used to treat pediatric brain cancers [99] and might exert a beneficial effect through inhibition of the pro-inflammatory cytokine storm progression. Although vorinostat has been used against HPV [100], we have not found any specific references in the current literature linking this drug category to COVID-19.

Inhibitors of class IA phosphatidyl inositol-3 kinases, such as dactolisib, are targeting immune responses, particularly through co-stimulation by CD28 and ICOS and have been reported to suppress clinical symptoms in ongoing experimental autoimmune encephalomyelitis and inhibited MOG-specific responses in vitro [101] . Hence, the latter class of drugs can be beneficial in treating COVID-19 effects to the CNS, albeit the underlying mechanisms and the pathology are not well understood. On the other hand, activation of PI3K/Akt pathway has been reported to cause attenuation of Endoplasmic reticulum (ER) stress-induced myocardial apoptosis, facilitating the NGF-induced heart protection [102] . Given that myocardial involvement in COVID-19 was observed in a subset of patients and has detrimental consequences, there might be a reason to be cautious how we interpret this finding.

Despite being less studied, the identified compounds that target directly the viral entry and replication mechanisms and related proteins are of interest. Through our pathogen networkbased approaches, the integrated list of antivirals comprises remdesivir, the first drug to obtain Food and Drug Administration (FDA) approval for emergency use in patients with severe COVID-19 [103] , along with a number of experimental compounds developed during the SARS and MERS outbreaks with promising in vitro results such as benzyl (2-oxopropyl) carbamate [104, 105] , D3F, F3F, GRL0617 and WR1. This list is also complemented by N-(2-aminoethyl)-1-aziridineethanamine that can be effective via its inhibition of ACE2, which is being used by SARS-CoVs for cell entry in order to initiate membrane fusion and cell infection [106] . As expected, the top scoring repurposed antivirals based on commonality and available information from other pathogens were the ones known to be effective against the closest taxa, e.g. members of the SARS-CoV species. However, our full lists of antiviral drug rankings (HPAV and TaxAV available in Supplementary File S5) not only identified and re-ranked drugs nearly from all main phyla of viruses, but also from bacteria that may share interactions with the same host proteins or share genetic similarity. In fact, this approach that was developed and utilized herein specifically for SARS-CoV-2 introduces a rapid mapping and prioritization method of all the known antiviral compounds against any pathogen of interest.

Naturally a multi-source approach like the one proposed in this work comes with certain limitations such as (a) the lack of harmonization across selection criteria applied for the DEGs across several datasets. Ideally data selection should be based on identical criteria/thresholds. This was possible for the cases where raw data were available and the data were produced from the same or similar experimental methodology (e.g. in Series 1, [2] [3] [4] [5] 6, 7, 15) as opposed to BALF, PBMC, proteomics and metabolomics datasets; (b) another limitation was the fact that selection biases among drugs might exist as observed in our null model analysis. Although we detected this possible bias for a limited number of drugs, we deemed more appropriate to keep them and highlight them as potential biased results, rather than excluding them from the final list; (c) finally, the redundancy analysis revealed some overlap between specific drug lists, mainly between HPAV and TaxAV. The small size of the initial pool of this category of drugs (compared with the other approaches) and common components (i.e. the taxonomy distance) in the scoring functions of these approaches expectantly yielded a higher degree of overlap. However, since both lists introduced unique drugs, we deemed appropriate to keep both methods in our workflow. It is also worth noting that as the number of known viral-host protein interactions in the literature increases, in a future application of this workflow (for SARS-CoV-2 or other pathogens), the lists are likely to diverge even more since TaxAV is insensitive to these interactions while the opposite applies for HPAV.

Concluding, we have identified a shortlist of drugs generated from multi-source, multi-omics data utilizing a multiplex DR approach. Note that, we do not anticipate that a single drug can lead to a sufficiently effective treatment of severe COVID-19 and especially in patients with comorbidities. Based on the multifaceted nature of COVID-19, an effective treatment will require combination(s) of drugs, allowing to target multiple pathways to provide (a) anti-inflammatory effects to prevent collateral tissue damage, (b) inhibition of the viral replication mechanisms and (c) where required, protective effects, which account for other underlying conditions like CVD. We anticipate that the proposed integrated list of drugs will enhance the underlying rational and provide insights from a molecular point of view for the compounds that have already entered clinical trials and more importantly will point towards novel therapeutic avenues involving the newly proposed ones.

• We proposed a protocol for multiplex DR approach against COVID-19 based on the molecular profile of patients, extracted from a multi-source multi-omics integrative analysis.

• We have focused on utilizing the publicly available multi-source and multi-omics datasets towards devising a multiplex DR approach yielding a highly informed shortlist of drug candidates against COVID-19 and its causative virus.

• We initially extracted all the available drugs relevant to the molecular profile observed in COVID-19 through an ensemble of existing computational tools as well as newly tailored methods. Through a network-based integration approach, we have generated a genedisease association map, which we exploited further for (a) re-ranking the initial identified therapeutics and (b) identifying the biological mechanisms and pathway communities that are involved in the onset and progression of the disease.

• The proposed drug list not only comprises both drugs aiming to reverse COVID-19-induced perturbations (such as immunomodulatory and anti-inflammatory drugs), but also compounds with a direct antiviral activity. Several of these drugs are already in clinical trials.

• Through the functional analysis of the identified molecular profiles, we found the Src tyrosine kinase to be involved in multiple COVID-19-related pathophysiological mechanisms. Hence, Src tyrosine kinase inhibitors such as bosutinib, dasatinib, cytarabine and saracatinib could hold a promising potential against COVID-19.

Supplementary data are available online at Briefings in Bioinformatics.

All the R scripts developed and used in this work along with their input files and details on the use of other computational tools are available for academic use at https://github. com/CING-BIG/DAVID_vs_COVID-Project. A part of the computational work of this project was supported by the European Union's Horizon 2020 European research infrastructures, 'National Initiatives for Open Science in Europe -NI4OS Europe' project (grant agreement no. 857645). Specifically, the PathWalks parallel version was run on the PARADOX-IV supercomputing facility at the Scientific Computing Laboratory, National Center of Excellence for the Study of Complex Systems, Institute of Physics Belgrade, supported in part by the Ministry of Education, Science, and Technological Development of the Republic of Serbia under project No. ON171017.

Drug repurposing: progress, challenges and recommendations

COVID-19 cytokine storm: the interplay between inflammation and coagulation

CT features of coronavirus disease 2019 (COVID-19) pneumonia in 62 patients in Wuhan, China

Cardiovascular implications of fatal outcomes of patients with coronavirus disease 2019 (COVID-19)

Network medicine framework for identifying drug repurposing opportunities for COVID-19

Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2

In silico drug repurposing to combat COVID-19 based on pharmacogenomics of patient transcriptomic data

Host transcriptome-guided drug repurposing for COVID-19 treatment: a meta-analysis based approach

Repurposing FDA-approved drugs for COVID-19 using a data-driven approach

A review on drug repurposing applicable to COVID-19

Drug repurposing for coronavirus (COVID-19): in silico screening of known drugs against coronavirus 3CL hydrolase and protease enzymes

Molecular docking and dynamic simulations for antiviral compounds against SARS-CoV-2: a computational study

A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2

Repurposing of FDA-approved antivirals, antibiotics, anthelmintics, antioxidants, and cell protectives against SARS-CoV-2 papain-like protease

FDA-approved thiol-reacting drugs that potentially bind into the SARS-CoV-2 main protease, essential for viral replication

In silico studies on therapeutic agents for COVID-19: drug repurposing approach

Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods

SARS-CoV-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems

edgeR: differential expression analysis of digital gene expression data. Bioconductor Fhcrc Org

Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Proteomic and metabolomic characterization of COVID-19 patient sera

Causal analysis approaches in Ingenuity Pathway Analysis

A SARS-CoV-2 protein interaction map reveals targets for drug repurposing

SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor

Expression profiling meta-analysis of ACE2 and TMPRSS2, the putative antiinflammatory receptor and priming protease of SARS-CoV-2 in human cells, and identification of putative modulators

The IntAct molecular interaction database in 2012

PHISTO: pathogenhost interaction search tool

VirHostNet 2.0: surfing on the web of virus/host molecular interactions data

A next generation connectivity map: L1000 platform and the first 1,000,000 profiles

L1000CDS2: LINCS L1000 characteristic direction signatures search engine

L1000FWD: fireworks visualization of drug-induced transcriptomic signatures

Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

PhenoScanner: a database of human genotype-phenotype associations

PhenoScanner V2: an expanded tool for searching human genotypephenotype associations

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019

Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium design of prospective meta-analyses of genome-wide association studies from 5 cohorts

DGIdb 3.0: a redesign and expansion of the drug-gene interaction database

The NCBI taxonomy database

DrugBank: a comprehensive resource for in silico drug discovery and exploration

DrugBank 5.0: a major update to the DrugBank database

Taxonomic search and retrieval in R

A taxonomic distinctness index and its statistical properties

A further biodiversity index applicable to species lists: variation in taxonomic distinctness

A tribute to Claude-Shannon (1916-2001) and a plea for more rigorous use of species richness, species diversity and the 'Shannon-Wiener' Index

The mathematical theory of communication

A subcellular map of the human proteome

Integrating multisource information on a single network to detect diseaserelated clusters of molecular mechanisms

Hs. eg. db: genome wide annotation for human

GeneMANIA prediction server 2013 update: biological network integration for gene prioritization and predicting gene function

Cytoscape: a software environment for integrated models of biomolecular interaction networks

R: A Language and Environment for Statistical Computing

PathWalks: identifying pathway communities using a disease-related map of integrated information

A web tool for ranking candidate drugs against a selected disease based on a combination of functional and structural criteria

Pub-Chem: integrated platform of small molecules and biological activities

The drug repurposing hub: a next-generation drug library and information resource

Open Babel: an open chemical toolbox

ChemBioServer 2.0: an advanced web server for filtering, clustering and networking of chemical compounds facilitating both drug discovery and repurposing

SARS coronavirus main proteinase 3.4.22.69. BT -Class 3.4-6 Hydrolases, Lyases, Isomerases

Structure-based drug design and structural biology study of novel nonpeptide inhibitors of severe acute respiratory syndrome coronavirus main protease

Structure-based design, synthesis, and biological evaluation of a series of novel and reversible inhibitors for the severe acute respiratory syndrome -coronavirus papain-like protease

Comparison of SARS and NL63 papain-like protease binding sites and binding site dynamics: inhibitor design implications

Discovery, synthesis, and structure-based optimization of a series of N-(tertbutyl)-2-(N-arylamido)-2-(pyridin-3-yl) acetamides (ML188) as potent noncovalent small molecule inhibitors of the severe acute respiratory syndrome coronavirus (SARS-CoV) 3CL pr

Thiol-based angiotensin-converting enzyme 2 inhibitors: P1 modifications for the exploration of the S1 subsite

Structure-based discovery of a novel angiotensin-converting enzyme 2 inhibitor

Histone deacetylase 6 inhibits influenza A virus release by downregulating the trafficking of viral components to the plasma membrane via its substrate, acetylated microtubules

Elucidating the role of HDACs in T cell biology and comparing distinct HDAC inhibitors in augmenting responses to cancer immunotherapy

Histone/protein deacetylases and T-cell immune responses

Repurposing of kinase inhibitors for treatment of COVID-19

Repurposing of clinically developed drugs for treatment of Middle East respiratory syndrome coronavirus infection

Bcr-Abl tyrosine kinase inhibitor imatinib as a potential drug for COVID-19

Reversal of infected host gene expression identifies repurposed drug candidates for COVID-19

Dasatinib inhibits HIV-1 replication through the interference of SAMHD1 phosphorylation in CD4+ T cells

mTOR inhibition and p53 activation, microR-NAs: the possible therapy against pandemic COVID-19

Targeting T-cell senescence and cytokine storm with rapamycin to prevent severe progression in COVID-19

Induction of autophagy by PI3K/MTOR and PI3K/MTOR/BRD4 inhibitors suppresses HIV-1 replication

Neurologic manifestations of hospitalized patients with coronavirus disease 2019 in Wuhan

Clinical insights into the gastrointestinal manifestations of COVID-19

Coronavirus disease-2019 (COVID-19) and cardiovascular complications

COVID-19 and the cardiovascular system

Chest CT manifestations of new coronavirus disease 2019 (COVID-19): a pictorial review

Serum amyloid A is a biomarker of severe coronavirus disease and poor prognosis

COVID-19: complement, coagulation, and collateral damage

A lung transcriptomic analysis for exploring host response in COVID-19

The role of growth factor receptors in viral infections: an opportunity for drug repurposing against emerging viral diseases such as COVID-19?

The landscape of human cancer proteins targeted by SARS-CoV-2

BTK/ITK dual inhibitors: modulating immunopathology and lymphopenia for COVID-19 therapy

Inhibition of Bruton tyrosine kinase in patients with severe COVID-19

Src tyrosine kinase inhibitors: new perspectives on their immune, antiviral, and senotherapeutic potential

The small-molecule MERTK inhibitor UNC 2025 decreases platelet activation and prevents thrombosis

Ropivacaine attenuates endotoxin plus hyperinflation-mediated acute lung injury via inhibition of early-onset Src-dependent signaling

Nintedanib reduces ventilationaugmented bleomycin-induced epithelial-mesenchymal transition and lung fibrosis through suppression of the Src pathway

Mechanical ventilation augments bleomycin-induced epithelial-mesenchymal transition through the Src pathway

Src tyrosine kinase inhibition prevents pulmonary ischemia-reperfusion-induced acute lung injury

Acute pulmonary embolism associated with COVID-19 pneumonia detected with pulmonary CT angiography

Excess soluble fms-like tyrosine kinase 1 correlates with endothelial dysfunction and organ failure in critically ill COVID-19 patients

COVID-19-driven endothelial damage: complement, HIF-1, and ABL2 are potential pathways of damage and targets for cure

Histone deacetylase inhibitors in pediatric brain cancers: biological activities and therapeutic potential

Vorinostat, a pan-HDAC inhibitor, abrogates productive HPV-18 DNA amplification

Erratum: suppression of CD4+ T lymphocyte activation 'in vitro' and experimental encephalomyelitis 'in vivo' by the phosphatidyl inositol 3-kinase inhibitor PIK-75

Nerve growth factor protects the ischemic heart via attenuation of the endoplasmic reticulum stress induced apoptosis by activation of phosphatidylinositol 3-kinase

Remdesivir: a review of its discovery and development leading to emergency use authorization for treatment of COVID-19

A structural view of the inactivation of the SARS coronavirus main proteinase by benzotriazole esters

Development of broadspectrum halomethyl ketone inhibitors against coronavirus main protease 3CLpro

Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target