key: cord-349794-mhviub6e authors: Le, Brian L.; Andreoletti, Gaia; Oskotsky, Tomiko; Vallejo-Gracia, Albert; Rosales, Romel; Yu, Katharine; Kosti, Idit; Leon, Kristoffer E.; Bunis, Daniel G.; Li, Christine; Kumar, G. Renuka; White, Kris M.; García-Sastre, Adolfo; Ott, Melanie; Sirota, Marina title: Transcriptomics-based drug repositioning pipeline identifies therapeutic candidates for COVID-19 date: 2020-10-23 journal: bioRxiv DOI: 10.1101/2020.10.23.352666 sha: doc_id: 349794 cord_uid: mhviub6e The novel SARS-CoV-2 virus emerged in December 2019 and has few effective treatments. We applied a computational drug repositioning pipeline to SARS-CoV-2 differential gene expression signatures derived from publicly available data. We utilized three independent published studies to acquire or generate lists of differentially expressed genes between control and SARS-CoV-2-infected samples. Using a rank-based pattern matching strategy based on the Kolmogorov-Smirnov Statistic, the signatures were queried against drug profiles from Connectivity Map (CMap). We validated sixteen of our top predicted hits in live SARS-CoV-2 antiviral assays in either Calu-3 or 293T-ACE2 cells. Validation experiments in human cell lines showed that 11 of the 16 compounds tested to date (including clofazimine, haloperidol and others) had measurable antiviral activity against SARS-CoV-2. These initial results are encouraging as we continue to work towards a further analysis of these predicted drugs as potential therapeutics for the treatment of COVID-19. SARS-CoV-2 has already claimed at least a million lives, has been detected in at least 40 million people, and has likely infected at least another 200 million. The spectrum of disease caused by the virus can be broad ranging from silent infection to lethal disease, with an estimated infection-fatality ratio around 1% 1 . SARS-CoV-2 infection has been shown to affect many organs of the body in addition to the lungs 2 . Three epidemiological factors increase the risk of disease severity: increasing age, decade-by-decade, after the age of 50 years; being male; and various underlying medical conditions 1 . However, even taking these factors into account, there is immense interindividual clinical variability in each demographic category considered 3 . Recently, researchers found that more than 10% of people who develop severe COVID-19 have misguided antibodies-autoantibodies-that attack the innate immune system. Another 3.5% or more of people who develop severe COVID-19 carry specific genetic mutations that impact innate immunity. Consequently, both groups lack effective innate immune responses that depend on type interferon, demonstrating a crucial role for type interferon in protecting cells and the body from COVID-19. Whether the type interferon has been neutralized by autoantibodies or-because of a faulty gene-is produced in insufficient amounts or induced an inadequate antiviral response, the absence of type IFN-mediated immune response appears to be a commonality among a subgroup of people who suffer from life-threatening COVID-19 pneumonia 3 . While numerous efforts are underway to identify potential therapies targeting various aspects of the disease, there is a paucity of clinically proven treatments for COVID- 19 . There have been efforts to therapeutically target the hyperinflammation 4 associated with severe COVID-19 4 , as well as to utilize previously identified antiviral medications 5, 6 . One of these antivirals, remdesivir, an intravenously administered RNAdependent RNA polymerase inhibitor, showed positive preliminary results in patients with severe COVID-19 7 . In October 2020, the FDA approved remdesivir for the treatment of COVID-19 8 . Dexamethasone has also been shown to reduce the mortality rate in cases of severe COVID-19 9 . Nevertheless, the lack of treatments and the severity of the current health pandemic warrant the exploration of rapid identification methods of preventive and therapeutic strategies from every angle. The traditional paradigm of drug discovery is generally regarded as protracted and costly, taking approximately 15 years and over $1 billion to develop and bring a novel drug to market 10 . The repositioning of drugs already approved for human use mitigates the costs and risks associated with early stages of drug development, and offers shorter routes to approval for therapeutic indications. Successful examples of drug repositioning include the indication of thalidomide for severe erythema nodosum leprosum and retinoic acid for acute promyelocytic leukemia 11 . The development and availability of large-scale genomic, transcriptomic, and other molecular profiling technologies and publicly available databases, in combination with the deployment of the network concept of drug targets and the power of phenotypic screening, provide an unprecedented opportunity to advance rational drug design. Drug repositioning is being extensively explored for COVID-19. High-throughput screening pipelines have been implemented in order to quickly test drug candidates as they are identified [12] [13] [14] [15] . In the past, our group has successfully applied a transcriptomics-5 based computational drug repositioning pipeline to identify novel therapeutic uses for existing drugs 16 . This pipeline leverages transcriptomic data to perform a patternmatching search between diseases and drugs. The underlying hypothesis is that for a given disease signature consisting of a set of up and down-regulated genes, if there is a drug profile where those same sets of genes are instead down-regulated and upregulated, respectively, then that drug could be therapeutic for the disease. This method has shown promising results for a variety of different indications, including inflammatory bowel disease 17 , dermatomyositis 18 , cancer [19] [20] [21] , and preterm birth 22 . In existing work from Xing et al. 23 , this pipeline has been used to identify potential drug hits from multiple input disease signatures derived from SARS-CoV or MERS-CoV data. The results were aggregated to obtain a consensus ranking, with 10 drugs selected for in vitro testing against SARS-CoV-2 in Vero E6 cell lines, with four drugs (bortezomib, dactolisib, alvocidib and methotrexate) showing viral inhibition 23 . However, this pipeline has not yet been applied specifically to SARS-CoV-2 infection. A variety of different transcriptomic datasets related to SARS-CoV-2 were published in the spring of 2020. In May 2020, Blanco-Melo et al. studied the transcriptomic signature of SARS-CoV-2 in a variety of different systems, including human cell lines and a ferret model 24 . By infecting human adenocarcinomic alveolar basal epithelial cells with SARS-CoV-2 and comparing to controls, the authors generated a list of 120 differentially expressed genes. They observed two enriched pathways: one composed primarily of type-I interferon-stimulated genes (ISGs) involved in the cellular response to viral infection; and a second composed of chemokines, cytokines, and complement proteins involved in the humoral response. After infecting 6 the cell lines, Blanco-Melo et al. did not detect either ACE2 or TMPRSS2, which are the SARS-CoV-2 receptor and SARS-CoV-2 protease, respectively 25 . However, supported viral replication was observed, thereby allowing the capture of some of the biological responses to SARS-CoV-2. In May 2020, another study by Lamers et al. examined SARS-CoV-2 infection in human small intestinal organoids grown from primary gut epithelial stem cells. The organoids were exposed to SARS-CoV-2 and grown in various conditions, including Wnt-high expansion media. Enterocytes were readily infected by the virus, and RNA sequencing revealed upregulation of cytokines and genes related to type I and III interferon responses 26 . A limited amount of transcriptomic data from human samples has also been published. One study detailed the transcriptional signature of bronchoalveolar lavage fluid (of which responding immune cells are often a primary component) of COVID-19 patients compared to controls 27 . Despite a limited number of samples, the results were striking enough to reveal inflammatory cytokine profiles in the COVID-19 cases, along with enrichments in the activation of apoptosis and the P53 signaling pathways. On the drug side, data are available in the form of differential gene expression profiles from testing on human cells. Publicly-available versions include the Connectivity Map (CMap) 28 , which contains genome-wide testing on approximately 1,300 drugs, wherein the differential profile for a drug was generated by comparing cultured cells treated with the drug to untreated control cultures. Here, we applied our existing computational drug repositioning pipeline to identify drug profiles with significantly reversed differential gene expression compared to predicted inhibitors (including one tested in Calu-3) were incubated with SARS-CoV-2 infected human embryonic kidney 293T cells overexpressing ACE2 (293T-ACE2) with viral replication determined using an immunofluorescence-based assay. 8 In this study, we applied our drug repositioning pipeline to SARS-CoV-2 differential gene expression signatures derived from publicly available RNA-seq data ( Figure 1 ). The transcriptomic data were generated from distinct types of tissues, so rather than aggregating them together, we predicted therapeutics for each signature and then combined the results. We utilized three independent gene expression signatures (labelled "ALV", "EXP", and "BALF"), each of which consisted of lists of differentially expressed genes between SARS-CoV-2 samples and their respective controls. The ALV signature was generated from human adenocarcinomic alveolar basal epithelial cells by comparing SARS-CoV-2 infection to mock-infection conditions 24 . The EXP signature originated from a study where organoids, grown from human intestinal cells expanded in Wnt-high expansion media, were infected with SARS-CoV-2 and then compared to controls 26 . The BALF signature was from a contrast of primary human BALF samples from two COVID-19 patients versus three controls 27 . Each of these signatures was contrasted with drug profiles of differential gene expression from CMap. For each of the input signatures, we applied a significance threshold false discovery rate (FDR) < 0.05. We further applied minimum fold change thresholds in order to identify the driving genes. The ALV signature had only 120 genes, with 109 genes shared with the drug profiles; in order to maintain at least 100 genes for the pattern-matching algorithm to work with, we applied no fold-change threshold. For the EXP signature, we applied a |log 2 FC| > 2 cutoff, resulting in 125 genes for the expansion signature (108 shared with the drug profiles). For the BALF signature, we 9 processed the raw read count data to calculate differential gene expression values. We applied a |log 2 FC| > 4 cutoff, with the BALF data yielding 1,349 protein-coding genes for the lavage fluid signature (941 shared with the drug profiles). The gene lists for each of these signatures can be found in the supplement (Tables S1, S2, S3). We used GSEA (Gene Set Enrichment Analysis) 29 We used the publicly available single-cell RNAseq dataset GSE128033 31 composed of 13 patients (4 healthy, 3 presenting with mild COVID-19 symptoms, and 6 presenting with severe COVID-19 symptoms) to further characterise the expression of DKK1 ( Figure S1 ). Data were re-analyzed following the standard Seurat pipeline. From the analyses of the single-cell data, DKK1 is highly expressed in COVID-19 patients compared to controls, specifically in severe patients and it is expressed by epithelial cells. 10 After analyzing the input SARS-CoV-2 signatures, we utilized our repositioning pipeline to identify drugs with reversed profiles from CMap ( Figure 1 ). Significantly reversed drug profiles were identified for each of the signatures using a permutation approach: 30 hits from the ALV signature (Table S4 ), 15 hits from the EXP signature (Table S5) , and 86 hits from the BALF signature (Table S6 ). When visualizing the gene regulation of the input signatures and their respective top 15 drug hits, the overall reversal pattern can be observed ( Figure 2C -E). In total, our analysis identified 102 unique drug hits (Table S7) . Twenty-five common drug hits were shared by at least two of the signatures (p = 0.0334), with four consensus drug hits (bacampicillin, clofazimine, haloperidol, valproic acid) across all three signatures (p = 0.0599) (Table 1, Figure 3A ). We further characterized the common hits by examining their interactions with proteins in humans. We used known drug targets from DrugBank 32 and predicted additional targets using the similarity ensemble approach (SEA) 33 . We visualized the known interactions from DrugBank in a network ( Figure 3B ). We also aggregated the list of proteins which were found in DrugBank for at least 2 of the common hits (Table S9) To confirm the validity of our approach, the inhibitory effects of 16 of our drug hits which significantly reversed multiple SARS-CoV-2 profiles were assessed in live antiviral assays. The inhibitory effects of haloperidol, clofazimine, valproic acid, and fluticasone were evaluated in SARS-CoV-2 infected Calu-3 cells (human lung epithelial cell line), with remdesivir also tested as a positive control. From these five, remdesivir and haloperidol inhibited viral replication ( Figure 4A ), and the inhibitory effect was also observed by microscopy ( Figure 4B ). Additionally, 13 drugs (bacampicillin, ciclopirox, ciclosporin, clofazimine, dicycloverine, fludrocortisone, isoxicam, lansoprazole, metixene, myricetin, pentoxifylline, sirolimus, tretinoin) were assessed in a live SARS-CoV-2 antiviral assay. Remdesivir was again used as a positive control. This testing involved six serial dilutions of each drug to inhibit the replication of SARS-CoV-2 in 293T-ACE2 cells using an immunofluorescence-based antiviral assay 34 . All antiviral assays were paired with cytotoxicity assays using identical drug concentrations in uninfected human 293T-ACE2 13 cells. Positive control remdesivir and 10 of our predicted drugs (bacampicillin, ciclopirox, ciclosporin, clofazimine, dicycloverine, isoxicam, metixene, pentoxifylline, sirolimus, and tretinoin) showed antiviral efficacy against SARS-CoV-2, reducing viral infection by at least 50%, that was distinguishable from their cytotoxicity profile when tested in this cell line ( Figure 5) . Several inhibitors showed micromolar to sub-micromolar antiviral efficacy, including clofazimine, ciclosporin, ciclopirox, and metixene. These results not only confirm our predictive methods, but have also identified several clinically-approved drugs with potential for repurposing for the treatment of COVID-19. Here, we used a transcriptomics-based drug repositioning pipeline to predict therapeutic drug hits for three different input SARS-CoV-2 signatures, each of which came from distinct human cell or tissue origins. We found significant overlap of the therapeutic predictions for these signatures. Twenty-five of our drug hits reversed at least two of the three input signatures (Table 3) . Notably, 14 of the 15 hits from the EXP signature were also hits for the BALF signature, despite being generated from different types of tissue. The EXP signature was generated from intestinal tissue, whereas the BALF signature was generated from constituents of the respiratory tract. Among the common hits reversing at least two of the signatures were two immunosuppressants (ciclosporin and sirolimus), an antiinflammatory medication (isoxicam), and two steroids (fludrocortisone and fluticasone). Our testing of clofazimine demonstrated submicromolar antiviral effects of this drug in SARS-Co-V-2 infected 293T-ACE2 and Vero E6 cells (Figures 4 and S3) . Clofazimine is an orally administered antimycobacterial drug used in the treatment of leprosy. By preferentially binding to mycobacterial DNA, clofazimine disrupts the cell cycle and eventually kills the bacterium 36 . In addition to being an antimycobacterial agent, clofazimine also possesses anti-inflammatory properties primarily by inhibiting T 16 lymphocyte activation and proliferation 37 Validation experiments revealed antiviral activity for 11 of 16 drug hits. Further clinical investigation into these drug hits as well as potential combination therapies is warranted. 18 We have previously developed and used a transcriptomics based bioinformatics approach for drug repositioning in various contexts including inflammatory bowel disease, dermatomyositis, and spontaneous preterm birth. For a list of differentially expressed genes, the computational pipeline compares the ranked differential expression of a disease signature with that of a profile 16, 19, 28 . A reversal score based on the Kolmogorov-Smirnov statistic is generated for each disease-drug pair, with the idea that if the drug profile significantly reverses the disease signature, then the drug could be potentially therapeutic for the disease. Blanco-Melo et al. generated a differential gene expression signature using RNAseq on human adenocarcinomic alveolar basal epithelial cells infected with SARS-CoV-2 propagated from Vero E6 cells (GSE147507) 24 . Due to the fast-moving nature of the research topic, we opted to use this cell line data in lieu of waiting for substantial patient-level data. This work identified 120 differentially expressed genes (DEGs) -100 upregulated and 20 downregulated. We used these 120 genes as the ALV signature for our computational pipeline (Table S1) (v.1.24.0 R package) to call differentially expressed genes (DEGs). After adjusting for the sequencing platform, the default settings of DESeq2 were used. Principal components were generated using the DESeq2 function ( Figure S2 ), and heat maps were generated using the Bioconductor package pheatmap (v.1.0.12) using the rlogtransformed counts ( Figure S3 ). Values shown are rlog-transformed and rownormalized. Volcano plots were generated using the Bioconductor package EnhancedVolcano (v.1.2.0) ( Figure S4 ). Retaining only protein-coding genes and applying both a significance threshold and a fold-change cutoff (FDR < 0.05, |log 2 FC| > 4), we obtained 1,349 genes to be used as the BALF signature (Table S3 ). 20 Functional enrichment gene-set analysis for GSEA (Gene Set Enrichment Analysis) was performed using fgsea (v.1.12.0 R package) and the input gene lists were ranked by log2 fold change. The 50 Hallmark Gene Sets used in the GSEA analysis were downloaded from MSigDB Signatures database 29, 47 . For GO (Gene Ontology) terms, identification of enriched biological themes was performed using the DAVID database 48 . Drug gene expression profiles were sourced from Connectivity Map (CMap), a publicly-available database of drugs tested on cancer cell lines 28 Computational gene expression reversal scoring 21 To compute reversal scores, we used a non-parametric rank-based method similar to the Kolmogorov-Smirnov test statistic. This analysis was originally suggested by the creators of the CMap database and has since been implemented in a variety of different settings [16] [17] [18] [19] 22, 28 . Similar to past works, we applied a pre-filtering step to the CMap profiles to maintain only drug profiles which were significantly correlated with another profile of the same drug. Drugs were assigned reversal scores based on their ranked differential gene expression profile relative to the SARS-CoV-2 ranked differential gene expression signature. A negative reversal score indicated that the drug had a profile which reversed the SARS-CoV-2 signature; that is, up-regulated genes in the SARS-CoV-2 signature were down-regulated in the drug profile and vice versa. Single-cell data analysis 22 Quantification files were downloaded from GEO GSE145926. An individual Seurat object for each sample was generated using Seurat v.3. While the data has been filtered by 10x's algorithm, we still needed to ensure the remaining cells are clean and devoid of artifacts. We calculated three confounders for the dataset: mitochondrial percentage, ribosomal percentage, and cell cycle state information. For each sample, cells were normalized for genes expressed per cell and per total expression, then multiplied by a scale factor of 10,000 and log-transformed. Low quality cells were excluded from our analyses-this was achieved by filtering out cells with greater than 5,000 and fewer than 300 genes and cells with high percentage of mitochondrial and ribosomal genes (greater than 10% for mitochondrial genes, and 50% for ribosomal genes). SCTransform is a relatively new technique that uses "Pearson Residuals" (PR) to normalize the data. PR's are independent of sequencing depththanks 49 . We "regress out" the effects of mitochondrial and ribosomal genes, and the cell cycling state of each cell, so they do not dominate the downstream signal used for clustering and differential expression. We then performed a lineage auto-update disabled r dimensional reduction (RunPCA function). Then, each sample was merged together into one Seurat object. Data were then re-normalized and dimensionality reduction and significant principal For studies at the Gladstone Institutes, Calu-3 cells, a human lung epithelial cell line For studies carried out at Mount Sinai, SARS-CoV-2 was propagated in Vero E6 cells and 293T-ACE2 cells, as previously described in 12, 34 . Two thousand cells were seeded into 96-well plates in DMEM (10% FBS) and incubated for 24 h at 37 °C,5% CO2. Then, 2 h before infection, the medium was replaced with 100 Predicted COVID-19 fatality rates based on age, sex, comorbidities and health system capacity Extrapulmonary manifestations of COVID-19 Inborn errors of type I IFN immunity in patients with life-threatening COVID-19 COVID-19: consider cytokine storm syndromes and immunosuppression A protein interaction map identifies existing drugs targeting SARS-CoV-2 Silico Discovery of Candidate Drugs against Covid-19 Remdesivir for the Treatment of Covid-19 -Preliminary Report Food and Drug Administration. FDA Approves First Treatment for COVID-19 Dexamethasone in Hospitalized Patients with Covid-19 -Preliminary Report The price of innovation: new estimates of drug development costs Old drugs -new uses A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2 Comparative host-coronavirus protein interaction networks reveal panviral disease mechanisms In silico studies on therapeutic agents for COVID-19: Drug repurposing approach Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data Computational Repositioning of the Anticonvulsant Topiramate for Inflammatory Bowel Disease Identification of alphaadrenergic agonists as potential therapeutic agents for dermatomyositis through drugrepurposing using public expression datasets Computational Discovery of Niclosamide Ethanolamine, a Repurposed Drug Candidate That Reduces Growth of Hepatocellular Carcinoma Cells In Vitro and in Mice by Inhibiting Cell Division Cycle 37 Signaling Reversal of cancer gene expression correlates with drug efficacy and reveals therapeutic targets A Drug Repositioning Approach Identifies Tricyclic Antidepressants as Inhibitors of Small Cell Lung Cancer and Other Neuroendocrine Tumors Computational discovery of therapeutic candidates for preventing preterm birth Reversal of Infected Host Gene Expression Identifies Repurposed Drug Candidates for Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor SARS-CoV-2 productively infects human gut enterocytes Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19 DrugBank 5.0: a major update to the DrugBank database for 2018 Relating protein pharmacology by ligand chemistry An In Vitro Microneutralization Assay for SARS-CoV-2 Serology and Drug Screening Schizophrenia: more dopamine, more D2 receptors Clofazimine: current status and future prospects Clofazimine is a broad-spectrum coronavirus inhibitor that antagonizes SARS-CoV-2 replication in primary human cell culture and hamsters Systematic integration of biomedical knowledge prioritizes drugs for repurposing A Next Generation Connectivity Map: L1000 platform and the first 1,000,000 profiles Entrez Gene: gene-centered information at NCBI Circulating levels of GDF-15 and calprotectin for prediction of in-hospital mortality in COVID-19 patients: A case series WHO | WHO Model Lists of Essential Medicines Molecular signatures database (MSigDB) 3.0 Database for Annotation, Visualization, and Integrated Discovery Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression INTERFERON_ALPHA_RESPONSE INTERFERON_GAMMA_RESPONSE E2F_TARGETS MYC_TARGETS_V1 G2M_CHECKPOINT IL6_JAK_STAT3_SIGNALING KRAS_SIGNALING_DN OXIDATIVE_PHOSPHORYLATION ESTROGEN_RESPONSE_LATE ANDROGEN_RESPONSE SPERMATOGENESIS TNFA_SIGNALING_VIA_NFKB KRAS_SIGNALING_UP PANCREAS_BETA_CELLS INFLAMMATORY_RESPONSE MYC_TARGETS_V2 DNA_REPAIR EPITHELIAL_MESENCHYMAL_TRANSITION ALLOGRAFT_REJECTION UNFOLDED_PROTEIN_RESPONSE COMPLEMENT ESTROGEN_RESPONSE_EARLY PROTEIN_SECRETION UV_RESPONSE_UP FATTY_ACID_METABOLISM HEDGEHOG_SIGNALING APICAL_SURFACE UV_RESPONSE_DN NOTCH_SIGNALING REACTIVE_OXIGEN_SPECIES_PATHWAY TGF_BETA_SIGNALING IL2_STAT5_SIGNALING MITOTIC_SPINDLE PEROXISOME PI3K_AKT_MTOR_SIGNALING XENOBIOTIC_METABOLISM P53_PATHWAY GLYCOLYSIS COAGULATION APICAL_JUNCTION MTORC1_SIGNALING CHOLESTEROL_HOMEOSTASIS ADIPOGENESIS