key: cord-350655-04dq9b4r authors: Ulas, T.; Seep, L.; Schulte-Schrepping, J.; De Domenico, E.; Mengiste, S.; Theis, H.; Kraut, M.; Becker, M.; Gierlich, J.; Lenkeit, L.; Drews, A.; van Uelft, M.; Dahm, K.; Agrawal, S.; Gemuend, I. D.; Horne, A.; Holsten, L.; Herbert, M.; Kroeger, C.; Kapellos, T. S.; Pecht, T.; Knoll, R.; Bassler, K.; Reusch, N.; Bonaguro, L.; Nuesch-Germano, M.; Oestreich, M.; Aschenbrenner, A. C.; Schultze, J. L.; Kox, M.; Bruse, N.; Pickkers, P.; Gerretsen, J.; Netea, M. G.; van de Veerdonk, F.; Nattermann, J.; Kraemer, B.; Raabe, J.; ToVinh, M.; Hoffmeister, C.; Rieke, G. J.; Keitel, V.; Breteler, M. M. title: Disease severity-specific neutrophil signatures in blood transcriptomes stratify COVID-19 patients date: 2020-07-08 journal: nan DOI: 10.1101/2020.07.07.20148395 sha: doc_id: 350655 cord_uid: 04dq9b4r The SARS-CoV-2 pandemic is currently leading to increasing numbers of COVID-19 patients all over the world. Clinical presentations range from asymptomatic, mild respiratory tract infection, to severe cases with acute respiratory distress syndrome, respiratory failure, and death. Reports on a dysregulated immune system in the severe cases calls for a better characterization and understanding of the changes in the immune system. Here, we profiled whole blood transcriptomes of 39 COVID-19 patients and 10 control donors enabling a data-driven stratification based on molecular phenotype. Neutrophil activation-associated signatures were prominently enriched in severe patient groups, which was corroborated in whole blood transcriptomes from an independent second cohort of 30 as well as in granulocyte samples from a third cohort of 11 COVID-19 patients. Comparison of COVID-19 blood transcriptomes with those of a collection of over 2,600 samples derived from 11 different viral infections, inflammatory diseases and independent control samples revealed highly specific transcriptome signatures for COVID-19. Further, stratified transcriptomes predicted patient subgroup-specific drug candidates targeting the dysregulated systemic immune response of the host. The SARS-CoV-2 pandemic is currently leading to increasing numbers of COVID-19 patients all over the 46 world. Clinical presentations range from asymptomatic, mild respiratory tract infection, to severe cases with 47 acute respiratory distress syndrome, respiratory failure, and death. Reports on a dysregulated immune 48 system in the severe cases calls for a better characterization and understanding of the changes in the 49 immune system. Here, we profiled whole blood transcriptomes of 39 COVID-19 patients and 10 control 50 donors enabling a data-driven stratification based on molecular phenotype. Neutrophil activation-51 associated signatures were prominently enriched in severe patient groups, which was corroborated in whole is great variety in disease manifestation, ranging from asymptomatic cases, to flu-like symptoms, to severe 69 cases needing mechanical ventilation, to those who do not survive (4-8). Increasing evidence suggests 70 that the immune system plays a pivotal role in determining the severity of the disease course and it has 71 been suggested that different molecular phenotypes might be responsible for the heterogenous outcome 72 of . Identifying these molecular phenotypes might not only be important for a better 73 understanding of the pathophysiology of the disease, but also to better define patient subgroups that are 74 more likely to benefit from specific therapies (12) (13) (14) (15) (16) (17) . Indeed, while vaccines are still under development, 75 finding an effective and patient-tailored therapeutic management for COVID-19 patients including targeting 76 derailed immune mechanisms (18, 19) is key to mitigate the clinical burden as well as to prevent further 77 disease fatalities (15, 16) . The analysis of peripheral blood-derived immune parameters in inflammatory and infectious diseases either 79 by classical testing, including flow cytometry and serum protein measurements, or omics technologies, 80 including transcriptomics, has been proven very valuable in the past (20-28). In COVID-19 patients, 81 monitoring peripheral blood as a proxy for the ongoing changes within the circulating cells of the immune 82 system, has revealed lymphopenia to correlate with disease severity (29). Similarly to SARS-CoV and 83 MERS-CoV infections, hyperinflammation due to excessive release of proinflammatory cytokines is often 84 observed in severe COVID-19 patients as increased serum IL-6 levels correlate with respiratory failure and 85 adverse clinical outcomes (9, 30, 31) . While one can envision mild and/or early cases to benefit from antiviral drug treatments currently under 87 clinical investigation, more severe cases may benefit from treatment to mitigate the excessive systemic 88 immune reactions resulting in progressing pneumonia and even respiratory failure associated with severe 89 . The detrimental role of the systemic inflammation in the late phase of the disease has 90 become clear, as the cytokine storm has been associated with disease morbidity (6, 9, (30) (31) (32) (33) . Thus, a 91 better understanding of the dysregulation of the host response to the infection leading to immunopathology 92 is urgently needed to dissect and comprehend the immune parameters accompanying the heterogeneous 93 disease severity seen upon SARS-CoV-2 infection. Based on previous experience with other infectious diseases (20-26), we hypothesized that whole blood 95 transcriptomes should allow us to 1) determine immune cellular characteristics and functions in COVID-19 96 patients, 2) reveal heterogeneous molecular phenotypes of patients with similar clinical presentation, 3) 97 define commonalities and differences of COVID-19 in comparison to other inflammatory conditions and 4) 98 predict potential drug repurposing that might counteract observed immune dysregulations. Here, by using blood transcriptomes, we provide evidence for molecular subtypes within the immune 100 response of COVID-19 patients beyond distinguishing mild and severe cases only. In addition, molecular 101 changes in blood of severely affected patients are strikingly associated with changes in the granulocyte 102 compartment. Furthermore, blood transcriptomes of molecular subtypes of COVID-19 patients seem to be 103 unique in comparison to more than 2,600 samples derived from other infections, inflammatory conditions 104 and controls. Finally, by reverse drug target prediction using patients' blood transcriptomes revealed known 105 as well as additional new potential targets for further evaluation. Our data might also serve as a starting 106 point for a large-scale assembly of molecular data collected during currently ongoing and future therapy 107 trials for COVID-19 patients based on whole blood transcriptomes. Resistin (RETN), matrix metalloproteinases (MMP8, MMP9), and alarmins (S100A8, S100A9, S100A12), 123 as well as for cell cycle progression-associated genes (G0S2, CDC6, CDC25A), type I interferon (IFN)-124 induced genes (IFI27, IFITM3, CD169 (SIGLEC1)), but also genes with immunosuppressive functions 125 (IL10, SOCS3, Arginase (ARG1)). Downregulated genes included many lymphocyte-associated factors, 126 such as NELL2, RORC, KLRB1, TCF1 (TCF7), Calcipressin-3 (RCAN3), BACH2, or LEF1 (Fig. 1D , Table 127 S1). Functional analysis of the differentially expressed genes (DEGs) by gene ontology enrichment analysis 128 (GOEA) revealed granulocyte and complement activation-associated terms enriched in the upregulated 129 DEGs and lymphocyte differentiation and T cell activation for the downregulated DEGs (Fig. 1E ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020 . . https://doi.org/10.1101 Interestingly, the T cell activation-associated genes accounting for the enrichment of this term for the 131 upregulated DEGs included IL10 and CD274 (PD-L1) pointing at suppressive T cell functionality (Table 132 S1 ). In the current cohort, 51% of COVID-19 patients required intubation (Table S2) . Given the heterogeneous 134 nature of clinical manifestation of COVID-19, we sought to stratify the transcriptomic profiles by disease 135 severity based on intubation status. Indeed, samples from patients with mild disease (requiring no 136 intubation) clustered more closely to the control samples, while those of severe cases scattered away in 137 the PCA (Fig. 1F) . Consequently, there was a greater number of DEGs in blood samples from severe 138 COVID-19 patients than in mild patients when compared to controls (Fig. 1G) . Many of the DEGs found in 139 the COVID-19 vs control comparison (Fig. 1D) were also found when separating the COVID-19 samples 140 by severity (Fig S1A,B) . Both, severe and mild COVID-19 in comparison to controls shared neutrophil-141 specific CD177 and HP expression among the most upregulated DEGs, as well as lymphocyte-associated 142 genes such as ABLIM1, NELL2, RCAN3, RORC, KLRB1, among the downregulated genes ( Fig. S1A,B ). GOEA reflected these findings ( Resistin (RETN), matrix metalloproteinase MMP8, and alarmins (S100A8, S100A12). Whereas the type I 149 IFN-response genes, such as IFI27 or IFITM3, were not differentially regulated in severe vs. mild samples, 150 expression of immunosuppression-associated factors was more pronounced in severe COVID-19 patients 151 (IL10, SOCS3, Arginase (ARG1)) ( Fig. 1H , Table S1 ). Moreover, blood transcriptomes from severe cases 152 showed decreased expression of lymphocyte-associated genes, such as the T cell receptor chains (TRAC, 153 TRBC1), CD3 zeta chain (CD247), CD4, CD2, IL2RB, TBET (TBX21), IL7R, as well as monocyte-154 associated genes, such as MHC class II molecules (HLA-DPA1, HLA-DRB5), fractalkine receptor 155 (CX3CR1), Macrophage scavenger receptor (MSR1), or CCL2 (Fig. 1H , Table S1 ). Differences in gene 156 expression were not restricted to granulocyte and T cell functions only: assessing the changes in defined ( Fig. 1I) . Strikingly, neither disease, disease severity, nor the inclusion of outcome or immune classification 166 (31) , sufficiently explained the structure in the data. In order to get a better clinical understanding of the 167 transcriptional data, we included further clinical parameters and grouped the COVID-19 patients 168 accordingly (Fig. 1I) . We therefore performed agglomerative hierarchical clustering using the clinical 169 parameters that contributed most to the transcriptional differences observed across the first principal 170 component of the dataset (r-adjusted square ≥0.1, Fig. S1E ). The COVID-19 patients were clustered into 171 five clinical groups, which was the optimal number of clusters at which the intra-group variance was low 172 and the 'clusters distance' remained high (Fig. S1F,G) . However, comparison of this clinical parameter-173 based grouping of the COVID-19 patients did not match the transcriptional variability observed in the data 174 either (Fig. 1I) , arguing that additional molecular parameters must exist that better define the blood 175 transcriptome structure and thereby more accurately dissect heterogeneity of the clinical manifestation of 176 177 178 Co-expression analysis discloses COVID-19 subgroups with distinct molecular signatures 179 Classical approaches to analyze the transcriptome data by using differential gene expression analysis 180 based on sample groups defined by a selection of clinical parameters precluded dissection of the 181 heterogeneity of the host immune response towards SARS-CoV-2 infection, which is evident in the high-182 parameter space of the transcriptome (Fig. 1) . Co-expression analysis on the other hand identifies similarly 183 regulated genes across samples, groups these genes into modules, which can then be explored for each 184 patient sample individually or for entire patient groups. Applying such an approach using our established 185 CoCena² pipeline [https://github.com/Ulas-lab/CoCena2] ( Fig. 2A) for all 49 samples (39 COVID-19, 10 186 control) independent of their clinical annotation disclosed 10 co-expression modules, designated by color 187 indianred to darkgrey, across a total of 6,085 genes included in the analysis (Fig. S2A) . Hierarchical 188 clustering of the samples based on their group fold changes (GFCs) for each module revealed a data-driven 189 patient stratification assorting the samples into six groups (Fig. S2B) , which were subsequently used in all 190 following analyses: five different COVID-19 sample-containing groups, which only partially grouped by 191 disease severity and illustrated heterogeneity of the immune response in COVID-19 patients, plus one 192 group containing all control as well as four COVID-19 samples (Fig. 2B+S2C) . Overlaying this information 193 onto the original PCA reflected structured sample stratification as the newly defined groups clustered 194 together (Fig. S2D) . GFC analysis of the newly generated groups revealed group-specific enrichment of 195 co-expressed gene modules (Fig. 2C) . GOEA on each of the modules identified associated gene signatures 196 displaying distinct functional characteristics, which distinguish the different sample groups G1-G6 ( Fig. 197 2D+S3, Table S3 ). For example, 'inflammatory response' was enriched in modules maroon, lightgreen, 198 pink, and darkgrey, all characteristic for sample groups G1 and G2 to different extents, indicating these to 199 possibly undergoing a more vigorous inflammatory immune reaction (Fig. 2C+D) . Of note, G1 and G2 200 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint harbour a great fraction of samples from patients with severe COVID-19 (Fig. 2B) . Only a slight increase 201 in the inflammation-associated module maroon, an increase in expression in the genes of darkorange 202 (enriched in oxidative phosphorylation, mTORC1 signaling and cell cycle-associated genes), as well as a 203 loss of expression in the gold module (connected to estrogen response genes and IL2 signaling) was 204 indicative of the G4 sample group. G6, encompassing all control samples, was not associated with any 205 modules connected to inflammatory processes, but showed higher expression of indianred, steelblue and 206 gold, all functionally enriched basic cellular and metabolic processes. Extended analysis of the lightgreen 207 module, containing 987 genes, revealed a prominent enrichment of granulocyte/neutrophil activation-208 related signatures (Fig. 2E , Table S3 ). To further explore this neutrophil activation signature association, 209 we investigated possible co-expression patterns of long non-coding RNAs (lncRNA) that were reported as 210 regulators of granulocyte function (36). CYTOR (also known as Morrbid) is a lncRNA that mediates survival 211 of neutrophils, eosinophils, and classical monocytes in response to pro-survival cytokines (36), and 212 interacts with the protein-coding RNAs for the catalytic PI3K isoform Phosphatidylinositol-4,5-bisphosphate 213 3-kinase catalytic subunit beta (PIK3CB) and the filament Vimentin (VIM) (37). Interestingly, expression of 214 CYTOR was significantly increased in severe COVID-19 patient group G1 (p<0.001) and correlated with 215 both PIK3CB (r= 0.53, p<0.001) and VIM (r= 0.55, p<0.001) (Fig. 2F ). Next, we asked whether the enrichment for neutrophil activation-associated signatures in G1 and G2 is 217 attributed to an increased relative number of granulocytes within the whole blood sample. Deconvolution of 218 the expression values using linear support vector regression (38) showed increased relative percentages 219 of neutrophils especially in G1 and G2 (Fig. S2E) . G5, on the other hand, clearly displayed an increased 220 percentage of monocytes. At the same time, lymphocyte enrichment was found to be reduced in the COVID-221 19 sample groups, most prominently in G1 and G2 (Fig. S2E) . The linear deconvolution results were then 222 validated by flow cytometry. Blood composition of COVID-19 donors confirmed an increased number of 223 neutrophils and a decreased number of lymphocytes especially in G1 and G2 (Fig. S2F) . As a result, the 224 neutrophil-lymphocyte ratio (NLR), a clinical marker proposed for disease severity as it has been associated 225 with an increased systemic inflammation (39, 40), was markedly elevated in G1 and G2 compared to the 226 control sample-containing G6, both in the computationally deconvoluted results (Fig. 2G ) as well as 227 measured by flow cytometry (Fig. 2H) . Interestingly, in context of the observation that men more often 228 progress to severe COVID-19 than women (41), G1 encompasses samples from solely male patients ( We confirmed our findings of distinct molecular phenotypes in the blood of COVID-19 patients in a second 235 independent cohort. Thirty patients, severely affected by SARS-CoV-2 infection, were sampled upon 236 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10. 1101 admission to the ICU. We stratified the obtained blood transcriptomes based on the module signatures from 237 the co-expression analysis (Fig. 2C) . The samples of the second cohort were filtered for the genes present 238 in the COVID-19 co-expression network, group fold changes were calculated across all patients individually, 239 and sample groups G1-G6 assigned according to their combinatorial module expression (Fig. S4A ). Controls from the first cohort were included for comparison. Interestingly, in these ICU patients, we noted 241 the transcriptome profiles from the second cohort to show greatest similarity to G1 and G2, which is in line 242 with their severe phenotypes and our findings from the first cohort. Hierarchical clustering of the samples 243 based on their group fold changes for each module stratified the samples of the second cohort into four 244 groups (Fig. S4B) . The control samples from the first cohort built one separate group, which we designated 245 again as G6. To allow for group-specific comparison to the stratification within the first cohort ( . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. 3C, Table S4 ). Whole blood transcriptome analysis showed enrichment of neutrophil activation-associated 274 signatures (Fig. 2) . Excluding the bias of alterations in neutrophil population size across conditions, gene 275 set enrichment analysis on granulocyte samples now uncovered that differentially expressed genes 276 between severe and mild COVID-19 patients are indeed characterized by an increase in granulocyte 277 activation-associated factors (Fig. S5B) . CD177 is part of the granulocyte activation gene set and was 278 indeed markedly increased in severe (day 1-14) compared to mild (day 1-14) COVID-19 samples (Fig. 3D ). Also, the alarmin S100A6 exhibited heightened expression in granulocytes from severe COVID-19 patients 280 ( Fig. 3D ). Next, we used the CoCena 2 modules from the whole blood analysis ( Recently, heterogeneity of neutrophils with distinct subsets associated with disease severity and phase 295 was revealed by single cell RNA-seq analysis in blood of COVID-19 patients (42). Enrichment of the three 296 signatures that related to severe COVID-19, in our granulocyte samples demonstrated that the findings 297 obtained in the single-cell study were also discernible in bulk data and the results in accordance to the 298 reported phenotypes: premature/immature, severe inflammatory, as well as severe suppressive subset 299 marker genes were markedly enriched in granulocytes from severe COVID-19 patients in the present study 300 (Fig. S5D ). Further analysis of this observation on the gene level displayed the heightened expression of 301 pre-/immature neutrophil-associated markers in severe COVID-19 patients' granulocytes, such as CD15 302 (FUT4), metalloproteinase MMP8, alarmins (S100A8/9), NET formation-involved PADI4, or NLRC4, for 303 which activating mutations have been reported to overtly trigger the inflammasome and thereby increase 304 the risk to develop autoinflammatory syndrome (43, 44) (Fig. 3F) . Marker genes attributed to the "mild 305 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint mature activated" neutrophil subset (42), such as ITGA4, or SLC38A1, were indeed elevated as well in the 306 mild COVID-19 patients' granulocytes of this study. In line with the single cell study, signs of an interferon 307 response were observed irrespective of disease severity (IFIT1, IFIT3, ISG15), while only severe COVID-308 19 patients' granulocytes featured expression of genes with suppressive functionality, such as ARG1 or 309 PD-L1 (CD274) (Fig. 3F ), We next stratified the granulocyte samples based on the module signatures from the whole blood analysis. The granulocyte samples were filtered for the genes present in the COVID-19 co-expression network ( CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint arthritis (48), Ebola vaccination (25), neonatal-onset multisystem inflammatory disease (NOMID) and 339 macrophage activation syndrome (NLRC4-MAS) (44), n=326) as well as control samples from eight 340 different studies (n=408). To investigate how the COVID-19-specific co-expression modules can be linked 341 to other diseases, the combined dataset was filtered for the genes present in the COVID-19 co-expression 342 network (Fig. 2C ) and the group fold changes were calculated across all samples (Fig. 4B) . Additionally, 343 cell type-specific signatures (38) and single cell-derived neutrophil subset signatures (42) ( Table S6) 'Inflammatory response', 'IL6 and TNFA signaling' is an attribute of both, G1 and G2, to a lesser degree of 374 G5, also Tuberculosis/HIV, and to some extent of sepsis. More prominently enriched in sepsis was 375 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint 'complement', 'coagulation', 'heme metabolism' and 'glycolysis' -shared by COVID-19 G1+G3; whereas 376 'oxidative phosphorylation' and 'mTORC1 signalling' were seen for Chikungunya and Zika virus infections 377 -shared to some extent with COVID-19 G3+G4. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint disease settings in recent years (55). Two interesting target genes discussed in this context and already 411 addressed in clinical trials are CXCR2 and C5AR1. Consistent with the increased NLR in G1 and G2, we 412 observed significant upregulation of CXCR2 and C5AR1 in both groups (Fig. S7E ). Using patient cluster-specific DEGs as input (Fig. S7C, Table S8 ), we searched for compounds that evoke Interleukin-1β (IL1B) (Fig. 5E ). The majority of these target genes cluster in the G1-specific lightgreen and 443 pink, as well as in the maroon CoCena² modules. Drugs predicted to be effective for each module are 444 presented as a resource as supplementary information for further inspection (Table S9) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. (Fig. 1, S1 ), CD177 has also been introduced as a hallmark for Kawasaki syndrome (71) In addition to many other infectious and non-infectious diseases (20-28), whole blood transcriptomics 514 revealed important insights into the patient structure in COVID-19 and comparative analysis provides first 515 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint evidence for the unique changes elicited by this disease within the host in comparison to other infections 516 (Fig. 4) . While cases in G2-4 shared changes with other viral infections such as Chikungunya or Zika, 517 mainly including interferon signature genes (IFI16, IFI35, IFIT1, maroon module), partial overlap to bacterial 518 sepsis was observed for G1-G3, albeit the major sepsis module (pink) was not prominently enriched in 519 COVID-19 patients indicating that there are distinct differences in pathology of these two diseases. Although 520 we could establish an integrative model using historical and publicly available blood transcriptome data, we 521 also realized that limited standardization of the experimental procedures (sample processing, library 522 production, sequencing) between different whole blood transcriptomics studies led to the exclusion of 523 several additional important studies. In this context, it will be of great interest whether blood transcriptomics, 524 as it was shown for tuberculosis (20, 21), can be utilized in large enough cohorts and clinical trials for 525 disease risk or outcome prediction in COVID-19. We propose to collect whole blood transcriptomics data 526 in a central registry for direct inspection by the research community and provide a prototype model for such 527 a registry on FASTGenomics. Transcriptome data have been successfully used to predict a role for specific 528 gene networks in the drug response to certain cancer types (79-83). Considering the strong influence of 529 the systemic immune response on severity and outcome of COVID-19, we wanted to establish, whether 530 the global assessment of molecular subgroups of COVID-19 patients could be utilized to predict novel drug 531 targets for this disease addressing the dysregulated peripheral immune response of the host (Fig. 5) . Using 532 two major databases providing transcriptome signatures to many known drugs, CLUE (83) and iLINCS (82) intubation, while no effect on mortality was seen for those patients who did not require respiratory support 542 (61). Of note, drugs predicted to potentially reverse the transcriptome signatures of the severely affected 543 G1 group may have adverse effects in milder COVID-19 cases from G4 as observed in the contrasting 544 regulation patterns in many of the clusters (Fig. 5C ). Interestingly and in line with the reports on sexual 545 dimorphism in COVID-19 severity and mortality (85), G1 included only male patients and many of the drugs 546 predicted to reverse the G1-specific signatures were related to female hormones (Fig. 5D) . However, we 547 also predicted drugs for all COVID-19 patients already in clinical trials such as immunoglobulins (>80 trials, 548 clinicaltrials.gov), or methylprednisolone (>20 trials), findings further supporting the value of our prediction 549 approach. Despite these promising results, strongly suggesting that reverse transcriptomics is not only of 550 value in cancer (79-81) but might also be used to identify drugs targeting the immune pathophysiology in 551 COVID-19, we would also like to point out current limitations of our findings that need to be addressed in 552 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint future studies. Predictions will further benefit from and focused by validation studies in independent COVID-553 19 patient cohorts, which is to be fostered by a central database for COVID-19 patients' blood transcriptome 554 data. Nevertheless, we used samples from different countries, illustrating the generalizability. Furthermore, 555 the molecularly derived and prioritized drug candidates presented here might be tested in very recently 556 introduced pre-clinical models (86) prior to starting clinical trials. Irrespective of the current shortcomings, 557 we favor such drug candidate identification, since it is based on interrogation of molecular data directly 558 derived from patients' immune cells involved in the ongoing processes in the disease and therefore may 559 increase the likelihood of a beneficial effect in patients. Collectively, we provide first evidence for whole blood transcriptomics to potentially become a valuable tool 561 for distinguishing COVID-19 from other infections in cases for which pathogen detection might be difficult, 562 for monitoring and potentially predicting outcome of the disease, to further dissect molecular phenotypes 563 of COVID-19, particularly of the host's immune system, also along the disease course over time, and to 564 support drug target prediction for subgroups of patients. Clearly, in contrast to more sophisticated higher 565 resolution methods, whole blood transcriptomes can be easily obtained in large clinical cohort studies and 566 large clinical treatment trials yet providing an enormous information content about the molecular reactions 567 of the host's immune system. We therefore propose a blood transcriptome registry following the model we 568 introduce here on the FASTGenomics platform that would allow the scientific community to utilize the 569 information for new clinical studies and to address further large-scale studies into pathophysiological 570 mechanisms of the disease and enhance the chances of trials to demonstrate a clinical benefit in patients. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. Ribo-Zero Globin kit (Illumina). In brief, ribosomal and globin mRNA were depleted from 750ng purified total 693 RNA using biotinylated, target-specific oligos combined with Ribo-Zero rRNA removal beads, remaining 694 RNA was fragmented using divalent cations under elevated temperature. First-strand was generated using 695 SuperScript2 RT (Invitrogen) supplemented with Actinomycin D, followed by second-strand synthesis with 696 dUTP replacing dTTP. 3' ends were adenylated and index adapters were ligated before subsequent PCR 697 amplification to yield the final library. Remaining overhangs were converted into blunt ends via 698 exonuclease/polymerase activities and enzymes were removed. Selective enrichment of DNA fragments 699 with ligated adaptor molecules was performed using Illumina PCR primers in a 15 cycles PCR reaction, 700 followed by purification cDNA using SPRIBeads (Beckman-Coulter). Libraries were quantified by Qubit 701 dsDNA HS Assay (Thermo Fisher Scientific) and fragment size distribution was determined using the HS 702 D1000 assay on a Tapestation 4200 system (Agilent). High-throughput sequencing was carried out with a 703 NovaSeq™ 6000 Sequencing System S2 (50bp paired-end reads), and data was converted into fastq files 704 using bcl2fastq2 v2.20. RNA-sequencing analysis 706 Sequenced reads were aligned and quantified using STAR: ultrafast universal RNA-seq aligner (v2.7.3a) 707 (87) and the human reference genome, GRCh38p13, from the Genome Reference Consortium. Raw counts 708 were imported using DESeqDataSetFromHTSeqCount function from DEseq2 (v1.26.0) (88) and rlog 709 transformed according to DEseq2 pipeline. DESeq2 was used for the calculation of normalized counts for 710 each transcript using default parameters. All normalized transcripts with a maximum over all row mean 711 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. Linear support vector regression 742 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10. 1101 Linear support vector regression (38) was employed to computationally deconvolute the study's whole 743 blood samples. Gene expression tables were normalized with DESeq2 and were utilized as the input 744 mixture file. LM22-subsetted signatures for B cells, T cells, NK cells, monocytes, dendritic cells, eosinophils 745 and neutrophils were generated as described on https://cibersort.stanford.edu/tutorial.php. The algorithm 746 was subsequently run with 1,000 permutations and the proportions of cell types were visualized with ggplot2 747 (v3.2.1) (93). CoCena²: Construction of Co-expression networks, analysis -automated 749 To define differences and similarities in transcript expression patterns among the different groups, CoCena 2 750 (Construction of co-expression Network Analysis -automated) was performed based on Pearson 751 correlation. CoCena² is a network-based approach to identify clusters of genes that are co-expressed in a 752 series of observed conditions based on data retrieved from RNA-sequencing. The tool offers a variety of 753 functions that allow subsequent in-depth analysis of the biological context associated with the found 754 clusters. As input for the analysis the 10,000 most variable genes were used. To identify genes whose expression patterns are highly similar across all tested samples, pairwise Pearson 756 correlation coefficients are calculated using the R package Hmisc (v4.1-1). The underlying assumption of 757 the Pearson correlation to the data is that it is normally distributed, which is a valid assumption to make in 758 the context of gene expression when looking at expression patterns within different experimental conditions. The correlation between each pair of genes is the basis for the subsequent network construction. Therefore, 760 the tool focuses mainly on positively correlated gene pairs, since the rate of confirmation of an edge 761 representing an association of genes is higher than that of a non-existing association. In order to refine the structure of the upcoming network and to unravel the condition specific signatures, a 763 correlation cut-off is proposed to mark the minimal correlation a pair of genes must exhibit for their co-764 expression to be taken into account. The cut-off is determined based on different criteria: . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint A graph component is a subset of nodes, such that there is a path from every node within the component 774 to any other node in that same component but none connecting the nodes to any outside of that component. Even though there exist functional collections of genes that cooperate to fulfil a common task, these 776 collections are not expected to be operating independently within the cell. Thus, the cut-off proposal favors 777 graphs with a small number of components. Number of edges: To avoid a highly connected graph with great lack of structure -"hairball"-, the cut-off is chosen such that 780 the number of edges is minimized while respecting the above-mentioned criteria. A Pearson correlation coefficient cut-off of 0.857 (6,085 nodes and 252,584 edges) was chosen to construct 782 scale-free networks. The undirected co-expression network is constructed based on the gene pairs which show a higher 784 correlation in their expression pattern than the set cut-off. A series of network-based clustering algorithms 785 is available to then identify clusters of strong co-expression within the network. An option "auto" is provided, 786 which tests the different clustering algorithms and picks the one that achieves the highest modularity score. Unbiased clustering was performed using the "label propagation" algorithm in igraph ( . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. To investigate the interactions between protein-coding and long-non-coding RNAs, we utilized the enricher 808 function from the clusterProfiler package. We performed an enrichment analysis for lncRNA species, using 809 the protein-coding genes that belong to the lightgreen cluster as the input gene list and all the network CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020 . . https://doi.org/10.1101 as autoimmune disease) and III) library preparation and sequencing technology differ as little as possible 840 from our COVID-19 protocol. The fastq files of 18 additional studies (588242, GSE101705, GSE107104, 841 GSE112087, GSE127792, GSE128078, GSE129882, GSE133378, GSE143507, GSE57253, GSE63042, 842 GSE66573, GSE79362, GSE84076, GSE89403, GSE90081, GSE97590, GSE99992 and the Rhineland 843 study) were downloaded and aligned with STAR. The counts were imported into R (v3.6.2) and were 844 modelled for each gene using DESeq2. Merged raw counts were filtered for the genes present in the 845 COVID-19 co-expression network, ribosomal protein-coding genes and mitochondrial genes were removed, 846 yielding a total of 5,770 genes and 3,176 samples. To account for differences in sequencing depth across 847 studies, a quantile normalization was performed on the filtered data. Group fold changes were calculated, 848 where the grouping variable was set to be the disease status. To explore COVID-19 associated expression of genes within the integrated dataset, the data was 850 intersected with the gene modules previously retrieved from the COVID-19 CoCena 2 network, the mean 851 group-fold-changes were determined per cluster and condition and visualized in a heat map. The modules were analyzed for enriched immune cell markers as provided by CIBERSORT and BD 853 Rhapsody and those that showed neutrophil enrichment were screened for genes representative of different 854 neutrophil subtypes as recently described (42 To get a more fine-grained differentiation of the specific neutrophil states for figure 3, the authors kindly 859 provided additional signatures from the scRNA dataset using a Wilcoxon rank sum test for differential gene 860 expression implemented in Seurat. Genes had to be expressed in >10% of the cells of a cluster, exceed a 861 logarithmic threshold >0.1 and to have >5% difference in the minimum detection between two clusters. The 862 following additional comparisons were performed: 8 and 9 (pre-and immature neutrophils combined) VS 863 the rest, 1,3,4,6 (neutrophil states from control patients) VS the rest. To get unique signature genes for 864 clusters 0, 2 and 5 (COVID-19-specific clusters) we took the following approach for each cluster: 1) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020 . . https://doi.org/10.1101 Overview of drugs CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint FIGURES 1072 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint 1073 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint Figure 1 -Whole blood transcriptomes reveal diversity of COVID-19 patients not explained by . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint 1130 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint 1168 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint 1229 1230 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint 1249 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2020. . https://doi.org/10.1101/2020.07.07.20148395 doi: medRxiv preprint We shouldn't worry when a virus mutates during 936 disease outbreaks Drug development in the era of precision medicine Towards host-directed therapies for tuberculosis Coordinating the COVID-19 pipeline Should we stimulate or suppress immune responses in COVID-19? Cytokine 957 and anti-cytokine interventions Pathological inflammation in patients with COVID-19: a key role for 972 monocytes and macrophages Neutrophil heterogeneity as therapeutic opportunity in 975 immune-mediated disease COVID-19: the gendered impacts of the outbreak Activation of PAD4 in NET formation Damps and nets in sepsis Neutrophils as emerging therapeutic targets The Drug Repurposing Hub: A next-generation drug library and information 1003 resource The hidden side of SERPINB1/Leukocyte Elastase Inhibitor Low-cost dexamethasone reduces death by up to one third in hospitalised patients with severe 1009 respiratory complications of COVID-19 The starting line for COVID-19 vaccine development Developing covid-19 vaccines at pandemic speed Kawasaki-like disease: emerging complication during the COVID-19 1026 pandemic Acute cor pulmonale in critically ill patients with covid-19 The Library of Integrated Network-Based Cellular Signatures NIH Program System-Level Cataloging of Human Cells Response to Perturbations Considering how biological 1043 sex impacts immune responses and COVID-19 outcomes STAR: Ultrafast universal RNA-seq aligner Methods in Molecular Biology Ggplot2 : elegant graphics for data analysis unikn: Graphical elements of the University of Konstanz's corporate 1068 design Figure S2 -related to Figure 2 Group fold change (GFC) heat map and hierarchical clustering for each sample and the gene modules 1132 identified by CoCena 2 analysis Top plots present the 1134 clustering statistics (within cluster sum of squares and high average silhouette width scores) used for the 1135 generation of the six data-driven CoCena 2 sample groups G1-G6, which are plotted in the dendrogram plot. 1136 (C) Heat map presenting summary statistics of clinical parameters for COVID-19 patients grouped 1137 according to the CoCena 2 sample groups G1-G5 Presented are scaled values of the mean value of the 1138 parameters age/blood cell counts/SOFA score/Pneumonia Index/Charlson score Statistical differences were estimated among the groups via the one-sided Anova test or Fisher test, for 1141 numeric or categorical values respectively PCA plot depicting relationship of all samples based on dynamic gene expression of all genes. Coloring 1143 based on the six data-driven CoCena 2 sample groups Cibersort cell type deconvolution at cell subset level. Grouping based on the six data-driven CoCena 2 1145 sample groups Flow cytometry analysis, number of lymphocytes (upper) and neutrophils (lower) per µl of blood Grouping based on the six data-driven CoCena 2 sample groups Heat map of DE and top 20 most variable transcription factors, epigenetic regulators, surface and & 1149 secreted proteins PCA of all genes within the dataset mapped by COVID-19 severity status Bar plot of DEGs between severe and mild COVID-19 patients at day 1-14 (left) and day 15-28 (right) 1173 (FC>|2|, FDR-adj Boxplot of CD177 (left) and S100A6 (right) in mild and severe COVID-19 patients at day 1-14 and 15-1175 28 Mean of group fold changes (GFCs) of the modules maroon, pink and lightgreen in the granulocyte 1177 samples over time. Patients are grouped according to severity mild (top) and severe (bottom) Heat map of mean expression of 24 markers in mild (top) and severe (bottom) patients ordered by days 1181 after disease onset bins Heat map of mean GFCs of the CoCena 2 whole blood modules in the granulocyte samples from each 1183 individual patient. Patients are clusters by the mean GFC module expression. Severity patterns found in 1184 the whole blood CoCena 2 network were identified and patients groups were assigned accordingly Patients with a distinct GFC expression pattern were labeled as G7 Box plot of CD177 expression in granulocytes grouped by (I) Box plot of CD177 expression in whole blood grouped by