key: cord-319519-mb9ofh12 authors: Ding, J.; Hostallero, D. E.; El Khili, M. R.; Fonseca, G. J.; Milette, S.; Noorah, N.; Guay-Belzile, M.; Spicer, J.; Daneshtalab, N.; Sirois, M.; Tremblay, K.; Emad, A.; Rousseau, S. title: A network-informed analysis of SARS-CoV-2 and hemophagocytic lymphohistiocytosis genes' interactions points to Neutrophil Extracellular Traps as mediators of thrombosis in COVID-19 date: 2020-07-02 journal: nan DOI: 10.1101/2020.07.01.20144121 sha: doc_id: 319519 cord_uid: mb9ofh12 Abnormal coagulation and an increased risk of thrombosis are features of severe COVID-19, with parallels proposed with hemophagocytic lymphohistiocytosis (HLH), a life-threating condition associated with hyperinflammation. The presence of HLH was described in severely ill patients during the H1N1 influenza epidemic, presenting with pulmonary vascular thrombosis. We tested the hypothesis that genes causing primary HLH regulate pathways linking pulmonary thromboembolism to the presence of SARS-CoV-2 using novel network-informed computational algorithms. This approach led to the identification of Neutrophils Extracellular Traps (NETs) as plausible mediators of vascular thrombosis in severe COVID-19 in children and adults. Taken together, the network-informed analysis led us to propose the following model: the release of NETs in response to inflammatory signals acting in concert with SARS-CoV-2 damage the endothelium and direct platelet-activation promoting abnormal coagulation leading to serious complications of COVID-19. The underlying hypothesis is that genetic and/or environmental conditions that favor the release of NETs may predispose individuals to thrombotic complications of COVID-19 due to an increase risk of abnormal coagulation. This would be a common pathogenic mechanism in conditions including autoimmune/infectious diseases, hematologic and metabolic disorders. HLH genes are significantly enriched within the SARS-CoV-2 host protein interactome 106 In the case of the SARS-Cov-2 pandemic, with widespread impact across the world, there 107 is an urgency that requires the adaptation of different strategies to understand COVID-19. In this 108 paper, we exploited the knowledge existing within protein interaction networks to identify the 109 molecular pathways underpinning thrombotic complications of COVID-19 using advanced 110 computational algorithms. As described in the introduction, a subset of patients suffering from 111 severe complications of COVID-19 present clinically with symptoms similar to HLH. Therefore, 112 we have assembled a list of candidate genes responsible for primary HLH and associated 113 syndromes to explore their relationships with COVID-19 24,25 (Supplementary Table S1 ). 114 The first question asked was whether these HLH genes had potential interactions with 115 SARS-CoV-2. We assembled a protein interaction network between the SARS-CoV-2 host 116 interaction protein network recently published 23 and the HLH genes using an algorithm that we 117 created for this purpose, GeneList2COVID19. The algorithm establishes the shortest path between 118 the candidate genes and the known host interacting proteins with SARS-CoV-2 and calculates an 119 overall connectivity score for the network (a smaller value represents a greater connectivity) ( Fig 120 1 and Supplementary Table S1 ). We computationally validated the predictions of the 121 GeneList2COVID19 to identify significant interactions. To demonstrate that the method can 122 assign significant connectivity scores to genes associated with COVID-19, we obtained a list of 123 10 confirmed COVID-19 related genes 26 , which are differentially expressed in severe COVID19 124 patients (Supplementary Table S1 ). We then calculated the "COVID-19" connectivity score for 125 those 10 genes (SA) as well as all the genes (SB) using GeneList2COVID19. We found that SA is 126 significantly (p-value=0.017) smaller than SB, which indicates that those 10 COVID-19 related 127 genes are indeed "significantly connected" to SARS-CoV-2 proteins (Fig. 2) . To show the 128 specificity of the method, we also calculated the "COVID19" connectivity score for 100 randomly 129 selected genes (SC) and compared it to the connectivity score of all genes (SB). We found that SC 130 is NOT significantly smaller than B (the background) (p-value=0.106) (Fig. 2) . In other words, 131 those 100 random genes are not "significantly connected" to SARS-CoV2 proteins, which reflects 132 the fact that those genes were randomly picked. As an additional control, we repeated the analysis 133 using genes linked to male infertility 27 , a condition that has not been associated with COVID-19 134 (Supplementary Table S1 ). The connectivity score was not significantly different from all other 135 genes (p-value=0.872), further demonstrating the specificity of the GeneList2COVID19, which is 136 not restricted to random genes but can also discriminate gene lists associated with other conditions 137 (Fig. 2) . After the method was validated, we compared the "connectivity" score for HLH genes 138 listed above with all genes that connect to SARS-CoV-2 proteins through our assembled protein-139 protein interaction network (Fig. 2) . We found that the score for the HLH marker genes is 140 significantly smaller compared to all other genes (p-value=0.0082, one-sided rank sum test) (Fig. 141 2). As an additional control, we compared the HLH genes to a list of vascular angiogenesis genes 142 linked to both H1N1 and SARS-CoV-2 pulmonary infections 17 (Supplementary Table S1 & Fig. 143 2). The HLH genes' connectivity score was smaller, which means that those genes had closer 144 theoretical interactions to SARS-CoV-2. This suggests that HLH genes and their associated 145 pathways are of high interest in the study of SARS-CoV-2 infections. 146 147 Differential expression of HLH genes in health conditions related to COVID-19. 148 We next investigated whether the expression of HLH genes in lung tissue were highly regulated 149 (up condition. 157 We hypothesized that HLH genes, that may play an active role in thrombotic complications 158 of COVID-19, are more likely to be regulated in co-morbid conditions. RAB27A expression was 159 found altered in all studied conditions, while AP3B1 expression was also found altered in all 160 conditions except one, lung cancer (Fig 3) . Table S1 ). The majority of 218 the genes were expressed in neutrophils and 7 of them (UNC13D, LYST, AP3B1, MAGT1, 219 RAB8A, GOLGA2 and G3BP1) were significantly elevated in either inactive or active sJIA 220 compared to control neutrophils (Fig. 4) . It is worth noting that the most connected gene to SARS-221 CoV-2, AP3B1 that can directly interact with SARS-CoV-2 E protein, is in the list of up-regulated 222 genes in neutrophils derived from sJIA. An important message stemming from our discovery that 223 NETs may be drivers of coagulopathies in reactive HLH is the potential susceptibility of a subset 224 of the pediatric population, identifying them as at risk of severe complication of COVID-19. This 225 led us to the next step, predicting potential vulnerable populations to thrombolytics complications 226 of COVID-19 based on their susceptibility to release NETs. 227 228 Identifying potentially vulnerable populations to COVID-19 based on NETs release. 229 The World Health Organization has established that identifying vulnerable populations is 230 an urgent public health priority in the context of the COVID-19 pandemic 48 . Based on the analyses 231 above, NETs may play an important role in promoting thrombosis in COVID-19. The role of 232 neutrophils in coagulopathies is becoming increasingly recognised and particularly that of NETs 41 . 233 Therefore, we hypothesized that health conditions associated with increased release of NETs could 234 be a predictive factor for thrombotic complications of COVID-19. Based on this hypothesis and in 235 order to identify vulnerable populations, we developed a method called foRWaRD (informative 236 Random Walk for Ranking Diseases) to rank different diseases associated with NETs based on 237 their relevance to a gene set of interest (here genes in the HLH-SARS-CoV-2 network) (see 238 Methods for details). 239 We obtained two NET gene signatures from previous studies 49, 50 and combined them to 240 obtain a list of NET-associated genes ( Supplementary Table S1 ). Then, we obtained the list of 241 diseases associated with these genes from the DisGeNET database 51 . Entries with gene-disease 242 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. . association (GDA) <0.4, were filtered out to arrive at a set of 99 diseases. We used the set of 24 243 genes in the HLH-SARS-CoV-2 interaction network ( Fig. 1) as the query set and used the 244 HumanNet Integrated network 35 as the gene interaction network in foRWaRD. The full list of 99 245 diseases, ranked using foRWaRD, is provided in Supplementary Table S3. Table 2 illustrates 246 the diseases with a Normalized Disease Score (NDS) greater than 0.5, meaning that these diseases 247 are enriched above the background probabilities. Most of the 10 top-ranked diseases associated 248 with genes linked to NETs can be sub-grouped in 4 major categories 1-immune/infectious 249 (Alzheimer's disease, Immunodeficiency 8); 2-cardiovascular (Myocardial reperfusion injury, 250 Hemolytic anemia due to G6PI, Bleeding disorder type; 15); 3-metabolic (Diabetes) and 4) Cancer 251 (Liver carcinoma). It is important to note that some of these diseases would be putatively 252 associated with NET-deficiency such as Immunodeficiency 8, more often presenting clinically 253 with bleeding. Whether patients suffering from these diseases are protected from thrombotic-254 complications of COVID-19 remains to be determined. However, diseases associated with 255 increased NET release are expected to yield greater risk of thrombosis and may identify vulnerable 256 populations to severe thrombotic complications of COVID-19. 257 258 Discussion 259 Based on recent literature, we hypothesized that severe pulmonary thrombotic 260 complications of COVID-19 are associated with a hematologic cytokine storm that could be, in 261 part, defined using genes causing HLH. The network-informed analysis presented in this paper, 262 revealed that 1) the top GO biological function associated with HLH genes is neutrophil 263 degranulation, consistent with a recent report highlighting the undervalued role of neutrophils in 264 HLH 36 ; 2) HLH genes are significantly enriched with the SARS-CoV-2 human interactome; 3) the 265 top-ranked HLH gene, AP3B1, has roles in cargo loading of type II pneumocytes, where it may 266 interact with SARS-CoV-2 to disturb surfactant physiological functions to promote 267 inflammation/pro-coagulation activities; 4) diseases/syndromes-associated with increased release 268 of Neutrophil Extracellular Traps (NETs) may predict vulnerable populations, including those 269 affecting children. 270 Taken together, the network-informed analysis led us to propose the following model: the 271 release of NETs in response to inflammatory signals acting in concert with SARS-CoV-2 damage 272 the endothelium and direct platelet-activation promoting abnormal coagulation leading to serious 273 complications of COVID-19 in susceptible individuals (Fig. 5) . The underlying hypothesis is that 274 genetic and/or environmental conditions that favor the release of NETs may predispose individuals 275 to thrombotic complications of COVID-19 due to an increase risk of abnormal coagulation. This 276 would be a common pathogenic mechanism amongst numerous conditions including 277 autoimmune/infectious diseases, hematologic and metabolic disorders. 278 The role of neutrophils in coagulopathies is becoming increasingly recognised and 279 particularly that of NETs 41 . Interestingly, elevated Neutrophils count is the best single leukocyte 280 predictor of cardiovascular risk 52 , bettered only by the combination of high neutrophils to low 281 lymphocytes ratio 52 , a clinical feature of COVID-19 10 . NET release can be triggered by various 282 inflammatory mediators found elevated in severe COVID-19, including CRP, IL-1b, IL-6 and IL-283 8 43 . There is also a positive correlation between circulating serum of IL-6, IL-8, CRP and NET 284 levels 53 . NETs are found in a variety of conditions such as infection, malignancy, atherosclerosis, 285 and autoimmune diseases with reports now emerging that describe their presence in COVID-19 9,54-286 56 . Amongst the known diseases associated with NETs, several are related to children including 287 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . Cystic Fibrosis 57 , Meningococcal Sepsis 58 ; Lyme Neurobiellosis 59 ; Juvenile Dermatomyositis 60 288 and pediatric inflammatory bowel diseases 61 . In pediatric sepsis, NETs levels were elevated and 289 correlated with disease severity, mirroring results in mice where higher NETs levels in response 290 to lipopolysaccharides are found in infant mice compared to adults 62 . 291 One of the quickest ways to decrease the burden of COVID-19 on the health care systems 292 throughout the world is to identify at-risk populations to emphasize the importance of infection 293 prevention measures for those individuals. Since these measures incur high personal, social and 294 economic costs, a precise knowledge is essential. We presented a novel computational algorithm 295 that enabled us to identify potential diseases linked with NETs (Table 2) . Interestingly, amongst 296 the identified diseases, Diabetes, a well-established comorbidity of COVID-19 63 , is ranked 4 th and 297 7 th . Our study provides additional insight into the potential mechanisms involved, with increase 298 NETs formation resulting from the underlying chronic inflammation as a key factor promoting 299 coagulopathies in diabetics suffering from COVID-19. As for the top ranked disease, Alzheimer's 300 Disease, whether NETs in the brain can lead to an increased risk of systemic thrombosis looks less 301 likely than the reverse, that SARS-CoV-2 infection may increase NETs release in the brain that 302 could exacerbate Alzheimer's disease-driven pathology including a greater risk of stroke. This 303 may be an important question for future studies due to the susceptibility and severity of the elderly 304 to COVID-19 and notably the extreme mortality seen in long-term care home arounds the world 305 where cognitive impairment is highly prevalent 64,65 . 306 It has been suggested that COVID-19 should be added to this list of hyperferritinemic 307 syndromes, which includes adult-onset Still's disease, septic shock, catastrophic anti-phospholipid 308 syndrome, and MAS (reactive HLH) 66 . Collectively, these diseases may share similar underlying 309 factors of complications, including an underappreciated role of NETs leading to coagulopathies. 310 It is possible that individuals can unfortunately contract SARS-CoV-2 infection in addition to other 311 factors that underlie any of these conditions (other viruses for example), which may lead to further 312 amplification loop. PCR-negative SARS-CoV-2 patients presenting with clinical symptoms of 313 hyperferritinemic syndrome should be considered highly vulnerable and appropriate infection 314 control measures should be put in place. 315 Disorders associated with bleeding should decrease the risk of thrombotic complications 316 of COVID-19 (however they may still lead to severe COVID-19 via other mechanism). 317 Nevertheless, they can be informative on pathophysiology. The strongest connectivity to SARS-318 CoV-2 E protein was AP3B1 (Fig. 1) . Loss of function of AP3B1 leads to Hermansky-Pudlak 319 syndrome type 2 that is associated with bleeding and coagulation defects 67,68 . The SARS-CoV 320 ( Therefore, both proteins have a coherent subcellular localization supporting their potential 323 interaction. Moreover, in post-mortem immunohistochemical analysis of lung tissue, the SARS-324 CoV-2 S (spike) and E proteins were found to localize with the respiratory epithelia, the 325 interalveolar, and the septal capillaries 5 . In addition, septal and intra-alveolar neutrophilia was 326 observed 5 , colocalizing some of the key players of a neutrophil-driven SARS-CoV-2 enhanced 327 coagulation cascade in COVID-19 (Fig. 5) . Whether SARS-CoV-2 E protein can directly or 328 indirectly penetrate neutrophils and/or platelets remains unknown, as these cells are not reported 329 to highly express ACE2+/TMPRSS2+, the two key host proteins for viral entry. However, both of 330 these proteins are highly expressed on type II pneumocytes 71 , where AP3B1 is important for cargo 331 loading of lamellar bodies 72 . A postmortem examination in a COVID-19 patient who succumbed 332 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . to a sudden cardiovascular accident revealed SARS-CoV-2-viruses present in pneumocytes despite 333 PCR-negative nasal swabs 73 , indicating a prolonged risk in the lower airways for complications. 334 Immunodeficiency 8, resulting from the loss of function of coronin-1, also leads to 335 bleeding. Coronin-1 plays key functions in PMN trafficking 74 in part via its interaction with the 336 integrin β2. β2 integrin-mediated systemic NET release is a viral mechanism of immunopathology 337 in hantavirus-associated disease such as kidney and lung damage 75 , similar to the 338 immunopathology in severe COVID-19. Overall, diseases associated with a putative loss-of-339 function of NETs suggest mechanistic roles for AP3B1, coronin-1 and integrin b2 in regulating 340 NET-mediated coagulopathies in the lung alveolar and peri-alveolar areas (Fig. 5) . The analysis in this study is based on a new algorithm that we develop (freely available at 342 https://github.com/phoenixding/genelist2covid19). GeneList2COVID19 can systematically 343 evaluate the connection of any given gene list to SARS-CoV-2 proteins both within-host proteins 344 and between host-viral proteins. Therefore, it can be used to study a wide variety of biological 345 problems associated with COVID-19, especially in circumstances where experimental data on 346 COVID-19 (e.g. transcriptomics or genomics) is not yet available for the problems of interest. The 347 algorithm was found effective, on positive (proven to be associated with COVID-19) and negative 348 (irrelevant to COVID-19) gene lists. In terms of limitation, GeneList2COVID19 is dependent on 349 the prior knowledge of the protein interactome within the host, and between the host and virus. 350 Currently, we have a well-established protein-protein interactome for the human species. 351 However, the interactome between the SARS-CoV-2 proteins and human proteins is relatively 352 limited 23 , since such an interactome is far from complete. For example, there are no reported 353 interactions for ACE2 and TMPRSS2, which are critical to SARS-CoV-2 infection. We provided 354 an option (-v) in GeneList2COVID19 to utilize any new host-viral protein interactome data when 355 it becomes available. While, GeneList2COVID19 is good at telling whether an input gene list is 356 associated with COVID-19, it cannot test the mechanistic hypothesis generated. It can provide the 357 network that connects the genes in the list of the SARS-COV-2 proteins, but it cannot determine 358 which nodes/edges in the network is more critical (and when they are activated). At this stage, the 359 most useful information is derived from considering the entire network. As the availability of 360 COVID-19 related "-omics" data increases, we will extend the method into a joint-model that 361 integrates all those omics data for a more comprehensive, high-definition network model that can 362 provide additional and more precise insights for the role of genes in COVID-19. The second 363 computational algorithm provided, foRWaRD (https://github.com/ddhostallero/foRWaRD), is 364 also limited by the requirement of known gene-disease associations. conditions. Further studies in well-defined cohorts of COVID-19 patients are mandatory to 377 confirm the relevance of the observations highlighted in the present study. Such knowledge may 378 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . be of importance in novel COVID-19 severity biomarkers identification that will be needed in the 379 management of individuals at risk of complications. 380 381 Methods 382 383 Datasets 384 For this study we used the following datasets: The interactions between the SARS-CoV-2 proteins 385 and host proteins 23 , reporting 332 interactions that involve 26 SARS-CoV-2 proteins and 332 386 human host proteins. Each interaction in the map was assigned an interaction score that represents 387 the strength of the interaction (a score between 0 and 1). We also collected the protein-protein 388 interactions (with interaction scores) between all human host proteins from the HIPPIE database 78 389 We obtained a list of Highly/Lowly (H/L) expressed genes under different health conditions that 390 potentially associated with COVID19 10,28,29 . To identify the vulnerable populations using 391 foRWaRD, the full list of genes associated with these diseases were downloaded from the 392 DisGeNET For the analysis, we did not use the bootstrapping option, selected homo sapiens as 'species', and 407 used default values for all other parameters. We obtained GO terms with a "difference score" 408 above 0.5. This score represents the normalized difference between the query probabilities and the 409 baseline probabilities in the RWR algorithm, with the best score observed as 1 ( Table 1) . Building the HLH-SARS-CoV-2 interaction network 412 We first built a network that connects all the SARS-CoV-2 proteins and the human host 413 proteins based on the collected protein interaction data. The edges connecting different proteins 414 are weighted based on the interaction scores obtained from the original datasets above. Next, we 415 inferred the signaling paths from SARS-CoV-2 proteins down to a list of proteins (genes) of 416 interest. 417 A few key assumptions must be made before we can make such inference. First, since 418 collected protein interactions within the host (and between the host and the SARS-CoV-2 virus) 419 do not have directions, the reconstructed network graph is undirected. Here, we assumed that the 420 information (i.e., infection) flows from the SARS-CoV-2 proteins to the proteins that directly 421 interact with SARS-CoV-2 proteins, next to other intermediate signaling proteins, and finally to 422 the target genes (proteins) of interest. There might be multiple intermediate proteins residing 423 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medRxiv preprint between the direct SARS-CoV-2 interacting proteins and the target proteins of interest. Second, 424 we did not allow loops in our path from SARS-CoV-2 proteins to the target proteins to reduce the 425 computation complexity. Although feedback loops have been reported in previous studies 80 , they 426 are still relatively rarely observed in the human protein-protein network 81 . Last, we assumed that 427 the interaction score between two proteins is proportional to the strength (or the likelihood) of their 428 interaction. A larger interaction score represents either a stronger or more likely interaction, which 429 results in a "stronger" connection edge in both cases. 430 The objective of the analysis was to find the strongest (or most likely) "connecting" path 431 from SARS-CoV-2 proteins to the target proteins (genes) of interest in the constructed network, 432 where the connection strength was quantified by a "connectivity score". We formulated the above 433 problem as: problem denotes the "shortest path problem", which we solved using Dijkstra's algorithm (with a 449 quadratic time complexity in the number of vertices). 450 The above optimization strategy relies heavily on the strong edges (interactions with high 451 scores). The preference of high score edges may lead to over-sized paths, composed of only high 452 score edges. To avoid oversized paths, we penalized/constrained the length of the path (# of edges 453 in the path) while minimizing the connectivity score (a smaller connectivity score represents 454 stronger connectivity). Here, we revised the aforementioned optimization problem into a 2-pass 455 strategy. In the first pass, we find all the shortest paths X(s,t) (with the same path length) that 456 connect SARS-CoV-2 proteins to the target proteins of interest, without considering the edge 457 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medRxiv preprint weights (interaction scores) in the graph. In the second pass, we find the path x(s,t) in X(s,t) that 458 produces the minimal connectivity score Connectivityconstrained (s,t) by taking the weight scores into 459 considerations for only the selected candidate paths from the first run. 460 461 We have packaged all the code into a tool named GeneList2COVID19, which is freely 465 available for academic uses. 466 467 Ranking diseases using foRWaRD 468 We developed foRWaRD to rank a set of diseases (with known associated genes) based on their 469 relevance to a set of genes (here HLH genes). This method works on the principles of Random 470 Walk with Restarts (RWRs) for ranking genes and gene sets on heterogenous networks 33,79,82 , and 471 enables integration of gene-level interactions to rank a set of diseases, with known associated 472 genes, based on their relevance. 473 foRWaRD, requires three types of inputs (Sup. Fig. 1 ): 1) a set of diseases along with 474 genes associated with each disease and the score of gene-disease associations (optional), 2) a gene 475 interaction network (e.g. co-expression, protein-protein interaction, etc.), and 3) a set of query 476 genes. Using these inputs, foRWaRD first generates a heterogeneous network comprising of gene-477 gene edges and disease-gene edges, with normalized edge weights representing the strength of the 478 gene-gene interaction and the strength of evidence for gene-disease interaction (e.g. from the 479 DisGeNET database). Then, the query set is superimposed on this network and is used as the restart 480 set in an RWR algorithm. Using RWR in this algorithm allows us to capture topological 481 information within the network both locally (the neighborhood surrounding the query set) and 482 globally. After the convergence of the RWR, the steady-state probabilities of the disease nodes 483 represent their relevance to the query set. In order to correct for the network bias (i.e. to avoid 484 diseases with a large number of associated genes be ranked highly independent of their relevance 485 to the query set), we run the RWR one more time with all the genes in the network as the restart 486 set, providing a background steady-state probability for each disease node. The difference between 487 the steady state probabilities of these two RWRs are then normalized between 0 and 1. More 488 specifically, letrepresents the difference between the steady state probabilities of the two RWRs 489 for disease , where = 1, 2, … , and is the total number of diseases to be ranked (note that 490 −1 ≤ -≤ 1). Also, let 3,4 = max -(| -|). The normalized disease score (NDS) for the -th 491 disease is: 492 It is important to note that NDS above 0.5 reflect diseases whose similarity score with respect to 494 the query set is larger than their similarity score with respect to all genes (i.e. background). 495 The RWR (which we used in foRWaRD) is an algorithm for scoring the similarity between 496 any given node of a weighted network and a query set of nodes. Starting from some initial node, 497 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . at each step the random walker moves to an adjacent node with a probability proportional to the 498 edge weight connecting the two nodes and with some probability (known as probability of restart) 499 it jumps to one of the nodes in the query set (also known as the restart set). The restart probability 500 controls the influence of the local topology of the network (surrounding the query set) and its 501 global topology. We used = 0.5 to balance the influence of these two factors. 502 503 Software Availability 504 The software GeneList2COVID19 is written in Python, available as an open source tool at GitHub 505 (https://github.com/phoenixding/genelist2covid19). An implementation of the software 506 foRWaRD is available in Python and is freely available on GitHub 507 (https://github.com/ddhostallero/foRWaRD). These GitHub repositories include the source code 508 as well as detailed instructions on how to install and use the methods. 509 510 511 512 513 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. contributions and to ensure that questions related to the accuracy or integrity of any part of the 531 work, even ones in which the author was not personally involved, are appropriately investigated, 532 resolved, and the resolution documented in the literature. 533 534 535 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. Hemolytic anemia, nonspherocytic, due 0.500 (-) None . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . to glucose phosphate isomerase deficiency 10 Bleeding disorder, platelet-type, 15 0.500 (-) None 551 Abbreviations used: NETs = Neutrophil Extracellular Traps. a A "+" sign indicates demonstrated NET release in the disease, a "-" sign a deficiency in NETs. Brackets "( )" 554 around the "+" or "-" signs indicate prediction without published data. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. network shows all the paths connecting the SARS-CoV-2 proteins to the HLH proteins (genes). 560 The red nodes represent the SARS-CoV-2 proteins, the yellow nodes are the human host proteins 561 that directly interact with SARS-CoV-2 proteins, the green nodes are the intermediate interacting 562 host proteins, and the blue nodes denote the target HLH proteins (genes). The edge weights in the 563 network represent the interaction strength (or probability). 564 565 Figure 2 . HLH genes are significantly enriched within the SARS-CoV-2 host protein 566 interactome. A connectivity score was calculated for each of the genes of interest (e.g. HLH 567 genes in this work). We further analyzed the network connectivity of all genes to the SARS-CoV2 568 proteins (or randomly picked genes). With these two analyses, we ended up with two lists of 569 connectivity scores: SA (for HLH genes) and SB (for all background genes). Then, we calculated 570 the statistical significance (p-value) using a one-sided Mann-Whitney rank test to determine 571 whether SA is significantly smaller than SB (stronger connectivity). SA significant p-value implies 572 that the list of proteins (genes) of interest is "significantly connected" to the SARS-CoV2 proteins. 573 A) A list of 10 known COVID19 related genes (differential genes in severe COVID19 patients) 574 have statistically stronger connections to the SARS-CoV-2 proteins compared with all background 575 genes (p-value=0.017). B) A list of 100 random genes does not "significantly" connect to the 576 SARS-CoV-2 proteins. C) The 23 Male infertility genes do not "significantly" connect to the 577 SARS-CoV-2 proteins. D) The 11 HLH genes have statistically (p-value=0.00821) stronger 578 connections to the SARS-CoV-2 proteins (compared with all background genes). E) A list of 45 579 vascular angiogenesis genes linked to both H1N1 and SARS-CoV-2 pulmonary infections 580 significantly (p-value=4.89e-5) connect to the SARS-CoV-2 proteins. The 11 HLH genes have 581 the smallest mean/median connectivity score compared to all the gene lists analyzed. Please note 582 that the p-values here only indicate whether the input gene lists have significantly smaller 583 connectivity scores than all the background genes, and they could be affected by the size of the 584 gene list. To compare the strength of the "connectivity" of input gene lists to SARS-CoV-2 585 proteins, we should also look at the mean (represented by a green triangle) and the median 586 (represent by a vertical line) connectivity scores. 587 588 Figure 3 . Differential expression of HLH genes in COVID19 associated health conditions. 589 Gene on the log fold change (condition vs control). Genes whose fold change was among the top 25% 596 were classified as high (H) and those whose fold change was among the bottom 25% were 597 classified as low (L). Finally, the 11 HLH genes were assessed to determine whether they are 598 among the H or L genes in each condition. The red blocks represent HLH genes that were highly 599 expressed (H, top 25%) in condition (vs. control) while the blue blocks represent HLH genes that 600 were lowly expressed (L, bottom 25%) in condition (vs. control). We compared the HLH genes 601 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 2, 2020. . and all background genes in terms of H/L expression under various conditions. We first counted 602 the number of H/L (differentially expressed between the condition and control) for each of the 603 HLH genes, and then for each of the background genes. Next, we used a one-sided Mann-Whitney 604 rank test to determine whether the HLH genes have larger absolute fold changes, (i.e, are 605 differentially expressed), in COVID19 associated conditions compared to all the background genes 606 significantly (p-value<0.05). The average number of H/L conditions for HLH genes (red or blue 607 blocks) is 3.82, which is significantly larger (one-sided rank-sum test p-value=1.01e-4) than the 608 average number of H/L conditions for all genes (1.70). 609 610 611 Figure 4 . Expression of HLH genes in Control, inactive sJIA and active sJIA Neutrophils. 612 Gene expression of HLH-SARS-CoV-2 and positive COVID-19 genes (Supplementary Table 1 ) 613 in sJIA was calculated from GEO series GSE122552 47 . Data was mapped to the hg38 genome and 614 normalized by reads per kilobase per million (RPKM). Values for HLH genes were displayed for 615 control and sJIA patients that were either in remission (inactive sJIA) or had active symptoms 616 (active sJIA). 617 618 Figure 5 . Model of NET-mediated endothelial damage contributing to pulmonary vascular 619 thrombosis in severe COVID-19. 620 Infection by SARS-CoV-2 in vulnerable population will lead to hyperinflammation either from 621 underlying genetic mutations, specific epigenetic landscapes or external factors, that will result in 622 the increase circulation of acute phase reactants such as CRP and pro-inflammatory cytokines 623 associated with neutrophilia like IL-6, IL-17A/F and CXCL8 (IL-8). IL-17A activates the 624 endothelium to induce neutrophil adhesion 93 , where the increase in CRP can trigger the release of 625 NETs, resulting in damage to the endothelium as well as aggregation and activation of platelets. 626 Additionally, the presence of SARS-CoV-2 E protein in type II pneumocytes could disturb the 627 surfactant cargo via its interaction with AP3B1, leading to impaired secretion of SP-D and greater 628 NET formation by septal and intra-alveolar neutrophils increasing the risk of thrombosis in the 629 pulmonary microvasculature. In some predisposed patients the combinations of these mechanisms 630 will lead to severe COVID-19 complications. The identification of mediators of this pro-631 coagulation cascade is essential in achieving the two-fold task of identifying vulnerable 632 populations and developing a personalized medicine approach. 633 634 635 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medRxiv preprint Figure 1 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. . https://doi.org/10.1101/2020.07.01.20144121 doi: medRxiv preprint Abnormal coagulation parameters are associated with 637 poor prognosis in patients with novel coronavirus pneumonia Clinical course and risk factors for mortality of adult inpatients with COVID-640 19 in Wuhan, China: a retrospective cohort study High incidence of venous thromboembolic events in anticoagulated severe 642 COVID-19 patients High risk of thrombosis in patients with severe SARS-CoV-2 infection: a 644 multicenter prospective cohort study Complement associated microvascular injury and thrombosis in the 647 pathogenesis of severe COVID-19 infection: a report of five cases Large-Vessel Stroke as a Presenting Feature of Covid-19 in the Young Imbalanced Host Response to SARS-CoV-2 Drives Development of 652 COVID-19 COVID-19: consider cytokine storm syndromes and immunosuppression Neutrophil extracellular traps in COVID-19 Clinical course and risk factors for mortality of adult inpatients with COVID-657 19 in Wuhan, China: a retrospective cohort study Clinical predictors of mortality due to 659 COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care 660 Weathering the COVID-19 storm: Lessons from hematologic cytokine 662 syndromes Hemophagocytic Lymphohistiocytosis Induced by Severe Pandemic 664 Influenza A (H1N1) 2009 Virus Infection: A Case Report. Case Rep. Med. 2011, 951910 Analysis of fatal cases of pandemic influenza A (H1N1) virus infections in 667 pediatric patients with leukemia Hemophagocytic lymphohistiocytosis associated with 2009 pandemic 669 influenza A (H1N1) virus infection Report of a Fatal Pediatric Case of 671 Pulmonary Vascular Endothelialitis, Thrombosis, and Angiogenesis in 674 Adult haemophagocytic syndrome An outbreak of severe Kawasaki-like disease at the Italian epicentre of the 678 -2 epidemic: an observational cohort study Outbreak of Kawasaki disease in children during COVID-19 pandemic: a 681 prospective observational study in Kawasaki-like disease: emerging complication during the 684 COVID-19 pandemic Hyperinflammation, rather than hemophagocytosis, is the 686 common link between macrophage activation syndrome and hemophagocytic 687 lymphohistiocytosis A SARS-CoV-2 protein interaction map reveals targets for drug 689 repurposing Advances in the pathogenesis of primary and 693 secondary haemophagocytic lymphohistiocytosis: differences and similarities Host-Viral Infection Maps Reveal Signatures of Severe COVID-19 Patients Key functional genes of spermatogenesis identified by 698 microarray analysis Incidence, clinical characteristics and prognostic factor of patients with 700 COVID-19: a systematic review and meta-analysis SARS-CoV-2 response signaling and regulatory networks The role of Rab27a in the regulation of neutrophil function Rab27a and Rab27b regulate neutrophil azurophilic granule exocytosis 708 and NADPH oxidase activity by independent mechanisms Localization of the AP-3 adaptor complex defines a novel endosomal exit 711 site for lysosomal membrane proteins Knowledge-guided analysis of 'omics' data using the KnowEnG cloud 713 platform STRING v11: protein-protein association networks with increased 715 coverage, supporting functional discovery in genome-wide experimental datasets HumanNet v2: human gene networks for disease research Mechanisms of action of ruxolitinib in murine models of hemophagocytic 720 lymphohistiocytosis Neutrophil extracellular traps in immunity and disease COVID-19 and Kawasaki Disease: Novel Virus and Novel Case Enhanced formation of neutrophil extracellular traps in Kawasaki disease Neutrophil extracellular traps induce 728 aggregation of washed human platelets independently of extracellular DNA and histones An emerging role for neutrophil extracellular traps in noninfectious 731 disease Neutrophil cytoplasts induce TH17 differentiation and skew 733 inflammation toward neutrophilia in severe asthma Role of C-Reactive Protein at Sites of Inflammation and 735 Infection Secondary hemophagocytic lymphohistiocytosis in pediatric patients: a 737 single center experience and factors that influenced patient prognosis Neutrophils in pediatric autoimmune disease The role of extracellular histones in systemic-onset 742 juvenile idiopathic arthritis Neutrophils From Children With Systemic Juvenile Idiopathic Arthritis 744 Exhibit Persistent Proinflammatory Activation Despite Long-Standing Clinically Inactive 745 Blueprint and COVID-19 Global substrate profiling of proteases in human neutrophil 748 extracellular traps reveals consensus motif predominantly contributed by elastase Neutrophil extracellular traps contain calprotectin, a cytosolic protein 751 complex involved in host defense against Candida albicans The DisGeNET knowledge platform for disease genomics Which White Blood Cell Subtypes Predict Increased Cardiovascular 756 Risk? CRP Induces NETosis in Heart Failure Patients with or without Diabetes Targeting potential drivers of COVID-19: Neutrophil extracellular traps Primary tumors induce neutrophil extracellular traps with targetable 762 metastasis promoting effects Neutrophil extracellular traps sequester circulating tumor cells and 764 promote metastasis Fibrosis Lung Disease from Childhood to Adulthood: Neutrophils Trap (NET) Formation, and NET Degradation Neutrophil Extracellular Traps in Tissue and Periphery in Juvenile 774 Neutrophil extracellular traps in pediatric inflammatory bowel disease Neutrophil extracellular traps (NETs) exacerbate severity of infant sepsis Association of Blood Glucose Control and Outcomes in Patients with COVID-780 19 and Pre-existing Type 2 Diabetes Dementia care during COVID-19 Epidemiology of Covid-19 in a Long-Term Care Facility in King 784 Storm, typhoon, cyclone or hurricane in 786 patients with COVID-19? Beware of the same storm that has a different origin Identification of a homozygous deletion in the AP3B1 gene causing 789 Altered Trafficking of Lysosomal Proteins in Hermansky-Pudlak Syndrome Due to 792 Mutations in the β3A Subunit of the AP-3 Adaptor Subcellular location and topology of severe acute respiratory 794 syndrome coronavirus envelope protein Coronavirus envelope protein: current knowledge SARS-CoV-2 Receptor ACE2 is an Interferon-Stimulated Gene in Human 798 The alveolar epithelium determines susceptibility to lung fibrosis in 801 Pathological evidence for residual SARS-CoV-2 in pulmonary tissues of a 803 ready-for-discharge patient Coronin 1A, a novel player in integrin biology, controls neutrophil trafficking 805 in innate immunity β2 integrin mediates hantavirus-induced release of neutrophil 807 extracellular traps Complement Activation Contributes to Severe Acute Respiratory 809 High Level of Neutrophil Extracellular Traps Correlates With Poor Prognosis 811