key: cord-352700-8ic7gu5y authors: Hernandez Cordero, A. I.; Li, X.; Milne, S.; Yang, C. X.; Bosse, Y.; Joubert, P.; Timens, W.; Berge, M. v. d.; Nickle, D.; Hao, K.; Sin, D. D. title: Multi-omics highlights ABO plasma protein as a causal risk factor for COVID-19 date: 2020-10-06 journal: nan DOI: 10.1101/2020.10.05.20207118 sha: doc_id: 352700 cord_uid: 8ic7gu5y SARS-CoV-2 is responsible for the coronavirus disease 2019 (COVID-19) and the current health crisis. Despite intensive research efforts, the genes and pathways that contribute to COVID-19 remain poorly understood. We therefore used an integrative genomics (IG) approach to identify candidate genes responsible for COVID-19 and its severity. We used Bayesian colocalization (COLOC) and summary-based Mendelian randomization to combine gene expression quantitative trait loci (eQTLs) from the Lung eQTL (n=1,038) and eQTLGen (n=31,784) studies with published COVID-19 genome-wide association study (GWAS) data from the COVID-19 Host Genetics Initiative. Additionally, we used COLOC to integrate plasma protein quantitative trait loci (pQTL) from the INTERVAL study (n=3,301) with COVID-19-associated loci. Finally, we determined any causal associations between plasma proteins and COVID-19 using multi-variable two-sample Mendelian randomization (MR). We found that the expression of 20 genes in lung and 31 genes in blood was associated with COVID-19. Of these genes, only three (LZTFL1, SLC6A20 and ABO) had been previously linked with COVID-19 in GWAS. The novel loci included genes involved in interferon pathways (IL10RB, IFNAR2 and OAS1). Plasma ABO protein, which is associated with blood type in humans, demonstrated a significant causal relationship with COVID-19 in MR analysis; increased plasma levels were associated with an increased risk of having COVID-19 and risk of severe COVID-19. In summary, our study identified genes associated with COVID-19 that may be prioritized for future investigation. Importantly, this is the first study to demonstrate a causal association between plasma ABO protein and COVID-19. Introduction 136 threshold for statistical significance and relevant associations below this threshold may be 161 missed. Second, disease-associated loci are typically thousands of base pairs wide and 162 thus contain multiple genes, which may obscure the causal gene contributing to the disease 163 trait. One emerging approach to identify genes within susceptibility loci is integrative 164 genomics. By combining genomic information with transcriptomic, proteomic and/or 165 methylation data, integrative genomics is able to fine-map genetic susceptibility loci and 166 identify genes and proteins most likely to have a causal association with disease. This 167 method has previously elucidated specific protein coding genes and mechanisms that 168 contribute to complex traits (Cano-Gamez and Trynka 2020). In the present study, we 169 harnessed the power of integrative genomics to identify several susceptibility genes for 170 COVID-19 and investigate the causal relationship between the plasma protein levels of 171 candidate genes and COVID-19. Notably, here, we demonstrate a causal association of the 172 ABO protein (this protein is responsible for the ABO blood groups) with both the risk for 173 COVID-19 and its severity. For our study we obtained publicly available summary statists from the COVID-19 HG 181 GWAS meta-analysis V3 (https://www.covid19hg.org/) (The COVID-19 Host Genetics 182 Initiative 2020). We obtained the summary statistics for two case-control analyses: 1) 183 "susceptibility to COVID-19" (where cases were all individuals with a diagnosis of COVID-184 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . Dutch national ethical and professional guidelines ("Code of conduct; Dutch federation of 206 biomedical scientific societies", http://www.federa.org). This study determined the gene 207 expression of non-tumour lung tissue samples using 43,466 non-control probe sets (see 208 GEO platform GPL10379). The participants were also genotyped using the Illumina Human 209 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.05.20207118 doi: medRxiv preprint 1M Duo BeadChip, and after imputation a total of 7,640,142 single nucleotide 210 polymorphisms (SNPs) for Laval, 7,610,179 for UBC, and 7,741,505 for Groningen were 211 kept for an eQTL analysis. Data for each site were evaluated, separately, using a linear 212 regression model, which assumed an additive genotype effect. Site-specific results were 213 then combined by meta-analysis using a fixed effects model with inverse variance 214 weighting. cis expression quantitative trait loci (cis-eQTLs) were defined by a 2 Mb window 215 (± 1Mb probe to SNP distance). Full details of the cohort and genotyping quality control, 216 and eQTL analysis are provided by Hao and colleagues (Hao et al. 2012) . 217 eQTLGen 218 We obtained blood cis-eQTL summary statistics from eQTLGen. The eQTLGen cohort used 219 to estimate the cis-eQTLs consisted of 31,684 whole blood (85%) and peripheral blood 220 mononuclear cell (15%) samples from 37 datasets. Gene expression profiles and 221 genotypes were obtained for the eQTLGen cohort. The full details on the participants, gene 222 expression processing and genotyping for each dataset are described by the eQTLGen 223 consortium (Võsa et al. 2018 ). The cis-eQTL analysis was performed in each separate 224 dataset and were estimated within a 2 Mb window (± 1Mb probe to SNP distance) as 225 previously described by Westra and colleagues (Westra et al. 2013) , later the results were 226 combine by meta-analysis using a weighted Z-score method (Westra et CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. . Biobank array was used for the genotyping of 830,000 SNPs and genotypes were later 235 imputed using the 1000 Genome phase 3 UK10K reference panel. After quality controls 236 10,572,788 SNPs were retained. A randomly-selected subset of 3,301 participants were 237 used for the plasma pQTL analyses of 3,622 proteins (Sun et al. 2018 ). Plasma protein 238 levels were measured by using an expanded version of an aptamer-based multiplex protein 239 assay (SOMAscan) previously described by Sun and colleagues (Sun et al. 2018 ). The 240 protein levels were adjusted for confounding variables (age, sex, waiting period between 241 blood collection and processing and the first three genetic principal components) and the 242 residuals were extracted and rank-inverse normalized. pQTL analysis consistent of testing 243 genetic associations with a linear regression using an additive genetic model. The results 244 from each donor centers were combined using fixed-effect inverse-variance meta-analysis. 245 Further details on the study cohort, genotyping protocol and quality control are described by 246 Sun and colleagues (Sun et al. 2018 We first conducted Coloc tests to determine the probability that SNPs associated to COVID-252 19 phenotypes and gene expression (eQTLs) were consistent with shared genetic causal 253 variants (colocalization). This integrative genomic method estimates the 'posterior 254 probabilities' (PP) of five hypothesis described as follow: a genetic locus has no 255 associations with either of the two traits (i.e. gene expression and a complex trait) 256 investigated (H 0 ); the locus is associated only with gene expression (H 1 ); the locus is 257 associated only with the complex trait (H 2 ); the locus is associated with both traits via 258 independent SNPs (H 3 ); the locus is associated with both traits through shared SNPs (H 4 ) 259 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. . (i.e.: a SNP is associated with COVID-19 and is also a cis-eQTL). Colocalization is 260 therefore indicated by a high PP of H 4 being true. For these analyses we used the coloc 261 package (Giambartolomei et al. 2014) implemented in R. We tested only cis-eQTL regions 262 (± 1Mb probe to SNP distance). As required by the method and recommended by 263 Giambartolomei and colleagues (Giambartolomei et al. 2014 ) we set 'prior probability' of the 264 various configurations (H 1, H 2, and H 4 ). For the eQTL dataset we used 1 × 10 -04 prior 265 probability for a cis-eQTL (H 1 ). We also used 1 × 10 -04 prior probability for COVID-19 266 associations (H 2 ). Finally, we set a prior probability that a single variant affects both traits 267 (H 4 ) to be 1 × 10 -06 . We set significant colocalization (posterior probability) at PP H4 > 0.80. 268 We executed coloc between the loci associated to COVID-19 (susceptibility and disease 269 severity) and cis-eQTLs associated to gene expression in both lung and blood tissues, 270 retaining genes whose expression colocalized with COVID-19 in both compartments as 271 'candidate genes'. If the corresponding proteins of the candidate genes were present in the 272 plasma protein dataset (INTERVAL study) we executed Coloc analysis for the plasma 273 protein levels and COVID-19 phenotypes. 274 The SMR method was specifically built to test the association between gene expression 276 and a complex trait using a SNP as the instrument ( Therefore, we employed SMR method to identify genes whose expression in lung and 280 blood may mediate the effect of genetic variants on COVID-19. For the SMR the lung and 281 blood cis-eQTLs and COVID-19 HG GWAS meta-analysis summary statistics were used. 282 We selected the 1000G phase 3 EUR (1000 Genomes Project Consortium et al. 2015) as 283 the reference panel for linkage disequilibrium (LD) estimation. Significant SMR was defined 284 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . at P SMR < 0.001. The significant SMR by itself does not necessarily means that the same 285 variants are associated with the gene expression and the phenotype; the association could 286 be a consequence of the LD between independent causal variants, rather than pleiotropy of 287 a single causal variant or causality. To determine whether the associations were related to 288 LD we used the heterogeneity in dependent instruments test (HEIDI) test (Zhu et CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.05.20207118 doi: medRxiv preprint The COVID-19 HG GWAS identified one genetic locus associated with susceptibility to 335 COVID-19 (defined as a positive COVID-19 diagnosis versus the general population) and 336 one genetic locus associated with severe COVID-19 (defined as hospitalization for COVID-337 19 versus the general population) at a genome-wide significant threshold of P GWAS < 5 × 10 -338 08 . In our analysis, in addition to these loci, we also explored regions below this stringent 339 threshold. We integrated these GWAS results with gene expression data from both lung 340 tissue (Lung eQTL Study) and blood (eQTLGen) using two statistical methodologies 341 (described in detail in the Methods) (Fig. 1) . Bayesian colocalization assess whether two 342 genetic association signals are consistent with a shared causal variant. We defined 343 colocalization as the posterior probability of this hypothesis (PP H4 ) being > 0.80. SMR 344 integrates summary data from GWAS and eQTLs in order to identify genes whose 345 expression levels are associated with a trait due to the effects of a common genetic variant 346 (either by direct causal or pleiotropic effects) rather than due to genetic linkage. 347 Significance of SMR estimate was set at P SMR < 0.001 with no significant heterogeneity 348 (P HEIDI > 0.05). 349 350 351 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. severe COVID-19 associated loci (Fig. 2a) and 3 with the COVID-19 diagnosis associated 362 loci (Supplementary Table S1 ). The majority of the gene associations identified in lung 363 tissue were novel (i.e. not been identified by previous GWAS), and in most cases (10 out of 364 14), the genes met the suggestive statistical significance (P GWAS < 5 × 10 -05 ) rather than the 365 genome-wide significance (P GWAS < 5 × 10 -08 ) (Supplementary Table S1 ). For example: 366 IL10RB and IFNAR2 (Fig. 2a) are interferon (IFN) receptor genes that co-localized with 367 COVID-19 (PP H4 > 0.80); these genes were located in the same suggestive locus on 368 chromosome 21 (sentinel SNP rs9976829, P GWAS = 7.2 × 10 -07 [COVID-19 hospitalization 369 vs population] (The COVID-19 Host Genetics Initiative 2020)). In addition, results from the 370 SMR show that the increased expression of IFNAR2 in lung tissue was associated with 371 decreased the risk of severe COVID-19 (Supplementary Table S2 ) and susceptibility to 372 COVID-19 (Fig. 2b) (P SMR < 0.001). SMR also identified a first-time association for the gene 373 OAS1 (Fig. 2b and Supplementary Table S2) , which is an interferon stimulated gene 374 involved in the cellular response to viral infection; increased expression of this gene was 375 associated with decreased susceptibility to COVID-19. 376 377 In addition to these novel gene associations, three co-localized genes (LZTFL1, SLC6A20 378 and ABO) were within loci that have been previously associated with severe COVID-19 by 379 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . (Fig. 2a) . SMR showed that increased SLC6A20 expression 380 in lung tissue was associated with increased the risk of severe COVID-19 (Supplementary 381 Table S2 ) and susceptibility to COVID-19 (Fig. 2b) . ABO gene expression co-localized with 382 severe COVID-19 Fig. 2c . The colocalization between ABO gene expression and the 383 COVID-19 susceptibility associated loci was not tested since the variants associated with 384 this gene in the lung eQTL and eQTLGEN studies were not present in the COVID-19 HG 385 meta-analysis for this phenotype. 386 In blood, the expression of 18 and 8 unique genes co-localized with COVID-19 severity 388 ( Fig. 2a) and susceptibility associated loci (Supplementary Table S1), respectively. The 389 expression of ABO in blood co-localized with severe COVID-19 associated loci. In addition, 390 IFNAR2 expression in blood co-localized with both COVID-19 phenotypes, although its 391 strongest colocalization (PP H4 = 0.96) was found with the COVID-19 hospitalization 392 phenotype. Other first-time associations within COVID-19 suggestive loci included 393 HNRNPU-AS1, ATP11A and CTD-2555A7.3. SMR identified 22 genes whose expression in 394 blood was associated with COVID-19 phenotypes (Supplementary Table S2 ). Increased 395 blood expression levels of OAS1 and CPOX in blood were associated with decreased 396 susceptibility to COVID-19 and increased risk of severe COVID-19, respectively. 397 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. The functional effects of genes are generally imparted through their translation into 418 proteins. In order to strengthen the mechanistic association between the identified genes 419 and COVID-19, we determined which of these genes were associated with both blood 420 protein levels and COVID-19 phenotypes. We integrated the COVID-19 HG GWAS and 421 blood protein GWAS from the INTERVAL study (which includes genome-wide associations 422 between genetic variants and 3,622 blood proteins) using Coloc. Additionally, we applied 423 MR to determined causal associations between protein and COVID-19. 424 Of the three protein-coding candidate genes (ABO, IFNAR2, and ATP11A) whose 426 expression in both tissues co-localized with COVID-19 phenotypes, only ABO was present 427 in the INTERVAL study. We found that ABO plasma protein levels co-localized with 428 susceptibility to COVID-19 and its severity ( Fig. 3a and Supplementary Table S3) . 429 430 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . SNP effect on the plasma protein levels, and the y-axis the SNP effect on severe COVID-442 19. The variants used for the MR are shown inside the plot with error bars that represent 443 the 95% confidence intervals. Table S4 ). The results indicated a significant causal 453 association between ABO plasma protein levels and COVID-19 phenotypes (P < 0.05) (Fig. 454 3b and Supplementary Figure S1 ). Each ABO-increasing allele increased the odds of 455 COVID-19 by 7% and the odds of severe COVID-19 by 11% (Fig. 3b) . In addition, we 456 performed a sensitivity analysis to assess if the MR results were driven by a single SNP. Consistent with these findings, another recent study has shown that a risk variant 487 (rs11385942) for SLC6A20 expression in human lung cells conferred an increased risk for 488 severe COVID-19 (Ellinghaus et al. 2020, p. 19) . LZTFL1 encodes leucine zipper 489 transcription factor-like protein 1, a protein involved in intracellular cargo trafficking that is 490 linked to congenital ciliopathies; the mechanism of its association with COVID-19 is unclear. 491 492 Of the newly described gene associations with susceptibility to COVID-19 and severe 493 COVID-19, IL10RB, IFNAR2 and OAS1 are notable given their role in the IFN pathways. 494 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . sample MR analysis, we showed that the ABO plasma protein is likely a causal risk factor 514 for susceptibility to COVID-19 and severe COVID-19. The mechanism by which ABO 515 protein modifies COVID-19 risk is unclear. In SARS-CoV a naturally occurring anti-A 516 antibodies can inhibit spike protein-mediated cellular entry via the ACE2 receptor (Guillon et 517 al. 2008 ) (this is also the putative entry mechanism for SARS-CoV-2), it has been 518 speculated that this effect may also be found in SARS-CoV-2 (Zhao et al. 2020). Another 519 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . possible explanation is that the A blood type is associated with an increased risk of 520 cardiovascular disease (Wu et al. 2008) , which is a known risk factors for severe COVID-521 19, while those with the O blood type are less likely to develop cardiovascular diseases. 522 Furthermore, previous reports found an incidence of venous thromboembolism (VTE) of 523 27% in critically ill COVID-19 patients (Klok et al. 2020, p. 19) . ABO blood type has been 524 previously associated with risk of VTE (Wang et al. 2017 ); interestingly, the protein-525 increasing allele (C) of the top SNP (rs505922) used in the MR tests is also a risk variant 526 for VTE (Trégouët et al. 2009 ). Further investigation is needed to understand the 527 physiological role of ABO in the pathophysiology of COVID-19. 528 529 Our study had several limitations. Firstly, our analyses were limited by the number of 530 genetic variants that overlapped between the COVID-19 HG meta-analysis and the 531 datasets that were used for this study. Thus, we could not test some genes that may be of 532 importance. Secondly, replication of our result in an independent dataset was not possible 533 due to the lack of COVID-19 GWAS data available at the present time; therefore first-time 534 associations identified by our analyses should be considered with caution. Lastly, the lung 535 and blood -omics data used reflect the transcriptomic and proteomic profile under normal 536 conditions, however it is likely that that these associations may change under stimulation by 537 viral infection or acute inflammation. 538 539 Conclusions 540 We used a multi-omics approach to identify several candidate genes that may be involved 542 in the pathogenesis of COVID-19. The analyses presented here linked COVID-19 genomics 543 to gene expression in lung and blood tissues. This approach revealed specific genes within 544 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. . previously implicated COVID-19 loci, and also identified new genes whose biology is 545 consistent with COVID-19. Importantly, our analysis suggests that the ABO protein is a 546 causal risk factor for severe COVID-19 and COVID-19 susceptibility. Table S1 : Genes associated to COVID-19 through colocalization. 552 Table S2 : Genes associated to COVID-19 through Summary-based Mendelian 553 Randomization. 554 Table S3 . COVID-19 phenotypes and ABO plasma protein colocalization. 555 Table S4 . Inverse variance weighting mendelian randomization (IVW-MR) of ABO plasma 556 protein and COVID-19. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.05.20207118 doi: medRxiv preprint Dysregulation of type I interferon responses in COVID-570 19 The transmembrane adaptor protein SIT inhibits 573 TCR-mediated signaling Inhaled interferon beta therapy shows promise in COVID-19 trial Mendelian randomization with invalid 580 instruments: effect estimation and bias detection through Egger regression Nasal Gene Expression of Angiotensin-Converting 583 Enzyme 2 in Children and Adults The UK Biobank resource with deep 586 phenotyping and genomic data From GWAS to Function: Using Functional Genomics to 589 Identify the Mechanisms Underlying Complex Diseases Second-generation PLINK: rising to the 592 challenge of larger and richer datasets Efficiency and safety of varying 595 the frequency of whole blood donation (INTERVAL): a randomised trial of 45 000 596 donors An interactive web-based dashboard to track COVID-19 in 598 real time Genomewide Association Study of 601 Severe Covid-19 with Respiratory Failure Bayesian Test for Colocalisation 604 between Pairs of Genetic Association Studies Using Summary Statistics CoV Spike protein and its cellular receptor by anti-histo-blood group antibodies Lung eQTLs to help reveal the molecular 610 underpinnings of asthma Covid-19: risk factors for severe disease and death Incidence of thrombotic 615 complications in critically ill ICU patients with COVID-19 Presence of Genetic 618 Variants Among Young Men With Severe COVID-19 Intestinal IMINO transporter SIT1 is not 621 expressed in human newborns The transmembrane adapter protein SIT 624 regulates thymic development and peripheral T-cell functions MendelianRandomization: Mendelian Randomization Package Genomic atlas of the human plasma 629 proteome Selecting instruments for Mendelian 631 randomization in the wake of genome-wide association studies The COVID-19 Host Genetics Initiative, a 634 global initiative to elucidate the role of host genetic factors in susceptibility and 635 severity of the SARS-CoV-2 virus pandemic Common susceptibility alleles are unlikely to 638 contribute as strongly as the FV and ABO loci to VTE risk: results from a GWAS 639 approach Unraveling the polygenic architecture of 641 complex traits using blood eQTL metaanalysis Human intestine luminal 644 ACE2 and amino acid transporter expression increased by ACE-inhibitors SARS-CoV-2 receptor 647 ACE2 gene expression in small intestine correlates with age Influences of ABO blood group, age and gender on 650 plasma coagulation factor VIII, fibrinogen, von Willebrand factor and ADAMTS13 651 levels in a Chinese population Systematic identification of trans eQTLs as 653 putative drivers of known disease associations ABO(H) blood groups and vascular disease: 656 a systematic review and meta-analysis GCTA: a tool for genome-wide complex 659 trait analysis Relationship between the ABO Blood Group and the 661 COVID-19 Susceptibility Clinical course and risk factors for mortality of adult 664 inpatients with COVID-19 in Wuhan, China: a retrospective cohort study A pneumonia outbreak associated with a new 667 coronavirus of probable bat origin A Novel Coronavirus from Patients with Pneumonia 670 in China Integration of summary data from GWAS and eQTL 673 studies predicts complex trait gene targets