key: cord-0907205-0xeue1rm
authors: Thibord, Florian; Chan, Melissa V.; Chen, Ming-Huei; Johnson, Andrew D.
title: A year of Covid-19 GWAS results from the GRASP portal reveals potential genetic risk factors
date: 2022-02-22
journal: HGG Adv
DOI: 10.1016/j.xhgg.2022.100095
sha: bc5b14db0f4fe9bd42aba99d26bd85382dd49be7
doc_id: 907205
cord_uid: 0xeue1rm

Host genetic variants influence the susceptibility and severity of several infectious diseases, and the discovery of genetic associations with Covid-19 phenotypes could help developing new therapeutic strategies to reduce its burden. Between May 2020 and June 2021, we used Covid-19 data released periodically by UK Biobank and performed 65 Genome-Wide Association Studies (GWAS) in up to 18 releases of Covid-19 susceptibility (N=18,481 cases in June 2021), hospitalization (N=3,260), severe outcomes (N=1,244) and death (N=1,104), stratified by sex and ancestry. In coherence with previous studies, we observed 2 independent signals at the chr3p21.31 locus (rs73062389-A, OR=1.21, P=4.26×10-15 and rs71325088-C, OR=1.62, P=2.25×10-9) modulating susceptibility and severity, respectively, and a signal influencing susceptibility at the ABO locus (rs9411378-A, OR=1.10, P=3.30×10-12), suggesting an increased risk of infection in non-O blood groups carriers. Additional signals at the APOE (associated with severity and death) LRMDA (susceptibility in non-European) and chr2q32.3 (susceptibility in women) loci were also identified but did not replicate in independent datasets. We then devised an approach to extract variants suggestively associated (P<10-5) exhibiting an increase in significance over time. When applied to the susceptibility, hospitalization and severity analyses, this approach revealed the known RPL24, DPP9, and MAPT loci, respectively, amongst hundreds of other signals. These results, freely available on the GRASP portal, provide insights on the genetic mechanisms involved in Covid-19 phenotypes.

The severe acute respiratory syndrome -coronavirus 2 (SARS-CoV-2) is responsible for the coronavirus disease 2019 (Covid-19) which affects individuals with variable severity, ranging from asymptomatic patients to mild respiratory symptoms, hypercytokinemia, pneumonia, thrombosis and even death. 1, 2 Understanding the mechanisms leading to heterogeneous symptoms and susceptibility is essential in order to develop efficient treatments and improve patient care. Host genetic diversity has been shown to influence the effects of infection to several viruses, 3 Notably, these findings implicate common and uncommon variants, while studies trying to identify associations of rare variants have been unsuccessful so far. 9 The largest effort is currently led by the Covid-19 host genetics initiative (Covid-19hgi), 10 which completed meta-analyses of results shared by 61 studies as of June 15 th 2021, identified 23 loci associated with Covid-19 phenotypes so far, and plan to release new results as additional data is made available. A major contributor to this group is the UK Biobank (UKB) 11 which periodically releases the results of Covid-19 tests and related deaths, as well as health care data for its nearly 500,000 consented participants, to approved researchers. Figure   S1 . For analyses prior to the June 18 th 2020 release, we conducted analyses on participants of European ancestry only, and started adding new analyses stratified by sex and ancestry from June J o u r n a l P r e -p r o o f 18 th 2020 onward. GWAS were stratified for EUR, AFR, SAS, and OTHERS ancestries, and an additional trans-ancestry GWAS combining all participants (labelled ALL) as well as GWAS combining non-European (nEUR) participants were performed.

Associations are either reported as odds ratios (OR) and 95% confidence intervals or as beta coefficients () and associated standard errors (SE). Linkage disequilibrium (LD) was estimated by squared correlation (r 2 ) using UKB EUR imputed data. Given the large number of variants that may be significantly (or suggestively) associated at a locus, we assigned significantly (or suggestively) associated variants to a common locus if they were separated by less than 1Mb, , with b1 and b2 corresponding to the observed effects and SEb1

and SEb2 the associated standard errors. In addition, haplotype analyses were performed with the LDlink LDhap tool, 14 while regional association plots were generated with locuszoom. 15 Significant associations (P < 5  10 -8 ) were further investigated to control for underlying health conditions. To reflect the cardiovascular health of the participants, the following traits were used as covariates: BMI, Type 2 diabetes (ICD-10 code E11) and ischemic heart disease (ICD-10 code I25). Results were then compared before and after adjustment using logistic regression with likelihood ratio test. In addition, significant associations were also controlled for Alzheimer's disease (ICD-10 code G30) and asthma (ICD-10 code J45).

For each analysis hosted on the portal, we provide comprehensive annotation for top results (P < 1  10 -5 ) using ANNOVAR 16 and the RESTful API service provided by CADD v1.6. 17 We also retrieve known phenotype associations extracted from the GRASP 18 and EBI GWAS catalogs, 19 and known eQTLs extracted from GTeX v8 20 and other eQTL resources compiled from nearly 150 datasets (built upon the work of Zhang et al, 21 detailed in Table S3 ).

All 65 analyses were updated as soon as new data was released from UKB. As a result, we obtained results for these analyses at different time points, which differed by the addition of new cases, thus increasing power in most recent analyses. With an increase in power, bona fide signals associated with Covid-19 phenotypes should increase in significance in each consecutive data release analyzed. We designed a workflow to extract these signals with positive significance trajectory in each analysis: for each variant (i) in the most recent data release analyzed, P < 10 -5 ;

(ii) P cannot decrease in significance by more than one order of magnitude between 2 consecutive releases; (iii) P must have increased in significance by more than 1 order of magnitude at least once between 2 consecutive releases. In addition, we made sure that the direction of effect did not change over time for each variant selected. This set of rules should ensure to extract variants which increased in significance since we started performing these analyses, while allowing some stagnation, which might happen due to random sampling and/or low case increase between 2 consecutive releases.

The Covid-19hgi datasets

For replication purposes, we used the publicly available Covid-19hgi meta-analyses summary statistics (freeze 6) for Covid-19 susceptibility (labelled C2, N = 112,612 cases), hospitalization (B2, N = 24,274) and severity (A2, N = 8,779 cases). These datasets are currently the largest analyses of Covid-19 phenotypes available. Publicly available summary statistics do not include the 23andMe dataset, and a version of the summary statistics without UKB was also made available for the B2 and C2 analyses. As there was no overlap of samples with our analyses, we used these summary statistics to replicate our findings from susceptibility and hospitalization analyses, as well as our suggestive findings. To replicate findings from our severe analyses, we used the A2 summary statistics without UKB from the freeze 5 (the A2 results without UKB from freeze 6 were not available).

Permission was obtained to post UKB summary statistics under an approved application (ID 28525). The association results are available on the portal, as well as annotated top results.

In addition to UKB summary statistics, results from other efforts are also hosted on the portal. (Table S4) . There was also a decrease in the mean age of positive cases, ranging from 57.02 to 53.57, with a significant drop after the 2020 summer (Table 2) , with younger individuals more likely to be infected ( = -0.04, P < 10 -300 ) while increase in age was associated with hospitalization ( = 0.05; P = 4.54  10 -90 ), severity ( = 0.10, P = 5.63  10 -108 ) and death ( = 0.12, P = 1.11 < 10 -122 ) ( (Table S5) .

The workflow of genetic analyses is presented in Figure 1 . Between May 2020 and June 2021, we observed several genome-wide significant (P < 5  10 -8 ) signals across the 65 analyses.

However, many signals were only punctually significant, and only a handful of signals remained Table 3) . All associations with P < 10 -5 , from all 65 analyses, are available in Table S6 . In addition, all signals reaching genome-wide significance in any data release analyzed are presented in Figures S2-S47 . The signals reported in the following correspond to results using Population controls, except when specified otherwise. Overall, the effect of significant associations was similar when using different set of controls ( Figure S48 ). susceptibility. 24 This previous report was based on UKB data but this signal was not replicated in an independent dataset, and this association was greatly attenuated after the summer, when the number of Covid-19 cases started to rise significantly and the mean age of infected participants decreased (Figure 2.A) . The interaction between the age of participants and the APOE variant was significant (P = 1.78  10 -9 ), which was further confirmed using a 2df test 25 Using the association of the death analysis, we further controlled the association for Alzheimer's disease, for which APOE-4 is a known risk factor, but the variant remained associated (OR = 1.33, P = 4.04  10 -7 ). Adjusting for cardiovascular health modestly increased the strength of association of the APOE allele with COVID-related death (OR = 1.39, P = 6.45 x 10 -9 ) ( Table S7) .

and OTHERS), no signal was found genome-wide significant in the last data release analyzed. In In sex-stratified analyses, using Population controls, the chr3p21.31 susceptibility signal was significant in women (rs73062389, P = 1.06  10 -8 in ALL) and moderatly associated in men (P = 2.10  10 -6 in ALL), whereas the ABO signal was significant in men (rs9411378, P = 5.10  10 -10 in ALL) and moderatly associated in women (P = 3.30  10 -5 in ALL). The chr3p21.31 lead variant in the hospitalization analysis (rs72893671) was more significant in men (P = 6.80  10 -11 in ALL) than in women (P = 3.68  10 -4 in ALL). Despite these differences in significance between men and women for these 3 main signals, we did not observe a significant difference of effect when using the Z-test for the equality of regression coefficients (P > 0.05 for all 3 signals). Additionally, a variant reached genome-wide significance in the analysis of Covid-19 susceptibility of women of ALL ancestry at the chr2q32.3 locus, while no association was observed for this variant in men (P = 0.47). However this association was not supported by the Covid-19hgi C2 analysis (P = 0.58), even though the Covid-19hgi meta-analyses were not sex-stratified.

Given the low number of significant associations, we also investigated suggestive associations (P < 10 -5 ), and kept track of how much they increased or decreased in significance at each new data release analyzed. More specifically, we recorded if these signals increased or J o u r n a l P r e -p r o o f decreased in significance by more than one order of magnitude. After collecting this information across all data releases, we obtained these significance trajectories for each variant. We noted that the significance trajectory of the most robust signals, at the chr3p21.31 and ABO loci, increased at least once by more than one order of magnitude between 2 consecutive releases, and sometimes slightly decreased, but not by more than one order of magnitude (Figure 2) . Thus, we were interested in suggestive associations displaying a similar positive trajectory in significance over time, and reaching a suggestive P-value of at least P < 10 -5 in the last data release analyzed.

Across all 65 analyses, the number of variants reaching a suggestive P-value in the last release was 11,639, and the subset of variants exhibiting a positive significance trajectory was 8,291 (28.8% of the variants with suggestive associations were discarded). After extracting the lead variant at each locus, and removing the lead variants that were already genome-wide significant, we obtained a list of 1,411 lead variants (Table S8) , with some duplicates, as lead variants from a same locus can appear in several analyses.

Notably, some of these loci previously reached genome-wide significance in the Covid-19hgi meta-analyses 10 infections occurred.

The most robust findings from our study are the association of the chr3p21. 31 Covid-19hgi to confirm the validity of a signal, is still challenging. Moreover, this approach relies on the assumption that external variables changing over time, such as SARS-CoV-2 strains, have no effect on these associations. For instance, if UKB participants were exposed to a fast spreading, novel strain, with distinct genetic mechanisms, our approach might be able to detect these new genetic risk factors, but the independent risk factors of previous strains might no longer be detectable.

The associations we observed changing through the pandemic could reflect random effects or changes in statistical power, but some of the results suggest changes due to potential J o u r n a l P r e -p r o o f 

The authors declare no competing interests. The design of our study includes 2 phases. First, GWAS were performed for all 65 analyses, and genome-wide significant associations were identified. Second, we focused on suggestive associations that increased in significance over time. 

COVID-19 and its implications for thrombosis and anticoagulation

The hallmarks of COVID-19 disease

Host genetics and infectious disease: new tools, insights and translational opportunities

HIV-1 entry into CD4+ cells is mediated by the chemokine receptor CC-CKR-5

Life-threatening influenza and impaired interferon amplification in human IRF7 deficiency

Genomewide Association Study of Severe Covid-19 with Respiratory Failure

Trans-ancestry analysis reveals genetic and nongenetic associations with COVID-19 susceptibility and severity

Pan-ancestry exome-wide association analyses of COVID-19 outcomes in 586,157 individuals

Mapping the human genetic architecture of COVID-19: an update

UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age

The UK Biobank resource with deep phenotyping and genomic data

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants

LocusZoom.js: Interactive and embeddable visualization of genetic association study results

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

CADD: predicting the deleteriousness of variants throughout the human genome

GRASP v2.0: an update on the Genome-Wide Repository of Associations between SNPs and phenotypes

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019

Genetic effects on gene expression across human tissues

Synthesis of 53 tissue and cell line expression QTL datasets reveals master eQTLs

Genetics of symptom remission in outpatients with COVID-19

Association between ABO haplotypes and the risk of venous thrombosis: impact on disease risk estimation

ApoE e4e4 Genotype and Mortality With COVID-19 in UK Biobank

Exploiting geneenvironment interaction to detect genetic associations

The mutational constraint spectrum quantified from variation in 141,456 humans

Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function

Association Between Single-Nucleotide Polymorphisms in HLA Alleles and Human Immunodeficiency Virus Type 1 Viral Load in Demographically Diverse, Antiretroviral Therapy-Naive Participants From the Strategic Timing of AntiRetroviral Treatment Trial

A dynamic mucin mRNA signature associates with COVID-19 disease presentation and severity

The effects of the COVID-19 pandemic on people with dementia

Nursing home staff networks and COVID

Ancestry-specific results: Susceptibility:nEUR:Population 10:78250184:T:C

Sex-specific results: Susceptibility:ALL:Population:F

a Indicate the ancestry by label, the control set, and, when it is the case, the corresponding sex-stratified analyses (F: Females, M: Males)

For the replication of the ABO variant, rs9411378 was not available and the best proxy available rs635634 (LD r 2 =0.53) was used instead

For the replication of the LRMDA variant, the C2 analysis restricted to African ancestry participants was used. CHR: Chromosome; POS: Position (hg19 genome build); NEA: Non Effect Allele

OR: Odds-ratio; ±95CI: 95% confidence interval

21 for susceptibility and hospitalization, and 05.09.21 for severe and death phenotypes) genetic analyses of Covid-19 phenotypes using data periodically released by the UK Biobank between