key: cord-0996498-5myn2jay authors: The COVID-19 Host Genetics Initiative,; ganna, a. title: Mapping the human genetic architecture of COVID-19 by worldwide meta-analysis date: 2021-03-12 journal: nan DOI: 10.1101/2021.03.10.21252820 sha: f6c8dc069dd107ff46c1973017f63e099c922398 doc_id: 996498 cord_uid: 5myn2jay The genetic makeup of an individual contributes to susceptibility and response to viral infection. While environmental, clinical and social factors play a role in exposure to SARS-CoV-2 and COVID-19 disease severity, host genetics may also be important. Identifying host-specific genetic factors indicate biological mechanisms of therapeutic relevance and clarify causal relationships of modifiable environmental risk factors for SARS-CoV-2 infection and outcomes. We formed a global network of researchers to investigate the role of human genetics in SARS-COV-2 infection and COVID-19 severity. We describe the results of three genome-wide association meta-analyses comprising 49,562 COVID-19 patients from 46 studies across 19 countries worldwide. We reported 15 genome-wide significant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19. Several of these loci correspond to previously documented associations to lung or autoimmune and inflammatory diseases. They also represent potentially actionable mechanisms in response to infection. We further identified smoking and body mass index as causal risk factors for severe COVID-19. The identification of novel host genetic factors associated with COVID-19, with unprecedented speed, was enabled by prioritization of shared resources and analytical frameworks. This working model of international collaboration a blue-print for future genetic discoveries in the event of pandemics or for any complex human disease. The coronavirus disease 2019 (COVID-19) pandemic, caused by infections with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has resulted in enormous health and economic burden worldwide. One of the most remarkable features of SARS-CoV-2 infection is that a large proportion of individuals 1 are asymptomatic while others experience progressive, even life-threatening, viral pneumonia and acute respiratory distress syndrome. While established host factors contribute to disease severity (e.g., increasing age, male gender, and higher body mass index 2 ), these risk factors alone do not explain all variability in disease severity observed among individuals. The contribution of host genetics to susceptibility and severity of infectious disease is well-documented, and encompasses rare inborn errors of immunity 3, 4 as well as common genetic variation [5] [6] [7] [8] [9] [10] . Characterizing which genetic factors contribute to COVID-19 susceptibility and severity may uncover novel biological insights into disease pathogenesis and identify mechanistic targets for therapeutic development or drug repurposing, as treating the disease remains a highly important goal despite the recent development of vaccines. For example, rare loss-of-function variants in genes involved in type I interferon (IFN) response may be involved in severe forms of COVID-19 [11] [12] [13] [14] . At the same time, several genome-wide association studies (GWAS) that investigate the contribution of common genetic variation [15] [16] [17] [18] to COVID-19 have provided support for the involvement of several genomic loci associated with COVID-19 severity and susceptibility, with the strongest and most robust finding at locus 3p21.31. However, much remains unknown about the genetic basis of susceptibility to SARS-CoV-2 and severity of COVID-19. The COVID-19 Host Genetics Initiative (COVID-19 HGI) (https://www.covid19hg.org/) 19 is an international, open-science collaboration to share scientific methods and resources with research groups across the world with the goal to robustly map the host genetic determinants of SARS-CoV-2 infection and severity of the resulting COVID-19 disease. We have carefully aligned phenotype definitions and incorporated variable ascertainment strategies to achieve greater statistical confidence in our results. We openly and continuously share updated results to the research community. Here, we report the latest results of meta-analyses of 46 studies from 19 countries (Fig. 1) for COVID-19 host genetic effects. 4 15 loci, we compared the lead variant (strongest association P-value) odds ratios (ORs) for the riskincreasing allele across our different COVID-19 phenotype definitions. We first noted that four loci had consistent ORs between the two larger and better powered analyses; all cases with reported infection and all cases hospitalized due to COVID-19 (Methods) ( Table 1, Supplementary Table 2 ). Such consistency implied that these four loci were likely associated with overall susceptibility to SARS-CoV-2 infection, but not with the progression to more severe COVID-19 phenotypes. Notably, these susceptibility loci included the previously reported ABO locus 15, 16, 18, 20 In contrast, 11 out of the 15 loci were associated with increased risk of severe symptoms with significantly larger ORs for hospitalized COVID-19 compared to the mildest phenotype of reported infection (P <0.003 (0.05/15) test for effect size difference) ( Table 1, Supplementary Table 2) . We further compared the ORs for these 11 loci for critical illness due to COVID-19 vs. hospitalized due to COVID-19, and found that these loci exhibited a general increase in effect risk for critical illness (Methods) (Extended Data Fig. 2A, Supplementary Table 3 ). These results indicated that these eleven loci were more likely associated with progression of the disease and worse outcome from SARS-CoV-2 infection compared to being associated with susceptibility to SARS-CoV-2 infection. We noted that two loci, tagged by lead variants rs1886814 and rs72711165, were identified primarily from East Asian genetic ancestry samples (n = 1,414 cases hospitalized due to with minor allele frequencies in European populations being < 3%. This highlights the value of including data from diverse populations for genetic discovery. Another locus at 3p21.31, which is the strongest, most replicated signal for COVID-19 severity [15] [16] [17] [18] 20 , showed substantial differences in allele frequency across ancestry groups, probably explained by its recent introgression 21 . We explored the effect of this locus in the Bangladeshi population, which carries the highest frequency for this haplotype in 1000 Genomes. Using data from the East London Genes & Health study 22 for a proxy variant rs34288077 in the locus (r 2 =0.99 to our lead variant rs10490770), we found that in British-Bangladeshi individuals, the variant frequency was 34.6% of the hospitalized COVID-19 positive patients (n = 76) compared to 23.8% in non-hospitalized population (n = 22,215) (OR [95%CI] = 2.11 [1.39, 3.21] ; P =4.7 × 10 -4 ). Our phenotype definitions include population controls without known SARS-CoV-2 infection. This is not an optimal control group because some individuals, if exposed to SARS-CoV-2 could develop a severe form of COVID-19 disease and should be classified as cases. To better understand the effect of such potential misclassification, we conducted a new meta-analysis, including only the studies that compared hospitalized COVID-19 cases with controls with laboratory-confirmed SARS-CoV-2 infection but who had mild symptoms or were asymptomatic (n = 5,773 cases and n = 15,497 controls). We then compared the effect sizes obtained from this analysis with those from the main phenotype definition (hospitalized cases vs. controls without known SARS-CoV-2 infection, if that information was available) using only studies that reported results for both analyses (Methods). We found that across the 11 loci that had reached genome-wide significance in our main hospitalized COVID-19 analysis, the ORs were not significantly different in the analysis with better refined controls (Extended Data Fig. 2B, Supplementary Table 3 ). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021 These results indicate that using population controls can be a valid and powerful strategy for host genetic discovery of infectious disease. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint To better understand the potential biological mechanism of each locus, we applied several approaches to prioritize candidate causal genes and explore additional associations with other complex diseases and traits. For gene prioritization, we first identified genes within each COVID-19 associated region by distance or linkage disequilibrium (LD) to a lead variant, and then prioritized those with protein-alternating variants, lung eQTLs, or having the highest prioritization score in the OpenTargets V2G (Variant-to-Gene) algorithm 23 Table 5 ), we only considered phenotypes for which the lead variants were in high LD (r 2 > 0.8) with the 15 genome-wide significant lead variants from our main COVID-19 meta-analysis. This conservative approach allowed spurious signals primarily driven by proximity rather than actual colocalization to be removed (see Methods). Of the 15 genome-wide significant loci, we found nine loci to have a distinct candidate gene(s), including biologically plausible genes ( ; P = 5.05 × 10 -10 ) due to COVID-19, is correlated with the missense variant rs34536443:G>C (p.Pro1104Ala; r 2 = 0.82) . This is consistent with the primary immunodeficiency described with complete TYK2 loss of function 24 . In contrast, this missense variant was previously reported to be protective against autoimmune diseases, including rheumatoid arthritis (OR = 0.74; P = 3.0 × 10 -8 ; UKB SAIGE), and hypothyroidism (OR = 0.84; P = 1.8 × 10 -10 ; UK Biobank) (Fig. 3 ). An additional independent missense variant rs2304256:C>A (p.Val362Phe; r 2 = 0.08 with rs34536443) in TYK2 was also associated with critical illness ( Lung-specific cis-eQTL from GTEx v8 26 (n = 515) and the Lung eQTL Consortium 27 (n = 1,103) provided further support for a subset of loci, including FOXP4 (6p21.1) and ABO (9q34.2), OAS1/OAS3/OAS2 (12q24.13), and IFNAR2/IL10RB (21q22.11), where the COVID-19 associated variants modifies gene expression in lung. Furthermore, our PheWAS analysis implicated three additional loci related to lung function, with modest lung eQTL evidence, i.e. the lead variant was not fine-mapped but significantly associated. An intronic variant rs2109069:G>A in DPP9 (19p13.3), positively associated with critical illness, was previously reported to be risk-increasing for interstitial lung disease (tag lead variant rs12610495:A>G [p.Leu8Pro], OR = 1.29, P =2.0 × 10 -12 ) 28 . The COVID-19 lead variant rs1886814:A>C in FOXP4 locus is modestly LD-linked (r 2 = 0.64) with a lead variant of lung adenocarcinoma (tag variant=rs7741164; OR=1.2, P=6.0 × 10 -13 ) 29 . We also found that intronic variants rs67579710:A>T in THBS3 (1q22) and rs1819040:T>A in KANSL1 (17q21.31), associated protectively against hospitalization due to COVID-19, were previously reported for reduced lung function (e.g. tag lead variant rs141942982:G>T, beta = -3.6 × 10 -2 , P = 1.00 × 10 -20 ) 30 . Notably, the 17q21.31 locus is a well-known locus for structural variants containing a megabase inversion polymorphism (H1 and inverted H2 forms) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint and complex copy-number variations, where the inverted H2 forms were shown to be positively selected in Europeans 31, 32 . Lastly, there are remaining six loci with varying evidence for candidate causal genes. For example, the 3p21.31 locus has a complex structure with varying genes prioritized by different methods, where we prioritized CXCR6 with the Variant2Gene (V2G) algorithm 23 , while LZTFL1 is the closest gene. The CXCR6 plays a role in chemokine signaling 33 , and LZTFL1 has been implicated in lung cancer 34 . Nonetheless, these results provide supporting in-silico evidence for candidate causal gene prioritization, while we strongly need further functional characterization. Detailed locus descriptions and LocusZoom plots are provided in Extended Data Fig. 3 . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint Genes that overlap with a genomic range that contains any variants in LD (r 2 > 0.6) with each lead variant. Genes with coding variants: Genes with a loss-of-function or missense variant in LD with a lead variant (r 2 > 0.6). eGenes: Genes with a fine-mapped cis-eQTL variant (PIP > 0. 1) in GTEx Lung that is in LD with a lead variant (r 2 > 0.6) (see Supplementary Table 4 for a complete list). V2G: Highest gene prioritized by OpenTargetGenetics' V2G score. B) Selected phenotypes associated with genome-wide significant COVID-19 variants (see Supplementary Table 5 for a complete list). We report those associations for which a lead variant from a prior GWAS results was in high LD (r 2 > 0. 8 To further investigate the genetic architecture of COVID-19, we used results from meta-analyses including only European ancestry samples (sample sizes described in Methods and Supplementary Table 1) . We applied linkage disequilibrium (LD) score regression 35 to the summary statistics to estimate SNP heritability, i.e. proportion of variation in the two phenotypes that was attributable to common genetic variants, and to determine whether heritability for COVID-19 phenotypes was enriched in genes specifically expressed in certain tissues 36 from GTEx dataset 37 . We detected a low, but significant heritability across all three analyses (<1% on observed scale, all P-values < 0.0001; Supplementary Table 6 ). Despite these low values, which interpretation is complicated by the use of population controls and variation in the disease prevalence estimates, we found that heritability for reported infection was significantly enriched in genes specifically expressed in the lung (P = 5.0 × 10 -4 ) (Supplementary Table 7 ). These findings, together with genome-wide significant loci identified in the meta-analyses, illustrate that there is a significant polygenic or oligogenic architecture that can be better leveraged with future, larger, sample sizes. Genetic correlations (rg) between the three COVID-19 phenotypes was high, though lower correlations were observed between hospitalized COVID- 19 To better understand which traits are genetically correlated and/or potentially causally associated with COVID-19 severity and SARS-CoV-2 reported infection, we chose a set of 38 disease, health and neuropsychiatric phenotypes as potential COVID-19 risk factors based on their putative relevance to the disease susceptibility, severity, or mortality (Supplementary Table 8 ). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint We found evidence (FDR<0.05) of significant genetic correlations between 9 traits and hospitalized COVID-19 and SARS-CoV-2 reported infection ( Fig. 4; Supplementary Table 9 ). Genetic correlation results for COVID-19 severity partially overlap with reported SARS-CoV-2 infection, with genetic liability to BMI, type 2 diabetes, smoking, and attention deficit hyperactivity disorder showing significant positive correlations (rg range between 0.16 -0.26). However some results were significantly different between COVID-19 severity and reported infection. For example, genetic liability to ischemic stroke, was only significantly positively correlated with critical illness or hospitalization due to to COVID-19, but not with a higher likelihood of reported SARS-CoV-2 infection (infection r g= 0.019 vs. hospitalization rg = 0.41, z = 2.7, P = 0.006; infection rg = 0.019 vs. critical illness rg = 0.40, z = 2.49, P = 0.013). In addition, coronary artery disease, and systemic lupus erythematosus showed positive genetic correlations with critical illness or hospitalization due to COVID-19. Genetic liability to risk tolerance, on the hand, was the only trait specifically associated with SARS-CoV-2 infection. This potentially reflects that risk taking behavior could be associated with a higher chance of infection, but is not, per-se, impacting the chances to develop a severe form of COVID-19. With improved phenotyping of cases and controls, methods to deconvolute the effects specific to SARS-CoV-2 infection -a proxy for disease susceptibility -and those specific for progression to severe disease can be applied to better interpret these results. We next used two-sample Mendelian randomization (MR) to infer potentially causal relationships between these traits. Fixed-effects IVW analysis was used as the primary analysis 38 , with weighted median estimator (WME) 39 , weighted mode based estimator (WMBE) 40 We noted that there was sample overlap between some datasets used to generate exposures used in the previous analysis, and the samples contributing to our meta-analysis of hospitalized COVID-19, as a result of inclusion of samples from the UK Biobank. We therefore conducted an additional sensitivity analysis, using new hospitalized COVID-19 summary statistics in which the UK Biobank study had been removed (Supplementary Table 10b) . In this analysis, genetically predicted BMI, height, and red blood cell counts remained significantly associated with COVID-19 outcomes (p < 0.05). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The COVID-19 Host Genetics Initiative has brought together investigators from across the world to to advance genetic discovery for SARS Cov2 infection and severe COVID-19 disease.We report 15 genomewide significant loci associated with some aspect of SARS Cov2 infection or COVID-19. Many of these loci overlap with previously reported associations with lung-related phenotypes or autoimmune/inflammatory diseases, but some loci have no obvious candidate gene as yet. Four out of the 15 genome-wide significant loci showed similar effects in the reported infection analysis (a proxy for disease susceptibility) and all-hospitalized COVID-19 (a proxy for disease severity). This supports the notion that some genetic variants, most notably at ABO and PPP1R15A loci, might indeed impact susceptibility to infection rather than progression to a severe form of the disease once infected. Whilst our ability to draw definitive conclusions is impaired by incomplete capture of who has been infected with SARS-CoV-2, a recent study based on self-reported exposure to COVID-19 positive housemate and consequent development of the disease, support our findings 43 . Several of the loci reported here, as noted in previous publications, intersect with well-known genetic variants that have established genetic associations. Variants at DPP9 show prior evidence of increasing risk for interstitial lung disease. Missense variants within TYK2 show a protective effect on several autoimmune-related diseases. Variants overlapping the well-known structural variants-rich 17q21.31 locus have been previously associated with pulmonary function. Together with the heritability enrichment observed in genes expressed in lung tissues, these results highlight the involvement of lung-related biological pathways in developing severe COVID-19. Several other loci show no prior documented genome-wide significant associations, even despite the high significance and attractive candidate genes for COVID-19 (e.g., CXCR6, LZTFL1, IFNAR2 and OAS1/2/3 loci). The previously reported associations for the strongest signal for COVID-19 severity at 3p21.31 and monocytes count are likely to be due to proximity and not a true co-localization. Increasing the global representation in genetic studies enhances the ability to detect novel associations. Two of the loci affecting disease severity were only discovered by including the four studies of individuals with East Asian ancestry. One of these loci, close to FOXP4, is common particularly in East Asian (40%) as well as Middle Eastern and Admixed American samples in the Americas but has a low frequency in most European populations (2-3%). Previous studies have reported association between this locus and lung cancer 29, 44 and interstitial lung disease 45 . Although we cannot be certain of the mechanism of action of this association FOXP4 is an attractive biological target, as it is expressed in the proximal and distal airway epithelium 46 , and has been shown to play a role in controlling epithelial cell fate during lung development 47 . A central challenge for the COVID-19 HGI was the harmonization of phenotype definitions, analytic pipelines and cohorts with extremely heterogeneous designs, sample ascertainment and control populations. Large-scale biobanks with existing genotype resources and connections to medical systems, newly enrolled hospital-based studies (particularly well-powered to study the extremes of severity by through the recruitment of individuals from intensive care units), and direct-to-consumer genetics studies with customer surveys each contributed different aspects to understanding the genetic basis of susceptibility and severity . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint traits. Indeed, working together through aligning phenotype definitions and sharing results accelerated progress and has enhanced the robustness of the reported findings. Nevertheless, the differences in study sample size, ascertainment and phenotyping of COVID-19 cases are unavoidable and care should be taken when interpreting the results from a meta-analysis. First, studies enriched with severe cases or studies with antibody-tested controls may disproportionately contribute to genetic discovery despite potentially smaller sample sizes. Second, differences in genomic profiling technology, imputation, and sample size across the constituent studies can have dramatic impacts on replication and downstream analyses (particularly fine-mapping where differential missing patterns in the reported results can muddy the signal). Third, the use of population controls with no complete information about SARS-CoV-2 exposure might result in cases of misclassification or reflect ascertainment biases in testing and reporting rather than true susceptibility to infection. Genotyping large numbers of control samples who have been exposed to the virus but remained asymptomatic or experienced only mild symptoms is challenging. Therefore, many studies prefer to use pre-existing datasets of genetically ancestry-matched samples as their controls, protecting against population stratification, but potentially introducing some of these biases. Our analysis comparing the discovery meta-analysis effects to one where controls were phenotypically refined, indicated that, for genome-wide significant variants, such bias was limited. Drawing a comprehensive and reproducible map of the host genetics factors associated with COVID-19 severity and SARS-CoV-2 requires a sustained international effort to include diverse ancestries and study designs. The number of COVID-19 study participants and studies contributing data to this study illustrate the benefits of worldwide international collaboration, open governance and planning, and sharing of technological and analytical resources. To expedite downstream scientific research and therapeutic discovery, the COVID-19 Host Genetic Initiative regularly publishes meta-analysis results from periodic data freezes on the website www.covid19hg.org as new data are included in the study. We also provide an interactive explorer where researchers can browse the results and the genomic loci in more detail. Future work will be required to better understand the biological and clinical value of these findings. Continued efforts to collect more samples and detailed phenotypic data should be endorsed globally; allowing for more thorough investigation of variable, heritable symptoms 48, 49 , particularly in the light of newly emerging strains of SARS-CoV-2 virus, which may provoke different host responses leading to disease, and with the enrollment of vaccines. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint In total 16 studies contributed data to analysis of critical illness due to COVID-19, 29 studies contributed data to hospitalized COVID-19 analysis, and 44 studies contributed to the analysis of all COVID-19 cases. Details of contributing research groups are described in Supplementary Table 1 . All subjects were recruited following protocols approved by local Institutional Review Boards (IRBs). All protocols followed local ethics recommendations and informed consent was obtained when required. COVID-19 disease status (critical illness, hospitalization status) was assessed following the Diagnosis and Treatment Protocol for Novel Coronavirus Pneumonia 50 . The critically ill COVID-19 group included patients who were hospitalized due to symptoms associated with laboratory-confirmed SARS-CoV-2 infection and who required respiratory support or whose cause of death was associated with COVID-19. The hospitalized COVID-19 group included patients who were hospitalized due to symptoms associated with laboratory-confirmed SARS-CoV-2 infection. The reported infection cases group included individuals with laboratory-confirmed SARS-CoV-2 infection or electronic health record, ICD coding or clinically confirmed COVID-19, or self-reported COVID-19 (e.g. by questionnaire), with or without symptoms of any severity. Genetic ancestry-matched controls for the three case definitions were sourced from population-based cohorts, including individuals whose exposure status to SARS-CoV-2 was either unknown or infection-negative for questionnaire/electronic health record-based cohorts. Additional information regarding individual studies contributing to the consortium are described in Supplementary Table 1 . Each contributing study genotyped the samples and performed quality controls, data imputation and analysis independently, but following consortium recommendations (information available at www.covid19hg.org). We recommended to run GWAS analysis using Scalable and Accurate Implementation of GEneralized mixed model (SAIGE) 51 on chromosomes 1-22 and X. The recommended analysis tool was SAIGE, but studies also used other software such as PLINK 52 . The suggested covariates were age, age2, sex, age*sex, and 20 first principal components. Any other study-specific covariates to account for known technical artefacts could be added. SAIGE automatically accounts for sample relatedness and case-control imbalances. Individual study quality control and analysis approaches are reported in Supplementary Table 1 . Study-specific summary statistics were then processed for meta-analysis. Potential false positives, inflation, and deflation were examined for each submitted GWAS. Standard error values as a function of effective sample size was used to find studies which deviated from the expected trend. Summary statistics passing this manual quality control were included in the meta-analysis. Variants with allele frequency of >0.1% and . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint imputation INFO>0.6 were carried forward from each study. Variants and alleles were lifted over to genome build GRCh38, if needed, and harmonized to gnomAD 3.0 genomes 53 by finding matching variants by strand flipping or switching ordering of alleles. If multiple matching variants, the best match was chosen by minimum absolute allele frequency fold change. Meta-analysis was performed using the inversevariance weighted method. The method summarizes effect sizes across the multiple studies by computing the mean of the effect sizes weighted by the inverse variance in each individual study. For each of the 15 independent lead variants reported in Table 1 , we tested whether there was heterogeneity between the effect sizes associated with hospitalized COVID-19 (progression to severe disease) and reported SARS-CoV-2 infection. We used Cochran's Q measure 54, 55 , which is calculated for each variant as the weighted sum of squared differences between the two analysis effects sizes and their meta-analysis effect, the weights being the inverse variance of the effect size. Q is distributed as a chi-square statistic with k (number of studies) minus 1 degrees of freedom. A significant P-value <0.003 (0.05/15 for multiple tests) indicates that the effect sizes for a particular variant are significantly different in the two analyses. For the 11 loci, where the lead variant effect size was significantly higher for hospitalized COVID-19, we carried out the same test again but comparing effect sizes from hospitalized COVID-19 with critically ill COVID-19 (Supplementary Table 3 ). Further, we carried out the same test comparing meta-analyzed hospitalized COVID-19 (population as controls) and hospitalized COVID-19 (SARS-Cov-2 positive but nonhospitalized as controls) (Supplementary Table 3) . For these pairs of phenotype comparisons, we generated new meta-analysis summary statistics to use; including only those studies that could contribute data to both phenotypes that were under comparison. To prioritize candidate causal genes, we employed various gene prioritization approaches using both locusbased and similarity-based methods. Because we only referred in-silico gene prioritization results without characterizing actual functional activity in-vitro/vivo, we aimed to provide a conservative list of any potential causal genes in a locus using the following criteria: 1. Closest gene: a gene that is closest to a lead variant by distance to the gene body 2. Genes in LD region: genes that overlap with a genomic range containing any variants in LD (r 2 > 0.6) with a lead variant. For LD computation, we retrieved LD matrices provided by the gnomAD v2.1.1 53 for each population analyzed in this study (except for Admixed American, Middle Eastern, and South Asian that are not available). We then constructed a weighted-average LD matrix by perpopulation sample sizes in each meta-analysis, which we used as a LD reference. 3. Genes with coding variants: genes with at least one loss of function or missense variant (annotated by VEP 56 v95 with GENCODE v29) that is in LD with a lead variant (r 2 > 0.6). 4. eGenes: genes with at least one fine-mapped cis-eQTL variant (PIP > 0.1) that is in LD with a lead variant (r 2 > 0.6) (Supplementary Table 4) . We retrieved fine-mapped variants from the GTEx v8 26 (https://www.finucanelab.org/) and eQTL catalogue 57 . In addition, we looked up significant associations in the Lung eQTL Consortium 27 (n = 1,103) to further support findings in lung with a larger sample size (Supplementary Table 11) . We note that, unlike the GTEx or eQTL catalogue, we only looked at associations and didn't finemap in the Lung eQTL Consortium data. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint 5. V2G: a gene with the highest overall Variant-to-Gene (V2G) score based on the Open Targets Genetics (OTG) 23 . For each variant, the overall V2G score aggregates differentially weighted evidence of variant-gene association from several data sources, including molecular cis-QTL data (e.g., cis-pQTLs from 58 ., cis-eQTLs from GTEx v7 etc.), interaction-based datasets (e.g., Promoter Capture Hi-C), genomic distance, and variant effect predictions (VEP) from Ensembl. A detailed description of the evidence sources and weights used is provided in the OTG documentation (https://genetics-docs.opentargets.org/our-approach/data-pipeline) 23, 59 . To investigate the evidence of shared effects of 15 index variants for COVID-19 and previously reported phenotypes, we performed a phenome-wide association study. We considered phenotypes in (Open Target) OTG obtained from the GWAS catalog (this included studies with and without full summary statistics, n = 300 and 14,013, respectively) 60 , and from UK Biobank. Summary statistics for UK Biobank traits were extracted from SAIGE 51 for binary outcomes (n = 1,283 traits), and Neale v2 (n = 2,139 traits) for both binary and quantitative traits (http://www.nealelab.is/uk-biobank/)and FinnGen Freeze 4 cohort (https://www.finngen.fi/en/access_results). To remove plausible spurious associations, we retrieved phenotypes for GWAS lead variants that were in LD (r 2 >0.8) with COVID-19 index variants. LD score regression v 1.0.1 35 was used to estimate SNP heritability of the phenotypes from the metaanalysis summary statistic files. As this method depends on matching the linkage disequilibrium (LD) structure of the analysis sample to a reference panel, the European-only summary statistics were used. Sample sizes were n = 5,101 critically ill COVID-19 cases and n = 1,383,241 controls, n = 9,986 hospitalized COVID-19 cases and n = 1,877,672 controls, and n = 38,984 cases and n = 1,644,784 controls for all cases analysis, all including the 23andMe cohort. Pre-calculated LD scores from the 1000 Genomes European reference population were obtained online (https://data.broadinstitute.org/alkesgroup/LDSCORE/). Analyses were conducted using the standard program settings for variant filtering (removal of non-HapMap3 SNPs, non-autosomal, chi-square > 30, MAF < 1%, or allele mismatch with reference). We additionally report SNP heritability estimates for the all-ancestries meta-analyses, calculated using European panel LD scores, in Supplementary Table 6 . We used partitioned LD score regression 61 to partition COVID-19 SNP heritability in cell types in our European-only summary statistics. We ran the analysis using the baseline model LD scores calculated for European populations and regression weights that are available online. We used the COVID-19 European only summary statistics for the analysis. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint We obtained genome-wide association summary statistics for 43 complex disease, neuropsychiatric, behavioural, or biomarker phenotypes (Supplementary Table 8 ). These phenotypes were selected based on their putative relevance to COVID-19 susceptibility, severity, or mortality, with 19 selected based on the Centers for Disease Control list of underlying medical conditions associated with COVID-19 severity 62 or traits reported to be associated with increased risk of COVID-19 mortality by OpenSafely 63 . Summary statistics generated from GWAS using individuals of European ancestry were preferentially selected if available. These summary statistics were used in subsequent genetic correlation and Mendelian randomization analyses. LD score regression 61 was also used to estimate genetic correlations between our COVID-19 meta-analysis phenotypes reported using European-only samples, and between these and the curated set of 38 summary statistics. Genetic correlations were estimated using the same LD score regression settings as for heritability calculations. Differences between the observed genetic correlations of SARS-COV2 infection and COVID-19 severity were compared using a z score method 64 . Two-sample Mendelian randomization was employed to evaluate the causal association of the 38 traits on COVID-19 hospitalization, on COVID-19 severity and SARS-CoV-2 reported infection using Europeanonly samples. Independent genome-wide significant SNPs robustly associated with the exposures of interest (P < 5 × 10 -8 ) were selected as genetic instruments by performing LD clumping using PLINK 52 . We used a strict r 2 threshold of 0.001, a 10MB clumping window, and the European reference panel from the 1000 Genomes project 65 to discard SNPs in linkage disequilibrium with another variant with smaller p-value association. For genetic variants that were not present in the hospitalized COVID analysis, PLINK was used to identify proxy variants that were in LD (r 2 > 0.8). Next, the exposure and outcome datasets were harmonized using the R-package TwoSampleMR 66 . Namely, we ensured that the effect of a variant on the exposure and outcome corresponded to the same allele, we inferred positive strand alleles and dropped palindromes with ambiguous allele frequencies, as well as incompatible alleles. Supplementary Table 8 includes the harmonized datasets used in the analyses. Mendelian Randomization Pleiotropy residual sum and outlier (MR-PRESSO) Global test 42 was used to investigate overall horizontal pleiotropy. In short, the standard IVW meta-analytic framework was employed to calculate the average causal effect by excluding each genetic variant used to instrument the analysis. A global statistic was calculated by summing the observed residual sum of squares, i.e., the difference between the effect predicted by the IVW slope excluding the SNP, and the observed SNP-effect on the outcome. Overall horizontally pleiotropy was subsequently probed by comparing the observed residual sum of squares, with the residual sum of squares expected under the null hypothesis of no pleiotropy. The MR-PRESSO Global test was shown to perform well when the outcome and exposure . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint GWASs are not disjoint (although the power to detect horizontal pleiotropy is slightly reduced by complete sample overlap). We also used the MR-Egger regression intercept 41 to evaluate potential bias due to directional pleiotropic effects. This additional check was employed in MR analyses with an 2 index surpassing the recommended threshold ( 2 > 90%; 67 ). Contingent on the MR-PRESSO Global test results we probed the causal effect of each exposure on COVID-19 hospitalization by using a fixed effect inverseweighted (IVW) meta-analysis as the primary analysis, or, if pleiotropy was present, the MR-PRESSO outlier corrected test. The IVW approach estimates the causal effect by aggregating the single-SNP causal effects (obtained using the ratio of coefficients method, i.e., the ratio of the effect of the SNP on the outcome on the effect of the SNP on the exposure) in a fixed effects meta-analysis. The SNPs were assigned weights based on their inverse variance. The IVW method confers the greatest statistical power for estimating causal associations 68 , but assumes that all variants are valid instruments and can produce biased estimates if the average pleiotropic effect differs from zero. Alternatively, when horizontal pleiotropy was present, we used MR-PRESSO Outlier corrected method to correct the IVW test by removing outlier SNPs. We conducted further sensitivity analyses using alternative MR methods that provide consistent estimates of the causal effect even when some instrumental variables are invalid, at the cost of reduced statistical power including: 1) Weighted Median Estimator (WME); 2) Weighted Mode Based Estimator (WMBE); 3) MR-Egger regression. Robust causal estimates were defined as those that were significant at an FDR of 5% and either 1) showed no evidence of heterogeneity (MR-PRESSO Global test P > 0.05) or horizontal pleiotropy (Egger Intercept P > 0.05), or 2) in the presence of heterogeneity or horizontal pleiotropy, either the WME, WMBE, MR-Egger or MR-PRESSO corrected estimates were significant (P < 0.05). All statistical analyses were conducted using R version 4.0.3. MR analysis was performed using the "TwoSampleMR" version 0.5.5 package 66 . In anticipation of the need to coordinate many international partners around a single meta-analysis effort, we created the COVID-19 HGI website (https://covid19hg.org). We were able to centralize information, recruit partner studies, rapidly distribute summary statistics, and present preliminary interpretations of the results to the public. Open meetings are held on a monthly basis to discuss future plans and new results; video recordings and supporting documents are shared (https://covid19hg.org/meeting-archive). This centralized resource provides a conceptual and technological framework for organizing global academic and industry groups around a shared goal. The website source code and additional technical details are available at https://github.com/covid19-hg/covid19hg. To recruit new international partner studies, we developed a workflow whereby new studies are registered and verified by a curation team (https://covid19hg.org/register). Users can explore the registered studies using a customized interface to find and contact studies with similar goals or approaches (https://covid19hg.org/partners). This helps to promote organic assembly around focused projects that are adjacent to the centralized effort (https://covid19hg.org/projects). Visitors can query study information, including study design and research questions. Registered studies are visualized on a world map and are searchable by institutional affiliation, city, and country. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint To encourage data sharing and other forms of participation, we created a rolling acknowledgements page (https://covid19hg.org/acknowledgements) and directions on how to contribute data to the central metaanalysis effort (https://covid19hg.org/data-sharing). Upon the completion of each data freeze, we post summary statistics, plots, and sample size breakdowns for each phenotype and contributing cohort (https://covid19hg.org/results). The results can be explored using an interactive web browser (https://app.covid19hg.org). Several computational research groups carry out follow-up analyses, which are made available for download (https://covid19hg.org/in-silico). To enhance scientific communication to the public, preliminary results are described in blog posts by the scientific communications team and shared on Twitter. The first post was translated to 30 languages with the help of 85 volunteering translators. We compile publications and preprints submitted by participating groups and summarize genome-wide significant findings from these publications (https://covid19hg.org/publications). Summary statistics generated by COVID-19 HGI are available at https://www.covid19hg.org/results/r5/ and will be made available on GWAS Catalog. The analyses described here utilize the freeze 5 data. COVID-19 HGI continues to regularly release new data freezes. Summary statistics for non-European ancestry samples are not currently available due to the small individual sample sizes of these groups. Meta-analysis code is available at https://github.com/covid19-hg/META_ANALYSIS/. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Comparison of the effect sizes and P-values of the 15 lead variant, using data from the COVID-19 critical illness meta-analysis in all the cohorts (y-axis) to leaving out genomiCC, leaving out UK Biobank (UKBB) and leaving out genomiCC + UKBB, respectively (x-axis). Dots represent the effect size estimates (top panels) and P-values (bottom panels), and bars represent the standard error. Filled dots indicate variants that were significant in the full meta-analysis of critical illness due to COVID-19, and empty dots represent variants that were not significant for critical illness but were significant for either hospitalization due to COVID-19 or SARS-CoV-2 reported infection. Red dots represent variants that were significant in leave-one-out analysis for genomiCC, UKBB or genomiCC + UKBB. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint Extended Data Figure 2 Comparison of effect sizes for the 11 variants associated with severity of COVID-19 disease. A. Comparing hospitalized COVID-19 cases vs population controls (x-axis, n=10,428 cases and n=1,483,270 controls) and critically ill COVID-19 cases vs population controls (y-axis, n=6,179 cases and n=1,483,780 controls). B. hospitalized COVID-19 cases vs population controls (x-axis,n=5,806 cases and n=1,144,263 controls)and hospitalized COVID-19 cases vs non-hospitalized COVID-19 cases (yaxis, n=5,773 and n=15,497 controls). Dots represent the effect size estimates, bars represent the confidence interval of the estimates. Effect size estimates and P-values for heterogeneity test are reported in Supplementary Table 3 . For each genome-wide significant locus in three meta-analyses: critical illness (labelled as Analysis A2), hospitalization (labelled as Analysis B2), and reported infection (labelled as Analysis C2), we showed 1) a manhattan plot of each locus where a color represents a weighted-average r 2 value (see Methods) to a lead variant; 2) r2 values to a lead variant across gnomAD v2 populations, i.e., African/African-American (AFR), Latino/Admixed American (AMR), Ashkenazi Jewish (ASJ), East Asian (EAS), Estonian (EST), Finnish (FIN), Non-Finish Europeans (NFE), North-Western Europeans (NWE), and Southern Europeans (SEU); 3) genes at a locus; and 4) genes prioritized by each gene prioritization metric where a size of circles represents a rank in each metric. Note that the COVID-19 lead variants were chosen across all the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21252820 doi: medRxiv preprint meta-analyses ( Table 1 ; see Methods) and were not necessarily a variant with the most significant Pvalue in each meta-analysis. Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: A living systematic review and meta-analysis Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study Lethal Infectious Diseases as Inborn Errors of Immunity: Toward a Synthesis of the Germ and Genetic Theories The role of host genetics in the immune response to SARS-CoV-2 and COVID-19 susceptibility and severity IFITM3 restricts the morbidity and mortality associated with influenza Resistance to HIV-1 infection in caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene A homozygous nonsense mutation (428G-->A) in the human secretor (FUT2) gene provides resistance to symptomatic norovirus (GGII) infections Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections Genetic variation in IL28B and spontaneous clearance of hepatitis C virus Human Genetic Determinants of Viral Diseases Presence of Genetic Variants Among Young Men With Severe COVID-19 Inborn errors of type I IFN immunity in patients with life-threatening COVID-19 Autoantibodies against type I IFNs in patients with life-threatening COVID-19 Failure to replicate the association of rare loss-of-function variants in type I IFN immunity genes with severe COVID-19. medRxiv Severe Covid-19 GWAS Group et al. Genomewide Association Study of Severe Covid-19 with Respiratory Failure Trans-ethnic analysis reveals genetic and non-genetic associations with COVID-19 susceptibility and severity Genetic mechanisms of critical illness in Covid-19 AncestryDNA COVID-19 host genetic study identifies three novel loci The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic Genetic association analysis of SARS-CoV-2 infection in 455,838 UK Biobank participants The major genetic risk factor for severe COVID-19 is inherited from Neanderthals Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics Resolving TYK2 locus genotype-to-phenotype differences in autoimmunity The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease The GTEx Consortium atlas of genetic regulatory effects across human tissues Lung eQTLs to help reveal the molecular underpinnings of asthma Genome-wide association study identifies multiple susceptibility loci for pulmonary fibrosis Meta-analysis of genome-wide association studies identifies multiple lung cancer susceptibility loci in never-smoking Asian women New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries A common inversion under selection in Europeans Structural haplotypes and recent evolution of the human 17q21.31 region CXCL16/CXCR6 chemokine signaling mediates breast cancer progression by pERK1/2-dependent mechanisms LZTFL1 suppresses lung tumorigenesis by maintaining differentiation of lung epithelial cells LD Score regression distinguishes confounding from polygenicity in genome-wide association studies Heritability enrichment of specifically expressed genes identifies diseaserelevant tissues and cell types Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans Mendelian randomization analysis with multiple genetic variants using summarized data Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases Novel COVID-19 phenotype definitions reveal phenotypically distinct patterns of genetic association and protective effects Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations Genome-wide association study of subclinical interstitial lung disease in MESA Foxp4: a novel member of the Foxp subfamily of winged-helix genes co-expressed with Foxp1 and Foxp2 in pulmonary and gut tissues Foxp1/4 control epithelial cell fate during lung development and regeneration through regulation of anterior gradient 2 COVID-19 and anosmia: A review based on up-to-date knowledge Self-reported symptoms of covid-19 including symptoms most predictive of SARS-CoV-2 infection Diagnosis and Treatment Protocol for Novel Coronavirus Pneumonia (Trial Version 7) Efficiently controlling for case-control imbalance and sample relatedness in largescale genetic association studies PLINK: a tool set for whole-genome association and population-based linkage analyses The mutational constraint spectrum quantified from variation in 141,456 humans Meta-analysis methods for genome-wide association studies and beyond The Combination of Estimates from Different Experiments The Ensembl Variant Effect Predictor eQTL Catalogue: a compendium of uniformly processed human gene expression and splicing QTLs. Cold Spring Harbor Laboratory Genomic atlas of the human plasma proteome Open Targets Genetics: An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Cold Spring Harbor Laboratory The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 Partitioning heritability by functional annotation using genome-wide association summary statistics Factors associated with COVID-19-related death using OpenSAFELY Educational attainment and drinking behaviors: Mendelian randomization study in UK Biobank A map of human genome variation from population-scale sequencing The MR-Base platform supports systematic causal inference across the human phenome Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic A comparison of robust Mendelian randomization methods using summary data We thank the entire COVID-19 Host Genetics Initiative community for their contributions and continued collaboration. The work of the contributing studies was supported by numerous grants from governmental and charitable bodies, and study specific acknowledgements will be released with the publication. We thank G. Butler-Laporte, G. Wojcik, M.-G. Hollm-Delgado, C. Willer and G. Davey Smith for their extensive feedback and discussion.