key: cord-0848051-gel6k3i0 authors: Zhou, Jingqi; Sun, Yitang; Huang, Weishan; Ye, Kaixiong title: Altered blood cell traits underlie a major genetic locus of severe COVID-19 date: 2021-02-02 journal: J Gerontol A Biol Sci Med Sci DOI: 10.1093/gerona/glab035 sha: 0635effb71dde99873f70f080e178fa1bd3b9c8e doc_id: 848051 cord_uid: gel6k3i0 BACKGROUND: The genetic locus 3p21.31 has been associated with severe coronavirus disease 2019 (COVID-19), but the underlying pathophysiological mechanism is unknown. METHODS: To identify intermediate traits associated with the 3p21.31 locus, we first performed a phenome-wide association study (PheWAS) with 923 phenotypes in 310,999 European individuals from the UK Biobank. For genes potentially regulated by the COVID-19 risk variant, we examined associations between their expression and the polygenic score (PGS) of 1,263 complex traits in a meta-analysis of 31,684 blood samples. For the prioritized blood cell traits, we tested their associations with age and sex in the same UK Biobank sample. RESULTS: Our PheWAS highlighted multiple blood cell traits to be associated with the COVID-19 risk variant, including monocyte count and percentage (p = 1.07×10 (-8), 4.09×10 (-13)), eosinophil count and percentage (p = 5.73×10 (-3), 2.20×10 (-3)), and neutrophil percentage (p = 3.23×10 (-3)). The PGS analysis revealed positive associations between the expression of candidate genes and genetically predicted counts of specific blood cells: CCR3 with eosinophil and basophil (p = 5.73×10 (-21), 5.08×10 (-19)); CCR2 with monocytes (p = 2.40×10 (-10)); and CCR1 with monocytes and neutrophil (p = 1.78×10 (-6), 7.17×10 (-5)). Additionally, we found that almost all examined white blood cell traits are significantly different across age and sex groups. CONCLUSIONS: Our findings suggest that altered blood cell traits, especially those of monocyte, eosinophil, and neutrophil, may represent the mechanistic links between the genetic locus 3p21.31 and severe COVID-19. They may also underlie the increased risk of severe COVID-19 in older adults and men. The coronavirus disease 2019 , caused by infection of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), affects individuals differently, with clinical manifestations ranging from asymptomatic infection, to mild flu-like symptoms, to severe respiratory failure (1) (2) (3) (4) . While some demographic factors and pre-existing conditions, especially older age and male sex, are well-established risk factors for severe COVID-19, the exact mechanisms are still elusive (5, 6). Genetic variation is partly responsible for varying individual responses (7-9). The first genome-wide association study (GWAS) for COVID-19 was published in June 2020, comparing 1,610 severe patients with respiratory failure to 2,205 healthy controls from Italy and Spain. It identified two genetic loci, with the most signal at locus 3p21.31 and the other at locus 9q34.2 coinciding with the ABO blood group locus (8) . The association at locus 3p21.31 was independently replicated by the COVID-19 Host Genetics Initiative (9) . The peak signal at this locus spans multiple chemokine receptor genes (e.g., CCR9, CXCR6, XCR1 and CCR1) and risk variants are associated with the expression of CXCR6, CCR1 and SLC6A20 (8) . However, the underlying pathophysiological process is unknown. Phenome-wide association study (PheWAS) is an unbiased approach that evaluates the associations of a disease-associated genetic variant (e.g., a COVID-19 risk variant) with a wide range of phenotypes (i.e., the phenome). PheWAS may identify intermediate traits or biomarkers residing in the causal physiological route from the genetic variant to the disease of interest, or it may reveal unexpected comorbidities that indicate shared biological mechanisms (10, 11) . Similarly, expression quantitative trait locus (eQTL) analysis for a trait-associated genetic variant across the transcriptome can identify candidate causal genes that are either close (in cis) or remote (in trans) to the variant (12, 13) . From the perspective of a candidate gene, insights could be gained into its physiological pathways and downstream functional effects by examining the associations of its expression level A c c e p t e d M a n u s c r i p t 4 with phenotypes across the phenome, or even with the genetically predicted phenotypic status if measured ones are unavailable (13) . This project aims to explore the mechanistic link between the genetic locus 3p21. 31 UK Biobank is a large population-based prospective study that recruited more than 500,000 individuals aged 40-70 years between 2006 and 2010. It was approved by the North West Multi-Centre Research Ethics Committee (11/NW/ 0382) and proper informed consent was obtained. All participants received baseline measurements, donated biological materials, and provided access to their medical records (14). Data for this project was accessed through an approved application to UK Biobank (Application ID: 48818). A c c e p t e d M a n u s c r i p t 5 Among all UK Biobank participants, only those fulfilling the following criteria were included in our PheWAS analyses: 1) genetic ancestry is Caucasian; 2) included in the genetic principal component analysis; 3) no sex chromosome aneuploidy; 4) no high degree of genetic kinship, and 5) for relative pairs (kinship coefficient > 0.0884), a minimum number of participants were removed so that all those remaining are unrelated. A total of 310,999 unrelated individuals passed this quality control and filtering procedure. Three sets of phenotypes were examined: binary disease outcomes, blood and urine biomarkers, and blood cell traits. Binary disease status was defined by mapping ICD9/ICD10 diagnosis codes in the hospital episode statistics to phecodes in the PheCODE grouping system (15). To ensure sufficient statistical power, only phecodes with a minimum of 200 cases were retained. A total of 858 phecodes were included in our analysis. For continuous traits, our PheWAS included 30 blood and 4 urine biochemistry markers, and 31 blood cell traits (14, 16, 17) . The full list of phenotypes could be found in our additional supporting materials (18). Statistical association analyses were performed with the R PheWAS package (19) . Logistic regression was performed for binary disease outcomes and linear regression for continuous blood and urine biomarkers, adjusting for age, sex, genotyping array, assessment center, and the first 10 genetic principal components. For statistical significance, we applied Bonferroni correction for the total number of phenotypes tested (i.e., p < 0.05 / 923 = 5.42×10 -5 ), although we note that this is a conservative approach because the phenotypes are not independent. For blood cell traits whose associations with the severe COVID-19 risk variant are nominally significant (p < 0.05), we examined their associations in two existing studies. The first is GeneATLAS, (21). Summary statistics for this study were retrieved from the IEU OpenGWAS database (22) . We note that these two studies were also mainly based on UK Biobank samples and thus should not be considered as independent replications. To identify genes whose expression levels are associated with the COVID-19 risk variant, we inquired eQTL analysis results from GTEx and eQTLGen (13, 23) . The GTEx project studies tissue-specific gene expression and regulation in 54 non-diseased tissue sites from about 1,000 individuals (23). The eQTLGen Consortium conducted eQTL meta-analysis in 31,684 samples of blood and peripheral blood mononuclear cells from 37 datasets (13) . It also performed polygenic score association analysis to evaluate the associations between the expression level of most genes and the PGS of 1,263 traits (13) . The majority of samples in both studies are of European ancestry. In the PGS association analysis, multiple PGS were calculated for each trait with different GWAS, sample ancestry, and p value cutoffs (p = 0.01, 110 -3 , 110 -4 , 110 -5 , 510 -8 ). For blood cell traits, three previous GWAS were used and designated as study 1 (24) , study 2 (25), and study 3 (26), respectively. Statistical significance was defined with the false discovery rate approach (FDR < 0.05) (13) . We evaluated the age and sex effects on 31 blood cell traits in the UK Biobank dataset with a linear regression model that included three variables, a continuous variable for age, a categorical variable for sex (female = 0 and male = 1), and a third term for the interaction between sex and age. For A c c e p t e d M a n u s c r i p t 7 statistical significance, we applied Bonferroni correction for the total number of blood cell traits tested (i.e., p < 0.05 / 31 = 1.6110 -3 ). For each blood cell trait, we also reported their mean and standard deviation in all samples and also in men and women separately. The severe COVID-19 risk variant examined in this study is rs67959919 (G/A), whose risk allele A has an odds ratio (OR) of 2.07 (95% confidence interval (CI): 1.66-2.56, p = 4.6910 -11 ) for severe COVID-19 after adjustment for genetic principal components, age and sex (8) Table 1 Table 2 ). Moreover, eQTLGen (13), a meta-analysis for cis-eQTL in 31,684 blood samples additionally identified the following genes: Integrating and reconciling association signals in PheWAS, eQTL, and PGS analysis, three candidate blood cell traits and their corresponding candidate genes were prioritized (Figure 3) . First, the severe COVID-19 risk allele inhibits the expression of CCR1 and CCR2, subsequently reducing the monocyte count. Second, the risk allele downregulates CCR3 expression and further diminishes the eosinophil count. Third, the risk allele downregulates CCR3 expression and relieves its inhibition on the neutrophil count. Given the well-established role of older age and male sex as risk factors for severe COVID-19, we examined if blood cell traits are significantly different across age and sex groups by testing their associations in the UK Biobank. The basic statistics for 31 blood cell traits were reported in Supplementary Table 4 . Among the 11 traits related to white blood cells, all of them are significantly associated with age (Table 2, p < 1.6110 -3 ). All but one, eosinophil count, are significantly different in men and women. The interaction between sex and age is highly significant across all these white blood cell traits. Significant age and sex effects were also observed for traits related to red blood cells and platelets (Supplementary Table 5 ). With an unbiased phenome-wide scan approach, our study established two pairs of relationships: 1) associations of the severe COVID-19 risk variant with blood cell traits; and 2) associations between expression levels of candidate target genes and the PGS of blood cell traits. Integrating association A c c e p t e d M a n u s c r i p t 11 signals across multiple analyses prioritizes three blood cell traits, the counts of monocyte, eosinophil and neutrophil, and their candidate target genes, CCR1, CCR2, and CCR3. Taken together, our results proposed blood cell traits as the probable mechanistic link between the risk variant at 3p21.31 and severe COVID-19. We further showed that these blood cell traits are drastically different across age and sex groups, calling for future investigation into their roles in the increased risk of severe COVID-19 in older adults and men. Hematologic manifestations are common in COVID-19 patients, especially elevated WBC and neutrophil counts but decreased lymphocyte and platelet counts (28) (29) (30) . Leukocytosis, neutrophilia, lymphopenia, thrombocytopenia, and neutrophil-to-lymphocyte ratio have been repeatedly associated with worse COVID-19 outcomes and could serve as prognostic biomarkers (1, 4, 31-33). Reducing basophil count or percentage was generally observed in patients (34) (35) (36) . For monocyte, its total number in circulation does not change dramatically in COVID-19 patients, with reports of no change or only slight increase (29, 36, 37) . However, its composition exhibits a pronounced shift, with a significant expansion of its inflammatory subsets, which are not typically seen in healthy individuals (6, 29, 30, 37-40). The pattern of eosinophil is less well-established. Some studies observed diminished and even undetectable eosinophil counts (i.e., eosinopenia) in COVID-19 patients (34, 36, 41-45), and it was also shown that eosinophil counts are positively associated with lymphocyte counts in both severe and non-severe cases (45). However, others did not find a significant difference (46), and there is also an report of an expanded eosinophil percentage among the total viable leukocyte CD45+ population (29) . These changes in circulating blood cells are closely related to the infiltration and accumulation of lymphocyte, neutrophil, eosinophil, and inflammatory monocyte-macrophage in the lung and other organs, leading to neutrophil extracellular trap and cytokine release syndrome (30, (47) (48) (49) . Notably, an immuno-monitoring study of COVID-19 patients A c c e p t e d M a n u s c r i p t 12 from acute to recovery phages observed gradual reduction of neutrophil and replenishment of basophil, eosinophil and non-inflammatory monocyte (50) . Our PheWAS in UK Biobank for the severe COVID-19 risk variant revealed that the risk allele is associated with decreased monocyte count and percentage, eosinophil count and percentage, but with increased neutrophil percentage. These significant associations remained almost unchanged when the white blood cell count was included as an additional covariate. GeneATLAS reported even more significant associations for these relationships, probably due to its different quality control procedures and a larger sample size (20) . It also reports suggestive evidence of negative associations between the risk allele and basophil count and percentage. These association directions are consistent with the observed blood cell count changes in COVID-19 patients, as discussed above. Of note, our associations were identified in the generally healthy population samples. On the other hand, the vast majority of existing studies measured blood cell counts at hospital admission or during hospitalization, which likely reflect immune responses to SARS-CoV-2 infection. Future studies are warranted to evaluate if before-infection differences in blood cell counts play a role in modulating the risk of developing severe COVID-19, especially in the context of age and sex differences. Our PGS analysis for the potential target genes of the COVID-19 risk variant further unraveled associations with multiple blood cell counts. It is important to stress that these associations are consistent across analyses with PGS calculated with different GWAS datasets, p value cutoffs, and sample ancestries. Intersecting and reconciling association signals across PheWAS, eQTL, and PGS analysis yielded multiple possible pathways for the COVID-19 risk allele. Strong and consistent evidence was found on the pathways through monocyte and eosinophil. On the other hand, support A c c e p t e d M a n u s c r i p t 13 for the role of neutrophil is weaker. The association with neutrophil count was not significant in a meta-analysis of three UK cohorts (21). Also, the negative associations between CCR3 expression and PGS of the neutrophil count were only suggestive (p = 0.011). In addition to these three blood cells, basophil may serve as another candidate pathway: the risk allele downregulates CCR3 expression, reduces its stimulatory effect on basophil count, and thus leads to a reduction of basophil. Additional evidence for the potential importance of these candidate genes could be drawn from their cell-type-specific expression patterns (Supplementary Figure 1) . CCR3 has highly specific expression in eosinophil and basophil and only slight expression in neutrophil, CCR2 has high expression in basophil and medium expression in classical monocyte, while CCR1 has medium to high expression across all types of granulocytes and monocyte. Notably, this pathway prioritization analysis utilized eQTL association signals in blood samples, but the regulatory effects could be different across tissues (23). Also, the eQTL analyses were based on generally healthy samples (13, 23 c c e p t e d M a n u s c r i p t 14 tool to integrate information across all available tissues. First, these additional analyses confirmed the associations between the expression of candidate target genes and blood cell traits. For instance, across all examined tissues, observed monocyte count is associated with genetically predicted CCR1 (p = 2.7910 -19 ) and CCR2 expression (p = 1.8310 -28 ); observed eosinophil count is associated with genetically predicted CCR3 expression (p = 4.3510 -4 ); observed neutrophil percentage is associated with genetically predicted CCR3 expression (p = 3.3510 -5 ). Second, the tissue-level analysis in 49 tissues revealed tissue-specific association patterns. For instance, the association between CCR1 and monocyte count is positive in some tissues (e.g., amygdala, cerebellar hemisphere, and hippocampus), but negative (e.g., esophagus, lung, and colon) and non-significant in others (e.g., stomach, heart atrial appendage, and putamen basal ganglia). These tissuedependent patterns indicate that there are more possible pathways connecting the COVID-19 risk variant, blood cell traits, and COVID-19, in addition to those presented in Figure 3 and based on associative patterns in blood samples. The relatively large number of significant associations in different tissues makes it challenging to further narrow down to specific tissues. Additionally, since the phenotypes of special interest are blood cell counts and percentages, it will be especially informative if we could evaluate gene expression in specific subsets of blood cells. However, these are currently unavailable in the GTEx reference dataset. Nevertheless, our additional analysis confirms the link between a major genetic locus of severe COVID-19 and blood cell traits. The strengths of our study include the unbiased phenome-wide approach at two levels of analysis, Genetically predicted lower counts of white blood cells, myeloid white blood cells, and granulocytes, and higher eosinophil percentage were found to be associated with an increased risk of severe COVID-19. These recent discoveries further strengthen the link from locus 3p21.31 to blood cell traits and then to severe COVID-19. In conclusion, our phenome-wide association study for the severe COVID-19 risk variant at locus 3p21.31 and its candidate target genes identified altered blood cell traits, especially counts of monocyte, eosinophil, and neutrophil, as the probable mechanistic links between the genetic locus and severe COVID-19. These blood cell traits, together with their candidate acting genes, CCR1, CCR2 and CCR3, represent compelling and testable hypothesis that call for follow-up studies into their roles in COVID-19 pathogenesis, especially in elevating the risk in the older adults and men. A c c e p t e d M a n u s c r i p t 17 The authors declare that they have no conflict of interest. This work is supported by the University of Georgia Research Foundation. traits. Each column corresponds to a gene. Each row corresponds to a polygenic score of a blood cell trait. Row names are organized as the combination of trait name, study number for the GWAS providing summary statistics, and the sample ancestry. EUR refers to European ancestry, while ALL refers to multiancestry. If no ancestry label is present, the study used only European samples. All PGS shown in this figure were calculated with a p-value cut off of 5×10 -8 . Complete association results for PGS calculated with other p-value cutoffs could be found in additional supporting materials. Blood cell traits are categorized into three groups: platelet, red blood cells, and white blood cells. The effects of association, Z-score, are shown as the heatmap. The statistical significance is indicated with "*" (p < 0.05) or "**" (FDR < 0.05). A c c e p t e d M a n u s c r i p t 24 Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study OpenSAFELY: factors associated with COVID-19 death in 17 million patients Prevalence of Asymptomatic SARS-CoV-2 Infection: A Narrative Review SARS-CoV-2 and COVID-19 in older adults: what we may expect regarding pathogenesis, immune responses, and outcomes Genomewide Association Study of Severe Covid-19 with Respiratory Failure The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic The challenges, advantages and future of phenome-wide association studies Systematic identification of trans eQTLs as putative drivers of known disease associations Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. bioRxiv Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation Genetics of 38 blood and urine biomarkers in the UK Biobank. bioRxiv Learning polygenic scores for human blood cell traits Additional Supporting Materials for R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment An atlas of genetic associations in UK Biobank The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease Consortium GT, Laboratory DA, Coordinating Center -Analysis Working G, et al. Genetic effects on gene expression across human tissues Large-Scale Exome-wide Association Analysis Identifies Loci for White Blood Cell Traits and Pleiotropy with Immune-Mediated Diseases LDlink: a web-based application for exploring populationspecific haplotype structure and linking correlated alleles of possible functional variants Extrapulmonary manifestations of COVID-19: Radiologic and clinical overview Comprehensive mapping of immune perturbations associated with severe COVID-19 Immunology of COVID-19: Current State of the Science Neutrophil-to-lymphocyte ratio predicts critical illness patients with 2019 coronavirus disease in the early stage Association between platelet parameters and mortality in coronavirus disease 2019: Retrospective cohort study Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: A meta-analysis The underlying changes and predicting role of peripheral blood inflammatory cells in severe COVID-19 patients: A sentinel? Epidemiologic and clinical characteristics of 91 hospitalized patients with COVID-19 in Zhejiang, China: a retrospective, multi-centre case series Dysregulation of Immune Response in Patients With Coronavirus COVID-19 infection induces readily detectable morphological and inflammation-related phenotypic changes in peripheral blood monocytes, the severity of which correlate with patient outcome. medRxiv Pathogenic T-cells and inflammatory monocytes incite inflammatory storms in severe COVID-19 patients Pathological inflammation in patients with COVID-19: a key role for monocytes and macrophages Correlation Analysis Between Disease Severity and Inflammation-related Parameters in Patients with COVID-19 Pneumonia Eosinopenia and COVID-19 Eosinopenia and elevated C-reactive protein facilitate triage of COVID-19 patients in fever clinic: a retrospective case-control study The role of peripheral blood eosinophil counts in COVID-19 patients Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan Eosinophil count in severe coronavirus disease 2019 Targeting potential drivers of COVID-19: Neutrophil extracellular traps Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19 Systems-Level Immunomonitoring from Acute to Recovery Phase of Severe COVID-19 Bayesian test for colocalisation between pairs of genetic association studies using summary statistics We would like to thank UK Biobank participants and administrators for data access. We also want to thank all other Ye lab members for helpful discussions. JZ and KY initiated and designed this study. JZ and YS performed data analysis and prepared visualizations. JZ, WH, and KY interpreted the results. JZ and KY wrote the first draft of the manuscript. All authors read, edited and approved the final version. A c c e p t e d M a n u s c r i p t 18 A c c e p t e d M a n u s c r i p t A c c e p t e d M a n u s c r i p t