key: cord-306924-dw35dlx3 authors: Wohlers, Inken; Calonga-Solís, Verónica; Jobst, Jan-Niklas; Busch, Hauke title: COVID-19 risk haplogroups differ between populations, deviate from Neanderthal haplotypes and compromise risk assessment in non-Europeans date: 2020-11-03 journal: bioRxiv DOI: 10.1101/2020.11.02.365551 sha: doc_id: 306924 cord_uid: dw35dlx3 Recent genome wide association studies (GWAS) have identified genetic risk factors for developing severe COVID-19 symptoms. The studies reported a 1bp insertion rs11385942 on chromosome 31 and furthermore two single nucleotide variants (SNVs) rs35044562 and rs679599192, all three correlated with each other. Zeberg and Pääbo3 subsequently traced them back to Neanderthal origin. They found that a 49.4 kb genomic region including the risk allele of rs35044562 is inherited from Neanderthals of Vindija in Croatia. Here we add a differently focused evaluation of this major genetic risk factor to these recent analyses. We show that (i) COVID-19-related genetic factors of Neanderthals deviate from those of modern humans and that (ii) they differ among world-wide human populations, which compromises risk prediction in non-Europeans. Currently, caution is thus advised in the genetic risk assessment of non-Europeans during this world-wide COVID-19 pandemic. In general, GWAS relate genotypes to phenotypes such as disease susceptibility and 27 severity. However, association does not imply causality. To pinpoint causal variant(s) 28 underlying a GWAS association signal which typically comprises many correlated 29 variants, a so-called fine-mapping is performed in a first step. And ultimately, fine-30 mapping must be followed by experimental validation to eventually identify causal 31 variant(s) and mechanisms. While GWAS are based on cohort data, a personal risk 32 can be assessed nonetheless, via associated variants as proxies for causal variants. 33 For this, the cohort's genetic linkage patterns need to be representative of the 34 Neanderthal study, with an overall maximum probability to include a causal variant of 50 61%. However, as two positions carry protective alleles the risk probability of the 51 previously assessed Vindija Neanderthal haplotype is only 52%. 52 Our results presented here complement the haplotype-based assessment of Zeberg 54 and Pääbo. We use the same 1000 Genomes 5 data as in the original study, but with 55 three important differences: (i) We investigate haplotypes within a larger genomic 56 frequency of 10% in the whole dataset and variable frequencies from 1 to 31% in 71 different continental populations (Fig. 1a) . Eight haplogroups, H1-H8, have counts 72 higher than 10 and the most common is the protective haplotype H1. Risk haplotypes 73 H2-H8 tend to differ between continental populations (Fig. 1a) . For them, COVID-19 74 genetic risk probability varies substantially between 8 and 96%. The high risk 75 haplogroups H2, H3 and H8 differ by one or two alleles, and differ from the low risk haplogroups H5, H6 and H7 all of which are similar to the protective haplogroup H1 77 (Fig. 1b) . However, individuals carrying a risk haplogroup very dissimilar from 78 Neanderthal haplotypes may still carry a causal variant (Fig. 1c) ; this holds particularly 79 true for Africans with haplogroups H5 or H6 (19% or 11% probability) and for Asians 80 with haplogroup H7 (8% probability). Haplogroup H3 has highest risk probability and 81 is the most common risk haplogroup in Europeans and Americans (Fig. 1b) . 82 83 All human risk haplogroups differ from Neanderthal haplotypes (Fig. 1c) If this variant was causal (2% probability) using lead variants such as rs11385942 or 120 rs35044562 would incorrectly classify individuals carrying these haplogroups to be at 121 risk. This applies to few Europeans, but mostly to non-Europeans. 122 In conclusion we find that classification into high and low COVID-19 risk is extremely 124 error-prone in non-European populations, if this assessment is based on currently 125 known European risk variants and probabilities. The risk haplogroup diversity observed 126 across populations thus compromises risk assessment in non-Europeans. This 127 situation is currently improved by performing ancestry-matched GWAS in non-using complementary, e.g. experimental approaches will help in the process. These 130 diverse systems genetics efforts will eventually converge into genetic causes and 131 corresponding molecular mechanisms that explain non-environmental variation in 132 COVID-19 severity. 133 Severe Covid-19 GWAS Group et al. Genomewide Association Study of Severe 136 Covid-19 with Respiratory Failure COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a 138 global initiative to elucidate the role of host genetic factors in susceptibility and 139 severity of the SARS-CoV-2 virus pandemic The major genetic risk factor for severe COVID-19 is 142 inherited from Neanderthals FINEMAP: efficient variable selection using summary data from 144 genome-wide association studies Genomes Project Consortium et al. A global reference for human genetic 146 variation We thank the COVID-19 Host Genetics Initiative for publicly releasing GWAS 153 summary statistics. IW and HB acknowledge funding by the Deutsche