key: cord-0743902-p2eu953v authors: Clayton, G. L.; Goncalves Soares, A.; Goulding, N.; Borges, M. C. G.; Holmes, M.; Davey Smith, G.; Tilling, K.; Lawlor, D. A.; Carter, A. R. title: The relationship between BMI and COVID-19: exploring misclassification and selection bias in a two-sample Mendelian randomisation study date: 2022-03-05 journal: nan DOI: 10.1101/2022.03.03.22271836 sha: 0516d4b8aab98fde26ce348b84b363fd13cee7e0 doc_id: 743902 cord_uid: p2eu953v Objective: To use the example of the effect of body mass index (BMI) on COVID-19 susceptibility and severity to illustrate methods to explore potential selection and misclassification bias in Mendelian randomisation (MR) of COVID-19 determinants. Design: Two-sample MR analysis. Setting: Summary statistics from the Genetic Investigation of ANthropometric Traits (GIANT) and COVID-19 Host Genetics Initiative (HGI) consortia. Participants: 681,275 participants in GIANT and more than 2.5 million people from the COVID-19 HGI consortia. Exposure: Genetically instrumented BMI. Main outcome measures: Seven case/control definitions for SARS-CoV-2 infection and COVID-19 severity: very severe respiratory confirmed COVID-19 vs not hospitalised COVID-19 (A1) and vs population (those who were never tested, tested negative or had unknown testing status (A2)); hospitalised COVID-19 vs not hospitalised COVID-19 (B1) and vs population (B2); COVID-19 vs lab/self-reported negative (C1) and vs population (C2); and predicted COVID-19 from self-reported symptoms vs predicted or self-reported non-COVID-19 (D1). Results: With the exception of A1 comparison, genetically higher BMI was associated with higher odds of COVID-19 in all comparison groups, with odds ratios (OR) ranging from 1.11 (95%CI: 0.94, 1.32) for D1 to 1.57 (95%CI: 1.57 (1.39, 1.78) for A2. As a method to assess selection bias, we found no strong evidence of an effect of COVID-19 on BMI in a 'no-relevance' analysis, in which COVID-19 was considered the exposure, although measured after BMI. We found evidence of genetic correlation between COVID-19 outcomes and potential predictors of selection determined a priori (smoking, education, and income), which could either indicate selection bias or a causal pathway to infection. Results from multivariable MR adjusting for these predictors of selection yielded similar results to the main analysis, suggesting the latter. Conclusions: We have proposed a set of analyses for exploring potential selection and misclassification bias in MR studies of risk factors for SARS-CoV-2 infection and COVID-19 and demonstrated this with an illustrative example. Although selection by socioeconomic position and related traits is present, MR results are not substantially affected by selection/misclassification bias in our example. We recommend the methods we demonstrate, and provide detailed analytic code for their use, are used in MR studies assessing risk factors for COVID-19, and other MR studies where such biases are likely in the available data. Mendelian randomisation (MR) has been used to explore risk factors for disease occurrence and 85 prognosis, including for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that 86 causes coronavirus disease 2019 (COVID-19, hereafter referred to as COVID-19 susceptibility/severity) 87 (1, 2). MR can be implemented as an instrumental variable (IV) analysis, which uses genetic variants 88 related to potential exposures, or modifiable risk factors, to explore their causal effects (3, 4). Because 89 genetic variants are randomly allocated at conception, they are not modified throughout the life course, 90 and therefore MR is less likely to be biased by confounding or prevalent disease (i.e., reverse causality) 91 than conventional multivariable regression (3-5). However, like other study designs, MR can be biased, 92 invalidating inferences made (4, 6). Whilst both horizontal pleiotropy (where genetic variants influence 93 the outcome through pathways not including the risk factor of interest) and confounding by population 94 stratification are widely discussed (7-9), selection and misclassification bias are less commonly 95 discussed, particularly in applied papers (6, 10, 11). 96 Selection bias is a systematic error introduced when a study sample systematically differs from the 98 target population, e.g., when there is dropout, subgroup analyses or missing data (12). In the context 99 of COVID-19, selection bias can arise from differential response to COVID-19 study questionnaires, 100 differential risk of being exposed to SARS-Cov-2 infection, and differences in who receives a SARS-101 CoV-2 test, or who is admitted to hospital (13, 14) . For example, if a person never had a SARS-CoV-2 102 test, then their COVID-19 status is unknown, and they would not be selected into any analysis that 103 depends on having a test (See Table 1 for more details). Selection bias can occur in genome wide 104 association studies (GWAS) when participants selected into the GWAS are not a random sample of the 105 when universal testing for SARS-CoV-2 infection was not available. In the UK, for example, at the start 114 of the COVID-19 pandemic, people did not receive a SARS-CoV-2 test unless they were symptomatic 115 or working in front-line services. As such, comparing test-positive COVID-19 participants to population 116 controls (i.e. any person who was not a case, such as those who tested negative or whose exposure 117 status to SARS-CoV-2 infection was unknown) could result in the controls including individuals who 118 were infected but were asymptomatic and/or had non-severe symptoms. This would lead to 119 misclassification bias and result in a violation of the MR exclusion-restriction criteria (11) (See Table 1 120 for more details). 121 There have been multiple MR studies investigating the effect of body mass index (BMI) on COVID-19 122 susceptibility/severity (17-23), mostly concluding a causal effect of genetically predicted higher BMI on 123 increased risk of COVID-19 susceptibility/severity. These studies are summarised in Supplementary 124 Table 1 . Most studies used population controls, and other than being described as a limitation, very few 125 attempts have been made to explore selection and misclassification bias, the focus of this paper. 126 Several methods could be used to assess the potential influence of selection and misclassification bias, 127 such as comparing results across different case/control comparison groups and 'no-relevance' control 128 analyses. In the latter, MR analyses are carried out in a sub-population of the 'real/relevant' study 129 population, in which the genetic instrument is not expected to relate to the risk factor of interest. This 130 approach has been previously used in MR studies to assess bias due to horizontal pleiotropy (24-26). 131 In this study, we use the example of the association of BMI on COVID-19 susceptibility/severity to 144 demonstrate methods to explore selection and misclassification bias in two-sample MR studies of risk 145 factors for 147 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 5, 2022. included in this release and we used the next most recent data for these (groups A1, C1 and D1). 179 Although in October 2020 COVID-19 HGI recommended that groups A1, C1 and D1 should not be 180 considered (no further explanations are given on the website), we used these groups in this study as 181 the aim was to assess selection and misclassification bias, of which the different case/control definitions 182 can be informative. 183 UK Biobank contributed significantly to the number of participants included in some COVID-19 GWAS 184 (e.g., 42% of the participants for A2) and it also contributed to the GWAS we used for our main two-185 sample MR analyses. We undertook sensitivity analyses using a previous GWAS of BMI that did not 186 include UK Biobank (see below under statistical analyses). 187 GWAS-significant SNPs (p<5×10 −8 ) were selected as candidate instrumental variables for BMI. Proxy 189 SNPs were used for groups A1, C1 and D1. Where a BMI SNP-outcome association was not available 190 on the COVID-19 GWAS we sought to replace it with a proxy that was within linkage disequilibrium (LD) 191 (and carried out in MR-Base (32)) which included 1 proxy SNP each for A1, C1 and D1 (Supplementary 192 Figure 2)). LD clumping was performed based on r 2 >0.001 using the 1000 Genomes European 193 ancestry reference panel. Up to 507 independent SNPs for BMI were used. 194 For the 'no-relevance' study, GWAS-significant SNPs (p<5×10 −8 ) were selected as candidate 195 instrumental variables for COVID-19, and proxy SNPs were used (as defined above). For groups A1, 196 C1 and D1, a threshold of p<5x10 -6 was used for selection of SNPs due to one or fewer SNPs 197 associated at conventional GWAS-significant level. LD clumping was performed as described above. 198 To ensure harmonisation, we aligned each genetic association for exposure and outcome on the same 200 effect allele. Where possible, we used effect allele frequency (EAF) to ensure palindromic SNPs were 201 correctly aligned. 202 We used inverse variance weighting (IVW) with multiplicative random effects to estimate the effect of 204 BMI on COVID-19 susceptibility/severity (33). Standard errors (SE) were corrected to account for any 205 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Sensitivity analyses and methods to explore the plausibility of MR assumptions 209 We conducted a series of sensitivity analyses to assess the validity of the core MR assumptions (5). To 210 check the relevance assumption and weak instrument bias, we estimated the mean F statistics and 211 total R 2 for the combined association of all genetic variants for BMI, based on the main analysis using 212 the most recent GIANT GWAS (i.e. including UK Biobank) for each case/control group (34, 35). For the 213 'no-relevance' analysis, we calculated the pseudo F-statistic and R 2 both with (36) and without (37) 214 assuming a prevalence for each case control comparison based on external information (38). We 215 checked for between-SNP heterogeneity in the main IVW analyses using the Cochran's Q test and 216 undertook sensitivity analyses using MR-Egger (33) and weighted-median (39), which, when certain 217 conditions are met, would be unbiased in the presence of unbalanced horizontal pleiotropy. Single-SNP 218 analysis and leave-one-out analysis were also conducted. As UK Biobank contributed to both the 219 COVID-19 HGI and the most recent GIANT GWAS for BMI, we also repeated analyses (main analysis 220 only) using summary results from a previous GIANT BMI GWAS, which does not include UK Biobank, 221 as a sensitivity analysis to explore any bias due to sample overlap. Analyses were repeated using 79 222 independent SNPs and results were compared to our main analysis results. 223 To assess bias caused by population stratification, i.e. confounding by population (see Table 2 ), we 224 used skin tanning as a negative control outcome to compare the association observed between BMI 225 and COVID-19 with the association observed between BMI and skin tanning (7). Here, a negative 226 control would be a variable that is not expected to be associated with the outcome but has the same 227 underlying population structure. Evidence of an association when using the negative control outcome 228 could indicate bias from population stratification. For this analysis, we also used the BMI GWAS not 229 including UK Biobank. We similarly explored bias in the COVID-19 GWAS by using COVID-19 as the 230 exposure and skin tanning as the outcome. 231 We compared different case/control definitions to explore potential misclassification or selection bias. 233 For example, where the case definition remains the same, e.g. for C1 (COVID-19 positive vs lab/self-234 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 5, 2022. report negative) and C2 (COVID-19 positive vs population), only those tested for SARS-CoV-2 (or those 235 who self-reported COVID-19 status) could be included as cases, and selection bias can be present due 236 to individual characteristics related to having a test or self-reporting COVID-19 status, and therefore 237 being in the analysis. However, when we compare COVID-19 positive cases to population controls (C2), 238 then we make the assumption that those who have not been tested are negative (hence classified as 239 controls), i.e. differential misclassification due to individual characteristics being related to getting 240 tested, and therefore being misclassified. As such, if effect estimates are similar across different control 241 definitions, then it suggests selection or misclassification bias may not be present. If the point estimates 242 differ, this may indicate selection or misclassification bias is present. More details are given in Table 1 243 and Table 2 . 244 We used COVID-19 as the exposure and BMI as the outcome in a 'no-relevance study' to determine 245 whether the genetic instruments for COVID-19 were related to BMI. Given COVID-19 could not 246 influence BMI assessed prior to 2019 (i.e., before COVID-19 existed), plausibly we would expect null 247 findings. If any effects of COVID-19 on BMI are observed, this suggests the presence of selection bias, 248 or other biases such as horizontal pleiotropy, and indicates that effects of BMI on COVID-19 (main 249 analysis) may be similarly biased. 250 It is possible that some of the COVID-19 SNPs are identifying factors associated with getting tested 251 rather than COVID-19 per se (40). For example, we know that being a non-smoker and being more 252 highly educated are predictors of getting a test (14). Using linkage disequilibrium score regression 253 (LDSC), we estimated the genetic correlation between COVID-19 SNPs and SNPs associated with 254 potential predictors of testing (smoking, education and income) (40). These potential predictors of 255 selection were determined a priori from estimates in the previous literature (13, 14, 41 ) and due to the 256 availability of GWAS summary statistics. 257 We then used multivariable MR (MVMR) to adjust for these predictors of being tested to estimate a 258 direct effect of BMI on COVID-19 independent of potential influences on being tested (42). Evidence of 259 a direct effect which is different to the total effect in the main IVW would support the presence of 260 selection bias. 261 To identify whether the effects of misclassification and selection bias may have changed over the course 262 of the pandemic (given more studies and more participants were included in more recent GWAS), we 263 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 5, 2022. ; https://doi.org/10.1101/2022.03.03.22271836 doi: medRxiv preprint From the 507 SNPs independently associated with BMI, after data harmonisation the number of SNPs 284 ranged from 487 (group A2) to 490 (group D1) (Supplementary Figure 2) . The mean F-statistic for the 285 combined association of these SNPs for BMI was 72.8, explaining 5% of the variation in BMI. 286 In IVW estimates, we found that genetically higher BMI was associated with higher odds of COVID-19 287 for all groups assessed, except A1 (very severe COVID-19 vs not hospitalised COVID-19, OR 0.56 per 288 1 SD higher BMI, 95%CI 0.22, 1.41) which had the smallest sample size (Ncases = 269; Ncontrols = 688). 289 For the remainder, odds ratios ranged from 1.14 (95%CI 1.11, 1.18) for group C2 (COVID-19 vs 290 population) to 1.57 (95%CI 1.39, 1.78) for group A2 (very severe COVID-19 vs population) ( Figure 291 2(a)). Results were consistent when the same case outcome was compared to different control groups 292 (Figure 2(a) ). Sensitivity analyses using MR-Egger and weighted-median showed consistent results to 293 the main IVW results, though with expected wider confidence intervals (Supplementary Figure 3(a) ). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 5, 2022. ; https://doi.org/10.1101/2022.03.03.22271836 doi: medRxiv preprint Analyses to assess whether potential misclassification and selection bias explain some of the 307 identified effects in our illustrative analyses 308 The number of SNPs (including proxies) available after harmonisation varied from two (group B1) to 310 eight (group B2), and the mean pseudo-F-statistics varied from 21.7 (group D1) to 134.2 (group C2) 311 (Supplementary Table 6 ). The percent variation of COVID-19 explained by the combined SNPs 312 included after harmonisation varied from 0.02% (group C2) to 10.6% (group A1). 313 For all seven case/control groups in the 'no-relevance' analyses, the effect estimates of genetic liability 314 to COVID-19 on BMI measured pre-pandemic were all close to the null (Figure 2(b) ), with results 315 consistent when using MR-Egger and weighted-median methods (Supplementary Figure 3(b) ). There 316 was some evidence of between-SNP heterogeneity in groups A1, A2 and B2 for the associations 317 between genetic liability to COVID-19 and BMI (Supplementary Table 4 Table 5) . 319 There was evidence from LDSC analyses of genetic correlation between most groups of COVID-19 321 outcomes and BMI, smoking, education, and income, except for groups C1 and D1. The correlations of 322 COVID-19 outcomes with BMI and smoking were positive, and negative with education and income. 323 Correlations were typically modest to weak (ranging from -0.34 for group C1 and education to 0.35 for 324 group A2 and BMI) ( Table 3) . The genetic correlation with each predictor of getting tested (selection) 325 was overall similar across the different case/control comparison groups. time (OR: 1.41; 95% CI: 1.21 to 1.63 based on data from release 3 compared with OR: 1.14; 95% CI: 336 1.11 to 1.18 based on data from release 6). With the exception of the negative association between 337 COVID-19 severity (as the exposure) and BMI based on COVID-19 data collected up to July 2020 338 (mean difference: -0.010, 95% CI -0.016, -0.003), the 'no relevance' analyses were consistent with the 339 null throughout all data releases (Figure 4(b) ). As expected, since sample sizes for the GWAS 340 increased, all results became more precisely estimated over time. 341 342 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) To our knowledge this is the first MR study of COVID-19 outcomes to take a systematic approach to 356 explore selection and misclassification bias in this area, including a novel application of 'no-relevance' 357 analyses and exploiting time-varying associations. In our illustrative example and in the 'no-relevance' 358 analyses we have conducted appropriate sensitivity analyses to assess the plausibility of the MR 359 assumptions, including comparing estimates from a BMI GWAS without UK Biobank participants 360 included and a negative control design to explore possible bias due to population stratification (7). 361 The summary statistics used for COVID-19 are from the largest available GWAS, incorporating many 362 global studies with differing selection processes (31). The 'no-relevance' analysis results may have 363 been biased by weak instruments, as there were far fewer SNPs for any COVID-19 instruments than 364 for BMI (the exposure in our 'real' illustrative example). Weak instrument bias in two-sample MR is 365 expected to bias estimates towards the null. However, the F-statistics for the 'no-relevance' COVID-19 366 instruments varied from 22 to 134, suggesting weak instrument bias is unlikely. For some of the COVID-367 19 exposures in the 'no-relevance' analyses we had to relax the p-value threshold in order to have 368 genetic instruments, and this could result in increased likelihood of horizontal pleiotropy. Consistent 369 results from MR-Egger, weighted median analyses, and the 'no-relevance' analyses, including analysis 370 over time, where sample sizes increased, suggest that any weak instrument or horizontal pleiotropy 371 bias is likely to have been small. 372 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 5, 2022. ; https://doi.org/10.1101/2022.03.03.22271836 doi: medRxiv preprint In our illustrative example, all our outcomes were binary and we have carried out analyses on the log 373 odds ratio scale in order to compare results of the effect of BMI on COVID-19 with others that have 374 been published (17-23). The symmetrical nature of the odds ratio, where an interaction between the 375 exposure (BMI) and the outcome (COVID-19) is on the multiplicative scale, means that the exposure 376 and outcome would need to be associated with selection for the odds ratio to be biased (43). As we 377 have carried out analyses on the log odds ratio scale, as opposed to the risk difference scale, the 378 magnitude of any biases here may be smaller than those observed for a continuous outcome, or for 379 analyses of COVID-19 carried out on the risk difference scale. 380 Our 'no-relevance' analyses is valid if our assumption that liability to COVID-19, which did not exist 381 before late 2019/early 2020, could not influence outcomes such as BMI, that already existed (were 382 assessed) before. We feel this is plausible but, as with other 'no-relevance' analyses, violation of this 383 assumption cannot be ruled out. For example, susceptibility to COVID-19 is likely influenced by innate 384 immunity and may also be influenced by acquired immunity from other infections. Thus, in other MR 385 studies if an association in the 'no-relevance study' is similar to that of the 'real analyses', interpreting 386 that as bias in the main analysis might not be correct. This is the reason why we feel that exploring bias 387 through a number of approaches, including exploring factors related to selection, undertaking 388 multivariable MR with those, and comparing different case/control groups is important, as well as 389 considering for specific questions the extent to which the 'no-relevance' assumption might be violated. 390 Indeed, previous analyses exploring the association between COVID-19 on stroke have been framed 391 as such (44) Table S1 in supplementary material). In comparison to analyses from release 415 4, effect estimates for the association between BMI and COVID-19 susceptibility and severity in our 416 analyses using release 6 were smaller and estimated with greater precision, potentially reflecting larger 417 samples and greater power. As testing capacity increases globally, there may be less misclassification 418 of population controls in the more recently included data, and less selection bias in who can obtain a 419 test for SARS-CoV-2 infection. Alternatively, as the pandemic persists and behaviours change, risk 420 continues to unfold, whilst balancing that with the need for unbiased/minimally bias results is paramount. 431 Risk of selection bias in this context is high, given, for example, pressures on who does and does not 432 experience and/or report symptoms, who gets tested (whether symptomatic or not) and how that has 433 changed over time. Our a priori assumption was that genetic, including GWAS and MR, analyses are 434 prone to bias from such selection. We are not aware of any previous studies that have explicitly explored 435 such bias in MR studies of COVID-19 outcomes. 436 Given that selection and misclassification bias are an issue in this area, we have systematically carried 437 out a range of methods, as outlined in Table 2 , to help test for these biases using an illustrative example 438 of BMI and COVID-19. The impact of selection bias and misclassification will depend on the variables 439 included in an analysis and on the underlying data used. Therefore, although we did not find evidence 440 of bias in our illustrative example of BMI and COVID-19, we recommend that researchers consider 441 potential misclassification and selection bias in MR analyses, particularly when using highly selected 442 phenotypes or disease outcomes. We suggest researchers to look across the different methods used 443 here and consider whether together they suggest no selection, misclassification bias or some other 444 sources of bias. However, depending on the available data, it may not be possible to conduct every 445 sensitivity analysis. 446 447 Unanswered questions and future research 448 As summary statistics for COVID-19 continue to be updated, these analyses should be used with the 449 methods we suggest (including the time variation assessment) to improve understanding of the 450 aetiology of COVID-19 susceptibility and severity. Future effect estimates obtained using summary data 451 may be less prone to selection bias as universal testing becomes more widespread, particularly in most 452 high-income countries, and salient symptoms have been defined for adults. However, some selection 453 is likely to remain, such as if those whose jobs and livelihood depend on being tested negative either 454 do not get tested or do not report positive results. 455 As individual studies with genotypic data obtain more data on COVID-19 in their study samples, it may 456 be possible to have enough power to carry out participant-level MR analyses. Where individual-level 457 data are available, further methods are available for exploring and/or accounting for selection bias, 458 including IPW, which can be used to weight the study sample to better reflect the population of interest 459 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 5, 2022. ; https://doi.org/10.1101/2022.03.03.22271836 doi: medRxiv preprint Table 1 Case and control definition (from COVID-19 Host Genetics Initiative) and causal questions Figure 1 Flow chart of methods carried out to investigate selection and misclassification bias Table 2 Summary of Mendelian randomisation methods carried out to investigate selection bias and misclassification bias in analyses of body mass index (BMI) and COVID-19 susceptibility and severity CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 5, 2022. ; offer a method to adjust for potential selection. This approach has previously been described in the non-instrumental variable literature as a method to account for selection bias (48). A direct effect, estimated via MVMR, attenuated from the unadjusted total effect can indicate any one of pleiotropy, mediation, or selection bias. Therefore, researchers would need to use subject matter knowledge to determine which would be most likely and carefully consider what variables may be considered predictors of selection in an analysis. Negative control exposure/outcome We have used a negative control study to test for population stratification. However, it is also possible to use a negative control design to test for selection bias. Here, a negative control would be a variable that is not expected to be associated with the exposure/outcome, but is subject to the same selection mechanisms. Any evidence of an association would therefore indicate selection bias is present. In this analysis, whilst many variables have been shown to be associated with selection, they have also been shown to be associated with BMI and/or COVID-19 susceptibility/severity, and therefore would not satisfy the conditions of being a negative control (18). Note that these methods are not explicitly for testing for selection bias and can be used to test for different sources of bias in Mendelian randomisation. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 5, 2022. ; https://doi.org/10.1101/2022.03.03.22271836 doi: medRxiv preprint Footnote: IVW: inverse variance weighting; MD: mean difference; OR: odds ratio; SNPs: single nuclear polymorphism. Smoking Education Income rG (se) P value rG (se) P value rG (se) P value rG (se) P value A1: Very severe COVID-19 vs not hospitalised COVID-19* n/a n/a n/a n/a rG: Genetic correlation between two traits; se: standard error *Small heritability for this phenotype and therefore genetic correlation cannot be estimated Note: Education (years of schooling, ieu-a-1239); smoking (cigarettes per day, ieu-b-25) and income (ukb-b-7408) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 5, 2022. ; https://doi.org/10.1101/2022.03.03.22271836 doi: medRxiv preprint Footnote: OR: odds ratios; SNPs: single nuclear polymorphism. OR correspond to inverse variance weighing (IVW) estimates. Release 3: released on 2 nd July 2020; release 4: released on 20 th October 2020; release 5: released on 18 th January 2021; release 6: released on 15 th June 2021. For comparison group B1 a threshold of p<5x10 -6 was used for selection of SNPs (across all releases to allow for comparison) due to the inexistence or small number of SNPs associated at conventional GWAS-significant level. No SNPs were available for C2 at release 3. Note that these are not mutually exclusive samples over time and different SNPs predict COVID-19 each time. Bias from 540 questionnaire invitation and response in COVID-19 research: an example using ALSPAC Exploring 543 selection bias in COVID-19 research: Simulations and prospective analyses of two UK cohort studies Has GWAS lost its status as a paragon of open science? Collider scope: when selection 548 bias can substantially influence observed associations Cardiometabolic risk factors 551 for COVID-19 susceptibility and severity: A Mendelian randomization analysis Impact of body composition on COVID-19 susceptibility 554 and severity: A two-sample multivariable Mendelian randomization study Causal Inference for Genetic Obesity Cardiometabolic Profile and COVID-19 Susceptibility: A Mendelian Randomization Study. Front Genet Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, 598 in 1868 Caucasians Using multiple 600 genetic variants as instrumental variables for modifiable risk factors. Statistical methods in medical 601 research A Better Coefficient of Determination for Genetic 603 Assessing 605 the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger 606 regression: the role of the I2 statistic SARS-CoV-2 antibody 608 prevalence in England following the first peak of the pandemic Randomization with Some Invalid Instruments Using a Weighted Median Estimator An atlas of genetic 614 correlations across human diseases and traits Collider bias 616 undermines our understanding of COVID-19 disease risk and severity We are extremely grateful to the British Heart Foundation-National Institute of Health Research (NIHR) 474 COVIDITY-COHORT group for their helpful discussions throughout. We also thank the COVID-19 HGI 475 and GIANT consortia for making GWAS summary statistics publicly available. The acknowledgment to 476 the people and studies/projects that contributed data to COVID-19 HGI is available at 477 . 478 The datasets analysed during the current study are all publicly available GWAS. Analysis scripts and 480 the analysis plan can be found on the following GitHub page: 481 https://github.com/gc13313/BMI_COVID_2SMR. 482 Ethics approval was not required for this study, given only summary-level (GWAS) data has been used. 484