key: cord-0002518-70dhlqoy authors: Rimpau, Sebastian; Joffe, Ari R. title: Interpretation of gene associations with risk of acute respiratory distress syndrome: P values, Bayes factors, positive predictive values, and need for replication date: 2016-12-21 journal: Crit Care DOI: 10.1186/s13054-016-1550-8 sha: 5b38429f9731b312a61176ad7299ce2b965c35d8 doc_id: 2518 cord_uid: 70dhlqoy nan be produced with certain 'error rates' without consideration of information from outside the experiment" [9] . The fallacy is to think that the P value refers to a hypothesis probability, involving inductive reasoning back from evidence (observations) to underlying truth [9] [10] [11] . This leads to misinterpretations of the P value (Table 2) . To make the inductive inference about hypothesis probability requires Bayesian methods. Bayesian methods are conceptually simple: (Prior-odds of null hypothesis)(Bayes factor) = (Posterior-odds of null hypothesis) [9] . The prior-odds are based on evidence external to the study concerning the plausibility of the null hypothesis; in a field of study, this is the ratio of the number of "true relationships" to "no relationships" among those tested in the field [12] . The Bayes factor (BF) measures the relative support, from the observed evidence, for two hypotheses: (Probability of the data given the null hypothesis)/(Probability of the data given the alternative hypothesis). The BF modifies the prior probability to give the posterior probability of the null hypothesis (or, if one reverses the numerator and denominator, the post-study probability that there is a true association: positive predictive value (PPV)). One can calculate, from the same numbers used to calculate the P value, the minimum BF: the strongest evidence against the null hypothesis, using the best supported hypothesis (the observed association) as the alternative-hypothesis [11] . One can also calculate the PPV of a statistically significant finding using the prior probability of an association, the BF based on power and alpha [BF = αβ/(1 − α)(1 − β)], in addition to bias (affecting the accuracy of the alpha and ARDS acute respiratory distress syndrome (including PaO 2 /FIO 2 ≤ 300), CELF-2 elav-like family member 2, CFTR cystic fibrosis transmembrane conductance regulator, IL1-ra interleukin 1 receptor antagonist, MV invasive (endotracheal tube or tracheostomy) or noninvasive (nasal prongs or face mask)-oxygen delivered by low or high flow via nasal cannula was not considered MV [4] , MYLK myosin light chain kinase, OR odds ratio, SNP single nucleotide polymorphism, SP-B surfactant protein B, TIA-1 T-cell intracellular antigen 1 Quality control: 5-10% genotyped a second time; blinded analysis of genotype to clinical status (but also state; two individuals independently assessed the results from the analyses and assigned genotypes [4] ). Comparisons are between 0 and 1 versus 2 copies; or sometimes 0 versus 1 or 2 copies also reflecting our estimate of the prior-odds) [12] [13] [14] . The PPV is lowered by low study power (smaller studies with small expected effect sizes), low pre-study odds (hypothesis-generating experiments), bias (flexibility in designs, definitions, outcomes, and analytic modes), and number of teams working in the field (hotter scientific fields) [12] . There are some surprising results of Bayesian methods ( Table 2) . There is evidence to support the predictions from Bayesian methods in interpreting study results (Table 2) . This is particularly so in genetic-association studies where the expected true (when there is a true association) odds ratios for common SNPs with common complex diseases (such as ARDS) is repeatedly found to be 1.1-1.4; this means that studies have low power unless there are >1000 subjects [12, 15] . This empirical evidence (Table 2) suggests that Bayesian methods, which keep statistical evidence (conveyed traditionally by the P value and more usefully by the BF) distinct from inductive inferences about hypotheses, are useful because they incorporate data external to the study (estimation of priors) in order to arrive at a conclusion about a hypothesis (posterior probability of the probed association being true) [9] [10] [11] [12] . Using the growing cohort of patients, six ARDS geneassociation studies have been published by this group (Table 1 ) [2] [3] [4] [5] [6] [7] . These reports were well done according to reporting guidelines [15] . We ask three questions to improve interpretation of these (and, in general, future) gene-association studies in critical care. The P value is the probability that the null hypothesis is true The P value assumes the null hypothesis is true [10] P ≤ 0.05 means the null hypothesis is false, or should be rejected P ≤ 0.05 simply flags the data as being unusual if all the assumptions used to compute it were correct [10] P > 0.05 means the null hypothesis is true, or should be accepted P > 0.05 only suggests that the data are not unusual if all the assumptions used to compute it were correct; the same data would also not be unusual under many other hypotheses [10] If you reject the null hypothesis because P ≤ 0.05, the chance your "significant finding" is a false positive is 5% The P value only refers to how often you would be in error over very many uses of the test across different studies, and not in a single use of the test [10] Surprising results of Bayesian methods In late-phase clinical trials with equipoise (the prior probability of the null hypothesis is 50%), a study with a P = 0.05 makes the posterior probability of the null hypothesis no less than 13% [11] In more exploratory research (the prior probability of the null hypothesis is, say, 75%), a study with a P = 0.05 or P = 0.01 makes the posterior probability of the null hypothesis no less than 31% and 10%, respectively [11] An adequately powered (80%) exploratory epidemiologic (prior 1:10, bias 0.3, α = 0.05) study with a statistically significant finding has a positive predictive value (PPV) 20% and, if underpowered (20%), a PPV of 10% [12] In large traditional cohort studies (prior 1:20, bias 0.1, α = 0.05, power 90%), the false positive to false negative ratio of findings is 32:1 [13] In a well done (power 95%, α = 0.05) cohort study testing SNPs with less than compelling evidence (prior 1:100), with a statistically significant finding (P = 0.05 or 0.01) the PPV is 16.1% and <60%; even with fairly compelling prior evidence (prior 1:10), the PPV is 67.9% and <90% [14] Surprising empirical evidence supporting the predictions of Bayesian methods In traditional genome epidemiology [a "few candidate risk factors are selected based on diverse considerations" (low prior); small sample size (low power, given the small size of expected effect); "discovery hunting using conventional levels" of statistical significance, confounding, selective reporting (bias)], the crude replication rate of statistically significant genetic associations is~1.2% [13] Hallmarks of discovery exploratory research (low priors, low BF, high bias): "vibration of effects" (evidence of inflated early effect sizes in epidemiologic associations), "Proteus phenomenon" (a rapid early sequence of extreme, opposite results in retrospective hypothesisgenerating molecular research), and "winners curse" (the first positive study provides inflated estimates compared to reality) [12, 13] Table 3 . If bias is low (0.05-0.2 = "the proportion of probed analyses that would not have been 'research findings', but nevertheless end up presented and reported as such, because of bias"), power is 50%, and prior is very high (0.1), the PPV that a statistically significant finding is true is 35-55% [12] . The observed evidence (the P value, or better yet, the BF) can be combined with prior considerations of plausibility to determine how well two hypotheses are supported (posterior probability, PPV). The posterior probability (PPV) that there is an association between exploratory Table 3 Considerations relevant to interpretation of the results of gene-association studies for the risk of ARDS Question What is the prior probability of the gene association? Pathophysiology of ARDS is very complex [1] Severity of ARDS likely also depends on the inciting cause (e.g., pathogen) and its duration prior to appropriate treatment [15] Many gene-association studies are the first to examine for an association between the particular genes with the development of ARDS in a hot field of interest [12, 13] It may be unknown if the SNPs are associated with changes in their respective protein levels or function (e.g., there is no change in the amino acid sequence of the CELF-2 protein) The rationale for exploration for a gene association is often based on limited prior information (e.g., the current study was done because of the previous finding, in one study with the same cohort, that CFTR gene variants are associated with ARDS in African American children with CAP) [3, 13, 14] What is the minimum BF observed, or the PPV? The minimum BF for a P = 0.03 and P = 0.01 are 0.095 and 0.036, respectively. This modifies the prior: even if very high (e.g., prior of 0.25 for the alternative hypothesis), the null hypothesis probability is lowered to no less than 22% or 10%, respectively [11] The PPV, assuming a prior of 0.01, power 0.80, and P values between 0.001 to 0.01, is between 55-90%; assuming a more realistic prior of 0.001, the PPV is 10-50% [14] Given the expected odds ratios of gene association for complex diseases such as ARDS are <1.5, and the often low number of patients with ARDS, the power of the study was well below 0.5. This lowers the PPV even more [12, 15] How much bias occurred? Attrition bias: not all the cohort has genotyping done. In this example: -The number of patients is often lower in more recent times [15] Selection bias: flexibility in eligibility criteria due to different definitions of conditions. In this example: -CAP definition required at least two of tachypnea, dyspnea, or hypoxemia, but in one study the list also included cough or abnormal chest exam (selection bias) [6, 12, 13, 15] Analysis and reporting biases: flexibility in definitions of predictor variables, in decisions of which covariates to adjust for, in decisions of outcomes to examine, and in which analyses to report. In this example: [12, 13, 15] a. Predictor variables definitions: only in the CFTR studies was age categorized as <11 versus ≥11 years; the cohorts were analyzed separately, although in other studies African Americans and Caucasians were combined or only results for African Americans were reported [2, 3, 4, 6, 7] b. Covariates to adjust for: in the one study where it was examined, a statistically significant association was "lost" if adjusted for bacterial culture positivity; asthma was forced into the multiple regressions in only this most recent study; other known SNPs were not adjusted for (especially those the same group previously found associated with ARDS: IL-1ra A1 allele, SP-B); duration of symptoms prior to enrolment, treatment (e.g., antibiotics), and duration of treatment are not reported or adjusted for [2, 4, 6] c. Outcomes to determine and report: the outcome was ARDS, but in other studies also included mechanical ventilation, severe sepsis, vasopressor use, and renal or hematologic dysfunction, and could include hospital admission, PICU admission, and duration of hospitalization [3, [4] [5] [6] [7] ARDS acute respiratory distress syndrome, BF Bayes factor, CAP community acquired pneumonia, CELF-2 elav-like family member 2, CFTR cystic fibrosis transmembrane conductance regulator, IL1-ra interleukin 1 receptor antagonist, PPV positive predictive value, SNP single nucleotide polymorphism, SP-B surfactant protein B SNPs and severity of ARDS in children is low given the low prior probability, the modest BF (reflected in modest P values and power), and potential for bias. This is not necessarily a problem if our interest is in generating hypotheses for further scientific study [13] . An interesting hypothesis has been suggested (i.e., a gene association) and warrants further investigation; we should wait for replication in additional larger studies before accepting this hypothesis. These future studies will have a prior probability that is closer to 0.1 (the posterior probability after the current study), and thus replication would move us much further toward accepting the hypothesis [13] [14] [15] . Overall, caution is warranted: most genetic associations for ARDS in adults have not replicated [8] . Abbreviations ARDS: Acute respiratory distress syndrome; BF: Bayes factor; CAP: Community-acquired pneumonia; CELF2: Elav-like family member 2; CFTR: Cystic fibrosis transmembrane conductance regulator; PPV: Positive predictive value; SNP: Single nucleotide polymorphism; TIA-1: T-cell intracellular antigen-1 Dahmer MK, for the Pediatric Acute Lung Injury Consensus Conference Group. Pathobiology of acute respiratory distress syndrome Association of polymorphisms in genes of factors involved in regulation of splicing of cystic fibrosis transmembrane conductance regulator mRNA with acute respiratory distress syndrome in children with pneumonia Association of cystic fibrosis transmembrane conductance regulator gene variants with acute lung injury in African American children with pneumonia Interleukin-1 receptor antagonist intron 2 variable number of tandem repeats polymorphism and respiratory failure in children with community-acquired pneumonia Genetic variation in MYLK and lung injury in children and adults with community-acquired pneumonia The influence of genetic variation in surfactant protein B on severe lung injury in African American children Lack of association of the caspase-12 long allele with community-acquired pneumonia in people of African descent Distinct and replicable genetic risk factors for acute respiratory distress syndrome of pulmonary or extrapulmonary origin Toward evidence-based medical statistics. 1: the p value fallacy Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations Toward evidence-based medical statistics. 2: the Bayes factor Why most published research findings are false The false-positive to false-negative ratio in epidemiologic studies Distinguishing true from false positives in genomic studies: p values STrengthening the Reporting of Genetic Association Studies (STREGA)-an extension of the STROBE statement Not applicable. There was no funding for this work. Not applicable. The authors declare that they have no competing interests. Ethics approval and consent to participate Not applicable. Authors' contributions SR and ARJ made substantial contributions to conception and design and interpretation of data; participated sufficiently in the work to take public responsibility for the content and agreed to be accountable for all aspects of the work. ARJ wrote the first draft of the manuscript. SR revised the manuscript critically for important intellectual content. SR and ARJ have given final approval of the version to be published.