Dissolving the missing heritability problem Abstract: Heritability estimates obtained from genome-wide association studies (GWAS) are much lower than those of traditional quantitative methods. This phenomenon has been called the “missing heritability problem”. By analyzing and comparing GWAS and traditional quantitative methods, we first show that the estimates obtained from the latter involve some terms other than additive genetic variance, while the estimates from the former do not. Second, GWAS, when used to estimate heritability, do not take into account additive epigenetic factors transmitted across generations, while traditional quantitative methods do. Given these two points we show that the missing heritability problem can largely be dissolved. Pierrick Bourrat* Macquarie University, Department of Philosophy North Ryde, NSW 2109, Australia Email: p.bourrat@gmail.com The University of Sydney, Department of Philosophy, Unit for the History and Philosophy of Science & Charles Perkins Centre Camperdown, NSW 2006, Australia Qiaoying Lu* Sun Yat-sen University, Department of Philosophy Xingangxi Road 135 Guangzhou, Guangdong, China * PB and QL contributed equally to this work. Acknowledgements We are thankful to Steve Downes, Paul Griffiths and Eva Jablonka for discussions on the topic. PB’s research was supported under Australian Research Council's Discovery Projects funding scheme (project DP150102875). QL’s research was supported by a grant mailto:p.bourrat@gmail.com from the Ministry of Education of China (13JDZ004) and “Three Big Constructions” funds of Sun Yat-sen University. Dissolving the missing heritability problem Abstract: Heritability estimates obtained from genome-wide association studies (GWAS) are much lower than those of traditional quantitative methods. This phenomenon has been called the “missing heritability problem”. By analyzing and comparing GWAS and traditional quantitative methods, we first show that the estimates obtained from the latter involve some terms other than additive genetic variance, while the estimates from the former do not. Second, GWAS, when used to estimate heritability, do not take into account additive epigenetic factors transmitted across generations, while traditional quantitative methods do. Given these two points we show that the missing heritability problem can largely be dissolved. 1. Introduction. One pervasive problem encountered when estimating the heritability of quantitative traits is that the estimates obtained from genome-wide association studies (GWAS) are much smaller than that calculated by traditional quantitative methods. This problem has been called the missing heritability problem (Turkheimer 2011). Take human height for example. Traditional quantitative methods deliver a heritability estimate of about 0.8, while the first estimates using GWAS were 0.05 (Maher 2008). More recent GWAS methods have revised this number and estimate the heritability of height to be 0.451 (Yang et al. 2010; Turkheimer 2011). Yet, compared to traditional quantitative methods, half of the heritability is still missing. In quantitative genetics, heritability is defined as the portion of phenotypic variance in a population that is due to genetic difference (Falconer and Mackay 1996; Downes 2015; Lynch and Bourrat 2017). Traditionally, this portion is estimated by measuring the phenotypic resemblance of genetically related individuals without identifying genes at the molecular level (more particularly DNA sequences). GWAS have been developed in order to locate the DNA sequences that influence the target trait and estimate their effects, especially for common complex diseases such as obesity, diabetes and heart disease 1 According to Yang et al. (2015), GWAS may deliver a higher estimate of the heritability of height in the future. (Visscher et al. 2012; Frazer et al. 2009). As for height, almost 300 000 common DNA variants in human populations that associate with it have been identified by GWAS (Yang et al. 2010). Granted by many that the heritability estimates obtained by traditional quantitative methods are quite reliable, the method(s) used in GWAS have been questioned (Eichler et al. 2010). A number of partial solutions to the missing heritability problem have been proposed, with most of them focusing on improving the methodological aspects of GWAS in order to provide a more accurate estimate (e.g., Manolio et al. 2009; Eichler et al. 2010). Some authors have also suggested that heritable epigenetic factors might account for part of the missing heritability. For instance, in Eichler et al. (2000, 488), Kong notes that “[e]pigenetic effects beyond imprinting that are sequence-independent and that might be environmentally induced but can be transmitted for one or more generations could contribute to missing heritability.” Furrow et al. (2011) also claim that “[e]pigenetic variation, inherited both directly and through shared environmental effects, may make a key contribution to the missing heritability.” Others have made the same point (e.g., McCarthy and Hirschhorn 2008; Johannes et al. 2008). Yet, in the face of this idea one might notice what appears to be a contradiction: how can epigenetic factors account for the missing heritability, if the heritability is about genes? To answer this question as well as to analyze the missing heritability problem, we compare the assumptions underlying both heritability estimates in traditional quantitative methods and those in GWAS. We make two points. First, traditional methods typically overestimate heritability (narrow-sense heritability, ℎ²) because these estimates do not successfully isolate the additive genetic component of phenotypic variance, which is part of the definition of ℎ² (see Section 2), from the non-additive genetic and non-genetic ones and the potential effects of assortative mating. Second, the concept of the gene used in the definition of ℎ2 is an evolutionary one, and it differs from the one used in GWAS which is DNA centered. This means that the heritability estimates obtained from traditional methods can include heritability due to heritable epigenetic factors (which can be regarded as evolutionary genes) while the effects from these factors are not included in the estimates obtained from GWAS. With these two points taken into account, we expect the missing heritability problem to be largely dissolved as well as setting the stage for further discussions. The reminder of the paper will be divided into three parts. First, we briefly introduce two ways in which heritability is estimated in traditional methods, namely twin studies and parent-offspring regression. We show that the estimates obtained by each way include some non-additive and (or) non-genetic elements and consequently overestimate ℎ². Second, we outline the basic rationale underlying GWAS and illustrate that they estimate heritability by considering solely DNA variants. By arguing that the notion of additive genetic variance used in traditional methods does not necessarily refer to DNA sequences but can also refer to epigenetic factors, we show that the notion of heritability estimated in GWAS is more restrictive than ℎ². Finally, in Section 4, based on the conclusions from Section 2 and Section 3, we show that the missing heritability problem can be partly dissolved in two ways. One is that if non-additive and non-genetic variance was removed from the estimates obtained via traditional methods, these estimates would be lower. The other is that if additive epigenetic factors were taken into account by GWAS, the heritability estimates obtained would be higher. We conclude Section 4 by demonstrating how our analysis sheds some light on a discussion about the role played by non-additive factors in the missing heritability problem. Because human height has been “the poster child” of the missing heritability problem (Turkheimer 2011, 232), we will use it to illustrate each of our points. 2. Heritability in Traditional Quantitative Methods. Although there exist different definitions of heritability (Jacquard 1983; Bourrat 2015; Downes 2009), according to the standard model of quantitative genetics, the phenotypic variance (𝑉𝑃) of a population can be explained by two components, its genotypic variance (𝑉𝐺) and its environmental variance (𝑉𝐸). In the absence of gene-environment interaction and correlation, we have: 𝑉𝑃 = 𝑉𝐺 + 𝑉𝐸 (1) From there broad-sense heritability (𝐻2) is defined as: 𝐻2 = 𝑉𝐺 𝑉𝑃 (2) 𝑉𝐺 can further be portioned into the additive genetic variance (𝑉𝐴), the dominance genetic variance (𝑉𝐷) and the epistasis genetic variance (𝑉𝐼). Thus Equation (1) can be rewritten as: 𝑉𝑃 = 𝑉𝐴 + 𝑉𝐷 + 𝑉𝐼 + 𝑉𝐸 (3) where 𝑉𝐴 is the variance due to alleles being transmitted from the parents to the offspring that contribute to the phenotype. 𝑉𝐷 is the variance due to interactions between alleles at one locus for diploid organisms, and 𝑉𝐼 is the variance due to interactions between alleles from different loci. 𝑉𝐷 and 𝑉𝐼 together represent the variance due to particular combinations of genes of an organism. Because genotypes of sexual organisms recombine at each generation via reproduction, the effects of combinations of genes, namely dominance and epistasis effects (measured respectively by 𝑉𝐷 and 𝑉𝐼) are not transmitted across generations; only the effects of the genes independent from their genetic background (measured by 𝑉𝐴) are. By taking only 𝑉𝐴 into account, narrow-sense heritability (ℎ 2) which “expresses the extent to which phenotypes are determined by the genes transmitted from the parents” (Falconer and Mackay 1996, 123) is defined as: ℎ2 = 𝑉𝐴 𝑉𝑃 (4) ℎ2 is used in breeding studies and by evolutionary theorists who are interested in making evolutionary projections while broad-sense heritability (𝐻2) is most used by behavioral geneticists and psychologists (Downes 2015). Following Equation (4), to know ℎ2, both 𝑉𝐴 and 𝑉𝑃 should be known. 𝑉𝑃, for most quantitative traits (including height), can be directly obtained by measuring phenotypes of individuals. However, traditional quantitative methods do not permit to obtain 𝑉𝐴 directly. It is classically obtained by deduction. This deduction is based on two types of information. First, one needs one or several population-level measures of a phenotypic resemblance of family relative pairs.2 These measures are obtained by calculating the covariance of the phenotypic values for those pairs. Second, one needs the genetic relation between family pairs. It indicates the percentage of genetic materials the pairs are expected to share. With these two pieces of information, assuming a large population with no interaction and correlation between some of the genetic and environmental components, one can estimate how much the genes shared by the two relatives (estimated by 𝑉𝐴) 2 Or the mean values of their class (e.g., offspring) depending on the particular method used. contribute to the phenotypic resemblance. From there, knowing 𝑉𝑃 and having an estimate of 𝑉𝐴 permits to estimate ℎ 2. As mentioned above, for simplicity, traditional quantitative methods usually assume that there is neither gene-environment interaction nor correlation (Falconer and Mackay 1996, 131). In such cases, the covariance between the phenotypic values (e.g., height) of pairs equals to the additive genetic covariance, dominant and epistasis genetic covariance, plus the environmental covariance. Formally, this covariance for the general case can be written as follows: 𝐶𝑜𝑣(𝑃1, 𝑃2) = 𝐶𝑜𝑣(𝐴1 + 𝐷1 + 𝐼1 + 𝐸1, 𝐴2 + 𝐷2 + 𝐼2 + 𝐸2) = 𝐶𝑜𝑣(𝐴1, 𝐴2) + 𝐶𝑜𝑣(𝐷1, 𝐷2) + 𝐶𝑜𝑣(𝐼1, 𝐼2) + 𝐶𝑜𝑣(𝐸1, 𝐸2) (5) where 𝐶𝑜𝑣(𝑃1, 𝑃2) is the covariance between the phenotypic values of one individual with the other with indexes “1” and “2” representing the two family members for each pair studied. 𝐴, 𝐷, 𝐼 and 𝐸 represent additive effects, dominant effects, epistasis effects and environmental effects respectively. The most common pairs of relative used for estimating heritability are twins (both monozygotic and dizygotic). In twin studies, one already knows that monozygotic twins share almost 100% of their genetic material while dizygotic twins about 50%. The environment is typically divided into the part of the environment that affects both twins in the same way (the shared environment, 𝐶) and the part of the environment that affects one twin but not the other (the unique environment, 𝑈) (Silventoinen et al. 2003). In the absence of interaction and correlation between 𝐶 and 𝑈, we have: 𝐸 = 𝐶 + 𝑈 (6) Assuming epistasis effects to be negligible (a common assumption in twin studies), by inserting Equation (6) into Equation (5) in the case of twins, we have: 𝐶𝑜𝑣(𝑃𝑇1, 𝑃𝑇2) = 𝐶𝑜𝑣(𝐴𝑇1 + 𝐷𝑇1 + 𝐶𝑇1 + 𝑈𝑇1, 𝐴𝑇2 + 𝐷𝑇2 + 𝐶𝑇2 + 𝑈𝑇2) = 𝐶𝑜𝑣(𝐴𝑇1, 𝐴𝑇2) + 𝐶𝑜𝑣(𝐷𝑇1, 𝐷𝑇2) + 𝐶𝑜𝑣(𝐶𝑇1, 𝐶𝑇2) + 𝐶𝑜𝑣(𝑈𝑇1, 𝑈𝑇2) (7) where 𝐶𝑜𝑣(𝑃𝑇1, 𝑃𝑇2) is the covariance between the phenotypic values of one twin with the other with indexes “T1” and “T2” representing the two twins for each twin pair studied. Because each twin’s unique environment is, by definition, independent of that of the other twin, 𝐶𝑜𝑣(𝑈𝑇1, 𝑈𝑇2) is nil for both monozygotic and dizygotic twins. Given that variance is a special case of covariance where the two variables are identical, and that for monozygotic twins 𝐴𝑇1, 𝐷𝑇1, and 𝐶𝑇1equal to 𝐴𝑇2, 𝐷𝑇2, and 𝐶𝑇2 respectively, we can reformulate Equation (7) as follows: 𝐶𝑜𝑣𝑀𝑇 (𝑃𝑇1, 𝑃𝑇2) = 𝑉𝐴 + 𝑉𝐷 + 𝑉𝐶 (8) where 𝐶𝑜𝑣𝑀𝑇 (𝑃𝑇1, 𝑃𝑇2) is the covariance between the phenotypic values of monozygotic twin pairs studied (with “MT” for monozygotic twin). By contrast, dizygotic twins are expected to share half of their genes, which means that the covariance between the phenotypic values of one twin with the other (𝐶𝑜𝑣𝐷𝑇 (𝑃𝑇1, 𝑃𝑇2), with “DT” for dizygotic twin) is expected to be equal to half of the additive genetic variance, a quarter of dominant variance,3 and all of the shared environmental variance (with 𝐶𝑜𝑣(𝑈𝑇1, 𝑈𝑇2) also to be zero). In this case we can rewrite Equation (7) as: 𝐶𝑜𝑣𝐷𝑇 (𝑃𝑇1, 𝑃𝑇2) = 1 2 𝑉𝐴 + 1 4 𝑉𝐷 + 𝑉𝐶 (9) It is classically assumed that, for both monozygotic and dizygotic twin pairs, the shared environment acts in the same way if the pair has been reared together.4 That is to say, the 3 For each given gene with two alleles, the possibility that dizygotic twins have the same genotype is one quarter. 4 Monozygotic twins are often treated more similarly than are dizygotic twins, and are more likely to share a placenta. Hence the shared environments for monozygotic twins are more similar than dizygotic twins. By using adoption twin studies in which environments are random on average, this shared environment difference can be mitigated. term 𝑉𝐶 in Equation (8) and (9) is the same. Hence it can be cancelled by subtracting Equation (9) from Equation (8). Heritability can then be estimated as follows: ℎ𝑇𝑆 2̂ = 2{𝐶𝑜𝑣𝑀𝑇(𝑃𝑇1,𝑃𝑇2)−𝐶𝑜𝑣𝐷𝑇(𝑃𝑇1,𝑃𝑇2)} 𝑉𝑃 = 𝑉𝐴 𝑉𝑃 + 3 2 𝑉𝐷 𝑉𝑃 (10) We label the heritability estimate obtained from twin studies ℎ𝑇𝑆 2̂ , with “ ”̂ symbolizing an estimate. It should be noted that this estimate provides an accurate estimate of neither 𝐻2 nor ℎ2, although it is a better estimate of 𝐻2 than of ℎ2 (Falconer and Mackay 1996, 172). Another often used traditional quantitative method to estimate heritability is using a parent-offspring regression (Falconer and Mackay 1996, 164). This method also assumes neither gene-environment interaction nor correlation. Following these assumptions, we can deduce that the covariance between the height of parents (one or the mean of both, but we will use the case with one parent here) and the mean of their offspring is equal to the additive genetic covariance, dominant covariance (the epistasis covariance is assumed to be small and is not included), plus environmental covariance between the heights of parents and offspring. Formally, in this case, we can write Equation (5) as follows: 𝐶𝑜𝑣(𝑃𝑃 , 𝑃𝑂 ) = 𝐶𝑜𝑣(𝐴𝑃 + 𝐷𝑃 + 𝐼𝑃 + 𝐸𝑃 , 𝐴𝑂 + 𝐷𝑂 + 𝐼𝑂 + 𝐸𝑂 ) = 𝐶𝑜𝑣(𝐴𝑃 , 𝐴𝑂 ) + 𝐶𝑜𝑣(𝐷𝑃 , 𝐷𝑂 ) + 𝐶𝑜𝑣(𝐸𝑃 , 𝐸𝑂 ) (11) where indexes “P” and “O” represent “parents” and “offspring”. Three further assumptions are then made. The first one is that parents are not related and consequently no dominant effects are transmitted from the parents to the offspring (Doolittle 2012, 178), which means that 𝐶𝑜𝑣(𝐷𝑃 , 𝐷𝑂 ) is nil. The second one is that there is no correlation between the parents’ environment and the offspring’s environment so that 𝐶𝑜𝑣(𝐸𝑃 , 𝐸𝑂 ) is also nil. Finally, the third assumption is that there is no assortative mating between parents. Given that on average, parents share in expectation 50% of their genes with their offspring (parents and offspring share half of their genes), it leaves Equation (11) with a result of half of the additive genetic variance ( 1 2 𝑉𝐴). Given 𝑉𝑃, since by definition the slope of the regression of average offspring phenotype on parent phenotype is equal to 𝐶𝑜𝑣(𝑃𝑃,𝑃𝑂) 𝑉𝑃 , which is equal to 1 2 𝑉𝐴 𝑉𝑃 , ℎ2 can be estimated by doubling the value of this slope. But the above three assumptions might be violated. First, there is evidence of inbreeding in human populations (Bittles and Black 2010). Without correcting for inbreeding, 𝐶𝑜𝑣(𝐷𝑃 , 𝐷𝑂 ) is likely to be non-nil. Second, because the environments experienced by individuals are likely to be more similar within a family line, 𝐶𝑜𝑣(𝐸𝑃 , 𝐸𝑂 ) might not be nil, either. Finally, there is evidence of assortative mating in human populations (Guo et al. 2014). The consequences of assortative mating for estimating heritability are complex. That said, in the case of one parent-offspring regression, when the population is at equilibrium, one effect of assortative mating is the overestimation of the value of 𝑉𝐴. If we take these three factors into consideration, the covariance between the parents and their offspring is equal to half of the additive genetic variance, plus a term representing some effects due to dominance, similarities between environments and assortative mating. This can be written formally as: 𝐶𝑜𝑣(𝑃𝑃 , 𝑃𝑂 ) = 1 2 𝑉𝐴 + Ԑ (12) where Ԑ represents the sum of covariance due to some non-additive genetic factors, environmental factors and assortative mating. Heritability if estimated by performing a parent–offspring regression and doubling its slope will thus capture the numerator as 𝑉𝐴 + 2Ԑ rather than solely 𝑉𝐴. Formally we will have: ℎ𝑃𝑂𝑅 2̂ = 2𝐶𝑜𝑣(𝑃𝑃,𝑃𝑂) 𝑉𝑃 = 𝑉𝐴 𝑉𝑃 + 2Ԑ 𝑉𝑃 (13) In light of the equations presented both in twin studies and while performing parent- offspring regressions we can conclude that heritability estimates obtained by these methods will generally overestimate ℎ2, such that: ℎ𝑇𝑀 2̂ = ℎ2 + 𝑜 (14) where the index “TM” is for “traditional methods”, 𝑜 is the part of the estimate contributed by the other component(s) than the ratio of additive genetic variance on phenotypic variance (with “𝑜” for other). In the next section, we analyze the main method used in GWAS. 3. Heritability in GWAS. Although any two unrelated individuals share about 99.5% of their DNA sequences, their genomes differ at specific nucleotide locations (Aguiar and Istrail 2013). Given two DNA fragments at the same locus of two individuals, if these fragments differ at a single nucleotide, they represent two variants of a single nucleotide polymorphism (SNP). GWAS focus on SNPs across the whole genome that occur in the population with a probability larger than 1% referred as “common SNPs”. If one variant of a common SNP, compared to another one, is associated with a significant change on the trait studied, then this SNP is a marker for a DNA region (or a gene) that leads to phenotypic variation. The development of commercial SNP chips makes it possible to rapidly detect common SNPs of DNA samples from all the participants involved in a study. Based on the readings of SNP chips and by using a series of statistical tests, it can be investigated at the population level whether each SNP associates with the target trait. For quantitative traits like height, the test reveals whether the mean height of a group with one variant of a SNP is significantly different from the group with another variant of the same SNP (Bush and Moore 2012).5 With all the SNPs associated with differences in phenotype being identified, data from the HapMap project is then used. The HapMap project provides a list of SNPs that are markers for most of the common DNA variants in human populations (Consortium, International HapMap 3 2010), which permit to identify the exact genomic regions for each SNP. With genetic studies examining those regions, it can then be determined whether the variant of the SNPs associated with a statistically significant difference in height do cause phenotypic variations. These variants are called “causal variants” (Visscher et al. 2012). Other statistical tests combined with the ones mentioned above, of which the details would greatly exceed the scope of the paper, can also be used to estimate the effects of SNPs that associated with height so that the portion of the variance in height explained by these SNPs can be calculated (e.g., Weedon et al. 2008). This portion thus represents the variance contributed by the causal variants. Since biologists classically regard genes as 5 For categorical (often binary disease/control) traits, the test used involves measuring an odds ratio, namely the ratio of the odds of disease for individuals having a specific variant of a SNP, and the odds of disease for individuals with another variant of that SNP. If this odds ratio is significantly different from 1, then that SNP is considered to be associated with the disease (Bush and Moore 2012). only made up of pieces of DNA, it is assumed that this variance should represent the additive genetic variance. With this assumption, and the ratio of this variance on total phenotypic variance in the population, one can estimate ℎ² (Yang et al. 2010; Visscher et al. 2006). However, the claim that additive genetic effects are solely based on DNA sequences is problematic when faced with the evidence of epigenetic inheritance. As was mentioned in Section 2, traditional quantitative methods for estimating heritability are based on measuring phenotypic values and genetic relations without reaching the molecular level. The genes are not defined physically, but functionally as heritable difference makers (Lu and Bourrat, Forthcoming). In other words, they are theoretical units defined by their effects on the phenotype (Griffiths and Neumann-Held 1999, 661; Griffiths and Stotz 2013, 35). With the discovery of DNA structure in 1953, it was thought that the originally theoretical genes were found in the physical DNA molecules. Since then, biologists commonly refer to genes as portions of DNA and so are the geneticists performing GWAS. This step was taken too hastily (Lu and Bourrat forthcoming). If there is physical material, other than DNA pieces, that can affect the phenotype and be transmitted stably across generations, then it should also be thought to play the role that contributes to additive genetic effects. Many studies have provided evidence for epigenetic inheritance, namely the stable transmission of epigenetic modifications across multiple generations and that affect organisms’ traits (e.g., Youngson and Whitelaw 2008; Dias and Ressler 2014). A classical example is the methylation pattern on the promoter of the agouti gene in mice (Morgan et al. 1999). It shows that mice with the same genotype but different methylation levels display a range of colors of their fur, and the patterns of DNA methylation can be inherited through generations causing heritable phenotypic variations. Epigenetic factors such as self-sustaining loops, chromatin modifications and three-dimensional structures in the cell can also be transmitted over multiple generations (Jablonka et al. 2014). Studies on various species suggest that epigenetic inheritance is likely to be “ubiquitous” (Jablonka and Raz 2009). The increasing evidence of epigenetic inheritance seriously challenges the restriction of the concept of the gene in the evolutionary sense to be materialized only in DNA. Relying on traditional quantitative methods, it is impossible to distinguish whether additive genetic variance is DNA based or based on other material(s). Some transmissible epigenetic factors, which are neither DNA based nor caused by DNA variation, might de facto be included in the additive genetic variance used to estimate ℎ². This extension of heritable units also echoes to the recent suggestion that genetic (assuming genes to be DNA based) and non-genetic heredity should be unified in an inclusive inheritance theory (Danchin 2013; Day and Bonduriansky 2011). To apply the idea that some epigenetic factors can lead to additive genetic effects, the additive variance term in Equation (4) should be decomposed into two terms, namely the additive variance of DNA sequences (𝑉𝐴𝐷𝑁𝐴 ) and the additive variance of epigenetic factors (𝑉𝐴𝑒𝑝𝑖 ), assuming there is no interaction between them so that: 𝑉𝐴 = 𝑉𝐴𝐷𝑁𝐴 + 𝑉𝐴𝑒𝑝𝑖 (15) Inserting Equation (15) to Equation (4) leads to: ℎ2 = 𝑉𝐴𝐷𝑁𝐴 𝑉𝑃 + 𝑉𝐴𝑒𝑝𝑖 𝑉𝑃 (16) Here we label the first term on the right side of Equation (16) “DNA-based narrow-sense heritability” (ℎ𝐷𝑁𝐴 2 ), and the second term “epigenetic-based narrow-sense heritability” (ℎ𝑒𝑝𝑖 2 ), we thus have: ℎ𝐷𝑁𝐴 2 = ℎ2 − ℎ𝑒𝑝𝑖 2 (17) 4. Dissolving the Missing Heritability Problem. As was mentioned in Introduction, since the first successful GWAS was published in 2005 (Klein et al. 2005), there have been many proposals for methodological improvements in GWAS (Manolio et al. 2009; Eichler et al. 2010). Studies have been conducted according to those proposals that permit to obtain higher heritability estimates. Examples include increasing the sample sizes which has resulted in more accurate estimates (e.g., Wood et al. 2014), considering all common SNPs simultaneously instead of one by one which has increased the heritability estimates of height from 0.05 to 0.45 (see Yang et al. 2010), and conducting meta-analyses which can lead to more accurate results when compared to single analysis (see Bush and Moore 2012). Biologists have also suggested to search for rare SNPs with frequencies lower than 1% in order to account for a wider range of possible causal variants (Schork et al. 2009). Besides these methodological improvements, which would certainly lead to an increase in heritability estimates obtained from GWAS and thus reduce the gap between the estimates obtained from GWAS and traditional quantitative methods, our analysis reveals two other reasons explaining away the missing heritability problem: a) In traditional quantitative methods heritability is overestimated due to the fact that the methods used cannot fully isolate the additive genetic variance from other components of variance; b) In GWAS, heritability is estimated based solely on causal DNA variants, while in traditional quantitative methods the additive effects contributed by epigenetic difference (𝑉𝐴𝑒𝑝𝑖 ) are de facto included in the estimates. These two reasons, as well as the potential methodological flaws, can be expressed formally using the equations presented in Section 2 and Section 3. Using our terminology, an estimate of the missing heritability (𝑀�̂�) can be obtained by deducing the heritability estimates obtained by GWAS (ℎ𝐷𝑁𝐴 2 ), from estimates obtained by traditional quantitative methods (ℎ𝑇𝑀 2̂ ) plus some of the potential methodological flaws in GWAS mentioned above. We thus have: 𝑀�̂� = ℎ𝑇𝑀 2̂ − (ℎ𝐷𝑁𝐴 2 + 𝑒) (18) with 𝑒 representing errors coming from methodological flaws in GWAS (“e” for errors, and we assume otherwise no measurement errors). Replacing ℎ𝑇𝑀 2̂ and ℎ𝐷𝑁𝐴 2 in Equation (18) by the right hand sides of Equation (14) and (17), we obtain: 𝑀�̂� = ℎ2 + 𝑜 − (ℎ2 − ℎ𝑒𝑝𝑖 2 + 𝑒) = ℎ𝑒𝑝𝑖 2 + 𝑜 − 𝑒 (19) This means that the missing heritability, excluding potential methodological flaws in GWAS, results from the part of heritability originating from additive epigenetic factors, plus the overestimation obtained from family studies, in which the additive genetic term cannot be fully isolated from other terms. Those other terms include non-additive genetic, non-genetic terms and terms coming from assortative mating. Our illustration of how part of the missing heritability problem can be dissolved by considering non-additive genetic factors supports the claim that one reason for the existence of missing heritability might be that almost all GWAS to date have focused on additive genetic effects (McCarthy and Hirschhorn 2008). Although there is not enough data to confirm that non-additive effects do explain away some part of the missing heritability, this claim appears numerous times in discussions on the missing heritability problem (see for instance Maher 2008; Frazer et al. 2009; Eichler 2010). Yang et al. (2010, 565) disagree with this claim and respond that “[n]on-additive genetic effects do not contribute to the narrow-sense heritability, so explanations based on non-additive effects are not relevant to the problem of missing heritability.” We agree with Yang et al. (2010) that non-additive genetic effects do not contribute to ℎ². That said, because the heritability estimates obtained from traditional quantitative methods do not strictly correspond to ℎ² but include some terms different from 𝑉𝐴, those factors cannot be dismissed as irrelevant in the missing heritability debate. And indeed, Visscher et al. (2008, 258) have pointed out that assumptions made in traditional methods such as twin studies may deliver a heritability estimate biased upward. Although Visscher et al. (2008) only mention shared environmental effects for the upward bias as an example, we showed in Section 2 that non-additive genetic effects could also be another one. More recently Yang et al. (2015) also considered this upward bias as one of three hypotheses regarding the missing heritability problem (Bourrat, Lu and Jablonka in press). 5. Conclusion. We have explained away the missing heritability problem in two major ways. First, the heritability estimates from traditional quantitative methods are overestimated when compared to the theoretical definition of heritability, namely ℎ². The resulting estimates would be smaller if the additive genetic component of phenotypic variance was accurately separated from other terms. Second, the theoretical notion of heritability used in GWAS (ℎ𝐷𝑁𝐴 2 ) does not strictly correspond to ℎ 2 for it does not include the additive effects of epigenetic factors on phenotype which are indistinguishable from the effects of DNA sequences. Hence the heritability estimates obtained from GWAS would be superior if those factors were taken into account. We have voluntarily stayed away from the question of whether heritability should be defined strictly relative to DNA sequences or if it should encompass any factors behaving effectively like evolutionary genes. Our inclination is that there is no principled reason to exclude non-DNA transmissible factors from the definitions of heritability, but our analysis does not bear on this choice. References: Aguiar, Derek, and Sorin Istrail. 2013. “Haplotype Assembly in Polyploid Genomes and Identical by Descent Shared Tracts.” Bioinformatics 29 (13): i352–i360. Bittles, Alan H., and M. L. Black. 2010. “Consanguinity, Human Evolution, and Complex Diseases.” Proceedings of the National Academy of Sciences 107 (suppl 1): 1779–786. Bourrat, Pierrick. 2015. “How to Read ‘heritability’ in the Recipe Approach to Natural Selection.” The British Journal for the Philosophy of Science 66: 883–903. Bourrat P., Lu Q., Jablonka E. (In press) Why the missing heritability might not be in the DNA. BioEssays Bush, William S., and Jason H. Moore. 2012. “Genome-Wide Association Studies.” PLoS Computational Biology 8 (12): e1002822. Consortium, International HapMap 3. 2010. “Integrating Common and Rare Genetic Variation in Diverse Human Populations.” Nature 467 (7311): 52–58. Danchin, Étienne. 2013. “Avatars of Information: Towards an Inclusive Evolutionary Synthesis.” Trends in Ecology & Evolution 28 (6): 351–58. Day, Troy, and Russell Bonduriansky. 2011. “A Unified Approach to the Evolutionary Consequences of Genetic and Nongenetic Inheritance.” The American Naturalist 178 (2): E18–E36. Dias, Brian G., and Kerry J. Ressler. 2014. “Parental Olfactory Experience Influences Behavior and Neural Structure in Subsequent Generations.” Nature Neuroscience 17 (1): 89–96. Doolittle, Donald P. 2012. Population Genetics: Basic Principles. Vol. 16. Springer Science & Business Media. Downes, Stephen M. 2009. “Moving Past the Levels of Selection Debates.” Biology and Philosophy 24 (5): 703–09. Downes, Stephen M. 2015. “Heritability.” In Stanford Encyclopedia of Philosophy, ed. Edward N. Zalta. Stanford, CA: Stanford University. Eichler, Evan E., Jonathan Flint, Greg Gibson, Augustine Kong, Suzanne M. Leal, Jason H. Moore, and Joseph H. Nadeau. 2010. “Missing Heritability and Strategies for Finding the Underlying Causes of Complex Disease.” Nature Reviews Genetics 11 (6): 446–450. Falconer, Douglas S., and Trudy F. C. Mackay. 1996. Introduction to Quantitative Genetics. 4th edition. Longman: Benjamin Cummings. Frazer, Kelly A., Sarah S. Murray, Nicholas J. Schork, and Eric J. Topol. 2009. “Human Genetic Variation and Its Contribution to Complex Traits.” Nature Reviews Genetics 10 (4): 241–251. Furrow, Robert E., Freddy B. Christiansen, and Marcus W. Feldman. 2011. “Environment- Sensitive Epigenetics and the Heritability of Complex Diseases.” Genetics 189 (4): 1377–387. Griffiths, Paul E., and Eva M. Neumann-Held. 1999. “The Many Faces of the Gene.” BioScience 49 (8): 656–662. Griffiths, Paul E., and Karola Stotz. 2013. Genetics and Philosophy: An Introduction. Cambridge University Press. Guo, Guang, Lin Wang, Hexuan Liu, and Thomas Randall. 2014. “Genomic Assortative Mating in Marriages in the United States.” PloS One 9 (11): e112322. Jablonka, Eva, and Gal Raz. 2009. “Transgenerational Epigenetic Inheritance: Prevalence, Mechanisms, and Implications for the Study of Heredity and Evolution.” The Quarterly Review of Biology 84 (2): 131–176. Jablonka, Eva, Marion J Lamb, and Anna Zeligowski. 2014. Evolution in Four Dimensions: Genetic, Epigenetic, Behavioral, and Symbolic Variation in the History of Life. Revised edition. MIT Press. Jacquard, Albert. 1983. “Heritability: One Word, Three Concepts.” Biometrics 39 (2): 465–477. Johannes, Frank, Vincent Colot, and Ritsert C. Jansen. 2008. “Epigenome Dynamics: A Quantitative Genetics Perspective.” Nature Reviews Genetics 9 (11): 883–890. Klein, Robert J., Caroline Zeiss, Emily Y. Chew, Jen-Yue Tsai, Richard S. Sackler, Chad Haynes, Alice K. Henning, John Paul SanGiovanni, Shrikant M. Mane, and Susan T. Mayne. 2005. “Complement Factor H Polymorphism in Age-Related Macular Degeneration.” Science 308 (5720): 385–89. Lu, Qiaoying, and Bourrat Pierrick. Forthcoming. “The Evolutionary Gene and the Extended Evolutionary Synthesis.” British Journal for Philosophy of Science. 10.1093/bjps/axw035. Lynch, Kate E., and Bourrat Pierrick. 2017. “Interpreting Heritability Causally.” Philosophy of Science 84 (1): 14–34. Maher, Brendan. 2008. “Personal genomes: The Case of the Missing Heritability.” Nature News 456 (7218):18–21. Manolio, Teri A., Francis S. Collins, Nancy J. Cox, David B. Goldstein, Lucia A. Hindorff, David J. Hunter, Mark I. McCarthy, Erin M. Ramos, Lon R. Cardon, and Aravinda Chakravarti. 2009. “Finding the Missing Heritability of Complex Diseases.” Nature 461 (7265): 747–753. McCarthy, Mark I., and Joel N. Hirschhorn. 2008. “Genome-Wide Association Studies: Potential next Steps on a Genetic Journey.” Human Molecular Genetics 17 (R2): R156–165. Morgan, Hugh D., Heidi GE Sutherland, David IK Martin, and Emma Whitelaw. 1999. “Epigenetic Inheritance at the Agouti Locus in the Mouse.” Nature Genetics 23 (3): 314–18. Schork, Nicholas J., Sarah S. Murray, Kelly A. Frazer, and Eric J. Topol. 2009. “Common vs. Rare Allele Hypotheses for Complex Diseases.” Current Opinion in Genetics & Development 19 (3): 212–19. Silventoinen, Karri, Sampo Sammalisto, Markus Perola, Dorret I. Boomsma, Belinda K. Cornes, Chayna Davis, Leo Dunkel, Marlies De Lange, Jennifer R. Harris, and Jacob VB Hjelmborg. 2003. “Heritability of Adult Body Height: A Comparative Study of Twin Cohorts in Eight Countries.” Twin Research 6 (05): 399–408. Turkheimer, Eric. 2011. “Still Missing.” Research in Human Development 8 (3-4): 227-– 241. Visscher, Peter M., Matthew A. Brown, Mark I. McCarthy, and Jian Yang. 2012. “Five Years of GWAS Discovery.” The American Journal of Human Genetics 90 (1): 7–24. Visscher, Peter M., Sarah E. Medland, Manuel AR Ferreira, Katherine I. Morley, Gu Zhu, Belinda K. Cornes, Grant W. Montgomery, and Nicholas G. Martin. 2006. “Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings.” PLoS Genet 2 (3): e41. Visscher, Peter M., William G. Hill, and Naomi R. Wray. 2008. “Heritability in the Genomics Era—concepts and Misconceptions.” Nature Reviews Genetics 9 (4): 255– 266. Weedon, Michael N., Hana Lango, Cecilia M. Lindgren, Chris Wallace, David M. Evans, Massimo Mangino, Rachel M. Freathy, John RB Perry, Suzanne Stevens, and Alistair S. Hall. 2008. “Genome-Wide Association Analysis Identifies 20 Loci that Influence Adult Height.” Nature Genetics 40 (5): 575–583. Wood, Andrew R, Tonu Esko, Jian Yang, Sailaja Vedantam, Tune H Pers, Stefan Gustafsson, Audrey Y Chu, Karol Estrada, Jian'an Luan, and Zoltán Kutalik. 2014. “Defining the Role of Common Variation in the Genomic and Biological Architecture of Adult Human Height.” Nature genetics 46 (11): 1173–186. Yang, Jian, Beben Benyamin, Brian P. McEvoy, Scott Gordon, Anjali K. Henders, Dale R. Nyholt, Pamela A. Madden, et al. 2010. “Common SNPs Explain a Large Proportion of the Heritability for Human Height.” Nature Genetics 42 (7): 565–69. Yang, Jian, Andrew Bakshi, Zhihong Zhu, Gibran Hemani, Anna AE Vinkhuyzen, Sang Hong Lee, Matthew R. Robinson, et al. 2015. “Genetic Variance Estimation with Imputed Variants Finds Negligible Missing Heritability for Human Height and Body Mass Index.” Nature Genetics 47: 1114–120. Youngson, Neil A., and Emma Whitelaw. 2008. “Transgenerational Epigenetic Effects.” Annual Review of Genomics and Human Genetics 9: 233–257.