Scalable Bias-corrected Linkage Disequilibrium Estimation Under Genotype Uncertainty


Scalable Bias-corrected Linkage Disequilibrium Estimation Under

Genotype Uncertainty

David Gerard

Department of Mathematics and Statistics, American University, Washington, DC, 20016, USA

Abstract

Linkage disequilibrium (LD) estimates are often calculated genome-wide for use in many
tasks, such as SNP pruning and LD decay estimation. However, in the presence of genotype
uncertainty, naive approaches to calculating LD have extreme attenuation biases, incorrectly
suggesting that SNPs are less dependent than in reality. These biases are particularly strong in
polyploid organisms, which often exhibit greater levels of genotype uncertainty than diploids. A
principled approach using maximum likelihood estimation with genotype likelihoods can reduce
this bias, but is prohibitively slow for genome-wide applications. Here, we present scalable
moment-based adjustments to LD estimates based on the marginal posterior distributions of
the genotypes. We demonstrate, on both simulated and real data, that these moment-based
estimators are as accurate as maximum likelihood estimators, and are almost as fast as naive
approaches based only on posterior mean genotypes. This opens up bias-corrected LD estimation
to genome-wide applications. Additionally, we provide standard errors for these moment-based
estimators. All methods are implemented in the ldsep R package on GitHub https://github.
com/dcgerard/ldsep.

1 Introduction

Pairwise linkage disequilibrium (LD), the statistical association between alleles at two different loci,
has applications in genotype imputation [Wen and Stephens, 2010], genome-wide association studies
[Zhu and Stephens, 2018], genomic prediction [Wientjes et al., 2013], population genetics [Slatkin,
2008], and many other tasks [Sved and Hill, 2018]. LD is often estimated from next-generation
sequencing technologies, where the genotypes and haplotypes are not known with certainty [Gerard
et al., 2018]. Thus, researchers typically use estimated genotypes, such as posterior mean genotypes
[Fox et al., 2019], to estimate LD. However, this can cause biased LD estimates, attenuated toward
zero, implying loci are less dependent than in reality. This bias is particularly strong in polyploids,
and so in Gerard [2020] we derived maximum likelihood estimates (MLEs) that have lower bias and
are consistent estimates of LD.

Unfortunately, the MLE approach is prohibitively slow. Researchers typically calculate pairwise
LD at genome-wide scales, and the MLE approach takes on the order of a tenth of a second. Thus,
for many genome-wide applications, containing millions of SNPs, LD estimation using the MLE
approach would take years of computation time. This is not conducive to large-scale applications.

Keywords and phrases: attenuation bias, genotype likelihood, linkage disequilibrium, polyploidy, reliability ratio.

1

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://github.com/dcgerard/ldsep
https://github.com/dcgerard/ldsep
https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


Here, we derive scalable approaches to estimate LD that account for genotype uncertainty (Sec-
tion 2). Our methods use only the first two moments of the marginal posterior genotype distribution
for each individual at each locus, which are often provided or easily obtainable from many geno-
typing programs. We calculate sample moments from these posterior moments, and use these to
multiplicatively inflate naive LD estimates. We show, through simulations (Section 3.1) and real
data (Section 3.2), that our estimates can reduce attenuation bias and improve LD estimates when
genotypes are uncertain. All calculations have computational complexities that are linear in the
sample size, and so these estimates are scalable to genome-wide applications.

2 Methods

In this section, we will define moment-based estimators of the LD coefficient ∆ [Lewontin and
Kojima, 1960], the standardized LD coefficient ∆′ [Lewontin, 1964], and the Pearson correlation ρ
[Hill and Robertson, 1968]. We will only consider estimating the “composite” versions of these LD
measures which, advantageously, are appropriate LD measures for generic autopolyploid, allopoly-
ploid, and segmental allopolyploid populations, even in the absence of Hardy-Weinberg equilibrium
[Gerard, 2020]. We will also only consider biallelic loci, where the genotype for each individual is
the dosage (from 0 to the ploidy) of one of the two alleles.

To define our estimators of LD, we assume the user provides the posterior means and variances
for the genotypes for each individual at two loci. The full posterior genotype distribution for each
individual is often provided by genotyping software [Gerard et al., 2018, Gerard and Ferrão, 2019,
e.g.], from which these posterior moments can be obtained. If genotype posteriors are not provided,
genotype likelihoods may be normalized to posterior probabilities (assuming a uniform prior) and
used in what follows. Let XiA and XiB be the posterior means at loci A and B for individual
i ∈ {1, . . . ,n}. Let YiA and YiB be the posterior variances at loci A and B for individual i. Our
estimators are based entirely on the following sample moments of these posterior moments, which
may be calculated in linear time in the sample size, n.

uxA :=
1

n

n∑
i=1

XiA, uxB :=
1

n

n∑
i=1

XiB, (1)

vxA :=
1

n− 1

n∑
i=1

(XiA −uxA)2, vxB :=
1

n− 1

n∑
i=1

(XiA −uxB)2, (2)

cx :=
1

n− 1

n∑
i=1

(XiA −uxA)(XiB −uxB), (3)

uyA :=
1

n

n∑
i=1

YiA, and uyB :=
1

n

n∑
i=1

YiB. (4)

For a K-ploid species, our LD estimators, which we derive in Section S1, are as follows. The
estimated LD coefficient is

∆̂ :=

(
uyA + vxA

vxA

)(
uyB + vxB

vxB

)(cx
K

)
. (5)

2

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


The estimated Pearson correlation is

ρ̂ :=

√
uyA + vxA

vxA

√
uyB + vxB

vxB

cx√
vxAvxB

. (6)

Note that cx/
√
vxAvxB is the sample Pearson correlation between posterior mean genotypes. The

estimated standardized LD coefficient is

∆̂′ := ∆̂/∆̂m, where (7)

∆̂m :=

{
min{uxAuxB, (K −uxA)(K −uxB)}/K2 if cx < 0, and
min{uxA(K −uxB), (K −uxA)uxB}/K2 if cx > 0.

(8)

Equations (5)–(7) take the naive estimators most researchers use in practice (the sample covari-
ance/correlation of posterior means) and inflate these by a multiplicative effect. Such multiplicative
effects are sometimes called “reliability ratios” in the measurement error models literature [Fuller,
2009]. Due to sampling variability, this inflation could result in estimates that lie beyond the theo-
retical bounds of the parameters being estimated. In such cases, we apply the following truncations.

ρ̃ :=

{
max{ρ̂,−1} if ρ̂ < 0
min{ρ̂, 1} if ρ̂ > 0

(9)

∆̃ :=

{
max{∆̂,−

√
(vxA + uyA)(vxB + uyB)/K} if ∆̂ < 0

min{∆̂,
√

(vxA + uyA)(vxB + uyB)/K} if ∆̂ > 0
(10)

∆̃′ :=

{
max{∆̂′,−K} if ∆̂′ < 0
min{∆̂′,K} if ∆̂′ > 0

(11)

Standard errors are important for hypothesis testing [Brown, 1975], read-depth suggestions
[Maruki and Lynch, 2014], and shrinkage [Dey and Stephens, 2018]. Because estimators (5)–(7) are
functions of sample moments, deriving their standard errors can be accomplished by appealing to
the central limit theorem, followed by an application of the delta method (Section S2).

Additional considerations for improving our estimates of the reliability ratios, such as using
hierarchical shrinkage [Stephens, 2016], are considered in Section S3.

All methods are implemented in the ldsep R package on GitHub https://github.com/dcgerard/
ldsep.

3 Results

3.1 Simulations

We compared our moment-based estimators (5)–(7) to those of the MLE of Gerard [2020] as well as
the naive estimator that calculates the sample covariance and sample correlation between posterior
mean genotypes at two loci. Each replication, we generated genotypes for n ∈{10, 100, 1000} indi-
viduals with ploidy K ∈{2, 4, 6, 8} under Hardy-Weinberg equilibrium at two loci with major allele
frequencies (pA,pB) ∈{(0.5, 0.5), (0.5, 0.75), (0.9, 0.9)} and Pearson correlation ρ ∈{0, 0.5, 0.9}. We
then used updog’s rflexdog() function [Gerard et al., 2018, Gerard and Ferrão, 2019] to generate
read-counts at read-depths of either 10 or 100, a sequencing error rate of 0.01, an overdispersion

3

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://github.com/dcgerard/ldsep
https://github.com/dcgerard/ldsep
https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


value of 0.01, and no allele bias. Updog was then used to generate genotype likelihoods and genotype
posterior distributions for each individual at each SNP. These were then fed into ldsep to obtain
the MLE, our new moment-based estimator, and the naive estimator. Simulations were replicated
200 times for each unique combination of simulation parameters.

The accuracy of estimating ρ2 when pA = pB = 0.5 at a read-depth of 10 is presented in
Figure 1. The results for other scenarios are similar and may be found on GitHub (https://
github.com/dcgerard/ldfast_sims). We see that the moment-based estimator and the MLE
perform comparably, even for small read-depth and sample size. The naive estimator has a strong
attenuation bias toward zero. This bias is particularly prominent for higher ploidy levels. For
example, for an octoploid species where the true ρ2 is 0.81, the naive estimator appears to converge
to a ρ2 estimate of around 0.25. This bias does not disappear with increasing sample size. Estimated
standard errors are reasonably well-behaved, except for ρ̂ and ρ̂2 when the sample size is small and
the LD is large (Figure 2).

3.2 LD estimates for Solanum tuberosum

We evaluated our methods on the autotetraploid potato (Solanum tuberosum, 2n = 4x = 48)
genotyping-by-sequencing data from Uitdewilligen et al. [2013]. We used updog [Gerard et al., 2018,
Gerard and Ferrão, 2019] to obtain the posterior moments for each individual’s genotype at each SNP
on a single super scaffold (PGSC0003DMB000000192). To remove monoallelic SNPs, we filtered
out SNPs with allele frequencies either greater than 0.95 or less than 0.05, and filtered out SNPs
with a variance of posterior means less than 0.05. This resulted in 2108 SNPs. We then estimated
the squared correlation between each SNP using either the naive approach of calculating the sample
Pearson correlation between posterior means, or using our new moment-based approach (6).

Our estimators are scalable. On a 1.9 GHz quad-core PC running Linux with 32 GB of memory,
it took a total of 1.9 seconds to estimate all pairwise correlations using our new moment-based
approach, which is a small increase over the 0.7 seconds it took to estimate all pairwise correlations
using the naive approach. In Gerard [2020], we found that the MLE approach took about 0.1
seconds for each pair of SNPs for a tetraploid individual. Extrapolating this to 2108 SNPs would
indicate that the MLE approach would take about 2.5 days of computation time to calculate all
pairwise LD estimates on this dataset (

(
2108

2

)
×0.1sec×1min/60sec×1hr/60min×1d/24hr = 2.57d).

The histogram of estimated reliability ratios are presented in Figure 3. We see there that the
reliability ratios of most SNPs only increase their correlation estimates by less than 10%. But a
not insignificant portion have reliability ratios that increase the correlation estimates by more than
10%. To evaluate the LD estimates of high reliability ratio SNPs, we calculated the MLEs for ρ2

between the twenty SNPs with the largest reliability ratios. A pairs plot for ρ2 estimates between
the three approaches is presented in Figure 4. We see there that the MLE and new moment-based
approach result in very similar ρ2 estimates, while the naive approach using posterior means results
in much smaller ρ2 estimates.

4 Discussion

It has been known since at least the time of Spearman that the sample correlation coefficient
(or, similarly, the ordinary least squares estimator in simple linear regression) is attenuated in the
presence of uncertain variables [Spearman, 1904]. Methods to adjust for this bias include assuming
prior knowledge on the measurement variances or the ratio of measurement variances (resulting

4

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://github.com/dcgerard/ldfast_sims
https://github.com/dcgerard/ldfast_sims
https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


from, for example, repeated measurements on the same individuals) [Koopmans, 1937, Degracie
and Fuller, 1972], using instrumental variables [Carter and Fuller, 1980], and using distributional
assumptions [Pal, 1980]. See Fuller [2009] for a detailed introduction to this vast field. Our solution
was to use sample moments of marginal posterior moments which, to our knowledge, has never been
proposed before.

It is natural to ask if our methods could be used to account for uncertain genotypes in genome-
wide association studies. However, the moment-based techniques we used in this manuscript, when
applied to simple linear regression with an additive effects model (where the SNP effect is pro-
portional to the dosage), result in the standard ordinary least squares estimates when using the
posterior mean as a covariate (Section S4). This supports using the posterior mean as a covariate
in simple linear regression with an additive effects model. This is not to say, however, that using
the posterior mean is also appropriate for more complicated models of gene action [Rosyara et al.,
2016], or for non-linear models.

Acknowledgments

Most analyses were performed using the R statistical language [R Core Team, 2020].

Data availability

All methods discussed in this manuscript are implemented in the ldsep R package, available on
GitHub (https://github.com/dcgerard/ldsep) under a GPL-3 license. Scripts to reproduce the
results of this research are available on GitHub (https://github.com/dcgerard/ldfast_sims).
All datasets used in this manuscript are publicly available [Uitdewilligen et al., 2013] and may be
downloaded from:

• https://doi.org/10.1371/journal.pone.0062355.s004
• https://doi.org/10.1371/journal.pone.0062355.s007
• https://doi.org/10.1371/journal.pone.0062355.s009
• https://doi.org/10.1371/journal.pone.0062355.s010

5

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://github.com/dcgerard/ldsep
https://github.com/dcgerard/ldfast_sims
https://doi.org/10.1371/journal.pone.0062355.s004
https://doi.org/10.1371/journal.pone.0062355.s007
https://doi.org/10.1371/journal.pone.0062355.s009
https://doi.org/10.1371/journal.pone.0062355.s010
https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


5 Figures

0 0.25 0.81

2
4

6
8

10 100 1000 10 100 1000 10 100 1000

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Sample Size

ρ̂2

method

MLE

MoM

Naive

Figure 1: Estimate of ρ2 (y-axis) for the maximum likelihood estimator [Gerard, 2020] (MLE),
our new moment-based estimator (6) (MoM), and the naive squared sample correlation coefficient
between posterior mean genotypes (Naive). The x-axis indexes the sample size, the row-facets index
the ploidy, and the column-facets index the true ρ2, which is also presented by the horizontal dashed
red line. These simulations were performed using a read-depth of 10, and major allele frequencies
of 0.5 at each locus. The naive estimator presents a strong attenuation bias toward 0, particularly
for higher ploidy regimes.

6

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


ẑ ∆̂′

∆̂ ρ̂ ρ̂
2

0.0 0.2 0.4 0.6 0 1 2

0.00 0.05 0.10 0.15 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6
0.0

0.2

0.4

0.6

0.0

0.2

0.4

0.0

0.5

1.0

1.5

0.00

0.05

0.10

0.00

0.25

0.50

0.75

1.00

MAD of Estimates

M
e

d
ia

n
 o

f 
S

ta
n

d
a

rd
 E

rr
o

rs

n  and  ρ

other

n = 10, ρ = 0.9

Figure 2: Median of estimated standard errors (y-axis) versus median absolute deviations (x-axis)
of each of the moment-based LD estimators (facets). The line is the y = x line, and points above
this line indicate that the estimated standard errors are typically larger than the true standard
errors. Estimated standard error are reasonably unbiased except for ρ̂ and ρ̂2 in scenarios with
small sample sizes (n = 10) and a large levels of LD (ρ = 0.9) (color and shape).

0

200

400

600

1.00 1.25 1.50 1.75 2.00
Reliability Ratio Estimate

co
u

n
t

Figure 3: Histogram of estimated reliability ratios (S69) using the data from Uitdewilligen et al.
[2013].

7

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


MLE MoM Naive

M
L

E
M

o
M

N
a

ive

0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00 0.0 0.2 0.4 0.6 0.8

0

20

40

60

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Figure 4: Pairs plot for ρ2 estimates between the twenty SNPs from Uitdewilligen et al. [2013] with
the largest estimated reliability ratios when using either maximum likelihood estimation (MLE)
[Gerard, 2020], our new moment-based approach (6) (MoM), or the naive approach using just
posterior means (Naive). The dashed line is the y = x line. The MLE and the moment-based
approach result in much more similar LD estimates.

8

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


Supplementary Material

S1 Derivation of LD estimators

In this section, we derive estimators (5)–(7). We do this by assuming a normal model on the data
and the genotypes. This is obviously not appropriate when using genotypes and sequencing data,
but our simulations in Section 3.1 were also accomplished using sequencing data and resulted in
very good performance.

Let Gi = (GiA,GiB)
ᵀ be the genotype for individual i at loci A and B. Let Zi = (ZiA,ZiB)

ᵀ be
the data for individual i at loci A and B. Then we let

Gi ∼ N2(µ, Σ), and (S1)
Zi|Gi ∼ N2(Gi,S), where (S2)

µ = (µ1,µ2)
ᵀ, (S3)

Σ =

(
σ11 σ12
σ12 σ22

)
, and (S4)

S =

(
s11 0
0 s22

)
. (S5)

To interpret these terms, µ1/K and µ2/K are the allele frequencies at each locus, σ11 and σ22
are the variances of the genotypes at each locus, s11 and s22 are the variances of the genotyping
errors at each locus, and σ12 is covariance between genotypes. By elementary methods, we have the
well-known result that, marginally,

Zi ∼ N2(µ, Σ + S). (S6)

We assume the user has provided posterior moments on the genotypes

XiA = E[GiA|ZiA],XiB = E[GiB|ZiB],YiA = var(GiA|ZiA), and YiB = var(GiB|ZiB). (S7)

These posterior moments are marginal in that they only condition on either ZiA or ZiB, but not
both. Thus, we assume they are well-approximated by the model

GiA ∼ N(µ1,σ11) (S8)
ZiA|GiA ∼ N(GiA,s11) (S9)

GiB ∼ N(µ2,σ22) (S10)
ZiB|GiB ∼ N(GiB,s22). (S11)

By standard methods, this results in

GiA|ZiA ∼ N

[(
1

σ11
+

1

s11

)−1 (
1

σ11
µ1 +

1

s11
ZiA

)
,

(
1

σ11
+

1

s11

)−1]
, and (S12)

GiB|ZiB ∼ N

[(
1

σ22
+

1

s22

)−1 (
1

σ22
µ2 +

1

s22
ZiB

)
,

(
1

σ22
+

1

s22

)−1]
. (S13)

9

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


Treating only Zi as random from distribution (S6), we have

uxA ≈ E

[(
1

σ11
+

1

s11

)−1 (
1

σ11
µ1 +

1

s11
ZiA

)]
(S14)

=

(
1

σ11
+

1

s11

)−1 (
1

σ11
µ1 +

1

s11
E[ZiA]

)
(S15)

=

(
1

σ11
+

1

s11

)−1 (
1

σ11
µ1 +

1

s11
µ1

)
(S16)

= µ1. (S17)

Similarly,

uxB ≈ µ2. (S18)

Furthermore,

vxA ≈ var

[(
1

σ11
+

1

s11

)−1 (
1

σ11
µ1 +

1

s11
ZiA

)]
(S19)

=

(
1

σ11
+

1

s11

)−2
1

s211
var(ZiA) (S20)

=

(
1

σ11
+

1

s11

)−2
σ11 + s11

s211
(S21)

=

(
1

σ11
+

1

s11

)−1
σ11
s11

. (S22)

Similarly,

vxB ≈
(

1

σ22
+

1

s22

)−1
σ22
s22

. (S23)

Now, using the posterior variances, we have

uyA ≈
(

1

σ11
+

1

s11

)−1
, and (S24)

uyB ≈
(

1

σ22
+

1

s22

)−1
. (S25)

10

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


Finally, the expectation of the sample covariance of posterior means is

cx ≈ cov

[(
1

σ11
+

1

s11

)−1 (
1

σ11
µ1 +

1

s11
ZiA

)
,

(
1

σ22
+

1

s22

)−1 (
1

σ22
µ2 +

1

s22
ZiB

)]
(S26)

=

(
1

σ11
+

1

s11

)−1 (
1

σ22
+

1

s22

)−1
1

s11

1

s22
cov(ZiA,ZiB) (S27)

=

(
1

σ11
+

1

s11

)−1 (
1

σ22
+

1

s22

)−1
1

s11

1

s22
σ12. (S28)

Using a method-of-moments approach, we now have a system of five equations and five un-
knowns:

vxA =

(
1

σ11
+

1

s11

)−1
σ11
s11

, (S29)

vxB =

(
1

σ22
+

1

s22

)−1
σ22
s22

, (S30)

uyA =

(
1

σ11
+

1

s11

)−1
, (S31)

uyB =

(
1

σ22
+

1

s22

)−1
, and (S32)

cx =

(
1

σ11
+

1

s11

)−1 (
1

σ22
+

1

s22

)−1
1

s11

1

s22
σ12. (S33)

Solving for s11, s22, σ11, σ22, and σ12, we obtain:

ŝ11 =
uyA(uyA + vxA)

vxA
(S34)

ŝ22 =
uyB(uyB + vxB)

vxB
(S35)

σ̂11 = uyA + vxA (S36)

σ̂22 = uyB + vxB (S37)

σ̂12 =
uyA + vxA

vxA

uyB + vxB
vxB

cx. (S38)

Using (S14)–(S18), we also have

µ̂1 = uxA, and (S39)

µ̂2 = uxB. (S40)

The LD coefficient estimates (5)–(7) can be obtained by substituting in parameter estimates in

11

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


the following equations [Gerard, 2020]

∆ = σ12/K, (S41)

ρ = σ12/
√
σ11σ22, and (S42)

∆′ = ∆/∆m, where (S43)

∆m =

{
min{µ1µ2, (K −µ1)(K −µ2)}/K2 if ∆ < 0, and
min{µ1(K −µ2), (K −µ1)µ2}/K2 if ∆ > 0.

(S44)

S2 Derivation of standard errors

Let

Mi := (XiA,X
2
iA,XiB,X

2
iB,XiAXiB,YiA,YiB)

ᵀ. (S45)

Then, by the central limit theorem, we have for

M̄ :=
1

n

n∑
i=1

Mi, (S46)

that
√
nM̄ is asymptotically multivariate normal with some limiting covariance, say, Ω. Finite

variances are guaranteed by the finite support of the genotypes. We can estimate Ω with the
sample covariance matrix

Ω̂ :=
1

n− 1

n∑
i=1

(Mi −M̄)(Mi −M̄)ᵀ. (S47)

Estimators (5)–(7) are approximately functions of M̄. Namely

∆̂ ≈
(
M̄6 + M̄2 −M̄21

M̄2 −M̄21

)(
M̄7 + M̄4 −M̄23

M̄4 −M̄23

)(
M̄5 −M̄1M̄3

K

)
(S48)

ρ̂ ≈

(√
M̄6 + M̄2 −M̄21
M̄2 −M̄21

)(√
M̄7 + M̄4 −M̄23
M̄4 −M̄23

)(
M̄5 −M̄1M̄3

)
(S49)

∆̂′ ≈
(
M̄6 + M̄2 −M̄21

M̄2 −M̄21

)(
M̄7 + M̄4 −M̄23

M̄4 −M̄23

)(
M̄5 −M̄1M̄3

K

)
/∆̂m, where (S50)

∆̂m =

{
min{M̄1M̄3, (K −M̄1)(K −M̄3)}/K2 if M̄5 −M̄1M̄3 < 0, and
min{M̄1(K −M̄3), (K −M̄1)M̄3}/K2 if M̄5 −M̄1M̄3 > 0.

(S51)

These are smooth functions of M̄ (except on a space of Lebesgue measure zero), and so admit the

12

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


following gradients, calculated in Mathematica [Wolfram Research, Inc., 2020]:

g∆ :=
d∆̂

dM̄
=




−(
M̄41 M̄3−2M̄1M̄5M̄6+M̄

2
1 M̄3(−2M̄2+M̄6)+M̄2M̄3(M̄2+M̄6))(M̄

2
3−M̄4−M̄7)

K(M̄21−M̄2)
2
(M̄23−M̄4)

(M̄1M̄3−M̄5)M̄6(M̄23−M̄4−M̄7)
K(M̄21−M̄2)

2
(M̄23−M̄4)

−(
M̄21−M̄2−M̄6)(−2M̄3M̄5M̄7+M̄1(M̄

4
3 +M̄

2
3 (−2M̄4+M̄7)+M̄4(M̄4+M̄7)))

K(M̄21−M̄2)(M̄
2
3−M̄4)

2

(M̄1M̄3−M̄5)(M̄21−M̄2−M̄6)M̄7
K(M̄21−M̄2)(M̄

2
3−M̄4)

2

(−M̄21 +M̄2+M̄6)(−M̄
2
3 +M̄4+M̄7)

K(M̄21−M̄2)(M̄
2
3−M̄4)

(−M̄1M̄3+M̄5)(−M̄23 +M̄4+M̄7)
K(M̄21−M̄2)(M̄

2
3−M̄4)

(−M̄1M̄3+M̄5)(−M̄21 +M̄2+M̄6)
K(M̄21−M̄2)(M̄

2
3−M̄4)




, (S52)

gρ :=
dρ̂

dM̄
=




(M̄31 M̄5+M̄
2
1 M̄3(−M̄2+M̄6)+M̄2M̄3(M̄2+M̄6)−M̄1M̄5(M̄2+2M̄6))

√
−M̄23 +M̄4+M̄7

(M̄21−M̄2)
2
(M̄23−M̄4)

√
−M̄21 +M̄2+M̄6

(M̄1M̄3−M̄5)(M̄21−M̄2−2M̄6)
√
−M̄23 +M̄4+M̄7

2(M̄21−M̄2)
2
(M̄23−M̄4)

√
−M̄21 +M̄2+M̄6

−
√
−M̄21 +M̄2+M̄6(M̄1M̄

2
3 (M̄4−M̄7)−M̄1M̄4(M̄4+M̄7)+M̄3M̄5(−M̄

2
3 +M̄4+2M̄7))

(M̄21−M̄2)(M̄
2
3−M̄4)

2√
−M̄23 +M̄4+M̄7

(M̄1M̄3−M̄5)
√
−M̄21 +M̄2+M̄6(M̄

2
3−M̄4−2M̄7)

2(M̄21−M̄2)(M̄
2
3−M̄4)

2√
−M̄23 +M̄4+M̄7√

−M̄21 +M̄2+M̄6
√
−M̄23 +M̄4+M̄7

(M̄21−M̄2)(M̄
2
3−M̄4)

(−M̄1M̄3+M̄5)
√
−M̄23 +M̄4+M̄7

2(M̄21−M̄2)(M̄
2
3−M̄4)

√
−M̄21 +M̄2+M̄6

(−M̄1M̄3+M̄5)
√
−M̄21 +M̄2+M̄6

2(M̄21−M̄2)(M̄
2
3−M̄4)

√
−M̄23 +M̄4+M̄7




, (S53)

13

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


and

g∆′ :=
d∆̂

dM̄
= g∆/∆̂m −A, where (S54)

A =




∆̂C1(M̄1,M̄3,M̄5)/∆̂
2
m

0

∆̂C3(M̄1,M̄3,M̄5)/∆̂
2
m

0
0
0
0




(S55)

C1(M̄1,M̄3,M̄5) =



M̄3/K

2 if M̄5 < M̄1M̄3 and M̄1M̄3 < (K −M̄1)(K −M̄3)
−(K −M̄3)/K2 if M̄5 < M̄1M̄3 and M̄1M̄3 > (K −M̄1)(K −M̄3)
−M̄3/K2 if M̄5 > M̄1M̄3 and M̄1(K −M̄3) > (K −M̄1)M̄3
(K −M̄3)/K2 if M̄5 > M̄1M̄3 and M̄1(K −M̄3) < (K −M̄1)M̄3

(S56)

C3(M̄1,M̄3,M̄5) =



M̄1/K

2 if M̄5 < M̄1M̄3 and M̄1M̄3 < (K −M̄1)(K −M̄3)
−(K −M̄1)/K2 if M̄5 < M̄1M̄3 and M̄1M̄3 > (K −M̄1)(K −M̄3)
(K −M̄1)/K2 if M̄5 > M̄1M̄3 and M̄1(K −M̄3) > (K −M̄1)M̄3
−M̄1/K2 if M̄5 > M̄1M̄3 and M̄1(K −M̄3) < (K −M̄1)M̄3

(S57)

Though these gradients are rather complicated, they are not computationally intensive and may
be calculated in constant time in the sample size.

The asymptotic variances of ∆̂, ρ̂, and ∆̂′ are

1

n
g
ᵀ
∆Ω̂g∆,

1

n
gᵀρΩ̂gρ, and

1

n
g
ᵀ
∆′ Ω̂g∆′, (S58)

respectively.
To accommodate missing data, we use only pairwise complete observations for the sample

covariance matrix (S47). This ensures that Ω̂ is positive definite and, thus, the resulting stan-
dard errors are non-negative. However, we use all non-missing observations for M̄. That is, let

14

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


ΘA, ΘB ⊆{1, 2, . . . ,n} be the index sets of non-missing values at loci A and B, respectively. Then

M̄1 =
1

|ΘA|
∑
i∈ΘA

XiA (S59)

M̄2 =
1

|ΘA|
∑
i∈ΘA

X2iA (S60)

M̄3 =
1

|ΘB|
∑
i∈ΘB

XiB (S61)

M̄4 =
1

|ΘB|
∑
i∈ΘB

X2iB (S62)

M̄5 =
1

|ΘA ∩ ΘB|
∑

i∈ΘA∩ΘB

XiAXiB (S63)

M̄6 =
1

|ΘA|
∑
i∈ΘA

YiA (S64)

M̄7 =
1

|ΘB|
∑
i∈ΘB

YiB (S65)

M̄
∗

=
1

|ΘA ∩ ΘB|
∑

i∈ΘA∩ΘB

Mi (S66)

Ω̂ =
1

|ΘA ∩ ΘB|− 1
∑

i∈ΘA∩ΘB

(Mi −M̄
∗
)(Mi −M̄

∗
)ᵀ (S67)

The asymptotic variances of ∆̂, ρ̂, and ∆̂′ are then

1

|ΘA ∩ ΘB|
g
ᵀ
∆Ω̂g∆,

1

|ΘA ∩ ΘB|
gᵀρΩ̂gρ, and

1

|ΘA ∩ ΘB|
g
ᵀ
∆′ Ω̂g∆′, (S68)

respectively.

S3 Adjusting the reliability ratios

S3.1 Adaptive shrinkage on the reliability ratios

Each SNP has an estimated reliability ratio,

bj :=
uyj + vxj

vxj
, (S69)

which corresponds to the multiplicative adjustment to all LD estimates that include that SNP
(see (5)). These reliability ratios might have high variance due to (i) lower sequencing depth
or (ii) containing fewer individuals with non-missing data. Thus, some reliability ratios may be
noisy. Hierarchical shrinkage is a statistical technique that allows high-variance observations to
borrow strength from low-variance observations and thus improve estimation performance. Adap-
tive shrinkage (ash) [Stephens, 2016] is a recently proposed general-purpose hierarchical shrinkage

15

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


technique that we can use to model the distribution of reliability ratios flexibly, only constraining
them to be unimodal. In this section, we will use ash to improve our reliability ratio estimates.

We will now describe the procedure for applying ash to shrink the reliability ratios. Our strategy
will be to derive the standard errors for the log of the reliability ratios (S69) and apply ash on the
log-scale using these standard errors. To begin, let Xij be the posterior mean for individual i at
SNP j. Let Yij be the posterior variance for individual i at SNP j. Finally, let

Mij = (Xij,X
2
ij,Yij), (S70)

M̄j =
1

n

n∑
i=1

Mij, so (S71)

M̄j1 =
1

n

n∑
i=1

Xij, (S72)

M̄j2 =
1

n

n∑
i=1

X2ij, and (S73)

M̄j3 =
1

n

n∑
i=1

Yij. (S74)

Then the log of the reliability ratio for SNP j is

Lj := log

(
M̄j3 + M̄j2 −M̄2j1

M̄j2 −M̄2j1

)
(S75)

= log(M̄j3 + M̄j2 −M̄2j1) − log(M̄j2 −M̄
2
j1). (S76)

Let the sample covariance be

Ω̂j :=
1

n− 1

n∑
i=1

(Mij −M̄j)(Mij −M̄j)ᵀ. (S77)

Then we have by the central limit theorem that
√
nM̄j is asymptotically multivariate normal, and

we can use Ω̂j as the estimate of the covariance matrix. The gradients for (S75) are

gj1 :=
dLj
dM̄j1

=
−2M̄j1

M̄j3 + M̄j2 −M̄2j1
+

2M̄j1
M̄j2 −M̄2j1

(S78)

gj2 :=
dLj
dM̄j2

=
1

M̄j3 + M̄j2 −M̄2j1
−

1

M̄j2 −M̄2j1
(S79)

gj3 :=
dLj
dM̄j3

=
1

M̄j3 + M̄j2 −M̄2j1
(S80)

Then, with gj := (gj1,gj2,gj3)
ᵀ, the variance for Lj is

ŝ2j :=
1

n
g
ᵀ
j Ω̂jgj. (S81)

16

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


We apply ash to (L1, ŝ1), . . . , (Lm, ŝm) to obtain shrunken log reliability ratios L̂1, . . . , L̂m. Because
ash’s grid-based scheme for estimating the mode is not the most computationally efficient, we used
the half-sample mode estimator of Robertson and Cryer [1974] prior to running ash.

This procedure seems to result in improved performance for SNPs with unusually variable reli-
ability ratios (Figure S1).

S3.2 Thresholding the reliability ratios

If a researcher accidentally provides a monoallelic SNP, its reliability ratio could explode due to
having a denominator close to zero in (S69). For example, the right panel of Figure S2 contains a
monoallelic SNP (PotVar0080327) whose reliability ratio estimate (S69) is 100.92. This can provide
unstable estimates of LD as some SNPs will, due to sampling variability, have correlations with
these monoallelic SNPs on the order of 0.01. For example, the sample correlation between posterior
means of PotVar0080327 and PotVar0078678 (left facet of Figure S2) -0.0098. But due to the
extreme reliability ratio of PotVar0080327, the genotype-error adjusted correlation estimate is -1.
This is, of course, unsettling. So by default, our software will take all reliability ratio estimates
(S69) above a user-provided value (default of 10) and assign these to have reliability ratios of the
median reliability ratio in the dataset.

S4 Genome-wide Association Studies

In this section, we demonstrate that the techniques used in Section S1, when applied to simple linear
regression with an additive effects model [Rosyara et al., 2016], result in the standard ordinary least
squares estimate when using the posterior mean as a covariate. This indicates that for genome-wide
association studies, using the posterior mean is appropriate in a linear regression context when
using an additive model for gene action.

Let Gi be the genotype for individual i at a locus. Let Zi be the data that lead to the genotyping
for individual i at the same locus. Let Wi be some quantitative trait of interest for individual i.
Then we let

Wi|Gi ∼ N(β0 + β1Gi,σ2) (S82)
Zi|Gi ∼ N(Gi,s2) (S83)
Gi ∼ N(µ,τ2). (S84)

We suppose the user is only provided the posterior means and variances of each Gi|Zi. Let Xi =
E[Gi|Zi] and Yi = var(Gi|Zi). From elementary methods, we have

Zi ∼ N(µ,s2 + τ2) (S85)

Gi|Zi ∼ N

[(
1

τ2
+

1

s2

)−1 (
1

τ2
µ +

1

s2
Zi

)
,

(
1

τ2
+

1

s2

)−1]
. (S86)

17

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


Let

uw =
1

n

n∑
i=1

Wi (S87)

ux =
1

n

n∑
i=1

Xi (S88)

cxw =
1

n− 1

n∑
i=1

(Wi −uw)(Xi −ux) (S89)

vx =
1

n− 1

n∑
i=1

(Xi −ux)2 (S90)

vw =
1

n− 1

n∑
i=1

(Wi −uw)2. (S91)

We have that

cwx ≈ cov(Wi,Xi) (S92)

≈ cov

(
Wi,

(
1

τ2
+

1

s2

)−1 (
1

τ2
µ +

1

s2
Zi

))
(S93)

=

(
1

τ2
+

1

s2

)−1
1

s2
cov(Wi,Zi) (S94)

=

(
1

τ2
+

1

s2

)−1
1

s2
β1 var(Gi) (S95)

=

(
1

τ2
+

1

s2

)−1
τ2

s2
β1. (S96)

We also have from (S19)–(S22) that

vx ≈
(

1

τ2
+

1

s2

)−1
τ2

s2
. (S97)

Using method of moments with equations (S96) and (S97), we have the following estimator for
β1

β̂1 = cwx/vx (S98)

=
cwx√
vxvw

√
vw√
vx
. (S99)

Equation (S99) is the sample correlation between the Wi’s and the Xi’s (cwx/
√
vxvw) multiplied

by the ratio of the sample standard deviations of the Wi’s and the Xi’s (
√
vw/
√
vx). This is the

well-known formula for the ordinary least squares estimate of β1 from a regression of Wi on Xi.

18

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


S5 Supplementary figures

0.0

0.1

0.2

0.3

0.0 0.2 0.4 0.6
Log of Reliability Ratio

E
st

im
a

te
d

 S
ta

n
d

a
rd

 E
rr

o
r

(A)

0

25

50

75

100

125

0 25 50 75 100 125
Alternative Counts

R
e

fe
re

n
ce

 C
o

u
n

ts

(B)

0

40

80

120

0 40 80 120
Alternative Counts

R
e

fe
re

n
ce

 C
o

u
n

ts

(C)

1.00

1.25

1.50

1.75

2.00

1.00 1.25 1.50 1.75 2.00
Raw Reliability Ratio

S
h

ru
n

ke
n

 R
e

lia
b

ili
ty

 R
a

tio

(D)

Figure S1: (A) The log of the reliability ratios (x-axis) versus their estimated standard errors
(y-axis). The two highlighted points do not seem to fit the trend. When we plot the read-counts
for these highlighted points ((B) and (C)), we notice that these two SNPs are almost monoallelic,
providing doubts on their unusually large reliability ratios. We plot the shrunken reliability ratios
(y-axis) against their original values (x-axis) in (D), noting that the problem SNPs (color) have
their reliability ratios highly adjusted.

19

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


PotVar0078678 PotVar0080327

0 50 100 150 200 0 50 100 150 200

0

50

100

150

200

Alternative Counts

R
e

fe
re

n
ce

 C
o

u
n

ts

Figure S2: Plots of read-counts of two SNPs (facets) from Uitdewilligen et al. [2013]. Alternative
counts lie on the x-axis and reference counts lie on the y-axis. The right SNP is monoallelic and
because of this the estimated correlation between the two SNPs using raw reliability ratios is -1,
even though the sample correlation between posterior means is only -0.0098.

20

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


References

A. Brown. Sample sizes required to detect linkage disequilibrium between two or three loci. Theoreti-
cal Population Biology, 8(2):184 – 201, 1975. ISSN 0040-5809. doi: 10.1016/0040-5809(75)90031-3.

R. L. Carter and W. A. Fuller. Instrumental variable estimation of the simple errors-in-
variables model. Journal of the American Statistical Association, 75(371):687–692, 1980. doi:
10.1080/01621459.1980.10477534.

J. S. Degracie and W. A. Fuller. Estimation of the slope and analysis of covariance when the
concomitant variable is measured with error. Journal of the American Statistical Association, 67
(340):930–937, 1972. doi: 10.1080/01621459.1972.10481321.

K. K. Dey and M. Stephens. CorShrink: Empirical Bayes shrinkage estimation of correlations, with
applications. bioRxiv, 2018. doi: 10.1101/368316.

E. A. Fox, A. E. Wright, M. Fumagalli, and F. G. Vieira. ngsLD: evaluating linkage disequilibrium
using genotype likelihoods. Bioinformatics, 35(19):3855–3856, 03 2019. ISSN 1367-4803. doi:
10.1093/bioinformatics/btz200.

W. A. Fuller. Measurement error models. John Wiley & Sons, 2009.
D. Gerard. Pairwise linkage disequilibrium estimation for polyploids. bioRxiv, 2020. doi:

10.1101/2020.08.03.234476.
D. Gerard and L. F. V. Ferrão. Priors for genotyping polyploids. Bioinformatics, 36(6):1795–1800,

11 2019. ISSN 1367-4803. doi: 10.1093/bioinformatics/btz852. bioRxiv: 751784.
D. Gerard, L. F. V. Ferrão, A. A. F. Garcia, and M. Stephens. Genotyping polyploids from

messy sequencing data. Genetics, 210(3):789–807, 2018. ISSN 0016-6731. doi: 10.1534/ge-
netics.118.301468.

W. Hill and A. Robertson. Linkage disequilibrium in finite populations. Theoretical and applied
genetics, 38(6):226–231, 1968. doi: 10.1007/BF01245622.

T. C. Koopmans. Linear regression analysis of economic time series, volume 20. De erven F. Bohn
nv, 1937.

R. Lewontin. The interaction of selection and linkage. i. general considerations; heterotic models.
Genetics, 49(1):49, 1964. URL https://www.genetics.org/content/49/1/49.

R. C. Lewontin and K.-i. Kojima. The evolutionary dynamics of complex polymorphisms. Evolution,
14(4):458–472, 1960. doi: 10.1111/j.1558-5646.1960.tb03113.x.

T. Maruki and M. Lynch. Genome-wide estimation of linkage disequilibrium from population-
level high-throughput sequencing data. Genetics, 197(4):1303–1313, 2014. ISSN 0016-6731. doi:
10.1534/genetics.114.165514.

M. Pal. Consistent moment estimators of regression coefficients in the presence of errors in vari-
ables. Journal of Econometrics, 14(3):349 – 364, 1980. ISSN 0304-4076. doi: 10.1016/0304-
4076(80)90032-9.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria, 2020. URL https://www.R-project.org/.

T. Robertson and J. D. Cryer. An iterative procedure for estimating the mode. Journal of the Amer-
ican Statistical Association, 69(348):1012–1016, 1974. doi: 10.1080/01621459.1974.10480246.

U. R. Rosyara, W. S. De Jong, D. S. Douches, and J. B. Endelman. Software for genome-wide
association studies in autopolyploids and its application to potato. The Plant Genome, 9(2),
2016. doi: 10.3835/plantgenome2015.08.0037.

M. Slatkin. Linkage disequilibrium-understanding the evolutionary past and mapping the medical
future. Nature Reviews Genetics, 9(6):477, 2008. doi: 10.1038/nrg2361.

21

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

http://doi.org/10.1016/0040-5809(75)90031-3
http://doi.org/10.1080/01621459.1980.10477534
http://doi.org/10.1080/01621459.1980.10477534
http://doi.org/10.1080/01621459.1972.10481321
http://doi.org/10.1101/368316
http://doi.org/10.1093/bioinformatics/btz200
http://doi.org/10.1093/bioinformatics/btz200
http://doi.org/10.1101/2020.08.03.234476
http://doi.org/10.1101/2020.08.03.234476
http://doi.org/10.1093/bioinformatics/btz852
http://doi.org/10.1534/genetics.118.301468
http://doi.org/10.1534/genetics.118.301468
http://doi.org/10.1007/BF01245622
https://www.genetics.org/content/49/1/49
http://doi.org/10.1111/j.1558-5646.1960.tb03113.x
http://doi.org/10.1534/genetics.114.165514
http://doi.org/10.1534/genetics.114.165514
http://doi.org/10.1016/0304-4076(80)90032-9
http://doi.org/10.1016/0304-4076(80)90032-9
https://www.R-project.org/
http://doi.org/10.1080/01621459.1974.10480246
http://doi.org/10.3835/plantgenome2015.08.0037
http://doi.org/10.1038/nrg2361
https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/


C. Spearman. The proof and measurement of association between two things. The American journal
of psychology, 15(1):72–101, 1904. doi: 10.2307/1422689.

M. Stephens. False discovery rates: a new deal. Biostatistics, 18(2):275–294, 10 2016. ISSN 1465-
4644. doi: 10.1093/biostatistics/kxw041.

J. A. Sved and W. G. Hill. One hundred years of linkage disequilibrium. Genetics, 209(3):629–636,
2018. ISSN 0016-6731. doi: 10.1534/genetics.118.300642.

J. G. A. M. L. Uitdewilligen, A.-M. A. Wolters, B. B. D’hoop, T. J. A. Borm, R. G. F. Visser,
and H. J. van Eck. A next-generation sequencing method for genotyping-by-sequencing of
highly heterozygous autotetraploid potato. PLOS ONE, 8(5):1–14, 05 2013. doi: 10.1371/jour-
nal.pone.0062355.

X. Wen and M. Stephens. Using linear predictors to impute allele frequencies from summary or
pooled genotype data. The annals of applied statistics, 4(3):1158–1182, 2010. ISSN 1932-6157.
doi: 10.1214/10-aoas338.

Y. C. J. Wientjes, R. F. Veerkamp, and M. P. L. Calus. The effect of linkage disequilibrium and
family relationships on the reliability of genomic prediction. Genetics, 193(2):621–631, 2013. ISSN
0016-6731. doi: 10.1534/genetics.112.146290.

Wolfram Research, Inc. Mathematica, Version 12.2, 2020. URL https://www.wolfram.com/
mathematica. Champaign, IL.

X. Zhu and M. Stephens. Large-scale genome-wide enrichment analyses identify new trait-associated
genes and pathways across 31 human phenotypes. Nature communications, 9(1):1–14, 2018. doi:
10.1038/s41467-018-06805-x.

22

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.08.430270doi: bioRxiv preprint 

http://doi.org/10.2307/1422689
http://doi.org/10.1093/biostatistics/kxw041
http://doi.org/10.1534/genetics.118.300642
http://doi.org/10.1371/journal.pone.0062355
http://doi.org/10.1371/journal.pone.0062355
http://doi.org/10.1214/10-aoas338
http://doi.org/10.1534/genetics.112.146290
https://www.wolfram.com/mathematica
https://www.wolfram.com/mathematica
http://doi.org/10.1038/s41467-018-06805-x
http://doi.org/10.1038/s41467-018-06805-x
https://doi.org/10.1101/2021.02.08.430270
http://creativecommons.org/licenses/by-nc-nd/4.0/

	Introduction
	Methods
	Results
	Simulations
	LD estimates for Solanum tuberosum

	Discussion
	Figures
	Derivation of LD estimators
	Derivation of standard errors
	Adjusting the reliability ratios
	Adaptive shrinkage on the reliability ratios
	Thresholding the reliability ratios

	Genome-wide Association Studies
	Supplementary figures