key: cord-0133449-0ciu7giu
authors: Henrion, Marc Yves Romain
title: Relaxation of the parameter independence assumption in the `bootComb` R package
date: 2022-02-09
journal: nan
DOI: nan
sha: 45595b06a1df90d6ffbafeefd60c452b73435118
doc_id: 133449
cord_uid: 0ciu7giu

Background. The bootComb R package allows researchers to derive confidence intervals with correct target coverage for arbitrary combinations of arbitrary numbers of independently estimated parameters. Previous versions (<1.1.0) of bootComb used independent bootstrap sampling and required that the parameters themselves are independent - an unrealistic assumption in some real-world applications. Findings. Using Gaussian copulas to define the dependence between parameters, the bootComb package has been extended to allow for dependent parameters. Implications. The updated bootComb package can now handle cases of dependent parameters, with users specifying a correlation matrix defining the dependence structure. While in practice it may be difficult to know the exact dependence structure between parameters, `bootComb` allows running sensitivity analyses to assess the impact of parameter dependence on the resulting confidence interval for the combined parameter. Availability. bootComb is available from the Comprehensive R Archive Network (https://CRAN.R-project.org/package=bootComb).

The bootcomb R package Henrion (2021) was recently published. This package for the statistical computation environment R (R Core Team, 2021) allows researchers to derive confidence intervals with correct coverage for combinations of independently estimated parameters. Important applications include adjusting a prevalence for estimated test sensitivity and specificity (e.g. Mandolo et al. (2021) ) or combining conditional prevalence estimates (e.g. Stockdale et al. (2020) ).

Briefly, for each of the input parameters, bootComb finds a best-fit parametric distribution based on the confidence interval for that parameter estimate. bootComb then uses the parametric bootstrap to sample many sets of parameter estimates from these best-fit distributions and computes the corresponding combined parameter estimate for each set. This builds up an empirical distribution of parameter estimates for the combined parameter. Finally, bootComb uses either the percentile or the highest density interval method to derive a confidence interval for the combined parameter estimate. Full details of the algorithm are given in Henrion (2021) .

A key point of the algorithm is that the best-fit distributions for the different parameters are sampled from independently. This requires the parameters to be independent. This may not be a realistic assumption in some real-world applications.

While for most practical applications the input parameters are typically estimated from independent experiments (otherwise the combined parameter could be directly estimated), the parameters themselves may not be independent. This is for instance the case when adjusting a prevalence for the diagnostic test's sensitivity and specificity. The latter two parameters are not independent: higher sensitivity can be achieved by lowering specificity and vice versa.

If the experiments estimating these parameters are sufficiently large, then the violation of the assumption of parameter independence may only have negligible impact on the resulting confidence interval for the combined parameter. However, for the sake of general applicability and to allow running sensitivity analyses, the author felt it was beneficial to extend bootComb to handle dependent parameters.

Copulas are multivariate distribution functions where the marginal probability distribution of each variable is the uniform distribution on the interval [0, 1]. Copulas allow to specify the intercorrelation between random

variables. An important probability theory result, Sklar's Theorm (Sklar, 1959) , states that any multivariate probability distribution can be expressed in terms of its univariate marginal distributions and a copula defining the dependence between the variables.

Mathematically, let X 1 , X 2 . . . , X d be d random variables and define U i = F i (X i ), i = 1, . . . , d. Then the copula C of (X 1 , . . . , X d ) is defined as the joint cumulative distribution function of (U 1 , . . . , U d ):

. . , d are continuous. Then, via the probability integral transform (Angus, 1994) , the random vector (U 1 , U 2 , . . . , U d ) has marginals that are

bootComb makes use of the fact that the above can be reversed: given a sample (u 1 , . . . , u d ), a sample for

. The inverse functions F −1 i (u) will be defined if the marginals F i (x) are continuous. For the use of bootComb, where users input confidence intervals for an estimated numeric parameter, this will always be the case.

bootComb will proceed as follows to generate samples from a multivariate distribution of d dependent variables:

• Estimate best-fit distributions F 1 , . . . , F d for each of the d parameters X 1 , . . . , X d given the lower and upper limits of the estimated confidence intervals for each parameter.

• Sample (z 1 , . . . , z d ) from a multivariate normal distribution N (0, Σ) where the variances in Σ are all 1.

• Since the marginals of this normal distribution are all N (0, 1), compute u i = Φ(z i ) where Φ is the cumulative distribution function of the standard normal.

The resulting vector (x 1 , . . . , x d ) will be a sample from the multivariate distribution of (X 1 , . . . , X d ). Note that the dependence structure was completely specified through the covariance matrix Σ (since the covariances are assumed to be 1, this really is a correlation matrix) and marginal distributions for each parameter were specified by F i , i = 1, . . . , d.

We repeat the 2 examples from Henrion (2021) here, but look at the effect of specifying a dependence between the input parameters.

With an application to hepatitis D and B viruses (HDV and HBV respectively) from Stockdale et al. (2020) , Henrion (2021) showed how to use bootComb to obtain a valid confidence interval forp aHDV , the prevalence of HDV specific immunoglobulin G antibodies (anti-HDV) in the general population.

HBV is a pre-condition for HDV and hence to derivep aHDV Stockdale et al. (2020) , obtained estimates of the prevalence of surface antigen of the hepatitis B virus (HBsAg),p HBsAg = 3.5%, and the conditional prevalence of anti-HDV given the presence of HBsAg,p aHDV |HBsAg = 4.5%:

•p HBsAg = 3.5% with 95% CI (2.7%, 5.0%).

•p aHDV |HBsAg = 4.5% with 95% CI (3.6%, 5.7%).

Assuming these 2 parameters to be independent, Henrion (2021) derived a 95% confidence interval for the estimatep aHDV =p aHDV |HBsAg ·p HBsAg using bootComb, (0.11%, 0.25%).

If, however, the 2 input prevalences are not independent, e.g. if anti-HDV is more common among people with presence of HBsAg the higher the population prevalence of HBsAg is, then that assumption of independence would not hold. We can investigate how strong an effect dependence of the parameters can have on the resulting confidence estimate. For example, let's run the same example using bootComb with specifying the following covariance matrix for the bivariate normal copula: This yields the 95% confidence interval (0.10%, 0.26%), a slightly wider interval -which makes sense, as the positive correlation means it is more likely for pairs of bootstrapped input parameters to be both near the upper (respectively lower) end of their confidence intervals.

For this particular application, a dependence between both prevalence parameters,p HBsAg andp aHDV |HBsAg , is unlikely and we have therefore not considered this example any further. Henrion (2021) gave an example of adjusting an estimated SARS-CoV-2 seroprevalence for the estimated sensitivity and specificity of the test assay. Specifically:

• 84 out of 500 study participants tested positive for SARS-CoV-2 antibodies, yielding a seroprevalence estimateπ raw = 16.8% with exact binomial 95% CI (13.6%, 20.4%).

• Estimated assay sensitivity: 238 out of 270 known positive samples tested positivep sens = 88.1%, 95% CI (83.7%, 91.8%).

• Estimated assay specificity: 82 out of 88 known negative samples tested negativep spec = 93.2%, 95% CI (85.7%, 97.5%).

Assuming the sensitivity and specificity to be independent, Henrion (2021) reported an adjusted seroprevalence estimateπ = 12.3% with 95% CI (3.9%, 19.0%).

However in this case, the assumption of independence is not fully realistic: there is a trade-off between sensitivity and specificity of the test assay, and as such one would expect a negative dependence between the two parameters: sensitivity can be increased at the cost of decreased specificity and vice versa.

Assuming that the sensitivity and specificity are negatively correlated with the copula correlation parameter ρ = −0.5 between these two parameters, using the extension of bootComb we can now account for the dependence of the parameters: The reported confidence interval is now (3.8%, 19.4%) -marginally wider than when the dependence was ignored.

If we additionally specify returnBootVals=TRUE in the function call, we can extract and plot the sampled pairs of sensitivity and specificity values to check the dependence structure. This is shown on Figure   @ref(fig:Fig1 ): as the correlation parameter ρ in the copula between the sensitivity and specificity is decreased from 0 to -1, the dependence between both parameters becomes more and more pronounced as one would expect.

This shows that a simple correlation matrix specified for the Gaussian copula results in this case in a non-trivial dependence structure between two beta-distributed variables, respecting the specified marginal distributions.

We can also visualise the effect on the estimated confidence interval, as shown on Figure Figure @ref(fig:Fig2) .

We can see that in this case, with a negative correlation, the width of the CI increases at the correlation becomes stronger. However, looking at the scale of the y-axis we see that this is just a marginal effect.

The R package bootComb has been extended and, using Gaussian copulas, it can now handle the case of dependent input parameters. For many applications, the effect of dependence between the parameters will be marginal or even negligible. However, the package now allows users to do sensitivity analyses to assess the effects of a miss-specified dependence structure between the parameters that are being combined. 

All data to support this work are contained within the article. The software package itself is available from https://cran.r-project.org/package=bootComb.

Author Marc Y. R. Henrion declares none.

The Probability Integral Transform and Related Results

bootComb: Combine Parameter Estimates via Parametric Bootstrap (R package version 1.1.2) [Computer software

bootComb-an R package to derive confidence intervals for combinations of independent parameter estimates

Reduction in Severity of All-Cause Gastroenteritis Requiring Hospitalisation in Children Vaccinated against Rotavirus in Malawi

R: A language and environment for statistical computing. R Foundation for Statistical Computing

Fonctions de Répartition À N Dimensions Et Leurs Marges

The global prevalence of hepatitis D virus infection: Systematic review and meta-analysis