key: cord-0109487-ixqtqhvh
authors: Bayer, Damon; Fay, Michael; Graubard, Barry
title: Confidence Intervals for Prevalence Estimates from Complex Surveys with Imperfect Assays
date: 2022-05-26
journal: nan
DOI: nan
sha: fd5a20279c81634cd27d7877ecd18aa40d152182
doc_id: 109487
cord_uid: ixqtqhvh

We present several related methods for creating confidence intervals to assess disease prevalence in variety of survey sampling settings. These include simple random samples with imperfect tests, weighted sampling with perfect tests, and weighted sampling with imperfect tests, with the first two settings considered special cases of the third. Our methods use survey results and measurements of test sensitivity and specificity to construct melded confidence intervals. We demonstrate that our methods appear to guarantee coverage in simulated settings, while competing methods are shown to achieve much lower than nominal coverage. We apply our method to a seroprevalence survey of SARS-CoV-2 in undiagnosed adults in the United States between May and July 2020.

Estimating and quantifying uncertainty for disease prevalence is a standard task in epidemiology. For rare events, these estimates are highly sensitive to misclassification 1 , making adjustments for sensitivity and specificity critically important. While estimating prevalence (or any event proportion in a population) in complex surveys and adjusting estimates for misclassification have been well studied separately, performing both of these tasks simultaneously remains relatively unexplored. Recent overviews of methods for estimating prevalence in surveys without misclassification are provided by Dean and Pagano 2 and Franco, et al 3 . For simple random sample surveys with imperfect sensitivity and specificity, Lang and Reiczigel 4 proposed an approximate method that performed well in simulations. Recent work by DiCiccio, et al 5 and Cai et al 6 study both valid (i.e., exact) and approximate methods. Their valid methods use test inversion and the adjustment of Berger and Boos 7 , while their approximation methods use the bootstrap with the test inversion approach. Fewer methods are available for for constructing frequentist confidence intervals for prevalence estimates from complex surveys while adjusting for sensitivity and specificity. Kalish et al 8 developed one such method that is closely related to one of the methods presented here, but that method's properties were not studied. Cai et al 6 (see also discussion in DiCiccio et al 5 ) modify their approximation approach to allow sample weights, but it assumes that the number of counts of events within the strata are large (see their Remark 4). Thus, it would not apply to a weighted survey method where each individual their own weight. Another recent advancement is the method developed by Rosin et al 9 that makes use of asymptotic normal approximations which reduce to the Wald interval when sensitivity and specificity are perfect. This problem has also previously been addressed in Bayesian literature, recently by Gelman and Carpenter 10 . We work up to our ultimate goal in stages. First, in Section 2.2, we propose confidence intervals for simple random samples where prevalence is assessed with an assay with imperfect sensitivity and/or specificity. Next, in Section 2.3, we present confidence intervals for weighted samples where prevalence is assessed with an assay without misclassification. In Section 2.4, we combine these methods to create confidence intervals for weighted samples where prevalence is assessed with an assay with imperfect sensitivity and specificity. Because the combined method reduces to one of the first two methods as a special case, we can think of the first two stages as testing the combined method in those cases. Finally, in Section 2.5 we show how certain complex surveys may fit into the format for our new method. Our new method is designed to guarantee coverage in all situations.

In simulations we compare our method to established frequentist competitors. and show through simulations that it beats the best of those in each of the three stages with respect to guaranteeing coverage. We did not include in our simulations the new methods that have been developed in response to the COVID-19 pandemic and are not yet in print in peer reviewed journals 6,5, 9 . The exact method of DiCiccio, et al 5 would guarantee coverage, although applying it to a survey with a large number of strata would be "computationally expensive", and it has not been applied to surveys using post-stratification weighting. In contrast, our new method can very tractable in those situations.

To introduce notation, consider first the stratified simple random sample. Suppose we have a population partitioned into strata, with 1 , 2 , … , individuals in the K strata of the population. We sample 1 , 2 , … , individuals via a simple random sample from each of the strata to have an assay performed to determine who has a disease. Let be the number of positive results from an assay performed on the individuals from stratum and assume ∼ Binomial( , ), where is the population frequency of positive results for assays performed on individuals from stratum . Similarly, let * be the unobserved true number of people with the disease among the individuals from stratum and assume * ∼ Binomial( , * ), where * is the population frequency of cases in stratum . In the case of a perfect assay, = * . Therefore, the population prevalence is * =

and the apparent prevalence is

where = ∕ ∑ =1 and, therefore, ∑ =1 = 1. This set-up will approximately work for other complex survey samples, where we can estimate survey weights such that the complex survey sample may be treated as a multinomial sample with probabilities proportional to those weights (see Section 2.5).

We can relate and * using the sensitivity ( ) and specificity (1-) of the assay, where and are the proportion of positive assays from a population of positive controls (i.e., individuals known to have the disease) and negative controls (i.e., individuals known to be without the disease), respectively. Then = * + (1 − * ), or equivalently, * = − − ,

Suppose the assay is measured on individuals known to not have the disease and on individuals known to have the disease. Let and be the number who test positive from the respective samples. Assume that the negative and positive controls act like simple random samples from their respective populations. Thus, ∼ Binomial( , ) where 1 − is the specificity of the assay, and ∼ Binomial( , ), where is the sensitivity of the assay. Let̂ = ,̂ = , and̂ = . Then a plug-in estimator for * iŝ * =

This estimator serves as an important basis for developing confidence intervals in this work. Section 2.2 is concerned with confidence intervals for * in the case where = 1, > 0, < 1, i.e. estimating prevalence from a simple random sample with an imperfect assay. Section 2.3 is concerned with confidence intervals for * in the case where > 1, = 0, = 1, i.e. estimating prevalence from a weighted sample with a perfect assay. Section 2.4 is concerned with confidence intervals for * in the case where > 1, > 0, < 1, i.e. estimating prevalence from a weighted sample with an imperfect assay.

First, we consider the scenario where = 1, > 0, and < 1. We develop a confidence interval for the population prevalence, * . When = 1, the estimand in Equation 3 becomes * = ( 1 − )∕( − ). We have > for any useful assay, and since the sample is a mixture of individuals with and without the disease of interest,

where we define 0∕0 = 0.

To create a confidence interval for̂ * , we use a generalization of the melding method 11 , which makes use of lower and upper confidence distributions on functions of independent estimators to account for variability in̂ 1 ,̂ , and̂ . Confidence distributions are like frequentist posterior distributions 12 . The lower and upper confidence distributions are used with discrete responses to ensure the validity of the resulting inferences, and for the binomial case they are equivalent to the posterior distributions that result from using well-calibrated null preference priors 13 .

Each estimated component in Equation 5 is a binomial probability parameter. For each of these, we use distributions associated with the exact binomial confidence interval. For a binomial experiment with successes out of trials, the lower confidence distribution is Beta( , − + 1) with associated random variable , and the upper confidence distribution is Beta( + 1, − ) with random variable , where for > 0 we let Beta(0, ) and Beta( , 0) be point masses at 0 and 1, respectively. Let ( , ) be the th quantile of a random variable . Then the exact 1 − % central confidence interval of Clopper-Pearson 14 for the binomial parameter is

Fay et al 11 proposed a method for obtaining confidence intervals for functions of two parameters that are monotonic within the allowable range for each parameter given the other is fixed. Here we generalize that to * , which is a function of 3 parameters. When 1 ≥ > 1 > ≥ 0 then * is monotonically increasing in 1 , monotonically decreasing in , and monotonically decreasing in . For an assessment of monotonicity in other scenarios see Appendix A. Then the 1 − % confidence interval for * is 2 ,

where (⋅) is defined in equation 5. The quantiles of these melded distributions are calculated by Monte Carlo sampling from each of the component distributions. We compare this method to one described in Lang and Reiczigel 4 as implemented in prevSeSp function in 15 , which provides approximate confidence intervals for true prevalence when sensitivity and specificity are estimated from independent samples, as they are in this section. The Lang-Reiczigel interval is given by * ′

where ≡ 1 − 2 , and ∼ (0, 1).

Next, we present a confidence interval for the population prevalence, * , in the scenario > 1, = 0, = 1. Our method is a straightforward adaptation of the gamma confidence interval presented in Fay and Feuer 16 , which was developed to create confidence intervals for a population rate which is assumed to be a weighted sum of Poisson rate parameters. We note that for sufficiently large sample size and small rate , a Poisson( ) distribution is approximately equal in distribution to a Binomial( , ) distribution. Under this Poisson assumption, we suggest the 100(1 − )% gamma confidence interval for * :

where * ∼ Gamma

We call this the wsPoison method, since it assumes a weighted sum of Poissons. We compare the wsPoisson confidence interval to two methods presented in Dean and Pagano 2 , which were recommended for scenarios with low prevalence. Dean and Pagano showed in simulations that the standard Wald interval had poor coverage with low prevalence (e.g., Fig. 1 of that paper showed 95% confidence intervals with coverage of less than 85% for prevalence values less than 2%). Since the confidence interval of Rosin, et al 9 reduces to the Wald interval with perfect assays, we will not include that method in the simulation comparisons.

The first recommended method of Dean and Pagano is an adaptation of the method of Agresti and Coull 17 for the survey setting. The interval for * is given by:

In the case where ∑ =1 2 ̂ = 0, we instead let eff = ∑ =1 . We also compare our suggested method to Dean and Pagano's modification of the method of Korn and Graubard 18,2 . This interval is given by

where analogously to the Clopper-Pearson interval (see equation 6),

̂ , and eff defined in Equation 11 . Although Dean and Pagano 2 expressed this in terms of the F distributions, the beta distribution representation is equivalent.

Lastly, we develop a confidence interval for the population prevalence, * , in the case where > 1, > 0, < 1. The two methods we discuss are closely related to each other and the methods discussed in Sections 2.2 and 2.3. As in Section 2.2, we use the melding method 11 to create 1 − % confidence interval very similar to Equation 7. The confidence distributions for and are the same Beta distributions as in Section 2.2. The two methods differ in their confidence distributions for the apparent prevalence .

In the first case, we use the adaptation of the gamma confidence interval 16 presented in Section 2.3 to derive the 1 − % confidence interval for * :

where * and * are defined in Section 2.3. We refer to this method as the WprevSeSp Poisson -weighted prevalence with sensitivity and specificity, where the prevalence confidence distribution is based on the weighted sum of Poissons.

The alternative method is very similar to that used in Kalish, et al 8 . We use the modification of 18 presented in 2 , as in Section 2.3, to derive the 100(1 − )% confidence interval for * :

where and are as defined in Section 2.3. Although equivalent, this expression looks different than in Kalish, et al 8 because they used a parameter for specificity, rather than , which is 1 minus specificity. We refer to this as WprevSeSp Binomial -weighted prevalence with sensitivity and specificity, where the prevalence confidence distribution is based on a binomial variance assumption.

In Section 2.1, we derived methods assuming that the apparent prevalence was a weighted sum of binomial random variables, = ∑

, where ∼ Binomial( , ). We used the fact that for small and large the binomial can be approximated by the Poisson, giving ⋅ ∼ Poisson( ). Thus, whenever we can model a complex survey estimator of apparent prevalence as a weighted sum of Poisson variates, then we can apply the methods of this paper.

In the upcoming Section 2.5.2, we give a detailed review relating the multinomial sampling model to a weighted sum of Poisson variates model. The multinomial sampling model treats the survey sample as if it is a sampling with replacement from the entire population of individuals, where each of the individuals has a probability of of being sampled for each of the samples from the survey, with ∑ =1 = 1. Under this model the number of times each of the individuals is included in the sample is a multinomial with parameters and [ 1 , … , ]. The multinomial model describes sampling with replacement, but it is nevertheless used to approximate a sampling design where the th individual is sampled without replacement with probability , even though under that design (unlike the multinomial model) no individual is included in the sample more than once. The multinomial model is a common approximation for other complex survey designs; see e.g., 19 , p. 14 . For example, in the Kalish, et al 8 analysis of Section 4 each individual in the sample is assigned a pseudo-weight approximating one over their sampling probability from a multinomial model. The actual sample was not a probability sample. In fact, it was a quota sample from a very large pool of self-selected volunteers, and the pseudo-weights were calculated using a different large survey that was a probability weighted survey. The pseudo-weights were calculated such that if they were analyzed under the multinomial model, they would adjust for selection bias due to self-selection of the volunteers and the imperfection of the quota sampling.

be the binary indicators of event in the individuals in the population of interest, so the prevalence is = −1 ∑

. There are many ways to design a complex survey sample, and it is often useful to analyze them as if individuals were sampled with replacement with the sampling probability of the th individual equal to , with ∑ =1 = 1 In other words, we treat the sample as if it was independent multinomial samples each with one trial and selection probability vector [ 1 , … , ]. Let = 1 if the th draw for the sample is individual in the population, and 0 otherwise. Then let = and = when = 1. Here, following the tradition in the survey literature, we use capital letters for the population of interest (e.g., N,Y,P), and lower case letters for the sample (e.g., n,y,p). In this notation, both and are fixed, and only the variables representing the sampling (i.e., the variables) are random. Under this independent multinomial model, since ( ) = , an unbiased estimator of iŝ

and an unbiased estimator of var(̂ ) under the multinomial model iŝ

(see Korn and Graubard 19 Problem 2.2-10). We can writê as a weighted sum. Traditional survey weighting defines the weights so that the weight for the th sampled individual can be interpreted as the number of individuals in the population that the th sampled individual represents. Following that tradition, let ( ) = 1∕( ) and ) = 1∕( ), then the expected sum of the sampled weights is ,

Sometimes the weights are scaled after selection so that the scaled weights are ( ) = ( ) ∑ =1 ( ) and are forced to sum to . For example, in Kalish et al 8 rescaling (sometimes called post-stratification) was done in a more complicated manner to ensure that the weights summed to the US census population within age group, sex, race, ethnicity and region.

For this paper we define the weights differently, because we want to model our estimator as a weighted sum of Poisson random variables. Thus, we use = 1∕( ) and = 1∕( ) so that the sums have expectation 1. In the complex survey case, we start with the independent multinomial model as in equation 15, then we use the relationship between the multinomial and Poisson distributions. Using the "multinomial-Poisson transformation", the maximum likelihood estimates (MLE) for a multinomial random variable are equivalent to the MLEs for independent Poisson random variables, and the variances are asymptotically equivalent (see Baker 20 ) . Even though we model̂ using multinomial random variables where there are many missing values (which occurs in our situation whenever = 0), that multinomial-Poisson relationship holds even when there are missing variables (see Baker 20 , Section 3). For both the Poisson and multinomial models, ( ) = , and̂ is unbiased under either model. For the Poisson model, all the are independent and each mean equals its variance, so that the variance of̂ under this model is

We estimate var ̂ by multiplying each term in the sum by ∕ , which has an expectation of 1 and eliminates terms of non-selected individuals, givingv

Under the Poisson modelvar ̂ is an unbiased estimator of var ̂ .

We assess and compare our new method (Melding, i.e., equation 7) to that of Lang and Reiczigel (LR) in a variety of simulated settings. In each simulation, 100 subjects are tested to estimate prevalence, 60 are tested to estimate sensitivity, and 300 are tested to estimate specificity. Several combinations of prevalences (0.5%-2%), sensitivities (75%-100%) and specificities (75%-100%) are assessed. Each simulated scenario is replicated 10,000 times. Figure 1 compares the two methods based on coverage, while Figures 2 and 3 present the lower and upper error frequencies for these scenarios, respectively. Figure 1 shows that, when specificity is less than perfect, both methods achieve approximately nominal coverage, with the melding method being somewhat more conservative. When specificity is 100%, both methods are conservative. Figure 3 shows that both methods make upper errors with roughly the same frequency. Figure 2 demonstrates that while the melding procedure bounds the lower error frequency below 2.5%, the Lang-Reiczigel method generally has lower error above 2.5%, which is undesirable for applications for which there is a need to bound the lower errors.

We compare the wsPoisson method to the more traditional Dean-Pagano modification of the Agresti-Coull (DPAC) method and the Korn-Graubard (KG) method for survey proportions in a variety of settings. Our simulations examine varying levels of disease prevalence (0.5% or 5%), different types survey designs (50 sampling strata with 200 subjects each or 8000 individuals, each with their own weight), distributions of weights among the sampling strata or individuals (coefficient of variation from approximately 0% to nearly 600%), and the number and weights of sampling strata with non-zero prevalence. For each combination of prevalence , and group type, up to 500 sets of weights are simulated. These 500 sets of weights are designed to span a range of coefficients of variation. For a target coefficient of variation, , weights ( ) are simulated by generating samples from a Beta 1 2 − 1 2 − 1 , −1 2 − −1 2 − −1 distribution and normalizing so that ∑ =1 = 1. This assures that the coefficient of variation among these weights is approximately . Then, certain weights are chosen to have non-zero prevalence (5%, 25%, or 75% distributed either among the highest weights, lowest weights, or distributed uniformly). These weights with non-zero prevalence are given a prevalence such that ∑ =1 = . For each simulated set of parameters and weights, 10,000 data sets are simulated and assessed.

The coverage properties for these simulations are presented in Figures From Figures 4-7, we note that the two competitor methods generally exhibit lower coverage as the coefficient of variation among the weights increases. In Figure 4 , this coverage falls below 60% when the prevalence very low and is concentrated among the highest weights, and the coefficient of variation among the weights exceeds 4. Uniform distribution of prevalence among the weights, increased overall prevalence, and larger sample sizes among fewer groups all appear to lessen the severity of this problem. In contrast, the wsPoisson method appears to guarantee coverage in all scenarios. The wsPoisson method tends to become more conservative when the coefficient of variation among the weights increases, when the other methods can have problems guaranteeing coverage. In all cases, the wsPoisson method is more conservative than the competitor methods. This is similar to the behavior observed in Fay and Feuer 16 , where, in simulations, the overall error rate for the gamma intervals decreased as the variance of the weights increased. Because our methods appear to be very conservative, with coverage near 100% in some cases, we present the widths of the confidence intervals in Figures B9-B12 . In scenarios where coefficient of variation among the survey weights is high, the wsPoisson intervals are often two or three times wider than intervals produced by competing methods.

We compare properties our melded confidence interval WprevSeSp Poisson, to another melded confidence interval method WprevSeSp Binomial, and one method, wsPoisson, which does not account for the imperfect assay. The methods are assessed in a several simulated scenarios with varying levels of disease prevalence (0.5% or 5%), types of groups surveyed (50 groups 200 subjects or 8000 groups of 1 subject), distributions of weights among the groups (coefficient of variation from approximately 0% to nearly 6%), the number of groups with non-zero prevalence, and the specificity of the assay (80% -100%). In each scenario, the assay has 95% sensitivity. Each scenario creates up to 500 new sets of weights and parameters (as in Section 3.2), and each of those is simulated 10,000 times, with new prevalence, sensitivity, and specificity surveys generated and 95% confidence intervals are created. Modelled after the study of Kalish, et al 8 , the simulated sensitivity is assessed based on 60 tests, while specificity is based on 300 tests.

The coverage properties for these simulations are presented in Figures 8-11 . Additional properties for these simulations are presented in Figures B13-B24 .

Based on Figures B13-B20 , we note that, in most settings, the two melding procedures result in conservative confidence intervals, often nearing 100% coverage. With perfect specificity, the WprevSeSp Binomial method fails to maintain nominal coverage when the coefficient of variation among the weights is high and specificity is perfect. Only our proposed WprevSeSp Poisson method maintains or exceed the desired coverage in all scenarios. In these scenarios, we also assess properties of the wsPoisson procedure, which does not account for the the imperfect assay. This method results in approximately 0% coverage in any scenarios where specificity is less than perfect. For this reason, results from method are omitted in the figures. Because our methods appear to be very conservative, with coverage near 100% in some cases, we present the widths of the confidence intervals in Figures B21-B24 . The WprevSeSp Binomial and WprevSeSp Poisson methods typically produce wide intervals of approximately the same size -sometimes as wide as 12%, even when true prevalence is 0.05%. One notable exception to this is presented in Figure B23 , which shows that for tests with perfect specificity, the WprevSeSP Binomial method produces much narrower confidence intervals than the other method.

We apply these two methods to a real data set from Kalish, et al 8 . This data set was collected to estimate seroprevalence of SARS-CoV-2 in undiagnosed adults in the United States between May and July 2020. The assay used in this data is estimated to have perfect sensitivity, based on 56 tests on individuals with confirmed SARS-CoV-2 and perfect specificity based on 300 tests on individuals confirmed to not have SARS-CoV-2. First we apply the methods to the full data set ( = 8058, weight coefficient of variation = 252%). The seroprevalence in Kalish, et al was 4.6% with (95% CI: 2.6% to 6.5%), using a confidence interval method that was nearly the same as the WprevSeSp Binomial method (the method Kalish et al included a calculation of the variability of the weights due to the estimation of the weights, whereas in this paper we treat the weights as fixed constants). The Korn and Graubard type melded confidence interval with imperfect assay adjustments (Wpre-vSeSp Binomial) studied in this paper produced the 95% confidence interval for population prevalence nearly the equivalent, (2.53%, 6.68%). while the wsPoisson type melded confidence interval with imperfect assay adjustments (WprevSeSp Poisson) produced the 95% confidence interval (2.56%, 7.54%). We also apply the wsPoisson method from Section 2.3, which does not account for imperfections in the assay, resulting in a 95% confidence interval of (3.04%, 7.39%). While all three intervals overlap to a large degree, the WprevSeSp Poisson interval is the widest. Our simulations show that in this situation, the WprevSeSp Binomial interval may be the best, because with coefficient of variance about 250% (see Figures B15 and B20 , top right panel) the error on both sides of the confidence interval is bounded at 2.5% and the width of the intervals are better ( Figure B23 ). We also apply the methods to the subset of only Hispanic participants ( = 1281, weight coefficient of variation = 306%), where Kalish et al estimated the undiagnosed adult seroprevalence estimate as 6.1% (95% CI: 2.4% to 11.5%). The WprevSeSp Binomial method produces a 95% confidence interval for population prevalence (2.35%, 11.75%), while the WprevSeSp Poisson method produces a 95% confidence interval (2.40%, 20.02%). The wsPoisson method produces a 95% confidence interval of (2.80%, 19.63%). In this case, the two melded confidence intervals are much wider than the WprevSeSp Binomial interval, which is as expected since the melding method is designed to guarantee coverage, although the simulations show that the WprevSeSp Binomial interval may be reasonable (see e.g., Figure B20 ). The smaller WprevSeSp Binomial interval is also unsurprising and is similar to the results observed in our simulation study. 

We presented several methods for creating confidence intervals to assess disease prevalence in variety of settings, including simple random samples with imperfect tests, weighted sampling with perfect tests, and weighted sampling with imperfect tests. One of the new methods was very similar to the method used by Kalish et al 8 , and in this paper we have explored its properties. These new confidence intervals appear to guarantee coverage in most simulated settings, and in general to demonstrate higher coverage than competitor methods. In the case of the simple random sample with an imperfect test, our new methods are able to bound the lower error rate for a 95% confidence interval at 2.5%, while the Lang-Reiczigel 4 method maintains 95% coverage by allowing a higher lower error rate. A big advantage of our method is that it may be applied with complex survey methods, where each individual has their own weight, such as in Kalish et al 8 . However, our method only studied fixed weights and not when the weights are estimated as in Kalish et al. Further worked is needed to address such cases. In addition, further worked is needed to consider other complex sample designs such as multistage cluster designs that are used in household and institutional surveys such hospital and medical practice surveys. Our method was slightly different to that used in Kalish, et al 8 , in that thdatae latter method included estimates of the variability of the weights; however, recalculating the confidence intervals on the same data shows that in that case there was little difference.

Our methods' conservative properties are especially advantageous in settings where the competitor methods exhibit much lower than nominal coverage. For example, there is high variance among the sample weights and prevalence is concentrated among the highest-weighted samples competitor coverage of 95% confidence intervals can fall to 60% for competitor methods, while our method exhibits > 95% coverage. Thus, we suggest that our melding methods be employed when working with survey settings which involve high variance among the weights or lower errors are particularly undesirable. 

Data sharing is not applicable to this article as no new data were created or analyzed in this study. 

It is clear that is monotonic within each of its piecewise-defined functions. In the following sections we consider if monotonicity holds at the change points between piecewise functions.

• Case 1:̂ <̂ 1 . Not monotonic.

• Case 3:̂ 1 =̂ . Monotonic.

• Case 1:̂ <̂ . Monotonic.

• Case 2:̂ <̂ . Monotonic.

• Case 1:̂ <̂ 1 . Monotonic. 

Confidence interval width properties for the confidence interval procedures, WprevSeSp Binomial and WprevS-eSp Poisson. Each point represents 10,000 simulations of datasets from a population with 5% Prevalence where 8000 individuals are sampled. Each datasets also includes simulated results of tests to evaluate the sensitivity and specificity of the assay performed on 60 and 300 individuals, respectively. Colored dashed lines are estimates from a logistic regression model using quadratic splines.

The Myth of Millions of Annual Self-Defense Gun Uses: A Case Study of Survey Overestimates of Rare Events

Evaluating confidence interval methods for binomial proportions in clustered surveys

Comparative study of confidence intervals for proportions in complex sample surveys

Estimating SARS-CoV-2 Seroprevalence

Bayesian analysis of tests with unknown specificity and sensitivity

Combining one-sample confidence procedures for inference in the two-sample case

Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review

Interpreting P-values and Confidence Intervals using Well-Calibrated Null Preference Priors

The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial

Confidence intervals for directly standardized rates: a method based on the gamma distribution

Approximate is Better than "Exact" for Interval Estimation of Binomial Proportions

Confidence intervals for proportions with small expected number of positive counts estimated from survey data

Analysis of health surveys

The multinomial-Poisson transformation

Confidence intervals for directly standardized rates using mid-p gamma intervals

Estimating prevalence from the results of a screening test

Surveys to measure programme coverage and impact: a review of the methodology used by the expanded programme on immunization

The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial

Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments

Comparison of Bayesian and frequentist methods for prevalence estimation under misclassification