key: cord-0592883-fgfgks1q
authors: Scherting, Braden; Peel, Alison; Plowright, Raina; Hoegh, Andrew
title: Pool samples to efficiently estimate pathogen prevalence dynamics
date: 2021-11-11
journal: nan
DOI: nan
sha: 1c31f0eea117838e29980f5840edb327a7a69a2e
doc_id: 592883
cord_uid: fgfgks1q

Estimating the prevalence of a disease is necessary for evaluating and mitigating risks of its transmission within or between populations. Estimates that consider how prevalence changes with time provide more information about these risks but are difficult to obtain due to the necessary sampling intensity and commensurate testing costs. We propose pooling and jointly testing multiple samples to reduce testing costs and use a novel nonparametric, hierarchical Bayesian model to infer population prevalence from the pooled test results. This approach is shown to reduce uncertainty compared to individual testing at the same budget and to produce similar estimates compared to individual testing at a much higher budget through two synthetic studies and two case studies of natural infection data.

The risk of pathogen spillover from animals to people (zoonotic spillover) is a function of the prevalence of the pathogen in animal populations in space and time [1] . Therefore, understanding patterns of pathogen prevalence is fundamental to mitigating zoonotic disease events but the intensive testing required to do so is often prohibitively costly. Testing costs can be reduced by testing pools of samples, but it is difficult to estimate prevalence from pooled samples over time. In this paper, we develop inferential tools for efficiently estimating pathogen prevalence time series processes from pooled data. These methods can be applied to estimate and mitigate the risk of zoonotic spillover and therefore provide a tool for pandemic prevention.

Prevalence estimation is central to disease surveillance in reservoir host populations [2] , however, because true prevalence is not observed except in trivial settings, estimating prevalence requires sampling, testing, and inference. Modeling prevalence dynamics can lead to more precise predictions of prevalence and, thereby, spillover but most approaches require intensive sampling and testing efforts [3] . Furthermore, the diversity and complexity of additional factors that contribute to spillover risk limit the availability of resources dedicated to sampling and testing for prevalence estimation [4] . Efficient allocation of testing resources coupled with dynamic modeling will enable better identification of current and future pathogenic hazards.

A popular strategy for reducing testing costs when prevalence is expected to be low is to test multiple samples jointly through a procedure known as group testing or pooled testing. Pooled testing proceeds by randomly constructing groups or pools of individual samples and expending a single test on each pool. With a perfect test, a positive result implies one or more individuals in the pool is infected, and a negative result implies all individuals are not infected. These implications are complicated slightly by imperfect tests and pooling-induced dilution, though these limitations arXiv:2111.06249v1 [stat.AP] 11 Nov 2021 are usually outweighed by benefits in practice. Individual diagnoses could then be obtained by retesting members of pools that test positive. Originally developed to scale up the US military's ability to screen recruits for syphilis, the strategy has seen a resurgence in popularity due to the COVID-19 pandemic [5] . When applied to syphilis screening and standard COVID-19 testing, group testing is used in a diagnostic capacity. However, when individual results are not required and pathogen prevalence is the quantity of interest, as is the case in surveillance of reservoir host populations, follow-up testing can be omitted, reducing testing costs further. [6] found that pooled testing can be used to estimate prevalence as accurately and more efficiently than individual testing. However, their results are not directly applicable to observations that exhibit natural temporal correlation, such as multiple samples collected from the same site at different times.

Knowledge of the distribution of infection with respect to time enables better understanding of the factors that lead to high prevalence and therefore prediction of spillover events. However, testing limitations often beget inferences with limited temporal scope or resolution and wide uncertainty intervals. Efficient sampling and inference strategies for estimating prevalence therefore must be extended to account for and leverage correlation among temporally-indexed observations. Such an extension should achieve interpolation between observation times and smoothing of noisy observations. Interpolation generalizes pointwise inferences to unobserved, intervening times and enables improved qualitative and quantitative characterizations of trends that may not be apparent from pointwise inferences alone. Independent estimates of prevalence at different points in time are noisy; under the assumption that nearby events (in space or time) are more closely related than distant ones, smoothing de-noises estimates by sharing information between neighboring observations. Smoothing can also produce more comprehensive representations of uncertainty. In the absence of direct observation, prevalence is modeled as a latent process with a transformed Gaussian process (GP) prior and is related to pooled test results through a Bayesian hierarchical model.

Our goals are twofold: 1) establish the ability of pooled data to recover true, underlying time-varying prevalence and 2) demonstrate that pooled testing can reduce testing costs without degrading estimates or, alternatively, generate more precise inferences at a fixed testing budget. The remainder of the article is organized as follows. In Section 2, we introduce the motivating data and the hypothetical pooling design. In Section 3, we briefly review pooled testing and GP methodology, introduce the modeling framework, and comment on relevant computational considerations. Section 4 reports results from synthetic studies demonstrating recovery of true prevalence and case studies evidencing the relative efficiency of pooled data. The results from the motivating case studies are reported in Section 5, along with additional consideration of sampling variability inherent to our studies. In Section 6 we discuss implications of the present work and future directions.

We consider two motivating case studies. The first is an example of pathogen surveillance among a reservoir host population where low-cost prevalence estimation is necessary for determining trends and designing interventions. The second considers disease surveillance in a human population where knowledge of prevalence may instead be used to inform institutional and public health policies.

[7] report the results from 3,561 coronavirus surveillance tests performed on samples from bats, rodents and primates over 12 years collected throughout the Republic of Congo and Democratic Republic of Congo. We restrict our attention to samples collected from bats. Sampling efforts occurred roughly monthly, and each site was visited roughly twice per year. An approximately two-year lapse in sampling occurred between 2013 and 2015, so, absent the ability to meaningfully interpolate over two years, we evaluate our methods on a subset of the total sampling interval comprising 752 individual tests performed between 24 August 2015 and 20 July 2018. If an individual was tested twice (e.g., using both fecal and saliva samples), we record it only once and deem an individual positive if any of the multiple samples tests positive. Though the date that each sample was collected is recorded, sampling efforts span multiple days. Because we do not expect prevalence to change meaningfully over the short window of days within which sampling occurs, we aggregate results from the same location that occur within < 10 days of each other and record the date as the date of the first sample collection date in the sampling effort. This procedure is consistent with the description of sampling provided by the authors and reflects the reality of sampling reservoir hosts. All processing described here was performed in advance of any analysis and is identical between pooled and individual testing analyses.

In the face of the COVID-19 pandemic, many higher education institutions in the United States implemented on-campus diagnostic and surveillance testing during the 2020/21 academic year. This, in combination with other public health precautions, was intended to limit transmission of SARS-CoV-2 through early detection and subsequent isolation [8] . The University of Notre Dame, a medium-sized university in Indiana, was one such institution. We examine results from nasal swab and saliva tests administered to asymptomatic students on campus on a daily basis between August 3 and December 16, 2020, published publicly in dashboard format on the university's website [9] . The 81,872 tests identified 248 positive cases, for an overall positive rate of 0.30%. In this scenario, individual diagnostic results were sought after, as is often the case when testing human populations. However, prevalence estimation remains relevant. In humans, knowledge of population prevalence is used to inform decision-making (e.g., whether to revert to online instruction). Pooled test results from initial testing may also be used to direct later, follow-up testing to obtain individual diagnoses. For example, testing bottlenecks may be combated by prioritizing pooled testing and prevalence estimation for decision-making, and follow-up testing of positive pools can occur when testing resources become available.

Both data sets contain individual test results. To evaluate methods for estimating true prevalence from tests performed on pooled samples, we construct hypothetical pools by randomly grouping individual results and computing pooled test results. A pool is positive if one or more individuals tested positive in the original data and negative otherwise, thereby censoring individual-level data.

Broadly, testing costs can be broken down into a fixed cost that is invariant to the number of tests and a variable cost that scales with the number of tests performed. Reductions in variable testing costs can be achieved through two parallel approaches: 1) reducing the number of tests needed to obtain similarly precise and accurate estimates (obtaining the same results at lower cost), or 2) improving the precision and accuracy of estimates at a fixed budget (obtaining better results at the same cost). The first approach is evaluated by comparing prevalence estimates obtained from pooling to prevalence estimates obtained from the original data. The second approach is evaluated by comparing pooled estimates to estimates obtained from a subsample of the original data representing the same number of tests (i.e., fixed variable cost). Let m be the number of individuals in a given pool. Prevalence estimates that use all available data are labeled m = 1 and are regarded as the best approximation to true prevalence and therefore the best known alternative to the proposed method. Estimates from subsampled, individual tests are labeled m = 1 * , and estimates from pool tests are labeled with their respective pool sizes (e.g., m = 5). Synthetic pooling and subsampling is performed within sampling instances or time steps. For example, if 17 individuals were originally sampled at time t i , then the m = 1 estimate is based on all 17 individuals, the m = 3 estimate is based on 5 pools of size 3 and 1 of size 2, and the m = 1 * estimate is based on 6 subsampled individuals, because 6 pools and therefore 6 tests are used in the pooling setup. When multiple pool sizes are considered simultaneously, the larger testing budget is used to construct the m = 1 * subsample. In the Congo Basin bat study, pool sizes 3 and 5 are used. In the Notre Dame study, pool sizes 5 and 10 are used.

The methods described here can be used to infer the population prevalence or probability that an individual selected randomly from a specified population is infected with an infectious disease of interest, which is denoted p, at any given time within a specified interval. We observe only whether or not a sample tests positive for the pathogen and the sample collection date. Under appropriate conditions, testing pools or groups comprising samples from multiple individuals confers considerable reductions in testing costs. Therefore, a hierarchical model is used to relate the binary or count data obtained through pooled testing to the probability that a pool tests positive, π, and to relate this pool probability back to individual probability, p.

In possession of test results from pooled samples, estimating prevalence at the pool level is straightforward-one could, for example, employ a beta prior and binomial likelihood and proceed with a conjugate analysis analogous to the procedure described in Section 2.2.1 of [6] . However, when prevalence at the individual level is the quantity of interest, any inference on π must be related to p. To do so, we use the inverse prevalence transformation of [10] :

where m is, again, the pool size. Inspection reveals that pool probability is high when either prevalence is high or pool size is large. At lower prevalence, p is approximately linear in π. For high prevalence, however, p is highly non-linear in π at which point many values of p imply similar, large values of π, hindering identifiability. The non-linearity also grows with m (see [10] , Figure 1 ). High prevalence requires small pools; low prevalence permits large pools. [11] describes experimental design considerations for optimal pool size; [12] additionally considers optimality under two-stage designs, where individuals from positive pools are retested.

GPs are popular tools for nonparametric regression as they make no assumptions about the functional forms of relationships and few assumptions generally. They can therefore be used as priors over functions to model unknown, possibly non-linear functions automatically. By definition, a GP is a stochastic process {X t } t∈T where any select variables X t = (X t1 , . . . , X tn ) are distributed as multivariate Gaussian. The support of a GP is therefore in the real numbers, limiting applications. However, transformations can be used to map this support to different intervals, greatly broadening its uses. The Gaussian distribution function (inverse probit) φ(·) is one such transformation, commonly used to map GP random variables to [0, 1] for use in classification. We adopt a similar approach.

Because p ti ∈ [0, 1], it is convenient to model a latent prevalence process W := {W t } t∈T with real support and transform to obtain p t = φ(w t ). Prevalence, and therefore the form of the latent process, are unknown. Together with assumptions of smoothness, continuity, and real support, this solicits a GP prior on W . The only additional user-specified components of a GP are the covariance function and priors over covariance hyperparameters. The form of the covariance function determines the nature of the relationship between process values as a function of time, and proximity in time determines the strength of the covariance between any two process values. In general, we expect process values at similar times to be more similar than process values at distant times. This agrees with our notion of smoothing. Covariance can be computed between any locations to obtain a function value, achieving interpolation. A common, general-purpose covariance function, and the function we use throughout this paper, is given by

known as the exponentiated quadratic covariance function. The hyperparameters θ = {σ 2 , } control the oscillation speed and amplitude of a sampled process, respectively. Many other covariance functions can be used instead to match the application. The covariance matrix C of our multivariate Gaussian distribution is obtained by letting C ij = C ji = Cov(t i , t j | θ), and we write W t ∼ GP(0, C).

Here, we have specified a zero-mean GP; a useful property is that if F ∼ GP(µ, C) and G ∼ GP(0, C), then F = G + µ. The mean can be modeled independently either as a scalar or some function of covariates and the covariance matrix will capture any residual structure not modeled in the mean.

Models of test results from pooled samples depend primarily on the size of the pools, because the probability that a pooled sample tests positive is a function of both population prevalence and pool size. Different models arise depending on how pool sizes are assigned. At one extreme, pool sizes and number of pools are determined in advance of sampling individuals and all pools are the same size. At the other extreme, pools of many different sizes are tested, possibly in advance of any analysis. Here, we consider three scenarios representing 1) the general case, where all pools may be different sizes, 2) the ideal case in which all pools are the same size, and 3) an efficient alternative to the general case that represents many realistic situations. Idealized and efficient general models can be regarded as special cases of the general model. In all scenarios, the pool sizes m, number of pools k, and number of individual samples n are assumed to be known for all pools. Unknown pool sizes could, in principle, be estimated from data, but strong prior information or other forms of regularization would likely be required for identification. In all scenarios, small notational simplifications are obtained when n (and therefore k and m, also) is constant in time.

In the general setting, each pool is given its own probability-no assumptions are needed about the size or number of pools. We use j ∈ {1, . . . , k ti } to index pools at time t i . The sampling model is

where Y ti,· is a vector of binary variables indicating which pools of samples test positive at time t i . At each time, π is indexed by pools, but p is constant. We use this model in the Congo Basin bat surveillance example. If m ti,a = m ti,b for all a, b ∈ {1, . . . , k ti }, this model simplifies to the idealized model described next.

In the ideal setting, all pools at time t i contain m ti individual samples, and k ti pools are tested. This requires m ti × k ti = n ti . In the absence of pool-level covariates, all pools at a given time have the same probability of testing positive because both m ti and p ti are the same for all pools. The sampling model is

where Y ti ∈ {0, 1, . . . , k ti } represents the number of pools that test positive at t i . Synthetic examples follow this framework. This model is not appropriate when pools are different sizes, as is the case when the pool size does not divide the number of individual samples.

The total number of individual samples may not be perfectly divisible by the chosen pool size. For example, if as many samples as possible are collected rather than a predetermined number, it is unlikely that m divides n. Suppose we test pools no larger than m * . We can use the idealized model when m * divides n ti , and the general model otherwise.

However, when k is on the order of tens or hundreds it is computationally advantageous to model pool structure more deliberately. When m * does not divide n ti , test k ti − 1 pools of size m * and one pool of size (n ti mod m * ) = m ti . This motivates the final sampling model:

Here, the response Y t = (Y t,1 , Y t,2 ) is a multivariate random vector comprising the number of positive pools among those of size m * and a binary variable indicating the test result of the final, odd-sized pool. Because the number of individual samples may vary over time in this scenario, it may be the case that n ti < m * or k ti × m * = n ti . This model handles both situations naturally. In the former case, we have Y ti,1 ∼ Binomial(0, π ti,1 ) and in the latter, Y ti,2 ∼ Bernoulli(0), which imply P r(Y ti,1 = 0) = P r(Y ti,2 = 0) = 1; in effect, either the ideal or general model is used automatically. Obviously, y ti,1 and y ti,2 must be coded as zeros in the respective situations. This formulation can reduce the number of likelihood terms by orders of magnitude, lending a substantial speed-up. We use this model in the Notre Dame COVID-19 testing example, in which more than 80,000 data are analyzed.

Let t = {t 1 , t 2 , . . . , t n } be an index set of observation times in the observation interval T with observed responses y t = {y t1 , . . . , y tn } where y ti ∈ {1, . . . , k ti }, corresponding to the ideal scenario described above. Additionally, let W t be the set of latent prevalence values at times t. The hierarchical model is

Replace the first two lines with the appropriate sampling model in non-ideal scenarios. In the absence of covariates, we model a scalar mean µ. Instead modeling the mean as a linear combination of predictors µ = Xβ would be a natural extension.

The model used to infer prevalence from individual samples is the same as (7) with the prevalence transformation omitted and the sampling model

where k ti is now the number of individuals tested at t i and y ti ∈ {1, . . . , k ti } is the observed number of infected individuals.

In the interest of propagating uncertainty as thoroughly as possible, we perform fully-Bayesian inference on all model parameters and hyperparameters by use of Markov chain Monte Carlo (MCMC). Probabilistic inference on GP hyperparameters is challenging in this setting due to the posterior dependence between latent variables and hyperparameters; the coupling can be resolved by intensive sampling, but the O(N 3 ) scaling behavior of GP prediction renders such intensive, repeated computation impractical. The data sets considered here are sufficiently small that inference in Stan [13] remains practical. For larger data sets, the dependence structure must be addressed directly. Several methods exist; see [14] for a review.

We use normal and half-normal priors on µ and σ, respectively. Values of less than the shortest time between observations (low) or greater than the total observation interval (high) are not identified. Accordingly, an inverse-gamma prior on weakly inform the range of plausible values between low and high.

Dynamic prevalence estimation from pooled samples is first evaluated through two studies of simulated data. The simulated studies are used to evaluate the proposed method's ability to recover known, underlying prevalence in settings similar to those of the case studies. The simulated studies are designed to reflect characteristics present in the data used in the applications. Clearly, it is not possible to simulate a perfect match to the unobserved prevalence processes that gave rise to the observed data, but we approximately match the order of magnitude, overall trend, and sampling design. In the first study, true prevalence ranges from 0 to 0.12 and oscillates slowly in that range. The prevalence is 'observed' at 25 evenly spaced times within a 1000 day interval. 45 individual test results are simulated for each observed prevalence and 15 pools of size 3 (k = 15, m = 3) are constructed. In the second study, true prevalence does not exceed 0.05 and tests are simulated every day for 150 days. 500 individual test results are simulated each day and pool size m = 10 is used, so 50 pools are required each day (k = 50, m = 10). Here, a larger pool size is tolerated because prevalence is low. This would not be known a priori, but may be available through domain knowledge, past analyses, or initial testing. In both studies, m = 1 is informed by k × m individual data, m = 1 * is informed by k individual data, and m > 1 are informed by k pools of size m. In both simulated studies, there is strong agreement between estimated curves for individual (m = 1) and pooled (m = (3, 5)) data, displayed in Figure 1 . In the top panel, pooled and individual curves track the true prevalence closely with a clear but slight exception around day 625. The budgeted individual curve (m = 1 * ) is both less precise and less accurate in general. The second study (Figure 1, panel 2) , in which much lower prevalence is estimated from more data over a shorter interval, provides even more compelling results. True prevalence is estimated well by individual and pooled data, and the two are largely indistinguishable. Furthermore, using budgeted individual data grossly over-smooths the curve, reducing the process to an approximately linear function. Notably, where true prevalence is 0, pooled results are disproportionately informative because it is known that exactly k × m individuals are negative, whereas individual results at the same testing budget indicate only that k individuals are negative-pooling lends greater precision through sample size when prevalence is near zero. Table 1 supports these results; compared to m = 1 and pooled estimates, m = 1 * estimates are either less accurate, less precise, or both, with the exception of µ in the second synthetic study. Together, these simulation studies demonstrate that 1) true, underlying prevalence can be estimated using pooled sample data together with the proposed hierarchical model, 2) for a fixed number of tests, use of pooled sample data confers substantial gains in precision and accuracy, and 3) results from pooled data are consistent with results from individual data, which require m times as many tests and therefore m times higher variable Pooled samples recover true prevalence Days Prevalence Figure 1 : Estimated median curves and 95% credible regions are displayed for each synthetic study under each sampling strategy. Rug marks indicate when sampling occurred. In both cases, the pooled estimates closely track true, unobserved prevalence (black, dotted) and m = 1 estimates (black, solid). Estimates from limited individual sampling (blue, solid) are less precise and fail to accurately recover true prevalence.

cost. Having established that true prevalence can be efficiently estimated from pooled data, we proceed with analyzing the data described in Section 2. Pooled estimates (m = 3 and m = 5), representing 3-and 5-fold reductions in the number of tests, closely match the m = 1 curve obtained through universal individual testing, both in median and uncertainty intervals (Figure 2 ). The m = 1 * curve, which has equal testing cost to m = 3 and greater cost than m = 5, is both less precise and a poorer fit to the best alternative, m = 1. Table 2 tells a similar story. Parameter estimates from pooled data closely match m = 1 estimates in terms of both mean and 95% interval endpoints. Estimates from m = 1 * are either biased (from m = 1) or less precise. 

Across all testing strategies, the number and frequency of tests lends greater precision, compared to the previous example. Figure 3 displays estimated curves from m = 5 and m = 10, which are nearly indistinguishable from the universal individual testing curve, m = 1. Due to the low prevalence and testing intensity, pooled testing is extremely efficient-an order-of-magnitude reduction in testing costs affects estimates almost imperceptibly. However, uncertainty intervals for m = 1 * are again far wider and the estimates are clearly over-smoothed. Table 3 also indicates that pooled estimates recover m = 1 estimates with high fidelity, but m = 1 * estimates differ considerably in expectation and precision.

Observed frequencies at most times are consistent with the posterior predictive distribution, but very low observed frequencies coupled with large sample sizes are represented only in the extreme left tails of predictive distributions at the respective times. Incorporating demographic information or correlation among individuals tested at the same time may resolve this. In practice, pooling is not achieved by subsampling individual results. Instead, pooled testing either incorporates information from more individuals than would otherwise be tested or tests all individuals at a lower cost. Figure 4 is included for completeness and suggests that the pooled results displayed in Figure 2 are typical under the pool sampling design. Individual results at the same budget, however, are far more variable. Consider a scenario where k × m individuals could be sampled, but only k tests are budgeted. Pooled testing permits the incorporation of information from all k × m individuals and subsequent analyses provide stable approximations to estimates generated from universal individual testing data.

The simulation studies have established that pooled testing data together with latent GP regression can be used to estimate prevalence over time. The case studies further established that pooled estimates closely match the far more costly alternatives (m = 1) and that pooled estimates are more precise than budgeted alternative (m = 1 * ). The realized pool assignments and resulting estimates are typical under the pool simulation scheme.

Smoothed and interpolated prevalence estimates are more representative of prevalence dynamics than sets of pointwise estimates. Obtaining such estimates requires more intensive sampling and careful inference. The inferential efficiency of pooled testing, which had previously been shown only in static settings, is seen to apply to dynamic modeling as proposed in this article also.

Population prevalence as a function of time is estimable from pooled samples. Compared to individual testing, pooling serves to greatly reduce testing costs without qualitatively affecting estimates or uncertainty; at a fixed budget, pooling generates more precise estimates. Applied to surveillance of reservoir host populations, the proposed methodology enables efficient, precise estimation of pathogen prevalence dynamics.

Throughout, we assumed that unlimited, universal testing best approximates true prevalence, among the strategies considered. A two-stage design is an interesting intermediary between pooled testing and individual testing. It would likely be more costly than pooling and less accurate than individual testing but may achieve a more favorable cost/accuracy trade-off than either. We leave this for future work.

Other natural extensions to this work exist, some of which immediately resolve current limitations. These extensions belong to three thematic categories: incorporating additional information, using estimates to inform sampling, and elaborating on the chosen inference tools.

Known values of sensitivity and specificity may additionally be incorporated at a commensurate expense to precision. A limitation of pooling is the potential for dilution-induced changes in sensitivity and specificity as a function of pool size. Resolving this entails a trade-off between cost and precision. Similarly, modeling covariance jointly in space and time would be an exciting extension applicable to sufficiently rich data sets.

In online learning settings, where data are analyzed as they become available, various forms of adaptive sampling may be used to design optimal pool sizes at subsequent times. It may also be advantageous to tune sampling frequency in real-time to ensure that dynamics of interest are identified. These adaptive sampling strategies would ensure testing resources are efficiently and effectively expended.

Lastly, the nonparametric tools described in this article were used with relatively basic specifications to establish the general applicability. The model could be strengthened by prior-encoded domain knowledge, alternative covariance functions, and covariate-informed mean structure, depending on the application. Furthermore, as the model is expanded to handle more information, efficient or approximate computation will become increasingly relevant. We look forward to these extensions as exciting future work.

AH and BS developed the proposed methodology. BS formulated the model, performed analyses, and led writing of the manuscript. AP and RP developed the application setting. AH supervised the project. All authors contributed to drafts and gave final approval for publication.

Data and code used in this work are available at https://github.com/braden-scherting/temporal_ prevalence

Ecological conditions experienced by bat reservoir hosts predict the intensity of hendra virus excretion over space and time

Alejandro Schudel, Klaus Stöhr, and ADME Osterhaus. Pathogen surveillance in animals

Sampling to elucidate the dynamics of infections in reservoir hosts

Pathways to zoonotic spillover

Evaluation of sample pooling for diagnosis of COVID-19 by real time-PCR: A resource-saving combat strategy

Estimating viral prevalence with data integration for adaptive two-phase pooled sampling

Coronavirus surveillance in wildlife from two Congo basin countries detects RNA of multiple species circulating in bats and rodents

Response to a COVID-19 outbreak on a university campus-Indiana

Covid-19 protocols and policies: Fall 2020 data

Estimating prevalence using composites

Group testing for estimating infection rates and probabilities of disease transmission

Optimal sampling strategies for two-stage studies

Stan: A probabilistic programming language

A comparative evaluation of stochastic-based inference methods for Gaussian process models