key: cord-0668812-5xo0pnxc authors: Liva, Gianluigi; Paolini, Enrico; Chiani, Marco title: Optimum Detection of Defective Elements in Non-Adaptive Group Testing date: 2021-02-10 journal: nan DOI: nan sha: 7a590a7c2e1b4592ba9b6e2e8429851260d735e8 doc_id: 668812 cord_uid: 5xo0pnxc We explore the problem of deriving a posteriori probabilities of being defective for the members of a population in the non-adaptive group testing framework. Both noiseless and noisy testing models are addressed. The technique, which relies of a trellis representation of the test constraints, can be applied efficiently to moderate-size populations. The complexity of the approach is discussed and numerical results on the false positive probability vs. false negative probability trade-off are presented. between the false positive and false negative probabilities (i.e., between the false alarm and the miss-detection probabilities). To combine the advantages of both techniques, while mitigating their limitations, it is sometimes preferable to implement a hybrid approach, where a first screening is performed via a non-adaptive testing step, followed by an adaptive (or even individual) testing step for the population members that are identified as potentially infected. The first step has the role to prune the sample population, delivering to the second step a small fraction of the original set of individuals for the additional testing. Approaches of this kind, which date back to the original work of Dorfman [3] , enable remarkable savings in the number of tests. Several on-going investigations on the use of group testing for SARS-CoV-2 screening follow this line [7] , [8] . In this paper, we address the problem of efficient a posteriori probability (APP) detection of defective elements in a nonadaptive setting. Our work falls along the lines of [9] [10] [11] , where belief propagation was used to the detect defective elements. In particular, we investigate the use of a trellis description of the test matrix to enable the use of the forwardbackward algorithm [12] . The technique is reminiscent of the trellis representation of linear block codes based on the paritycheck matrix [12] , [13] , and allows obtaining APP estimates for each element of the population with a complexity that grows exponentially in the number of tests (rather than in the population size). The approach can be applied to small and moderate size test matrices and it may be considered as a building block for more sophisticated group testing strategies [14] , [15] . It is developed for both noiseless and noisy group testing settings. The paper is organized as follows. Section II provides the main definitions and the notation used in the rest of the manuscript. Section III presents the trellis construction. The application of the forward-backward algorithm (derived in Appendix A) is discussed in Section IV, along with some numerical examples. Conclusions follow in Section V. We consider a non-adaptive group testing problem where m pooled tests are applied to a population of n elements. The status of the population is described by the defectivity vector x " px 1 , x 2 , . . . , x n q where each element belongs to t0, 1u. For the defectivity vector, we adopt an independent, identically-distributed (i.i.d.) model where each element is defective (i.e., it takes value 1) with probability δ, where δ is referred to as the prevalence. We denote by s " ps 1 , . . . , s m q the syndrome vector, where s i " 0 if none of the elements of x participating in the ith pool is defective while s i " 1 if at least one element participating in the pool is defective. The tests are, therefore, non-quantitative. The allocation of the population elements to the pools is described by an mˆn binary test matrix A " ta i, u, where a i, " 1 if and only if the th element of the population participates in the ith pool. Compactly, we write where the _ operator between the vector x and the matrix A T is defined to yield Here, _ is the inclusive logical disjunction ("or") and^is the logical conjunction ("and"). We consider two models for the tests. In a first (noiseless) model, the test vector t is equal to the syndrome, t " s, i.e., tests are error-free. In a second model, we observe a noisy version of the syndrome, yielding a test vector that is only statistically dependent on the syndrome according to a generic distribution Qpt|sq. We further assume the test vector to take values in t0, 1u m . The random vectors associated with x and t are indicated as X and T , respectively. We denote the set of defectivity vectors compatible with a syndrome s as The decision taken on the status of the elements isx (andX is the corresponding random vector). The false-alarm probability is nd the miss-detection probability is In the following, log is the natural logarithm, and w H pxq is the Hamming weight of the vector x. In this section, we illustrate how the sets of defectivity vectors X s can be compactly represented through a trellis diagram with n sections and at most 2 m states per section. The trellis construction follows the footsteps of the construction introduced in [12] , [13] to represent a linear block code based on the code parity-check matrix. We denote by S the state at depth , where the state can take value in t0, 1, . . . , 2 m´1 u. We further introduce the partial syndrome vector at depth as s . Observe that the syndrome can be obtained as here a is the th column of the test matrix and the^operation has to be intended as element-wise. Owing to the associativity of the _ operator, we can obtain s " s n following the recursion s " s ´1 _`x ^a T f or " 1, . . . , n, and where s 0 :" p0, 0, . . . , 0q. Following this observation, we associate to each possible partial syndrome s the state at depth with index equal to the decimal representation of the syndrome. Specifically, to a syndrome s " ps 1 , s 2 , . . . , s m q we associate the state index rss D " ř m i"1 s i 2 i´1 . Similarly, we retrieve the syndrome associated with a state S as the binary expansion of the state index through the operator rSs B , i.e., s " rrss D s B . The trellis construction proceeds as follow. At depth 0, the trellis admits only state 0. At depth 1, two states rx 1^a1 s D for x 1 P t0, 1u are allowed: it is easy to check that the first state is (again) state 0, and that the second state has index ra 1 s D . We then connect state 0 at depth 0 to state 0 at depth 1 through a 0-labeled edge (i.e., associated to x 1 " 0), and to state ra 1 s D through a 1-labeled edge (i.e., associated to x 1 " 1). The construction proceeds recursively: For each admitted state S ´1 at depth ´1, we draw an x -labeled edge connecting to state S if and only if The construction proceeds recursively until " n. We refer to the trellis obtained by following this procedure as the complete trellis. Note that all paths reaching the final state rss D correspond to the defectivity vectors in X s . Note also that the trellis diagram may present parallel edges between two states. The trellis diagram can be used to efficiently obtain the APP Ppx |tq for each element in x via the forward-backward algorithm [12] , as it will be illustrated in Section IV. Before proceeding, we will highlight some features of the trellis representation that are important in the noiseless group testing setting. Remark 1. In a noiseless group testing setting (i.e., where t " s), upon observing the test vector t the trellis diagram can be expurgated by removing all paths that do not terminate at the state rts D . This can be done without incurring in any loss of information. The paths removal leads to an expurgated trellis diagram with a (possibly) reduced number of states. The paths contained in the new trellis correspond to defectivity vectors compatible with the syndrome s, i.e., all vectors in X s . Following Example 1, Figure 2 reports the trellis associated to a final state rp1, 0, 1qs D " 5. In a noiseless group testing setting, following [13] , we refer to the trellis obtained by removing all paths that do not yield the observed syndrome as the expurgated trellis. By visual inspection of the expurgated trellis of Figure 2 , we see that the second, third, and fifth trellis sections contain only 0-labeled edges, i.e., x 2 " x 3 " x 5 " 0 with certainty. This fact is not surprising, since, whenever a given test evaluates at 0, the elements in x participating in the test can be surely marked as non-defective as foreseen, for example, by the combinatorial orthogonal matching pursuit (COMP) algorithm [5] , [16] . In light of this, the following property holds. Property 1. Denote by m 0 the number of non-zero tests in t (i.e., m 0 " w H ptq), and by n 0 the number of elements in x which participate only in pools resulting in a non-zero test. Then, in a noiseless group testing setting (i.e., where t " s), upon observing the test vector t the trellis diagram can be reduced to a trellis with n 0 sections and at most 2 m0 states per section. We refer to the trellis following from Property 1 as the reduced trellis associated with the test vector t. provides the reduced trellis for t " p1, 0, 1q, for the test matrix of Example 1. Note that, in a noiseless group testing setting, the possibility of describing the whole set of defectivity vectors with a reduced trellis possessing at most 2 m0 states per section enables dramatic savings on the average complexity of the detection algorithm provided in the next section. Let us consider the general case of a noisy group testing setting as described in Section II. We are interested in evaluating the logarithmic APP ratio By means of the complete trellis representation introduced in Section III, (1) can be computed efficiently via the forwardbackward algorithm [12] as (2) In (2), E pxq is the set of x-labeled edges in section , and ps 1 , sq denotes an edge connecting state s 1 at depth ´1 with state s at depth . Moreover, the forward metric at state s and depth can be recursively computed as and the backward metric at state s 1 and depth can be obtained as β ps 1 q " with γ ps 1 , sq " The initial condition for the recursion (3) is α 0 p0q " 1 and α 0 ps 1 q " 0 for s 1 " 1, . . . , 2 m´1 , whereas for the backward recursion (4) it is β n psq " Q pt | rss B q for s " 0, . . . , 2 m´1 . For sake of completeness, the derivation of (2), as well as of (3), (4), is provided in the Appendix. For the special case of a noiseless group testing setting, the likelihood Q pt | sq takes value 1 for t " s, and it is 0 otherwise. It follows that the forward-backward algorithm can be run on the expurgated (or on the reduced) trellis associated with the syndrome s, by initializing the backward metric to β n prss D q " 1. Note also that (1) can be obtained, in the noiseless setting, by observing that PpX " 0|T " tq and PpX " 1|T " tq are PpX " 0|T " tq " ÿ xPXt x "0 δ wHpxq p1´δq n´wHpxq (5) and In this case, the forward-backward algorithm can be seen as an efficient way to attack the enumeration problem entailed by (5), (6) . A decision about each element in x can be obtained by applying a threshold test to (1), i.e. or, by recasting (7) as a log-likelihood ratio (LLR) test, as The test (8) is optimal in the Neyman-Pearson sense. Moreover, for fixed δ and a given noise model Qpt|sq, the forwardbackward algorithm is deterministic, since it associates to each test vector t a fixed logarithmic APP ratio vector pL APP 1 , L APP 2 , . . . , L APP n q. It follows that, for a given threshold Λ, the final decisionx is fixed and only a discrete set of pairs pP MD , P FA q can be achieved, with the operating points linearly interpolating two pairs pP MD pΛ 1 q, P FA pΛ 1 qq and pP MD pΛ 2 q, P FA pΛ 2 qq achievable through randomized tests. In the noiseless setting, by fixing the threshold Λ to a large value, we recover the COMP algorithm [5] , [16] . Borrowing from the jargon of detection theory, the receiver operating characteristic (ROC) curves (displaying the probability of successful detection 1´P MD vs. the probability of false alarm P FA as the threshold Λ varies) for a 7ˆ64 test matrix is given in Figure 4 . The curves have been obtained via Monte Carlo simulations. The test matrix is based on the parity-check matrix of a p64, 57q extended Bose-Chaudhuri-Hocquenghem (BCH) code in cyclic form, where the Hamming weight of each row is 32. The ROC curves are provided for a prevalence δ " 0.015 and for both noiseless and noisy settings. In the noisy case, the noise model mimics the observation of the Qpt i |s i q with Qp0|0q " Qp1|1q " 1´ and Qp1|0q " Qp0|1q " . In particular, two crossover probabilities are considered, " 0.05 and " 0.1. In the noiseless setting, by setting Λ to a large value we obtain the working point of the COMP algorithm, characterized by a zero miss-detection probability. The impact of imperfect tests is remarkable already for a test accuracy of 95% ( " 0.05), where to achieve a 98% success rate in the detection the rate of false alarms has to be as high as 30%. Figure 5 reports the ROC curves for the same conditions considered in the previous example, for the case where the 9ˆ84 test matrix is given by the incidence matrix of an order-9, 3-uniform complete hypergraph (i.e., each column has Hamming weight 3 and the the matrix A is composed by the set of all possible weight-3 columns). An open question relates to the test matrix design criteria that, for given matrix dimensions, provide the best miss-detection vs. false-alarm probability trade-off under the forward-backward detection algorithm. In this paper, we addressed the problem of deriving a posteriori probabilities of being defective for the members of a population in the non-adaptive, non-quantitative group testing framework, both in the noiseless and noisy settings. The approach relies on a trellis representation of the test constraints and it can be applied efficiently to testing matrices involving a moderate number of tests. The peculiarities of the technique, when applied to the noiseless setting, are discussed, emphasizing the implications on the complexity of the algorithm. Numerical results on the false positive probability vs. false negative probability trade-off are presented. The approach may be applied also to the scheme of [14] , [15] , where the algorithm can be employed at the level of the signature matrices. An open research direction is to find (classes of) test matrices capable of providing the best miss-detection vs. false-alarm probability trade-off under the forward-backward detection algorithm. " P`T " t | S ´1 " s 1 , S " s˘P`S ´1 " s 1 , S " s( " PpT " t | S " sq P`S ´1 " s 1 , S " s( c) " PpT " t | S " sq P`S " s|S ´1 " s 1˘P`S ´1 " s 1w here (a) follows from Bayes' rule, and (b) is due to the fact that the final state depends on the state at depth ´1 through the state at depth . Furthermore, (c) is obtained again by application of Bayes' rule. We introduce the shorthand α ´1 ps 1 q :" P`S ´1 " s 1β psq :" PpT " t | S " sq γ ps 1 , sq :" P`S " s|S ´1 " s 1˘. Observe that P`S " s|S ´1 " s 1˘" " 1´δ if ps 1 , sq P E p0q δ if ps 1 , sq P E p1q . and that α psq " PpS " sq " ÿ s P`T " t|S " s 1 , S `1 " s˘P`S `1 " s|S " s 1( " ÿ s PpT " t|S `1 " sq P`S `1 " s|S " s 1" ÿ s β `1 psqγ `1 ps 1 , sq where (a) is again due to the total probability theorem, (b) from Bayes's rule, and (c) by observing that the final state depends on the state at depth through the state at depth `1. Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo Pooling RT-PCR or NGS samples has the potential to cost-effectively generate estimates of COVID-19 prevalence in resource limited environments The detection of defective members of large populations The mathematical strategy that could transform coronavirus testing Group testing: An information theory perspective Group testing to eliminate efficiently all defectives in a binomial sample Detection of SARS-CoV-2 in a plurality of biological samples Two-Stage Adaptive Pooling with RT-qPCR for COVID-19 Screening Note on noisy group testing: Asymptotic bounds and belief propagation reconstruction Semiquantitative group testing Bitwise MAP estimation for group testing based on holographic transformation Optimal decoding of linear codes for minimizing symbol error rate (corresp.) Efficient maximum likelihood decoding of linear block codes using a trellis SAF-FRON: a fast, efficient, and robust framework for group testing based on sparse-graph codes Non-adaptive quantitative group testing using irregular sparse graph codes Nonrandom binary superimposed codes