key: cord-1043795-ybdkrujz
authors: He, Kun; Li, Meng-jie; Fu, Yan; Gong, Fu-zhou; Sun, Xiao-ming
title: Null-free False Discovery Rate Control Using Decoy Permutations
date: 2022-04-09
journal: Acta Math Appl Sin
DOI: 10.1007/s10255-022-1077-5
sha: 0962fe16848cabca3f7139b03aa6936755fd2550
doc_id: 1043795
cord_uid: ybdkrujz

The traditional approaches to false discovery rate (FDR) control in multiple hypothesis testing are usually based on the null distribution of a test statistic. However, all types of null distributions, including the theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null might fail because of improper assumptions on the sample distribution. Here, we propose a null distribution-free approach to FDR control for multiple hypothesis testing in the case-control study. This approach, named target-decoy procedure, simply builds on the ordering of tests by some statistic or score, the null distribution of which is not required to be known. Competitive decoy tests are constructed from permutations of original samples and are used to estimate the false target discoveries. We prove that this approach controls the FDR when the score function is symmetric and the scores are independent between different tests. Simulation demonstrates that it is more stable and powerful than two popular traditional approaches, even in the existence of dependency. Evaluation is also made on two real datasets, including an arabidopsis genomics dataset and a COVID-19 proteomics dataset.

Multiple testing has become increasingly popular in the present big-data era. For example, a typical scenario of applying multiple testing in biomedical studies is to look for differentially expressed genes/proteins, from thousands of candidates, between two groups (i.e., cases and controls) of samples [13, 16] . Currently, controlling the false discovery rate (FDR), which is defined as the expected proportion of incorrect rejections among all rejections [6] , is the predominant way to do multiple testing. FDR control procedures aim at selecting a subset of rejected hypotheses such that the FDR is no more than a given level.

Because a p-value is typically computed from the null distribution of a test statistic in each single test, the canonical approaches to FDR control for multiple testing at present are based on the p-values of all tests or at least the null distribution of the test statistic. Since Benjamini

Here, we propose a new approach to FDR control in the case-control study, named targetdecoy procedure, which is free of the null distribution and the null proportion. The procedure (simplified version) can be briefly described with the following steps. First, a target score and a number of decoy scores are calculated for each hypothesis test. These scores are used to measure the (dis)similarities of two groups of samples, and can be common statistics (e.g., t-value) or other scoring functions. The target score is calculated with regard to the original samples, while the decoy scores are calculated with regard to random permutations of the original samples. Then, based on the target score and decoy scores, a label and a final score are calculated for each test in a competitive manner. If the target score is more significant than half of the decoy scores, the test is labelled as target and the final score is set as the target score. Otherwise, the test is labelled as decoy and the final score is set as the decoy score with a specific rank that is mapped symmetrically from the rank of the target score. Next, the tests are sorted by their final scores in descending order (assuming larger scores are more significant), and for each test, a ratio of (N d + 1)/N t is calculated, where N d and N t represent the numbers of decoy and target tests ranked above this test (included), respectively. At last, the lowest-ranked test that has a (N d + 1)/N t ratio below the given FDR control level is localized, and all the hypotheses of target tests ranking no lower than this test are rejected.

The addition of one (+1 correction) to the number of decoy tests is essential to our approach.

We prove that the target-decoy procedure with such correction can rigorously control the FDR when the score function is symmetric and the scores are independent between different tests. Our approach is exclusively based on the scores and labels of tests. The scoring function used is not limited to traditional p-value or test statistics which have clear null distributions, but can be in any free forms with some symmetry property. Therefore, our approach provides great flexibility and can be potentially more powerful than traditional approaches, the performance of which largely relies on the precision of p-values or the sample size of each test. Monte-Carlo simulations demonstrate that our approach effectively controls the FDR and is more powerful than two popular methods, i.e., the Bayes method [44] [45] [46] and the empirical Bayes method [16, 18, 19] . The performances of the three methods were also compared on two real biological datasets, including an arabidopsis genomics dataset and a COVID-19 proteomics dataset. Because our procedure is more straightforward and can be used with arbitrary score functions, we believe that it will have many practical applications.

Our approach was inspired by the widely used target-decoy database search approach to estimating the FDR of peptide identifications in tandem mass spectrometry-based proteomics [20] . In this approach, tandem mass spectra of peptides are searched against a database consisting of equal size of target and decoy protein sequences. The peptide-spectrum matches (PSMs) are scored and filtered by some score threshold. The FDR of selected PSMs is estimated by the ratio of the number of decoy matches to the number of target matches. Usually, the lowest score threshold is taken such that the estimated FDR is below a given level. Although this empirical target-decoy approach to FDR has been very effective in practice, its theoretical foundation was not established until we proved that a +1 correction to the number of decoy matches leads to rigorous FDR control under the assumption of independence between PSMs [27] . Our work in the context of mass spectrometry was initially submitted to journals in 2013 (unpublished) and was made public in 2015 [28] . The extension to general multiple testing as presented here was first described in an earlier manuscript [29] .

The most related work to ours is the knockoff filter method proposed by Barber and Cands [2] , which aims to control the FDR of variables selected via Lasso regression for a Gaussian linear model. In this method, knockoff variables, which are not associated with the response (conditioning on the original variables), are constructed and subjected to competition with the original variables (covariates). The basic rationale of knockoff filter in FDR control is identical to the target-decoy approach. First, knockoff is essentially synonymous with decoy in their roles. Second, the method used by knockoff filter to derive the rejection region, i.e., the FDR estimation formula with +1 correction and the procedure of selecting the score threshold, is exactly the same as the target-decoy approach. Third, after the proof of equal probabilities of a null variable obtaining a positive score (target label) or a negative score (decoy label), the proof of FDR control is the same mathematical problem addressed by the knockoff filter and the target-decoy approach, although their proving techniques are different. The main contribution of the knockoff filter is its sophisticated knockoff construction method that makes possible the proof of the aforementioned 'equal probabilities' for dependent variables. Knockoff filter allows the variables to be correlated with each other, but assumes the Gaussian noise in the linear model. In comparison, our approach (this paper) achieves FDR control for independent variables only, but puts no assumptions on the distribution of the variables. In addition, the original knockoff filter method required that the sample size (n) is no less than the number of variables (p) for FDR control.

Cands et al. [9] later re-framed the knockoff procedure and proposed the so-called model-X knockoffs method. Unlike the original linear model in which X i,j was treated as fixed (randomness was from the Gaussian noise), the model-X knockoffs method treats X i,j as random. It assumes knowledge of the joint distribution of the covariates, and constructs knockoffs probabilistically instead of geometrically. This removes the restriction on sample size (n ≥ p) and makes the method applicable to both linear and non-linear models. Although the construction of model-X knockoffs does not rely on the specific distribution forms of the original variables in principle, Gaussian distribution is the only one that can be implemented at present. Another limitation of the knockoff method is its high computational cost on knockoff construction, which involves complex matrix computation, such as eigenvalue computation and semidefinite programming.

In the current knockoff methods, only one knockoff copy is constructed for each original variable, and the probability of a null variable or its knockoff copy being selected is equal (0.5). In our target-decoy procedure, multiple decoy permutations are constructed for each original variable, which offers us the flexibility of setting different probabilities of producing target or decoy tests for true null hypotheses. This kind of multiple competition can enhance the power as we experimentally illustrated. Recently, Emery et al. [22] investigated the multiple competition problem in more depth. They presented two methods, namely max method and mirror method, for competition with the multiple decoys/knockoffs. The max method is most intuitive. It selects the variable (original or knockoff) with the highest score. Gimenez and Zou [26] also used the max method for multiple knockoffs. The mirror method is like what we do in our standard target-decoy procedure but is more flexible. It uses two adjustable rank cutoffs for target/decoy labelling, while we only use one adjustable cutoff for target labelling. Emery and Keich [23] also proposed methods to construct multiple knockoffs that offer both FDR control and enhanced power.

In recent years, the approach of FDR control using competitive decoys/knockoffs has attracted much attention from the field of statistics [3, 4, 24, 25, 34, 36, 37, 40] . No doubt, this success was owed to the publication of the knockoff method by Candes et al. However, it should be noticed that we first proposed the FDR estimation formula with the +1 correction, which is the key to FDR control, and gave the first proof of FDR control (in the context of mass spectrometry and under the independence assumption) [27, 28] . We also first introduced the multiple competition strategy [29] . These have been recognized by the community, e.g., [11, 12, 21, 22, 30, 35] . Thus, despite the similarity of our approach to the well-known knockoff method which has been published earlier [2] , we still would like to introduce the target-decoy procedure to the community with our original notations and proofs [27] [28] [29] . As far as we know, the target-decoy approach to FDR control was first used and named in mass spectrometry-based proteomics as early as in 2007 [20] . Therefore, we use the terminology decoy instead of knockoff. Moreover, compared to the knockoff method, our approach has different technical arguments, different motivations, and is verified with different simulation experiments and real data here.

Other related works include that Levitsky et al. [35] proposed an interpretation to the +1 correction based on the negative binomial distribution. However, this interpretation assumes that the number of null targets can be infinite and has uniform prior probability, and therefore, is not a rigorous interpretation. Storey et al. [46] also had a +1 correction in their pFDR estimation to achieve FDR control. However, this correction was made to the number of p-values greater than a fixed threshold λ, which amounts to the total number of decoys in our case. This is very different from the target-decoy/knockoff approach in which the +1 correction is made to the number of decoys/knockoffs in the rejection region.

Organization. The rest of the paper is organized as follows. Section 2 describes our targetdecoy approach for FDR control. Section 2.1 discusses a general scenario of case-control study. The simplified and standard target-decoy procedures are presented in Sections 2.2 and 2.3, respectively. Section 2.4 provides an adaptive version of the target-decoy procedure. Section 2.5 establishes the theoretical foundation of our approach (Proofs are given in Supplementary Material). Numerical results on independent and dependent variables are given in Section 3. Applications to real data are shown in Section 4. Section 5 concludes the paper and points out some directions worthy of further study.

Consider a two-groups (case and control) study involving m random variables, X 1 , X 2 , · · · , X m . For each random variable X j where 1 ≤ j ≤ m, there are n random samples X j,1 , X j,2 , · · · , X j,n , in which X j,1 , X j,2 , · · · , X j,n1 are from the n 1 cases and X j,n1+1 , · · · , X j,n are from the n 0 = n − n 1 controls. For simplicity, the numbers of random samples (and similarly, cases and controls) are assumed to be the same for all random variables here, although our method does not rely on this assumption.

The goal is to search for random variables differently distributed between cases and controls. The null hypothesis for random variable X j used here is the exchangeable hypothesis H j0 : the joint distribution of X j,1 , X j,2 , · · · , X j,n is symmetric. In other words, the joint probability density function of X j,1 , X j,2 , · · · , X j,n (or the joint probability mass function if X j,1 , X j,2 , · · · , X j,n are discrete) satisfies f Xj,1,··· ,Xj,n (x j,1 , · · · , x j,n ) = f Xj,1,··· ,Xj,n (π n (x j,1 , · · · , x j,n )) for any possible x j,1 , · · · , x j,n and any permutation π n of x j,1 , · · · , x j,n . If X j,1 , · · · , X j,n are independent, this hypothesis is equivalent to that X j,1 , · · · , X j,n are identically distributed. Here we use the exchangeable hypothesis to deal with the case where X j,1 , · · · , X j,n are correlated but still an exchangeable sequence of random variables [10] .

Let S(x 1 , x 2 , · · · , x n ) be some scoring function satisfying

for any possible x 1 , · · · , x n , any permutation of n 1 elements π n1 (·) and that of n 0 elements π n0 (·). Note that most scoring functions evaluating the difference between x 1 , x 2 , · · · , x n1 and x n1+1 , x n1+2 , · · · , x n have the above symmetry property, including commonly used test statistics, e.g., the t-value as we used in this paper. Without loss of generality, we assume that larger scores are more significant. Note that neither the null distributions of scores nor the distributions of random variables are required to be known.

We first introduce the simplified version of our target-decoy procedure for FDR control. The intuition of the procedure is to let each random variable X j be labelled as target or decoy with the same chance if the null hypothesis for X j is true. At the same time, the chance of X j being labelled as decoy is expected to be negligible if its null hypothesis is false (this assumption is not needed for FDR control). Thus, the number of target tests of the true null hypotheses beyond a threshold can be approximated by the number of decoy ones.

Algorithm 2.1. The simplified target-decoy procedure.

1. For each 1 ≤ j ≤ m, calculate t scores including a target score and t − 1 decoy scores. The target score is S T j = S(X j,1 , X j,2 , · · · , X j,n ). Each decoy score is obtained by first sampling a permutation π n of X j,1 , X j,2 , · · · , X j,n randomly and then calculating the score as S(π n (X j,1 , X j,2 , · · · , X j,n )). Sort these t scores in descending order. For equal scores, sort them randomly with equal probability.

2. For each test j, calculate a final score S j and assign it a label L j ∈ {T, D}, where T and D stand for target and decoy, respectively. Assume that the rank of S T j is i. If i < (t + 1)/2, let L j be T and set S j as S T j . If i > (t + 1)/2, let L j be D and set S j as the score ranking i − ⌈t/2⌉. Otherwise, i = (t + 1)/2, let L j be T or D randomly and set S j as S T j . 3. Sort the m tests in descending order of the final scores. Let i 1 , · · · , i m be such that S i1 ≥ · · · ≥ S im (with tied values randomly broken). Let L (1) , · · · , L (m) be the the corresponding labels L i1 , · · · , L im , respectively.

4. If the specified FDR control level is α, let

and reject the hypothesis with rank j if L (j) = T and j ≤ K.

Note that there is a +1 correction to the number of decoy tests in the numerator of equation (2.1), which is key to FDR control as shown in Section 2.5. This correction was first proposed in the context of proteomics for target-decoy based FDR control of peptide identifications [27, 28] An example of the simplified target-decoy procedure is shown in Figure 2 .1. In it, m = n = 6, and t = 2. The first three columns of the data are from cases and the other three columns are from controls. The scoring function is S(

Obviously, the function satisfies the symmetry property defined in Section 2.1 but the null distribution is unknown. For each row, a target score S T j is first calculated for the original samples. Then, the procedure performs one permutation π 6 and calculates one decoy score S(π 6 (X j,1 , · · · , X j,6 )), since t − 1 = 1. If S T j > S(π 6 (X j,1 , · · · , X j,6 )), the final score S j is set as S T j and L j is set as T . Otherwise, if S T j < S(π 6 (X j,1 , · · · , X j,6 )), S j is set as S(π 6 (X j,1 , · · · , X j,6 )) and L j is set as D (Figure 2.1 (c) ). The 6 tests are sorted in descending order of S j to derive i j , S ij and L ij (i.e., L (j) ). For example, i 1 is 4 because S 4 is maximal in all the final scores. Then, with L (1) , · · · , L (6) , we can calculate

#{L (j) =T,j≤k}∨1 for each row k. If α is set as 0.25, we reject the first four hypotheses since

#{L (j) =T,j≤4}∨1 = 0.25 and the formula is larger than 0.25 for any k > 4 ( Figure 2.1 (d) ). For comparison, the Bayes method [46] is also applied to this example. In our setting, λ is set as 1/2. Meanwhile, the p-values are calculated with Wilcoxon rank sum test since the null-distributions of the random variables are unknown. If α is set as 0.25, no hypothesis is rejected with the Bayes method ( Figure 2.1 (e) ). Note that this toy example is for illustration purposes only. In most real-world applications, the Bayes method works well and the choice of α = 0.25 is too loose.

The random permutation used in the procedure can be generated by simple random sampling either with or without replacement, just as in the permutation tests. Similarly, with larger sampling number t − 1, the power of our approach will become slightly stronger as shown in Section 3. We can set t as min{

, τ }, where τ is the maximum number of permutations we would perform.

Unlike other FDR control methods, our approach does not depend on the null distribution. The number of permutations, t − 1 can be much smaller than that used in permutation tests. In our simulations, t − 1 was set as 49 or 1, while in the real data experiments, it was set as 19. Simulations demonstrate that the target-decoy approach can still control the FDR even if t − 1 was set as 1, in which case little information was revealed about the null distribution.

The +1 in the numerator of equation (2.1) is essential to accomplish FDR control. However, it has a side effect of reducing the power. This effect can be amplified under some conditions, e.g., when the number of false null hypotheses or the total number of hypotheses is small. To enhance the power, we introduce a parameter r(> 1) into the procedure. The intuition is to let each random variable X j be labelled as target with probability 1 2r and as decoy with probability 1 2 if the null hypothesis for X j is true. Thus, the number of decoy tests beyond a threshold is about r times the number of target ones of the true null hypotheses, and then 1 r times the ratio of the number (added by one) of decoy tests to the number of target ones beyond a threshold can be used for FDR control.

For any fixed 1 ≤ r ≤ ( n n0 )

, the standard target-decoy procedure (we will omit the word standard below for simplicity) is as follows. 2. For each 1 ≤ j ≤ m, let Λ j = i − P j where i is the rank of S T j in the t scores, and P j is a random draw from uniform[0, 1) distribution. Calculate a final score S j and assign a label L j ∈ {T, D, U }, where T, D and U stand for target, decoy and unused, respectively.

distribution, L j be D and S j be the score ranking ⌈Λ ′ j ⌉-th. Otherwise, let L j be U and S j be −∞.

and reject the hypothesis with rank j if L (j) = T and j ≤ K.

In

Step 2, we introduce P j to make that Pr(L j = T ) = 1 2r and Pr(L j = D) = 1 2 if the null hypothesis for random variable X j is true. If Λ j ≤ t 2r , X j is labelled as target and S j is set as S T j . If t 2 < Λ j ≤ t, X j is labelled as decoy, and S j is a random score sampled from the largest ⌈ t 2r ⌉ scores. Otherwise, we have t 2r < Λ j ≤ t 2 , X j is labelled as unused, and S j is set as −∞ such that it is at the end of the queue after Step 3. Section 2.5 will show that the above target-decoy procedure controls the FDR for any fixed r. In practice, one can set the value of r empirically or simply set r = 1, which reduces the target-decoy procedure into its simplified version described in Section 2.2. Alternatively, an algorithm can be used to choose r adaptively for a given dataset as discussed in Section 2.4.

The parameter r is for adjusting the probability that a true null hypothesis is labelled as T . On the one hand, equation (2.2) can be too conservative for a small r, e.g., 1 as in the simplified target-decoy procedure, because of the addition of 1 in the numerator if there are only a few false null hypotheses. For example, assume that the total number of tests is 80 and the FDR control level is 0.01. If r is set as 1, no hypothesis will be rejected, because the numerator of equation (2.2) is always no less than 1 and the fraction is greater than 1/80 > 0.01. On the other hand, if r is too large, many false null hypotheses will be labelled as U or D, potentially decreasing the power of testing. Thus, r should be set appropriately in practice to enhance the power. Below, we provide an adaptive procedure to choose a suitable r for the given dataset and the FDR control level. The intuition of the adaptive procedure is to split the data into two parts, with one part used to choose r and the other for inference. Algorithm 2.3. The adaptive target-decoy procedure.

1. Divide the samples of each random variable into two parts as follows. Choose a suitable n 2 which is smaller than n 0 and n 1 from some range, say 5 ≤ n 2 ≤ min{⌊n 0 /2⌋, ⌊n 1 /2⌋}. For each random variable X j where 1 ≤ j ≤ m, randomly choose n 2 samples from X j,1 , X j,2 , · · · , X j,n1 and X j,n1+1 , · · · , X j,n , respectively. Let X 1 j,1 , X 1 j,2 , · · · , X 1 j,2n2 be these samples. The rest has n 1 − n 2 samples from the cases and n 0 − n 2 samples from the controls. Let X 2 j,1 , X 2 j,2 , · · · , X 2 j,n−2n2 be the rest samples.

) and perform the target-decoy procedure on X 1 j,1 , X 1 j,2 , · · · , X 1 j,2n2 where 1 ≤ j ≤ m for some range of r, say R = {1, 2, 5, 10, 15, 20, 25}. Let r max be the one such that the most hypotheses are rejected by the target-decoy procedure.

3. Perform the target-decoy procedure on X 2 j,1 , X 2 j,2 , · · · , X 2 j,n−2n2 where 1 ≤ j ≤ m with r = r max and reject corresponding hypotheses.

In this section, we will show that the target-decoy procedure controls the FDR. Let H j = 0 and H j = 1 denote that the null hypothesis for test j is true and false, respectively. Note that H 1 , H 2 , · · · , H m are constants in the setting of hypothesis testing. Define Z j for 1 ≤ j ≤ m as follows.

Let S (1) , S (2) , · · · , S (m) denote the sorted scores and Z (1) , Z (2) , · · · , Z (m) denote the sorted sequence of Z 1 , Z 2 , · · · , Z m . Let #» S and # » S ̸ =j denote S 1 , · · · , S m and S 1 , · · · , S j−1 , S j+1 , · · · , S m , respectively. Let # » S (·) and # » S (̸ =j) denote S (1) , · · · , S (m) and S (1) , · · · , S (j−1) , S (j+1) , · · · , S (m) , respectively. We define #» s , # » s ̸ =j , # » s (·) and # » s (̸ =j) similarly. For example, we will use # » s (·) to denote a sequence of m constants, s (1) , · · · , s (m) , which is one of the observed values of S (·) . We also define #» L, #» Z , #» H, # » L (̸ =j) , etc. Then we have the following three theorems.

In the simplified target-decoy procedure, if the m random variables are independent, then for any fixed 1 ≤ j ≤ m and any possible # » s (·) and # » z (̸ =j) we have

) .

In the target-decoy procedure, if the m random variables are independent, then for any fixed 1 ≤ j ≤ m and any possible # » s (·) and # » z (̸ =j) we have

) .

Suppose that S (1) , S (2) , · · · , S (m) ,Z (1) , Z (2) , · · · , Z (m) are random variables satisfying S (1) ≥ S (2) ≥ · · · ≥ S (m) and Z (1) , Z (2) , · · · , Z (m) ∈ {−2, −1, 0, 1}, and r is a positive constant. For any α ∈ (0, 1], define

If there is no such k, let K = 0. If for any fixed j and any possible # » s (·) and # » z (̸ =j) ,

The proofs of these theorems are given in the Supplementary Materials. Theorem 2.3 indicates that the target-decoy procedure controls the FDR if the m random variables are independent.

Specially, all of the above theorems hold for the adaptive target-decoy procedure. Recall that the null hypothesis for random variable X j used here is the exchangeable hypothesis H j0 : the joint probability density function of X j,1 , X j,2 , · · · , X j,n satisfies f Xj,1,··· ,Xj,n (x j,1 , · · · , x j,n ) = f Xj,1,··· ,Xj,n (π n (x j,1 , · · · , x j,n )) for any possible x j,1 , · · · , x j,n and any permutation π n of x j,1 , · · · , x j,n . If H j0 is true, it is easy to see that X 2 j,1 , X 2 j,2 , · · · , X 2 j,n−2n2 are also exchangeable.

We used Monte-Carlo simulations to study the performance of our approach. The target-decoy procedure were compared with two popular traditional multiple testing methods, including the Bayes method [44] [45] [46] and the empirical Bayes method [16, 18, 19] . Simulations were conducted for both independent and dependent random variables. We mainly evaluated the performance of the simplified target-decoy procedure. To show the effectiveness of adjusting r, we also did a simulation on a small dataset and compared the adaptive target-decoy procedure with the simplified target-decoy procedure.

In the simulation, we considered the case-control studies in which the random variables follow the normal distribution or the gamma distribution. In addition to the normal distribution, we did simulation experiments for the gamma distribution because many random variables in real world are gamma-distributed. Recall that the case-control study consists of m random variables. For each random variable, there are n random samples, n 1 of which are from the cases and the other n 0 = n − n 1 are from the controls. Let X j,1 , X j,2 , · · · , X j,n be the n random samples for random variable X j .

The observation values from the normal distribution were generated in a way similar to [7] . First, let ζ 0 , ζ 11 , · · · , ζ 1n , · · · , ζ m1 , · · · , ζ mn be independent and identically distributed random variables following the N (0, 1) distribution. Next, let X j,i = √ ρζ 0 + √ 1 − ρζ ji + µ ji for j = 1, · · · , m and i = 1, · · · , n. We used ρ = 0, 0.4 and 0.8, with ρ = 0 corresponding to independence and ρ = 0.4 and 0.8 corresponding to typical moderate and high correlation values estimated from real microarray data, respectively [1] . The values of µ ji are zero for i = n 1 + 1, n 1 + 2, · · · , n, the n 0 controls. For the n 1 cases where i = 1, 2, · · · , n 1 , the values of µ ji are also zero for j = 1, 2, · · · , m 0 , the m 0 hypotheses that are true null. The values of µ ji for i = 1, 2, · · · , n 1 and j = m 0 + 1, · · · , m are set as follows. We let µ ji = 1, 2, 3 and 4 for j = m 0 + 1, m 0 + 2, m 0 + 3, m 0 + 4, respectively. Similarly, we let µ ji = 1, 2, 3 and 4 for j = m 0 + 5, m 0 + 6, m 0 + 7, m 0 + 8, respectively. This cycle was repeated to produce µ (m0+1)1 , · · · , µ (m0+1)n1 , · · · , µ m1 , · · · , µ mn1 for the false null hypotheses.

The observation values from the gamma distribution, which is characterized using shape and scale, were generated in the following way. First, let Γ 0 , Γ 11 , · · · , Γ 1n , · · · , Γ m1 , · · · , Γ mn be independent random variables where Γ 0 follows the Γ(k 0 , 1) distribution and Γ ji follows the Γ(k ji , 1) distribution for any j = 1, · · · , m and i = 1, · · · , n. Next, let X j,i = Γ ji for j = 1, · · · , m and i = 1, · · · , n in the simulation study for independent random variables and let X j,i = Γ 0 + Γ ji for dependent random variables. To obtain reasonable correlation values, k 0 was set as 4 and k ji was set as 1 for i = n 1 + 1, n 1 + 2, · · · , n, the n 0 controls. For the n 1 cases where i = 1, 2, · · · , n 1 , k ji was set as 1 for j = 1, · · · , m 0 , the m 0 hypotheses that are true null. The values of k ji for i = 1, 2, · · · , n 1 and j = m 0 + 1, · · · , m are set as follows. We let k ji = 2, 3, 4 and 5 for j = m 0 + 1, m 0 + 2, m 0 + 3, m 0 + 4, respectively. Similarly, we let k ji = 2, 3, 4 and 5 for j = m 0 + 5, m 0 + 6, m 0 + 7, m 0 + 8, respectively. This cycle was repeated to produce k (m0+1)1 , · · · , k (m0+1)n1 , · · · , k m1 , · · · , k mn1 for the false null hypotheses.

The specified FDR control level α was set as 5% or 10%. The total number of tests, m, was set as 10000. The proportion of false null hypotheses was 1% or 10%. The total sample size, n, was set as 20, consisting of the same numbers of cases and controls.

Three different approaches to FDRs were compared, including the Bayes method [44] [45] [46] , the empirical Bayes method [16, 18, 19] and our target-decoy approach. The Bayes method and the empirical Bayes method are among the most remarkable multiple testing methods. To compare the power of these methods, we rejected the hypotheses against the specified FDR control level α. The rejection threshold, s, for the Bayes method was set as the largest p-value such that q-value(s) is no more than α [44, 45] . The rejection threshold, s, for the empirical Bayes method was set as the minimum z-value such that Efdr(s) is no more than α, where Efdr(s) is the expected fdr (local false discovery rate) of hypotheses with z-values no smaller than s [14, 15] . Specifically, the R packages "locfdr" version 1.1-8 [14] , and "qvalue" version 2.4.2 [47] were used. Each simulation experiment was repeated for 1000 times. We calculated the mean number of rejected hypotheses to evaluate the power of each method. The realized FDR of rejected hypotheses was calculated as the mean of observed false discovery proportions (FDPs) in all repetitions. Note that the variance of the mean of FDPs of 1000 repetitions is one thousandth of the variance of FDPs. We also estimated the standard deviation of the mean of FDPs from the sample standard deviation of FDPs.

The p-values of the Bayes method and the z-values of the empirical Bayes method were calculated with the Student's t-test, Wilcoxon rank sum test or the Student's t-test with permutation. For the Student's t-test, we used the Welch's t-test, a two-sample unequal variances t-test, which is defined as follows,

Here, X i , s i , and N i are the i-th sample mean, sample standard deviation and sample size, respectively. For the Student's t-test with permutation, we sampled the cases and the controls for each test, calculated the z-values for sampled data by t-test, and calculated the p-values with the null distribution of pooled z-values [38, 52] . The sampling number of permutations was set as 10 [17] . For our target-decoy approach, the cases and the controls of each test were permuted for 49 times or only once, and the t-values and the test statistics of the Wilcoxon rank sum test were used. We did the one-permutation experiments where little information about the null distributions was revealed to demonstrate that our approach does not rely on the null distribution. Because the permutation is performed inherently in our target-decoy approach, the extra permutation is unnecessary.

We will use abbreviations to represent the experiments. For example, Bayes,permutation, Normal, 10%, ρ = 0.8 represents the simulation experiment where the Bayes method combined with the pooled permutation is used, the random variables follow the normal distribution, the proportion of false null hypotheses is 10% and the correlation values are 0.8. For our targetdecoy approach, t-value, 49, Gamma, 1% represents the simulation experiment where the t-value is used as the score, 49 permutations are performed for each test, the random variables follow the gamma distribution and the proportion of false null hypotheses is as low as 1%. Figure 3 .1 shows the realized FDRs of different methods with independent random variables while the specified FDR control level α was no more than 10%. Table 3 .1 gives the realized FDRs while the specified FDR control level α was 5% or 10%. In all cases, the target-decoy approach controlled the FDR, and the realized FDRs were favourably close to α. The empirical Bayes and Bayes methods performed well when the random variables followed the normal distribution. However, they considerably overestimated the FDRs with t-test for the gamma distribution. With the pooled permutation, some of the realized FDRs exceeded α for the gamma distribution as marked by asterisks in the table. Of course, some small exceedances are not necessarily the evidence of a fail of FDR control but may be due to Monte Carlo error. At last, the Wilcoxon rank-sum test coupled with Bayes or empirical Bayes occasionally overestimated the FDRs. Table 3 .2 shows the statistical powers of different methods with independent random variables. When the random variables followed the normal distribution, the powers of the three methods were overall comparable with each other. In the case of gamma distribution, the target-decoy approach was much more powerful than Bayes and empirical Bayes when t-test was used. Permutation based Bayes and empirical Bayes had higher power but at the cost of uncontrolled FDR. When the Wilcoxon rank-sum test was used, our approach was more powerful than the other two methods except the only case of Gamma, 1% and α=0.05.

In all the above experiments, the target-decoy approach successfully controlled the FDR and meanwhile it was remarkably powerful. Notably, the results obtained with 49 permutations or 1 permutation in the target-decoy approach were quite similar, indicating that the proposed Table 3 .4 shows the statistical power of different methods with dependent random variables. When the random variables followed the normal distribution, the Bayes method was less powerful than the target-decoy approach while the Wilcoxon rank-sum test was used. Though the Bayes method seems to be a little more powerful than the target-decoy approach while the t-test was used, the realized FDR of this method exceeded the specified FDR control level. The empirical Bayes method was less powerful than the Bayes method and our target-decoy approach in the Normal,10%, ρ = 0.4 experiments.

When the random variables followed the gamma distribution, the target-decoy approach was much more powerful than the Bayes and empirical Bayes methods, even if only one permutation was performed. Though the pooled permutation seems to be powerful, the FDRs were not controlled.

Similar to the results for independent random variables, the target-decoy approach performed significantly better than other methods for dependent random variables. It controlled the FDR in all cases without loss of statistical power.

To show the effectiveness of the adaptive target-decoy procedure for small datasets, a casecontrol study involving 200 random variables was simulated. The null hypotheses of 20 random variables were true and the others were false. For each random variable, there were 20 random samples, 10 of which were from the cases and the other 10 were from the controls. The observation values from the cases where the null hypotheses were false followed the N (4, 1) distribution, and all the other observation values followed the N (0, 1) distribution. All the observation values were independent. In the simulation, the cases and the controls of each test were permuted for 49 times and the t-values were used.

As shown in Table 3 .5, the adaptive procedure controlled the FDR for all values of α, and its power was much larger than the simplified target-decoy procedure for small α. We applied the target-decoy approach to two real biological datasets, including an arabidopsis genomics dataset and a COVID-19 proteomics dataset. Similar to the simulation experiments, the Bayes method, the empirical Bayes method and our target-decoy approach (the simplified procedure) are compared here. The p-values in the Bayes method and the z-values in the empirical Bayes method were calculated with the Student's t-test, Wilcoxon rank sum test, and the Student's t-test with permutation, respectively. For the Bayes method, two-tailed tests were used. For the empirical Bayes method, we first transformed the FDR control level to the threshold of local fdr and then identified differentially expressed genes/proteins according to the threshold. For the target-decoy approach, the absolute t-values and the test statistics of the Wilcoxon rank sum test were used.

To determine whether Arabidopsis genes respond to oncogenes encoded by the transfer-DNA (T-DNA) or to bacterial effector proteins codelivered by Agrobacteria into the plant cells, Lee et al. [33] conducted microarray experiments at 3 h and 6 d after inoculating wounded young Arabidopsis plants with two different Agrobacterium strains, C58 and GV3101. Strain GV3101 is a cognate of strain C58, which only lacks T-DNA, but possesses proteinaceous virulence (Vir) factors such as VirD2, VirE2, VirE3 and VirF [51] . Wounded, but uninfected, stalks were served as control. Here we just use the 6-d postinoculation data as an example (downloaded from http://www.ncbi.nlm.nih.gov/geo/, GEO accession: GSE14106). The data consisting of 22810 genes were obtained from the C58 infected and control stalks. Both infected and control stalks were with three replicates.

Because it is unknown which genes were really differentially expressed, the realized FDRs cannot be computed here. The power of these methods are compared. In fairness, the sampling numbers were set as 19 = (

) − 1 in all the experiments, including the pooled permutation and the target-decoy approach. That is, all possible permutations were generated for each gene.

As shown in Table 4 .1, no differentially expressed genes were found by the empirical Bayes method or the Wilcoxon rank-sum test. For the Bayes method, the t-test was more powerful than the pooled permutation for small α (≤ 0.05) while the pooled permutation was more powerful for large α (≥ 0.06). The target-decoy approach with t-test was most powerful for 0.04 ≤ α ≤ 0.09. The additional genes identified by the target-decoy approach are reliable, because similar numbers of genes, i.e., 785 genes for FDR 0.034, 1427 genes for FDR 0.050 and 2071 genes for FDR 0.065, were reported by a more specific analysis [49] .

In a study to discover differentially expressed proteins that correlate with the COVID-19 disease, the serums of 118 subjects were sampled, including 28 severe COVID-19 patients, 37 nonsevere COVID-19 patients, 25 non-COVID-19 patients and 28 healthy subjects [43] . The proteins from these serum samples were analyzed with tandem mass spectrometry. From the resulting mass spectra, 791 proteins were successfully identified and quantified, and were subjected to subsequent statistical analysis.

To find differentially expressed proteins specific to the COVID-19 patients, the healthy subjects were first served as the control group and were compared with the other three groups, respectively. The FDR control level of 0.05 was used. The numbers of differentially expressed proteins found by the three FDR methods (Bayes, empirical Bayes and target-decoy) are listed in columns 2 to 4 of Table 4 .2. Then, those proteins found in the severe or nonsevere COVID-19 patients but not in the non-COVID-19 patients were regarded as the final set of differentially expressed proteins related to the COVID-19 disease. The numbers of them are listed in column 5 of Table 4 .2. As shown, the target-decoy method using t-test had reported 132 proteins, more than those reported by other methods. In the original study by [43] , 105 COVID-19 related proteins were reported with FDR controlled at 0.05 using the BH method [6] . Here, we compared the proteins found by the three methods with the 105 proteins. The numbers of consistent proteins were listed in column 6 of Table 4 .2. Higher numbers probably indicate higher sensitivity and precision. It can be seen that both the target-decoy method and the Bayes method found 104 out of the 105 originally reported proteins when t-test was used. Moreover, the target-decoy method reported 28 additional proteins, which could also be COVID-19 related ones.

In this paper, we presented the target-decoy approach to FDR control for multiple hypothesis testing. This approach is free of estimating the null distribution or the null proportion, and can rigorously control the FDR for independent variables. Simulation studies demonstrated its ability in FDR control and higher power than two representative traditional methods. The higher power of the approach was also illustrated by two applications to real biomedical data.

In the target-decoy approach, the scores are only used to determine the labels and ranks of tests, and the statistical meaning of the scores is not required. Therefore, any test statistic can be used, regardless of whether or not its null distribution is known. This flexibility brings the potential to increase the power of multiple testing. In this paper, we only used the t-value and the test statistic of the Wilcoxon rank sum test for a fair comparison with the traditional FDR control methods. In the simulation study, the t-value is more powerful than the statistic of the Wilcoxon rank sum test for the normal distribution and is less powerful for the Gamma distribution. In the applications to real data, no differentially expressed genes or proteins were found by the Wilcoxon rank-sum test, but the t-value performed pretty well. Overall, the tvalue is a good choice for the target-decoy procedure. Trying other statistics or engineering specific scoring functions for different types of data is a topic worthy of future research. For example, machine learning-derived feature importance scores can in principle be directly used in our approach.

The adaptive target-decoy procedure chooses an r by data splitting and thus reduces the size of the data used for inference. As shown in Section 3.4, the impact of size reduction is insignificant if the sample size of each variable is moderate. In this case, the adaptive targetdecoy procedure can be much more powerful than the simplified procedure. However, if the sample size is very small, the size reduction may diminish the power greatly. How to choose r properly in that case deserves further study.

In this paper, FDR control was proved for independent variables, and only simulation evaluation was performed for dependent variables. The theoretic analysis under dependency will be our future work. Especially, whether permutation-based decoys can lead to FDR control under some kind of dependency is an interesting problem that needs to be addressed.

Moreover, our control theorem is based on the exchangeable hypothesis. This null hypothesis is stronger than the more popular hypothesis that the two groups have the same means. The performance of our approach for the 'equality of means' hypothesis needs further studies.

Finally, our approach can be extended to the pair-matched case-control study by adjusting Step 1 of the target-decoy procedure, i.e., randomly exchange the paired observed values just as the permutation tests for pair-matched study instead of permuting them. The other steps and analyses are the same.

Supplementary Materials. The supplementary material provides the proofs of theorems in the main text.

Software package. The R package for the target-decoy procedure can be downloaded from http://fugroup.amss.ac.cn/software/TDFDR/TDFDR.html.

Utility of correlation measures in analysis of gene expression

Controlling the false discovery rate via knockoffs

A knockoff filter for high-dimensional selective inference

Robust inference with knockoffs

Weighted false discovery rate control in large-scale multiple testing

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Adaptive linear step-up procedures that control the false discovery rate

The control of the false discovery rate in multiple testing under dependency

Panning for gold: model-x knockoffs for high dimensional controlled variable selection

Probability theory: independence, interchangeability, martingales

Beyond target-decoy competition: Stable validation of peptide and protein identifications in mass spectrometry-based discovery proteomics

Kertsz-Farkas, A. Bias in false discovery rate estimation in mass-spectrometry-based peptide identification

Multiple hypothesis testing in proteomics: a strategy for experimental work

Large-scale simultaneous hypothesis testing: the choice of a null hypothesis

Size, power and false discovery rates

Microarrays, empirical bayes and the two-groups model

Large-scale inference: empirical Bayes methods for estimation, testing, and prediction

Empirical bayes methods and false discovery rates for microarrays

Empirical bayes analysis of a microarray experiment

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

Controlling the FDR through multiple competition

Multiple competition-based fdr control and its application to peptide detection

Controlling the fdr in variable selection via multiple knockoffs

Rank: Large-scale inference with graphical nonlinear knockoffs

Ipad: Stable interpretable forecasting with knockoffs inference

Improving the stability of the knockoff procedure: Multiple simultaneous knockoffs and entropy maximization

Multiple hypothesis testing methods for large-scale peptide identification in computational proteomics

A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics

A direct approach to false discovery rates by decoy permutations

Averaging strategy to reduce variability in target-decoy estimates of false discovery rate

Comments on the analysis of unbalanced microarray data

Estimating the proportion of true null hypotheses, with application to dna microarray data

Agrobacterium tumefaciens promotes tumor induction by modulating pathogen defense in arabidopsis thaliana

Power of ordered hypothesis testing

Unbiased false discovery rate estimation for shotgun proteomics based on the target-decoy approach

Ggm knockoff filter: False discovery rate control for gaussian graphical models

Model-free feature screening and fdr control with knockoff features

Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control

Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses

Deep knockoffs

Some results on false discovery rate in stepwise multiple testing procedures

Bayes and empirical-bayes multiplicity adjustment in the variable-selection problem

Proteomic and metabolomic characterization of covid-19 patient sera

A direct approach to false discovery rates

The positive false discovery rate: a bayesian interpretation and the q-value

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach

Statistical significance for genomewide studies

A unified approach to false discovery rate estimation

A general method for accurate estimation of false discovery rates in identification of differentially expressed genes

Significance analysis of microarrays applied to the ionizing radiation response

Recognition of the agrobacterium tumefaciens vire2 translocation signal by the virb/d4 transport system does not require vire1

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

A parametric model to estimate the proportion from true null using a distribution for p-values

Acknowledgments. The authors would like to thank Xiaoya Sun for her help in data analysis. 

approach is not sensitive to the number of permutations. In this part, we present the simulation results for the simplified target-decoy procedure on dependent random variables. Table 3 .3 shows the realized FDRs of different methods with dependent random variables while the specified FDR control level α was 5% or 10%. The results show that the t-test with empirical Bayes overestimated the FDRs for the gamma distribution. The realized FDRs of pooled permutation significantly exceeded α when the random variables followed the gamma distribution. The Wilcoxon rank-sum test with Bayes or empirical Bayes overestimated the FDRs. The target-decoy approach controlled the FDR in all cases.