Multiple testing
Multiple testing
The multiple comparisons or multiple testing problem occurs when one considers a set of statistical tests simultaneously. In our context, this problem occurs when one wants to identify or select the variables (thousands of genes) linked to the phenotype response.
Increase of the first risk
Let us start by an example to explain why the first kind risk increases in the context of multiple testing.
Example Say you wish to test simultaneously the level of expression of a set of genes between two conditions (say dead/alive subjects).
The first idea is to test each gene's level of expression separately with a t.test
, using a first kind risk α. This idea reveals not good. Let us consider a simple case where you have 20 genes to test, and a first kind risk α of 0.05.
What’s the probability of observing at least one significant result just due to chance?
Let us assume that the tests are independent (they are usually not in genomics). Then we have:
P(at least one significant result) = 1 − P(no significant results)
= 1 − (1 − 0.05)20 ≈ 0.64
Therefore, with 20 tests being considered, we have a 64% chance of observing at least one significant result, even if all of the null hypotheses are actually true. In genomics, the number of simultaneous tests to be performed is usually larger than 20... and the probability of getting a significant result simply due to chance keeps going up.
These errors are called false positives and Type I errors.
Methods for dealing with multiple testing, also called controlling procedures, are based on adjusting the first kind risk α in some way. The objective is that the probability of observing at least one significant result due to chance remains below the desired significance level.
False Discovery Rate
An important notion in large-scale multiple testing (as often happens in genomics) is the false discovery rate (FDR). This is defined as the proportion of false positives among all significant results. The FDR works by estimating some rejection region so that, on average, FDR < α.
Such procedure is said to control the FDR.
Controlling procedures
Different controlling procedures have been proposed. Let us start with Bonferroni correction. The number of tests to be applied is denoted n in the following.
Bonferroni correction
In the example above with 20 genes and α = 0.05, you’d only reject a null hypothesis if the p-value is less than α/20 = 0.0025.
The Bonferroni correction tends to be a bit too conservative, especially if there are a large number of tests and/or the test statistics are positively correlated. The correction comes at the cost of increasing the probability of producing false negatives, i.e., reducing statistical power.
The Bonferroni correction is implemented in R
, through the function
p.adjust
This function takes as input a vector of p-values, corresponding to the n tests of comparison in level of expression. Creating this vector manually would be time-consuming (and stupid when n is large!). We use the function apply
that applies the test function (t.test
, wilcox.test
, etc) to all the columns of a genomics matrix.
Example We use the two data sets LenzT
and LenzI
. Let us assume we want to compare the level of expression of genes according to the status. We apply a t.test
to each gene.
R> Y<-readRDS("data/LenzT.rds") R> X<-readRDS("data/LenzI.rds") R> names(X) [1] "gender" "age" "diagno" "status" "follup" "regim" "ecog" "stage" [9] "ldhrat" "extnod" R> dim(Y) [1] 414 17290There are n = 17290 genes. We focus on the first 100 genes. We have to calculate n = 100 p-values of a
t.test
. Let us use the function apply
:
R> Y<-Y[,1:100] R> pstatus<-apply(Y,2,function(v){t.test(v~X$status)$p.value})The vector
pstatus
contains the n = 100 p-values.
R> pstatus DDR1 RFC2 HSPA6 PAX8 GUCA1A UBA7 1.772361e-01 4.455526e-01 1.332655e-02 3.539837e-05 3.437319e-01 2.220430e-06 THRA PTPN21 CCL5 CYP2E1 EPHB3 ESRRA 4.395653e-02 8.213417e-01 3.717196e-07 6.337907e-01 1.785658e-01 2.932282e-05 CYP2A6 SCARB1 TTLL12 WFDC2 MAPK1 ADAM32 1.312198e-01 8.361984e-01 4.448381e-07 5.442100e-01 7.911499e-01 2.008663e-01 SPATA17 PRR22 PXK VPS18 MSANTD3 SLC46A1 7.814449e-02 8.003794e-02 9.469434e-06 6.124951e-03 1.292448e-01 2.233710e-06 TIMD4 SLC39A5 ZDHHC11 ATP6V1E2 CILP2 PIGX 2.417954e-02 5.743720e-03 2.899784e-05 1.245774e-04 2.010269e-01 9.173653e-01 TMEM196 SLC39A13 BEST4 AK9 CORO6 TMEM106A 8.659954e-01 1.287665e-07 1.105442e-03 1.892304e-09 1.697229e-03 3.407954e-02 ALG10 TTC39C NEXN C15orf40 RAX2 MFAP3 2.885099e-01 7.064045e-05 2.201809e-05 3.260682e-01 1.044170e-01 7.477207e-01 EYA3 GIMAP1 KLK8 CCDC65 FAM122C CFAP53 6.291313e-01 1.485883e-02 3.705238e-01 8.790098e-01 3.898992e-01 6.005410e-01 ARMCX4 RBBP6 CENPBD1 TRIOBP CATSPER1 HOXD4 9.246867e-02 1.586550e-01 8.786392e-02 4.656764e-01 2.287150e-01 8.749417e-01 GSC SP7 PDE7A CNOT7 CRYZL1 PRSS33 4.653813e-02 3.701823e-01 2.548724e-03 3.632067e-07 6.967562e-01 4.906279e-04 CBARP MCMDC2 TIRAP LEAP2 MSI2 SCIN 1.918569e-01 9.493551e-01 3.749062e-04 5.328031e-03 6.306454e-02 1.104760e-01 CTCFL C4orf33 ZNF333 TVP23C RDH10 SRSF12 2.385870e-01 1.113861e-02 4.647826e-04 4.835807e-04 4.033206e-01 2.919043e-01 FAM71A GAPT ERICH5 CCDC185 ENTHD1 TSSK3 7.703352e-04 3.124296e-01 8.477699e-03 4.482195e-01 1.685822e-05 1.073876e-03 WFDC6 CLEC12A BRF1 TMEM266 CALML6 NLRP5 6.301815e-02 7.594683e-01 6.900515e-01 5.271653e-01 6.412504e-02 7.925233e-02 ODF4 CLEC4F WFDC9 NEDD1 TTLL10 CALR3 3.898969e-01 8.298857e-03 2.438103e-03 4.898098e-03 3.748525e-01 3.659087e-02 ETV3 KLHL10 TM2D3 ZNF485 WDR17 MFSD6L 7.060238e-01 4.953442e-02 1.013675e-01 1.814030e-01 1.856326e-02 3.131468e-01 CDH23 ANKAR MEGF11 GPR182 7.272198e-02 5.385366e-01 1.606548e-02 8.070153e-04Then, to apply the Bonferroni correction, we use the function
p.adjust
:
R> pstatusadB<-p.adjust(pstatus,method="bonferroni")The p-values have been increased. As seen on the density plot below, a lot of p-values have been set to 1 with the Bonferroni correction method. This is an illustration of the fact that Bonferroni is too conservative. Code R :
R> plot(density(pstatus), main="") R> lines(density(pstatusadB), col="red")
Résultat : #r{codeimg}
To display the most significant genes, we sort them and display their names and p-values:R> sort(pstatusadB) AK9 SLC39A13 CNOT7 CCL5 TTLL12 UBA7 1.892304e-07 1.287665e-05 3.632067e-05 3.717196e-05 4.448381e-05 2.220430e-04 SLC46A1 PXK ENTHD1 NEXN ZDHHC11 ESRRA 2.233710e-04 9.469434e-04 1.685822e-03 2.201809e-03 2.899784e-03 2.932282e-03 PAX8 TTC39C ATP6V1E2 TIRAP ZNF333 TVP23C 3.539837e-03 7.064045e-03 1.245774e-02 3.749062e-02 4.647826e-02 4.835807e-02 PRSS33 FAM71A GPR182 TSSK3 BEST4 CORO6 4.906279e-02 7.703352e-02 8.070153e-02 1.073876e-01 1.105442e-01 1.697229e-01 WFDC9 PDE7A NEDD1 LEAP2 SLC39A5 VPS18 2.438103e-01 2.548724e-01 4.898098e-01 5.328031e-01 5.743720e-01 6.124951e-01 CLEC4F ERICH5 DDR1 RFC2 HSPA6 GUCA1A 8.298857e-01 8.477699e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 THRA PTPN21 CYP2E1 EPHB3 CYP2A6 SCARB1 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 WFDC2 MAPK1 ADAM32 SPATA17 PRR22 MSANTD3 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 TIMD4 CILP2 PIGX TMEM196 TMEM106A ALG10 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 C15orf40 RAX2 MFAP3 EYA3 GIMAP1 KLK8 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 CCDC65 FAM122C CFAP53 ARMCX4 RBBP6 CENPBD1 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 TRIOBP CATSPER1 HOXD4 GSC SP7 CRYZL1 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 CBARP MCMDC2 MSI2 SCIN CTCFL C4orf33 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 RDH10 SRSF12 GAPT CCDC185 WFDC6 CLEC12A 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 BRF1 TMEM266 CALML6 NLRP5 ODF4 TTLL10 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 CALR3 ETV3 KLHL10 TM2D3 ZNF485 WDR17 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 MFSD6L CDH23 ANKAR MEGF11 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
Benjamini-Hochberg
The Benjamini-Hochberg (1995) procedure controls the FDR at level α. The idea is the following. Let us consider n tests and their corresponding p-values, in ascending order P1, …, Pn. The procedure is in two steps:
find the largest k such that $P_k \leq \frac kn \alpha$.
Reject the null hypothesis for the tests i = 1, …, k.
R> pstatusadBH<-p.adjust(pstatus,method="BH")The sorted p-values are as follow:
R> sort(pstatusadBH) AK9 SLC39A13 CCL5 TTLL12 CNOT7 UBA7 1.892304e-07 6.438326e-06 8.896762e-06 8.896762e-06 8.896762e-06 3.191014e-05 SLC46A1 PXK ENTHD1 NEXN ESRRA ZDHHC11 3.191014e-05 1.183679e-04 1.873136e-04 2.201809e-04 2.443568e-04 2.443568e-04 PAX8 TTC39C ATP6V1E2 TIRAP PRSS33 ZNF333 2.722952e-04 5.045746e-04 8.305162e-04 2.343164e-03 2.582252e-03 2.582252e-03 TVP23C FAM71A GPR182 BEST4 TSSK3 CORO6 2.582252e-03 3.842930e-03 3.842930e-03 4.806272e-03 4.806272e-03 7.071789e-03 WFDC9 PDE7A NEDD1 LEAP2 SLC39A5 VPS18 9.752412e-03 9.802786e-03 1.814110e-02 1.902868e-02 1.980593e-02 2.041650e-02 ERICH5 CLEC4F C4orf33 HSPA6 GIMAP1 MEGF11 2.649281e-02 2.649281e-02 3.375335e-02 3.919573e-02 4.245379e-02 4.462633e-02 WDR17 TIMD4 TMEM106A CALR3 THRA GSC 5.017097e-02 6.363038e-02 8.738344e-02 9.147718e-02 1.072110e-01 1.108051e-01 KLHL10 MSI2 WFDC6 CALML6 CDH23 SPATA17 1.151963e-01 1.394023e-01 1.394023e-01 1.394023e-01 1.547276e-01 1.600759e-01 PRR22 NLRP5 CENPBD1 ARMCX4 TM2D3 RAX2 1.600759e-01 1.600759e-01 1.722822e-01 1.778244e-01 1.912594e-01 1.933649e-01 SCIN CYP2A6 MSANTD3 RBBP6 DDR1 EPHB3 2.008654e-01 2.302101e-01 2.302101e-01 2.735431e-01 2.973820e-01 2.973820e-01 ZNF485 CBARP ADAM32 CILP2 CATSPER1 CTCFL 2.973820e-01 3.094466e-01 3.141046e-01 3.141046e-01 3.518693e-01 3.614955e-01 ALG10 SRSF12 GAPT MFSD6L C15orf40 GUCA1A 4.292710e-01 4.292710e-01 4.473525e-01 4.473525e-01 4.592510e-01 4.774054e-01 KLK8 SP7 TTLL10 FAM122C ODF4 RDH10 4.998033e-01 4.998033e-01 4.998033e-01 5.063626e-01 5.063626e-01 5.170777e-01 RFC2 CCDC185 TRIOBP TMEM266 WFDC2 ANKAR 5.602744e-01 5.602744e-01 5.749091e-01 6.428845e-01 6.478690e-01 6.478690e-01 CFAP53 CYP2E1 EYA3 CRYZL1 BRF1 ETV3 7.065188e-01 7.284950e-01 7.284950e-01 7.828721e-01 7.828721e-01 7.844709e-01 MFAP3 CLEC12A MAPK1 PTPN21 SCARB1 TMEM196 8.216711e-01 8.255090e-01 8.506988e-01 8.737678e-01 8.802089e-01 8.969488e-01 CCDC65 HOXD4 PIGX MCMDC2 8.969488e-01 8.969488e-01 9.266316e-01 9.493551e-01They are not in the same order than before. Let us now compare the density plots: Code R :
R> plot(density(pstatus), main="") R> lines(density(pstatusadB), col="red") R> lines(density(pstatusadBH), col="blue")
Résultat : #r{codeimg}
Note that the procedure is valid when the n tests are independent, wich is quite unlikely in genomics analysis. We therefore recommend the use of Benjamini-Yekutieli procedure.
Benjamini-Yekutieli
The Benjamini-Yekutieli procedure controls the FDR under some dependence assumptions between the n tests. The difference with the Benjamini-Hochberg procedure is in the first step:
find the largest k such that $P_k\leq \frac k{n c(n)} \alpha$
Reject the null hypothesis for the tests i = 1, …, k.
The weights c(n) are $c(n)=\sum_{i=1}^n \frac 1i$ under dependence assumptions between the tests.
Back to the example We apply the Benjamini-Yekutieli procedure to the previous vector of p-values:R> pstatusadBY<-p.adjust(pstatus,method="BY")Let us plot the density of the BY correction (green curve): Code R :
R> plot(density(pstatus), main="") R> lines(density(pstatusadB), col="red") R> lines(density(pstatusadBH), col="blue") R> lines(density(pstatusadBY), col="green")
Résultat : #r{codeimg}
The sorted p-values are as follow:R> sort(pstatusadBY) AK9 SLC39A13 CCL5 TTLL12 CNOT7 UBA7 9.816098e-07 3.339803e-05 4.615086e-05 4.615086e-05 4.615086e-05 1.655300e-04 SLC46A1 PXK ENTHD1 NEXN ESRRA ZDHHC11 1.655300e-04 6.140191e-04 9.716661e-04 1.142161e-03 1.267571e-03 1.267571e-03 PAX8 TTC39C ATP6V1E2 TIRAP PRSS33 ZNF333 1.412498e-03 2.617419e-03 4.308201e-03 1.215488e-02 1.339512e-02 1.339512e-02 TVP23C FAM71A GPR182 BEST4 TSSK3 CORO6 1.339512e-02 1.993473e-02 1.993473e-02 2.493195e-02 2.493195e-02 3.668404e-02 WFDC9 PDE7A NEDD1 LEAP2 SLC39A5 VPS18 5.058944e-02 5.085075e-02 9.410475e-02 9.870896e-02 1.027408e-01 1.059081e-01 ERICH5 CLEC4F C4orf33 HSPA6 GIMAP1 MEGF11 1.374282e-01 1.374282e-01 1.750914e-01 2.033230e-01 2.202238e-01 2.314936e-01 WDR17 TIMD4 TMEM106A CALR3 THRA GSC 2.602558e-01 3.300748e-01 4.532909e-01 4.745266e-01 5.561442e-01 5.747877e-01 KLHL10 MSI2 WFDC6 CALML6 CDH23 SPATA17 5.975668e-01 7.231321e-01 7.231321e-01 7.231321e-01 8.026306e-01 8.303740e-01 PRR22 NLRP5 CENPBD1 ARMCX4 TM2D3 DDR1 8.303740e-01 8.303740e-01 8.936928e-01 9.224421e-01 9.921345e-01 1.000000e+00 RFC2 GUCA1A PTPN21 CYP2E1 EPHB3 CYP2A6 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 SCARB1 WFDC2 MAPK1 ADAM32 MSANTD3 CILP2 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 PIGX TMEM196 ALG10 C15orf40 RAX2 MFAP3 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 EYA3 KLK8 CCDC65 FAM122C CFAP53 RBBP6 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 TRIOBP CATSPER1 HOXD4 SP7 CRYZL1 CBARP 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 MCMDC2 SCIN CTCFL RDH10 SRSF12 GAPT 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 CCDC185 CLEC12A BRF1 TMEM266 ODF4 TTLL10 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 ETV3 ZNF485 MFSD6L ANKAR 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00