Multiple testing

Multiple testing
- Increase of the first risk
- False Discovery Rate
Controlling procedures

Multiple testing

The multiple comparisons or multiple testing problem occurs when one considers a set of statistical tests simultaneously. In our context, this problem occurs when one wants to identify or select the variables (thousands of genes) linked to the phenotype response.

Increase of the first risk

Let us start by an example to explain why the first kind risk increases in the context of multiple testing.

Example Say you wish to test simultaneously the level of expression of a set of genes between two conditions (say dead/alive subjects).

The first idea is to test each gene's level of expression separately with a t.test, using a first kind risk α. This idea reveals not good. Let us consider a simple case where you have 20 genes to test, and a first kind risk α of 0.05.

What’s the probability of observing at least one significant result just due to chance?

Let us assume that the tests are independent (they are usually not in genomics). Then we have:

P(at least one significant result) = 1 − P(no significant results)

= 1 − (1 − 0.05)20 ≈ 0.64

Therefore, with 20 tests being considered, we have a 64% chance of observing at least one significant result, even if all of the null hypotheses are actually true. In genomics, the number of simultaneous tests to be performed is usually larger than 20... and the probability of getting a significant result simply due to chance keeps going up.

When statistical tests are used repeatedly, the increase in type I error occurs.

These errors are called false positives and Type I errors.

Methods for dealing with multiple testing, also called controlling procedures, are based on adjusting the first kind risk α in some way. The objective is that the probability of observing at least one significant result due to chance remains below the desired significance level.

False Discovery Rate

An important notion in large-scale multiple testing (as often happens in genomics) is the false discovery rate (FDR). This is defined as the proportion of false positives among all significant results. The FDR works by estimating some rejection region so that, on average, FDR < α.

Such procedure is said to control the FDR.

Controlling procedures

Different controlling procedures have been proposed. Let us start with Bonferroni correction. The number of tests to be applied is denoted n in the following.

Bonferroni correction

The Bonferroni correction sets the significance level of each test at α/n.

In the example above with 20 genes and α = 0.05, you’d only reject a null hypothesis if the p-value is less than α/20 = 0.0025.

The Bonferroni correction tends to be a bit too conservative, especially if there are a large number of tests and/or the test statistics are positively correlated. The correction comes at the cost of increasing the probability of producing false negatives, i.e., reducing statistical power.

The Bonferroni correction is implemented in R, through the function

p.adjust

This function takes as input a vector of p-values, corresponding to the n tests of comparison in level of expression. Creating this vector manually would be time-consuming (and stupid when n is large!). We use the function apply that applies the test function (t.test, wilcox.test, etc) to all the columns of a genomics matrix.

Example We use the two data sets LenzT and LenzI. Let us assume we want to compare the level of expression of genes according to the status. We apply a t.test to each gene.

R> Y<-readRDS("data/LenzT.rds")
R> X<-readRDS("data/LenzI.rds")
R> names(X)
 [1] "gender" "age"    "diagno" "status" "follup" "regim"  "ecog"   "stage" 
 [9] "ldhrat" "extnod"
R> dim(Y)
[1]   414 17290

There are n = 17290 genes. We focus on the first 100 genes. We have to calculate n = 100 p-values of a t.test. Let us use the function apply:

R> Y<-Y[,1:100]
R> pstatus<-apply(Y,2,function(v){t.test(v~X$status)$p.value})

The vector pstatus contains the n = 100 p-values.

R> pstatus
        DDR1         RFC2        HSPA6         PAX8       GUCA1A         UBA7 
1.772361e-01 4.455526e-01 1.332655e-02 3.539837e-05 3.437319e-01 2.220430e-06 
        THRA       PTPN21         CCL5       CYP2E1        EPHB3        ESRRA 
4.395653e-02 8.213417e-01 3.717196e-07 6.337907e-01 1.785658e-01 2.932282e-05 
      CYP2A6       SCARB1       TTLL12        WFDC2        MAPK1       ADAM32 
1.312198e-01 8.361984e-01 4.448381e-07 5.442100e-01 7.911499e-01 2.008663e-01 
     SPATA17        PRR22          PXK        VPS18      MSANTD3      SLC46A1 
7.814449e-02 8.003794e-02 9.469434e-06 6.124951e-03 1.292448e-01 2.233710e-06 
       TIMD4      SLC39A5      ZDHHC11     ATP6V1E2        CILP2         PIGX 
2.417954e-02 5.743720e-03 2.899784e-05 1.245774e-04 2.010269e-01 9.173653e-01 
     TMEM196     SLC39A13        BEST4          AK9        CORO6     TMEM106A 
8.659954e-01 1.287665e-07 1.105442e-03 1.892304e-09 1.697229e-03 3.407954e-02 
       ALG10       TTC39C         NEXN     C15orf40         RAX2        MFAP3 
2.885099e-01 7.064045e-05 2.201809e-05 3.260682e-01 1.044170e-01 7.477207e-01 
        EYA3       GIMAP1         KLK8       CCDC65      FAM122C       CFAP53 
6.291313e-01 1.485883e-02 3.705238e-01 8.790098e-01 3.898992e-01 6.005410e-01 
      ARMCX4        RBBP6      CENPBD1       TRIOBP     CATSPER1        HOXD4 
9.246867e-02 1.586550e-01 8.786392e-02 4.656764e-01 2.287150e-01 8.749417e-01 
         GSC          SP7        PDE7A        CNOT7       CRYZL1       PRSS33 
4.653813e-02 3.701823e-01 2.548724e-03 3.632067e-07 6.967562e-01 4.906279e-04 
       CBARP       MCMDC2        TIRAP        LEAP2         MSI2         SCIN 
1.918569e-01 9.493551e-01 3.749062e-04 5.328031e-03 6.306454e-02 1.104760e-01 
       CTCFL      C4orf33       ZNF333       TVP23C        RDH10       SRSF12 
2.385870e-01 1.113861e-02 4.647826e-04 4.835807e-04 4.033206e-01 2.919043e-01 
      FAM71A         GAPT       ERICH5      CCDC185       ENTHD1        TSSK3 
7.703352e-04 3.124296e-01 8.477699e-03 4.482195e-01 1.685822e-05 1.073876e-03 
       WFDC6      CLEC12A         BRF1      TMEM266       CALML6        NLRP5 
6.301815e-02 7.594683e-01 6.900515e-01 5.271653e-01 6.412504e-02 7.925233e-02 
        ODF4       CLEC4F        WFDC9        NEDD1       TTLL10        CALR3 
3.898969e-01 8.298857e-03 2.438103e-03 4.898098e-03 3.748525e-01 3.659087e-02 
        ETV3       KLHL10        TM2D3       ZNF485        WDR17       MFSD6L 
7.060238e-01 4.953442e-02 1.013675e-01 1.814030e-01 1.856326e-02 3.131468e-01 
       CDH23        ANKAR       MEGF11       GPR182 
7.272198e-02 5.385366e-01 1.606548e-02 8.070153e-04

Then, to apply the Bonferroni correction, we use the function p.adjust:

R> pstatusadB<-p.adjust(pstatus,method="bonferroni")

The p-values have been increased. As seen on the density plot below, a lot of p-values have been set to 1 with the Bonferroni correction method. This is an illustration of the fact that Bonferroni is too conservative. Code R :

R> plot(density(pstatus), main="")
R> lines(density(pstatusadB), col="red")

Résultat : #r{codeimg}

To display the most significant genes, we sort them and display their names and p-values:

R> sort(pstatusadB)
         AK9     SLC39A13        CNOT7         CCL5       TTLL12         UBA7 
1.892304e-07 1.287665e-05 3.632067e-05 3.717196e-05 4.448381e-05 2.220430e-04 
     SLC46A1          PXK       ENTHD1         NEXN      ZDHHC11        ESRRA 
2.233710e-04 9.469434e-04 1.685822e-03 2.201809e-03 2.899784e-03 2.932282e-03 
        PAX8       TTC39C     ATP6V1E2        TIRAP       ZNF333       TVP23C 
3.539837e-03 7.064045e-03 1.245774e-02 3.749062e-02 4.647826e-02 4.835807e-02 
      PRSS33       FAM71A       GPR182        TSSK3        BEST4        CORO6 
4.906279e-02 7.703352e-02 8.070153e-02 1.073876e-01 1.105442e-01 1.697229e-01 
       WFDC9        PDE7A        NEDD1        LEAP2      SLC39A5        VPS18 
2.438103e-01 2.548724e-01 4.898098e-01 5.328031e-01 5.743720e-01 6.124951e-01 
      CLEC4F       ERICH5         DDR1         RFC2        HSPA6       GUCA1A 
8.298857e-01 8.477699e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
        THRA       PTPN21       CYP2E1        EPHB3       CYP2A6       SCARB1 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
       WFDC2        MAPK1       ADAM32      SPATA17        PRR22      MSANTD3 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
       TIMD4        CILP2         PIGX      TMEM196     TMEM106A        ALG10 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
    C15orf40         RAX2        MFAP3         EYA3       GIMAP1         KLK8 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
      CCDC65      FAM122C       CFAP53       ARMCX4        RBBP6      CENPBD1 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
      TRIOBP     CATSPER1        HOXD4          GSC          SP7       CRYZL1 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
       CBARP       MCMDC2         MSI2         SCIN        CTCFL      C4orf33 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
       RDH10       SRSF12         GAPT      CCDC185        WFDC6      CLEC12A 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
        BRF1      TMEM266       CALML6        NLRP5         ODF4       TTLL10 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
       CALR3         ETV3       KLHL10        TM2D3       ZNF485        WDR17 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
      MFSD6L        CDH23        ANKAR       MEGF11 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00

Interpretation: Only the first genes are of importance. They are sorted by their p-values of a different level of expression between dead and alive patients.

Note that the R code above can be adapted to other tests such as var.test, ks.test, oneway.test, wilcox.test.

Benjamini-Hochberg

The Benjamini-Hochberg (1995) procedure controls the FDR at level α. The idea is the following. Let us consider n tests and their corresponding p-values, in ascending order P₁, …, P_n. The procedure is in two steps:

find the largest k such that $P_k \leq \frac kn \alpha$.
Reject the null hypothesis for the tests i = 1, …, k.

In R, the Benjamini-Hochberg procedure is implemented in the function p.ajust with the option BH.

Back to the example We apply the Benjamini-Hochberg procedure to the previous vector of p-values:

R> pstatusadBH<-p.adjust(pstatus,method="BH")

The sorted p-values are as follow:

R> sort(pstatusadBH)
         AK9     SLC39A13         CCL5       TTLL12        CNOT7         UBA7 
1.892304e-07 6.438326e-06 8.896762e-06 8.896762e-06 8.896762e-06 3.191014e-05 
     SLC46A1          PXK       ENTHD1         NEXN        ESRRA      ZDHHC11 
3.191014e-05 1.183679e-04 1.873136e-04 2.201809e-04 2.443568e-04 2.443568e-04 
        PAX8       TTC39C     ATP6V1E2        TIRAP       PRSS33       ZNF333 
2.722952e-04 5.045746e-04 8.305162e-04 2.343164e-03 2.582252e-03 2.582252e-03 
      TVP23C       FAM71A       GPR182        BEST4        TSSK3        CORO6 
2.582252e-03 3.842930e-03 3.842930e-03 4.806272e-03 4.806272e-03 7.071789e-03 
       WFDC9        PDE7A        NEDD1        LEAP2      SLC39A5        VPS18 
9.752412e-03 9.802786e-03 1.814110e-02 1.902868e-02 1.980593e-02 2.041650e-02 
      ERICH5       CLEC4F      C4orf33        HSPA6       GIMAP1       MEGF11 
2.649281e-02 2.649281e-02 3.375335e-02 3.919573e-02 4.245379e-02 4.462633e-02 
       WDR17        TIMD4     TMEM106A        CALR3         THRA          GSC 
5.017097e-02 6.363038e-02 8.738344e-02 9.147718e-02 1.072110e-01 1.108051e-01 
      KLHL10         MSI2        WFDC6       CALML6        CDH23      SPATA17 
1.151963e-01 1.394023e-01 1.394023e-01 1.394023e-01 1.547276e-01 1.600759e-01 
       PRR22        NLRP5      CENPBD1       ARMCX4        TM2D3         RAX2 
1.600759e-01 1.600759e-01 1.722822e-01 1.778244e-01 1.912594e-01 1.933649e-01 
        SCIN       CYP2A6      MSANTD3        RBBP6         DDR1        EPHB3 
2.008654e-01 2.302101e-01 2.302101e-01 2.735431e-01 2.973820e-01 2.973820e-01 
      ZNF485        CBARP       ADAM32        CILP2     CATSPER1        CTCFL 
2.973820e-01 3.094466e-01 3.141046e-01 3.141046e-01 3.518693e-01 3.614955e-01 
       ALG10       SRSF12         GAPT       MFSD6L     C15orf40       GUCA1A 
4.292710e-01 4.292710e-01 4.473525e-01 4.473525e-01 4.592510e-01 4.774054e-01 
        KLK8          SP7       TTLL10      FAM122C         ODF4        RDH10 
4.998033e-01 4.998033e-01 4.998033e-01 5.063626e-01 5.063626e-01 5.170777e-01 
        RFC2      CCDC185       TRIOBP      TMEM266        WFDC2        ANKAR 
5.602744e-01 5.602744e-01 5.749091e-01 6.428845e-01 6.478690e-01 6.478690e-01 
      CFAP53       CYP2E1         EYA3       CRYZL1         BRF1         ETV3 
7.065188e-01 7.284950e-01 7.284950e-01 7.828721e-01 7.828721e-01 7.844709e-01 
       MFAP3      CLEC12A        MAPK1       PTPN21       SCARB1      TMEM196 
8.216711e-01 8.255090e-01 8.506988e-01 8.737678e-01 8.802089e-01 8.969488e-01 
      CCDC65        HOXD4         PIGX       MCMDC2 
8.969488e-01 8.969488e-01 9.266316e-01 9.493551e-01

They are not in the same order than before. Let us now compare the density plots: Code R :

R> plot(density(pstatus), main="")
R> lines(density(pstatusadB), col="red")
R> lines(density(pstatusadBH), col="blue")

Résultat : #r{codeimg}

The density plot shows that the BH correction (blue curve) is less conservative than Bonferroni (red curve). It is however very close to the unadjusted p-values: the correction has a very small effect.

Note that the procedure is valid when the n tests are independent, wich is quite unlikely in genomics analysis. We therefore recommend the use of Benjamini-Yekutieli procedure.

Benjamini-Yekutieli

The Benjamini-Yekutieli procedure controls the FDR under some dependence assumptions between the n tests. The difference with the Benjamini-Hochberg procedure is in the first step:

find the largest k such that $P_k\leq \frac k{n c(n)} \alpha$
Reject the null hypothesis for the tests i = 1, …, k.

The weights c(n) are $c(n)=\sum_{i=1}^n \frac 1i$ under dependence assumptions between the tests.

In R, the Benjamini-Yekutieli procedure is implemented in the function p.ajust with the option BY.

Back to the example We apply the Benjamini-Yekutieli procedure to the previous vector of p-values:

R> pstatusadBY<-p.adjust(pstatus,method="BY")

Let us plot the density of the BY correction (green curve): Code R :

R> plot(density(pstatus), main="")
R> lines(density(pstatusadB), col="red")
R> lines(density(pstatusadBH), col="blue")
R> lines(density(pstatusadBY), col="green")

Résultat : #r{codeimg}

The density plot shows that the BY correction is a mixture between the Bonferroni (too conservative) and the BH (no enough conservative) corrections

The sorted p-values are as follow:

R> sort(pstatusadBY)
         AK9     SLC39A13         CCL5       TTLL12        CNOT7         UBA7 
9.816098e-07 3.339803e-05 4.615086e-05 4.615086e-05 4.615086e-05 1.655300e-04 
     SLC46A1          PXK       ENTHD1         NEXN        ESRRA      ZDHHC11 
1.655300e-04 6.140191e-04 9.716661e-04 1.142161e-03 1.267571e-03 1.267571e-03 
        PAX8       TTC39C     ATP6V1E2        TIRAP       PRSS33       ZNF333 
1.412498e-03 2.617419e-03 4.308201e-03 1.215488e-02 1.339512e-02 1.339512e-02 
      TVP23C       FAM71A       GPR182        BEST4        TSSK3        CORO6 
1.339512e-02 1.993473e-02 1.993473e-02 2.493195e-02 2.493195e-02 3.668404e-02 
       WFDC9        PDE7A        NEDD1        LEAP2      SLC39A5        VPS18 
5.058944e-02 5.085075e-02 9.410475e-02 9.870896e-02 1.027408e-01 1.059081e-01 
      ERICH5       CLEC4F      C4orf33        HSPA6       GIMAP1       MEGF11 
1.374282e-01 1.374282e-01 1.750914e-01 2.033230e-01 2.202238e-01 2.314936e-01 
       WDR17        TIMD4     TMEM106A        CALR3         THRA          GSC 
2.602558e-01 3.300748e-01 4.532909e-01 4.745266e-01 5.561442e-01 5.747877e-01 
      KLHL10         MSI2        WFDC6       CALML6        CDH23      SPATA17 
5.975668e-01 7.231321e-01 7.231321e-01 7.231321e-01 8.026306e-01 8.303740e-01 
       PRR22        NLRP5      CENPBD1       ARMCX4        TM2D3         DDR1 
8.303740e-01 8.303740e-01 8.936928e-01 9.224421e-01 9.921345e-01 1.000000e+00 
        RFC2       GUCA1A       PTPN21       CYP2E1        EPHB3       CYP2A6 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
      SCARB1        WFDC2        MAPK1       ADAM32      MSANTD3        CILP2 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
        PIGX      TMEM196        ALG10     C15orf40         RAX2        MFAP3 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
        EYA3         KLK8       CCDC65      FAM122C       CFAP53        RBBP6 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
      TRIOBP     CATSPER1        HOXD4          SP7       CRYZL1        CBARP 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
      MCMDC2         SCIN        CTCFL        RDH10       SRSF12         GAPT 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
     CCDC185      CLEC12A         BRF1      TMEM266         ODF4       TTLL10 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
        ETV3       ZNF485       MFSD6L        ANKAR 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00

Be careful: The most significant genes might not be the same depending on the correction. The knowledge of the specialist (the biologist) is of great importance !