Statistical tests

Main Page

A statistical test is a decision method that helps us to examine two opposing conjectures, called hypotheses and to validate or invalidate these hypotheses with a certain degree of confidence.

Hypothesis

Null and alternative hypotheses

In a statistical test, one want to examine two opposing conjectures, also called hypotheses H0 and H1. These two hypotheses are mutually exclusive and exhaustive so that one is true to the exclusion of the other. Usually we choose the two hypotheses such that:
  • the null hypothesis H0 is the one you must have good reasons to reject
  • the alternative H1 is the opposite: the hypothesis you must have good reasons to accept

The purpose of a statistical test is to determine which of the two hypotheses is true and which is false from information collected in a sample.

Example: Test of the mean: the null hypothesis is the mean is zero (H0: μ = 0) against the alternative hypothesis the mean is not zero (H1: μ ≠ 0).

In practice, the null hypothesis always contains the '=' sign, and the alternative hypothesis never contains just the '=' sign.

Notion of risks

When taking a decision, two different errors could occur:

Decision
Truth H0 H1
H0 OK first error
H1 2nd error OK
  • The first error is the decision of rejecting H0 wrongly
  • The second error is the decision of accepting H0 wrongly

The first error is considered as the most serious. This is the one we want to control, that is we want a small probability of the first error. This is the notion of risk:

  • The first kind risk is the probability of the first error, usually denoted α
    PH0[Reject H0]=P[Reject H0 wrongly]=P[Reject H0 while H0 is true]=α
  • The second kind risk is the probability of the second error, of accepting H0 wrongly, ie the probability of accepting H0 when the alternative hypothesis H1 is true.
    PH1[Accept H0]=β
  • The power of the test is 1 − β. It is the probability of rightyfully rejecting H0.

Example For an adult, the logarithm of the D-dimer concentration, denoted by X, is modeled by a normal random variable with expectation μ and standard deviation σ. The variable X is an indicator for the risk of thrombosis: it is considered that for healthy individuals, μ is −1; whereas for individuals at risk, μ is 0. In both cases, the value of σ is the same: 0.3.

Answer If Dr. House does not want to worry a patient, the hypothesis he considers as dangerous to reject wrongly is that the patient is not at risk, thus that his value of X (the test statistic) has expectation −1. His hypothesis H0 is μ = −1 (the patient is not at risk), that he will test against H1: μ = 0 (the patient is at risk).

Answer If Dr. Cuddy does not want to miss a patient at risk, the hypothesis she considers as dangerous to reject wrongly is that he is at risk, and that his variable X has expectation 0. Her hypothesis H0 is μ = 0 (the patient is at risk), that she will test against H1: μ = −1 (the patient is not at risk).

Vocabulary of test

The statistical test helps the statistician to decide between the two hypotheses, based on the information contained in the data. A rule of decision is defined based on a test statistic. The two notions are defined below.

Test statistic

A statistical test uses the observations to determine the statistical support for the null hypothesis H0 against the alternative H1.

It needs a test statistic:
  • a test statistic is a function of the data, for which the probability distribution under the null hypothesis H0 is known
  • a decision rule that specifies, as a function of the values taken by the test statistic, in which cases the hypothesis H0 should be rejected

Example Dr. House will choose to reject too high values for X, ie to decide to reject H0 (and decide H1) when X has high values. Dr. Cuddy will choose to reject lower values of X, ie to decide to reject H0 (and decide H1) when X has low values.

One-sided or two-sided tests

According to the alternative, the test may be
  • left-sided if the decision rule is
    Reject H0 if T < ℓ
    (reject too small values)
  • right-sided if the decision rule is
    Reject H0 if T > ℓ
    (reject too large values)
  • two-sided if the decision rule is
    Reject H0 if T ∉ [ℓ,ℓ′]
    (reject too small or too large values)

The values and ℓ′ are chosen to control the probability of the error of rejecting H0 wrongly. This is an important notion in statistical testing, that is developed below.

The values and ℓ′ are chosen such that the first kind risk is controlled. The standard choices are:
  • left-sided test: is such that PH0[T < ℓ]=α
  • right-sided test: is such that PH0[T > ℓ]=α
  • two-sided test: and ℓ′ are chosen such that PH0[T < ℓ]=PH0[T > ℓ′] = α/2

The first kind risk is usually α = 5%.

Example

The test is right-sided (Dr House will choose to reject too high values for X). The decision rule will be:


Reject H0  ⇔  X > l ,

where H0[X > l]=α .

According to the null hypothesis H0, the test statistic X follows the ${\cal N}(-1,0.3)$ distribution. The bound of the rejection region is the quantile of that distribution at level 1 − α:

R> qnorm(0.95,mean=-1,sd=0.3)
[1] -0.5065439
R> qnorm(0.99,mean=-1,sd=0.3)
[1] -0.3020956

Interpretation Dr. House declares the patient is at risk if the logarithm of his D-dimer concentration is higher than −0.5065 for a risk of 0.05%, −0.3021 for a risk of 0.01%.

The test is left-sided (she will choose to reject lower values of X). The decision rule will be:


Reject H0  ⇔  X < l′ ,
where H0[X < l′] = α .

Under hypothesis H0, the test statistic X follows the ${\cal N}(0,0.3)$ distribution. The bound of the rejection region is the quantile of that distribution at the threshold 1 − α:

R> qnorm(0.05,mean=0,sd=0.3)
[1] -0.4934561
R> qnorm(0.01,mean=0,sd=0.3)
[1] -0.6979044

Interpretation Dr. Cuddy declares the patient is not at risk if the logarithm of his D-dimer concentration is lower than −0.4935 for 0.05, −0.6980 for 0.01.

This is a two-sided test. The decision rule will be:


Reject H0  ⇔  X ∉ [l1, l2] ,
where H0[ X ∉ [l1, l2] ]=0.05 .

Under hypothesis H0, the test statistic X follows the ${\cal N}(-1,0.3)$ distribution, hence $\frac{X-(-1)}{0.3}$ follows the ${\cal N}(0,1)$. The interval [l1 ; l2] must contain 95% of the values taken by a variable with ${\cal N}(-1,0.3)$ distribution. The interval centered at −1 is chosen:

R> qnorm(c(0.025,0.975),mean=-1,sd=0.3)
[1] -1.5879892 -0.4120108

Interpretation At threshold 0.05 the decision rule of the two-tailed test is:
Reject H0  ⇔  X ∉ [ − 1.588 ;   − 0.412] .
The patient is said to have logarithm of D-dimer concentration significantly different from −1 when his variable X is either lower than −1.588, or larger than −0.488.

p-value

A last central notion in statistical tests is the p-value.
  • The p-value is the proportion of values at least as bad as the observations, if H0 is true.

Interpretation: the p-value measures how likely are the data, assuming the null hypothesis is true. It is the maximal risk to reject H0 wrongly.

It is used as a test statistic, with the following decision rule
  • If the p-value is smaller than the first kind risk, reject H0
  • If the p-value is not smaller than the first kind risk, do not reject H0.

Example

The p-value is the threshold at which −0.46 would be the bound. Knowing the results of the first question, since −0.46 lies between −0.5065 and −0.3021, the p-value must be between 0.05 and 0.01. It is the probability under H0, that the variable X is higher than −0.46, that is the right tail at −0.46 of the ${\cal N}(-1,0.3)$ distrubution, that is

R> pnorm(-0.46,mean=-1,sd=0.3,lower.tail=FALSE)
[1] 0.03593032

Interpretation For Dr. House's test, the p-value of a patient with a value of log-dimer equal to −0.46 is 0.0359. At risk 5%, the patient is declared at risk by Dr. House.

The p-value is the threshold for which the observed value would be a bound of the rejection region. That rejection region is centered at −1. The other bound should be −1 − ( − 0.46 − ( − 1)) = −1.54. The normal distribution 𝒩(−1, 0.3) being symmetric at −1, the probabilility to be smaller than −1.54 is equal to the probability to be larger than −0.46. The later is

R> pnorm(-0.46,mean=-1,sd=0.3,lower.tail=FALSE)
[1] 0.03593032

The later has been calculated in question 3; it must be doubled.

R> 2*pnorm(-0.46,mean=-1,sd=0.3,lower.tail=FALSE)
[1] 0.07186064

Interpretation The p-value for a two-sided test is that of a one-sided test, multiplied by 2.

For the two-sided test, the p-value of a patient with a value of log-dimer equal to −0.46 is 0.0718. At risk 5%, the patient is declared not at risk.

7/9/2020