Statistical tests

Hypothesis
- Null and alternative hypotheses
- Notion of risks
Vocabulary of test

A statistical test is a decision method that helps us to examine two opposing conjectures, called hypotheses and to validate or invalidate these hypotheses with a certain degree of confidence.

Hypothesis

Null and alternative hypotheses

In a statistical test, one want to examine two opposing conjectures, also called hypotheses H₀ and H₁. These two hypotheses are mutually exclusive and exhaustive so that one is true to the exclusion of the other. Usually we choose the two hypotheses such that:

the null hypothesis H₀ is the one you must have good reasons to reject
the alternative H₁ is the opposite: the hypothesis you must have good reasons to accept

The purpose of a statistical test is to determine which of the two hypotheses is true and which is false from information collected in a sample.

Example: Test of the mean: the null hypothesis is the mean is zero (H₀: μ = 0) against the alternative hypothesis the mean is not zero (H₁: μ ≠ 0).

In practice, the null hypothesis always contains the '=' sign, and the alternative hypothesis never contains just the '=' sign.

Notion of risks

When taking a decision, two different errors could occur:

	Decision
Truth	$H_{0}$	$H_{1}$
$H_{0}$	OK	first error
$H_{1}$	2nd error	OK

The first error is the decision of rejecting H₀ wrongly
The second error is the decision of accepting H₀ wrongly

The first error is considered as the most serious. This is the one we want to control, that is we want a small probability of the first error. This is the notion of risk:

The first kind risk is the probability of the first error, usually denoted α
P_H₀[Reject H₀]=P[Reject H₀ wrongly]=P[Reject H₀ while H₀ is true]=α
The second kind risk is the probability of the second error, of accepting H₀ wrongly, ie the probability of accepting H₀ when the alternative hypothesis H₁ is true.
P_H₁[Accept H₀]=β
The power of the test is 1 − β. It is the probability of rightyfully rejecting H₀.

Example For an adult, the logarithm of the D-dimer concentration, denoted by X, is modeled by a normal random variable with expectation μ and standard deviation σ. The variable X is an indicator for the risk of thrombosis: it is considered that for healthy individuals, μ is −1; whereas for individuals at risk, μ is 0. In both cases, the value of σ is the same: 0.3.

Dr. House does not want to worry his patients if there is no need to. What hypotheses H₀ and H₁ will he choose to test?

Answer If Dr. House does not want to worry a patient, the hypothesis he considers as dangerous to reject wrongly is that the patient is not at risk, thus that his value of X (the test statistic) has expectation −1. His hypothesis H₀ is μ = −1 (the patient is not at risk), that he will test against H₁: μ = 0 (the patient is at risk).

Dr. Cuddy's point of view is that she'd rather worry a patient wrongly than not warn him of an actual risk. What hypotheses H₀ and H₁ will she choose to test?

Answer If Dr. Cuddy does not want to miss a patient at risk, the hypothesis she considers as dangerous to reject wrongly is that he is at risk, and that his variable X has expectation 0. Her hypothesis H₀ is μ = 0 (the patient is at risk), that she will test against H₁: μ = −1 (the patient is not at risk).

Vocabulary of test

The statistical test helps the statistician to decide between the two hypotheses, based on the information contained in the data. A rule of decision is defined based on a test statistic. The two notions are defined below.

Test statistic

A statistical test uses the observations to determine the statistical support for the null hypothesis H₀ against the alternative H₁.

It needs a test statistic:

a test statistic is a function of the data, for which the probability distribution under the null hypothesis H₀ is known
a decision rule that specifies, as a function of the values taken by the test statistic, in which cases the hypothesis H₀ should be rejected

Example Dr. House will choose to reject too high values for X, ie to decide to reject H₀ (and decide H₁) when X has high values. Dr. Cuddy will choose to reject lower values of X, ie to decide to reject H₀ (and decide H₁) when X has low values.

One-sided or two-sided tests

According to the alternative, the test may be

left-sided if the decision rule is
Reject H₀ if T < ℓ
(reject too small values)
right-sided if the decision rule is
Reject H₀ if T > ℓ
(reject too large values)
two-sided if the decision rule is
Reject H₀ if T ∉ [ℓ,ℓ′]
(reject too small or too large values)

The values ℓ and ℓ′ are chosen to control the probability of the error of rejecting H₀ wrongly. This is an important notion in statistical testing, that is developed below.

The values ℓ and ℓ′ are chosen such that the first kind risk is controlled. The standard choices are:

left-sided test: ℓ is such that P_H₀[T < ℓ]=α
right-sided test: ℓ is such that P_H₀[T > ℓ]=α
two-sided test: ℓ and ℓ′ are chosen such that P_H₀[T < ℓ]=P_H₀[T > ℓ′] = α/2

The first kind risk is usually α = 5%.

Example

Give the decision rule for Dr House's test, at threshold 1%, and at threshold 5%.

The test is right-sided (Dr House will choose to reject too high values for X). The decision rule will be:

Reject H₀ ⇔ X > l ,

where ℙ_H₀[X > l]=α .

According to the null hypothesis H₀, the test statistic X follows the ${\cal N}(-1,0.3)$ distribution. The bound of the rejection region is the quantile of that distribution at level 1 − α:

R> qnorm(0.95,mean=-1,sd=0.3)
[1] -0.5065439
R> qnorm(0.99,mean=-1,sd=0.3)
[1] -0.3020956

Interpretation Dr. House declares the patient is at risk if the logarithm of his D-dimer concentration is higher than −0.5065 for a risk of 0.05%, −0.3021 for a risk of 0.01%.

Give the decision rule for Dr Cuddy's test, at threshold 1%, and at threshold 5%.

The test is left-sided (she will choose to reject lower values of X). The decision rule will be:

Reject H₀ ⇔ X < l′ ,
where ℙ_H₀[X < l′] = α .

Under hypothesis H₀, the test statistic X follows the ${\cal N}(0,0.3)$ distribution. The bound of the rejection region is the quantile of that distribution at the threshold 1 − α:

R> qnorm(0.05,mean=0,sd=0.3)
[1] -0.4934561
R> qnorm(0.01,mean=0,sd=0.3)
[1] -0.6979044

Interpretation Dr. Cuddy declares the patient is not at risk if the logarithm of his D-dimer concentration is lower than −0.4935 for 0.05, −0.6980 for 0.01.

Give the decision rule of the test for the null hypothesis H₀ : μ = −1 against the alternative one H₁ : μ ≠ −1, at threshold 5%.

This is a two-sided test. The decision rule will be:

Reject H₀ ⇔ X ∉ [l₁, l₂] ,
where ℙ_H₀[ X ∉ [l₁, l₂] ]=0.05 .

Under hypothesis H₀, the test statistic X follows the ${\cal N}(-1,0.3)$ distribution, hence $\frac{X-(-1)}{0.3}$ follows the ${\cal N}(0,1)$. The interval [l₁ ; l₂] must contain 95% of the values taken by a variable with ${\cal N}(-1,0.3)$ distribution. The interval centered at −1 is chosen:

R> qnorm(c(0.025,0.975),mean=-1,sd=0.3)
[1] -1.5879892 -0.4120108

Interpretation At threshold 0.05 the decision rule of the two-tailed test is:
Reject H₀ ⇔ X ∉ [ − 1.588 ; − 0.412] .
The patient is said to have logarithm of D-dimer concentration significantly different from −1 when his variable X is either lower than −1.588, or larger than −0.488.

p-value

A last central notion in statistical tests is the p-value.

The p-value is the proportion of values at least as bad as the observations, if H₀ is true.

Interpretation: the p-value measures how likely are the data, assuming the null hypothesis is true. It is the maximal risk to reject H₀ wrongly.

It is used as a test statistic, with the following decision rule

If the p-value is smaller than the first kind risk, reject H₀
If the p-value is not smaller than the first kind risk, do not reject H₀.

Example

A patient has a value of X equal to −0.46. Find the p-value of Dr. House's test.

The p-value is the threshold at which −0.46 would be the bound. Knowing the results of the first question, since −0.46 lies between −0.5065 and −0.3021, the p-value must be between 0.05 and 0.01. It is the probability under H₀, that the variable X is higher than −0.46, that is the right tail at −0.46 of the ${\cal N}(-1,0.3)$ distrubution, that is

R> pnorm(-0.46,mean=-1,sd=0.3,lower.tail=FALSE)
[1] 0.03593032

Interpretation For Dr. House's test, the p-value of a patient with a value of log-dimer equal to −0.46 is 0.0359. At risk 5%, the patient is declared at risk by Dr. House.

A patient has a value of X equal to −0.46. Find the p-value for the two-sided test.

The p-value is the threshold for which the observed value would be a bound of the rejection region. That rejection region is centered at −1. The other bound should be −1 − ( − 0.46 − ( − 1)) = −1.54. The normal distribution 𝒩(−1, 0.3) being symmetric at −1, the probabilility to be smaller than −1.54 is equal to the probability to be larger than −0.46. The later is

R> pnorm(-0.46,mean=-1,sd=0.3,lower.tail=FALSE)
[1] 0.03593032

The later has been calculated in question 3; it must be doubled.

R> 2*pnorm(-0.46,mean=-1,sd=0.3,lower.tail=FALSE)
[1] 0.07186064

Interpretation The p-value for a two-sided test is that of a one-sided test, multiplied by 2.

For the two-sided test, the p-value of a patient with a value of log-dimer equal to −0.46 is 0.0718. At risk 5%, the patient is declared not at risk.