One Sample Tests

Tests on a Gaussian sample
Testing particular values
Goodness-of-fit tests

We focus on one sample tests.

Tests on a Gaussian sample

Consider a random sample (X₁, …, X_n) of a distribution with mean μ, standard deviation σ. Recall that

The empirical mean is $\bar X = \frac{X_1 + \ldots + X_n}n$
The empirical variance is $S^2 = \frac{n}{n-1} \left(\frac{X^2_1 +\ldots + X^2_n}n - \bar X^2\right)$.

Test of the mean

A first test is on the mean of the sample.

Example. For an adult, the logarithm of the D-dimer concentration, denoted by X, is modeled by a normal random variable with mean μ and standard deviation σ. The variable X is an indicator for the risk of thrombosis: it is considered that for healthy individuals, μ is −1, whereas for individuals at risk μ is 0. The influence of olive oil on thrombosis risk must be evaluated. A group of 13 patients, previously considered as being at risk, had an olive oil enriched diet. After the diet, their value of X was measured, and this gave an empirical mean of −0.15.

The doctor would like to decide if the olive oil diet has improved the D-dimer concentration.

The test on the mean of the sample compares the hypothesis H₀ : μ = μ₀ with a two-sided hypothesis H1 : μ ≠ μ₀ or a one-sided hypothesis H1 : μ ≥ μ₀ or H1 : μ ≤ μ₀.

When the variance σ² is known, the test statistic is
$$T = \sqrt{n} \left(\frac{\bar X - \mu_0}{\sigma}\right)$$

When H₀ is true, the statistic T follows a 𝒩(0, 1).

The decision rule depends on H₁. It consists in computing the bounds at which we reject H₀. The bounds depend also on the risk of the test (the first kind risk).

The test on the mean of the sample compares the hypothesis H₀ : μ = μ₀ with one of the three alternatives:

For H1 : μ ≠ μ₀: we reject H₀ when T takes too small or too large values. At a risk of α = 5%, the two bounds are

R> alpha <- 0.05
R> qnorm(alpha/2,0,1)
[1] -1.959964
R> qnorm(1-alpha/2,0,1)
[1] 1.959964

We reject H₀ when T < ( − 1.959964) or whenT > 1.959964.

For H1 : μ ≥ μ₀: we reject H₀ when T takes too large values. At a risk of α = 5%, the bound is

R> alpha <- 0.05
R> qnorm(alpha,0,1, lower.tail=FALSE)
[1] 1.644854

We reject H₀ when T > 1.644854.

For H1 : μ ≤ μ₀: we reject H₀ when T takes too small values. At a risk of α = 5%, the bound is

R> alpha <- 0.05
R> qnorm(alpha,0,1)
[1] -1.644854

We reject H₀ when T < ( − 1.644854).

Back to the example. We assume that the sample of 13 patients is a Gaussian sample. The standard deviation σ is supposed to be known and equal to 0.3. We want to test
H₀ : μ = 0 versus H1 : μ = −1

The test statistic is
$$ T = \sqrt{13} \left(\frac{\bar X - 0}{0.3}\right)$$
According to the null hypothesis H₀, T follows the normal distribution 𝒩(0, 1). The hypothesis H₀ is rejected when T takes low values. At risk 5%, the bound is

R> qnorm(0.05,0,1)
[1] -1.644854

The decision rule is Reject H_0 if T < (−1.6449).

For $\bar X= -0.15$, the test statistic takes the value

R> n<-13
R> Xbar<--0.15
R> sig<-0.3
R> mu0<-0
R> t<-sqrt(n)*(Xbar-mu0)/sig
R> t
[1] -1.802776

Decision and interpretation: At risk 5%, the hypothesis H₀ is rejected. The decision is that there has been a significant improvement.

The previous case assumes that the standard deviation σ is known. This is usually not the case in practice. The adaptation to the test of the mean, with an unknown variance is the following.

The test on the mean of the sample compares the hypothesis H₀ : μ = μ₀ versus a two-sided hypothesis H1 : μ ≠ μ₀ or a one-sided hypothesis H1 : μ ≥ μ₀ or H1 : μ ≤ μ₀.

When the variance σ² is unknown, the test statistic is
$$T = \sqrt{n} \left(\frac{\bar X - \mu_0}{S}\right)$$

When H₀ is true, the statistic T follows a Student distribution with n − 1 degrees of freedom T(n − 1).

The decision rules are before, but the bounds are computing from the Student distribution instead of the normal distribution.

For H1 : μ ≠ μ₀: we reject H₀ when T takes too small or too large values. At a risk of α = 5%, the two bounds are

R> alpha <- 0.05
R> n<-13
R> qt(alpha/2,n-1)
[1] -2.178813
R> qt(1-alpha/2,n-1)
[1] 2.178813

We reject H₀ when T is outside the two bounds.

For H1 : μ ≥ μ₀: we reject H₀ when T takes too large values. At a risk of α = 5%, the bound is

R> alpha <- 0.05
R> n<-13
R> qt(alpha,n-1)
[1] -1.782288

We reject H₀ when T is larger than the bound.

For H1 : μ ≤ μ₀: we reject H₀ when T takes too small values. At a risk of α = 5%, the bound is

R> alpha <- 0.05
R> n<-13
R> qt(alpha,n-1)
[1] -1.782288

We reject H₀ when T is lower than the bound.

Back to the example. We assume that the standard deviation σ is unknown and estimated to 0.3. We want to test
H₀ : μ = 0 versus H1 : μ = −1

The test statistic is
$$ T = \sqrt{13} \left(\frac{\bar X - 0}{0.3}\right)$$
According to the null hypothesis H₀, T follows a Student distribution with 12 degrees of freedom. The hypothesis H₀ is rejected when T takes low values. At risk 5%, the bound is

R> qt(0.05,12)
[1] -1.782288

The decision rule is Reject H_0 if T < (−1.6449).

For $\bar X= -0.15$, the test statistic takes the value

R> n<-13
R> Xbar<--0.15
R> s<-0.3
R> mu0<-0
R> t<-sqrt(n)*(Xbar-mu0)/s
R> t
[1] -1.802776

Decision and interpretation: At risk 5%, the hypothesis H₀ is rejected. The decision is that there has been a significant improvement.

The previous example uses the estimation of the mean, the standard deviation and the sample size. In practice, all the values of the sample are usually available. In that case, the user could estimate himself the mean, the standard deviation and apply the previous instruction. Or he can directly use the function t.test.

R code for the test on a mean. The mean of a sample can be tested using the function t.test.

t.test(X,mu,alternative)

The function computes the test statistic of Student’s T-test comparing mean(X) to mu, and the corresponding p-value according to the alternative. The null hypothesis H₀ is “the mean is equal to mu”. The alternative is in « two.sided » (default), « less », « greater »; they are understood as:

two.sided: the mean is not equal mu,
less: the mean is less than mu,
greater: the mean is greater than mu.

Example To test if the mean of the bmi in the bosson sample is equal to 23, we run the following code

R> B <- read.table("data/bosson.csv", header=TRUE, sep=";")
R> b<-B$bmi
R> t.test(b, mu=23)

	One Sample t-test

data:  b
t = -0.80562, df = 208, p-value = 0.4214
alternative hypothesis: true mean is not equal to 23
95 percent confidence interval:
 22.17514 23.34628
sample estimates:
mean of x 
 22.76071

The two hypotheses of this t-test are H₀ : μ = 23 and H₁ : μ ≠ 23*. The output reads as follows:

t=-0.80562 is the value of the test statistic
df=208is the degree of freedom of the Student distribution (equal to n-1)
p-value = 0.4214is the p-value of the t-test, corresponding to the two-sided test (by default, the alternative is two-sided)
alternative… recalls the choice of the alternative
95 percent confidence interval computes the 95% confidence interval of the mean, assuming the standard deviation unknown
the last value 22.76071 is the estimation of the mean of the sample

Interpretation of the t.testoutput: the p-value is 0.4214. Therefore, at a risk of 5%, we do not reject H₀. The mean bmi is not significantly different from 23.

We can also apply a one-sided test by changing the alternative. To test the alternative H₁ : μ ≤ 23, run

R> t.test(b, mu=23, alternative= "less")

	One Sample t-test

data:  b
t = -0.80562, df = 208, p-value = 0.2107
alternative hypothesis: true mean is less than 23
95 percent confidence interval:
     -Inf 23.25146
sample estimates:
mean of x 
 22.76071

Interpretation of the t.testoutput: the p-value is 0.2107. Therefore, at a risk of 5%, we do not reject H₀. The mean bmi is not significantly less than 23.

Remark that even if the empirical mean (22.76071) is less than 23, the difference is not significant, and we can not conclude, at a risk of 5%, that the mean is less than 23. Several reasons could be involved: the variability in the sample is too large (the standard error of the empirical mean is large) or the size of the sample is not large enough (the empirical mean is not estimated with enough precision).

Test of the standard deviation or the variance

One can also test the value of the standard deviation or the variance of a Gaussian sample.

The test on the variance of the sample compares the hypothesis H₀ : σ² = σ₀² versus a two-sided hypothesis H1 : σ² ≠ σ₀² or a one-sided hypothesis H1 : σ² ≥ σ₀² or H1 : σ² ≤ σ₀².

The test statistic is
$$T = (n-1) \left(\frac{S^2}{\sigma_0^2}\right)$$

When H₀ is true, the statistic T follows a chi-square distribution with n − 1 degrees of freedom χ²(n − 1).

Test of the mean for large sample

Finally, a test of the mean exists for large sample, and we don't need the assumption that the sample is Gaussian thanks to the Central Limit Theorem.

The test on the mean of a large sample compares the hypothesis H₀ : μ = μ₀ versus a two-sided hypothesis H1 : μ ≠ μ₀ or a one-sided hypothesis H1 : μ ≥ μ₀ or H1 : μ ≤ μ₀.

The test statistic is
$$T = \sqrt{n} \left(\frac{\bar X - \mu_0}{S}\right)$$

When H₀ is true, the statistic T follows a normal distribution 𝒩(0, 1).

With R, the test of the mean for large sample can be applied with the function t.test (as above), assuming that the normal distribution is very closed to a Student distribution with a large degree of freedom.

Testing particular values

A particular value of a mean, a standard deviation, a proportion, or a quantile, can be tested using the function t.test, sd.test, prop.test.

Value of a mean

A value of the mean can be tested using the function t.test. This can be applied when the sample is Gaussian or when the sample is large enough.

The test on the mean of a continuous sample compares the hypothesis H₀ the mean is equal to mu versus a two-sided hypothesis H1:the mean is not equal to mu or a one-sided hypothesis H1: the mean is less than mu, or H1 the mean is greater than mu.

The code in R is the same as explained above for the Gaussian case:

A value of a mean can be tested using the function t.test.

t.test(X,mu,alternative)

Value of a standard deviation

A value of the standard deviation can be tested using the function sd.test. This can be applied when the sample is Gaussian or when the sample is large enough.

The test on the standard deviation of a continuous sample compares the hypothesis H₀ the standard deviation is equal to sigma versus a two-sided hypothesis H1:the standard deviation is not equal to sigma or a one-sided hypothesis H1: the standard deviation is less than sigma, or H1 the standard deviation is greater than sigma.

A value of a standard deviation can be tested using the function sd.test.

sd.test(X,sigma,alternative)

Value of a proportion or a quantile

When the variable of interest is binary (two possible modalities), we are interested in comparing the proportion of the first modality with a theoretical proportion p₀.

Example For a certain disease, there exists a treatment that cures 70% of the cases. A laboratory proposes a new treatment claiming that it is better than the previous one. Out of 100 patients having received the new treatment, 74 of them have been cured. The expert would like to decide whether the new treatment should be authorized. 

The test on the proportion of a binary sample compares the hypothesis H₀ : p = p₀ versus a two-sided hypothesis H1 : p ≠ p₀ or a one-sided hypothesis H1 : p ≤ p₀ or H1 : p ≥ p₀.

Back to the example The hypotheses we want to test are H₀ : p = 0.7 versus H1 : p ≥ 0.7. 

A value of a proportion can be tested using the function prop.test.

prop.test(x,n,p,alternative)

The null hypothesis H₀ is: “the proportion of x out of n is equal to p”. The alternative is in « two.sided » (default), « less », « greater »; they are understood as:

two.sided: the proportion x/n is not equal p,
less: the proportion x/n is less than p,
greater: the proportion x/n is greater than p.

Back to the example The one-sided test is applied running

R> prop.test(x=74, n= 100, p=0.7, alternative=« greater »)

Interpretation The p-value is 0.2225. At risk 5%, we do not reject H₀. The new treatment is not significantly better than the standard treatment. It should not be authorized.

Note that when the whole binary sample X is available (and not only the count of « successes »), the instruction is prop.test(sum(X), length(X), p, alternative). 

Goodness-of-fit tests

A goodness-of-fit test answers the question: could the sample have been drawn at random from a particular distribution?

Example Consider a diploid population with allele frequencies 0.4 for A, 0.6 for a. At the Hardy-Weinberg equilibrium, the probabilities of the three genotypes AA, Aa, aa, are (0.16, 0.48, 0.36). Frequency table of the three genotypes is (1600, 4900, 3500). Is the theoretical model plausible ? 

Chi-squared test

For a discrete variable, the goodness-of-fit is measured by a distance between the relative frequencies of the variable, and the probabilities of the target distribution.

For a discrete variable, the chi-squared test compares the null hypothesis H₀: “the observed frequencies fit the theoretical probabilities”. The alternative is “the observed frequencies do not fit the theoretical probabilities”.

Under H₀, the distance follows a chi-squared distribution. The parameter df of that chi-squared distribution is the number of different values minus 1, minus the number of estimated parameters, if there are any.

Under the alternative, the distance should be large, so that the p-value is computed as the right-tail probability of the chi-squared distribution at the distance.

A a goodness-of-fit for a discrete variable can be tested using the function chisq.test, if no parameter have been estimated. If X is the sample, and p is the distribution, the result is obtained by:

chisq.test(table(X),p)

In that command,

table(X) is a table of absolute frequencies (any vector of integers can be tested)
The probability distribution p is a vector of probabilities, with same length as table(X)

The answer is “the fit is good”, if the p-value is large (above the risk).

If some frequencies are too small, a warning message may be issued. If one parameter or more have been estimated, the test statistic should be extracted, and the p-value computed as its right-tail probability for the chi-squared distribution with a smaller parameter.

Back to the example The frequency table of the three genotypes AA, Aa, aa is (1600, 4900, 3500). The theoretical probabilities are (0.16, 0.48, 0.36). The chi-squared test is applied running

R> chisq.test(c(1600, 4900, 3500),p=c(0.16, 0.48, 0.36))

	Chi-squared test for given probabilities

data:  c(1600, 4900, 3500)
X-squared = 4.8611, df = 2, p-value = 0.08799

The outputs are

data the observed data
X-squared the value of the chi-squared distance
dfthe degree of freedom of the chi-squared distribution
p-value the p-value

The p-value is 0.08799. At risk 5%, we do not reject H₀. The theoretical probabilities are acceptable.

Kolmogorov-Smirnov test

For a continuous variable, the goodness-of-fit is measured by a distance between the empirical cumulative distribution function (ecdf) of the variable, and the cdf of the target distribution.

For a continuous variable, the Kolmogorov-Smirnov test compares the null hypothesis H0: “the empirical distribution of the data fits the theoretical distribution” and H₁: "The empirical distribution does not fit the theoretical distribution".

If X is the sample, dist is the distribution, param the parameters of that distribution, the result is obtained by:

`ks.test(table(X), dist, param, alternative)’

The answer is “the fit is good”, if the p-value is large (above the risk). The variable X should not have ties (equal values). If some values are equal, a warning message is issued, indicating that the p-value is not quite as precise. This does not affect the validity of the result.

The null hypothesis H₀ is: “the distribution of the sample is the theoretical cdf”. The alternative is in « two.sided » (default), « less », « greater »; they are understood as:

two.sided: the ecdf of the sample is different from the theoretical cdf,
less: the ecdf of the sample is under the theoretical cdf (the values of the sample are larger than those of the theoretical distribution),
greater: the ecdf of the sample is above the theoretical cdf (the values of the sample are smaller than those of the theoretical distribution).

Example Let us study the distribution of the variable aneurysm in the data set bosson. Let us plot the ecdf of aneurysm and those of a normal distribution.

Code R :

R> A<-B$aneurysm
R> plot(ecdf(A))
R> curve(pnorm(x,mean(A), sd(A)), col="red", add=TRUE)

Résultat :

The red curve is the one of a normal distribution with parameters μ = 47.57 and σ = 13.76. The ecdf (black curve) is not always close to the theoretical cdf. The aneurysm variable is probably not normally distributed.

To test if the ecdf of aneurysm is a normal distribution with parameters μ = 47.57 and σ = 13.76 or not, run

R> ks.test(A, "pnorm", c(47.57,13.76))

	One-sample Kolmogorov-Smirnov test

data:  A
D = 0.99522, p-value < 2.2e-16
alternative hypothesis: two-sided

The outputs are

data the observed data
D the value of the Kolmogorov-Smirnov distance
p-value the p-value

The p-value is 2.2e-16. At risk 5%, we reject H₀. The sample aneurysm does not follow a normal distribution 𝒩(47.57, 13.76).

The histogram of the variable aneurysm reveals a right-skewed distribution, closed to a log-normal distribution. Let us log-transform the data

R> LA<-log(A)

and plot the ecdf (black curve) of the log-aneurysm and the cdf (red curve) of a normal distribution with parameters log(47.57)≈3.8 and log(13.76)≈0.28.

The two curves are quite closed. Let us test if the empirical cdf of LA is a normal distribution with parameters log(47.57)≈3.8 and log(13.76)≈0.28

R> ks.test(LA, "pnorm", c(3.82,0.28))

	One-sample Kolmogorov-Smirnov test

data:  LA
D = 0.99237, p-value < 2.2e-16
alternative hypothesis: two-sided

The p-value is still very small. At risk 5%, we reject H₀. The sample log-aneurysm does not follow a normal distribution 𝒩(3.8, 0.28).

Remark: The parameters may not be the good ones to accept H₀. Below, we will test the normality family (which is different from testing a specific normal distribution) and see that we do accept the normality assumption.

Normality test

Testing whether a variable is normally distributed, is different from testing whether a particular normal distribution with given parameters fits the variable.

The normality of a continuous variable is tested with the Shapiro-Wilk test.

The null hypothesis H0 is: “the variable is normally distributed”. The alternative is “the variable is not normally distributed”.

If X is the sample, the result is obtained by:

`shapiro.test(X)’

Back to the example Test of the normality of the log-aneurysm in the bosson dataset:

R> shapiro.test(LA)

	Shapiro-Wilk normality test

data:  LA
W = 0.9968, p-value = 0.9471

The outputs are

data the observed data
W the value of the Shapiro-Wilk distance
p-value the p-value

The p-value is 0.9471. At risk 5%, we do not reject H₀. The sample log-aneurysm is normally distributed.