Wednesday, July 22, 2009

One sample Z-test

Comparison of the sample mean with know population mean and standard deviation.

Suppose that 10 volunteers have done an intelligence test; here are the results obtained. The mean obtained at the same test, from the entire population is 75. You want to check if there is a statistically significant difference (with a significance level of 95%) between the means of the sample and the population, assuming that the sample variance is known and equal to 18.
65, 78, 88, 55, 48, 95, 66, 57, 79, 81

To solve this problem it is necessary to develop a one sample Z-test. In R there isn't a similar function, so we can create our function.
Recalling the formula for calculating the value of z, we will write this function:


z.test = function(a, mu, var){
zeta = (mean(a) - mu) / (sqrt(var / length(a)))

We have built so the function z.test; it receives in input a vector of values (a), the mean of the population to perform the comparison (mu), and the population variance (var); it returns the value of zeta. Now apply the function to our problem.

a = c(65, 78, 88, 55, 48, 95, 66, 57, 79, 81)

z.test(a, 75, 18)
[1] -2.832353

The value of zeta is equal to -2.83, which is higher than the critical value Zcv = 1.96, with alpha = 0.05 (2-tailed test). We conclude therefore that the mean of our sample is significantly different from the mean of the population.

1 comment:

  1. Hey buddy,
    you had mistaken something in your z.test function definition!
    You are not allowed to take the sqrt of the fraction (var/length(n)). I see, you have to take the sqrt of the variance, but just assume the standard deviation is needed for your function and not the variance, then you'll get something like (x.mean - mu) / (SE / sqrt(n)) and in your case it will be (x.mean - mu) / (sqrt(var) / sqrt(n)). The reciprocal had to applied first, elsewhere you receive another and wrong result!

    Here is some r-script to visualize it:
    constant = 1 # a substitute for x.mean - mu
    a <- 100 # could be the length
    b <- 25 # let's assume this is the variance
    (constant) / (sqrt(b) / sqrt(a)) # this is what you wanna do
    sqrt(a) * ( constant) / sqrt(b) # this is how it is right
    (constant) / sqrt(a/b) # that is how you do it