Friday, July 31, 2009

Kruskal-Wallis one-way analysis of variance

If you have to perform the comparison between multiple groups, but you can not run a ANOVA for multiple comparisons because the groups do not follow a normal distribution, you can use the Kruskal-Wallis test, which can be applied when you can not make the assumption that the groups follow a gaussian distribution.
This test is similar to the Wilcoxon test for 2 samples.

Suppose you want to see if the means of the following 4 sets of values are statistically similar:
Group A: 1, 5, 8, 17, 16
Group B: 2, 16, 5, 7, 4
Group C: 1, 1, 3, 7, 9
Group D: 2, 15, 2, 9, 7


To use the test of Kruskal-Wallis simply enter the data, and then organize them into a list:


a = c(1, 5, 8, 17, 16)
b = c(2, 16, 5, 7, 4)
c = c(1, 1, 3, 7, 9)
d = c(2, 15, 2, 9, 7)

dati = list(g1=a, g2=b, g3=c, g4=d)


Now we can apply the kruskal.test() function:


kruskal.test(dati)

Kruskal-Wallis rank sum test

data: dati
Kruskal-Wallis chi-squared = 1.9217, df = 3, p-value = 0.5888


The value of the test statistic is 1.9217. This value already contains the fix when there are ties (repetitions). The p-value is greater than 0.05; also the value of the test statistic is lower than the chi-square-tabulation:


qchisq(0.950, 3)
[1] 7.814728


The conclusion is therefore that I accept the null hypothesis H0: the means of the 4 groups are statistically equal.

2 comments:

  1. Thanks, helpful!

    ReplyDelete
  2. Although you're not aiming to discuss the Kruskal-Wallis test itself your conclusion seems wrong to me. You're accepting H0 that you've found no evidence of the groups being different. Forget about the means.

    Null hypothesis

    The null hypothesis is that the samples come from populations such that the probability that a random observation from one group is greater than a random observation from another group is 0.5.

    The Kruskal–Wallis test does not test the null hypothesis that the populations have identical means, which is the null hypothesis of a one-way anova. It is therefore incorrect to say something like "The mean amount of substance X was significantly higher in muscle tissue than in liver (Kruskal–Wallis test, P=0.012)." It also does not test the null hypothesis that the populations have equal medians, although you will see this error many places, including some statistics textbooks.

    from: http://udel.edu/~mcdonald/statkruskalwallis.html

    ReplyDelete