Tuesday, July 28, 2009

Wilcoxon-Mann-Whitney rank sum test (or test U)

Comparison of the averages of two independent groups of samples, of which we can not assume a distribution of Gaussian type; is also known as Mann-Whitney U-test.

You want to see if the mean of goals suffered by two football teams over the years is the same. Are below the number of goals suffered by each team in 6 games for each year.
Team A: 6, 8, 2, 4, 4, 5
Team B: 7, 10, 4, 3, 5, 6


The Wilcoxon-Matt-Whitney test (or Wilcoxon rank sum test, or Mann-Whitney U-test) is used when is asked to compare the means of two groups that do not follow a normal distribution: it is a non-parametrical test. It is the equivalent of the t test, applied for independent samples.
Let's see how to solve the problem with R:


a = c(6, 8, 2, 4, 4, 5)
b = c(7, 10, 4, 3, 5, 6)

wilcox.test(a,b, correct=FALSE)

Wilcoxon rank sum test

data: a and b
W = 14, p-value = 0.5174
alternative hypothesis: true location shift is not equal to 0


The p-value is greater than 0.05, then we can accept the hypothesis H0 of statistical equality of the means of two groups.
If you run wilcox.test(b, a, correct = FALSE), the p-value would be logically the same:


a = c(6, 8, 2, 4, 4, 5)
b = c(7, 10, 4, 3, 5, 6)

wilcox.test(b,a, correct=FALSE)

Wilcoxon rank sum test

data: b and a
W = 22, p-value = 0.5174
alternative hypothesis: true location shift is not equal to 0


The value W is so computed:


sum.rank.a = sum(rank(c(a,b))[1:6]) #sum of ranks assigned to the group a
W = sum.rank.a – (length(a)*(length(a)+1)) / 2
W
[1] 14

sum.rank.b = sum(rank(c(a,b))[7:12]) #sum of ranks assigned to the group b
W = sum.rank.b – (length(b)*(length(b)+1)) / 2
W
[1] 22


We can finally compare the intervals tabulated on the tables of Wilcoxon for independent samples. The tabulated interval for two groups of 6 samples each is (26, 52), while the interval of our samples is:


sum(rank(c(a,b))[1:6]) #sum of ranks assigned to the group a
[1] 35
sum(rank(c(a,b))[7:12]) #sum of ranks assigned to the group b
[1] 43


Since the computed interval (35, 43), is contained within the tabulated interval (26, 52), we conclude by accepting the hypothesis H0 of equality of means.

1 comment:

  1. Where do the 26 and 52 come from? Is that a mistake and you mean 14 and 22 you calculated above?

    ReplyDelete