Statistic on aiR: Analysis of variance: ANOVA, for multiple comparisons

Thursday, July 30, 2009

Analysis of variance: ANOVA, for multiple comparisons

Analysis of variance: ANOVA, for multiple comparisons

The ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).
Proceed with the following example:

The manager of a supermarket chain wants to see if the consumption in kilowatts of 4 stores between them are equal. He collects data at the end of each month for 6 months. The results are:

Store A: 65, 48, 66, 75, 70, 55
Store B: 64, 44, 70, 70, 68, 59
Store C: 60, 50, 65, 69, 69, 57
Store D: 62, 46, 68, 72, 67, 56

To proceed with the verification ANOVA, we must first verify the homoskedasticity (ie test for homogeneity of variances). The software R provides two tests: the Bartlett test, and the Fligner-Killeen test.

We begin with the Bartlett test.

First we create the 4 vectors:


a = c(65, 48, 66, 75, 70, 55)
b = c(64, 44, 70, 70, 68, 59)
c = c(60, 50, 65, 69, 69, 57)
d = c(62, 46, 68, 72, 67, 56)

Now we combine the 4 vectors in a single vector:


dati = c(a, b, c, d)

Now, on this vector in which are stored all the data, we create the 4 levels:


groups = factor(rep(letters[1:4], each = 6))

We can observe the contents of the vector groups simply by typing groups + [enter].

At this point we start the Bartlett test:


bartlett.test(dati, groups)

        Bartlett test of homogeneity of variances

data:  dati and groups 
Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228

The function gave us the value of the statistical tests (K squared), and the p-value. Can be argued that the variances are homogeneous since p-value > 0.05. Alternatively, we can compare the Bartlett's K-squared with the value of chi-square-tables; we compute that value, assigning the value of alpha and degrees of freedom at the qchisq function:


qchisq(0.950, 3)
[1] 7.814728

Chi-squared > Bartlett's K-squared: we accept the null hypothesis H0 (variances homogeneity)

We try now to check the homoskedasticity, with the Fligner-Killeen test.
The syntax is quite similar, and then proceed quickly.


a = c(65, 48, 66, 75, 70, 55)
b = c(64, 44, 70, 70, 68, 59)
c = c(60, 50, 65, 69, 69, 57)
d = c(62, 46, 68, 72, 67, 56)

dati = c(a, b, c, d)

groups = factor(rep(letters[1:4], each = 6))

fligner.test(dati, groups)

        Fligner-Killeen test of homogeneity of variances

data:  dati and groups 
Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878

The conclusions are similar to those for the test of Bartlett.

Having verified the homoskedasticity of the 4 groups, we can proceed with the ANOVA model.

First organize the values, fitting the model:


fit = lm(formula = dati ~ groups)

Then we analyze the ANOVA model:


anova (fit)

Analysis of Variance Table

Response: dati
          Df  Sum Sq Mean Sq F value Pr(>F)
groups     3    8.46    2.82  0.0327 0.9918
Residuals 20 1726.50   86.33

The output of the function is a classical ANOVA table with the following data:
Df = degree of freedom
Sum Sq = deviance (within groups, and residual)
Mean Sq = variance (within groups, and residual)
F value = the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)
Pr(>F) = p-value

Since p-value > 0.05, we accept the null hypothesis H0: the four means are statistically equal. We can also compare the computed F-value with the tabulated F-value:

qf(0.950, 20, 3)
[1] 8.66019

Tabulated F-value > computed F-value: we accept the null hyptohesis.

13 comments:

Duleep SamuelJanuary 7, 2010 at 7:54 AM
Sir, I wanted to compute Fisher's least significant difference (LSD) method
I used the following code to read the table

library(RODBC)
> library(agricolae)
> channel <- odbcConnectExcel("G:/oneway.xls")
> mydata <- sqlFetch(channel,"Sheet1")
> library(reshape)
> mydata.m <- melt(mydata, id=c("Treatment"))
> fit <- aov(value~Treatment, data=mydata.m)
> summary(fit)
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 9 23494.8 2610.54 1589.5 < 2.2e-16 ***
Residuals 23 37.8 1.64
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

then I used the following code as given in the following sites

http://cran.r-project.org/web/packages/agricolae/index.html
http://tarwi.lamolina.edu.pe/~fmendiburu/

library(agricolae)
LSD.test()
comparison <- LSD.test(Treatment,Residuals,df,MSerror,group=F)

but I could not proceed any further,

How can I read ANOVA data from an Excel table and find out the LSD and graph it as done in Agricolae. This information will help me analyse field experimental data, thanks, Samuel, Bangalore, India
ReplyDelete
Replies
Todos LogosJanuary 7, 2010 at 9:06 AM
Hi Samuel,
why you want to read the ANOVA table from Excel, while you have all data in R, to perform the LSD.test?

With your data, you can proceed as follow:

df<-df.residual(fit)
MSerror<-deviance(fit)/df
library(agricolae)
comparison <- LSD.test(value, Treatment, df, MSerror, group=FALSE)
comparison
bar.err(comparison,std=TRUE,ylim=c(0,45),density=4,border="blue")

So you can perform your post-hoc test.
ReplyDelete
Replies
Todos LogosJanuary 7, 2010 at 9:14 AM
Sorry, before the comparison, you have to attach your data:

attach(mydata.m)

I hope this help you.
ReplyDelete
Replies
Duleep SamuelJanuary 7, 2010 at 10:22 AM
Sir, it works so nicely, thanks, the reason I wanted to use is this, I wanted to do my data entry in MS excel as I wanted multiple columns, each column as a replication.
Treatment Rep1 Rep2 Rep3
Tridemorph (0.1%) 33.89 32.45 35.67

Sir I have another request, in the graph I would be happy if each bar has the treatment name as its label, now I am seeing only 3 labels,
thanking you very much, yours sincerely,
Dr.D.K.Samuel, Plant Virologist, Ind Inst of Horticultural Research, Bangalore - 89, India
ReplyDelete
Replies
Todos LogosJanuary 7, 2010 at 1:31 PM
Hi Samuel,
A question: are you performing a comparison between Rep1, Rep2, Rep3; or are you performing a comparison between Treatments, considering only one Rep?
Because you cannot perform an ANOVA considering replication AND treatment simultaneously: it is a "repeated measures ANOVA", which in R has another syntax.

However the bar.err() function plots the name of the factor you are considering in default.

(maybe, if you want, send at my e-mail your Excel file with all data, so I can have a better idea of your problem)
ReplyDelete
Replies
UnknownFebruary 2, 2010 at 6:41 AM
Hi Sir ,

I have the same test dataset organized in two different formats. I tried to do single factor ANOVA on both the formats which resulted two different results for P value. Can you please help me to know where the problem is? I am afraid I did something wrong in declaring the factor variable.

Dataset 1

110 17 11 10
13 12 7 12
14 18 9 8
13 13 13 7

Dataset 2
10 1
13 1
14 1
13 1
17 2
12 2
18 2
13 2
11 3
7 3
9 3
13 3
10 4
12 4
8 4
7 4

I followed this procedure for Dataset 1 in R following the blog

> test1 <- read.table("c://anova/dataset1.txt")
> names(test1) = c ("low" , "mod","high","veryhigh")
> attach(test1)
> merge <- c(low , mod , high , veryhigh)
> groups <- factor(rep(letters[1:4], each = 4))
> fit <- lm(formula = merge ~ groups)
> anova(fit)

Analysis of Variance Table

Response: merge
Df Sum Sq Mean Sq F value Pr(>F)
groups 3 2119.2 706.4 1.199 0.3518
Residuals 12 7069.8 589.1

I followed this procedure for Dataset 2 in R follwing (http://www.agr.kuleuven.ac.be/vakken/statisticsbyR/ANOVAbyRr/multiplecomp.htm)

> test2 <- read.table ("c://anova/dataset2.txt")
> names(test2) = c("data", "Brand")
> attach(test2)
> Brand <- factor(Brand)
> fit <- lm(formula = data ~ Brand)
> anova(fit)

Analysis of Variance Table

Response: data
Df Sum Sq Mean Sq F value Pr(>F)
Brand 3 81.688 27.229 4.6846 0.02176 *
Residuals 12 69.750 5.812
---
This will help me to do ANOVA in the orginal dataset for my analysis.

Many Thanks ,

Rodney , Australia
Rodney
ReplyDelete
Replies
Todos LogosFebruary 2, 2010 at 9:00 AM
Hi Rodney,
the two methods for declaring groups are totally equivalent, and you write the right code.
There is only an error:

Group 1 of dataset 1 is:
110, 13, 14, 13
Group 1 of dataset 2 is:
10, 13, 14, 13

Of course the two ANOVA tables are different (different variables). If you correct the first variable, you will obtain the same ANOVA table.

:)
ReplyDelete
Replies
UnknownFebruary 2, 2010 at 10:03 AM
Hi Sir ,

It was really a silly mistake. Thank you very much for pointing it out.

Best Regards,

Rodney
ReplyDelete
Replies
AnonymousJanuary 4, 2011 at 1:16 PM
thank you
ReplyDelete
Replies
AnonymousNovember 23, 2012 at 5:42 PM
I have a doubt: I´ve run an anova of 2 factors with only one observation per cell. In this case one can´t test the homogeneity of variances due to lack of observations in each group.However, if you run a bartlett test, you get a result. What does it do in this case? I can´t understand
ReplyDelete
Replies
AnonymousAugust 19, 2014 at 11:05 AM
Hi Sir
Would it be possible to run the Flingner-Killeen test on the data from the four stores if one of the managers had forgotten to report their consumption of kilowatts one month? what would be the r-code for the groups be then?
Best regards
ReplyDelete
Replies
ANOVAOctober 13, 2016 at 7:38 PM
Thank you for this nice tutorial. Another good ressources on ANOVA can be found at http://www.sthda.com/english/wiki/one-way-anova-test-in-r
ReplyDelete
Replies
AnonymousSeptember 7, 2024 at 4:33 PM
Thanks and I have a tremendous supply: Where To Start House Remodeling split level kitchen remodel
ReplyDelete
Replies

Add comment

Statistic on aiR

Thursday, July 30, 2009

Analysis of variance: ANOVA, for multiple comparisons

13 comments:

Google Ads

Tag Cloud

Blog's Info & Utilities

Contact me at:

Blog Archive

Last Comments