The ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).
Proceed with the following example:
The manager of a supermarket chain wants to see if the consumption in kilowatts of 4 stores between them are equal. He collects data at the end of each month for 6 months. The results are:
Store A: 65, 48, 66, 75, 70, 55
Store B: 64, 44, 70, 70, 68, 59
Store C: 60, 50, 65, 69, 69, 57
Store D: 62, 46, 68, 72, 67, 56
Store B: 64, 44, 70, 70, 68, 59
Store C: 60, 50, 65, 69, 69, 57
Store D: 62, 46, 68, 72, 67, 56
To proceed with the verification ANOVA, we must first verify the homoskedasticity (ie test for homogeneity of variances). The software R provides two tests: the Bartlett test, and the Fligner-Killeen test.
We begin with the Bartlett test.
First we create the 4 vectors:
a = c(65, 48, 66, 75, 70, 55)
b = c(64, 44, 70, 70, 68, 59)
c = c(60, 50, 65, 69, 69, 57)
d = c(62, 46, 68, 72, 67, 56)
Now we combine the 4 vectors in a single vector:
dati = c(a, b, c, d)
Now, on this vector in which are stored all the data, we create the 4 levels:
groups = factor(rep(letters[1:4], each = 6))
We can observe the contents of the vector
groups
simply by typing groups + [enter]
.At this point we start the Bartlett test:
bartlett.test(dati, groups)
Bartlett test of homogeneity of variances
data: dati and groups
Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228
The function gave us the value of the statistical tests (K squared), and the p-value. Can be argued that the variances are homogeneous since p-value > 0.05. Alternatively, we can compare the Bartlett's K-squared with the value of chi-square-tables; we compute that value, assigning the value of alpha and degrees of freedom at the
qchisq
function:
qchisq(0.950, 3)
[1] 7.814728
Chi-squared > Bartlett's K-squared: we accept the null hypothesis H0 (variances homogeneity)
We try now to check the homoskedasticity, with the Fligner-Killeen test.
The syntax is quite similar, and then proceed quickly.
a = c(65, 48, 66, 75, 70, 55)
b = c(64, 44, 70, 70, 68, 59)
c = c(60, 50, 65, 69, 69, 57)
d = c(62, 46, 68, 72, 67, 56)
dati = c(a, b, c, d)
groups = factor(rep(letters[1:4], each = 6))
fligner.test(dati, groups)
Fligner-Killeen test of homogeneity of variances
data: dati and groups
Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878
The conclusions are similar to those for the test of Bartlett.
Having verified the homoskedasticity of the 4 groups, we can proceed with the ANOVA model.
First organize the values, fitting the model:
fit = lm(formula = dati ~ groups)
Then we analyze the ANOVA model:
anova (fit)
Analysis of Variance Table
Response: dati
Df Sum Sq Mean Sq F value Pr(>F)
groups 3 8.46 2.82 0.0327 0.9918
Residuals 20 1726.50 86.33
The output of the function is a classical ANOVA table with the following data:
Df
= degree of freedomSum Sq
= deviance (within groups, and residual)Mean Sq
= variance (within groups, and residual)F value
= the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)Pr(>F)
= p-valueSince p-value > 0.05, we accept the null hypothesis H0: the four means are statistically equal. We can also compare the computed F-value with the tabulated F-value:
qf(0.950, 20, 3)
[1] 8.66019
Tabulated F-value > computed F-value: we accept the null hyptohesis.
Sir, I wanted to compute Fisher's least significant difference (LSD) method
ReplyDeleteI used the following code to read the table
library(RODBC)
> library(agricolae)
> channel <- odbcConnectExcel("G:/oneway.xls")
> mydata <- sqlFetch(channel,"Sheet1")
> library(reshape)
> mydata.m <- melt(mydata, id=c("Treatment"))
> fit <- aov(value~Treatment, data=mydata.m)
> summary(fit)
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 9 23494.8 2610.54 1589.5 < 2.2e-16 ***
Residuals 23 37.8 1.64
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
then I used the following code as given in the following sites
http://cran.r-project.org/web/packages/agricolae/index.html
http://tarwi.lamolina.edu.pe/~fmendiburu/
library(agricolae)
LSD.test()
comparison <- LSD.test(Treatment,Residuals,df,MSerror,group=F)
but I could not proceed any further,
How can I read ANOVA data from an Excel table and find out the LSD and graph it as done in Agricolae. This information will help me analyse field experimental data, thanks, Samuel, Bangalore, India
Hi Samuel,
ReplyDeletewhy you want to read the ANOVA table from Excel, while you have all data in R, to perform the LSD.test?
With your data, you can proceed as follow:
df<-df.residual(fit)
MSerror<-deviance(fit)/df
library(agricolae)
comparison <- LSD.test(value, Treatment, df, MSerror, group=FALSE)
comparison
bar.err(comparison,std=TRUE,ylim=c(0,45),density=4,border="blue")
So you can perform your post-hoc test.
Sorry, before the comparison, you have to attach your data:
ReplyDeleteattach(mydata.m)
I hope this help you.
Sir, it works so nicely, thanks, the reason I wanted to use is this, I wanted to do my data entry in MS excel as I wanted multiple columns, each column as a replication.
ReplyDeleteTreatment Rep1 Rep2 Rep3
Tridemorph (0.1%) 33.89 32.45 35.67
Sir I have another request, in the graph I would be happy if each bar has the treatment name as its label, now I am seeing only 3 labels,
thanking you very much, yours sincerely,
Dr.D.K.Samuel, Plant Virologist, Ind Inst of Horticultural Research, Bangalore - 89, India
Hi Samuel,
ReplyDeleteA question: are you performing a comparison between Rep1, Rep2, Rep3; or are you performing a comparison between Treatments, considering only one Rep?
Because you cannot perform an ANOVA considering replication AND treatment simultaneously: it is a "repeated measures ANOVA", which in R has another syntax.
However the bar.err() function plots the name of the factor you are considering in default.
(maybe, if you want, send at my e-mail your Excel file with all data, so I can have a better idea of your problem)
Hi Sir ,
ReplyDeleteI have the same test dataset organized in two different formats. I tried to do single factor ANOVA on both the formats which resulted two different results for P value. Can you please help me to know where the problem is? I am afraid I did something wrong in declaring the factor variable.
Dataset 1
110 17 11 10
13 12 7 12
14 18 9 8
13 13 13 7
Dataset 2
10 1
13 1
14 1
13 1
17 2
12 2
18 2
13 2
11 3
7 3
9 3
13 3
10 4
12 4
8 4
7 4
I followed this procedure for Dataset 1 in R following the blog
> test1 <- read.table("c://anova/dataset1.txt")
> names(test1) = c ("low" , "mod","high","veryhigh")
> attach(test1)
> merge <- c(low , mod , high , veryhigh)
> groups <- factor(rep(letters[1:4], each = 4))
> fit <- lm(formula = merge ~ groups)
> anova(fit)
Analysis of Variance Table
Response: merge
Df Sum Sq Mean Sq F value Pr(>F)
groups 3 2119.2 706.4 1.199 0.3518
Residuals 12 7069.8 589.1
I followed this procedure for Dataset 2 in R follwing (http://www.agr.kuleuven.ac.be/vakken/statisticsbyR/ANOVAbyRr/multiplecomp.htm)
> test2 <- read.table ("c://anova/dataset2.txt")
> names(test2) = c("data", "Brand")
> attach(test2)
> Brand <- factor(Brand)
> fit <- lm(formula = data ~ Brand)
> anova(fit)
Analysis of Variance Table
Response: data
Df Sum Sq Mean Sq F value Pr(>F)
Brand 3 81.688 27.229 4.6846 0.02176 *
Residuals 12 69.750 5.812
---
This will help me to do ANOVA in the orginal dataset for my analysis.
Many Thanks ,
Rodney , Australia
Rodney
Hi Rodney,
ReplyDeletethe two methods for declaring groups are totally equivalent, and you write the right code.
There is only an error:
Group 1 of dataset 1 is:
110, 13, 14, 13
Group 1 of dataset 2 is:
10, 13, 14, 13
Of course the two ANOVA tables are different (different variables). If you correct the first variable, you will obtain the same ANOVA table.
:)
Hi Sir ,
ReplyDeleteIt was really a silly mistake. Thank you very much for pointing it out.
Best Regards,
Rodney
thank you
ReplyDeleteI have a doubt: I´ve run an anova of 2 factors with only one observation per cell. In this case one can´t test the homogeneity of variances due to lack of observations in each group.However, if you run a bartlett test, you get a result. What does it do in this case? I can´t understand
ReplyDeleteHi Sir
ReplyDeleteWould it be possible to run the Flingner-Killeen test on the data from the four stores if one of the managers had forgotten to report their consumption of kilowatts one month? what would be the r-code for the groups be then?
Best regards
Thank you for this nice tutorial. Another good ressources on ANOVA can be found at http://www.sthda.com/english/wiki/one-way-anova-test-in-r
ReplyDeleteThanks and I have a tremendous supply: Where To Start House Remodeling split level kitchen remodel
ReplyDelete