The ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).

Proceed with the following example:

The manager of a supermarket chain wants to see if the consumption in kilowatts of 4 stores between them are equal. He collects data at the end of each month for 6 months. The results are:

Store A: 65, 48, 66, 75, 70, 55

Store B: 64, 44, 70, 70, 68, 59

Store C: 60, 50, 65, 69, 69, 57

Store D: 62, 46, 68, 72, 67, 56

Store B: 64, 44, 70, 70, 68, 59

Store C: 60, 50, 65, 69, 69, 57

Store D: 62, 46, 68, 72, 67, 56

To proceed with the verification ANOVA, we must first verify the homoskedasticity (ie test for homogeneity of variances). The software R provides two tests: the Bartlett test, and the Fligner-Killeen test.

We begin with the

**Bartlett test**.

First we create the 4 vectors:

a = c(65, 48, 66, 75, 70, 55)

b = c(64, 44, 70, 70, 68, 59)

c = c(60, 50, 65, 69, 69, 57)

d = c(62, 46, 68, 72, 67, 56)

Now we combine the 4 vectors in a single vector:

dati = c(a, b, c, d)

Now, on this vector in which are stored all the data, we create the 4 levels:

groups = factor(rep(letters[1:4], each = 6))

We can observe the contents of the vector

`groups`

simply by typing `groups + [enter]`

.At this point we start the Bartlett test:

bartlett.test(dati, groups)

Bartlett test of homogeneity of variances

data: dati and groups

Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228

The function gave us the value of the statistical tests (

**K squared**), and the p-value. Can be argued that the variances are homogeneous since

*p-value > 0.05*. Alternatively, we can compare the Bartlett's K-squared with the value of chi-square-tables; we compute that value, assigning the value of alpha and degrees of freedom at the

`qchisq`

function:

qchisq(0.950, 3)

[1] 7.814728

Chi-squared > Bartlett's K-squared: we accept the null hypothesis H0 (variances homogeneity)

We try now to check the homoskedasticity, with the

**Fligner-Killeen test**.

The syntax is quite similar, and then proceed quickly.

a = c(65, 48, 66, 75, 70, 55)

b = c(64, 44, 70, 70, 68, 59)

c = c(60, 50, 65, 69, 69, 57)

d = c(62, 46, 68, 72, 67, 56)

dati = c(a, b, c, d)

groups = factor(rep(letters[1:4], each = 6))

fligner.test(dati, groups)

Fligner-Killeen test of homogeneity of variances

data: dati and groups

Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878

The conclusions are similar to those for the test of Bartlett.

Having verified the homoskedasticity of the 4 groups, we can proceed with the

**ANOVA model**.

First organize the values, fitting the model:

fit = lm(formula = dati ~ groups)

Then we analyze the ANOVA model:

anova (fit)

Analysis of Variance Table

Response: dati

Df Sum Sq Mean Sq F value Pr(>F)

groups 3 8.46 2.82 0.0327 0.9918

Residuals 20 1726.50 86.33

The output of the function is a classical ANOVA table with the following data:

`Df`

= degree of freedom`Sum Sq`

= deviance (within groups, and residual)`Mean Sq`

= variance (within groups, and residual)`F value`

= the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)`Pr(>F)`

= p-valueSince

*p-value > 0.05*, we accept the null hypothesis H0: the four means are statistically equal. We can also compare the computed F-value with the tabulated F-value:

qf(0.950, 20, 3)

[1] 8.66019

Tabulated F-value > computed F-value: we accept the null hyptohesis.

Sir, I wanted to compute Fisher's least significant difference (LSD) method

ReplyDeleteI used the following code to read the table

library(RODBC)

> library(agricolae)

> channel <- odbcConnectExcel("G:/oneway.xls")

> mydata <- sqlFetch(channel,"Sheet1")

> library(reshape)

> mydata.m <- melt(mydata, id=c("Treatment"))

> fit <- aov(value~Treatment, data=mydata.m)

> summary(fit)

Df Sum Sq Mean Sq F value Pr(>F)

Treatment 9 23494.8 2610.54 1589.5 < 2.2e-16 ***

Residuals 23 37.8 1.64

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

then I used the following code as given in the following sites

http://cran.r-project.org/web/packages/agricolae/index.html

http://tarwi.lamolina.edu.pe/~fmendiburu/

library(agricolae)

LSD.test()

comparison <- LSD.test(Treatment,Residuals,df,MSerror,group=F)

but I could not proceed any further,

How can I read ANOVA data from an Excel table and find out the LSD and graph it as done in Agricolae. This information will help me analyse field experimental data, thanks, Samuel, Bangalore, India

Hi Samuel,

ReplyDeletewhy you want to read the ANOVA table from Excel, while you have all data in R, to perform the LSD.test?

With your data, you can proceed as follow:

df<-df.residual(fit)MSerror<-deviance(fit)/df

library(agricolae)

comparison <- LSD.test(value, Treatment, df, MSerror, group=FALSE)

comparison

bar.err(comparison,std=TRUE,ylim=c(0,45),density=4,border="blue")

So you can perform your post-hoc test.

Sorry, before the comparison, you have to attach your data:

ReplyDeleteattach(mydata.m)I hope this help you.

Sir, it works so nicely, thanks, the reason I wanted to use is this, I wanted to do my data entry in MS excel as I wanted multiple columns, each column as a replication.

ReplyDeleteTreatment Rep1 Rep2 Rep3

Tridemorph (0.1%) 33.89 32.45 35.67

Sir I have another request, in the graph I would be happy if each bar has the treatment name as its label, now I am seeing only 3 labels,

thanking you very much, yours sincerely,

Dr.D.K.Samuel, Plant Virologist, Ind Inst of Horticultural Research, Bangalore - 89, India

Hi Samuel,

ReplyDeleteA question: are you performing a comparison between Rep1, Rep2, Rep3; or are you performing a comparison between Treatments, considering only one Rep?

Because you cannot perform an ANOVA considering replication AND treatment simultaneously: it is a "repeated measures ANOVA", which in R has another syntax.

However the bar.err() function plots the name of the factor you are considering in default.

(maybe, if you want, send at my e-mail your Excel file with all data, so I can have a better idea of your problem)

Hi Sir ,

ReplyDeleteI have the same test dataset organized in two different formats. I tried to do single factor ANOVA on both the formats which resulted two different results for P value. Can you please help me to know where the problem is? I am afraid I did something wrong in declaring the factor variable.

Dataset 1

110 17 11 10

13 12 7 12

14 18 9 8

13 13 13 7

Dataset 2

10 1

13 1

14 1

13 1

17 2

12 2

18 2

13 2

11 3

7 3

9 3

13 3

10 4

12 4

8 4

7 4

I followed this procedure for Dataset 1 in R following the blog

> test1 <- read.table("c://anova/dataset1.txt")

> names(test1) = c ("low" , "mod","high","veryhigh")

> attach(test1)

> merge <- c(low , mod , high , veryhigh)

> groups <- factor(rep(letters[1:4], each = 4))

> fit <- lm(formula = merge ~ groups)

> anova(fit)

Analysis of Variance Table

Response: merge

Df Sum Sq Mean Sq F value Pr(>F)

groups 3 2119.2 706.4 1.199 0.3518

Residuals 12 7069.8 589.1

I followed this procedure for Dataset 2 in R follwing (http://www.agr.kuleuven.ac.be/vakken/statisticsbyR/ANOVAbyRr/multiplecomp.htm)

> test2 <- read.table ("c://anova/dataset2.txt")

> names(test2) = c("data", "Brand")

> attach(test2)

> Brand <- factor(Brand)

> fit <- lm(formula = data ~ Brand)

> anova(fit)

Analysis of Variance Table

Response: data

Df Sum Sq Mean Sq F value Pr(>F)

Brand 3 81.688 27.229 4.6846 0.02176 *

Residuals 12 69.750 5.812

---

This will help me to do ANOVA in the orginal dataset for my analysis.

Many Thanks ,

Rodney , Australia

Rodney

Hi Rodney,

ReplyDeletethe two methods for declaring groups are totally equivalent, and you write the right code.

There is only an error:

Group 1 of dataset 1 is:

110, 13, 14, 13Group 1 of dataset 2 is:

10, 13, 14, 13Of course the two ANOVA tables are different (different variables). If you correct the first variable, you will obtain the same ANOVA table.

:)

Hi Sir ,

ReplyDeleteIt was really a silly mistake. Thank you very much for pointing it out.

Best Regards,

Rodney

thank you

ReplyDeleteI have a doubt: I´ve run an anova of 2 factors with only one observation per cell. In this case one can´t test the homogeneity of variances due to lack of observations in each group.However, if you run a bartlett test, you get a result. What does it do in this case? I can´t understand

ReplyDeleteHi Sir

ReplyDeleteWould it be possible to run the Flingner-Killeen test on the data from the four stores if one of the managers had forgotten to report their consumption of kilowatts one month? what would be the r-code for the groups be then?

Best regards

Thank you for this nice tutorial. Another good ressources on ANOVA can be found at http://www.sthda.com/english/wiki/one-way-anova-test-in-r

ReplyDelete