We want to compare the heights in inches of two groups of individuals. Here the measurements:
A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188
B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188
As we have seen in a previous exercise, we must first check whether the variances are homogeneous (homoskedasticity) with a F-test of Fisher:
a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
b = c(120, 180, 125, 188, 130, 190, 110, 185, 112, 188)
var.test(b,a)
F test to compare two variances
data: b and a
F = 14.6431, num df = 9, denom df = 9, p-value = 0.0004636
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
3.637133 58.952936
sample estimates:
ratio of variances
14.64308
We obtained p-value less than 0.05, then the two variances are not homogeneous. Indeed we can compare the value of F computed with the tabulated value of F for alpha = 0.05, degrees of freedom at numerator = 9, and degrees of freedom of denominator = 9, using the function
qf(p, df.num, df.den)
:
qf(0.95, 9, 9)
[1] 3.178893
F-computed is greater than F-tabulated, so we can reject the null hypothesis H0 of homogeneity of variances.
To make the comparison between the two groups, we use the function
t.test
with not homogeneous variances (var.equal = FALSE
, which can also be omitted, because the function works on non-homogeneous variance by default) and independent samples (paired = FALSE
, which can also be omitted, because by default the function works on independent samples) in this way:
t.test(a,b, var.equal=FALSE, paired=FALSE)
Welch Two Sample t-test
data: a and b
t = 1.8827, df = 10.224, p-value = 0.08848
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.95955 47.95955
sample estimates:
mean of x mean of y
174.8 152.8
As we see in the headline, you made a t-test on two samples with the calculation of degrees of freedom using the formula of Welch-Satterthwaite (the result of the formula is df = 10,224), which is used in cases where the variances are not homogeneous. Welch-Satterthwaite equation is also called Dixon-Massey formula when you make the comparison between two groups, as in this case.
We obtained p-value greater than 0.05, then we can conclude that the means of the two groups are significantly similar (albeit p-value is very close to the threshold 0.05). Indeed the value of t is less than the tabulated t-value for 10,224 degrees of freedom, which in R we can calculate:
qt(0.975, 10.224)
[1] 2.221539
We can accept the hypothesis H0 of equality of means.
Welch-Satterthwaite formula:
$$df=\frac{\sum deviance(X)}{\sum df(X)}=\frac{\sum_{i=1}^{k} (\sum_{j=1}^{n} (X_{ij} - \bar{X_i})^2}{\sum_{i=1}^{k}(n_i-1)}$$
Dixon-Massey formula:
$$df=\frac{\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}+\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\frac{\displaystyle\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}\right)^2}{\displaystyle n_1-1}+\frac{\displaystyle\left(\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\displaystyle n_2-1}}$$
No comments:
Post a Comment