Monday, August 3, 2009

Parametric method for the study of the correlation: the Pearson r-test

Suppose you want to study whether there is a correlation between 2 sets of data. To do this we compute the Pearson product-moment correlation coefficient, which is a measure of the correlation (linear dependence) between two variables X and Y; then we compute the value of a t-test to study the significance of the Pearson coefficient R. We can use this test when the data follow a Gaussian distribution.

A new test to measure IQ is subjected to 10 volunteers. You want to see if there is a correlation between the new experimental test and the classical test, in order to replace the old test with the new test. These the values:
Old test: 15, 21, 25, 26, 30, 30, 22, 29, 19, 16
New test: 55, 56, 89, 67, 84, 89, 99, 62, 83, 88


The software R has a single function, easily recalled, which gives us directly the value of the Pearson coefficient and the t-statistical test for checking the significance of the coefficient:


a = c(15, 21, 25, 26, 30, 30, 22, 29, 19, 16)
b = c(55, 56, 89, 67, 84, 89, 99, 62, 83, 88)

cor.test(a, b)

Pearson's product-moment correlation

data: a and b
t = 0.4772, df = 8, p-value = 0.646
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5174766 0.7205107
sample estimates:
cor
0.166349


The value of the coefficient of Pearson is 0.166: it is a very low value, which indicates a poor correlation between the variables.
Furthermore, the p-value is greater than 0.05; so we reject the null hypothesis: then the Pearson coefficient is significant.
So we can say that there is no correlation between the results of both tests.

3 comments:

  1. I think the normality assumption is not that strict with Pearson Product-Moment Correlation Coefficient, especially if the data-set is larger.

    On the other hand it is important to have integer/real values, rather than ordinal, then one should use Kandal or Spearman...

    Great tutorials I must say, useful reference for people like me who only very ocasionaly use R. Thanks.

    ReplyDelete
  2. Furthermore, the p-value is greater than 0.05; so we reject the null hypothesis:
    I think this is wrong. When p is greater than 0.05 (alpha) we cannot reject null hypothesis. Please correct me if I am wrong.

    ReplyDelete
  3. "Furthermore, the p-value is greater than 0.05; so we reject the null hypothesis: "
    I think this is wrong. When p is greater than 0.05 (alpha) we cannot reject null hypothesis. Please correct me if I am wrong.

    ReplyDelete