## Monday, August 3, 2009

### Parametric method for the study of the correlation: the Pearson r-test

Suppose you want to study whether there is a correlation between 2 sets of data. To do this we compute the Pearson product-moment correlation coefficient, which is a measure of the correlation (linear dependence) between two variables X and Y; then we compute the value of a t-test to study the significance of the Pearson coefficient R. We can use this test when the data follow a Gaussian distribution.

A new test to measure IQ is subjected to 10 volunteers. You want to see if there is a correlation between the new experimental test and the classical test, in order to replace the old test with the new test. These the values:
Old test: 15, 21, 25, 26, 30, 30, 22, 29, 19, 16
New test: 55, 56, 89, 67, 84, 89, 99, 62, 83, 88

The software R has a single function, easily recalled, which gives us directly the value of the Pearson coefficient and the t-statistical test for checking the significance of the coefficient:

`a = c(15, 21, 25, 26, 30, 30, 22, 29, 19, 16)b = c(55, 56, 89, 67, 84, 89, 99, 62, 83, 88)cor.test(a, b)        Pearson's product-moment correlationdata:  a and b t = 0.4772, df = 8, p-value = 0.646alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.5174766  0.7205107 sample estimates:     cor 0.166349`

The value of the coefficient of Pearson is 0.166: it is a very low value, which indicates a poor correlation between the variables.
Furthermore, the p-value is greater than 0.05; so we reject the null hypothesis: then the Pearson coefficient is significant.
So we can say that there is no correlation between the results of both tests.

1. I think the normality assumption is not that strict with Pearson Product-Moment Correlation Coefficient, especially if the data-set is larger.

On the other hand it is important to have integer/real values, rather than ordinal, then one should use Kandal or Spearman...

Great tutorials I must say, useful reference for people like me who only very ocasionaly use R. Thanks.

2. Furthermore, the p-value is greater than 0.05; so we reject the null hypothesis:
I think this is wrong. When p is greater than 0.05 (alpha) we cannot reject null hypothesis. Please correct me if I am wrong.

3. "Furthermore, the p-value is greater than 0.05; so we reject the null hypothesis: "
I think this is wrong. When p is greater than 0.05 (alpha) we cannot reject null hypothesis. Please correct me if I am wrong.