What's the difference between correlation and simple linear regression?
In particular, I am referring to the Pearson product-moment correlation coefficient.
Note that one perspective on the relationship between regression & correlation can be discerned from my answer here: What is the difference between doing linear regression on y with x versus x with y?.
What's the difference between the correlation between $X$ and $Y$ and a linear regression predicting $Y$ from $X$?
First, some similarities:
- the standardised regression coefficient is the same as Pearson's correlation coefficient
- The square of Pearson's correlation coefficient is the same as the $R^2$ in simple linear regression
- Neither simple linear regression nor correlation answer questions of causality directly. This point is important, because I've met people that think that simple regression can magically allow an inference that $X$ causes $Y$.
Second, some differences:
- The regression equation (i.e., $a + bX$) can be used to make predictions on $Y$ based on values of $X$
- While correlation typically refers to the linear relationship, it can refer to other forms of dependence, such as polynomial or truly nonlinear relationships
- While correlation typically refers to Pearson's correlation coefficient, there are other types of correlation, such as Spearman's.
Hi Jeromy, thank you for your explaination, but I still have a question here: What if I don not need to make predictions and just want to know how close two variable are and in which direction/strength? Is there still a different using these two technique?
@yue86231 Then it sounds like a measure of correlation would be more appropriate.
(+1) To the similarities it might be useful to add that standard tests of the hypothesis "correlation=0" or, equivalently, "slope=0" (for the regression in either order), such as carried out by `lm` and `cor.test` in `R`, will yield identical p-values.
I agree that the suggestion from @whuber should be added, but at a very basic level I think it is worth pointing out that the *sign* of the regression slope and the correlation coefficient are equal. This is probably one of the first things most people learn about the relationship between correlation and a "line of best fit" (even if they don't call it "regression" yet) but I think it's worth noting. To the differences, the fact that you get the same answer correlation X with Y or vice versa, but that the regression of Y on X is different to that of X on Y, might also merit a mention.