What's the difference between correlation and simple linear regression?

• In particular, I am referring to the Pearson product-moment correlation coefficient.

Note that one perspective on the relationship between regression & correlation can be discerned from my answer here: What is the difference between doing linear regression on y with x versus x with y?.

• What's the difference between the correlation between \$X\$ and \$Y\$ and a linear regression predicting \$Y\$ from \$X\$?

First, some similarities:

• the standardised regression coefficient is the same as Pearson's correlation coefficient
• The square of Pearson's correlation coefficient is the same as the \$R^2\$ in simple linear regression
• Neither simple linear regression nor correlation answer questions of causality directly. This point is important, because I've met people that think that simple regression can magically allow an inference that \$X\$ causes \$Y\$.

Second, some differences:

• The regression equation (i.e., \$a + bX\$) can be used to make predictions on \$Y\$ based on values of \$X\$
• While correlation typically refers to the linear relationship, it can refer to other forms of dependence, such as polynomial or truly nonlinear relationships
• While correlation typically refers to Pearson's correlation coefficient, there are other types of correlation, such as Spearman's.

Hi Jeromy, thank you for your explaination, but I still have a question here: What if I don not need to make predictions and just want to know how close two variable are and in which direction/strength? Is there still a different using these two technique?

@yue86231 Then it sounds like a measure of correlation would be more appropriate.

(+1) To the similarities it might be useful to add that standard tests of the hypothesis "correlation=0" or, equivalently, "slope=0" (for the regression in either order), such as carried out by `lm` and `cor.test` in `R`, will yield identical p-values.

I agree that the suggestion from @whuber should be added, but at a very basic level I think it is worth pointing out that the *sign* of the regression slope and the correlation coefficient are equal. This is probably one of the first things most people learn about the relationship between correlation and a "line of best fit" (even if they don't call it "regression" yet) but I think it's worth noting. To the differences, the fact that you get the same answer correlation X with Y or vice versa, but that the regression of Y on X is different to that of X on Y, might also merit a mention.