What are the differences between Factor Analysis and Principal Component Analysis?

  • It seems that a number of the statistical packages that I use wrap these two concepts together. However, I'm wondering if there are different assumptions or data 'formalities' that must be true to use one over the other. A real example would be incredibly useful.

    The principal components analysis and factor analysis chapters in the following book, which is available in most college libraries, address your question exactly: http://www.apa.org/pubs/books/4316510.aspx

    In addition to the answers below you might **also read** this and this of mine.

    And another good question like "should I use PCA or FA": http://stats.stackexchange.com/q/123063/3277.

    Explanation of why factor scores are inexact while component scores are true: http://stats.stackexchange.com/q/127483/3277.

    @ttnphns: I would encourage you to issue an answer in this thread, perhaps consisting of an annotated list of your answers in other related threads. This could replace your comments above (currently four comments with links), and would be more practical, especially if you briefly annotated each link. E.g. look here for the explanation of this issue, look there for an explanation of that issue, etc. It is just a suggestion, but I believe this thread would greatly benefit from it! One particular advantage is that you can always add more links to that answer.

    Personally, I like the analogy of PCA = **formative** and FA = **reflective**, see https://stats.stackexchange.com/q/279062/27276. But probably not all share that view.

    A similar question was asked on MathOverflow, and received what I would consider an excellent answer: https://mathoverflow.net/questions/40191/the-difference-between-principal-components-analysis-pca-and-factor-analysis

  • Principal component analysis involves extracting linear composites of observed variables.

    Factor analysis is based on a formal model predicting observed variables from theoretical latent factors.

    In psychology these two techniques are often applied in the construction of multi-scale tests to determine which items load on which scales. They typically yield similar substantive conclusions (for a discussion see Comrey (1988) Factor-Analytic Methods of Scale Development in Personality and Clinical Psychology). This helps to explain why some statistics packages seem to bundle them together. I have also seen situations where "principal component analysis" is incorrectly labelled "factor analysis".

    In terms of a simple rule of thumb, I'd suggest that you:

    1. Run factor analysis if you assume or wish to test a theoretical model of latent factors causing observed variables.

    2. Run principal component analysis If you want to simply reduce your correlated observed variables to a smaller set of important independent composite variables.

    The rule of thumb there is highly useful. Thanks for that.

    Regarding rule of thumb (1): Wouldn't I test a theoretical model of latent factors with a confirmatory factor analysis rather than an exploratory fa?

    @roman Yes. A CFA gives you much more control over the model than EFA. E.g., you can constrain loadings to zero; equate loadings; have correlated residuals; add higher order factors; etc.

    @Jeromy Anglim Is it really correct to say PCA makes a "smaller set of important independent composite variables." Or should you really say "smaller set of important uncorrelated composite variables". If the underlying data being used in PCA is not (multivariate) normally distributed, the reduced dimensional data will only be uncorrelated?

    The 2nd thumb of rule is easy to get, but how do I apply the first? Sounds maybe strange but when do I know I wanna' run a factor model against observed variables?

    In other words, principal component analysis is to factor analysis as induction is to deduction.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM