Why do we need to normalize data before principal component analysis (PCA)?

  • I'm doing principal component analysis on my dataset and my professor told me that I should normalize the data before doing the analysis. Why?

    • What would happen If I did PCA without normalization?
    • Why do we normalize data in general?
    • Could someone give clear and intuitive example which would demonstrate the consequences of not normalizing the data before analysis?

    If some variables have a large variance and some small, PCA (maximizing variance) will load on the large variances. For example if you change one variable from km to cm (increasing its variance), it may go from having little impact to dominating the first principle component. If you want your PCA to be independent of such rescaling, standardizing the variables will do that. On the other hand, if the specific scale of your variables matters (in that you want your PCA to be in that scale), maybe you don't want to standardize.

    Watch out: normalize in statistics sometimes carries the meaning of transform to be closer to a normal or Gaussian distribution. As @Glen_b exemplifies, it is better to talk of standardizing when what is meant is scaling by (value - mean)/SD (or some other _specified_ standardization).

    Ouch, that 'principle' instead of 'principal' in my comment up there is going to drive me crazy every time I look at it.

    @Glen_b In principle, you do know how to spell it. Getting it right all the time is the principal difficulty.

    These are multiple questions so there is no one exact duplicate, but every one of them is extensively and well discussed elsewhere on this site. A good search to begin with is on pca correl* covariance.

    @NickCox The generally accepted definition of normalise is to transform a random variable to one with zero means and unit standard deviation. This is also what Google gives when you search "define normalise". Therefore it is not better to use a different word for the same thing.

    @Robino I agree with your conclusion but I disagree with your assertion. The problem is that there is not a generally accepted meaning across statistics and machine learning. Normalise is used with the sense I mention and with other senses too, e.g. scaling to within [0, 1].

    @NickCox Should I use mean normalization by using x-mean/std. or just use feature scaling before applying pca.I am applying pca to images whose pixel values varies from 0-255 .

    @Boris I can't possibly advise remotely on what is best for you beyond pointing that (x $-$ mean) / SD is one method possible and certainly not x $-$ mean/SD. If all your variables are in [0, 255] it's conceivable that not scaling at all makes as much sense as any other approach.

    @NickCox means it doesn't matter

    Not what I meant. Not knowing which method is best for your data and your project doesn't mean that I am implying that choice of method doesn't matter.

    @whuber: You get 0 hits with your search.

    @MSIS Thank you. Somehow the system eliminated the wild card "*" after "correl". I have re-inserted it and hope it stays there this time! It now returns 316 results.

  • Dr. Mike

    Dr. Mike Correct answer

    7 years ago

    Normalization is important in PCA since it is a variance maximizing exercise. It projects your original data onto directions which maximize the variance. The first plot below shows the amount of total variance explained in the different principal components wher we have not normalized the data. As you can see, it seems like component one explains most of the variance in the data.

    Without normalization

    If you look at the second picture, we have normalized the data first. Here it is clear that the other components contribute as well. The reason for this is because PCA seeks to maximize the variance of each component. And since the covariance matrix of this particular dataset is:

                 Murder   Assault   UrbanPop      Rape
    Murder    18.970465  291.0624   4.386204  22.99141
    Assault  291.062367 6945.1657 312.275102 519.26906
    UrbanPop   4.386204  312.2751 209.518776  55.76808
    Rape      22.991412  519.2691  55.768082  87.72916
    

    From this structure, the PCA will select to project as much as possible in the direction of Assault since that variance is much greater. So for finding features usable for any kind of model, a PCA without normalization would perform worse than one with normalization.

    With normalization

    You explain standardizing not normalization but anyway good staff here :)

    @Erogol that is true.

    Great post! Perfectly reproduceable with skelarn. BTW, USArrests dataset can be downloaded from here https://vincentarelbundock.github.io/Rdatasets/datasets.html

    Just curious: How come the autocorrelations in your data are not 1 ?

    @gary this is a covariance matrix, not a correlation matrix, therefore the diagonal elements are not necessarily equal to 1.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM