PCA model in R

Question

What is Principal Component Analysis and how do I create it's model in R

Sahiti · Answer 1 · Jul 17, 2018

Principal Component Analysis is a method for dimensionality reduction. Many times, it happens that, one observation is related to multiple dimensions(features) and this brings in a lot of chaos to the data, that is why it is important to reduce the number of dimensions.

The concept of Principal Component Analysis is this:

The data is transformed to a new space, with equal or less number of dimensions. These dimensions(features) are known as principal components.
The first principal component captures the maximum amount of variance from the features in the original data.
The second principal component is orthogonal to the first and captures the maximum amount of variability left.
The same is true for each principal component, they are all uncorrelated and each is less important than the previous one.

You can do PCA in R with the help of “prcomp()” function.

answered Jul 17, 2018 by Sahiti
• 6,370 points

zombie · Answer 2 · Jul 19, 2018

Principal component analysis (PCA) is routinely employed on a wide range of problems. From the detection of outliers to predictive modeling, PCA has the ability of projecting the observations described by $p$ variables into few orthogonal components defined at where the data ‘stretch’ the most, rendering a simplified overview. PCA is particularly powerful in dealing with multicollinearity and variables that outnumber the samples ( $p \gg n$ ).

It is an unsupervised method, meaning it will always look into the greatest sources of variation regardless of the data structure. Its counterpart, the partial least squares (PLS), is a supervised method and will perform the same sort of covariance decomposition, albeit building a user-defined number of components (frequently designated as latent variables) that minimize the SSE from predicting a specified outcome with an ordinary least squares (OLS).

Although there is a plethora of PCA methods available for R, I will only introduce two,