Principal Component Analysis (PCA) is an unsupervised learning algorithm as it ignores the class labels (the so-called principal components) that maximize the variance in a dataset, to find the directions. In other words, PCA is basically summarization of data.PCA does not select a set of features and discard other features, but it infers some new features, which best describe the type of class from the existing features.
PCA works on eigenvectors and eigenvalues of the covariance matrix, which is the equivalent of fitting those straight, principal-component lines to the variance of the data. Why? Because eigenvectors trace the principal lines of force, In other words, PCA determines the lines of variance in the dataset which are called as principal components with the first principal component having the maximum variance, second principal component having second maximum variance and so on.
Linear Discriminant Analysis is a supervised algorithm as it takes the class label into consideration. It is a way to reduce ‘dimensionality’ while at the same time preserving as much of the class discrimination information as possible.
LDA helps you find the boundaries around clusters of classes. It projects your data points on a line so that your clusters are as separated as possible, with each cluster having a relative (close) distance to a centroid.
So the question arises- how are these clusters are defined and how do we get the reduced feature set in case of LDA?
Basically LDA finds a centroid of each class datapoints. For example with thirteen different features LDA will find the centroid of each of its class using the thirteen different feature dataset. Now on the basis of this, it determines a new dimension which is nothing but an axis which should satisfy two criteria:
1. Maximize the distance between the centroid of each class.
2. Minimize the variation (which LDA calls scatter and is represented by s2), within each category.
PCA performs better in case where number of samples per class is less. Whereas LDA works better with large dataset having multiple classes; class separability is an important factor while reducing dimensionality.