- Machine Learning Algorithms
- Giuseppe Bonaccorso
- 377字
- 2021-07-02 18:53:31
Kernel PCA
We're going to discuss kernel methods in Chapter 7, Support Vector Machines, however, it's useful to mention the class KernelPCA, which performs a PCA with non-linearly separable data sets. Just to understand the logic of this approach (the mathematical formulation isn't very simple), it's useful to consider a projection of each sample into a particular space where the dataset becomes linearly separable. The components of this space correspond to the first, second, ... principal components and a kernel PCA algorithm, therefore, computes the projection of our samples onto each of them.
Let's consider a dataset made up of a circle with a blob inside:
from sklearn.datasets import make_circles
>>> Xb, Yb = make_circles(n_samples=500, factor=0.1, noise=0.05)
The graphical representation is shown in the following picture. In this case, a classic PCA approach isn't able to capture the non-linear dependency of existing components (the reader can verify that the projection is equivalent to the original dataset). However, looking at the samples and using polar coordinates (therefore, a space where it's possible to project all the points), it's easy to separate the two sets, only considering the radius:
Considering the structure of the dataset, it's possible to investigate the behavior of a PCA with a radial basis function kernel. As the default value for gamma is 1.0/number of features (for now, consider this parameter as inversely proportional to the variance of a Gaussian), we need to increase it to capture the external circle. A value of 1.0 is enough:
from sklearn.decomposition import KernelPCA
>>> kpca = KernelPCA(n_components=2, kernel='rbf', fit_inverse_transform=True, gamma=1.0)
>>> X_kpca = kpca.fit_transform(Xb)
The instance variable X_transformed_fit_ will contain the projection of our dataset into the new space. Plotting it, we get:
The plot shows a separation just like expected, and it's also possible to see that the points belonging to the central blob have a curve distribution because they are more sensitive to the distance from the center.
Kernel PCA is a powerful instrument when we think of our dataset as made up of elements that can be a function of components (in particular, radial-basis or polynomials) but we aren't able to determine a linear relationship among them.