Sparse PCA

The scikit-learn library provides different PCA variants that can solve particular problems. I do suggest reading the original documentation. However, I'd like to mention SparsePCA, which allows exploiting the natural sparsity of data while extracting principal components. If you think about the handwritten digits or other images that must be classified, their initial dimensionality can be quite high (a 10 x 10 image has 100 features). However, applying a standard PCA selects only the average most important features, assuming that every sample can be rebuilt using the same components. Simplifying this, this is equivalent to the following:

On the other hand, we can always use a limited number of components, but without the limitation given by a dense projection matrix. This can be achieved by using sparse matrices (or vectors), where the number of non zero elements is quite low. In this way, each element can be rebuilt using its specific components (in most cases, they will always be the most important), which can include elements normally discarded by a dense PCA. The previous expression now becomes the following:

Here, the non-null components have been put into the first block (they don't have the same order as the previous expression), while all the other zero terms have been separated. In terms of linear algebra, the vectorial space now has the original dimensions. However, using the power of sparse matrices (provided by scipy.sparse), scikit-learn can solve this problem much more efficiently than a classical PCA.

The following snippet shows a SparsePCA with 60 components. In this context, they're usually called atoms, and the amount of sparsity can be controlled via L1-norm regularization (higher alpha parameter values lead to more sparse results). This approach is very common in classification algorithms and will be discussed in the upcoming Atom extraction and dictionary learning section and also in the following chapters:

from sklearn.decomposition import SparsePCA

spca = SparsePCA(n_components=60, alpha=0.1)
X_spca = spca.fit_transform(digits.data / 255)

print(spca.components_.shape)
(60L, 64L)

As we are going to discuss, the extraction of sparse components is very helpful whenever it's necessary to rebuild each sample, starting from a finite subset of features. In this particular case, we are not considering the explained variance anymore, but we are focusing on finding out all those elements that can be used as distinctive atoms. For example, we could employ a SparsePCA (which is equivalent to a dictionary learning scikit-learn) to the MNIST dataset in order to find the geometrical base components (such as vertical/horizontal lines) without caring about the actual dimensionality reduction (which becomes a secondary goal, in this case).

For further information about SciPy sparse matrices, visit https://docs.scipy.org/doc/scipy-0.18.1/reference/sparse.html.