Non-Negative Matrix Factorization

When the dataset X is made up of non-negative elements, it's possible to use Non-Negative Matrix Factorization (NNMF) instead of standard PCA. This algorithm optimizes a loss function (alternatively on W and H) based on the Frobenius norm:

If dim(X) = n × m, then dim(W) = n × p and dim(H) = p × m with p equal to the number of requested components (the n_components parameter), which is normally smaller than the original dimensions n and m. In some implementations, the loss function is squared and L1/L2 penalties are applied to both the W and H matrices:

As we are going to discuss in the next chapter, Chapter 4Regression Algorithms, the regularization terms can improve the accuracy and increase the sparseness of W and H (L1 penalty) by selecting all those elements that are rather greater than zero and forcing all the other ones to become null. The final reconstruction is purely additive and it has been shown that it's particularly efficient for images or text where there are normally no non-negative elements. The algorithms start by setting random values for both W and H, and then it alternates between two optimization steps:

  1. W is kept constant and L is minimized with respect to H
  2. H is kept constant and L is minimized with respect to W

The procedure is repeated until both W and H become stable. Clearly, the goal is to find a basis, H, so that X ≈ WH. The main difference with PCA is that we don't have any constraint on the explained variance, but the matrices must be non-negative.

In the following snippet, there's an example of using the Iris dataset. The init parameter can assume different values (see the documentation) that determine how the data matrix is initially processed. A random choice is for non-negative matrices that are only scaled (no SVD is performed):

from sklearn.datasets import load_iris
from sklearn.decomposition import NMF

iris = load_iris()
print(iris.data.shape)
(150L, 4L)

nmf = NMF(n_components=3, init='random', l1_ratio=0.1)
Xt = nmf.fit_transform(iris.data)

print(nmf.reconstruction_err_)
1.8819327624141866

print(iris.data[0])
array([ 5.1, 3.5, 1.4, 0.2])

print(Xt[0])
array([ 0.20668461, 1.09973772, 0.0098996 ])

print(nmf.inverse_transform(Xt[0]))
array([ 5.10401653, 3.49666967, 1.3965409 , 0.20610779])

NNMF, together with other factorization methods, will be very useful for more advanced techniques, such as recommendation systems and topic modeling, hence I invite the reader to understand the logic of these methods. The factorization of a matrix allows you to express it as a product of two entities that share one dimension. For example, if X is made up of user vectors where each component refers to a product, W will associate users to some unknown (called latent) factors, and H will associate the latent factors with the products. As we are going to discuss in Chapter 12, Introduction to Recommendation Systems, this assumption becomes a fundamental concept. The latent factor behave as a basis in linear algebra. Hence it's a reference system that allows finding the components of any point in a specific subspace.

NNMF is very sensitive to its parameters (in particular, initialization and regularization), so I suggest reading the original documentation for further information:  http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html.