- Machine Learning Algorithms
- Giuseppe Bonaccorso
- 255字
- 2021-07-02 18:53:31
Non-negative matrix factorization
When the dataset is made up of non-negative elements, it's possible to use non-negative matrix factorization (NNMF) instead of standard PCA. The algorithm optimizes a loss function (alternatively on W and H) based on the Frobenius norm:
If dim(X) = n x m, then dim(W) = n x p and dim(H) = p x m with p equal to the number of requested components (the n_components parameter), which is normally smaller than the original dimensions n and m.
The final reconstruction is purely additive and it has been shown that it's particularly efficient for images or text where there are normally no non-negative elements. In the following snippet, there's an example using the Iris dataset (which is non-negative). The init parameter can assume different values (see the documentation) which determine how the data matrix is initially processed. A random choice is for non-negative matrices which are only scaled (no SVD is performed):
from sklearn.datasets import load_iris
from sklearn.decomposition import NMF
>>> iris = load_iris()
>>> iris.data.shape
(150L, 4L)
>>> nmf = NMF(n_components=3, init='random', l1_ratio=0.1)
>>> Xt = nmf.fit_transform(iris.data)
>>> nmf.reconstruction_err_
1.8819327624141866
>>> iris.data[0]
array([ 5.1, 3.5, 1.4, 0.2])
>>> Xt[0]
array([ 0.20668461, 1.09973772, 0.0098996 ])
>>> nmf.inverse_transform(Xt[0])
array([ 5.10401653, 3.49666967, 1.3965409 , 0.20610779])
NNMF, together with other factorization methods, will be very useful for more advanced techniques, such as recommendation systems and topic modeling.