NMF - Misplaced Pages

Non-negative matrix factorization ( NMF or NNMF ), also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H , with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

#635364

81-527: NMF may refer to: Non-negative matrix factorization National Medical Fellowships , a nonprofit organization providing scholarships and awards to underrepresented minority medical students in the United States Neuenfelder Maschinenfabrik , a German company for ship cranes N-Methylformamide New minor forcing , a contract bridge bidding convention Neural modeling fields ,

162-489: A i j | , {\displaystyle \|A\|_{1}=\max _{1\leq j\leq n}\sum _{i=1}^{m}|a_{ij}|,} which is simply the maximum absolute column sum of the matrix. ‖ A ‖ ∞ = max 1 ≤ i ≤ m ∑ j = 1 n | a i j | , {\displaystyle \|A\|_{\infty }=\max _{1\leq i\leq m}\sum _{j=1}^{n}|a_{ij}|,} which

243-694: A p -norm unit ball V p , m {\displaystyle V_{p,m}} in K m {\displaystyle K^{m}} , then multiply it by at least ‖ A ‖ p {\displaystyle \|A\|_{p}} , in order for it to be large enough to contain A V p , n {\displaystyle AV_{p,n}} . When p = 1 , ∞ {\displaystyle p=1,\infty } , we have simple formulas. ‖ A ‖ 1 = max 1 ≤ j ≤ n ∑ i = 1 m |

324-434: A compatible vector norm on K n {\displaystyle K^{n}} by defining ‖ v ‖ := ‖ ( v , v , … , v ) ‖ {\displaystyle \left\|v\right\|:=\left\|\left(v,v,\dots ,v\right)\right\|} . These norms treat an m × n {\displaystyle m\times n} matrix as

405-490: A document (column vector) from our input matrix by a linear combination of our features (column vectors in W ) where each feature is weighted by the feature's cell value from the document's column in H . NMF has an inherent clustering property, i.e., it automatically clusters the columns of input data V = ( v 1 , … , v n ) {\displaystyle \mathbf {V} =(v_{1},\dots ,v_{n})} . More specifically,

486-491: A local minimum, rather than a global minimum of the cost function. A provably optimal algorithm is unlikely in the near future as the problem has been shown to generalize the k-means clustering problem which is known to be NP-complete . However, as in many other data mining applications, a local minimum may still prove to be useful. In addition to the optimization step, initialization has a significant effect on NMF. The initial values chosen for W and H may affect not only

567-434: A long history under the name "self modeling curve resolution". In this framework the vectors in the right matrix are continuous curves rather than discrete vectors. Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the 1990s under the name positive matrix factorization . It became more widely known as non-negative matrix factorization after Lee and Seung investigated

648-545: A mathematical framework for machine learning Nippon Music Foundation , an organisation to develop international networks of music and foster public interest in music in Japan Green Warriors of Norway (Norwegian: Norges Miljøvernforbund ), a Norwegian environmental NGO Norman Music Festival , an annual three-day American music festival Norman, Oklahoma Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with

729-429: A method to study the common properties of astronomical objects and post-process the astronomical observations. The advances in the spectroscopic observations by Blanton & Roweis (2007) takes into account of the uncertainties of astronomical observations, which is later improved by Zhu (2016) where missing data are also considered and parallel computing is enabled. Their method is then adopted by Ren et al. (2018) to

810-484: A monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981. Kalofolias and Gallopoulos (2012) solved the symmetric counterpart of this problem, where V is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in O(rm ) time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for

891-472: A vector norm ‖ ⋅ ‖ β {\displaystyle \|\cdot \|_{\beta }} on K m {\displaystyle K^{m}} are given. Any m × n {\displaystyle m\times n} matrix A induces a linear operator from K n {\displaystyle K^{n}} to K m {\displaystyle K^{m}} with respect to

SECTION 10

#1732917239636

972-506: A vector of size m ⋅ n {\displaystyle m\cdot n} , and use one of the familiar vector norms. For example, using the p -norm for vectors, p ≥ 1 , we get: This is a different norm from the induced p -norm (see above) and the Schatten p -norm (see below), but the notation is the same. The special case p = 2 is the Frobenius norm, and p = ∞ yields

1053-837: Is consistent with the vector norms that induce it, giving ‖ A x ‖ β ≤ ‖ A ‖ α , β ‖ x ‖ α . {\displaystyle \|Ax\|_{\beta }\leq \|A\|_{\alpha ,\beta }\|x\|_{\alpha }.} Suppose ‖ ⋅ ‖ α , β {\displaystyle \|\cdot \|_{\alpha ,\beta }} ; ‖ ⋅ ‖ β , γ {\displaystyle \|\cdot \|_{\beta ,\gamma }} ; and ‖ ⋅ ‖ α , γ {\displaystyle \|\cdot \|_{\alpha ,\gamma }} are operator norms induced by

1134-469: Is multiplication . Matrix norms are particularly useful if they are also sub-multiplicative : Every norm on K can be rescaled to be sub-multiplicative; in some books, the terminology matrix norm is reserved for sub-multiplicative norms. Suppose a vector norm ‖ ⋅ ‖ α {\displaystyle \|\cdot \|_{\alpha }} on K n {\displaystyle K^{n}} and

1215-423: Is a non-negative monomial matrix . In this simple case it will just correspond to a scaling and a permutation . More control over the non-uniqueness of NMF is obtained with sparsity constraints. In astronomy, NMF is a promising method for dimension reduction in the sense that astrophysical signals are non-negative. NMF has been applied to the spectroscopic observations and the direct imaging observations as

1296-408: Is also related to the latent class model . NMF with the least-squares objective is equivalent to a relaxed form of K-means clustering : the matrix factor W contains cluster centroids and H contains cluster membership indicators. This provides a theoretical foundation for using NMF for data clustering. However, k-means does not enforce non-negativity on its centroids, so the closest analogy

1377-400: Is an m × n matrix, W is an m × p matrix, and H is a p × n matrix then p can be significantly less than both m and n . Here is an example based on a text-mining application: This last point is the basis of NMF because we can consider each original document in our example as being built from a small set of hidden features. NMF generates these features. It

1458-439: Is an instance of nonnegative quadratic programming , just like the support vector machine (SVM). However, SVM and NMF are related at a more intimate level than that of NQP, which allows direct application of the solution algorithms developed for either of the two methods to problems in both domains. The factorization is not unique: A matrix and its inverse can be used to transform the two factorization matrices by, e.g., If

1539-415: Is an operator norm on the space of square matrices K n × n {\displaystyle K^{n\times n}} induced by vector norms ‖ ⋅ ‖ α {\displaystyle \|\cdot \|_{\alpha }} and ‖ ⋅ ‖ α {\displaystyle \|\cdot \|_{\alpha }} . Then,

1620-1029: Is called consistent with a vector norm ‖ ⋅ ‖ α {\displaystyle \|\cdot \|_{\alpha }} on K n {\displaystyle K^{n}} and a vector norm ‖ ⋅ ‖ β {\displaystyle \|\cdot \|_{\beta }} on K m {\displaystyle K^{m}} , if: ‖ A x ‖ β ≤ ‖ A ‖ ‖ x ‖ α {\displaystyle \left\|Ax\right\|_{\beta }\leq \left\|A\right\|\left\|x\right\|_{\alpha }} for all A ∈ K m × n {\displaystyle A\in K^{m\times n}} and all x ∈ K n {\displaystyle x\in K^{n}} . In

1701-519: Is fixed and W found by a non-negative least squares solver, then W is fixed and H is found analogously. The procedures used to solve for W and H may be the same or different, as some NMF variants regularize one of W and H . Specific approaches include the projected gradient descent methods, the active set method, the optimal gradient method, and the block principal pivoting method among several others. Current algorithms are sub-optimal in that they only guarantee finding

SECTION 20

#1732917239636

1782-573: Is in fact with "semi-NMF". NMF can be seen as a two-layer directed graphical model with one layer of observed random variables and one layer of hidden random variables. NMF extends beyond matrices to tensors of arbitrary order. This extension may be viewed as a non-negative counterpart to, e.g., the PARAFAC model. Other extensions of NMF include joint factorization of several data matrices and tensors where some factors are shared. Such models are useful for sensor fusion and relational learning. NMF

1863-541: Is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too. When the error function to be used is Kullback–Leibler divergence , NMF is identical to the probabilistic latent semantic analysis (PLSA), a popular document clustering method. Usually the number of columns of W and the number of rows of H in NMF are selected so the product WH will become an approximation to V . The full decomposition of V then amounts to

1944-507: Is proven by singular value decomposition of A {\displaystyle A} , and the fact that the trace is invariant under circular shifts. The Frobenius norm is an extension of the Euclidean norm to K n × n {\displaystyle K^{n\times n}} and comes from the Frobenius inner product on the space of all matrices. The Frobenius norm

2025-1284: Is simply the maximum absolute row sum of the matrix. For example, for A = [ − 3 5 7 2 6 4 0 2 8 ] , {\displaystyle A={\begin{bmatrix}-3&5&7\\2&6&4\\0&2&8\\\end{bmatrix}},} we have that ‖ A ‖ 1 = max ( | − 3 | + 2 + 0 ; 5 + 6 + 2 ; 7 + 4 + 8 ) = max ( 5 , 13 , 19 ) = 19 , {\displaystyle \|A\|_{1}=\max(|{-3}|+2+0;5+6+2;7+4+8)=\max(5,13,19)=19,} ‖ A ‖ ∞ = max ( | − 3 | + 5 + 7 ; 2 + 6 + 4 ; 0 + 2 + 8 ) = max ( 15 , 12 , 10 ) = 15. {\displaystyle \|A\|_{\infty }=\max(|{-3}|+5+7;2+6+4;0+2+8)=\max(15,12,10)=15.} When p = 2 {\displaystyle p=2} (the Euclidean norm or ℓ 2 {\displaystyle \ell _{2}} -norm for vectors),

2106-682: Is sub-multiplicative and is very useful for numerical linear algebra . The sub-multiplicativity of Frobenius norm can be proved using Cauchy–Schwarz inequality . Frobenius norm is often easier to compute than induced norms, and has the useful property of being invariant under rotations (and unitary operations in general). That is, ‖ A ‖ F = ‖ A U ‖ F = ‖ U A ‖ F {\displaystyle \|A\|_{\text{F}}=\|AU\|_{\text{F}}=\|UA\|_{\text{F}}} for any unitary matrix U {\displaystyle U} . This property follows from

2187-418: Is the i -th column vector of the product matrix V and h i is the i -th column vector of the matrix H . When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix. For example, if V

2268-568: Is the Frobenius inner product , and Re is the real part of a complex number (irrelevant for real matrices) The max norm is the elementwise norm in the limit as p = q goes to infinity: This norm is not sub-multiplicative ; but modifying the right-hand side to m n max i , j | a i j | {\displaystyle {\sqrt {mn}}\max _{i,j}\vert a_{ij}\vert } makes it so. Note that in some literature (such as Communication complexity ), an alternative definition of max-norm, also called

2349-626: Is the i-th row of matrix A {\displaystyle A} . In the special cases of α = 1 {\displaystyle \alpha =1} and β = 2 {\displaystyle \beta =2} , the induced matrix norms can be computed by ‖ A ‖ 1 , 2 = max 1 ≤ j ≤ n ‖ A : j ‖ 2 , {\displaystyle \|A\|_{1,2}=\max _{1\leq j\leq n}\|A_{:j}\|_{2},} where A : j {\displaystyle A_{:j}}

2430-399: Is the j-th column of matrix A {\displaystyle A} . Hence, ‖ A ‖ 2 , ∞ {\displaystyle \|A\|_{2,\infty }} and ‖ A ‖ 1 , 2 {\displaystyle \|A\|_{1,2}} are the maximum row and column 2-norm of the matrix, respectively. Any operator norm

2511-435: Is the sum of the Euclidean norms of the columns of the matrix: The L 2 , 1 {\displaystyle L_{2,1}} norm as an error function is more robust, since the error for each data point (a column) is not squared. It is used in robust data analysis and sparse coding . For p , q ≥ 1 , the L 2 , 1 {\displaystyle L_{2,1}} norm can be generalized to

NMF - Misplaced Pages Continue

2592-431: Is useful to think of each feature (column vector) in the features matrix W as a document archetype comprising a set of words where each word's cell value defines the word's rank in the feature: The higher a word's cell value the higher the word's rank in the feature. A column in the coefficients matrix H represents an original document with a cell value defining the document's rank for a feature. We can now reconstruct

2673-417: The γ 2 {\displaystyle \gamma _{2}} -norm, refers to the factorization norm: The Schatten p -norms arise when applying the p -norm to the vector of singular values of a matrix. If the singular values of the m × n {\displaystyle m\times n} matrix A {\displaystyle A} are denoted by σ i , then

2754-506: The L p , q {\displaystyle L_{p,q}} norm as follows: When p = q = 2 for the L p , q {\displaystyle L_{p,q}} norm, it is called the Frobenius norm or the Hilbert–Schmidt norm , though the latter term is used more frequently in the context of operators on (possibly infinite-dimensional) Hilbert space . This norm can be defined in various ways: where

2835-432: The ‖ A ‖ p {\displaystyle \|A\|_{p}} defined previously is the special case of ‖ A ‖ p , p {\displaystyle \|A\|_{p,p}} . In the special cases of α = 2 {\displaystyle \alpha =2} and β = ∞ {\displaystyle \beta =\infty } ,

2916-695: The "entry-wise" p -norms and the Schatten p -norms for matrices treated below, which are also usually denoted by ‖ A ‖ p . {\displaystyle \|A\|_{p}.} Geometrically speaking, one can imagine a p -norm unit ball V p , n = { x ∈ K n : ‖ x ‖ p ≤ 1 } {\displaystyle V_{p,n}=\{x\in K^{n}:\|x\|_{p}\leq 1\}} in K n {\displaystyle K^{n}} , then apply

2997-491: The K - vector space of matrices with m {\displaystyle m} rows and n {\displaystyle n} columns and entries in the field K {\displaystyle K} . A matrix norm is a norm on K m × n {\displaystyle K^{m\times n}} . Norms are often expressed with double vertical bars (like so: ‖ A ‖ {\displaystyle \|A\|} ). Thus,

3078-467: The Karhunen–Loève theorem , an application of PCA, using the plot of eigenvalues. A typical choice of the number of components with PCA is based on the "elbow" point, then the existence of the flat plateau is indicating that PCA is not capturing the data efficiently, and at last there exists a sudden drop reflecting the capture of random noise and falls into the regime of overfitting. For sequential NMF,

3159-569: The exoplanets and circumstellar disks . Matrix norm#Frobenius norm In the field of mathematics , norms are defined for elements within a vector space . Specifically, when the vector space comprises matrices, such norms are referred to as matrix norms . Matrix norms differ from vector norms in that they must also interact with matrix multiplication. Given a field K {\displaystyle K} of either real or complex numbers , let K m × n {\displaystyle K^{m\times n}} be

3240-731: The spectral radius formula : lim r → ∞ ‖ A r ‖ 1 / r = ρ ( A ) . {\displaystyle \lim _{r\to \infty }\|A^{r}\|^{1/r}=\rho (A).} If the vector norms ‖ ⋅ ‖ α {\displaystyle \|\cdot \|_{\alpha }} and ‖ ⋅ ‖ β {\displaystyle \|\cdot \|_{\beta }} are given in terms of energy norms based on symmetric positive definite matrices P {\displaystyle P} and Q {\displaystyle Q} respectively,

3321-590: The supremum . This norm measures how much the mapping induced by A {\displaystyle A} can stretch vectors. Depending on the vector norms ‖ ⋅ ‖ α {\displaystyle \|\cdot \|_{\alpha }} , ‖ ⋅ ‖ β {\displaystyle \|\cdot \|_{\beta }} used, notation other than ‖ ⋅ ‖ α , β {\displaystyle \|\cdot \|_{\alpha ,\beta }} can be used for

NMF - Misplaced Pages Continue

3402-429: The trace is the sum of diagonal entries, and σ i ( A ) {\displaystyle \sigma _{i}(A)} are the singular values of A {\displaystyle A} . The second equality is proven by explicit computation of t r a c e ( A ∗ A ) {\displaystyle \mathrm {trace} (A^{*}A)} . The third equality

3483-488: The PCA components are ranked by the magnitude of their corresponding eigenvalues; for NMF, its components can be ranked empirically when they are constructed one by one (sequentially), i.e., learn the ( n + 1 ) {\displaystyle (n+1)} -th component with the first n {\displaystyle n} components constructed. The contribution of the sequential NMF components can be compared with

3564-600: The above minimization is mathematically equivalent to the minimization of K-means clustering . Furthermore, the computed H {\displaystyle H} gives the cluster membership, i.e., if H k j > H i j {\displaystyle \mathbf {H} _{kj}>\mathbf {H} _{ij}} for all i ≠ k , this suggests that the input data v j {\displaystyle v_{j}} belongs to k {\displaystyle k} -th cluster. The computed W {\displaystyle W} gives

3645-896: The approximation of V {\displaystyle \mathbf {V} } by V ≃ W H {\displaystyle \mathbf {V} \simeq \mathbf {W} \mathbf {H} } is achieved by finding W {\displaystyle W} and H {\displaystyle H} that minimize the error function (using the Frobenius norm ) ‖ V − W H ‖ F , {\displaystyle \left\|V-WH\right\|_{F},} subject to W ≥ 0 , H ≥ 0. {\displaystyle W\geq 0,H\geq 0.} , If we furthermore impose an orthogonality constraint on H {\displaystyle \mathbf {H} } , i.e. H H T = I {\displaystyle \mathbf {H} \mathbf {H} ^{T}=I} , then

3726-462: The case where one of the factors W satisfies a separability condition. In Learning the parts of objects by non-negative matrix factorization Lee and Seung proposed NMF mainly for parts-based decomposition of images. It compares NMF to vector quantization and principal component analysis , and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results. It

3807-391: The cluster centroids, i.e., the k {\displaystyle k} -th column gives the cluster centroid of k {\displaystyle k} -th cluster. This centroid's representation can be significantly enhanced by convex NMF. When the orthogonality constraint H H T = I {\displaystyle \mathbf {H} \mathbf {H} ^{T}=I}

3888-471: The columns of V represent data sampled over spatial or temporal dimensions, e.g. time signals, images, or video, features that are equivariant w.r.t. shifts along these dimensions can be learned by Convolutional NMF. In this case, W is sparse with columns having local non-zero weight windows that are shared across shifts along the spatio-temporal dimensions of V , representing convolution kernels . By spatio-temporal pooling of H and repeatedly using

3969-411: The corresponding operator norm is ‖ A ‖ α , β = sup x ≠ 0 ‖ A x ‖ β ‖ x ‖ α . {\displaystyle \|A\|_{\alpha ,\beta }=\sup _{x\neq 0}{\frac {\|Ax\|_{\beta }}{\|x\|_{\alpha }}}.} In particular,

4050-725: The cyclic nature of the trace ( trace ⁡ ( X Y Z ) = trace ⁡ ( Y Z X ) = trace ⁡ ( Z X Y ) {\displaystyle \operatorname {trace} (XYZ)=\operatorname {trace} (YZX)=\operatorname {trace} (ZXY)} ): and analogously: where we have used the unitary nature of U {\displaystyle U} (that is, U ∗ U = U U ∗ = I {\displaystyle U^{*}U=UU^{*}=\mathbf {I} } ). It also satisfies and where ⟨ A , B ⟩ F {\displaystyle \langle A,B\rangle _{\text{F}}}

4131-436: The data are provided in streaming fashion. One such use is for collaborative filtering in recommendation systems , where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different. If

SECTION 50

#1732917239636

4212-409: The data. In standard NMF, matrix factor W ∈ R + ， i.e., W can be anything in that space. Convex NMF restricts the columns of W to convex combinations of the input data vectors ( v 1 , … , v n ) {\displaystyle (v_{1},\dots ,v_{n})} . This greatly improves the quality of data representation of W . Furthermore,

4293-414: The direct imaging field as one of the methods of detecting exoplanets , especially for the direct imaging of circumstellar disks . Ren et al. (2018) are able to prove the stability of NMF components when they are constructed sequentially (i.e., one by one), which enables the linearity of the NMF modeling process; the linearity property is used to separate the stellar light and the light scattered from

4374-501: The divergence between V and WH and possibly by regularization of the W and/or H matrices. Two simple divergence functions studied by Lee and Seung are the squared error (or Frobenius norm ) and an extension of the Kullback–Leibler divergence to positive matrices (the original Kullback–Leibler divergence is defined on probability distributions). Each divergence leads to a different NMF algorithm, usually minimizing

4455-403: The divergence using iterative update rules. The factorization problem in the squared error version of NMF may be stated as: Given a matrix V {\displaystyle \mathbf {V} } find nonnegative matrices W and H that minimize the function Another type of NMF for images is based on the total variation norm . When L1 regularization (akin to Lasso ) is added to NMF with

4536-407: The induced matrix norm is the spectral norm . The two values do not coincide in infinite dimensions — see Spectral radius for further discussion. The spectral radius should not be confused with the spectral norm. The spectral norm of a matrix A {\displaystyle A} is the largest singular value of A {\displaystyle A} , i.e., the square root of

4617-393: The induced matrix norms can be computed by ‖ A ‖ 2 , ∞ = max 1 ≤ i ≤ m ‖ A i : ‖ 2 , {\displaystyle \|A\|_{2,\infty }=\max _{1\leq i\leq m}\|A_{i:}\|_{2},} where A i : {\displaystyle A_{i:}}

4698-588: The inequality for all positive integers r , where ρ ( A ) is the spectral radius of A . For symmetric or hermitian A , we have equality in ( 1 ) for the 2-norm, since in this case the 2-norm is precisely the spectral radius of A . For an arbitrary matrix, we may not have equality for any norm; a counterexample would be A = [ 0 1 0 0 ] , {\displaystyle A={\begin{bmatrix}0&1\\0&0\end{bmatrix}},} which has vanishing spectral radius. In any case, for any matrix norm, we have

4779-701: The largest eigenvalue of the matrix A ∗ A , {\displaystyle A^{*}A,} where A ∗ {\displaystyle A^{*}} denotes the conjugate transpose of A {\displaystyle A} : ‖ A ‖ 2 = λ max ( A ∗ A ) = σ max ( A ) . {\displaystyle \|A\|_{2}={\sqrt {\lambda _{\max }\left(A^{*}A\right)}}=\sigma _{\max }(A).} where σ max ( A ) {\displaystyle \sigma _{\max }(A)} represents

4860-589: The largest singular value of matrix A . {\displaystyle A.} There are further properties: We can generalize the above definition. Suppose we have vector norms ‖ ⋅ ‖ α {\displaystyle \|\cdot \|_{\alpha }} and ‖ ⋅ ‖ β {\displaystyle \|\cdot \|_{\beta }} for spaces K n {\displaystyle K^{n}} and K m {\displaystyle K^{m}} respectively;

4941-436: The linear map A {\displaystyle A} to the ball. It would end up becoming a distorted convex shape A V p , n ⊂ K m {\displaystyle AV_{p,n}\subset K^{m}} , and ‖ A ‖ p {\displaystyle \|A\|_{p}} measures the longest "radius" of the distorted convex shape. In other words, we must take

SECTION 60

#1732917239636

5022-609: The matrix norm is a function ‖ ⋅ ‖ : K m × n → R {\displaystyle \|\cdot \|:K^{m\times n}\to \mathbb {R} } that must satisfy the following properties: For all scalars α ∈ K {\displaystyle \alpha \in K} and matrices A , B ∈ K m × n {\displaystyle A,B\in K^{m\times n}} , The only feature distinguishing matrices from rearranged vectors

5103-445: The maximum norm. Let ( a 1 , … , a n ) {\displaystyle (a_{1},\ldots ,a_{n})} be the columns of matrix A {\displaystyle A} . From the original definition, the matrix A {\displaystyle A} presents n data points in m-dimensional space. The L 2 , 1 {\displaystyle L_{2,1}} norm

5184-424: The mean squared error cost function, the resulting problem may be called non-negative sparse coding due to the similarity to the sparse coding problem, although it may also still be referred to as NMF. Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where

5265-860: The multiplicative factors for W and H , i.e. the W T V W T W H {\textstyle {\frac {\mathbf {W} ^{\mathsf {T}}\mathbf {V} }{\mathbf {W} ^{\mathsf {T}}\mathbf {W} \mathbf {H} }}} and V H T W H H T {\textstyle {\textstyle {\frac {\mathbf {V} \mathbf {H} ^{\mathsf {T}}}{\mathbf {W} \mathbf {H} \mathbf {H} ^{\mathsf {T}}}}}} terms, are matrices of ones when V = W H {\displaystyle \mathbf {V} =\mathbf {W} \mathbf {H} } . More recently other algorithms have been developed. Some approaches are based on alternating non-negative least squares : in each step of such an algorithm, first H

5346-530: The operator norm can be expressed as the spectral norm of a modified matrix: ‖ A ‖ P , Q = ‖ Q 1 / 2 A P − 1 / 2 ‖ 2 . {\displaystyle \|A\|_{P,Q}=\|Q^{1/2}AP^{-1/2}\|_{2}.} A matrix norm ‖ ⋅ ‖ {\displaystyle \|\cdot \|} on K m × n {\displaystyle K^{m\times n}}

5427-439: The operator norm is a sub-multiplicative matrix norm: ‖ A B ‖ α , α ≤ ‖ A ‖ α , α ‖ B ‖ α , α . {\displaystyle \|AB\|_{\alpha ,\alpha }\leq \|A\|_{\alpha ,\alpha }\|B\|_{\alpha ,\alpha }.} Moreover, any such norm satisfies

5508-697: The operator norm. If the p -norm for vectors ( 1 ≤ p ≤ ∞ {\displaystyle 1\leq p\leq \infty } ) is used for both spaces K n {\displaystyle K^{n}} and K m , {\displaystyle K^{m},} then the corresponding operator norm is: ‖ A ‖ p = sup x ≠ 0 ‖ A x ‖ p ‖ x ‖ p . {\displaystyle \|A\|_{p}=\sup _{x\neq 0}{\frac {\|Ax\|_{p}}{\|x\|_{p}}}.} These induced norms are different from

5589-461: The plot of eigenvalues is approximated by the plot of the fractional residual variance curves, where the curves decreases continuously, and converge to a higher level than PCA, which is the indication of less over-fitting of sequential NMF. Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix V . A polynomial time algorithm for solving nonnegative rank factorization if V contains

5670-445: The properties of the algorithm and published some simple and useful algorithms for two types of factorizations. Let matrix V be the product of the matrices W and H , Matrix multiplication can be implemented as computing the column vectors of V as linear combinations of the column vectors in W using coefficients supplied by columns of H . That is, each column of V can be computed as follows: where v i

5751-400: The rate of convergence, but also the overall error at convergence. Some options for initialization include complete randomization, SVD , k-means clustering, and more advanced strategies based on these and other paradigms. The sequential construction of NMF components ( W and H ) was firstly used to relate NMF with Principal Component Analysis (PCA) in astronomy. The contribution from

5832-1774: The respective pairs of vector norms ( ‖ ⋅ ‖ α , ‖ ⋅ ‖ β ) {\displaystyle (\|\cdot \|_{\alpha },\|\cdot \|_{\beta })} ; ( ‖ ⋅ ‖ β , ‖ ⋅ ‖ γ ) {\displaystyle (\|\cdot \|_{\beta },\|\cdot \|_{\gamma })} ; and ( ‖ ⋅ ‖ α , ‖ ⋅ ‖ γ ) {\displaystyle (\|\cdot \|_{\alpha },\|\cdot \|_{\gamma })} . Then, this follows from ‖ A B x ‖ γ ≤ ‖ A ‖ β , γ ‖ B x ‖ β ≤ ‖ A ‖ β , γ ‖ B ‖ α , β ‖ x ‖ α {\displaystyle \|ABx\|_{\gamma }\leq \|A\|_{\beta ,\gamma }\|Bx\|_{\beta }\leq \|A\|_{\beta ,\gamma }\|B\|_{\alpha ,\beta }\|x\|_{\alpha }} and sup ‖ x ‖ α = 1 ‖ A B x ‖ γ = ‖ A B ‖ α , γ . {\displaystyle \sup _{\|x\|_{\alpha }=1}\|ABx\|_{\gamma }=\|AB\|_{\alpha ,\gamma }.} Suppose ‖ ⋅ ‖ α , α {\displaystyle \|\cdot \|_{\alpha ,\alpha }}

5913-477: The resulting matrix factor H becomes more sparse and orthogonal. In case the nonnegative rank of V is equal to its actual rank, V = WH is called a nonnegative rank factorization (NRF). The problem of finding the NRF of V , if it exists, is known to be NP-hard. There are different types of non-negative matrix factorizations. The different types arise from using different cost functions for measuring

5994-508: The resulting operator norm is given as ‖ A ‖ P , Q = sup x ≠ 0 ‖ A x ‖ Q ‖ x ‖ P . {\displaystyle \|A\|_{P,Q}=\sup _{x\neq 0}{\frac {\|Ax\|_{Q}}{\|x\|_{P}}}.} Using the symmetric matrix square roots of P {\displaystyle P} and Q {\displaystyle Q} respectively,

6075-408: The resulting representation as input to convolutional NMF, deep feature hierarchies can be learned. There are several ways in which the W and H may be found: Lee and Seung's multiplicative update rule has been a popular method due to the simplicity of implementation. This algorithm is: Note that the updates are done on an element by element basis not matrix multiplication. We note that

6156-556: The special case of m = n and α = β {\displaystyle \alpha =\beta } , ‖ ⋅ ‖ {\displaystyle \|\cdot \|} is also called compatible with ‖ ⋅ ‖ α {\displaystyle \|\cdot \|_{\alpha }} . All induced norms are consistent by definition. Also, any sub-multiplicative matrix norm on K n × n {\displaystyle K^{n\times n}} induces

6237-1262: The standard basis, and one defines the corresponding induced norm or operator norm or subordinate norm on the space K m × n {\displaystyle K^{m\times n}} of all m × n {\displaystyle m\times n} matrices as follows: ‖ A ‖ α , β = sup { ‖ A x ‖ β : x ∈ K n with ‖ x ‖ α = 1 } = sup { ‖ A x ‖ β ‖ x ‖ α : x ∈ K n with x ≠ 0 } . {\displaystyle {\begin{aligned}\|A\|_{\alpha ,\beta }&=\sup\{\|Ax\|_{\beta }:x\in K^{n}{\text{ with }}\|x\|_{\alpha }=1\}\\&=\sup \left\{{\frac {\|Ax\|_{\beta }}{\|x\|_{\alpha }}}:x\in K^{n}{\text{ with }}x\neq 0\right\}.\end{aligned}}} where sup {\displaystyle \sup } denotes

6318-734: The title NMF . If an internal link led you here, you may wish to change the link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=NMF&oldid=1248061487 " Category : Disambiguation pages Hidden categories: Short description is different from Wikidata All article disambiguation pages All disambiguation pages Non-negative matrix factorization NMF finds applications in such fields as astronomy , computer vision , document clustering , missing data imputation , chemometrics , audio signal processing , recommender systems , and bioinformatics . In chemometrics non-negative matrix factorization has

6399-618: The two new matrices W ~ = W B {\displaystyle \mathbf {{\tilde {W}}=WB} } and H ~ = B − 1 H {\displaystyle \mathbf {\tilde {H}} =\mathbf {B} ^{-1}\mathbf {H} } are non-negative they form another parametrization of the factorization. The non-negativity of W ~ {\displaystyle \mathbf {\tilde {W}} } and H ~ {\displaystyle \mathbf {\tilde {H}} } applies at least if B

6480-492: The two non-negative matrices W and H as well as a residual U , such that: V = WH + U . The elements of the residual matrix can either be negative or positive. When W and H are smaller than V they become easier to store and manipulate. Another reason for factorizing V into smaller matrices W and H , is that if one's goal is to approximately represent the elements of V by significantly less data, then one has to infer some latent structure in

6561-474: Was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA". When NMF is obtained by minimizing the Kullback–Leibler divergence , it is in fact equivalent to another instance of multinomial PCA, probabilistic latent semantic analysis , trained by maximum likelihood estimation. That method is commonly used for analyzing and clustering textual data and

#635364