Fudong LiuZheng ShanYihang Chen
Nonnegative matrix factorization (NMF) decomposes a high-dimensional nonnegative matrix into the product of two reduced dimensional nonnegative matrices. However, conventional NMF neither qualifies large-scale datasets as it maintains all data in memory nor preserves the geometrical structure of data which is needed in some practical tasks. In this paper, we propose a parallel NMF with manifold regularization method (PNMF-M) to overcome the aforementioned deficiencies by parallelizing the manifold regularized NMF on distributed computing system. In particular, PNMF-M distributes both data samples and factor matrices to multiple computing nodes instead of loading the whole dataset in a single node and updates both factor matrices locally on each node. In this way, PNMF-M succeeds to resolve the pressure of memory consumption for large-scale datasets and to speed up the computation by parallelization. For constructing the adjacency matrix in manifold regularization, we propose a two-step distributed graph construction method, which is proved to be equivalent to the batch construction method. Experimental results on popular text corpora and image datasets demonstrate that PNMF-M significantly improves both scalability and time efficiency of conventional NMF thanks to the parallelization on distributed computing system; meanwhile it significantly enhances the representation ability of conventional NMF thanks to the incorporated manifold regularization.
Wenjun HuKup‐Sze ChoiPeiliang WangYunliang JiangShitong Wang
Wenjun HuKup‐Sze ChoiJianwen TaoYunliang JiangShitong Wang
Huirong LiYani ZhouPengjun ZhaoLei WangChengxiang Yu