摘要
随着大数据应用的发展,通过非线性流形采样得到的多类型关系数据规模越来越大,数据几何结构更加复杂,异构关系数据变得异常稀疏,导致数据挖掘难度增大且准确率降低。针对上述问题,提出一种基于流形非负矩阵三分解的多类型关系数据联合聚类方法:首先,对于较小规模的实体,根据其自然关系或内容相关性构造关联矩阵,对其分解后得到该类实体的聚类指示矩阵,将其作为非负矩阵三分解的输入;然后,在快速非负矩阵三分解(FNMTF)的基础上加入流形正则化处理,实现数据类型间关系与类型内部关系的联合聚类,进一步提高聚类的准确率。实验表明:在准确率和整体性能方面,流形非负矩阵三分解算法优于传统的基于非负矩阵分解的联合聚类算法。
With the development of big data applications,the size of multi-type relational data sampled from nonlinear manifolds is getting larger.The data geometric structure is more complicated,and the heterogeneous relational data are becoming extremely sparse.As a result,data mining becomes more difficult and less accurate.In order to solve this problem,this paper proposed a manifold nonnegative matrix tri-factorization(MNMTF) approach for multi-type relational data co-clustering.First of all,the correlation matrix is constructed with the natural relationship or content relevance of smaller-scale entities and it is decomposed into indicating matrix.The indicating matrix is used as the input of nonnegative matrix tri-factorization.Then,the manifold regularization is added on the basis of fast nonnegative matrix tri-factorization(FNMTF) to simultaneously cluster data inter-type relationships and intra-type relationships,improving the accuracy of clustering.Experiments show that the accuracy and performance of MNMTF algorithm are superior to the traditional co-clustering algorithms based on nonnegative matrix factorization.
引文
[1] ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
[2] BELKIN M,NIYOGI P.Laplacian eigenmaps for dimensionality reduction and data representation [J].Neural Computation,2003,15(6):1373-1396.
[3] AILEM M,ROLE F,NADIF M.Co-clustering document-term matrices by direct maximization of graph modularity[C]//ACM International on Conference on Information and Knowledge Management.New York:ACM Press,2015:1807-1810.
[4] HONDA K,TANAKA D,NOTSU A.Incremental algorithms for fuzzy co-clustering of very large cooccurrence matrix[C]//IEEE International Conference on Fuzzy Systems.Piscataway:IEEE Press,2014:2494-2499.
[5] LEE D D,SEUNG H S.Learning the parts of objects with nonnegative matrix factorization[J].Nature,1999,401(21):788-791.
[6] LEE D D,SEUNG H S.Algorithms for non-negative matrix factorization[C]//Neural Information Processing Systems.New York:NIPC Press 2000:535-541.
[7] DING C,HE X,SIMON H D,et al.On the equivalence of nonnegative matrix factorization and spectral clustering[C]//SIAM International Conference on Data Mining.Philadelphia:SIAM Press,2005:606-610.
[8] DING C,LI T,PENG W,et al.Orthogonal nonnegative matrix tri-factorizations for clustering[C]//ACM SIGKDD Internatio-nal Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2006:126-135.
[9] LI Z,WU X.Weighted nonnegative matrix tri-factorization for co-clustering[C]//IEEE International Conference on TOOLS with Artificial Intelligence.Piscataway:IEEE Press,2011:811-816.
[10] BUONO N D,PIO G.Non-negative Matrix Tri-Factorization for co-clustering:An analysis of the block matrix[J].Information Sciences,2015,301(20):13-26.
[11] GU Q,ZHOU J.Co-clustering on manifolds[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2009:359-368.
[12] WANG S,HUANG A.Penalized nonnegative matrix tri-factorization for co-clustering[J].Expert Systems with Applications,2017,78(C):64-73.
[13] WANG S,GUO W.Robust co-clustering via dual local learning and high-order matrix factorization[J].Knowledge-Based Systems,2017,138(15):176-187.
[14] WANG H,NIE F,HUANG H,et al.Fast nonnegative matrix tri-factorization for large-scale data co-clustering[C]//International Joint Conference on Artificial Intelligence.Menlo Park:AAAI Press,2011:1553-1558.
[15] SHEN G,YANG W,WANG W,et al.Large-scale heteroge- neous data co-clustering based on nonnegative matrix factorization[J].Journal of Computer Research and Development,2016,53(2):459-466.(in Chinese)申国伟,杨武,王巍,等.基于非负矩阵分解的大规模异构数据联合聚类[J].计算机研究与发展,2016,53(2):459-466.