摘要
在多标签分类的相关研究中,由于现有的基于网络表示学习算法的相关方法只利用了网络中节点之间的邻接领域信息,而没有考虑到节点之间的结构相似性,从而导致分类准确性较低,因此,本文提出一种基于深度自动编码器的多标签分类模型。该方法首先利用轨迹计算算法(Orca)计算不同规模下网络中节点的结构相似性,作为深度自动编码器的输入来改进隐藏层中的向量表示,保留网络的全局结构;然后利用节点的邻接领域信息在模型中进行联合优化,从而能有效地捕捉到网络的高度非线性结构;最后根据隐藏层得到节点的向量表示,利用支持向量机对节点进行多标签分类。验证实验采用3个公开的网络数据集,实验结果表明,与基准方法相比,本文方法在多标签分类任务中能取得更好的效果。
For the issue of multi-label classification,most existing methods only take the neighborhood information into consideration and ignore the structural similarity,leading to the low accuracy of classification.Therefore,a deep autoencoder for multi-label classification is proposed in this paper.In order to capture the global network structure,this method uses orbit counting algorithm(Orca)to calculate structural similarity of each node,which is the input information of the representations in the latent space.Then,the highly-nonlinear network structure can be well preserved by jointly optimizing the global structure and the neighborhood structure in the proposed model.Finally,SVM is used to classify the nodes according to the nodes vectors obtained from the latent space.Three real-world networks are used to conduct the experiment and the results show that the new model outperforms thestate-of-the-art methods in multi-label classification.
引文
[1] LIU H,LI X,ZHANG S.Learning instance correlation functions for multilabel classification[J].IEEE Transactions on Cybernetics,2016,47(2):499-510.
[2] TSOUMAKAS G,KATAKIS I,VLAHAVAS I.Mining multi-label data[C]//MAIMON O,ROKACH L.Data Mining and Knowledge Discovery Handbook.New York:Springer,2009:667-685.
[3] BOUTELL M R,LUO J,SHEN X,et al.Learning multi-label scene classification[J].Pattern Recognition,2004,37(9):1757-1771.
[4] TSOUMAKAS G, KATAKIS I, VLAHAVAS I.Randomk-labelsets for multilabel classification[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(7):1079-1089.
[5] ZHANG M L,ZHOU Z H.Multilabel neural networks with applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351.
[6] BENBOUZID D,CASAGRANDE N.MULTIBOOST:a multi-purpose boosting package[J].Journal of Machine Learning Research,2012,13(1):549-553.
[7] ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
[8] XU J.Multi-label weighted k-nearest neighbor classifier with adaptive weight estimation[C]//International Conference on Neural Information Processing.Berlin:Springer,2011:79-88.
[9] ELISSEEFF A,WESTON J.A kernel method for multi-labelled classification[C]//International Conference on Neural Information Processing Systems:Natural and Synthetic.Cambridge,MA:MIT Press,2001:681-687.
[10] ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323.
[11] BELKIN M,NIYOGI P.Laplacian eigenmaps for dimensionality reduction and data representation[J].Neural Computation,2014,15(6):1373-1396.
[12] TENENBAUM J B,SILVA,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319.
[13]涂存超,杨成,刘知远,等.网络表示学习综述[J].中国科学:信息科学,2017,47(8):980-996.
[14] PEROZZI B,AL-RFOU R,SKIENA S.DeepWalk:online learning of social representations[C]//Acm Sigkdd International Conference on Knowledge Discovery and Data Mining.New York:ACM,2014:701-710.
[15] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
[16] CAO S,LU W,XU Q.GraRep:Learning graph representations with global structural information[C]//ACM International on Conference on Information and Knowledge Management.New York:ACM,2015:891-900.
[17] TANG J,QU M,WANG M Z,et al.Line:large-scale information network embedding[C].Proceedings of the 24th International Conference on World Wide Web.[s.l.]:International World Wide Web Conferences Steering Committee,2015:1067-1077.
[18] GROVER A,LESKOVEC J.Node2vec:scalable feature learning for networks[C]//Acm Sigkdd International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:855.
[19] WANG D,CUI P,ZHU W.Structural deep network embedding[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:1225-1234.
[20] HOCˇEVAR T,DEMAR J.A combinatorial approach to graphlet counting[J].Bioinformatics,2014,30(4):559.
[21] TANG L,LIU H.Relational learning via latent social dimensions[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,New York:ACM,2009:817-826.