基于深度自动编码器的多标签分类研究

英文篇名：Multi-label Classification Based on the Deep Autoencoder
作者：聂煜 ; 廖祥文 ; 魏晶晶 ; 杨定达 ; 陈国龙
英文作者：NIE Yu;LIAO Xiangwen;WEI Jingjing;YANG Dingda;CHEN Guolong;College of Mathematics and Computer Science,Fuzhou University;Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing(Fuzhou University);Digital Fujian Institute of Financial Big Data,Fuzhou University;College of Electronics and Information Science,Fujian Jiangxia University;
关键词：多标签分类 ; 网络表示学习 ; 结构相似性 ; 深度自动编码器
英文关键词：multi-label classification;;network embedding;;structural similarity;;deep autoencoder
中文刊名：GXSF
英文刊名：Journal of Guangxi Normal University(Natural Science Edition)
机构：福州大学数学与计算机科学学院;福建省网络计算与智能信息处理重点实验室(福州大学);数字福建金融大数据研究所(福州大学);福建江夏学院电子信息科学学院;
出版日期：2019-01-10
出版单位：广西师范大学学报(自然科学版)
年：2019
期：v.37
基金：国家自然科学基金(61772135,U1605251);; 中国科学院网络数据科学与技术重点实验室开放基金(CASNDST201708,CASNDST201606);; 北邮可信分布式计算与服务教育部重点实验室主任基金(2017KF01);; 福建省自然科学基金(2017J01755)
语种：中文;
页：GXSF201901008
页数：9
CN：01
ISSN：45-1067/N
分类号：75-83

摘要

在多标签分类的相关研究中,由于现有的基于网络表示学习算法的相关方法只利用了网络中节点之间的邻接领域信息,而没有考虑到节点之间的结构相似性,从而导致分类准确性较低,因此,本文提出一种基于深度自动编码器的多标签分类模型。该方法首先利用轨迹计算算法(Orca)计算不同规模下网络中节点的结构相似性,作为深度自动编码器的输入来改进隐藏层中的向量表示,保留网络的全局结构;然后利用节点的邻接领域信息在模型中进行联合优化,从而能有效地捕捉到网络的高度非线性结构;最后根据隐藏层得到节点的向量表示,利用支持向量机对节点进行多标签分类。验证实验采用3个公开的网络数据集,实验结果表明,与基准方法相比,本文方法在多标签分类任务中能取得更好的效果。
For the issue of multi-label classification,most existing methods only take the neighborhood information into consideration and ignore the structural similarity,leading to the low accuracy of classification.Therefore,a deep autoencoder for multi-label classification is proposed in this paper.In order to capture the global network structure,this method uses orbit counting algorithm(Orca)to calculate structural similarity of each node,which is the input information of the representations in the latent space.Then,the highly-nonlinear network structure can be well preserved by jointly optimizing the global structure and the neighborhood structure in the proposed model.Finally,SVM is used to classify the nodes according to the nodes vectors obtained from the latent space.Three real-world networks are used to conduct the experiment and the results show that the new model outperforms thestate-of-the-art methods in multi-label classification.

引文

[1] LIU H,LI X,ZHANG S.Learning instance correlation functions for multilabel classification[J].IEEE Transactions on Cybernetics,2016,47(2):499-510.
    [2] TSOUMAKAS G,KATAKIS I,VLAHAVAS I.Mining multi-label data[C]//MAIMON O,ROKACH L.Data Mining and Knowledge Discovery Handbook.New York:Springer,2009:667-685.
    [3] BOUTELL M R,LUO J,SHEN X,et al.Learning multi-label scene classification[J].Pattern Recognition,2004,37(9):1757-1771.
    [4] TSOUMAKAS G, KATAKIS I, VLAHAVAS I.Randomk-labelsets for multilabel classification[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(7):1079-1089.
    [5] ZHANG M L,ZHOU Z H.Multilabel neural networks with applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351.
    [6] BENBOUZID D,CASAGRANDE N.MULTIBOOST:a multi-purpose boosting package[J].Journal of Machine Learning Research,2012,13(1):549-553.
    [7] ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
    [8] XU J.Multi-label weighted k-nearest neighbor classifier with adaptive weight estimation[C]//International Conference on Neural Information Processing.Berlin:Springer,2011:79-88.
    [9] ELISSEEFF A,WESTON J.A kernel method for multi-labelled classification[C]//International Conference on Neural Information Processing Systems:Natural and Synthetic.Cambridge,MA:MIT Press,2001:681-687.
    [10] ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323.
    [11] BELKIN M,NIYOGI P.Laplacian eigenmaps for dimensionality reduction and data representation[J].Neural Computation,2014,15(6):1373-1396.
    [12] TENENBAUM J B,SILVA,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319.
    [13]涂存超,杨成,刘知远,等.网络表示学习综述[J].中国科学:信息科学,2017,47(8):980-996.
    [14] PEROZZI B,AL-RFOU R,SKIENA S.DeepWalk:online learning of social representations[C]//Acm Sigkdd International Conference on Knowledge Discovery and Data Mining.New York:ACM,2014:701-710.
    [15] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
    [16] CAO S,LU W,XU Q.GraRep:Learning graph representations with global structural information[C]//ACM International on Conference on Information and Knowledge Management.New York:ACM,2015:891-900.
    [17] TANG J,QU M,WANG M Z,et al.Line:large-scale information network embedding[C].Proceedings of the 24th International Conference on World Wide Web.[s.l.]:International World Wide Web Conferences Steering Committee,2015:1067-1077.
    [18] GROVER A,LESKOVEC J.Node2vec:scalable feature learning for networks[C]//Acm Sigkdd International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:855.
    [19] WANG D,CUI P,ZHU W.Structural deep network embedding[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:1225-1234.
    [20] HOCˇEVAR T,DEMAR J.A combinatorial approach to graphlet counting[J].Bioinformatics,2014,30(4):559.
    [21] TANG L,LIU H.Relational learning via latent social dimensions[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,New York:ACM,2009:817-826.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700