融合K均值聚类和低秩约束的属性选择算法

英文篇名：Attribute Selection Algorithm Based on K-Means Clustering and Low Rank Constraint
作者：杨常清
英文作者：YANG Changqing;School of Materials Engineering,Xi'an Aeronautical University;
关键词：属性选择 ; 自表达方法 ; K均值聚类 ; 低秩约束 ; 稀疏学习
英文关键词：attribute selection;;self-expression method;;K-means clustering;;low rank constraints;;sparse learning
中文刊名：MESS
英文刊名：Journal of Chinese Information Processing
机构：西安航空学院材料工程学院;
出版日期：2018-07-15
出版单位：中文信息学报
年：2018
期：v.32
基金：陕西省教育厅项目(15BY117)
语种：中文;
页：MESS201807012
页数：8
CN：07
ISSN：11-2325/N
分类号：96-103

摘要

针对无监督属性选择算法无类别信息和未考虑属性低秩等问题,该文提出了一种融合K均值聚类和低秩约束的属性选择算法。算法在线性回归的模型框架中有效地嵌入自表达方法,同时利用K均值聚类产生伪类标签最大化类间距以更好地稀疏结构,并使用l2,p-范数代替传统的l2,1-范数,通过参数p来灵活调节结果的稀疏性,最后证明了该文算法具有执行线性判别分析的特点和收敛性。经实验验证,该文提出的属性算法与NFS算法、LDA算法、RFS算法、RSR算法相比分类准确率平均提高了17.04%、13.95%、3.6%和9.39%,分类准确率方差也是最小的,分类结果稳定。
The unsupervised attribute selection algorithm does not consider the classification information and the low rank of attributes.To address this issue,this paper proposes an attribute selection algorithm combining K-means clustering and low-rank constraint.The algorithm embeds the self-expression method into the framework of the linear regression model.At the same time,the K-means clustering is used to generate the pseudo-class label to maximize the class spacing to better sparse the structure.The algorithm uses l2,p-norm instead of the traditional l2,1-norm,which can adjust the sparsity of the result flexibly by parameter p.It is also proved that the algorithm has the characteristics and convergence of linear discriminant analysis.The experimental results show that the accuracy of the proposed algorithm is 17.04%,13.95%,3.6%and 9.39% higher than that of the NFS algorithm、the LDA algorithm、the RFS algorithm 、the RSR algorithm,respectively,with the lowest classification accuracy variance.

引文

[1]Zhu X,Zhang L,Huang Z.A sparse embedding and least variance encoding approach to hashing[J].IEEE Transactions on Image Processing,2014,23(9):3737-3750.
    [2]Dai J,Hu Q,Zhang J,et al.Attribute selection for partially labeled categorical data by rough set approach[J].IEEE Trans Cybern,2017,47(9):2460-2471.
    [3]Zhu X,Li X,Zhang S.Block-row sparse Multiview multilabel learning for image classification[J].IEEE Trans Cybern,2016,46(2):450.
    [4]Zhu X,Suk Heung I,Shen D,A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis[J].NeuroImage,2014,100:91-105.
    [5]Tutkan M,Ganiz M C,Akyoku爧S.Helmholtz principle based supervised and unsupervised feature selection methods for text mining[J].Information Processing and Management,2016,52(5):885-910.
    [6]Altnel B,Ganiz M C.A new hybrid semi-supervised algorithm for text classification with class-based semantics[J].Knowledge-Based Systems,2016,108(C):50-64.
    [7]Jin X,Bo T,He H,et al.Semisupervised feature selection based on relevance and redundancy criteria[J].IEEE Transactions on Neural Networks&Learning Systems,2016,28(9):1974-1984.
    [8]钟智,胡荣耀,何威,等.基于图稀疏的自表达属性选择算法[J].计算机工程与设计,2016,37(6):1643-1649.
    [9]胡荣耀,刘星毅,程德波,等.基于稀疏学习的低秩属性选择算法[J].计算机工程与应用,2017,53(10):132-139.
    [10]白鹤翔,王健,李德玉,等.基于粗糙集的非监督快速属性选择算法[J].计算机应用,2015,35(8):2355-2360.
    [11]吴虎胜,张凤鸣,徐显亮,等.多变量时间序列的无监督属性选择算法[J].模式识别与人工智能,2013,26(10):916-924.
    [12]Ghaderi A,Athitsos V.Selective unsupervised feature learning with convolutional neural network(SCNN)[C]//Proceedings of International Conference on Pattern Recognition.IEEE,2017.
    [13]Tabakhi S,Moradi P,Akhlaghian F.An unsupervised feature selection algorithm based on ant colony optimization[J].Engineering Applications of Artificial Intelligence,2014,32(6):112-123.
    [14]李亚超,加羊吉,江静,等.融合无监督特征的藏文分词方法研究[J].中文信息学报,2017,31(2):71-77.
    [15]Xiang S,Zhu Y,Shen X,et al.Optimal Exact Least Squares Rank Minimization[C]//Proceedings of the18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2012:480-488.
    [16]Fan Z,Xu Y,Zhang D.Local linear discriminant analysis framework using sample neighbors[J].IEEE Transactions on Neural Networks,2011,22(7):1119-1132.
    [17]Nie F,Huang H,Cai X,et al.Efficient and robust feature selection via joint l2,1-norms minimization[J].Advances in Neural Information Processing Systems,2010(23):1813-18.
    [18]Zhu P,Zuo W,Zhang L,et al.Unsupervised feature selection by regularized self-representation[J].Pattern Recognition,2015,48(2):438-446.
    [19]UCI repository of machine learning datasets[EB/OL].[2015-04-10]http://archive.ics.uci.edu/ml/
    [20]Feature selection datasets[EB/OL].[2015-04-10]http://featureselection.asu.edu/datasets.php.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700