决策信息系统的连续型特征选取方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Continuous Feature Selection Method of Decision Information System
  • 作者:李国和 ; 杨绍伟 ; 吴卫江 ; 郑艺峰
  • 英文作者:LI Guohe;YANG Shaowei;WU Weijiang;ZHENG Yifeng;Beijing Key Lab of Petroleum Data Mining,China University of Petroleum;College of Geophysics and Information Engineering,China University of Petroleum;Key Laboratory of Data Science and Intelligence Application(Fujian Province),School of Computer Sciences,Minnan Normal University;
  • 关键词:特征约简 ; 随机森林 ; 相似性度量 ; 二次筛选
  • 英文关键词:feature reduction;;random-forest;;similarity measure;;two filtering
  • 中文刊名:XXYK
  • 英文刊名:Information and Control
  • 机构:中国石油大学(北京)石油数据挖掘北京市重点实验室;中国石油大学(北京)地球物理与信息工程学院;闽南师范大学计算机学院数据科学与智能应用福建省高等学校重点实验室;
  • 出版日期:2019-04-15
  • 出版单位:信息与控制
  • 年:2019
  • 期:v.48
  • 语种:中文;
  • 页:XXYK201902014
  • 页数:8
  • CN:02
  • ISSN:21-1138/TP
  • 分类号:100-107
摘要
在大数据应用过程中,对特征集合进行约简,降低数据维度,有助于提升数据模型的泛化能力.采用随机森林模型选择和相似性度量结合的方式对特征集合进行特征初选,并通过前向搜索策略以距离为评价方式对初选集合进行二次筛选,最终获得特征子集.算法模型采用局部遍历以提高执行效率,同时通过前向选择算法解决传统方法无法确定最优特征数目的问题.实验结果表明,本文提出的方法能更有效地选择特征子集,提高模型的分类准确率.
        In the process of large data application,it is necessary to reduce the feature set for improving the generalization ability of the data model.We use random forest model selection and similarity measure to select feature sets.Then,we adopt the forward search strategy to finish the second filtering.In the algorithmic model,it uses local traversal because it can be helpful to enhance the execution efficiency.At the same time,it can effectively solve the problem about how to determine the optimal number of features.The experimental results show that this method can obtain the feature subset more effectively and improve the classification accuracy.
引文
[1] 黄铉.特征选择研究综述[J].信息与电脑(理论版),2017(24):67-68.Huang X.Summary of research on feature selection[J].Information and Computer (Theoretical Version),2017(24):67-68.
    [2] Breiman L.Rand om forests[J].Machine Learning,2001,45(1):5-32.
    [3] 常春云.基于Lasso特征选择的自闭症预测[J].北京生物医学工程,2017,36(6):564-568,596.Chang C Y.Autism prediction based on Lasso feature selection[J].Biomedical Engineering in Beijing,2017,36(6):564-568,596.
    [4] 赵宇,陈锐,刘蔚.集成特征选择的最优化支持向量机分类器模型研究[J].计算机科学,2016,43(8):177-182,215.Zhao Y,Chen R,Liu W.Optimization of ensemble feature selection based on support vector machine classifier[J].Computer Science,2016,43(8):177-182,215.
    [5] 傅昊,徐国胜.基于随机森林和RFE的组合特征选择的研究[C]//第十九届全国青年通信学术年会论文集.北京:中国通信学会,2014.Fu H,Xu G S.Research on the combination of feature selection based on random forests and RFE[C]//Proceedings of the 19th National Youth Communication Annual Conference.Beijing:China Institute of Communications,2014.
    [6] Hancer E,Xue B,Zhang M,et al.Pareto front feature selection based on artificial bee colony optimization[J].Information Sciences,2017,422:462-479.
    [7] 张文倩,庄华亮,陈翔,等.基于竞争思想的分级聚类算法[J].信息与控制,2017,46(5):614-619.Zhang W Q,Zhuang H L,Chen X,et al.Hierarchical clustering algorithm based on competitive thinking[J].Information and Control,2017,46(5):614-619.
    [8] Mafarja M M,Mirjalili S.Hybrid whale optimization algorithm with simulated annealing for feature selection[J].Neurocomputing,2017,260:302-312.
    [9] Hall M A,Smith L A.Feature selection for machine learning:Comparing a correlation-based filter approach to the wrapper[C]//Twelfth International Florida Artificial Intelligence Research Society Conference.Orlando,FL,USA:DBLP,1999:235-239.
    [10] Maldonado S,Weber R.A wrapper method for feature selection using support vector machines[J].Information Sciences,2009,179(13):2208-2217.
    [11] Archer K J,Kimes R V.Empirical characterization of random forest variable importance measures[J].Computational Statistics & Data Analysis,2008,52(4):2249-2260.
    [12] Genuer R,Poggi J M,Tuleau-Malot C,et al.Random forests for big data[J].Big Data Research,2017,9:28-46.
    [13] 孙广路,宋智超,刘金来,等.基于最大信息系数和近似马尔科夫毯的特征选择方法[J].自动化学报,2017,43(5):795-805.Sun G L,Song Z C,Liu J L,et al.Feature selection method based on maximum information coefficient and approximate Markov blanket[J].Acta Automatica Sinica,2017,43(5):795-805.
    [14] Zhang J,Chen M,Zhao S,et al.Relief F-based EEG sensor selection methods for emotion recognition[J].Sensors,2016,16(10):1558.
    [15] Liu X,Wang L,Zhang J,et al.Global and local structure preservation for feature selection[J].IEEE Transactions on Neural Networks & Learning Systems,2017,25(6):1083-1095.
    [16] Zhou Q,Zhou H,Li T.Cost-sensitive feature selection using random forest:Selecting low-cost subsets of informative features[J].Knowledge-Based Systems,2016,95:1-11.
    [17] Ramrez-Gallego S,Krawczyk B,Woniak M,et al.A survey on data preprocessing for data stream mining:Current status and future directions[J].Neurocomputing,2017,239(C):39-57.
    [18] Guyon I,Elisseeff A.An introduction to variable and feature selection[C]//Joint International Conference on Artificial Neural Networks and Neural Information Processing.Berlin,Germany:Springer-Verlag,2003:737-744.
    [19] Zhang Y,Ding C,Li T.Gene selection algorithm by combining reliefF and mRMR[J].BMC Genomics,2008,9(S2):S27-S27.
    [20] Muriel G,Ioannis D,Sovan L.Review and comparison of methods to study the contribution of variables in artificial neural network models[J].Ecological Modelling,2003,160(3):249-264.
    [21] 王宏威,李国和.基于属性相似度的连续型特征选择方法[J].渤海大学学报(自然科学版),2014(4):350-355.Wang H W,Li G H.Continuous feature selection method based on attribute similarity[J].Journal of Bohai University (Natural Science Edition),2014(4):350-355.
    [22] 邓小龙.基于距离相关的最小冗余最大相关特征选择法在QSAR中的应用[D].长沙:湖南农业大学,2016.Deng X L.Application of minimum redundancy maximum correlation feature selection based on range correlation in QSAR[D].Changsha:Hunan Agricultural University,2016.
    [23] 李捷,陈彦如,杨璐.基于两阶段组合预测模型的区域物流需求预测[J].信息与控制,2018,47(2):247-256.Li J,Chen Y R,Yang L.Regional logistics demand forecasting based on two-stage combination forecasting model[J].Information and Control,2018,47(2):247-256.
    [24] 姚登举,杨静,詹晓娟.基于随机森林的特征选择算法[J].吉林大学学报(工学版),2014,44(1):137-141.Yao D J,Yang J,Zhan X J.Feature selection algorithm based on random forest[J].Journal of Jilin University (Engineering),2014,44(1):137-141.
    [25] Zhou X,Tuck D P.MSVM-RFE:Extensions of SVM-RFE for multiclass gene selection on DNA microarray data[J].Bioinformatics,2007,23(9):1106.
    [26] Mundra P A,Rajapakse J C.SVM-RFE with MRMR filter for gene selection[J].IEEE Transactions on Nanobioscience,2010,9(1):31-37.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700