动态误分类代价下代价敏感属性选择分治算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Divide and conquer algorithm for cost-sensitive feature selection based on dynamic misclassification costs
  • 作者:黄伟婷 ; 赵红
  • 英文作者:HUANG Weiting;ZHAO Hong;School of Computing,Minnan Normal University;Lab of Granular Computing,Minnan Normal University;
  • 关键词:粗糙集 ; 代价敏感 ; 属性选择 ; 动态误分类代价 ; 自适应分治
  • 英文关键词:rough sets;;cost-sensitive;;feature selection;;dynamic misclassification cost;;adaptive divide and conquer
  • 中文刊名:JSGG
  • 英文刊名:Computer Engineering and Applications
  • 机构:闽南师范大学计算机学院;闽南师范大学粒计算及其应用重点实验室;
  • 出版日期:2017-02-16 10:54
  • 出版单位:计算机工程与应用
  • 年:2018
  • 期:v.54;No.898
  • 基金:福建省教育厅项目(No.JAT160287,No.JAT160307);; 漳州市自然科学基金(No.ZZ2016J35);; 国家自然科学基金(No.61379049,No.61379089)
  • 语种:中文;
  • 页:JSGG201803026
  • 页数:7
  • CN:03
  • 分类号:171-176+216
摘要
代价敏感属性选择问题的目的是通过权衡测试代价和误分类代价,得到一个具有最小总代价的属性子集。目前,多数代价敏感属性选择方法只考虑误分类代价固定不变的情况,不能较好地解决类分布不均衡等问题。而在大规模数据集上,算法效率不理想也是代价敏感属性选择的主要问题之一。针对这些问题,以总代价最小为目标,设计了一种新的动态误分类代价机制。结合分治思想,根据数据集规模按列自适应拆分各数据集。基于动态误分类代价重新定义最小代价属性选择问题,提出了动态误分类代价下的代价敏感属性选择分治算法。通过实验表明,该算法能在提高效率的同时获得最优误分类代价,从而保证所得属性子集的总代价最小。
        Cost-sensitive feature selection problem aims at getting an attribute subset with the minimal total cost, through considering the trade-off between test costs and misclassification costs. There are two main challenges in cost-sensitive feature selection problem. On the one hand, most of the cost-sensitive attribute selection methods only take fixed misclassification costs into account, thus these methods can't solve imbalance class problems. On the other hand, the efficiency is not ideal when dealing with cost-sensitive feature selection on large scale datasets. In this paper, the contributions for the two challenges are summarized as follows. Firstly, it designs a new dynamic mechanism of misclassification costs to minimize total cost. Secondly, each of datasets is adaptively divided according to the scale of the dataset based on divide and conquer method. Finally, cost-sensitive feature selection problem is redefined based on dynamic misclassification costs, and a divide and conquer algorithm is proposed for cost-sensitive feature selection problem. The proposed algorithm is compared with two other algorithms on seven UCI datasets. Some experiments demonstrate that the proposed algorithm can improve the efficiency and obtain the optimal misclassification costs as well, so as to ensure to minimize total cost.
引文
[1]Yang Qiang,Wu Xindong.10 challenging problems in data mining research[J].International Journal of Information Technology&Decision Making,2006,5(4):597-604.
    [2]Turney P D.Types of cost in inductive concept learning[C]//Proceedings of the Workshop on Cost-Sensitive Learning at the 17th ICML,California,2000:15-21.
    [3]Hsu J L,Hung P C,Lin H Y,et al.Applying undersampling techniques and cost-sensitive learning methods on risk assessment of breast cancer[J].Journal of Medical Systems,2015,39(4):1-13.
    [4]Fan Jianping,Zhang Ji,Mei Kuizhi,et al.Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection[J].Pattern Recognition,2015,48(5):1673-1687.
    [5]Li Huaxiong,Zhang Libo,Huang Bing,et al.Sequential three-way decision and granulation for cost-sensitive face recognition[J].Knowledge-Based Systems,2016,91:241-251.
    [6]Lu J,Liong V E,Zhou J.Cost-sensitive local binary feature learning for facial age estimation[J].IEEE Transactions on Image Processing,2015,24(12):5356-5368.
    [7]Jahromi A T,Stakhovych S,Ewing M.Customer churn models:a comparison of probability and data mining approaches[M]//Looking forward,looking back:drawing on the past to shape the future of marketing.[S.l.]:Springer International Publishing,2016:144-148.
    [8]Min F,Zhu W.Minimal cost attribute reduction through backtracking[M]//Database theory and application,bioscience and bio-technology.Berlin/Heidelberg:Springer,2011:100-107.
    [9]Li Xiangju,Zhao Hong,Zhu W.An exponent weighted algorithm for minimal cost feature selection[J].International Journal of Machine Learning and Cybernetics,2014:1-10.
    [10]Weiss Y,Elovici Y,Rokach L.The CASH algorithm-costsensitive attribute selection using histograms[J].Information Sciences,2013,222:247-268.
    [11]Zhao Hong,Zhu W.Optimal cost-sensitive granularization based on rough sets for variable costs[J].KnowledgeBased Systems,2014,65:72-82.
    [12]Shu Wenhao,Shen Hong.Multi-criteria feature selection on cost-sensitive data with missing values[J].Pattern Recognition,2016,51:268-280.
    [13]Dai Jianhua,Han Huifeng,Hu Qinghua,et al.Discrete particle swarm optimization approach for cost sensitive attribute reduction[J].Knowledge-Based Systems,2016.
    [14]Fan Anjing,Zhao Hong,Zhu W.Test-cost-sensitive attribute reduction on heterogeneous data for adaptive neighborhood model[J].Soft Computing,2015:1-12.
    [15]Krawczyk B,Wo?niak M,Schaefer G.Cost-sensitive decision tree ensembles for effective imbalanced classification[J].Applied Soft Computing,2014,14:554-562.
    [16]Min Fan,He Huaping,Qian Yuhua,et al.Test-cost-sensitive attribute reduction[J].Information Sciences,2011,181(22):4928-4942.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700