基于类别随机化的随机森林算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Randomization of Classes Based Random Forest Algorithm
  • 作者:关晓蔷 ; 庞继芳 ; 梁吉业
  • 英文作者:GUAN Xiao-qiang;PANG Ji-fang;LIANG Ji-ye;School of Computer and Information Technology,Shanxi University;Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University),Ministry of Education;
  • 关键词:随机森林 ; 多分类问题 ; 类别随机化 ; 多样性
  • 英文关键词:Random forest;;Multi-class classification problems;;Randomization of classes;;Diversity
  • 中文刊名:JSJA
  • 英文刊名:Computer Science
  • 机构:山西大学计算机与信息技术学院;山西大学计算智能与中文信息处理教育部重点实验室;
  • 出版日期:2019-02-15
  • 出版单位:计算机科学
  • 年:2019
  • 期:v.46
  • 基金:国家自然科学基金项目(61876103);; 山西省青年科技基金项目(201701D221098);; 山西省重点研发项目(201603D111014);; 山西省留学基金项目(2016-003)资助
  • 语种:中文;
  • 页:JSJA201902034
  • 页数:6
  • CN:02
  • ISSN:50-1075/TP
  • 分类号:205-210
摘要
随机森林是数据挖掘和机器学习领域中一种常用的分类方法,已成为国内外学者共同关注的研究热点,并被广泛应用到各种实际问题中。传统的随机森林方法没有考虑类别个数对分类效果的影响,忽略了基分类器和类别之间的关联性,导致随机森林在处理多分类问题时的性能受到限制。为了更好地解决该问题,结合多分类问题的特点,提出一种基于类别随机化的随机森林算法(RCRF)。从类别的角度出发,在随机森林两种传统随机化的基础上增加类别随机化,为不同类别设计具有不同侧重点的基分类器。由于不同的分类器侧重区分的类别不同,所生成的决策树的结构也不同,这样既能够保证单个基分类器的性能,又可以进一步增大基分类器的多样性。为了验证所提算法的有效性,在UCI数据库中的21个数据集上将RCRF与其他算法进行了比较分析。实验从两个方面进行,一方面,通过准确率、F1-measure和Kappa系数3个指标来验证RCRF算法的性能;另一方面,利用κ-误差图从多样性角度对各种算法进行对比与分析。实验结果表明,所提算法能够有效提升集成模型的整体性能,在处理多分类问题时具有明显优势。
        Random forest is a commonly used classification method in the field of data mining and machine learning,which has become a research focus of scholars at home and abroad,and has been widely applied to various practical problems.The traditional random forest methods do not consider the influence of the number of classes on the classification effect,and neglect the correlation between base classifiers and classes,limiting the performance of the random forest in dealing with multi-class classification problems.In order to solve the problem better,combined with the characteristics of multi-class classification problem,this paper proposed a randomization of classes based random forest algorithm(RCRF).From the perspective of classes,the randomization of classes is added on the basis of two kinds of traditional randomizations of random forest,and the corresponding base classifiers with different emphasis are designed for different classes.The structures of the decision tree generated by the base classifier are different because different classifiers focus on different classes,which can not only guarantee the performance of the single base classifier,but also further increase the diversity of base classifier.In order to verify the validity of the proposed algorithm,RCRF is compared with other algorithms on 21 data sets in UCI database.The experiment is carried out from two aspects.On the one hand,the accuracy,F1-measure and Kappa coefficient are used to verify the performance of RCRF algorithm.On the other hand,theκ-error diagram is used to compare and analyze various algorithms from the perspective of diversity.Experimental results show that the proposed algorithm can effectively improve the overall performance of the integrated model and has obvious advantages in dealing with multi-class classification problems.
引文
[1] BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-23.
    [2] FERNANDEZ-DELGADO M,CERNADAS E,BARRO S,et al.Do we need hundreds of classifiers to solve real world classification problems[J].Journal of Machine Learning Research,2014,15(1):3133-3181.
    [3] MEHER P K,SAHU T K,RAO A R.Identification of species based on DNA barcode using k-mer feature vector and random forest classifier[J].Gene,2016,592(2):316-324.
    [4] JOG A,CARASS A,ROY S,et al.Random forest regression for magnetic resonance image synthesis[J].Medical Image Analysis,2017,35:475-488.
    [5] WANG S,LIU J,BI Y Y,et al.Automatic recognition of breast gland based on two-step clustering and random forest[J].Computer Science,2018,45(3):247-252.(in Chinese)王帅,刘娟,毕姚姚,等.基于两步聚类和随机森林的乳腺腺管自动识别方法[J].计算机科学,2018,45(3):247-252.
    [6] FANELLI G,DANTONE M,GALL J,et al.Random forests for real time 3Dface analysis[J].International Journal of Computer Vision,2013,101(3):437-458.
    [7] GALL J,YAO A,RAZAVI N,et al.Hough forests for object detection,tracking,and action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(11):2188-2202.
    [8] GEURTS P,ERNST D,WEHENKEL L.Extremely randomized trees[J].Machine Learning,2006,63(1):3-42.
    [9] RODRIGUEZ J J,KUNCHEVA L I,ALONSO C J.Rotation forest:a new classifier ensemble method[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2006,28(10):1619-1630.
    [10]ZHANG L,SUGANTHAN P N.Random forests with ensemble of feature spaces[J].Pattern Recognition,2014,47(10):3429-3437.
    [11]ABELLN J,MANTAS C J,CASTELLANO J G.A random forest approach using imprecise probabilities[J].KnowledgeBased Systems,2017,134:72-84.
    [12]WANG Y,XIA S T,TANG Q,et al.A novel consistent random forest framework:bernoulli random forests[J].IEEE Transactions on Neural Networks&Learning Systems,2018,29(8):3510-3523.
    [13]YE Y,WU Q,HUANG J Z,et al.Stratified sampling for feature subspace selection in random forests for high dimensional data[J].Pattern Recognition,2013,46(3):769-787.
    [14]XIA J,LI L,LI L,et al.Adjusted weight voting algorithm for random forests in handling missing values[J].Pattern Recognition,2017,69(C):52-60.
    [15]HU C,CHEN Y,HU L,et al.A novel random forests based class incremental learning method for activity recognition[J].Pattern Recognition,2018,78:277-290.
    [16]BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
    [17]HO T K.The random subspace method for constructing decision forests[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844.
    [18]DEMSAR J.Statistical comparisons of classifiers over multiple data sets[J].Journal of Machine Learning Research,2006,7(1):1-30.
    [19]MARGINEANTU D D,DIETTERICH T G.Pruning adaptive boosting[C]∥Fourteenth International Conference on Machine Learning.Morgan Kaufmann Publishers Inc.,1997:211-218.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700