基于双特征和松弛边界的随机森林进行异常点检测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Random forest based on double features and relaxation boundary for anomaly detection
  • 作者:胡淼 ; 王开军
  • 英文作者:HU Miao;WANG Kaijun;College of Mathematics and Informatics, Fujian Normal University;Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University;
  • 关键词:异常点检测 ; 随机森林 ; 双特征过滤 ; 松弛边界
  • 英文关键词:anomaly detection;;Random Forest(RF);;double-feature filtering;;relaxation boundary
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:福建师范大学数学与信息学院;福建师范大学数字福建环境监测物联网实验室;
  • 出版日期:2018-11-26 13:57
  • 出版单位:计算机应用
  • 年:2019
  • 期:v.39;No.344
  • 基金:国家自然科学基金资助项目(61672157);; 福建省自然科学基金资助项目(2018J01778)~~
  • 语种:中文;
  • 页:JSJY201904005
  • 页数:7
  • CN:04
  • ISSN:51-1307/TP
  • 分类号:28-34
摘要
针对现有基于随机森林的异常检测算法性能不高的问题,提出一种结合双特征和松弛边界的随机森林算法用于异常点检测。首先,在只使用正常类数据构建随机森林的分类决策树过程中,在二叉决策树的每个节点里记录两个特征的取值范围(每个特征对应一个值域),以此双特征值域作为异常点判断的依据。然后,在进行异常检测时,当某样本不满足决策树节点中的双特征值域时,该样本被标记为候选异常类;否则,该样本进入决策树的下层树节点继续作特征值域的比较,若无下层节点则被标记为候选正常类。最后,由随机森林算法中的判别机制决定该样本的类别。在5个UCI数据集上进行的异常点检测实验结果表明,所提方法比现有的异常检测随机森林算法性能更好,其综合性能与孤立森林(iForest)和一类支持向量机(OCSVM)方法相当或更好,且稳定于较高水平。
        Aiming at the low performance of existing anomaly detection algorithms based on random forest, a random forest algorithm combining double features and relaxation boundary was proposed for anomaly detection. Firstly, in the process of constructing binary decision tree of random forest with normal class data only, the range of two features(each feature had a corresponding eigenvalue range) were recorded in each node of the binary decision tree, and the double-feature eigenvalue ranges were used as the basis for abnormal point judgment. Secondly, during the anomaly detection, if a sample did not satisfy the double-feature eigenvalue range in the decision tree node, the sample would be marked as a candidate exception class; otherwise, the sample would enter the lower nodes of the decision tree and continue the comparision with the corresponding double-feature eigenvalue range. The sample would be marked as candidate normal class if there were no lower nodes. Finally, the discriminative mechanism in random forest algorithm was used to distinguish the class of the samples. Experimented results on five UCI datasets show that the proposed method has better performance than the existing random forest algorithms for anomaly detection, and its comprehensive performance is equivalent to or better than isolation Forest(iForest) and One-Class SVM(OCSVM), and stable at a high level.
引文
[1] HAWKINS D M. Identification of outliers[M]. London: Chapman and Hall, 1980: 1-2.
    [2] DOMINGUES R, FILIPPONE M, MICHIARDI P, et al. A comparative evaluation of outlier detection algorithms: experiments and analyses [J]. Pattern Recognition, 2018, 74: 406-421.
    [3] WANG Y, WONG J, MINER A. Anomaly intrusion detection using one class SVM[C]// Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop. Piscataway, NJ: IEEE, 2004: 358-364.
    [4] SCHOLKOPF B, WILLIAMSON R, SMOLA A, et al. Support vector method for novelty detection[J]. Advances in Neural Information Processing Systems, 2000, 12(3): 582-588.
    [5] 张晓惠, 林柏钢. 基于特征选择和多分类支持向量机的异常检测[J]. 通信学报, 2009, 30(增刊1): 68-73. (ZHANG X H, LIN B G. Anomaly detection based on feature selection and multi-class support vector machines[J]. Journal on Communications, 2009, 30(S1): 68-73.
    [6] ERFANI S M, RAJASEGARAR S, KARUNASEKERA S, et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning[J]. Pattern Recognition, 2016, 58: 121-134.
    [7] PAULA E L, LADEIRA M, CARVALHO R N, et al. Deep learning anomaly detection as support fraud investigation in brazilian exports and anti-money laundering[C]// Proceedings of the 2016 IEEE International Conference on Machine Learning and Applications. Piscataway, NJ: IEEE, 2016: 954-960.
    [8] LIU F T, TING K M, ZHOU Z H. Isolation-based anomaly detection [J]. ACM Transactions on Knowledge Discovery from Data, 2012, 6(1): 1-39.
    [9] SHEN Y, LIU H, WANG Y, et al. A novel isolation-based outlier detection method[C]// PRICAI 2016: Proceedings of the 2016 Pacific Rim International Conference on Artificial Intelligence. Berlin: Springer, 2016: 446-456.
    [10] 邱一卉, 林成德. 基于随机森林方法的异常样本检测方法 [J]. 福建工程学院学报, 2007, 5(4): 392-396. (QIU Y H, LIN C D. Outlier detection based on random forest[J]. Journal of Fujian University of Technology, 2007, 5(4): 392-396.)
    [11] ZHOU Q F, ZHOU H, NING Y P, et al. Two approaches for novelty detection using random forest [J]. Expert Systems with Applications, 2015, 42(10): 4840-4850.
    [12] 李贞贵.随机森林改进的若干研究[D]. 厦门: 厦门大学, 2013: 28-30. (LI Z G. Several research on random forest improve[D]. Xiamen: Xiamen University, 2013: 28-30.)
    [13] 胡淼, 王开军, 李海超, 等.模糊树节点的随机森林与异常点检测[J]. 南京大学学报(自然科学版), 2018, 54(6): 1141-1151. (HU M, WANG K J, LI H C, et al. A random forest algorithm based on fuzzy tree node for anomaly detection[J]. Journal of Nanjing University (Natural Science), 2018, 54(6): 1141-1151.)
    [14] BREIMAN L, FRIEDMAN J, OLSHEN R, et al. Classification and Regression Trees[M]. New York:Champman & Hall,1984:18-55.
    [15] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012: 67-71. (LI H. Statistical Learning Method[M]. Beijing: Tsinghua University Press, 2012: 67-71.)
    [16] BREIMAN L. Bagging predictors [J]. Machine Learning, 1996, 24(2): 123-140.
    [17] BREIMAN L. Random forest [J]. Machine Learning, 2001, 45(1): 5-32.
    [18] 周志华.机器学习[M]. 北京: 清华大学出版社, 2016: 179-181. (ZHOU Z H. Machine Learning[M]. Beijing: Tsinghua University Press, 2016: 179-181.)
    [19] BLAKE C L, M C J. UCI repository of machine learning databases [EB/OL]. [2018- 05- 10]. http://mlearn.ics.uci.edu/MLRepository.html.
    [20] CHANG C C, LIN C J. LIBSVM: a library for support vector machines [EB/OL]. [2018- 05- 10]. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
    [21] LIU F T, TING K M, ZHOU Z H. Isolation-based anomaly detection [EB/OL]. [2018- 05- 10]. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html.
    [22] HAN J W, KAMBER M. 数据挖掘: 概念与技术[M]. 范明, 孟小峰, 译.3版.北京: 机械工业出版社, 2012: 236-240. (HAN J W, KAMBER M. Data Mining: Concepts and Techniques [M]. FAN M, MENG X F, translated. 3rd ed. Beijing: China Machine Press, 2012: 236-240.)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700