用户名: 密码: 验证码:
基于实例迁移的数据流分类挖掘方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Classification Mining Method for Data Streams Based on Instances Transfer
  • 作者:刘三民 ; 刘余霞
  • 英文作者:LIU Sanmin;LIU Yuxia;College of Computer and Information, Anhui Polytechnic University;
  • 关键词:互近邻 ; 迁移学习 ; 数据流分类 ; 增量学习
  • 英文关键词:mutual nearest neighbor;;transfer learning;;data stream classification;;incremental learning
  • 中文刊名:XXYK
  • 英文刊名:Information and Control
  • 机构:安徽工程大学计算机与信息学院;
  • 出版日期:2019-04-04 08:34
  • 出版单位:信息与控制
  • 年:2019
  • 期:v.48
  • 基金:国家自然科学基金资助项目(71371012);; 安徽省自然科学基金资助项目(1608085MF147);; 教育部人文社科基金资助项目(18YJA630114);; 安徽省提升计划一般项目(TSKJ2016B05)
  • 语种:中文;
  • 页:XXYK201903020
  • 页数:5
  • CN:03
  • ISSN:21-1138/TP
  • 分类号:133-137
摘要
为解决数据流分类过程中样本标注和概念漂移问题,提出了一种基于实例迁移的数据流分类挖掘模型.首先,该模型用支持向量机作学习器,用所得分类模型中的支持向量构建源领域,待分类的当前数据块为目标域.然后,借助互近邻思想在源域中挑选目标域中样本的真邻居进行实例迁移,避免发生负迁移.最后,通过合并目标域和迁移样本形成训练集,提高标注样本数量,增强模型的泛化能力.理论分析和实验结果表明,所提方法具有可行性,相比其它学习方法在分类准确性方面更具优势.
        To solve the problem of sample labeling and concept drift in the process of data streams classification, we propose an instance-based transfer data streams classification model. First, we use support vector machine as the learning machine in this model. The support vectors constitute the source domain, and the current data block forms the target domain. Then, we select the real neighbors of the target domain from the source domain according to mutual neighbor concept; as a result, the occurrence of negative transfer can be neglected. Finally, we combine the target domain and the transfer sample to form a training set, and this enlarges the number of labeled sample and enhances the generalization ability of the classifier model. Through the analysis of theory and the experiment results, the method is found to be feasible and superior to the other learning methods in terms of classification accuracy.
引文
[1] 何文韬,邵诚.工业大数据分析技术的发展及其面临的挑战[J].信息与控制,2018,47(4):398-410.He W T,Shao C.The development and challenges of industrial big data analysis technology[J].Information and Control,2018,47(4):398-410.
    [2] 孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862.Sun D W,Zhang G Y,Zheng W M.Big data stream computing:Technologies and instances[J].Journal of Software,2014,25(4):839-862.
    [3] Zhou Z Z,Zheng W S,Hu J F,et al.One-pass online learning:A local approach[J].Pattern Recognition,2016,51:346-357.
    [4] 吕艳霞,王翠容,王聪,等.一种基于数据不确定性的概念漂移数据流分类算法[J].应用科学学报,2017,35(5):559-569.Lü Y X,Wang C R,Wang C,et al.Data stream classification with data uncertainty and concept drift[J].Journal of Applied Sciences,2017,35(5):559-569.
    [5] Sun Y,Tang K,Minku L,et al.Online ensemble learning of data streams with gradually evolved classes[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(6):1532-1545.
    [6] Sancho-Asensio A,Orriols-Puig A,Casillas J.Evolving association streams[J].Information Sciences,2016,334-335:250-272.
    [7] 文益民,唐诗淇,冯超,等.基于在线迁移学习的重现概念数据流分类[J].计算机研究与发展,2016,53(8):1781-1791.Wen Y M,Tang S Q,Feng C,et al.Online transfer learning for mining recurring concept in data stream classification[J].Journal of Computer Research and Development,2016,53(8):1781-1791.
    [8] 赵强利,蒋艳凰,卢宇彤.具有回忆和遗忘机的数据流挖掘模型与算法[J].软件学报,2015,26(10):2567-2580.Zhao Q L,Jiang Y H,Lu Y T.Ensemble model and algorithm with recalling and forgetting mechanisms for data stream mining[J].Journal of Software,2015,26(10):2567-2580.
    [9] 刘三民,孙知信,刘涛.基于样本不确定性的增量式数据流分类研究[J].小型微型计算机系统,2015,36(2):193-196.Liu S M,Sun Z X,Liu T.Research of incremental data stream classification based on sample uncertainty[J].Journal of Chinese Computer Systems,2015,36(2):193-196.
    [10] Bartosz K,Michal W.Incremental weighted one-class classifier for mining stationary data streams[J].Journal of Computational Science,2015,9:19-25.
    [11] 杨海涛,肖军,王佩瑶,等.基于参数间隔孪生支持向量机的增量学习算法[J].信息与控制,2016,45(4):432-436.Yang H T,Xiao J,Wang P Y,et al.Incremental learning method based on twin parametric-margin support vector machine[J].Information and Control,2016,45(4):432-436.
    [12] 孙艳歌,王志海,原继东,等.基于信息熵的数据流自适应集成分类算法[J].中国科学技术大学学报,2017,47(7):575-582.Sun Y G,Wang Z H,Yuan J D,et al.Adaptive ensemble classification algorithm for data streams based on information entropy[J].Journal of University of Science and Technology of China,2017,47(7):575-582.
    [13] Weiss K,Taghi M K,Wang D D.A survey of transfer learning[J].Journal of Big Data,2016,3(9):1-40.
    [14] Aghamaleki J A,Baharlou S M.Transfer learning approach for classification and noise reduction on noisy web data[J].Expert Systems with Applications,2018,105:221-232.
    [15] 舒醒,于慧敏,郑伟伟,等.基于边际Fisher准则和迁移学习的小样本集分类器设计算法[J].自动化学报,2016,42(9):1313-1321.Shu X,Yu H M,Zheng W W,et al.Classifier-designing algorithm on a small dataset based on margin Fisher criterion and transfer learning[J].Acta Automatica Sinica,2016,42(9):1313-1321.
    [16] 杭文龙,蒋亦樟,刘解放,等.迁移近邻传播聚类算法[J].软件学报,2016,27(11):2796-2813.Hang W L,Jiang Y Z,Liu J F,et al.Transfer affinity propagation clustering algorithm[J].Journal of Software,2016,27(11):2796-2813.
    [17] Holmes G,Kirkby R,Pfahringer B.MOA:Massive online analysis[EB/OL].(2018-06-30)[2018-09-30].http://sourceforge.net/projects/moa-datastream.
    [18] Dong F,Lu J,Zhang G Q,et al.Active fuzzy weighting ensemble for dealing with concept drift[J].Journal of Computational Intelligence System,2018,11:438-450.
    [19] Hulten G,Spencer L,Domingos P.Mining time-changing data streams[C]//Internation Conference on Knowledge Discovery and Data Mining.New York,NJ,USA:ACM,2001:97-106.
    [20] Chang C C,Lin C J.LIBSVM:A Library for support vector machines[EB/OL].(2018-07-15)[2018-09-30].https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700