摘要
带概念漂移不平衡流数据分类研究是机器学习和现实应用领域的一个难点和热点.针对带概念漂移不平衡流数据的动态性和不平衡性,本文中提出了随机平衡采样算法用于再平衡不平衡数据流.之后,在随机平衡采样算法的基础上提出了一种新的处理带概念漂移的不平衡流数据集成分类算法用于抵抗流数据的概念漂移和不平衡性.理论和实验表明本文中提出的集成分类算法对处理带概念漂移的不平衡流数据较强的多样性和泛化能力.
Mining imbalanced data stream with concept drifts become an important and challenging task in machine learning and real world application areas. In this paper,we proposed a new data sampling algorithm,called random balance sampling algorithm( RBS),to battle against the imbalanced data stream. Then a new ensemble classifier,called random balance sampling concept-drifting imbalanced streaming ensemble algorithm( RBSCISE) was built from imbalanced and concept-drifting streaming scenario. The theoretical and empirical study shows that the new ensemble classifier is superior and more robust for concept-drifting imbalanced data streams.
引文
[1]刘静静,智淑敏.一种传感器网络不确定感知数据挖掘方法研究[J].电子设计工程,2016(13):73-76.
[2]陈小芳,葛晓滨,马冠骏.基于数据挖掘的网络购物用户行为分析[J].牡丹江师范学院学报(自然科学版),2016(1):32-35.
[3]周立军,张杰,吕海燕.基于数据挖掘技术的网络入侵检测技术研究[J].现代电子技术,2016(6):10-13.
[4]李莉,王小刚.基于Spark的并行K-means气象数据挖掘研究[J].信息技术,2017(9):26-30.
[5]Domingos P,Hulten G.Mining high-speed data streams[C].Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Boston,2000:71-80.
[6]Tao Y.Mining Time-Changing Data Streams[J].Computer Science,2011.
[7]Nú1ez M,Fidalgo R,Morales R.Learning in environments with unknown dynamics:towards more robust concept learners.[J].Journal of Machine Learning Research,2007,8(8):2595-2628.
[8]Bifet A,Holmes G,Pfahringer B,et al.New ensemble methods for evolving data streams[C]//Intell Data Anal,2009.
[9]Gu X F,Xu J W,Huang S J,et al.An improving online accuracy updated ensemble method in learning from evolving data streams[C]//International Computer Conference on Wavelet Active Media Technology and Information Processing,2015.
[10]Loeffel P X,Biffet A,Marsala C,et al.Droplets Ensemble Learning on Drifting Data Streams[C]//International Symposium on Intelligent Data Analysis,Springer,Cham,2017.
[11]Gao J,Ding B,Fan W,et al.Classifying data streams with skewed class distributions and concept drifts[J].IEEE Internet Computing,2008,12(6):37-49.
[12]Ditzler G,Polikar R,Chawla N.An incremental learning algorithm for non-stationary environments and class imbalance[C].//International Conference on Pattern Recognition,2010.
[13]Wang S K,Dai B R.A G-means update ensemble learning approach for the imbalanced data stream with concept drifts[M].New York:Springer International Publishing,2016.
[14]Sun Y,Wang Z,Li H,et al.A novel ensemble classification for data streams with class imbalance and concept drift[J].International Journal of Performability Engineering,2017,13(6):945-955.
[15]季梦遥,袁磊.不平衡数据的随机平衡采样bagging算法分类研究[J].贵州大学学报(自然科学版),2017(6):54-58.
[16]袁磊,季梦遥.基于随机平衡采样的不平衡数据集分类算法研究[J].海南大学学报(自然科学版),2017(3):228-233.
[17]袁磊,季梦遥.基于随机平衡采样的不平衡数据流分类研究[J].云南民族大学学报(自然科学版),2018(1):63-68.
[18]Ditzler G,Polikar R,Chawla N V.An incremental learning algorithm for non-stationary environments and class imbalance[C]//20th International Conference on Pattern Recognition,ICPR 2010,Istanbul,Turkey,2010:2997-3000.
[19]Ditzler G,Polikar R.Incremental learning of concept drift from streaming imbalanced data[J].IEEE Transactions on Knowledge&Data Engineering,2013,25(10):2283-2301.
[20]Gao J,Fan W,Han J,et al.A general framework for mining concept-drifting data streams with skewed distributions[C]//Siam International Conference on Data Mining,Minnesota,Usa,2007(4):26-28.