摘要
高维流式大数据的产生与发展对传统机器学习和数据挖掘算法提出了诸多挑战。本文结合流式大数据流式到达的特性,首先建立自适应增量特征提取算法模型。然后,针对噪声环境,建立基于特征空间校准的增量流形学习算法模型,解决小样本问题。最后,构造流形学习的正则化优化框架,解决高维数据流特征提取过程中产生的降维误差问题,并得到最终的最优解。实验结果表明本文提出的算法框架符合流形学习算法的3个评价指标:稳定性、提高性以及学习曲线能迅速增加到一个相对稳定的水平;从而实现了高维数据流的高效学习。
The emergence and development of high dimensional big data streams have presented a great challenge to the traditional machine learning and data mining algorithms.Based on the characteristics of data flow,first we construct an adaptive incremental feature extraction algorithm model.Then,according to the environment with noise,we establish an incremental manifold learning algorithm model based on feature space alignment to solve the small size sample problem.Finally,the regularization optimization framework of manifold learning is constructed to solve the problem of dimensionality reduction errors of high-dimensional data flow in feature extraction process,and then the optimal solutions are obtained.Experimental results show that the proposed algorithm framework conforms to the three evaluation criterions of manifold learning algorithm:Stability,enhancement,and the learning curve can rapidly increase to a relative stable level.Thus the efficient learning of high-dimensional data streams can be realized.
引文
[1]Zeng X,Li G.Incremental partial least squares analysis of big streaming data[J].Pattern Recognition,2014,47(11):3726-3735.
[2]孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862.Sun Dawei,Zhang Guangyan,Zheng Weimin.Big data stream computing:Technologies and instances[J].Journal of Software,2014,25(4):839-862.
[3]潘志松,唐斯琪,邱俊洋,等.在线学习算法综述[J].数据采集与处理,2016,31(6):1067-1082.Pan Zhisong,Tang Siqi,Qiu Junyang,et al.Survey on online learning algorithms[J].Journal of Data Acquisition and Processing,2016,31(6):1067-1082.
[4]张长水,张见闻.演化数据的学习[J].计算机学报,2013,36(2):310-316.Zhang Changshui,Zhang Jianwen.Learning on time-evolving data[J].Chinese Journal of Computers,2013,36(2):310-316.
[5]张钢,谢晓珊,黄英,等.面向大数据流的半监督在线多核学习算法[J].智能系统学报,2014,9(3):355-363.Zhang Gang,Xie Xiaoshan,Huang Ying,et al.An online multi-kernel learning algorithm for big data[J].CAAI Transactions on Intelligent Systems,2014,9(3):355-363.
[6]孙大为.大数据流式计算:应用特征和技术挑战[J].大数据,2015,3(2):99-105.Sun Dawei.Big data stream computing:Features and challenges[J].Big Data Research,2015,3(2):99-105.
[7]王桂玲,韩燕波,张仲妹,等.基于云计算的流数据集成与服务[J].计算机学报,2017,40(1):107-125.Wang Guiling,Han Yanbo,Zhang Zhongmei,et al.Cloud-based integration and service of streaming data[J].Chinese Journal of Computers,2017,40(1):107-125.
[8]Wang X.A summary of LDA,PCA and relative work[J].Journal of the Graduates Sun Yat-Sen University:Natural Sciences,Medicine,2007,28(4):50-61.
[9]Rosipal R,Kramer N.Overview and recent advances in partial least squares[C]∥International conference on Subspace,Latent Structure and Feature Selection.Heidelberg,Berlin:Springer Press,2006:34-51.
[10]Weng J Y,Zhang Y L,Hwang W S.Candid covariance-free incremental principal component analysis[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2003,25(8):1034-1040.
[11]李焕哲,吴志健,汪慎文,等.协方差矩阵自适应演化策略学习机制综述[J].电子学报,2017,45(1):238-245.Li Huanzhe,Wu Zhijian,Wang Shenwen,et al.The overview of learning mechanism of covariance matrix adaptation evolution strategy[J].Acta Electronica Sinica,2017,45(1):238-245.
[12]Chu D,Liao L,Ng K,et al.Incremental linear discriminant analysis:A fast algorithm and comparisons[J].IEEE Transactions on Neural Networks and Learning Systems,2015,26(11):2716-2735.
[13]李波.基于流形学习的特征提取方法及其应用研究[D].合肥:中国科学技术大学,2008.
[14]Chen M,Li W,Zhang W,et al.Dimensionality reduction with generalized linear models[C]∥Proceedings of the International Joint Conference on Artificial Intelligence.San Jose,CA,USA:IEEE Computer Society Press,2013:1267-1272.
[15]Tan C,Ji G.A manifold learning algorithm based on incremental tangent space alignment[C]∥International Conference on Cloud Computing and Security.Heidelberg,Berlin:Springer Press,2016:541-552.
[16]Zhang Z Y,Zha H Y.Principal manifolds and nonlinear dimensionality reduction via tangent space alignment[J].SIAM Journal of Scientific Computing,2004,26(1):313-338.
[17]Tan C,Guan J.A feature space alignment learning algorithm[C]∥Pacific Rim International Conference on Artificial Intelligence.Heidelberg,Berlin:Springer Press,2012:795-800.
[18]Geng X,Smith-Miles K.Encyclopedia of biometrics[M].New York:Springer,2015:912-917.
[19]Roweis S,Saul L.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
[20]Belkin M,Niyogi P.Laplacian eigenmaps for dimensionality reduction and data representation[J].Neural Computation,2003,15(6):1373-1396.
[21]Tenenbaum J,Silva de V,Langford J.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
[22]谈超,关佶红,周水庚.基于等角映射的多样本增量流形学习算法[J].模式识别与人工智能,2014,27(2):127-133.Tan Chao,Guan Jihong,Zhou Shuigeng.Multi-sample incremental manifold learning algorithm based on isogonal mapping[J].Pattern Recognition and Artificial Intelligence,2014,27(2):127-133.
[23]Roweis S.Research:Data for MATLAB[EB/OL].http:∥www.cs.nyu.edu/~roweis/data.html,2017-08-06.
[24]Yale University.Yale face database[EB/OL].http:∥cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html,2017-08-06.
[25]Li B,Li J,Zhang X.Nonparametric discriminant multi-manifold learning for dimensionality reduction[J].Neurocomputing,2015,152:121-126.
[26]Martinez A,Benavente R.The AR face database[R].Computer Vision Center,Technical Report.Barcelona,Spain:[s.n.],2007,3:5.
[27]Nene S,Nayar S,Murase H.Columbia object image library(COIL-20)[R].Technical Report CUCS-005-96.NewYork:Columbia University,1996.