用户名: 密码: 验证码:
一种块增量偏最小二乘算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A chunk increment partial least square algorithm
  • 作者:曾雪强 ; 叶震麟 ; 左家莉 ; 万中英 ; 吴水秀
  • 英文作者:ZENG Xue-qiang;YE Zhen-lin;ZUO Jia-li;WAN Zhong-ying;WU Shui-xiu;Information Engineering School,Nanchang University;School of Computer & Information Engineering,Jiangxi Normal University;
  • 关键词:增量学习 ; 偏最小二乘 ; 数据块 ; 数据降维
  • 英文关键词:incremental learning;;partial least square;;data chunk;;dimension reduction
  • 中文刊名:SDDX
  • 英文刊名:Journal of Shandong University(Natural Science)
  • 机构:南昌大学信息工程学院;江西师范大学计算机信息工程学院;
  • 出版日期:2019-02-25 14:59
  • 出版单位:山东大学学报(理学版)
  • 年:2019
  • 期:v.54
  • 基金:国家自然科学基金资助项目(61463033,61866017);; 江西省杰出青年人才资助计划(20171BCB23013);; 江西省教育厅科学技术研究项目(GJJ150354)
  • 语种:中文;
  • 页:SDDX201903013
  • 页数:9
  • CN:03
  • ISSN:37-1389/N
  • 分类号:97-105
摘要
增量学习模型是一种有效挖掘大规模数据的数据处理技术。增量式偏最小二乘(incremental partial least square, IPLS)模型是一种基于增量技术的偏最小二乘算法改进模型,具有不错的数据降维效果,但是,IPLS模型每新增1个样本都需要对模型进行增量更新,导致模型的训练时间较长。针对这一问题,基于数据分块更新的思想提出了一种块增量偏最小二乘算法(chunk incremental partial least square, CIPLS)。CIPLS算法将样本数据划分为数个的数据块(chunk),然后再以数据块为单位对模型进行增量更新,从而大幅减少了模型的更新频率,提高了模型的学习效率。在K8版本的p53蛋白数据集和路透文本分类语料库上的对比实验表明,CIPLS算法大幅度缩短了增量式偏最小二乘模型的训练时间。
        For the data mining of large-scale data, incremental learning is an effective and efficient technique. As an improved partial least square(PLS) method based on incremental learning, incremental partial least square(IPLS) has a competitive dimension reduction performance. However, there is a drawback in this approach that training samples must be learned one by one, which consumes a lot of time on the issue of on-line learning. To overcome this problem, we propose an extension of IPLS called chunk incremental partial least square(CIPLS) in which a chunk of training samples is processed at a time. Comparative experiments on k8 cancer rescue mutants data set and Reuter-21578 text classification corpus show the proposed CIPLS algorithm is much more efficient than IPLS without sacrifice dimension reduction performance.
引文
[1] WOLD S. Principal component analysis[J]. Chemometrics & Intelligent Laboratory Systems, 1987, 2(1): 37-52.
    [2] LANDAUER T K, FOLTZ P W, LAHAM D. Introduction to latent semanticanalysis[J]. Discourse Processes, 1998, 25(2/3): 259-284.
    [3] BOULESTEIX A L. PLS dimension reduction for classification with microarraydata[J]. Statistical Applications in Genetics and Molecular Biology, 2004, 3(1): 1-30.
    [4] ZENG X Q, LI G Z, YANG J Y, et al. Dimension reduction with redundant gene elimination for tumor classification[J]. BMC Bioinformatics, 2008, 9(Suppl 6): S8.
    [5] YAN J, ZHANG B, LIU N, et al. Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(3):320-333.
    [6] 李雪, 蒋树强. 智能交互的物体识别增量学习技术综述[J]. 智能系统学报, 2017, 12(2):140-149.LI Xue, JIANG Shuqiang. Incremental learning and object recognition system based on intelligent HCI: a survey[J]. CAAI Transactions on Intelligent System, 2017, 12(2): 140-149.
    [7] 卜范玉, 陈志奎, 张清辰. 支持增量式更新的大数据特征学习模型[J]. 计算机工程与应用, 2015, 51(12):21-26.BU Fanyu, CHEN Zhikui, ZHANG Qingchen. Incremental updating method for big data feature learning[J]. Computer Engineering and Applications, 2015, 51(12): 21-26.
    [8] OZAWA S, PANG S, KASABOV N. Online feature extraction for evolving intelligent systems[M]//OZAWA S, PANG S, KASABOV N. eds. Evolving Intelligent Systems. Hoboken: John Wiley & Sons, Inc., 2010: 151-171.
    [9] WENG J Y, ZHANG Y L, HWANG W S. Candid covariance-free incremental principal componentanalysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(8): 1034-1040.
    [10] ZENG X Q, LI G Z. Dimension reduction for p53 protein recognition by using incremental partial leastsquares[J]. IEEE Transactions on NanoBioscience, 2014, 13(2): 73-79.
    [11] HIRAOKA K, HIDAI K, HAMAHIRA M, et al. Successive learning of linear discriminant analysis: sanger-type algorithm[C]//International Conference on Pattern Recognition, 2000. Borcelona: IEEE, 2000:664-667.
    [12] PANG S, OZAWA S, KASABOV N. Incremental linear discriminant analysis for classification of datastreams[J]. IEEE Transactions on Systems, Man and Cybernetics: Part b (Cybernetics), 2005, 35(5): 905-914.
    [13] OZAWA S, PANG S, KASABOV N. Incremental learning of chunk data for online pattern classification systems[J]. IEEE Transactionson Neural Networks, 2008, 19(6):1061-1074.
    [14] 曾雪强, 赵丙娟, 向润,等. 基于偏最小二乘的人脸年龄估计[J]. 南昌大学学报(工科版), 2017, 39(4):380-385.ZENG Xueqiang, ZHAO Bingjuan, XIANG Run, et al. Partial least squares based facial age estimation[J]. Journal of Nanchang University(Engineering & Technology), 2017, 39(4): 380-385.
    [15] MARTíNEZ J L, SAULO H, ESCOBAR H B, et al. A new model selection criterion for partial least squaresregression[J]. Chemometrics and Intelligent Laboratory Systems, 2017, 169: 64-78.
    [16] HELLAND I S. On the structure of partial least squaresregression[J]. Communications in Statistics - Simulation and Computation, 1988, 17(2): 581-607.
    [17] DE JONG S. SIMPLS: an alternative approach to partial least squaresregression[J]. Chemometrics and Intelligent Laboratory Systems, 1993, 18(3): 251-263.
    [18] DANZIGER S A, BARONIO R, HO L, et al. Predicting positive p53 cancer rescue regions using most informative positive (MIP) active learning[J]. PLOS Computational Biology, 2009, 5(9): e1000498.
    [19] HTUN P T, KHAINGK T. Important roles of data mining techniques for anomaly intrusion detectionsystem[J]. International Journal of Advanced Research in Computer Engineering & Technology, 2013, 2(5): 1850-1854.
    [20] WITTEN I, FRANK E. Datamining: practical machine learning tools and techniques [J]. ACM Sigmod Record, 2005, 31(1):76-77.
    [21] YANG Y, LIU X. A re-examination of text categorization methods [C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley: ACM Press, 1999: 42-49.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700