基于动量方法的受限玻尔兹曼机的一种有效算法

英文篇名：An Effective Algorithm of Restricted Boltzmann Machine Based on Momentum Method
作者：沈卉卉 ; 李宏伟
英文作者：SHEN Hui-hui;LI Hong-wei;School of Mathematics and Physics,China University of Geosciences;School of Statistics & Information Management,Hubei University of Economics;Hubei Subsurface Multi-scale Imaging Key Laboratory,China University of Geosciences;
关键词：深度学习 ; 受限玻尔兹曼机 ; Kullback-Leibler ; (KL)距离 ; 蒙特卡罗思想 ; 动量
英文关键词：deep learning;;restricted Boltzmann machine;;Kullback-Leibler(KL) divergence;;Monte Carlo method;;momentum
中文刊名：DZXU
英文刊名：Acta Electronica Sinica
机构：中国地质大学数理学院;湖北经济学院信息管理与统计学院;中国地质大学(武汉)地球内部多尺度成像湖北省重点实验室;
出版日期：2019-01-15
出版单位：电子学报
年：2019
期：v.47;No.431
基金：湖北省教育厅科技处重点项目(No.20182203);; 湖北省高等学校优秀中青年创新团队计划项目(No.T201516)
语种：中文;
页：DZXU201901023
页数：7
CN：01
ISSN：11-2087/TN
分类号：178-184

摘要

深度学习给模式识别与机器学习带来了巨大的变化,已成功应用于语言处理、图像处理、信号处理、商业经济等方面.受限玻尔兹曼机(Restricted Boltzmann Machine,RBM)是一个表示能力强、很好的生成模型,多个RBM堆叠而构成的深度信念网络模型(Deep Belief Nets,DBN)的学习时间会较长.为加快整个DBN网络的学习时间和提高分类效果,本文提出基于动量方法 RBM的一种有效算法.该算法在RBM预训练阶段,结合梯度上升算法特点采取快速上升的动量方式;以及BP算法微调阶段,为了能精确的找到最优点,结合梯度下降算法特点,相应的引入缓慢下降式的动量项,即在梯度上升和梯度下降过程中都使用不同的动量方式.本文算法在MNIST手写数字体和CMU-PIE人脸数据库上进行了实验,结果表明,提出的改进算法能够有效地增强图像特征的表达能力,提高图像的分类效果和实验效率.
Deep learning is bringing revolution to pattern recognition and machine learning,which has been successfully applied to language processing, image processing, signal processing,business economy and so on. Restricted Boltzmann machine( RBM) is a strong representation and generative mod el,however, the learning time of deep belief nets( DBN),which consists of multiple stacking RBM,will be longer. In this paper, the improved momentum method is used not only in gradient ascent algorithm but also in gradient descent algorithm for both classification accuracy enhancement and training time decreasing. According to the characteristics of the gradient ascent algorithm,a rapidly ascending momentum method is used in the RBM pre-training phase,which greatly improves the speed of learning. According to the characteristics of the gradient descent algorithm, an improved slowly descending momentum term is also used in the fine-tuning stage to accurately find the best point. Through the recognition experiments on the MNIST dataset and CMU-PIE face dataset, the achieved results show that the improved momentum algorithm can effectively enhance the ability of image feature expression and improve both accuracy and computation efficiency.

引文

[1]焦李成,杨淑媛,刘芳等.神经网络七十年:回顾与展望[J].计算机学报,2016,39(1):1-21.Jiao Li-cheng,Yang Shu-yuan,Liu Fang. Neural netw ork in seventy:retrospect and prospect[J]. Chinese Journal of Computers,2016,39(1):1-21.(in Chinese)
    [2] Lee H,Grosse R,Ranganath R. Convolutional deep belief netw orks for scalable unsupervised learning of hierarchical representations[A]. Proceedings of the 26th Annual International Conference on M achine Learning[C]. New York:ACM,2009:609-616.
    [3]Swersky K,Chen B,Marlin B M. A tutorial on stochastic approximation algorithms for training restricted boltzmann machines and deep belief nets[A]. ITA[C]. IEEE,2010,80-89.
    [4]Mei X G,Ma Y,Fan F. Infrared ultraspectral signature classification based on a restricted Boltzmann machine w ith sparse and prior constraints[J]. International Journal of Remote Sensing,2015,36(18):4724-4747.
    [5]Hinton G E,Srivastava N,Krizhevsky A,Sutskever I,Salakhutdinov R. Improving neural netw orks by preventing coadaptation of feature detectors[DB/OL]. https://arxiv.org/pdf/1207. 0580v1. pdf,2012-7-3.
    [6] Wager S,Wang S,Liang P. Dropout training as adaptive regularization[DB/OL]. https://arxiv. org/pdf/1307.1493v2. pdf,2013-11-1.
    [7] Hinton G E. Training products of experts by minimizing contrastive divergence[J]. Neural Computation,2002,14(8):1711-1800.
    [8]Mayraz G,Hinton G E. Recognizing handwritten digits using hierarchical products of experts[J]. IEEE Transactions on Pattern Analysis and M achine Intelligence,2002,24(2):189-197.
    [9]Hinton G E,Osindero S,Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation,2006,18(7):1527-1554.
    [10]杨杰,孙亚东,张良俊,刘海波.基于弱监督学习的去噪受限玻尔兹曼机特征提取算法[J].电子学报,2014,42(12):2365-2370.Yang Jie,Sun Ya-dong,Zhang Liang-jun,Liu Hai-bo.Weakly supervised learning w ith denoising restricted Boltzmann machines for extracting features[J]. Acta Electronica Sinica,2014,42(12):2365-2370.(in Chinese)
    [11] Lopes N,Ribeiro B,Goncalves J. Restricted Boltzmann machines and deep belief netw orks on multi-core processors[A]. WCCI 2012 IEEE World Congress on Computational Intelligence June[C]. Brisbane,Australia,2012.10-15.
    [12] Zhang Ch Y,Philip-Chen C L,Chen D W. MapReduce based distributed learning algorithm for restricted Boltzmann machine[J]. Neurocomputing,2016,198:4-11.
    [13]Rumelhart D E,Hinton G E,Williams R J. Learning representations by back-propagating errors[J]. Nature,1986,323:533-536.
    [14]Hinton G E. A practical guide to training restricted Boltzmann machines[R]. Neural Netw orks:Tricks of the Trade(2nd ed),2012. 599-619.
    [15]Sutskever I,Martens J,Dahl G,Hinton G. On the importance of initialization and momentum in deep learning[A]. Proc International Conference on Machine Learning[C]. Atlanta,USA,2013:1139-1147.
    [16]Nitanda A. Stochastic proximal gradient descent with acceleration techniques[A]. Proc Advances in Neural Information Processing Systems[C]. M ontreal,Canada,2014.1574-1582.
    [17]Zareba S,Gonczarek A,Tomczak J M,Swiatek J. Accelerated learning for restricted Boltzmann machine w ith momentum term[A]. International Conference on Systems Engineering[C]. Coventry,UK,2015. 187-192.
    [18]Yuan K,Ying B C,Sayed A H. On the influence of momentum acceleration on online learning[J]. Journal of M achine Learning Research,2016(17):1-66.
    [19]李飞,高晓光,万开方.基于权值动量的RBM加速学习算法研究[J].自动化学报,2017,43(7):1142-1159.Li Fei,Gao Xiao-guang,Wan Kai-fang. Research on RBM accelerating learning algorithm w ith w eight momentum[J]. Acta Automatica Sinica,2017,43(7):1142-1159.(in Chinese)
    [20] Fischer A,Igel C. Training restricted Boltzmann machines:An introduction[J]. Pattern Recognition,2014,(47):25-39.
    [21]Polyak T. Some methods of speeding up the convergence of iteration methods[J]. USSR Computational M athematics and M athematical Physics,1964,4(5):1-17.
    [22]Goodfellow I,Bengio Y,Courville A著,赵申剑等译,深度学习[M].北京:人民邮电出版社,2017. 181-187.
    [23]王岳青,窦勇,吕启,李宝峰,李腾.基于异构体系结构的并行深度学习编程框架[J].计算机研究与发展,2016,53(6):1202-1210.Wang Yue-qing,Dou Yong,Lv Qi,et al. A parallel deep learning programming framew ork based on heterogeneous architecture[J]. Journal of Computer Research and Development,2016,53(6):1202-1210.(in Chinese)
    [24]付晓,沈远彤,付丽华等.基于特征聚类的稀疏自编码快速算法[J].电子学报,2018,46(5):1041-1046.FU Xiao,SHEN Yuan-tong,FU Li-hua,et al.. An optimized sparse auto-encoder netw ork based on feature clustering[J]. Acta Electronica Sinica 2018,46(5):1041-1046(in Chinese)
    [25]李倩玉,蒋建国,齐美彬.基于改进深层网络的人脸识别算法[J].电子学报,2017,45(3):619-625.Li Qian-yu,Jiang Jian-guo,Qi M ei-bin. Face recognition algorithm based on improved deep netw orks[J]. Acta Electronica Sinica,2017,45(3):619-625.(in Chinese)
    1)给定训练集S={v1,v2,…,v T},分批训练,每批有S=T个样本,可见单元和隐单元分别设为n个和m个,每个RBM的学习率设

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700