深度学习自适应学习率算法研究

英文篇名：Research on adaptive learning rate algorithm in deep learning
作者：蒋文斌 ; 彭晶 ; 叶阁焰
英文作者：JIANG Wenbin;PENG Jing;YE Geyan;Services Computing Technology and System Laboratory,Huazhong University of Science and Technology;Cluster and Grid Computing Laboratory,Huazhong University of Science and Technology;School of Computer of Science and Technology,Huazhong University of Science and Technology;
关键词：深度学习 ; 学习率 ; 准确率 ; 陈旧梯度 ; MXNet框架
英文关键词：deep learning;;learning rate;;accuracy;;stale gradient;;MXNet framework
中文刊名：HZLG
英文刊名：Journal of Huazhong University of Science and Technology(Natural Science Edition)
机构：华中科技大学服务计算技术与系统教育部重点实验室;华中科技大学集群与网格计算湖北省重点实验室;华中科技大学计算机科学与技术学院;
出版日期：2019-05-15 17:19
出版单位：华中科技大学学报(自然科学版)
年：2019
期：v.47;No.437
基金：国家自然科学基金资助项目(61672250)
语种：中文;
页：HZLG201905015
页数：5
CN：05
ISSN：42-1658/N
分类号：84-88

摘要

为了获得更好的收敛速度和训练效果,提出了根据模型测试准确率对学习率使用不同调整策略的自适应学习率调整算法.将训练过程分为前期、中期和后期三个阶段:在前期适当增大学习率,在中期和后期根据与测试准确率的增量相关的衰减因子函数使用不同大小的学习率衰减因子减小学习率,增量越小表示模型越接近收敛,因而使用更小的衰减因子.基于MXNet框架,在数据集CIFAR-10和CIFAR-100上进行测试实验,结果表明所提出的方法在收敛速度和准确率收敛值方面都有更好的效果.
In order to achieve better convergence speed and training effect,an adaptive learning rate algorithm was proposed which used different adjustment strategies according to the test accuracy of the model.The training process was divided into three stages,which were the early,the middle,and the late stage.In the early stage,the learning rate was increased appropriately.In the middle and later stages,different attenuation factor of the learning rate was used to reduce the learning rate according to the attenuation factor function related to the increment of the test accuracy.The smaller the increment was,which meant the model was to convergence,the smaller the attenuation factor was used.Experiments on datasets CIFAR-10 and CIFAR-100 show that the proposed method has better results in terms of convergence speed and accuracy.

引文

[1]周志华.机器学习[M].北京:清华大学出版社,2016.
    [2]ZHANG Y,DAI H,XU C,et al.Sequential click prediction for sponsored search with recurrent neural networks[C]//Proc of the 28th AAAI Conference on Artificial Intelligence.Quebec City:AAAI Press,2014:1369-1375.
    [3]YU X H,CHEN G A,CHENG S X.Dynamic learning rate optimization of the backpropagation algorithm[M].Piscataway:IEEE,1995.
    [4]SMITH L N.Cyclical learning rates for training neural networks[C]//Proc of 2017 IEEE Winter Conference on Applications of Computer Vision.Santa Rosa:IEEE,2017:464-472.
    [5]BABICHEV D,BACH F.Constant step size stochastic gradient descent for probabilistic modeling[EB/OL].[2018-06-08].http://arxiv.org/abs/1804.05567.
    [6]FU Q,LUO F,LIU J,et al.Improving learning algorithm performance for spiking neural networks[C]//Proc of2017 IEEE 17th Interational Conference on Communication Technology.Chengdu:IEEE,2017:1916-1919.
    [7]XIAO W,BHARDWAJ R,RAMJEE R,et al.Gandiva:introspective cluster scheduling for deep learning[C]//Proc of the 13th USENIX Symposium on Operating Systems Design and Implementation.Carlsbad:USENIXAssociation,2018:595-610.
    [8]HOFFER E,HUBARA I,SOUDRY D.Train longer,generalize better:closing the generalization gap in large batch training of neural networks[C]//Proc of Advances in Neural Information Processing Systems.Long Beach:NIPS,2017:1729-1739.
    [9]IDA Y,FUJIWARA Y,IWAMURA S.Adaptive learning rate via covariance matrix based preconditioning for deep neural networks[C]//Proc of the 26th International Joint Conference on Artificial Intelligence.Melbourne:IJCAI,2017:1923-1929.
    [10]KESKAR N S,SAON G.A nonmonotone learning rate strategy for SGD training of deep neural networks[C]//Proc of 2015 IEEE International Conference on Acoustics,Speech and Signal Processing.Brisbane:IEEE,2015:4974-4978.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700