摘要
为了获得更好的收敛速度和训练效果,提出了根据模型测试准确率对学习率使用不同调整策略的自适应学习率调整算法.将训练过程分为前期、中期和后期三个阶段:在前期适当增大学习率,在中期和后期根据与测试准确率的增量相关的衰减因子函数使用不同大小的学习率衰减因子减小学习率,增量越小表示模型越接近收敛,因而使用更小的衰减因子.基于MXNet框架,在数据集CIFAR-10和CIFAR-100上进行测试实验,结果表明所提出的方法在收敛速度和准确率收敛值方面都有更好的效果.
In order to achieve better convergence speed and training effect,an adaptive learning rate algorithm was proposed which used different adjustment strategies according to the test accuracy of the model.The training process was divided into three stages,which were the early,the middle,and the late stage.In the early stage,the learning rate was increased appropriately.In the middle and later stages,different attenuation factor of the learning rate was used to reduce the learning rate according to the attenuation factor function related to the increment of the test accuracy.The smaller the increment was,which meant the model was to convergence,the smaller the attenuation factor was used.Experiments on datasets CIFAR-10 and CIFAR-100 show that the proposed method has better results in terms of convergence speed and accuracy.
引文
[1]周志华.机器学习[M].北京:清华大学出版社,2016.
[2]ZHANG Y,DAI H,XU C,et al.Sequential click prediction for sponsored search with recurrent neural networks[C]//Proc of the 28th AAAI Conference on Artificial Intelligence.Quebec City:AAAI Press,2014:1369-1375.
[3]YU X H,CHEN G A,CHENG S X.Dynamic learning rate optimization of the backpropagation algorithm[M].Piscataway:IEEE,1995.
[4]SMITH L N.Cyclical learning rates for training neural networks[C]//Proc of 2017 IEEE Winter Conference on Applications of Computer Vision.Santa Rosa:IEEE,2017:464-472.
[5]BABICHEV D,BACH F.Constant step size stochastic gradient descent for probabilistic modeling[EB/OL].[2018-06-08].http://arxiv.org/abs/1804.05567.
[6]FU Q,LUO F,LIU J,et al.Improving learning algorithm performance for spiking neural networks[C]//Proc of2017 IEEE 17th Interational Conference on Communication Technology.Chengdu:IEEE,2017:1916-1919.
[7]XIAO W,BHARDWAJ R,RAMJEE R,et al.Gandiva:introspective cluster scheduling for deep learning[C]//Proc of the 13th USENIX Symposium on Operating Systems Design and Implementation.Carlsbad:USENIXAssociation,2018:595-610.
[8]HOFFER E,HUBARA I,SOUDRY D.Train longer,generalize better:closing the generalization gap in large batch training of neural networks[C]//Proc of Advances in Neural Information Processing Systems.Long Beach:NIPS,2017:1729-1739.
[9]IDA Y,FUJIWARA Y,IWAMURA S.Adaptive learning rate via covariance matrix based preconditioning for deep neural networks[C]//Proc of the 26th International Joint Conference on Artificial Intelligence.Melbourne:IJCAI,2017:1923-1929.
[10]KESKAR N S,SAON G.A nonmonotone learning rate strategy for SGD training of deep neural networks[C]//Proc of 2015 IEEE International Conference on Acoustics,Speech and Signal Processing.Brisbane:IEEE,2015:4974-4978.