用户名: 密码: 验证码:
深度学习常用优化算法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A study of common optimization algorithms for deep learning
  • 作者:贾桐
  • 英文作者:Jia Tong;National Computer System Engineering Research Institute of China;
  • 关键词:优化算法 ; 机器学习 ; 深度学习
  • 英文关键词:optimization algorithm;;machine learning;;deep learning
  • 中文刊名:WXJY
  • 英文刊名:Information Technology and Network Security
  • 机构:华北计算机系统工程研究所;
  • 出版日期:2019-07-10
  • 出版单位:信息技术与网络安全
  • 年:2019
  • 期:v.38;No.507
  • 语种:中文;
  • 页:WXJY201907008
  • 页数:5
  • CN:07
  • ISSN:10-1543/TP
  • 分类号:46-50
摘要
随着计算处理单元及图形处理单元计算能力的快速发展,以及人类收集数据规模的指数级跨越,深度学习技术蓬勃发展,在图像识别、自然语言理解以及语音识别等领域展现出强大的能力。而深度学习通常被建模为一个无约束优化问题,因此需要有具体的优化算法对其进行求解。通过对比分析的方法介绍了深度学习领域常用的基于随机梯度下降的启发式优化算法,对比各种优化算法的优缺点,并总结在实际问题中的使用技巧和注意事项。
        With the rapid development of computing power of Computational Processing Units (CPU) and Graphics Processing Units (GPU),as well as the exponential leap of human-collected data scales,deep learning techniques have flourished,demonstrating powerful capabilities in areas such as image recognition,natural language understanding,and speech recognition. Deep learning is usually modeled as an unconstrained optimization problem,so a specific optimization algorithm is needed to solve it. This paper introduced the heuristic optimization algorithm based on stochastic gradient descent commonly used in deep learning field through comparative analysis method,compared the advantages and disadvantages of various optimization algorithms,and summarized the use skills and precautions in practical problems.
引文
[1]仝卫国,李敏霞,张一可.深度学习优化算法研究[J].计算机科学,2018,45(S2):155-159.
    [2]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521:436-444.
    [3]BOTTOU L,CURTIS F,NOCEDAL J.Optimization methods for large-scale machine learning[J].ar Xiv:1606.04838v1,2016.
    [4]XU P,ROOSTA-KHORASANI F,MAHONEY M W.Second-order optimization for non-convex machine learning:an empirical study[J].arxiv:1708.07827,2017.
    [5]RUDER S.An overview of gradient descent optimization algorithms[J].CoRR,abs/1609.04747,2016.
    [6]ROBBINS H,MONRO S.A stochastic approximation method[J].Annals of Mathematical Statistics,1951,22:400-407.
    [7]Qian Ning.On the momentum term in gradient descent learning algorithms[J].Neural Netw,1999,12(1):145-151.
    [8]BOTEV A,LEVER G,BARBER D.Nesterov's accelerated gradient and momentum as approximations to regularised update descent[J].CoRR,abs/1607.01981,2016.
    [9]DUCHI J,HAZAN E SINGER Y.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12:2121-2159.
    [10]DAUPHIN Y N,VRIES H D,CHUNG J,et al.Rmsprop and equilibrated adaptive learning rates for non-convex optimization[J].CoRR,abs/1502.04390,2015.
    [11]KINGMA D,BA J.Adam:a method for stochastic optimization[J].CoRR,abs/1412.6980,2014.
    [12]周志华.机器学习[M].北京:清华大学出版社,2016.
    [13]LECUN Y,CORTES C,BURGES C J C.The mnist database of handwritten digits[EB/OL].[2019-04-20].http://yann.lecun.com/exdb/mnist/.
    [14]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradientbased learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700