用户名: 密码: 验证码:
基于Bagging-Down SGD算法的分布式深度网络
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Distributed deep networks based on Bagging-Down SGD algorithm
  • 作者:秦超 ; 高晓光 ; 陈大庆
  • 英文作者:QIN Chao;GAO Xiaoguang;CHEN Daqing;School of Electronics and Information,Northwestern Poly-technical University;London South Bank University;
  • 关键词:深度网络 ; 分布式 ; Bootstrap向下聚合随机梯度下降 ; 速度控制器
  • 英文关键词:deep network;;distributed;;Bootstrap aggregating-down stochastic gradient descent(Bagging-Down SGD);;speed controller
  • 中文刊名:XTYD
  • 英文刊名:Systems Engineering and Electronics
  • 机构:西北工业大学电子信息学院;南岸大学;
  • 出版日期:2019-03-22 10:27
  • 出版单位:系统工程与电子技术
  • 年:2019
  • 期:v.41;No.476
  • 基金:国家自然科学基金(61573285)资助课题
  • 语种:中文;
  • 页:XTYD201905013
  • 页数:7
  • CN:05
  • ISSN:11-2422/TN
  • 分类号:90-96
摘要
通过对大量数据进行训练并采用分布式深度学习算法可以学习到比较好的数据结构,而传统的分布式深度学习算法在处理大数据集时存在训练时间比较慢或者训练精度比较低的问题。提出Bootstrap向下聚合随机梯度下降(Bootstrap aggregating-down stochastic gradient descent,Bagging-Down SGD)算法重点来提高分布式深度网络的学习速率。Bagging-Down SGD算法通过在众多单机模型上加入速度控制器,对单机计算的参数值做统计处理,减少了参数更新的频率,并且可以使单机模型训练和参数更新在一定程度上分开,在保证训练精度的同时,提高了整个分布式模型的训练速度。该算法具有普适性,可以对多种类别的数据进行学习。
        As a cutting-edge disruptive technology,deep learning and unsupervised learning have attracted a significant research attention,and it has been widely acknowledged that training big data with a distributed deep learning algorithm can get better structures.However,there are two main problems with traditional distributed deep learning algorithms:the speed of training is slow and the accuracy of training is low.The Bootstrap aggregating-down stochastic gradient descent(Bagging-Down SGD)algorithm is proposed to solve the speed problem mainly.We add a speed controller to update the parameters of the single machine statistically,and to split model training and parameters updating to improve the training speed with the assurance of the same accuracy.It is to be proved in the experiment that the algorithm has the generality to learn the structures of different kinds of data.
引文
[1]NGIAM J,COATES A,LAHIRI A,et al.On optimization methods for deep learning[C]∥Proc.of the 28th International Conference on Machine Learning(ICML-11),2011:265-272.
    [2]MARTENS J.Deep learning via Hessian-free optimization[C]∥Proc.of the 27th International Conference on Machine Learning(ICML-10),2010:735-742.
    [3]HINTON G E,OSINDERO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554.
    [4]HINTON G,DENG L,YU D,et al.Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups[J].Signal Processing Magazine,2012,29(6):82-97.
    [5]张亚军,刘宗田,周文,等.基于深度信念网络的事件识别[J].电子学报,2017,45(6):1415-1423.ZHANG Y J,LIU Z T,ZHOU W,et al.Event recognition based on deep belief network[J].Acta Electronica Sinica,2017,45(6):1415-1423.
    [6]DAHL G E,YU D,DENG L,et al.Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J].IEEE Trans.on Audio,Speech and Language Processing,2012,20(1):30-42.
    [7]BENGIO Y.How auto-encoders could provide credit assignment in deep networks via target propagation[J].ArXiv Preprint ArXiv:1407.7906,2014.
    [8]O’NEILL M.Neural network for recognition of handwritten digits[J].Standard Reference Data Program National Institute of Standards and Technology,2006.
    [9]CIRESAN D C,MEIER U,GAMBARDELLA L M,et al.Deep big simple neural nets excel on handwritten digit recognition[J].Corr,2010,22(12):3207-3220.
    [10]COATES A,NG A Y,LEE H.An analysis of single-layer networks in unsupervised feature learning[C]∥Proc.of the International Conference on Artificial Intelligence and Statistics,2011:215-223.
    [11]BENGIO Y.Deep learning of representations:looking forward[M]∥Statistical Language and Speech Processing,Berlin Heidelberg:Springer,2013:1-37.
    [12]RAINA R,MADHAVAN A,NG A Y.Large-scale deep unsupervised learning using graphics processors[C]∥Proc.of the 26th Annual International Conference on Machine Learning,2009:873-880.
    [13]MCDONALD R,MOHRI M,SILBERMAN N,et al.Efficient large-scale distributed training of conditional maximum entropy models[C]∥Proc.of the Advances in Neural Information Processing Systems,2009:1231-1239.
    [14]MCDONALD R,HALL K,MANN G.Distributed training strategies for the structured perceptron[C]∥Proc.of the Annual Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies,2010:456-464.
    [15]NOKLEBY M,BAJWA W U.Distributed mirror descent for stochastic learning over rate-limited networks[C]∥Proc.of the IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing(CAMSAP),2017:1-5.
    [16]AGARWAL A,DUCHI J C.Distributed delayed stochastic optimization[C]∥Proc.of the Advances in Neural Information Processing Systems,2011:873-881.
    [17]BENGIO Y,DUCHARME R,VINCENT P,et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2000,3(6):932-938.
    [18]NIU F,RECHT B,RE C,et al.HOGWILD!:a lock-free approach to parallelizing stochastic gradient descent[J].Optimization and Control,2011,6:693-701.
    [19]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images[R].Canada:University of Toronto,2009.
    [20]LAN G,ZHOU Y.Random gradient extrapolation for distributed and stochastic optimization[J].SIAM Journal on Optimization,2018,28(4):2753-2782.
    [21]DEAN J,GHEMAWAT S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
    [22]LOW Y,BICKSON D,GONZALEZ J,et al.Distributed graphLab:a framework for machine learning and data mining in the cloud[J].Proceedings of the VLDB Endowment,2012,5(8):716-727.
    [23]ABADI M,AGARWAL A,BARHAM P,et al.Tensorflow:large-scale machine learning on heterogeneous distributed systems[J].ArXiv Preprint ArXiv:1603.04467,2016
    [24]AGARWAL A,CHAPELLE O,DUDíK M,et al.A reliable effective terascale linear learning system[J].The Journal of Machine Learning Research,2014,15(1):1111-1133.
    [25]BERGSTRA J,BREULEUX O,BASTIEN F,et al.Theano:a CPU and GPU math expression compiler[C]∥Proc.of the Python for Scientific Computing Conference(SciPy),2010:3.
    [26]MOKHTARI A,RIBEIRO A.DSA:decentralized double stochastic averaging gradient algorithm[J].The Journal of Machine Learning Research,2016,17(1):2165-2199.
    [27]PAN X,PAPAILIOPOULOS D,OYMAK S,et al.Parallel correlation clustering on big graphs[C]∥Proc.of the Advances in Neural Information Processing Systems,2015:82-90.
    [28]CIRESAN D,MEIER U,SCHMIDHUBER J.Multi-column deep neural networks for image classification[C]∥Proc.of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2012:3642-3649.
    [29]GROSSE R B,MADDISON C J,SALAKHUTDINOV R R.Annealing between distributions by averaging moments[C]∥Proc.of the Advances in Neural Information Processing Systems,2013:2769-2777.
    [30]DENG L,YU D,PLATT J.Scalable stacking and learning for building deep architectures[C]∥Proc.of the IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2012:2133-2136.
    [31]DUCHI J,HAZAN E,SINGER Y.Adaptive sub-gradient methods for online learning and stochastic optimization[J].The Journal of Machine Learning Research,2011,12:2121-2159.
    [32]BELTRAMELLI T,RISI S.Deep-spying:Spying using smart watch and deep learning[J].ArXiv Preprint ArXiv:1512.05616,2015.
    [33]LANGFORD J,SMOLA A J,ZINKEVICH M.Slow learners are fast[C]∥Proc.of the International Conference on Neural Information Processing Systems,2009.
    [34]ZINKEVICH M,WEIMER M,LI L,et al.Parallelized stochastic gradient descent[C]∥Proc.of the Advances in Neural Information Processing Systems,2010:2595-2603.
    [35]LEE S,NEDIC A,RAGINSKY M.Stochastic dual averaging for decentralized online optimization on time-varying communication graphs[J].IEEE Trans.on Automatic Control,2017,62(12):6407-6414.
    [36]ALLEN-ZHU Z.Katyusha:the first direct acceleration of stochastic gradient methods[J].The Journal of Machine Learning Research,2017,18(1):8194-8244.
    [37]LECUN Y A,BOTTOU L,ORR G B,et al.Efficient backprop[M]∥Neural Networks:Tricks of the Trade,2012:9-48.
    [38]BOTTOU L.Stochastic gradient learning in neural networks[J].Proceedings of Neuro-Nimes,1991,91(8).
    [39]SHI Q,PETTERSON J,DROR G,et al.Hash kernels[C]∥Proc.of the International Conference on Artificial Intelligence and Statistics,2009:496-503.
    [40]MCMAHAN H B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[J].ArXiv Preprint ArXiv:1602.05629,2016.
    [41]COLLOBERT R,WESTON J.A unified architecture for natural language processing:Deep neural networks with multitask learning[C]∥Proc.of the 25th International Conference on Machine Learning,2008:160-167.
    [42]LANGKVIST M,KARLSSON L,LOUTFI A.A review of unsupervised feature learning and deep learning for time-series modeling[J].Pattern Recognition Letters,2014,42:11-24.
    [43]NALISNICK E,RAVI S.Infinite dimensional word embeddings[J].ArXiv Preprint ArXiv:1511.05392,2015.
    [44]COATES A,NG A,LEE H.An analysis of single-layer networks in unsupervised feature learning[C]∥Proc.of the 14th International Conference on Artificial Intelligence and Statistics,2011:215-223.
    [45]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].ArXiv Preprint ArXiv:1409.1556,2014.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700