基于改进贝叶斯优化算法的CNN超参数优化方法

英文篇名：Hyper-parameter optimization of CNN based on improved Bayesian optimization algorithm
作者：邓帅
英文作者：Deng Shuai;Beijing Advanced Innovation Center for Future Internet Technology,Beijing University of Technology;Beijing Engineering Research Center for IoT Software & Systems,Beijing University of Technology;
关键词：贝叶斯优化 ; 卷积神经网络 ; 高斯过程 ; 超参数优化
英文关键词：Bayesian optimization;;convolutional neural network;;Gaussian process;;hyper-parameter optimization
中文刊名：JSYJ
英文刊名：Application Research of Computers
机构：北京工业大学北京未来网络科技高精尖创新中心;北京工业大学北京市物联网软件与系统工程技术研究中心;
出版日期：2018-04-12 08:50
出版单位：计算机应用研究
年：2019
期：v.36;No.333
基金：北京市自然科学基金资助项目(4122010);; 国家自然科学基金资助项目(60773186)
语种：中文;
页：JSYJ201907014
页数：4
CN：07
ISSN：51-1196/TP
分类号：70-73

摘要

CNN框架中,如何对其模型的超参数进行自动化获取一直是一个重要问题。提出一种基于改进的贝叶斯优化算法的CNN超参数优化方法。该方法使用改进的汤普森采样方法作为采集函数,利用改进的马尔可夫链蒙特卡罗算法加速训练高斯代理模型。该方法可以在超参数空间不同的CNN框架下进行超参数优化。利用CIFAR-10、MRBI和SVHN测试集对算法进行性能测试,实验结果表明,改进后的CNN超参数优化算法比同类超参数优化算法具有更好的性能。
In the framework of convolutional neural network(CNN),how to obtain the hyper-parameters of its model automatically is an important and pressing research topic. This paper proposed a hyper-parameter optimization method of CNN based on improved Bayesian optimization algorithm. This method used the improved Thompson sampling method as the acquisition function. It used the improved Markov chain Monte Carlo algorithm to accelerate the Gaussian surrogate model. The proposed method can be used to optimize hyper-parameters in frameworks of CNN with different hyper-parameter space. It tested the performance of the algorithm by using these testing sets: CIFAR-10,MRBI and SVHN. The experimental results show that the improved hyper-parameter optimization algorithm of CNN has better performance than the other algorithms.

引文

[1] Breuel T M. The effects of hyperparameters on SGD training of neural networks[EB/OL].(2015-08-12). https://arxiv. org/abs/1508. 02788.
    [2] Mockus J. Bayesian approach to global optimization:theory and applications[M]. Berlin:Springer Science&Business Media,2012.
    [3] Jones D R. A taxonomy of global optimization methods based on response surfaces[J]. Journal of Global Optimization,2001,21(4):345-383.
    [4] Klein A,Falkner S,Bartels S,et al. Fast Bayesian optimization of machine learning hyper-parameters on large datasets[EB/OL].(2016-05-23).[2017-03-07]. https://arxiv. org/abs/1605. 07079.
    [5] Bergstra J,Bardenet R,Bengio Y,et al. Algorithms for hyper-parameter optimization[C]//Proc of the 24th International Conference on Neural Information Processing Systems. USA:Curran Associates Inc.,2011.
    [6] Snoek J,Larochelle H,Adams R P. Practical Bayesian optimization of machine learning algorithms[C]//Proc of the 25th International Conference on Neural Information Processing Systems. USA:Curran Associates Inc.,2012.
    [7] Rasmussen C E,Williams C K I. Gaussian processes for machine learning[M]. Cambridge,MA:MIT Press,2006.
    [8] Pepelyshev A. The role of the nugget term in the Gaussian process method[M]//Advances in Model-Oriented Design and Analysis.[S.l.]:Physica-Verlag HD,2010:149-156.
    [9] Kapli P,Lutteropp S,Zhang J,et al. Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo[J]. Bioinformatics,2017,33(11):1630-1638.
    [10]Krizhevsky A,Hinton G. Learning multiple layers of features from tiny images[EB/OL].(2009-04-08). http://www. cs. toronto. edu/~kriz/learning-features-2009-TR. pdf.
    [11] Powell M J D. Developments of NEWUOA for minimization without derivatives[J]. IMA Journal of Numerical Analysis,2008,28(4):649-664.
    [12]Hernández-Lobato J M,Hoffman M W,Ghahramani Z. Predictive entropy search for efficient global optimization of black-box functions[C]//Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge,MA:MIT Press,2014:918-926.
    [13]Wu Huasen,Liu Xin. Double Thompson sampling for dueling bandits[C]//Advances in Neural Information Processing Systems. 2016:649-657.
    [14]Christen J A,Fox C. Markov chain Monte Carlo using an approximation[J]. Journal of Computational and Graphical statistics,2005,14(4):795-810.
    [15]González J,Osborne M,Lawrence N. GLASSES:relieving the myopia of Bayesian optimisation[C]//Proc of the 19th International Conference on Artificial Intelligence and Statistics. 2016:790-799.
    [16]Domhan T,Springenberg J T,Hutter F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves[C]//Proc of the 24th International Conference on Artificial Intelligence. 2015:3460-3468.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700