摘要
CNN框架中,如何对其模型的超参数进行自动化获取一直是一个重要问题。提出一种基于改进的贝叶斯优化算法的CNN超参数优化方法。该方法使用改进的汤普森采样方法作为采集函数,利用改进的马尔可夫链蒙特卡罗算法加速训练高斯代理模型。该方法可以在超参数空间不同的CNN框架下进行超参数优化。利用CIFAR-10、MRBI和SVHN测试集对算法进行性能测试,实验结果表明,改进后的CNN超参数优化算法比同类超参数优化算法具有更好的性能。
In the framework of convolutional neural network(CNN),how to obtain the hyper-parameters of its model automatically is an important and pressing research topic. This paper proposed a hyper-parameter optimization method of CNN based on improved Bayesian optimization algorithm. This method used the improved Thompson sampling method as the acquisition function. It used the improved Markov chain Monte Carlo algorithm to accelerate the Gaussian surrogate model. The proposed method can be used to optimize hyper-parameters in frameworks of CNN with different hyper-parameter space. It tested the performance of the algorithm by using these testing sets: CIFAR-10,MRBI and SVHN. The experimental results show that the improved hyper-parameter optimization algorithm of CNN has better performance than the other algorithms.
引文
[1] Breuel T M. The effects of hyperparameters on SGD training of neural networks[EB/OL].(2015-08-12). https://arxiv. org/abs/1508. 02788.
[2] Mockus J. Bayesian approach to global optimization:theory and applications[M]. Berlin:Springer Science&Business Media,2012.
[3] Jones D R. A taxonomy of global optimization methods based on response surfaces[J]. Journal of Global Optimization,2001,21(4):345-383.
[4] Klein A,Falkner S,Bartels S,et al. Fast Bayesian optimization of machine learning hyper-parameters on large datasets[EB/OL].(2016-05-23).[2017-03-07]. https://arxiv. org/abs/1605. 07079.
[5] Bergstra J,Bardenet R,Bengio Y,et al. Algorithms for hyper-parameter optimization[C]//Proc of the 24th International Conference on Neural Information Processing Systems. USA:Curran Associates Inc.,2011.
[6] Snoek J,Larochelle H,Adams R P. Practical Bayesian optimization of machine learning algorithms[C]//Proc of the 25th International Conference on Neural Information Processing Systems. USA:Curran Associates Inc.,2012.
[7] Rasmussen C E,Williams C K I. Gaussian processes for machine learning[M]. Cambridge,MA:MIT Press,2006.
[8] Pepelyshev A. The role of the nugget term in the Gaussian process method[M]//Advances in Model-Oriented Design and Analysis.[S.l.]:Physica-Verlag HD,2010:149-156.
[9] Kapli P,Lutteropp S,Zhang J,et al. Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo[J]. Bioinformatics,2017,33(11):1630-1638.
[10]Krizhevsky A,Hinton G. Learning multiple layers of features from tiny images[EB/OL].(2009-04-08). http://www. cs. toronto. edu/~kriz/learning-features-2009-TR. pdf.
[11] Powell M J D. Developments of NEWUOA for minimization without derivatives[J]. IMA Journal of Numerical Analysis,2008,28(4):649-664.
[12]Hernández-Lobato J M,Hoffman M W,Ghahramani Z. Predictive entropy search for efficient global optimization of black-box functions[C]//Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge,MA:MIT Press,2014:918-926.
[13]Wu Huasen,Liu Xin. Double Thompson sampling for dueling bandits[C]//Advances in Neural Information Processing Systems. 2016:649-657.
[14]Christen J A,Fox C. Markov chain Monte Carlo using an approximation[J]. Journal of Computational and Graphical statistics,2005,14(4):795-810.
[15]González J,Osborne M,Lawrence N. GLASSES:relieving the myopia of Bayesian optimisation[C]//Proc of the 19th International Conference on Artificial Intelligence and Statistics. 2016:790-799.
[16]Domhan T,Springenberg J T,Hutter F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves[C]//Proc of the 24th International Conference on Artificial Intelligence. 2015:3460-3468.