一种利用知识迁移的卷积神经网络训练策略

英文篇名：Convolutional neural network training strategy using knowledge transfer
作者：罗可 ; 周安众 ; 罗潇
英文作者：LUO Ke;ZHOU An-zhong;LUO Xiao;College of Computer and Communication Engineering,Changsha University of Science and Technology;
关键词：卷积神经网络 ; 知识迁移 ; 过拟合 ; 梯度弥散 ; 预训练 ; 微调
英文关键词：convolutional neural network;;knowledge transfer;;overfitting;;gradient vanishing;;pre-training;;fine-tuning
中文刊名：KZYC
英文刊名：Control and Decision
机构：长沙理工大学计算机与通信工程学院;
出版日期：2018-01-08 11:00
出版单位：控制与决策
年：2019
期：v.34
基金：国家自然科学基金项目(11671125,71371065,51707013)
语种：中文;
页：KZYC201903008
页数：8
CN：03
ISSN：21-1124/TP
分类号：66-73

摘要

针对深层卷积神经网络在有限标记样本下训练时存在的过拟合和梯度弥散问题,提出一种从源模型中迁移知识训练一个深层目标模型的策略.迁移的知识包括样本的类别分布和源模型的低层特征,类别分布提供了样本的类间相关信息,扩展了训练集的监督信息,可以缓解样本不足的问题;低层特征包含样本的局部特征,在相关任务的迁移过程中具有一般性,可以使目标模型跳出局部最小值区域.利用这两部分知识对目标模型进行预训练,能够使模型收敛到较好的位置,之后再用真实标记样本进行微调.实验结果表明,所提方法能够增强模型的抗过拟合能力,并提升预测精度.
To overcome the overfitting and gradient vanishing of deep convolutional neural networks trained under limited labeled samples, a strategy is proposed to transfer knowledge from a source model to a deep target model. The transferred knowledge includes class distribution of the samples and low-level features of the source model. The class distribution provides class-related information about the samples, which extends the supervised informations of the training set to alleviate the problem of inadequate samples. The low-level feature contains the local characteristics of the samples, which is general in the process of transfer knowledge, and can make the target model jump out of the local minimum value area.Then, the two parts of knowledge are applied to the pre-training target model to make the model converge to a better position, and real labeled samples are used for fine-tuning. The experimental results show that the proposed method can both improve the anti overfitting ability of the model and prediction accuracy.

引文

[1] Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks[C]. Int Conf on Neural Information Processing Systems 2012. Harrah:Curran Associates Inc, 2012:1097-1105.
    [2] Schmidhuber J. Deep Learning in neural networks:An overview[J]. Neural Networks, 2015, 61(1):85-117.
    [3] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout:A simple way to prevent neural networks from overfitting[J].The J of Machine Learning Research, 2014, 15(1):1929-1958.
    [4] Bucila C, Caruana R, Niculescu A. Model compression[C]. Proc of the 12th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.Philadelphia:ACM, 2006:535-541.
    [5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. IEEE Conf on Computer Vision and Pattern Recognition. Columbus:IEEE Computer Society,2014:580-587.
    [6] Erhan D, Manzagol P A, Bengio Y, et al. The difficulty of training deep architectures and the effect of unsupervised pre-training[J]. Immunology of Fungal Infections, 2009,5(1):153-160.
    [7] Hinton G E, Osindero S. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
    [8] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]. Proc of the 14th Int Conf on Artificial Intelligence and Statistics. Florida:JMLR Proceedings,2011:315-323.
    [9] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]. Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Boston:IEEE, 2015:1-9.
    [10]许敏,王士同,顾鑫. TL-SVM:一种迁移学习算法[J].控制与决策, 2014, 29(1):141-146.(Xu M, Wang S T, Gu X. TL-SVM:A transfer learning algorithm[J]. Control and Decision, 2014, 29(1):141-146.)
    [11]耿新,徐宁,邵瑞枫.面向标记分布学习的标记增强[J].计算机研究与发展, 2017, 54(6):1171-1184.(Geng X, Xu N, Shao R F. Label enhancement for label distribution learning[J]. J of Computer Research and Development, 2017, 54(6):1171-1184.)
    [12] Geng X, Yin C, Zhou Z H. Facial age estimation by learning from label distributions[J]. IEEE Trans on Pattern Analysis&Machine Intelligence, 2013, 35(10):2401-2412.
    [13] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. Computer Science, 2015, 14(7):38-39.
    [14] Tang Z, Wang D, Zhang Z. Recurrent neural network training with dark knowledge transfer[C]. The IEEE Int Conf on Acoustics, Speech and Signal Processing.Brisbane:IEEE, 2015:5900-5904.
    [15] Gulcehre C, Bengio Y. Knowledge matters:Importance of prior information for optimization[J]. J of Machine Learning Research, 2016, 17(8):1-32.
    [16] Romero A, Ballas N, Kahou S E, et al. Fitnets:Hints for thin deep nets[C]. Int Conf Learning Representations2015. San Diego:Arxive-prints, 2015:1412-1550.
    [17] Ba J, Caruana R. Do deep nets really need to be deep[C].Advances in Neural Information Processing Systems.Montreal:NIPS, 2013:2654-2662.
    [18]庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报, 2015, 26(1):26-39.(Zhuang F Z, Luo P, He Q, et al. Survey on transfer learning research[J]. J of Software, 2015, 26(1):26-39.)
    [19] Soekhoe D, Putten D, Plaat A. On the impact of data set size in transfer learning using deep neural networks[C].Advances in Intelligent Data Analysis XV. Sweden:Springer Int Publishing, 2016:50-60.
    [20] Lin M, Chen Q, Yan S. Network in network[C]. Int Conf on Learning Representations. Banff:Arxive-prints, 2014:1312-4400.
    [21] Geng X. Label distribution learning[J]. IEEE Trans on Knowledge and Data Engineering, 2016, 28(7):1734-1748.
    [22] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proc of the IEEE, 1998, 86(11):2278-2324.
    [23] Krizhevsky A. Learning multiple layers of features from tiny images[R]. Toronto:University of Toronto, 2009.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700