摘要
在自然语言处理中,由于神经网络的结构需要人工设计,容易导致复杂的神经网络结构中存在大量冗余.为了减少冗余,人们常采用剪枝等模型压缩方法,但是这类方法通过一些与训练过程无关的指标直接对模型进行裁剪时往往造成性能损失.因此探索了一种神经网络中神经元连接的自动学习方法,通过在训练中对神经元连接进行动态生长和删除的方法,可以更好地对网络连接进行动态操作,从而得到更紧凑、高效的网络结构.使用该方法在神经语言模型上进行自动生长和消去,在保证网络性能不变的前提下,网络规模可缩小49%.
In the field of natural language processing,the structure of the neural network requires manual design,which leads to a large amount of redundancy in the complex neural network structure.For the purpose of reducing the redundant model parameters,researchers often adopt model compression methods such as pruning.However,these methods directly compress the model by taking some indicators that are not related to the training process,resulting in the performance loss.This paper explores an automatic learning method of neural connection in neural network.This method can dynamically grow and delete the neuron connection during training,which can better operate the network connection dynamically,thus achieving more compact and efficient network structures.Using this method,we perform automatic growth and elimination on the neural language model,and the network scale can be further reduced by 49% while maintaining the original network performance.
引文
[1]BENGIO Y,DUCHARME R,VINCENT P,et al.Aneural probabilistic language model[J].Journal of Machine Learning Research,2003,3:1137-1155.
[2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[EB/OL].[2018-10-22].https:∥arxiv.org/pdf/1706.03762.
[3]GRAVES A,MOHAMED A R,HINTON G.Speech recognition with deep recurrent neural networks[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing.Vancouver:IEEE,2013:6645-6649.
[4]HARACLICK R M.Texture features for image classification[J].IEEE Trans Smc,1973,3(6):610-621.
[5]DEAN J,CORRADO G S,MONGA R,et al.Large scale distributed deep networks[C]∥International Conference on Neural Information Processing Systems.Lake Tahoe:Curran Associates Inc,2012:1223-1231.
[6]CUN Y L,DENKER J S,SOLLA S A.Optimal brain damage[C]∥International Conference on Neural Information Processing Systems.Cambridge:MIT Press,1989:598-605.
[7]THODBERG H H.Improving generalization of neural networks through pruning[J].International Journal of Neural Systems,1991,1(4):317-326.
[8]HAN S,MAO H,DALLY W J.Deep compression:compressing deep neural networks with pruning,trained quantization and huffman coding[EB/OL].[2018-10-22].https:∥arxiv.org/pdf/1510.00149.
[9]NABHAN T M,ZOMAYA A Y.Toward generating neural network structures for function approximation[J].Neural Networks,1994,7(1):89-99.
[10]KADETOTAD D,ARUNACHALAM S,CHAKRABARTIC,et al.Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications[C]∥Proceedings of the 35th International Conference on Computer-Aided Design.Austin:ACM,2016:78.
[11]MZARD M,NADAL J P.Learning in feedforward layered networks:the tiling algorithm[J].Journal of Physics A:Mathematical and General,1989,22(12):2191-2203.
[12]DAI X,YIN H,JHA N K.NeST:a neural network synthesis tool based on a grow-and-prune paradigm[EB/OL].[2018-10-22].https:∥arxiv.org/pdf/1711.02017.