面向语言模型的神经元连接自动学习方法

英文篇名：Automatic learning method of neuron connections for language models
作者：姜雨帆 ; 李北 ; 林野 ; 李垠桥 ; 肖桐 ; 朱靖波
英文作者：JIANG Yufan;LI Bei;LIN Ye;LI Yinqiao;XIAO Tong;ZHU Jingbo;Natural Language Processing Laboratory,School of Computer Science and Engineering,Northeastern University;
关键词：语言模型 ; 神经元连接 ; 剪枝
英文关键词：language model;;neuron connection;;pruning
中文刊名：XDZK
英文刊名：Journal of Xiamen University(Natural Science)
机构：东北大学计算机科学与工程学院自然语言处理实验室;
出版日期：2019-03-28
出版单位：厦门大学学报(自然科学版)
年：2019
期：v.58;No.269
基金：国家自然科学基金(61432013,61732005,61876035);; 中央高校基本科研业务费专项(N161604007);; 辽宁省高等学校创新人才支持计划(LR20170606)
语种：中文;
页：XDZK201902013
页数：6
CN：02
ISSN：35-1070/N
分类号：83-88

摘要

在自然语言处理中,由于神经网络的结构需要人工设计,容易导致复杂的神经网络结构中存在大量冗余.为了减少冗余,人们常采用剪枝等模型压缩方法,但是这类方法通过一些与训练过程无关的指标直接对模型进行裁剪时往往造成性能损失.因此探索了一种神经网络中神经元连接的自动学习方法,通过在训练中对神经元连接进行动态生长和删除的方法,可以更好地对网络连接进行动态操作,从而得到更紧凑、高效的网络结构.使用该方法在神经语言模型上进行自动生长和消去,在保证网络性能不变的前提下,网络规模可缩小49%.
In the field of natural language processing,the structure of the neural network requires manual design,which leads to a large amount of redundancy in the complex neural network structure.For the purpose of reducing the redundant model parameters,researchers often adopt model compression methods such as pruning.However,these methods directly compress the model by taking some indicators that are not related to the training process,resulting in the performance loss.This paper explores an automatic learning method of neural connection in neural network.This method can dynamically grow and delete the neuron connection during training,which can better operate the network connection dynamically,thus achieving more compact and efficient network structures.Using this method,we perform automatic growth and elimination on the neural language model,and the network scale can be further reduced by 49% while maintaining the original network performance.

引文

[1]BENGIO Y,DUCHARME R,VINCENT P,et al.Aneural probabilistic language model[J].Journal of Machine Learning Research,2003,3:1137-1155.
    [2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[EB/OL].[2018-10-22].https:∥arxiv.org/pdf/1706.03762.
    [3]GRAVES A,MOHAMED A R,HINTON G.Speech recognition with deep recurrent neural networks[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing.Vancouver:IEEE,2013:6645-6649.
    [4]HARACLICK R M.Texture features for image classification[J].IEEE Trans Smc,1973,3(6):610-621.
    [5]DEAN J,CORRADO G S,MONGA R,et al.Large scale distributed deep networks[C]∥International Conference on Neural Information Processing Systems.Lake Tahoe:Curran Associates Inc,2012:1223-1231.
    [6]CUN Y L,DENKER J S,SOLLA S A.Optimal brain damage[C]∥International Conference on Neural Information Processing Systems.Cambridge:MIT Press,1989:598-605.
    [7]THODBERG H H.Improving generalization of neural networks through pruning[J].International Journal of Neural Systems,1991,1(4):317-326.
    [8]HAN S,MAO H,DALLY W J.Deep compression:compressing deep neural networks with pruning,trained quantization and huffman coding[EB/OL].[2018-10-22].https:∥arxiv.org/pdf/1510.00149.
    [9]NABHAN T M,ZOMAYA A Y.Toward generating neural network structures for function approximation[J].Neural Networks,1994,7(1):89-99.
    [10]KADETOTAD D,ARUNACHALAM S,CHAKRABARTIC,et al.Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications[C]∥Proceedings of the 35th International Conference on Computer-Aided Design.Austin:ACM,2016:78.
    [11]MZARD M,NADAL J P.Learning in feedforward layered networks:the tiling algorithm[J].Journal of Physics A:Mathematical and General,1989,22(12):2191-2203.
    [12]DAI X,YIN H,JHA N K.NeST:a neural network synthesis tool based on a grow-and-prune paradigm[EB/OL].[2018-10-22].https:∥arxiv.org/pdf/1711.02017.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700