面向神经机器翻译的模型存储压缩方法分析

英文篇名：On Storage Compression for Neural Machine Translation
作者：林野 ; 姜雨帆 ; 肖桐 ; 李恒雨
英文作者：LIN Ye;JIANG Yufan;XIAO Tong;LI Hengyu;NLP Laboratory,Northeastern University;
关键词：模型压缩 ; 剪枝 ; 量化 ; 低精度 ; 机器翻译
英文关键词：model compression;;pruning;;quantization;;low-precision;;machine translation
中文刊名：MESS
英文刊名：Journal of Chinese Information Processing
机构：东北大学自然语言处理实验室;
出版日期：2019-01-15
出版单位：中文信息学报
年：2019
期：v.33
基金：国家自然科学基金(61876035,61432013,61732005);; 中央高校基本科研业务费;; 辽宁省高等学校创新人才支持计划
语种：中文;
页：MESS201901015
页数：10
CN：01
ISSN：11-2325/N
分类号：98-107

摘要

模型存储压缩,旨在在不改变模型性能的同时,大幅度降低神经网络中过多的模型参数带来的存储空间浪费。研究人员对于模型存储压缩方法的研究大多数在计算机视觉任务上,缺乏对机器翻译模型压缩方法的研究。该文在机器翻译任务上通过实验对比剪枝、量化、低精度三种模型压缩方法在Transformer和RNN(recurrent neural network)两种模型上的模型压缩效果,最终使用剪枝、量化、低精度三种方法的组合方法可在不损失原有模型性能的前提下在Transformer和RNN模型上分别达到5.8×和11.7×的压缩率。同时,该文还针对三种模型压缩方法在不同模型上的优缺点进行了分析。
The model storage compression is to significantly reduce the storage cost by removing redundant model parameters without quality loss.Previous efforts are mostly devoted to computer vision tasks,leaving neural machine translation less touched.In this paper,we compare the model compression methods including pruning,quantification,and low-precision compression on Transformer and RNN models.Finally,we achieve 5.8× and 11.7× compression ratio on the Transformer and RNN models by a combined approach,while maintaining the same BLEU score.

引文

[1] Hinton G,et al.Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.
    [2] Collobert R,Weston J.A unified architecture for natural language processing:Deep neural networks with multitask learning[C]//Proceedings of the 25th International Conference on Machine Learning,2008:160-167.
    [3] Zhang J,Zong C.Deep neural networks in machine translation:An overview[J].IEEE Intelligent Systems,2015,30(5):16-25.
    [4] Dean J,et al.Large scale distributed deep networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.Curran Associates Inc.2012:1223-1231.
    [5] Denil M,et al.Predicting parameters in deep learning[J].Neural Information Processing Systems,2013:2148-2156.
    [6] Howard A G,et al.MobileNets:Efficient convolutional neural networks for mobile vision applications[J].arXiv preprint arXiv:1704.04861,2017.
    [7] Zhang X,et al.ShuffleNet:An extremely efficient convolutional neural network for mobile devices[J].arXiv preprint arXiv:1707.01083,2017.
    [8] Iandola F N,et al.SqueezeNet:AlexNet-level accuracy with 50xfewer parameters and <0.5MB model size[J].arXiv preprint arXiv:1602.07360,2016.
    [9] Lin M,et al.NetworkIn network[J].arXiv preprint arXiv:1312.4400,2014.
    [10] Cun Y L,Denker J S,Solla S A.Optimal brain damage[M].Davids T.Advances in Neural Information Processing Systems2.San Francisco:Morgan Kanfmann Publisher Incs,1990:598-605.
    [11] Courbariaux M,Bengio Y,David J P.Training deep neural networks with low precision multiplications[J].arXiv preprint arXiv:1412.7024,2015.
    [12] Mamta Sharma.Compression using huffman coding[J].International Journal of Computer Science and Network Security:IJCSNS,2010,(5):133-141.
    [13] Hinton G,Vinyals O,Dean J.Distilling the knowledge in a neural network[J].Computer Science,2015,14(7):38-39.
    [14] Wu Y,et al.Googles neural machine translation system:Bridging the gap between human and machine translation[J].arXiv preprint arXiv:1609.98140,2016.
    [15] Han S,Mao H,Dally W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].Fiber,2015,56(4):3-7.
    [16] See A,et al.Compression of neural machine translation models via pruning[C]//Proceedings of the Conference on Computational Natural Language Learning,2016:291-301.
    [17] Oliver B M,Pierce J R,Shannon C E.Philosophy of PCM[J].Proceedings of the Ire,1948,36(11):1324-1331.
    [18] Mishra A,Marr D.Apprentice:Using knowledge distillation techniques to improve low-precision network accuracy[J].arXiv preprint arXiv:1711.05852,2017.
    [19] Lin D D,et al.Fixed point quantization of deep convolutional networks[C]//Proceedings of the International Conference on Machine Learning,2016:2849-2858.
    [20] Courbariaux M,et al.Binarized neural networks:Training deep neural networks with weights and activations constrained to+1or-1[J].arXiv preprint arXiv:1602.02830,2016.
    [21] Vaswani A,et al.Attention is all you need[J].Neural Information Processing Systems,2017:5998-6008.
    [22] Tong Xiao,et al.Niutrans:An open source toolkit for phrase-based and syntax-based machine translation[C]//Proceedings of the ACL 2012System Demonstrations,2012,19-24.Association for Computational Linguistics.
    (1)LDC2000T46,LDC2000T47,LDC2000T50,LDC2003E14,LDC2005T10,LDC2002E18,LDC2007T09,LDC2004T08

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700