基于改进seq2seq模型的英汉翻译研究

英文篇名：English-Chinese translation based on an improved seq2seq model
作者：肖新凤 ; 李石君 ; 余伟 ; 刘杰 ; 刘倍雄
英文作者：XIAO Xin-feng;LI Shi-jun;YU Wei;LIU Jie;LIU Bei-xiong;Department of Mechanical and Electrical Engineering,Guangdong Polytechnic of Environmental Protection Engineering;School of Computer Science,Wuhan University;
关键词：深度学习 ; 神经机器翻译 ; seq2seq模型 ; 注意力机制 ; 命名实体识别
英文关键词：deep learning;;neural machine translation;;seq2seq model;;attention mechanism;;named entity recognition
中文刊名：JSJK
英文刊名：Computer Engineering & Science
机构：广东环境保护工程职业学院机电工程系;武汉大学计算机学院;
出版日期：2019-07-15
出版单位：计算机工程与科学
年：2019
期：v.41;No.295
基金：国家自然科学基金(61502350);; 2017广东高校省级重点平台和重大科研项目(2017GKTSCX042)
语种：中文;
页：JSJK201907016
页数：9
CN：07
ISSN：43-1258/TP
分类号：117-125

摘要

目前机器翻译主要对印欧语系进行优化与评测,很少有对中文进行优化的,而且机器翻译领域效果最好的基于注意力机制的神经机器翻译模型—seq2seq模型也没有考虑到不同语言间语法的变换。提出一种优化的英汉翻译模型,使用不同的文本预处理和嵌入层参数初始化方法,并改进seq2seq模型结构,在编码器和解码器之间添加一层用于语法变化的转换层。通过预处理,能缩减翻译模型的参数规模和训练时间20%,且翻译性能提高0.4 BLEU。使用转换层的seq2seq模型在翻译性能上提升0.7～1.0 BLEU。实验表明,在规模大小不同的语料英汉翻译任务中,该模型与现有的基于注意力机制的seq2seq主流模型相比,训练时长一致,性能提高了1～2 BLEU。
Current machine translation systems optimize and evaluate the translation process in Indo-European languages to enhance translation accuracy. But researches about Chinese language are few. At present the seq2 seq model is the best method in the field of machine translation, which is a neural machine translation model based on the attention mechanism. However, it does not take into account the grammar transformation between different languages. We propose a new optimized English-Chinese translation model. It uses different methods to preprocess texts and initialize embedding layer parameters. Additionally, to improve the seq2 seq model structure, a transform layer between the encoder and the decoder is added to deal with grammar transformation problems. Preprocessing can reduce the parameter size and training time of the translation model by 20%, and the translation performance is increased by 0.4 BLEU. The translation performance of the seq2 seq model with a transform layer is improved by 0.7 to 1.0 BLEU. Experiments show that compared to the existing seq2 seq mainstream model based on the attention mechanism, the training time for English-Chinese translation tasks is the same for corpus of different sizes, but the translation performance of the proposal is improved by 1 to 2 BLEU.

引文

[1] Marcu D,Wong W.A phrase-based,joint probability model for statistical machine translation[C]//Proc of the ACL-02 Conference on Empirical Methods in Natural Language Processing,2002:133-139.
    [2] Bengio Y,Ducharme R,Vincent P,et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2003,3(6):1137-1155.
    [3] Vaswani A,Zhao Y,Fossum V,et al.Decoding with large-scale neural language models improves translation[C]//Proc of the 2013 Conference on Empirical Methods in Natural Language Processing,2013:1387-1392.
    [4] Cho K,van Merrienboer B,Gulcehre C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proc of the 2014 Conference on Empirical Methods in Natural Language Processing,2014:1724-1734.
    [5] Sutskever I,Vinyals O,Le Q V.Sequence to sequence learning with neural networks[C]//Proc of the 27th Conference on Neural Information Processing Systems,2014:3104-3112.
    [6] Bahdanau D,Cho K,Bengio Y.Neural machine translation by jointly learning to align and translate[J].arXiv preprint arXiv:1409.0473v1,2014.
    [7] Luong M T,Pham H,Manning C D.Effective approaches to attention-based neural machine translation[C]//Proc of the 2015 Conference on Empirical Methods in Natural Language Processing,2015:1412-1421.
    [8] Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[C]//Proc of the 31st Conference on Neural Information Processing Systems,2017:5998-6008.
    [9] Bengio Y,Simard P,Frasconi P.Learning long-term dependencies with gradient descent is difficult[J].IEEE Transactions on Neural Networks,1994,5(2):157-166.
    [10] Pascanu R,Mikolov T,Bengio Y.On the difficulty of training recurrent neural networks[C]//Proc of International Conference on Machine Learning,2013:1310-1318.
    [11] Tiedemann J O.Parallel data,tools and interfaces in OPUS[C]//Proc of the 8th International Conference on Language Resources and Evaluation 2012,2012:2214-2218.
    [12] Tian L,Wong D F,Chao L S,et al.UM-corpus:A large English-Chinese parallel corpus for statistical machine translation[C]//Proc of the 9th International Conference on Lan- guage Resources and Evaluation 2014,2014:1837-1842.
    [13] Ziemski M,Junczys-Dowmunt M,Pouliquen B.The united nations parallel corpus v1.0[C]//Proc of the 11th International Conference on Language Resources and Evaluation 2016,2016:3530-3534.
    [14] Han Dong-xu,Chang Bao-bao.Approaches to domain adaptive Chinese segmentation model[J].Chinese Journal of Computers,2015,38(2):272-281.(in Chinese)
    [15] Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].arXiv preprint arXiv:1301.3781v3,2013.
    [16] Pennington J,Socher R,Manning C D.Glove:Global vectors for word representation[C]//Proc of the 2014 Conference on Empirical Methods in Natural Language Processing,2014:1532-1543.
    [14] 韩冬煦,常宝宝.中文分词模型的领域适应性方法[J].计算机学报,2015,38(2):272-281.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700