基于翻译日志的统计机器翻译模型剪枝

英文篇名：Statistical Machine Translation Model Pruning Based on Translation Log
作者：刘凯 ; 吕雅娟 ; 姜文斌 ; 刘群
英文作者：LIU Kai;Lü Yajuan;JIANG Wenbin;LIU Qun;Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,University of Chinese Academy of Sciences;Dublin City University (DCU);
关键词：统计机器翻译 ; 模型剪枝 ; 翻译日志
英文关键词：statistical machine translation;;model pruning;;translation log
中文刊名：BJDZ
英文刊名：Acta Scientiarum Naturalium Universitatis Pekinensis
机构：中国科学院大学计算技术研究所,智能信息处理重点实验室;Dublin City University(DCU);
出版日期：2013-11-06 11:23
出版单位：北京大学学报(自然科学版)
年：2014
期：v.50;No.261
基金：863计划(2011AA01A207);; 国家关键技术支撑项目(2012BAH39B03)资助
语种：中文;
页：BJDZ201401024
页数：6
CN：01
ISSN：11-2442/N
分类号：170-175

摘要

提出一种基于翻译日志的统计机器翻译模型的剪枝方法。该方法利用翻译规则在翻译日志中的命中频数对机器翻译规则进行过滤,保留当前机器翻译模型所需的最小规则表。实验表明,该方法能够在仅保留原有模型1%~3%翻译规则的前提下达到原有模型的翻译效果。
The authors propose a novel translation log based translation rule pruning method, which prunes translation rules according to the translation rule hit counts pairs. Experiment results show that the proposed method requires only 1%- 3% translation rules without significantly difference compared to the full model.

引文

[1]Quirk C,Menezes A.Do we need phrases?:challenging the conventional wisdom in statistical machine translation//Proceedings of the main conference on human language technology conference of the North American chapter of the Association of Computational Linguistics(NAACL).New York:Association for Computational Linguistics,2006:9–16
    [2]Johnson H,Martin J,Foster G,et al.Improving translation quality by discarding most of the phrasetable//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL).Prague:Association for Computational Linguistics,2007:967–975
    [3]Kavitha K M,Gomes L,Lopes G P,et al.Using SVMs for filtering translation tables for parallel corpora alignment//EPIA.Lisbon,2011:690–702
    [4]Zettlemoyer L S,Moore R C.Selective phrase pair extraction for improved statistical machine translation//Human Language Technologies 2007:The Conference of the North American Chapter of the Association for Computational Linguistics.Rochester:Association for Computational Linguistics,2007:209–212
    [5]Tu Zhaopeng,Liu Qun,Lin Shouxun.Extracting long distance reordering rules with dependency restriction.Journal of Chinese Informaiton Processing,2011,25(2):55–60
    [6]Liu Qun,He Zhongjun,Liu Yang,et al.Maximum entropy based rule selection model for syntax-based statistical machine translation//Proceedings of the2008 Conference on Empirical Methods in Natural Language Processing(EMNLP).Honolulu,Hawaii:Association for Computational Linguistics,2008:85–97
    ①http://www.sogou.com//labs/dl/ca.html

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700