摘要
提出一种基于翻译日志的统计机器翻译模型的剪枝方法。该方法利用翻译规则在翻译日志中的命中频数对机器翻译规则进行过滤,保留当前机器翻译模型所需的最小规则表。实验表明,该方法能够在仅保留原有模型1%~3%翻译规则的前提下达到原有模型的翻译效果。
The authors propose a novel translation log based translation rule pruning method, which prunes translation rules according to the translation rule hit counts pairs. Experiment results show that the proposed method requires only 1%- 3% translation rules without significantly difference compared to the full model.
引文
[1]Quirk C,Menezes A.Do we need phrases?:challenging the conventional wisdom in statistical machine translation//Proceedings of the main conference on human language technology conference of the North American chapter of the Association of Computational Linguistics(NAACL).New York:Association for Computational Linguistics,2006:9–16
[2]Johnson H,Martin J,Foster G,et al.Improving translation quality by discarding most of the phrasetable//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL).Prague:Association for Computational Linguistics,2007:967–975
[3]Kavitha K M,Gomes L,Lopes G P,et al.Using SVMs for filtering translation tables for parallel corpora alignment//EPIA.Lisbon,2011:690–702
[4]Zettlemoyer L S,Moore R C.Selective phrase pair extraction for improved statistical machine translation//Human Language Technologies 2007:The Conference of the North American Chapter of the Association for Computational Linguistics.Rochester:Association for Computational Linguistics,2007:209–212
[5]Tu Zhaopeng,Liu Qun,Lin Shouxun.Extracting long distance reordering rules with dependency restriction.Journal of Chinese Informaiton Processing,2011,25(2):55–60
[6]Liu Qun,He Zhongjun,Liu Yang,et al.Maximum entropy based rule selection model for syntax-based statistical machine translation//Proceedings of the2008 Conference on Empirical Methods in Natural Language Processing(EMNLP).Honolulu,Hawaii:Association for Computational Linguistics,2008:85–97
①http://www.sogou.com//labs/dl/ca.html