基于混合策略的高质量英汉机器翻译引擎设计
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Design of a Hybrid High Quality English-Chinese Machine Translation System
  • 作者:胡小鹏 ; 耿鑫辉 ; 袁琦
  • 英文作者:Xiaopeng Hu;Jinhui Geng;Qi Yuan;Beijing CCID Translation Technology co.LTD;
  • 关键词:基于混合策略机器翻译 ; 双语术语提取 ; 翻译模板 ; 统计机器翻译系统
  • 英文关键词:Hybrid machine translation;;Bilingual terminology extraction;;Translation template;;Statistical machine translation system
  • 中文刊名:GYJS
  • 英文刊名:Industrial Technology Innovation
  • 机构:北京赛迪翻译技术有限公司;
  • 出版日期:2014-06-25
  • 出版单位:工业技术创新
  • 年:2014
  • 期:v.01;No.02
  • 基金:国家自然科学基金(No.61172101,No.61172102)
  • 语种:中文;
  • 页:GYJS201402006
  • 页数:7
  • CN:02
  • ISSN:10-1231/F
  • 分类号:35-41
摘要
随着全球化进程的加快,人们对机器翻译性能要求的日益增高,基于单一方法的机器翻译系统其质量很难满足人们需求。鉴于不同的机器翻译方法之间存在着优势互补的特点,因而将不同方法结合,成为提高机器翻译质量的合理途径。本文提出了一种构建高质量的混合机器翻译引擎的方法,以传统的基于实例、规则和模板的多引擎机器翻译系统为基础,深层次地结合了统计机器翻译方法。使用统计学方法挖掘语料库并提取双语资源来扩展系统知识库。通过对齐两种机器翻译系统的译文结果,进行词汇替换,使用统计语言模型评分,来改善短语搭配和译文流利度。这种多范式融合的方法,结合了各种方法的优势,不仅大大降低了引擎的构建成本和周期,也使译文结果的准确性和流利度得到了提高。
        With the accelerated process of internationalization, there is increasing performance requirements of machine translation, and machine translation systems based on single method can hardly satisfy the people's needs. Since different approaches to machine translation has its own advantages, a combination of different methods is regarded as a reasonable approach to improve the quality of machine translation. This paper presents a method of building high quality hybrid machine translation engine, which is based on traditional multi-engine machine translation system in view of instances, rules and templates, and deeply combined with statistical machine translation method. The statistical approaches are used in mining corpus and extraction of bilingual resources to expand knowledge bases. The translation alignment of two machine translation systems is used to implement lexical substitution. Statistical language model is used to improve the accuracy of phrase translation and the fluency of sentence translation. This multi-paradigm approach, which combines the advantages of different systems, not only greatly reduced the cost of system building and system construction cycle, but also improved the accuracy and fluency of the translation results.
引文
[1]Annette Rios,Anne G¨ohring.Machine Learning Disambiguation of Quechua Verb Morphology.Proceedings of the Second Workshop on Hybrid Approaches to Translation,pages 13–18,Sofi a,Bulgaria,August 8,2013.c2013 Association for Computational Linguistics
    [2]Alex Rudnick and Michael Gasser.Lexical Selection for Hybrid MT with Sequence Labeling.Proceedings of the Second Workshop on Hybrid Approaches to Translation,pages 102–108,Sofia,Bulgaria,August 8,2013.c2013 Association for Computational Linguistics
    [3]Kurt Eberle,Johanna Gei,Bogdan Babych.Design of a hybrid high quality machine translation system.2012.Proceedings of the13 th Conference of the European Chapter of the Association for Computational Linguistics,pages 101–112.
    [4]Sabine Hunsicker,Christian Federmann,Chen Yu.Machine Learning for Hybrid Machine Translation.Proceedings of the 7th Workshop on Statistical Machine Translation,pages 312–316,Montr,eal,Canada,June 7-8,2012.c2012 Association for Computational Linguistics.
    [5]Christian Federmann.Can Machine Learning Algorithms Improve Phrase Selection in Hybrid Machine TranslationProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics,pages 113–118,Avignon,France,April 23-27 2012.c2012 Association for Computational Linguistics.
    [6]Gregor Thurmair.Comparing different architectures of hybrid Machine Translation systems.2009.In Proceedings of the Fourth Workshop on Statistical Machine Translation.
    [7]Heafield,K.,Hanneman,Gr.,Lavie,A.2009:Machine Translation System Combination with Flexible Word Ordering.Proc 4th Workshop on SMT,Athens.
    [8]Loic Dugast,Jean Senellart,Philipp Koehn(2009).Statistical Post Editing and Dictionary Extraction:Systran/Edinburgh submissions for ACL-WMT2009.In Proceedings of the Fourth Workshop on Statistical Machine Translation.
    [9]Callison-Burch,Chr.,Koehn,Ph.,Monz,Ch.,Schroed-er,J.,2009:Findings of the 2009 Workshop on Statis-tical Machine Translation.Proc 4th Workshop on SMT,Athens.
    [10]A.Eisele,C.Federmann,H.Uszkoreit,H.Saint-Amand,M.Kay,M.Jellinghaus,S.Hunsicker,T.Herrmann,and Y.Chen.Hybrid machine translation architectures within and beyond the EuroMatrix project.In Proceedings of the 12th annual conference of the European Association for Machine Translation(EAMT 2008),pages 27–34,Hamburg,Germany,September 2008.
    [11]Haitao Mi,Liang Huang and Qun Liu.Forest-Based Translation.In Proceedings ofACL 2008 Columbus,OH.
    [12]Haitao Mi,and Liang Huang.Forest-based Translation Rule Extraction.In Proceedingsof EMNLP 2008 Hawaii.
    [13]Nicola Ueffing,Jens Stephan,Evgeny Matusov,Loic Dugast,George Foster,Roland Kuhn,Jean Senellart,and Jin Yang.2008.Tighter integration of rule-based and statistical MT in serial system combination.In Proceedings of the 22nd International Conference on Computational Linguistics(Coling 2008),pages 913–920,Manchester,UK,August.Coling 2008 Organizing Committee.
    [14]Andreas Eisele(2007).Hybrid machine translation:Combining rulebased and statistical MT systems.
    [15]Simard,M.,Ueffing,N.,Isabelle,P.,Kuhn,R.,2007:Rule-based Translation With Statistical Phrase-based Post-editing.ACL 2007Second Workshop on Statis-tical Machine Translation.Prague.
    [16]Ehara,T.,2007:Rule Based Machine Translation Com-bined with Statistical Post Editor for Japanese to Eng-lish Patent Translation.Proc.MT Summit,Copenhagen,Workshop on Patent Translation.
    [17]付雷,吕雅娟,刘群,一种融合了句型模板和统计机器翻译技术的翻译方法,第九届计算语言学联合学术会议,2007.
    [18]Dugast,L.,Senellart,J.,Koehn,Ph.,2007:Statistical Post-Editing on SYSTRAN’s Rule-Based Translation System.Proc.SMT-WS 2WTS,Prague
    [19]Masao Utiyama(2006).A Survey of Statistical Machine Translation.Lecture slides.Kyoto University.
    [20]Franz Josef Och,Hermann Ney(2004).The Alignment Template Approach to Statistical Machine Translation.Association for Computational Linguistics.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700