基于双语语料库的机器翻译关键技术研究

英文题名：Research on Bilingual Corpus-Based Machine Translation
作者：巢文涵
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：语料库 ; 统计机器翻译 ; 基于实例的机器翻译 ; 词对齐 ; 重定序 ; 树-树翻译模型 ; 相似实例检索 ; 基于实例的统计机器翻译
英文关键词：Corpus ; statistical machine translation ; example-based machine translation ; word alignment ; reordering ; tree-tree translation model ; similar example retrieval ; example-based statistical machine translation
学位年度：2008
导师：李舟军
学科代码：081202
学位授予单位：国防科学技术大学
论文提交日期：2008-04-01

摘要

机器翻译的研究由来已久,但尚未能完全达到人类期望的目标。随着计算机软硬件技术的高速发展,以及语料库建设的完善,利用统计知识的机器翻译成为可能,翻译质量有望离人类的期望更近一步。自噪声信道模型,尤其是最大熵模型提出以来,机器翻译的一个中心任务是如何在模型中融入更有效的知识(特别是语言学知识),以进一步提高机器翻译的质量。本文聚焦于中文-英文之间的机器翻译问题,针对如何有效地在基于中英双语语料库的机器翻译中结合句法知识进行了一系列系统、深入的研究,并形成了一套完整的系统。具体来说,本文包括以下工作:
     1.提出了一种基于句法知识的词对齐模型及方法。
     词对齐是统计机器翻译的基础,词对齐的质量将会最终影响到机器翻译的质量。针对中英文之间词对齐的困难,本文提出一种词对齐改进模型,在词对齐过程中引入句法知识,以解释中-英词对齐之间复杂的词序变化。
     本文首先将反向转录文法(ITG)内隐式的约束转换成显式的位置判断,从而可以有效地将ITG模型引入对数线性词对齐模型。同时,设计了句法分析树与ITG之间的相似度度量,将句法分析树的约束融入到基于ITG的词对齐模型中。通过整合两种类型的句法知识,使得可以对词对齐中的词序变化进行更好的约束。
     2.提出了一种树-树映射的统计机器翻译模型及方法。
     由于源句子和目标句子的词序差异,重定序(Reordering)处理翻译过程中目标词顺序的变化,它是统计机器翻译(SMT)过程中需要面对的难题之一。
     本文提出一种树-树映射的统计机器翻译模型,通过在源句子的句法树与ITG树之间进行映射,实现在全局范围内约束目标短语的顺序变化;同时模型中包含了基于ITG的局部重定序模型特征,通过将两个块的方向预测分解成对两者相邻子块的方向预测,从而能够预测任意长度的两个块之间的翻译方向。局部模型与全局模型的集成,有效地解释了源句子与目标句子之间的复杂关系。
     3.给出了一种基于双语信息的相似实例检索方法。
     基于实例的机器翻译(EBMT)采用类比的原理进行翻译,在给定相似实例的条件性,能够产生流畅的译文。因此,如何在大规模的实例库中检索出相似实例,对于EBMT的质量具有重要意义。
     本文提出一种新颖的相似实例检索方法,利用实例中的词对齐信息,设计了一系列相似度度量,用于计算输入的待翻译句子与训练语料库中实例的相似度,提高了检索的质量;同时,为加快检索的速度,设计了一个双层倒排索引表,提高了检索的效率。
     4.提出了一种基于实例的统计机器翻译模型及方法。
     前文提出的树-树模型是从源句子的角度出发,尽量确保生成的译文结构满足与源句子句法树的约束关系。因此,它无法保证目标句子结构的合理性。
     本文提出一种混合模型,该模型是对树-树模型的扩展,在SMT中结合实例知识,以保证译文的结构合理性以及流畅性。同时,给出了一个基于实例的解码器,它结合统计知识以及实例信息,以提高解码的质量和效率。
The research on machine translation has lasted a long time, but the quality has not reached the goal that the human beings have expected. However, with the rapid development of the computer technologies, and the improvement of the corpus construction, the machine translation based on the statistical knowledge becomes possible, and the quality of translation has the chance to get closer to the expectation of human beings. Since the noisy channel model, especially the maximum entropy model, for the machine translation have been proposed, one of the central tasks is to integrate more useful knowledge, especially linguistic knowledge, to improve the translation quality further. This paper focuses on the machine translation between the Chinese-English texts. And we make an in-depth and systematical research on how to incorporate the syntactic knowledge into the bilingual corpus-based machine translation , and implement a complete system in the end. In detail, the paper consists of the following topics:
     1. We propose a syntax-based word alignment.
     Word alignment is the base of the statistical machine translation, and its quality will take great effect on the quality of translation. Considering the problems faced in the Chinese-English word alignment, we propose an improved word alignment model, which introduces the syntactic knowledge to explain the flexible word order within the word alignment.
     By transforming the constraints, which is contained in the inversion transduction grammar implicitly, into some explicit position judgments, we introduce the ITG into the log-linear word alignment model in an effective way. Also, after designing some similarity metrics between the syntactic tree and the ITG tree, we integrated the syntactic knowledge into the ITG-based word alignment model, so that the model can constrain the complex word order within the word alignment.
     2. We propose a tree-tree statistical machine translation model.
     Because the word order is different between the source sentence and target sentence, one of the problems that should be solved in the SMT is the reorderings of the target words.
     We present a tree-tree SMT model in this paper. By mapping between the syntactic tree and the ITG tree, the model limits the reordering of the phrases in the global scope. While in the local scope, the tree-tree model takes an ITG-based local reordering model as one feature, in which the reordering probability of two blocks is decomposed into the product of the reordering probabilities of the child blocks respectively. So the model is able to estimate the reordering of two blocks with arbitrary lengths. By combining the global and local reordering model, the tree-tree model is able to explain the complex relationship between the source and target sentences.
     3. We propose a similar example retrieval approach based on bilingual information.
     When given similar translation examples, the example-based machine translation (EBMT) system will generate fluent translation. Thus, it is very important for the EBMT to retrieve the similar examples in the large scale corpus.
     In this paper, we propose a novel retrieval approach, which makes good use of the word alignment knowledge within the examples. In order to measure the similarity between the input sentence, which should be translated, and a translation example, we design a series of similarity metrics based on the word alignment within the example. These metrics improve the quality of retrieval. Also, we design a two-level inverted index table, to improve the efficiency of retrieval.
     4. We propose an example-based statistical machine translation model.
     The tree-tree SMT model above considers the source sentence only, and it tries to make the translation satisfy with the syntactic tree of the source sentence. So, it is unable to ensure that the structure of the target sentence is reasonable.
     We present a hybrid machine translation model, which expands the tree-tree model, combining the example knowledge into the SMT, to ensure the translation's fluency and consistency. In the same time, we present an example-based decoder, which makes use of both of the knowledge within the translation examples and the statistical knowledge, to improve the quality of translation.

引文

[1].W.J.Hutchins.and H.L.Somers.An Introduction to Machine Translation.Academic Press,1992
    [2].冯志伟.机器翻译的现状和问题.《中文信息处理若干重要题》科学出版社,2003
    [3].W.J.Hutchins.Machine Translation:Past,Present,Future.Ellis Horwood,Chichester,England,1986
    [4].D.J.Arnold,Lorna Balkan,Siety Meijer,R.Lee Humphreys and Louisa Sadler.Machine Translation:an Introductory Guide.Blackwells-NCC,London,ISBN:1855542-17x,1994
    [5].W.J.Hutchins.Retrospect and prospect in computer-based translation.In Proceedings of MT Summit Ⅶ,13th- 17th September 1999,Kent Ridge Digital Labs,Singapore,30-34,1999
    [6].Daniel Jurafsky and James H.Martin著;冯志伟,孙乐译.《自然语言处理综述》.电子工业出版社,2005
    [7].赵铁军等编著.《机器翻译原理》.哈尔滨工业大学出版社,2000
    [8].黄昌宁等主编.《自然语言理解与机器翻译》.清华大学出版社,2001
    [9].刘群.《汉英机器翻译若干关键技术研究》.北京大学博士研究生学位论文,2004
    [10].冯志伟著.《机器翻译研究--翻译理论与实务丛书》.中国对外翻译出版社.2004
    [11].马红妹.汉英机器翻译中汉语上下文语境的表示与应用研究.国防科技大学博士诊文,2002
    [12].Kanlaya Naruedomkul & Nick Cercone.Steps toward accurate machine translation.TMI-97:proceedings of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation,July 23-25,1997,St.John's College,Santa Fe,New Mexico,USA;pp.63-73,1997
    [13].Ariadna Font Llitjos,Jaime Carbonell,& Alon Lavie.Improving transfer-based MT systems with automatic refinements.MT Summit Ⅺ,10-14September 2007,Copenhagen,Denmark.Proceedings;pp.183-190,2007
    [14].Sergei Nirenburg,Victor Raskin,& Allen B.Tucker.The structure of interlingua in TRANSLATOR.In:Sergei Nirenburg(ed.) Machine translation:theoretical and methodological issues(Cambridge:Cambridge University Press,1987);pp.90-113,1987
    [15].Uchida Hiroshi & Zhu Meiying.Intedingua for multilingual machine translation.MT Summit Ⅳ:International Cooperation for Global Communication.Proceedings,July 20-22,1993,Kobe,Japan;pp.157-169,1993
    [16].Nadia Mesli.Interlingua vs.transfer? Knowledge sharing across projects.Technology partnerships for crossing the language barrier:Proceedings of the First Conference of the Association for Machine Translation in the Americas,5-8 October,Cohtmbia,Maryland,USA;pp.169-176,1994
    [17].Smriti Singh,Mrugank Dalai,Vishal Vachhani,Pushpak Bhattacharyya,&Om P.Damani.Hindi generation from interlingua.MT Summit Ⅸ,10-14September 2007,Copenhagen,Denmark.Proceedings;pp.421-428,2007
    [18].Kevin Knight & Steve K.Luk.Building a large-scale knowledge base for machine translation.12th National conference of the American Association for Artificial Intelligence(AAAI 1994) July 31 - August 4,1994,
    [19].Kevin Knight,Ishwar Chander,Mathew Haines,Vasileios Hatzivassiloglou,Eduard Hovy,Masayo Iida,Steve K.Luk,Richard Whitney,& Kenji Yamada.Filling knowledge gaps in a broad-coverage MT system.Proceedings of the 14th IJCAI Conference.
    [20].王小捷,钟义信.基于Ontology的英汉机器翻译研究.中文信直学报,2000
    [21].Yorick Wilks.Corpora and machine translation.MT Summit Ⅳ:International Cooperation for Global Communication.Proceedings,July 20-22,1993,Kobe,Japan;pp.137-145
    [22].刘群.统计机器翻译综述.中文信息学报,2003
    [23].Peter F.Brown,John Cocke,Stephen A.Della Pietra,Vincent J.Della Pietra,Frederick Jelinek,John D.Lafferty,Robert L.Mercer,& Paul S.Roossin.A statistical approach to machine translation.Computational Linguistics 16(2),pp.79-85,1990
    [24].Peter F.Brown,Stephen A.Della Pietra,Vincent J.Della Pietra,& Robert L.Mercer.The mathematics of statistical machine translation:parameter estimation.Computational Linguistics 19(2),pp.263-311,1993
    [25].Franz Joseph Och & Hermann Ney.Statistical machine translation.Fifth EAMT Workshop "Harvesting existing resources",May 11-12,Ljubljana,Slovenia;pp.39-46.2000
    [26].Franz Josef Och.Statistical machine translation:foundations and recent advances MT Summit Ⅹ,,Phuket,Thailand,September 12,2005
    [27].Makoto Nagao.A framework of a mechanical translation between Japanese and English by analogy principle.Artificial and human intelligence:edited review papers presented at the international NATO Symposium,October 1981,Lyons,France;ed.A.Elithorn and R.Banerji.Amsterdam:North Holland,pp.173-180.1984
    [28].Satoshi Sato & Makoto Nagao.Toward memory-based translation.Coling-90:Papers presented to the 13th International Conference on Computational Linguistics,Helsinki,1990;ed Hans Karlgren,vol.3,pp.247-252.1990
    [29].Eiichiro Sumita,Hitoshi Iida,& Hideo Kohyama.Translating with examples:a new approach to machine translation.Third international conference on Theoretical and Methodological Issues in Machine Translation of Natural Language,11-13 June 1990,Linguistics Research Center,University of Texas;pp.203-212.1990
    [30].Satoshi Sato.Example-based translation of technical terms.TMI-93:The Fifth International Conference on Theoretical and Methodological Issues in Machine Translation,Kyoto,Japan,July 14-16,1993:Proceedings;pp.58-68.1993
    [31].Ralf D.Brown:Example-based machine translation in the Pangloss system.Coling 1996:the 16th International Conference on Computational Linguistics:Proceedings,August 5-9,1996,Center for Sprogteknologi,Copenhagen;pp.169-173,1996
    [32].W.J.Hutchins.Towards a definition of example-based machine translation.In:MT Summit X,Phuket,Thailand:Proceedings of Workshop on Example-Based Machine Translation,pp.63-70.2005
    [33].W.J.Hutchins.Example-based machine translation:a review and commentary.Machine Translation vol.19,2005,197-211.2005
    [34].D.Mostefa,O.Hamon,N.Moreau and K.Choukri.Evaluation report (TC-STAR).Technology and Corpora for Speech to Speech Translation.2007
    [35].侯宏旭,刘群,张玉洁,井佐原均,HOUHong-xu,LIU Qun,ZHANG Yu-jie,ISAHARA Hitoshi.2005年度863机器翻译评测方法研究与实施.中文信息学报。
    [36].Cameron Shaw Fordyce.Overview of the IWSLT 2007 evaluation campaign.In Proceedings of International Workshop on Spoken Language Translation(IWSLT2007),October 15-16,Trento,Italy.2007
    [37].Christopher_D.Manning and Hinrich Schutzez著,苑春法等译.《统计自然语言处理基础》.电子工业出版社,2005
    [38].Ye-Yi Wang & Alex Waibel.Modeling with structures in statistical machine translation.Coling-ACL '98:36th Ammal Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics,August 10-14,1998,Universite de Montreal, Montreal,Quebec,Canada;pp.1357-1363,1998
    [39].Kenji Yamada and Kevin Knight.A Syntax-based Statistical Translation Model.In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics,pp.523-530,2001
    [40].Kenji Yamada & Kevin Knight.A decoder for syntax-based statistical MT.ACL-2002:40th Ammal meeting of the Association for Computational Linguistics,Philadelphia,July 2002;pp.303-310.2002
    [41].Yang Liu,Qun Liu,& Shouxun Lin.Tree-to-string alignment template for statistical machine translation.Coling-ACL 2006:Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics;pp.609-616,2006
    [42].Dekai Wu.Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora.Computational Linguistics,23(3):374,1997
    [43].熊德意.《基于括号转录语法和依存语法的统计机器翻译研究》.中国科学院研究生院博士学位论文,2007
    [44].Ralf D.Brown.Automated Generalization of Translation Examples.In Proceedings of the Eighteenth International Conference on Computational Linguistics(COLING-2000),p.125-131.Saarbr(u|¨)cken,Germany,August 2000.
    [45].刘群.汉语词法分析和句法分析技术综述.第一届学生计算语言学研讨会(SWCL2002)专题讲座,2002
    [46].曹海龙.《基于词汇化统计模型的汉语句法分析研究》.哈尔滨工业大学博士论文,2006
    [47].Roger Levy and Christopher D.Manning.Is it harder to parse Chinese,or the Chinese Treebank?.In proceeding of ACL.2003
    [48].Dan Klein and Christopher D.Manning.Accurate Unlexicalized Parsing.Proceedings of the 41st Meeting of the Association for Computational Linguistics,pp.423-430.2003.
    [49].Le Sun,Song Xue,Weimin Qu,Xiaofeng Wang,& Yufang Sun.Constructing a large-scale Chinese-English parallel corpus.Coling-2002:Third Workshop on Asian Language resources and International Standarization,31August 2002
    [50].俞士汶,段慧明,朱学锋,孙斌.北京大学现代汉语语料库基本加工规范.中文信息学报,2002
    [51].傅爱平.语料库研究与应用综述.
    [52].冯志伟.中国语料库研究的历史与现状.汉语浯言与计算学报,2002
    [53].黄昌宁等.《语料库语言学》,北京:商务印书馆,2002
    [54].A.Stolcke.SRILM- an extensible language modeling toolkit.In Proc.Int. Conf.on Spoken Language Processing,Denver,Colorado,September.2002.
    [55].P.Clarkson and R.Rosenfeld.Statistical language modeling using the CMU-Cambridge toolkit in G.Kokkinakis,N.Fakotakis,and E.Dermatas,editors,Proc.EUROSPEECH,vol.1,pp.2707-2710,Rhodes,Greece,Sep.1997.
    [56].S.F.Chen and J.Goodman.An empirical study of smoothing techniques for language modeling,Technical Report TR- 10-98.Computer Science Group,Harvard University,Aug.1998.
    [57].王继曾,任浩征,罗恒,刘宽.基于统计的句法分析方法研究.计算机工程与设计,2006
    [58].Dekai Wu and Xuanyin Xia.Learning an english chinese lexicon from a parallel corpus.Proceedings of the First Conference of the Association for Machine Translation in the Americas,Columbia,MD,1994.
    [59].Willam.A.Gale and Kenneth W.Church..Identifying Word Correspondences in Parallel Texts.In 4th Speech and Natural Language Workshop,pages 152-157.1991
    [60].Dekang Lin.Automatic Retrieval and Clustering of Similar Words.In COLING-A CL '98,pages 768-773.1998
    [61].孙玉芳,杜林,金友兵,孙乐.平行语料库中双语术语词典的自动抽取.中文信息学报,2000
    [62].陈博兴,杜利民.基于双语对齐口语语料的翻译词典的自动生成.计算机学报,2003
    [63].P.F.Brown,J.C.Lai,R.L.Mercer.Aligning Sentences in Parallel Corpora.In proceeding of the 29th Association for Computational Linguistics,1991
    [64].Pascale Fung & Kenneth Ward Church.K-vec:a new approach for aligning parallel texts.Coling 1994:the 15th International Conference on Computational Linguistics:Proceedings,Kyoto,Japan;pp.1096-1102,August 5-9,1994
    [65].W.A.Gale,K.W.Church.A Program for Aligning Sentences in Bilingual Corpora.Computational Linguistics,1994
    [66].王斌.汉英双语语料库自动对齐研究.中科院计算所博士论文.1999
    [67].吕学强,李清隐,黄志丹,沈嫣娜,姚天顺.基于统计的汉英句子对齐研究.小型微型计算机系统2004
    [68].张艳,柏冈秀纪,ZHANG Yan,KASHIOKA Hideki.基于长度的扩展方法的汉英句子对齐.中文信息学报.2005.
    [69].常宝宝詹卫东柏晓静吴云芳张化瑞.服务于汉英机器翻译的双语语料库和短语库建设.第二届中日自然语言处理专家研讨会论文集.p147-154.2002
    [70].Stephan Vogel,Hermann NeT,and Christoph Tillmann.HMM-based word alignment in statistical translation.In Proceedings of the COLING 96,pp.836-841.1996
    [71].I.Dan Melamed..Models of Translational Equivalence among Words.Computational Linguistics,26(2):221-249.2000
    [72].Och,F.J.,Tillmann,C.,and Ney,H..Improved alignment models for statistical machine translation.In Proc.Of the Joint Conf of Empirical Methods in Natural Language Processing and Very Large Corpora,pages 20-28.1999
    [73].Franz Joseph Och and Herman NeT.A Systematic Comparison of Various Stati sti cal Alignment Model s.Computational Linguistics,29(1):19-52,March.2003
    [74].Dekang Lin & Colin Cherry.Word alignment with cohesion constraint HLT-NAACL 2003:conference combining Human Language Technology conference series and the North American Chapter of the Association for Computational Linguistics conference series,May 27 - June 1,2003
    [75].Colin Cherry & Dekang Lin.A probability model to improve word alignment ACL-2003:41st Annual meeting of the Association for Computational Linguistics,July 7-12,2003,Sapporo,Japan
    [76].Bing Zhao and Stephan Vogel.Word Alignment Based on Bilingual Bracketing.In HLT-NAACL 2003 Workshop:Building and Using Parallel Texts Data Driven Machine Translation and Beyond,pp.15-18.2003
    [77].D.Gildea.Dependencies vs.constituents for treebased alignment.In Proceedings of the EMNLP,214-221,Barcelona,Spain.2004
    [78].Yang Liu,Qun Liu,& Shouxun Lin.Log-linear models for word alignment ACL-2005:43rd Annual meeting of the Association for Computational Linguistics,University of Michigan,Ann Arbor,25-30 June;pp.459-466.2005
    [79].Robert C.Moore.A discriminative framework for bilingual word alignment HLT-EMNLP-2005:Proceedings of Human Technology Conference and Conference on Empirical Methods in Natural Language Processing,Vancouver,October 2005;pp.81-88.2005
    [80].Robert C.Moore.Association-based bilingual word alignment.ACL-2005:Workshop on Building and Using Parallel Texts - Data-driven machine translation and beyond,University of Michigan,Ann Arbor,29-30 June,pp.1-8.2005
    [81].Ben Taskar,Simon Lacoste-Julien,& Dan Klein.A discriminative matching approach to word alignment.HLT-FA4NLP-2005:Proceedings of Human Technology Conference and Conference on Empirical Methods in Natural Language Processing,Vancouver,pp.73-80.2005
    [82].Colin Cherry & Dekang Lin.A comparison of syntactically motivated word alignment spaces.EACL-2006:11th Conference of the European Chapter of the Association for Computational Linguistics,Trento,Italy,April 3-7,pp.145-152,2006
    [83].Colin Cherry & Dekang Lin.Soft syntactic constraints for word alignment through discriminative training.Coling-ACL 2006:Proceedings of the Coling/ACL 2006 Main Conference Poster Sessions,Sydney,July 2006;pp.105-112.2006
    [84].Franz Joseph Och and Hermann Ney.Discriminative training and maximum entropy models for statistical machine translation.In Proceedings of the 40th Annual Meeting of the ACL,pp.295-302.2002
    [85].Declan Groves & Andy Way.Hybrid example-based SMT:the best of both worlds? ACL-2005:Workshop on Building and Using Parallel Texts -Data-driven machine translation and beyond,University of Michigan,Ann Arbor,29-30 June 2005;pp.183-190.2005
    [86].Taro Watanabe and Eiichiro Sumita.Example-based Decoding for Statistical Machine Translation.In Machine Translation Summit Ⅸ pp.410-417,2003
    [87].Christoph Tillmann and Tong Zhang.A Localized Prediction Model for Statistical Machine Translation.Proceedings of the 43rd Annual Meeting of the ACL,pages 557-564.2005
    [88].Shanka Kumar,Willianm Byrne.Local Phrase Reordering Models for Statistical Machine Translation.Proceedings of Human Language Technology Conference and Conference on Empirial Methods in Natural Language Processing(HLT/EMNLP),pages 161-168.2005
    [89].Masaaki Nagata,Kuniko Saito.A Clustered Global Phrase Reordering Model for Statistical Machine Translation.Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL,pages 713-720.2006
    [90].Richard Zens,Hermann Ney,Taro Watanabe,& Eiichiro Sumita.Reordering constraints for phrase-based statistical machine translation.Coling 2004:20th International Conference on Computational Linguistics,23-27August 2004,University of Geneva,Switzerland,Proceedings;2004
    [91].Deyi Xiong,Qun Liu and Shouxun Lin.Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation.Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL,pages521-528.2006
    [92].张慧,史晓东.一种基于树到树映射的统计机器翻译模型,第三届统计机器翻译研讨会(SSMT2007)论文集,pp.14-21,2007
    [93].Ren Feiliang,Zhang Li,Hu Minghan,& Yao Tianshun.EBMT based on finite automata state transfer generation.TMI-2007:Proceedings of the 11th International Conference on Theoretical and Methodological Issues in Machine Translation,Sk(o|¨)vde[Sweden],7-9 September;pp.65-74.2007
    [94].任飞亮.东北大学SSMT2007系统描述,第三届统计机器翻译研讨会(SSMT2007)论文集,PP.149-152,2007
    [95].中科院计算所,中科院自动化所,中科院软件所,厦门大学,哈尔滨工业大学.基于短语的统计机器翻译系统“丝路”1.0版设计与使用说明.2006
    [96].刘宏,黄赟,刘群.第三届统计机器翻译研讨会评测报告.SSMT2007会议录,2007
    [97].David Chiang:A Hierarchical Phrase-Based Model for Statistical Machine Translation.In Proc.of ACL 2005,pages 263-270(2005)
    [98].Federica Mandreoli,Riccardo Martoglia,and Paolo Tiberio.Searching Similar(Sub)Sentences for Example-Based Machine Translation.In:Atti del Decimo Convegno Nazionale su Sistemi Evohtti per Basi di Datt(SEBD 2002),Isola d'Elba,Italy,2002.
    [99].Takao Doi and Eiichiro Sumita.Splitting Input Sentence for Machine Translation Using Language Model with Sentence Similarity.In proceedings of Coling 2004:20th International Conference on Computational Linguistics,pp.23-27,2004.
    [100].黄河燕,陈肇雄,张孝飞,张克亮:大规模句子相似度计算方法。中文信息学报,2006年21期。
    [101].Naoaki Okazaki,Yutaka Matsuo,Naohiro Matsumura and Mitsuru Ishizuka.Sentence Extraction by Spreading Activation with Refined Similarity Measure.In IEICE Transactions on Information and Systems(Special Issue on Text Processing for Information Access),E86-D(9):915-926,September 2003.
    [102].李彬,刘挺,秦兵,李生.基于语义依存的汉语句子相似度计算。哈尔滨工业大学信息检索研究室论文集,第一卷,2003
    [103].Courtney Corley and Rada Mihalcea.Measuring the Semantic Similarity of Texts.In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment,pp.13-18,Ann Arbor,June 2005.
    [104].车万翔,刘挺,秦兵,李生:基于改进编辑距离的中文相似句子检索。高技术通讯,2004年07期。
    [105].R.Zens,F.J.Och,and H.Ney.Phrase-based statistical machine translation.In 25th German Conference on Artificial Intelligence(KI2002),pages 18-32,Aachen,Germany,September.Springer Verlag.2002
    [106].D.Marcu and W.Wong.A phrase-based,joint probability model for statistical machine translation.In Proc.Conf.on Empirical Methods for Natural Language Processing,pages 133-139,Philadelphia,PA,July.2002
    [107].Philipp Koehn,Franz Josef Och,& Daniel Marcu.Statistical phrase-based translation.HLT-NAACL 2003:conference combining ttuman Language Technology conference series and the North American Chapter of the Association for Computational Linguistics conference series,May 27- June 1,Edmonton,Canada;pp.48-54,2003
    [108].Richard Zens & Hermann Ney:Improvements in phrase-based statistical machine translation.HLT-NAACL 2004:Human Lang,.rage Technology conference and North American Chapter of the Association for Computational Linguistics anmtal meeting,May 2-7,2004,The Park Plaza Hotel,Boston,USA;pp.257-264.2004
    [109].P.Koehn.Pharaoh:a beam search decoder for phrase-based statistical machine translation models.In:Proceedings of the Sixth Conference of the Asso-ciation for Machine Translation in the Americas,pp.115-124.2004
    [110].Philipp Koehn,Hieu Hoang,Alexandra Birch,Chris Callison-Burch,Marcello Federico,Nicola Bertoldi,Brooke Cowan,Wade Shen,Christine Moran,Richard Zens,Chris Dyer,Ondrej Bojar,Alexandra Constantin,Evan Herbst.Moses:open source toolkit for statistical machine translation.ACL 2007:proceedings of demo and poster sessions,Prague,Czech Republic,June 2007;pp.177-180,2007
    [111].Koehn,Philipp and Hieu Hoang.Factored Translation Models.EMNLP,2007
    [112].Michel Galley,Mark Hopkins,Kevin Knight & Daniel Marcu.What's in a translation rule? HLT-NAACL 2004:Human Language Technology conference and North American Chapter of the Association for Computational Linguistics annual meeting,May 2-7,2004,The Park Plaza Hotel,Boston,USA;pp.273-280.2004
    [113].Michel Galley,Jonathan Graehl,Kevin Knight,Daniel Marcu,Steve DeNeefe,Wei Wang,& Ignacio Thayer.Scalable inference and training of context-rich syntatic translation models.Coling-ACL 2006:Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics,Sydney,17-21 July 2006;pp.961-968,2006
    [114].Daniel Marcu,Wei Wang,Abdessamad Echihabi,& Kevin Knight.SPMT:statistical machine translation with syntactified target language phrases. EMNLP-2006:Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing,Sydney,Australia,July 2006;pp.44-52,2006
    [115].M.Collins.Discriminative training methods for hidden Markov models:theory and experiments with perceptron algorithms.In Proceedings of EMNLP.2002.
    [116].Franz Josef Och:Minimum error rate training in statistical machine translation.ACL-2003:41st Annual meeting of the Association for Computational Linguistics,July 7-12,2003,Sapporo,Japan.2003
    [117].Philip Bille.Tree Edit Distance,Alignment Distance and Inclusion.IT University Technical Report Series TR-2003-23 ISSN 1600-6100 March 2003
    [118].Huaping Zhang,HongKui Yu,Deyi Xiong,and Qun Liu.HHMM-based Chinese lexical analyzer ICTCLAS.In proceedings of the second SigHan Workshop affiliatedwith 41th ACL,pp.184-187,2003
    [119].Kishore Papineni,Salim Roukos,Todd Ward and Wei-Jing Zhu.BLEU:a Method for Automatic Evaluation of Machine Translation.In Proceedings of the 40th Anmtal Meeting of the Association for Computational Linguistics (ACL),Philadelphia,July 2002,pp.311-318,2002
    [120].耶茨(Baeza-Yates,R.)等著,王知津等译.《现代信息检索--计算机科学丛书》.机械工业出版社.2005
    [121].S.Robertson.Understanding inverse document frequency.On Theoretical Arguments for IDF.Journal of Documentation,Vol.60 Issue 5 pp.503-520,2004
    [122].赵妍妍,秦兵,刘挺,张俐,苏中.基于多特征融合的句子相似度计算.全国第八届计算机语文学联合学术会议(JSCL-2005).pp.168-174,2005
    [123].梅家驹.《同义词词林(第二版)》.上海辞书出版社,1996
    [124].哈尔滨工业大学信息检索研究室.《哈工大信息检索研究室同义词词林扩展版》说明.哈工大信息检索研究室,2006.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700