基于语义语言的机器翻译系统中若干关键问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
时至今日,全自动高质量的机器翻译系统仍未出现,其原因在于语言知识的复杂性和人类认识语言规律的有限性。目前,机器翻译的研究已不局限于对单个句子的句法语义分析,更要充分挖掘语言内部规律并深入探讨包括句群、段落、篇章、体裁在内的各种语境信息。基于语义语言的多语言机器翻译方法根据语义单元库对源语言句子进行语义分析,并展开为目标语言句子来实现翻译。该机器翻译系统由两部分组成:一是统一的多自然语言机器翻译软件;另一是高质量、可扩充、完备的、无可弃、无重复、无非正常歧义的多语统一语义单元库。但是受所提取语料限制和语言现象复杂多变的影响,现有语义单元库并不完备(语义单元及其表示不全或不正确),无法保证所有句子都能翻译且结果正确。另外目前该系统中仍存在一些难点重点问题亟待解决。为了提高翻译质量,得到正确且无语无伦次的译文,本文针对系统中存在的其中三类关键问题进行了一系列的研究,主要包括如下三个方面:
     (1)研究了基于语义语言的机器翻译系统中的关键问题之一:量词。
     首先在统一语言学语义单元理论的基础上,对英、汉名量词进行统一分类,即量词在英语和汉语中的不同语义单元表示对应相同的量词语义类型;然后基于语义单元理论将名量短语形式化成汉语和英语语义单元表;最后利用《知网》的词汇定义对形式量词搭配的名词进行语义表示,并建立形式量词-名词搭配规则库,提出并实现了一种基于规则库和名词实例的系统中汉语形式量词选择方法。
     (2)研究了基于语义语言的机器翻译系统中的关键问题之二:英语介词的语义消歧和汉译。
     首先基于语义单元理论并针对含介词的短语和句子特点,构建了介词特有语义单元(表示)库——介词语义模式库。介词的翻译由两步来实现:第一步基于语义模式库对英语含介词的短语和句子进行语义分析得到英语完整语义模式;第二步基于语义模式库将英语完整语义模式代入展开成为汉语完整语义模式和汉语译文。最后在语义分析阶段提出三种介词特有的语义分析方法。第一种方法是基于连接文法和语义模式法:建立含介词的短语和句子与连接因子对应关系,将连接文法分析器对句子识别结果与其相匹配获取句子语法结构。第二种方法是语义模式分解法:将含介词的短语和句子的四种基本形式分解得到介词简单语义模式。第三种方法是语义模式扩展法:从介词开始向左向右逐层扩展动词、名词和形容词得到实际扩展形式。实验结果验证了介词语义模式库和语义分析三种方法对处理基于语义语言的机器翻译系统中的介词语义分析和汉译是有效的。
     (3)研究了基于语义语言的机器翻译系统中的关键问题之三:汉英时态转换。
     首先在分析汉英双语句对和归纳汉语句子时间表达规律基础上,从英语16种时态向汉语时间表达的逆向映射关系出发,对汉语句子时间信息进行新分类,该分类方法有效地避免传统时态分类的复杂性和类别交叉等缺点:然后提出时间模式的相关概念,用其对每类时间信息进行形式化,并构建了时间信息模式库;最后提出了一种将汉语单句时间分析算法、汉语关联词语标记句时间分析算法、类虚拟语气句时间分析算法与篇章信息识别规则相结合的多策略汉语句子时间分析和英译方法,为基于语义语言的机器翻译系统提供了一种可行的时态转换方法。
Up to now,there isn't a full automatic and high quality machine translation system being developed yet.Its difficulty mainly lies in the complexity of language knowledge and our limited understanding of the rules of human languages.At present the study on machine translation not only is constraint to semantic and syntactic analysis in single sentence,but needs to explore the inherent law of the language and context information like sentences, paragraphs,texts and styles,etc.Multi-language machine translation based on Semantic Language analyzes the source language sentences semantically with reference to the Semantic Element Base,and converts the source language to the form of the target language to realize translation.The machine translation system is composed of two parts:one is a unified multi-language machine translation software;the other is a high-quality,expandable, complete,free-discardable,free-of-repetition and free-of-abnormal-ambiguity multi-language Semantic Element Base.But limited to the corpora and influenced by complex and flexible language phenomena,the present Semantic Element Base is not perfect enough(incomplete and wrong Semantic Elements and their representations) to realize entirely right translation for all the sentences.Furthermore there still exist some difficulties need to be settled in the system.
     To improve translation quality and get correct translation,a series of researche(?) have been taken to solve three key problems among all and they are mainly carried out in the following three aspects:
     (1) Study on one of the key problems in the machine translation system based on semantic language:classifiers.
     Firstly based on the theory of semantic element in unified linguistics,a new unified classification of English and Chinese nominal classifiers was proposed.Different semantic element representations of classifiers in English and Chinese have the same semantic type of classifiers.The noun-classifier phrases were formalized into English and Chinese semantic element representations respectively.Then the nouns collocated with the formalized classifiers were represented semantically by the lexical definition in the HowNet,and a Formalized Classifier-Noun Collocation Rule Base was constructed.Finally a Chinese formalized classifier selection method based on lexical examples and Collocation Rule Base was proposed and realized.
     (2) Study on one of the key problems in the machine translation system based on semantic language:semantic disambiguation and Chinese translation of English prepositions.
     Firstly based on the theory of Semantic Element and the characteristics of phrases and sentences with prepositions,the Semantic Pattern Bases of English prepositions(special Semantic Element Representation Bases for prepositions) were presented.Then the translation process of prepositions was in two steps.One was getting English complete semantic pattern by semantic analysis based on Semantic Pattern Base,the other was deploying into Chinese complete semantic pattern and Chinese representation.Finally three semantic analysis methods peculiar to prepositions in semantic analysis step were proposed. The first method was based on Link Grammar and semantic pattern.The parsing results by the Link Grammar Parser were matched with the correspondence between the phrases and sentences with prepositions and the connectors to get the grammatical structure.The second method was based on semantic pattern decomposition.The four basic forms of phrases and sentences with prepositions were decomposed into simple semantic patterns.The third method was based on semantic pattern extending.The verbs,nouns and adjectives were extended around the prepositions to get real extending form.The test results prove that the Semantic Pattern Bases of English prepositions and the three semantic analysis methods are effective on resolving semantic analysis and Chinese translation of English prepositions in the machine translation system based on semantic language.
     (3) Study on one of the key problems in the machine translation system based on semantic language:Chinese-English temporal transfer.
     Firstly based on the analysis of bilingual sentence pairs and summarizing the law of temporal expressions,a new classification method of Chinese temporal information was proposed in the light of the reversal mapping from sixteen English tense and aspect to Chinese temporal expressions.This classification method can effectively avoid the drawback of complexity and overlapping of traditional classification.Then the concept of Temporal Pattern was presented to formalize each type of the temporal information,and the Temporal Pattern Base was constructed.Finally the temporal analysis and translation of Chinese sentences in the system were resolved by combining temporal analysis algorithm of Chinese simple sentence,conjunction-marked sentence and analogous subjunctive mood sentence with context rules.It provides a feasible solution for temporal transfer in the machine translation system based on semantic language.
引文
[1]Allen J.自然语言理解[M].刘群,张华平,骆卫华等译.北京:电子工业出版社,2005.
    [2]姚天顺,朱靖波,张琍.自然语言理解——一种让机器懂得人类语言的研究[M](第2版).北京:清华大学出版社,2002.
    [3]Jurafsky D,Martin J H.自然语言处理综论[M].冯志伟,孙乐等译.北京:电子工业出版社,2005.
    [4]王小捷,常宝宝.自然语言处理技术基础[M].北京:北京邮电大学出版社,2002.
    [5]Hutchins J.Machine translation:past,present,future[M].England:Ellis Horwoods Limited,1986.
    [6]赵铁军.机器翻译原理[M].哈尔滨:哈尔滨工业大学出版社,2001.
    [7]冯志伟.机器翻译研究[M].北京:中国对外翻译出版公司,2004.
    [8]Gao Q S,Hu Y,Li L,et al.Semantic language and multi-language MT approach based on SL [J].Journal of Computer Science & Technology,2003,18(6):848-852.
    [9]高庆狮,高小宇,胡玥.基于语义的机器翻译系统及方法[P].中国,发明专利,01131689.6.2001.
    [10]胡玥,高小宇,高庆狮.多语言机译系统中高质量语义单元库形成方法[J].北京科技大学学报,2008,30(6):698-703.
    [11]Harold L.Somers,Current research in MT[J].Machine Translation,1993,7(4):231-246.
    [12]Nagao M.A framework of a mechanical translation between Japanese and English by analogy principle[M].In:Elithorn A,Banerji R eds.Artificial and Human Intelligence.NATO Publication,1984:173-180.
    [13]Brown P F,Cocke J,Pietra S D,et al.A statistical approach to MT[J].Computational Linguistics,1990,16(2):79-85.
    [14]俞士汶.现代汉语语法信息词典[M].北京:北京大学计算语言学研究所,2000.
    [15]董振东,董强.知网[DB/OL].1999.http://www.keenage.com.
    [16]梅家驹.同义词词林[M].上海:上海辞书出版社,1982.
    [17]陈群秀.信息处理用现代汉语语义分类词典的设计与实现[C].见:曹石琦.辉煌二十年——中国中文信息学会二十周年学术会议.北京:清华大学出版社,2001,148-155.
    [18]Miller A G.WordNet:A lexical database for English[J].Communications of the ACM,1995,38(11):39-41.
    [19]俞士汶.计算语言学的应用研究与基础研究[C].见:曹石琦.辉煌二十年——中国中文信息学会二十周年学术会议.北京:清华大学出版社,2001,54-65.
    [20]周强,詹卫东,任海波等.构建大规模的汉语语块库[M].见:黄昌宁,张普.自然语言理解与机器翻译.北京:清华大学出版社,2001,102-115.
    [21]冯志伟.中国语料库研究的历史与现状[J].Journal of Chinese Language and Computing,2002,12(1):43-62.
    [22]COBUILD English-Chinese Dictionary[M].上海:上海译文出版社,2002.
    [23]高庆狮,陈肇雄,李堂秋.类人机译系统原理[J].计算机研究与发展,1989,26(2):1-7.
    [24]高庆狮.句义表达式的生成方法、机器翻译及电子词典[P].中国,发明专利,200310114331.X.2003.
    [25]高庆狮,高小宇.提高文字、语音识别的准确率的方法和装置及自动翻译系统[P].中国,发明专利,200410062566.3.2004.
    [26]高小宇,高庆狮,胡玥等.基于语义单元表示树剪枝的高速多语言机器翻译[J].软件学报,2005,16(11):1909-1919.
    [27]李莉,高庆狮.基于语义分析提高脱机手写体识别率的方法[J].计算机工程与应用,2006,42(6):10-12.
    [28]高庆狮,李莉,刘宏岚.基于语义单元表示树剪枝的关键字过滤方法[J].北京科技大学学报,2006,28(12):1191-1195.
    [29]Quirk,Leech.A grammar of the contemporary[M].London:Longman Group Ltd,1972.
    [30]Alexander L G.朗文英语语法[M].北京:外语教学与研究出版社,1991.
    [31]章振邦.新编英语语法教程[M].上海:上海外语教育出版社,1995.
    [32]张道真.实用英语语法[M].北京:外语教学与研究出版社,2002.
    [33]吕叔湘.中国文法要略[M].北京:商务印书馆,1982.
    [34]邵敬敏.量词的语义分析及其与名词的双向选择[J].中国语文,1993,(3):181-188.
    [35]邵敬敏.动量词的语义分析及其与动词的选择关系[J].中国语文,1996,(2):100-108.
    [36]邢福义.汉语语法学[M].长春:东北师范大学出版社,1996.
    [37]朱德熙.语法讲义[M].北京:商务印书馆,1998.
    [38]王晓玲.汉英量词之比较[J].南京航空航天大学学报(社会科学版),2001,3(1):42-44.
    [39]郑旭玲,李堂秋,陈毅东.量词与相关成分的制约关系在汉语短语排歧中的应用[J].厦门大学学报,2002,41(6):715-719.
    [40]张辉.构建面向中文信息处理的名量搭配词典[D]:(硕士学位论文).上海:上海交通大学,2003.
    [41]张辉,徐菁,陆汝占等.利用数据挖掘扩充量词名词词典的方法[J].计算机工程,2003,29(13):92-94.
    [42]陈先华.自然语言理解之汉语量词和名词搭配及查错系统[D]:(硕士学位论文).成都:电子科技大学,2002.
    [43]Voss C R.Interlingua-based Machine Translation of spatial expressions[D]:(Ph.D.thesis).College Park:University of Maryland,1996.
    [44]Litkowski K C.Digraph analysis of dictionary preposition definitions[C].In:Proceedings of the ACL-02 workshop on Word sense disambiguation:recent successes and future directions.Morristown:Association for Computational Linguistics,2002:9-16.
    [45]Tom O' Hara,Janyce Wiebe.Preposition semantic classification via Penn Treebank and FrameNet[C]In:Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003.Edmonton,Canada,2003:79-86.
    [46]Yukiko Sasaki Alam.Decision trees for sense disambiguation of prepositions:case of over[C].In:Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004.Boston,2004:52-59.
    [47]Litkowski K C,Orin Hargraves.The preposition project[C].In:Proceedings of the Second ACL-SIGSEM Workshop on Prepositions and their Use in Computational Linguistics Formalisms and Applications.Colchester:ACL-SIGSEM,2005.
    [48]Patrick Saint-Dizier.PrepNet:A framework for describing prepositions:Preliminary investigation results[C].In:Proceedings of the Sixth International Workshop on Computational Semantics.Colchester:ACL-SIGSEM,2005.
    [49]Chutima Boonthum,Shunichi Toida,Irwin Levinstein et al.Sense disasbiguation for preposition 'with'[C].In:Valia Kordoni,Aline Villavicencio,eds.Proceedings of the Second ACL-SIGSEM Workshop on the Linguistic Dimensions of Prepositions and their Use in Computational Linguistics Formalisms and Applications.Colchester:ACL,2005.
    [50]黄建烁,徐秉铮,周三忠.机器翻译中介词的处理策略[J].中文信息学报,1992,6(8):35-41.
    [51]陈强,周洪.英汉机器翻译系统中介词多义的处理方法[J].北京联合大学学报,1998,12(S1):158-164.
    [52]Nathalie Japkowicz,J.M.Wiebe.A system for translating locative preposition from English into French[C].In:Proceedings of the 29th annual meeting on Association for Computational Linguistics Association.Morristown:Association for Computational Linguistics,1991:153-160.
    [53]Ebba Gustavii.Target language preposition selection—an experiment with transformation-based learning and aligned bilingual data[C].In:Proceedings of 10th EAMT Conference-Practical applications of machine translation.Budapest,2005.
    [54]Sudip Naskar,Sivaji Bandyopadhyay.Handling of prepositions in English to Bengali Machine Translation[C].In:Proceedings of the Third ACL-SIGSEM Workshop on Prepositions.Morristown:Association for Computational Linguistics,2006:89-94.
    [55]Samar Husain,Dipti Misra Sharma,Manohar Reddy.Simple preposition correspondence:a problem in English to Indian language Machine Translation[C].In:Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions.Morristown:Association for Computational Linguistics,2007:51-58.
    [56]郭宏蕾,姚天顺.时间语义层次结构及理解[J].中文信息学报,1997,11(1):11-19.
    [57]郭宏蕾,姚天顺.自然语言中时间信息的模型化[J].软件学报,1997,8(6):432-440.
    [58]麻志毅,林鸿飞,姚天顺.基于情境的文本中时间信息分析[J].东北大学学报(自然科学版),1999,20(3):239-242.
    [59]王昀.金融领域中汉语时间信息抽取的研究[D]:(硕士学位论文).北京:清华大学,2004.
    [60]陈振宇,陈振宁.怎样计算现代汉语句子的时间信息[J].中文信息学报,2005,19(3):94-104.
    [61]成斌,陈跃新.基于Ontology的汉语时间语义分析[J].计算机与现代化,2005,(6):109-112.
    [62]成斌。汉语时间语义分析及推理[D]:(硕士学位论文).长沙:国防科技大学,2005.
    [63]林达真,李绍滋.基于模式分类的汉语时态确定方法研究[J].中文信息学报,2006,20(1):67-75.
    [64]杜津,杨一平,曾隽芳.自然语言时间信息的模拟与计算[J].计算机工程与设计,2006,27(13):2419-2471.
    [65]代建英,何中市.基于词性信息的汉语时间语词消歧算法[J].重庆大学学报,2007,(9):53-56.
    [66]徐永东,徐志明,王晓龙等.中文文本时间信息获取及语义计算[J].哈尔滨工业大学学报,2007,39(3):438-442.
    [67]Li W J,Cao G H,Yuan C,et al.A model for processing temporal references in Chinese [C].In:Proceedings of the ACL workshop on Temporal and spatial information processing.Morristown:Association for Computational Linguistics,2001:1-8.
    [68]Li W J,Cao G H,Wong K F,et al.Applying Machine Learning to Chinese Temporal Relation Resolution[C].In:Proceedings of Association for Computational Linguistics.Morristown:Association for Computational Linguistics,2004.
    [69]Yang G W,and Bateman J A.The Chinese aspect system and its semantic interpretation [C].In:Shu-Chuan Tseng,eds.COLING 2002:Proceedings of the 19th international conference on Computational linguistics.Taipei:Howard International nouse,2002:1-7.
    [70]Wong S M,Li W J,Wong K F,et al.A Framework for modeling and representing temporal discourse structure[C].In:Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering.2005:213-218.
    [71]王凌飞.汉英机译系统中上下文处理的研究[D]:(硕士学位论文).厦门:厦门大学,2000.
    [72]马红妹,齐漩,王挺等.汉英机器翻译中汉语篇章时间信息系统模型[J].计算机工程与科学,2002,(4):85-88.
    [73]马红妹,王挺,陈火旺.汉语篇章时间短语的分析与时制验算[J].计算机研究与发展,2002,(10):1211-1220.
    [74]马红妹.汉英机器翻译中汉语上下文语境的表示与应用研究[D]:(博士学位论文).长沙:国防科技大学 2002.
    [75]孙广范.汉英机器翻译系统中动词时态的处理[C].见:HNC(第二届)学术研讨会论文集.北京,2003.
    [76]程节华,戴新宇,陈家骏等.汉英机器翻译中时体态处理[J].计算机应用研究,2004,(3):79-80.
    [77]Xiao Z H,Tony McEnery.A corpus-based approach to tense and aspect in English-Chinese translation[C].The 1st International Symposium on Contrastive and Translation Studies between Chinese and English.Shanghai,China,2002.
    [78]Qu Y H.A contrastive study of Chinese Progressive aspect structure,ZAI+verb and its English correspondents:a bilingual Parallel corpus-based perspective[C].In:Proceeding of of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering.2005.
    [79]何杰.现代汉语量词研究[M].北京:民族出版社,2000.
    [80]马庆株.数词、量词的语义成分和数量结构的语法功能[J].中国语文,1990,(3):161-172.
    [81]郭先珍.谈谈物量词对前搭配数词的语义选择[J].中国人民大学学报,1996,(3):98-102.
    [82]李宇明.量词与数词、名词的扭结[J].语言教学与研究,2000,(3):50-58.
    [83]郭先珍.现代汉语量词用法词典[M].北京:中国和平出版社,1987.
    [84]Sinclair J.Collins COBUILD grammar patterns 1:verbs[M].上海:上海外语教育出版社,2000.
    [85]Sinclair J.Collins COBUILD grammar patterns 2:nouns and adjectives[M].上海:上海外语教育出版社,2002.
    [86]Somer H.Review article:Example-based machine translation[J].Machine Translation,1999,14(2):113-157.
    [87]Kitano H,Higuchi T.High performance memory-based translation on IXM2 massively parallel associative memory processor[C].In:Proceedings Ninth National Conference on Artificial Intelligence(AAAI-91).USA:AAAI,1991:149-154.
    [88]Kitano H,Higuchi T.Massive parallel memory-based parsing[C].In:Proceedings of the twelfth International Conference on Artificial Intelligence(IJCAI-91).Australia:ACL,1991:918-924.
    [89]Kaji H,Kida Y,Morimoto Y.Learning translation templates from bilingual text[C].In:Proceedings of the 14th International Conference on Computational Linguistics.France:Association of Computational Linguistics,1992:672-678
    [90]张健.基于实例的机器翻译的泛化方法研究[D]:(硕士学位论文).北京:中国科学院计算技术研究所,2001.
    [91]Cicekli I,G(u|¨)venir H A.Learning translation templates from bilingual translation examples[J].Applied Intelligence,2001,15(1):57-76.
    [92]G(u|¨)venir H A,Cicekli I.Learning translation templates from examples[J].Information System,1998,23(6):353-363.
    [93]刘群.汉英机器翻译中若干关键技术研究[D]:(博士学位论文).北京:北京大学计算机系,2004.
    [94]Sleator D,Temperley D.Parsing English with a Link Grammar[R].Pittsburg:Carnegie Mellon University Computer Science technical report,1991,CMU-CS-91-196.
    [95]Grinberg D,Lafferty J,Sleator D.A robust parsing algorithm for link grammar[R].Carnegie Mellon University Computer Science technical report CMU-CS-95-125,Prague,1995,1-17.
    [96]冯志伟.链语法述评[J].语言文字应用,1999,(4):100-102.
    [97]Temperley D,Sleator D,Lafferty D.Link grammar parser 4.0[CP].http://www.link.cs.cmu.edu/link/:CMU,2004.
    [98]毛新年.基于Link-grammer英汉词层机器翻译系统[D]:(硕士论文).沈阳:东北大学,2000.
    [99]吕学强.基于E-Chunk的英汉机器翻译系统[D]:(硕士论文).沈阳:东北大学,2001.
    [100]吕学强,陈文亮,姚天顺.基于连接文法的双语E-Chunk获取方法[J].东北大学学报(自然科学版,2002,23(9):829-832.
    [101]吕学强,王德喜.链接文法及其应用[J].辽阳石油化工高等专科学校学报,2002.18(4):53-63.
    [102]Johan B,Alice M.Handbook of logic and language[M].MIT Press,1997.
    [103]龚千炎.汉语的时相时制时态[M].北京:商务印书馆,1995.
    [104]陈平.论现代汉语时间系统的三元结构[J].中国语文,1988,(6):401-422.
    [105]邹崇理.自然语言逻辑研究[M].北京:北京大学出版社 2000.
    [106]孙瑞禾.汉语虚词英译[M].北京:商务印书馆,1981.
    [107]潘文国.从“了”的英译看汉语的时体问题[C].见:汉语时体系统国际研讨会.上海,2003.
    [108]陈前瑞.汉语体貌系统研究[D]:(博士学位论文).武汉:华中师范大学,2003.
    [109]袁莉容.现代汉语句子时间语义范畴研究[D]:(硕士学位论文).成都:四川师范大学,2004.
    [110]陆俭明.现代汉语时间词说略[J].语言教学与研究,1991,(1):25-37.
    [111]胡培安.时间词语的内部组构与表达功能研究[D]:(博士学位论文).上海:华东师范大学,2005.
    [112]曹敏.汉语表达“过去”的手段[D]:(博士学位论文).成都:四川大学,2004.
    [113]陈国良.现代汉语时制表达及相关问题[D]:(硕士学位论文).长春:东北师范大学,2005.
    [114]龚千炎.谈现代汉语的时制表示和时态表达系统[J].中国语文,1991,(4):251-261.
    [115]戴耀晶.现代汉语时体系统研究[M].杭州:浙江教育出版社,1997.
    [116]程立民.汉语的时态和时态成分[J].语言研究,2002,(3):14-31.
    [117]薄冰.英语时态详解[M].北京:商务印书馆,1992.