基于树到串藏语机器翻译若干关键技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于树到串藏语机器翻译若干关键技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

作者：华却才让
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：藏语词法分析 ; 藏语依存分析 ; 藏语树库 ; 句法翻译模型 ; 依存树到串藏语 ; 机器翻译
英文关键词：Tibetan lexical analysis ; Tibetan dependency parse ; Tibetan Treebank ; syntactic translation model ; Tibetan dependency tree to string machine
英文关键词：translation
学位年度：2014
导师：赵海兴 ; 刘群
学科代码：081202
学位授予单位：陕西师范大学
论文提交日期：2014-05-01

摘要

目前,统计机器翻译已经成为机器翻译研究的主流,其研究经历了基于词模型、基于短语模型和基于句法模型的演变过程,正向着语义知识的模型迈进。在英汉等语言上已经取得了令人瞩目的可喜成绩。而面向藏语句法翻译模型的研究还处在起步阶段。这一方面是因为藏文信息处理的起步比较晚,另一方面是由于目前尚未完全解决藏语句法翻译用基础性关键技术。
     句法翻译模型是基于句法树的翻译模型,这种模型是利用句法树中所包含的句法知识和语义知识来构建的。其先决条件是具有比较成熟的词法分析技术、句法分析技术以及基于句法树的翻译规则自动抽取技术。而且依存句法树结构作为语义分析的先决工作,有助于提高统计机器翻译的质量。为此本文对依存树到串藏语机器翻译的一些关键技术进行了探索,目标在于完整地实现一个源端为藏语依存树的机器翻译系统。主要的研究内容和成果分四部分,具体如下：
     1.实现了一个包含分词和词性标注的藏语词法分析系统。考虑到藏语词法分析的实用性,提出先分词后词性标注的策略进行研究和实验。第一,分词部分提出了判别式的感知机模型加词图重排序的藏文分词方法,和基于规则的藏语音节切分方法。利用音节特征感知机模型进行词语粗切分并生成词图,然后在词图上计算最短路径时查询词典惩罚边权重,生成最优分词结果。兼顾了词语组成单元音节的局部特征和词语间非局部特征。第二,词性标注部分同样采用感知机方法提出基于判别式模型的藏语文本词类标注方法,融合藏语格词接续和词法特征训练出在线平均感知机词性标注模型,利用柱搜索解码算法实现了分词后的词性标注子模块。经实验证明,达到了比较理想的实验效果,目前已应用于全国藏汉机器翻译评测和句法分析等应用研究领域。
     2.根据藏语自身特点,首先制定了36类藏语依存句法标注规范。其次,藏语依存树库构建过程中存在的问题,提出了新颖的半自动依存树库构建模式,实现了基于词对依存分类模型的半自动树库构建可视化工具。首次构建了藏语依存树库TDTreebank V1.1,规模达1.1万句。第三,针对藏语特性提出融合丰富特征的统计藏语依存分析模型,实现了基于一层感知机模型的藏语依存句法分析器。实验结果表明,藏语依存句法分析的性能基本达到实际可用的水平,初步解决了目前藏语还没有依存句法标注规范、树库和依存句法分析器的实际问题。
     3.实现了藏语依存树到串模型翻译规则的抽取算法。根据依存树中依存关系的支配准则,用头-依存关系HDR (head-dependent relation)片段对藏语依存树进行分解,保证每个HDR片段包含与其他HDR片段重叠的节点,使得只需替换作为基本操作来描述依存树的生成过程。翻译规则的抽取算法通过树标记、可接受HDR片段的识别和规则的生成三步完成。翻译规则的源端为泛化的HDR片段,目标端为变量和目标语言词组组成的序列,对其进行泛化时引入了藏语开放词和封闭词性的约束,以改善翻译规则的判断能力。同时在生成头节点翻译规则时,引入了藏语基本数词的翻译模型。实验结果表明,词性的约束和基本数词的翻译有助于提高依存树到串模型的性能。
     4.实现了藏语依存树到串模型机器翻译的解码算法。本文选择自底向上的线图分析算法,由于在翻译规则中使用了子树一致性跨度的可接受HDR片段识别方法,对头-依存基本结构单元的操作只作替换,而且调序信息也表示在翻译规则中,故不再需要调序模型,简化了翻译解码算法。对于翻译规则词汇化和多种泛化的翻译表示方法,本文采用了所有翻译规则的完全匹配策略翻译方案,并用条件过滤和立方体剪枝算法生成最终的翻译假设。在小规模藏汉平行语料上进行了实验,结果表明,藏语依存树到串模型表现出了比较好的性能。本系统是目前第一个完成基于藏语句法翻译模型的藏语统计机器翻译系统。
The statistical machine translation (SMT) has become most popular in field of machine Translation processing in recent years. Its research has witnessed the development of word-based models, phrase-based models and syntax-based models, and now evolving into models exploiting semantic knowledge. SMT has gotten remarkable gratifying achievements in English and Chinese language. The research for Tibetan syntactic translation model is still in the initial stage. This is because the Tibetan information processing started relatively late, partly because the basic key technologies are not yet fully resolved for Tibetan syntactic t translation.
     Syntactic translations are syntax tree based model, which represented by syntactic and semantic knowledge contained in the syntax tree. Its prerequisites are relatively mature technologies of lexical analysis, syntax analysis and automatic extraction translation rules on syntactic tree. And language dependency structure holds both syntactic and semantic knowledge, and is viewed as a transition from syntactic representation to semantic representation, to help improve the quality of statistical machine translation. For this reason in thesis we focus on exploiting above key technologies for Tibetan dependency tree to string model SMT, aiming at developing a Tibetan dependency tree as source machine translation system.
     Specifically, the research contents and results of thesis are summarized to four parts, as follows:
     1. A Tibetan lexical analysis system that includes word segmentation and POS tagging is implemented. Considering the practicality of Tibetan lexical analysis, this paper put forward first word segmentation after speech tagging strategy. The word segmentation part proposed a perceptron model Tibetan word classification and lattice re-ranking method, and a new rule-based Tibetan syllable segmentation method. We make use of syllable features discriminative model to coarse segment words and generates a words lattice, then calculates shortest path with query dictionary punishment edge weights, finally generates optimal segmentation results. This method holds both words inner local features and non local features between words. POS tagging part proposed a perceptron method discriminative Tibetan speech tagging technology. With Tibetan lexical features, we designed model training feature template, and trained average perceptron weights. Last using beam search decoding algorithm tag POS to word segmented sentence. Experiments show, the Tibetan lexical analysis system has reached the practical level, and it has been applied to Tibetan and Chinese machine translation evaluation in CWMT and syntacitic analysis.
     2. There is no practical Tibetan dependency parser, dependency syntactic annotation standard and Treebank. This paper first defined36Tibetan dependency annotation classes. Secondly, aimed at the existing problems of building Tibetan dependency Treebank, proposed Tibetan word dependent classification model based semi-automatic Treebank constructing methods, including word-pair dependent classification model and the dependency edge labeling model. We developed semi-automatic syntax tree annotation software with properly designing feature template, and using maximum entropy trained the model. Using this semi-automatic dependency annotation tool, we proofread and constructed a Tibetan dependency TDTreebank1.1contain11thousands sentence. Third, we implement online average perceptron model training algorithm and maximum spanning tree based decoding algorithm. Experiments show that, the Tibetan dependency parser has almost reached level of use.
     3. The translation rule acquisition algorithm is implemented for Tibetan dependency tree to string model. According to the dependency tree control criterion using head-dependent relation (HDR) fragment to decompose Tibetan dependency tree. Ensure that each HDR fragment containing the overlap node with other HDR fragments, which simply replace as basic operations to generate dependency tree to string translation. The rule extraction algorithm through the tree labeling, acceptable HDR fragment recognition and generation rules three steps. In order to improve the judgment of translation rules, we use open and closed POS of Tibetan word to restrain the rules when it generalization. In the head node rule, we present Tibetan basic numeric translation model. Experiments show that, the POS constraint and basic numeral translation helps to improve the dependency tree to string model performance.
     4. Tibetan syntactic translation decoding algorithm is implemented, the decoder is based on bottom-up chart parsing algorithm. Since we use sub tree consistency span as constraint for acceptable HDR fragment identification on bilingual corpus, no longer need to reordering model. For lexicalization and various generalization rules, we chose complete matching all of rules scheme, and cube pruning algorithm. On a small scale of Tibetan and Chinese bilingual corpus, experiments show that, Tibetan dependency tree to string model got good performance. This is the first SMT system that has solved the Tibetan syntactic translation model.

引文

[1]格桑居冕.实用藏文语法[M].成都：四川民族出版社,1987年11月,50-70.
    [2]陈玉忠,李保利等.基于格助词和接续特征的藏文自动分词方案[J].语言文字应用,2003(2)：75-82.
    [3]才智杰.藏文自动分词系统中紧缩词的识别[J].中文信息学报,2009,23(1)：35-37.
    [4]才让加.藏语语料库词语分类体系及标记集研究[J].中文信息学报,2009,23(4)：146-148.
    [5]陈玉忠.信息处理用现代藏语词语的分类方案[C].第十届全国少数民族语言文字信息处理学术研讨会论文集,2005.
    [6]扎西加,珠杰.面向信息处理的藏文分词规范研究[J].中文信息学报,2009(3)：113-123.
    [7]多拉,扎西加.信息处理用藏文词类及标记规范[C].第十一届全国少数民族语言文字信息处理学术研讨会论文集,2007.
    [8]关白.信息处理用藏文分词单位研究[J].中文信息学报,2010(03)：124-128.
    [9]陈玉忠,李保利,俞士汶等.藏文自动分词系统的设计与实现[J].中文信息学报,2003,17(3)：15-20.
    [10]才智杰,才让卓玛.藏文自动分词系统设计[J].计算机工程与科学,2011,33(5)：151-154.
    [11]孙媛,罗桑强巴等.藏语交集型歧义字段切分方法研究[C].第十二届全国少数民族语言文字信息处理学术研讨会论文集,2009.
    [12]刘汇丹,诺明花,赵维纳等.SegT:一个实用的藏文分词系统[J].中文信息学报,2012(1)：97-103.
    [13]江荻.现代藏语组块分词的方法和过程概述[J].民族语文,2003(4)：30-39.
    [14]史晓东,卢亚军.央金藏文分词系统[J].中文信息学报,2011,25(4)：54-56.
    [15]苏俊峰,祁坤钰,本太.基于HMM的藏语语料库词性自动标注研究[J].西北民族大学学报(自然科学版),2009(1)：42-45.
    [16]李亚超,加羊吉,宗成庆,于洪志.基于条件随机场的藏语自动分词方法研究与实现[J].中文信息学报,2013(4)：52-58.
    [17]陈玉忠,俞士汶等.藏文信息处理技术的研究现状与展望[J].中国藏学,2003(4)：97-107.
    [18]关白.浅析藏文分词中的几个概念[J].西藏大学学报(自然科学版),2009(1).
    [19]L. Tesniere. Elements de syntaxe structurale[M]. Klincksieck, Paris/FRA,1959.
    [20]周明,黄昌宁.面向语料库标注的汉语依存体系的探讨[J].中文信息学报,1994,8(3)：35-51.
    [21]熊德意.基于括号转录语法和依存语法的统计机器翻译研究[D].中国科学院计算技术研究所,2007.4.
    [22]马金山,李生.基于统计方法的汉语依存句法分析研究[D].哈尔滨工业大学,2007.
    [23]王丽杰,刘挺.汉语语义依存分析研究[D].哈尔滨工业大学,2010.
    [24]P.F.Brown, V.J. Della Pietra, S.A.D. Pietra, and R.L.Mercer. The mathematics of statistical machine translation:Parameter estimation. Computational Linguistics,1993,19:263-311.
    [25]P.F. Brown, J.Cocke, S.A.D. Pietra, V.J.D. Pietra, F. Jelinek, J.D.Lafferty, R.L.Mercer, and P.S. Roossin. A statistical approach to machine translation [J]. Computational linguistics,1990,16(2):79-85.
    [26]赵红梅,吕雅娟,贲国生,黄云,刘群.第七届全国机器翻译研讨会机器翻译评测总结[J].中文信息学报,2012(1).
    [27]Y. Al-Onaizan, J. Curin, and M. Jahr. Statistical machine translation. Technical report, John Hopkins University Summer Workshop. (1999)http://www.clsp.jhu.edu/ws99/proiects/mt.
    [28]F.J. Och, C. Tillmann, and H. Ney, etal. Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. On Empirical Methods in Natural Language Processing and Very Large Corpora,1999, pages 20-28.
    [29]F.J. Och. Minimum error rate training in statistical machine translation. In Proceedings of ACL-2003, pages160-167, Sapporo, Japan, July 2003.
    [30]F.J. Och, and H. Ney. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages295-302, Philadelphia, Penn-Sylvania, USA, July 2002.
    [31]F.J. Och, and H. Ney. A systematic comparison of various statistical alignment models. Computational Linguistics,29(1):19-51,2003.
    [32]F.J. Och, and H. Ney. The alignment template approach to statistical machine translation, 2004.
    [33]K. Yamada, and K. Knight. A syntax-based statistical translation model. ACL 2001 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics Pages 523-530,2001.
    [34]D. Chiang. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL2005, pages263-270,2005.
    [35]D. Chiang. An introduction to synchronous grammars,2006.
    [36]D. Chiang. Hierarchical phrase-based translation. Computational Linguistics,33,2007.
    [37]D. Chiang, Y. Marton, and P. Resnik. Online large-margin training of syntactic and structural translation features. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages224-233, Honolulu, Hawaii, October2008. Association for Computational Linguistics.
    [38]M. Galley, M. Hopkins, K. Knight, and D. Marcu. What's in a translation rule? In HLT-NAACL 2004:Main Proceedings, pages 273-280, Boston Massachusetts, USA, May2-May7 2004.
    [39]Y. Liu, Q. Liu, and S. Lin.2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of ACL 2006, pages 609-616, Sydney, Australia, July.
    [40]刘洋.树到串统计翻译模型研究[D].中国科学院研究生院,2007.
    [41]熊德意.基于括号转录语法和依存语法的统计机器翻译研究[D].中国科学院计算技术研究所,2007年4月.
    [42]L. Shen, J. Xu, and R. Weischedel, A new string-to-dependency machine translation algorithm with a target dependency language model. In Proc. of the ACL/HLT, Columbus, Ohio, June 2008, pp.577-585.
    [43]L. Shen, and J. Xu. String-to-dependency statistical machine translation, Computational Linguistics, vol.36, no.4, pp.649-671, Dec.2010.
    [44]J. Xie, H. Mi, and Q. Liu, A novel dependency-to-string model for statistical machine translation, in Proc. of the EMNLP, Edinburgh, Scotland, UK., July2011, pp.216-226.
    [45]S.M.Shieber, and Y.Schabes. Synchronous tree-adjoining grammars. In Proceedings of the 13th conference on Computational linguistics-Volume 3, pages 253-258. Association for Computational Linguistics,1990.
    [46]M. Zhang, H. Jiang, A. Aw, H. Li, C.L. Tan, and S.Li. A tree sequence alignment-based tree-to-tree translation model. In proceedings ofACL-08:HLT, pages559-567, Columbus, Ohio, June 2008 Association for Computational Linguistics.
    [47]H. Mi and Q. Liu. Constituency to dependency translation with forests. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1433-1442, Uppsala, Sweden, July 2010 Association for Computational Linguistics.
    [48]Y. Ding, and M. Palmer. Synchronous dependency insertion grammars:A grammar formalism for syntax based statistical MT. In COLING 2004 Recent Advances in Dependency Grammar, pages90-97, Geneva, Switzerland, August 2004.
    [49]德盖才郎(陈玉忠)等.实用化汉藏机器翻译系统的设计与实现[M].智能计算机接口与应用进展,电子工业出版社,2001,404-411.
    [50]张海波,吕雅娟,华却才让.一种通用的少数民族语言语种和编码识别方法[C].第四届全国少数民族青年自然语言信息处理学术研讨会论文集.青海民族出版社,2012.
    [51]李响,才藏太,姜文斌,吕雅娟,刘群.最大熵和规则相结合的藏文句子边界识别方法[J].中文信息学报,2011(4)：39-44.
    [52]华却才让,刘群,赵海兴.判别式藏语文本词性标注研究[J].中文信息学报.已录用,论文编号：11869.
    [53]孙萌,华却才让,刘凯,吕雅娟,刘群.藏文数词识别与翻译[J].北京大学学报(自然科学版),2013,49(1)：75-80.
    [54]才让加,吉太加.藏语语料库的词性分类方法研究[J].青海师范大学学报,2005(4)：111-114.
    [55]华却才让,赵海兴.现代藏语依存句法标注初探[C].第十二届全国少数民族语言文字信息处理学术研讨会,2011.7.
    [56]华却才让,赵海兴.基于判别式藏语依存句法分析[J].计算机工程,2013,39(4)：300-304.
    [57]贾剑峰,史晓东.依存语法在汉英统计机器翻译中的应用[D].厦门大学,2008.4.
    [58]张家俊,宗成庆.机器翻译研究最新进展及其应用[OL],http://www.doc88.com/p-74286 1692154.html,2012.
    [59]J.K. Low, H.T. Ng and W. Guo. A maximum entropy approach to Chinese word segmentation. Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing,2005, pagesl61-164. Jeju Island, Korea.
    [60]J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields:Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th ICML, pages282-289, Massachusetts,USA,2001.
    [61]M. Collins. Discriminative training methods for hidden markov models:Theory and experiments with perceptron algorithms. In Proceedings of EMNLP, pages 1-8, Philadelphia,USA,2002.
    [62]高山,张艳等.基于三元统计模型的汉语分词标注一体化研究[C].2001年全国第六届计算语言学联合学术会议论文集,pp.116-112.
    [63]孙茂松,卢红娜,邹嘉彦.基于隐Markov模型的汉语词类自动标注的实验研究[J].清华大学学报(自然科学版),2000(9)：57-60.
    [64]刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004(8)：1421-1429.
    [65]L.R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of IEEE, pages257-286,1989.
    [66]P. Pereira, and Y. Schabes. Inside-outside reestimation from partially bracketed corpora. In Proceedings of ACL,1992.
    [67]S. Petrov, L. Barrett, R. Thibaux, and D. Klein. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the ACL,2006.
    [68]P. Hellwig. Dependency unification grammar[C]. Proceeding of Coling'86.1986.
    [69]M.C.D. Marneffe, C.D. Manning. Stanford typed dependencies manual.2008.
    [70]周明,黄昌宁.面向语料库标注的汉语依存体系的探讨[J].中文信息学报,1994,8(3)：35-51.
    [71]J. Eisner. Three New Probabilistic models for dependency parsing:An exploration. Proc. of the 16th Intern. Conf. On Computational Linguistics (COLING).1996:340-345.
    [72]J. Eisner. An empirical comparison of probability models for dependency grammar. Technical report, IRCS-96-11, IRCS, University of Pennsylvania.1996:1-18.
    [73]H. Yamada and Y. Matsumoto. Statistical dependency analysis with support vector machines. Proc. of the 8th Intern. Workshop on Parsing Technologies (IWPT).2003:195-206.
    [74]Y.C. Cheng, M. Asahara, and Y. Matsumoto. Deterministic dependency structure analyzer for Chinese. Proceedings of International Joint Conference of NLP.2004:500-508.
    [75]Y.C. Cheng, M. Asahara, and Y. Matsumoto. Chinese deterministic dependency analyzer: examining effects of global features and root node finder. Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing.2005.
    [76]R. McDonald and F. Pereira. Online learning of approximate dependency parsing algorithms. In Proceedings ofEACL,2006:81-88.
    [77]R. McDonald, K. Crammer, and F. Pereira. Online large-margin training of dependency parsers. In Proceedings of ACL,2005:91-98.
    [78]R. McDonald. Discriminative learning and spanning tree algorithms for dependency parsing. In Ph.D. thesis, University of Pennsylvania,2006.
    [79]Y. Seginer. Fast unsupervised incremental parsing. In Proceedings of the ACL,2007.
    [80]R. Bod. An all-sub trees approach to unsupervised parsing. In Proceedings of the COLING-ACL,2006.
    [81]D. McClosky, E. Charniak, and M. Johnson. Reranking and self-training for parser adaptation. In Proceedings of ACL 2006.
    [82]A. Sarkar. Applying co-training methods to statistical parsing. In Proceedings of NAACL,2001.
    [83]M. Steedman, M. Osborne, A. Sarkar, S. Clark, R. Hwa, J. Hockenmaier, P. Ruhlen, S. Baker, and J. Crim. Bootstrapping statistical parsers from small datasets. In Proceedings of the EACL,2003.
    [84]华却才让,姜文斌,赵海兴,刘群.基于词对依存分类的藏语树库半自动构建研究[J].中文信息学报,2013,27(5)：166-172.
    [85]P. Koehn, H. Hoang, A. Birch, C. C. Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E.H. Moses:Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demoand Poster Sessions, pages177-180, Prague, Czech Republic, June2007. Association for Computational Linguistics.
    [86]P. Koehn, F.J. Och, and D. Marcu. Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, July 2003.
    [87]D.Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics,23(3):377-403,1997.
    [88]D. Xiong, and Q. Liu, and S. Lin. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages521-528, Sydney, Australia, July 2006.
    [89]X. Carreras and M. Collins. Non-projective parsing for statistical machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages200-209, Singapore, August 2009. Association for Computational Linguistics.
    [90]S. DeNeefe and K. Knight. Synchronous tree adjoining machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages727-736, Singapore, August 2009. Association for Computational Linguistics.
    [91]D. Lin. A path-based transfer model for machine translation. In Proceedings of Coling 2004, pages625-630, Geneva, Switzerland, Aug 2004.
    [92]H. Mi, L. Huang, and Q. Liu. Forest-based translation. In Proceedings ofACL2008:HLT, pages192-199, Columbus, Ohio, June 2008.
    [93]H. Mi and Q. Liu. Constituency to dependency translation with forests. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1433-1442, Uppsala, Sweden, July 2010. Association for Computational Linguistics.
    [94]C. Quirk, A. Menezes, and C. Cherry. Dependency treelet translation:Syntactically informed phrasals MT. In Proceeding sofACL2005, pages271-279,2005.
    [95]D. Xiong, Q. Liu, and S. Lin. A dependency treelet string correspondence model for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages40-47, Prague, Czech Republic, June 2007.
    [96]刘群.基于句法的统计机器翻译模型与方法[J].中文信息学报,2011(6)：63-71.
    [97]才智杰,才让卓玛.班智达藏文标注词典设计[J].中文信息学报,2010,24(5)：46-49.
    [98]于洪志,李亚超,汪昆,冷本扎西.融合音节特征的最大熵藏文词性标注研究[J].中文信息学报,2013(5)：160-165.
    [99]N. Xue, L. Shen. Chinese word segmentation as LMR tagging. In Proceedings of the 2nd SGHAN Workshop on Chinese Language Processing, in conjunction with ACL'03, page 176-179,2003.
    [100]W. Jiang, H. Mi, Q. Liu. Word lattice reranking for Chinese word segmentation and sart-of-speech tagging. In proceedings of 22nd international Conference on Computational Linguistics, page 385-392,2008.
    [101]珠杰,李天瑞,乔少杰.藏文音节规则模型及应用[J].北京大学学报(自然科学版),2013(1)：P69-74.
    [102]H.T. Ng and J.K. Low. Chinese part-of-speech tagging:One-at-a-time or all-at-once? word-based or character-based? In Proceedings of EMNLP,2004.
    [103]M. Collins. Discriminative reranking for naturallanguage parsing. In Proceedings of the ICML,2000, pages175-182.
    [104]扎塘·将白益西坚参.新编藏文语法[M].拉萨：西藏人民出版社,1997.4.
    [105]宗成庆.统计自然语言处理[M].北京：清华大学出版社,2008.5.
    [106]W. Jiang, Y. Lv, Y. Liu and Q. Liu. Effective constituent projection across languages. In Proceedings of COLING2010, Beijing, China.
    [107]A.M. Zwicky. Heads. Journal of Linguistics,1985,21:1-29.
    [108]R. Hudson. English Word Grammar. Oxford:Blackwell,1990.
    [109]M.l Collins. A new statistical parser based on bigram lexical dependencies. In Proceedings of ACL,1996.
    [110]T. Kasami. An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL-65-758 (1965), Air Force Cambridge Research Lab, Bedford, MA.
    [111]W. Jiang, Qun Liu. Dependency parsing and projection based on word pair classification[C]. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Uppsala, Sweden 2010:12-20.
    [112]P. Liang. Learning dependency-based compositional semantics. Portland, ﹔egon, June, 2011.CO2011 Association for Computational Linguistics.
    [113]A. Venugopal and S. Vogel. Considerations in mce and mmi training for Statistical machine translation. In Proceedings of the Tenth Conference of the European Association for Machine Translation (EAMT-05), Budapest, Hungary, May 2005. The European Association for Machine Translation.
    [114]K. Papineni, S. Roukos, T. Ward, and W. Zhu. Bleu:a method for automatic evaluation of machine translation. In Proceedings of ACL2002, pages311-318, Philadelphia, Pennsylvania, USA, July 2002.
    [115]P. Koehn(著).宗成庆,张霄军(译).统计机器翻译[M].北京：电子工业出版社,2012.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700