汉英篇章结构平行语料库的对齐标注研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Alignment and Annotation of Chinese-English Discourse Structure Parallel Corpus
  • 作者:冯文贺
  • 英文作者:FENG Wenhe;Department of Chinese Language and Literature,He Nan Institute of Science and Technology;School of Computer,Wuhan University;
  • 关键词:平行语料库 ; 对齐标注 ; 篇章结构
  • 英文关键词:parallel corpus;;alignment;;discourse structure
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:河南科技学院中文系;武汉大学计算机学院;
  • 出版日期:2013-11-15
  • 出版单位:中文信息学报
  • 年:2013
  • 期:v.27
  • 基金:国家自然科学基金资助项目(61273320);; 教育部人文社科青年资助项目(13YJC740022);; 中国博士后基金资助项目(2013M540594);; 国家社科基金资助项目(13BYY026)
  • 语种:中文;
  • 页:MESS201306023
  • 页数:8
  • CN:06
  • ISSN:11-2325/N
  • 分类号:162-168+190
摘要
篇章结构平行语料库是对具有对译关系的双语文本标注了平行篇章结构信息的语料库。对齐标注是汉英篇章结构平行语料库的核心理论基础。该文提出"结构对齐,关系对齐"的对齐标注策略,应用于切分对齐、层次结构对齐、关系对齐、中心对齐等环节,实现了对齐和标注并行、单位对齐和结构对齐共进的平行语料库工作模式。本策略辅之以相应标注平台和工作程序以及相应难点解决方案,被证明是一种高效的篇章结构平行语料库工作方式。
        Discourse structure parallel corpus is a corpus annotated with parallel discourse structure information for bilingual text.This paper proposes such an alignment and annotation strategy,the structural and relational alignment,which is the theoretical basis of Chinese-English discourse structure parallel corpus.This strategy is applied to the corpus building process,including segmental,structural,relational,and central alignment,having achieved an operation mode of parallel corps along with alignment and annotation working together,as well unit alignment and structural alignment.The strategy with the help of corresponding annotation software and the solutions to the difficulties has been proved to be an effective operation mode for discourse structure parallel corpus.
引文
[1]柏晓静,常宝宝,詹卫东,等.构建大规模的汉英双语平行语料库[C]//机器翻译研究进展—2002年全国机器翻译研讨会论文集.2002.
    [2]王克非.双语对应语料库:研制与应用[M].北京:外语教学与研究出版社.2004.
    [3]刘泽权,田璐,刘超朋.《红楼梦》中英文平行语料库的创建[J].当代语言学,2008,10(4):329-339.
    [4]Carlson L,Marcu D,Okurowski M E.Building a discourse-tagged corpus in the framework of rhetorical structure theory[C]//Proceedings of Jan van Kuppevelt and Ronnie W.Smith(eds.),Current and New Directions in Discourse and Dialogue,Kluwer Academic Publishers,2003:85-112.
    [5]Wolf F,Gibson E.Representing discourse coherence:A corpus-based study[J].Computational Linguistics,2005,31(2):249-287.
    [6]Prasad R,Dinesh N,Lee A,et al.The Penn Discourse Treebank 2.0[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation.2008.
    [7]Xue N.Annotating discourse connectives in the Chinese Treebank[C]//Proceedings of the Workshop on Frontiers in Corpus Annotations II:Pie in the Sky.Association for Computational Linguistics,2005:84-91.
    [8]乐明.汉语篇章修辞结构的标注研究[J].中文信息学报,2008,22(4):19-23.
    [9]Zhou Y,Xue N.PDTB-style Discourse Annotation of Chinese Text[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics.2012:69-77.
    [10]刘群.汉英机器翻译若干关键技术研究[M].北京:清华大学出版社.2008.
    [11]李艳翠,冯文贺,周固栋,等.基于逗号的汉语子句识别研究[J].北京大学学报:自然科学版,2013(1):7-14.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700