面向中文的修辞结构关系分类体系及无歧义标注方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Chinese-Oriented Rhetorical Structure Relation Taxonomy and Unambiguous Annotation Method
  • 作者:侯圣峦 ; 费超群 ; 张书涵
  • 英文作者:HOU Shengluan;FEI Chaoqun;ZHANG Shuhan;Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences;University of Chinese Academy of Sciences;
  • 关键词:自然语言处理 ; 修辞结构理论 ; 修辞结构关系 ; 篇章结构分析
  • 英文关键词:natural language processing;;phetorical structure theory;;rhetorical structure relation;;discourse parsing
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:中国科学院计算技术研究所智能信息处理重点实验室;中国科学院大学;
  • 出版日期:2019-07-15
  • 出版单位:中文信息学报
  • 年:2019
  • 期:v.33
  • 基金:国家重点研发计划(2016YFB1000902);; 国家自然科学基金(61232015,61472412,61621003)
  • 语种:中文;
  • 页:MESS201907004
  • 页数:11
  • CN:07
  • ISSN:11-2325/N
  • 分类号:25-35
摘要
修辞结构理论是一种重要的篇章结构理论,其核心是修辞结构关系。该文基于修辞结构理论,结合中文文本特点,提出面向中文的层次化修辞结构关系分类体系及多元定义。同时,针对标注者遇到的歧义问题,提出了无歧义标注方法。为了便于标注,设计并实现了基于Java图形界面的标注工具RSTTagger,该工具以句子的主谓结构关键词构成的元组作为基本标注单位,自底向上逐级标注,最终标注成一棵完整的修辞结构关系树。为验证标注结果的一致性,选取160篇中文外贸领域语料进行标注,不同标注者同时标注其中50篇,标注一致性达到76.63%。该标注框架可以应用到其他领域语料标注中,已标注的160篇语料可以作为篇章结构理论研究的基础语料库。
        Rhetorical Structure Theory(RST)is a common discourse structure theories,emphasizing the RSR(rhetorical structure relation).Based on English-oriented RST and the characteristics of Chinese text,this paper presents a hierarchical taxonomy and multiple definitions of Chinese-oriented RSR.Moreover,an annotated method is proposed to deal with the problem of ambiguity.A Java-GUI based tagging tool called RST Tagger is designed and implemented as a bottom-up tagger,whose elementary tagging unit is a subject-predicate structure and tagging result is a full discourse structure tree.To validate our proposed tagging framework,we selected 160 Chinese foreign trade text as the tagging corpus,from which 50 texts were randomly selected to be tagged by different annotators.We got annotator agreement with score 76.63%.
引文
[1]Hirao T,Yoshida Y,Nishino M,et al.Single-document summarization as a tree knapsack problem[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,2013:1515-1520.
    [2]Guzmán F,Joty S,Màrquez L,et al.Using discourse structure improves machine translation evaluation[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers),2014,1:687-698.
    [3]Taboada M,Mann W C.Applications of rhetorical structure theory[J].Discourse Studies,2006,8(4):567-588.
    [4]Mann W C,Thompson S A.Rhetorical structure theory:Toward a functional theory of text organization[J].Text-Interdisciplinary Journal for the Study of Discourse,1988,8(3):243-281.
    [5]Marcu D.The theory and practice of discourse parsing and summarization[M].Cambridge:MIT Press,2000.
    [6]Carlson L,Marcu D,Okurowski M E.Building a discourse-tagged corpus in the framework of rhetorical structure theory[M].Current and new directions in discourse and dialogue.Berlin:Springer,2003:85-112.
    [7]Hou S,Huang Y,Fei C,et al.Holographic lexical chain and its application in Chinese text summarization[C]//Proceedings of Asia-Pacific Web(APWeb)and Web-Age Information Management(WAIM)Joint Conference on Web and Big Data.Springer,Cham,2017:266-281.
    [8]Taboada M,Mann W C.Rhetorical structure theory:Looking back and moving ahead[J].Discourse Studies,2006,8(3):423-459.
    [9]Ji Y,Eisenstein J.Representation learning for textlevel discourse parsing[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics,2014,1:13-24.
    [10]严为绒,徐扬,朱珊珊,等.篇章关系分析研究综述[J].中文信息学报,2016,30(04):1-11.
    [11]O'Donnell M.RSTTool 2.4:A markup tool for rhetorical structure theory[C]//Proceedings of the 1st International Conference on Natural Language Generation. Association for Computational Linguistics,2000:253-256.
    [12]周小佩,洪宇,车婷婷,等.一种无指导的隐式篇章关系推理方法研究[J].中文信息学报,2013,27(02):17-25,46.
    [13]李生,孔芳,周国栋.基于PDTB体系的隐式篇章关系识别[J].中文信息学报,2016,30(04):81-89.
    [14]Webber B D.LTAG:Extending lexicalized TAG to discourse[J].Cognitive Science,2004,28(5):751-779.
    [15]乐明.汉语篇章修辞结构的标注研究[J].中文信息学报,2008,22(04):19-23,42.
    [16]张牧宇,秦兵,刘挺.中文篇章级句间语义关系体系及标注[J].中文信息学报,2014,28(02):28-36.
    [17]徐凡,朱巧明,周国栋.篇章分析技术综述[J].中文信息学报,2013,27(03):20-32,55.
    [18]王荀,李素建,王宇昕.内容标签和关系标签相结合的汉语篇章标注规范[J].中文信息学报,2015,29(03):65-70.
    [19]Shi Y.The establishment of modern Chinese grammar:The formation of the resultative construction and its effects[M].John Benjamins Publishing,2002.
    [20]Joty S,Carenini G,Ng R,et al.Combining intra-and multi-sentential rhetorical parsing for document-level discourse analysis[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics,2013,1:486-496.
    (1)http://www.sfu.ca/rst/01intro/definitions.html
    (2)https://catalog.ldc.upenn.edu/LDC2002T07

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700