基于主题相似度的宏观篇章主次关系识别方法

英文篇名：A Macro Discourse Primary and Secondary Relation Recognition Method Based on Topic Similarity
作者：蒋峰 ; 褚晓敏 ; 徐昇 ; 李培峰 ; 朱巧明
英文作者：JIANG Feng;CHU Xiaomin;XU Sheng;LI Peifeng;ZHU Qiaoming;School of Computer Sciences and Technology,Soochow University;Provincial Key Laboratory for Computer Information Processing Technology;
关键词：宏观篇章主次关系 ; 主题相似度 ; word2vec ; LDA
英文关键词：macro discourse-level primary and secondary relation;;topic similarity;;word2vec;;LDA
中文刊名：MESS
英文刊名：Journal of Chinese Information Processing
机构：苏州大学计算机科学与技术学院;江苏省计算机信息技术处理重点实验室;
出版日期：2018-01-15
出版单位：中文信息学报
年：2018
期：v.32
基金：国家自然科学基金(61773276,61472265,61772354);; 江苏省科技计划(BK20151222)
语种：中文;
页：MESS201801007
页数：8
CN：01
ISSN：11-2325/N
分类号：47-54

摘要

篇章分析是自然语言处理领域的一个重要任务。分析篇章主次关系有助于理解篇章的结构和语义,并为自然语言处理的应用提供有力的支持。该文在微观篇章主次关系识别研究的基础上,重点研究宏观篇章主次关系,提出了一种基于word2vec和LDA的主题相似度的宏观篇章主次关系识别模型。基于word2vec的主题相似度和基于LDA的主题相似度在不同维度上计算语义相似度,两者在语义层面形成互补,因而增强了模型识别宏观篇章主次关系的能力。该模型在宏观汉语篇章树库(MCDTB)上实验的F1值达到79.9%,正确率达到81.82%,相较基准系统分别提升了1.7%和1.81%。
Discourse analysis is an important task in the field of natural language processing.The analysis of primary and secondary relations at discourse-level helps to understand the discourse structure and semantics.Based on the research of micro discourse-level primary and secondary relation recognition,this paper aims at macro discourse-level primary and secondary relation and provides a recognition model based on topic similarity with word2vec and LDA.The topic similarity based on word2vce and the topic similarity based on LDA calculate the semantic similarity on different dimensions.They are complementary at the semantic level,which enhances the ability of the model to recognize the macro discourse-level primary and secondary relations.Experimental results on the Macro Chinese Discourse TreeBank(MCDTB)show that our model achieves 79.9% in F1-score,and 81.82% in accuracy,which improves the baseline by 1.7% and 1.81% ,respectively.

引文

[1]褚晓敏,朱巧明,周国栋.自然语言处理中的篇章主次关系研究[J].计算机学报,2017,40(4):842-860.
    [2]Zou B,Zhou G,Zhu Q.Negation focus identification with contextual discourse information[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,Maryland,USA:Association for Computational Linguistics,2014:522-530.
    [3]Cohan A,Goharian N.Scientific article summarization using citation-context and article’s discourse structure[C]//Proceedings of the 2015Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal:Association for Computational Linguistics,2015:390-400.
    [4]Liakata M,Dobnik S,Saha S,et al.A discourse-driven content model for summarising scientific articles evaluated in a complex question answering task[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Seattle,Washington,USA:Association for Computational Linguistics,2013:747-757.
    [5]Mann W C,Thompson S A.Relational propositions in discourse[J].Discourse Processes,1986,9(1):57-90.
    [6]Mann W C,Thompson S A.Rhetorical structure theory:A theory of text organization[J].Text-Interdiscip-linary Journal for the Study of Discourse,1987,8(3):243-281.
    [7]Van Dijk T A.Macrostructures:An interdisciplinary study of global structures in discourse,interaction,and cognition[M].Hillsdale,New Jersey,USA:Lawrence Erlbaum Associates,Inc.,1980.
    [8]Hernault H,Prendinger H,Ishizuka M.HILDA:A discourse parser using support vector machine classification[J].Dialogue&Discourse,2010,1(3):1-33.
    [9]Joty S R,Carenini G,Ng R T,et al.Combining intraand multi-sentential rhetorical parsing for documentlevel discourse analysis[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.Sofia,Bulgaria:Association for Computational Linguistics,2013:486-496.
    [10]Joty S,Carenini G,Ng R T.A novel discriminative framework for sentence-level discourse analysis[C]//Proceedings of the 2012Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Jeju Island Korea:Association for Computational Linguistics,2012:904-915.
    [11]Feng V W,Hirst G.A linear-time bottom-up discourse parser with constraints and post-editing[C]//Proceedings of the 52nd Annual Meeting of the Association for Association for Computational Linguistics.Baltimore,Maryland,USA:Association for Computational Linguistics,2014:511-521.
    [12]Feng V W,Hirst G.Text-level discourse parsing with rich linguistic features[C]//Proceedings of the50th Annual Meeting of the Association for Computational Linguistics.Jeju,Republic of Korea:Association for Computational Linguistics,2012.60-68.
    [13]Chu X,Wang Z,Zhu Q,et al.Recognizing nuclearity between Chinese discourse units[C]//Asian Language Processing(IALP).2015International Conference on.IEEE,2015:197-200.
    [14]李艳翠.汉语篇章结构表示体系及资源构建研究[D].苏州:苏州大学,2015.
    [15]李锦,廖开洪.汉英语篇主题与段落结构模式的比较研究[J].暨南学报(哲学社会科学版),2001,23(5):89-93.
    [16]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,(26):3111-3119.
    [17]Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(Jan):993-1022.
    [18]徐帅.面向问答系统的复述识别技术研究与实现[D].哈尔滨:哈尔滨工业大学,2009.
    [19]Hoffman M,Bach F R,Blei D M.Online learning for latent dirichlet allocation[C]//Advances in Neural Information Processing Systems.Hyatt Regency,Vancouver CANADA:Neural Information Processing Systems Foundation,Inc.,2010:856-864.
    (1)http://www.nltk.org/

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700