基于加权AMR图的语义子图预测摘要算法

英文篇名：Semantic Subgraph Predictive Summary Algorithm Based on Weighted AMR Graph
作者：明拓思宇 ; 陈鸿昶 ; 黄瑞阳 ; 柳杨
英文作者：M ING Tuosiyu;CHEN Hongchang;HUANG Ruiyang;LIU Yang;National Digital Switching System Engineering and Technological R&D Center;
关键词：抽象语义表示图 ; 语义摘要子图 ; 语义信息 ; 冗余信息 ; 摘要评价指标
英文关键词：Abstarct Meaning Representation(AMR) graph;;semantic abstract subgraph;;semantic information;;redundant information;;summary evaluation index
中文刊名：JSJC
英文刊名：Computer Engineering
机构：国家数字交换系统工程技术研究中心;
出版日期：2018-10-15
出版单位：计算机工程
年：2018
期：v.44;No.493
基金：国家自然科学基金(61601513)
语种：中文;
页：JSJC201810046
页数：7
CN：10
ISSN：31-1289/TP
分类号：298-303+308

摘要

方法多数停留在挖掘词与词之间的浅层语义关系,没有很好地利用词句之间的完整语义信息,为此,提出一种改进的语义子图预测摘要的算法。将原始文本转化为相应的抽象语义表示(AMR)图,融合成一个AMR总图,基于WordNet语义词典对其进行冗余信息的过滤。在此基础上利用综合统计特征对不具有权值的AMR图节点赋予权值,通过筛选重要性程度高的部分构成语义摘要子图,并基于ROUGE指标和Smatch指标综合衡量生成摘要的质量。实验结果表明,与仅挖掘浅层语义关系的文本摘要基准算法相比,该算法ROUGE值和Smatch值明显提高。
Most of the existing text abstract methods stay in the shallow semantic relationship between words and w ords,and do not make good use of the complete semantic information betw een w ords. Therefore,an improved algorithm for semantic subgraph predictive summary is proposed. The algorithm transforms the original text into corresponding Abstract M eaning Representation( AM R) graphs,merges them into an AM R total graph,and filters the redundant information based on the WordNet semantic dictionary. On this basis,using the comprehensive statistical features assigns w eights to the AM R graph nodes that do not have w eights,and constructs the semantic summary subgraphs by filtering the parts w ith high importance,and comprehensively measures the quality of the abstracts based on the ROUGE index and the Smatch index. Experimental results show that compared w ith the text abstraction benchmark algorithm w hich only mines shallow semantic relations,the ROUGE value and Smatch value of the algorithm are significantly improved.

引文

[1]NALLAPATI R,ZHOU B,SANTOS C N D,et al.Abstractive text summarization using sequence-tosequence RNNs and beyond[C]//Proceedings of CoNLL’16.Washington D.C.,USA:IEEE Press,2016:125-136.
    [2]王萌,何婷婷,姬东鸿,等.基于HowNet概念获取的中文自动文摘系统[J].中文信息学报,2005,19(3):87-93.
    [3]MILLER G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41.
    [4]MIHALCEA R,TARAU P.TextRank:bringing order into texts[EB/OL].[2018-01-21].http://w w w.aclw eb.org/.
    [5]李宝程.基于浅层语义分析的文本摘要方法研究与实现[D].成都:电子科技大学,2016.
    [6]吴晓锋,宗成庆.一种基于LDA的CRF自动文摘方法[J].中文信息学报,2009,23(6):39-45.
    [7]罗森林,白建敏,潘丽敏,等.融合句义特征的多文档自动摘要算法研究[J].北京理工大学学报,2016,36(10):1059-1064.
    [8]BANARESCU L,BONIAL C,CAI S,et al.Abstract meaning representation for sembanking[C]//Proceedings of Linguistic Annotation Workshop on Interoperability w ith Discourse.Washington D.C.,USA:IEEE Press,2013:178-186.
    [9]LIU F,FLANIGN J,THOMSO S,et al.Toward abstractive summarization using semantic representations[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Washington D.C.,USA:IEEE Press,2015:1077-1086.
    [10]HERMANN K M,KOCISKY T.Teaching machines to read and comprehend[C]//Proceedings of International Conference on Neural Information Processing Systems.[S.1.]:MIT Press,2015:1693-1701.
    [11]曲维光,周俊生,吴晓东,等.自然语言句子抽象语义表示AMR研究综述[J].数据采集与处理,2017,32(1):26-36.
    [12]李斌,闻媛,宋丽,等.融合概念对齐信息的中文AMR语料库的构建[J].中文信息学报,2017,31(6):93-102.
    [13]SONG L,PENG X,ZHANG Y,et al.AMR-to-text generation with synchronous node replacement Grammar[EB/OL].[2018-01-21].http://www.aclweb.org.
    [14]KONSTAS I,IYER S,YATSKAR M,et al.Neural AMR:sequence-to-sequence models for parsing and generation[EB/OL].[2018-01-21].http://www.ikonstas.net.
    [15]杜秀英.基于聚类与语义相似分析的多文本自动摘要方法[J].情报杂志,2017,36(6):167-172.
    [16]宁可,孙同晶,徐洁洁.面向海量数据的改进最近邻优先吸收聚类算法[J].计算机工程,2018,44(4):35-40.
    [17]GOLDSTEIN J,MITTAL V,CARBONELL J,et al.Multidocument summarization by sentence extraction[C]//Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization Association for Computational Linguistics.Washington D.C.,USA:IEEE Press,2000:40-48.
    [18]孟令阁,马建芬,张雪英.基于主题的SVM与MMR融合的会议摘要技术[J].计算机工程与设计,2016,37(10):2695-2699.
    [19]刘寒磊,关毅,徐永东.多文档文摘中基于语义相似度的最大边缘相关技术研究[C]//全国计算语言学联合学术会议论文集.南京:[出版社不详],2005.
    [20]TAN J,WAN X,XIAO J.Abstractive document summarization with a graph-based attentional neural model[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Washington D.C.,USA:IEEE Press,2017:1171-1181.
    [21]SEE A,LIU P J,MANNING C D.Get to the point:summarization with pointer-generator networks[EB/OL].[2018-01-21].http://www.aclweb.org/.
    [22]FLICK C.ROUGE:a package for automatic evaluation of summaries[C]//Proceedings of IEEE Workshop on Text Summarization Branches Out.Washington D.C.,USA:IEEE Press,2004:10.
    [23]CAI S,KNIGHT K.Smatch:an evaluation metric for semantic feature structures[C]//Proceedings of IEEEMeeting of the Association for Computational Linguistics.Washington D.C.,USA:IEEE Press,2012:748-752.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700