改进的TextRank双层单文档摘要提取算法

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息查看全文 | 推荐本文 |

英文篇名：Improved TextRank Double Layers Single-document Summation Extracting Algorithm
作者：何春辉 ; 李云翔 ; 王孟然 ; 王梦贤
英文作者：HE Chunhui;LI Yunxiang;WANG Mengran;WANG Mengxian;School of Mathematics and Computational Sciences, Xiangtan University;School of Science, Hunan City University;Yinshan School of Changsha County;School of Management, Hunan City University;
关键词：TextRank ; 信息抽取 ; 摘要算法 ; 累计贡献率
英文关键词：TextRank;;information extraction;;summation algorithm;;accumulating contribution rate
中文刊名：HNCG
英文刊名：Journal of Hunan City University(Natural Science)
机构：湘潭大学数学与计算科学学院;湖南城市学院理学院;长沙县印山学校;湖南城市学院管理学院;
出版日期：2017-11-15
出版单位：湖南城市学院学报(自然科学版)
年：2017
期：v.26;No.90
基金：益阳市科技计划项目(2014JZ40)
语种：中文;
页：HNCG201706012
页数：6
CN：06
ISSN：43-1428/TU
分类号：58-63

摘要

本文提出了基于句子重要度的累积贡献率摘要句筛选算法和改进的TextRank双层单文档摘要提取算法﹒摘要提取算法采用了分层结构,在不同层上融合了基于句子重要度的累积贡献率摘要句筛选算法,同时使用了长句和短句两种不同分割方式相结合的策略来构建摘要提取算法﹒用手工整理的中文单文档摘要数据集验证了算法的性能,结果表明:提取的摘要质量非常好﹒
A summation sentence selection algorithm based on accumulating contribution rate of sentence importance and an improved TextRank double layers single-document summation extraction algorithm are proposed in this paper. The summation extraction algorithm adopts the hierarchical structure, on the different layer, the summation sentence selection algorithm based on accumulating contribution rate of sentence importance is blended, at the same time, using long sentences and short sentences in two different ways to construct summation extraction algorithm. The manual finishing Chinese single-document summation data set is used to verify the performance of the algorithm, the results show that the quality of the extraction summation is very fine.

引文

[1]LUHN H P.The automatic creation of literature abstracts[J].IBM Journal of Research and Development,1958,2(2):159-165.
    [2]WANG Y C.Automatic extraction of words from Chinese textual data[J].Journal of Computer Science and Technology,1987,2(4):287-291.
    [3]曹洋.基于Text Rank算法的单文档自动文摘研究[D].江苏:南京大学,2016.
    [4]MIHALCEA R,TARAU P.Textrank:Bringing order into texts[C].Conference on Empirical Methods in Natural Language Processing,EMNLP 2004,2004:404-411.
    [5]PAGE L,BRIN S,MOTWANI R,et al.The Page Rank citation ranking:Bringing order to the web[J].Stanford Digital Libraries Working Paper,1998,9(1):1-17.
    [6]余珊珊,苏锦钿,李鹏飞.基于改进的Text Rank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247.
    [7]AIZAWA A.An information-theoretic perspective of TF–IDF measures[J].Information Processing&Management,2003,39(1):45-65.
    [8]HU B T,CHEN Q C,ZHU F Z.LCSTS:A large scale Chinese short text summarization dataset[J].Computer Science,2015,1967-1972.
    [9]EDMUNDSON H P.New methods in automatic extracting[J].Journal of the Acm,1969,16(2):264-285.
    [10]秦玉平,邱凤凤,冷强奎.组合凸线器和Hadamard纠错码相结合的多类文本分类算法[J].渤海大学学报:自然科学版,2017,38(1):71-75.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700