基于TextRank的自动摘要优化算法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于TextRank的自动摘要优化算法

详细信息查看全文 | 推荐本文 |

英文篇名：Automatic digest optimization algorithm based on TextRank
作者：李娜娜 ; 刘培玉 ; 刘文锋 ; 刘伟童
英文作者：Li Nana;Liu Peiyu;Liu Wenfeng;Liu Weitong;School of Information Science & Engineering,Shandong Normal University;Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology;School of Computer Science,Heze University;
关键词：摘要提取 ; TextRank ; 结构信息 ; 候选摘要句群 ; 冗余处理
英文关键词：abstract extraction;;TextRank;;structure information;;digest candidate sentence group;;redundancy processing
中文刊名：JSYJ
英文刊名：Application Research of Computers
机构：山东师范大学信息科学与工程学院;山东省分布式计算机软件新技术重点实验室;菏泽学院计算机学院;
出版日期：2018-03-14 17:30
出版单位：计算机应用研究
年：2019
期：v.36;No.330
基金：国家自然科学基金资助项目(61373148);; 国家青年自然科学基金资助项目(61502151);; 山东省社科规划项目(17CHLJ18,17CHLJ33,17CHLJ30);; 山东省自然科学基金资助项目(ZR2014FL010);; 山东省教育厅基金资助项目(J15LN34)
语种：中文;
页：JSYJ201904020
页数：6
CN：04
ISSN：51-1196/TP
分类号：91-96

摘要

在对中文文本进行摘要提取时,传统的TextRank算法只考虑节点间的相似性,忽略了文本的其他重要信息。针对中文单文档,在现有研究的基础上,使用TextRank算法并考虑句子间的相似性,使TextRank算法与文本的整体结构信息、句子的上下文信息等相结合,如文档句子或者段落的物理位置、特征句子、核心句子等有可能提升权重的句子来生成文本的摘要候选句群。对得到的摘要候选句群作冗余处理,以除去候选句群中相似度较高的句子,得到最终的文本摘要。最后通过实验验证,该算法能够提高生成摘要的准确性,表明了该算法的有效性。
When abstracting Chinese texts,the traditional TextRank algorithm only considers the similarity between nodes and neglects other important information of the text. Firstly,aiming at Chinese single document,on the basis of existing research,this paper used TextRank algorithm,on the one hand,it considered the similarities between sentences,on the other hand,TextRank was combined with the overall structural information of texts and the contextual information of sentences,such as the physical position of the document sentences or paragraph,feature sentences,core sentences and other sentences that might increase the weight of the sentence,all were used to generate the digest candidate sentence group of the text. And then,removing high-similarity sentences by redundancy processing technology on the digest candidate sentence group. Finally,the experimental verification shows that the algorithm can improve the accuracy of the generated digest,indicating the effectiveness of the algorithm.

引文

[1]Mihalcea R,Tarau P.TextRank:bringing order into texts[C]//Proc of Conference on Empirical Methods in Natural Language Processing.2004:404-411.
    [2]余珊珊,苏锦钿,李鹏飞.基于改进的TextRank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247.(Yu Shanshan,Su Jindian,Li Pengfei.Improved TextRank-based method for automatic summarization[J].Computer Science,2016,43(6):240-247.)
    [3]Blanco R,Lioma C.Random walk term weighting for information retrieval[C]//Proc of International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACMPress,2007:829-830.
    [4]Blanco R,Lioma C.Graph-based term weighting for information retrieval[J].Information Retrieval,2012,15(1):54-92.
    [5]陆伟,程齐凯.一种基于加权网络和句子窗口方案的信息检索模型[J].情报学报,2013,32(8):797-804.(Lu Wei,Cheng Qikai.An information retrieval model based on weighted graph and sentence[J].Journal of the China Society for Scientific and Technical Information,2013,32(8):797-804.)
    [6]Wan Xiaojun,Yang Jianwu,Xiao Jianguo.Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction[C]//Proc of the 45th Annual Meeting of the Association of Computational Linguistics.[S.l.]:Association for Computational Linguistics,2007:552-559.
    [7]杨洁,季铎,蔡东风,等.基于TextRank的多文档关键词抽取技术[C]//第四届全国信息检索与内容安全学术会议论文集(上).2008:397-404.(Yang Jie,Ji Duo,Cai Dongfeng,et al.Keyword extraction in multi-document based on TextRank technology[C]//Proc of the 4th National Conference on Information Retrieval and Content Security(I).2008:397-404.)
    [8]李鹏,王斌,石志伟,等.Tag-TextRank:一种基于tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351.(Li Peng,Wang Bin,Shi Zhiwei,et al.Tag-TextRank:a webpage keyword extraction method based on tag[J].Journal of Computer Research and Development,2012,49(11):2344-2351.)
    [9]Fang Changjian,Mu Dejun,Deng Zhenghong,et al.Word-sentence coranking for automatic extractive text summarization[J].Expert Systems with Applications,2017,72(4):189-195.
    [10]曹洋.基于TextRank算法的单文档自动文摘研究[D].南京:南京大学,2016.(Cao Yang.Single document automatic summarization based on TextRank algorithm[D].Nanjing:Nanjing University,2016.)
    [11]刘星含,霍华.基于互信息的文本自动摘要[J].合肥工业大学学报:自然科学版,2014,37(10):1198-1203.(Liu Xinghan,Huo Hua.Automatic summarization for text based on mutual information[J].Journal of Hefei University of Technology:Natural Science Edition,2014,37(10):1198-1203.)
    [12]徐超,王萌,何婷婷,等.基于局部主题关键句抽取的自动文摘方法[J].计算机工程,2008,34(22):49-51.(Xu Chao,Wang Meng,He Tingting,et al.Automatic summarization method based on extracting sentences from local topics[J].Computer Engineering,2008,34(22):49-51.)
    [13]叶星火,胡珀,张小鹏.基于特征信息提取的中文自动文摘方法[J].计算机应用与软件,2008,25(5):31-32.(Ye Xinghuo,Hu Bai,Zhang Xiaopeng.Approach to automatic summarization for Chinese documents based on feature information extraction[J].Computer Applications and Software,2008,25(5):31-32.)
    [14]蒋昌金,彭宏,陈建超,等.基于主题词权重和句子特征的自动文摘[J].华南理工大学学报:自然科学版,2010,38(7):50-55.(Jiang Changjin,Peng Hong,Chen Jianchao,et al.Automatic text summarization based on thematic word weight and sentence features[J].Journal of South China University of Technology:Natural Science Edition,2010,38(7):50-55.)
    [15]胡珀.融合上下文信息的自动文摘研究[D].武汉:武汉大学,2013.(Hu Po.Research on automatic text summarization by the integration of contextual information[D].Wuhan:Wuhan University,2013.)
    [16]程传鹏,杨要科.自动文摘中的冗余句消除方法[J].计算机应用,2011,31(12):3275-3277.(Cheng Chuanpeng,Yang Yaoke.Method for elimination of redundant sentences in automatic abstraction[J].Journal of Computer Applications,2011,31(12):3275-3277.)
    [17]张璐,曹杰,蒲朝仪,等.基于词句协同排序的单文档自动摘要算法[J].计算机应用,2017,37(7):2100-2105.(Zhang Lu,Cao Jie,Pu Chaoyi,et al.Single document automatic summarization algorithm based on word-sentence co-ranking[J].Journal of Computer Applications,2017,37(7):2100-2105.)
    [18]Barrera A,Verma R.Combining syntax and semantics for automatic extractive single-document summarization[C]//Proc of the 13th International Conference on Computational Linguistics and Intelligent Text Processing.Berlin:Springer-Verlag,2012:366-377.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700