用户名: 密码: 验证码:
中文文本挖掘的动态文摘建模方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research dynamic summarization modeling method based on Chinese text mining
  • 作者:刘美 ; 王慧强 ; 陈广胜 ; 于洋
  • 英文作者:LIU Meiling;WANG Huiqiang;CHEN Guangsheng;YU Yang;College of Information and Computer Engineering,Northeast Forestry University;College of Computer Science and Technology,Harbin Engineering University;
  • 关键词:动态文摘 ; 建模 ; 文本挖掘 ; 中文摘要 ; 评测方法 ; 句子加权 ; 文本理解 ; 语料测试
  • 英文关键词:dynamic abstract;;modeling;;text mining;;Chinese abstract;;evaluation method;;sentence weighting;;text understanding;;corpus testing
  • 中文刊名:HEBG
  • 英文刊名:Journal of Harbin Engineering University
  • 机构:东北林业大学信息与计算机工程学院;哈尔滨工程大学计算机科学与技术学院;
  • 出版日期:2018-11-02 14:32
  • 出版单位:哈尔滨工程大学学报
  • 年:2019
  • 期:v.40;No.270
  • 基金:国家自然科学基金项目(61702091);; 中央高校基本科研业务费专项资金项目(2572018BH06)
  • 语种:中文;
  • 页:HEBG201904027
  • 页数:7
  • CN:04
  • ISSN:23-1390/U
  • 分类号:184-190
摘要
中文文摘是自然语言处理领域的重要研究之一,尤其是基于理解的文摘中对时间特征的研究,更是引起了广泛的关注。本文抽取中文文本摘要的动态特征,进行文本挖掘和建模分析来体现时间特性。通过对中文动态多文档文摘系统框架和句子加权、特征抽取及句子选择等关键步骤的算法设计,实现动态文摘模型,并提出基于动态性能的中文文摘评测方法。实验证明该中文动态文摘技术及其评测算法在实际的文本挖掘过程中,能较好地获得可理解性的、具有时间延展性的摘要,具有一定的可行性和较高的研究价值。
        Chinese summarization is one of the most important research fields in natural language processing. In particular,the study of time characteristics in comprehension-based abstracts has attracted widespread attention. In this study,we extract the dynamic features of Chinese text summary and conduct text mining and modeling analysis to reflect the time characteristics. Based on dynamic performance,a dynamic summarization model is established and a Chinese summarization evaluation method is proposed by constructing the framework of Chinese dynamic multidocument summarization system and the algorithm design of key steps,including sentence weighting,feature extraction,and sentence selection. Experiments show that the dynamic Chinese summarization technology and its evaluation algorithm can obtain understandable time-lapse summaries in the actual text mining process,which is feasible and possesses high research value.
引文
[1]郭庆琳,樊孝忠,柳长安.基于文本聚类的自动文摘系统的研究与实现[J].计算机工程,2006,32(4):30-32,121.GUO Qinglin,FAN Xiaozhong,LIU Changan.Research and implementation about automatic abstract system based on text clustering[J].Computer engineering,2006,32(4):30-32,121.
    [2]郭庆琳,樊孝忠,柳长安.文本聚类在自动文摘中的应用研究[J].计算机应用,2005,25(5):1036-1038.GUO Qinglin,FAN Xiaozhong,LIU Changan.Application in automatic abstracting for text clustering[J].Computer application,2005,25(5):1036-1038.
    [3]张其文,李明.多文档文摘提取方法的研究[J].兰州理工大学学报,2007,33(1):96-99.ZHANG Qiwen,LI Ming.Investigation of method for extracting multi-document abstracts[J].Journal of Lanzhou university of technology,2007,33(1):96-99.
    [4]刘美玲,任洪娥,于洋,等.基于网络的动态多文档文摘系统框架[J].软件学报,2013,24(5):1006-1021.LIU Meiling,REN Honge,YU Yang,et al.Web-based dynamic multi-document summarization system framework[J].Journal of software,2013,24(5):1006-1021.
    [5]刘德荣,王永成,刘传汉.基于主题概念的多文档自动摘要研究[J].情报学报,2005,24(1):69-74.LIU Derong,WANG Yongcheng,LIU Chuanhan.Study of multiple documents summarization based on subject concept cohesion[J].Journal of the China society for scientific and technical information,2005,24(1):69-74.
    [6]YE Na,ZHU Jingbo,ZHENG Yan,et al.A dynamic programming model for text segmentation based on min-max similarity[C]//Proceedings of the 4th Asia Information Retrieval Conference on Information Retrieval Technology.Harbin,China,2008:141-152.
    [7]杨选选,张蕾.基于语义角色和概念图的信息抽取模型[J].计算机应用,2010,30(2):411-414.YANG Xuanxuan,ZHANG Lei.Information extraction based on semantic role and concept graph[J].Journal of computer applications,2010,30(2):411-414.
    [8]张瑾,许洪波,程学旗.面向网络演化信息的动态文摘方法研究[J].计算机学报,2008,31(4):696-701.ZHANG Jin,XU Hongbo,CHENG Xueqi.Research on dynamic summarization for evolutionary web information[J].Chinese journal of computers,2008,31(4):696-701.
    [9]ZHANG Jin,CHENG Xueqi,XU Hongbo.Dynamic summarization:Another stride towards summarization[C]//Proceedings of 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops.Silicon Valley,CA,USA,2007:64-67.
    [10]余珊珊,苏锦钿,李鹏飞.基于改进的TextRank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247.YU Shanshan,SU Jindian,LI Pengfei.Improved TextRank-based method for automatic summarization[J].Computer science,2016,43(6):240-247.
    [11]傅间莲,陈群秀.一种新的自动文摘系统评价方法[J].计算机工程与应用,2006,42(18):176-177.FU Jianlian,CHEN Qunxiu.A new evaluation method for automatic text summarization[J].Computer engineering and applications,2006,42(18):176-177.
    [12]魏继增,孙济洲,秦兵.多文档文摘评价标准的研究[J].计算机工程与应用,2007,43(2):180-183.WEI Jizeng,SUN Jizhou,QIN Bing.Research on standard of evaluation of multi-document summarization[J].Computer engineering and applications,2007,43(2):180-183.
    [13]RUSH A M,CHOPRA S,WESTON J.A neural attention model for abstractive sentence summarization[J].arXiv preprint arXiv:1509.00685,2015.
    [14]LIN C Y.Looking for a few good metrics:ROUGE and its evaluation[C]//Proceedings of the NTCIR Workshop.Tokyo,Japan,2004.
    [15]BOUDIN F,MORENO J M T.NEO-CORTEX:a performant user-oriented multi-document summarization system[M]//GELBUKH A.Computational Linguistics and Intelligent Text Processing.Berlin Heidelberg:Springer,2007:551-562.
    [16]刘美玲,郑德权,赵铁军,等.动态多文档文摘模型[J].软件学报,2012,23(2):289-298.LIU Meiling,ZHENG Dequan,ZHAO Tiejun,et al.Dynamic multi-document summarization model[J].Journal of software,2012,23(2):289-298.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700