基于用户行为的文档关键词提取方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on document key words extraction based on user behavior
  • 作者:王燊 ; 施运梅
  • 英文作者:WANG Shen;SHI Yunmei;Computer School, Beijing Information Science & Technology University;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science & Technology University;
  • 关键词:提取 ; 结构熵权法
  • 英文关键词:extraction;;structure entropy weight method
  • 中文刊名:BJGY
  • 英文刊名:Journal of Beijing Information Science & Technology University
  • 机构:北京信息科技大学计算机学院;北京信息科技大学网络文化与数字传播北京市重点实验室;
  • 出版日期:2018-10-15
  • 出版单位:北京信息科技大学学报(自然科学版)
  • 年:2018
  • 期:v.33;No.125
  • 基金:国家重点研发计划项目(2018YFB1004100);; 北京信息科技大学网络文化与数字传播北京市重点实验室开放课题(ICDDXN006)
  • 语种:中文;
  • 页:BJGY201805009
  • 页数:5
  • CN:05
  • ISSN:11-5866/N
  • 分类号:48-52
摘要
对文档提取关键词时忽略在文档中出现频率不高但对文章具有关键意义的词语的问题,提出一种基于用户行为的文档关键词提取方法。利用结构熵权法为用户对文档的用户行为建模,在提取关键词时,考虑用户行为的影响和文档关键词的位置,并通过实验验证了提出的方法所提取出的关键词具有更高的准确性。
        When extracting keywords from documents, words that are not frequently used but have important meaning to the articles tend to be neglected. This paper proposes a method of extracting keywords from documents based on user behavior. The structure entropy weight method is used to model the user behavior of the document. When extracting the keywords, the influence of user behavior and the location of the keywords in the document are considered. The experimental results show that keywords extracted by proposed method have higher accuracy.
引文
[1] 黄先珍,杨玉珍,刘培玉.信息过滤中基于统计与规则的关键词抽取研究 [J].计算机工程,2012,38(2):57-59.
    [2] Wu H,Luk R,Wong K,et al.Interpreting TF-IDF term weights as making relevance decisions[J].ACM Transactions on Information Systems (TOIS),2008,26(3):55-57.
    [3] HE Q,HAO HW,YIN XC.Keyword extraction based on multi-feature fusion for Chinese Web pages[C]//Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science.Berlin:Springer,2012:119-124.
    [4] 苏祥坤,吾守尔·斯拉木,买买提依明·哈斯木.基于词序统计组合的中文文本关键词提取技术[J].计算机工程与设计,2015,36(6):1647-1651.
    [5] 站学刚,吴强.基于TF统计和语法分析的关键词提取算法[J].计算机应用与软件,2014,31(1):47-49.
    [6] 姜芳,李国和,岳翔.基于语义的文档关键词提取方法[J].计算机应用研究,2015,32(1):142-145.
    [7] 台德艺,王俊.文本分类特征权重改进算法[J].计算机工程,2010,36(9):192-199.
    [8] 李镇君,周竹荣.基于Document Triage的TF-IDF算法的改进[J].计算机应用,2015,35(12):3506-3510.
    [9] Charles Noven Castillo,France Kevin Degamo,Faith Therese Gitgano,et al.Appropriate criteria set for personnel promotion across organizational levels using analytic hierarchy process (AHP)[J].International Journal of Production Management and Engineering.2016,5(1):11-22.
    [10] Keurcien Luu,Eric Bazin,Michael G B Blum.pcadapt:an R package to perform genome scans for selection based on principal component analysis[J].Molecular Ecology Resources,2016,17(1):67-77.
    [11] 程启月.评测指标权重确定的结构熵权法[J].系统工程理论与实践,2010,30(7):1225-1228.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700