摘要
对文档提取关键词时忽略在文档中出现频率不高但对文章具有关键意义的词语的问题,提出一种基于用户行为的文档关键词提取方法。利用结构熵权法为用户对文档的用户行为建模,在提取关键词时,考虑用户行为的影响和文档关键词的位置,并通过实验验证了提出的方法所提取出的关键词具有更高的准确性。
When extracting keywords from documents, words that are not frequently used but have important meaning to the articles tend to be neglected. This paper proposes a method of extracting keywords from documents based on user behavior. The structure entropy weight method is used to model the user behavior of the document. When extracting the keywords, the influence of user behavior and the location of the keywords in the document are considered. The experimental results show that keywords extracted by proposed method have higher accuracy.
引文
[1] 黄先珍,杨玉珍,刘培玉.信息过滤中基于统计与规则的关键词抽取研究 [J].计算机工程,2012,38(2):57-59.
[2] Wu H,Luk R,Wong K,et al.Interpreting TF-IDF term weights as making relevance decisions[J].ACM Transactions on Information Systems (TOIS),2008,26(3):55-57.
[3] HE Q,HAO HW,YIN XC.Keyword extraction based on multi-feature fusion for Chinese Web pages[C]//Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science.Berlin:Springer,2012:119-124.
[4] 苏祥坤,吾守尔·斯拉木,买买提依明·哈斯木.基于词序统计组合的中文文本关键词提取技术[J].计算机工程与设计,2015,36(6):1647-1651.
[5] 站学刚,吴强.基于TF统计和语法分析的关键词提取算法[J].计算机应用与软件,2014,31(1):47-49.
[6] 姜芳,李国和,岳翔.基于语义的文档关键词提取方法[J].计算机应用研究,2015,32(1):142-145.
[7] 台德艺,王俊.文本分类特征权重改进算法[J].计算机工程,2010,36(9):192-199.
[8] 李镇君,周竹荣.基于Document Triage的TF-IDF算法的改进[J].计算机应用,2015,35(12):3506-3510.
[9] Charles Noven Castillo,France Kevin Degamo,Faith Therese Gitgano,et al.Appropriate criteria set for personnel promotion across organizational levels using analytic hierarchy process (AHP)[J].International Journal of Production Management and Engineering.2016,5(1):11-22.
[10] Keurcien Luu,Eric Bazin,Michael G B Blum.pcadapt:an R package to perform genome scans for selection based on principal component analysis[J].Molecular Ecology Resources,2016,17(1):67-77.
[11] 程启月.评测指标权重确定的结构熵权法[J].系统工程理论与实践,2010,30(7):1225-1228.