基于R-list的Top-K高效用项集挖掘算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A top-k high utility itemset mining algorithm based on R-list
  • 作者:何登平 ; 何宗浩
  • 英文作者:HE Deng-ping;HE Zong-hao;School of Telecommunications and Information Engineering,Chongqing University of Posts and Telecommunications;Research Center of New Telecommunication Technology Applications,Chongqing University of Posts and Telecommunications;Chongqing Information Technology Designing Company Limited;
  • 关键词:高效用项集 ; 一阶段挖掘 ; 重用链表 ; 数据挖掘 ; Top-K
  • 英文关键词:high utility item set;;one-phase mining;;R-list;;data mining;;top-K
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:重庆邮电大学通信与信息工程学院;重庆邮电大学通信新技术应用研究中心;重庆信科设计有限公司;
  • 出版日期:2019-07-15
  • 出版单位:计算机工程与科学
  • 年:2019
  • 期:v.41;No.295
  • 语种:中文;
  • 页:JSJK201907026
  • 页数:7
  • CN:07
  • ISSN:43-1258/TP
  • 分类号:178-184
摘要
针对现有的一阶段Top-K高效用项集挖掘算法挖掘过程中阈值提升慢,迭代时生成大量候选项集造成内存占用过多等问题,提出一种基于重用链表(R-list)的Top-K高效用挖掘算法RHUM。使用一种新的数据结构R-list来存储并快速访问项集信息,无需第2次扫描数据库进行项集挖掘。该算法重用内存以保存候选集信息,结合改进的RSD阈值提升策略对数据进行预处理,期间采用更严格的剪枝参数在递归搜索的过程中同时计算多个项集的效用来缩小搜索空间。在不同类型数据集中的实验结果表明:RHUM算法在内存效率方面均优于其他一阶段算法,且在K值变化时能保持稳定。
        Aiming at the problem that the existing one-phase top-k high utility itemset mining algorithm is slow to raise the threshold and generates a large number of candidate sets, thus occupying too large memory space during the iteration, we propose a top-k high utility mining algorithm RHUM based on reused list(R-list). This algorithm uses a new data structure called R-list to store and quickly access itemset information without having to scan the database a second time for mining. It reuses the memory to save the information of candidate sets, and preprocesses data jointly with the improved RSD threshold increment strategy. During the recursive search process, stricter pruning parameters are used to calculate the effect of multiple item sets simultaneously to narrow the search space. Experimental results on different types of data sets show that the RHUM is superior to other one-phase algorithms in memory efficiency and stable under the change of K value.
引文
[1] Cheung Y L,Fu A W.Mining frequent itemsets without support threshold:With and without item constraints[J].IEEE Transactions on Knowledge & Data Engineering,2004,16(9):1052-1069.
    [2] Dam T L,Li K,Fournier-Viger P,et al.An efficient algorithm for mining top-rank-k frequent patterns[J].Applied Intelligence,2016,45(1):96-111.
    [3] Le T,Vo B.An N-list-based algorithm for mining frequent closed patterns[J].Expert Systems with Applications,2015,42(19):6648-6657.
    [4] Duong Q H,Liao B,Fournier-Viger P,et al.An efficient algorithm for mining the top-k,high utility itemsets,using novel threshold raising and pruning strategies[J].Knowledge-Based Systems,2016,104(C):106-122.
    [5] Fournier-Viger P,Wu C W,Zida S,et al.FHM:Faster high-utility itemset mining using estimated utility co-occurrence pruning[C]//Proc of the 21st International Symposium on Methodologies for Intelligent Systems(ISMIS2014),2014:83-92.
    [6] Tseng V S,Wu C W,Shie B E,et al.UP-Growth:An efficient algorithm for high utility itemset mining[C]//Proc of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2010:253-262.
    [7] Wu C W,Shie B E,Tseng V S,et al.Mining top-K high utility itemsets[C]//Proc of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2012:78-86.
    [8] Ryang H,Yun U.Top-k high utility pattern mining with effective threshold raising strategies[J].Knowledge-Based Systems,2015,76(1):109-126.
    [9] Tseng V S,Wu C W,Fournier-Viger P,et al.Efficient algorithms for mining Top-K high utility itemsets[J].IEEE Transactions on Knowledge & Data Engineering,2015,28(1):54-67.
    [10] Tseng V S,Shie B E,Wu C W,et al.Efficient algorithms for mining high utility itemsets from transactional databases[J].IEEE Transactions on Knowledge & Data Engineering,2013,25(8):1772-1786.
    [11] Lin Shu-kuan,Wang Xiao-cong,Qiao Jian-zhong,et al.A Top-k high utility item set mining method based on the index utility[J].Journal of Northeastern University(Natural Science),2016,37(1):24-28.(in Chinese)
    [12] Fournier-Viger P,Zida S,Lin C W,et al.Efficient closed high-utility itemset mining[C]//Proc of the 31st Annual ACM Symposium on Applied Computing,2016:898-900.
    [11] 林树宽,王晓丛,乔建忠,等.基于索引效用的Top-k高效用项集挖掘方法[J].东北大学学报(自然科学版),2016,37(1):24-28.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700