一种基于Top-K查询的加权频繁项集挖掘算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Frequent Itemset Mining Algorithm for Uncertain Data Based on Top-K Queries
  • 作者:赵学健 ; 熊肖肖 ; 张欣慧 ; 孙知信
  • 英文作者:ZHAO Xue-jian;XIONG Xiao-xiao;ZHANG Xin-hui;SUN Zhi-xin;School of Modern Post,Nanjing University of Posts and Telecommunications;School of Internet of Things,Nanjing University of Posts and Telecommunications;
  • 关键词:Top-K ; 加权频繁项集 ; 向下闭包特性 ; 不确定数据 ; 数据挖掘
  • 英文关键词:Top-K;;weighted frequent itemset;;downward closure property;;uncertain data;;data mining
  • 中文刊名:WJFZ
  • 英文刊名:Computer Technology and Development
  • 机构:南京邮电大学现代邮政学院;南京邮电大学物联网学院;
  • 出版日期:2019-03-21 11:08
  • 出版单位:计算机技术与发展
  • 年:2019
  • 期:v.29;No.267
  • 基金:国家自然科学基金(61373135,61672299);; 国家自然青年科学基金(61702281,20140883);; 江苏省基础研究计划(自然科学基金)(BK20140883,BK20140894,BK20150869)
  • 语种:中文;
  • 页:WJFZ201907010
  • 页数:6
  • CN:07
  • ISSN:61-1450/TP
  • 分类号:55-60
摘要
数据挖掘技术在各行各业的决策支持活动中扮演着越来越重要的角色,频繁项集挖掘作为数据挖掘最活跃的研究领域之一,具有广泛的应用。近年来,随着信息采集技术和数据处理技术的快速发展,针对不确定数据的频繁项集挖掘引起广泛的关注。然而,面向不确定数据集的加权频繁项集挖掘,由于项目权重值的引入使得加权频繁项集不再满足向下闭包特性,无法对频繁项集的搜索空间进行压缩,时间效率较低。因此,文中提出一种基于Top-K查询的不确定数据加权频繁项集挖掘算法(top-k frequent itemset mining,TK-FIM),以减少候选加权频繁项集的数量,缩小加权频繁项集的搜索空间,提高搜索效率。最后,在真实数据集和合成数据集上的实验结果表明,TK-FIM算法具有良好的时间性能。
        Data mining plays a increasingly important role in the decision-making support activities of all walks of life. Frequent itemset mining,as one of the most active research field of data mining,has widely prospect in application. In recent years,with the rapid development of information collection technology and data processing technology,the technology of frequent itemset mining for uncertain data has attracted much attention. However,in the process of weighted frequent itemset mining for uncertain data,the introduction of weight makes the weighted frequent itemsets not satisfy the downward closure property any longer. Thus,the searching space of frequent itemsets cannot be reduced according to downward closure property which will result to a low efficiency. In this paper,the TK-FIM(top-k frequent itemset mining) is proposed to narrow the searching space of weighted frequent itemsets and improve the searching efficiency. Finally,the experiment on both synthetic and real-life datasets shows that the TK-FIM algorithm has a excellent time efficiency.
引文
[1] ISHITA R,RATHOD A.Frequent itemset mining in data mining:a survey[J].International Journal of Computer Applications,2016,139(9):15-18.
    [2] 汪金苗,张龙波,邓齐志,等.不确定数据频繁项集挖掘方法综述[J].计算机工程与应用,2011,47(20):121-125.
    [3] AGRAWAL R,SRIKANT R.Fast algorithms for mining association rules[C]//Proceedings of the 20th international conference on very large data bases.San Francisco,CA,USA:[s.n.],1994:487-499.
    [4] HAN Jiawei,PEI Jian,YIN Yiwen,et al.Mining frequent patterns without candidate generation:a frequent-pattern tree approach[J].Data Mining & Knowledge Discovery,2004,8(1):53-87.
    [5] CHUI C K,KAO B,HUNG E.Mining frequent itemsets from uncertain data[C]//Pacific-Asia conference on knowledge discovery and data mining.Berlin:Springer,2007:47-58.
    [6] WANG Liang,CHEUNG W L,CHENG R,et al.Efficient mining of frequent item sets on large uncertain databases[J].IEEE Transactions on Knowledge & Data Engineering,2012,24(12):2170-2183.
    [7] SUN X,LIM L,WANG S.An approximation algorithm of mining frequent itemsets from uncertain dataset[J].International Journal of Advancements in Computing Technology,2012,4(3):42-49.
    [8] DJENOURI Y,COMUZZI M.Combining apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem[J].Information Sciences,2017,420:1-15.
    [9] LEUNG K S,CARMICHAEL C L,HAO B.Efficient mining of frequent patterns from uncertain data[C]//IEEE international conference on data mining workshops.Omaha,NE,USA:IEEE,2007:489-494.
    [10] AGGARWAL C C,LI Yan,WANG Jianyong,et al.Frequent pattern mining with uncertain data[C]//ACM SIGKDD international conference on knowledge discovery and data mining.Paris,France:ACM,2009:29-38.
    [11] LIN C W,HONG T P.A new mining approach for uncertain databases using CUFP trees[J].Expert Systems with Applications,2012,39(4):4084-4093.
    [12] LEUNG K S,TANBEER S K.Fast tree-based mining of frequent itemsets from uncertain data[C]//International conference on database systems for advanced applications.Berlin:Springer,2012:272-287.
    [13] LEUNG K S,TANBEER S K.PUF-tree:a compact tree structure for frequent pattern mining of uncertain data[C]//PAKDD 2013:advances in knowledge discovery and data mining.Berlin:Springer,2013:13-25.
    [14] LEUNG K S,MACKINNON R K,TANBEER S K.Tightening upper bounds to the expected support for uncertain frequent pattern mining[J].Procedia Computer Science,2014,35:328-337.
    [15] LEE G,YUN U.A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives[J].Future Generation Computer Systems,2017,68:89-110.
    [16] 刘殷雷,刘玉葆,陈程.不确定性数据流上频繁项集挖掘的有效算法[J].计算机研究与发展,2011,48:379-385.
    [17] ABRAHAM S,JOSEPH S.Rare and frequent weighted itemset optimization using homologous transactions:a rule mining approach[C]//International conference on control communication & computing india.Trivandrum,India:IEEE,2015:600-605.
    [18] BARALIS E,CAGLIERO L,CERQUITELLI T,et al.Frequent weighted itemset mining from gene expression data[C]//IEEE international conference on bioinformatics and bioengineering.Chania,Greece:IEEE,2013:1-4.
    [19] LEE G,YUN U,RYANG H,et al.Erasable itemset mining over incremental databases with weight conditions[J].Engineering Applications of Artificial Intelligence,2016,52:213-234.
    [20] LIN C W,GAN W,FOURNIER-VIGER P,et al.Weighted frequent itemset mining over uncertain databases[J].Applied Intelligence,2016,44(1):232-250.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700