摘要
数据挖掘技术在各行各业的决策支持活动中扮演着越来越重要的角色,频繁项集挖掘作为数据挖掘最活跃的研究领域之一,具有广泛的应用。近年来,随着信息采集技术和数据处理技术的快速发展,针对不确定数据的频繁项集挖掘引起广泛的关注。然而,面向不确定数据集的加权频繁项集挖掘,由于项目权重值的引入使得加权频繁项集不再满足向下闭包特性,无法对频繁项集的搜索空间进行压缩,时间效率较低。因此,文中提出一种基于Top-K查询的不确定数据加权频繁项集挖掘算法(top-k frequent itemset mining,TK-FIM),以减少候选加权频繁项集的数量,缩小加权频繁项集的搜索空间,提高搜索效率。最后,在真实数据集和合成数据集上的实验结果表明,TK-FIM算法具有良好的时间性能。
Data mining plays a increasingly important role in the decision-making support activities of all walks of life. Frequent itemset mining,as one of the most active research field of data mining,has widely prospect in application. In recent years,with the rapid development of information collection technology and data processing technology,the technology of frequent itemset mining for uncertain data has attracted much attention. However,in the process of weighted frequent itemset mining for uncertain data,the introduction of weight makes the weighted frequent itemsets not satisfy the downward closure property any longer. Thus,the searching space of frequent itemsets cannot be reduced according to downward closure property which will result to a low efficiency. In this paper,the TK-FIM(top-k frequent itemset mining) is proposed to narrow the searching space of weighted frequent itemsets and improve the searching efficiency. Finally,the experiment on both synthetic and real-life datasets shows that the TK-FIM algorithm has a excellent time efficiency.
引文
[1] ISHITA R,RATHOD A.Frequent itemset mining in data mining:a survey[J].International Journal of Computer Applications,2016,139(9):15-18.
[2] 汪金苗,张龙波,邓齐志,等.不确定数据频繁项集挖掘方法综述[J].计算机工程与应用,2011,47(20):121-125.
[3] AGRAWAL R,SRIKANT R.Fast algorithms for mining association rules[C]//Proceedings of the 20th international conference on very large data bases.San Francisco,CA,USA:[s.n.],1994:487-499.
[4] HAN Jiawei,PEI Jian,YIN Yiwen,et al.Mining frequent patterns without candidate generation:a frequent-pattern tree approach[J].Data Mining & Knowledge Discovery,2004,8(1):53-87.
[5] CHUI C K,KAO B,HUNG E.Mining frequent itemsets from uncertain data[C]//Pacific-Asia conference on knowledge discovery and data mining.Berlin:Springer,2007:47-58.
[6] WANG Liang,CHEUNG W L,CHENG R,et al.Efficient mining of frequent item sets on large uncertain databases[J].IEEE Transactions on Knowledge & Data Engineering,2012,24(12):2170-2183.
[7] SUN X,LIM L,WANG S.An approximation algorithm of mining frequent itemsets from uncertain dataset[J].International Journal of Advancements in Computing Technology,2012,4(3):42-49.
[8] DJENOURI Y,COMUZZI M.Combining apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem[J].Information Sciences,2017,420:1-15.
[9] LEUNG K S,CARMICHAEL C L,HAO B.Efficient mining of frequent patterns from uncertain data[C]//IEEE international conference on data mining workshops.Omaha,NE,USA:IEEE,2007:489-494.
[10] AGGARWAL C C,LI Yan,WANG Jianyong,et al.Frequent pattern mining with uncertain data[C]//ACM SIGKDD international conference on knowledge discovery and data mining.Paris,France:ACM,2009:29-38.
[11] LIN C W,HONG T P.A new mining approach for uncertain databases using CUFP trees[J].Expert Systems with Applications,2012,39(4):4084-4093.
[12] LEUNG K S,TANBEER S K.Fast tree-based mining of frequent itemsets from uncertain data[C]//International conference on database systems for advanced applications.Berlin:Springer,2012:272-287.
[13] LEUNG K S,TANBEER S K.PUF-tree:a compact tree structure for frequent pattern mining of uncertain data[C]//PAKDD 2013:advances in knowledge discovery and data mining.Berlin:Springer,2013:13-25.
[14] LEUNG K S,MACKINNON R K,TANBEER S K.Tightening upper bounds to the expected support for uncertain frequent pattern mining[J].Procedia Computer Science,2014,35:328-337.
[15] LEE G,YUN U.A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives[J].Future Generation Computer Systems,2017,68:89-110.
[16] 刘殷雷,刘玉葆,陈程.不确定性数据流上频繁项集挖掘的有效算法[J].计算机研究与发展,2011,48:379-385.
[17] ABRAHAM S,JOSEPH S.Rare and frequent weighted itemset optimization using homologous transactions:a rule mining approach[C]//International conference on control communication & computing india.Trivandrum,India:IEEE,2015:600-605.
[18] BARALIS E,CAGLIERO L,CERQUITELLI T,et al.Frequent weighted itemset mining from gene expression data[C]//IEEE international conference on bioinformatics and bioengineering.Chania,Greece:IEEE,2013:1-4.
[19] LEE G,YUN U,RYANG H,et al.Erasable itemset mining over incremental databases with weight conditions[J].Engineering Applications of Artificial Intelligence,2016,52:213-234.
[20] LIN C W,GAN W,FOURNIER-VIGER P,et al.Weighted frequent itemset mining over uncertain databases[J].Applied Intelligence,2016,44(1):232-250.