摘要
针对聚类布尔矩阵的Apriori算法—CBM_Apriori算法的不足之处,提出了一种基于聚类布尔矩阵的Eclat算法—CBM_Eclat算法。该算法首先对布尔矩阵使用K-medoids算法,获得权值和聚类后的布尔矩阵;然后将聚类后的布尔矩阵转换成Tidset,并采用逻辑"交操作"运算,进而有效地减少了聚类布尔矩阵存储和候选项集的生成,提高了该算法的执行效率。通过实例应用和算法执行结果都能够证明CBM_Eclat算法具有可行性和有效性。
For the inadequacy of Apriori algorithm of cluster Boolean matrix —CBM_Apriori algorithm,this paper presents a methods of Eclat algorithm based on cluster Boolean matrix —CBM_Eclat algorithm. To begin with, using K-medoids algorithm deal with Boolean matrix to obtain the weight and new Boolean matrix. Then,new Boolean matrix is transformed into the Tidset that use logical "and" operating,so the cluster Boolean matrix storage and candidate itemsets are reduced effectively. Thus,the efficiency of the algorithm is improved. Meanwhile,the application of example and result of algorithm performance both can prove the feasibility and effectiveness of the CBM_Eclat algorithm.
引文
[1]Agrawal R,Imielinaki T,Swami A.Mining association rules between sets of items in large databases[C].In Proc.1993 ACM—SIGMOD Int.Conf.Management of Date,Washington,D.C.,1993:207-216.
[2]Jiawei Han,Jian Pei,Yiwen Yin.Mining frequent patterns without candidate generation[C].In Proc.2000 ACM—SIGMOD Int.Conf.Management of Data,Dallas,Texas,USA,2000:1-12.
[3]Vu L,Alaghband G.A fast algorithm combining FP-tree and TID-list for frequent pattern mining[C].In Proceedings of IEEE Conference on Information and Knowledge Engineering,2011:472-477.
[4]付沙,宋丹.基于矩阵的Apriori改进算法研究[J].微电子学与计算机,2012,5(5):156-161.
[5]Mohammed J Zaki.Scalable algorithms for association mining[J].Knowledge and Data Engineering,2000,12(3):372-390.
[6]Zaki M.J.Fast vertical mining using diffsets[R].Technical Report 0-1,Rensselaer Polytechnic Institute,Troy,New York,2001.
[7]方炜炜,杨炳儒,宋威,等.基于布尔矩阵的关联规则算法研究[J].计算机应用,2008,25(7):1964-1967.
[8]李敏,李春平.频繁模式挖掘算法分析和比较[J].计算机应用,2005,25(1):166-171.
[9]宋长新,马克.改进的Eclat数据挖掘算法的研究[J].微计算机信息,2008,24(8):92-94.
[10]景永霞,王治和,杜跃.一种新的Apriori改进算法[J],长春理工大学,2007,30(2):67-69.
[11]谈恒贵,王文杰,李克双.频繁项集挖掘算法综述[J],计算机仿真,2005,22(11):1-4.