用户名: 密码: 验证码:
一种基于邻接表的最大频繁项集挖掘算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Maximal Frequent Itemsets Mining Algorithm Based on Adjacency Table
  • 作者:殷茗 ; 王文杰 ; 张煊宇 ; 姜继娇
  • 英文作者:YIN Ming;WANG Wenjie;ZHANG Xuanyu;JIANG Jijiao;Institute of Software and Microelectronics, Northwestern Polytechnical University;Management School, Northwestern Polytechnical University;
  • 关键词:数据挖掘 ; 频繁项集 ; Apriori ; FP-Growth ; FP-Tree
  • 英文关键词:Data mining;;Frequent itemsets;;Apriori;;FP-Growth;;FP-Tree
  • 中文刊名:DZYX
  • 英文刊名:Journal of Electronics & Information Technology
  • 机构:西北工业大学软件与微电子学院;西北工业大学管理学院;
  • 出版日期:2019-08-13
  • 出版单位:电子与信息学报
  • 年:2019
  • 期:v.41
  • 基金:教育部人文与社会科学基金(16YJA630068,18YJA630043);; 航空科学基金(2016ZG53071);; 陕西省自然科学基础研究计划项目(2018JM7008);; 陕西省社会科学基金(2018S28);; 西北工业大学研究生种子基金(ZZ2018222)~~
  • 语种:中文;
  • 页:DZYX201908031
  • 页数:8
  • CN:08
  • ISSN:11-4494/TN
  • 分类号:236-243
摘要
针对Apriori算法与FP-Growth算法在最大频繁项集挖掘过程中存在的运行低效、内存消耗大、难以适应稠密数据集的处理、影响大数据价值挖掘时效等问题,该文提出一种基于邻接表的最大频繁项集挖掘算法。该算法只需遍历数据库一次,同时用哈希表对邻接表进行辅助存储,减小了遍历的空间规模。理论分析与实验结果表明,该算法时间与空间复杂度较低,提高了最大频繁项集挖掘速率,尤其在处理稠密数据集时具有较好的优越性。
        To solve the problems of Apriori algorithm and FP-Growth algorithm in the process of mining the maximal frequent itemsets, which refer to inefficient operation, high memory consumption, difficulty in adapting to the process of dense datasets, and affecting the time-effectiveness of large data value mining, this paper proposes a maximal frequent itemsets mining algorithm based on adjacency table. The algorithm only needs to traverse the database once and adopts the hash table to store the adjacency table, which reduces the memory consumption. Theoretical analysis and experimental results show that the algorithm has lower time and space complexity and improves the mining rate of maximal frequent itemsets, especially when dealing with dense datasets.
引文
[1]WU Xindong,KUMAR V,QUINLAN J R,et al.Top 10algorithms in data mining[J].Knowledge and Information Systems,2008,14(1):1-37.doi:10.1007/s10115-007-0114-2.
    [2]FASIH H and SHAHRAKI M H N.Incremental mining maximal frequent patterns from univariate uncertain data[J].Knowledge-Based Systems,2018,152:40-50.doi:10.1016/j.knosys.2018.04.001.
    [3]易彤,徐宝文,吴方君.一种基于FP树的挖掘关联规则的增量更新算法[J].计算机学报,2004,27(5):703-710.doi:10.3321/j.issn:0254-4164.2004.05.017.YI Tong,XU Baowen,and WU Fangjun.A FP-tree based incremental updating algorithm for mining association rules[J].Chinese Journal of Computers,2004,27(5):703-710.doi:10.3321/j.issn:0254-4164.2004.05.017.
    [4]陈安龙,唐常杰,陶宏才,等.基于极大团和FP-Tree的挖掘关联规则的改进算法[J].软件学报,2004,15(8):1198-1207.doi:10.13328/j.cnki.jos.2004.08.012.CHEN Anlong,TANG Changjie,TAO Hongcai,et al.An improved algorithm based on maximum clique and FP-Tree for mining association rules[J].Journal of Software,2004,15(8):1198-1207.doi:10.13328/j.cnki.jos.2004.08.012.
    [5]BUI H,VO B,NGUYEN H,et al.A weighted N-list-based method for mining frequent weighted itemsets[J].Expert Systems with Applications,2018,96:388-405.doi:10.1016/j.eswa.2017.10.039.
    [6]AGRAWAL R and SRIKANT R.Fast algorithms for mining association rules in large databases[C].The 20thInternational Conference on Very Large Data Bases,San Francisco,1994:487-499.
    [7]APILETTI D,BARALIS E,CERQUITELLI T,et al.Frequent itemsets mining for big data:A comparative analysis[J].Big Data Research,2017,9:67-83.doi:10.1016/j.bdr.2017.06.006.
    [8]肖波,徐前方,蔺志青,等.可信关联规则及其基于极大团的挖掘算法[J].软件学报,2008,19(10):2597-2610.XIAO Bo,XU Qianfang,LIN Zhiqing,et al.Credible association rule and its mining algorithm based on maximum clique[J].Journal of Software,2008,19(10):2597-2610.
    [9]HAN Jiawei,PEI Jian,and YIN Yiwen.Mining frequent patterns without candidate generation[C].The 2000 ACMSIGMOD International Conference on Management of Data,Dallas,2000:1-12.doi:10.1145/342009.335372.
    [10]KARIM R,COCHEZ M,BEYAN O D,et al.Mining maximal frequent patterns in transactional databases and dynamic data streams:A spark-based approach[J].Information Sciences,2018,432:278-300.doi:10.1016/j.ins.2017.11.064.
    [11]吉根林,杨明,宋余庆,等.最大频繁项目集的快速更新[J].计算机学报,2005,28(1):128-135.doi:10.3321/j.issn:0254-4164.2005.01.016.JI Genlin,YANG Ming,SONG Yuqing,et al.Fast updating maximum frequent itemsets[J].Chinese Journal of Computers,2005,28(1):128-135.doi:10.3321/j.issn:0254-4164.2005.01.016.
    [12]宋余庆,朱玉全,孙志挥,等.基于FP-Tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003,14(9):1586-1592.doi:10.13328/j.cnki.jos.2003.09.012.SONG Yuqing,ZHU Yuquan,SUN Zhihui,et al.An algorithm and its updating algorithm based on FP-Tree for mining maximum frequent itemsets[J].Journal of Software,2003,14(9):1586-1592.doi:10.13328/j.cnki.jos.2003.09.012.
    [13]VIJAYARANI S and SHARMILA S.Comparative analysis of association rule mining algorithms[C].2016 International Conference on Inventive Computation Technologies,Coimbatore,2016:1-6.doi:10.1109/INVENTIVE.2016.7830203.
    [14]LIU Li,YU Shuo,WEI Xiang,et al.An improved Aprioribased algorithm for friends recommendation in microblog[J].International Journal of Communication Systems,2018,31(2):e3453.doi:10.1002/dac.3453.
    [15]连志春,伊凤新.一种改进的频繁模式树生长算法[J].应用科技,2008,35(6):47-51.doi:10.3969/j.issn.1009-671X.2008.06.012.LIAM Zhichun and YI Fengxin.An improved frequent pattern tree growth algorithm[J].Applied Science and Technology,2008,35(6):47-51.doi:10.3969/j.issn.1009-671X.2008.06.012.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700