用户名: 密码: 验证码:
滑动窗口中FP-Tree的频繁项集挖掘算法的研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on Frequent Itemsets Mining Algorithm of FP-tree in Sliding Window
  • 作者:李芬田 ; 王红梅 ; 潘超
  • 英文作者:LI Fen-tian;WANG Hong-mei;PAN Chao;School of Computer Science & Engineering,Changchun University of Technology;
  • 关键词:数据流 ; 滑动窗口 ; 数据挖掘 ; 临界频繁项集
  • 英文关键词:data flow;;sliding window;;data mining;;critical frequent itemsets
  • 中文刊名:XXWX
  • 英文刊名:Journal of Chinese Computer Systems
  • 机构:长春工业大学计算机科学与工程学院;
  • 出版日期:2019-01-15
  • 出版单位:小型微型计算机系统
  • 年:2019
  • 期:v.40
  • 基金:吉林省科技厅发展计划项目(20160415013JH)资助;吉林省科技厅重大科技招标专项(20160203010GX)资助
  • 语种:中文;
  • 页:XXWX201901010
  • 页数:5
  • CN:01
  • ISSN:21-1106/TP
  • 分类号:47-51
摘要
当有大量的事务插入或者删除时,针对p Win算法在窗口滑动阶段反复访问前缀树进行事务的更新; DSFPM算法中DSFPM-Tree中大量的父子之间存在不频繁的关系,因此建立的DSFPM-Tree比较高,特别是在窗口滑动的时候,需要频繁更新DSFPM-Tree带来很大的时间开销等缺点,提出滑动窗口中FP-Tree的频繁项集挖掘算法.算法将数据流分成大小相等的模块来进行挖掘,每个模块均采用上三角矩阵存储,并且设计了一种概要结构NCFP-Tree来存储每个基本窗口中的临界频繁项集,窗口每次滑动一个基本窗口,利用优化的频繁项集挖掘算法,分别把各个基本窗口中的临界频繁项集挖掘出来.用C实现了该算法,实验结果证明了该算法比其他两个算法的时间效率更高,查全率和查准率都优于其它两个算法,具有良好的性能.
        When a large number of transactions are inserted or deleted,the p Win algorithm repeatedly accesses the prefix tree for transaction updating during the windowsliding phase. The DSFPMalgorithm has an infrequent relationship between a large number of father-son relationships in the DSFPM-Tree. Therefore,the established DSFPM-Tree is relatively high. In particular,when the windowis sliding,frequent updating of the DSFPM-Tree requires a lot of time overhead,and the FP-Tree frequent itemsets mining algorithm in the sliding windowis proposed. The algorithm divides the data flowinto modules of equal size for mining. Each module adopts the upper triangular matrix storage,and a synoptic structure NCFP-Tree is designed to store the critical frequent itemsets in each basic window. Each time the windowslides a basic window,the use of optimized frequent itemsets mining algorithm,respectively,each of the basic windowin the critical frequent itemsets mining. The algorithm is implemented in C,and the experimental results showthat the algorithm is more efficient than the other two algorithms in time efficiency,the recall rate and precision are better than the other two algorithms,with good performance.
引文
[1] Ao Fu-jiang,Yan Yue-jin,Huang Jian,et al. Design of data stream frequent pattern mining algorithm[J]. Computer Science,2008,35(3):1-5.
    [2] Li Guo-hui,Chen Hui. Frequent pattern mining in arbitrary sliding time windowof data flow[J]. Journal of Software,2008,19(10):2585-2596.
    [3] Tu Li,Chen Ling,Bao Fang. Mining frequent items in a stream based on sliding window[J]. Journal of Chinese Computer Systems,2012,33(5):940-949.
    [4] Tanbeer S K,Ahmed C F,Jeong B S,et al. Sliding window-based frequent pattern mining over data streams[J]. Information Sciences,2009,179(22):3843-3865.
    [5] Lee G,Yun U,Ryang H,et al. Approximate maximal frequent pattern mining with weight conditions and error tolerance[J]. International Journal of Pattern Recognition and Artificial Intelligence,2016,30(6):1650012-1650054.
    [6] Han J,Pei J,Yin Y. Mining frequent patterns without candidate generation[C]. ACMSigmod Record,ACM,2000,29(2):1-12.
    [7] Manku G S,Motwani R. Approximate frequency counts over data streams[C]. Proceedings of the 28th International Conference on Very Large Data Bases,VLDB Endowment,2002:346-357.
    [8] Leung C K S,Khan Q I. DSTree:a tree structure for the mining of frequent sets from data streams[C]. Data Mining,ICDM'06,Sixth International Conference on. IEEE,2006:928-932.
    [9] Giannella C,Han J,Pei J,et al. Mining frequent patterns in data streams at multiple time granularities[C]. Data Mining:Next Generation Challenges and Future Directions,AAAI/MIT,2003:191-212.
    [10] Lee G,Yun U,Ryu K H. Sliding windowbased weighted maximal frequent pattern mining over data streams[J]. Expert Systems with Applications,2014,41(2):694-708.
    [11] Liu Xue-jun,Xu Hong-bing,Dong Yi-sheng,et al. Mining data flowclosed frequent patterns based on sliding window[J]. Journal of Computer Research and Development,2006,43(10):1738-1743.
    [12] Li Guo-hui,Yang Bing,Hu Dun,et al. Frequent patterns of data flowin mining sliding windows[J]. Journal of Chinese Computer System,2008,29(8):1491-1497.
    [13] Deypir M,Sadreddini MH,Tarahomi M. An efficient sliding windowbased algorithm for adaptive frequent itemset mining over data streams[J]. J. Inf. Sci. Eng.,2013,29(5):1001-1020.
    [14] Aggarwal,Charu C,Jiawei Han,et al. Frequent pattern mining[M]. Springer,2014.
    [15] Lee G,Yun U,Ryu K H. Sliding windowbased weighted maximal frequent pattern mining over data streams[J]. Expert Systems with Applications,2014,41(2):694-708.
    [16] Wang Xin,Liu Fang-ai. An improved multi-stream collaborative frequent itemsets mining algorithm[J]. Journal of Computer Applications,2016,36(7):1988-1992.
    [1]敖富江,颜跃进,黄健,等.数据流频繁模式挖掘算法设计[J].计算机科学,2008,35(3):1-5.
    [2]李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10):2585-2596.
    [3]屠莉,陈崚,包芳.挖掘滑动窗口中的数据流频繁项算法[J].小型微型计算机系统,2012,33(5):940-949.
    [11]刘学军,徐宏炳,董逸生,等.基于滑动窗口的数据流闭合频繁模式的挖掘[J].计算机研究与发展,2006,43(10):1738-1743.
    [12]李国徽,杨兵,胡忄享,等.挖掘滑动窗口中的数据流频繁模式[J].小型微型计算机系统,2008,29(8):1491-1497.
    [16]王鑫,刘方爱.改进的多数据流协同频繁项集挖掘算法[J].计算机应用,2016,36(7):1988-1992.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700