一种最有趣的序列模式挖掘算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Mining Algorithm of the Most Interesting Sequential Patterns
  • 作者:李涛 ; 张帅弛 ; 张灿
  • 英文作者:LI Tao;ZHANG Shuai-chi;ZHANG Can;School of Electronic & Information Engineering,Nanjing University of Information Science and Technology;
  • 关键词:数据挖掘 ; 序列模式 ; 位图 ; 兴趣度
  • 英文关键词:Data mining;;Sequential pattern;;Bitmap;;Interestingness
  • 中文刊名:JSJZ
  • 英文刊名:Computer Simulation
  • 机构:南京信息工程大学电子与信息工程学院;
  • 出版日期:2019-04-15
  • 出版单位:计算机仿真
  • 年:2019
  • 期:v.36
  • 基金:公益性行业(气象)科研专项(GYHY201306070);; 江苏省高等学校大学生创新创业训练计划项目(201610300031)
  • 语种:中文;
  • 页:JSJZ201904042
  • 页数:7
  • CN:04
  • ISSN:11-3724/TP
  • 分类号:205-210+270
摘要
序列模式挖掘作为数据挖掘的重要课题,在许多实际应用中也是一个重要且具有挑战性的任务。传统的序列模式挖掘算法通常以频度作为兴趣模式的标准且缺乏序列模式的优良扩展,挖掘结果质量不高。现提出一种最有趣的序列模式挖掘(Interesting Sequential Patterns Mining,ISPM)算法,定义一种新的序列模式兴趣度度量方法,同时采用分支定界的搜索方式对所有可能的候选序列进行遍历,并利用相关剪枝策略和位图的数据结构提高挖掘效率。通过手语表达序列、网站点击流、购物篮等数据集验证了算法的有效性。
        Sequential pattern mining has been wildly used in the analysis of data in practical application as an important part of data mining. Traditional sequential pattern mining algorithms usually take frequency as the proxy of interesting patterns,and lack of excellent extension of sequential pattern. The quality of the mining results is not high. This paper proposes an efficient algorithm for mining the most interesting sequential patterns(ISPM). A new method of evaluating interesting sequential pattern was defined and a new branch-and-bound algorithm was taken to travel all possible candidate sequential patterns. We used related pruning strategy and data structure of bitmap to improve mining efficiency. Experiments on sign language utterance,web click stream sequences,shopping basket sequences and other sequential datasets confirmed the effectiveness of the proposed algorithm.
引文
[1] R Agrawal,R Srikant.Mining sequential patterns[C].Proceedings of the Eleventh International Conference on Data Engineering,1995:3-14.
    [2] 王虎,丁世飞.序列模式挖掘研究与发展[J].计算机科学,2009,(12):14-17.
    [3] G I Webb.Filtered-top-k association discovery[J].Wiley Interdisciplinary Reviews:Data mining and knowledge discovery.2011,1(3):183-192.
    [4] F Petitjean,T Li,N Tatti and G I Webb.Skopus:Mining Top-k sequential pattern under leverage[J].Data Mining and Knowledge Discovery.2016,(5):1086-1111.
    [5] G I Webb.OPUS:An efficient admissible algorithm for unordered search[J].Journal of Artificial Intelligence Research.1995,3(1):431-465
    [6] J Ayres,et al.Sequential pattern mining using a bitmap representation[J].Proc of Sigkdd.2002,8(4):429-435.
    [7] 線刚.GSP序列模式挖掘算法设计与实现[D].吉林大学,2013.
    [8] 王斌,黄晓芳,袁平.基于PrefixSpan序列模式挖掘的改进算法[J].西南科技大学学报,2016,(4):68-72.
    [9] P Fournier-Viger,et al.Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information[C].Pacific-Asia Conference on Knowledge Discovery and Data Mining Springer International Publishing,2014:40-52.
    [10] P Tzvetkov,X Yan,and J Han.TSP:Mining Top-k Closed Sequential Patterns[J].Knowledge and Information Systems,2005,7(4):438-457.
    [11] P Fournier-Viger,A Gomariz.TKS:Efficient mining of top-k sequential patterns[J].Advanced Data Mining and Applications,Volume 8346 of the series Lecture Notes in Computer Science,2013:109-120.
    [12] H T Lam,et al.Mining Compressing Sequential Patterns[C].SDM.2012:34–52.
    [13] C Low-Kam,C Raissi,M Kaytoue and J Pei.Mining statistically significant sequential patterns[C].In:2013 IEEE 13th International Conference on Data Mining,Dallas,TX,USA,December 7-10,2013:488-497.
    [14] W G I ebb.Self-sufficient itemsets:An approach to screening potentially interesting associations between items[J].Acm Transactions on Knowledge Discovery from Data,2010,4(1):1-20.
    [15] 宋威,乔阳阳.基于加权序列模式的推荐算法研究[J].计算机工程与学,2015,(7):1399-1404.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700