摘要
序列模式挖掘作为数据挖掘的重要课题,在许多实际应用中也是一个重要且具有挑战性的任务。传统的序列模式挖掘算法通常以频度作为兴趣模式的标准且缺乏序列模式的优良扩展,挖掘结果质量不高。现提出一种最有趣的序列模式挖掘(Interesting Sequential Patterns Mining,ISPM)算法,定义一种新的序列模式兴趣度度量方法,同时采用分支定界的搜索方式对所有可能的候选序列进行遍历,并利用相关剪枝策略和位图的数据结构提高挖掘效率。通过手语表达序列、网站点击流、购物篮等数据集验证了算法的有效性。
Sequential pattern mining has been wildly used in the analysis of data in practical application as an important part of data mining. Traditional sequential pattern mining algorithms usually take frequency as the proxy of interesting patterns,and lack of excellent extension of sequential pattern. The quality of the mining results is not high. This paper proposes an efficient algorithm for mining the most interesting sequential patterns(ISPM). A new method of evaluating interesting sequential pattern was defined and a new branch-and-bound algorithm was taken to travel all possible candidate sequential patterns. We used related pruning strategy and data structure of bitmap to improve mining efficiency. Experiments on sign language utterance,web click stream sequences,shopping basket sequences and other sequential datasets confirmed the effectiveness of the proposed algorithm.
引文
[1] R Agrawal,R Srikant.Mining sequential patterns[C].Proceedings of the Eleventh International Conference on Data Engineering,1995:3-14.
[2] 王虎,丁世飞.序列模式挖掘研究与发展[J].计算机科学,2009,(12):14-17.
[3] G I Webb.Filtered-top-k association discovery[J].Wiley Interdisciplinary Reviews:Data mining and knowledge discovery.2011,1(3):183-192.
[4] F Petitjean,T Li,N Tatti and G I Webb.Skopus:Mining Top-k sequential pattern under leverage[J].Data Mining and Knowledge Discovery.2016,(5):1086-1111.
[5] G I Webb.OPUS:An efficient admissible algorithm for unordered search[J].Journal of Artificial Intelligence Research.1995,3(1):431-465
[6] J Ayres,et al.Sequential pattern mining using a bitmap representation[J].Proc of Sigkdd.2002,8(4):429-435.
[7] 線刚.GSP序列模式挖掘算法设计与实现[D].吉林大学,2013.
[8] 王斌,黄晓芳,袁平.基于PrefixSpan序列模式挖掘的改进算法[J].西南科技大学学报,2016,(4):68-72.
[9] P Fournier-Viger,et al.Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information[C].Pacific-Asia Conference on Knowledge Discovery and Data Mining Springer International Publishing,2014:40-52.
[10] P Tzvetkov,X Yan,and J Han.TSP:Mining Top-k Closed Sequential Patterns[J].Knowledge and Information Systems,2005,7(4):438-457.
[11] P Fournier-Viger,A Gomariz.TKS:Efficient mining of top-k sequential patterns[J].Advanced Data Mining and Applications,Volume 8346 of the series Lecture Notes in Computer Science,2013:109-120.
[12] H T Lam,et al.Mining Compressing Sequential Patterns[C].SDM.2012:34–52.
[13] C Low-Kam,C Raissi,M Kaytoue and J Pei.Mining statistically significant sequential patterns[C].In:2013 IEEE 13th International Conference on Data Mining,Dallas,TX,USA,December 7-10,2013:488-497.
[14] W G I ebb.Self-sufficient itemsets:An approach to screening potentially interesting associations between items[J].Acm Transactions on Knowledge Discovery from Data,2010,4(1):1-20.
[15] 宋威,乔阳阳.基于加权序列模式的推荐算法研究[J].计算机工程与学,2015,(7):1399-1404.