一类时态关联规则数据挖掘的研究

英文题名：Study on Data Mining of a Temporal Association Rules
作者：李少年
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：数据挖掘 ; 时态数据 ; 时态数据关联规则 ; 兴趣度 ; Markov ; Chain
英文关键词：Data Mining ; Temporal data ; Temporal association rule ; Interest measurement ; Markov Chain
学位年度：2003
导师：孟志青
学科代码：081203
学位授予单位：湘潭大学
论文提交日期：2003-04-01

摘要

随着商务活动日趋频繁和企业信息化程度的提高，有越来越多的信息积累，其中大部分均以时态数据形式存在。这样，时态数据挖掘作为数据挖掘的一个新的课题出现，引起了人们极大的兴趣。国内外时态数据挖掘的研究起步不久，面向时态数据的挖掘算法目前效果还不是非常理想，还存在许多问题亟待解决。本文对此进行了研究。
     首先，我们介绍了数据挖掘有关的概念、技术和研究现状，论述了时态数据挖掘的研究背景和进展，给出了本文研究的内容。
     然后，介绍了时态型、时态因子和时态粒度的基本概念和性质，给出了一种时态关联规则的数学模型，并对给出了几个具有实际意义的时态关联规则。
     其次，讨论了单事件相同时态、周期时态内关联规则挖掘的两个算法，并给出了相应的试验结果。同时针对双事件时态关联规则挖掘提出一个增量算法，并给出了实验结果。另外还给出一个基于兴趣度的关联规则挖掘算法和试验结果。
     最后，我们论述了Markov Chain在时态数据挖掘中的应用。
     本文所获得的主要成果为：1、给出了一种时态关联规则的数学模型，2、提出了单事件相同时态、周期时态内时态关联规则挖掘的两算法，以及双事件时态关联规则挖掘的一个增量算法和一个基于兴趣度的时态关联规则挖掘算法。
With the frequent interaction of business and the elevation of corporation's information degree, more and more information are being accumulated, most of which exist as a form of temporal data. Therefore, temporal data mining (TDM) becomes a very interesting field of data mining. The study of temporal data mining is at its early stage, till now the effect of the arithmetic for temporal data mining is not idea. There exist many problems that need to be studied, therefore we discuss TDM in this paper.
    First, we discuss the background and development of TDM, and describe the researching content in this paper.
    Second, we introduce the basic concepts about temporal type, temporal factor and temporal granularity, propose a kind of mathematical model about temporal association rules, and describe several temporal association rules with practice significance.
    Third, we discuss the temporal association rules mining algorithm of single event during the period of same temporal factor, cycle temporal factor and analyze the experiment results. To mine temporal association rules in two events, we propose a new increment algorithm. Moreover, their experiment results are given. We propose a new association rules mining algorithm based interest measurement, and give the experiment results.
    Finally, we discuss applications of Markov Chain in TDM.
    In this paper, we obtain these results: 1 a kind of mathematics model of temporal association rules; 2 propose an algorithm of single event during the period of same temporal factor, an algorithm of cycle temporal factor, an increment algorithm about temporal association rules mining of two events and an algorithm based interest measurement.

引文

[1] Fayyad, U. M.; Piatetsky-Shapiro and Smyth, P.1996. From Data Mining to Knowledge Discovery: An Overview. In Advances in knowledge Discovery and Data Mining, 1-31, eds. Fayyad, U. M.; Piatetsky-Shapiro, G.; Smyth, P.; and Uthurusamy, R.AAAI/MIT Press.
    [2] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.
    [3] R.Agrawa, C.Lin, H.Sawhney, K.Shim. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-series Databases. VLDB(490-501).Santiago, Chile. 1995.
    [4] R. Agrawal, R. Srikant. Mining sequential pattern. ICDF(3-14). Taipei. Taiwan. 1995.
    [5] C. Faloutasos, M.Ranganathan, Y.manolopoulos. Fast Subsequence Matching in Time-series Databases. ACM SIGMOD int. Conf. On Management of Data(419-429). Minneapolis, USA, 1994.
    [6] M. Gavrilov,D.Anguelov, P Indyk,R.Motwani.Mining The Stock Market: Which MeasureIs best?. KDD(487-496). Boston, USA, 2000.
    [7] X.Ge,P.Smyth. Deformable Markov model Tempelate for Time-Series Pattern Matching KDD(81-90).Boston,USA.2000.
    [8] K.Chan,W.Fu. Efficient Time Series Matching by Walvelets, ICDE(126-133). Sydeney, Austr-alian, 1999.
    [9] D. Berndt, J. Clifford. Finding Pattern in Time-series: A Dynamic Programming Approach, in U. Fayyad, G. Shapiro, P.Smyth, R. Uthurusamy, Advanced in Knowledge Discovery and data Mining(229-248). AAAI Press, 1996.
    [10] S. Ramaswamy, S.Mahajan,A.Silberschatz. on the discovery of

    Interesting patterns in Association Rules. VLDB(368-379).New York,USA, 1998.
    [11] A.Weigend, N.Gershenfeld. Time-series Prediction: Forecasting the Future and Understand the Past. Addison-wesley. 1994.
    [12] G.Guimaraes, The Introduce of Temporal Grammatical Rules from Multivariate Time Series.ICGI(127-140). Lisbon, Portugal. 2000.
    [13] H.Lu,J.Han,L.Feng. Stock Price Movement Prediction and N-dimension Intertransaction association Rules. ACM SIGMOD workshop on research Issues indata Mining and Knowledge Discovery(12.1-12.7). Seattle USA. 1998.
    [14] A. Silberschatz and A. Tuzhilin. What makes patternsinteresting in knowledge discover systems. IEEE Transactions on Knowledge and Data Engineerin 9(TKDE) vol. 8 no. 6, pp.970-974, 1996.
    [15] K. Wang, Y. He, and J. Han. Mining frequent itemsets using support constraints. Proe. Int. Conf. on Very Large Data Bases, 2000.
    [16] G. Berger and A. ～zhilin. Discovering unexpected patterns intemporal data using temporal logic. Temporal Databases-Research and Practice, Lecture Notes on Computer Sciences,(1399) 281-309, 1998.
    [17] Jean-Marc Adamo. Data Mining for Association Rules and SequentialPatterns. Springer, New York, 2001.
    [18] G. Dong and J. Li. Effcient mining of emerging patterns: Discoveringtrends and dierences. In Proc. 1999 Int. Conf. Knowledge Discoveryand Data Mining(KDD'99), pages 43-52, San Diego, 1999.
    [19] R. Agrawal etc., "Fast Discovery of Association Rules,"

    Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1996, pp. 307-328.
    [20] A. Mueller, Fast Sequential and Parallel Algorithms for Association Rule Mining: A comparison, Tech. Report CS-TR-3515, Univ.of Maryland, College Park, Md., 1995.
    [21] D. Cheung, K. Hu, and S. Xia, "Asynchronous Parallel Algorithm for Mining Association Rules on Shared-Memory Multi-Processors," Proc. 10th ACM Symp. Parallel Algorithms and Architectures, ACM Press, New York, 1998, pp. 279-288.
    [22] J. Han and Y. Fu. Discovery of Multiple-Level Association Rules from Large Databases. Proceedings of the 21th International Conference on Very Large Data Bases, pages 420-431, September 1995.
    [23] J. Han. Mining knowledge at multiple concept levels. In Proc. 4th Int. Conf. on Informationand Knowledge Management, pages 19-24, Balti-more, Maryland, Nov. 1995.
    [24] 董豆豆，李登峰，陈玉文，基于关联规则的舰艇故障诊断数据挖掘系统结构框架，船舶工程，2001．4。
    [25] R. Agrawal,T. Imielinski, A. Swami,Database Mining:A Performance Perspective, IEEE Trans. on Knowledge and Data Engineering, 1990, 5(5), 914-925.
    [26] R.Alur and D.L.Dill,A theory of Timed Automata, Theoretical Computer Science, 1994,126,183-235.
    [27] R. Agrwal, R. Srikant, Mining Sequential Patterns, IEEE Proc. Int'l Conf. Database Eng. 1995, 3-14.
    [28] X. Wang, C. Bettini,A. Brodsky,and S. Jajodia, Logical Design for Temporal Databases with Multiple Granuarities, ACM Trans. Database System, 1997, 22(2), 115-170.
    [29] Bettini, Claudio; Wang, Sean X; Jajodia, Sushil; Lin, Jia-Ling,

    Discovering frequent event patterns with multiple granularities in time sequences, IEEE Transactions on Knowledge and Data Engineering, 1998, 10(2), 222-237.
    [30] 欧阳为民，蔡庆生，数据库中的时态数据发掘研究，计算机科学，1998，25(4)：60-63。
    [31] Ning Zhong, Lizhu Zhou, Methodologies for Knowledge Discovery and Data Mining, Third Pacific-AsiaConference, PAKDD-99, Beijing, April,1999.
    [32] 欧阳为民，蔡庆生，在数据库中发现具有时态约束的关联规则，软件学报，1999，10(5)：527-532。
    [33] 唐常杰，于中华，游志胜等，时态数据的变粒度分段存储策略及其效益分析，软件学报，1999，10(10)，25-30．
    [34] 唐常杰，于中华，游志胜等，基于具有时态约束的数据库的Web数据周期规律的采掘，计算机学报，2000，23(1)，52-59．
    [35] Mannila H.,Toivonen H., InKeri VerKamo, A Efficient Algorithms For Discovering Association Rules. Proceedings of AAAI Workshop on Knowledge Discovery in Database, July 1994.
    [36] Srikant R., Agrawal R., Ming Generalized Association Rules. Proceedings of the 21th International Conference on Very Large Databases,Sept. 1995.
    [37] 孟志青，时态数据采掘中的时态型与时间粒度研究，湘潭大学自然科学学报，2000，22(3)：1-4．
    [38] 刘念祖，时态数据挖掘的探讨，上海第二工业大学报，2001，3。
    [39] 周欣，沙朝锋，朱扬勇，施伯乐，兴趣度——关联规则的又一个阈值，计算机研究与发展，2000．5。
    [40] 娄兰芳蒋志方田世壮，影响关联规则挖掘的有趣性因素的研究，计算机工程与应用，2003，6。


    [41] 朱绍文，王德全，关联规则技术及发展动向，计算机工程，2000，9。
    [42] 王伟贤，有限齐次markov链遍历性的快速判定，工科数学，1994．4。
    [43] 徐精明，一个基于markov链预测模型的算法改进及计算机实现，安庆师范学院学报(自然科学版)1997，4。
    [44] Hu Di-he, The Relations Among Various markov Chains, 武汉大学学报(自然科学版) 2001，vol6，No3，(643-648)。
    [45] 刘嘉鲲著，应用随机过程，科学出版社。
    [46] 施仁杰，马尔可夫链基础及应用。电子科技大学出版社，1992。
    [47] Jiawei Han,Micheline Kamber著，数据挖掘概念与技术，机械工业出版社。
    [48] Bettini, C-Wang, X.-Jajodia, S.-Lin, J.: Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences. IEEE TOKDE Vol. 10 N 2:222-237. April 1998.
    [49] Agrawal, R.; Imielinski, T.; and Swami, A. 1993.Mining Association Rules between Sets of Items inLarge Databases. In Proceedings of ACM SIGMOD,207-216, Washington, D.C.
    [50] J. S. Park, M. Chen, and P.S. Yu, "An EffectiveHash Based Algori-thm for Mining AssociationRules," Proc. ACM SIGMOD Conf., ACM Press, New York, 1995, pp. 175-186.
    [51] S. Brin et al., "Dynamic Itemset Counting and Implication Rules for Market Basket Data," Proc. ACM SIGMOD Conf. Management of Data, ACM Press, New York, 1997, pp. 255-264.
    [52] R. Agrawal and J. Shafer, "Parallel Miningof Association Rules," IEEE Trans. Knowledge and Data Eng., Vol. 8, No. 6, Dec. 1996, pp. 962-969.
    [53] 徐敏金远平，一种新的周期性关联规则模型，计算机工程与

    科学 2000，Vol．22，No．4
    [54] A. Savasere, E. Omiecinski, and S. Navathe,"An Efficient Algorithm for Mining Association Rules in Large Databases," Proc. 21st Int'l Conf. Very Large Databases., Morgan Kaufmann, San Francisco, 1995, pp. 432-444.
    [55] Yiming Yang. Noise reduction in a statistical ap-proach to text categorization. In Pruceedings of the 18th Annua International ACM SZGZR Conference on Research and Development in Information Re-trieval(SZGZR'95), pages 256-263, 1995.
    [56] Yiming Yang and Jan Pederson. Feature selection instatistical learn-ing of text categorization. In ZCML97, pages 412-420, 1997.
    [57] P.S.M.Tsai, C.-C.Lee, A.L.P.Chen, An Efficient Approach for Incremental Association Rules Mining, same to above.
    [58] L.Singh, B.Chen, R.Haight, P.Scheuermann, An Algorithm for Constrained Association Rule Mining in Semi-structured Data, same to above.
    [59] M.-F.Jiang, S.-S.Tseng, C.-J.Tsai, Discovering Structure from Document Database, same to above.
    [60] H.Mannila,D.Pavlov,P.Smyth,Prediction with Local Patterns Using Cross-Entropy,KDD(357-361),San Diego,USA, 1999.
    [61] Business Cycle Prediction Using Support Vector Methods (2000) (Make Corrections) Kai Vogtlinder, Claus Weihs. www.statistik.uni-dortmund.de/sfb475/berichte/tr21-00.ps
    [62] Models of Information Security Trend AnalysisTim Shimeall, Ph.D., Phil Williams, Ph.D.www.cert.org/archive/pdf/info-security.pdf
    [63] 刘彦宾，趋势分析的一种KDD方法，西南师范大学学报(自然科学版)2002年05期。


    [64] 范垂仁，李学宽，孙恩福，1996年洪水趋势分析，东北水利水电，1996年08期。
    [65] 陈仁升，康尔泗，张济世，小波变换在河西地区水文和气候周期变化分析中的应用，地球科学进展，2001年03期。
    [66] 郑斌祥，杜秀华，席裕庚，时序数据相似性挖掘算法研究，信息与控制，2002年03期。
    [67] Abdullah Uz Tansel2 and Necip Fazil Ayan, Discovery of associ-ation Rules in Temporal Databases, 1998, ftp.cs.umd.edu/users/nfa/tansel.ps.z
    [68] 陶兰，唐玉堂，具有时态约束的数据库周期规律与关联规则的挖掘，中国农业大学学报，2001，6(4)76-80

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700