最大频繁项集挖掘算法及应用研究

英文题名：Algorithms of Maximal Frequent Itemset Mining and Their Applications
作者：王卉
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：数据挖掘 ; 并行算法 ; 入侵检测 ; 最大频繁项集 ; 剪枝
英文关键词：data mining ; parallel algorithm ; intrusion detection ; maximal frequent itemset ; pruning
学位年度：2004
导师：李庆华
学科代码：081202
学位授予单位：华中科技大学
论文提交日期：2004-04-01
答辩委员会主席：王振宇

摘要

频繁项集的挖掘是数据挖掘中的一个基础和核心问题,具有广泛的应用领域。由于它是数据挖掘过程中最耗时的部分,挖掘算法的好坏直接影响数据挖掘尤其是关联挖掘的效率和应用范围。因此,最大频繁项集挖掘算法的研究具有重要的理论和应用价值。
    在对数据挖掘中的核心问题,即频繁项集的挖掘算法及其并行化技术,进行深入研究的基础上,围绕最大频繁项集的挖掘算法和应用,研究了高效的挖掘最大频繁项集的串行算法和并行算法,并将最大频繁项集挖掘算法应用于入侵检测。
    频繁项集的挖掘是一个搜索问题,剪枝优化技术是提高频繁项集挖掘效率的一个重要手段。文献中在频繁项集挖掘算法中用到的剪枝优化策略可归纳为:根部剪枝、频繁扩展和不扩展三种策略。在分析与研究传统剪枝策略的基础上,提出了新的剪枝策略——多步回退剪枝策略。多步回退剪枝策略在发现一个最大频繁项集后最多可一次回退k层(k为所发现的这个最大频繁项集的长度),最好情况下可将要扩展的节点数量从降低为。与文献中深度优先搜索中逐层回退策略相比,可大幅度削剪搜索空间,达到提高解决问题效率的目的。
    最大频繁项集的挖掘是频繁项集挖掘中的重要研究分支。在分析了现有最大频繁项集挖掘算法的基础上,针对其不足,提出了一个改进的挖掘最大频繁项集的算法MinMax(Mining Maximal)。MinMax采用了垂直的数据库表示形式,按照自顶向下深度优先的策略对项集空间进行搜索,采用了多步回退剪枝、根部剪枝、频繁扩展和不扩展等多种剪枝优化策略,大幅度削剪了搜索空间。提出了频繁项的不频繁度的概念,通过对频繁项进行适当的排序发挥了各种剪枝优化技术的优势。垂直的数据库表示形式使得项集的支持度计算可以通过简单的集合交集运算来完成,从而避免了对数据库的多次扫描。实验和分析表明,在长模式密集的情况下,MinMax的性能优于目前同类算法。
    并行处理是提高解决问题效率的有效办法,在研究了挖掘最大频繁项集挖掘的并行化策略地基础上,基于分布存储结构,将算法MinMax并行化,提出了挖掘最大频


    繁项集的并行算法P-MinMax(Parallel MinMax)。为了异步执行MinMax,减少处理机之间的制约和等待,P-MinMax基于前缀关系划分等价类,以等价类长度的指数函数为权值,并利用因子项集的完全包含关系在处理机之间贪心分配等价类,根据等价类的需要相应地划分和复制数据库记录,使各处理机得以异步计算,达到了较好的负载平衡、较高的剪枝效率和较少的数据库记录复制,减少了算法的执行时间。分析和实验表明, P-MinMax有较好的可扩展性,其性能优于已有同类算法。
    从以数据为中心的观点来看,入侵检测问题实际上是一个数据分析问题。用以入侵检测的数据是主机的审计轨迹数据和网络的审计轨迹数据,这些审计数据中记录了系统和网络上发生的所有活动。基于此种思想,提出了一个基于最大频繁项集的入侵检测系统模型MMID(Mining Maximal for Intrusion Detection)。模型中,针对入侵检测的特点,设计了新的最大频繁项集的挖掘算法MinMax_for_IDS。通过挖掘训练数据中的最大频繁项集建立系统和用户的正常行为模型以及攻击模型,用一个滑动窗口来检测是否有不被正常行为模型覆盖的频繁模式发生,以此达到检测入侵的目的。实验表明,MMID对在短时间内频繁发生的攻击类型有较高的检测速度和精度。
Progress in digital data acquisition, distribution, retrieval and storage technology has resulted in the growth of massive databases. One of the greatest challenges that organizations and individuals encountered is how to convert their collections of rapidly expanding data into accessible and actionable knowledge. The attempts to overcome these hurdles have gathered researchers from different areas, such as statistics, machine learning, and databases, which resulted in a new research field, so called Data Mining.
    Frequent itemset mining plays a crucial role in many data mining applications. It occurs in the discovery of association rules, strong rules, correlations, multidimensional patterns, and many other important discovery tasks. Frequent itemset mining dominates the time complexity of the discovery algorithms.
    Most existing work focuses on mining all frequent itemsets. However, as any subset of a frequent itemset is also frequent, the size of the set of all the maximal frequent itemsets is much smaller than that of the entire frequent itemsets. It has been observed that it suffices to mine only the set of maximal frequent itemsets instead of every frequent itemsets.
    There has been some recent research work that applied data mining techniques to an intrusion detection system for discovering new types of attacks. An in-depth analysis of data mining methods used in an intrusion detection system for discovering new types of attacks is presented and some open problems are addressed as well.
    This dissertation work focuses on maximal frequent itemset mining and its application. A number of new algorithms for mining maximal frequent itemsets with novel pruning strategies and parallelization solutions are presented for more efficiently , and a system model for intrusion detection based maximal frequent itemset mining is proposed for improving the accuracy and performance of an intrusion detection system.
    This thesis research makes contributions to both data mining and intrusion detection fields.
    Frequent itemset mining is a time-consuming task. Pruning strategies are widely used to improve the efficiency of searching. However, most of them still explore unnecessary redundant nodes. The classical pruning strategies are thoroughly studied and categorized into three main classes of root pruning, frequent extending and non-extending. Based on the in-depth analysis, a novel and powerful pruning strategy, called multi-level backtrack strategy, is presented in this dissertation. Compared with all the previous pruning strategies, which backtrack step by step, the multi-level backtrack strategy can backtrack best up to k levels when a k-length maximal frequent itemset has been found. The number of nodes need to be extended can be best reduced from to .
    Maximal frequent itemset mining is an important branch. However, it has not received much attention so far. A novel algorithm for mining maximal frequent itemsets, called MinMax (Mining Maximal), is presented in this dissertation. MinMax employs a vertical database layout scheme. Along with depth first search strategy, it uses a number of optimization techniques to prune the search space such as the multi-level backtrack pruning, root pruning, frequent extending and non-extending pruning, thus cutting down the search space efficiently.
    By introducing a new concept of no-support of a frequent item, the frequent items are ordered in the way that they lead to a maximal frequent itemset pruning as much as possible.


    The vertical database layout scheme makes it available to count the support of itemsets simply by set intersection operations instead of scanning database repeatedly. The experiment results show the effective performance, which are better than previous work especially in database with intensive long patterns.
    Parallel techniques are widely used to improve the efficiency of mining algorithms. A novel and powerful parallel algorithm, P-MinMax (Parallel MinMax), is proposed in this dissertation. Based on its serial version MinMax, the new algorithm decom

引文

Gregory Piatetsky-Shapiro, Usama M. Fayyad, Padhraic Smyth. From Data Mining to Knowledge Discovery: An Overview. in: Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth eds. Advances in Knowledge Discovery and Data Mining. AAAI /MIT Press, 1996. 1-34
    Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami. Mining Association Rules between Sets of Items in Large Databases. SIGMOD, 1993, 22(2): 207-216
    Jaturon Chattratichat, John Darlington, Moustafa Ghanem et al. Large Scale Data Mining: Challenges and Responses. in: David Heckerman, Heikki Mannila, Daryl Pregibon eds. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. Newport Beach California USA. 1997. AAAI Press, 1997. 143-146
    Jennifer Widom. Research Problems in Data Warehousing. in: Proceedings of the 1995 International Conference on Information and Knowledge Management, 1995, Baltimore Maryland USA. ACM 1995. 25-30
    Takuya Wada, Tadashi Horiuchi, Hiroshi Motoda et al. Characterization of Default Knowledge in Ripple Down Rules Method. in: Ning Zhong, Lizhu Zhou eds. Proceedings of third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining. Beijing China. 1999. Springer, 1999. Lecture Notes in Computer Science,1999, 1574: 284-295
    Guozhu Dong, Jinyan Li. Efficient Mining of Emerging Patterns: Discovering Trends and Differences. in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego USA. 1999. ACM, 1999. 43-52
    Andreas Arning, Rakesh Agrawal, Prabhakar Raghavan. A Linear Method for Deviation Detection in Large Databases. in: Evangelos Simoudis, Jiawei Han, Usama M. Fayyad eds. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining . Portland Oregon USA. 1996. AAAI Press, 1996. 164-169
    Heikki Mannila, Hannu Toivonen, Inkeri Verkamo. Efficient algorithms for discovering association rules. in: Usama M. Fayyad and Ramasamy Uthurusamy eds. Proceedings of the AAAI Workshop on Knowledge Discovery in Databases. AAAI Press, 1994. 181-192

    Rakesh Agrawal, Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules in Large Databases. in: Jorge B. Bocca, Matthias Jarke, Carlo Zaniolo eds. Proceedings of the 20th International Conference on Very Large Data Bases. Santiago de Chile. 1994. Morgan Kaufmann, 1994. 487-499
    Jong Soo Park, Ming-Syan Chen, Philip S. Yu. An effective hash based algorithm for mining association rules. in: Michael J. Carey and Donovan A. Schneider eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. San Jose California. 1995. ACM Press, 1995. SIGMOD 24(2): 175-186
    Ashoka Savasere, Edward Omiecinski, Shamkant B. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. in: Umeshwar Dayal, Peter M. D. Gray, Shojiro Nishio eds. Proceedings of 21th International Conference on Very Large Data Bases. Zurich Switzerland. 1995. Morgan Kaufmann, 1995. 432-444
    Hannu Toivonen. Sampling Large Databases for Association Rules. in: T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan eds. Proceedings of 22th International Conference on Very Large Data Bases. Mumbai (Bombay) India. 1996. Morgan Kaufmann, 1996. 134-145
    Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman et al. Dynamic Itemset Counting and Implication Rules for Market Basket Data. in: Joan Peckham ed. Proceedings of ACM SIGMOD International Conference on Management of Data. Tucson Arizona USA.1997. ACM Press, 1997. 255-264
    Christian Hidber. Online Association Rule Mining. in: Alex Delis, Christos Faloutsos, Shahram Ghandeharizadeh eds. Proceedings of ACM SIGMOD International Conference on Management of Data. Philadelphia Pennsylvania USA. 1999. ACM Press, 1999. 145-156
    Hoschka P, Kl?sgen W. A Support System for Interpreting Statistical Data. in: Gregory Piatetsky-Shapiro and William J. Frawley eds. Knowledge Discovery in Databases. Menlo Park CA. 1991. AAAI Press, 1991. 325-346
    Ramakrishnan Srikant, Rakesh Agrawal. Mining Quantitative Association Rules in Large Relational Tables. in: H. V. Jagadish, Inderpal Singh Mumick eds. Proceedings of ACM SIGMOD International Conference on Management of Data. Montreal Quebec Canada. 1996. ACM Press, 1996. SIGMOD 25(2): 1-12

    Ramakrishnan Srikant, Quoc Vu, Rakesh Agrawal. Mining Association Rules with Item Constraints. in: David Heckerman, Heikki Mannila, Daryl Pregibon eds. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. Newport Beach California USA.1997. AAAI Press, 1997. 67-73
    Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han et al. Exploratory Mining and Pruning Optimizations of Constrained Association Rules. in: Laura M. Haas and Ashutosh Tiwary eds. Proceedings of ACM SIGMOD International Conference on Management of Data. Seattle Washington USA. 1998. ACM Press, 1998. 13-24
    Roberto J. Bayardo Jr. Efficiently Mining Long Patterns from Databases. in: Laura M. Haas and Ashutosh Tiwary eds. Proceedings of ACM SIGMOD International Conference on Management of Data. Seattle Washington USA. 1998. ACM Press, 1998. 85-93
    D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. in: Proceedings of 17th International Conference on Data Engineering. Heidelberg Germany. 2001. 443-452
    Karam Gouda, Mohammed J. Zaki. Efficiently Mining Maximal Frequent Itemsets. in: Proceedings of 2001 IEEE International Conference on Data Mining. San Jose California. 2001. 63-170
    Dimitrios Gunopulos, Heikki Mannila, Sanjeev Saluja. Discovering All Most Specific Sentences by Randomized Algorithms. in: Foto N. Afrati, Phokion G. Kolaitis eds. Proceedings of 6th International Conference on Database Theory . Delphi Greece. 1997. Springer, 1997. Lecture Notes in Computer Science, 1186: 215-229
    Dao-I Lin, Zvi M. Kedem. Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set. in: Foto N. Afrati, Phokion G. Kolaitis eds. Proceedings of 6th International Conference on Database Theory. Delphi Greece. 1997. Springer, 1997. Lecture Notes in Computer Science, 1186: 105-119
    Jong Soo Park, Ming-Syan Chen, Philip S. Yu. Efficient Parallel and Data Mining for Association Rules. in: Proceedings of the International Conference on Information and Knowledge Management. Baltimore Maryland USA. 1995. ACM, 1995. 31-36

    Rakesh Agrawal, John C. Shafer. Parallel Mining of Association Rules. IEEE Transaction on Knowledge and Data Engineering , 1996, 8(6): 962-969
    David Wai-Lok Cheung, Vincent Ng, Ada Wai-Chee Fu et al. Efficient Mining of Association Rules in Distributed Databases. IEEE Transactions on Knowledge and Data Engineering, 1996, 8(1): 911-922
    David Wai-Lok Cheung, Jiawei Han, Vincent Ng et al. A Fast Distributed Algorithm for Mining Association Rules. in: Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems. Miami Beach Florida USA. 1996. IEEE Computer Society, 1996. 31-42
    Hu K, Cheung D W and Xia S. Effect of Adaptive Interval Configuration on Parallel Mining Association Rules. Journal of Software, 2000, 11(1): 159-172
    Eui-Hong Han, George Karypis, Vipin Kumar. Scalable Parallel Data Mining for Association Rules. in: Joan Peckham ed. Proceedings of ACM SIGMOD International Conference on Management of Data. Tucson Arizona USA. 1997. ACM Press, 1997. SIGMOD 26(2): 277-288
    Mohammed J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 2000, 12(3): 372-390
    Dorothy E. Denning. An Intrusion Detection Model. IEEE Transactions on Software Engineering, 1987, 13(2):222-232
    D.S. Bauser and M.E. Koblentz. NIDX - a real-time intrusion detection expert system. in: USENIX Conference Proceedings. San Francisco CA. 1998. Usenix Association, 1988. 261-273
    H. S. Javitz, A. Valdes. The sri ides statistical anomaly detector. in: Proceedings of IEEE Symposium on Security and Privacy. Oakland CA. 1991. 316-326
    Teresa F. Lunt and Jagannathan R. A prototype real-time intrusion detection expert system. in: Proceedings of IEEE Symposium on Security and Privacy. New York. 1988. 59-66
    Peter G. Neumann and Phillip A. Porras. Experience with Emerald to Date. in: Proceedings of the 1st USENIX Workshop on Intrusion Detection and Network Monitoring. Santa Clara California. 1999. 73-80

    Phillip A. Porras, Peter G. Neumann. Emerald: Event monitoring enabling responses to anomalous live disturbances. in: Proceedings of the Nineteenth National Computer Security Conference and National Information Systems Security Conference. Baltimore Maryland. 1997. 353-365
    Cheri Dowell and Paul Ramstedt. The computerwatch data reduction tool. in: Proceedings of 13th National Computer Security Conference. Washington D.C. 1990. 99-108
    Bauer D S and Koblentz M E. NIDX - An expert system for real-time network intrusion detection. in: IEEE Computer Networking Symposium. 1998. 98-106
    J. R. Winkler. A unix prototype for intrusion an anomaly detection in secure networks. in: Proceedings of the 13th National Computer Security Conference. Washington D.C. 1990. 115-124
    Liepins G. E. and Vaccaro H. S. Intrusion detection: It’s role and validation. Computers Security, 1992. 347-355
    Anderson D., Teresa F. Lunt, Javitz H. et al. Detecting Unusaul Program Behavior Using the Stastistical Component of the Next-generation Intrusion Detection Expert System (NIDES). SRI-CSL-95-06, SRI International Menlo Park CA, 1995.
    Anderson D., Frivold T. Valdes A. Next-generation Intrusion Detection Expert System (NIDES): A Summary. SRI-CSL-95-07, SRI International Menlo ParkCA, May 1995.
    Joao B D, Ravichandran C B and Mehra R K. Statistical traffic modeling for network intrusion detection. in: Proceedings of 8th International Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems. 2000. 466-473
    Eleazar Eskin. Anomaly Detection over Noisy Data Using Learned Probability Distributions. in: Pat Langley ed. Proceedings of the Seventeenth International Conference on Machine Learning . Stanford University Standord CA USA. 2000. Morgan Kaufmann, 2000. 255-262
    Stephen Smaha. Haystack: An intrusion detection system. in: Proceedings of the 4th Aerospace Computer Security Applications Conference. 1988. 37-44

    Jia-Ling Lin, Xiaoyang Sean Wang, Sushil Jajodia. Abstraction-Based Misuse Detection: High-Level Specifications and Adaptable Strategies. in: Proceedings of the 11th IEEE Computer Security Foundations Workshop. Rockport Massachusetts USA. 1998. IEEE Computer Society, 1998. 190-201
    Porras P. A. and Kemmerer R. A. Penetration state transition analysis: A rule-based intrusion detection approach. in: Proceedings of the Eighth Annual Computer Security Application Conference. San Antonio TX. 1992. 220-229
    Giovanni Vigna, Richard A. Kemmerer. NetSTAT: A Network-based Intrusion Detection Approach. in: Proceedings of 14th Annual Computer Security Applications Conference. Scottsdale Arizona USA. 1998. IEEE Computer Society, 1998. 25-30
    Stephanie Forrest, Steven A. Hofmeyr, Anil Somayaji et al. A Sense of Self for Unix Processes. in: Proceedings of the 1996 IEEE Symposium on Security and Privacy. Los Alamitos CA. 1996. IEEE Computer Society Press,1996. 120-128
    A. Ghosh and A. Schwartzbard. A Study in using neural networks for anomaly and misuse detection. in Proceedings of 8th USENIX Security Symposium. 1999. 141-151
    Alfonso Valdes, Keith Skinner. Adaptive, model-based monitoring for cyber attack detection. in: Hervé Debar, Ludovic Mé, Shyhtsun Felix Wu eds. Proceedings of third International Workshop on Recent Advances in Intrusion Detection. Toulouse France. 2000. Springer, 2000. Lecture Notes in Computer Science 1907: 80-92
    Wenke Lee, Dong Xiang. Information-theoretic measures for anomaly detection. in: Proceedings of IEEE Symposium on Security and Privacy. Oakland California USA. 2001. IEEE Computer Society, 2001. 130-143
    Wagner D and Dean D. Intrusion detection via static analysis. IEEE Symposium on security and Privacy. 2001. 156-169
    Wenke Lee. A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems:[PhD thesis]. Columbia University, 1999.
    Daniel Barbará, Julia Couto, Sushil Jajodia, et al. ADAM: A Testbed for Exploring the Use of Data Mining in Intrusion Detection. SIGMOD, 2001, 30(4): 15-24
    Jianxiong Luo, Susan Bridges. Mining Fuzzy Association Rules and Fuzzy Frequency Episodes for Intrusion Detection, International Journal of Intelligent Systems, 2000, 15(8): 687-704

    宋世杰,胡华平,胡笑蕾. 关联规则和序列模式算法在入侵检测系统中的应用. 成都信息工程学院学报, 2004. 19(1): 1-6
    连一峰,戴英侠,王航. 基于模式挖掘的用户行为异常检测. 计算机学报, 2002, 25(3): 325-330
    凌军,曹阳,尹建华等. 基于时态知识模型的网络入侵检测方法研究. 计算机学报, 2003, 26(11): 1591-1598
    Ross J. Quinlan. C4.5 Programs for Machine Learning . Morgan Kaufmann, 1993. 1-34
    William W. Cohen. Fast Effective Rule Induction. in: Armand Prieditis and Stuart Russell eds. Proceedings of the Twelfth International Conference on Machine Learning. Lake Tahoe California. 1995. Morgan Kaufmann, 1995. 115-123
    Xiangyang Li. Clustering and Classification Algorithm for Computer Intrusion Detection:[ PhD thesis]. Arizona State University, 2001.
    Leonid Portnoy and Eleazar Eskin and S. Stolfo. Intrusion Detection with Unlabeled Data using Clustering. in: Proceedings of ACM CSS Workshop on Data Mining Applied to Security. Philadelphia PA. 2001.
    Eleazar Eskin, Arnold A, Michael J. Prerau et al. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. in: D. Barbara and S. Jajodia eds. Applications of Data Mining in Computer Security. Kluwer, 2002.
    Karlton Sequeira, Mohammed J. Zaki. ADMIT: Anomaly-based Data Mining for Intrusions. in: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton Alberta Canada. 2002. 386-395
    Wenke Lee, Wei Fan, Matthew Miller, et al. Toward Cost-Sensitive Modeling for Intrusion Detection and Response. Journal of Computer Security, 2002, 10(1-2): 5-22
    李庆华,童建华. 基于数据挖掘的入侵特征建模. 计算机工程. 2004.5 已录用,近期发表。
    李庆华,王卉,蒋盛益. 挖掘最大频繁项集的并行算法. 计算机科学. 2004. 已录用,近期发表。
    Li Qinghua, Wang Hui, He Ye et al. Scalable algorithm for mining maximal frequent itemsets. in: proceedings of International Conference on Machine Learning and Cybernetics. Xi'an China. 2003.IEEE Xplore, 2003. 1: 143-146

    王卉,李庆华,马传香等. 频繁模式挖掘中的剪枝策略. 计算机工程与科学, 2003. 25(4): 65-68
    Wang Hui, Li Qinghua, Ma Chuanxiang et al. A Maximal Frequent Itemset Algorithm. in: Guoyin Wang, Qing Liu, Yiyu Yao eds. Proceedings of 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Chongqing China. 2003. Springer, 2003. Lecture Notes in Computer Science 2639: 484-490
    Wang Hui, Xiao Zhiting, Zhang Hongjun et al. Parallel Algorithm for Mining Maximal Frequent Patterns. in: Xingming Zhou, Stefan J?hnichen, Ming Xu eds. Proceedings of 5th International Workshop on Advanced Parallel Programming Technologies. Xiamen China. 2003. Springer, 2003. Lecture Notes in Computer Science 2834: 241-248
    Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava. Selecting the right interestingness measure for association patterns. in: D. Hand eds. Proceedings of the Eight ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, 2002. 32-41
    Salvatore J. Stolfo, Wenke Lee, Philip K. Chan et al. Data Mining-based Intrusion Detectors: An Overview of the Columbia IDS Project. SIGMOD, 2001, 30(4): 5-14
    Christian Borgelt. Efficient Implementations of Apriori and Eclat. in: Proceedings of Workshop of Frequent Item Set Mining Implementations. Melbourne FL USA. 2003.
    Jochen Hipp, Ulrich Güntzer, Gholamreza Nakhaeizadeh. Algorithms for Association Rule Mining: A General Survey and Comparison. SIGKDD Explorations, 2000, 2(1): 58-64
    Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant et al. Fast discovery of association rules. in: Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth eds. Advances in Knowledge Discovery and Data Mining. AAAI /MIT Press, 1996. 307-328
    Bart Goethals. Efficient frequent pattern mining:[PhD thesis]. University of Limburg , 2002.

    Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara et al. New algorithms for fast discovery of association rules. in: David Heckerman, Heikki Mannila, Daryl Pregibon eds. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. Newport Beach California USA. 1997. AAAI Press, 1997. 283-286
    Ramesh C. Agarwal, Charu C. Aggarwal, and V. V. V. Prasad. Depth first generation of long patterns. in: Ramakrishnan eds. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 2000. 108-118
    Ramesh C. Agarwal, Charu C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing, 2001, 61(3): 350-371
    Jiawei Han, Jian Pei, Yiwen Yin et al. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53-87
    Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Chen W eds. Proceedings of ACM SIGMOD International Conference on Management of Data. 2000. ACM Press, 2000. SIGMOD, 2000, 29(2): 1-12
    Mohammed J. Zaki, Karam Gouda. Fast vertical mining using diffsets. in: Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining. Washington DC. 2003.
    Mohammed J. Zaki, Ching-Jui Hsiao. CHARM: An efficient algorithm for closed itemset mining. in: Grossman R eds. Proceedings of the Second SIAM International Conference on Data Mining. 2002.
    Jochen Hipp, Ulrich Güntzer, Gholamreza Nakhaeizadeh. Mining association rules: Deriving a superior algorithm by analyzing today’s approaches. in: Djamel A. Zighed, Henryk Jan Komorowski, Jan M. Zytkow eds. Proceedings of 4th European Conference on Principles of Data Mining and Knowledge Discovery. Lyon France. 2000. Springer, 2000. Lecture Notes in Computer Science 1910: 159-168

    Zijian Zheng, Ron Kohavi, Llew Mason. Real world performance of association rule algorithms. in: Foster Provost and Ramakrishnan Srikant eds. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 2001. 401-406
    Doug Burdick, Manuel Calimlim and Johannes Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. in: Proceedings of 17th International Conference on Data Engineering. Heidelberg Germany. 2001. 443-452
    Dimitrios Gunopulos and Heikki Mannila and Sanjeev Saluja. Discovering All Most Specific Sentences by Randomized Algorithms. in: Foto N. Afrati, Phokion G. Kolaitis eds. Proceedings of 6th International Conference on Database Theory. Delphi Greece. 1997. Springer, 1997. Lecture Notes in Computer Science 1186: 56-67
    Dao-I Lin, Zvi M. Kedem. Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set. in: Hans-J?rg Schek, Fèlix Saltor, Isidro Ramos eds. Proceedings of 6th International Conference on Extending Database Technology on Advances in Database Technology. Valencia Spain. 1998. Springer, 1998. Lecture Notes in Computer Science 1377: 105-119
    Dao-I Lin. Fast Algorithms for Discovering the Maximum Frequent Set. in: Proceedings of 9th International Symposium on Algorithms and Computation. 1998.
    路松峰,卢正鼎. 快速开采最大频繁项目集. 软件学报, 2001, 12(2): 293-297
    宋余庆, 朱玉全, 孙志挥等. 基于FP-Tree 的最大频繁项目集挖掘及更新算法. 软件学报, 2003, 14(9): 1586-1592
    刘君强, 孙晓莹,王勋.最大频集的挖掘方法.计算机工程, 2003, 29(11): 25-27
    Mohammed J. Zaki. Parallel and Distributed Association Mining: A Survey. IEEE Concurrency, 1999, 7(4):14-25
    Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara et al. New Parallel Algorithms for Fast Discovery of Association Rules. Data Mining and Knowledge Discovery, 1997, 1(4): 343-373
    Mohammed J. Zaki, Mitsunori Ogihara, Srinivasan Parthasarathy et al. Parallel Data Mining for Association Rules on Shared-memory Multi-processors. in: Proceedings of ACM/IEEE conference on Supercomputing (CDROM). Pittsburgh Pennsylvania United States. 1996. 43-56

    David Wai-Lok Cheung, K. Hu, S. Xia. Asynchronous Parallel Algorithm for Mining Association Rules on a Shared-memory Multi-processors. in: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms And Architectures. Puerto Vallarta Mexico. 1998.
    Takahiko Shintani and Masaru Kitsuregawa. Hash based parallel algorithms for mining association rules. in: Proceedings of IEEE Fourth International Conference on Parallel and Distributed Information Systems. 1996. IEEE Computer Society Press, 1996. 19-30
    Egan J.P. Signal detection theory and ROC-analysis. New York: Academic Press. 1975.
    Jo?o BD Cabrera, Lundy M. Lewis, Raman K. Mehra. Detection and Classification of Intrusions and Faults using Sequences of System Calls. SIGMOD, 2001, 30(4):25-34
    Tom Fawcett, Foster J. Provost. Combining data mining and machine learning for effective user profiling. in: Evangelos Simoudis, Jiawei Han, Usama M. Fayyad eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland OR.1996. AAAI Press, 1996. 8-13
    Salvatore J. Stolfo, Andreas L. Prodromidis, Shelley Tselepis et al. JAM: Java agents for meta-learning over distributed databases. in: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. Newport Beach CA. 1997. AAAI Press, 1997. 74-81
    Kimmo H?t?nen, Mika Klemettinen, Heikki Mannila et al. TASA: Telecommunication alarm sequence analyzer. in: Proceedings of the IEEE/IFIP Network Operations and Management Symposium. 1996. 520-529

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700