关联规则挖掘EARM算法及其应用的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本课题通过对关联规则挖掘及其经典Apriori算法的深入研究,针对类Apriori算法的效率瓶颈,提出了一个高效的关联规则挖掘算法,即EARM(Efficient Association Rule Mining)算法。
     一般的类Aprioir算法会产生非常庞大的候选项集,对挖掘的效率是一个沉重的负担,并且类Apriori在每一阶段循环都需要进行重复数据库存取确认,这对系统而言也是一个很大的负担。另外只用支持度作为频繁项集产生的度量指标,并未真正考虑实际交易数量和不同商品的消费一定会产生不同的获利情况等显著性的问题;针对这些不足EARM算法很好的给予了改进,使关联规则的挖掘效率得到了改善,并将EARM算法在CRM系统中给予应用。本论文将对EARM算法及其应用进行详细的研究,分析和验证。
     在本论文中,首先对关联规则挖掘及其算法,历史进行了一定的介绍,从中我们可以看出,虽然关联规则挖掘产生的时间并不是很长,但它的发展和在现实生活中的应用却很快。接着详细介绍了类Apriori算法的原理和特点,在这样的基础之上,提出了高效的EARM算法。接下来,本论文对EARM算法的原理、结构和实现进行了详细的分析,并对EARM算法与其他类Apriori算法的执行效率进行比较。接着给出了EARM算法在客户关系管理系统中的应用实现,使理论研究上升到实际的应用。
To solve the problem of efficiency bottleneck of Apriori-like algorithm, we propose a high-efficient association rule mining algorithm, namely EARM( Efficient Association Rule Mining) algorithm via further study of association rule mining and typical Apriori algorithm.
    General kinds of Apriori-like algorithm always produce huge number of candidate itemset, and it's a grave burden to mining efficiency. At the same time Apriori-like algorithm needs repeated confirming of the database FETCH in every stage circulation. That is also a great burden for the system. In addition, only use the threshold of support for measurement of the frequent items, but not really think that the actual trade amount and the consuming of different commodity will produce the different payoff and others; EARM algorithm make good improvement for this shortage, and perfect for the mining efficiency of the association rule, and apply in the CRM system well. This paper will research , analyze and verify in detail on EARM algorithm and it's application.
    In this paper, introduce the association rule , it's algorithm and it's history firstly, we can know that although the birth time is not long of the association rule, it's development and application in life is speedy. Secondly, introduce the theory and character of the Apriori-like algorithm, based on this, propose the efficient EARM algorithm. Then this paper analyze the theory , structure and application of the EARM algorithm at length , and compare the implement's efficiency of the EARM with the Apriori-like algorithm. Then give the application's realization of the EARM in Customer Relation Management system to make the theory upgrade practical application.
引文
[1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of data, pp.207-216, 1993.
    [2] R.Agrawal,and J. Sharer. Parallel mining of association rules:Design,Implementation, and Experience. Technical Report FJ 10004, IBM Almaden Research Center, San Jose, CA 95120, Jan.1996.
    [3] S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets:generlizing association rules to correlations. Proceedings of the ACM SIGMOD, 1996. pages 255-276.
    [4] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic Itemset counting and implication rules for market basket data. In ACM SIGMOD International Conference On the Management of Data.1997.
    [5] F. Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new paradigm for fast, quantifiable data mining.
    [6] J. Kleinberg, C. Papadimitriou, and P. Raghavan. Segmentation problems. Proceedings of the 30th Annual Symposium on Theory of Computing, ACM. 1998.
    [7] J. L. Lin, and M. H. Dunham. Mining association rules: Anti-skew algorithms. Proceedings of the International Conference on Data Engingeering, Orlando, Florida, February 1998.
    [8] H. Mannila, H. Toivonen, and A. Verkamo. Efficient algorithm for discovering association rules. AAAI Workshop on Knowledge Discovery in Databases,1994, pp.181-192.
    [9] R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. Proceedings of ACM SIOMOD International Conference on Management of Data, pates 13-24, Seattle, Washington, June 1998.
    [10] J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for mining association rules. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 175-186, San Jose, CA, May 1995.
    [11] J. S. Parks M. S. Chen, and P. S. Yu. Efficient parallel data mining of association rules. 4th International Conference on Information and Knowledge Management, Baltimore, Maryland, Novermber 1995.
    [12] R. Srikant, and R. Agrawal. Mining generalized association rules. Proceedings of the 21st Intelnational Conference on Very Large Database, 1995, pp. 407-419.
    [13] R. Srikant, and R. Agrawal. Mining quantitative association rules in large relational tables. Proceedings of the ACM SIGMOD Conference on Management of Data, 1996. pp. 1-12.
    [14] A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. Proceedings of the 21st International Conference on Very large Database, 1995.
    
    
    [15] A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of costomer transactions. Proceedings of the International Conference on Data Engineering, February 1998.
    [16] H. Toivonen. Sampling large databases for association rules. Proceedings of the 22nd International Conference on Very Large Database, Bombay, India, September 1996.
    [17] M. J. Zaki, S. Parthasarathy, and W. Li. A localized algorithm for parallel association mining. 9th Annual ACM Symposium on Parallel Algorithms and Architectures, Newport, Rhode Island, June 1997.
    [18] J.Hart, J.Pei,and Y. Yin.Mining frequent patterns without candidate generation. In Proc.2000 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD'00), Dalas,TX,May 2000.
    [19] Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis,Piotr Indyk,Rajeev Motwani,Jeffrey D.Ullman,Cheng Yang. Finding Interesting Associations without Support Pruning.
    [20] Jiawei Hah,Sonny H.S. Chee,Jenny Y. Chiang. Issues for On-Line Analytical Mining of Data Warehouses.
    [21] Information Discovery, Inc.OLAP and DataMining,Bridging the Gap.
    [22] C. C. Aggarwal, and P. S. Yu. A new framework for itemset generation. IBM Research Report, RC-21064.
    [23] Jochen Hipp, Ulrich G(?)ntzer Algorithms for Association Rule Mining A General Survey and Comparison (2000)
    [24] TopCat: Data Mining for Topic Identification in a Text Corpus-Clifton, Cooley, Rennie (2002)
    [25] Motwani-Cohen et al., Finding interesting associations without support pruning-2000
    [26] Hipp, untzer et al. Mining association rules: Deriving a superior algorithm by a..-2000
    [27] 张维明 《数据仓库原理与应用》电子工业出版社,2002
    [28] 朱明 《数据挖掘》 中国科学技术大学出版社,2002
    [29] 李德毅 《数据挖掘研究现状》http://www.dwway.com
    [30] 郑伟民 《数据挖掘工具简介》http://www.chinabyte.com/20001120/134576.shtml
    [31] 《多策略数据挖掘平台MSMiner》http://www.intsci.ac.cn/product/msminer/index.html
    [32] 宝利嘉 《客户关系管理解决方案》 中国经济出版社,2002
    [33] 唐璎璋 孙黎 《一对一营销》 中国经济出版社,2002
    [34] 《挖掘数据中的“宝藏”》http://www.hotsales.net/enterprise/knowledge/knowledge.asp?typeid=3
    [35] 田金兰黄刚,《关联规则挖掘在保险业务中的应用》计算机世界1999.20
    [36] 李煊 汪晓岩 《基于关联规则挖掘的个性化智能推荐服务》 计算机工程与应用2002.11
    [37] 张磊钟勇 《基于神经网络规范约束的关联规则挖掘》 计算机应用 2002.6
    [38] 孙海洪 夏克俭 《一种挖掘意外规则的快速算法》 计算机工程与应用2001.19
    [39] 孙建平 梅晓勇 《关联规则在高校智能排课系统中的应用》计算机应用2002.5
    
    
    [40] 颜雪松 蔡之华 《一种改进的挖掘关联规则的并行算法研究》计算机工程 2002.6
    [41] 邱光谊 《管理信息系统》 电子工业出版社 2001.3

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700