基于量化概念格的关联规则挖掘模型研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据库中的知识发现是当前涉及人工智能、数据库等学科的一门相当活跃的研究领域。数据挖掘是从数据中提取人们感兴趣的、潜在的、可用的知识,并表示成用户可理解的形式。关联规则挖掘是数据挖掘的一个重要分支,是描述数据库中数据项(属性、变量)间存在的潜在关系。
     概念格通过概念的内涵和外延及泛化和例化之间的关系来表示知识,因而适用于从数据库中挖掘规则的问题描述。在概念格的内涵中引入等价关系并将其外延量化,得到量化概念格。本文是基于量化概念格的关联规则挖掘为中心而展开的。
     本文中创新性的主要内容如下:
     ①提出了基于量化概念格的关联规则及基于兴趣度加权的量化概念格的关联规则挖掘的思想、算法以及性能分析。基于兴趣度加权的量化概念格关联规则挖掘选择大于兴趣度加权阈值的项目构造量化概念格,在此基础上交互地挖掘感兴趣的关联规则。
     与Apriori算法相比,两种方法所挖掘出的规则结果完全吻合,前者具有较好的时间性能,规则表示更直观,减少了算法的搜索空间和计算量,提高了挖掘的效率和准确性。
     ②改进了传统的购物篮分析,由于传统的购物篮分析只关心顾客是否购买商品,忽略其购买的数量,因而在实际应用中,有很大的局限性,在本文中,不仅要关心顾客是否购买商品,而且考虑顾客购买的数量,在传统的购物篮分析中,引入兴趣度加权思想,并提出了如何获取兴趣度加权阈值的方法,因此在改进了传统的购物篮分析基础上,基于量化概念格所挖掘出的关联规则有更贴近于实际和应用价值。
Knowledge discovery in databases (KDD) is a rapidly emerging research field relevant to artificial intelligence and database system. Data Mining is the process of mining the interesting, potentially useful, valid and understandable knowledge in data. Association rule mining is an important sub-branch of Data Mining, which describes the potential relationships between attributes and variables in databases.
    Concept Lattice represents knowledge with the relationships between the intension and the extension of concepts, and the relationships between the generalization and the specialization of concepts, thus it is applied to the description of association rules mining in databases. The Quantitative Extended Concept Lattice (QECL) evolves from concept lattice by introducing equivalence relationships to its intension and quantity to its extension. The paper is presented by the main ideas, the research on the model of association rules mining based on quantitative extended concept lattice.
    There are original main ideas in the paper zs follows:
    (1) The main ideas, algorithm and capability performance analysls of the model of association rules mining based on quantitative extended concept lattice and that of association rule mining by interest-weighted have been proposed, Association rule mining by interest-weighted on quantitative extended concept lattice is an algorithm that we choose those items whose interest-weighted are bigger than the interest-weighted threshold, generate QECL, then mine mutually interest-weighted association rules according to user's interests.
    Compared with Apriori algorithm, the uniform results of association rules have been obtained by the two methods, but association rules mining by interest-weighted on quantitative extended concept lattice has high quality of time complexity, shows association rules more brief and visual, reduces much searching space and computation of the algorithm, then improves the efficiency and veracity
    
    
    
    of association rules mining.
    (2) Traditional marked-basket analysis has been improved, Since it only cares for that the customer have bought something or not, ignores the quantity of those bought, There are some more limitations in practical application. In the paper, I am concerned about both cases, then introduce the idea of interest-weighted to marked-basket analysis, put forward the algorithm how to acquire the interest-weighted threshold, therefore, The association rules mining by interest-weighted on quantitative extended concept lattice is more practical.
引文
[1] W.J. Frawley, G. Piatetsky, C. Shapiro, J. Matheus, Knowledge Discovery in Databases: An Overview. In Piatetsky-Shapiro, W. J. Frawley eds. Knowledge Discovery in Databases. Menlo Park, California: AAAI Press/The MIT Press, 1991, p1-27.
    [2] Marcel Holsheimer, Martin Kersten, Heikki Mannila, Hannu Toivonen, A Perspective on Databases and Data Mining, In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD95), p150-155.
    [3] N.Cercone, A. Tsuchiya eds, Special Issue on Learning and Discovery in Knowledge-Bases Database, IEEE Transactions on Knowledge and Data Engineering, 1993, 5(6).
    [4] 丁德恒,《大规模数据库中的知识获取》,计算机科学,1994年,21卷,第5期,p48-50。
    [5] U. Fayyad M. Piatetsky-Shapiro G. Smyth, From Data Mining to Knowledge Discovery: An Overview. In: Advances in Knowledge Discovery and Data Mining. Menlo Park, California: AAAI Press/The MIT Press, 1996, p1-35.
    [6] 陈栋,刘兵,徐洁磐,《KDD研究现状及发展》,计算机科学,1996,23卷第6期:p38—42.
    [7] Christopher J. Matheus, G.Piatetsky-Shapiro and Dwight Mcneill, Selecting and Reporting, What is Interesting?: The KEFIR Application to Healthcare Data. In: Advances in Knowledge Discovery and Data Mining. Menlo Park, California: AAAI Press/The MIT Press, 1996, p495-516.
    [8] U.Fayyad, G. Piatetsky-Shapiro, P. Smyth, Knowledge Discovery and Data Mining: Towards a Unifying Framework. KDD'96 Proc. 2nd Intl. Conf. on Knowledge Discovery & Data Mining, AAAI Press, 1996.
    
    
    [9] Raskesh Agrawal, Fomasz Imielinski.Arun Swami, Mining Asociation Rules between Sets of Itcms in Large Databases. Proceedings of ACM SIGMOD Conference on Management of Data, Washington, DC, May 1993, p207-216.
    [10] R. Agrawal, R. Srikant, Mining Generalized Association Rules, Proceedings of the 1994 Int. Conf. on Very Large Databases, Zurich, Switzerland, September 1994, p407-419.
    [11] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules in Large Databases, In Research Report RJ9839, IBM Almaden Research Center, Sanjose, CA, June 1994.
    [12] R. Agrawal, R. Srikant, Fast Algorithms for Mining Association Rules, In Proc. 20th int'l Conf. Very Large Data Bases (VLDB), 1994, p487- 499.
    [13] K.M. Ahmed, N.M. E1-Makky, and Y. Taha, A Note on" Beyond Marked Basket: Generalizing Association Rule to Correlations.", SIGKDD Explorations, 2000, p46-48.
    [14] A. Raskesh, I. Tomasz S. Arun, Mining Association Rules Between Sets of Items in Large Databases, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data.
    [15] R. Agrawal, R. Srikant, Quoc Vu, Mining Association Rules with Item Constraints, In Proc 3rd Int. Conf. Knowledge Discovery and Data Mining (KDD97), Newport Beach, California, August 1997, p67-73.
    [16] R.Wille, Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts. In: RivalI ed. Ordered Sets. Dordrecht: Reidel, 1982, p445-470.
    [17] R. Wille, Knowledge Acquisition by Methods of Formal Concept Analysis. In: Diday E(ed). Data Analysis, Learning Symbolic and Numeric Knowledge. New York: Nova Science Publishers, Inc., 1989, p365-380.
    [18] Ingo Schmitt & Gunter Saake, Merging Inheritance Hierarchies for Scheme Integration Based on Concept Lattices (1997). http://www, iti.cs.uni-magdeburg.de/publikationen/97/SchSaa97preprint.ps.gz.
    [19] Hu Xuegang, Etc, The Design Knowledge Representation and Reasoning in Intelligence CAD Based on Extended Concept Lattice. In Proc.of 3rd International Conf. on Computer-Aided Industrial Design and Conceptual
    
    Design, Hong Kong, Nov. 2000, p 460-463.
    [20] Hu Xuegang, Etc, The Design Knowledge Representation and Reasoning in Intelligence CAD Based on Extended Concept Lattice. In Proc.of 3rd International Conf. on Computer-Aided Industrial Design and Conceptual Design, Hong Kong. Nov. 2000, p 460-463.
    [21] 胡学钢,《从数据库中提取知识的模型研究》,合肥工业大学博士学位论文,合肥,2000,5.
    [22] R. Godin, R. Missaoui and H. Alaoui, Incrcmental Algorithms for Updating the Galois Lattices of a Binary Relation, Tech. Rep.#155,Dept .of Comp. Science, UQAM, Sep. 1991.
    [23] Raskesh Agrawal, Tomasz Imielinski, Arun Swami, Mining Association Rules between Sets of Items in Large Databases. Proceedings of ACM SIGMOD Conference on Management of Data, Washington DC, May 1993, p207-216.
    [24] Jitender Deogun, Vijay V. Raghavan, Heayri Sever, Association Mining and Formal Concept Analysis. In: Anita, W. San,C Eds. Proc. of the RSDMGRC'98. Duke: Elsevier Science Publishers, 1998.
    [25] Oregor Snelting, Reengineer of Configuration Based on Mathematical Concept Analysis. ACM Transactions on Software Engineering and Methodology, 5(2),1996, p146-189.
    [26] R. Srikant, R. Agrawal, Mining Quantitative Association Rules in Large Relational Tables, In: Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, Montreal, Canda, June 1996, p1-12
    [27] 王志海,胡可云,胡学钢,刘宗田,张奠成,《概念格上规则提取的一般算法与渐进式算法》,计算机学报,Vol.22,No.1,Jan 1999
    [28] K. Hu, Y. Lu, L. Zhou, C. Shichunyi, Integrated Classification and Association Rule Mining Based on Concept Lattice [A], N. Zhong and A. Skowron, eds. In: Proc. of RSFDGrC99[C], 1999, p443-447.
    [29] K. Hu, Y. Lu, C. Shi, Incremental Association Rule Mining: A Concept Lattice Approach [A]. In: Ning Z, Lizhu Z. Eds. Proc. of PAKDD-99[C]. Springer, 1999, p109-113.
    
    
    [30] Savasere A, E. Omiecinski, S. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases. In VLDB'95, p432-443
    [31] M. Sahami, Learning classification rules using lattices (Extended Abstract). In ECML-95: Proceedings of the Eighth European Conference on Machine Learning, Berlin, Germany: Springer-Verlag, 1995, p343-346
    [32] R. Missaoui, R. Godin, Extracting Exact and Approximate Rules From Databases. In: V. S. Alagar, S. Bergler, F.Q. Dong (Eds). Incompleteness and Uncertainty in Information Systems.London: Springer-Verlag, 1994, p209-222.
    [33] R. Wille, Knowledge Acquisition by Methods of Formal Concept Analysis. In: Diday E ed. SData Analysis, Learning Symbolic and Numeric Knowledge. New York: Nova Science Publisher, 1989, p365-380
    [34] W. Ziarko, The Discovery, Analysis and Representation Data Dependencies in Databases. In Piatetsky-Shapiro, W. J. Frawley, eds. Knowledge Discovery in Databases. Menlo Park, California: AAAI Press/The MITPress,1991, p195-209.
    [35] M. Faid, R. Missaoui, R. Godin, Mining Complex Structures Using Context Concatenation in Formal Concept Analysis. Proceedings of the Second International KRUSE Symposium, Aug. 1997, Vancouver, p11-13.
    [36] J. Gennari, H., Langley, P., and D. Fisher, "Models of Incremental Concept Formation", in Machine Learning: Paradigmas and Methods, J. Carbonell, (Ed.), 1990, MIT Press, Amsterdam, The Netherlands, 1990, p11-62.
    [37] R. Godin, R. Missaoui, and H. Alaoui, "Learning Algorithms Using a Galois Lattices Structure," in the Proc. Third Int. Conf .on Tools for Artificial Intelligence, San Jose, Calif., IEEE Computer Society Press, 1991, p22-29.
    [38] R. Godin, R. Missaoui, H. Alcui.Incremental Concept Formation Algorithms Based on Galois (concept) Lattices, Computational Intelligence, 1995, 11(2): 246-267.
    [39] Gregor S. Reengineering of Configurations Based on Mathematical Concept Analysis (1996). ACM Transactions on software Engineering and methodology. http://citeseer.nj.nec.com/snelting96reengineering.html.
    [40] Han J W, Fu Y J. Discovery of Multiple-level Association Rules from Large Databases (1995). Proc. of 1995 Int'l Conf. on Very Large Data Bases
    
    (VLDB'95). http://citeseer.nj.nec.com/han95discovery.html.
    [41] 谢志鹏,刘宗田,《概念格与关联规则发现》,计算机研究与发展,2000.12.
    [42] 王德兴、胡学钢、王浩,《基于概念格兴趣度加权的关联规则挖掘模型研究》,南京大学学报(自然科学版),计算机专辑,2002,9
    [43] A. Savasere, E. Omiecinski, S. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases. In VLDB'95, 1995, p432-443.
    [44] H. Toivonen, Sampling Large Databases for Association Rules. Proc. 1996 Int. Conf. (VLDB'96), Bombay, India, September 1996, p134-145.
    [45] J. S Park, Chen M. S, Yu P. S, An Effective Hash-based Algorithm for Mining Association Rules. In SIGMOD'95, p175-186.
    [46] D. P. Ballou and G. K. Tayi, Enhancing Data Quality in Data Warehouse Environments. Communications of ACM, 42: 73-78, 1999.
    [47] R. Wang, V. Story, and C. Firth, A Framework for Analysis of Data Quality Research. IEEE Trans. Knowledge and Data Engineering, 7: 623-640,1995.
    [48] D.Pyle, Data Preparation for Data Mining, Sam Francisco, Morgan Kaufmann, Publishers, 1999.
    [49] R. Wang, V. storey, and C. Firth, A Framework for Analysis of Data Quality Research, IEEE Trans, Knowledge and Data Engineering, 7:623-640,1995.
    [50] Han Jiawei, Micheline Kamber, Data Mining---concepts and techniques, High Education Press, Morgan Kaufman Publishers, 2001, p227-236.
    [51] 王德兴、胡学钢、王浩,《基于概念格和Apriori的关联规则挖掘算法比较》,2002年中国第八届青年学术年会,中国科学技术大学出版社,2002.7
    [52] 王德兴、胡学钢、王浩,《基于量化概念格的关联规则挖掘》,合肥工业大学学报(自然科学版),第25卷第5期,20002.10。
    [53] Wang Dexing, Hu Xuegang, Wang Hao, 《The Research on Model of Mining Association Rules Based on Quantitative Extended Concept Lattice》, IEEE The First International Conference on Machine Learning and Cybernetics, Nov.4-5, 2002, Beijing.(EI、ISTP 索引).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700