银行决策支持系统中数据挖掘的研究与实现

作者：周四新
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：决策支持系统 ; 数据仓库 ; 数据挖掘
英文关键词：decision support system ; data warehouse ; data mining
学位年度：2004
导师：陈松乔
学科代码：081203
学位授予单位：中南大学

摘要

在国外银行即将进入中国市场之前，提高国有银行的决策质量以增强其市场竞争力，具有极其重要的意义。基于这种情况，我们和中国农业银行湖南省分行合作开发研制了农行决策支持系统项目。
     系统采用基于B／S模式的四层体系结构，界面层采用Jsp技术访问Web服务器，中间层应用服务器存储以JavaBean形式表示的业务逻辑，底层是用Sybase ASE 12.0构造的逻辑数据仓库。
     论文第一章介绍了决策支持系统、数据仓库和数据挖掘的研究现状；第二章给出系统的总体结构；第三章着重阐述了逻辑数据仓库的构造；第四章对系统所采用的挖掘构件做了详尽的描述；第五章是我在研发过程中的工作总结。其中，第三章和第四章是论文的重点。
     对于数据仓库的设计，我们从概念模型设计开始，首先分析系统的需求，从而确定数据仓库的主题，并且采用星型模式进行建模。然后是逻辑模型设计，对概念模型进行进一步的细化，对每一个主题域确定其维表和事实表的表结构。最后进行物理模型设计，针对系统实际的硬件环境，确定数据的存储结构，数据的索引策略，以及数据的存放位置。
     系统针对银行的具体业务情况，设计了三个挖掘操作构件。分类构件所采用的是基于对SPRINT算法进行改进的NS判定树分类算法，其可扩展性和挖掘时间都得到了很好的改善。序列模式挖掘构件、聚类构件分别采用的是AprioriSome算法和CLIQUE算法。
It is significant to improve the quality of decision making of state banks and thus enhance their competency against overseas banks that are entering Chinese market.
    For this reason, the project "Agricultural Bank's Decision Support System(ABDSS)" has been developed in the cooperation with Agricultural Bank of Hunan province.
    A four-level structure based on B/S form is employed in the system, that is, in the interface level, the web server is accessed using the JSP technique; in the middle levels, the business logics are stored in the server by means of Java Beans; and the bottom level is a data warehouse developed by Sybase ASE 12.0.
    In the first chapter of this thesis, current advances in the research of decision support system, data warehouse and data mining are introduced. The second chapter proposes the system structure and design ideas. The third chapter discusses mainly about the development of the logic data warehouse. The fourth chapter gives a detailed description on the data mining component applied in this system. And in the fifth chapter, a conclusion is drawn about the work done in this research. The third and fourth chapters form the main part of the whole thesis.
    Regarding the data warehouse design, it starts from the design of conceptual model. Firstly, the system requirements must be analyzed thus the subjects of the data warehouse can be determined, in which the star schema is used for the modeling work. Next, the logic model design is carried out. In this process, the logic model should be further developed and improved, that is, the structure of dimension tables and fact tables for each subject will be determined. Finally, the physical model design is performed, in which the data storage and index strategy are considered according to the actual hardware conditions.
    Aiming at specific business situations of the bank concerned, three mining operation components are designed. For the classification component, a decision tree classification algorithm named NS is used for application, which derives from the SPRINT algorithm and has been


    improved, so that it can get a better performance on both the extending ability and mining time. Concerning the sequence pattern mining component and clustering component, the AprioriSome algorithm and CLIQUE algorithm are employed respectively for their development.

引文

[1] W. Frawley, G. Piatetsky-Shapiro, C. Matheus. Knowledge discovery in databases: an overview. Knowledge Discovery in Databases, page 1-27. Cambridge, MA: MIT Press, 1991.
    [2] K. Sattler and O. Dunemann. SQL Database Primitives for Decision Tree Classifiers. Proc. of the 10th ACM CIKM Int. Conf. on Information and Knowledge Management. 2001
    [3] R. Agrawal, T. Imielinski, A. Swami. Database mining: A performance perspective. IEEE Trans. on Knowledge and Data Engineering, 5(6), Dec. 1993
    [4] M. Mehta, R. Agrawal, J. Rissanen. SLIQ: A fast scalable classifier for data mining. In Proc. Of the Fifth EDBT, 1996
    [5] J. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In Proc. Of VLDB, 1996
    [6] R. Agrawal et. al. An interval classifier for database mining applications. In Proc. Of the VLDB conf. 1992
    [7] P.K. Chan and S.J. Stolfo. Meta-learning for multistrategy and parallel learning. In Proc. Second Intl. Workshop on Multistrategy Learning, page 150-165, 1993
    [8] R. Agrawal, J. Gehrke, et al. Automatic subspace clustering of high dimensional data for data miningapplications. In Proc. ACM-SIGMOD International conference on management of data. 1998
    [9] R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM-SIGMOD Conference on manegerment of data. 1993
    [10] R. Agrawal and R. Srikant. Mining Sequential Patterns. In Proc. of the Eleventh IEEE International Conference on Data Engineering. 1995
    [11] J. Gehrke, R. Ramakrishnan, V. Ganti. Rainforest: A framework for fast decision tree construction of large datasets. In Proc. Very Large Data Bases(VLDB).1998


    [12] R. Rastogi, K. Shim. Public: A decision tree classifier that integrates building and pruning. In Proc. 1998 Int. Conf. Very Large Data Bases, page 404-415, New York, Aug. 1998
    [13] Jeusfeld, M.A. Quix, C. Jarke. Design and Analysis of Quality Information for Data Warehouses. In Proc. 17th International Conference on Conceptual Modeling(ER'98),Singapore, Nov 16-19, 1998
    [14] Terry Moriarty. Modeling Data Warehouse. Database Programming and Design, 1996
    [15] M.S. Chen, J.W. Han, P.S. Yu. Data mining: an overview from a database perspective. (J) IEEE Transactionson Knowledge and Data Engineering, 1996.8(6):866～883
    [16] 李宏，陈松乔．一种序列模式的概念及挖掘算法．中南工业大学学报．2001，32(4)：425～427
    [17] 李宏，陈松乔，王建新．基于时序模式关联的股票走势分析研究．计算机工程与应用．2001，37(13)：56～57
    [18] W．H．Inmon著，王志海等译．Building the Data Warehouse．北京：机械工业出版社．2000
    [19] Jiawei Han，Micheline Kamber著范明，孟小峰译．数据挖掘—概念与技术．北京：机械工业出版社．2001
    [20] 戴超凡，陈文伟．数据仓库中元数据技术研究．计算机工程与应用．2001，37(14)：85～87
    [21] R. Barquin, H Edelstein. Planning and Designing the Data Warehouse. Prentice Hall PTR. 1997
    [22] S. Chaudhuri, U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 1997
    [23] A. Berson, J. Smith. Data Warehousing, Data Mining and OLAP. McGraw-Hill, 1997
    [24] 胡侃，夏绍伟．基于大型数据仓库的数据采掘：综述研究．软件学报，1998，9(1)：53～63
    [25] 陈京民等编．数据仓库与数据挖掘技术．北京：电子工业出版社．2002
    [26] 钱卫宁，魏藜等．一个面向大规模数据库的数据挖掘系统．软件学报．2002，13(8)：1540～1546


    [27] 田金兰，李奔．用决策树方法挖掘保险业务数据中的投资风险规则．小型微型计算机系统．2000，21(10)：1035～1038
    [28] 王清毅，张波，蔡庆生．目前数据挖掘算法的评价．小型微型计算机系统．2000，21(1)：75～78
    [29] 刘红岩，陆宏钧，陈剑．利用数据库技术实现的可扩展的分类算法．软件学报．2002，13(6)：1075～1081
    [30] 陆丽娜，陈亚萍，魏恒义，杨麦顺．挖掘关联规则中Apriori算法的研究．小型微型计算机系统．2000，21(9)：940～943
    [31] 李水平，陈意云，黄刘生．数据采掘技术回顾．小型微型计算机系统．1998，19(4)：74～81
    [32] 刘红岩，陈剑，陈国青．数据挖掘中的数据分类算法综述．清华大学学报(自然科学版)．2002，42(6)：727～730
    [33] 李存华，纪兆辉．基于互联网络的决策支持系统模型．计算机工程．2000，26(10)：48～50
    [34] 周斌，吴泉源．序列模式挖掘的一种渐进算法．计算机学报．1999，22(8)：882～887
    [35] 田边，戴冠中．构件模板动态实例化技术的研究与应用．小型微型计算机系统．2000，21(12)：1286～1289
    [36] 戴玉勤，景广军，谢俊元．基于数据仓库技术的银行决策支持系统设计和实现．计算机工程与应用．2002，38(5)：224～227
    [37] 刘廷焕．银行经营分析学．北京：中国经济出版社，1993
    [38] 王向阳．试论计算机网络信息管理与决策支持系统．小型微型计算机系统．1994，5(1)：47～52
    [39] 李宝东，宋瀚涛．数据挖掘在客户关系管理(CRM)中的应用．计算机应用研究．2002，12(10)：57～60
    [40] 蒋旭东，周立柱．数据仓库查询处理中的一种多表连接算法．软件学报．2001，12(2)：190～195

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700