基于数据仓库的数据挖掘方法在经济系统中的应用研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于数据仓库的数据挖掘方法在经济系统中的应用研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

作者：王晓红
论文级别：硕士
学科专业名称：数量经济学
中文关键词：数据挖掘 ; 数据仓库 ; 超市 ; 股票 ; 证券市场
英文关键词：Data Mining ; Data Warehouse ; retail trade ; stock ; securities business
学位年度：2004
导师：高洪深
学科代码：020209
学位授予单位：北方工业大学
论文提交日期：2004-04-20

摘要

随着现代信息技术、通讯技术和计算机技术的高速发展，数据库应用的范围、深度、规模不断扩大，使得无论是企业、科研机构或政府部门等，尤其在经济系统中(例如零售业、证券市场)过去多年的时间里积累了大量的数据。目前人们面临着迅速扩张的数据，如何有效地利用这一丰富的数据宝藏为人类服务，已经成为许多信息工作者关注的焦点之一。与日趋成熟的数据管理技术与软件工具相比，人们所依赖的数据分析工具功能，却无法有效地为决策者提供其决策支持所需要的相关知识，从而形成了一种“丰富的数据，贫乏的知识”之独特的现象，人们迫切需要有新的有效手段对这些数据进行分析，数据挖掘方法就是为满足这种需求而产生并迅速发展起来的。
     本文重点研究基于数据仓库的数据挖掘方法在经济系统中的应用研究，主要针对零售业和证券市场进行数据挖掘，这两个领域的数据丰富，具有一定的代表性。本文探讨了两种算法(关联规则、决策树)。首先进行数据采集，建立数据库，将有用的数据从数据库中提取、整合等转到一个数据仓库中，在此基础上，利用这些数据进行实证分析，从而发现一些规律性的东西。
     本文的工作主要体现在以下几个方面：
     1、论文首先对数据挖掘国内外的研究现状进行了分析，总结了国外数据挖掘在internet／web、电子商务等方面的应用，指出了国内的研究水平仍在起始阶段，绝大多数工作集中于局部算法设计，虽然有的开始进行软件开发，但还处在业务数据转移和建立数据仓库的初级阶段，进行综合的系统集成设计却寥寥无几，由于技术核心的欠缺，使得数据挖掘应用到各行业中的还不多。
     2、论文对数据挖掘的产生、定义和过程进行了进一步分析，同时结合本文所用到的决策树和关联规则方法，对数据挖掘的方法进行了深入探讨。
     3、论文设计和构造数据挖掘的集成开发环境——证券市场数据仓库(DW)，提出了证券市场数据仓库的解决方案，介绍数据仓库系统的组成、信息来源、功能设计、建模及其关键技术，是本文的创新点之一。
     4、在前人研究的基础上，突破了对局部算法的理论研究，将数据挖掘算法具体应用到了证券市场中，论文主要探讨了关联规则，针对证券市场的行情交易数据，看出股票的走势与价格有一定的关系，一段时间低价股上涨，一段时间中价股上涨，一段时间高价股上涨，说明股价与涨跌之间存在一定的关系即数量关

    联规则，发现股票价格和股票涨跌之间的关系，同时提取一个可信度最优的规则，
    这是本文的另一个创新点。
     5、由于股票代码相当于商品，属于布尔型变量，论文采用APriori算法，该
    算法主要是针对布尔型变量的关联规则算法，利用行情数据，挖掘类似于“某只
    股票在某段时间内是上涨的，在置信度为某个值的情况下，另一只股票也随之上
    涨”·的规律，这又是本文的一大特色。
     6、论文将客户关系管理和数据挖掘结合起来，给超市提出建设性意见，利
    用决策树算法将顾客进行划分，建议超市实行会员卡服务，根据不同的会员卡实
    行不同的优惠活动，为客户提供相应的服务。
With the development of modern information technology and computer technology, the scope, depth of data base is to become larger. Many enterprises and scientific organizations especially economy system (retail trade, securities business etc.) accumulated a great deal data a few years ago. At present, people are faced with proliferate data which provide favorable conditions for establishes of data warehouse. At present, Data Base Management System can succeed in realizing record modification and query. With the rapid increase of data,,query and stat can't meet practical need and find relation of data so that: knowledge is lacked. At present, a new-effective measure is needed to analyze data. Data mining become developing in order to meet needs.
    The keystone of the paper is to discuss data mining methods and application in-economy system(such as retail trade securities business ). Two methods(association rule, decision tree) are mentioned. Firstly, data is get together and database is established. Secondly, useful data is extracted from database. Finally, extracted data is analyzed to get some well-regulated knowledge.
    My job can be seen most from the following:
    Firstly, the paper analyze the overseas actuality of data mining and summarize application in internet/web etc. the paper account for the national situation where there are no many industries use data mining to analyze data..
    Secondly, the paper introduces data mining: definition, course; association rule and decision tree are discussed mostly.
    Thirdly, the paper designs a integrate condition for data mining-data warehouse for securities business and introduces data warehouse system: information origin function, model and technology, which is one of features in the paper.
    Fourthly, explains the process of data mining and apply the methods into securities business. Quantity association rule and apirori are used to analyze the relation of share price and markup. An excellent rule is put forward.
    Lastly, the paper combines CRM with DM and provides some suggestions for


    retail trade. Decision tree are used to plot customers out.

引文

[1] 高洪深，决策支持系统 (DSS)——理论、方法、案例，(第二版)，清华大学出版社，2000．9
    [2] 陈国青，数据挖掘技术在商业决策中的应用，全国税务系统第一期高级信息管理人才研修班信息管理课程讲义，清华大学经管学院，2000，(4)
    [3] (加)Jiawei Han, Micheline Kamber著，范明，孟小峰等译数据挖掘概念与技术，北京：机械工业出版社，2001。
    [4] 刘兴雨，数据挖掘技术及在电子商务中的应刚，应用技术，2001，(6)
    [5] 陈秋双等，基于数据仓库的客户分析系统的分析与设计[J]。计算机工程 2001(9)
    [6] 沈兆阳，SQL Server 2000 OLAP 解决方案——数据仓库与Analysis Services，清华大学出版社，2001
    [7] 王熙照，洪家荣。区间值属性决策树学习算法[J]。软件学报，1998，9(8)：637-640。
    [8] 陈恩红等。基于决策树学习中的测试生成及连续属性的离散化[J]。计算机研究与发展，1996，35(5)：403-407。
    [9] 吕安民，林宋坚等，数据挖掘和知识发现的技术方法，测绘科学，2000；25(4)
    [10] 关俐，梁洪峻，数据仓库和数据挖掘，微型电脑应用，1999；15(9)
    [11] 唐晓萍，数据挖掘与知识发现综述，电脑开发与应用，2002；15(4)
    [12] 朱玉怡，数据挖掘技术在财经领域的应用，电脑与信息技术，2000(4)
    [13] 李宏等，基于时序模式关联的股票走势分析研究，计算机工程与应用，2001；1(13)
    [14] 方依兰，黄智兴等，股票信息的数据挖掘，西安师范大学学报 (自然科学版)，2000；25(2)
    [15] 郭红丽，基于数据挖掘的证券投资分析系统的分析与设计，2002(28)
    [16] 李长树，田锋，证券公司数据仓库解决方案，计算机工程，2002；28(4)
    [17] 陈舜青，郑成增，数据仓库及其数据挖掘的应用研究，网络经济，2002(9)
    [18] 郑朝霞，刘廷建，关联规则在股票分析中的应用，成都大学学报 (自然科学版)，2002，21(4)
    [19] 苑森淼，程晓青，数量关联规则发现中的聚类方法研究，计算机学报，2000，23(8)：866-871
    [20] Fukuda T, Morimoto Y, Morishita S et al. Mining optimized association rutes. In: Mendelzon A, Ozsoyoglu Z, eds. Prodeeding of the 15th ACM Symposium on Principles of Database Systems. New York: ACM Press, 1996, 182～191.
    [21] Srikant R, Agrawal R. Mining quantitative association rules in large relation table, In: Carey M, Schneider D, eds. Proceedings of the ACMSIGMOD Conference on Management of Data. New York: ACM Press, 1996, 1～12
    [22] Barquin R C, Edelstein H A. Planning and Designing the Data Warehouse[M]. Prentice Hall PTR, 1997
    [23] R. Agrawal. Fast Algorithms for Mining Association Rules in Large Databases [J]. Proc. 20th VLDB. 1994, 487-499
    [24] J. R. Quinlan. Induction of decision trees [j]. Machine learning. 1986:81-106 M. Ester. Knowledge Discovery in Large Spatial Databases [j]. Proc 4th Int. Symp. on Large Spatial Databases. 1995, 67-82.
    [25] Terveen L. Hill W. Amento B. etal. PHOAKS: a System for Sharing Recommendations [J]. Communications of the ACM. 1997. 40 (3): 59～62.
    [25] Inmon W H. Building the Data Warehouse [M]. John Wiley" Sons Inc. 1993
    [26] Barquin R C. Edelstein H A. Planning and Designing the Data Warehouse Prentine Hall PTR.1997
    [27] Arawal R, Imielinski T. Swami A. Mining association rules between sets of items in large databases.

    In Proc of Data Washington D.C. 1993.207-216
    [28] Park J. Chen M. Yu P. An Effective-Hash Based Algorithm for Mining Association Rules[J].IEEE Trans on Knowledge and Data Engineering. 1997.9(5):813-825.
    [29] Srikant R. Agrawal R. Mining quantitative association rules in large relational tables. Proc of the ACM SIGMOD Conference on Management of Data. Montreal. Canada. June 1996
    [30] Agrawai R et al. Effieient Similarty search in sequence database[e].In: FODO Conf. Evanston. Illinois. 1993.10
    [31] Ming-Syan chen. Jiawei Han Philip S Yu. Data Mining: An Overview from a Database Perpective[J]. IEEE Transactions on Knowledge and Data Engineering. 1996.8(6):866-883
    [32] Vasant Dhat. Data Mining In Finance: Using Counterfactuals to generate knowledge form organizational information system [J].Information system. 1998. 23(7); 423-437

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700