数据挖掘技术在超市数据仓库中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
传统的数据库管理信息系统不能够很好地利用、分析数据库中积累的大量数据,数据挖掘与数据仓库技术很好地解决了这一问题。本文首先介绍数据挖掘和数据仓库的相关知识,包括数据挖掘与数据仓库、联机分析处理、统计学之间的关系,接着详细论述了数据挖掘模式和数据挖掘过程模型,重点讨论了聚类模式中的动态聚类算法,并采用主成分分析法预处理数据,在此基础上提出了动态聚类的改进算法。
     作为一个应用实例,本文在分析超市业务数据库的基础上,用星型架构的方式建模,构造出一个数据仓库的逻辑模型;然后从超市业务数据库中抽取数据,经过转换等处理,把“有价值的、干净”的数据加载到数据仓库中,完成数据仓库的构建。参照Two Crows数据挖掘过程模型,首先收集客户购买产品的类型、交易、属性等数据;然后采用主成分分析法预处理这些数据,以降低数据之间的相关性和减少变量个数;接着采用改进的动态聚类方法建模,在聚类过程中剔除异常点,改善聚类的质量,最终得到一个客户分片的模型,并对该模型作了比较详尽的解释。
     数据挖掘和数据仓库有很紧密的联系,数据仓库是数据挖掘一个良好的奠基石;数据挖掘使数据仓库的决策作用得到更好的发挥,所以数据挖掘和数据仓库系统的无缝集成是数据挖掘界的一个热点。作为一种发展趋势,本文对此也作了进一步的论述。
The data of large database can't be fully used and analyzed by the traditional database management information system, on the other hand, data mining and data warehouse resolve such problem well. This paper first introduces the data mining and data warehouse's knowledge, including the relation of data mining and data warehouse and the connection of olap and statistics, then puts the data mining pattern and the data mining process model in detail. It addresses the dynamic cluster arithmetic and preprocess data by the primary component analysis arithmetic, then improves the dynamic cluster arithmetic.
    As one application, this paper analyses the operational database, then builds the supermarket's data warehouse logical model with the method of starschema, after that, the data is extracted from operational database. The "valuable and clean" data is loaded into the data warehouse after transformed by some tools or programming languages, then the physical model of supermarket's data warehouse is finished. Following the Two Crows data mining process model, the product data, transaction data and customer's demographic data are accumulated, such data is preprocessed by the primary component analysis method which can low the connection of variants and reduce the number of variants. One model is built by the improved dynamic cluster method. The quality of the model's result will be improved with deleted the outlier data. At last the result of this model is explained in detail.
    There is closed connection between the data mining and the data warehouse: data warehouse is one excellent platform of data mining, what's more, the decision function of data warehouse can be developed well with
    
    
    
    the help of data mining, so the seamless integration of data mining and
    data warehouse is one hit topic, this trend also be addressed more in this
    paper.
引文
[1] W.H.Inmon,数据仓库 王志海等译 机械工业出版社 2000/3,p20-23
    [2] Jiawei Han、Mickeline Kamber,数据挖掘:技术与方法,范明、孟小峰译,机械工业出版社,2001/8,p223-235
    [3] 陈燕,数据仓库技术及其应用大连海事大学出版社 2002/4
    [4] Lou Agosta,数据仓库技术指南 人民邮电出版社,潇湘工作室译,2001/2,p110-117
    [5] 罗积玉,邢瑛,经济统计分析方法及预测,清华大学出版社,1987/8
    [6] Pete Chapman、Julian Clinton、Tjpmas Reinartz, The CRISP-DM Process Model, CRISP-DM Discussion Paper, March, 1999
    [7] Randy Kerber, Data Mining and the Case for Sampling, SAS Institute, Sep, 1998
    [8] Devel,客户流失管理中的数据挖掘,SAS广州办事处译,SAS广州办事处,2000/8
    [9] 凌云,SAS基础课程,SAS广州办事处,2000/8
    [10] 凌云,SAS产品白皮书,SAS广州办事处,2000/8,p56-68
    [11] 高惠璇等编译,SAS系统SAS/STAT软件使用手册,中国统计出版社,1997/9
    [12] Chuck Ballard、Dirk Herreman、Don Schau, Data Modeling Techniques for Data Warehousing, IBM Institute, Feb, 1998
    [13] Christian M Andersen、Stephan Bayerl、Graham Bent, Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data, IBM Institute, July, 2001
    [14] Peter Cabena、Hyun Hee Choi、Soo Kim, Intelligent Miner for Data Applications Guide, IBM Institute, March, 1999
    [15] Lelia Morrill, Data Mining for Enterprise Solutions, NCR Institute, July. 2001
    
    
    [16] Arlene Zaima, Data Mining Primer for the Data Warehouse Professional, NCR Institute, April, 2002
    [17] 章立民,SQL Server 2000完全实战——数据转换服务(DTS),中国铁道出版社,2002/6,p166-178
    [18] Mike Gunderloy、Tim Sneath, SQL Server开发指南——联机分析处理(OLAP),张伟、宋霞译,电子工业出版社,2001/11
    [19] David Wishart, Clustering Methods for Large Data Problems, Bulletin of the tnternatinal Statistical Institute, August, 1999
    [20] David Wishart, Efficient Hierarchical Cluster Analysis for Data Mining and Knowledge Discovery, Computing Science and Statistics, Vol30, 1998
    [21] Ralph Kimball, Slowly Changing Dimensions, DBMS and Internet Systems, November, 1996
    [22] Ralph Kimball, Digging Into Data Mining, DBMS and Internet Systems, October, 1997
    [23] Ralph Kimball, It' s Time for Time, DBMS and Internet Systems, July, 1997
    [24] Microsoft Corporation,分析服务,东方人华译,清华大学出版社,2001/8
    [25] Michael Corey著,Oracle 8i数据仓库,施平安等译,清华大学出版社,2002/1
    [26] 飞思科技产品研发中心编著,SQL Server 2000 OLAP服务设计与应用,电子工业出版社,2002/1
    [27] 徐国祥,统计预测和决策,复旦大学出版社,1994/4,p89-96
    [28] 统计原理与经济统计教材编写组,统计原理与经济统计,中国财政经济出版社,1982/7
    [29] 余思勤、邵瑞庆、郭一洋,统计学原理,人民交通出版社,1991/4
    [30] Robert D. Mayo、Douglas A.Lind, Statistical Techniques in Business and Economics,机械工业出版社,1998/12

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700