数据仓库的数据析取技术研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文描述了一个数据仓库通用数据析取软件的设计与实现。随着社会的进步和科技的发展,分析决策成为了各行各业的生命线。数据仓库技术凭借其在数据存储与组织结构上的优势为决策支持系统提供强有力的数据支持。本软件将来源数据经过集成、转换、清洗、优化后加载到数据仓库中,保障数据仓库拥有高质量的数据,为决策分析系统能有效地工作奠定基础。
     本文第一章阐述本课题的意义并对数据仓库技术进行简要分析;二—六章介绍系统设计开发的思路和实现方法;最后一章进行总结和展望。
     本软件采用了三层体系结构,使用COM技术和MTS开发和管理中间层组件。我们将数据的集成、转换、清洁、优化等模块都以COM组件形式进行了封装,形成.DLL文件,这样有利于系统的升级、维护和移植。
     本文分析了形形色色的数据析取方法,将其归纳为集成、转换、清洁,并提出有必要对数据进行优化,如数据平滑、规范化等,以期更好地支持数据挖掘。
     本软件支持对大部分结构化和半结构化数据的析取,包括各种关系数据库,Excel表格,有分隔符的文本文件,XML文件。特别是对XML文件的析取,是本软件特色之一。我们提出了一种基于规则驱动的XML模式数据到关系模式的转换方法,用于完成对XML数据的析取。
     系统将用户定义的析取过程封装为析取包(Package),实现一次定义多次使用。为了提高析取包的执行效率,我们采用了微软的DTS作为传输工具,它大大加快了数据析取的速度。
This paper discusses about the design and implementation of a general Data Extracting Tool of Data Warehouse. With development of the society and improvement of technology, analysis and decision become the lifeline of every walk of life. Data warehouse provides strong data support for analysis and decision by right of its data storied format and data organize structure. This software provides a solution that ensures the Data Warehouse can get high quality data. It gets the raw data from the data source and sends them into the Data Warehouse after integrating, conversing, cleaning and optimizing.
    The first chapter expatiates the meaning of this work and gives a brief analysis about data warehouse technology. Chapter 2 to 6 introduce the ideal of the system and its implementing method; the last chapter summarizes the paper and vistas the data warehouse technology in the future.
    This system was designed into three-tier architecture. We use COM technology to develop middle-level components and use MTS to manage them. We packaged the function modules, such as integration, conversion modules, into COM components. It can do good to update, maintain and transplant the system.
    In this paper we generalize the common data extracting methods into integration, conversion, clean and sum up, and bring forward data optimizing, such as data smoothness, data standardization and so on, in order to support data mining more effectively.
    This software can extract data from a majority of structurized or semi-structurized data source, such as relational database, Excel file, formatted text file and XML document. Extracting data from XML document is characteristic of the software. We bring forward a XML circumstance based rule driven method to transform XML data to RDB, and in this base implements extracting data from XML document.
    System can package the extract work defined by user into 'Extract Package'. These packages can be used more than one time. In order to improve the execute efficiency; we adopt MS DTS as our transform tool. It quickens the data extracting speed.
引文
[1] Joyce Bischoff,Ted Alexander,数据仓库技术,电子工业出版社
    [2] Lou Agosta,数据仓库技术指南,人民邮电出版社
    [3] Jiawei Han,Micheline Kamber,数据挖掘概念与技术,机械丁业出版社
    [4] James britt,Teun Duynstee,Visual Basic 6 XML专业技术,人民邮电出版社
    [5] Mary Kirtland,基于组件的应用程序设计,北京大学出版社
    [6] 任庆东,苏斐,李井辉,利用XML实现异源数据库中的数据交换,计算机应用研究,2001,12
    [7] 瞿裕忠,一个基于XML的数据交换原型系统,计算机工程,2000,9
    [8] 基于SQL Server 2000的关系数据与XML的集成,2001,10
    [9] 肖富军,陈文鏖,胡运发,基于XML技术的电子交换技术,计算机工程,2000,9
    [10] 周傲英,张龙,梁宇奇,邱越峰,基于关系的XML数据存储,计算机应用,2000,9
    [11] 陈晓云,郭朝珍,数据析取分类研究与设计,计算机应用,2001,8
    [12] 陈元,陈文伟,基于数据抽取器实现数据挖掘,计算机工程,2000,10
    [13] 朱焱,浅论数据抽取、净化和转换工具,计算机应用,2000,4
    [14] 闵丽娟,朱旻如,郝丽,DSS应用中一种跨数据库平台的数据抽取技术,工业控制计算机,2001,7
    [15] 琚春华,凌云,王光明,基于知识的多数据源DSS的数据抽取技术研究,小型微型计算机系统,2001,9
    [16] 陈刚,吴刚,孙德敏,基于COM的多层体系设计,微计算机信息,2000,6
    [17] 杨勤,基于COM的三层客户服务器模型,计算机应用研究,2001,2
    [18] 吴竞华,陈根才,基于三层结构模式的档案管理系统设计及实现,计算机应用,2000,8
    [19] Jayavel Shanmugasundaram, Kristin Rufte, Gang He, et.al., Relational Databases for Querying XML Documents: Limitations and Opportunities, VLDB'99, 1999
    [20] C.Fennivel, C.Jing, G.Yongqin, S.Bole, XML Arouse the Web Architecture Revolution, Lecture Notes in Computer Science (LNCS), Volume 1749, 1999
    [21] DB2 XML Extender Administration and Programming, IBM, 2000
    [22] L.Wu,L.Miller and S.Nilakanta,Design of data warehouses using metadata, Information and Software Technology,2001,2
    [23] P.Henderson and R.Walters,Behavioural analysis of component-based systems, Information and Software Technology,2001,3
    http://atiger.best.163.com/数据仓库之路
    http://wwww.sqlmine.com/warehouse/中软银通数据仓库
    http://www-900.ibm.com/developerWorks/xml/x-rdb/IBM技术网页

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700