实时数据仓库环境中分区技术的研究与应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着企业对数据的实时性的要求的提高,传统的数据仓库技术已无法满足实时性需求。实时数据仓库技术的出现,为企业或组织提供了实时或近实时的数据信息。由于数据实时性的提高,每天都有海量的数据被保存到计算机中,如何在实时数据仓库环境中高效地管理数据的问题也暴露出来。基于数据库的管理技术实现高效的数据管理是解决这一问题的途径之一。
     本文针对实时数据仓库的特点,以提高实时数据仓库的存储和查询效率为目标,研究了分区技术对实时数据仓库中查询效率的贡献,以及对增量数据的分区存储的问题。从传统的数据库模式特点出发,提出了基于分区表的分区模型和系统化建模流程,包括模型建立、模式抽取和数据迁移等,并从理论依据、实验依据、通用性和扩展性等几个角度详细的论证了此分区表的优越性。针对实时数据仓库的特点,.提出了改进的水平分区算法和基于服务器端改进算法,并详细论证了此方法对查询效率的提升。在此基础上,设计了以上述分区技术为核心的分区引擎,并且将其应于到国家海洋环境数据仓库的开发过程。
     实验表明,本文所提出的基于分区表的分区策略以及动态分区算法在数据存储和查询领域具有优越的性能,达到了预期的目标。
With the increasing of real-time processing data requirements in enterprises, the traditional data warehouse technique has been unable to meet the requirements. The appearance of real-time data warehouse technique provides real-time or near real-time data information for enterprises and organizations. As the improvement of real-time data, the daily mass of data is stored in the computer, so the problem how to efficiently manage the data in data warehouse is emerged. It is one of the approaches solving the problem to implement the efficient data management based on database management techniques.
     In this thesis, for the characteristics of real-time data warehouse and the purpose improving storage and query effectivity of data warehouses, the contribution of partition technique for storage and query effectivity of real-time data warehouses and the problem of partition storage for increment data are studied. From the traditional characteristics of the database model, the partition model based on zoning district table and systematic modeling processes are proposed, including model creation, the mode extraction and data migration. The advantage of the partition table is proven from the theoretical basis, experimental basis, universal and expansibility. Aim at the characteristics of real-time data warehouse, an improved algorithm of horizontal partition and an improved algorithm based on server-side are proposed, and the improvement for query effectivity is discussed. Moreover, the partition engine based on above partition techniques is designed and utilized in the development for data warehouse in national marine environment.
     The experiments show that partition strategy based on partition table and the dynamic partition algorithm in data storage and query had superior performance and achieved the expected goals.
引文
1. Inmon W H. Building the Data Warehouse 3rd ed [M], John Wiley & Sons, Inc, 2002,234-245.
    2. Langseth J. Real-time Reality. Teradata Magazine Online[EB/OL], http://www.teradata.com/t/page/115223/index.html.
    3. Inmon W H著,王志海,林友芳等译.数据仓库[M].北京:机械工业出版社,2003(4):78-84.
    4.朱焱.浅谈数据抽取、净化和转换工具[J], 计算机应用,2000(4):51-59.
    5. Do H H, Rahm E. On metadata interoperability in data warehouses [EB/OL], Univ. of Leipzig,2000, available at:http://dol.uni-leipzig.de/pub/2000-13.
    6. Wu M, Buchmann A P. Research issues in data warehousing [C], In Proc. of the German Database Conf[C], Ulm,1997,61-82.
    7. Nguyen M, Tjoa A. Zero-Latency Data Warehousing (ZLDWH):the State-of-the-art and experimental implementation approaches [J], RIVF 2006-The 4th IEEE Int'l Conf. on Computer Science, Research, Innovation, and Vision for the Future, IEEE,2006,166-175.
    8. Agosta A, Gile K. Real-Time Data Warehousing:The Hype And The Reality [EB/OL], DMreview magazine, December 15,2004, http://www.DMReview.com.
    9. Nicholls C, BI 2.0:The Next Generation[EB/OL], DM Review, November 2006, http://www.DMReview.com.
    10. White C, Intelligent Business Strategies:Real-Time Data Warehousing Heats Up[EB/OL], DMReview Publication, August 2002. http://www.DMReview.com.
    11. Oracle白皮书[EB/OL],利用Oracle数据库10g实现即时数据仓库—按企业所需速度提供信息,http://www.oracle.com/global/cn/documentation/10g/bi/twp_dss_ontime_etl_10gr1_0304_proofread.pdf,2004.
    12. Bruckner R M. Tjoa, AM:Capturing Delays and Valid Times in Data Warehouses-Towards Timely Consistent Analyses[J], Journal of Intelligent Information System(JIIS), 2002.
    13. Johnson T, Sasha D. The Performance of Current B-tree Algorithms[C]. In:ACM Transactions on Database Systems (TODS) archive,1993.51-101.
    14. KounoTan K J, Ishikawa S. High-speed Data Retrieval in an Eigenspace Employing a B-tree Structure[C]. In:International Joint Conference 2006.2717-2720
    15.Furtado C, Lima A A, Pacitti E. Physical and Virtual Partitioning in OLAP Database Clusters[J]. In:Computer Architecture and High Performance Computing,2005.143-150
    16. Rohm U, Bohm K, Schek H. OLAP Query Routing and Physical Design in a Database Cluster[J]. In:Conf. on Extending Database Technology (EDBT), Springer, Germany, 2000.254-268
    17. Basu R. Challenges of Real-time Data Warehousing[EB/OL], Published in BI Report in Nov.2003. http://www.DMReview.com.
    18.Bellatreche L, Karlapalem K, Mohania M. OLAP Query Processing for Partitioned Data Warehouses [J]. In:Database Applications in Non-Traditional Environments,1999 proceedings.35-42
    19. Gruenwald L, Eich M H. Database Partitioning Techniques to Support Reload in a Main Memory Database System[M]. In:Databases, Parallel Architectures and Their Applications, PARBASE-90.107-109
    20. Finkelstein S, Schkolnick M, Schkolnick M. Physical Database Design for Relational Databases[J]. In:TODS 13(1),1988.91-128
    21.Copeland G P, Khoshafian S F. A Decomposition Storage Model[M]. In:SIGMOD 1985,234-237
    22. Ceri S, Negri M, Pelagatti G. Horizontal Data Partitioning in Database Design[M]. In: SIGMOD 1982,123-124
    23. Cornell D W, Yu P S, Vertical A. Partitioning Algorithm for Relational Databases[J]. In: ICDE 1987,324-329
    24. Bellatreche L, Karlapalem K, Mohania M, Schneider M. What can Partitioning Do for Your Data Warehouses and Data Marts [M] In:Database Engineering and Applications Symposium,2000 International,23-26
    25. Oracle Official Document. Oracle 9i Data Warehouse Guide [EB/OL],release (9.2), chapter 5. http://www.oracle.com/technology/documentation,2002.
    26. Gruenwald L, Eich M H. Database Partitioning Techniques to Support Reload in a Main Memory Database System[J]. In:Databases, Parallel Architectures and Their Applications, PARBASE-90.107-109
    27. Nguyen M, Tjoa A. Zero-Latency Data Warehousing (ZLDWH):the State-of-the-art and experimental implementation approaches[C], RIVF 2006-The 4th IEEE Int'l Conf. on Computer Science, Research, Innovation, and Vision for the Future, IEEE,2006,166-175
    28. Thalhammer T, Schrefl M, Mohania M. Active Data Warehouses:Complementing OLAP with Analysis Rules[C], Data & Knowledge Engineering, Elsevier Science Ltd.,2001,39 (3):241-269
    29.Rahm E, Do H H. Data Cleaning:Problem and Current Approaches [J]. IEEE Data Engineering Bulletin,2000,245-249
    30. F. Araque. Real-time Data Warehousing with temporal requirements [EB/OL]. White paper at http://jobfunctions.bnet.com/whitepaper.aspx?&tags=JIT&docid=165392,2003
    31.DB29 Table Partition[EB/OL], http://www.ibm.com/developerworks /cn/edu/dm-dw-dm-0612read-i.html?s_cmp=techccid&s_tact=105agx52,2004
    32. Strategies for Partitioning Relational Data Warehouses in Microsoft SQL Server [EB/OL],http://www.microsoft.com/technet/prodtechnol/sql/2005/plan/spdw.mspx,2005
    33. David M. Buliding and Managing the Meta Data Repository[M]. New York, NY:wiley Publishing, Inc,2000,156-160
    34. AmerYahia S, Du F, Freire J. A comprehensive solution to the XML-to-Relational mapping problem [C], In Proceedings of Web Information and Data Management [C], 2004:31-38.
    35. Tufte K, He G, Zhang C, DeWitt D, Naughton J. Relational databases for querying XML documents:limitations and opportunities [C], In Proceedings of the 25th VLDB Conference [C],1999:302-314.
    36. Tatarinov I, Viglas S. Storing and querying ordered XML using a relational database system [C], In Proceedings of the ACM SIGMOD International Conference on Management of Data [C],2002:204-215.
    37. Feng S, Song J, Wang D. Web-Based transformation system for massive scientific data [J]. Springer-Lecture Notes in Computer Science,2006.10:104-114.
    38. David. M, Buliding and Managing the Meta Data Repository[M]. New York, NY:wiley Publishing, Inc,2000.8:109-115.
    39. Bruckner R M, List B, Schiefer J. Striving Towards Near Real-time Data Integration for DataWarehouses[M]. In:Proc. of DaWak 2002, LNCS 2454.2002.317-326
    40. Haisten M. Real-Time Data Warehouse:The Next Stage in Data Warehouse Evolution. In: DM Review[J], August 1999.7:145-150
    41. White C. Intelligent Business Strategies:Real-Time Data Warehousing Heats Up. In: DMReview Publication[J], August 2002.5:134-140
    42. Johnson T, Sasha D. The Performance of Current B-tree Algorithms. In:ACM Transactions on Database Systems (TODS) archive[M],1993.51-101
    43.KounoTan K J, Ishikawa S. High-speed Data Retrieval in an Eigenspace Employing a B-tree Structure[M]. In:International Joint Conference 2006.2717-2720
    44.Furtado C, Lima A A, Pacitti E. Physical and Virtual Partitioning in OLAP Database Clusters[C]. In:Computer Architecture and High Performance Computing,2005.143-150
    45. Rohm U, Bohm K,Schek H. OLAP Query Routing and Physical Design in a Database Cluster[C]. In:Conf. on Extending Database Technology (EDBT),Springer, Germany, 2000.254-268
    46. Oracle Official Document. Oracle 9i Data Warehouse Guide[EB/OL],release(9.2), chapter http://www.oracle.com/technology/documentation,2002.8
    47. Bellatreche L, Karlapalem K, Mohania M. OLAP Query Processing for Partitioned Data Warehouses[J]. In:Database Applications in Non-Traditional Environments,1999 proceedings.35-42
    48.Taniar D, Rahayu J W. Sorting in Parallel Database Systems. In:High Performance Computing in the Asia-Pacific Region[C],2000. Proceedings vol.2.830-835
    49. Activity Statement on the Document Object Model[EB/OL]. http://www.w3.org/DOM
    50.何克清,应时,何非.一个可构造的反演状态模式[J].软件学报,2001,12(8):1242-1249.
    51. Gamma E, Helm R, Johnson R. Design Patterns Element of Reusable Object-Oriented Software[M].北京:机械工业出版社,200414-15
    52.国家海洋局908专项办,我国近海海洋综合调查与评价近海“数字海洋”信息基础框架构建[EB/OL]. http://www.coi.gov.cn/oceannews/2005/hyb1467/41.htm
    53.叶仰明,黄加棋.中国数字海洋的总体技术系统框架[M], In:海洋科学,2001.05,387-401.
    54.侯文峰.中国“数字海洋”发展的基本构想[J].In:海洋通报,1999.06,78-82

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700