基于有向查询图的异构数据访问框架研究与实现

英文题名：Research and Implementation of Oriented Query Graph-based Heterogeneous Sources Framework
作者：刘星宇
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：异构数据访问 ; 有向查询图 ; 增强浓密树算法 ; XQuery
英文关键词：Heterogeneous Data Access ; Oriental Query Graph ; Enhanced Bush Tree ; XQuery
学位年度：2008
导师：于双元
学科代码：081203
学位授予单位：北京交通大学
论文提交日期：2008-05-01
答辩委员会主席：须德

摘要

信息集成系统的目的是通过集成各种可用资源建立一个复杂的信息系统,并最大限度的利用这些资源,包括计算资源。常用的做法是建立一个一致的查询界面(语言和模式表述),对多个数据源通过这个统一的界面建立查询。
     在当前已实现的查询集成系统中存在几方面的缺陷:数据源的组织不够灵活,不能满足一些特定的应用需求;没有一个能够体现混合多数据源查询定义的中间语言,使得优化推理不能体现多数据源混合的特点;在分布式因素分析及相关优化策略方面做的工作很少。
     本文在传统数据库架构的基础上,提出并实现了一个基于有向查询图的异构数据源查询集成框架,利用虚拟数据源组织方式实现了数据源的灵活组织,并通过SQL,XQuery为异构数据源建立了一致的访问接口,从而实现了对分布式异构数据的一致化查询。本文还针对分布式查询优化提出了增强浓密树的优化算法,此外在分布式因素分析和相应的优化策略方面作了初步的探索,并在最后给出了相关的分布式数据查询的实例。
     最后的实例表明,本文提出并实现的异构数据访问系统实现了对异构数据源的统一、透明的访问,并具有较好的可扩展性,最后可通过量化的实验方法还证明了该系统具有较短的访问时间。
The aim of information integration is to build sophisticated systems by making use of available information resources, which including the computing resources, to fullest extent and by pushing costly operations to these sources as much as possible. What the queries integration systems do is to create a unified query interface, including query languages and schema that queries built on, and users can query multi-sources through the unified query interface.
     There are some shortages in current implements of query integration systems. Firstly, the organization of data sources can't meet the requirement of some applications. Some implements need a middle language to express the definition with multi-sources queries characteristics, with which we can do some optimization reasoning expediently. Thirdly, current implements work little research in distributed factors and related optimization algorithm.
     In this thesis, base on the traditional database framework, we propose and implement a Query Graph-based Heterogeneous Sources Framework. Based on this architecture, we research in some aspect of query integration system to a quite deep extent and implement a prototype of query integration system. We organize data sources with the structure of virtual sources, which is flexible to use in some scenario that current implements can't be used in. We use SQL and XQuery as the common query interface of distributed heterogeneous sources. Also, we define a middle algorithm, Enhanced Bushy Tree to limit traditional Bushy tree's search space and enhance the performance of distributed query execution. One more research in the analysis and optimization strategy of distributed factor is also referred in this thesis. We also give some example based on this architecture at last of this thesis.
     The example given at last show the successful design and implementation of our Heterogeneous Sources Framework, and additional quantitative performance measure result is available at last.

引文

[1]王忠群,管理信息系统的集成技术研究[J].计算机应用,1998,18(6):14-19
    [2]Sheth Amit P,LARSON James A Federated Database System for Managing Distributed,Heterogeneous,and Autonomous Database[J].ACM Computing Surveys,1990,22(3):183-236.
    [3]M.W.Bright.Taxonomy and Current Issues in Multi-database Systems.IEEE.Computer March,1992:P51-59
    [4]顾军华,分布式异构数据源集成研究.河北工业大学学报 2001.8
    [5]M.T.Roth and P.Scharz.A wrapper architecture for legacy data sources.In Proc.of the 23rd Int.Conf.on Very Large Databases,Athens,Greece,1997.
    [6]S.Bergamaschi,S.Castano,M.Vincini.Semantic integration of semistructure and structured data sources.ACM SIGMOD 1999,Record 28(1)
    [7]D.Beneventano,S.Bergamaschi,S.Castano,A.Corni,R.Guidetti,G.Malvezze,M.Melchiori,M.Vincini.Information integration:the MOMIS project demonstration.Proc 26th Int Conf On Very Large Data Bases 2000,pp.611-614.
    [8]S.Bergamaschi,S.Castano,M.Vinicini,D.Beneventano.Semantic integration of heterogeneous information sources.Data Knowl Eng 2001,36(3):215-249.
    [9]D.Beneventano,S.Bergamaschi,F.Guerra,M.Vincini:"The MOMIS approach to Information Integration",IEEE and AAAI International Conference on Enterprise Information Systems(ICEIS01),Portugal,7-10 July,2001
    [10]Manuel Rodrguez-Martnez and Nick Roussopoulos.MOCHA:A self-extensible database middleware system for distributed data sources.In Proc.ACM SIGMOD Conference,Dallas,Texas,USA,May 2000:213-224
    [11]R.Cattel,ed.,Object Database Standard:ODMG-93(Release 1.2),Morgan Kaufmann Publishers,San Francisco,CA,1996.
    [12]S.Castano,V.De antonellis.A schema analysis and reconciliation tool environment.Proc Int Database Eng Appl symp(IDEAS),IEEE computer New York,1999.pp.53-62.
    [13]S.Castano,V.De Antonellis,S.De capitani di Vemercati.Global viewing of heterogeneous data sources.IEEE trans data knowing 2001,13(2):277-297.
    [14]http://www.cogsci.princeton.edu/～wn/
    [15]Clement T.Yu,Weiyi Meng.Principles of Database Query Processing for Advanced Applications.Morgan Kaufmann Publishers,1997.
    [16]Mary Fernandez,Jerome Simeon,and Philip Wadler.An Algebra for XML Query.Lecture Notes In Computer Science.in:Lecture Notes In Computer Science on FST TCS,Delhi,December 2000.pp:11-45.
    [17]David Beech,Oracle;Ashok Malhotra,IBM;Michael Rys,Microsoft.A Formal Data Model and Algebra for XML.W3C XML Query Working Group Note,September 1999.
    [18]Carlo Sartiani,Antonio Albano.Yet Another Query Algebra For XML Data.International Database Engineering and Applications Symposium(IDEAS'02),July 17-19,2002,pp:106-115
    [19]J.Naughton,D.DeWitt,D.Maier,et al.The Niagara Internet Query System.IEEE Data Engineering Bulletin,2002.http://www.cs.wisc.edu/Niagara.
    [20]Z.G.Ives,A.Y.Halevy,and D.S.Weld.An XML Query Engine for network-bound data.VALB Journal,11(4),2002.pp:380-402.
    [21]Zachary G.Ives,Alon Y.Levy,Daniel S.Weld.Efficient Evaluation of Regular Path Expressions on Streaming XML Data.Technical Report UW-CSE-2000-05-02,University of Washington.2000.
    [22]Charles Bartom,Philippe Charles,Deepak Goyal,Mukund Raghavachari,Marcus Fontoura,Vanja Josifovski.An Algorithm for Streaming XPath Processing with Forward and Backward Axes.IBM Almaden Research Center,Published at ICDE 2003.
    [23]Sangwon Park,Hyoung-Joo Kim.A New Query Processing Technique for XML Based on Signature.The 7~(th)International Conference on Database Systems for Advanced applications (DASFAA'01),April 18-21,2001,Hong Kong,China.pp:22-31.
    [24]Quanzhong Li,Bongld Moon.Indexing and Querying XML Data for Regular Path Expressions.Proceedings of the 27 th VLDB Conference,Roma,Italy,2001.pp:361-370.
    [25]http://www.w3.org/XML/
    [26]http://www.w3.org/XML/Ouery
    [27]许向阳,冯玉才.并行数据库PDBMS的设计与实现.计算机工程与应用,1999,11
    [28]李建中,孙文隽.并行关系数据库管理系统引论[M].科学出版社,1998
    [29]李建中,并行数据操作算法和查询优化技术[J].软件学报,1994,5(10):11-23
    [30]金树东,冯玉才.并行数据库系统原型PARO[J].计算机科学,1997,24(3)
    [31]Harald Kosch.Managing the operator ordering problem in parallel databases[J].Future Generation Computer Systems 16(2000)665-676.
    [32]L.Brunie,H.Kosch.W.Wohner.From the modeling of parallel relational query processing to query optimization and simulation[J].Parallel Processing Letters 8(1)(1998)2-14.
    [33]R.S.G.Lanzelotte,P.Valduriez.Industrial-strength parallel query optimization:issues and lessons[J].Information Systems 19(4)(1994)311-330.
    [34]M.T.Ozsu,P.Valduriez.Distributed and Parallel Database Systems[J].CRC Press,Boca Raton,FL,1997,pp,1093-1111.
    [35]D.D.Straube,M.T.Ozsu.Query optimization and execution plan generation in objected-oriented database systems[J].IEEE Transaction on Knowledge and Data Engineering 7(2)(1995)210-227.
    [36]G.Grafe.Query evaluation techniques for large databases[J].ACM Computing Surveys 1993,25(2).
    [37]P.Celis.The query optimizer in Tandem ServerWare SQL Product,Processing of the International Conference on Very Large Databases[J].Bombay,india,September 1996.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700