异构数据源数据交换引擎的设计与实现

作者：罗金群
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：分布式 ; XML ; 模板 ; 数据交换 ; 异构数据源
英文关键词：Distribute ; XML ; Template ; Data Exchange ; Heterogeneous data source
学位年度：2007
导师：陈启买
学科代码：081202
学位授予单位：华南师范大学
论文提交日期：2007-05-01

摘要

高校信息管理系统普遍存在着多平台运行、多种数据库、分散凌乱等弊端，建立统一的信息平台是数字化校园建设的核心课题之一。由于异构数据源在结构、数据、DBMS、硬件、网络协议等方面的差异性和自治性，构建多数据源数据交换引擎是实现多数据源数据集成与共享的有效解决途径。本文采用XML数据模型，根据目录服务的思想，将P2P与分布式网络有机地结合起来进行资源管理，最终形成异构数据源数据交换引擎。
     XML是一种采用开放的自我描述方式定义的数据格式。它包括了文档格式化标准(Schema)、文档显示模式定义(XSL)、文档查询标准(XQuery)、文档解析标准(SAX)和文档链接标准(XLink)。作为一种元标记语言，XML可针对不同应用环境和要求定制标记，并以统一、开放、基于文本格式的模式来描述和交换数据。XML Schema是一种规范的XML文档，通过使用XML作为描述手段，使之具有很强的描述能力、扩展能力和处理维护能力。XQuery是对XML数据集进行查询的功能语言，简单灵活，易于理解和实现。
     P2P(peer-to-peer)又称为对等网，它让用户可以与网络上的其他计算机直接连接，进行文件共享与交换。P2P是由物理上分布的节点组成，所有节点都是对等的(称为对等点)，各节点具有相同的责任和能力，并协同工作共同完成任务。对等点之间直接相连，共享信息资源，无须依赖集中式服务器。在P2P模式中，对等点具有很高的自治性和随意性，它们既是信息的消费者(客户端)，也是信息的提供者(服务器端)，在执行计算、提供和消费时分担相同的作用。
     基于以上理论基础，论文提出了异构数据源数据交换引擎的总体方案设计，包括总的体系结构、系统功能模块、系统总体流程以及开发环境。
     论文给出了数据交换引擎的设计模型，该数据交换引擎采用32EE架构，以Java为编程语言、以XML Schema定义数据模型，开发了数据交换引擎的简单实现模型。该模型提供模板定制功能。相对于其他的系统需要用户熟悉查询语句，在提交查询时需输入详细的查询语句，本系统为用户提供模板定制服务，用户只需在友好用户界面上选择需要查询的数据，即可提交查询，并将该查询定制为模板，以便今后直接调用模板进行查询。查询处理器提供了数据查询的功能，将整个查询过程定义为查询规范化、查询分解、查询重写以及结果合成几个部分，并给出实现算法。最后介绍了异构数据源数据交换引擎简单原型系统的工作原理及主要类。
     论文构建了异构数据源数据交换引擎的简单原型系统，对Access、SQL Server2000和Oracle 9i中的例子数据库进行数据交换的应用，给出了一个应用的具体实例系统以及介绍该系统的特点。
     最后，对本文的工作做出了总结，并讨论了下一步要进行的研究工作。
     本文的研究课题来源于广东省教育厅科研基金项目“基于多校区办学的高校教学管理信息化运作模式研究与实践”。
There are many irregularities, such as multi-flat, variety database,variety circulating way and dispersion in great disorder; exist in manyinformation management system of college. It is one of the core topics of theestablishment of numeral turn campus to build unitive information flat.Heterogeneous database have very big of difference and autonomy because of thedifference of structure, data, DBMS, hardware and network protocol. To solvethe communicate problem among the heterogeneous databases, It is the first andfoundation problem to set up a heterogeneous database engine for data exchange.This paper form the engine by the use of XML data model, according to the ideaof catalogue service, and band the P2P together with distribute networkorganically.
     The XML is a kind data format which defines by open self-description way.It includes document format standard (Schema), document manifestation modedefinition (XSL), document query standard (XQuery), document analyze standard(SAX) and document Link standard (XLink). As a kind of meta-markup language,the XML can order markup aim at dissimilarity application environments andrequests, and descript and exchange data with unify, open, mode according todocument format. The XML Schema is a kind of standard XML document. It has verystrong of description ability, expand ability and processing maintenanceability, by using XML as a description means. The XQuery is the function languagewhich gathers to carry on a query to the XML data. It is simple vivid, be easyto comprehension and realization.
     The P2P(peer-to-peer) is also called an equal net, customer can direct linkto the other computers on the net, and carry on a file sharing and exchange. P2Pis make up of physical distribute nodes. All nodes are equal. Each node has thesame responsibility and ability, and cooperates with the common mission. Theequal node is direct connecting with each other, sharing information resources, need no centralize server. In the P2P mode, the equal node has a very highautonomy and willingness. They are the consumer (the client) of information,and are also the promoter (the server) of information, share homology of functionat performing calculation, provide and consume.
     The paper introduces the total design of the heterogeneous database enginefor data exchange, including the total system structure, the system functionmodules, the system total process and the development environment.
     The paper gives a model of the data exchange engine. The paper takes javaas the program language, uses the XML Schema to define data module, and developsa simple realization model for the data exchange engine. The model provides thefunction of template. Opposite at the other system demand the customer acquaintwith the query language, and input an importation detailed of query languagesentence to hand over the query, this system provides template-customizationservice for the customer. Customer just need to select the data want to queryon the friendly user interface, then can immediately hand over the query, andcustomize the query asatemplate for the direct use later. Then the paper givesthe design and realization of the query processor. The whole query processdefines as query regularization, query decomposes, and query rewrite and queryresult synthesize, and then gives the realization arithmetic. At last, it givesthe working theory and the main class of the model.
     The model uses the example databases of Access, SQL Server and oracle toapply the data exchange. The paper gives a material appliance of the mode andintroduces its characteristics.
     At last, a summary of the paper is going, and the further research is putforward.

引文

[1] Richard Hull, Managing semantic heterogeneity in database: A theoretical perspective. In Proc of the 16th, ACM SIGACT SIGMOD SIGART Sym. On Principles of Database Systems (PODS-97),1997
    [2] Richard Hull and Gang Zhou. A framework for supporting data integration using the materialized and virtual approaches. In H.V. Jagadish and Inderpal Signh Mumick, editors, Proceedings of ACM SIFMOD 1996 International Conference on Management of Data, Montreal, Canada, June 1996
    [3] Hector Garcia-Molina, Joachim Rammer, Kelly Ireland, and et al. Integrating and Accessing Heterogeneous Information Sources in TSIMMIS.
    [4] 王宁，王能斌，异构数据源集成系统查询分解和优化的实现[J]．软件学报，2000，11(2)：222-228
    [5] Li Guangyu, Zhang Jun, Jin Qiangyong. The Research of heterogeneous Data Integration in Information Systems. ICMSE/Harbin(2001)
    [6] Li Guangyu, Zhang Jun, Xie Yiwu, Liu Jun. The Research of heterogeneous Data Integration in Information Systems. ICMSE/Russian(2002)
    [7] H. garcia-Molina, Y. Papakonstantinou, et al. The TSIMMIS Project: Integration of heterogeneous information sources[J]. Journal of Intelligent Information System. 1997.3, 8(2),117-132
    [8] Laura Haas, Donald Kossmann, Edward Winmmers, Jun Yang. Optimizing queries across diverse data sources. In Proc. Of the Int. Conf. on Very Large Data Bases (VLDB), Athens, Greece, 1997
    [9] Chaitan Baru, Amamath Gupta, and Bertram Ludascher. XML-Based Information Mediation with MIX. 1999
    [10] Hironori Mizuguchi, hiroyuki Kitagawa, A Rule-oriented Architecture to Incorporate Dissemination-Based Information Delivery into Information integration Environments ADBIS-DASFAA 2000, LNCS1884(2000),p185-199
    [11] 王宁，陈滢，徐宏炳，王能斌．一个基于CORBA的异构数据源集成系统的设计[J]．软件学报，1998，9(5)：378-372
    [12] 王宁，徐宏炳，王能斌．基于带根连通有向图的对象集成模型及代数[J]．软件学报，1998，9(12)：894-898
    [13] http://www.w3.org/XML
    [14] Didier Martin etc, Professional XML[M],北京：机械工业出版社，2001
    [15] 万常选．XML数据库技术[M]，北京：清华大学出版社，2005
    [16] http://www.w3.org/TR/xquery
    [17] http://www.microsoft.com/china/windows2000/guide/server/features/dirlist.asp
    [18] 雷雨，刘豫飞．P2P技术的初步探讨[J]．襄樊职业技术学院学报，2006．9，5(5)，P5-7
    [19] 叶从欢．P2P和Grid从构建到应用全历程对比，电子技术应用，2006年第7期，P37-41
    [20] 陈洪，刘双与，杨玉华．P2P技术发展与应用，计算机工程，2003，29(19)，P127-130
    [21] 罗林球，孔祥疆，李晓，胡彬华，程扬．基于CORBA／数据字典／JDBC的异构数据库检索系统实现[J]．计算机应用，2006，p91-94
    [22] 余敏，李战怀，张龙波．P2P数据管理[J]．软件学报，2006．17(8)：1717-1730
    [23] Ng WS, Ooi BC, Tan KL, Zhou AY. PeerDB: A P2P-based system for distributed data sharing. In: Dayal U, ed. Proc. of the 19thInt' 1 Conf. on Data Engineering(ICDE). Bangalore: IEEE Computer Society Press, 2003. 633—644.
    [24] Heubsch R, Hellerstein JM, Lanham N, Loo BT, Shenker S, Stoeia I. Querying the Internet with PIER. In: Freytag JC, Lockemann PC, Abitebou] S, Carey MJ, Selinger PG, Heuer A, eds. Pr oe. of the 29th Int' 1 Conf. on Very Large Data Bases. San Fran sisco: M organ Kaufmann Publishers, 2003. 321-332.
    [25] Ooi BC, Shu YF, Tan KL. Relational data sharing in peer—based data management systems. ACM SIGMOD Record, 2003, 32(3): 59-64.
    [26] T. Shimura, M. Yoshikawa and S. Uemura. " Storage and Retrieval of XML Documents Using Object-Relational Database", in Proceedings of the 10th International Conference on Database and Expert Systems Applications(DEXA), Florence, Italy, Pages 206-217, August-September 1999.
    [27] Hui-I Hsiao, Joshua Hui, Ning Li, and Parag Tijare. "Integrated XML Document Management", Proc. VLDB Workshop on Efficiency and Effectiveness of XML Tools, and Techniques(EEXTT), Hong Kong, China, August 2002.
    [28] David DeHaan, David Toman, Mariaro P. Consens, and M. Tamer Ozsu. "A Comprehensive XQuery to SQL Translation using Dynamic Interval Encoding", Proc. ACM SIGMOD International Conference on Management of Data, pages 623-634,2003
    [29] Rajasekar K rishnamurthy, Venkatesan T. Chakaravarthy, RaghaKaushik, Jeffrey F. Naughton. "Recursive XML Schema, Recursive XML Queries, and Relational Storage: XML-to-SQL Query Translation", 20th International Conference on Data Engineering(ICDE 2004),30 March-2 April 2004, Boston, MA, USA.
    [30] 李瑞轩，卢正鼎，肖卫军，王治纲．多数据库系统中基于模式映射树的查询分解和优化[J]．华中科技大学学报(自然科学版)，2003，31(11)：22-24，30(EI)
    [31] 俞红奇，丁宝康．多数据库环境下的模式集成及查询分解[J]．计算机工程，2000，10：124-126．
    [32] 李瑞轩，卢正鼎，吴炜，肖卫军．多数据库系统中查询分解算法的研究[J]．小型微型计算机系统，2001，22(4)：488-491(EI)
    [33] 卢秉亮．分布式查询分解及优化[J]．辽宁税务高等专科学校学报，2000，(1)，47-49
    [34] 王宁，徐宏炳，王能斌．数据源集成系统中动态字典构造方法研究[J]．计算机学报，1999，22(1)：103-107
    [35] 黄玲，李陶深．基于多数据库的工程数据库模式集成与查询分解的研究[J]．计算机工程与应用，2002，(7)：193-196．
    [36] 王宁，王能斌．异构数据源集成系统查询分解和优化的实现[J]．软件学报，2000，11(2)：222-228．
    [37] 王宁、徐宏炳、王能斌．异构数据源集成系统中基于数据源能力的查询分解和优化策略[J]．计算机学报，Jan．1999，Vol．22，No．1，31～38
    [38] 张敬伟，周娅．基于集合划分的分布式数据库查询分解算法[J]．桂林电子科技大学学报，Feb．2003，Vol 23，No．1，61-64
    [39] 孟小峰，周龙骧，王珊．数据库技术发展趋势[J]．软件学报，2004，15(12)：1822 1836．
    [40] 姬琳，杨岳湘，王韶红．分布式计算资源管理[J]．通信学报，2005，26(1)，P259-263
    [41] 张然，钱德沛等．基于JDBC技术的Web数据库集成[J]．计算机科学，2001，28(3)，P65-68
    [42] 宋波．基于Servlet与JDBC技术的Web数据库应用[J]，微型机与应用，2001，20(1)，P34-36