异构数据集成的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
异构数据集成的任务就是通过给用户提供一个统一的应用平台,从而屏蔽底层数据源的不同,使得用户可以无缝而且灵活的访问这些异构的数据源。
     本文研究了异构数据集成的相关理论和技术,对现有数据集成方法进行分析,采用Mediator/Wrapper方式并结合本体技术,提出了一种数据集成中间件体系结构——HDSQ(Heterogeneous Data Source Query)。该结构采用成熟的XML技术,并利用本体描述精确语义的优势,使用本体描述语言OWL描述全局数据模式,解决数据源的语义异构问题。同时采用Web Service技术,实现对各类数据源的封装和远程调用,使集成结构更适用于分布式环境。
     基于上述理论,设计实现了一个数据集成检索系统,该系统提供数据源配置,数据源包装和查询处理等多项功能,通过统一界面提供对各种关系数据库和XML文档的透明访问。
     本文的主要研究工作如下:
     (1) 使用本体(ontology)描述领域概念,构建了全局本体和局部本体及局部到全局的映射关系,解决了异构数据集成中语义冲突的问题。
     (2) 设计了异构数据集成的体系结构,给出了各模块的具体功能,并实现了一个原型系统。
     (3) 将各数据源包装器封装为Web Service,简化了调用机制,增加了系统的灵活性和可扩展性。
The purpose of data integration system is to provide users with seamless and flexible access to heterogeneous data source through a unified query interface which can hide the difference of underlying data source.The paper focuses on correlative theories and technologies of heterogeneous data integration and presents HDSQ (Heterogeneous Data Source Query) architecture of data integration which is based on analysis and comparing the existing data integration approach and the character of architecture. HDSQ adopts Mediator/Wrapper method and combines with ontology technology. This architecture uses OWL (Ontology Web Language) to represent global schema.lt provides an effective approach to solve the problems of semantic heterogeneity on data integration. The HDSQ that implements the encapsulation and invoking of the data source based on Web Service technology is fit for the distributed system environment.Finally, a system prototype which is based on the methods and technologies for data integration and query is introduced. The prototype can completely perform integration of schema, query processing and source configure. It provides transparent access to existing database management systems and XML data through a unified query interface.The main works of this paper are as follows:(1) The concept of field is described by ontology. The global ontology and local ontology are constructed. The relation between them is presented.The
    method solves the problems of semantic heterogeneity on data integration.(2) The heterogeneous data integration architecture is designed The functions of components are described and the prototype system is realized.(3) The method that data source wrapper is encapsulated as Web Service makes invocation mechanism simple and improves the flexibility and the extensibility.
引文
[1] Sriram Raghavan, Hector Garcia-Molina. Integrating Diverse Information Management Systems: A Brief Survey. www.almaden.ibm.com/cs/people/rsriram/pubs/integsurvey.pdf
    [2] 吕行,基于XML的异构数据源集成系统研究与应用,学位论文,2004年7月
    [3] Levy, A.. Logic-based techniques in data integration. Logic Based Artificial Intelligence, Edited by Jack Minker. Kluwer Publishers. 2000.
    [4] Reinoso Castillo,. Ontology-Driven Information Extraction and Integration from Autonomous, Heterogeneous, Distributed data sources-A Federated Query-Centric approach. Masters Thesis. Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa, State University. 2002.
    [5] Agustina Buccella and Alejandra Cechich. An Ontology Approach to Data Integration. JCS&T Vol. 3 No. 2, October 2003.
    [6] Busse, S., Kutsche, R. -D., Leser, U., Weber H. Federated Information Systems: Concepts, Terminology and Architectures. Technical Report. Nr. 99-9, TU Berlin. April 1999.
    [7] Cheng Hian Gob. Representing and Reasoning about Semantic Conflicts in Heterogeneous Information Sources. Phd, MIT, http://ccs.mit.edu/ebb/peo/mad.html
    [8] Nancy Wiegand, NaijunZhou. Extending XML Web Querying to Heterogeneous Geospatial Information. www.digitalgovernment.org/dgrc/dgo2003/cdrom/PAPERS/internet_web2/wiegand/.pdf
    [9] Zhan Cui, Dean Jones and Paul O' Brien. Issues in Ontology-based Information Integration.www.csd.abdn.ac.uk/~apreece/ebiweb/papers/cui.pdf
    [10] 孟小峰,Web信息集成技术研究,计算机应用与软件,卷20(11):32-36, 2003,11
    [11] Ruxandra Domenig, Klaus R. Dittrich, "An Overview and Classification of Mediated, Query System", SIGMOD Record,, 28(3), pp. 63-72, 1999.
    [12] A. P. Sheth and J. A. harson, "Federated Databases for managing distributed, heterogeneous, and autonomous databases" Computing Surveys 22: 3, pp. 183-236, 1990.
    [13] Antonis Ramfosl, Ralph Busse. CORBA-BASED DATA INTEGRATION FRAMEWORK. Proc. Intl. Workshop on Issues and Applications of Database Technology IADT98, Berlin, 1998
    [14] Mark Hansen, Stuart Madnick, and Michael Siegel. Data Integration Using Web Services. EEXTT and DIWeb 2002, LNCS 2590, pp. 165-182, 2003. Springer-Verlag Berlin Heidelberg 2003.
    [15] 袁磊,数据集成模型框架探讨,http://www.amteam.org/static/50503.html
    [16] R. Ahmed, P. D. Smedt, w. Du, The Pegasus Heterogeneous Multidatabase System, IEEE Computer., Vol. 24, No. 12, December 1991
    [17] William Kelley, Sunit Gala. Schema architecture of the UniSQL/M multidatabase system. Modem Database Systems. 1995
    [18] Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, V. Vassalos, J. Widom (1996). The TSIMMIS approach to mediation: Data models and Languages. Journal of Intelligent Information Systems
    [19] Arens, Y., Chee, C., Hsu, C., and Knoblock, C. (1993) Retrieving and Integrating Data from Multiple Information Sources. International Journal of Intelligent and Cooperative Information Systems. Vol. 2, No. 2. Pp. 127-158
    [20] Chaitanya K. Baru, Amarnath Gupta, Bertram Ludaischer, Richard Marciano, Yannis Papakonstantinou, Pavel Velikhov, Vincent Chu, "XML-Based Information Mediation with MIX, " Proceeding of International Conference on ACM SIGMOD, pp. 597-599, 1999.
    [21] Kangchan Lee, Jaehong Min, Kishik Park, A Design and Implementation Of XML-based Mediation Framework(XMF)for Integration of Internet Information Resources. Rroceedings of the 35th Hawaii International Conference on Dystem Science-2002.
    [22] H. Wache, T. Vogele, U. Visser. Ontology-based integration of information-a survey of existing approaches. Proceedings of the International Workshop on Ontologies and Information Sharing, pages 108-117, August 2001.
    [23] Jaime A Reinoso Castillo, Adrian Silvescu, Doina Caragea, Jyotishman Pathak, Vasant G Honavar. Information Extraction and Integration from Heterogeneous, Distributed, Autonomous Information Sources-A Federated Ontology-Driven Query-Centric Approach. www.cs.iastate.edu/~honavar/Papers/indusfinal.pdf
    [24] Agustina Buccella and Alejandra Cechich. Am Ontology Approach to Data Integration. JCS&T Vol. 3 No. 2, October 2003.
    [25] B. Amann, C. Beeri, 1. Fundulaki, and M. Scholl. Ontology-Based Integration of XML Web Resources. In International Semantic Web Conference (ISWC 2002), 2002.
    [26] Cui, Z., Jones, D. and O' Brien, P. Issues in Ontology-based Information Integration. IJCAI- Seattle, USA·August 5 2001.
    [27] 邓志鸿,唐世渭,张铭等,Ontology研究综述北京大学学报(自然科学版),第38卷,第5期,2002年9月
    [28] I. F. Cruz and H. Xiao and F. Hsu, An Ontology-based Framework for Semantic Interoperability between XML Sources, Eighth Intemational Database Engineering\& Applications Symposium (IDEAS 2004)", July, 2004
    [29] 宋炜,张铭,《语义网简明教程》,高等教育出版社,2004年6月第一版。
    [30] T. Bray, J. Paoli, C. Sperberg-McOueen, E. Maker, Extensible markup Language (XML) 1.0, Second edition, W3C, URLhttp://www.w3.org/TR/2000/REC-XM-20001006
    [31] P. Biron, A. Malhotra, XML schema Dart 2: datatypes, W3C recommendation, W3C, URL http://www.w3.org/TR/2001/REC-XMLschema-2-20010502
    [32] Ashvin Radiya, Vibha Dixit. 使用XML Schema定义元素的基本知识.http://www-128.ibm.com/developerworks/cn/XMh/XML-schema. 2003年12 月
    [33] 金玉玲,陈培久,裘江南,XQuery——种全新的XML查询语言。情报学报第21卷,第4期2002年8月
    [34] 李霞,XQuery查询语言及应用实例分析,2004年7月http://www-128.ibm.com/developerwoks/cn/XML/x-xqueryl/?ca=dwcn-newslette-XML
    [35] thomas r. gruber, toward principles for the design of ontologies used for knowledge sharing, revision: august 23, 1993.
    [36] grigoris antoniou, frank van harmelen, web ontology language: owl. 19. w3c的网站: http://www.w3.org/
    [37] 吴昊,邢桂芬.基于本体的信息集成技术研究.计算机应用第25卷第2期,2005年2月.
    [38] Heather Kreger,Web服务概念性体系结构,http://www-900.ibm.com/developerWorks/cn/webservices/ws-wsca/part1/index.shtml
    [39] Mark Hansen, Stuart Madnick, and Michael Siegel. Data Integration Using Web Services. EEXWW and DIWeb 2002, LNCS 2590, pp. 165-182, 2003. Springer-Verlag Berlin Heidelberg 2003
    [40] 段智华,浅谈SOAP.IBM中国网站http://www.ibm.com/developer/works/developer/works/cn/XML/x-sisoap
    [41] 柴晓路.SOAP应用模式:基础与安全.http://www-900.ibm.con/developerWorks/cn/XML/x-SOAPapp/part1/index.shtml
    [42] 异构平台的数据仓库与数据开采技术http://www.tongji.edu.cn/-yangdy/database/paper2.htm
    [43] Ioanna Ko_na. Integrating XML data sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM), Master's Thesis, Heraklion, February 2005
    [44] Alon Y Levy. Answering Queries Using Views: A Survey. VLDB Journal, 2001: 10(4).
    [45] Chaitanya Baru, Amarnath Gupta. MIX: Mediation of Information Using XML, http//www.sdsc.edu/DICE/mix.html.
    [46] M ano lescu I, F lo rescu D, Ko ssmann D. Answering XML queries over heterogeneous data sources. The VLDB conference, 2001.
    [47] B. Amann, C. Beeri, I. Fundulaki, M. Scholl. Querying XML Sources Using an Ontology-Based Mediator. CoopIS/DOA/ODBASE, Springer-Verlag Berlin Heidelberg, 2002.
    [48] 陶春.半结构化数据集成系统中的查询处理研究.学位论文 2004年4月
    [49] Ioanna Koffna, Integrating XML data sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Master's Thesis Heraklion, February 2005
    [50] S. Cluet, P. Veltri, D. Vodislav. Viewsin a Large Scale XML Repository. In: Proc of the Confon VLDB. Roma, Italy, 2001.
    [51] The Protege Ontology Editor and Knowledge Acquisition System. http://protege.stanford.edu/
    [52] Altova XMLSpy 2005-XML Editor. http://www.altova.com/products_ide.html
    [53] Brian McBride. An Introduction to RDF and the Jena RDF API. http://jena.sourceforge.net/tutorial/RDF_API/index.html.2005/03

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700