基于本体和XML的异构数据集成研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着Web的迅猛发展,因特网上的资源越来越丰富,已经成为一个巨大的全球化信息仓库。Web上的资源不仅包括传统的有严格数据模型的数据库,如关系数据库和面向对象的数据库,而且还包括无结构和半结构的数据,如大量的HTML文档、XML文档和文本数据。这些分布在各处的数据资源,在其设计阶段,主要是为了满足各自的业务需要而形成的,由于软硬件平台及数据模型的不同而成为了异构数据。异构数据互相之间难以集成和共享,使各数据源间的互操作变得困难,无法实现信息的共享和有效利用,从而成为“信息孤岛”。为了更好地利用网络上浩如烟海的信息,人们迫切需要集成这些地理分布、管理自治、模式异构的数据,因此异构数据集成问题吸引了众多关注。
     在本文中,先全面地分析了现有的数据集成方式,异构数据集成的相关理论和技术。然后指出了当前异构数据集成的主要问题是语义异构问题。在此基础上提出了一种基于本体和XML的异构数据集成系统模型,用来解决语义异构问题。设计了基于本体和XML的异构数据集成模型,并对模型中的关键模块进行探讨。本体的引入是为了解决异构数据集成中的语义异构。
     本文的研究主要有以下几点:
     (1)探讨了异构数据集成中的相关理论和技术。分析了现有的数据集成方法,指出了当前的数据集成中急需解决语义异构。
     (2)通过对已有的数据集成系统体系结构的研究,结合XML技术、本体技术和Web Services技术,提出了一种基于本体和XML的异构数据集成模型。对此模型中的功能模块给出了详细的描述,并对关键模块进行了测试。
     (3)采用XML作为中间语言,将各局部数据源数据转化为XML数据模式进行集成,从XML Schema上构建局部本体,从而屏蔽底层数据源的语法的异构性。
     (4)利用本体描述领域概念的优势,采用本体描述语言OWL构建全局本体和局部本体,同时定义了全局本体和局部本体的映射,局部本体和数据源的映射规则,解决数据集成中存在的语义异构问题。
     (5)将各个异构数据源包装器封装为Web Services,使系统具有松耦合、灵活、易扩展的良好特性,能真正实现异构数据源的无缝集成。
     (6)采用XQuery作为全局模式上的查询语言,容易实现对XML数据的查询。对针对全局模式(全局本体)的全局查询语句进行分解,分解为针对局部本体术语表示的子查询语句。
With the rapid development of Web, there are more and more resources in Internet which becomes a large and global information warehouse. The resource in Web not only include relational database, but also include html documents and xml documents. These scattering resources had been designed to satisfy the business need, and because of the difference between the software and hardware platform it excites heterogeneous phenomenon in these data source. It is hard to share and use the heterogeneous data source. The need of solving the integration problem of these heterogeneous data source becomes more important. However, the traditional data integration doesn’t fit the demand. So, new data integration is expected to appear.
     The paper analyzed different kinds of data integration method, relevant theories and technologies of heterogeneous data integration, concluded that the semantic heterogeneity of data source need to resolve. Based on this, in order to resolve the semantic heterogeneity of data source, a kind of the heterogeneous data integration system framework based on Ontology and XML was proposed. The key technologies: XML, Ontology and Web Services were discussed. This paper mainly imported the concept of Ontology and XML to realize the integration of heterogeneous data.
     The main achievements of this paper were as follows:
     (1)The relevant theories and technologies of heterogeneous data integration were discussed. Analyzed the advantage of these technologies in heterogeneous data integration, concluded that the semantic heterogeneity of data source should to resolve.
     (2)Based on the technologies of XML, Ontology and Web Services, a kind of the heterogeneous data integration system framework based on Ontology and XML was proposed. The functions of components were designed in detail, and the key modules of the prototype system were tested.
     (3)The paper Used XML Schema to express the data source, and built the local ontology from XML Schema. XML technique can solve the problems of syntax heterogeneity.
     (4)In the paper, it defined global ontology, local ontology, the mapping between global ontology and local ontology, and the mapping between local ontology and data source. It used OWL language to express them. The integration scheme that mapping between global ontology and local ontology was put forward to solve the problems of semantic heterogeneity.
     (5)The method of wrapping data source to construct Web Services featured the prototype with the following good qualities, loose coupling, more flexibility and more extensibility. With this method, the seamless data integration can be exactly achieved.
     (6)Use XQuery language to express global query, which is easy to query XML. Then it decomposed the global query to local ontology.
引文
[1]李学荣,李莎.基于元数据的异构数据源集成系统设计与实现[J].计算机应用,2005,25:209-210.
    [2]Levy,A..Logic-Based Techniques in Data Integration.Logic Based Artificial Intelligence, Edited by Jack Minker.Kluwer Publishers.2000.
    [3]Reinoso Castillo.Ontology-Driven Information Extraction and Integration from Autonomous, Heterogeneous,Distributed Data Sources-A Federated Query-Centric Approach.Masters Thesis.Artificial Intelligence Research Laboratory,Department of Computer Science,Iowa State University.2002.
    [4]Castillo J.A.R.,Silvescu A.,Caragea D.,Pathak J.,Honavar V.G..Information Extraction and Integration from Heterogeneous,Distributed,Autonomous Information Source-A Federated Ontology-Driven Query-Centric Approach[J].IEEE International Conference on 27-29 Oct.2003. 2003:183-191.
    [5]参达飞鹏,孟广猛.分布式数据库异构消解研究[J].计算机工程与应用,2004,40(6):187-190.
    [6]吴家菊,唐定勇,席传裕.基于 XML 的信息集成研究[J].计算机工程与设计,2007,28(20):5018-5020,5038.
    [7]顾天竺,沈洁,陈晓红,李慧,张舒,吴颜.基于 XML 的异构数据集成模式的研究[J].计算机应用研究 2007,24(4):94-96.
    [8]A.P.Sheth,J.A.Larson.Federated Database Systems far Managing Distributed. Heterogeneous and Autonomous Databases,ACM Computing Surveys, 1990,22(3):183-236.
    [9]V.Christophides,S.Cluet,J.Simeon.On Wrapping Query Languages and Efficient XML Integration. In Proceedings of ACM SIGMOD Conference on Management of Data,Dallas, USA,2000:141-152.
    [10]R.Hull and G Zhou.A Framework Supporting Data Integration Using the Materialized and Virtual Approaches.In Proc.Of the ACM SIGMOD Int Conf.On Management of Data, 1996:481-492.
    [11]孟小峰.Web 信息集成技术研究[J].计算机应用与软件,2003,20(11):32-36.
    [12]邓志鸿,唐世渭,张铭.Ontology 研究综述[J].北京大学学报(自然科学版),2002,38(5):728-730.
    [13]刘海滨,李冠宇,刘发军.基于 Ontology 的信息集成研究综述[J].计算机工程与应用.2005,41(25):159-161.
    [14]吴昊,邢桂芬.基于本体的信息集成技术研究[J].计算机应用,2005,25(2):456-458.
    [15]万常选.XML 数据库技术[M].北京:清华大学出版社,2005.
    [16]陆建江,张亚非,苗壮,周波.语义网原理与技术[M].北京:科学出版社,2007.
    [17]陈石.XML 技术及其应用[J].计算机应用研究,2002,19(3):31-33.
    [18]程雷.XQuery 语言的查询实现[D].中科院计算机研究所博士论文,2002.
    [19]M Ouzzani,A Bouguettaya Query Processing and Optimization on the Web. Distributed and Parallel Databases,2004,15(3):187-218.
    [20]雷琼.基于本体的异构数据集成研究[D].东北大学硕士论文,2005.
    [21] Gruber T R.A Translation Approach to Portable Ontology Specifications. Knowledge Acduisition, 1993,5:199-220.
    [22]Brost W N.Construction of Engineering Ontologies for Knowledge Sharing and Reuse.Ph.D. Thesis.University of Twente.1997.
    [23]Studer R,Benjamins V R,Fensel D.Knowledge Engineering:principles and methods.Data and Knowledge Engeering,1998,25(122):161-197.
    [24]杜小勇,李曼,王珊.本体学习研究综述[J].软件学报,2006,17(9):1837-1847.
    [25]Pater F.Patel-Schneider,Patrick Hayes,and Ina Horrocks.Owl-web ontology Language reference[S].W3C recommendation 10 february 2004,15 December 2004.
    [26]胡鹤,刘大有,王生生.Web 本体语言 OWL[J].计算机工程,2004,30(12):1-2,47.
    [27]Cui Z,Brien P.Domain Ontology Management Environment.In: Proceedings of th 33rd Hawaii Inernational Conference on System Sciences.2000.
    [28]张磊,吴笑凡,谢强,丁秋林.基于 Ontology 的多数据源语义集成研究[J].情报理论与实践,2005,28(6):656-659.
    [29]孙友仓,宋彩利,李润洲.一种基于 Web Services 的异构数据集成中间件[J].西安科技大学学报,2007,27(2):284-287.
    [30]Tosic,V.,Pagurek,B..On comprehensive contractual descriptions of Web services.The Proceedings of the 2005 IEEE International Conference on e-Technology,e-Commerce and e-Service, 29 March-1 April.2005:444-449.
    [31]柴晓路.Web Services 技术、架构和应用[M].北京:清华大学出版社,2003.
    [32]王占杰,安之廷,李金波.Web 服务协议栈安全研究[J].计算机应用研究,2003,20(6):4-6.
    [33]Mike Jasnowski(美),盖江南等译.Java XML 和 Web 服务宝典[M].电子工业出版社,2002.
    [34]Gruber T R.Towards Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies.1995,43:907-928.
    [35]Noy N F,McGuiness D L.Ontology Development 101:a guide to creating your first ontology.SMI Technical Report,SMI-2001-0880,Stanford University,USA.
    [36]protégé[DB/OL].http://protégé.stanford.edu/.
    [37]protégé 新手入门[DB/OL].http://bbs.w3china.org/list.asp?boardid=2.
    [38]向阳,王敏,马强.基于 Jena 的本体构建方法研究[J].计算机工程,2007,33(14):59-61.
    [39]Cluet S,Veltri P,Vodislay D.Views in a large scale XML repository[DB/OL]. htttp://www-rocq. inria.fr/veltri/papers.html.
    [40]费爱蓉,穆斌,蒋建国.基于本体的 XML 数据集成及映射关系的研究[J].合肥工业大学学报(自然科学版),2004,27(8):911-914.
    [41]Charles F.Goldfard(美),张晓辉等译.XML 手册(第四版)[M].电子工业出版社,2005.
    [42]Wache.H,Vogele.T,Visser.U,Stuckenschmidt.H,Schuster.G,Neumann.H,Hubner.Ontology-based integrationof information-A Survey of Existing Approaches[P],In Proceedings of IJCAI-01 Workshop:Ontologies and Information Sharing,Seattle,WA,2001:108-117.
    [43]黄烟波,张红宇,李建华,谭立球,李志.本体映射方法研究[J].计算机工程与应用,2005,41(18):27-29,33.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700