基于领域本体的Deep Web不确定性模式匹配研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网技术的不断发展,出现越来越多的网络信息资源,如何利用的问题引起广大网民和学术研究人员的关注。根据Web的信息资源的分布和位置特征可以将其划分为Surface Web与Deep Web两部分。传统的搜索引擎仅能检索Surface Web信息,而对于信息量更大、信息质量更好、主题更专一、结构性更强的Deep Web数据库信息却不能有效的爬取。
     Deep Web信息集成是有效利用Deep Web信息资源的重要手段。Deep Web查询接口集成的研究是信息集成研究的核心内容,有着重要的“承上启下”的作用。目前的查询接口集成研究存在一些问题:中文语义计算不够准确、查询接口模式匹配的方法复杂、时间复杂度较大、模式匹配的不确定性考虑不多等。针对这些缺点和不足,本文提出一种基于领域本体的查询接口集成方法,该方法是一种整体匹配方法,打破了传统的两两匹配方法在效率上的瓶颈,大大简化了匹配的复杂过程。同时提出一种不确定性匹配的选择标准,为不确定性匹配的研究开拓了新的思路。本文主要的研究工作和贡献概括如下:
     (1)本文重点介绍了本体相关知识并分析了领域本体的组成结构,根据领域本体的构建方法并结合旅游领域相关Deep Web查询接口属性和实例特征,使用更规范的、表达能力更强的本体语言OWL2作为编码语言,构建了面向查询接口的旅游领域本体。
     (2)本文在深入研究和分析传统模式匹配技术基础上,提出了一种基于领域本体的Deep Web查询接口模式匹配方法,利用该方法实现了对特定领域的大量查询接口的整体匹配,匹配效率上大大优于传统的两两匹配。该方法充分的利用了本体概念与概念之间的语义关系,实现了查询接口在语义级别上的理解。
     (3)本文对模式匹配中最重要的相似度计算问题提出了一种改进的属性相似度计算方法。该方法用于中文查询接口集成中的模式匹配问题,考虑到中文查询接口属性名称出现的规律和特点,在基于知网的中文语义相似度计算的基础上改进了属性相似度计算的公式。实验证明使用该公式大大提高了计算的准确率。
     (4)本文对不确定性模式匹配的评价提出了基于属性位置判断属性匹配可信度的观点,并给出了属性匹配可信度量化的计算公式,帮助我们选择更合理的匹配结果。
     (5)本文实现了基于本体的查询接口集成系统,包括本体管理模块、查询接口预处理模块、相似度计算模块、模式匹配生成模块和查询接口集成模块。在系统实现的基础上评估并验证了本文的关键技术和算法,为实验结果数据的收集创造了良好的平台。
     最后,通过建立的系统平台,设置相应的实验,对实验结果进行分析与评价,验证了基于本体的模式匹配方法的性能特点和改进的属性相似度计算方法的准确率。
With the continuous development of Internet technology, more and more network information resources keep appearing, there goes the problem how to utilize the resources which caused the concern of the majority of Internet users and academic researchers'attention. According to the distribution of the information resources, Web position and feature can be classified as Surface Web and Deep Web two parts. The traditional search engines can only retrieve Surface Web information, and cannot crawl the more informative, better qualified and more specific, stronger-structured Deep Web database information effectively.
     Deep Web information integration is the important means to use Deep Web information resources effectively. Deep Web querying interface integrated research is the core of information integration, which plays the important "transitional" role. The current inquires the interface integration has some problems:Chinese semantic calculation is not accurate enough, inquires interface schema matching method is complex, time complexity is relatively huge, lack of consideration in schema matching uncertainty, and etc. According to these shortcomings and the insufficiencies, this paper proposes a method based on the domain ontology querying interface integration method, this method is a kind of whole matching method, which has broken the low efficient bottleneck in traditional two-two matching approach, greatly simplified the complex matching process. Meanwhile put forward a kind of selection criteria for uncertainty selection, opened up new ideas to the research of uncertain matching. This main research work and contributions of the paper can be summed up as follows:
     (1) This paper mainly introduces ontology and analyzes the structure of domain ontology, according to the method of constructing domain ontology and combining related Deep Web querying interface properties and real case in tourism field uses more standard, expressive ontology language OWL2 as code language, constructs query-oriented interface tourism domain ontology.
     (2) Based on thorough study and analysis of the traditional schema matching technology, this paper presents a querying interface schema matching method based on the Deep Web domain ontology, using this method, it is possible to realize the holistic matching in specific areas, which is much better than the traditional pairwise matching in efficiency. The method makes full use of ontology concept and the semantic relations between concepts realized the understanding of querying interface in the semantic level.
     (3) For the most important problem similarity calculation in schema matching, this paper proposes an improved attribute similarity calculating method. The method was applied to the Chinese query interface integration mode matching problem, considering the rules and characteristics in Chinese query interface, improved the Chinese semantic similarity calculation formula based on HowNet. Experiments evidence showed that this formula can greatly improve the calculation accuracy.
     (4) This paper proposes the idea that to base on the attribute location to evaluate the attribute matching credibility in relating to the uncertainty schema matching, and gives the quantitative calculation formula of matching credibility, which can to help us choose more reasonable matching results.
     (5) This paper realized the integration system based on the ontology-based querying interface, including noumenon management module, pretreatment module for inquires interface, similarity calculation module, schema matching generation module and query interface integration module. Assessd and proved the key technology and calculation method proposed in this paper, provides a good platform for the collection of experimental result data.
     Finally, through the established system platform, set the corresponding experiment, then analyzed and evaluated the experimental results, proved the attribute matching method based on ontology and the accuracy of the improved similarity calculation method.
引文
[1]Barbosa L.Freire J.Siphoning Hidden-Web Data through Keyword-based Interfaces[C]//Brazilian Symposium on Databases(SBBD).2004
    [2]Ghanem T M,Aref W G.Databases Deepen the Web[J].IEEEComputer,2004,73(1):116-117
    [3]Chang K. C, He B, Li C, et al. Structured Databases on the Web:Observations and Implications [J]. SIGMOD Record,2004,33(3):61-70.
    [4]Bergman M K. The deep web:Surfacing hidden value[J].Tech rep, Bright Planet LLC, Dec 2000.
    [5]Deep Web Technology[CP/OL].Accessible at http://www.deepwebtech.com/, Oetober 2005.
    [6]Invisiable.com[CP/OL].Accessible at http://www.invisiable.com/, October 2005.
    [7]MetaQuerier Researeh Group[CP/OL].Accessible at http://metaquerier.es.uiuc.edu/, October 2005.
    [8]He H, Meng W, Yu C.T, Wu Z:WISE-Integator:an automatic integrator of Web search interfaces for e-commerce[C]. In Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, 2003,357-368.
    [9]Davulcu H,Freire J,Kifer M, Ramakrishnam I V.A layered architecture for querying dynamic Web content[C].In SIGMOD'99 Proceedings, Philadelphia,PA, May 1999,191-502.
    [10]Raghavan S,Garcia-Molina H.Crawling the hidden Web[C].In Proceedings of the 27th International Conference on Very Large Data Bases.Roma,Italy,2001,129-138.
    [11]Doorenbos R B, EtZioni O, and Weld D. A scalable comparison shopping agent for the World-Wide Web[C].In Proceedings of the First Intenational Confence on Autonomous Agenis, Marina delRey, CA, Februry 1997,39-48.
    [12]QProber Research Group[CP/OL].Accessible at http://qprober.cs.columbia.edu/.
    [13]H He, W Meng, C Yu, Z Wu, Wise-integrator:an automatic integrator of web search interfaces for E-commerce, In:Proceedings of the 29th International Conference on Very large databases, Berlin, Germany,2003, PP.357-368
    [14]J Wang, J R Wen, F Lochovsky, W Y Ma, Instance-based schema matching for web databases by domain-specific query probing, In:Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada,2004, pp:408-419
    [15]刘伟,孟小峰,Deep Web数据集成研究综述,计算机学报,2007,14(1):28-35
    [16]X Meng, W Liu, Vision-based Web Data Records Extraction, In:Proceedings of the 9th SIGMOD International Workshop on the Web and Databases, Chicago, Illinois, USA,2006, pp.20-25.
    [17]凌妍妍,刘伟,王仲远,艾静,孟小峰,Deep Web数据集成中的实体识别方法,计算机研究与发展,2006,43(增刊):46-53.
    [18]W Liu, X Li, Y Ling, X Zhang, X Meng, A Deep Web Data Integration System for Job Search, Wuhan University Journal of Natural Sciences,2006,11(5):1197-1201.
    [19]寇月,申德荣,李冬,聂铁铮,一种基于语义及统计分析的Deep Web实体识别机制,2008,19(2):194-208.
    [20]Y Kou, D Shen, G Yu, T Nie, Combining Local Scoring and Global Aggregation to Rank Entities for Deep Web Queries, Journal of Computer Science and Technology,2009,24(4):626-637.
    [21]徐和祥,王鑫印,王述云,胡运发,基于知识的Deep Web集成环境变化处理的研究,软件学报,2008,19(2),257-266.
    [22]P Zhao, Z Cui, L Gao, H Zhong, Vision-based Deep Web Query Interfaces Automatic Extraction, Journal of Computational Information System,2007,3(4):1441-1448.
    [23]王辉,刘艳威,左万利,使用分类器自动发现特定领域的深度网入口,软件学报,2008,19(2):246-256.
    [24]袁柳,李战怀,陈世亮,基于本体的Deep Web数据标注,软件学报,2008,19(2):237-245.
    [25]邓志鸿,唐世渭,张铭等,Ontology研究综述[J],北京大学学报(自然科学版),Sep2002,Vo1.38(5):730-738。
    [26]Guarino N.Semantic Matching:Formal Ontological Distinctions for Information Organization, Extraction, and Integration.Information Extraction:A Multidisciplinary Approach to an Emerging Information Technology, Springer Verlag,1997,139-170
    [27]Guarino N, Formal ontology and information systems[C]. In Proceedings of Nicola Guarino, eds, Proc of the 1st Int Conf on Formal Ontologies in Information Systems(FOIS'98). IOS Press,1998:3-5.
    [28]A.Gangemi, N.Guarino, C.Masolo, A.Oltramari, and L.Schneider, Sweetening Ontologies with DOLCE[C], Proc of the 13th International Conference on Knowledge Engineering and Knowledge Management, Ontologies and the Semantic Web, Springer,2002, pp:166-181.
    [29]I.Niles,A.Pease, Towards a Standard Upper Ontology[C].Proc of the 2nd International Conference on Formal Ontology in Information Systems, Ogunquit, Maine, USA,2001, pp:2-9.
    [30]Gruber T R. Towards Principles for the Design of Ontologies Used for Knowledge Sharing[J]. International Journal of Human and Computer Studies,1995,43(5/6):907-928.
    [31]Gruninger M, Fox S M. Methodology for the design and evaluation of ontologies[A]. In:Proceedings of the Workshop on Basic Ontological Issues in Knowledge Sharing[C], held in conjunction with IJCAI-95, Montreal, Canada,1995.
    [32]Uschold M, Gruninger M. Ontologies:Principles, methods and applications[J], The Knowledge Engineering Review,1996,11(2).
    [33]Abir Qimitre A. Dimitrov, Jeff Heflin. ISENS:A Multi-ontology Query System for the Semantic Deep Web[C]. In Proceedings of 10th IEEE Conference on E-Commerce Technology and the Fifth IEEE Conference on Enterprise Computing, E-Commerce and E-Services,2008:396-399.
    [34]Kathrin Prantner, Ying Ding, Michael Luger, et all. Tourism Ontology and Semantic Management System:State-of-The-Arts Analysis[A].In:IADIS International Conference WWW/Internet[C], Vila Real, Portugal,2007:111-115.
    [35]Huaming Gong, Jianyi Guo, ZhengTao YU, et all. Research on the Building and Reasoning of Travel Ontology[A]. In:Intelligent Information Technology Application Workshops,2008. IITAW'08. International Symposium on[C], Shanghai,2008:94-97.
    [36]Aumuller D, Do HH, Massmann S, Rahm E. Schema and ontology matching with COMA++. SIGMOD Conference 2005:906-908
    [37]Do HH, Rahm E. COMA:A System for Flexible Combination of Schema Matching Approaches. VLDB Conference,2002:610-621.
    [38]Cruz IF, Antonelli FP, Stroe C. AgreementMaker:Effcient Matching for Large Real-World Schemas and Ontologies.VLDB Conference,2009:24-28.
    [39]Benkley S, Fandozzi J, Housmanetal E. Data element tool-based analysis(DELTA).The MITRE Corporation,Bedford,MA,1995.
    [40]Hovy E.Combining and standardizing large-scale, practical ontologies for machine translation and other uses.Proc.Int.Conf.on Language Resources and Evaluation(LREC),1998:535-542.
    [41]Mitra P, Wiederhold G, Jannink J. Semi-automatic integration of knowledge Sources. Proc.of Fusion, 1999.
    [42]Madhavan J, Bernstein PA, Rahm E.Generic Schema Matching with Cupid. VLDB Conference. 2001:49-58.
    [43]Palopoli L, Terracina G, Ursino D. The system DIKE:Towards the semi-automatic synthesis of cooperative information systems and data warehouses. ABDIS Conference,2000:108-117.
    [44]Mitra P, Wiederhold G, Kersten M.A Graph oriented model for articulation of ontology interdependencies. Proc. EDBT 2000:86-100.
    [45]Castano S,Ferrara A,Montanelli S.Matching ontologiesin open networked systems:Techniques and applications. Journal on Data Semantics,2006:25-63.
    [46]Noy NF, Musen MA.Anchor-prompt:Using non-local context for semantic matching. IJCAI Workshop on Ontologies and Information Sharing.2001:63-70.
    [47]Lee ML, Yang LH, Hsu W. XClust:Clustering XML Schemas for Effective Integration.CIKM Conf.,2002:Schemas for Effective Integration.CIKM Conf.,2002:292-299.
    [48]Velegrakis Y, Miller R, Popa L,Mylopoulos J. ToMAS:A system for adapting mappings while schemas evolve.ICDE Conference,2004.862.
    [49]Modica G, Gal A,Jamil H.The use of machine generated ontologies in dynamic information seeking. Proc.of CoopIS,2001:433-448.
    [50]Melnik S, Molina-Garcia H, Rahm E.Similarity flooding:A versatile graph matching algorithm.ICDE Conference,2002:117-128. 
    [51]Giunchiglia F,Shvaiko P,Yatskevich M.S-Match:an algorithm and an implementation of semantic matching.Proc.of ESWS,2004:61-75.
    [52]Bonifati A, Mecca G, et al. Schema mapping verification:the spicy way.EDBT Conference,2008:85-96.
    [53]Xu L,Embley DW.Discovering direct and indirect matches for schema elements.DASFAA Conference, 2003:39-46.
    [54]Doan AH, Domingos P, Levy A.Learning source descriptions for data integration. Proc.of the Workshop on the Web and Database,2000:81-86.
    [55]Doan AH,Madhavan J,Domingos P,Halevy A.Learning to map between ontologies on the semantic web. WWW Conference,2002:662-673.
    [56]Dhamankar R,Lee Y,Doan AH,et al.iMAP:Discovering Complex Semantic Matches between Database Schemas.SIGMOD Conference,2004:383-394.
    [57]Ichise R,Hamasaki M,Takeda H.Discovering relation-ships among catalogs. Proc.of Int. Conference on Discovery Science,2004:371-379.
    [58]Berlin J,Motro A.Database Schema Matching Using Machine Learning With Feature Selection.CAISE 2002:452-466.
    [59]Doan AH, Domingos P, Halevy AY.Reconciling Schemas of Disparate Data Sources:A Machine Learning Approach.SIGMOD Conference,2001:509-520.
    [60]Euzenat J,Valtchev P.Similarity-based ontology alignmentin OWL-lite. European Conference on Artificial Intelligence,2004.333-337.
    [61]Tang J, Li J, Liang B, Huang X, Li Y, Wang K. Using Bayesian decision for ontology mapping. Journal of Web Semantics,2006,4(1):243-262.
    [62]Miller RJ, Hernandez MA, Haas LM,Yan L. The Clio Project:Managing Heterogeneity. SIGMOD Record,2001,30(1):78-83.
    [63]Nandi A, Bernstein PA.HAMSTER:Using Search Clicklogs for Schema and Taxonomy Matching. VLDB Conference,2009:24-28.
    [64]Do HH.Schema matching and mapping based data integration[Ph.D.Thesis].University of Leipzig,2006.
    [65]Giunchiglia F, Shvaiko P. Semantic matching. Knowledge Engineering Review,2003,18(3):265-280.
    [66]Bouquet P, Serafini L, Zanobini S. Semantic coordina-tion:A new approach and an application to schema matching.ISWC,2003:130-145.
    [67]Ehrig M. Ontology alignment:bridging the semantic gap. Semantic web and beyond:computing for human experience,2007.
    [68]Doan AH, Halevy A. Semantic integration research in the database community:A brief survey.AI Magazine.2005,26(1):83-94.
    [69]Rahm E, Bernstein PA. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal, 2001,10(4):334-350.
    [70]Shvaiko P, Euzenat J. A survey of schema-based matching approaches. Journal on Data Semantics, 2005:146-171.
    [71]B He, K C Chang, Statistical schema matching across web query interfaces, In:Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, New York, NY, USA,2003, pp.217-228
    [72]W Wu, C Yu, A H Doan, W Meng, An interactive clustering-based approach to integrating source query interfaces on the deep web, In:Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, New York, NY, USA,2004, PP.95-106
    [73]W Wu, A H Doan, C Yu, WebIQ:Learning from the web to Match Deep-Web Query Interfaces, In: Proceedings of the 22nd International Conference on Data Engineering, Washington, DC, USA,2006, PP.44
    [74]DONG X L, HALEBTAY, YU C. Data integration with uncertainty [J]. The VLDB Journal:The International Journal on Very Large Data Bases,2009,18(2):469-500.
    [75]MAGNANIM, MONTESID. Uncertainty in data integration:Cur-rent approaches and open problems [C]//Proceedings of the First VLDB Workshop on Management of Uncertain Data. Netherlands:Centre for Telematics and Information Technology (CTIT),2007:18-31.
    [76]DHAMANKAR. R, LEE Y, DOAN A, et al. MAP:Discovering complex semantic matches between database schemas [C]//Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. New York:ACM,2004:383-394.
    [77]MAGNANIM, RIZOPOULOSN, MCBRIEN P, et al. Schema integration based on uncertain semantic mappings [C]//International Conference of Conceptual Modeling, LNCS 3716. Berlin:Springer-Verlag, 2005:31-46.
    [78]MADHAVAN J, BERNSTEIN P, RAHM E. Generic schema matching with Cupid [C]//Proceedings of the 27th International Conference on Very Large Data Bases. San Francisco, CA:Morgan Kaufmann Publishers,2001:49-58.
    [79]DOAN A, MADHAVAN J, DOMINGOS P, et al. Learning to map between ontologies on the semantic Web[J]. The VLDB Journal:The International Journal on Very Large Data Bases,2003,12(4):303-319.
    [80]GAL A. Managing uncertainty in schema matching with top-k schema mappings [J]. Journal on Data Semantics VI:Special issue on Emergent Semantics,2006,4090:90-114.
    [81]Nottelmann H, Straccia U. Splmap:A probabilistic approach to schema matching[J]. Lecture Notes in Computer Science,2005,3408:81-95.
    [82]Nottelmann H, Straccia U. Information retrieval and machine learning for probabilistic schema matching[J]. Information Processing and Management,2007,43(3):552-576
    [83]Gal A, Anaby-TavorA, Trombetta A, MontesiD. A framework formodeling and evaluating automatic semantic reconciliation [J]. The VLDB Journa,l 2003,14(1):1-18.
    [84]DeyD, SarkarS, De P. A distance-based approach tc entity reconciliation in heterogeneous databases[J]. IEEE Transactions on Knowledge and Data Engineering,2002,14(3):567-582.
    [85]Magnani M, Rizopoulos N, Mc Brien P, et al. Conceptual modeling-ER 2005 [M]. Heidelberg: Springer Berlin,2005:31-46.
    [86]Conglei Yao,Yongjian Yu,Sicong Shou,Xiaoming Li.Towards a Global Schema for Web Entities[C].In Proceedings of WWW,2008,pp999-1008.
    [87]Hai He,Weiyi Meng,Clement Yu,Zonghuan Wu.WISE-Integrator:A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web[C].In Proceedings of the 31st VLDB Conference,2005,pp 1314-1317.
    [88]Hai He,Weiyi Meng,Clement T.Yu,Zonghuan Wu:Automatic integration of Web search interfaces with WISE-Integrator[J].VLDB J.2004,13(3):256-273.
    [89]洪辉,李石君,余伟,田建伟,基于语义的中文Deep Web查询接口集成[J].计算机科学,2008,35(3):61-64.
    [90]金玉,范学峰,基于《知网》的中文Deep Web模式匹配算法研究[J].计算机应用研究,2009,26(10):3750-3753
    [91]姜芳,孟小峰,贾琳琳Deep Web集成服务的不确定模式匹配[J].计算机学报,2008,31(8):1412-1421.