基于本体的Deep Web信息集成关键技术研究

英文题名：Research on Key Technologies of Ontology-Based Deep Web Information Integration
作者：方巍
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：Deep ; Web ; 信息集成 ; 本体 ; 知识表示 ; 数据源发现 ; 数据源选择 ; 信息抽取 ; 本体映射
英文关键词：Deep Web ; Information Integration ; Ontology ; Knowledge Representation ; Data Sources Discovery ; Data Sources Selection ; Information Extraction ; Ontology Mapping
学位年度：2009
导师：崔志明
学科代码：081203
学位授予单位：苏州大学
论文提交日期：2009-05-01

摘要

随着万维网(WWW)的飞速发展,Web尤其是Deep Web蕴含了各种各样的海量高价值信息,并且仍在以惊人的速度增长。Deep Web上的信息具有异构性、自治性和动态性等特点,这些特点决定了传统结构化信息集成方法已不能满足人们的需求。为了方便用户快捷准确的使用Deep Web中高价值信息,基于本体的Deep Web信息集成研究已成为一个非常迫切的问题,具有重要理论意义和广阔应用前景。
     在对Deep Web信息集成的研究现状和发展趋势进行了深入的分析后。在课题组前期工作的基础上,提出了一种基于本体的Deep Web信息集成方案。该方案包括面向Deep Web不确定知识表示的动态模糊描述逻辑方法、基于最大熵和本体的数据源发现技术、基于质量估计模型的数据源选择方法、以及基于多数据源同步标注的信息抽取和Deep Web语义集成中模糊性本体映射方法等内容。本文的主要研究工作和取得的创新成果包括:
     (1)一个完整、准确的本体是基于本体的Deep Web信息集成的必要前提。本文根据Deep Web特征半自动构建了Deep Web领域本体,并针对Deep Web本体学习和本体映射过程中存在不确定性知识表示问题,提出了一种面向Deep Web不确定知识表示的动态模糊描述逻辑方法(DFDLs),该方法弥补了传统描述逻辑方法对不确定性知识表示的不足。
     (2)针对Deep Web数据源的动态性和稀疏分布的特征,提出了一种基于最大熵分类器和领域本体的Deep Web数据源发现方法,该方法首先通过最大熵分类器进行Deep Web查询接口自动判定,然后利用基于本体的Deep Web聚焦爬虫发现Deep Web数据源,该方法使得聚焦爬虫聚焦访问那些可能链接到Deep Web入口页面的链接,从而避免访问下载不必要的页面。
     (3)通过服务质量可以评价Deep Web数据源的优劣,本文提出了一个基于领域本体的Deep Web数据源质量估计模型,并将其应用于Deep Web数据源选择过程中。采用此模型能够选取最符合用户需求的数据源,达到查询代价更少,效率更高的要求。
     (4)针对信息抽取过程中存在接口模式和结果模式缺失的问题,提出了一种多数据源间的同步标注方法。从一组Deep Web接口模式和结果模式中高效地学习领域本体知识,通过对本体的实例查询可实现多数据源间的同步标注。并成功应用此方法于Deep Web复杂结果页面抽取过程中。
     (5)针对基于本体的Deep Web信息集成过程中存在的不确定性模式匹配问题,将模式匹配问题转化为本体映射问题,提出了一个模糊性本体映射框架。在此框架中,运用了多个本体映射策略,从不同方面多个角度对本体特征进行描述,尽可能的发掘可能存在的映射关系,从模糊性角度表述映射过程。该方法为基于本体的Deep Web信息集成提供了一种有效和通用的自动映射策略。
     (6)Deep Web语义集成原型系统设计,本文根据所研究的关键技术和实际应用需求,设计并实现了一个Deep Web语义集成原型系统,该原型系统具有数据源发现、数据源选择、信息抽取和语义集成等功能。实际应用表明,该系统具有一定实用价值。
     本项研究工作受到国家自然科学基金项目“面向Deep Web的不完备知识处理的逻辑模型研究”(编号:60673092)、江苏省高技术研究计划项目“面向Deep Web的搜索和挖掘关键技术研究”(编号:BG2005019)、江苏省高校研究生科研创新计划项目“基于本体的Deep Web数据源发现与选择技术研究”(编号:CX08B-099Z)以及2008年苏州大学优秀博士论文选题项目资助(苏大研字[2008]22号)的资助。
As the rapid development of Word Wide Web (WWW), Web especially Deep Web contains various kinds of huge high-valued information which is developing at an amazing speed now. Information hidden in Deep Web has such characteristics as heterogeneous, autonomous and dynamic, which decide that the methods of traditional information integration could not meet the requirements of modern people. In order to make it easier for the users to obtain the high-valued information rapidly and accurately, the research on Ontology-Based Deep Web Information integration has been an urgent problem pressed for solution for its broad application theoretical significance.
     In this thesis, the current research status and development trends of Deep Web information integration have been deeply analyzed. Based on the preliminary work of our research group, this dissertation puts forward an Ontology-Based Deep Web Information integration solution, which covers the dynamic fuzzy description logic method for Deep Web uncertain knowledge representation, the discovery technique based on maximum entropy and ontology of Deep Web sources, Deep Web data sources selection based on quality estimate model , the semantic annotation based on multiple data sources synchronous, Deep Web fuzzy ontology mapping and so on. The main research work and contributions of this dissertation are as follows.
     (1)An accurate and integrated ontology is a necessary precondition of Ontology-Based Deep Web Information integration, so we semi-automatically create the domain ontology of Deep Web in complicate with the characteristics of Deep Web. In addition, considering the uncertain problem of Deep Web ontology learning and ontology mapping, a dynamic fuzzy description logic (DFDLs) method based on uncertain knowledge representation is presented in order to overcome the deficiency of uncertain knowledge representation approaches used by the traditional description logic.
     (2) According to the dynamic and sparse distribution characteristics of Deep Web data sources, this dissertation brings forward a new method of detecting data sources based on maximum entropy classifier and domain ontology. This method firstly automatically identifies the Deep Web query interface through maximum entropy classifier, and then detects the data sources using a focused crawling technology based on domain ontology, which enables the focused crawler to focuse on visting those links which may access to entrance pages of Deep Web and avoid downloading some unnecessary pages in the whole process.
     (3) The efficiency and quality of Deep Web sources can be evaluated by the quality of services, so this paper proposes a quality estimation model of data source based on the domain of ontology, and applies it to the process of selecting the data sources. In this way, the model can select data source that best meets the users’exacting requirements, to achieve lower query cost and higher efficiency.
     (4) Considering the problem of interface schema and result schema missing in the process of information extraction, this paper provides a synchronous-annotation approach among multiple data sources, which can be realized by learning knowledge of domain ontology effectively from a set of interfaces and results schema of Deep Web and the case inquiry of ontology . This method is successfully applied to the data extract process of the complex result pages.
     (5) With regard to the problem of uncertain schema matching under the process of Ontology-Based Deep Web Information integration, this paper raises a new type of framework in which ontology mapping with uncertainty towards the uncertain schema matching. This framework integrates various ontology features, integrates several matching strategies and introduces the uncertain matching in each mapping strategy. This new approach is an efficient and general automatic mapping strategy for Ontology-Based Deep Web Information integration.
     (6) Based on the proposed key technologies and practice requirement, we propose Deep Web semantic integration architecture and implement a prototype system of Deep Web semantic integration. The system has functions such as sources discovery, sources selection, data extraction and semantic integration etc. Practical application shows that the system has certain practical value.
     This work is partially supported by Natural Science Foundation of China under grant No.60673092, the High-Technology Research Program of Jiangsu Province Under grant No. BG2005019, the Higher Education Graduate Research Innovation Program of Jiangsu Province in 2008 under grant No.cx08b-099z, and the Excellence Doctoral Dissertation Topic Selection Program of Soochow University in 2008 under grant No.SDY Zi [2008]22.

引文

[1] Thanaa Ghanem M, Walid Aref G.. Databases Deepen the Web [J]. IEEE Computer, 2004, 73(1):116-117.
    [2] Bergman M. K. The Deep Web: Surfacing Hidden Value [J]. In Journal of Electronic Publishing, 2002, 7(1):8912-8914.
    [3] Chang K. C.-C, He B, Li C, et al. Structured Databases on the Web: Observations and Implications [J]. SIGMOD Record, 2004, 33(3): 61-70.
    [4] Baader F. and W. Nutt. Basic Description Logic[M]. Handbook of Description Logic, Cambridge:Cambridge University Press, January, 2003.
    [5]陆建江,张亚非,苗壮,周波.语义网原理与技术[M].北京:科学出版社, 2007年3月,第一版.
    [6] Berners-Lee T, Hendler J, Lassila O.The Semantic Web [J].Scientific American, 2001,284(5):34-43.
    [7]吴朝晖,陈华钧.语义网格:模型、方法与应用[M].杭州:浙江大学出版社, 2008年5月第一版.
    [8] Robert B. D, Oren E, and Daniels. W. Ascalable comparison shopping agent for the World-Wide Web[C]. In proceedings of the First International Confence on Autonomous Agents, 1997, 39-48.
    [9] Raghavan S, Garcia-Molin H. Crawling the hidden Web[C]. In Proceedings of the 27th International Conference on Very Large Data Bases, 2001,129–138.
    [10] Ipeirotis P, Ntoulas A, Cho J, and Gravano L. Modeling and Managing Content Changes[C].In Text Databases Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE 2005), 2005,606-617.
    [11] Ipeirotis P. and Gravano L. When one Sample is not Enough. In Improving Text Database Selection Using Shrinkage[C]. Proceedings of the 2004 ACM SIGMOD International Conference On Management of Data, 2004,767-778.
    [12] He B. and Chang K.C.-C. Automatic Complex Schema Matching across Web Query Interfaces: A Correlation Mining Approach [J]. ACM Transactions onDatabase Systems (TODS), March 2006, 31(1):346-395.
    [13] Zhang Z, He B, and Chang K. C.-C. Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly[C]. In Proc. of the 31st Very Large Data Bases Conference (VLDB 2005), Trondheim, Norway, August 2005,87-108.
    [14] He B. and Chang K. C.-C. Making Holistic Schema Matching Robust: An Ensemble Approach[C]. In Proc. of the 2005 ACM SIGKDD Conference (KDD 2005), Chicago, Illinois, 2005,429-438.
    [15] Chang K. C.-C, He B, and Zhang Z.Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web[C]. In Proc. of the Second Conference on Innovative Data Systems Research (CIDR 2005), January 2005,44-55.
    [16] He H, Meng W, Yu C and Wu Z. WISE-Integrator: An Automatic Integrator of Web Search Interfaces for Ecommerce[C].In: Proc. 29th International Conference on Very Large Data Bases, 2003, 357-368.
    [17] Nie ZQ, Wen JR, Ma WY. Object-level Vertical Search[C]. In CIDR 2007,235-246.
    [18] Fang W, Cui ZM, Zhao PP.Ontology-Based Deep Web Sources Focused Crawling[C]. Lecture Notes in Artificial Intelligence, Springer, KSEM 2007, LNAI 4798, 2007,514–519.
    [19] Zhao PP, Huang L, Fang W, Cui ZM.Organizing Structured Deep Web by Clustering Query Interfaces Link Graph[C]. Lecture Notes in Artificial Intelligence, Springer , ADMA 2008, LNAI 5139, 2008,683-690.
    [20] Fang W, Xian XF, Zhao PP, Cui ZM. A Dynamic Fuzzy Description Logic [J]. Wuhan University Journal of Natural Sciences, Springer 2008, 13(4):417-420.
    [21] Fang W, Hu PY , Cui ZM. Ontology-Based Deep Web Data Sources Selection[C].Springer, Heidelberg, Lecture Notes in Artificial Intelligence, HAIS 2008,LNAI5271,2008,483-490.
    [22] Li X., Liu W. and Meng X.: Easy Querier-A keyword Based Web Integrated Interface [J]. Journal of Computer Research and Development, Nov. 2006, 43,S1:54-60.
    [23] Ling Y, Liu W, Wang Z, Ai J and Meng X.: Entity Identification for Deep Web Data Integration [J]. Journal of Computer Research and Development, Nov. 2006, 43,Sl:46-53.
    [24] Liu W, Meng XF, Meng WY.Vision-based Web Data Records Extraction[C]. In Proceedings of the 9th SIGMOD International Workshop on Web and Databases (SIGMOD-WebDB2006), Chicago, Illinois, June 30, 2006,20-25.
    [25] Pinto H S , Martins J P. A methodology for ontology integration[C].In Proceedings of the International Conference on Knowledge Capture, 2001,131-138.
    [26] Airio E, J¨arvelin K, Saatsi P, Kek¨al¨ainen J, and Suomela S. CIRI- An ontology-based query interface for text retrieval[C]. In Web Intelligence: Proceedings of the 11th Finnish Artificial Intelligence Conference, 2004, 2:73-82.
    [27] On to broker[EB/OL].http://Pontobroker.aifb.Uni-karlsruhe.de.
    [28] SKC[EB/OL].http://www-db.stanford.edu skc.
    [29] Guha R, McCool R.TAP: A Semantic Web Test-bed[J]. Journal of Web Semantics, 2003, 1(1):81-87.
    [30] Guha R, McCool R. Tap: Towards a web of data[EB/OL]. http://tap.stanford.edu/.
    [31] M¨akel¨a E., Hyv¨onen E., Sidoroff T. View-based user interfaces for information retrieval on the semantic web[C]. In: Proceedings of the ISWC-2005 Workshop End User SemanticWeb Interaction, 2005. http://www.seco.tkk.fi/publications/2005/makela-hyvonen-et-al-view-based-user-2005.pdf.
    [32] Heflin J, Hendler J. Searching the web with shoe[C]. AAAI2000 Workshop on AI for Web Search, 2000,35-40.
    [33] Ontobroker[EB/OL].http://Pontobroker.aifb.Uni-karlsruhe.de,2006-01-25.
    [34] Mayfield J, Finin T. Information retrieval on the Semantic Web: Integrating inference and retrieval[C].SIGIR Workshop on the Semantic Web. Toronto, August 2004. http://ebiquity.umbc.edu/get/a/publication/121.pdf.
    [35] Finin T, Mayfield J. Joshi A., et al.. Information Retrieval and the Semantic Web[C].Proceedings of the 38th Hawaii International Conference on System Sciences, 2005,113-120.
    [36] Ding L, Finin T, Joshi A,et al. Swoogle: A search and metadata engine for the Semantic Web[C]. In CIKM'04. Washington DC, USA, November 2004,652-659.
    [37]史忠植,董明楷,蒋运承等.语义Web的逻辑基础[J].中国科学E辑, 2004, 34(10) : 1123-1138.
    [38] Zhuge H.Semantic Grid: Scientific Issues, Infrastructure, and Methodology [J]. Communications of the ACM, 2005,48 (4):117-119.
    [39]吴刚,唐杰,李涓子等.细粒度语义网检索[J].清华大学学报(自然科学版), 2005, 45(1):1865-1872.
    [40]刘炎禄,俞勇.面向语义Web的知识表示框架[J].上海交通大学学报, 2002, 9:1306-1309.
    [41] Zhou Q, Wang C, Xiong M and Yu Y. SPARK: Adapting Keyword Query to Semantic Search[C]. To appear in Proceedings of 6th International Semantic Web Conference (ISWC 2007), LNCS4825, 2007,694-707.
    [42] Yoo Jung A, James G, W Y T, Soon Ae Chun.Semantic deep web: automatic attribute extraction from the deep web data sources[C].SAC'07, March 11-15, Korea,2007,1667-1672.
    [43] James G, Chun SA, An YJ.Toward the Semantic Deep Web[J].IT Systems perspectives, 2008, 9:95-97.
    [44]徐宝文,张卫丰.搜索引擎与信息获取技术[M].北京:清华大学出版社, 2003.
    [45]赵朋朋,高岭,崔志明.关于中国Deep Web的规模、分布和结构[J].小型微型计算机系统, 2007,28(10):1799-1802.
    [46]刘伟,孟小峰,孟卫一. Deep Web数据集成研究综述[J].计算机学报, 2007, 30 (9):1475-1489.
    [47] Grigoris A, Frank van H. A Semantic Web Primer [M].ISBN 0-262-01210-3. Cambridge, MA:MIT Press, 2004.
    [48] Lu J.J., Moerkotte G., Schue J., and Subrahmanian V.S.Efficient maintenance of materialized mediated views[C].In Proc. ACM SIGMOD Symp on the Management of Data, 1995, 340-351.
    [49]张凯.基于本体的Web信息集成若干关键技术研究[D].上海:复旦大学,2004.
    [50] Zhou G., Hull R., and King R.Generating data integration mediators that use materiabation [J]. Journal of Intelligent Information Systems, 1996,6(2/3):199-221.
    [51] Inmon W.H. and KeIIey C. Rdb/VMS: Developing the Data Warehouse[M].QEDPublishing Group, Boston: Massachussetts, 1993.
    [52]顾进广,陈莘萌.基于语义的XML信息集成技术[M].武汉:武汉大学出版社,2007年10月第一版.
    [53] Gruber T R. A translation approach to portable ontology specifications [J]. Knowledge Acquisition, 1993,5:199-220.
    [54] Borst W N. Construction of engineering ontologies for knowledge sharing and reuse [D]. Enschede, Netherlands :University of Twente, 1997.
    [55] Baader F, Calvanese D, McGUinness D., et al. The Description Logic Handbook: Theory, Implementation, and Application [M]. Cambridge: Cambridge University Press, 2003.
    [56] Baader F, Horrocks I, Sattler U. Description Logic as ontology languages for the semantic web[C].Lecture Notes in Artificial Intelligence, 2005,2605:228-248.
    [57] Daconta M C, Obrst L J, and Smith K T.The Semantic Web: a Guide to the Future of XML, Web Services, and Knowledge Management [M].Brisbane: John Wiley&Sons Inc, 2003.
    [58]胡东东,孟小峰.一种基于树结构的Web数据自动抽取方法[J].计算机研究与发展, 2004,41(10):1607-1613.
    [59] HuII R. and Zhou G. A framework for supporting data integration using the materialized and virtual approaches[C].In Proc.ACM SIGMOD Symp.on the Management of Data, 1996,481-492.
    [60] Maedche A. Ontology Learning for the Semantic Web [M].Boston: Kluwer Academic Publishers, 2002.
    [61] Zhao PP, Cui ZM, Gao L, and Zhong H. Vision-based deep web query interfaces automatic extraction [J]. Journal of Computational Information Systems, 2007,3(4):1441-1448.
    [62] Cai D, Yu S, Wen J, and Ma W. Extracting content structure for web pages based on visual representation[C]. In the International Conferences on Asia-Pacific Web Conference (APWeb), 2003, 406-417.
    [63]李凡长.动态模糊数据模型研究[J].计算机研究与发展, 1998, 35(8): 714-718.
    [64]李凡长,刘贵全,佘玉梅.动态模糊逻辑引论[M].昆明:云南科技出版社,2005年7月第一版.
    [65] Pawlak Z. Rough Classification [J]. International Journal of Man-Machine Studies, 1984, 20: 9-83.
    [66] He B, Chang K. C.-C and Han J. Mining Complex Matchings across Web Query Interfaces[C]. In Proceedings of the 9th ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD'04), Paris, France, June 2004,3-10.
    [67] Baader F, Nutt W. Handbook of Description Logic [M]. The second chapter, Cambridge: Cambridge University Press, 2003.
    [68] McCarthy J. Applications of Circumscript to Formalizing Commonsense Knowledge [J]. Artificial Intelligence, 1986, 28(2):89-118.
    [69] Straccia U. Reasoning within fuzzy description logics [J]. Journal of Artificial Intelligence Research, 2001, 14(1):137-166.
    [70] Giorgos S, Vassilis T, Jeff Z.P, Ian H. A fuzzy Description Logic for Multimedia Knowledge Representation [C]. Proc. of the International Workshop on Multimedia and the Semantic Web, 2005. http://www.image.ece.ntua.gr/php/pub_details.php?code=358.
    [71] Han Q, Lin ZQ. Default Reasoning with Inconsistent Knowledge [J]. Journal of software, 2004, 15(7):1030-1041.
    [72] Shi ZZ, Dong MK, Jiang YC, Zhang HJ. The logic foundation of Semantic Web [J]. Science in China ser. E information Sciences, 2004, 34(10):1123-1138.
    [73] Lu LQ. Knowledge Engineer and Knowledge Science of century chum [M]. Beijing:Tsinghua Press, 2001.
    [74] Jiang YC, Shi ZZ, Tang Y, Wang J. Fuzzy Description Logic for Representation of the Semantic Web [J]. Journal of software, 2007,18(6):1257-1269.
    [75]王驹,蒋运承,唐素勤.一种模糊动态描述逻辑[J].计算机科学与探索, 2007, 1(2): 216-227.
    [76] An Yoo J, James G, et al. Automatic Generation of Ontology from the Deep Web[C].18th International Workshop on Database and Expert Systems Applications, IEEE Computer Society, 2007,470-474.
    [77] Astrova I, Stantic B. Reverse engineering of relational database to ontologies: an approach based on an analysis of HTML forms[C]. In: Proc. of the Workshop on Knowledge Discovery and Ontologies at ECML/PKDD, 2004. http://olp.dfkide/pkdd04/astrova-final.pdf.
    [78] Jared C, Nick C and David H. Automated Discovery of Search Interfaces on the Web[C]. Proceedings of the 14th Australasian Database Conference,2003,181-189.
    [79] Juliano Palmieri L, Altigran S. da S, Paulo B.G, Alberto H.F. Laender. Automatic generation of agents for collecting hidden Web pages for data extraction[C]. Data&Knowledge Engineering, 2004, 49:177-196.
    [80] Adwait R. Maximum entropy models for natural language ambiguity resolution[D]. Pennsylvania, USA:University of Pennsylvania, 1998.
    [81]刘挺,车万翔,李生.基于最大熵分类器的语义角色标注[J].软件学报,2007, 18(3): 565-573.
    [82]李荣陆,王建会,陈晓云,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展, 2005, 42(1): 94-101.
    [83] McCallum A, Nigam K, Rennie J, and Seymore K.Building domain-specific search engines with machine learning technique[C], In Proc. of AAAI Spring Symposium on Intelligents Engine in Cyberspace, 1999,28-39.
    [84] Ehrig M, Maedche A. Ontology-Focused Crawling of Web Documents [J]. In: SAC 2003, ACM USA.2003,581-624.
    [85] Liu VZ, Richard JC, Luo C, et al. Drop: A probalilistic approach for hidden Web database selection using dynamic probing[R]. In Proc. ICDE2004, 2004. ftp://ftp.cs.ucla.edu/tech-report/2003-reports/030024.pdf.
    [86] Ipeirotis P, Gravano L.When one sample is not enough: inproving text database selection using shrinkage [C]. In Proceeding of the 2004 ACM International Conference on Management of Data, 2004,767-778.
    [87] Alberto P, Paula M and Anastasio M, et al. A model for advanced query capability descriptionin mediator systems[C]. In Proc. of the ICEIS,2002,140-147.
    [88] Cen RW, Liu YQ and Zhang M, et al. Web Page Quality Estimation Based on Linear Discriminant Function [J]. Journal of Computational Information Systems,2007, 3(3):1117-1125.
    [89] Le-HV, Manfred H and Karl A. QoS-based Service Selection and Ranking with Trust and Reputation Management[C]. International Conference on Cooperative Information Systems (CoopIS), 2005,285-294.
    [90] Ling YY, Meng XF, Liu W. An Attributes Correlation Based Approach for Estimating Size of Web Databases [J]. Journal of Software, 2008, 19(2):224-236.
    [91] Mou YJ, Cao J, Zhang SS. Research on Extended Web Service QoS Mode [J]. Journal of Computer Science, 2006, 33(1):156-168.
    [92] Crescenzi V, Mecca G, and Merialdo P. Roadrunner: Towards automatic data extraction from large Web sites[C]. In: Proc. of VLDB2001, San Francisco, CA: Morgan Kaufmann, 2001,109-118.
    [93] Jean-Robert G, Louiqa R, M.E.V and Bright L.Wrapper Generation for Web Accessible Data Sources[C].In Proc.Cooperative Information Systems, 1998,14-23.
    [94] Yang JY, Kim TH, Choi JM. An Interface Agent for Wrapper-Based Information Extraction[C]. In Proc. of thePRIMA 2004,291-302.
    [95] Muslea I, Minton S and Knoblock C. A hierarchical approach to wrapper induction[C]. In Proc. 3rd Intl. Conf. on Autonomous Agents, 1999,190-197.
    [96] Sahuguet A and Azavant F. WysiWyg web wrapper factory (W4F) [C]. In Proc. 8th World Wide Web, 1999. http://db.cis.upenn.edu/DL/www8.pdf.
    [97] Wang J and Lochovsky F.Data-rich section extraction from HTML pages[C].In Proc. 3rd Conf. on Web Information Systems Engineering, 2002,313-322.
    [98] Arasu A, Garcia-Molina H. Extracting structured data from Web pages[C]. In: Proc of ACM SIGMOD'03, New York: ACM Press, 2003,337-348.
    [99] Zhu J, Nie ZQ, Wen JR, Zhang B, Ma WY. Simultaneous Record Detection and Attribute Labeling in Web Data Extraction[C]. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006,494-503.
    [100] Halevy A Y, Franklin M J, Maier D. Principles of dataspace systems[C]. In Proc. of 22th ACM SIGMOD(PODS), Chicago,USA,2006,1-9.
    [101] Arlotta L, Crescenzi V, Mecca G, Merialdo P. Automatic annotation of data extracted from large Web sites[C]. In Proc. of the 6th International Workshop onWeb and Databases, San Diego,2003,7-12.
    [102] Wang JY, Fred H. Lochovsky. Data Extraction and Label Assignment for Web Databases[C]. In: Proc.of the 12th Int'l World Wide Web Conf(WWW 2003), Budapest: ACM Press, 2003,187-196.
    [103] Lu YY, He H, Zhao HK, Meng WY, Yu C. Annotating Structured Data of the Deep Web[C]. IEEE 23rd International Conference on Data Engineering, ICDE 2007, 2007,376-385.
    [104] He H, Meng W, Yu C, Wu Z.WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interface of the Deep Web[C].In Proc. of the Int’l Conf. on Very Large Data Bases (VLDB), Norway: ACM, 2005,1314-1317.
    [105]袁柳,李战怀,陈世亮.基于本体的Deep Web数据标注[J].软件学报, 2008, 19(2): 237-245.
    [106] Handschuh S, Staab S, Volz R. On deep annotation[C]. In Proc. of the 12th International World Wide Web Conf. San Diego: ACM Press, 2003,431-438.
    [107] Yuan L, Li ZH, Chen SL. Inference rules guided ontology alignment [J]. Journal of Computational Information Systems, 2006, 2: 1085-1090.
    [108] Chuang SL, Chang K, Zhai CX. Context-Aware Wrapping: Synchronized Data Extraction[C]. In Proc. of the 33rd international conference on Very large data bases(VLDB), ACM, 2007,699-710.
    [109] sMaedch A. Ontology Learning for the Semantic Web [J].IEEE Intelligent Systems, 2001,16(2):72-79.
    [110] Calvanese D, et al. A framework for ontology integration[C]. In: Cruz I., Decker S., Euzenat J., McGuinness D. eds..The Emerging Semantic Web-Selected Papers from the First Semantic Web Working Symposium, the Netherlands, IOS Press, 2002,201-211.
    [111] Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: a versatile graph matching algorithm and its application to schema matching[C]. In: Proceedings of the 18th International Conference on Data Engineering, California, USA, 2002,117-128.
    [112] Ehrig M, Euzenat J, Hess A, Van Hage W R, Stoilos G. Alignment implementationand benchmarking results [EB/OL]. http:// knowledgeweb. semanticweb.org/semanticportal/deliverables/D2.2.4.pdf.
    [113]唐杰,梁邦勇,李涓子,王克宏.语义Web中的本体自动映射[J].计算机学报, 2006, 17(9): 1837- 1847.
    [114]姜芳艽,孟小峰,贾琳琳. Deep Web集成服务的不确定模式匹配[J].计算机学报, 2008, 31(8):1412-1421.
    [115]寇月,申德荣,李冬,聂铁铮.一种基于语义及统计分析的Deep Web实体识别机制[J].软件学报, 2008, 19(2):194-208.
    [116]宋杰,王大玲,鲍玉斌,申德荣.基于页面Block的Web档案采集和存储[J].软件学报, 2008, 19(2):275-290.
    [117]王辉,刘艳威,左万利.使用分类器自动发现特定领域的深度网入口[J].软件学报, 2008, 19(2): 246-256.
    [118]马军,宋玲,韩晓晖,闫泼.基于网页上下文的Deep Web数据库分类[J].软件学报, 2008, 19(2): 267-274.
    [119] Sheth A, Larson J. Federated Database Systems for Managing Distributed Heterogeneous and Autonomous Database[J]. ACM Computing Surveys, 1990,22(3):183-236.
    [120] Zadeh L A. Fuzzy sets[J]. Information and Control,1965, 8(3):338-353.
    [121] Zhang J, Peng ZH, et al. Si-SEEKER: Ontology-Based Semantic Search over Databases[C]. KSEM2006, LNAI 4092, 2006,599-611.
    [122] Pantel P, Lin D. Discovering word senses from text[C]. In: Proceeding of the 2002 ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Edmonton.Alberta, Canada, 2002,613-619.
    [123] Xyleme L. A dynamic warehouse for XML data of the web[C]. In Proc.of the IDEAS. IEEE Computer Society, 2001,102-109.
    [124] Manolescu I, Florescu D, Kossmann D.Answering XML queries over heterogeneous data sources[C]. In Proc.of the VLDB, Rome, Italy, September, 2001,241-250.
    [125] Marcos AVS, JensPeter DS, Kirakos K. iTrails: Pay-as-you-go Information Integration in Dataspaces[C]. In Proc.of the VLDB, 2007,663-674.
    [126] Do H,Rahm E. COMA—A system for exible combination of schema matching approaches[C].In Proc. of theVLDB,2002,610-621.
    [127] Kabra G, Li CK, Chang KCC. Query routing: Finding Ways in the Maze of the Deep Web[C]. In Proc. of the ICDE, 2005,64-73.
    [128] Cheng T., Yan X., and Chang K. C.-C. EntityRank: Searching Entities Directly and Holistically[C]. In Proceedings of the 33rd Very Large Data Bases Conference, 2007, 387-398.
    [129] He H, Meng WY, Lu YY, Yu C., and Wu ZH. Towards Deeper Understanding of the Search Interfaces of the Deep Web[J]. World Wide Web Journal, 2007, 10(2):133-155.
    [130] Barbosa L, Freire J. Searching for Hidden-Web Databases[C].In Proceedings of International Workshop on Web and Databases, 2005,1-6.
    [131] Zhang Z., He B., Chang K. C. Understanding Web query interfaces:best-effort parsing with hidden syntax[C]. In Proceedings of the 23th ACM SIGMOD International Conference on Management of Data, Paris, 2004,107-118.
    [132] Arasu A., Garcia-Molina H. Extracting Structured Data from Web Pages[C].In Proceedings of the 22th ACM SIGMOD International Conference on Management of Data, 2003, 337-348.
    [133] Crescenzi V., Mecca G., and Merialdo P. RoadRunner: towards automatic data extraction from large web sites[C]. In Proceedings of the 27th International Conference on Very Large Data Bases, Italy, 2001, 109-118.
    [134] Wang J, Lochovsky F H. Data extraction and label assignment for Web database[C].In Proc. of the 12th International Word Wide Web Conference, Budapest, 2003,187-196.
    [135] Madhavan J, Bernstein P A, Rahm E. Generic schema matching with cupid[C]. In Proc. of the 27th International Conference on Very Large Data Bases, Roma, 2001,49-58.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700