详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     (1)分析比较了几类主要的本体构建的方法。综合各方面,本文采用Mike Uschold & Micheal Gruninger提出的方法构建旅游领域本体。在构建过程中,本文分析研究了本体概念之间的关系、概念的层次结构、概念的等价性、属性约束以及实例的等价性。
With the development of Internet and Web technology, WWW has become a tremendous information depository. However, with traditional search engines, people can’t easily find the precise information which they need. The technology of Web information extraction is appeared under this background.
     At present, the technology of Web information extraction has a lot of research. The main methods of Web information extraction are natural language processing-based and Wrapper induction-based and HTML structure-based and ontology-based. The method of ontology-based information extraction mainly uses the description information of the data itself, relying less on Web page, and ontology can provide domain concepts knowledge and relations which machine can understand, and ontology has expressive reasoning ability. Besides, in information extraction, it has many advantages using ontology. First, ontology provides a rich and predefined lexicon, which can be used as the stable concept interface for data source, and is independent of the data mode. Second, the knowledge of ontology representation is enough for the converting of all relevant information sources. Third, ontology supports the management of consistency and indentification of the non-consistent data, and etc.
     With the analysis above and the actual needs of our project, a method of Web information extraction based on ontology in tourism domain is proposed in this paper, and a model platform of information extraction in tourism of Guangxi—Tourism_IESystem is designed and implemented. The main works done in this paper are as follows:
     (1) Analyze and compare the main methods of domain ontology construction. All things considered, tourism ontology is constructed in this paper, using the method proposed by Mike Uschold & Micheal Gruninger. In constructing process, this paper studies the relation between the concept and the hierarchical structure of the concept and the equaivenlent of the concept and the restrictions of the property and equaivenlent of the individual.
     (2) Introduce the Pellet reasoner, state the SHOIQ(D)-Tableaux reasoning algorithm, study the reasoning of the tourism domain ontology using the reasoning algorithm, including the check of ontology consistency and the check of concept subsumption and the check of concept satisfiability and the check of property restrictions and the check of instance. At last, state the ontology parser using Jena, analyze ontology concept and keywords and relation and instance and etc, storing in database.
     (3) On the basis of ontology reasoning and parser, firstly, according to the characteristics of the transferring from the website to the DOM tree, state the extraction algorithm of the website text content using the keywords of the tourism ontology to locate the information regional of the pages. Secondly, state the Chinese word segmentation using ICTCLAS word segmentation tool and tourism domain vocabulary, and analyze the filtering of stop words. At last, state the extraction rules. In the construction of the extraction rules, the semantic feature of the property is used in this paper, and combining the triple.
     At last, according to the key technology studied in this paper, a model platform of information extraction in tourism of Guangxi—Tourism_IESystem is implemented. And the performance of the information extraction system is validated by making use of the Web page of tourism sites as experimental object. This shows that the method proposed in this paper is feasible according to technology aspect, and it has practical application value and realistic significance.
[1] Line Eikvil原著,陈鸿标译.网上信息抽取技术纵览, 2003.
    [2] Bozsak E, Kaon. Towards a large scale semantic web[A]. In Proceedings of the Third International Conference on E-Commerce and Web Technologies (EC-Web 2002)[C], Springer Lecture Notes in Computer Science, 2002.
    [3] Berners-Lee T. Semantic Web Road Map, 1998. http://www.w3c.org/ DesignIssues/Semantic.html MessageUnderstandingConferenee.1998.
    [4] Robert B, Sergio F, Georg G. Supervised wrapper generation with lixto[C]. Proeeedings of 27th International Confereneeon Very Large Database, Roma, Italy, 2001.
    [5] ROBERI,B, SERGIOFIESCA,GEORG G. Visual web information extraction with lixto[C]. Proceedings of 27th Intemational Conference on Very Large Database, Roma, Italy, 2001.
    [6]狄慧.基于Agent的Web信息抽取研究[D].大连理工大学,硕士学位论文, 2004.
    [7] Creseenzi V, Meeea G, Merialdo P. RoadRuriner: Towards Automatic Data Extraction from Large Web Sites[C]. Proeeedings of the 2e International Conference on Very Large Database Systems. Rome, 2001: 109-118.
    [8] Yanhong Zhai, Bing Liu. Extracting Web Data Using Instance-Based Learning[C]. Proceedings of 6th International Conference on Web Information Systems Engineering, 2005: 318-331.
    [9] Soderland S. Learning information extraction rules for semi-structured and Free Text[J]. Machine Learning, 1999, 34(1-3): 233-272.
    [10] Califf M, Mooney R. Relational Learning of Pattern-Match Rules for Information Extraction[C]. In Proceeding of the 6th National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, Orlando, Florida, 1999.
    [11] FREITAG D.Machine Learning for Information Extraction in Informal Domains[J]. Machine Learning, 2000, 39(2/3): 169-202.
    [12] Muslea I, Minton S, Knolock C. Hierarchical wrapper induction for semi-structured formation sources[J]. Autonomous Agents and Multi-Agent Systems, 2001, 4(1/2): 93-114.
    [13] Craig A, Knoblock, Kristina L, etal. Accurately and Reliably Extrating Data from the Web: A Machine Learning Approach[J]. Data Engineering Bulletin, 2000,23(4): 33-41.
    [14] Muslea I, Minton S, Craig A, etal. Active Learning for Hierarchical Wrapper Induction[C]. In Proceeding of the 6th National Conference on Artifial Intelligence, Orlando, Florida, USA, 1999.
    [15] Muslea I, Minton S, Craig A, etal. A Hierarchical Approach to Wrapper Induction[C]. In Proceeding of the Third International Conference on Autonomous Agents, Washington, USA,1999.
    [16] Hus C.N, Dung M. Generating Finite-state Transducers for Semi-structured Data Extraction from the Web[J]. Information system, 1998, 23(8): 521-538.
    [17] Kushmerick N. Wrapper Induction: Efficiency and Expressiveness[J]. Artificial Intelligence Journal, 2000, 118(1/2): 15-68.
    [18] Liu L, Pu C, Han W. XWRAP: An XML-enabled wrapper construction system for Web information sources[C]. In Proceedings of the International Conference on Data Engineering, San Diego, 2001.
    [19] Liu L, Han W, Buttler D, etal. An XML-Based Wrapper Generator for Web Information Extraction[C]. In Proceedings of ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, 1999.
    [20] Valter C, Giansalvatore M. RoadRunner: Towards Automatic Data Extraction from Large Web Sites[C]. In Proceedings of 27th International Conference on Very Large Database. Roma, Italy, 2001.
    [21] Arnaud S, Fabien A. Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F[C]. Proceedings of 25th VLDB Conference, Edinburgh, Scotland, UK, 1999.
    [22] Arnaud S, Fabien A. Web Ecology: Recycling HTML Pages as XML Documents Using W4F[C]. In Second International. Workshop on the Web and Databases, Philadelphia, Pennsylvania, USA, 1999.
    [23] Application of Suffix Tree[EB/OL]. http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/, 2007.
    [24]马腾.基于ontology的信息抽取系统的研究与实现[D].四川:电子科技大学, 2006.
    [25] Maria V, Enrico M, John D, etal. Knowledge Extraction by using an Ontology-based Annotation Tool. Knowledge Media Institute(KMI), The Open University, Walton Hall, Milton Keynes, MK76AA, United Kingdom.
    [26] Harith A, Sanghee K, David E.M, etal. Automatic Ontology-based Knowledge Extraction from Web Documents, Published by IEEE Computer Society, University of Southampton.
    [27] Chang-Shing Lee, Yea-Juan Chen, Zhi-Wei Jian. Ontology-based fuzzy event extraction agent for Chinesee-news summarization, Department of Information Management, Chang Jung University, Tainan 711, Taiwan.
    [28] Gruber R T. A translation approach to portable ontology specifications[J]. Knowledge Acquisition, 1993(5): 199-220.
    [29] Studer R, Benjamins R V, Fensel D. Knowledge engineering: principles and methods[J]. Data and Knowledge Engineering, 1998, 25(122): 161-197.
    [30] W3C OWL1.1 Web Ontology Language Overview. http://www.w3.org/ Submission/owl11overview/.
    [31] Perez G A, Benjamins R V. Overview of Knowledge Sharing and Reuse Components: Ontologies and Problem-Solving Methods[A]. In: Stockholm V R, Benjamins B, Chandrasekaran A, eds. Proceedings of the IJCAI-99 workshop on Ontologies and Problem-Solving Methods(KRRS), 1999: 1-15.
    [32] OWL Web本体语言指南http://zh.transwiki.org/cn/owlguide.htm.
    [33] Gruber T. Towards principles for the design of ontologies used for knowledge sharing[J]. International Journal of Human and Computer Studies, 1995(43): 907-928.
    [34] Gruninger M, Fox S M. Methodology for the design and evaluation of ontologies [A]. In: Proceedings of the Workshop on Basic Ontological Issues in Knowledge Sharing[C], held in conjunction with IJCAI-95,Montreal, Canada, 1995.
    [35] Uschold M, Gruninger M .Ontologies: Principles, methods and applications [J],The Knowledge Engineering Review ,1996,11(2).
    [36] Gomez-Perez A. Knowledge sharing and reuse [A]. In: The Handbook of Applied Expert Systems [M], CRC, 1998.
    [37] Farshad H, Andreas G. Resolving semantic heterogeneity in schema integration: an ontology based approach[C]. Proceeding of the international conference on Formal Ontology in Infromation Systems, Ogunquit, Maine, USA October 17-19, 2001: 297-308.
    [38] Baader F, Calvanese D, McGuinness D, etal. The Description Logic Handbook: Theory , Implementation and Applications[M]. Cambridge: Cambridge University Press, 2003.
    [39]陆建江等.语义网原理与技术[M].科学出版社, 2007.
    [40] Pellet: An OWL DL Reasoner. http://nellet.owldl.eoln/.
    [41] Manfred S, Gert S. Attributive Concept Descriptions with Complements[J]. Artificial Intelligence, 1991, 48(1): 1–26.
    [42] Massimo Paolucci, Takahiro Kawamura, Terry R.Payne, etal. Semantic matching of Web services capabilities. In Proceedings of the First International Semantic Web Conference(ISWC), volume 2342 of Lecture Notes in Computer Science, 2002: 333–347.
    [43] Gruber T. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 1993, 5(2): 199.
    [44] Ian Horrocks. The FaCT System. In Automated Reasoning with Analytic Tableaux and Related Methods: International Conference Tableaux’98, volume 1397 of LectureNotes in Artificial Intelligence, 1998: 307-312.
    [45] Franz B, Sattler U. An Overview of Tableaux Algorithms for Description Logics http://www.cs.man.ac.uk/~franconi/dl/course/articles/baader-Tableaux.ps.gz
    [46]蒋运承,汤庸,王驹,周生明.面向语义Web的描述逻辑[J].模式识别与人工智能, 2007(1): 48-54.
    [47]史忠植,董明楷,蒋运承,张海俊.语义Web的逻辑基础[J],中国科学E辑, 2004,34(10): 1123-1138.
    [48]董明楷,蒋运承,史忠植.一种带缺省推理的描述逻辑[J].计算机学报, 2003(6): 729-736.
    [49]蒋运承,史忠植,汤庸,王驹.面向语义Web语义表示的模糊描述逻辑[J]软件学报, 2007(6):1257-1269.
    [51]周慧.基于应急案例本体的信息抽取的研究与应用[D].太原:太原理工大学, 2007.
    [52]陈静.基于本体的信息抽取研究[D].苏州:苏州大学, 2007.
    [53] Jena http://jena.sourceforge.net/index.html.
    [54] W3C. Document Object Model(DOM) Level 1 Specication, Version1.0[EB/OL]. http://www.w3.org/TR/REC-DOM-Level-1.
    [55]刘艺琴.基于本体的web非规范知识处理中信息抽取技术研究[D].昆明:昆明理工大学, 2005.
    [56]李盛.面向真实文本的汉语词义排歧系统:[D].太原:山西大学, 2004.
    [57]刘开瑛.中文文本自动分词和标注[M].北京:商务印书馆, 2000.
    [59] ICTCLAS. http://ictclas.org/index.html.
    [60] Protégé. http://protege.standford.edu
    [61] HTMLParser http://sourceforge.net/projects/htmlparser/
    [62] JTidy http://jtidy.sourceforge.net/

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700