基于本体的垂直搜索系统的设计与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着信息技术的迅猛发展,网络成了人们获取信息的主要手段,它在给人们带来便利的同时,但也带来了困惑。因特网上的信息浩如烟海,内容庞大,组织松散,人们经常要耗费大量的时间去搜索特定的主题来完成工作。所以信息检索已经成为我们工作、生活必不可少的一种网络服务。随着使用的普及,大量的潜在需求使信息查询技术得到了突飞猛进的发展。其中的主要支撑工具——搜索引擎的性能将极大的影响人们搜索有效信息的速度,因此如何优化搜索引擎的性能在该领域备受关注。
     Web上的大量信息造成了“信息过载”,实现快速、准确的检索信息已经成为一个亟待解决的问题。本论文利用软件工程的相关知识,分析了国内外搜索引擎的现状以及存在的不足,研究了语义网和本体的知识,以及它们在信息检索的应用,深入研究垂直搜索引擎设计理论,通过对特定专业领域或行业内容进行精细挖掘和分类,过滤筛选等工作后,得出的一个优化了的信息检索模式。它排除与用户检索内容不相关的杂项,提高检索准确率,信息定位更精准,因此在特定领域或行业中,垂直搜索提供的搜索服务势必更好更强,更为用户所欢迎。
     本文的研究重点放在以垂直搜索引擎理论为基础构建了一个基于本体的信息检索的框架,采用特定算法设计面向主题的搜索引擎信息抽取系统。经过研究分析,提出了基于本体的个性化搜索引擎设计,摒弃了目前搜索引擎中普遍存在的问题。它借助用户的个人兴趣知识,参照共享本体知识库对查询进行语义分析,对查询结果进行过滤,基本实现了个性化查询。
     目标系统由页面获取、网页预处理、主题信息抽取、查询内容匹配等组成。用Java环境进行框架构建,MVC模式及Struts进行开发,protégé本体建模工具构建本体,选择XML语言描述本体。该系统从实际应用的角度出发,通过搜索内容的主题进行分类,更有效地从web上抓取相关的网页,再进行整理、综合,提取出有用的信息再以一定的方式展现给用户。应用结果证明本文研究实现的搜索引擎信息抽取系统极大的提高了信息获得的准确率和时效性。
Accompany with the explosive development of information technology, network becomes the primary way of obtaining information for people. Network endows convenience on people but puzzle people on the other edge. As information is explosive, enormous and unsystematic, finding useful information from the Internet always waste people a lot of time. As a result, information searching technology already turned into a necessary tool. As the prevalence of usage, information inquiring technology is pushed to fast evolution. The performance of search engine poses a big influence on the speed of catching the effective information in the Internet.
     Massive information on the Internet causes 'information overload', how to search information in a quick and exact way is becoming an emergency problem. Aiming at given domain or industry, vertical searching engine made a professional and deep analyze, classify in a refinement rule, and filtrate in a certain rule, it can get rid of information weed which is not concerned with search contents, as a result, it prove that it possess higher veracity ratio and better information pinpoint capability. It can be concluded that vertical search is better and stronger in special domain search, and it will be more popular than other search engine.
     After analyzing the current and deficiency of the domestic and abroad search engine, basic knowledge of semantic web and ontology theory, and apply instance of them, the paper proposes an information search frame base on ontology at the actual apply aspect: snatching relative network page in a more effective way according to set the theme of searching content, pumping out useful information after trimming un-relative information, and then exhibiting those selected information to user. The key point of the paper is the information-extraction module design of topic-oriented search engine system by adopting certain algorithm, and this module is composed by page obtaining, page pre-handling, topic information exaction and inquired content matching. The actual usage proves that the otology search engine with information-extraction designed by the paper improves the veracity and speed of information obtaining enormously.
     Using JAVA environment to construct the system frame, MVC and Struts to develop, protege ontology modeling tool to build ontology, XML language to describe ontology, the system started from actual apply angle, classified the search contents topic, snatched the relative pages from the web in a more effective way, then after scratching up useful information, and finally showed the search results to the user in a certain way.
引文
[1].国家信息化专家咨询委员会,《互联网的现状,技术及发展趋势》,国家信息化专家咨询委员会课题研究成果,1-47
    [2].杜小勇,《第三代搜索引擎初探:智能化、个性化》,北京:机械工业出版社,2006,11-16
    [3].吴丹,《搜索引擎的智能化研究[J].情报理论与实践》.2002(4),56-63
    [4].周中成,孙荣胜,《基于语义的Web服务自动发现、匹配及执行平台》,计算机应用,2005,25[1],49-112
    [5].陈兰,左志宏,熊毅等,《一种新的基于Ontology的信息抽取方法》,计算机应用,2004.8:155-159
    [6].Michael Bowman,Antonio M.Lopez,Jr.,Gheorghe Tecuci,《Ontology Development for Military Applications》,Proceedings of the South Eastern Regional ACM Conference,2001,87-94
    [7].《A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet》,Joerg-Uwe Kietz,Alexander Maedche,Raphael Volz,http://www.kietz.ch/ekaw-text2000.pdf
    [8].Tim Bemers-Lee,Mark Fischetti,张宇宏,萧风译,《编织万维网-万维网之父谈万维网的原初设计与最终命运》,上海,上海译文出版社,1999,154-155
    [9].A Semantic Web Search Engine,http://swoogle.umbc.edu/
    [10].TIM BERNERS-LEE,JAMES HENDLER,ORA LASSILA,《语义网:一种能让计算机理解的新型Web内容形式》,上海,上海译文出版社,1997,124-139
    [11].http://www.chinagrid.net/dvnews/show.aspx?id=1415&cid=23,2005
    [12].Tim Berners Lee,《Semantic Web on XML》,http://www.w3.org/2000/Talks/1206-xm12k-tbl/slide1-0.html,2000/12/06
    [13].David Mertz,《W3C XML Schema与文档类型定义(DTD)比较》,http://www-128.ibm.com/developerworks/cn/xml/x-matters/part7/index.html,2005年6月
    [14].David C.Fallside,《Xml schema part 0:Primer》,w3c recommendation,2001,203-209
    [15].柴晓路,《理解XML Schema:XML Schema初步(Ⅱ)》,2001
    [16].刘升平,《XML的模型论语义及其应用》,[博士论文],北京:北京大学,2005
    [17].邓志鸿,唐世渭,《Ontology研究综述》,北京大学学报,2002,38[5]
    [18].Ian,Horrocks,《From SHIQ and RDF to OWL:The Making of a Web Ontology Language》,http://www.w3.org/2001/sw/WebOnt/charter
    [19].http://www.cs.vu.nl/-frankh/postscript/OntoHandbook03OWL.pdf
    [20].Fernández López,《Overview Of Methodologies For Building Ontologies,Proceedings of the IJCAI-99 workshop on Ontologies and Problem-Solving Methods(KRR5)》,Stockholm,Sweden,1999,67-74
    [21].Cf.T.R.Gruber,《A translation approach to portable ontoiogies》,Knowledge Acquisition,1993,5,45-76
    [22].Fernández López,M.,Overview Of Methodologies For Building ontologies,Proceedings of the IJCAI-99 workshop on Ontologies and Problem-Solving Methods(KRR5) Stockholm,Sweden,1999,http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-18/
    [23].吴健,吴朝晖,李莹,邓水光,《基于本体论和词汇语义相似度的Web服务发现》,计算机学报,2005,4
    [24].W.N.Borst,《Construction of Engineering Ontologies for Knowledge Sharing and Reuse》,[PhD thesis],University of Twente,Enschede,1997
    [25].孟章荣,《军事应用中的多源信息融合技术》,现代防御技术,2001
    [26].牛志一,《现代化战争中的多传感器信息融合技术研究》,计算机与信息技术,2005,http://www.ahcit.com/lanmuyd.asp?id=1743,98-139
    [27].Wenjie Li,Zhiyong Feng,Yong Li,《Ontology-bascd intelligent information retrieval system》,CCGEI,2004,373-376
    [28].http://www-128.ibm.com/dcvcloperworks/cn/xml/x-schema/part2/index.html,2002
    [29].李景,《本体理论在文献检索系统中的应用研究》,北京:北京图书馆出版社
    [30].R.P.B.Swartout,K.Knight,T.Russ,Toward Distributed Use of Large-Scale Ontologies,Ontological Engineering,1997,138-148
    [31].杨文军,马路,丁峰,王克宏,《从“自动化”到“智能化”-智能Web服务在信息处理中的应用》,2005,http://searchwebservices.techtarget.com.cn/367/2088367_1.shtml
    [32].Ontologies-Description and Applications.http://wiki.w3 china.org/wiki/index.php
    [33].Gruber T.《Towards principles for the design of ontologies used for knowledge sharing》.International Journal of Human-Computer Studies 1995,43(5/6):907-928
    [34].Uschold M.《Building Ontologies:Towards Unified Method Technology[J]》.Inexpert systems 96,1996(3)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700