基于本体的文本内容相关性的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
当今互联网时代,用户虽然能够非常方便快捷地获取大量信息,但在所获取的海量信息中,并非所有内容都与用户需求相关,那些不相关信息大大影响用户获取所需信息的效率。因此,探索新的检索模式,进一步提高检索相关性,满足用户快速、准确获取所需信息的要求,是信息检索研究发展的必然趋势。本体作为语义网中的关键技术,是近年来学界研究的热点,它有着良好的概念层次结构和对逻辑推理的支持,通过对领域知识的建模,表达出机器可理解的语义知识,实现基于内容的检索。因此,本体的应用对信息检索相关性有较大的积极影响。
     本文通过本体和相关性这两项技术的探索与研究,将二者相结合,设计了基于本体的语义查询扩展算法。算法的目的是对用户输入的查询词进行语义扩展,从而提高查全率和查准率。提出了多重因子加权文档排序算法对文档进行相关度排序操作,大大提高了查询词和检索结果的相关性。
     基于本体的语义查询扩展算法,在Eclipse平台下用Java语言设计并实现了基于本体的搜索系统。它包括六部分:用户接口、查询请求处理模块、本体处理模块、网络资源预处理模块、检索模块和查询结果处理模块。实验表明,在系统各模块的协同运作下,能够返回较准确的查询结果。
     本文通过本体知识库来扩展查询词和基于多重因子加权文档排序算法以提高搜索结果与查询词的相关性,也提高了查全率和查准率。展示了利用查询扩展和相关性排序来提高信息检索性能的搜索系统,为构建智能信息检索系统提供了有力的参考。
In Internet era, user can easily get abundant information, but some information is not related to user's needs. Therefore, it is inevitable to search for new retrieval model to improve the relevance and help people overcome the problem of information overloading. Ontology is a very important technology for semantic Web, it is research hotspot in recent years, and it has concept hierarchy structure and supports logical reasoning. By modeling domain knowledge, it can conveys the semantic knowledge that machine easily understand and achieve content-based retrieval. Therefore, the application of ontology is bound to have a greater positive impact on information retrieval.
     Through studying lots of related works on Ontology and relevance, this paper combines the two technologies and designs a semantic expansion algorithm base on domain ontology. The aim of the algorithm is to expand query words which user has input and improve the recall and precision ratio. The paper also sorts the query results by multiple factors weighted sorting algorithm and greatly improved the relevance between query words and retrieval results.
     Base on the semantic expansion algorithm's presentation, an information retrieval system is designed and built with Java language on Eclipse platform. It consists of six parts. They are user interface, query processing module, ontology processing module, Internet resources preprocessing module, search module and query results processing module. Experiments show that the results are accurate with the six modules working cooperatively.
     This paper through expand initial query words in Ontology knowledge database and use multiple factors weighted sorting algorithm to improve recall ratio and precision ratio of the search system. Demonstrated information retrieval system which performance has greatly improved by semantic expansion and relevance sorting, it is provides powerful reference for building intelligent information retrieval system.
引文
[1]姚天顺.自然语言理解与机器翻译.北京:清华大学出版社,2001.
    [2]邓志鸿,唐世渭,张铭等.Ontology研究综述.北京大学学报(自然科学版),2002,38(5):730-738.
    [3]Cooper,W.S."On selecting a measure of retrieval effectiveness,part 1.The subjective philosophy of evaluation." Journal of the American Society for Information Science 1973,24(2):87-100.
    [4]崔航,文继荣,李敏强.基于用户日志的查询扩展统计模型.软件学报,2003(3):195-201.
    [5]苏君华.基于信息用户的相关性研究.图书馆学研究,2003(9):57-59.
    [6]成颖,孙建军等.信息检索中的相关性研究.情报学报,2004,23(6):686-696.
    [7]王家钺.信息检索中“相关性”概念的研究.现代外语,2001,24(2):182-191.
    [8]Cooper,W.S."A definition of relevance for information retrieval." Information Storage and Retrieval 1971,7(1):19-37.
    [9]李国秋,吕斌.检索相关性研究的发展.情报理论与实践,1996(2):56-59.
    [10]侯震宇.信息检索系统中相关性评价问题.现代图书情报技术,2003(2):159-163.
    [11]杨广翔,俞宁,谌莉.搜索引擎结果的重排序方法.计算机应用,2005,25(2):151-155.
    [12]Best Fernandez C.Notations and Terminology on Petri Net Theory.In Petri NetNewsletters,1986,23.
    [13]李善平,尹奇炜.本体论研究综述.计算机研究与发展,2004,41(7):1041-1052.
    [14]黄丽红.信息检索中“相关性”的探究.图书馆学研究,2006(2):65-67.
    [15]赵荣等.检索相关性及其提高路径.科技导报,2005(11):63-65.
    [16]张海涛,董洲.搜索引擎Google的检索功能及PageRank技术分析.信息资源管理,2002(8).
    [17]刘炎禄等.面向语义Web的知识表示框架.上海交通大学学报,2002,vol36(9):1309-1311.
    [18]唐立民,黄德才.本体模型及其在语义Web中的本体描述语言.计算机应用与软件,2005,22(7):33-35.
    [19]张维明,宋峻峰.面向语义Web的领域本体表示、推理与集成研究.计算机研究与发展,2006,43(1):101-108.
    [20]Cunningham H.,Maynard D.,Tablan V JAPE:a Java Annotation Patterns Engine(Second Edition).Research Memorandum CS-00-10.Department of Computer Science,University of Sheffield,November,2000.
    [21]Perez A G,Benjamins VR.Overview of knowledge sharing and reuse components:Ontologies and problem solving methods.In:Stockholm VR,Benjmins B,Chandrasekaran A,eds,Proceedings of the IJCAI99 Workshop on Ontologies and problem Solving Methods(KRRS),2003.
    [22]张瑾,丁颖.领域本体构建方法研究.计算机时代,2007(6):13-15.
    [23]余传明.基于本体的语义信息系统研究:武汉大学博士论文,武汉:武汉大学,2005.
    [24]廖军,文敦伟.基于Ontology的信息检索研究.情报探索,2007(8):78-80.
    [25]王存刚.基于Ontology的智能信息检索系统研究:硕士学位论文,青岛:中国海洋大学,2006.
    [26]李景,苏晓鹭.构建领域本体的方法:计算机与农业,2003.
    [27]朱欣娟.应用领域知识系统建模研究:西北工业大学博士论文,西安:西北工业大学,2003.
    [28]J.H.Gennari,M.A.Musen,R.W.Fergerson,W.E.Grosso,M.Crubezy,H.Eriksson,N.F.Noy,and S.W.Tu.The Evolution of protege:An Environment for Knowledge-Based Systems Development.International Journal of Human-Computer Studies,2003
    [29]J.H.Gennari,R.B.Altman,and M.A.Musen.Reuse with protege-Ⅱ:From Elevators to Ribosomes.In ACM-SigSoft 1995 Symposium on Software Reusability,Seattle,WA,2005.
    [30]向阳,王敏,马强.基于Jena的本体构建方法研究.计算机工程,2007,33(14):59-61.
    [31]Jena2-A Semantic Web Framework for Java.http://jena.sourceforge.net/index.html.2006-05-04.
    [32]张占一,蒋国瑞,黄梯云.Jena在基于Ontology的TBT文档搜索中的应用研究.信息科技,2007(15):132-134.
    [33]杨建林,邓三鸿.信息检索中相关性的维度与度量.情报理论与实践,2005(6):189-194.
    [34]Chen-Yn Lee,Von-Wun Soo.Ontology-based information retrieval and extraction.Information Technology:Research and Education.ITRE 2005.3rd International Conference.2005(6):265-269.
    [35]张承立,陈剑波,齐开悦.基于语义网的语义相似度算法改进.计算机工程与应用,2006(17):165-179.
    [36]徐德智,邓春卉,K.Passi.基于SUMO的概念语义相似度研究.计算机应用,2006,26(1).180-183.
    [37]陆晓辉.网络信息检索的相关性问题.现代情报,2006(2):125-127.
    [38]Cooper,W.S."A definition of relevance for information retrieval." Information Storage and Retrieval 1971,7(1):19-37.
    [39]刘俊凤.语义Web环境下基于ontology的信息检索研究.情报科学,2006(4):566-570.
    [40]宋聚平,王永成.对网页PageRank算法的改进.上海交通大学学报,2003,37(3):397-400.
    [41]袁占亭,张秋余,董建设.智能信息搜索系统中对搜索结果的捧序策略.计算机工程与应用,2004(2):236-241.
    [42]Belkin,N.J.,R.N.Oddy and H.M.Brooks."ASK for information retrieval." Journal of Documentation.1982,38(2):61-71 and 38(3):145-164.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700