基于林业科学数据的语义检索研究

英文题名：Study on Semantic Retrieval Based on Forestry Scientific Data
作者：张乃静
论文级别：博士
学科专业名称：森林经理学
中文关键词：林业科学数据 ; 本体 ; 语义检索 ; 查询扩展 ; 语义Web
英文关键词：Forestry Scientific Data ; Ontology ; Semantic Retrieval ; Query Expansion ; Semantic Web
学位年度：2013
导师：鞠洪波
学科代码：090704
学位授予单位：中国林业科学研究院
论文提交日期：2013-05-01
答辩委员会主席：王广兴

摘要

随着技术的发展和观念的变更，Web已经成为人们获取信息的主要来源之一，承载的信息量以爆炸方式急剧增长，它在带给人们大量信息的同时，也使准确检索所需信息变得困难。给Web赋予语义信息，将Web作为基于知识的资源共享平台，让人们更加方便快捷地获取信息，是Web发展的必然趋势。
     科学数据共享工程是国家科技创新体系建设的重要内容，也是我国科技发展基础条件大平台的重要组成部分。林业科学数据共享工程作为其中之一，门户网站林业科学数据中心在十多年的建设和运行服务中不断地深化和拓展，影响范围不断扩大，数据量也不断增加。面对如此大量的林业科学数据，如何让使用者更加快速、便捷地查找到所需内容是平台不断探索和追求的目标。针对传统信息检索中存在的问题，本文尝试从语义的角度挖掘隐藏在数据背后的信息和规律，以期为用户提供更高质量的数据服务。
     语义信息检索是一种在传统信息检索方法的基础上与领域本体知识管理、数据挖掘和自然语言处理相结合的新技术。本文针对基于本体的语义信息检索进行了深入的研究，以林业科学数据本体为基础，提出了基于林业科学数据的语义信息检索模型，并从系统的角度对本体知识模型、文档的语义预处理、语义查询扩展以及语义检索等主要技术方法进行了分析和研究，主要内容和结论如下：
     （1）以本体的构建理论及技术为指导，构建了林业科学数据本体模型。详细阐述了本体模型中，概念集的选取、核心概念的主要关系和属性及属性之间的关系。为基于林业科学数据本体的语义信息检索提供了重要的基础。
     （2）对语义Web框架进行研究，描述和分析了林业科学数据本体知识模型的维护、存储、推理及查询方法。经过比较研究发现：本体的TDB持久化存储方案比关系数据库更为高效，实验中，前者存储本体的效率最多优于后者60倍；同样，使用Jena和Pellet推理相结合的方法对林业科学数据本体进行陈述三元组推理比单独使用其中一种的推理方法的效率高10%以上。
     （3）对文档进行语义预处理研究。经过对现有林业科学数据的分析，构建了领域词典，专业词汇达7万余条，提高了分词的精度；以向量空间表示词汇在文档中的特征权重，从林业科学数据本体中提取了特征概念集，并作为聚类中心，以余弦相似度作为距离函数，使用改进的k-均值模型对文档进行聚类，并对聚类文档的倒排索引方法进行分析。实验表明使用该聚类方法的聚类结果正确率为81.4%。
     （4）提出了一种语义查询扩展方法。将用户的查询请求分为单关键词、多关键词和疑问句3种情况进行分析处理。单关键词使用改进的语义相似度进行查询扩展；多关键词使用语义推理和语义相似度相结合的查询扩展方法；对于疑问句探索性的提出了基于句法分析和语义推理相结合的查询扩展方法。这些语义查询扩展方法是实现语义信息检索的核心内容。
     （5）在前文介绍的相关理论和研究的基础之上，利用语义Web框架设计开发了基于林业科学数据的语义信息检索系统，实现了信息的语义查询方式。并且通过实验分析，与传统基于关键词匹配的检索模型进行对比。结果表明，本文构建的语义检索方法无论在查全率还是在查准率上的表现都优于传统的检索方法。
     语义信息检索的研究不仅具有重要的理论价值，而且还有实际的应用价值。本文围绕林业科学数据中心现有的八大类数据，对林业科学数据的语义检索进行了深入的研究和探索。通过本体理论方面的研究，构建了林业科学数据本体，为实现林业领域知识模型的共享和复用提供了条件。同时探讨了利用本体实现林业科学数据语义检索的一般方法，在上述研究的基础上，结合网络计算技术设计开发了林业科学数据语义检索系统并进行评价，为海量林业科学数据在语义层次上的共享提供了理论基础和技术支撑。同时，语义检索系统的实现为林业科学数据共享提供了一个全新的思路，对其它数据共享平台的相关研究具有借鉴意义。
As technology advances and idea is updated, Web has become one of the main sources ofinformation to be obtained today. As the amount of information on the web is dramaticallyincreasing, web brings people lots of information, but has led to difficulties for one toaccurately search for the information needed. It is a trend to endow the web semanticinformation and to use it as knowledge based resource sharing so as to make it easy for peopleto obtain relevant knowledge and information through the web.
     The scientific data sharing project is an important part of the construction of nationalscientific and technological innovation system as well as of the basic platform for nationalscientific and technological development. Sharing of forestry scientific data is one part of thisproject. As a portal site, Scientific Data Center of Forestry has been greatly developed andexpanded during the construction and service in the past more than10years. More and morefields have been added and the amount of data has been greatly increased. Faced with such alarge amount of forestry scientific data, how to make the search of information more rapid andeasilier has become the goal to develop a such system. Focusing on the existing problems inthe traditional information retrieval, we tried to unearth the information and rules behind thedata in terms of semantics to provide one with high quality service.
     Semantic information retrieval is a kind of new technology that combines traditionalinformation retrieval with ontology knowledge management, data mining and natural languageprocessing. In this dissertation a research on the semantic information retrieval was conductedbased on ontology and a semantic information retrieval model for forestry scientific data wasproposed. A systematic analysis and research on the critical technologies such as ontologyknowledge model, document semantic pre-processing, semantic query expansion and semanticretrieval were conducted. The significant findings included:
     (1)In this study, the ontology model of forestry scientific data was developed based onthe theory and technology of ontology construction. The selection of concept assembly and the relationship among the core concepts were decribed in detail so as to provide an importantfoundation for the semantic information retrieval of forestry data.
     (2)This study conducted the research on semantic web framework and analyzed andexplored the methods of maintaining, storing, inferring and querying with the ontologyknowledge model of forestry scientific data. Results showed that in comparision with therelational database, the ontology based TDB persistent storage was more efficient and themaximum efficiency was60times better. At the same time, if Jena and Pellet reasoning werecombined for triple group reasoning in forestry scientific data ontology, the efficiency wouldbe10%higher than that using Jena and Pellet separately.
     (3)The study on document semantic pre-processing was conducted. To increase theaccuracy of the dictionary, a total of over70,000professional words were collected throughanalysis of current forestry scientific data. The feature weights of words and terms in thedocument were expressed using vector space. The feature sets of concepts were extracted fromforestry scientific data ontology and used as cluster centers. The document clustering wascarried out using k-means model and the similarity of cosine was employed as the distancemeasure. Finally, the reverse index based method was explored. Results showed that theaccuracy of clustering was81.4%.
     (4)In this study, a kind of semantic query expansion method was put forward. In thismethod, the queries were first classified into three categories: single key words, multi-keywords and question sentences. For single key words, the modified semantic similarity was usedfor query expansion. For multi-key words, integration of semantic reasoning and semanticsimilarity was applied. For question sentences, syntax analysis and semantic reasoning werecombined. These semantic query expansion methods are critical for semantic informationretrieval.
     (5)Based on the above research, a semantic information retrieval system of forestryscientific data was developed using the semantic web framework and the means of semanticquery for information was realized. The system was compared with traditional retrieval modelsthat are based upon key words matching. Results showed that the semantic retrieval method developed in sthis study performed much better than the traditional retrieval methods in termsof success and accuracy.
     The research on semantic information retrieval is theoretically and practically important.This dissertation focused on studying and exploring the semantic retrieval of forestry scientificdata using the eight categories of forestry data that currently exists in Forestry Science DataCenter. Based on the ontology theory, the forestry scientific data ontology was built, whichprovided the potential for knowledge model sharing and reuse in forestry domain. Meanwhile,a method about the semantic retrieval based on the forestry scientific data ontology wasexplored. By combined with the network counting technology, moreover, a semantic retrievalsystem of forestry scientific data was designed, developed and evaluated. This provided atheoretical basis and technical support for sharing of massive forestry scientific data on thesemantic level. The realization of the semantic retrieval system provided a new way forforestry scientific data sharing and it can also be used as a reference for other data sharingplatforms.

引文

[1] Berners-Lee T，Hendler J，Lassila O，et al.The semantic web.Scientific American，2001，284(5):28~37
    [2] Assini P T.NESSTAR:A Semantic Web Application for Statistical Data and Metadata.the11thInternational World Wide Web Conference (WWW2002)，Hawaii，USA，2002
    [3] Shadbolt N R，Gibbins N，Glaser H，et al.Walking Through Cs Aktive Space:A Demonstration of anIntegrated Semantic Web Application.Web Semantics:Science，Services and Agents on the World WideWeb，2004，1(4):415~419
    [4]朱歆华，赵大哲，于亚新，等.医学影像诊断资源平台关键技术的研究.计算机工程，2008，(1):259~261
    [5]刘开瑛.汉语框架语义网构建及其应用技术研究.中文信息学报，2011，(6):46~52
    [6]何丽.本体匹配的网格服务发现模型研究.计算机工程与应用，2012，(19):109~113
    [7] Gro Ss A，Hartung M，Pr U Fer K，et al.Impact of Ontology Evolution on Functional Analyses.Bioinformatics，2012，28(20):2671~2677
    [8] Duan L，Hastings J，de Matos P，et al.Structured Chemical Class Definitions and Automated Matchingfor Chemical Ontology Evolution.Journal of Cheminformatics，2012，5
    [9] Blake J A，Chan J，Kishore R，et al.Gene Ontology Annotations and Resources.Nucleic AcidsResearch，2013，41(D1):D530~D535
    [10] Shi L，Setchi R.Ontology-Based Personalised Retrieval in Support of Reminiscence.Knowledge-BasedSystems，2013，4547~61
    [11] Jung C，Sun C，Yuan M.An Ontology-Enabled Framework for a Geospatial Problem-SolvingEnvironment.Computers, Environment and Urban Systems，2013，3845~57
    [12]田虎.面向服务的科学数据共享平台架构研究.上海交通大学硕士学位论文，2007
    [13]国家林业局.全国林业信息化发展“十二五”规划(2011-2015年).北京:2011
    [14] Luhn H P.The automatic creation of literature abstracts.IBM Journal of research and development，1958，2(2):159~165
    [15] Salton G.The SMART Retrieval System—Experiments in Automatic Document Processing.NewJersey，USA，1971
    [16] Salton G，Wong A，Yang C.A Vector Space Model for Automatic Indexing.Communications of theACM，1975，18(11):613~620
    [17] Salton G，Lesk M E.Computer Evaluation of Indexing and Text Processing.Journal of the ACM，1968，15(1):8~36
    [18] Ras Z W.An Algebraic Approach to Information Retrieval Systems.International Journal of Computer&Information Sciences，1982，11(4):275~293
    [19] Salton G，Fox E A，Wu H.Extended Boolean Information Retrieval.Communications of the ACM，1983，26(11):1022~1036
    [20] Maron M E，Kuhns J L.On Relevance, Probabilistic Indexing and Information Retrieval.Journal of theACM，1960，7(3):216~244
    [21] Jones K S.A Statistical Interpretation of Term Specificity and its Application in Retrieval.Journal ofDocumentation，1972，28(1):11~21
    [22] Roberton S E.The Probability Ranking Principle:The Probability Ranking Principle in InformationRetrieval.Journal of Documentation，1977，33(4):294~304
    [23] Van Rijsbergen C J.A Theoretical Basis for the Use of Co-Occurrence Data in Information Retrieval.Journal of Documentation，1977，33(2):106~119
    [24] Croft W B.A Model of Cluster Searching Based on Classification.Information Systems，1980，5(3):189~195
    [25] Furnas G W，Landauer T K，Gomez L M，et al.Statistical Semantics:Analysis of the PotentialPerformance of Keyword Information Systems.Bell System Technical Journal，1983，62(6):1753~1806
    [26] Mani I，House D，Klein G，et al.The Tipster Summac Text Summarization Evaluation.Proceedings ofthe Ninth Conference on European Chapter of the Association for Computational Linguistics.Stroudsburg，PA，USA，1999
    [27]文坤梅.基于本体知识库推理的语义搜索研究.华中科技大学博士学位论文，2007
    [28]段磊，李琦，毛曦.基于本体的空间搜索引擎研究.计算机科学，2009，(2):172~174
    [29] Chen H，Houston A L，Sewell R R，et al.Internet Browsing and Searching:User Evaluation of CategoryMap and Concept Space Techniques.Journal of the American Society for Information Science，1998，497：582~603
    [30] Van Den Berg J，Schuemie M.Information Retrieval Systems Using an Associative Conceptual Space.Europian Symposium on Artificial Neural Networks，1999，21~23
    [31] Eklund P，Ducrou J，Dau F.Concept Similarity and Related Categories in Information Retrieval UsingFormal Concept Analysis.International Journal of General Systems，2012，41(8):826~846
    [32] Voorhees E M.Query Expansion Using Lexical-Semantic Relations.Proceedings of ACMSIGIR，Dublin，Ireland，1994
    [33] Maki W S，McKinley L N，Thompson A G.Semantic distance norms computed from an electronicdictionary (WordNet).Behavior Research Methods，Instruments，\&Computers，2004，36(3):421~431
    [34] Navigli R，Velardi P.An Analysis of Ontology-Based Query Expansion Strategies.Proceedings of the14th European Conference on Machine Learning, Workshop on Adaptive Text Extraction and Mining.Cavtat-Dubrovnik，Croatia，2003，42~49
    [35] Abasolo J M，Gomez M.MELISA: An Ontology-Based Agent for Information Retrieval in Medicine.Proceedings of the First International Workshop on the Semantic Web，2000，73~82
    [36] Kara S，Alan O，Sabuncu O，et al.An Ontology-Based Retrieval System Using Semantic Indexing.Information Systems，2011，37(4):294~305
    [37] Fernández M，Cantador I，López V，et al.Semantically Enhanced Information Retrieval:AnOntology-Based Approach.Web Semantics:Science，Services and Agents on the World Wide Web，2011，9(4):434~452
    [38]武成岗，焦文品，田启家，等.基于本体论和多主体的信息检索服务器.计算机研究与发展,2001，(6):641~647
    [39]宋峻峰，张维明，肖卫东，等.基于本体的信息检索模型研究.南京大学学报(自然科学版)，2005，(2):189~197
    [40]高翔，赵逢禹.基于语义查询本体的语义网文档检索.计算机工程与设计，2008，29(13):3471~3473
    [41]陈叶旺，李海波，余金山.一种基于农业领域本体的语义检索模型.华侨大学学报(自然科学版)，2012，(1):27~32
    [42]赵建伟，郑诚，吴永俊.基于语义查询扩展的垂直搜索研究.计算机工程，2010，(12):97~99
    [43] Thomas R. G.A Translation Approach to Portable Ontology Specifications.Knowledge Acquisition，1993，5(2):199~220
    [44] Studer R，Benjamins V R，Fensel D.Knowledge Engineering:Principles and Methods.Data&Knowledge Engineering，1998，25(1-2):161~197
    [45]张志平，杨建伟.语义网技术及应用研究综述.情报学报，2008，27(5):721~726
    [46] Gruber T R.Toward Principles for the Design of Ontologies Used for Knowledge Sharing.InternationalJournal of Human Computer Studies，1995，43(5):907~928
    [47] Arpírez J C，Corcho O，Fernández-López M，et al.WebODE:A Scalable Workbench for OntologicalEngineering.Proceedings of the1st International Conference on Knowledge Capture，2001，6~13
    [48] Stader J.Results of the Enterprise Project.proceedings of the16th Annual Conference of the BritishComputer Society Specialist Group on Expert Systems，Cambridge，UK，1996
    [49] Gruninger M.Designing and Evaluating Generic Ontologies.Proceedings of the12th EuropeanConference of Artificial Intelligence，1996，53~64
    [50] http://www.swi.psy.uva.nl/projects/NewKACTUS/Reports.html
    [51] Fem A Ndz M.Overview of Methodologies for Building Ontologies.Proceedings of IJCAI99’sWorkshop in Ontologies and Problem Solving Methods:Lessons Learned and Future Trends，Stockholm，Sweden，1999
    [52] Knight K，Whitney R.Ontology Creation and Use:Sensus.Information Sciences Institute，University ofSouthern California，http://www.isi.edu/naturallanguage/resources/sensus.html，1997
    [53] Noy N F，McGuinness D L，et al.Ontology Development101:A Guide to Creating your First Ontology.Stanford Knowledge Systems Laboratory Technical Report KSL-01-05and Stanford MedicalInformatics Technical Report SMI-2001-0880，2001
    [54] http://www.w3.org/XML
    [55] http://www.w3.org/TR/owl-guide/
    [56] http://www.w3.org/TR/2009/REC-owl2-new-features-20091027/
    [57] http://protege.stanford.edu
    [58] http://www.ontoprise.de
    [59] http://code.googel.com/p/swoop/
    [60] Rennolls K.A Partial Ontology for Forest Inventory and Mensuration.Database and Expert SystemsApplications，2005
    [61]唐丽玉，陈崇成，池子文.面向协同式森林灭火决策的领域本体构建及其解析.地球信息科学，2008，(3):344~349
    [62]张峻峰，郑怀国，李光达，等.基于本体的植物病虫害诊断推理机制研究.全国Web信息系统及其应用学术会议、全国语义Web与本体论学术研讨会暨全国电子政务技术与应用学术研讨会，西安，2008
    [63]王超亮，王霓虹.本体在森林病虫害专家系统中的应用.林业机械与木工设备，2009，(5):47~49
    [64]李庭波，陈平留，郑德祥.基于森林资源数据结构的本体学习探索.西南林学院学报，2009，(2):57~61
    [65]贾雪峰.基于林业主题词表构建林业领域本体的研究.北京林业大学硕士学位论文，2010
    [66]李力人.基于林业主题词表语义关系网的文献聚类.北京林业大学硕士学位论文，2010
    [67]杨抒.基于Web的林产品信息资源整合方法研究.北京林业大学博士学位论文，2011
    [68]林业科学数据中心.林业科学数据库和数据共享技术标准与规范（第一辑）.中国林业出版社，2004
    [69]林业科学数据中心.林业科学数据库和数据共享技术标准与规范（第二辑）.中国林业出版社，2006
    [70]戴才萍，黄义德，钱平，等.水稻病虫草害本体的构建研究.广东农业科学，2011，(1):191~194
    [71]马文峰，杜小勇.领域本体进化研究.图书情报工作，2006，(6):71~75
    [72] http://Jena.sourceforge.net
    [73] http://www.openrdf.org
    [74] http://owlapi.sourceforge.net
    [75] http://www.4suite.org
    [76] http://sourceforge.net/projects/dfapi-php
    [77] http://libraf.org
    [78] http://code.google.com/p/linqtordf
    [79] http://code.google.com/p/factplusplus/
    [80] http://www.hermit-reasoner.com/
    [81] http://clarkparsia.com/pellet/
    [82] http://www.racer-systems.com/
    [83] http://www. w3. org/Submission/SWRL
    [84] Calero J M A，Ortega A M，Perez G M，et al.A Non-Monotonic Expressiveness Extension on theSemantic Web Rule Language.Journal of Web Engineering，2012，11(2):93~118
    [85] Horrocks I，Patel-Schneider P F，Boley H，et al.SWRL:A Semantic Web Rule Language CombiningOWL and RuleML.W3C Member Submission，2004
    [86]潘超，古辉.本体推理机及应用.计算机系统应用，2010，(9):163~167
    [87] Sirin E，Parsia B，Grau B C，et al.Pellet:A Practical OWL-DL Reasoner.Web Semantics:Science，Services and Agents on the World Wide Web，2007，5(2):51~53
    [88] http://www.w3.org/TR/rdf-sparql-query/
    [89]刘艳民.中文网页分类方法的研究.微电子学与计算机，2009，(9):166~169
    [90]张春霞，郝天永.汉语自动分词的研究现状与困难.系统仿真学报，2005，17(1):138~143
    [91]饶高琦，修驰，荀恩东.语料库自然标注信息与中文分词应用研究.北京大学学报：自然科学版，2013，(1):140~146
    [92]郑晓刚，韩立新，白书奎，等.一种组合型中文分词方法.计算机应用与软件，2012，29(7):26~28
    [93]袁鼎荣，李新友，邵延振.用于中文分词的组合型歧义消解算法.计算机应用与软件，2011，28(6):57~58
    [94]李春雨，王勇.基于词典和语素的交集型歧义消除模型.微型机与应用，2013，32(4):12~14
    [95]王军辉，胡铁军，李丹亚，等.中文生物医学文本无词典分词方法研究.情报学报，2011，30(2):197~203
    [96]岳金媛，徐金安，张玉洁.面向专利文献的汉语分词技术研究.北京大学学报：自然科学版，2013，(1):159~164
    [97] Zhang H，Yu H，Xiong D，et al.HHMM-Based Chinese Lexical Analyzer ICTCLAS.Proceedings ofthe second SIGHAN Workshop on Chinese Language Processing.Stroudsburg，USA，2003
    [98] http://ictclas.nlpir.org/
    [99]汤亚玲，崔志明.行为特征分析模式下的网页分类技术研究.计算机工程，2012，(20):179~183
    [100]刘端阳，陆洋.一种有指导的文本特征一种有指导的文本特征加权改进算法.计算机工程，2012，38(8):128~130
    [101] Ortega M，Rui Y，Chakrabarti K，et al.Supporting Ranked Boolean Similarity Queries in MARS.Knowledge and Data Engineering，1998，10(6):905~925
    [102] Kim Y，Seo J，Croft W B.Automatic Boolean Query Suggestion for Professional Search.34thinternational ACM SIGIR Conference on Research and Development in Information Retrieval.NewYork，USA，2011
    [103] Martinet J，Chiaramella Y，Mulhem P.A Relational Vector Space Model Using an AdvancedWeighting Scheme for Image Retrieval.Information Processing&Management，2011，47(3):391~414
    [104] Chen Y，Chiu Y.An IPC-Based Vector Space Model for Patent Retrieval.Information Processing&Management，2011，47(3):309~322
    [105]李贞.基于统计语言模型的中文网页信息检索研究.华中师范大学硕士学位论文，2012
    [106] Salton G，Yu C T.On the Construction of Effective Vocabularies for Information Retrieval.Proceeding SIGPLAN '73Proceedings of the1973Meeting on Programming Languages andInformation retrieval.New York，USA，1973
    [107] Park H，Jun C.A Simple and Fast Algorithm for K-Medoids Clustering.Expert Systems withApplications，2009，36(2，Part2):3336~3341
    [108] Giancarlo R，Bosco G E L，Pinello L.Distance Functions，Clustering Algorithms and Microarray DataAnalysis.Learning and Intelligent Optimization，2010，125~138
    [109] Carslaw D C，Beevers S D.Characterising and Understanding Emission Sources Using Bivariate PolarPlots and k-Means Clustering.Environmental Modelling&Software，2013，40:325~329
    [110] Hill M O，Harrower C A，Preston C D.Spherical k-Means Clustering is Good for InterpretingMultivariate Species Occurrence Data.Methods in Ecology and Evolution，2013
    [111]王进.基于本体的语义信息检索研究.中国科学技术大学博士学位论文，2006
    [112]黄名选，严小卫，张师超.查询扩展技术进展与展望.计算机应用与软件，2007，(11):1~4
    [113] Deerwester S，Dumais S T，Furnas G W，et al.Indexing by Latent Semantic Analysis.Journal of theAmerican Society for Information Science，1990，41(6):391~407
    [114] Alhabashneh O，Iqbal R，Shah N，et al.Towards the Development of an Integrated Framework forEnhancing Enterprise Search Using Latent Semantic Indexing.Conceptual Structures for DiscoveringKnowledge，2011，(6828):346~352
    [115] Joho H，Liu Y K，Sanderson M.Large Scale Testing of a Descriptive Phrase Finder.Proceedings of thefirst International Conference on Human Language Technology Research.San Diego，CA，2001
    [116] Xu J，Croft W B.Query Expansion Using Local and Global Document Analysis.Proceedings of the19th Annual International ACM SIGIR Conference on Research and Development in InformationRetrieval.Zurich，Switzerland，1996
    [117] Buckley C，Salton G，Allan J，et al.Automatic Query Expansion Using SMART TREC3:The ThirdText REtrieval Conference (TREC3).Gaithersburg，Maryland，1994
    [118] Jinxi X，Croft W B.Improving the Effectiveness of Information Retrieval with Local ContextAnalysis.ACM Transactions on Information Systems，2000，18(1):79~112
    [119]崔航，文继荣，李敏强.基于用户日志的查询扩展统计模型.软件学报，2003，(9):1593~1599
    [120] Díaz-Galiano M C，Martín-Valdivia M T，A-López L A U，et al.Using WordNet in MultimediaInformation Retrieval.Proceeding CLEF'09Proceedings of the10th International Conference onCross-language Evaluation Forum:Multimedia Experiments.Berlin，German，2010
    [121] Nawab R M A，Stevenson M，Clough P.Retrieving Candidate Plagiarised Documents Using QueryExpansion.Advances in Information Retrieval，2012，7224：207~218
    [122] Ogilvie P，Voorhees E，Callan J.On the Number of Terms Used in Automatic Query Expansion.Information Retrieval，2009，12(6):666~679
    [123] Liu Z，Natarajan S，Chen Y.Query Expansion Based on Clustered Results.Proceedings of the VLDBEndowment，2011，4(6):350~361
    [124] Chauhan R，Goudar R，Rathore R，et al.Ontology Based Automatic Query Expansion for SemanticInformation Retrieval in Sports Domain.Eco-friendly Computing and Communication Systems，2012，305：422~433
    [125] Ngo V M，Cao T H. Ontology-Based Query Expansion with Latently Related Named Entities forSemantic Text Search.Advances in Intelligent Information and Database Systems，2010，283：41~52
    [126]熊文新，宋柔.信息检索查询语句的表述分析.第四届全国语言文字应用学术研讨会论文集，2005，569~582
    [127]北京语言学院句型研究小组.现代汉语基本句型(续完)第一部分单句（丁）.世界汉语教学,1991，(1):23~29
    [128]余慧佳，刘奕群，张敏，等.基于大规模日志分析的搜索引擎用户行为分析.中文信息学报，2007，(01):109~114
    [129] Silverstein C，Henzinger M，Marais H，et al.Analysis of a very Large Web Search Engine Querylog.ACM SIGIR Forum，1999，33(1):6~12
    [130]刘群，李素建.基于《知网》的词汇语义相似度计算.中文计算语言学，2002，7(2):59~76
    [131]朱礼军，陶兰，刘慧.领域本体中的概念相似度计算.华南理工大学学报(自然科学版)，2004，(S1):147~150
    [132] Bing Liu.Web数据挖掘[M].北京:清华大学出版社，2009
    [133]崔巍.用本体实现地理信息系统语义集成和互操作.武汉大学博士学位论文，2004
    [134]郑任儿.基于本体的语义检索技术研究.华东师范大学硕士学位论文，2007

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700