个性化文献检索技术研究

英文题名：Research on Personalized Document Retrieval Technology
作者：杨卫忠
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：个性化 ; 信息检索 ; 用户兴趣模型 ; 查询扩展 ; 相关反馈
英文关键词：Personalization ; Information Retrieval ; User Interest Model ; Query Expansion ; Relevance Feedback
学位年度：2007
导师：李占利
学科代码：081203
学位授予单位：西安科技大学
论文提交日期：2007-01-06

摘要

信息检索是指将信息按一定的方式组织、存储起来,并根据用户的需求查找信息的过程。随着信息技术的发展和Internet的全球普及,传统的检索工具已不能满足人们的需求,人们更多的是通过网络查找自己所需的信息。在各种信息检索工具中,搜索引擎使用的相对比较广泛,但目前的搜索引擎检索信息主要考虑的是通用性,没有体现个别用户的信息需求,而个性化信息服务能有效地满足个别用户的信息检索需求。
     目前大多数检索系统都是通过用户输入查询关键词来实现查询的,而用户实际需求与查询关键词之间往往存在较大的语义差距。如何缩小这种语义差距是实现面向用户个性化信息服务的关键。本文应用查询扩展方法,给出了对查询关键词的权重修改、增加和删除的自适应模型,模型通过权重调节因子α,β,γ,λ优化查询条件,使之能够更好的满足用户的实际需求,提高了检索的精度。
     本文针对现有搜索引擎进行文献信息检索过程中存在的低查准率与查全率的缺点,设计了一个基于用户兴趣模型的个性化文献信息检索系统。其主要思想是:需要用户注册时填写个人的基本信息,包括兴趣类和查询关键词等,系统将根据这些基本信息为每个用户建立初始用户兴趣模型。当用户成功登录系统后,系统主动为用户推荐最新的检索结果,用户也可根据自身需求继续查询。用户可对其感兴趣的文档进行浏览、下载等相关操作,系统根据用户的反馈行为进行查询扩展并自动地更新用户兴趣模型,以使用户兴趣模型能真正体现用户当前的兴趣,进一步提高检索的查准率与查全率。
Information retrieval is the process of building up and storing information according to certain ways, and finding the information that users need. With the development of information technology and the spread of Internet world widely, the traditional retrieval tools can't meet the demands of people any more. We are looking for the information resources in Internet more and more. In all kinds of information retrieval tools, search engines are used pretty widely, but the existing search engines is for all users, which can’t satisfy the user’s individual demands. Personalized information service system can satisfy the users’individual demands effectively.
     In most retrieval systems, the demand of users is represented by query keywords. In fact, there exists difference between the real demand of users and the query keywords. How to decreasing the difference is the key problem in implementing the user-oriented personalized information service system. We put forward a way of query expansion oriented to user and adapted modification model based on users’interest. Furthermore, the model can use the regulating factorα,β,γ,λto optimize query ,in order to increase the retrieval precision and make the returning documents satisfy users better.
     In view of the situation that existing search engines have low precision and low recall in document retrieval, this paper proposes a personalized document retrieval system based on the user’s interest model. The idea of design is: when every registration of users, we request the user provides his basic information, interest domain, query keywords and so on. We will create an initial user's interest model to every new user according to the basic information. The system will recommend the lately retrieval results for the user automatically after he login in it successfully. User can also carry on his query if he wants. Meanwhile, user can browse or download the documents which he is interested in. The personalized document retrieval system will modify the user's interest model and expand the original query by the result of user feedback automatically, so that the interest model can really represent the user's interest and then improve the precision and recall in searching information.

引文

[1] 高虎.关于加强我国文献信息检索工作的对策探讨.情报杂志,2004(8):99-100
    [2] 刘秀杰.网上文献信息服务的内容及其发展趋势.现代情报,2005(7):51-55
    [3] Wu,Y.H,Chen,Y.C,Chen,A.L.P. Enabling personalized recommendation on the web based on user interests and behaviors. Proceedings of the 11th International Workshop on Research Issues in Data Engineering. Los Alamitos, CA:IEEE CSPress,2001.17-24
    [4] Webb G, Pazzani M, Bitlsus D. Machine Learning for User Modeling.User Modeling and User-Adapted Interaction,2001(11):19-29
    [5] Pazzani M, Bi11sus D. Learning and Revising User Profiles:The Identification of Interesting Web Sites.Machine Learning,1997(27):313-331
    [6] Mladenie D.Intelligent Systems Text-Learning and Related Intelligent Agents: a Survey.IEEE,1999,14(4):44-54
    [7] 应晓敏,刘明,窦文华.一种面向个性化服务的无需反例集的用户建模方法.国防科技大学学报,2002,24(3):67-71
    [8] Middleton.Stuart.Shadbolt.Nigel R.Roure.David C.De. Ontological User Profiling in Recommender Systems, ACM Transaction on Information Systems. 2004, 22(1)
    [9] 徐小琳,阙喜戎,程时端.信息过滤技术和个性化信息服务.计算机工程与应用,2003(9): 182-184
    [10] 林鸿飞,杨元生.用户兴趣模型的表示和更新机制.计算机研究与发展,2002,39(7): 843-847
    [11] 李业丽,林鸿飞,姚天顺.基于示例的用户信息需求模型的获取和表示.计算机工程与应用,2000(9):11-12
    [12] 张永奎.基于分类模板的用户模型构造方法.山西大学学报(自然科学版), 2002,25(2): 109-111
    [13] 许欢庆,王永成.基于加权概念网络的用户兴趣建模.上海交通大学学报,2004, 38(1):34-37
    [14] 张华,姚丽,张英朝等.面向决策支持的个人助手 Agent 用户概貌构建方法研究.复旦学报(自然科学版),2004(5):860-864
    [15] 应晓敏.面向 Internet 个性化服务的用户建模技术研究[D].长沙:国防科技大学,计算机学院计算机科学与工程系,2003
    [16] V.Gudivada,V.Raghavan,W.Grosky,and R.Kasangagottu.Information retrival on theWorld Wide Web.IEEE Internet Computing,Oct-Nov,1997
    [17] Ricardo Baeza-Yates and Berthier Ribeiro-Neto.Modern Information Retrival.ACM, New York,Addidon Wesley,1999
    [18] Kwok K L. Experiments with a component theory of probabilistic information retrieval based on single terms as document components. ACM Trans.Inf. 1990,8(4):363-386
    [19] S.E.Robertson, S.Walker, M.M.Beaulieu etc. At TREC-4,in D.K.Harman(Ed.),The 4th Text Retrieval Conf.National Institute of Standards and Technology, Gaithersburg, MD, 1996
    [20] Savoy J. Searching information in legal hypertext systems. Artifical Intelligence & Law, 1994, 205-232
    [21] J.M. Ponte and W.B.Croft. A language modeling approach to information retrieval. In Proceedings of the 21th International Conference on Research and Development in information Retrieval,1998
    [22] A.Berger and J.Lafferty. Information retrieval as statistical translation. In Proceedings of the 21th International Conference on Research and Development in information Retrieval, 1999
    [23] F.Song and W.B.Croft. A general language model for information retrieval. In Proceedings of Eighth International Conference on Information and Knowledge Management(CIKM),1999
    [24] C.Zhai,P.Jansen,D.A.Evans. Exploration of a heuristic approach to threshold learning in adaptive filtering. In Proceedings of the 21th International Conference on Research and Development in information Retrieval,2000(56):312-331
    [25] V.Lavrenko and W.B.Croft. Relevance-based Language Models. In Proceedings of the 21th International Conference on Research and Development in information Retrieval,2001,120-127
    [26] D.Lawrie,W.B.Croft and A.Rosenberg. Finding topic words for hierarchical summarization. In Proceedings of the 21th International Conference on Research and Development in information Retrieval,2001,349-357
    [27] Workshop on Language Modeling and Information Retrieval.CMU,2001
    [28] 张俊林.基于语言模型的信息检索系统研究:[博士学位论文].北京:中国科学院研究生院,2004
    [29] G.A.Miller, R.Beckwith, C.Fellbaum, D.Gross,and K.J.Miller. Introduction to WordNet:A on-line lexical database. International Journal of Lexicography(special issue),1990,3(4):235-312
    [30] R.Richardson and A.Smeaton. Using WordNet in a Knowledge-based Approach to Information Retrieval. Workshop paper CA-0359,School of Computer Application, Trinity College Dublin,1995
    [31] A.Smeaton and R.Wilkinson. Spanish and Chinese document retrieval in TREC-5. Maryland,1996
    [32] Carolyn J.Crouch. An approach to automatic construction of global thesauri. Information Processing and Management,1990,26(5):629-640
    [33] Qiu and H.P.Frei. Concept based query expansion. In Proceedings of the 16th ACM SIGIR Conference,1993,160-169
    [34] H.Schutze and J.O.Pedersen. A coocurrence-based thesaurus and two applications to information retrieval. In Proceeding of RIAO’94, 1994,266-274
    [35] H.Chen,B.Schatz. Automatic thesaurus generation for an electronic community system. Journal of American Society for Information Science,1995,46(3):175-193
    [36] D.Lin. Identifying Synonyms among Distributionally Similar Words.In Proceeding of IJCAI,2003,1492-1493
    [37] E.M.Voorhees. Query Expansion Using Lexical-Semantic Relations. In 17th Annual International ACM SIGIR Conference,1994,61-69
    [38] H.Wu and G.Salton. The estimation of term relevance weights using relevance feedback. Journal of Documentation,1981,37(4):194-214
    [39] D.Harman. Towards interactive query expansion.In Eleventh International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998,321-331
    [40] C.Buckley,G.Salton,J.Allan and A.Singhal. Automatic query expansion using SMART:TREC-3.In Proceedings of the Third Text Retrieval Conference(TREC-3), 1994,69-80
    [41] G.Salton and C.Buckley. Improving performance by relevance feedback.Journal of the American Society for Information Science, 1990,41(4):288-297
    [42] S.E.Robertson,K.Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Sciences. 1996,27(3):129-146
    [43] 徐宝文,张卫丰.搜索引擎与信息获取技术.清华大学出版社,2003
    [44] 曾春,邢春晓,周立柱.个性化服务技术概述.软件学报,2002,13(10):1952-1961

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700