基于多Agent的个性化信息检索技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着网络信息的爆炸性增长,如何在浩瀚的网络信息资源里查询出自己想要的信息变得越来越重要。现有的通用搜索引擎在一定程度上满足了用户的需求,但是它们完全忽视了用户的兴趣和背景。个性化信息检索技术的出现正是为了满足这种需求,它能够为不同的用户提供个性化的检索结果。本文讨论了用户兴趣的获取和描述方法,以及用户兴趣模型与检索模型的结合方式等个性化信息检索中的主要问题。
     本文首先讨论和分析了当前主要的信息检索模型,然后针对用户兴趣的获取和描述方法,提出了基于同现词条的用户兴趣模型构建方法。考虑到不同用户的个体行为差异,本文又提出了一种基于缓存的用户兴趣模型更新方法。之后根据现有信息检索系统的特点,设计了一种兴趣模型和检索系统的结合方式——用户查询的个性化扩展技术。最后描述了系统的整体设计、实现和测试结果。本文重点讨论并实现了以下三方面的内容:信息检索模型,用户兴趣建模技术,以及用户查询扩展技术。
     利用以上技术,本文初步实现了基于多Agent的个性化信息检索系统,并设计了相应的实验。实验表明,相对于通用检索系统,该系统的前10个结果的平均准确率提升了16.25个百分点,平均排序倒数的提升也可达到15.2%,能够为用户提供更符合其当前需求的检索结果。实验还表明,对于多兴趣用户,该系统也能够提供较好的个性化服务。
With the explosion of information accessible via the Internet, how to get useful information rapidly in huge-scaled web data is becoming more and more important. Current generic search engines satisfy users' search need in some extent. However, they completely ignore the difference of users' interests or background. Personalized Information Retrieval techniques to solve the problem emerge, which can provide with personalized search results.
     This paper discussed several important problem of the Personalized Information Retrieval techniques, such as the acquisition and description of user-interest, user-interest modeling method and the integrating method of the user-interest model and the retrieval model.
     In this paper, current information retrieval models are overviewed and analyzed firstly. Then a term co-occurrence based user-interest modeling method is presented. After that, to adapt to users’personalized behaviors, a cache-based model updating method is proposed. Next, as an integrating method of interest model and information retrieval systems, a personalized query expansion method is designed. And finally, the design, implementation and test of the personalized IR system based on multi-agent are described. This thesis focus mainly on three techniques: information retrieval modeling, user-interest modeling and query expansion method.
     Based on above techniques and researches, a personalized information retrieval based on multi-agent is established. The experiments show that the improvement of precision of Top 10 can promoted 16.25% and the improvement of mean reciprocal rank of top 10 results can achieved to 15.2%. Users can achieve better results that meet their current needs from this system. The experimental results also show that the system could also provide better personalized service for users with multi-interests.
引文
1李广建.个性化网络信息检索系统的研究与实现.中国科学院博士毕业论文. 2002:6~7
    2 R Armstrong, D Freitag, T Joachims, et al. WebWatcher: A Learning Apprentice for the World Wide Web. Proc. of AAAI Spring Symp on Information Gathering from Heterogeneous, Distributed Environments, Montreal, Seattle, 1995: 123~134
    3李勇,徐振宁,张维明. Internet个性化信息服务研究综述.计算机工程与应用. 2002, (19):183~188
    4李晓明,闫宏飞,王继民.搜索引擎原理、技术与系统.科学出版社. 2005:212~215
    5李雪梅.个性化搜索引擎的研究.南京大学硕士学位论文. 2002:57
    6 S. Brin and L. Page. The Anatomy of a Large-scale Hypertextual Web Search Engine. Proc. of WWW. 1998:107~117
    7冯丽娜.个性化信息获取与建模相关技术研究.哈尔滨工业大学硕士学位论文. 2006:6
    8 A. Pretschner. Ontology Based Personalized Search. Master's Thesis, The University of Kansas. 1999:45~67
    9马献明,严小卫,陈宏朝.个性化网上信息代理技术的研究概述.广西师范大学学报. 2000, 18(3):40~44
    10 M. J. Wooldridge, N. R. Jennings. Intelligent Agent Theory and Practice. Knowledge Engineering Review. 1995, 10(2):115~152
    11 K. Yang, D. Y. Liu. Agents: Properties and Classifications. Computer Science. 1999, 26(9): 30~34
    12赵龙文,侯义斌.多Agent系统及其组织结构.计算机应用研究. 2000, (7):12~14
    13 C. D. Manning, H. Schütze. Foundations of Statistical Natural Language Processing. 2002:330
    14 V. Gudivada, V. Raghavan, W. Grosky and R. Kasanagottu. Information Retrieval on the World Wide Web. IEEE Internet Computing. 1997(Oct-Nov):58~68
    15 R. Baeza-Yates and B. Ribeiro-Nero. Modern Information Retrieval. AddisonWesley Press. 1999:19~33
    16 G. Salton, J. Allan and C. Buckley. Approaches to Passage Retrieval in Full Text Information Systems. Proc. of the 16th Annual International ACM SIGIR Conference, Pittsburgh, 1993:49~58
    17 S. E. Robertson and K. Sparck Jones. Relevance Weighting of Search Terms. Journal of the American Society for Information Sciences. 1976, (3):129~146
    18 J. Ponte and W. B. Croft. A Language Modeling Approach to Information Retrieval. Proc. of the 21st Annual International ACM SIGIR Conference, Melbourne, 1998:275~281
    19 D. Hiemstra. A Linguistically Motivated Probabilistic Model of Information Retrieval. Lecture Notes in Computer Science: Research and Advanced Technology for Digital Libraries. 1998:569~584
    20 D. Hiemstra, W. Kraaij. Twenty-One at TREC-7: Ad-hoc and Cross-language Track. Proceedings of the Seventh Text Retrieval Conference TREC-7, Gaithersburg, 1999:227~238
    21 D. R. Miller, T. Leek, and R. M. Schwartz.. A Hidden Markov Model Information Retrieval System. Proc. of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, 1999:214~221
    22 M. Goldszmidt and M. Sahami. A Probabilistic Approach to Full-Text Document Clustering. Technical Report ITAD-433-MS-98-044, SRI International. 1998:8
    23 A. Berger and J. Lafferty. Information Retrieval as Statistical Translation. In Research and Development in Information Retrieval. 1999:222~229
    24 D. Hiemstra and F. M. G. de Jong. Disambiguation Strategies for Cross-language Information Retrieval. Proc. of the 3rd European Conference on Digital Libraries (ECDL), Paris, 1999: 274~293
    25 F. Song, W. Croft. A General Language Model for Information Retrieval. Proc. of Eighth Intl. Conf. on Information and Knowledge Management (CIKM'99), Marriott, 1999:316~321
    26 H. Turtle and W. B. Croft. Evaluation of An Inference Network-Based Retrieval Model. ACM Transactions on Information System. 1991, (3):187~222
    27 T. Strohman, D. Metzler, H. Turtle and W. B. Croft. Indri: A language-model Based Search Engine for Complex Queries (extended version). CIIR Technical Report. 2005:2~4
    28 D. Metzler and W. B. Croft. Combining the Language Model and Inference Network Approaches to Retrieval. Info. Proc. and Mgt. 2004:735~750
    29 Y. H. Wu, Y. C. Chen and A. L. P. Chen. Enabling Personalized Recommendation on the Web Based on User Interests and Behaviors. Proc. of the 11th International Workshop on Research Issues in Data Engineering. 2001:17~24
    30赵银春,付关友,朱征宇.基于Web浏览内容和行为相结合的用户兴趣挖掘.计算机工程. 2005, (31):93~94
    31 R. Cooley, B. Mobasher and J. Srivastava. Data Preparation for Mining World Wide Web Browsing Patterns. Journal of Knowledge and Information Systems, 1999, 1(1):5~32
    32 J. Xiao, Y. Zhang. Clustering of Web Users Using Session-based Similarity Measures. Proc. of the 2001 International Conference on Computer Networks and Mobile Computing(ICCNMC'O1), Washington. DC, 2001:223~228
    33 J. Xiao, Y. Zhang, X. H. Jia, et al. Measuring Similarity of Interests for Clustering Web Users. Proc. of the 12th Australian Database Conference 2001(ADC'2001) , Washington. DC, 2001:107~114
    34 M. Claypool, L. Phong, W. Makoto, et al. Implicit Interest Indicators. Proc. of the ACM Intelligent User Interfaces Conference(IUI), Santa Fe, 2001:14~17
    35 M. Claypool, D. Brown, P. Le,et al. Inferring User Interest. IEEE Internet Computing. 2001, (6):32~39
    36 W. N. Wang, O. R. Zaiane.Clustering Web Sessions by Sequence Alignment. Proc. of the 13th International Workshop on Database and Expert Systems Applications (DEXA'02), Springer-verlag, 2002:394~398
    37 D. Mladenie. Personal WebWatcher: Design and Implementation. Technical Report IJS-DP-7472, Dept of Intelligent Systems, J. Stefan Institute. 1996:1~8
    38林鸿飞,杨元生.用户兴趣模型的表示和更新机制.计算机研究与发展. 2002, 39(7): 843~847
    39李荣陆,张永奎.一种基于多实例的自适应用户模型.计算机工程与应用. 2002, 38(5): 92~94
    40应晓敏,刘明,窦文华.一种面向个性化服务的客户端细拉度用户建模方法.计算机工程与科学. 2003, 6(25):39~42
    41 S. Gunduz and M. T. Ozsu. A User Interest Model for Web Page Navigation. In Proc. of International Workshop on Data Mining for Actionable Knowledge,Seoul, 2003:46~57
    42陈红莲,陈华钧.信息过滤系统中用户文档更新策略研究.计算机应用研究. 2003(8): 45~50
    43 J. Bai, D. Song, P. Bruza, J. Y. Nie and G. Cao. Query Expansion Using Term Relationships in Language Models for Information Retrieval. ACM CIKM. 2005:688~695
    44 J. Bai, J. Y. Nie and G. H. Cao. Context-Dependent Term Relations for Information Retrieval. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, 2006:551~559
    45邓汉成.查全率与查准率辨异.情报理论与实践. 1998, (1):36~37
    46 E. M. Voorhees and D. M. Tice. The Trec-8 Question Answering Track Evaluation. TREC-8 Proceedings, Gaithersburg, 1999:41~63
    47刘绍翰,武港山,张福炎.基于词条权值的相关反馈算法在Web信息检索中的应用.情报学报. 2002, (6):668~673

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700