个性搜索引擎中用户兴趣模型研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着网络资源数量的不断增长,信息更新的不断加快,信息冗余、主题参杂等问题相继出现,人们想高效地搜索到自己想要的信息变得越发困难。在解决这些问题的过程中,提供个性化服务的搜索引擎提高了检索效率,是人们一直在研究的热点。将用户感兴趣的信息反馈给用户,对不同的用户提供不同的服务模式,即个性化服务的信息方式。
     本文对个性搜索中的建模技术进行研究,并将sprint分类算法和Agent技术结合到一起应用到建模过程当中,提高建模的速度和准确度,使兴趣模型更贴近用户的实际喜好。该模型对用户将要执行的操作行为进行分析,预测其兴趣所在,优化用户的查询语句,最终达到提高人们检索信息的效率的目的。
     本文给出了兴趣模型的建立过程。首先设计数据结构,要想运用sprint算法构建用户的兴趣模型,必须对构建兴趣模型所需要的数据源信息进行分析学习,根据sprint算法的执行需求,设计出了有特定结构的三张表,用来存放建模不同阶段所需要的数据。然后,研究基于sprint算法的信息抽取过程。算法如何协助兴趣模型的建立,以及每一步执行是如何对知识库中的数据进行抽取学习,是本文研究的重点内容。最后,建立兴趣模型。对于兴趣模型,本文提出了兴趣模型的建立方法和模型结构。将数据挖掘算法充分应用于建模过程当中,通过用户的链接操作构造兴趣树,从中提取兴趣模型。
     研究的最终目的是快速建立一个准确的、贴近用户需求的兴趣模型。这种兴趣模型针对不同用户的兴趣取向,提供不同的个性化服务。当用户在面对杂乱繁多的网络资源而不知所措时,帮助用户高效精准的找到自己想到的信息。
Network capacity growing and information update accelerating, redundant information, mixed themes occur one after another, and people find it difficult to retrieve the information they want efficiently. In the process of solving these problems, search engine providing personalized services which enhances retrieval efficiency come into focus. The search engine provides information which users are interested in initiatively, and offers different service strategies and content for different users, i.e. personalized service mode of information.
     In this paper, not only modeling technology of personalized search has been studied, but also sprint algorithm and agent technology together have been applied to the modeling process, which improve the speed and accuracy of modeling. Thus, interest model is closer to the user's actual preferences. The model analyzes the user’s performance, forecasts their interest, and optimizes their query statements, so it finally can raise the efficiency of information retrieval.
     This paper provide the process interest model. First, data structure design. According to the implementation requirement of sprint, three tables of special structure have been designed which store required data in different modeling stages. Then, sprint algorithm application. The algorithm applied in data modeling is an innovative point of this article. The establishment of interest model as well as the implementation of data extraction leaning in each step is the main focus of this paper. Finally, interest model establishment. For the interest model, this paper proposes a unique establishment method and a model structure. Data mining algorithm is fully applied in the modeling process, and interest model is extracted while building interest tree with the user's link operation.
     The essence of this thesis is to establish an interest model which is accurate and close to the user’s requirement. This interest model provides different user’s orientations with different personalized services. When users lose in messy network resources, the model can help them find what they want effectively and efficiently.
引文
[1] Ian H.Witten,Erbe Frank.数据挖掘实用机器学习技术.董琳,邱泉,于晓峰.第一版.机械工业出版社,2007:241-251页
    [2] Jiawei Han,Micheline Kamber.数据挖掘概念与技术.范明,孟小峰.第二版.机械工业出版社,2007:1-20页
    [3]罗可,张学茂.SPRINT算法及其改进方法.计算机工程与应用.2005,No.32:1-3页
    [4] Mehmed Kantanrdzic.数据挖掘概念、模型、方法和算法.闪四清,陈茵,程雁等.第一版.清华大学出版社,2003:195-205页
    [5]刘友军,汪林林. SPRINT算法的改进.计算机工程.2006,Vol.32,No.16:2-5页
    [6]彭程,罗可. SPRINT算法中寻找连续属性分割点方法的改进.计算机工程与应用.2006, No.27:3-4页
    [7]刘晓庆.浅析数据挖掘的研究现状及其应用.电脑知识与技术(学术交流).2006,No.26:23-24页
    [8]毛国君等.数据挖掘原理与算法.第二版.清华大学出版社,2007:233-236页
    [9]于蕾.基于元学习的决策树分类算法研究.吉林大学硕士学位论文.2009:25-36页
    [10]董峰,刘远军.数据挖掘中决策树SPRINT算法探讨.邵阳学院学报(自然科学版).2007,Vol.4 ,No.2:1-3页
    [11] Jennifer Grappone, Gradiva Couzin.搜索引擎优化.杨明军译.第二版.清华大学出版社,2007:56-66页
    [12]夏小玲,乐嘉锦.基于Agent的个性化智能信息检索系统.计算机软件与应用.2003,Vol.20,No.8:1-4页
    [13]黄艳.基于Web的个性化信息检索技术研究.西北大学硕士学位论文.2008:11-16页
    [14]李树青,韩忠愿.个性化搜索引擎原理与技术.科学出版社.2008:22-35页
    [15]路海.个性化网上信息过滤智能体及其自然语言人机接口的实现.重庆大学硕士学位论文.2007:5-7页
    [16]陈恩红等.Web使用挖掘:从Web数据中发现用户模式.计算机科学.2001,Vol.28,No.5:85-88页
    [17]蔡国民.基于XML的个性化信息检索系统研究.中南大学硕士学位论文.2007
    [18]张玉峰.基于Agent的个性化服务模型研究.情报学报,2001,Vol.20,No.5: 555-559页
    [19]卢亮,张博文.搜索引擎原理、实践与应用.电子工业出版社, 2007:123-135页
    [20]杨善林,倪志伟.机器学习与智能决策支持系统.科学出版社,.2004:145-150页
    [21] Kargupta H,Johnson E , Riva Sanseverino E . Scalable data mining from distributed,heterogeneous data,using collective learning and gene expression based geneticAlgorithms,Technical Report EECS-98-001.Washington:School of ElectricalEngineering and Computer Science,Washington State University,1998.
    [22]刘炜.智能元搜索引擎中个性化模式库的研究.太原理工大学博士学位论文.2007
    [23]王海腾.基于多Agent的搜索引擎优化研究.哈尔滨工业大学硕士学位论文.2007
    [24]孙荣霜.面向主题的多Agent搜索系统.扬州大学硕士学位论文.2008
    [25]毛新军,赵建民,王怀民.多Agent系统的抽象合作模型.计算机研究与发展.2004,Vol.41,No.5:787-795页
    [26]李富萍,曾建潮.基于多Agent的搜索引擎设计研究.计算机应用.2004,Vol.24 ,No.5:203-205页
    [27]白丽君.基于智能Agent的用户兴趣发现和更新.计算机工程.2003,Vol.20,No.1:70-7 2页
    [28]张磊,张代远.中文分词算法解析.电脑知识与技术(学术交流).2009,Vol.5,No.1:192-193页
    [29]张启宇,朱玲,张雅萍.中文分词算法研究综述.情报探索. 2008,Vol.11,No.133:53-56页
    [30]张春霞,郝天永.汉语自动分词的研究现状与困难.系统仿真学报.2005,Vol.11,No.1:87-91页
    [31]张李义,李亚子.基于反序词典的中文逆向最大匹配分词系统设计.现代图书情报技术.2006,Vol.11,No.8:134-137页
    [32]李丹宁,李丹,王保华,马新强.几种基于词典的中文分词算法评价.贵州科学.2008,Vol.26,No.3:1-8页
    [33] Jhieh-Yu Shyng , Gwo-Hshiung Tzeng , Shu-Hui Hsieh , Fang-Kuo Wang . Data Mining for Multi-Domain Decision-Making Based on Rough Set Theory . IEEE International Conference on Systems , Man , and Cybernetics October 8-11 , 2006, Taipei, Taiwan
    [34] Ian H. Witten, Eibe Frank . Data mining : practical machine learning tools and techniques with Java implementations . China Machine Press . 2003:169-179P
    [35] Pang-Ning Tan, Michael Steinbach, Vipin Kumar . Introduction to data mining . Posts & Telecom Press . 2006:67-71P
    [36] Ian H. Witten, Eibe Frank . Data mining : practical machine learning tools and techniques . China Machine Press . 2005:302-312P
    [37] Pascal Poncelet, Florent Masseglia & Maguelonne Teisseire . Data mining patterns : new methods and applications . Information Science Refernce .2008:111-123P
    [38] Jiawei Han, Micheline Kamber. DATA MINING:Concepts and Techniques. Morgan Kaufmann Publishers . 2000
    [39]何炎祥,陈莘萌.Agent和多Agent系统的设计与应用.武汉大学出版社. 2001:33-39页
    [40]赵蕊.基于WEKA平台的决策树算法设计与实现.中南大学硕士学位论文.2006:30-44页
    [41] John Shafer,Rakesh Agrawal,Manish Mehta . SPRINT:A Scalable Parallel Classifierfor Data Mining . In Proc.of the 22nd Int.Conf.on Very Large Databases Mumbai(Bombay).India . 1996:15-22P
    [42] Caragea D , Silvescu A , Honavar V . Decision tree induction from distributed heterogeneous autonomous data sources . In Proceedings of the Conference on intelligent Systems Design and Applications.USA . 2003
    [43] Dietterich T G . An experimental comparison of three methods for constructing ensembles of decision trees:Bagging,boosting and randomization . Machine Learning . 2000, Vol.40,No.2:139-158P
    [44] Paukl P . Maglio, Rob Barrett, How to Build Modeling Agents to Support Web Searches. Spring-verlag. 1997,Vol.2:12-17P
    [45] Xuehua shen, Bin Tan, ChengXiang Zhai, Implicit User Modeling for Personalized Search. CIKM’05. 2005,Vol.6:283-290P
    [46] Kritikopoulos and M. Sideri. The Compass Filter:Search engine result personalization using Web communities. In Proceeding of ITWP. 2003:229-240p
    [47] Lieberman H, Letizia. An agent that assists web browsing.Int Jonit Conf.On Artificial Inteligence,Montreal,Aug. 1995
    [48] Steinberg D, Colla P. L. CART: Tree-structured nonparametric data analysis. San Diego, CA:Salford Systems.1995

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700