RSS个性化信息服务的用户兴趣模型研究及应用

英文题名：User Interest Model of RSS Personalized Information Service Research and Application
作者：郭力军
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：RSS ; 个性化信息服务 ; 用户兴趣模型 ; 向量空间模型 ; 农业信息化
英文关键词：RSS ; personalized information service ; user interest model ; VSM ; agricultural information
学位年度：2010
导师：朱群雄
学科代码：081203
学位授予单位：北京化工大学
论文提交日期：2010-05-26
答辩委员会主席：赵英

摘要

互联网飞速发展带来了信息资源的爆炸式增长,人们需要花费大量的时间和精力从繁杂、海量的信息中筛选出自己需要的信息。“信息迷失”和“信息过载”问题的日益严重迫切要求提供个性化的信息服务。
     搜索引擎是目前主要的信息搜索工具,但是使用搜索引擎返回的结果数量多,质量良莠不齐,难于筛选和甄别,并且要求用户具有专业的检索知识和经验,对于普通用户使用难度较大,而且搜索的信息也具有滞后性。而RSS(简易新闻聚合)是一种信息聚合的简易方式,利用RSS技术可以实现信息推送,能够为人们方便及时地提供最新的信息服务。然而,当前的RSS信息服务对每个用户提供的内容统一,缺乏个性化特征。
     实现RSS个性化信息服务的基础和核心是用户兴趣建模技术。在基于RSS数据源的用户兴趣模型研究领域,已有的模型是三层结构的树状用户兴趣模型,由用户模型根节点、信息类别和用户兴趣子类构成。本文改进了原有模型的表示形式以及关键词权值计算方法,并且将用户的兴趣标记为长期兴趣和短期兴趣,针对两种不同的兴趣采用不同的更新机制。实验表明,改进模型的个性化程度更高,对用户兴趣更新更及时准确。模型在农业信息领域的应用,验证了本文研究课题的应用价值。
With the development of internet, information resource is expanding with an alarming speed, people have to spend too much time and energy on searching needful information. The problems, such as "information overload" and "information labyrinth", which are more and more serious to the ordinary users, so it is requested that personal information service have to be provided urgently.
     Search engines attempt to resolve the problem of mass information searching, but the number of results returned is too large, and the quality good and evil intermingled, as a result, the choice is very difficult, and these are hysteretic. Moreover, it is necessary for users using search engines have the specialized knowledge and experience of retrieving. However, The RSS (Really Simple Syndication) is an easy way to aggregate information, using RSS technology the latest information can be push to people easily and quickly. However, The current RSS information service content provided to each user is unification, and lacks the personalized characteristic.
     User modeling technology is the base and core of RSS personalized information service. In the research of user interest model based on RSS data source, a user interest model with tree structure which has three layers, user point, information sorts, and user interest classes had been proposed. In this paper, on the basis of the former model, the form of user interest model could be improved, and the keyword's weight determination could be improved by adding the date in it. On the other hand, user's interest could be divided into the long-term interest and the short-term interest, which have different update mechanism. The experiment results indicate that the improved user interest model is well personalized and efficiently updated. The application of model in the agriculture information field, has confirmed the paper research application value.

引文

[1]庄鹏,张惠惠,夏佩福.代理模式实现数字图书馆个性化信息服务模型[J].现代图书情报技术,2003,5：15-17
    [2]董其军.基于CORBA技术的图书馆个性化信息服务系统研究[J].情报理论与实践,2002,25(5)：360-362
    [3]Mylibrary@CornCll.[2005-2-24]. http://mylibrary.cornell.edu/MyLibrary/Main
    [4]詹黎锋.高校图书馆网络个性化信息服务探析[J].信息化与网络建设,2004,3：69-71
    [5]李期位.农业信息智能推送技术的研究与实现[D].中国农业科学院硕士学位论文,2006,22-23
    [6]成建权,朱咫渝.从Push到RSS—个性化信息服务的利器[J].现代情报,2006,11：14-15
    [7]Adomavicius G, Tuzhilin A. Using data mining methods to build customer profiles[J]. Computer, 2001,34(2):74-82
    [8]Koychev I, Schwab 1. Adaptation to Drifting User's Intersects[A]. Proc. of the Workshop on Machine Learning in New Information Age[C]. Barcelona, Spain:[s. n.],2000
    [9]宋丽哲,牛振东,余正涛,宋瀚涛,董祥军.一种基于混合模型的用户兴趣漂移方法[J].计算机工程,2006,32(1)：4-5
    [10]费洪晓,戴弋,穆珺,黄勤径,罗桂琼.基于优化时间窗的用户兴趣漂移方法[J].计算机工程,2008,34(16)：210-212
    [11]Mark Claypool, David Brown, Phong Le, Makoto Waseda. Inferring user interest[J]. Internet Computing, IEEE,2001,5:32-39
    [12]Glen Jeh, Jennifer Widom. Scaling Personalized Web Search[R]. Stanford University Technical Report,2002
    [13]Greg Linden, Brent Smith, Jeremy York. Amazon com recommendations item-to-item collaborative filtering [J]. Internet Computing, IEEE,2003,7:76-80
    [14]Tan A H, Teo C. Learning User Profiles for Personalized Information Dissemination[A]. IEEE International Joint Conference on Neural Networks[C],1998,183-188
    [15]Shepherd M, Watters C, Marath A T. Adaptive user modeling for filtering electronic news[A]. In:Proceedings of the 35th Annual Hawaii International Conference on System Sciences[C], 2002,1180-1188
    [16]Kim W, Kerschberg L, Scime A. Learning for automatic personalization in a semantic taxonomy-based meta-search agent[J]. Electronic Commerce Research and Applications,2002, 1:150-173
    [17]Miller G A, Beckwith R, Fellbaum C. Introduction to WordNet:an On-line Lexical Database[J]. International Journal of Lexicography,1990,3(4):235-244
    [18]Baker C F, Fillmore C J, Lowe J B. The Berkeley FrameNet Project[J], Coling-ACL'98,1998, 86-90
    [19]Kozima H, Furugori T. Similarity between words computed by spreading activation on an English dictionary[A]. In:Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics[C],1993,21-23
    [20]Degemmis M, Lops P, Semeraro G. WordNet-Based Word Sense Disambiguation for Learning User Profiles[J]. EWMF/KDO 2005, LNAI 4289,2005,18-33
    [21]Degemmis M, Lops P, Semeraro G. Learning Semantic User Profiles from Text[J]. Advanced Data Mining and Applications,2006,4093:661-672
    [22]Kaufman L, Rousseeuw P. Finding Groups in Data:An Introduction to Cluster Analysis[M], 1990
    [23]Guha S, Rastogi R, Shim K. CURE:An efficient clustering algorithm for large databases[A], in Proc. ACM SIGMOD Int. Conf. Management of Data[C],1998,73-84
    [24]Guha S, Rastogi R, Shim K. ROCK:A robust clustering algorithm for categorical attributes[A], Inf. Syst[C],2000,25(5):345-366
    [25]Karypis G E, Han V, Kumar. Chameleon:Hierarchical clustering using dynamic modeling[A], IEEE Computer[C],1999,32(8):68-75
    [26]Zhang T, Ramakrishnan R, Livny M. BIRCH:An efficient data clustering method for very large databases[A], in Proc. ACM SIGMOD Conf. Management of Data[C],1996,103-114
    [27]Xu R, Wunsch D. Survey of clustering algorithms[A], IEEE Transactions on Neural Networks[C],2005
    [28]Han J, Kamber M. Data Mining:Concepts and Techniques[M],2006
    [29]Jain A, Dubes R. Algorithms for Clustering Data[A]. Englewood Cliffs[C], NJ:Prentice-Hall, 1988
    [30]Miiller K S, Mika G, Ratsch K, Tsuda B, Scholkopf. An introduction to kernel-based learning algorithms[A]. IEEE Trans. Neural Net[C],2001,12(2):181-201
    [31]Hoppner F, Klawonn F, Kruse R. Fuzzy Cluster Analysis:Methods for Classification, Data Analysis and Image Recognition[M]. New York:Wiley,1999
    [32]Salton G, Buckley C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management,1988,24(5):513-523
    [33]王平,朱明.基于RSS信息源的用户兴趣建模与更新[J].计算机仿真,2005,12：46-48
    [34]全海金,邱玉辉,李瑞.基于用户行为及语义相关实时更新用户兴趣的推荐系统[J].计算机科学,2005,32(3)：77-78
    [35]吴晶,张品,罗辛,盛浩,熊璋.门户个性化兴趣获取与迁移模式发现[J].计算机研究与发展,2007,44(8)：1285-1288
    [36]吴丽辉,王斌,张刚.一个个性化的Web信息采集模型[J].计算机工程,2005,31(22)：86-87
    [37]宋丽哲,牛振东,宋瀚涛,余正涛,师雪霖.数字图书馆个性化服务用户模型研究[J].北京理工大学学报,2005,25(1)：61-62
    [38]Lancieri L, Durand N. Internet user behavior:Compared study of the access traces and application to the discovery of communities [J]. IEEE Trans on System, Man and Cybernetics-Part A:Systems and Humans,2006,36(1):208-219
    [39]Eirinaki M, Vazirgiannis M. Web Mining for Web Personalization [J]. ACM Transactions on Internet Technology,2003,3(1):1-27
    [40]宋琦,薛建武.智能检索系统中用户兴趣模型构建技术研究[J].情报杂志,2007,1：59-60
    [41]蒋萍,崔志明.智能搜索引擎中用户兴趣模型分析与研究[J].微电子学与计算机,2004,21(11)：24-26

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700