基于Hadoop的Slope One及其改进算法实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
用户推荐系统是一种通过分析用户的个人喜好,例如用户浏览过或者已经购买的商品的信息,向用户推荐其可能喜欢的项目的智能系统。它可以在一定程度上帮助人们在海量信息中寻找自己喜欢的内容。用户推荐系统的核心是个性化推荐技术,现在比较成熟推荐技术主要基于协同过滤算法。但由于用户兴趣的不稳定性和模糊性,这些方法仍然不能够很好的理解用户喜好,从而影响了推荐的效果。
     相对传统的基于用户对项目评分的协同过滤算法,Slope One算法简单、高效。但该算法依赖于大量用户对待预测项目的评分,如果对预测项目的评分的用户没有或者较少,就会遭遇“冷启动”的问题。同时Slope One算法只考虑了不同用户间评分的相似性,而没有考虑同一个用户对项目评分的个人习惯,这些都可能对评分预测结果有所影响。为了解决这个问题,引入了项目的内容相似性,考虑了描述项目的关键字语义相似和项目类型相似这两个因素。利用这些相似性去度量项目间的相近程度,并结合用户对其他项目的评分提出了一种基于项目内容相似的Slope One算法。
     最后在Hadoop平台上,基于MapReduce分布式编程模型设计了一套Slope One及其改进算法的实现方法,并在标准的MovieLens数据集上进行实验。实验结果表明SlopeOne算法随着数据集用户评分记录数量的增加能够改善算法预测的性能。同时加入了项目内容相似因素的新算法可以在一定程度上解决原算法可能出现的预测精度降低的问题。
Recommendation system is an intelligent system which introduces the items to the users by analyzing the users'personal preferences, such as the goods'information about a user has visited or bought. To some extent it can help people to find what they needed from the vast amounts of information. The core of the user recommendation system is the personalized recommendation technology, whose most mature technology is based on collaborative filtering recommendation algorithms now. However, due to the instability of users'interest, these methods still could not understand what the user like, which affect on the results of recommendation.
     Compared to the traditional collaborative filtering algorithm based on user ratings, Slope One algorithm is simple and efficient. But it depends on the users'ratings that it will encounter "cold start" problem as predicting items'ratings which are not enough. Moreover Slope One algorithm only considered the similarity between different users, without regarding to the users'personal habits, which may have an impact on the score prediction. To solve this problem, the similarity of the item-content is taken into account, including the semantic similarity of keywords describing the items and item-type-similarity. By using of them to measure the similarity between items, a new Slope One algorithm based on the user's ratings on other items is proposed.
     Finally, Slope One and its improved algorithm are both completed over the Hadoop platform by the MapReduce distributed programming model, and test them. The results show that the Slope One algorithm could improve the prediction performance with the amount of records in the data sets increasing. Meanwhile the new Slope One algorithm can improve the accuracy of prediction which mixes the factor of item-content-similarity.
引文
[1]郑华,彭欣.第三方电子商务的个性化信息推荐系统.计算机工程与设计.2009,30(12)
    [2]Tariq Mahmood, Francesco Ricci.Improving recommender systems with adaptive conve-rsational strategies. HT'09 20th ACM Conference on Hypertext and Hypermedia,Toronto, 2009. ACM Press
    [3]Paul Resnick, Hal R. Varian.Recommender systems. Communications of the ACM, New York,1997. ACM Press
    [4]Robin Burke.Hybrid web recommender systems.The adaptive web, Heidelberg,2007.Sp-ringer-Verlag Berlin
    [5]A. Albadvi,M. Shahbazi.A hybrid recommendation technique based on product category attributes.Expert Systems with Application.2009, Volume 36:11480-11488
    [6]Y.H. Cho,K. Kim. Application of web useage mining and product taxonomy to collabor-ative recommendations in e-commerce.Expert Systems with Applications.2004:233-246
    [7]刘浩淼,徐从富,何俊.基于模糊聚类的歌曲智能推荐方法研究.计算机工程与设计.2009,30(10)
    [8]李娜,李爱军.基于用户特征分类地精准广告投放研究.电脑知识与技术.2010,第6卷
    [9]周涛,李华.基于用户情景的协同过滤推荐.计算机应用.2010,30(4)
    [10]雷瑛,吴晶,熊璋.基于项目分层的个性化推荐方法.计算机工程与设计.2007,28(21)
    [11]马宏伟,张光卫,李鹏.协同过滤算法综述.小型微型计算机系统.2009,30(7):1283-1284
    [12]刘志昆,王卫平.基于精确序列模式的网页个性化推荐.计算机系统应用.2006
    [13]Yhuda Koren.The bellkor solution to the Netflix grand prize.2009
    [14]Michael Lesk.How much information is there in the world?.http://www.lesk.com/mlesk/ ksg97/ksg.html
    [15]谢桂兰,罗省贤.基于Hadoop MapReduce模型的应用研究.微型机与应用.2010:4-7
    [16]林清滢.基于Hadoop的云计算模型.现代计算机.2010:114
    [17]杨代庆,张智雄.基于Hadoop的海量共现矩阵生成方法.现代图书情报技术.2009
    [18]Jiawei Han,Micheline Kamber.数据挖掘概念与技术.范明,孟小峰.机械工业出版社,2007
    [19]K. P. Soman,Shyam Diwakar,V. Ajar.数据挖掘基础教程.范明,牛常勇.机械工业出版社,2009
    [20]Daniel Lemire,Anna Maclachlan.Slope One Predictors for Online Rating Based Collab-orative Filtering.SIAM Data Mining(SDM'05),2005:21-23
    [21]赵静,但琦.数学建模与数学实验.第三版.高等教育出版社,2010
    [22]Jeffrey Dean, Sanjay Ghemawat.MapReduce:Simplified Data Processing on Large Clusters. Communications of the ACM-50th anniversary issue:1958-2008, Volume 51 Issue 1, January 2008:107-113
    [23]http://www.freepatentsonline.com/7650331.html
    [24]周明伟.多核计算与程序设计.华中科技大学出版社,2009
    [25]Tom White.Hadoop权威指南.曾大聃,周傲英.清华大学出版社,2010:9-10,47-48,92-97
    [26]Venner, Jason.Pro hadoop.Apress,2009
    [27]Chuck Lam.Hadoop in action.manning,2010
    [28]Slobodan Vucetic,Zoran Obradovic. A regression-based approach for scaling-up person-alized recommender systems in e-commerce.2000
    [29]黄毅,李树青.基于查全系数与查准系数的检索系统评价模型.情报杂志.2010:86
    [30]秦春秀,赵捧来,刘怀亮.词语相似度计算研究.情报理论与实践.2007,30(1)
    [31]Gorge A.Miller. WordNet A Lexical Database for English.Communications of the ACM. 1995,ACM Press:39-41.
    [32]黄世国,耿国华,语义相似性测度方法研究综述.计算机应用与软件.2008
    [33]Alexander Budanitsky,Graeme Hirst.Evaluating WordNet-based measures of lexical semantic relatedness.Department of Computer Science,Toronto,Canada.2005
    [34]Resnik.Using information content to evaluate semantic similarity. Proc 14th Internation-al Joint Conference on Artificial Intelligence,1995:448-453
    [35]罗志高.国外英语语料库简介.重庆科技学院学报(社会科学版).2008:129
    [36]Wu Zhibiao, Palmer M. Verb semantics and lexical selection. In Proceedings of 32nd Annual Meeting of the Association for Computational Linguistics,1994:133-138
    [37]Dekang Lin.An Information-Theoretic Definition of Similarity.Proceedings of the 15th International Conf. On Machine Learning Morgan Kaufmann.San Francisco, CA,1998,296-304.
    [38]http://www.merriam-webster.com/dictionary/
    [39]耿素云,屈婉玲.离散数学.修订版.高等教育出版社.2006
    [40]耿永利.基于Wiener滤波的协作过滤算法改进研究.电脑知识与技术.2010,6(14)
    [41]周军锋,汤显,郭景峰.一种优化的协同过滤推荐算法.计算机研究与发展.2004,41(]0)
    [42]Sean Owen,Robin Anil,Ted Dunning,Ellen Friedman.Mahout in action.Manning Publications,2010
    [43]http://pig.apache.org/
    [44]http://www.grouplens.org
    [45]Francesco Ricci,Lior Rokach,Bracha Shapira,Paul B. Kantor.Recommender Systems Handbook. Springer,2011:108

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700