基于协同过滤的个性化新闻推荐系统的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网的快速发展,信息呈爆炸式增长,用户逐渐由信息匮乏时代迈入了信息过载时代——过量信息反而使得用户无法找到自己需要的信息。为了方便互联网用户快速查找到所需信息,研究者提出了很多方法:门户网站,相对专业的信息源;分类目录,对热门网站分门别类;搜索引擎,只需输入关键词就能找到所需的信息。但用户需求不止于此,用户很多时候并没有明确信息获取指向,个性化推荐技术以其能够过滤大量用户不感兴趣的内容,帮助用户发现自身潜在喜欢的内容,得到了广泛应用。随着个性化推荐在电子商务领域大放异彩,个性化推荐技术逐步应用到其他领域,比如个性化新闻推荐。互联网步入到大数据时代,也给个性化新闻阅读发展提供了良好的机遇。
     新闻个性化推荐系统在理论研究中取得了长足进展,但仍有很多问题亟待解决:可扩展性问题、时效性问题、冷启动问题、数据稀疏性问题等,因此高效可扩展的个性化新闻推荐系统是论文的研究重点。本文的主要工作为:
     1.提出新的相似度计算方法,结合行为相似度和内容相似度,解决了传统相似度计算方法计算不准确或无法计算的问题,解决了协同过滤推荐数据稀疏性问题。
     2.提出新的适合个性化新闻推荐的可扩展聚类方法,更改了中心点选取方式和距离度量方式,使得新闻推荐系统的可扩展性大大提高。
     3.在个性化新闻推荐系统相似度计算阶段和最终推荐阶段融入了时间因素,保证了所推荐新闻的时效性。
     4.基于MapReduce模型实现整个协同过滤新闻推荐系统,使得个性化新闻推荐系统能够并行运行,可扩展性大大提高,适应了海量新闻和海量用户的个性化推荐需求。
     5.对聚类方法和个性化新闻推荐方法进行了实验,确定了相关参数,对最终基于协同过滤的个性化新闻推荐系统进行了功能测试,验证了推荐系统相关功能。
     论文首先分析了当前个性化推荐技术的研究现状和Hadoop云计算平台,阐述了论文提出的个性化新闻推荐的聚类方法和基于多维相似度的个性化推荐算法,最后给出了基于MapReduce模型实现的新闻推荐系统,并给出了详细的测试和评估结果。
With the rapid development of the Internet, the amount of information grew rapidly. Users gradually enter the era of overload information from the era of poor information. However, users often can not find the information they need under this situation. In order to make network users find the information they need conveniently, researchers put forward many methods:the portal site, a professional information source; classification catalogue, grouping popular web sites into categories; search engine, through which users can find the necessary information simply by typing some key words. However, the users' requirements go beyong that, they often donot know what information they need. Personalized recommendation systems are designed to filter a large amount of content the users aren't interested in and help them find their own potential favor, and it has been widely used. With the great success achieved in the electronic commerce, personalized recommendation starts gradually penetrating into news field. As the Internet steps into the big data age, it also gives personalized news reading a good development opportunity.
     Personalized news recommendation system has made considerable progress in the theorectical study, but there are still many problems to be solved:scalability, timeliness, cold start, data sparseness and so on. So this thesis focuses on efficient scalable personalized news recommendation system. The work of this thesis mainly reflects in the several aspects.
     1. Puts forwad a new similarity measure which combines behavior similarity and content similarity. The new similarity measure solves the problems that traditional similarity calculation method is inaccurate or can not be calculated, alleviating the data sparseness.
     2. Proposes a new scalable clustering method for personalized news recommendation, and changes the central point selection method and the distance metrics, greatly improving the news recommendation system scalability.
     3. Combines time factor in the stage of similarity calculation and final recommendation in the personalized news recommendation system, and ensures the time feature of news recommended.
     4. The collaborative filtering recommendation system is implemented based on MapReduce model, making the system runing simultaneously. This improves the system scalability and makes the system adapt to the demand of mass news and mass users recommendation.
     5. Experiments the clustering method and personalized news recommendation method to determine the revelant parameters. Also, this thesis gives functional test to news recommendation system to verify the system.
     This thesis analyzes the current personalized recommendation technology and Hadoop cloud computing platform. For special personalized news recommendation field, this thesis proposes a new scalable clustering method and a new similarity measure and verifys the effectiveness of the algorithms. On this basis, we design and realize the personalized news recommendation system using MapReduce model. Finally we give a detailed testing and evaluation results.
引文
[1]MingSheng Shang, ZiKe Zhang, TaoZhou. Collaborative filtering with diffusion-based similarity on tripartite graphs[J]. Physica A:Statistical Mechanics and its Applications, 2010,389(6):259-1264.
    [2]http://tech.163.com/special/cnnic30/
    [3]Greg Linden, Brent Smith, Jeremy York. Amazon.com recommendations:item-to-item collaborative Filtering[J]. IEEE Internet Computing,2003,7(1):76-80.
    [4]Xiaoyuan Su, Taghi M Khoshgoftaar. A survey of collaborative filtering techniques [J]. Advances in Artificial Intelligence,2009,2009(4).
    [5]D. Jurafsky, J.H. Martin, A. Kehler, K. Vander Linden, and N. Ward. Speech and language processing. Prentice Hall,2000.
    [6]Ahn, J., Brusilovsky, P., Grady, J., He, D., Syn. Open User Proiles for Adaptive News Systems:Help or Harm. In 16th International Conference on World Wide Web (WWW 2007).
    [7]D. Billsus and M.J. Pazzani. A personal news agent that talks, learns and explains. In Proceedings of the 3rd Annual Conference on Autonomous Agents, pages 268-275, 1999.
    [8]Tan, A. and Tee, C. "Learning User Profiles for Personalized Information Dissemination," Proceedings of 1998 IEEE International Joint conference on Neural Networks, pp.183-188, May 1998
    [9]Yuxin Ding, Xiaolong Wang, Jun Xu. Topic based automatic news recommendation using topic model and affinity propagation,Machine Learning and Cybernetics (ICMLC),2010 International Conference.
    [10]Abhinandan Das and Mayur Datar. Google News Personalization:Scalable Online Collaborative Filtering. World Wide Web Pages,2007,124(20):271-280.
    [11]G. Shani, D. Heckerman, and R. I. Brafman. An MDP-based recommender system[J]. Journal of Machine Learning Research,2005,6:1265-1295.
    [12]Alain Pirotte, Jean-Michel Renders, Marco Saerens. Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation[J]. Knowledge and Data Engineering,2007,19(3):355-369.
    [13]P. Melville, R. J. Mooney, and R. Nagarajan. Content-boosted collaborative filtering for improved recommenda-tions. Proceedings of the 18th National Conference on Artificial Intelligence (AAAI'02), pp.187-192, Edmonton,Canada,2006.
    [14]Lei Li, Dingding Wang, Tao Li. SCENE:A Scalable Two-Stage PersonalizedNews Recommendation System [J]. Research and development in Information Retrieval,2011, 32(1):10-15.
    [15]http://hadoop.apache.org/
    [16]http://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudyl
    [17]项亮,陈义,王益.推荐系统实践.北京.人民邮电出版社.2012.
    [18]http://xlvector.net/blog/?p=490
    [19]Yuan Xiumei, Wu Pujun. Content-Based Recommendation Model in Micro-blogs Community[J]. Management of e-Commerce and e-Government (ICMeCG),2012 International Conference,2012, Page:165-168.
    [20]Andrew I. Schein, Lyle H. Ungar, David M.Pennock. Methods and metrics for cold-start recommendations[J]. Research and development in information retrieval, 2012,22(1):253-260.
    [21]Zan Huang, Hsinchun Chen, Daniel Zeng. Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering[J]. ACM Transactions on Information Systems,2004,22(1):116-142
    [22]Yehuda Koren, Robert Bell. Advances in collaborative filtering[M]. US: Recommender Systems Handbook,2011.
    [23]John S. Breese, David Heckerman, Carl Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering[R]. US:Morgan Kaufmann Publishers,1998.
    [24]Schein AI, Popescul A, Ungar LH, et al. Methods and metrics for cold-start recommendations[J]. Research and development in information retrieval,2012,15(9): 253-260.
    [25]CJ Zhang, A Zeng. Behavior patterns of online users and the effect on information filtering[J], Physica A,2012,391(4):1822-1830.
    [26]郭艳红.推荐系统的协同过滤算法与应用研究[D].大连:大连理工大学,2008.
    [27]孙少华.协同过滤系统的稀疏性与冷启动问题研究[D].杭州:浙江大学,2005.
    [28]邓爱林,朱扬勇,施伯乐.基于项目评分预测的协同过滤推荐算法[J].软件学报,2003,14(09):1621-1628.
    [29]张红霞,杨渊.基于客户行为和兴趣变化的电子商务推荐系统[J].宝鸡文理学院学报,2012,2(02):1-5.
    [30]梁胜勇.协同过滤算法中新型相似度计算方法的研究[D].南宁:广西大学,2010.
    [31]李军华.云计算及若干数据挖掘算法的MapReduce化研究[D].成都:电子科技大学,2011.
    [32]Jing Jiang. Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. Computer Science Technology,2011,7(4):123-126.
    [33]TH Roh. The collaborativefiltering recommendation based on SOM. cluster-indexing CBR Expert Systems with Applications,2003,25(3):413-423.
    [34]Dakhel G.M. A new colloaborative filtering algorithm using K-means clustering and neighbors'voting[J]. Hybrid Intelligent Systems,2011,25(4):110-115.
    [35]http://wiki.apache.org/hadoop/HDFS?action=show&redirect=DFS.
    [36]http://wiki.apache.org/hadoop/HadoopMapReduce.
    [37]Bobadilla J., Serradilla F, Bemal, J. A new collaborative filtering metric that improves the behavior of recommender systems. Universidad Politecnica de Madrid, Computer Science,2010.
    [38]John S. Breese David Heckerman and Carl Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Fitlering. Proceeding UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, Pages 43-52, USA,1998.
    [39]http://dig.chouti.com.
    [40]Lance Parsons, Ehtesham Haque, Huan Liu. Subspace clustering for high dimensional data:a review. ACM SIGKDD Explorations Newsletter-Special issue on learning from imbalanced datasets.2004.
    [41]D Kim, BJ Yum. Collaborative filtering based on iterative principal component analysis. Expert Systems with Applications,,2005.
    [42]Ibrahim Yakut, Huseyin Polat. Privacy-preserving SVD-based collaborative filtering on partitioned data.2010.
    [43]Kazuyoshi Yoshii, Masataka Goto. Continue plsi and smoothing techniques for hybrid music recommendation. US, Goolge Patents,12/347958,2008.
    [44]Wen-Yen Chen, Jon-Chyuan Chu, Junyi Luan. Collaborative filtering for orkut communities:discovery of user latent behavior.2009.
    [45]http://en.wikipedia.org/wiki/Locality-sensitive_hashing.
    [46]http://mahout.apache.org/.
    [47]http://hbase.apache.org/.
    [48]http://hbase.apache.org/
    [49]http://dig.chouti.com

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700