基于协同过滤的推荐算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
W.eb2.0技术将互联网带入了一个崭新的时代,互联网用户在互联网生活中发挥着越来越主动的作用,用户不再只是被动地从互联网上接受信息,而是主动地创造信息,并利用Web2.0平台与其他用户进行交互和分享。随着互联网用户的飞速增长,以用户为中心的信息生产模式造成了互联网信息的爆炸式增长,人们正面临着越来越严重的“信息过载”问题。“信息过载”问题是指,人们无法从海量的信息中快速准确的定位到自己所需要的信息。目前,解决信息过载问题的技术主要分两类,第一类是以搜索引擎为代表的信息检索技术,第二类是以推荐系统为代表的信息过滤技术。两者最重要的区别在于用户通过搜索引擎获取的信息的质量的好坏在很大程度上依赖于用户对于信息求描述的准确程度,而推荐系统不需要用户提供明确的需求,而是从用户的历史行为和数据中出发,建立相关的模型从而挖掘出用户的需求和兴趣,从而以此为依据从海量的信息中为用户筛选出用户感兴趣的信息。由此可见,在用户需求不明确时,推荐系统的作用显得尤为重要。
     到目前为止,已经有许多推荐算法被提出,协同过滤是这些算法中应用最多且最为有效的推荐算法。虽然协同过滤算法已经被成功地应用到许多商业推荐系统中,但是仍然存在着诸如数据稀疏问题、冷启动问题等亟待解决。随着互联网的飞速发展,以微博为代表的各种社交媒体纷纷涌现,以用户为中心的社交网站产生了海量的和用户兴趣相关的数据,如何有效的利用这些数据来改进推荐算法的性能已经成为一个重要的研究领域。针对以上关键问题,本文展开了如下几个方面的研究。
     第一,协同过滤中相似度模型的研究。用户(项目)相似度计算是基于内存的协同过滤算法中最为关键的问题,正负标注信息不对称和数据稀疏性导致了传统的相似度模型不准确从而影响推荐精度。本文针对这两个问题,提出了基于变权重和罚函数的用户相似度模型。实验结果表明,本文提出的算法能够有效缓解上述两个问题,从而提高推荐精度。
     第二,融合社交网络信息的协同过滤算法研究。丰富的社交网络信息给推荐系统带来的新的机遇也提出了更大的挑战,如何有效地挖掘海量的社交网络信息以提高推荐算法的精度是社交网络推荐系统研究的核心问题。本文基于腾讯微博用户的真实社交网络信息,构建有效的用户相似度模型,并将该相似度模型与基于评价矩阵信息的用户相似度模型相结合,提出了融合社交网络信息的协同过滤算法。实验结果表明,通过融合社交网络信息,数据稀疏问题得到了明显缓解且推荐精度显著提高。
     第三,基于用户与基于项目的融合协同过滤算法的研究。根据不同的假设,协同算法可以分为基于用户的方法与基于项目的方法。本文研究了两种方法在推荐性能与效果上的本质差别,并在此基础上针对两种方法的优缺点进行模型融合,提出了融合基于用户和基于项目的融合协同过滤算法。实验结果表明,基于用户的方法更擅长于热门推荐而基于项目的方法更擅长于长尾推荐,本文提出的模型融合算法能有效的缓解数据稀疏问题并提高算法精度。
     第四,协同过滤算法中的全局模型融合与局部模型融合研究。目前存在着许多有效的协同过滤算法(例如基于内存的方法与基于模型的方法、基于用户的方法与基于项目的方法),不同的算都具有各自的优势和缺陷。本文提出了不同的方法对于不同的用户(项目)的适用程度不一致的观点。基于上述观点,本文通过机器学习的方法,自动发现用户(项目)对于各种方法的适应程度,并进行局部模型融合。实验结果表明,局部融合模型比全局融合模型具有更高的推荐精度。
The fast development of Web2.0technology sparked a new revolution of the in-ternet. Users now play a new role in the world of internet, they take the initiative to generate information instead of simply getting information from the web. As the rapid growth of the users'population, the user-centric information generation mode leads to the exponential growth of the available information in internet, which cause the infor-mation overload problem. The information overload problem refers that people can not quickly and accurately locate the information they need. Currently, the technology to solve information overload problem can be classified into to two categories. The first technology is information retrieval represented by the search engine and the second is information filtering represented by recommender systems. The most important differ-ence between these two technologies is that search engines need queries formatted by the user and recommender systems need no queries. Thus the quality of the results of search engines depend on how users describe their information needs. Recommender systems however, filter out the information that the user is interested in by exploiting users'profile data and historical activities(watching,listening,buying etc.). So, recom-mender systems can play an very important role in the situation that uses'can not tell their information need precisely.
     Many recommendation algorithms have been proposed by both academia and in-dustry, collaborative filtering is one of the most effective recommendation algorithms. Collaborative filtering algorithm has been successfully applied to many commercial recommender system, but there are still issues such as the data sparsity problem and the cold start problem to be solved. With the rapid rise of social media, user-centric social networking web sites generate vast amounts of data which may reflects users'interests, how to leverage these data to improve the performance of the recommendation algorith-m has become a very hot research area. In view of the above key issues, this dissertation launched a study of the following aspects.
     First, research on the similarity model of collaborative filtering. User/item simi-larity calculation is the most critical issue in the memory-based collaborative filtering algorithms, sparsity of the rating matrix and unbalance of negative and positive ratings causes inaccurate similarity computation, thus limit the recommendation quality. In this dissertation, we introduce a weighting scheme and a penalty function to address the above issue. Experiment results show that improved similarity model can significantly improve the recommendation accuracy.
     Second, Integrating social information into collaborative filtering. The rich social information brings great opportunities for recommendation system. How to effectively leverage the abundant social network information to improve the accuracy of recom-mendation systems is the core issue of the research on social recommendation systems. In this dissertation, we build an user similarity model based on Tencent micro-blogging users' real social network information, and effectively combine the social information based similarity model and the rating information based similarity model. Experiment results show that the proposed approach can effectively ease the data sparsity problem and improve the recommendation quality.
     Third, combining user-based and item-based collaborative algorithms using stacked regression. Collaborative filtering algorithms can be classified into user-based meth-ods and item-based methods according to different assumptions. In this dissertation, we studied the advantages and disadvantages of both methods and propose a two level machine learning framework to effectively combine both based on stacked regression. Experiment results show that the proposed framework can effectively ease the data s-parsity problem and improve the recommendation quality.
     Fourth, research on global and local model combing. In this dissertation, we claim that different users and items have different preference over user-based and item-based methods. According to above point of view, we use machine learning algorithm to auto-matically discover users and items preference information over these two methods, and use the preference information to locally combine the predictions. Experiment results show that the performance of the local combing model is significantly better than the global combing model in literature.
引文
[1]Goldberg D, Nichols D, Oki B M, et al. Using collaborative filtering to weave an information tapestry. Com-munications of the ACM,1992,35(12):61-70.
    [2]Grouplens. MovieLens data:http://grouplens.org. University of Minnesota.
    [3]Netflix. Netflix prize data:http://www.netflixprize.com/. Netflix Company.
    [4]Goldberg K. Jester data:http://shadow.ieor.berkeley.edu/humor/. Berkeley University.
    [5]Inc T. Kdd cup 2012 trackl data:http://www.kddcup2012.org/c/kddcup2012-trackl. Tencent Inc..
    [6]Goldberg K, Roeder T, Gupta D, et al. Eigentaste:A constant time collaborative filtering algorithm. Information Retrieval,2001,4(2):133-151.
    [7]Billsus D, Pazzani M J. Learning collaborative information filters. Proceedings of the fifteenth international conference on machine learning, volume 54,1998.48.
    [8]Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis. Journal of the American society for information science,1990,41(6):391-407.
    [9]Landauer T K, Littman M L. Computerized cross-language document retrieval using latent semantic indexing, April 5,1994. US Patent 5,301,109.
    [10]Pearson K. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science,1901,2(11):559-572.
    [11]Melville P, Mooney R J, Nagarajan R. Content-boosted collaborative filtering for improved recommendations. Proceedings of the National Conference on Artificial Intelligence. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999,2002.187-192.
    [12]Ziegler C N, Lausen G, Schmidt-Thieme L. Taxonomy-driven computation of product recommendations. Pro-ceedings of the thirteenth ACM international conference on Information and knowledge management. ACM, 2004.406-415.
    [13]Ma H, King I, Lyu M R. Effective missing data prediction for collaborative filtering. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007.39-46.
    [14]Greiner R, Su X, Shen B, et al. Structural extension to logistic regression:Discriminative parameter learning of belief net classifiers. Machine Learning,2005,59(3):297-322.
    [15]Su X, Khoshgoftaar T M. Collaborative filtering for multi-class data using belief nets algorithms. Proceedings of ICTAI'06.18th IEEE International Conference on Tools with Artificial Intelligence,2006. IEEE,2006. 497-504.
    [16]Schein A I, Popescul A, Ungar L H, et al. Methods and metrics for cold-start recommendations. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM,2002.253-260.
    [17]Kim B M, Li Q. Probabilistic model estimation for collaborative filtering based on items attributes. Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society,2004. 185-191.
    [18]Sarwar B, Karypis G, Konstan J, et al. Incremental singular value decomposition algorithms for highly scalable recommender systems. Proceedings of Fifth International Conference on Computer and Information Science. Citeseer,2002.27-28.
    [19]Linden G, Smith B, York J. Amazon, com recommendations:Item-to-item collaborative filtering. Internet Computing, IEEE,2003,7(1):76-80.
    [20]Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms. Pro-ceedings of the 10th international conference on World Wide Web. ACM,2001.285-295.
    [21]Miyahara K, Pazzani M J. Improvement of collaborative filtering with the simple Bayesian classifier. IPSJ Journal,2002,43(11):3429-3437.
    [22]Chee S H S, Han J, Wang K. Rectree:An efficient collaborative filtering method. Proceedings of Data Ware-housing and Knowledge Discovery. Springer,2001:141-151.
    [23]Sarwar B M, Karypis G, Konstan J, et al. Recommender systems for large-scale e-commerce:Scalable neigh-borhood formation using clustering. Proceedings of the fifth international conference on computer and infor-mation technology, volume 1,2002.
    [24]O'Connor M, Herlocker J. Clustering items for collaborative filtering. Proceedings of the ACM SIGIR workshop on recommender systems, volume 128. UC Berkeley,1999.
    [25]Xue G R, Lin C, Yang Q, et al. Scalable collaborative filtering using cluster-based smoothing. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM,2005.114-121.
    [26]Karypis G. Evaluation of item-based top-n recommendation algorithms. Proceedings of the tenth international conference on Information and knowledge management. ACM,2001.247-254.
    [27]Deshpande M, Karypis G. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems (TOIS),2004,22(1):143-177.
    [28]Herlocker J L, Konstan J A, Borchers A, et al. An algorithmic framework for performing collaborative filter-ing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM,1999.230-237.
    [29]Salton G, McGill M J. Introduction to modern information retrieval.1986..
    [30]Su X, Khoshgoftaar T M, Greiner R. A mixture imputation-boosted collaborative filter. Proceedings of the 21th International Florida Artificial Intelligence Research Society Conference (FLAIRS'08),2008.312-317.
    [31]Su X, Khoshgoftaar T M, Zhu X, et al. Imputation-boosted collaborative filtering using machine learning classifiers. Proceedings of the 2008 ACM symposium on Applied computing. ACM,2008.949-950.
    [32]Nakamura A, Abe N. Collaborative filtering using weighted majority prediction algorithms. Proceedings of the Fifteenth International Conference on Machine Learning,1998.395-403.
    [33]Miyahara K, Pazzani M J. Collaborative filtering with the simple Bayesian classifier. Proceedings of PRICAI 2000 Topics in Artificial Intelligence. Springer,2000:679-689.
    [34]Shen B, Su X, Greiner R, et al. Discriminative parameter learning of general Bayesian network classifiers. Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence,2003. IEEE,2003. 296-305.
    [35]Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Machine learning,1997,29(2-3):131-163.
    [36]Breese J S, Heckerman D, Kadie C. Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc.,1998.43-52.
    [37]MacQueen J, et al. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1. California, USA,1967.14.
    [38]Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd,1996.
    [39]Ankerst M, Breunig M M, Kriegel H P, et al. OPTICS:ordering points to identify the clustering structure. ACM SIGMOD Record,1999,28(2):49-60.
    [40]Zhang T, Ramakrishnan R, Livny M. BIRCH:an efficient data clustering method for very large databases. Proceedings of ACM SIGMOD Record, volume 25. ACM,1996.103-114.
    [41]Ungar L H, Foster D P. Clustering methods for collaborative filtering. Proceedings of AAAI Workshop on Recommendation Systems,1998.
    [42]Si L, Jin R. Flexible mixture model for collaborative filtering. Proceedings of the 20th International Conference on Machine Learning, volume 20,2003.704.
    [43]Canny J. Collaborative filtering with privacy via factor analysis. Proceedings of Security and Privacy,2002. Proceedings.2002 IEEE Symposium on. IEEE,2002.45-57.
    [44]Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological),1977.1-38.
    [45]Vucetic S, Obradovic Z. Collaborative filtering using a regression-based approach. Knowledge and Information Systems,2005,7(1):1-22.
    [46]Lemire D, Maclachlan A. Slope one predictors for online rating-based collaborative filtering. Society for Industrial Mathematics,2005,5:471-480.
    [47]Hofmann T. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS),2004,22(1):89-115.
    [48]Marlin B. Modeling user rating profiles for collaborative filtering. Advances in neural information processing systems,2003,16.
    [49]Davidson J, Liebald B, Liu J, et al. The YouTube video recommendation system. Proceedings of the fourth ACM conference on Recommender systems. ACM,2010.293-296.
    [50]Liu J, Dolan P, Pedersen E R. Personalized news recommendation based on click behavior. Proceedings of the 15th international conference on Intelligent user interfaces. ACM,2010.31-40.
    [51]Arguello J, Elsas J, Callan J, et al. Document representation and query expansion models for blog recommen-dation. Proceedings of the 2nd Intl. Conf. on Weblogs and Social Media (ICWSM),2008.
    [52]Sigurbjornsson B, Van Zwol R. Flickr tag recommendation based on collective knowledge. Proceedings of the 17th international conference on World Wide Web. ACM,2008.327-336.
    [53]Hotho A, Jaschke R, Schmitz C, et al. Folkrank:A ranking algorithm for folksonomies. Proc. FGIR,2006, 2006.
    [54]Guy I, Ronen I, Wilcox E. Do you know?:recommending people to invite into your social network. Proceedings of the 14th. international conference on Intelligent user interfaces. ACM,2009.77-86.
    [55]Chen J, Geyer W, Dugan C, et al. Make new friends, but keep the old:recommending people on social net-working sites. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM,2009. 201-210.
    [56]Hannon J, Bennett M, Smyth B. Recommending twitter users to follow using content and collaborative filtering approaches. Proceedings of the fourth ACM conference on Recommender systems. ACM,2010.199-206.
    [57]Golbeck J, Hendler J. Inferring binary trust relationships in web-based social networks. ACM Transactions on Internet Technology (TOIT),2006,6(4):497-529.
    [58]Phelan O, McCarthy K, Smyth B. Using twitter to recommend real-time topical news. Proceedings of the third ACM conference on Recommender systems. ACM,2009.385-388.
    [59]Chen J, Nairn R, Nelson L, et al. Short and tweet:experiments on recommending content from information streams. Proceedings of the 28th international conference on Human factors in computing systems. ACM,2010. 1185-1194.
    [60]Freyne J, Jacovi M, Guy I, et al. Increasing engagement through early recommender intervention. Proceedings of the third ACM conference on Recommender systems. ACM,2009.85-92.
    [61]Jin R, Chai J Y, Si L. An automatic weighting scheme for collaborative filtering. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2004.337-344.
    [62]Resnick P, Iacovou N, Suchak M, et al. GroupLens:an open architecture for collaborative filtering of netnews. Proceedings of the 1994 ACM conference on Computer supported cooperative work. ACM,1994.175-186.
    [63]Wang J, De Vries A P, Reinders M J. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM,2006.501-508.
    [64]Breiman L. Stacked regressions. Machine learning,1996,24(1):49-64.
    [65]Adomavicius G, Sankaranarayanan R, Sen S, et al. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Transactions on Information Systems (TOIS),2005, 23(1):103-145.
    [66]Bobadilla J, Hernando A, Ortega F, et al. Collaborative filtering based on significances. Information Sciences, 2012,185(1):1-17.
    [67]Cacheda F, Carneiro V, Fernandez D, et al. Comparison of collaborative filtering algorithms:Limitations of current techniques and proposals for scalable, high-performance recommender systems. ACM Transactions on the Web (TWEB),2011,5(1):2.
    [68]Cai X, Bain M, Krzywicki A, et al. Collaborative filtering for people to people recommendation in social networks. Proceedings of AI 2010:Advances in Artificial Intelligence. Springer,2011:476-485.
    [69]Choi S S, Cha S H, Tappert C. A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics,2010,8(1):43-48.
    [70]Cramer H, Evers V, Ramlal S, et al. The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction,2008,18(5):455-496.
    [71]Ge M, Delgado-Battenfeld C, Jannach D. Beyond accuracy:evaluating recommender systems by coverage and serendipity. Proceedings of the fourth ACM conference on Recommender systems. ACM,2010.257-260.
    [72]Guy I, Zwerdling N, Ronen I, et al. Social media recommendation based on people and tags. Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.194-201.
    [73]Herlocker J L, Konstan J A, Terveen L G, et al. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS),2004,22(1):5-53.
    [74]Hung C C, Huang Y C, Hsu J Y j, et al. Tag-based user profiling for social media recommendation. Proceed-ings of Workshop on Intelligent Techniques for Web Personalization & Recommender Systems at AAAI2008, Chicago, Illinois,2008.
    [75]Hurley N J. Robustness of recommender systems. Proceedings of the fifth ACM conference on Recommender systems. ACM,2011.9-10.
    [76]Jannach D, Zanker M, Felfernig A, et al. Recommender systems:an introduction. Cambridge University Press, 2010.
    [77]King I, Lyu M R, Ma H. Introduction to social recommendation. Proceedings of the 19th international confer-ence on World wide web. ACM,2010.1355-1356.
    [78]Koren Y, Bell R. Advances in collaborative filtering. Proceedings of Recommender Systems Handbook. Springer,2011:145-186.
    [79]Liu Q, Xiong Y, Huang W. Integrating social information into collaborative filtering for celebrities recommen-dation. Proceedings of Intelligent Information and Database Systems. Springer,2013:109-118.
    [80]Ma H, Yang H, Lyu M R, et al. Sorec:social recommendation using probabilistic matrix factorization. Pro-ceedings of the 17th ACM conference on Information and knowledge management. ACM,2008.931-940.
    [81]Marlin B, Zemel R S, Roweis S, et al. Collaborative filtering and the missing at random assumption. arXiv preprint arXiv:1206.5267,2012..
    [82]Massa P, Avesani P. Trust-aware recommender systems. Proceedings of the 2007 ACM conference on Rec-ommender systems. ACM,2007.17-24.
    [83]McLaughlin M R, Herlocker J L. A collaborative filtering algorithm and evaluation metric that accurately model the user experience. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM,2004.329-336.
    [84]McSherry F, Mironov I. Differentially private recommender systems:building privacy into the net. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2009. 627-636.
    [85]Shani G, Gunawardana A. Evaluating recommendation systems. Proceedings of Recommender systems hand-book. Springer,2011:257-297.
    [86]Shepitsen A, Gemmell J, Mobasher B, et al. Personalized recommendation in social tagging systems using hierarchical clustering. Proceedings of the 2008 ACM conference on Recommender systems. ACM,2008. 259-266.
    [87]Soboroff I, Nicholas C. Collaborative filtering and the generalized vector space model (poster session). Pro-ceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM,2000.351-353.
    [88]Su X, Khoshgoftaar T M. A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009,2009:4.
    [89]Yu C, Lakshmanan L, Amer-Yahia S. It takes variety to make a world:diversification in recommender systems. Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database Technology. ACM,2009.368-378.
    [90]Zhen Y, Li W J, Yeung D Y. TagiCoFi:tag informed collaborative filtering. Proceedings of the third ACM conference on Recommender systems. ACM,2009.69-76.
    [91]Zheng Z, Ma H, Lyu M R, et al. QoS-aware Web service recommendation by collaborative filtering. Services Computing, IEEE Transactions on,2011,4(2):140-152.
    [92]Zitnick L, Kanade T. Maximum entropy for collaborative filtering. arXiv preprint arXiv:1207.4152,2012..
    [93]吴湖,王永吉,王哲,et al.两阶段联合聚类协同过滤算法.软件学报,2010,21(5):1042-1054.
    [94]徐翔、王煦法.协同过滤算法中的相似度优化方法.计算机工程,2010,36(6).
    [95]方娟,梁文灿.一种基于协同过滤的网格门户推荐模型.电子与信息学报,2010,32(7):1585-1590.
    [96]曾小波,魏祖宽,金在弘.协同过滤系统的矩阵稀疏性问题的研究.计算机应用,2010,(004):1079-1082.
    [97]李春,朱珍民,高晓芳,et al.基于邻居决策的协同过滤推荐算法.计算机工程,2010,36(13):34.
    [98]汪静,印鉴,郑利荣,et al.基于共同评分和相似性权重的协同过滤推荐算法.计算机科学,2010,(002):99-104.
    [99]黄创光,印鉴,汪静,et al.不确定近邻的协同过滤推荐算法.计算机学报,2010,33(8):1369-1377.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700