基于网络结构的推荐算法的研究

英文题名：The Reasearch of Recommendation Algorithm Based on Network
作者：杜晗
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：推荐算法 ; 二分图 ; 并行化
英文关键词：recommendation algorithm ; bipartite network ; parallelization
学位年度：2013
导师：杨娟
学科代码：0812
学位授予单位：北京邮电大学
论文提交日期：2012-12-01

摘要

随着互联网技术与其应用领域的迅猛发展,接入互联网的服务器数量迅速增长,信息共享变得越来越方便。与此同时产生的是互联网上数据的指数增长,带来了海量数据的存储和管理问题。面对如此海量的数据,用户想要找到自已感兴趣的商品或信息变得非常困难。传统的搜索算法只能根据用户提供的关键词呈现给所有用户一样的搜索结果,不能深入挖掘用户的喜好,提供不同的搜索服务。信息量大规模增长,但是信息的利用率很低,这种现象称为信息过载。解决信息过载最有效的方法之一是采用个性化推荐系统。个性化推荐系统产生的目的是为了降低人们在网上搜寻信息时所耗费的时间成本,采用一种“推”的方式把信息呈现在人们面前,它的实现思想是根据用户过去的行为记录,例如浏览记录、点击记录、购买商品的历史记录等,利用一些算法,挖掘出用户的喜好或需求,预测用户可能感兴趣的潜在的信息和商品或者服务,从而对用户进行推荐。本文首先对推荐系统做了一般性的了解,然后深入研究了基于网络结构的推荐算法的发展现状,并且详细介绍了基于二分图的推荐算法。在基于二分图的推荐算法的基础上,做了以下三个方面的工作：
     (1)利用Hadoop平台做分布式计算框架和底层存储,利用mongodb数据库做辅助存储,将基于二分图的资源分配算法改写成为mapreduce模式,设计了基于二分图的资源分配推荐算法的并行化,并且编码实现了算法,设计多组实验进行验证。实验结果显示算法的效率和可扩展性达到了预期要求。
     (2)本文分析了用户在网上选择产品的过程中可能会发生的兴趣转移,以及在兴趣发生转移之后用户选择的产品信息对推荐结果可能带来的影响。在这个分析结果的基础上,本文将用户选择的时间因素考虑进来,在基于二分图的资源分配算法的基础上,提出了一种改良的加入时间权重的推荐算法,最后编码实现并验证推荐准确率和推荐列表多样性。
     (3)基于二分图的资源分配算法把用户对产品的选择信息包含在二分图的边中,但是没有考虑用户对产品的喜好程度,本文将用户对产品的打分信息作为边的权值加在二分图上,并且改进了基于二分图的资源分配算法的资源分配方法,然后以此来设计推荐算法,最后利用网上的公共数据库里的数据进行验证。
With the rapid development of Internet technology, the servers which connect to Internet is become more and more, and it is more convenient for sharing information. At the same time, the data scale on the Internet is on exponential growth, and bring about the problem of data storage and data management. In the face of such mass of data, users find that it is become much more difficult to find the product of information which they interested in. The traditional searching algorithm can only present the same search result to users by the key words they offered, but the usage of information is very low, which called information overloaded. One of the most effective way to solve the problem of information overloaded is to use personalized recommendation system. Personalized recommendation system is produced to lower the time cost of users to searching useful information, present the information to users as a way of "push". Which is achieved by the method using the history records of users, which include browsing records, click records and so on, and then utilize some kinds of recommendation algorithms, guess the hobbies and interests of users and forcast which information they may be interested in. Thus recommend products to users. This article make some general understanding on recommendation system first, and then make deep research on the development of recommendation systems based on network, and introduce the recommendation algorithm based on bipartite network in detail. On the basis of recommendation algorithm based on bipartite network, we make three aspects achievements as follows:
     (1) This article use Hadoop platform as bottom storage and distribute computing framework, and mongodb as auxiliary storage, and transform the resource allocation algorithm based on bipartite network to mapreduce mode, and design a parallelized recommendation algorithm based on resource allocation algorithm of bipartite network. At last we design several experiments and the results of these experiments prove the excellent efficiency of our parallelized algorithm.
     (2) This article analyse the users'interests transference in product choose progress, and the influence of recommendation results by such transference. Based on these analysis, this article take consideration of the factor of time, and design an improved recommendation algorithm based on resource allocation algorithm of bipartite network. And at last we design several experiments to prove the excellent efficiency of our parallelized algorithm.
     (3) Resource allocation algorithm of bipartite network put the preference relation of users and products include the edges of network, without the consideration of how much the user prefer to the product. This article put the rating information to the network as edges'weight. And improve the resource allocation method of the resource allocation algorithm. And then design an improved weighted recommendation algorithm based on resource allocation algorithm of bipartite network. And at last we design several experiments to prove the excellent efficiency of our parallelized algorithm.

引文

[1]Resnick P, Iakovou N, Sushak M, et al. GroupLens:An open architecture for collaborative filtering of netnews. Proc 1994 Computer Supported Cooperative Work Conf, Chapel Hill,1994:175-186
    [2]Hill W, Stead L, Rosenstein M, et al. Recommending and evaluating choices in a virtual community of use. Proc Conf Human Factors in Computing Systems. Denver, 1995:194-201
    [3]Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. New York: Addison-Wesley Publishing Co.1999.
    [4]Murthi BPS, Sarkar S. The role of the management sciences in research on personalization. Management Science,2003,49(10):1344-1362.
    [5]Shardanand U, Maes P. Social information filtering:Algorithms for automating 'Word of Mouth'. Proc Conf Human Factors in Computing Systems Denver,1995: 210-217.
    [6]Breese JS, Heckerman D, Kadie C. Empirical analysis of predictive algorithms for collaborative filtering. Proc 14th Conf Uncertainty in Artificial Intelligence Madison, 1998:43-52.
    [7]Getoor L, Sahami M. Using probabilistic relational models for collaborative filtering. Proc Workshop Web Usage Analysis and User Profiling, San Diego,1999
    [8]Pavlov D, Pennock D. A maximum entropy approach to collaborative filtering in dynamic, sparse, high-dimensional domains. Proc 16th Ann Conf Neural Information Processing Systems,2002.
    [9]Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms. Proc 10th Int'l WWW Conf, Hong Kong,2001:1-5
    [10]Chien YH, George El. A Bayesian model for collaborative filtering. Proc Seventh Int'l Workshop Artificial Intelligence and Statistics,1999
    [11]Rich E. User modeling via stereotypes. Cognitive Science,1979,3(4):329-354
    [12]Konstan JA, Miller BN, Maltz D, et al. GroupLens:Applying collaborative-filtering to usenet news. Comm ACM,1997,40(3):77-87.
    [13]Goldberg D, Nichols D, Oki BM, et al. Using collaborative filtering to weave an information tapestry. Comm ACM,1992,35(12):61-70
    [14]Terveen L, Hill W, Amento B, et al. PHOAKS:A system for sharing recommendation. Comm ACM,1997,40(3):59-62
    [15]Linden G, Smith B, York J. Amazon.com recommendations:Item to item collaborative filtering. IEEE Internet Computing,2003,7(1):76-80
    [16]Goldberg K, Roeder T, Gupta D, et al. Eigentaste:A constant time collaborative filtering algorithm, Information Retrieval J,2001,4(2):133-151
    [17]Salton G, Automatic Text Processing. Addison-Wesley,1989
    [18]Balabanovic M, Shoham Y. Fab:Content-based, collaborative recommendation. Comm ACM,1997,40(3):66-72
    [19]Mooney RJ, Bennett PN, Roy L. Book recommending using text categorization with extracted information. Proc Recommender Systems Papers from 1998 Workshop, Technical Report WS-98-08,1998
    [20]Zhang Y, Callan J, Minka T. Novelty and redundancy detection in adaptive filtering. Proc 25th Ann Int'l ACM SIGIR Conf Tampere,2002:81-88
    [21]Zhang Y, Callan J. Maximum likelihood estimation for filtering thresholds. Proc 24th Ann Int'l ACM SIGIR Conf, New Orleans,2001:294-302
    [22]Zhou T, Ren J, Medo M, et al. Bipartite network projection and personal recommendation. Phys Rev E,2007,76:046115
    [23]Zhou T, Jiang LL, Su RQ, et al. Effect of initial configuration on network-based recommendation. Europhys Lett,2008,81:58004
    [24]Kuscsik Z, Zhang YC, Zhou T. Improved recommendation algorithm with similarity threshold. Submitted to Phys Rev E
    [25]Billsus D, Pazzani M J. Learning collaborative information filters [C]//Rich C, Mostow J. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-1998). Menlo Park, Calif:AAAI Press,1998:46-53
    [26]Basu C Hirsh H, Cohen WW. Recommendation as classification:using social and content-based information in recommendation[C]//Rich C, Mostow J. Proceedings of the 15th National Conference on Artificial Intelligence (AAA I-1998). Menlo Park, Calif:AAA I Press,1998:714-720
    [27]Apache Hadoop. http://hadoop.apache.org
    [28]mongoDB. http://www.mongodb.org/.
    [29]Zhang YC, Blattner M, Yu YK. Heat conduction process on community networks as recommendation model, Physical Review Letters 99(2007) 154301.
    [30]Zhang YC, Medo M, Ren J, et al. Recommendation model based on opinion diffusion, EPL 80(2007) 68003
    [31]周涛.科学网——复杂网络观察. http://blog.sciencenet.cn/blog-3075-359994.html

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700