摘要
协同过滤算法是解决信息超载的关键技术之一,但仍存在预测不准确的问题。因此,在分析了Spark技术及框架并阐述了Slope One算法不足的基础上,针对项目与用户间的相似性提出了一种改进的Slope One算法,并在Spark平台上实现了该算法。实验证明,改进后的Slope One算法具有更高的预测准确性,且在Spark平台上实现了并行化操作,用Speedup和Sizeup方法证明了算法的并行性、扩展性良好,提高了算法的效率。
As one of the key technologies to solve the information overload, the collaborative filtering algorithm exhibits the flaw of inaccuracy prediction. Therefore, based on the analysis of Spark technology as well as its framework and the elaboration of the flaw in Slope One algorithm,an improved Slope One algorithm has thus been proposed for the similarity between projects and users, followed by the implementation of the algorithm on Spark platform.Experimental results show that the improved Slope One algorithm has a higher accuracy of prediction with its paralleled implementation on Spark. The combined methods of Speedup and Sizeup prove that this algorithm is characterized with a good parallel effect and an excellent expansibility, thus helping to promote the efficiency.
引文
[1]车晋强,谢红薇.基于Spark的分层协同过滤推荐算法[J].电子技术应用,2015,41(9):135-138.CHE Jinqiang,XIE Hongwei.Hierarchical Collaborative Filtering Algorithm Based on Spark[J].Application of Electronic Technique,2015,41(9):135-138.
[2]张明敏.基于Spark平台的协同过滤推荐算法的研究与实现[D].南京:南京理工大学,2015.ZHANG Mingmin.Research on and Implementation of Collaborative Filtering Recommendation Algorithm Based on Spark Platform[D].Nanjing:Nanjing University of Science and Technology,2015.
[3]中国IDC圈.腾讯在Spark上的应用与实践优化[EB/OL].(2016-03-04)[2018-03-24].http://cloud idcquan.com/yjs/73681.shtml.IDC of China.The Application and Practice Optimization of Tencent on Spark[EB/OL].(2016-03-04)[2018-03-24].http://cloud.idcquan.com/yjs/73681.shtml.
[4]WANG Z,SUN L,ZHU W,et al.Joint Social and Content Recommendation for User-Generated Videos in Online Social Network[J].IEEE Transactions on Multimedia,2013,15(3):698-709.
[5]陈业斌,刘娜,徐宏,等.基于Spark的空间范围查询索引研究[J].计算机应用与软件,2018,35(2):96-101.CHEN Yebin,LIU Na,XU Hong,et al.Research on Range Queries in Spatial Index Based on Spark[J].Computer Applications and Software,2018,35(2):96-101.
[6]孔维梁.协同过滤推荐系统关键问题研究[D].武汉:华中师范大学,2013.KONG Weiliang.Research on the Key Problems of Collaborative Filtering Recommendation System[D].Wuhan:Central China Normal University,2013.
[7]王毅,楼恒越.一种改进的Slope One协同过滤算法[J].计算机科学,2011,38(10A):192-194.WANG Yi,LOU Hengyue.An Improved Slope One Algorithm for Collaborative Filtering[J].Computer Science,2011,38(10A):192-194.
[8]王茜,王均波.一种改进的协同过滤推荐算法[J].计算机科学,2010,37(6):226-228,243.WANG Qian,WANG Junbo.Improved Collaborative Filtering Recommendation Algorithm[J].Computer Science,2010,37(6):226-228,243.
[9]DESHPANDE M,KARYPIS G.Item-Based Top-NRecommendation Algorithms[J].ACM Transactions on Information Systems,2014,22(1):143-177.
[10]孙金刚,艾丽蓉.基于项目属性和云填充的协同过滤推荐算法[J].计算机应用,2012,32(3):658-660,668.SUN Jingang,AI Lirong.Collaborative Filtering Recommendation Algorithm Based on Item Attribute and Cloud Model Filling[J].Journal of Computer Applications,2012,32(3):658-660,668.
[11]ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient Distributed Datasets:A Fault-Tolerant Abstraction for In-Memory Cluster Computing[J].InMemory Cluster Computing.USENIX Symposium on Networked Systems Design and Implementation(NSDI),2012,70(2):141-146.
[12]TAKáCS G,PILáSZY I,NéMETH B,et al.Major Components of the Gravity Recommendation System[J].ACM SIGKDD Explorations Newsletter,2007,9(2):80-83.
[13]NIGHTINGALE E B,CHEN P M,FLINN J.Speculative Execution in a Distributed File System[J].ACM Transactions on Computer Systems,2006,24(4):361-392.