基于XGBoost算法的用户评分预测模型及应用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Predicting User Ratings with XGBoost Algorithm
  • 作者:杨贵军 ; 徐雪 ; 赵富强
  • 英文作者:Yang Guijun;Xu Xue;Zhao Fuqiang;China Center of Economics and Statistics Research, Tianjin University of Finance and Economics;Institute of Polytechnic, Tianjin University of Finance and Economics;
  • 关键词:评分预测 ; XGBoost算法 ; LDA主题模型 ; 文本特征提取 ; 用户评论
  • 英文关键词:Rating Prediction;;XGBoost Algorithm;;LDA;;Feature Extraction;;User Reviews
  • 中文刊名:XDTQ
  • 英文刊名:Data Analysis and Knowledge Discovery
  • 机构:天津财经大学中国经济统计研究中心;天津财经大学理工学院;
  • 出版日期:2019-01-25
  • 出版单位:数据分析与知识发现
  • 年:2019
  • 期:v.3;No.25
  • 基金:国家自然科学基金面上项目“劣者淘汰两阶段自适应临床试验的设计和分析”(项目编号:11471239);; 国家社会科学基金青年项目“社交媒体中敏感信息可信度评估方法研究”(项目编号:18CTJ008);; 全国统计科研计划重点项目“Web社会网络中敏感信息识别及突发事件预测研究”(项目编号:2017LZ05)的研究成果之一
  • 语种:中文;
  • 页:XDTQ201901016
  • 页数:9
  • CN:01
  • ISSN:10-1478/G2
  • 分类号:122-130
摘要
【目的】基于用户网络评论构建有效的评分预测模型,挖掘用户消费行为特征。【方法】基于LDA模型,量化用户评论为主题特征向量作为解释变量,将用户评分作为被解释变量,采用XGBoost算法,并加入样本扰动和属性扰动生成多个模型进行集成,构建用户评分预测模型。【结果】针对某汽车门户网站的用户评论评分预测结果表明,该模型较好地揭示了用户对汽车商品的偏好。较逻辑回归、随机森林算法,其预测准确度分别高出13.73%、0.64%,且具有较高的计算效率。【局限】未融合其他方面的数据对用户行为特征进行更全面的刻画。【结论】将用户评论量化为主题特征向量,基于XGBoost算法能够准确、高效地预测用户评分。
        [Objective] This study aims to build a model for effectively predicting ratings of user reviews and analysing consumer behaviours. [Methods] First, we applied the Latent Dirichlet Allocation model to set the topic features from user reviews as independent variable and user ratings as dependent variable. Then, we built a user rating prediction model based on the e Xtreme Gradient Boosting algorithm. Finally, we added the disturbances of samples and attributes to the proposed model for rating prediction. [Results] We used the new model to predict user's comments on a domestic automobile online portal, and identified their preferences of automobile. Compared with the Logical Regression and Random Forest algorithms, the proposed model has better precision and efficiency. [Limitations] We need to include data from other fields to more comprehensively describe user's behaviours. [Conclusions] The proposed model could quantify user's reviews and then predict their ratings effectively.
引文
[1]Koren Y,Bell R,Volinsky C.Matrix Factorization Techniques for Recommender Systems[J].Computer,2009,42(8):30-37.
    [2]Koren Y,Bell R.Advances in Collaborative Filtering[A]//Recommender Systems Handbook[M].New York:Springer,2011:145-186.
    [3]邓晓懿,金淳,韩庆平,等.基于情境聚类和用户评级的协同过滤推荐模型[J].系统工程理论与实践,2013,33(11):2945-2953.(Deng Xiaoyi,Jin Chun,Han Jim C,et al.Improved Collaborative Filtering Model Based on Context Clustering and User Ranking[J].Systems Engineering-Theory&Practice,2013,33(11):2945-2953.)
    [4]Li X,Xu G,Chen E,et al.Learning User Preferences across Multiple Aspects for Merchant Recommendation[C]//Proceedings of the 2015 IEEE International Conference on Data Mining.IEEE,2015.
    [5]Fan M,Khademi M.Predicting a Business Star in Yelp from Its Reviews Text Alone[OL].ar Xiv Preprint,ar Xiv:1401.0864.
    [6]张红丽,刘济郢,杨斯楠,等.基于网络用户评论的评分预测模型研究[J].数据分析与知识发现,2017,1(8):48-58.(Zhang Hongli,Liu Jiying,Yang Sinan,et al.Predicting Online Users’Ratings with Comments[J].Data Analysis and Knowledge Discovery,2017,1(8):48-58.)
    [7]高祎璠,余文喆,晁平复,等.基于评论分析的评分预测与推荐[J].华东师范大学学报:自然科学版,2015(3):80-90.(Gao Yifan,Yu Wenzhe,Chao Pingfu,et al.Analyzing Reviews for Rating Prediction and Item Recommendation[J].Journal of East China Normal University:Natural Science,2015(3):80-90.)
    [8]杨博,赵鹏飞.推荐算法综述[J].山西大学学报:自然科学版,2011,34(3):337-350.(Yang Bo,Zhao Pengfei.Review of the Art of Recommendation Algorithms[J].Journal of Shanxi University:Natural Science Edition,2011,34(3):337-350.)
    [9]Brown I,Mues C.An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets[J].Expert Systems with Applications,2012,39(3):3446-3453.
    [10]应维云.随机森林方法及其在客户流失预测中的应用研究[J].管理评论,2012,24(2):140-145.(Ying Weiyun.The Research on Random Forests and the Application in Customer Churn Prediction[J].Management Review,2012,24(2):140-145.)
    [11]Breiman L.Random Forests[J].Machine Learning,2001,45(1):5-32.
    [12]Chen T,Guestrin C.XGBoost:A Scalable Tree Boosting System[C]//Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining.ACM,2016:785-794.
    [13]Seyfio?lu M,Demirezen M.A Hierarchical Approach for Sentiment Analysis and Categorization of Turkish Written Customer Relationship Management Data[C]//Proceedings of the 2017 Federated Conference on Computer Science and Information Systems.IEEE,2017:361-365.
    [14]Athanasiou V,Maragoudakis M.A Novel,Gradient Boosting Framework for Sentiment Analysis in Languages where NLPResources are Not Plentiful:A Case Study for Modern Greek[J].Algorithms,2017,10(1):34.
    [15]Zhang R,Gao Y,Yu W,et al.Review Comment Analysis for Predicting Ratings[A]//Web-Age Information Management[M].Springer,2015:247-259.
    [16]Blei D M,Ng A Y,Jordan M I.Latent Dirichlet Allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
    [17]Friedman J H.Greedy Function Approximation:A Gradient Boosting Machine[J].Annals of Statistics,2001,29(5):1189-1232.
    [18]Breiman L I,Friedman J H,Olshen R A,et al.Classification and Regression Trees(CART)[J].Encyclopedia of Ecology,1984,40(3):582-588.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700