摘要
移动定位服务的发展使得互联网商家"线上线下"的交易数据急剧增长,如何挖掘出海量交易数据中隐藏的用户行为、实现智能化决策是互联网商家在运营过程中面临的一个重要问题。基于此,提出了一种融合历史均值与提升树的互联网商家客流量预测模型,其中提升树用于改进模型的预测精度,历史均值模型用于考虑客流量预测与时间的依赖关系。历史均值与提升树融合的核心思想是先通过提升树XGBoost、GBDT和历史均值模型预测商家过去三周的平均销量和总销量,然后,构建提升树模型与历史均值模型的融合权重系数公式。在包含2 000个互联网商家销售数据集上实现了该方法,并将其与时间序列加权回归模型进行了对比,发现两种方法的预测结果相似,这表明该方法考虑时间因素是正确合理的;并且在训练集增大的情况下,模型的预测精度能得到显著改善。
The development of mobile positioning service makes the online and offline transaction data of Internet merchants grow rapidly. How to dig out the hidden user behaviors in the massive transaction data and realize the intelligent decision-making is a critical problem that Internet merchants are facing in the process of operation. Based on this,we propose an Internet merchant traffic prediction model integrating historical mean and boosting tree,in which the boosting tree is used to improve the prediction accuracy,and the historical mean model is used to consider the dependence between passenger flow prediction and time. The core idea of the proposed model is to predict the average sales and total sales of merchants in the past three weeks by XGBoost,GBDT and historical mean model,and then build the fusion weight coefficient formula of the boosting tree and historical mean model. This method is implemented on the sales data set of 2 000 Internet merchants,and compared with the weighted regression model of time series. It is found that the results of the two methods are similar,which indicates that the proposed method is correct to consider the time factor. Moreover,with the increase of training set,the prediction accuracy of the model can be significantly improved.
引文
[1] 雷名龙.基于阿里巴巴大数据的购物行为研究[J].物联网技术,2016,6(5):57-60.
[2] KIM J,HAN Jiawei,YUAN Cangzhou.TOPTRAC:topical trajectory pattern mining[C]//Proceeding of 2015 ACM SIGKDD international conference on knowledge discovery and data mining.Sydney,NSW,Australia:ACM,2015:587-596.
[3] ZHANG Chao,ZHOU Guangyu,YUAN Quan,et al.Ge-oBurst:real-time local event detection in geo-tagged tweet stream[C]//Proceeding of 2016 ACM SIGIR conference on research & development in information retrieval.Pisa,Italy:ACM,2016:513-522.
[4] ZHANG A,GOYAL A,BAEZA-YATES R,et al.Towards mobile query auto-completion:an efficient mobile application-aware approach[C]//Proceeding of 25th international conference on world wide web.Montréal,Québec,Canada:WWW,2016:579-590.
[5] GUI Huan,LIU Haishan,MENG Xiangrui,et al.Downside management in recommender systems[C]//Proceeding of 2016 IEEE/ACM international conference on advances in social networks analysis and mining.San Francisco,CA,USA:IEEE,2016:394-401.
[6] QU Meng,TANG Jian,SHANG Jingbo,et al.An attention-based collaboration framework for multi-view network representation learning[C]//Proceeding of 2017 ACM international conference on information and knowledge management.Singapore:ACM,2017:1767-1776.
[7] 付全兴,韩立新,杨艺.基于生活场景的逻辑回归推荐算法[J].计算机与现代化,2016(12):38-41.
[8] 陈传波,潘非,李其申,等.时间序列趋势加权平滑预测模型研究[J].小型微型计算机系统,2001,22(11):1299-1301.
[9] 张昊,纪宏超,张红宇.XGBoost算法在电子商务商品推荐中的应用[J].物联网技术,2017,7(2):102-104.
[10] CHEN Tianqi,GUESTRIN C.XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining.San Francisco:ACM,2016:785-794.
[11] 曹颖超.改进的GDBT迭代决策树分类算法及其应用[J].科技视界,2017(12):105.
[12] 王爱平,万国伟,程志全,等.支持在线学习的增量式极端随机森林分类器[J].软件学报,2011,22(9):2059-2074.
[13] 肖瑞,刘国华.基于趋势的时间序列相似性度量和聚类研究[J].计算机应用研究,2014,31(9):2600-2605.
[14] 殷春霞,楚涛,马力.基于数据挖掘的网络性能分析系统的设计和实现[J].计算机工程,2006,32(12):136-138.