一种融合历史均值与提升树的客流量预测模型
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Passenger Flow Predication Model Combining History Means and Boosting Tree
  • 作者:白智远 ; 温从威 ; 杨锦浩 ; 陈智 ; 吕品
  • 英文作者:BAI Zhi-yuan;WEN Cong-wei;YANG Jin-hao;CHEN Zhi;LYU Pin;School of Electronics and Information,Shanghai Dianji University;
  • 关键词:历史均值 ; 提升树 ; 时间序列加权回归 ; 互联网商家 ; 客流量
  • 英文关键词:history mean;;boosting tree;;time series weighting regression;;Internet business;;passenger flow
  • 中文刊名:WJFZ
  • 英文刊名:Computer Technology and Development
  • 机构:上海电机学院电子信息学院;
  • 出版日期:2018-12-20 15:15
  • 出版单位:计算机技术与发展
  • 年:2019
  • 期:v.29;No.264
  • 基金:2017年上海市大学生科创项目(A1-5701-17-009-02-54);; 上海市教育科学研究项目(C17014/17AR04)
  • 语种:中文;
  • 页:WJFZ201904043
  • 页数:4
  • CN:04
  • ISSN:61-1450/TP
  • 分类号:218-221
摘要
移动定位服务的发展使得互联网商家"线上线下"的交易数据急剧增长,如何挖掘出海量交易数据中隐藏的用户行为、实现智能化决策是互联网商家在运营过程中面临的一个重要问题。基于此,提出了一种融合历史均值与提升树的互联网商家客流量预测模型,其中提升树用于改进模型的预测精度,历史均值模型用于考虑客流量预测与时间的依赖关系。历史均值与提升树融合的核心思想是先通过提升树XGBoost、GBDT和历史均值模型预测商家过去三周的平均销量和总销量,然后,构建提升树模型与历史均值模型的融合权重系数公式。在包含2 000个互联网商家销售数据集上实现了该方法,并将其与时间序列加权回归模型进行了对比,发现两种方法的预测结果相似,这表明该方法考虑时间因素是正确合理的;并且在训练集增大的情况下,模型的预测精度能得到显著改善。
        The development of mobile positioning service makes the online and offline transaction data of Internet merchants grow rapidly. How to dig out the hidden user behaviors in the massive transaction data and realize the intelligent decision-making is a critical problem that Internet merchants are facing in the process of operation. Based on this,we propose an Internet merchant traffic prediction model integrating historical mean and boosting tree,in which the boosting tree is used to improve the prediction accuracy,and the historical mean model is used to consider the dependence between passenger flow prediction and time. The core idea of the proposed model is to predict the average sales and total sales of merchants in the past three weeks by XGBoost,GBDT and historical mean model,and then build the fusion weight coefficient formula of the boosting tree and historical mean model. This method is implemented on the sales data set of 2 000 Internet merchants,and compared with the weighted regression model of time series. It is found that the results of the two methods are similar,which indicates that the proposed method is correct to consider the time factor. Moreover,with the increase of training set,the prediction accuracy of the model can be significantly improved.
引文
[1] 雷名龙.基于阿里巴巴大数据的购物行为研究[J].物联网技术,2016,6(5):57-60.
    [2] KIM J,HAN Jiawei,YUAN Cangzhou.TOPTRAC:topical trajectory pattern mining[C]//Proceeding of 2015 ACM SIGKDD international conference on knowledge discovery and data mining.Sydney,NSW,Australia:ACM,2015:587-596.
    [3] ZHANG Chao,ZHOU Guangyu,YUAN Quan,et al.Ge-oBurst:real-time local event detection in geo-tagged tweet stream[C]//Proceeding of 2016 ACM SIGIR conference on research & development in information retrieval.Pisa,Italy:ACM,2016:513-522.
    [4] ZHANG A,GOYAL A,BAEZA-YATES R,et al.Towards mobile query auto-completion:an efficient mobile application-aware approach[C]//Proceeding of 25th international conference on world wide web.Montréal,Québec,Canada:WWW,2016:579-590.
    [5] GUI Huan,LIU Haishan,MENG Xiangrui,et al.Downside management in recommender systems[C]//Proceeding of 2016 IEEE/ACM international conference on advances in social networks analysis and mining.San Francisco,CA,USA:IEEE,2016:394-401.
    [6] QU Meng,TANG Jian,SHANG Jingbo,et al.An attention-based collaboration framework for multi-view network representation learning[C]//Proceeding of 2017 ACM international conference on information and knowledge management.Singapore:ACM,2017:1767-1776.
    [7] 付全兴,韩立新,杨艺.基于生活场景的逻辑回归推荐算法[J].计算机与现代化,2016(12):38-41.
    [8] 陈传波,潘非,李其申,等.时间序列趋势加权平滑预测模型研究[J].小型微型计算机系统,2001,22(11):1299-1301.
    [9] 张昊,纪宏超,张红宇.XGBoost算法在电子商务商品推荐中的应用[J].物联网技术,2017,7(2):102-104.
    [10] CHEN Tianqi,GUESTRIN C.XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining.San Francisco:ACM,2016:785-794.
    [11] 曹颖超.改进的GDBT迭代决策树分类算法及其应用[J].科技视界,2017(12):105.
    [12] 王爱平,万国伟,程志全,等.支持在线学习的增量式极端随机森林分类器[J].软件学报,2011,22(9):2059-2074.
    [13] 肖瑞,刘国华.基于趋势的时间序列相似性度量和聚类研究[J].计算机应用研究,2014,31(9):2600-2605.
    [14] 殷春霞,楚涛,马力.基于数据挖掘的网络性能分析系统的设计和实现[J].计算机工程,2006,32(12):136-138.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700