用户名: 密码: 验证码:
基于Spark出租车乘客出行时空特征分析研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Analysis and Research of Spatio-temporal Characteristics of Taxi Passengers' Trip Based on Spark
  • 作者:李雷孝 ; 周成栋 ; 高静
  • 英文作者:LI Lei-xiao;ZHOU Cheng-dong;GAO Jing;College of Data Science and Application,Inner Mongolia University of Technology;Inner Mongolia Autonomous Region Engineering & Technology Research Center of Big Data Based Software Service;College of Computer and Information Engineering,Inner Mongolia Agricultural University;
  • 关键词:Spark ; K-means算法 ; 热点区域 ; 时间分布 ; 时空特征分析
  • 英文关键词:spark;;k-means algorithm;;hot spot;;temporal distribution;;spatio-temporal characteristics analysis
  • 中文刊名:NMGD
  • 英文刊名:Journal of Inner Mongolia University of Technology(Natural Science Edition)
  • 机构:内蒙古工业大学数据科学与应用学院;内蒙古自治区基于大数据的软件服务工程技术研究中心;内蒙古农业大学计算机与信息工程学院;
  • 出版日期:2019-04-15
  • 出版单位:内蒙古工业大学学报(自然科学版)
  • 年:2019
  • 期:v.38;No.128
  • 基金:国家自然科学基金资助项目(61462070)
  • 语种:中文;
  • 页:NMGD201902009
  • 页数:11
  • CN:02
  • ISSN:15-1060/T
  • 分类号:52-62
摘要
随着出租车GPS轨迹信息的不断增多,如何高效的从海量轨迹信息中挖掘出租车乘客出行的时空特征信息,成为数据处理和分析领域的研究热点.研究了K-means算法和Spark平台技术,设计了基于Spark平台的出租车乘客出行热点区域挖掘方法、出租车使用时间分布特征挖掘方法以及时空特征分析方法,得出了乘客出行繁忙时段与区域的时空规律.实验结果表明:和传统的SPSS软件相比,该方法能够快速地分析出租车乘客出行的时空分布规律.
        With the increasing of taxi GPS trajectory information,how to efficiently extract the spatio-temporal characteristics of taxi passengers from massive track information has become a heated topic for research in the field of data processing and analysis.In this paper,K-means algorithm and Spark platform technology were employed to design information mining method for obtaining the information about the hot travelling spots and temporal distribution patterns of taxi passengers.The method for spatio-temporal characteristic analysis was proposed.As a result,the passengers' busy traveling time and hot spots were obtained.Experimental results show that compared with the traditional SPSS software,this method can obtain the temporal and spatial distribution pattern of taxi passengers more quickly.
引文
[1] Maciejewski M,Bischoff J,Nagel K.An Assignment-Based Approach to Efficient Real-Time City-Scale Taxi Dispatching[J].IEEE Intelligent Systems,2016,31(1):68~77.
    [2] 刘家良,孙立双.城市出行热点区域的出租车调度点配置[J].中国科技论文,2018,13(9):1012~1017.
    [3] Shao D,Wu W,Xiang S,et al.Estimating Taxi Demand-Sup-ply Level Using Taxi Trajectory Data Stream[C]//2015 IEEE International Conference on Data Mining Workshop (ICDMW).IEEE Computer Society,2015:407~413.
    [4] Cui Jianxun,Liu Feng,et al.Detecting urban road network accessibility problems using taxi GPS data[J].Journal of Transport Geography,2016,(51):147~157.
    [5] 张红,王晓明,过秀成,等.出租车GPS轨迹大数据在智能交通中的应用[J].兰州理工大学学报,2016,42(1):109~114.
    [6] Zongtao D,Yun Y,Kai Z,et al.Improved Deep Hybrid Networks for Urban Traffic Flow Prediction Using Trajectory Data[J].IEEE Access,2018,6:31820~31827.
    [7] 陈卓然,黄翀,刘高焕,等.基于出租车GPS数据的居民就医时空特征分析[J].地球信息科学报,2018,20(8):1111~1122.
    [8] Mao F,Minhe J I,Liu T.Mining spatiotemporal patterns of urban dwellers from taxi trajectory data[J].Frontiers of Earth Science,2016,10(2):205~221.
    [9] 程敏等.基于Spark和浮动出租车全球定位系统数据的实时交通路况预测方法[J].集成技术,2016,5(6):62~70.
    [10] Zaharia M,Chowdhury M,Franklin M J,et al.Spark:clustercomputing with working sets[C]//Usenix Confere-nce on Hot Topics in Cloud Computing.USENIX Assoc-iation,2010:10~10.
    [11] 王郑委.基于大数据Hadoop平台的出租车载客热点区域挖掘研究[D].北京:北京交通大学,2016.
    [12] White T,Cutting D.Hadoop:the definitive guide[J].O'reilly Media Inc Gravenstein Highway North,2009,215(11):1~4.
    [13] 廖彬,张陶,于炯,等.基于Spark的MapReduce相似度计算效率优化[J].计算机科学,2017,44(8):46~53.
    [14] Chenxi W,Fang L,Huimin C,et al.Heterogeneous Memory Programming Framework Based on Spark for B-ig Data Processing[J].Journal of Computer Research and Develop-ment,2018,55(2):246~264.
    [15] 段宗涛,陈志明,陈柘,等.基于Spark平台城市出租车乘客出行特征分析[J].计算机系统应用,2017,26(3):37~43.
    [16] 杨树亮,毕硕本,Nkunzimana A,等.一种出租车载客轨迹空间聚类方法[J].计算机工程与应用,2018,54(14):249~254.
    [17] Zhang W,Mo T,Li W,et al.The Comparison of Decision Tree Based Insurance Churn Prediction between Spark MLand SPSS[C]//2016 9th International Conference on Servi-ce Science (ICSS).IEEE Computer Society,2016.
    [18] Macqueen J.Some Methods for Classification and Analysis of Multi Variate Observations[C]//Proc of Berkeley Symposium on Mathematical Statistics & Probability,1965:281~297.
    [19] Yuan J,Tian Y.Practical Privacy-Preserving MapReduce B-ased K-means Clustering over Large-scale Dataset[J].IEEETransactions on Cloud Computing,2017,99(4):568~579.
    [20] Khanmohammadi S,Adibeig N,Shanehbandy S.An Improv-ed Overlapping k-Means Clustering Method for Medical Applications[J].Expert Systems with Applications,2017,(67):12~18.
    [21] Lu Y,Cao B,Rego C,et al.A Tabu search based clustering algorithm and its parallel implementation on Spark[J].Applied Soft Computing,2018,63:97~109.
    [22] Bhattacharya A,Jaiswal R,Ailon N.Tight lower bound inst-ances for k-means++ in two dimensions[J].Theoretical Co-mputer Science,2016,634(C):55~66.
    [23] Zhang M,Chen R,Zhang X,et al.Intelligent RDD Manage-ment for High Performance In-Memory Computing in Sp-ark[C]//International Conference on World Wide Web Co-mpanion.International World Wide Web Conferences Steer-ing Committee,2017:873~874.
    [24] 牛丹丹,段宗涛,等.城市出租车乘客出行特征可视化分析方法[J].计算机工程与应用,2019,55(6):237~243.
    [25] 张文元,谈国新,等.停留点空间聚类在景区热点分析中的应用[J].计算机工程与应用,2018,54(4):263~270.
    [26] 李雪丽,盛勇,兰小机.基于Spark的并行化出租车轨迹热点区域提取与分析[J].计算机科学与应用,2018,8(9):1482~1489.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700