基于深度强化学习的车辆跟驰控制

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于深度强化学习的车辆跟驰控制

详细信息查看全文 | 推荐本文 |

英文篇名：A Car-following Control Algorithm Based on Deep Reinforcement Learning
作者：朱冰 ; 蒋渊德 ; 赵健 ; 陈虹 ; 邓伟文
英文作者：ZHU Bing;JIANG Yuan-de;ZHAO Jian;CHEN Hong;DENG Wei-wen;State Key Laboratory of Automotive Simulation and Control, Jilin University;School of Transportation Science and Engineering, Beihang University;
关键词：汽车工程 ; 跟驰控制 ; 深度强化学习 ; 自适应巡航控制 ; 运动不确定 ; 高斯过程
英文关键词：automotive engineering;;car-following control;;deep reinforcement learning;;adaptive cruise control;;driver's uncertainty;;Gaussian process
中文刊名：ZGGL
英文刊名：China Journal of Highway and Transport
机构：吉林大学汽车仿真与控制国家重点实验室;北京航空航天大学交通科学与工程学院;
出版日期：2019-06-15
出版单位：中国公路学报
年：2019
期：v.32;No.190
基金：国家重点研发计划项目(2016YFB0100904);; 国家自然科学基金项目(51775235);; 吉林省科技发展计划重点科技研发项目(20180201056GX);; 吉林省发改委科技研发项目(2019C036-6)
语种：中文;
页：ZGGL201906006
页数：8
CN：06
ISSN：61-1313/U
分类号：57-64

摘要

针对自适应巡航控制系统在控制主车跟驰行驶中受前车运动状态的不确定性影响问题,在分析车辆运动特点的基础上,提出一种能够考虑前车运动随机性的跟驰控制策略。搭建驾驶人实车驾驶数据采集平台,招募驾驶人进行实车跟驰道路试验,建立驾驶人真实驾驶数据库。假设车辆未来时刻的加速度决策主要受前方目标车辆运动影响,建立基于双前车跟驰结构的主车纵向控制架构。将驾驶数据库中的驾驶数据分别视作前车和前前车运动变化历程,利用高斯过程算法建立了前车纵向加速度变化随机过程模型,实现对前方目标车运动状态分布的概率性建模。将车辆跟驰问题构建为一定奖励函数下的马尔可夫决策过程,引入深度强化学习研究主车跟驰控制问题。利用近端策略优化算法建立车辆跟驰控制策略,通过与前车运动随机过程模型进行交互式迭代学习,得到具有运动不确定性跟驰环境下的主车纵向控制策略,实现对车辆纵向控制的最优决策。最后基于真实驾驶数据,对控制策略进行测试。研究结果表明:该策略建立了车辆纵向控制与主车和双前车状态之间的映射关系,在迭代学习过程中对前车运动的随机性进行考虑,跟驰控制中不需要对前车运动进行额外的概率预测,能够以较低的计算量实现主车稳定跟随前车行驶。
Longitudinal acceleration decisions in a car-following control mode are directly determined by the state of the preceding vehicle. A driver's uncertainty makes car-following control difficult because of the complexity in state prediction of the target vehicle. To address the problem in which the performance of adaptive cruise control may deteriorate without consideration of the uncertainty of the preceding vehicle, a car-following control strategy based on deep reinforcement learning was proposed. To study the characteristics of human drivers, a driving-data-acquisition platform was established, and substantial amounts of human-driving data were collected. Based on the assumption that longitudinal control decisions are mainly affected by the preceding vehicle, a two-predecessor following structure was established. The vehicles in the driving dataset were taken as target vehicles 1~# and 2~# of the car-following control. Based on the real-world driving dataset, a stochastic process model was established to describe the characteristics of preceding vehicle 1~# based on Gaussian process algorithm. Then car-following control was established as a Markov decision process. A car-following control method based on deep reinforcement learning was obtained through iterative learning with the stochastic process model using proximal policy optimization. Finally, the algorithm was verified based on the driving dataset. The results demonstrate that the mapping between longitudinal acceleration decisions and the states of the host and preceding vehicles can be obtained through iterative learning with consideration of the uncertainty of the target vehicle.

引文

[1] LI S,LI K,RAJAMANI R,et al.Model Predictive Multi-objective Vehicular Adaptive Cruise Control [J].IEEE Transactions on Control Systems Technology,2011,19 (3):556-566.
    [2] 王建强,杨波,张德兆,等.基于双模式执行器的商用车自适应巡航控制系统[J].中国公路学报,2011,24(3):104-112.WANG Jian-qiang,YANG Bo,ZHANG De-zhao,et al.Adaptive Cruise Control System of Commercial Vehicle Based on Dual-mode Actuators [J].China Journal of Highway and Transport,2011,24 (3):104-112.
    [3] 朱敏,陈慧岩.考虑车间反应时距的汽车自适应巡航控制策略[J].机械工程学报,2017,53(24):144-150.ZHU Min,CHEN Hui-yan.Strategy for Vehicle Adaptive Cruise Control Considering the Reaction Headway [J].Journal of Mechanical Engineering,2017,53 (24):144-150.
    [4] GAO Z,WANG J,HU H,et al.Control Mode Switching Strategy for ACC Based on Intuitionistic Fuzzy Set Multi-attribute Decision Making Method [J].Journal of Intelligent & Fuzzy Systems,2016,31 (6):2967-2974.
    [5] JONSSON J,JANSSON Z.Fuel Optimized Predictive Following in Low Speed Conditions [J].IFAC Proceedings Volumes,2004,37 (22):119-124.
    [6] EBEN L S,LI K,WANG J.Economy-oriented Vehicle Adaptive Cruise Control with Coordinating Multiple Objectives Function [J].Vehicle System Dynamics,2013,51 (1):1-17.
    [7] ZHU B,JIANG Y,ZHAO J,et al.Typical-driving-style-oriented Personalized Adaptive Cruise Control Design Based on Human Driving Data [J].Transportation Research Part C,2019,100:274-288.
    [8] BRACKSTONE M,SULTAN B,MCDONALD M.Motorway Driver Behaviour:Studies on Car Following [J].Transportation Research Part F,2002,5 (1):31-46.
    [9] 王雪松,朱美新.基于自然驾驶数据的中国驾驶人城市快速路跟驰模型标定与验证[J].中国公路学报,2018,31(9):129-138.WANG Xue-song,ZHU Mei-xin.Calibration and Validation Car-following Models on Urban Expressways for Chinese Drivers Using Naturalistic Driver Data [J].China Journal of Highway and Transport,2018,31 (9):129-138.
    [10] MOON S,YI K.Human Driving Data-based Design of a Vehicle Adaptive Cruise Control Algorithm [J].Vehicle System Dynamics,2008,46 (8):661-690.
    [11] MORTON J,WHEELER T A,KOCHENDERFER M J.Analysis of Recurrent Neural Networks for Probabilistic Modeling of Driver Behavior [J].IEEE Transactions on Intelligent Transportation Systems,2016,18 (5):1289-1298.
    [12] JIANG Y,DENG W,WANG J,et al.Studies on Drivers’ Driving Styles Based on Inverse Reinforcement Learning [J].SAE Paper 2018-01-0612.
    [13] RAJAMANI R,PHANOMCHOENG G,PIYABONGKARN D,et al.Algorithms for Real-time Estimation of Individual Wheel Tire-road Friction Coefficients [J].IEEE/ASME Transactions on Mechatronics,2011,17 (6):1183-1195.
    [14] AOUDE G S.Threat Assessment for Safe Navigation in Environments with Uncertainty in Predictability [D].Cambridge:Massachusetts Institute of Technology,2011.
    [15] MOSER D,WASCHL H,KIRCHSTEIGER H,et al.Cooperative Adaptive Cruise Control Applying Stochastic Linear Model Predictive Control Strategies [C] // IEEE.2015 European Control Conference (ECC).New York:IEEE,2015:3383-3388.
    [16] KAMAL M A S,MUKAI M,MURATA J,et al.Ecological Driving Based on Preceding Vehicle Prediction Using MPC [J].IFAC Proceedings Volumes,2011,44 (1):3843-3848.
    [17] 钱立军,荆红娟,邱利宏.基于随机模型预测控制的四驱混合动力汽车能量管理[J].中国机械工程,2018,29(11):1342-1348.QIAN Li-jun,JING Hong-juan,QIU Li-hong.Energy Management of a 4WD HEV Based on SMPC [J].China Mechanical Engineering,2018,29 (11):1342-1348.
    [18] LIU B,LI L,WANG X,et al.Hybrid Electric Vehicle Downshifting Strategy Based on Stochastic Dynamic Programming During Regenerative Braking Process [J].IEEE Transactions on Vehicular Technology,2018,67 (6):4716-4727.
    [19] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the Game of Go Without Human Knowledge [J].Nature,2017,550 (7676):354.
    [20] ZAMBALDI V,RAPOSO D,SANTORO A,et al.Relational Deep Reinforcement Learning [EB/OL].[2018-10-18].https://arxiv.org/pdf/1806.01830.pdf.
    [21] DESJARDINS C,CHAIB-DRAA B.Cooperative Adaptive Cruise Control:A Reinforcement Learning Approach [J].IEEE Transactions on Intelligent Transportation Systems,2011,12 (4):1248-1260.
    [22] WANG J,XU X,LIU D,et al.Self-learning Cruise Control Using Kernel-based Least Squares Policy Iteration [J].IEEE Transactions on Control Systems Technology,2014,22 (3):1078-1087.
    [23] 朱冰,蒋渊德,邓伟文,等.基于KL散度的驾驶员驾驶习性非监督聚类[J].汽车工程,2018,40(11):1317-1323.ZHU Bing,JIANG Yuan-de,DENG Wei-wen,et al.Unsupervised Clustering of Driving Styles Based on KL Divergence [J].Automotive Engineering,2018,40 (11):1317-1323.
    [24] PARIOTA L.Driving Behaviour for ADAS:Theoretical and Experimental Analyses [D].Naples:University of Naples,2013.
    [25] RASMUSSEN C E.Gaussian Processes in Machine Learning [C] // Springer.Summer School on Machine Learning.Berlin:Springer,2003:63-71.
    [26] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal Policy Optimization Algorithms [EB/OL].[2018-10-18].https://arxiv.org/pdf/1707.06347.pdf.
    [27] HEESS N,SRIRAM S,LEMMON J,et al.Emergence of Locomotion Behaviours in Rich Environments [EB/OL].[2018-10-18].https://arxiv.org/pdf/1707.02286.pdf.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700