基于Q学习的多路口交通信号协调控制研究

英文题名：Based Q-iearning in Traffic Signal Control of Multiple Intersection
作者：韦钦平
论文级别：硕士
学科专业名称：交通运输规划与管理
中文关键词：离线Q学习 ; 交通信号控制 ; 绿灯时间优化 ; 相位差优化 ; 交通仿真平台 ; VISSIM接口技术 ; MATLAB接口技术
英文关键词：Offline Q-learning ; Traffic Signal Control ; Green Time
英文关键词：Optimization ; Offset Optimization ; Traffic Simulation Platform ; VISSIM
英文关键词：Interface Techniques ; MATLAB Interface Techniques
学位年度：2012
导师：沈文 ; 卢守峰
学科代码：082303
学位授予单位：长沙理工大学
论文提交日期：2012-03-23
答辩委员会主席：黄合来

摘要

交通问题已经成为制约城市发展经济的主要瓶颈，解决交通拥挤己经迫在眉睫，而城市空间的有限性和其他经济、环境等因素阻碍交通基础设施的扩展。引进人工智能、计算机仿真等高新技术，来解决城市交通的拥挤和堵塞问题，已经成为交通信号控制的研究热点。
     本文简要概述了信号控制的发展现状及Q学习理论之后，对重点针对Q学习理论应用于交通信号控制进行研究。一方面，是基于Q学习理论的绿灯时间优化的研究。针对固定周期和可变周期两种模式下的单路口信号配时优化进行研究，构造了等饱和度和延误最小为优化目标的奖赏函数，建立了两种优化目标的离线Q学习模型。通过VBA及Matlab编程实现算例，对4种离线Q学习模型的解的结构、最优解的分布进行分析，探讨离线Q学习优化模型在交叉口信号控制的适用性，最后将最优解应用到VISSIM实时交通控制中，并与经典Webster算法进行对比，结果表明Q学习绿时优化算法具有很强的优越性。
     另一方面，是基于Q学习理论的多路口相位差优化研究。针对固定周期模式下的多路口相位差优化进行研究，以集成VISSIM-ExcelVBA-Matlab的仿真平台为技术平台，采用VBA及Matlab编程建立了延误最小为优化目标的离线Q学习模型，然后将最优解应用到VISSIM实时交通控制中，并与MAXBAND进行对比，结果表明Q学习相位差优化算法具有很强的优越性。
     同时，为了构建算法研究的实验条件，对集成Vissim-ExcelVBA-Matlab的仿真平台研究，综合VISSIM可靠的微观交通流仿真能力、ExcelVBA高效的编程效率和数据通信能力以及MATLAB实现复杂智能交通控制算法的能力，通过VISSIM-ExcelVBA接口技术、ExcelVBA-MATLAB接口技术构建了交通控制实时仿真平台。最后对研究工作进行了总结，提出了需要进一步深入研究的问题。
The traffic problem has become a major bottleneck to restrict the urban’seconomic development, to solve the traffic congestion already imminent, the limitednature of urban space and other factors, such as economic, environmental, etc, thathinder the expansion of transport infrastructure. The introduction of artificialintelligence, computer simulation and other high-tech, to solve urban trafficcongestion and congestion, has become the hot spots of traffic signal control.
     In this thesis, after a brief overview of the signal to control the developmentstatus and Q-learning theory, focusing on the Q-learning theory applied to the trafficsignal control. On the one hand, is the green time optimization based on theQ-learning theory. For a fixed period and variable cycle two modes of singleintersection signal timing optimization study, construct saturation and delay minimumas the optimization objective reward function, established the two optimizationobjectives off-line Q-learning model. Through the VBA and Matlab programsexample, the thesis analysis the structure of the solution and the distribution of theoptimal solution of the4kinds of offline Q-learning model, and discusses theapplicability of offline Q-learning optimization model in the intersection signalcontrol. Then, the optimal solution is applied to real-time traffic control in VISSIM,and compared with the classical Webster algorithm, and the result shows that theQ-learning green optimization algorithm has very strong superiority.
     On the other hand, this thesis is based on the Q-learning theory at the manycrossroads optimization phase difference. This thesis studied optimization phasedifference for a fixed period mode, and integrated VISSIM-Excel and VBA-Matlabsimulation platform as technology platform.And this thesis taked VBA and Matlabprogramming,established offline Q-learning model,and taked the minimum delay asoptimal objective. Then the optimal solution is applied to VISSIM real-time trafficcontrol, and compared with MAXBAND, the results indicated that the Q-learningphase difference optimization algorithm has the very strong superiority.
     At the same time, in order to establish experimental condition of the algorithms,the thesis researched the integrated Vissim-Excel VBA-Matlab simulation platform,comprehensive the ability of the Vissim reliable microscopic traffic flow simulation,Excel VBA efficient programming efficiency and data communication ability, and Matlab complex intelligent traffic control algorithm.Through the Vissim-Excel VBAinterface technology, the thesis built traffic control real-time simulation platform. Atlast, the thesis summarized the work and put forward the problems that is need to befurther studied.

引文

[1] NCHRP SYNTHESIS. Adaptive Traffic Control Systems: Domestic and ForeignState of Practice. Washington D C: Transportation Research Board,2010:1-42
    [2] Russell S, NorVig P. Artificial Intelligence, A Modem Approach Prentice Hall,1995
    [3] Marco Wiering, Jelle van Veenen, Jilles Vreeken, et al. Intelligent Traffic LightControl, Institute of Information and Computing Sciences,2004
    [4]王莉，王明哲等．实时自适应交通信号控制CPN建模分析．公路交通科技，2008，25(6):115-119
    [5] C W Anderson. Learning to Control an Invcrted Pendulum Using NeuralNetworks. IEEE Control Systems Magazine,1989,9(3):31-37
    [6] Simon Saykin. Neural Networks a Comprehensive Foundation. TsinghuaUniversity Press.2001:664-727
    [7]李瑞敏，陆化普，史其信．基于神经网络的路口交通流转向比预测．西南交通大学学报，2007，42-43
    [8]吴银风，邝先验，许伦辉．信号交差口模糊逻辑白适应控制．微计算机信息，2007，23(4)：103-105
    [9] Taale H., Back T., Preu M., Eiben A.E.et al., Optimizing traffic light controllersby means of evolutionary algorithms, In: EUFIT. Aachen Germany, ELITEFoundation,1998:1730-1734.
    [10]万伟，陈锋．基于遗传算法的单交叉口信号优化控制．计算机工程，2007，16(33)：217-219
    [11]王殿海，严宝杰．交通流理论．北京：人民交通出版社，2002，213-224
    [12] Prashanth L. A. and Shalabh Bhatnagar, Reinforcement Learning With FunctionApproximation for Traffic Signal Control, TRANSACTIONS ONINTELLIGENT TRANSPORTATION SYSTEMS,2011,12(2):412-421
    [13] Ana L.C.Bazzan, DenisedeOliveira, BrunoC.daSilva, Learning ingroups of trafficsignals, Engineering Applications of Artificial Intelligence,2010,(23):560-568
    [14] Lu Shoufeng, Liu Ximin, Dai Shiqiang. Q-learning for adaptive traffic signalcontrol based on delay minimization strategy, International Conference onNetworking, Sensing and Control. USA: The Institute of Electrical andElectronics Engeners,2008:687-691
    [15]李志强．Q学习在单路口交通信号控制中的应用研究：[硕士学位论文]．长沙：长沙理工大学，2009，43-56
    [16] Sutton R.S Barto A.G.Reinforcement Learning: An Introduction. MA：MITPress,1998,60~88
    [17]刘智勇，宋正东．城市区域交通信号的混沌模糊Q学习控制．计算机工程与应用，2012，48(4)：207-210
    [18] Watkins P Dayan. Q-learning. Machine Learning,1992,8(3):279-292
    [19]徐建闽，尹宏宾．道路交通控制基础，广州：华南理工大学出版社，2000，113-127
    [20]翟忠民，景东升，陆化普．道路交通实战案例．北京：人民交通出版社，2007：23-56
    [21]刘智勇．智能交通控制理论及其应用．北京：科学出版社，2003，251-267
    [22](德)道路与交通工程研究学会．交通信号控制指南——德国现行规范(RiLSA)．李克平．北京：中国建筑工业出版社，2006：2-5
    [23]扬佩昆，吴兵．交通管理与控制．2版．北京：人民交通出版社，2003：89-121
    [24]张吕禄，翟润平．交通干线信号协调控制方法综述．中国人民公安人学学报，2007．1：87-90
    [25]王伟平．城市平面交叉口交通信号控制优化方法的研究：[硕士学位论文]．山东：山东科技大学，2004，30-37
    [26]谷远利，于雷，邵春福．相邻交叉口相位差优化模型及仿真．吉林大学学报(工学版)，2008，38(增)：53-58
    [27] Kevin Fehon, P.E., Principal, DKS Associates. Adaptive Traffic Signals Are wemissing the boat, ITE District, Annual Meeting,2004,6
    [28] Hunt P B, Robertson D L, Bretherton R D. The SCOOT On-Line Traffic SignalOptimization Technique. Traffic Engineering and Control,1982,23(4):190-192
    [29] Chau C. Adaptive Traffic Control System Using SCATS, Land Development andTransportation Division Engineering Department,2003:51
    [30] Luk J Y K. Two Traffic-Responsive Area Traffic Control Methods: SCAT andSCOOT. Traffic Engineering and Control,1984,25(1):14-22
    [31] Thorpe T, Anderson C. Traffic light control using SARSA with three staterepresentations. Colorado, USA: Department of Computer Science, ColoradoState University,1996
    [32] Thorp T L, Anderson C. Traffic light control using SARSA with three staterepresentations, Technical Report, IBM Cooperation,1996
    [33] Moriarty D E, Langley P. Learning cooperative lane selection strategies forhighways, In: Madison Wisconsin U.S., Proceedings of the Fifteenth NationalConference on Artificial Intelligence,1998:684-691
    [34] Levinson D. The value of advanced traveler information systems for route choice,Transportation Research Part C: Emerging Technologies,2003:75-87.
    [35] Martin P T, Hockaday S L M. SCOOT-An Update. Institute of TransportationEngineers Journal,1995,65(1):44-48
    [36] Mauro V, Di Taranto C. UTOPIA. Proceedings of the6th IFAC/IFIP/IFORSSymposium on Control, Computers and Communications on Transportation,Paris,1989:245-252
    [37] Head L, P Mirchandani. RHODES: Fundamental Principles, the79th AnnualMeeting of the Transportation Research Board, Adaptive Signal Control SystemsWorkshop, Washington D C,2000:9-13,.
    [38] Gartner N H, Tarnoff P J, Andrews C M. Evaluation of the Optimized Policies forAdaptive Control(OPAC) Strategy. Transportation Research Record,1991,1324:105-114
    [39] Gartner N H. OPAC: A Demand-Responsive Strategy for Traffic Signal Control.Transportation Research Record,1983,906:75-84
    [40] Henry, J J, J L Farges, J Tufal. The PRODYN Real Time Traffic Algorithm,Proceedings of the IFAC Symposium, Baden–Baden, Germany,1983
    [41]高阳，陈世福，陆鑫．强化学习综述．自动化学报，2004，30(1)：86-98
    [42] E L Thomdike. Educational Psychology. Brifer Course,1914
    [43] P J Werbos. Approximate dynamic Programming for real-time Control and neuralModeling. Handbook of intelligent control. New York: Van Nostrand,1992
    [44] Sutton R S. Temporal Credit Assignment in Reinforcement. University ofMassachusetts, Amherst, Massachusetts, Learning, USA,1984
    [45] S P Siagh. Reinforcement Learning with Replacing Eligibility Traces. MachineLearning,1996,22:159-195
    [46] Mahadevan S. Average reward reinforcement learning: Foundations, Algorithms,and empirical results. Machine Learning,1996,22:159-196
    [47] Rummery A. Problem Solving with Reinforcement Learning. EngineeringDepartment, Cambridge University,1994
    [48] J. C. H. Watkins. Learning from Delayed Rewards. University of Carnbridge,England,1989
    [49] Schwartz. A reinforcement learning method for maximizing undiscountedrewards. In Proceedings of the Tenth International Conference on MachineLearning,1993:298-305
    [50] J. C. H. Watkins. Learning from Delayed Rewards. University of Carnbridge,England,1989
    [51] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. MA：MITPress,1998:60~88
    [52] S Mahadevan，Average reward reinforcement learning：foundations，algorithmsand empirical results Machine learning，1996,22:1-38
    [53]李瑞．强化学习主要算法研究．渝西学院学报，2004，(09)：22-25
    [54]张汝波，顾国昌，刘照德等．强化学习理论、算法及应用．控制理论与应用，2000，17(5)：637-542
    [55]赵昀．有关强化学习的若干问题研究：[硕士学位论文]．江苏：南京理工大学，2009，20-29
    [56] Sutton R. Learning to Predict by the Methods of Temporal Difference. MachineLearning,1988,3:9-44
    [57] Tadepalli C. Model-based average reward reinforcement learning. ArtificialIntelligence,1998,100:177-224
    [58] Bellman. Dynamic Programming. Princeton University Press,1957
    [59] Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: a survey,Journal of Artificial Intelligence Research,1996:237-285
    [60]欧海涛等．基于多智能体技术的城市智能交通控制系统．电子学报，2000，28(12)：52-55
    [61] Chen Cai, Chi Kwong Wong, Benjamin G Heydecker. Adaptive traffic signalcontrol using approximate dynamic programming, Transportation Research PartC,2009,17:456–474
    [62] Matt Selinger P E, Ptoe Luke Schmidt. Adaptive traffic-control systems in theunited states updated summary and comparison, HDR,2010
    [63]段后利，李志恒，张毅等．交通控制子区动态划分模型．吉林大学学报(工学版)，2009，39(sup2)：13-18
    [64]林晓辉，徐建闽，卢凯．各进口单独放行条件下的双向绿波设计方法研究．交通与计算机，2007，25(5)：8-12
    [65]马万经，李晓丹，杨晓光．基于路径的信号控制交叉口关联度计算模型．同济大学学报(自然科学版)，2009，37(11):1462-1466
    [66] Kevin Fehon P E, Principal DKS Associates. Adaptive Traffic Signals Are wemissing the boat, ITE District: Annual Meeting,2004,6
    [67] John D C Little, Mark D Kelson, Nathan H Gartner. MAXBAND A VersatileProgram for Setting Signals on Arteries and Triangular Networks, MassachusettsInstitute of Technology,1981
    [68]陈宁宁，何兆成，余志．考虑动态红灯排队消散时间的改进MAXBAND模型．武汉理工大学学报(交通科学与工程版)，2009，33(5):843-847
    [69] PTV AG. VISSIM520User Manual. Germany: Planung Transport Verkehr AG,2010
    [70]杨兆升，王爽．基于Vissim仿真软件的交通信号配时研究．交通与计算机，2005，23(1)：7-11
    [71]林晓辉，徐建闽，卢凯等．基于Vissim仿真软件的干道协调控制方案研究．物流科技，2010，(2)：127-130
    [72]杨兆升，王爽．Vissim在交叉口交通设计与运行分析中的应用．武汉理工大学学报：交通科学与工程版，2004，28(2)：232-235
    [73]邓润飞，李冬梅．基于Vissim仿真软件的交通组织方案研究．现代交通科技，2006，(3)：61-63
    [74]秦雅琴，熊坚．基于VISSIM仿真系统的城市路网评价——以昆明城市路网整治为例．昆明理工大学学报(理工版)，2006，31(6)：87-89
    [75]刘丽莎，王亚．基于Vissim的交通微观仿真讨论．交通科技，2006，(1)：79-81
    [76] PTV AG. VISSIM5.20-10COM Interface Manual. Germany: Planung TransportVerkehr AG,2010
    [77]吴文国．MATLAB&Excel工程计算．北京：清华大学出版社，2009，90-112
    [78]杨高波，亓波．精通MATLAB7.0混合编程．北京：电子工业出版社，2006，35-87

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700