详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
The traffic problem has become a major bottleneck to restrict the urban’seconomic development, to solve the traffic congestion already imminent, the limitednature of urban space and other factors, such as economic, environmental, etc, thathinder the expansion of transport infrastructure. The introduction of artificialintelligence, computer simulation and other high-tech, to solve urban trafficcongestion and congestion, has become the hot spots of traffic signal control.
     In this thesis, after a brief overview of the signal to control the developmentstatus and Q-learning theory, focusing on the Q-learning theory applied to the trafficsignal control. On the one hand, is the green time optimization based on theQ-learning theory. For a fixed period and variable cycle two modes of singleintersection signal timing optimization study, construct saturation and delay minimumas the optimization objective reward function, established the two optimizationobjectives off-line Q-learning model. Through the VBA and Matlab programsexample, the thesis analysis the structure of the solution and the distribution of theoptimal solution of the4kinds of offline Q-learning model, and discusses theapplicability of offline Q-learning optimization model in the intersection signalcontrol. Then, the optimal solution is applied to real-time traffic control in VISSIM,and compared with the classical Webster algorithm, and the result shows that theQ-learning green optimization algorithm has very strong superiority.
     On the other hand, this thesis is based on the Q-learning theory at the manycrossroads optimization phase difference. This thesis studied optimization phasedifference for a fixed period mode, and integrated VISSIM-Excel and VBA-Matlabsimulation platform as technology platform.And this thesis taked VBA and Matlabprogramming,established offline Q-learning model,and taked the minimum delay asoptimal objective. Then the optimal solution is applied to VISSIM real-time trafficcontrol, and compared with MAXBAND, the results indicated that the Q-learningphase difference optimization algorithm has the very strong superiority.
     At the same time, in order to establish experimental condition of the algorithms,the thesis researched the integrated Vissim-Excel VBA-Matlab simulation platform,comprehensive the ability of the Vissim reliable microscopic traffic flow simulation,Excel VBA efficient programming efficiency and data communication ability, and Matlab complex intelligent traffic control algorithm.Through the Vissim-Excel VBAinterface technology, the thesis built traffic control real-time simulation platform. Atlast, the thesis summarized the work and put forward the problems that is need to befurther studied.
[1] NCHRP SYNTHESIS. Adaptive Traffic Control Systems: Domestic and ForeignState of Practice. Washington D C: Transportation Research Board,2010:1-42
    [2] Russell S, NorVig P. Artificial Intelligence, A Modem Approach Prentice Hall,1995
    [3] Marco Wiering, Jelle van Veenen, Jilles Vreeken, et al. Intelligent Traffic LightControl, Institute of Information and Computing Sciences,2004
    [5] C W Anderson. Learning to Control an Invcrted Pendulum Using NeuralNetworks. IEEE Control Systems Magazine,1989,9(3):31-37
    [6] Simon Saykin. Neural Networks a Comprehensive Foundation. TsinghuaUniversity Press.2001:664-727
    [9] Taale H., Back T., Preu M., Eiben A.E.et al., Optimizing traffic light controllersby means of evolutionary algorithms, In: EUFIT. Aachen Germany, ELITEFoundation,1998:1730-1734.
    [12] Prashanth L. A. and Shalabh Bhatnagar, Reinforcement Learning With FunctionApproximation for Traffic Signal Control, TRANSACTIONS ONINTELLIGENT TRANSPORTATION SYSTEMS,2011,12(2):412-421
    [13] Ana L.C.Bazzan, DenisedeOliveira, BrunoC.daSilva, Learning ingroups of trafficsignals, Engineering Applications of Artificial Intelligence,2010,(23):560-568
    [14] Lu Shoufeng, Liu Ximin, Dai Shiqiang. Q-learning for adaptive traffic signalcontrol based on delay minimization strategy, International Conference onNetworking, Sensing and Control. USA: The Institute of Electrical andElectronics Engeners,2008:687-691
    [16] Sutton R.S Barto A.G.Reinforcement Learning: An Introduction. MA:MITPress,1998,60~88
    [18] Watkins P Dayan. Q-learning. Machine Learning,1992,8(3):279-292
    [27] Kevin Fehon, P.E., Principal, DKS Associates. Adaptive Traffic Signals Are wemissing the boat, ITE District, Annual Meeting,2004,6
    [28] Hunt P B, Robertson D L, Bretherton R D. The SCOOT On-Line Traffic SignalOptimization Technique. Traffic Engineering and Control,1982,23(4):190-192
    [29] Chau C. Adaptive Traffic Control System Using SCATS, Land Development andTransportation Division Engineering Department,2003:51
    [30] Luk J Y K. Two Traffic-Responsive Area Traffic Control Methods: SCAT andSCOOT. Traffic Engineering and Control,1984,25(1):14-22
    [31] Thorpe T, Anderson C. Traffic light control using SARSA with three staterepresentations. Colorado, USA: Department of Computer Science, ColoradoState University,1996
    [32] Thorp T L, Anderson C. Traffic light control using SARSA with three staterepresentations, Technical Report, IBM Cooperation,1996
    [33] Moriarty D E, Langley P. Learning cooperative lane selection strategies forhighways, In: Madison Wisconsin U.S., Proceedings of the Fifteenth NationalConference on Artificial Intelligence,1998:684-691
    [34] Levinson D. The value of advanced traveler information systems for route choice,Transportation Research Part C: Emerging Technologies,2003:75-87.
    [35] Martin P T, Hockaday S L M. SCOOT-An Update. Institute of TransportationEngineers Journal,1995,65(1):44-48
    [36] Mauro V, Di Taranto C. UTOPIA. Proceedings of the6th IFAC/IFIP/IFORSSymposium on Control, Computers and Communications on Transportation,Paris,1989:245-252
    [37] Head L, P Mirchandani. RHODES: Fundamental Principles, the79th AnnualMeeting of the Transportation Research Board, Adaptive Signal Control SystemsWorkshop, Washington D C,2000:9-13,.
    [38] Gartner N H, Tarnoff P J, Andrews C M. Evaluation of the Optimized Policies forAdaptive Control(OPAC) Strategy. Transportation Research Record,1991,1324:105-114
    [39] Gartner N H. OPAC: A Demand-Responsive Strategy for Traffic Signal Control.Transportation Research Record,1983,906:75-84
    [40] Henry, J J, J L Farges, J Tufal. The PRODYN Real Time Traffic Algorithm,Proceedings of the IFAC Symposium, Baden–Baden, Germany,1983
    [42] E L Thomdike. Educational Psychology. Brifer Course,1914
    [43] P J Werbos. Approximate dynamic Programming for real-time Control and neuralModeling. Handbook of intelligent control. New York: Van Nostrand,1992
    [44] Sutton R S. Temporal Credit Assignment in Reinforcement. University ofMassachusetts, Amherst, Massachusetts, Learning, USA,1984
    [45] S P Siagh. Reinforcement Learning with Replacing Eligibility Traces. MachineLearning,1996,22:159-195
    [46] Mahadevan S. Average reward reinforcement learning: Foundations, Algorithms,and empirical results. Machine Learning,1996,22:159-196
    [47] Rummery A. Problem Solving with Reinforcement Learning. EngineeringDepartment, Cambridge University,1994
    [48] J. C. H. Watkins. Learning from Delayed Rewards. University of Carnbridge,England,1989
    [49] Schwartz. A reinforcement learning method for maximizing undiscountedrewards. In Proceedings of the Tenth International Conference on MachineLearning,1993:298-305
    [50] J. C. H. Watkins. Learning from Delayed Rewards. University of Carnbridge,England,1989
    [51] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. MA:MITPress,1998:60~88
    [52] S Mahadevan,Average reward reinforcement learning:foundations,algorithmsand empirical results Machine learning,1996,22:1-38
    [56] Sutton R. Learning to Predict by the Methods of Temporal Difference. MachineLearning,1988,3:9-44
    [57] Tadepalli C. Model-based average reward reinforcement learning. ArtificialIntelligence,1998,100:177-224
    [58] Bellman. Dynamic Programming. Princeton University Press,1957
    [59] Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: a survey,Journal of Artificial Intelligence Research,1996:237-285
    [61] Chen Cai, Chi Kwong Wong, Benjamin G Heydecker. Adaptive traffic signalcontrol using approximate dynamic programming, Transportation Research PartC,2009,17:456–474
    [62] Matt Selinger P E, Ptoe Luke Schmidt. Adaptive traffic-control systems in theunited states updated summary and comparison, HDR,2010
    [66] Kevin Fehon P E, Principal DKS Associates. Adaptive Traffic Signals Are wemissing the boat, ITE District: Annual Meeting,2004,6
    [67] John D C Little, Mark D Kelson, Nathan H Gartner. MAXBAND A VersatileProgram for Setting Signals on Arteries and Triangular Networks, MassachusettsInstitute of Technology,1981
    [69] PTV AG. VISSIM520User Manual. Germany: Planung Transport Verkehr AG,2010
    [76] PTV AG. VISSIM5.20-10COM Interface Manual. Germany: Planung TransportVerkehr AG,2010

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700