增强学习在共面双机空战格斗中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
微分对策作为解决追逃动态对策问题主要工具已经经历了近50年的发展,就其本身而言已经发展的相当成熟,但距实际应用还有一段距离。这主要是由于微分对策理论来源于最优控制理论,因此它需要精确的数学模型,以及在求解时会遇到非线性两点边值问题和奇异面问题。
     近年来,随着人工智能的兴起,国内外许多学者致力于将智能控制理论引入微分对策理论的研究中。而要达到智能化制导就不可避免地涉及知识的自动提取和利用问题。作为机器学习的一种方法,增强学习恰可使知识的获取过程自动化,并扩展所能得到的知识资源范围。
     本文研究了共面双机空战格斗的动态对策问题,采用增强学习与微分对策相结合的方法,避免了传统的控制理论根据被控对象的精确数学模型和性能指标来求解最优解析解的方法带来的困难。并依据人的模糊思维建立空战对策准则,实现状态空间的离散化以减小动作空间范围,提高网络学习效率。
     本文针对传统增强学习中出现的“维数灾难”问题以及学习问题中的“Structure Credit-Assignment”问题采用BP神经网络近似Q-学习的评价函数的解决方法。
     在仿真试验中考虑诸多实际因素,并采用了实际空气动力学参数,仿真结果验证本文所采用的方法的有效性,表明将增强学习与微分对策理论相结合,并应用于空战格斗问题中是—种有前途的发展方向。
     本文首先分析双机格斗的重要性及其研究方法的发展,并给出设计方案的依据及总体框架。在第二章介绍了增强学习的特点、发展历史和各种算法。在第三章设计了基于Q-学习智能空战制导控制,并给出空战对策准则。在第四章对水平面双机空战格斗常、变速数学模型进行了仿真试验,对仿真结果作了分析。
As a main tool to deal with pursuit-evasion games, the differential game theory has developed very well for fifty years. However, it is difficult to apply the results in real air games. Most analytical studies need precise mathematics model and involve to solve the problems of nonlinear two-point boundary value and singular surfaces, which are formulated by the set of necessary conditions of game optimality. So it is impossible to get the accurate solutions.
    With the development of artificial intelligence, many recent studies have been devoted to combining intelligent control with differential game theory. In the realization of intelligent control, it must be involved in automatic acquisition and utilization of knowledge. As one of machine learning, reinforcement learning not only has this function but also can expand the acquisitive resource.
    By using the method of combining reinforcement learning and differential game theory, the coplanar air combat problem between two aircrafts is analyzed. Based on this method, it is avoided to solve the tedious two-point boundary value problem derived from the optimal control theory. By the human fuzzy logic, the rule of air-combat policy is built, which decomposes the state space , decreases the action space and improves the efficiency of neural network.
    The value function approximation of reinforcement learning with neural network is studied. Based on this method, the problems of the "curse of dimensionality" in the reinforcement-learning algorithm and "structure credit-assignment" in learning are solved.
    In the simulation, many practical conditions and realistic aerodynamic data are analyzed. The simulation results show the validity of applying reinforcement-learning-based differential games to coplanar air combat.
    This paper is outlined as follow: the importance and method of research in the two-aircraft combat is firstly analyzed, And then the general structure and the bases of design are presented. In Section 2, the nature, history and algorithms are introduced. In Section 3, the rule of air combat countermeasure is given, and the intelligence guidance problem of air combat with
    
    
    
    reinforcement learning is discussed. In Section 4, Based on the horizontal two-aircraft combat dynamics model, the numerical simulations are made respectively for constant-speed and variable-speed cases, and then these results are analyzed.
引文
[1] 张嗣瀛,微分对策,北京:科学出版社,1987.2.
    [2] Isaacs, R., Differential Games, John Wiley and Sons, New York, 1965.
    [3] 胡奇英,刘建庸,马尔可夫决策过程引论,西安:西安电子科技大学出版社,2000.7.
    [4] 吴云洁,宋立国,姜玉宪.飞行综合控制系统空战决策方法[J].北京航空航天大学学报,(6),1999.
    [5] Falco Burgin, Automated Maneuvering Decision for Air-to-Air Combat. [J], AIAA 87-2393.
    [6] Merz, A.W., The Homicididal Chauffeur, A Differential Game, Stanford University, Department of Aeronautics and Astronautics, SUDAAR Report No. 418, 1971.
    [7] Roberts, D.A., and Montgomery, R.C., Development and Application of a Gradient Method for Solving Differential Games, NASA TN-D-6502, 1971.
    [8] Anthony L.Leatham, Urban H.D.Lynch, Two Numerical Method to Solve Realistic Air-To-Air Combat Differential Games, AIAA Paper, No. 74-22, 1974.
    [9] Neeland,R.P., The Numerical Solution of a Nonlinear, Control Constrained, Air-to-Air Combat Differential Game, University of California, Los Angeles, PhD Thesis, 1974.
    [10] B.S.A.Jrmark, A.W.Merz and J.V.Breakwell, The Variable-Speed Tail-Chase Aerial Combat Problem, J. Guidance and Control, 1981,4(3): 323-328.
    [11] Siying Zhang, Hansheng Wu and Jingcai Wang, An Approach to Solve the Role Ambiguity Problem in Aerial Combat, Proc. Of 9th IFAC World Congress, 1984, 5, 160-165.
    [12] J. Shinar and A. Davidovitz, Unified Approach for Two-Target Games Analysis, Proc. of 10th IFAC World Congress, 1987,8, 72-78.
    [13] J. Shinar and N. Farber, Horizontal Variable-Speed Interception Game Solved by Forced Singular Perturbation Technique, J. of Optimization Theory and Applications, 1984, 42 (4), 603-636.
    
    
    [14] Changyou Liu, Siying Zhang, An Approximate Solution of a Variable-Speed Interception Game in Vertical Plane and the Accuracy Evaluation, Proc. of 26th IEEE Conference on Decision and Control, 1987, 2, 1111-1115.
    [15] 刘长有,张嗣瀛,共面变速拦截对策的一个统一的近似反馈制导律,航空学报,1989,10(9),427—433。
    [16] Changyou Liu, Siying Zhang, An Approach to Solve Three-Dimensional Variable-Speed Interception Games, Proc. of American Control Conference, 1991,1,360-363.
    [17] 徐成,沈如松,周卿吉.基于神经网络和微分对策理论的制导律[J].系统工程与电子技术,1998,20(1):1-4.
    [18] K.A. Wise, Missile Autopilot Robustness Using the Real Multi-Loop Stability Margins, J. of Guidance, Control and Dynamics, 1993, 16 (2), 354-362.
    [19] Yong-In Lee, Hungu Lee, Byung-Chan Sun, Min-Jea Tahk. Btt Missile Guidance Using Neural Network. Proc. Of 13th IFAC World Congress, 1996, 8, 243-248.
    [20] P.A.Creaser, B.A.Stacey and B.A.White, Evolutionary Generation of Fuzzy Guidance Laws, Proc. UKACC International Conference on Control'98, 883-888,1998.
    [21] C.Slivestre, A.Pascoal, I.Kaminer, E.Hallberg, Trajectory Tracking Controllers for AUVS: An Integrated Approach To Guidance and Control System Design, Proc. of 13th IFAC World Congress, 8, 345-350, 1996.
    [22] 周锐,陈宗基,强化学习在导弹制导中的应用,控制理论与应用,2001,18(5):748-750.
    [23] 周锐,陈宗基,遗传算法在逃逸机动策略中的应用研究,控制与决策,2001,16(4):465-467.
    [24] 蒋映忠,空战微分对策的智能控制研究[D],北京:北京航空航天大学制动控制系,1994.
    [25] 周锐,李惠峰,神经网络理论在微分对策中的应用,北京航空航天大学学报,26(6):666-668,2000.
    [26] 薄涛,彭再求,刘秀罗,王正志,黄柯赓,基于模糊规则的双机的格斗行为建模方法研究,系统仿真学报,2002,14(4):440-443.
    
    
    [27] 张凤霞,王丕宏,李锋,基于神经网络的自动机动决策设计,电光与控制,2001,83(3):47-52.
    [28] Jurek Z.Sasiadek and Qi wang, Guidance and Navigation of Mobile and Flying Robot Using Fuzzy Control, Proe. of 14th IFAC World Congress, 8, 215-220, 1999.
    [29] Watkins C. Learning from Delayed Rewards [Ph.D. Dissertation]. UK: King's College, Univ of Cambridge, 1989.
    [30] C. Watldns and P. Dayan, Q-Learning, [J]. Machine Learning, 1992, 8(4): 279-292.
    [31] Sutton R. Learning to Predict by the Method of Temporal Differences. Machine Learning, 1988, 3(1): 9-44.
    [32] 徐昕,戴斌,唐修俊,贺汉根,基于增强学习的移动机器人导航控制,中南工业大学学报,2000,31(10):462-464.
    [33] 张汝波,顾国昌,刘照德,王醒策,强化学习的理论、算法及应用,控制理论与应用,2000,17(5):637-642.
    [34] 蒋国飞,吴昌浦,基于Q学习算法和BP神经网络的倒立摆控制,自动化学报,1998,24(5):662-666.
    [35] 阎平凡,再励学习—原理、算法及其在智能控制中的应用,信息与控制,1996,25(1):28-34.
    [36] 张汝波,顾国昌,张国印,强化学习系统的结构及算法,计算机科学,1996,26(10):53-56
    [37] Marco Wiering, Jürgen Schmidhuber. Fast Online (), Machine Learning, 1998, 33: 105-115.
    [38] Richard S.Sutton, Andrew G.Bart,and Ronald J.Williams. Reinforcement Learning is Direct Adaptive Optimal Control, ACC, Boston, MA, June 26-28, 1991.
    [39] 贺汉根,徐昕,朱小俊,增强学习及其在机器人控制中的应用,中南工业大学学报,2000,31,170-173.
    [40] 顾冬雷,陈卫东,席裕庚,机器人足球赛中基于增强学习的任务分工,机器人,2000,22(6):482-489.
    
    
    [41] Yuka Akisato, Keiji Suzuki, and Azuma Ohuchi, GA-Based Q-CNAC Applied to Airship Evasion Problem, J. of Robotics and Mechatronics, 1998, 10(5): 431-438.
    [42] 余黎明,王占林,裘丽华,辅助驾驶员操纵的预见显示及预测控制,飞行力学,1999,17(3):5-10.
    [43] Hsin-Yuan Chen, Ciann-Dong Yang, Three-Dimensional Nonlinear H_∞ Guidance Law with Maneuvering Targets, Proc. of 14th IFAC World Congress, 2, 417-422, 1999.
    [44] Healey, A. and Lienard, D. Multivariable Sliding Mode Control for Autonomous Diving and Steering of Unmanned Underwater Vehicle. IEEE Journal of Oceanic Engineering. 1993, Vol. 18, pp. 327-339.
    [45] 张明廉,飞行控制系统,北京:国防工业出版社,1984.
    [46] 孙增圻等,智能控制理论与技术,北京:清华大学出版社,1997.
    [47] Minsky M L. Theory of Neural Analog Reinforcement System and its Application to the Brain Mold Problem[D]. New Jersey, USA: Princeton University, 1954.
    [48] Bush R. R. and Mosteller F. Stochastic Molds for Learning. [M]. New York: Wiley, 1955.
    [49] Barto A.G., Sutton S. and Anderson C. W. Neurallike Adaptive Elements That Can Solve Difficult Learning Control Problems. [J]. IEEE Trans. On System, Man, and Cybernetics, 1983, 13(5): 834-846.
    [50] Widrow B, Gupta N K and Maitre S. Punish/reward: beaming With a Critic in Adaptive Threshold System [J]. IEEE Trans. On System, Man, and Cybernetics, 1973, 3(5): 455-465.
    [51] Sardis G. N. Selt-Organizing Control of Stochastic System [M]. New York: Marcel Dekker, 1977, 319-332.
    [52] Singh S P. Reinforcement Learning with Replacing Eligibility Traces. [J] Machine Learning, 22:159-195, 1996.
    [53] Tesauro G J. Temporal Difference Learning and TD Gammon. [J]. Communications of the ACM, 1995, 38(3): 58-68.
    [54] Crites R H and Barto A G. Improving Elevator Performance Using Reinforcement Learning [A]. In: Touretzky D S, Mozer M C, and M E H. Advances in Neural
    
    Information Processing Systems [M]. Cambridge, MA: The MIT Press, 1017-1023, 1995.
    [55] 张智星,孙春在,[日]水谷英二,神经.模糊和软计算,西安:西安交通大学出版社,2000.6.
    [56] Anderson C W and Hittle D C. Synthesis of Reinforcement Learning Neural Network and PI Control Applied to a Simulated Heating Coil. [J]. Aitificial Intelligence Engineering, 1997, 11: 421-429.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700