Policy Improvements for Probabilistic Pursuit-Evasion Game

详细信息查看全文

作者：Dong Jun Kwak (1)
H. Jin Kim (1)
关键词：Pursuit ; evasion game ; Probabilistic game ; Multiple robots ; Reinforcement learning ; Particle swarm optimization
刊名：Journal of Intelligent and Robotic Systems
出版年：2014
出版时间：June 2014
年：2014
卷：74
期：3-4
页码：709-724
全文大小：
参考文献：1. Olfati-Saber, R.: Flocking for multi-agent dynamic systems: algorithms and theory. IEEE Trans. Automat. Control 51(3), 401鈥?20 (2006) CrossRef
2. Yu, J., LaValle, S.M., Liberzon, D.: Rendezvous without coordinates. In: Proceedings of the 47th IEEE Conference on Decision and Control (CDC 2008), pp. 1803鈥?808 (2008)
3. Kim, D.H., Kim, J.H.: A real-time limit-cycle navigation method for fast mobile robots and its application to robot soccer. Robot. Auton. Syst. 42(1), 17鈥?0 (2003) CrossRef
4. Couzin, I.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision-making in animal groups on the move. Nature 433, 513鈥?16 (2005) CrossRef
5. Ren, W., Sorensen, N.: Distributed coordination architecture for multi-robot formation control. Robot. Auton. Syst. 56(4), 324鈥?33 (2008) robot.2007.08.005" target="_blank" title="It opens in new window">CrossRef
6. Isaacs, R.: Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. Wiley, New York (1965)
7. Vidal, R., Shakernia, O., Kim, H.J., Shim, D.H., Sastry, S.: Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation. IEEE Trans. Robot. Autom. 18(5), 662鈥?69 (2002) CrossRef
8. Kwak, D.J., Kim, H.J.: Probabilistic pursuit-evasion game. In: Proceedings of Korea Automatic Control Conference (2009)
9. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks, pp. 1942鈥?948 (1995)
10. Clerc, M., Kennedy, J.: The particle swarm explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58鈥?3 (2002) CrossRef
11. Benkoski, S.J., Monticino, M.G., Weisinger, J.R.: A survey of the search theory literature. Nav. Res. Log. 38(4), 469鈥?94 (1991) CrossRef
12. Chung, T., Hollinger, G., Isler, V.: Search and pursuit-evasion in mobile robotics. Auton. Robot. 31(4), 299鈥?16 (2011) CrossRef
13. Kehagias, A., Hollinger, G., Singh, S.: A graph search algorithm for indoor pursuit/evasion. Math. Comput. Model. 50(9鈥?0), 1305鈥?317 (2009) CrossRef
14. Huang, H., Zhang, W., Ding, J., Stipanovic, D.M., Tomlin, C.J.: Guaranteed decentralized pursuit-evasion in the plane with multiple pursuers. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC 2011), pp. 1803鈥?808 (2011)
15. Murrieta-Cid, R., Muppirala, T., Sarmiento, A., Bhattacharya, S., Hutchinson, S.: Surveillance strategies for a pursuer with finite sensor range. Int. J. Robot. Res. 26(3), 233鈥?53 (2007) CrossRef
16. Murrieta-Cid, R., Monroy, R., Hutchinson, S., Laumond, J.P., Tomlin, C.J.: A Complexity result for the pursuit-evasion game of maintaining visibility of a moving evader. In: Proceedings of the IEEE Conference on Robotics and Automation (ICRA 2008), pp. 2657鈥?664 (2008)
17. Tovar, B., LaValle, S.M.: Visibility-based pursuit-evasion with bounded speed. Int. J. Robot. Res. 27(11鈥?2), 1350鈥?360 (2008) CrossRef
18. Bhattacharya, S., Hutchinson, S.: On the existence of nash equilibrium for a two player pursuit-evasion game with visibility constraints. In: Chirikjian, G.S., Choset, H., Morales, M., Murphey, T. (eds.) Algorithmic Foundation of Robotics VIII, pp. 251鈥?65. Springer Berlin Heidelberg (2009)
19. Vieira, M., Govindan, R., Sukhatme, G.: Scalable and practical pursuit-evasion with networked robots. Intel. Serv. Robotics 2(4), 247鈥?63 (2009) CrossRef
20. Stone, L.D.: Theory of Optimal Search. Academic Press, New York (1975)
21. Yan, I., Blankenship, G.L.: Numerical methods in search path planning. In: Proceedings of the IEEE Conference on Decision and Control, pp. 1563鈥?569 (1988)
22. Bourgault, F., Furukawa, T., Durrant-Whyte, H.F.: Optimal search for a lost target in a bayesian world. Field Serv. Robot. 24, 209鈥?22 (2006) CrossRef
23. Tisdale, J., Ryan, A., Kim, Z., Tornqvist, D., Hedrick, J.K.: A multiple UAV system for vision-based search and localization. In: Proceedings of the IEEE Conference on American Control Conference (ACC 2008), pp. 1985鈥?990 (2008)
24. Tisdale, J., Kim, Z., Hedrick, J.K.: Autonomous UAV path planning and estimation. IEEE Trans. Robot. Autom. Mag. 16(2), 35鈥?2 (2009) CrossRef
25. Furukawa, T., Bourgault, F., Lavis, B., Durrant-Whyte, H.F.: Recursive Bayesian search-and-tracking using coordinated uavs for lost targets. In: Proceedings of the IEEE Conference on Robotics and Automation (ICRA 2006), pp. 2521鈥?526 (2006)
26. Chung, C.F., Furukawa, T.: Coordinated pursuer control using particle filters for autonomous search-and-capture. Robot. Auton. Syst. 57(6鈥?), 700鈥?11 (2009) robot.2008.11.002" target="_blank" title="It opens in new window">CrossRef
27. Grocholsky, B., Keller, J., Kumar, V., Pappas, G.: Cooperative air and ground surveillance. IEEE Trans. Robot. Autom. Mag. 13(3), 16鈥?5 (2006) CrossRef
28. Santana, H., Ramalho, G., Corruble, V., Ratitch, B.: Multi-agent patrolling with reinforcement learning. In: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, vol. 3, pp. 1122鈥?129 (2004)
29. Xueqing Sun, Tao Mao, Kralik, J.D., Ray, L.E.: Cooperative multi-robot reinforcement learning: a framework in hybrid state space. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), pp. 1190鈥?196 (2009)
30. Cai, Y., Yang, S., Xu, X., Mittal, G.: A hierarchical reinforcement learning based approach for multi-robot cooperation in unknown environments. Adv. Intell. Soft Comput. 144, 69鈥?4 (2012) CrossRef
31. Nanduri, V., Das, T.K.: A reinforcement learning algorithm for obtaining the Nash equilibrium of multi-player matrix games. IIE Trans. 41(2), 158鈥?67 (2009) CrossRef
32. Pugh, J., Martinoli, A.: Distributed scalable multi-robot learning using particle swarm optimization. Swarm Intell. 3(3), 203鈥?22 (2009) CrossRef
33. Niehaus, C., Rofer, T., Laue, T.: Gait-optimization on a humanoid robot using particle swarm optimization. In: Proceedings of the 2nd Workshop on Humanoid Soccer Robots @ IEEE-RAS 7th International Conference on Humanoid Robots (2007)
34. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
35. Dubins, L.E.: On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. Am. J. Math. 79(3), 497鈥?16 (1957) CrossRef
36. Khosla, P., Volpe, R.: Superquadric artificial potentials for obstacle avoidance and approach. In: Proceedings of the 1988 IEEE International Conference on Robotics and Automation, pp. 1778鈥?784 (1988)
37. Ng, A.Y., Jordan, M.I.: PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI鈥?0), pp. 406鈥?15 (2000)
38. Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat, S., Zufferey, J., Floreano, D., Martinoli, A.: The e-puck, a robot designed for education in engineering. In: Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, pp. 59鈥?5 (2009)
39. Video clip: Available online. http://icsl.snu.ac.kr/video/research/pursuit_evasion_game.htm (2013). Accessed 2 May 2013
作者单位：Dong Jun Kwak (1)
H. Jin Kim (1)

1. School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Korea
ISSN：1573-0409

文摘

This paper focuses on a pursuit-evasion game (PEG) which involves two teams: one side consists of pursuers trying to minimize the time required to capture evaders, and the other side consists of evaders trying to maximize the capture time by escaping the pursuers. In this paper, we propose a hybrid pursuit policy for a probabilistic PEG, which possesses the combined merits of local-max and global-max pursuit policies proposed in previous literature. A method to find optimal pursuit and evasion polices for two competitive parties of the pursuers and evaders is also proposed. For this, we employ an episodic parameter optimization (EPO) algorithm to learn good values for the weighting parameters of a hybrid pursuit policy and an intelligent evasion policy. The EPO algorithm is performed during the numerous repeated simulation runs of the PEG and the reward of each episode is updated using reinforcement learning, and the optimal weighting parameters are selected by using particle swarm optimization. We analyze the trend of the optimal parameter values with respect to the number of the pursuers and evaders. The proposed strategy is validated both in simulations and experiments with small ground robots.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700