摘要
路径积分方法源于随机最优控制,是一种数值迭代方法,可求解连续非线性系统的最优控制问题,不依赖于系统模型,快速收敛.文中将基于路径积分强化学习的策略改善方法用于蛇形机器人的目标导向运动.使用路径积分强化学习方法学习蛇形机器人步态方程的参数,不仅可以在仿真环境下使蛇形机器人规避障碍到达目标点,利用仿真环境的先验知识也能在实际环境下快速完成相同的任务.实验结果验证方法的正确性.
Path integral is derived from stochastic optimal control. It is a numerical iteration method and solves the problem of the optimal control about continuous nonlinear systems at a high convergence speed without system model. A policy improvement algorithm based on path integral reinforcement learning is proposed for the target-directed locomotion of a snake-like robot in this paper. The path integral reinforcement learning approach is employed to learn the parameters of the snake-like robot serpentine equation,and the robot is controlled to arrive at the target position fast without contacting obstacles in simulation environment. Moreover,the robot with the priori knowledge from the simulation in real environment can complete the task well. Experimental result verifies the validity of the propose algorithm.
引文
[1]HIROSE B S.Biologically Inspired Robots:Snake-Like Locomotors and Manipulators.Oxford,UK:Oxford University Press,1993.
[2]KELASIDI E,LILJEBACK P,PETTERSEN K Y,et al.Innovation in Underwater Robots:Biologically Inspired Swimming Snake Robots.IEEE Robotics and Automation Magazine,2016,23(1):44-62.
[3]BORENSTEIN J,HANSEN M,BORRELL A.The Omni Tread OT-4Serpentine Robot-Design and Performance.Journal of Field Robotics,2007,24(7):601-621.
[4]ROLLINSON D,CHOSET H.Pipe Network Locomotion with a Snake Robot.Journal of Field Robotics,2014,33(3):322-336.
[5]TANAKA M,NAKAJIMA M,SUZUKI Y,et al.Development and Control of Articulated Mobile Robot for Climbing Steep Stairs.IEEE/ASME Transactions on Mechatronics,2018,23(2):531-541.
[6]SATO M,FUKAYA M,IWASAKI T.Serpentine Locomotion with Robotic Snakes.IEEE Control Systems Magazine,2002,22(1):64-81.
[7]ROLLINSON D,CHOSET H.Gait-Based Compliant Control for Snake Robots//Proc of the IEEE International Conference on Robotics and Automation.Washington,USA:IEEE,2013:5123-5128.
[8]WU X D,MA S G.Adaptive Creeping Locomotion of a CPG-Controlled Snake-Like Robot to Environment Change.Autonomous Robots,2010,28(3):283-294.
[9]CRESPI A,IJSPEERT A J.Online Optimization of Swimming and Crawling in an Amphibious Snake Robot.IEEE Transactions on Robotics,2008,24(1):75-87.
[10]MATSUNO F,MOGI K.Redundancy Controllable System and Control of Snake Robots Based on Kinematic Model//Proc of the IEEEConference on Decision and Control.Washington,USA:IEEE,2000,V:4791-4796.
[11]MOHAMMADI A,REZAPOUR E,MAGGIORE M,et al.Maneuvering Control of Planar Snake Robots Using Virtual Holonomic Constraints.IEEE Transactions on Control Systems Technology,2015,24(3):884-899.
[12]ARIIZUMI R,MATSUNO F.Dynamic Analysis of Three Snake Robot Gaits.IEEE Transactions on Robotics,2017,33(5):1075-1087.
[13]OKAL B,ARRAS K O.Learning Socially Normative Robot Navigation Behaviors with Bayesian Inverse Reinforcement Learning//Proc of the IEEE International Conference on Robotics and Automation.Washington,USA:IEEE,2016:2889-2895.
[14]KRETZSCHMAR H,SPIES M,SPRUNK C,et al.Socially Compliant Mobile Robot Navigation via Inverse Reinforcement Learning.International Journal of Robotics Research,2016,35(11):1289-1307.
[15]ZHU Y K,MOTTAGHI R,KOLVE E,et al.Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning//Proc of the IEEE International Conference on Robotics and Automation.Washington,USA:IEEE,2017:3357-3364.
[16]GONG C H,TRAVERS M J,ASTLEY H C,et al.Kinematic Gait Synthesis for Snake Robots.International Journal of Robotics Research,2016,35(1/2/3):100-113.
[17]THEODOROU E,BUCHLI J,SCHAAL S.A Generalized Path Integral Control Approach to Reinforcement Learning.Journal of Machine Learning Research,2010,11:3137-3181.
[18]WILLIAMS G,DREWS P,GOLDFAIN B,et al.Aggressive Driving with Model Predictive Path Integral Control//Proc of the IEEE International Conference on Robotics and Automation.Washington,USA:IEEE,2016:1433-1440.
[19]CHEBOTAR Y,KALAKRISHNAN M,YAHYA A,et al.Path Integral Guided Policy Search[J/OL].[2018-08-23].https://arxiv.org/pdf/1610.00529.pdf.
[20]OKADA M,RIGAZIO L,AOSHIMA T.Path Integral Networks:End-to-End Differentiable Optimal Control[J/OL].[2018-08-23].https://arxiv.org/pdf/1706.09597.pdf.
[21]CHATTERJEE S,NACHSTEDT T,WORGOTTER F,et al.Reinforcement Learning Approach to Generate Goal-Directed Locomotion of a Snake-Like Robot with Screw-Drive Units//Proc of the23rd International Conference on Robotics in Alpe-Adria-Danube Region.Washington,USA:IEEE,2014.DOI:10.1109/RAAD.2014.7002234.
[22]POREZ M,IJSPEERT A J.Improved Lighthill Fish Swimming Model for Bio-inspired Robots:Modeling,Computational Aspects and Experimental Comparisons.The International Journal of Robotics Research,2014,33(10):1322-1341.