摘要
针对传统增强学习方法在运动规划领域,尤其是机器人避障问题上存在容易过估计、难以适应复杂环境等不足,提出了一种基于深度增强学习的提升机器人避障性能的新算法模型。该模型将dueling神经网络架构与传统增强学习算法Q学习相结合,并利用两个独立训练的dueling网络处理环境数据来预测动作值,在输出层分别输出状态值和动作优势值,并将两者结合输出最终动作值。该模型能处理较高维度数据以适应复杂多变的环境,并输出优势动作供机器人选择以获得更高的累积奖励。实验结果表明,该新算法模型能有效地提升机器人避障性能。
In view of the disadvantages of traditional reinforcement learning methods in motion planning,especially the problem of robot obstacle avoidance,it is easy to have overestimation and difficult to adapt to complex environment.A new model based on deep reinforcement learning is proposed to improve the obstacle avoidance performance of robots.The model combines dueling networks with Q-learning which is the traditional reinforcement learning method,and using two independent trained dueling networks to deal with environmental data and predict the action value.In the output layer,the state value and the action advantage are output respectively,with both values combined as the final action value.The model can process high dimension data to adapt to complex and changeable environment,and output advantageous actions for robot selection to get a higher accumulative reward.It can effectively improve the obstacle avoidance performance of a robot.
引文
[1]LAVALLE S M.Motion Planning[J].IEEE Robotics and Automation Magazine,2011,18(2):108-118.
[2]NILSSON N J.Shakey the Robot[J].Sri International Menlo Park,1984,42(1991):38-65.
[3]ORLIN J.Network Flows[J].Journal of the Operational Research Society,1993,45(11):791-796.
[4]STENTZ A.The Focussed D*Algorithm for Real-time Replanning[C]//Proceedings of the 1995IEEE Joint Conference on Artificial Intelligence.Piscataway:IEEE,1995:1652-1659.
[5]KHATIB O.Real-time Obstacle Avoidance for Manipulators and Mobile Robots[J].International Journal of Robotics Research,1986,5(1):90-98.
[6]SUTTON R S,BARTO A G.Reinforcement Learning:an Introduction[M].2nd edition.Cambridge:The MIT Press,2017.
[7]ZHANG Q C,LIN M,YANG L T,et al.Energy-efficient Scheduling for Real-time Systems Based on Deep Q-learning Model[J].IEEE Transactions on Sustainable Computing,2017,DOI 10.1109/TSUSC.2017.2743704.
[8]DERHAMI V,MAJD V J,AHMADABADI M N.Fuzzy Sarsa Learning and the Proof of Existence of Its Stationary Points[J].Asian Journal of Control,2008,10(5):535-549.
[9]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human Level Control Through Deep Reinforcement Learning[J].Nature,2015,518:529-533.
[10]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].Computer Science,2013,1312(5602):23-32.
[11]PAN J,WANG X CHENG Y,et al.Multisource Transfer Double DQN Based on Actor Learning[J].IEEETransactions on Neural Networks and Learning Systems,2018,29(6):2227-2238.
[12]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//Proceedings of the 2016 33rd International Conference on Machine Learning.Lille:International Machine Learning Society(IMLS),2016:2939-2947.