一种改进dueling网络的机器人避障方法

英文篇名：Method for robot obstacle avoidance based on the improved dueling network
作者：周翼 ; 陈渤
英文作者：ZHOU Yi;CHEN Bo;National Key Lab.of Radar Signal Processing,Xidian Univ.;Collaborative Innovation Center of Information Sensing and Understanding,Xidian Univ.;
关键词：机器人避障 ; 深度增强学习 ; dueling网络 ; 独立训练
英文关键词：robot obstacle avoidance;;deep reinforcement learning;;dueling networks;;independent trained
中文刊名：XDKD
英文刊名：Journal of Xidian University
机构：西安电子科技大学雷达信号处理国家重点实验室;西安电子科技大学信息感知技术协同创新中心;
出版日期：2018-07-10 15:39
出版单位：西安电子科技大学学报
年：2019
期：v.46
基金：国家自然科学基金(61771361);国家自然科学基金杰出青年基金(61525105)
语种：中文;
页：XDKD201901010
页数：6
CN：01
ISSN：61-1076/TN
分类号：52-56+69

摘要

针对传统增强学习方法在运动规划领域,尤其是机器人避障问题上存在容易过估计、难以适应复杂环境等不足,提出了一种基于深度增强学习的提升机器人避障性能的新算法模型。该模型将dueling神经网络架构与传统增强学习算法Q学习相结合,并利用两个独立训练的dueling网络处理环境数据来预测动作值,在输出层分别输出状态值和动作优势值,并将两者结合输出最终动作值。该模型能处理较高维度数据以适应复杂多变的环境,并输出优势动作供机器人选择以获得更高的累积奖励。实验结果表明,该新算法模型能有效地提升机器人避障性能。
In view of the disadvantages of traditional reinforcement learning methods in motion planning,especially the problem of robot obstacle avoidance,it is easy to have overestimation and difficult to adapt to complex environment.A new model based on deep reinforcement learning is proposed to improve the obstacle avoidance performance of robots.The model combines dueling networks with Q-learning which is the traditional reinforcement learning method,and using two independent trained dueling networks to deal with environmental data and predict the action value.In the output layer,the state value and the action advantage are output respectively,with both values combined as the final action value.The model can process high dimension data to adapt to complex and changeable environment,and output advantageous actions for robot selection to get a higher accumulative reward.It can effectively improve the obstacle avoidance performance of a robot.

引文

[1]LAVALLE S M.Motion Planning[J].IEEE Robotics and Automation Magazine,2011,18(2):108-118.
    [2]NILSSON N J.Shakey the Robot[J].Sri International Menlo Park,1984,42(1991):38-65.
    [3]ORLIN J.Network Flows[J].Journal of the Operational Research Society,1993,45(11):791-796.
    [4]STENTZ A.The Focussed D*Algorithm for Real-time Replanning[C]//Proceedings of the 1995IEEE Joint Conference on Artificial Intelligence.Piscataway:IEEE,1995:1652-1659.
    [5]KHATIB O.Real-time Obstacle Avoidance for Manipulators and Mobile Robots[J].International Journal of Robotics Research,1986,5(1):90-98.
    [6]SUTTON R S,BARTO A G.Reinforcement Learning:an Introduction[M].2nd edition.Cambridge:The MIT Press,2017.
    [7]ZHANG Q C,LIN M,YANG L T,et al.Energy-efficient Scheduling for Real-time Systems Based on Deep Q-learning Model[J].IEEE Transactions on Sustainable Computing,2017,DOI 10.1109/TSUSC.2017.2743704.
    [8]DERHAMI V,MAJD V J,AHMADABADI M N.Fuzzy Sarsa Learning and the Proof of Existence of Its Stationary Points[J].Asian Journal of Control,2008,10(5):535-549.
    [9]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human Level Control Through Deep Reinforcement Learning[J].Nature,2015,518:529-533.
    [10]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].Computer Science,2013,1312(5602):23-32.
    [11]PAN J,WANG X CHENG Y,et al.Multisource Transfer Double DQN Based on Actor Learning[J].IEEETransactions on Neural Networks and Learning Systems,2018,29(6):2227-2238.
    [12]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//Proceedings of the 2016 33rd International Conference on Machine Learning.Lille:International Machine Learning Society(IMLS),2016:2939-2947.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700