基于近端策略优化算法的四足机器人步态控制研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:On Gait Control of Quadruped Robot Based on Proximal Policy Optimization Algorithm
  • 作者:张浩昱 ; 熊凯
  • 英文作者:ZHANG Haoyu;XIONG Kai;Beijing Institute of Control Engineering;Science and Technology on Space Intelligent Control laboratory;
  • 关键词:深度强化学习 ; 近端策略优化 ; 机器人控制
  • 英文关键词:deep reinforcement learning;;proximal policy optimization;;robot control
  • 中文刊名:KJKZ
  • 英文刊名:Aerospace Control and Application
  • 机构:北京控制工程研究所;空间智能控制技术国家级重点实验室;
  • 出版日期:2019-06-15
  • 出版单位:空间控制技术与应用
  • 年:2019
  • 期:v.45;No.264
  • 基金:北京市自然科学基金(4162070);; 国家自然科学基金(61573059)资助项目~~
  • 语种:中文;
  • 页:KJKZ201903008
  • 页数:6
  • CN:03
  • ISSN:11-5664/V
  • 分类号:56-61
摘要
足式机器人步态控制是机器人研究领域的难点问题,应用强化学习让机器人自主学习策略提供了一种很好的解决思路.基于ROS机器人操作系统搭建了四足机器人仿真平台,将近端策略优化算法用于四足机器人步态控制,并与其他深度强化学习算法进行了对比分析.仿真实验结果表明,近端策略优化算法在实际应用中具有更好的训练效果.
        The gait control of the foot robot is a difficult problem in the field of robot research. Applying reinforcement learning provides a good solution to the robot control policy. Based on the ROS robot operating system,a quadruped robot simulation platform is built. The proximal policy optimization algorithm is applied to the quadruped robot gait control,and compared with other deep reinforcement learning algorithms. Simulation results show that the proximal policy optimization algorithm has better training effect in practical applications.
引文
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529.
    [2]SCHULMAN J,LEVINE S,MORITZ P,et al.Trust region policyoptimization[J].Computer Science,2015:1889-1897.
    [3]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J].ar Xiv preprint ar Xiv:1707.06347,2017.
    [4]BARTO A G.Reinforcement learning[J].A Bradford Book,1998,15(7):665-685.
    [5]周志华.机器学习[M].北京:清华大学出版社,2016:377-382.
    [6]SUTTON L,RICHARD S,BARTO F,et al.Introduction to reinforcement learning[J].Machine Learning,2005,16(1):285-286.
    [7]SUTTON R S,MCALLESTER D A,SINGH S P,et al.Policy gradient methods for reinforcement learning with function approximation[C]∥Neural information processing systems,1999:1057-1063.
    [8]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].Computer Science,2015,8(6):A187.
    [9]ZHANG H Y,XIONG K.Improved deep deterministic policy gradient algorithm based on prioritized sampling[C]∥Chinese Intelligent Systems Conference.Singapore:Springer,2018.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700