Introduction of Fixed Mode States into Online Profit Sharing and Its Application to Waist Trajectory Generation of Biped Robot

详细信息

推荐本文 |

作者：Seiya Kuroda (1)
Kazuteru Miyazaki (2) teru@niad.ac.jp
Hiroaki Kobayashi (3)
关键词：Reinforcement Learning &#8211 ; Exploitation ; oriented Learning &#8211 ; Profit Sharing &#8211 ; Improved PARP &#8211 ; biped robot
刊名：Lecture Notes in Computer Science
出版时间：2012
出版年：2012
期刊代码：99_0302-9743
类别：cp
卷：7188
期：1
页码：297-308
数据来源：sp

摘要

In reinforcement learning of long-term tasks, learning efficiency may deteriorate when an agent’s probabilistic actions cause too many mistakes before task learning reaches its goal. The new type of state we propose – fixed mode – to which a normal state shifts if it has already received sufficient reward – chooses an action based on a greedy strategy, eliminating randomness of action selection and increasing efficiency. We start by proposing the combining of an algorithm with penalty avoiding rational policy making and online profit sharing with fixed mode states. We then discuss the target system and learning-controller design. In simulation, the learning task involves stabilizing of biped walking by using the learning controller to modify a robot’s waist trajectory. We then discuss simulation results and the effectiveness of our proposal.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700