Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

详细信息	查看全文 \| 推荐本文 \|

作者：Finale Doshi-Velez^a ; ^{finale@alum.mit.edu} ; Joelle Pineau^b ; Nicholas Roy^a
关键词：Partially observable Markov decision process ; Reinforcement learning ; Bayesian methods
刊名：Artificial Intelligence
出版年：2012
期刊代码：11_00043702
类别：cp
出版时间：August, 2012
卷：187-188
期：Complete
页码：115-132
文件大小：568 K

摘要

Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agent始s sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agent始s knowledge and actions that increase the agent始s immediate reward. However, the task of specifying the POMDP始s parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive.

In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries鈥攊n which we ask an expert for the correct action鈥攖o infer the consequences of a potential pitfall without experiencing its effects. More important for human-robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700