In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries鈥攊n which we ask an expert for the correct action鈥攖o infer the consequences of a potential pitfall without experiencing its effects. More important for human-robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.
© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号 地址:北京市海淀区学院路29号 邮编:100083 电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700 |