Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems

详细信息查看全文

作者：Kyriakos G. Vamvoudakis ^{kyriakos@ece.ucsb.edu" class="auth_mail" title="E-mail the corresponding author}Author Vitae
关键词：Q-learning ; Nash-games ; Uncertain systems ; Model-free formulation
刊名：Automatica
出版年：2015
出版时间：November 2015
年：2015
卷：61
期：Complete
页码：274-281
全文大小：509 K

文摘

This work proposes a novel Q-learning algorithm to solve the problem of non-zero sum Nash games of linear time invariant systems with g" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S000510981500343X&_mathId=si2.gif&_user=111111111&_pii=S000510981500343X&_rdoc=1&_issn=00051098&md5=b0c17363ebb72bae9a49f3a487882934" title="Click to view the MathML source">N

g="si2.

gif" overflow="scroll">N-players (control inputs) and centralized uncertain/unknown dynamics. We first formulate the Q-function of each player as a parametrization of the state and all other the control inputs or players. An integral reinforcement learning approach is used to develop a model-free structure of g" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S000510981500343X&_mathId=si2.gif&_user=111111111&_pii=S000510981500343X&_rdoc=1&_issn=00051098&md5=b0c17363ebb72bae9a49f3a487882934" title="Click to view the MathML source">N

g="si2.

gif" overflow="scroll">N-actors/g" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S000510981500343X&_mathId=si2.gif&_user=111111111&_pii=S000510981500343X&_rdoc=1&_issn=00051098&md5=b0c17363ebb72bae9a49f3a487882934" title="Click to view the MathML source">N

g="si2.

gif" overflow="scroll">N-critics to estimate the parameters of the g" data-mathURL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S000510981500343X&_mathId=si2.gif&_user=111111111&_pii=S000510981500343X&_rdoc=1&_issn=00051098&md5=b0c17363ebb72bae9a49f3a487882934" title="Click to view the MathML source">N

g="si2.

gif" overflow="scroll">N-coupled Q-functions online while also guaranteeing closed-loop stability and convergence of the control policies to a Nash equilibrium. A 4th order, simulation example with five players is presented to show the efficacy of the proposed approach.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700