基于异步优势执行器评价器的自适应PID控制

英文篇名：Adaptive PID Controller Based on Asynchronous Advantage Actor-Critic Learning
作者：段友祥 ; 任辉 ; 孙歧峰 ; 闫亚男
英文作者：Duan Youxiang;Ren Hui;Sun Qifeng;Yan Yanan;College of Computer & Communication Engineering,China University of Petroleum;
关键词：深度强化学习 ; 异步优势执行器评价器 ; 自适应PID
英文关键词：deep reinforcement learning;;asynchronous advantage actor-critic;;adaptive PID control
中文刊名：JZCK
英文刊名：Computer Measurement & Control
机构：中国石油大学(华东)计算机与通信工程学院;
出版日期：2019-02-25
出版单位：计算机测量与控制
年：2019
期：v.27;No.245
基金：十三五”重大专项(2017ZX05009-001,2016ZX05011-002);; 中央高校基本科研业务费(18CX02020A)
语种：中文;
页：JZCK201902016
页数：5
CN：02
ISSN：11-4762/TP
分类号：76-79+84

摘要

自适应PID较好地解决了传统PID无法自整定参数的问题,已成为控制领域内的研究热点;研究基于异步优势执行器评价器(Asynchronous Advantage Actor-Critic,A3C)算法设计了一种新的自适应PID控制器;该控制器利用A3C结构的多线程异步学习特性,并行训练多个执行器评价器(Actor-Critic,AC)结构的智能体,每个智能体采用多层前馈神经网络逼近策略函数和值函数实现在连续动作空间中搜索最优的参数整定策略,以达到最佳的控制效果;与已有的多种自适应PID控制器性能对比分析结果表明该方法具有收敛速度快,自适应能力强的特点。
Self-adaptive PID has become a hotspot in the field of control,it can solve the problem that traditional PID can't turning parameters.This paper proposed a new adaptive PID controller based on the Asynchronous Advantage Actor-Critic(A3C)algorithm.It used the multi-threaded and asynchronous learning style to train multiple agents of Actor-Critic(AC)structures in parallel.In order to achieve the best effect,each agent adopts a multilayer feedforward neural network to approximate strategy function and value function.In this way,they can search for the best parameter turning strategies in continuous motion space.Compared with the performance of others adaptive PID controllers,the results show that this method has the advantage of fast convergence and strong self-adaptability.

引文

[1]Adel T,Abdelkader C.A particle swarm optimization approach for optimum design of PID controller for nonlinear systems[A].International Conference on Electrical Engineering and Software Applications[C].IEEE,2013:1-4.
    [2]Savran A.A multivariable predictive fuzzy PID control system[J].Applied Soft Computing,2013,13(5):2658-2667.
    [3]谢朝杰,保宏,杜敬利,等.一种新隶属度函数在非线性变增益模糊PID控制中的应用[J].信息与控制,2014,43(3):264-269.
    [4]李草苍,张翠芳.基于最小资源分配网络的自适应PID控制[J].计算机应用研究,2015,32(1):167-169.
    [5]Wang X S,Cheng Y H,Wei S.A proposal of adaptive PID controller based on reinforcement learning[J].Journal of China U-niversity of Mining&Technology,2007,17(1):40-44.
    [6]常俊林,李亚朋,马小平,等.基于改进差分进化算法的PID檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳檳优化设计[J].控制工程,2010,17(6):807-810.
    [7]Shang X Y,Ji T Y,Li M S,et al.Parameter optimization of PID controllers by reinforcement learning[A].Computer Science and Electronic Engineering Conference[C].IEEE,2013:77-81.
    [8]陈学松,杨宜民.基于执行器-评价器学习的自适应PID控制[J].控制理论与应用,2011,28(8):1187-1192.
    [9]Wang Z,Bapst V,Heess N,et al.Sample efficient actor-critic with experience replay[J].arXiv preprint arXiv:1611.01224,2016.
    [10]Mnih V,Badia A P,Mirza M,et al.Asynchronous methods for deep reinforcement learning[A].International Conference on Machine Learning[C].2016:1928-1937.
    [11]刘全,翟建伟,章宗长,等.深度强化学习综述[J].2017,40(1).
    [12]秦蕊,曾帅,李娟娟,等.基于深度强化学习的平行企业资源计划[J].自动化学报,2017,43(9):1588-1596.
    [13]文波,孟令军,张晓春,等.基于增量式PID算法的水温自动控制器设计[J].仪表技术与传感器,2015(12):113-116.
    [14]刘智斌,曾晓勤,刘惠义,等.基于BP神经网络的双层启发式强化学习方法[J].计算机研究与发展,2015,52(3):579-587.
    [15]Seijen H,Sutton R.True online TD(lambda)[A].International Conference on Machine Learning,2014:692-700.