机器人操作技能学习方法综述

英文篇名：A Review of Robot Manipulation Skills Learning Methods
作者：刘乃军 ; 鲁涛 ; 蔡莹皓 ; 王硕
英文作者：LIU Nai-Jun;LU Tao;CAI Ying-Hao;WANG Shuo;State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences;University of Chinese Academy of Sciences;Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences;
关键词：机器人 ; 操作技能 ; 强化学习 ; 示教学习 ; 小数据学习
英文关键词：Robots;;manipulation skills;;reinforcement learning;;imitation learning;;few data learning
中文刊名：MOTO
英文刊名：Acta Automatica Sinica
机构：中国科学院自动化研究所复杂系统管理与控制国家重点实验室;中国科学院大学;中国科学院脑智卓越中心;
出版日期：2018-12-17 10:44
出版单位：自动化学报
年：2019
期：v.45
基金：国家自然科学基金(U1713222,61773378,61703401);; 北京市科技计划(2171100000817009)资助~~
语种：中文;
页：MOTO201903002
页数：13
CN：03
ISSN：11-2109/TP
分类号：16-28

摘要

结合人工智能技术和机器人技术,研究具备一定自主决策和学习能力的机器人操作技能学习系统,已逐渐成为机器人研究领域的重要分支.本文介绍了机器人操作技能学习的主要方法及最新的研究成果.依据对训练数据的使用方式将机器人操作技能学习方法分为基于强化学习的方法、基于示教学习的方法和基于小数据学习的方法,并基于此对近些年的研究成果进行了综述和分析,最后列举了机器人操作技能学习的未来发展方向.
Designing a robot manipulation skill learning system with autonomous reasoning and learning ability has gradually become an important branch of robotics research field in combination with artificial intelligence and robotics technology. In this paper, the main methods and the latest research results of robot manipulation skills learning methods are introduced. We divide the learning methods into three categories, namely reinforcement learning approach, demonstration learning approach, and few data learning approach. Achievements of the robot manipulation skills learning areas based on these methods are discussed thoroughly. Finally, the future research directions are listed.

引文

1 Goldberg K. Editorial:"One Robot is robotics, ten robots is automation".IEEE Thansactions on Automation Science and Engineering, 2016, 13(4):1418-1419
    2 Tan Min, Wang Shuo. Research progress on robotics. Acta Automatica Sinica, 2013, 39(7):963-972(谭民,王硕.机器人技术研究进展.自动化学报,2013, 39(7):963-972
    3 Rozo L, Jaquier N, Calinon S, Caldwell D G. Learning manipulability ellipsoids for task compatibility in robot manipulation. In:Proceedings of the 30th International Conference on Intelligent Robots and Systems(IROS). Vancouver,Canada:IEEE, 2017. 3183-3189
    4 Siciliano B, Khatib O. Springer Handbook of Robotics.Berlin:Springer, 2016. 357-398
    5 Connell J H, Mahadevan S. Robot Learning. Boston:Springer, 1993. 1-17
    6 Dang H, Allen P K. Robot learning of everyday object manipulations via human demonstration. In:Proceedings of the 23rd IEEE International Conference on Intelligent Robots and Systems(IROS). Taipei, China:IEEE, 2010.1284-1289
    7 Gu S X, Holly E, Lillicrap T, Levine S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In:Proceedings of the 35th IEEE International Conference on Robotics and Automation(ICRA).Singapore, Singapore:IEEE, 2017. 3389-3396
    8 Li D Y, Ma G F, He W, Zhang W, Li C J, Ge S S.Distributed coordinated tracking control of multiple EulerLagrange systems by state and output feedback. IET Control Theory and Applications, 2017, 11(14):2213-2221
    9 Lillicrap T P, Hunt J J, Pritzel A, Heess N, Eraz T, Tassa Y,et al. Continuous control with deep reinforcement learning.arXiv:1509.02971, 2015.
    10 Heess N, Dhruva T B, Sriram S, Lemmon J, Merel J, Wayne G, et al. Emergence of locomotion behaviours in rich environments. arXiv:1707.02286, 2017.
    11 Levine S, Abbeel P. Learning neural network policies with guided policy search under unknown dynamics. In:Proceedings of the 28th Advances in Neural Information Processing Systems(NIPS). Montreal, Canada:NIPS Press, 2014. 1071-1079
    12 Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 2016, 17(1):1334-1373
    13 Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
    14 Al-Shedivat M, Bansal T, Burda Y, Sutskever I, Mordatch I,Abbeel P. Continuous adaptation via meta-learning in nonstationary and competitive environments. In:Proceedings of the 6th International conference on Learning Representations(ICLR). Vancouver, Canada:ICLR, 2018.
    15 Levine S, Pastor P, Krizhevsky A, Quillen D. Learning handeye coordination for robotic grasping with large-scale data collection. In:Proceedings of the 25th International Symposium on Experimental Robotics. Cham:Springer, 2016.173-184
    16 Calinon S. Robot learning with task-parameterized generative models. Robotics Research. Cham:Springer, 2018. 111-126
    17 Billard A, Grollman D. Robot learning by demonstration.Scholarpedia, 2013, 8(12):3824
    18 Wiering M, van Otterlo M. Reinforcement Learning:Stateof-the-Art. Berlin:Springer-Verlag, 2015. 79-100
    19 Sutton R S, Barto A G. Reinforcement Learning:An Introduction(Second edition). Cambridge:MIT Press, 1998.
    20 Bellman R. On the theory of dynamic programming. Proceedings of the National Academy of Sciences of the United States of America, 1952, 38(8):716-719
    21 Lioutikov R, Paraschos A, Peters J, Neumann G. Samplebased informationl-theoretic stochastic optimal control. In:Proceedings of the 32nd IEEE International Conference on Robotics and Automation(ICRA). Hong Kong, China:IEEE, 2014. 3896-3902
    22 Schenck C, Tompson J, Fox D, Levine S. Learning robotic manipulation of granular media. In:Proceedings of the 1st Conference on Robot Learning(CORL). Mountain View,USA:CORL, 2017.
    23 Hester T, Quinlan M, Stone P. Generalized model learning for reinforcement learning on a humanoid robot. In:Proceedings of the 28th IEEE International Conference on Robotics and Automation(ICRA). Alaska, USA:IEEE,2010. 2369-2374
    24 Kocsis L, Szepesvari C. Bandit based Monte-Carlo planning.In:Proceedings of the 2006 European Conference on Machine Learning. Berlin, Germany:Springer, 2006. 282-293
    25 Hasselt H, Mahmood A R, Sutton R S. Off-policy TD(λ)with a true online equivalence. In:Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence. Quebec City, Canada:UAI, 2014.
    26 Park K H, Kim Y J, Kim J H. Modular Q-learning based multi-agent cooperation for robot soccer. Robotics and Autonomous Systems, 2001, 35(2):109-122
    27 Ramachandran D, Gupta R. Smoothed Sarsa:reinforcement learning for robot delivery tasks. In:Proceedings of the 27th IEEE International Conference on Robotics and Automation(ICRA). Kobe, Japan:IEEE, 2009. 2125-2132
    28 Konidaris G, Kuindersma S, Grupen R, Barto A. Autonomous skill acquisition on a mobile manipulator. In:Proceedings of the 25th AAAI Conference on Artificial Intelligence(AAAI). San Francisco, California, USA:AAAI Press,2011. 1468-1473
    29 Konidaris G, Kuindersma S, Barto A G, Grupen R A. Constructing skill trees for reinforcement learning agents from demonstration trajectories. In:Proceedings of the 24th Advances in Neural Information Processing Systems(NIPS).Vancouver Canada:NIPS Press, 2010. 1162-1170
    30 Asada M, Noda S, Tawaratsumida S, Hosoda K. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 1996, 23(2-3):279-303
    31 Kroemer O B, Detry R, Piater J, Peters J. Combining active learning and reactive control for robot grasping. Robotics and Autonomous Systems, 2010, 58(9):1105-1116
    32 Gass S I, Fu M C. Encyclopedia of Operations Research and Management Science. Boston, MA:Springer, 2013. 326-333
    33 Iruthayarajan M W, Baskar S. Covariance matrix adaptation evolution strategy based design of centralized PID controller. Expert Systems with Applications, 2010, 37(8):5775-5781
    34 Endo G, Morimoto J, Matsubara T, Nakanishi J, Cheng G.Learning CPG-based biped locomotion with a policy gradient method:application to a humanoid robot. The International Journal of Robotics Research, 2008, 27(2):213-228
    35 Peters J, Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks, 2008, 21(4):682-697
    36 Deisenroth M P, Rasmussen C E, Fox D. Learning to control a low-cost manipulator using data-efficient reinforcement learning. Robotics:Science and Systems VII. Cambridge:MIT Press, 2011. 57-64
    37 Deisenroth M P, Rasmussen C E. PILCO:a model-based and data-efficient approach to policy search. In:Proceedings of the 28th International Conference on Machine Learning(ICML). Washington, USA:Omnipress, 2011. 465-472
    38 Deisenroth M P, Neumann G, Peters J. A survey on policy search for robotics. Foundations and Trends in Robotics,2013, 2(1-2):1-142
    39 Zhao Dong-Bin, Shao Kun, Zhu Yuan-Heng, Li Dong, Chen Ya-Ran, Wang Hai-Tao, et al. Review of deep reinforcement learning and discussions on the development of computer Go. Control Theory and Applications, 2016, 33(6):701-717(赵冬斌,邵坤,朱圆恒,李栋,陈亚冉,王海涛,等.深度强化学习综述:兼论计算机围棋的发展.控制理论与应用,2016, 33(6):701-717)
    40 Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J,Bellemareet M G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529-533
    41 Silver D, Huang A, Maddison C, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587):484-489
    42 Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature, 2017, 550(7587):354-359
    43 van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In:Proceedings of the 30th AAAI Conference on Artificial Intelligence. Arizona, USA:AAAI Press, 2016. 2094-2100
    44 Wang Z Y, Schaul T, Hessel M, van Hasselt H, Lanctot M, de Freitas N. Dueling network architectures for deep reinforcement learning. In:Proceedings of the 33rd International Conference on Machine Learning. New York City,USA:JMLR, 2016. 1995-2003
    45 Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable MDPs. In:Proceedings of the 29th AAAI Conference on Artificial Intelligence. Texas, USA:AAAI Press, 2015
    46 Zhang F Y, Leitner J, Milford M, Upcroft B, Corke P. Towards vision-based deep reinforcement learning for robotic motion control. arXiv:1511.03791, 2015.
    47 Zhang F Y, Leitner J, Milford M, Corke P. Modular deep Q networks for Sim-to-real transfer of visuo-motor policies.arXiv:1610.06781, 2016.
    48 Gu S X, Lillicrap T, Sutskever I, Levine S. Continuous deep Q-learning with model-based acceleration. In:Proceedings of the 33rd International Conference on Machine Learning(ICML). New York City, USA:JMLR, 2016. 2829-2838
    49 Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M. Deterministic policy gradient algorithms. In:Proceedings of the 31st International Conference on International Conference on Machine Learning(ICML). Beijing, China:JMLR, 2014. 387-395
    50 Schulman J, Levine S, Moritz P, Jordan M, Abbeel P. Trust region policy optimization. In:Proceedings of the 32nd International Conference on Machine Learning(ICML). Lille,France:JMLR., 2015. 1889-1897
    51 Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T, Harley T, et al. Asynchronous methods for deep reinforcement learning. In:Proceedings of the 33rd International Conference on Machine Learning(ICML). New York City, USA:JMLR, 2016. 1928-1937
    52 Levine S, Koltun V. Guided policy search. In:Proceedings of the 30th International Conference on Machine Learning(ICML). Atlanta, USA:JMLR, 2013. 1-9
    53 Levine S, Koltun V. Learning complex neural network policies with trajectory optimization. In:Proceedings of the31st International Conference on Machine Learning(ICML).Beijing, China:JMLR, 2014. 829-837
    54 Malekzadeh M, Queiβer J, Steil J J. Imitation learning for a continuum trunk robot. In:Proceedings of the 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning(ESANN). Bruges,Belgium:ESANN, 2017.
    55 Ross S, Gordon G J, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning. In:Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA:JMLR, 2011. 627-635
    56 Ng A Y, Russell S J. Algorithms for inverse reinforcement learning. In:Proceedings of the 17th International Conference on Machine Learning(ICML). Stanford, USA:Morgan Kaufmann Publishers Inc., 2000. 663-670
    57 Zhou Zhi-Hua. Machine Learning. Beijing:Tsinghua University Press, 2016.(周志华.机器学习.北京:清华大学出版社,2016.)
    58 Takeda T, Hirata Y, Kosuge K. Dance step estimation method based on HMM for dance partner robot. IEEE Transactions on Industrial Electronics, 2007, 54(2):699-706
    59 Calinon S, Guenter F, Billard A. On learning, representing,and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B(Cybernetics), 2007, 37(2):286-298
    60 Calinon S, Billard A. Incremental learning of gestures by imitation in a humanoid robot. In:Proceedings of the 2nd ACM/IEEE International Conference on Human-robot Interaction. Arlington, VA, USA:IEEE, 2007. 255-262
    61 Rahmatizadeh R, Abolghasemi P, Behal A, Boloni L. From virtual demonstration to real-world manipulation using LSTM and MDN. arXiv:1603.03833, 2016.
    62 Calinon S, DHalluin F, Sauser E L, Caldwell D G, Billard A G. Learning and reproduction of gestures by imitation.IEEE Robotics and Automation Magazine, 2010, 17(2):44-54
    63 Zhang T H, McCarthy Z, Jow O, Lee D, Chen X, Goldberg K, et al. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In:Proceedings of the 36th International Conference on Robotics and Automation(ICRA). Brisbane, Australia:IEEE, 2018.
    64 Abbeel P, Ng A Y. Apprenticeship learning via inverse reinforcement learning. In:Proceedings of the 21st International Conference on Machine Learning(ICML). Alberta, Canada:ACM, 2004.
    65 Ratliff N D, Bagnell J A, Zinkevich M A. Maximum margin planning. In:Proceedings of the 23rd International Conference on Machine Learning(ICML). Pennsylvania, USA:ACM, 2006. 729-736
    66 Ziebart B D, Maas A, Bagnell J A, Dey A K. Maximum entropy inverse reinforcement learning. In:Proceedings of the 23rd AAAI Conference on Artificial Intelligence(AAAI).Illinois, USA:AAAI Press, 2008. 1433-1438
    67 Levine S, Popovicf Z, Koltun V. Nonlinear inverse reinforcement learning with Gaussian processes. In:Proceedings of the 24th International Conference on Neural Information Processing Systems(NIPS). Granada, Spain:Curran Associates, 2011. 19-27
    68 Ratliff N D, Bradley D M, Bagnell J A, Chestnutt J E. Boosting structured prediction for imitation learning.In:Proceedings of the 19th Advances in Neural Information Processing Systems(NIPS). British Columbia, Canada:Curran Associates, 2006. 1153-1160
    69 Xia C, El Kamel A. Neural inverse reinforcement learning in autonomous navigation. Robotics and Autonomous Systems, 2016, 84:1-14
    70 Wulfmeier M, Ondruska P, Posner I. Maximum entropy deep inverse reinforcement learning. arXiv:1507.04888, 2015.
    71 Finn C, Levine S, Abbeel P. Guided cost learning:deep inverse optimal control via policy optimization. In:Proceedings of the 33rd International Conference on Machine Learning(ICML). New York City, USA:JMLR, 2016. 49-58
    72 Ho J, Ermon S. Generative adversarial imitation learning.In:Proceedings of the 30th Advances in Neural Information Processing Systems(NIPS). Barcelona, Spain:Curran Associates, 2016. 4565-4573
    73 Peng X B, Abeel P, Levine S, van de Panne M. DeepMimic:example-guided deep reinforcement learning of physics-based character skills. arXiv:1804.02717, 2018.
    74 Zhu Y K, Wang Z Y, Merel J, Rusu A, Erez T, Cabi S, et al.Reinforcement and imitation learning for diverse visuomotor skills. arXiv:1802.09564, 2018.
    75 Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, et al. Deep Q-learning from demonstrations. In:Proceedings of the 32th Association for the Advancement of Artificial Intelligence(AAAI). Louisiana USA:AAAI Press, 2018.
    76 Lemke C, Budka M, Gabrys B. Metalearning:a survey of trends and technologies. Artificial Intelligence Review. 2015,44(1):117-130
    77 Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359
    78 Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T.Deep domain confusion:maximizing for domain invariance.arXiv:1412.3474, 2014.
    79 Shi Z Y, Siva P, Xiang T. Transfer learning by ranking for weakly supervised object annotation. arXiv:1705.00873,2017.
    80 Gupta A, Devin C, Liu Y X, Abbeel P, Levine S. Learning invariant feature spaces to transfer skills with reinforcement learning. In:Proceedings of the 5th International Conference on Learning Representations(ICLR). Toulon, France:ICLR, 2017.
    81 Stadie B C, Abbeel P, Sutskever I. Third-person imitation learning. In:Proceedings of the 5th International Conference on Learning Representations(ICLR). Toulon, France:ICLR, 2017.
    82 Ammar H B, Eaton E, Ruvolo P, Taylor M E. Online multitask learning for policy gradient methods. In:Proceedings of the 31st International Conference on International Conference on Machine Learning(ICML). Beijing, China:JMLR,2014. 1206-1214
    83 Gupta A, Devin C, Liu Y X, Abbeel P, Levine S. Learning invariant feature spaces to transfer skills with reinforcement learning. arXiv:1703.02949, 2017.
    84 Tzeng E, Devin C, Hoffman J, Finn C, Peng X C, Levine S, et al. Towards adapting deep visuomotor representations from simulated to real environments. arXiv:1511.07111,2015.
    85 Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In:Proceedings of the 30th Advances in Neural Information Processing Systems(NIPS). Barcelona, Spain:Curran Associates, 2016.3630-3638
    86 Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T.Meta-learning with memory-augmented neural networks.In:Proceedings of the 33rd International Conference on Machine Learning(ICML). New York City, USA:JMLR, 2016.1842-1850
    87 Ravi S, Larochelle H. Optimization as a model for few-shot learning. In:Proceedings of the 5th International Conference on Learning Representations(ICLR). Toulon, France:ICLR, 2017.
    88 Edwards H, Storkey A. Towards a neural statistician. In:Proceedings of the 5th International Conference on Learning Representations(ICLR). Toulon, France:ICLR, 2017.
    89 Rezende D, Mohamed S, Danihelka I, Gregor K, Wierstra D.One-shot generalization in deep generative models. In:Proceedings of the 33rd International Conference on Machine Learning(ICML). New York City, USA:JMLR, 2016.
    90 Duan Y, Schulman J, Chen X, Bartlett P L, Sutskever I,Abbeel P. RL~2:fast, reinforcement learning via slow reinforcement learning. arXiv:1611.02779, 2016.
    91 Wang J X, Kurth-Nelson Z, Tirumala D, Soyer H, Leibo J Z, Munoset R, et al. Learning to reinforcement learn. arXiv:1611.05763, 2016.
    92 Duan Y, Andrychowicz M, Stadie B C, Ho J, Schneider J,Sutskever I, et al. One-shot imitation learning. arXiv:1703.07326, 2017.
    93 Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv:1703.03400,2017.
    94 Xu D F, Nair S, Zhu Y K, Gao J L, Garg A, Li F F, et al. Neural task programming:learning to generalize across hierarchical tasks. arXiv:1710.01813, 2017.
    95 Reed S, de Freitas N. Neural programmer-interpreters.arXiv:1511.06279, 2015.
    96 Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P. Domain randomization for transferring deep neural networks from simulation to the real world. In:Proceedings of the 30th International Conference on Intelligent Robots and Systems(IROS). Vancouver, Canada:IEEE, 2017. 23-30

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700