具有认知能力的智能机器人行为学习方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
行为学习是智能机器人设计中的关键技术之一。目前,机器人行为学习方法只限于学习反射式行为。人为给出任务的知识表示结构,根据训练样本来不断调整参数,一旦任务改变则需要重新编程。具有该行为学习能力的系统不具备认知能力,无法产生复杂的智能行为。研究具有认知能力的机器人系统已经成为机器人学研究的重要方向,研究涉及认知心理学、认知科学以及动物行为学等领域。
     本文着重研究了机器人的认知机制,深入分析了认知模型对于机器人智能发展的重要性。提出了具有认知能力的智能机器人体系结构,并对认知模型中的知识表示以及学习方法进行深入研究,最后利用该研究成果实现了环境的空间认知,自底向上突现出了多任务规划行为。论文的主要工作如下:
     首先,本文从智能产生的角度重新对机器人的范式进行分类。新的范式分类不仅涵盖了传统的系统范式,而且完善了智能机器人的认知层次,区分了不同的智能等级,明确了认知能力在机器人系统范式中的地位。在此基础上,本文提出了具有认知能力的智能机器人体系结构。该体系结构具有自主学习的能力,只需要给出基本的反射式行为,所有的高级认知能力都可以通过自主学习得到,不需重新编程。各模块之间互相依赖并且可以同时学习,具有实时的学习能力。
     其次,研究了环境特征的自组织提取,利用“主动感知行为”和“感知-运动协调”来获得环境特征。给出基于变化检测和激活强度的活性神经元设计方法,并利用动态增长自组织特征图(GDSOM)实现了路标的自组织提取和路标识别。实验表明该路标提取和识别方法无需精确定位控制和传感器度量模型,具有较好的鲁棒性和计算速度,并且有效解决了“感知变化性”问题,为认知能力打下基础。
     再次,研究了时空经验的知识表示和学习方法。讨论了认知数学模型——观测驱动马尔科夫决策过程(ODMDP)并提出了相应的求解策略。借鉴生物神经元的特性,提出一种新的生物神经网络模型—时空联想记忆网络(STAMN)。该网络实现了状态和行动的增量学习并且解决了ODMDP的状态定位问题。利用STAMN实现了环境的空间认知,实验表明该网络可以用于解决循环环境的同时全局定位和标图(SLAM)问题。
     最后,研究了具有认知能力的强化学习方法。针对机器人所面临的多任务学习问题,提出了具有认知能力的强化学习模型,并提出了适合多任务学习的k步记忆和k步预测的Sarsa((k-M)(k-P) Sarsa)算法。该强化学习模型解决了ODMDP的策略学习问题,并且具有较好的收敛速度。迷宫环境实验验证了智能机器人的多任务学习的有效性。
Behavior learning is one of the key techniques for intelligent robot design. Nowadays, behavior learning methods of robot is limited to reflex behavior learning. Knowledge representation structure of tasks is given by human beforehand, and training samples is used for parameter tuning. Once the task is changed, reprogramming is needed. Systems that possess such behavior learning capability do not have the cognitive ability, and are unable to emerge complex intelligent behavior. Research on the robotics systems with cognitive ability is becoming an important research direction of robotics, which is closely related to cognitive psychology, cognitive science and animal behavior.
     This thesis focuses on the research of the cognitive mechanism of robotics, and thoroughly analyses the importance of cognitive model to the development of robot’s intelligence. The architecture of intelligent robots with cognitive ability is presented, knowledge representation and learning methods of the cognitive model is thoroughly studied. Finally, the results are used to achieve environmental spatial cognition, and emerge the multi-tasks’planning behavior in a bottom-up way. The main contributions are as follows:
     Firstly, the paradigm of robot architecture is reclassificated from the viewpoint of intelligence acquisition. New paradigm classification not only covers the traditional paradigm, but also completes the cognitive levels of intelligent robot, differentiates the intelligent levels of robot systems, and specifies the importance of cognitive ability in the paradigm of robot architecture. Based on this, this thesis presents the architecture of intelligent robots with cognitive ability, which realizes autonomous learning, only needs the fundamental reflex behavior, and acquires the high-level cognitive ability through autonomous learning, instead of reprogramming. Modules are dependant on each other, learning synchronously, and so, possess the ability of real-time learning.
     Secondly, self-organized extraction process of the environmental features is studied.“Active exploration behavior”and“sensory-motor coordination”are used to acquire environmental features. Design method of the activity neurons based on variaty detection and activation intensity is presented. The growing dynamic self-organizing feature map (GDSOM) is presented to extract and recognize the landmark. Experiment results show that this landmark extracting method does not need exact location control and sensor metric model, while, possesses better robustness and less computing burden, which effectively solves the problem of“perception variability”, and builds the foundation for cognitive ability.
     Thirdly, the knowledge representation and learning method of spatio-temporal experience are studied. The cognitive mathematical model, that is observation-drived Markov decision process (ODMDP), is discussed, and the solving strategy is proposed. Referring the characteristics of biological neuron, a new biological neural network model, spatio-temporal associative memory networks (STAMN), is proposed to realize the incremental learning of state and action. The state localization problem of ODMDP is resolved. STAMN proposed here is applied to achieve environmental spatial cognition. Experiment results show that this network can effectively solve the SLAM problem for large-scale circular environment.
     Finally, the reinforcement learning methods with cognitive ability are studied. Reinforcement learning model, which resolves strategy learning problem of ODMDP, and a (k-M)(k-P) Sarsa algorithm are proposed for the multi-tasks learning problem of robot. Their feasibility and effectiveness are validated by the maze environment multi-tasks experiments.
引文
[1]蔡自兴,贺汉根,陈虹.未知环境中移动机器人导航控制研究的若干问题.控制与决策. 2002, 17(4): 385-390页
    [2] Robin R. Murphy. Introduction to AI Robotics. MIT Press, 2004: 22-23P
    [3] Hajime Asama, Masafumi Yano. System principle on emergence of mobilligence and its engineering realization. Proceedings of IEEE International conference on Intelligence and Systems. 2003: 1715~1720P
    [4]谭民,王硕,曹志强.多机器人系统.清华大学出版社,2004:20-23页
    [5] Brooks R. A.. Intelligence without reason. International Joint Conference on Artificial Intelligence. 1991: 569-595P
    [6] Brooks R. A.. A robust layered control system for a mobile robot. Journal of Robotics and Automation. 1986, 2(1): 14-23P
    [7] Brooks R. A.. Intelligence without representation. Artificial Intelligence. 1991, 47: 139~159P
    [8] Matari′c, M. and Brooks, R.. Learning a distributed map representation based on navigation behaviors. In Cambrian intelligence. MIT Press, 1999: 37-58P
    [9] Weng J. Learning in image analysis and beyond: development. Visual Communication and Image Processing. New York, 1998: 356-367 P
    [10] Weng J, Mcclelland J, Pentaland A, et al. Autonomous mental development by robots and animals. Science. 2001, 291: 599-600P
    [11]于化龙,朱长明,刘海波,顾国昌.发育机器人研究综述.智能系统学报. 2007, 2(4): 34-39页
    [12]高颖,陈东岳,张立明.一种带有实时发育视觉特征学习的自主发育机器人探索.复旦学报(自然科学版). 2005, 44(6): 964-970页
    [13] D. Y. Chen, X. D. Lu, L. M. Zhang.. Fisher subspace tree classifier based on neural networks. International Symposium on Neural Networks 2005. Chongqing, China, Proceeding lecture notes in Computer Science. 2005, 3497: 14-19P
    [14] D. Y. Chen, L. M. Zhang.. An incremental linear discriminate analysis using fixed point method. International Symposium on Neural Networks 2006. Chengdu, China, Proceeding lecture notes in Computer Science. 2006, 3971: 1334-1339P
    [15] M. Lungarella, G. Metta and R. Pfeifer. Developmental robotics: a survey. Connetion Science. 2003, 15(4): 151-190P
    [16] Weng J. Developmental robotics: theory and experiments. International Journal of Humanoid Robotics. 2004, 1(2): 199-236P
    [17] Nagai Y. Understanding the development of joint attention from a viewpoint of cognitive developmental robotics. PhD. Thesis, Osaka University. 2004: 56-78P
    [18] Blanchard A. and Amero L.. Developing affect-modulated behaviors: stability, exploration, exploitation, or imitation. Preceedings of the Sixth International Workshop on Epigenetic. Paris. 2006
    [19] Andry P. and Revel A.. Modeling synchrony for perception-action systems coupling. Preceedings of the Sixth International Workshop on Epignetic. Paris, 2006, 126: 163-174P
    [20] Breazeal, C. and Scassellati, B.. Infant-like social interactions between a robot and human caretaker. Adaptive Behavior. 2000, 8: 49-74P
    [21] Demiris, Y. amd Hayes, G.. Imitation as a dual-route process featuring predictive and learning components: a biologically plausible computational model. In K. Dautenhahn and C. Nehaniv(eds). Imitation in Animals and Artifacts. MIT Press. 2002: 321-361P
    [22] Andry, P., Gaussier, P. and Nadel, J.. From visuo-motor development to low-level imitation. In Proceedings of the 2nd International Conference on Epigenetics Robotics. 2002: 7-15P
    [23] Dautenhahn, K., and Billard, A.. Studying robot social cognition within a developmental psychology framework. In Proceedings of the 3rd International Workshop on Advanced Mobile Robots. 1999: 187-194P
    [24] Kozima, H., Nakagawa, C., and Yano, H.. Emergence of imitation mediated by objects. In Proceedings of the 2nd International Workshop on Epigenetic Robotics. 2002: 59–61P
    [25] Pfeifer, R.. On the role of morphology and materials in adaptive behavior. In Proceedings of the 6th International Conference on Simulation of Adaptive Behavior. 2000: 23-32P
    [26] Sporns, O., and Alexander, W.. Neuromodulation and plasticity in an autonomous robot. Neural Networks. 2002, 15: 761–774P
    [27] Stoica, A.. Robot fostering techniques for sensory-motor development of humanoid robots. Robotics and Autonomous Systems. 2001, 37: 127–143P
    [28] Coehlo, J., Piater, J., and Grupen, R.. Developing haptic and visual perceptual categories for reaching and grasping with a humanoid robot. Robotics and Autonomous Systems. 2001, 37: 195–218P
    [29] Weng J. Zhang Y. Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Analysis and Machine Intelligence. 2003, 25(8): 1034-1040P
    [30] Weng J. T. Luwang, H. Lu, and X. Xue. Multilayer in-place learning networks for modeling functional layers in the laminar cortex. Neural Networks. 2008, 21: 150-159P
    [31] Weng J. and S. Zeng. A theory of developmental mental architecture and the Dav architecture design. Humanoid Robotics. 2005,2(2): 145-179P
    [32] Weng J. and Hwang, W.. Incremental hierarchical discriminant regression. IEEE Transactions on Neural Networks. 2007, 18(2): 397-415P
    [33] D. Chen, L. Zhang and J. Weng. Spatio-temproal Adaptation in the Unsupervised Development of Networked Visual Neurons. IEEE Transactions on Neural Networks. 2009, 6(20): 992-1008P
    [34]吴警余.行为自动机研究——选择性综合神经模拟.国防工业出版社. 2003: 23-28页
    [35] G. M. Edelman. Naturalizing consciousness: a theoretical framework. Proceedings of the national academy of sciences. USA, 2003:5520~5524P
    [36] G. M. Edelman. A Universe of Consciousness: How Matter Becomes Imagination. Penguin Books. 2001: 123-129P
    [37] G. M. Edelman. Neural Darwinism: The Theory of Neural Group Selection. Basic Books. 1987: 14-19P
    [38] Meisheng Wang, Qingshan Li, Chenguang Zhao. A contract Net Model Based on agent active perception. International conference on computer software and applications. Turku, Finland. 2008: 511-516P
    [39]危辉,栾尚敏.基于连通结构与动力学过程的知觉记忆层次模型.软件学报. 2004, 15(11): 1616-1628页
    [40] D. Y. Chen, L. M. Zhang, J. Weng. A Dynamic Predictive Coding System based on Retina Biological Model. International Conference on Development and Learning. Bloomington, Indiana. 2006: 216-223P
    [41] Maes, P., and Brooks, R.. Learning to coordinate behaviors. Proceedings of the American Association of Artificial Intelligence, Boston, MA. 1990: 796-802P
    [42] Beer, R., Chiel, H., Quinn, R., and Ritzmann, R.. Biorobotics approaches to the study of motor systems. Current Opinion in Neurobiology. 1998, 8: 777–782P
    [43] Balch, T.. Clay: Integrating motor schemas and reinforcement learning (Technical Report GIT-CC-97-11). College of Computing, Georgia Institute of Technology. 1997
    [44] Balch, T.. Integrating RL and behavior-based control for soccer. RoboCup-97: Proceedings of the First Robot World Cup Soccer Games and Conferences, Springer-Verlag. 1997
    [45] Balch, T.. Reward and diversity in multirobot foraging. Proceedings of the IJCAI Workshop on Agents Learning About, From and With Other Agents. 1999
    [46]危辉,何新贵.表象式直接知识表示.计算机学报. 2001, 24(8): 891-896页
    [47]史忠植.智能科学.北京:清华大学出版社, 2005: 36-57页
    [48] John E Laird. The soar user’s manual version 8.6. Master’s Thesis, University of Michigan. 2005:5~32P
    [49] John E Anderson, Daniel Bothell, Michael D Byrne. An integrated theory of the mind. Psychological Review. 2004, 111(4): 1036~1060P
    [50] Mitchell, T. M.. Machine learning. McGraw-Hilll. 2005: 68-72P
    [51]阮晓钢.神经计算科学——在细胞的水平上模拟脑功能.国防工业出版社. 2006: 4-6页
    [52] Martin T. Hagan, Howard B. Demuth, Mark H. Beale. Neural Network Design. PWS. 1996: 3-5P
    [53] Braitenberg, V.. Vehicles: Experiments in Synthetic Psychology. MIT Press. 1986: 35-68P
    [54] Pfeifer, R. and Scheier, C.. Understanding Intelligence. MIT Press. 1999: 324-330P
    [55] G. D. Konidaris and A.G. Barto. An Adaptive Robot Motivational System. In From Animals to Animats 9: Proceedings of the 9th International Conference on the Simulation of Adaptive Behavior. Rome, Italy. 2006: 346~356P
    [56] A. Stout, G. D Konidaris and A.G. Barto. Intrinsically Motivated Reinforcement Learning: A Promising Framework for Developmental Robot Learning. In The AAAI Spring Symposium on Developmental Robotics. 2005: 4582-4587P
    [57] Barto, A.G. and ?im?ek, O. Intrinsic Motivation for Reinforcement LearningSystems. Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems, New Haven, CT. 2005: 113~118P
    [58] Konidaris, G.. Behaviour-based reinforcement learning. Master’s Thesis, School of Informatics, University of Edinburgh. 2003: 14-19P
    [59]蒋新松.机器人学导论.辽宁科学技术出版社. 1994: 51-60页
    [60] Murphy, R. R.. Introduction to AI Robotics. MIT Press. 2000:57-63P
    [61] Meystel, A., Knowledge based Nested Hierarchical Control. In Advances in Automation and Robotics. G. Saridis(eds.), JAI Press, Greenwich, CA. 1990, 1(2): 63-152P
    [62] Albus, J., and Proctor, F. G.. A Reference Model Architecture for Intelligent Hybrid Control Systems. Proceedings of the International Federation of Automatic Control. San Francisco, CA. 1996
    [63] Brooks, R. A.. A Robot that Walks: Emergent Behavior from a Carefully Evolved Network. Neural Computation. 1989,1(2): 253–262P
    [64] Brooks, R. A.. New Approaches to Robotics. Science. 1991, 253: 1227–1232P
    [65] Arkin, R. C.. Motor Schema-based Mobile Robot Navigation, International Journal of Robotics Research. 1989, 8(4): 92-112P
    [66] Arkin, R. C., and Murphy, R. R.. Autonomous Navigation in a Manufacturing Environment. IEEE Trans on Robotics and Automation. 1990, 6(4): 445-454P
    [67] Arkin, R. C., Riseman, E. M., and Hansen, A.. AuRA: An Architecture for Vision-based Robot Navigation. Proceedings of the DARPA Image Understanding Workshop, Los Angeles, CA. 1987: 417-423P
    [68] Murphy, R. R., and Mali, A.. Lessons Learned in Integrating Sensing into Autonomous Mobile Robot Architectures. Journal of Experimental and Theoretical Artificial Intelligence special issue on software Architecture for Hardware Agents. 1997, 9(2): 191-209P
    [69] Bonasson, R. P., Firby, J., Gat, E., Kortenkamp, D., Miller, D., and Slack, M.. A Proven Three-tiered Architecture for Programming Autonomous Robotics. Journal of Experimental and Theoretical Artificial Intelligence. 1997, 9(2):171-215P
    [70] Simmons, R., Goodwin, R., Haigh, K., Koenig, S., and Osullivan, J.. A Layered Architecture for Office Delivery Robots. Proceedings of Autonomous Agents 97. 1997: 245-252P
    [71] Maja J Matari? and Dave Cliff. Challenges in Evolving Controllers for Physical Robots. Robotics and Autonomous Systems. 1996, 19(1): 67-83P
    [72] J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press. 1975: 34-45P
    [73] Tolman, E. C.. Cognitive maps in rats and men. The Psychological Review. 1948, 55: 189-208P
    [74] Liu Juan, Cai Zixing, Tu Chunming. A Connectionist Approach for Cognitive Map Learning and Navigation Based on Navigation Based on Spatio-Temporal Experiences. Control Theory and Application. 2003, 20(2): 161~167P
    [75]熊哲宏.认知科学导论.华中师范大学出版社. 2002: 123~131页
    [76] Robert J. Sternberg.认知心理学.杨炳钧,陈燕译.中国轻工业出版社. 2006: 261-270页
    [77]危辉,栾尚敏.基于连通结构与动力学过程的知觉记忆层次模型.软件学报. 2004, 15(11): 1616-1628页
    [78]刘景钊.内隐认知与意会知识的深层机制.自然辨证法研究. 1999, 15: 11-14页
    [79]李恒威,黄华新.表征与认知发展.中国社会科学. 2006, vol. 2: 34-44页
    [80] Pfeifer, R. and Scheier, C.. Sensory-motor coordination: the metaphor and beyond. Robotics and Autonomous Systems. 1997, 20: 157-178P
    [81] Scheier, C., and Pfeifer, R.. Classification as sensory-motor coordination: A case study on autonomous agents. Robotics and Emulation of Animal Behavior. 1995, 5: 657-667P
    [82] Dedeoglu, G., Mataric, M., and Sukhatme, G. S.. Incremental, online topological map building with a mobile robot. In Proceedings of mobile robots 7. 1999: 129-139P
    [83] Shatkay, H., and Kaelbling, L. P.. Learning geometrically-constrained hidden Markov models for robot navigation: bridging the topological-geometrical gap. Journal of Artificial Intelligence Research. 2002,16: 167-207P
    [84] Kuipers, B. J., and Byun, Y. T.. A robot exploration and mapping strategy based on a semantic hieratchy of spatial representations. Robotics and Autonomous Systems. 1991, 8: 47-63P
    [85] Levitt, T. S., and Lawton, D. T.. Qualitative navigation for mobile robots. Artificial Intelligence. 1990, 44: 305-360P
    [86] Mataric, M. J.. Integration of representation into goal-driven behavior-based robots. IEEE Transactions on Robotics and Autonomous. 1992, 8(3): 304-312P
    [87] Arleo, A., and Gerstner, W.. Spatial cognition and neuromimetic navigation: a model of hippocampal place-cell activity. Biological Cybernetics. Special Issue on Navigation in Biological and Artificial Systems. 2000, 83: 287-299P
    [88] J. Weng, and W. Hwang. Online image classification using IHDR. International Journal on Document Analysis and Recognition. 2002, 5(2-3): 1433-2825P
    [89] Ballard, D. H.. Animate vision. Artificial Intelligence. 1991, 48: 57-86P
    [90] Pfeifer, R.. Robot as cognitive tools. International Journal of Cognition and Technology. 2002, 1: 125-143P
    [91] Barasalou, L. W.. Language comprehension: Archival memory or preparation for situated action?. Discourse Processes. 1999, 28: 61-80P
    [92] Tulving E. episodic memory: from mind to brain. Annual reviews of psychology. 2002, 53: 1~25P
    [93]聂爱情,郭春彦.情节记忆的神经科学研究综述.心理学研究. 2004, 5: 113~118页
    [94] J. Weng. On developmental mental architectures. Neurocomputing. 2007, 70(13-15): 2303-2323P
    [95] J. Weng and M. Luciw. Dually Optimal Neuronal Layers: Lobe Component Analysis. IEEE Transactions on Autonomous Mental Development. 2009, 1(1): 68-85P
    [96] Kaelbing, L., Littman, M. and Cassandra, A.. Planning and acting in partially observable stochastic domains. Artificial Intelligence. 1998, 101(1): 99-134P
    [97] Cassandra, A.. Exact and approximate algorithms for partially observable Markov decision processes. PhD. Thesis, Brown University. 1998
    [98] Parr, R., Russcil, S.. Approximating optimal policies for partially observable stochastic domains. In proceedings of the 4th International Joint Conference on Artificial Intelligence. Morgan. 1995: 1088-1094P
    [99] Lang, K. J., Waibel, A. H., and Hinton, G. E.. A time-delay neural-network architecture for isolated word recognition. Neural Networks. 1990, 3(1): 23-44P
    [100] Jordan, M. I.. Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the 8th annual Conference of the cognitive Science. 1986: 531-546P
    [101] Elman, J. L.. Distributed Representations, Simple Recurrent Networks and Grammatical Structure. Machine Learning. 1991, 7(2): 195-226P
    [102]寿天德.神经生物学.高等教育出版社. 2006: 54-78页
    [103] Lowe, D. G.. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision. 2004, 60(2): 91-110P
    [104] Theocharous, G., Rohanimanesh, K., and Mahadevan, S.. Learning hierarchical partially observable Markov decision processes for robot navigation. In Proceedings of the IEEE Conference on robots and automation. Seoul, Korea. 2001, 1: 511-516P
    [105] Chappelier, J. C., Gori, M., and Grumbach, A.. Time in Connectionist Models. In Sequence Learning: Paradigms, Algorithms and Applications. R. Sun and L. Giles(eds.). Lecture Notes in Artificial Intelligence Serie 1828, Springer. 2001: 105-134P
    [106]刘娟.基于时空信息与认知模型的移动机器人导航机制研究.中南大学博士论文. 2003: 31-35页
    [107] Filliat, D., and Meyer, J. A.. Global localization and topological map learning for robot navigation. In Hallam, B., Floreano, D., Hallam, J., and Hayes, G.(eds.). In From animals to animats 7, The 7th International Conference on simulation of adaptive behavior. Edinburgh, UK. 2002: 131-140P
    [108] Remolina, E.. A logical account of causal and topological maps. PhD. Thesis, The University of Texas at Austin. 2001
    [109] Hafner V. V.. Learning places in newly explored environments. In Meyer, J., Berthoz, A., Floreano, D., Herbert, R., and Stewart, W..(eds.). In From animals to animats 6, Proceedings of the 6th international conference on simulation of adaptive behavior. Paris, France. 2000:111-120P
    [110] Arkin,R.C.. Behavior-based Robotics. MIT Press. 1998:121-130P
    [111] Watkins, C., and Dayan, P.. Q-learning. Machine Learning. 1992, 8: 279~292P
    [112] Sutton, R., and Barto, A.. Reinforcement learning: An introduction. Cambridge. MIT Press. 1998: 112-124P
    [113]涂晓媛.人工鱼——计算机动画的人工生命方法.清华大学出版社. 2001: 69-71页