动态不确定性环境下的实时规划系统研究

作者：李响
论文级别：博士
学科专业名称：计算机应用
中文关键词：动态不确定性环境 ; 规划问题 ; POMDPRS ; 状态因子化 ; Monte ; Carlo滤波
英文关键词：Dynamic nondeterministic environment ; Planning ; POMDPRS ; State factorization ; Monte Carlo filter
学位年度：2004
导师：陈小平
学科代码：081203
学位授予单位：中国科学技术大学
论文提交日期：2004-12-01

摘要

作为一种非常重要而且常见的智能行为和能力,规划(Planning)就成为人工智能研究的一个重要领域,很早就受到关注的主要问题之一。而在动态不确定性环境下的规划就因其更加贴近现实环境,具有更高的实用价值而成为目前规划问题研究的重点和热点。
     本文首先分析动态不确定性环境的主要特点,包括:
     ■动态性:环境的状态无时无刻不在变化。它不仅仅受智能体自身的影响而变化,还受环境中其他智能体和其他因素的影响而变化。
     ■智能体知识的局限性:一般来说,智能体不可能掌握环境中所有的知识,不可能了解可以引起环境变化的所有因素,不可能了解其他智能体的所有情况。智能体只可能部分的掌握这些知识,甚至对一些方面一无所知。
     ■智能体行动的不确定性:智能体在环境中执行一定的行为,其结果是不确定的,事先无法对这个结果作准确的预测。
     ■智能体观察的局部性:一般来说,智能体对环境的观察是不全面的。在同一时刻,智能体只能观察到环境中一部分的情况。
     ■智能体观察的不确定性:智能体从环境中得到的观察一般来说是不准确的,有时甚至是错误的。
     然后,对现有的规划系统在适应上述动态不确定性环境的能力进行了概述。分析了这些系统在适应动态不确定性环境方面各自的优点和不足。
     本文的主要工作是基于以上的分析和认识,提出了基于PRS和决策论规划的面向动态不确定性环境的规划系统POMDPRS。并讨论了两种提高决策效率的改进方法。具体工作主要有:
     1)提出了面向动态不确定性环境的规划系统POMDPRS。描述了其基本模型,并给出了形式化描述。POMDPRS通过保持PRS系统的持续规划机制来适应环境的动态性,通过使用环境状态空间上的概率分布作为智能体的信念来适应环境的不确定性,从而兼顾了两个大方面的要求。
     2)阐述了状态因子化表示在POMDPRS中的应用,并给出了因子化的POMDPRS——FPOMDPRS的形式化描述。POMDPRS使用环境状态空间上的概率分布作为智能体的信念,并根据智能体输出的行为和接收到的观察来对其进行更新。但是在很多情况下,状态空间往往十分巨大,从而使得信念更新的时间消耗非常高,难以适应系统反应实时性的需要。因子化方法通过将状态表示中涉及到的环境属性根据其互相依赖关系来对它们进行划分。将一个状态表示为几个子状态的集合,从而将未因子化时的一个大状态空间变成几个较小的状态空间。从而信念也就变成几个子状态空间上的概率分布的集合。在信念更新的时候,对这几个子状态空间上的概率分布分别处理,从而达到削减信念分布时间消耗的作用。
     3)阐述了Monte Carlo滤波表示在POMDPRS中的应用,并给出了应用MonteCarlo滤波的POMDPRS——MCPOMDPRS的形式化描述。削减信念更新的时间消耗的另一个方法是Monte Carlo滤波。它通过使用概率分布上有限的一些具体数值(样本)来代表整个分布,并根据行动和观察,使用SIR方法来对这个样本集进行更新。这使得信念更新的时间消耗依赖于样本集的大小。从而可以通过控制样本集的大小来控制信念更新的时间消耗。
     因子化和Monte Carlo滤波可以在POMDPRS中结合起来使用。即先对状态进行因子化,然后再对一些仍然很大的子状态集使用Monte Carlo方法,从而达到进一步提高信念分布更新效率的目的。本文在最后具体描述了一个FPOMDPRS和MCPOMDPRS相结合的,在实体机器人上运行的机器人决策控制系统P-DOG并给出了实验结果,验证了POMDPRS及其变种的可行性。
As a very important and familiar intelligent activity a capability, planning is one of the key fields in artificial intelligence research. It had received attentions very early before. Planning in dynamic nondeterministic environment becomes the focus and hot spot, since this environment is more real and the research on it is more valuable.
     In this thesis, main characters of dynamic non-deterministic environment are analyzed in the beginning. They are
     Dynamic environment - The environment always keeps changing. It is affected not only by the agent itself but also by other agents and other factors in the environment.
     Limit knowledge of agents - In general, any agent has not all knowledge of the environment it lives. It can't know every factor which can affect the environment. And it also can not know other agents completely. One agent can only hold a part of them and even be utterly ignorant in some fields.
     Nondeterministic result of actions - Agents can perform some actions in the environment. But the results of these actions are not deterministic and unpredictable.
     Partial observation -- In general, agents' observations on the environment is partial. At each time, an agent can only observe a part of the situation of the environment.
     Nondeterministic observation -- agents' observations on the environment could be not accurate and even be completely false.
     Then, existing planning systems are analyzed in the capability of adapting in dynamic nondeterministic environment. Advantages and shortcomings are pointed out.
     Besides analyzing above, the main work of this thesis is introducing a new planning system POMDPRS which is better on adapting dynamic nondeterministic environment. It's based on PRS and decision theory planning. This thesis also discusses two approaches to improve efficiency of decision making. Abstract details are as below
     1) A new planning system POMDPRS which is better on adapting dynamic nondeterministic environment is introduced. The basic model and formal description are provided. POMDPRS reserves the continued planning mechanism to adapt dynamic character and depicts the belief of an agent with a distribution over the state space to adapt nondeterministic character. Therefore, POMDPRS satisfied requirements on both sides.
     2) The factorial depiction of states in POMDPRS is introduced. And the formal description of factorial POMDPRS - FPOMDPRS - is provided. POMDPRS depicts the belief of an agent with a distribution over the state space, and updates the belief with actions the agent performed and observations it received. Unfortunately, in many cases, the state apace is very large. This makes time consuming of belief updating very high. It does meet the real-time requirement of system. Factorial approach divides the properties of the environment, which construct states, into groups according to their dependency relations. This makes a big state space change into several smaller sub-state spaces. Therefore the belief is also transform into several distributions over sub-state spaces. In this way, belief updating is performed in these sub-state spaces independently. The time consuming of belief updating is reduced.
     3) The Monte Carlo filter in POMDPRS is introduced. And the formal description of Monte Carlo filter POMDPRS - MCPOMDPRS -- is provided. Monte Carlo filter is another way to reduce time consuming of belief updating. It use limit number of values (known as samples) to descript the whole distribution. And this sample set is updated by SIR method with actions and observations. It makes the time consuming depend on the size of the sample set. Therefore, we can control the size of sample set to limit time consuming of belief updating.
     Factorial state and Monte Carlo filter can be combined to apply in POMDPRS. First, state is factorized. Then, Monte Carlo filter is introduced into the sub-state sets which are still large in size. In this way, the efficiency of belief updating can be more improved. This thesis also describes a robot control system called P-DOG which runs on real robots. It's an instance of the combination of FPOMDPRS and MCPOMDPRS. This system on real robot proves the feasibility of POMDPRS and its variations.

引文

[1]Robert A. Wilson, Frank C. Keil, The MIT Encyclopedia of the Cognitive Sciences, MIT Press, September 1, 2001

    [2]JPL's Deep Space l's Remote Agent page, Planning and Scheduling, http ://ic. arc. nasa. gov/projects/remote-agent/pstext.html

    [3]Green, C: Application of theorem proving to problem solving. Proceedings Intl. Joint Conf. on Artificial Intelligence (1969) 219-240

    [4]R. Fikes and N. Nilson. STRIPS: A new approach to the application of theorem proving to problem solving. Arti cial Intelligence, 2(3-4): 189-208, 1971.

    [5]A. Blum and M. Furst. Fast planning through planning graph analysis. Artificial Intelligence, 90:281-300, 1997.

    [6]J. A. Hendler, A. Tate and M. Drummond. AI planning: Systems and techniques. AI Magazine 11(2):61--77, Summer 1990

    [7]John McCarthy and Patrick J. Hayes. Some philosophical problems from the standpoint of Artificial Intelligence. In Bernard Meltzer and Donald Michie, editors, Machine Intelligence 4. Edinburgh University Press, 1969.

    [8]Levesque, H., Pirri, F. and Reiter, R., Foundations for the situation calculus, Linkoping Electronic Articles in Computer and Information Science, Vol. 3 (1998): nr 18

    [9]R. Reiter, Knowledge in action. Logical Foundations for Describing and Implementing Dynamical Systems, MIT Press (2001)

    [10]R. Reiter. The frame problem in the situation calculus: A simple solution (sometimes) and a completeness result for goal regression. In V. Lifschitz, ed., Artificial Intelligence and Mathematical Theory of Computation, p. 359—380. Academic Press, 1991

    [11] Yves Lesperance, Hector J. Levesque, F. Lin, Daniel Marcu, Raymond Reiter and Richard B. Scherl, A logical approach to high-level robot programming - a progress report, In Benjamin Kuipers, editor, Control of the Physical World by Intelligent Agents, Papers from the 1994 AAAI Fall Symposium, pages 109-119, New Orleans, LA, November 1994

    [12]Levesque, H.; Reiter, R.; Lesperance, Y; Lin, F.; and Scherl, R. 1997. Golog: A logic programming language for dynamic domains. Journal of Logic Programming 31:59-84.

    [13]G. De Giacomo, Y. Lesperance, and H. Levesque. ConGolog, a concurrent programming language based on the situation calculus. Arti cial Intelligence, 121 (1-2): 109-169,2000.

    [14]On the semantics of deliberation in Indigolog: from theory to implementation, de Giacomo, G., Lesperance, Y, Levesque, H. and Sardina, S., in Proc. of the KR-2002 Conference, Toulouse, France, 2002.

    [15] Yves Lesperance, Hector J. Levesque, Shane J. Ruman: An Experiment in Using Golog to Build a Personal Banking Assistant (Extended Abstract). Agents 1997:486-487

    [16]http://mindstorms.lego.com/eng/default.asp

    [17]Hector J. Levesque and Maurice Pagnucco, Legolog: Inexpensive Experiments inCognitive Robotics, Proceedings of the Second International Cognitive Robotics Workshop, Berlin, Germany, August 21-22,2000.

    [18]Kautz, Henry, and Selman, Bart. 1996. Pushing the Envelope: Planning, Propositional Logic, and Stochastic Search. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference. Eds. Howard Shrobe and Ted Senator. Menlo Park, CA: AAAI Press. 1194-1201.

    [19]Henry Kautz, Bart Selman, Planning as satisfiability, Proceedings of the 10th European conference on Artificial intelligence table of contents: 359 - 363

    [20]B. Selman, H.J. Levesque H.J., and D. Mitchell, "A New Method for Solving Hard Satisfiability Problems" in Proceedings of the 10th National Conference on Artificial Intelligence (AAAI-92), San Jose, CA, 1992.

    [21]Henry Kautz, Bart Selman, BLACKBOX: A New Approach to the Application of Theorem Proving to Problem Solving, Working notes of the Workshop on Planning as Combinatorial Search, held in conjunction with AIPS-98, Pittsburgh, PA, 1998.
    [22]Henry Kautz and Bart Selman, Unifying SAT-based and Graph-based Planning, Proc. IJCAI-99, Stockholm, 1999

    [23]Kutluhan Erol, James Hendler, Dana S. Nau, HTN Planning: Complexity and Expressivity, Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94) 1994

    [24]Sacerdoti, E. D. A Structure for Plans and Behavior, Elsevier/North-Holland, New York, 1977.

    [25]Nau, Dana; Cao, Yue; Lotem, Amnon; and Munoz-Avila, Hector. 1999. SHOP: Simple Hierarchical Ordered Planner. In Proceedings of the International Joint Conference on ArtificialIntelligence 1999. 968-973.

    [26]M. P. Georgeff and A. L. Lansky. Reactive reasoning and planning. Proceedings of the Sixth National Conference on Artificial Intelligence (AAAI-87), pages 677-682, Seattle, WA, 1987.

    [27]F. F. Ingrand, M. P. Georgeff, A. S. Rao, An Architecture for Real-Time Reasoning and System Control, IEEE Expert 7(6) 33-44, December, 1992

    [28]Procedural Reasoning System: User's Guide, Technical report, Artificial Intelligence Center, SRI International, 2001

    [29]Mark d'Inverno, David Kinny, M Luck, M Wooldridge, A formal specification of dMARS, In Singh et al, editors, Proceedings of the 4 th International Workshop on Agent Theories, Architectures, and Languages (ATAL'97), LNAI, Vol. 1365, pp. 155-176, Springer, 1998

    [30]Jaeho Lee, Marcus J. Huber, Edmund H. Durfee, Patrick G. Kenny, UM-PRS: An implementation of the procedural reasoning system for multirobot applications, AIAA/NASA Conference on Intelligent Robots in Field, Factory, Service, and Space (CIRFFSS '94)

    [31]Anand S Rao, AgentSpeak(1): BDI agents speak out in a logical computable language, Proc. 7th European Workshop on Modeling Autonomous Agents in a Multi-Agent World, MAAMAW'96, LNAI-1038, Springer Pub., 1996

    [32]Michael Wooldridge, A logic of BDI agents with procedural knowledge, Working Notes of 3rd ModelAge Workshop: Formal Models of Agents (1996)
    [33]McAllester, D. and Rosenblitt, D. (1991). Systematic nonlinear planning. In Proceedings of AAAI-91, pages 634-639.

    [34]Barrett, A. and Weld, D. (1994). Partial-order planning: Evaluating possible efficiency gains. Artificial Intelligence, 67(1):71-112.

    [35]Blum, Avrim L. and Langford, John C. 1999. Probabilistic Planning in the Graphplan Framework. In Proceedings of the 5th European Conference on Planning. Springer Verlag.

    [36]Kushmerick, N.; Hanks, S.; andWeld, D. 1994. An algorithm for probabilistic least-commitment planning. In Proc. Twelfth National Conference on Artificial Intelligence, 1073-1078. AAAI Press.

    [37]Draper, Denise; Hanks, Steve; and Weld, Daniel. 1993. Probabilistic Planning with Information Gathering and Contingent Execution. Technical Report 93-12-04. Department of Computer Science and Engineering, University of Washington.

    [38]Boutilier, Craig; Dean, Thomas; and Hanks, Steve. 1999. Decision Theoretic Planning: Structural Assumptions and Computational Leverage. Journal of Artificial Intelligence Research 11:194.

    [39]Littman,M. L. 1996. Algorithms for SequentialDecisionMaking. Ph.D. Dissertation, Department of Computer Science, Brown University.

    [40]Puterman, M. 1994. Markov Decision Processes : Discrete Stochastic Dynamic Programming. John Wiley & Sons.

    [41]Edward J. Sondik. The optimal control of partially observable Markov Decision Processes. PhD thesis, Stanford university, 1971

    [42]A. Cassandra, M. Littman, and N. Zhang. Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes. In UAI, 1997.

    [43]L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99-134, 1998.

    [44]N. L. Zhang and W. Zhang. Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14:29-51,2001.

    [45]J. Pineau, G Gordon, and S. Thrun. Point-based value iteration: an anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (IJCAI), 2003

    [46]J. Pineau and S. Thrun. High-level robot behavior control using POMDPs. In AAAI Workshop notes, 2002

    [47]Paolo Busetta, Ralph Ronnquist, Andrew Hodgson, and Andrew Lucas. JACK Intelligent Agents - Components for Intelligent Agents in Java. Technical report, Agent Oriented Software Pty. Ltd, Melbourne, Australia, 1998.

    [48]JACK Manual, Release 4.1, Agent Oriented Software Pty. Ltd., 2003

    [49]Marcus Huber, JAM: A BDI-theoretic Mobile Agent Architecture, Proceedings of the Third International Conference on Autonomous Agents (Agents'99) 236243, Seattle, May, 1999.

    [50]Pineau, Joelle and Thrun, Sebastian. 2002. An integrated approach to hierarchy and abstraction for POMDPs. Technical Report CMU-RI-TR-02-21. Carnegie Mellon University School of Computer Science.

    [51]S.P.M. Choi, D.Y. Yeung, N. Zhang. Hidden-mode Markov decision processes. In Proceedings of the IJCAI'99 Workshop on Neural, Symbolic, and Reinforcement Methods for Sequence Learning, Stockholm, Sweden, 1 August 1999

    [52]S.P.M. Choi, Nevin L. Zhang and Dit-Yan Yeung. Solving Hidden-Mode Markov Decision Problems. In Proceedings of Eighth International Workshop on Artificial Intelligence and Statistics (AI & Stat 2001), Key West, Florida, USA. Jan 2001.

    [53]Sebastian Thrun. Monte Carlo POMDPs. In Proceedings of Conference on Neural Information Processing Systems, pages 1064—1070, Denver, 1999.

    [54]Firby, R. J. 1987. An Investigation into Reactive Planning in Complex Domains. In Proceedings of the Sixth National Conference on Artificial Intlligence. AAAI Press. 202-206.

    [55]Chien, Steve; Knight, Russell; Stechert, A.; Sherwood, Robert; and Rabideau, Greg. 1999. Using Iterative Repair to Increase the Responsiveness of Planning and Scheduling for Autonomous Spacecraft. In Proceedings of IJCAI99 Workshop on Scheduling and Planning Meet Real-Time Monitoring in a Dynamic and Uncertain World. San Francisco, CA: Morgan Kauffman.
    [56]J. Pearl. Fusion, Propagation, and Structuring in Belief Networks. Artificial Intelligence 29,p.241-288,1986.

    [57]J. Pearl. Probabilistic Reasoning in Intelligence Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, California, 1988.

    [58]Shachter, R. D. (1998). Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams). In Uncertainty in Artificial Intelligence: Proceedings of the Fourteenth Conference (pp. 480-487). San Francisco, CA: Morgan Kaufmann

    [59]F. V. Jensen. "Bayesian Networks and Decision Graphs". Springer. 2001.

    [60]Cooper, G. F. (1990). The computational complexity of probabilistic reasoning using Bayesian belief networks, Artificial Intelligence 42(2-3): 393—405.

    [61]T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Artificial Intelligence, 93(1-2): 1-27,1989.

    [62]Kevin Patrick Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning, PH.D Thesis, UNIVERSITY OF CALIFORNIA, BERKELEY, 2002

    [63]L. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE 77(1989).

    [64]L. E. Baum. 1972. An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes. Inequalities, 3:1-8

    [65]Russell, S. and Norvig, P. 1995. Artificial Intelligence: A Modem Approach. Englewood Cliffs, New Jersey: Prentice Hall.

    [66]Huang, T., Koller, D., Malik, J., Ogasawara, G., Rao, B., Russell, S., & Weber, J. (1994). Automated symbolic traffic scene analysis using belief networks. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pp. 966-972.

    [67]Murray, R.C. and VanLehn, K. (2000) DT Tutor: A Decision-Theoretic, Dynamic Approach for Optimal Selection of Tutorial Actions. In G. Gauthier, C. Frasson, & K. VanLehn (Ed.), Intelligent Tutoring Systems, 5th International Conference, ITS 2000, pp. 153-162. New York: Springer.
    [68]Magni, P.(1999) A new approach to optimal dynamic therapy planning. In Transforming Health Care Through Informatics: Proceedings of the 1999 AMIA Annual Symposium

    [69]Puterman, M. 1994. Markov Decision Processes : Discrete Stochastic Dynamic Programming. John Wiley & Sons.

    [70] Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling. On the complexity of solving Markov decision problems. In Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence (UAI-95), Montreal, Quebec, Canada, 1995.

    [71]I. Prigogine, I. Steingers, The End of Certainty: Time, Chaos, and the New Laws of Nature, Free Press, 1997.

    [72]Z. Ghahramani and M.I. Jordan. Factorial hidden Markov models. Machine Learning, 29:245-- 273, 1997. 15

    [73]Z. Ghahramani, M. Jordan, Factorial Hidden Markov Models Computational Cognitive Science Technical Report 9502 (Revised) July 1996.

    [74]Beth Logan Pedro J. Moreno, Factorial Hidden Markov Models for Speech Recognition: Preliminary Experiments, CRL 97/7 September Cambridge Research Laboratory

    [75]Kitagawa, G.., 1996. Monte Carlo Filter and smoother for non-Gaussian state space models, Journal of Computational and Graphical Statistics, 5(1), 1-25.

    [76]Rubin, D. 1988. Using the SIR algorithm to simulate posterior distributions. Bayesian Statistics 3. Oxford University Press.

    [77]R.E. Kalman, A new approach to linear filtering and prediction problems, Transactions of the ASME, Ser. D, Journal of Basic Engineering, 82, 34—45(1960).

    [78]Dieter Fox, Wolfram Burgardy, Frank Dellaert, Sebastian Thrun, Monte Carlo Localization: Efficient Position Estimation for Mobile Robots, Proc. of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, Florida, 1999

    [79]S. Thrun, D. Fox, W. Burgard, and F. Dellaert, Robust Monte Carlo Localization for Mobile Robots, Artificial Intelligence Journal, 2001

    [80]H.Kitano, Y.Kuniyoshi, I.Noda, RoboCup: A challenge problem for AI. AI Magazine,18(1),pages 73-85,1997
    [81]Manuela Veloso,William Uther,Masahiro Fujita,Minoru Asada,and Hiroaki Kitano.Playing soccer with legged robots,In Proceedings of IROS-98,Intelligent Robots and Systems Conference,Victoria,Canada,October 1998
    [82]陈小平,国际机器人足球(RoboCup)最新进展,机器人技术与应用,2001年第一期
    [83]Sony Corporation.Entertainement robot AIBO,http:///www.aibo.com
    [84]Y.Yokote.The ApertOS Re ective Operating System:The Concept and Its Implementation.In,Proceedings of OOPSLA'92,volume 27 of Sigplan Notices,pages 414-434.ACM,October 1992
    [85]John Dalgliesh,Mike Lawther,Playing Soccer With Quadruped Robots,School of Computer Science and Engineering University of New South Wales November 4,1999
    [86]H.Kitano,M.Fujita,S.Zrehen,and K.Kageyama.Sony legged robot for robocup challenge,In Proceedings of the 1998 IEEE International Conference on Robotics and Automotion,pages 2605-2612,1998.
    [87]SONY Corporation,Japan,OPEN-R programmer's Guide
    [88]SONY Corporation,Japan,OPEN-R reference
    [89]SONY Corporation,Japan,OPEN-R tutorial
    [90]RoboCup Technical Committee,Sony Four Legged Robot Football League Rule Book,http://www.openr.org/robocup/index.html
    [91]N.R.Jennings,Controlling cooperative problem solving in industrial multi-agent systems using joint intentions,Artificial Intelligence,Volume 75,Issue 2(June 1995),Page 195-240,1995
    [92]Brazier,F.M.T.Jonker,C.M.Treur,J.Formalization of a cooperation model based on joint intentions,Intelligent Agents Ⅲ,Proceedings of the Third International Workshop on Agent Theories,Architectures and Languages,ATAL'96,Vol:1193,1997,Springer Verlag

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700