用户名: 密码: 验证码:
基于马尔可夫决策理论的规划问题的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,智能体及多智能体规划问题成为人工智能领域新的研究热点,且有着广泛的应用前景。本文针对马尔可夫决策过程及其相关理论展开研究,对这些决策理论在接触现实世界的应用时所面临的问题及解决方法做了一定的探讨,最后对相关的一类基本决策算法进行了一定的理论分析和改进。
     主要涉及到以下几个方面的工作:
     (1)较为系统的研究了与智能体及多智能体不确定性规划相关的几类基础决策模型及算法。模型部分,首先是最基本的马尔可夫决策模型,然后是在此基础上加入观察不确定性的部分可观察马尔可夫决策模型,以及进一步加入多智能体合作的分布式部分可观察马尔可夫决策模型和多智能体对抗的部分可观察的随机博弈模型。算法部分,针对上述几类模型,均按照后向迭代和前向搜索两大类进行了对比分析。最后,进一步讨论了与时间抽象相关的半马尔可夫决策模型及Option理论,这一理论是设计分等级的规划框架及算法的基础。
     (2) Robocup仿真2D提供了一个研究大规模不确定环境下多智能体规划问题的标准测试平台。结合对该平台的一些必要的说明,分析了在这种接近现实世界应用的问题中,进行整体规划所需要处理的一些子问题的设计方法,并通过结合现有马尔可夫决策过程相关理论对这些问题进行建模及分析,给出该平台更一般的研究意义。
     (3) Option理论对应了时间抽象的概念,它为马尔可夫决策理论更多的接触现实世界应用提供一个分等级规划的研究方向。针对类似Robocup仿真2D这种带有观察不确定性的大规模多智能体系统的规划问题,在部分可观察随机博弈模型的基础上,结合策略启发,信念状态压缩,因子化表示法及Option理论,给出了一个新的基于动态行为生成器的决策框架,并在此基础上设计了一个以快速寻找可行解为目标的实时启发式搜索算法。最后,结合仿真2D这一标准平台,对这一决策框架及算法的实用效果进行了检验。
     (4)基于Option的理论分等级规划时,大规模问题中子策略的求解效率也至关重要。实时动态规划是求解马尔可夫决策过程的一类较新的方法。这类方法除了具有求解效率上的优势外,还很容易被设计成anytime的工作方式。实时动态规划类算法结合了启发式搜索与值迭代的技术,算法的核心问题是分支选择策略与收敛判据。分支选择策略决定了值迭代的收敛速度,收敛判据用以判定解的最优性。通过对启发式函数上界及下界的分析及利用,给出了一个新的收敛判据,称为最优行动判据,以及一个更适合实时算法的分支言癫呗浴W钣判卸芯菘梢愿绲谋甓ǖ鼻白刺憔纫蟮淖钣判卸┝⒓粗葱?而新的分支选择策略可以加快这一判据的满足。并据此设计了一个有界增量实时动态规划算法(BI-RTDP)。在两种典型仿真实时环境的实验中,BI-RTDP显示了比现有相关算法更好的实时性能。最后,通过对算法异步值迭代机制的研究改进了其在搜索图上处理环的能力,并展示了算法离线求解效果。
In recent years,agent and multi-agent planning have been new research hotspots in the field of Artificial Intelligence with a broad prospect.In this paper,the research is on the basis of Markov decision process and its related theories,study the issues and solutions of these theories when contact with the real-world application,present some theoretical analysis and improvement on a class of basic Markov decision-making algorithm.
     The research in this paper is mainly related to the following content:
     (1) A systematic study of the foundation models and algorithms related to the agent and multi-agent planning problem with uncertainties.Among the models,first of all,is the Markov decision processes model as the most basic one,and then is the partially observable Markov decision processes model which take into account of the observation uncertainty,and further is the decentralized partially observable Markov decision processes mode with multi-agent cooperation joined and the partially observable stochastic games model with multi-agent competition involved.Among the algorithms,a survey and some comparative analysis are carried out following such a classification that,one is a backward iteration fashion and the other is a forward search fashion.At last,further discussion is made about the Semi-Markov decision processes model and the Option theory involving temporal abstraction,which is a foundation for the design of hierarchical planning framework and its algorithms.
     (2) Robocup simulation 2D is regarded as a standard platform for research on multi-agent planning problem in large scale uncertain environment.With some necessary explanation about the platform itself,address and analyze the design methods on some sub-problem for the overall planning in such kinds of real world application.By employing existing Markov decision theory to model these issues, give the platform more general research significance.
     (3) Option theory involves the concept of temporal abstraction and can be applied for hierarchical planning,which gives a new research direction for Markov decision theory being capable to contract more with the real world application. Toward the large scale multi-agent planning problem with the observation uncertainty like Robocup simulation 2D,on the basis of the partially observable stochastic games model,by combining the technique of policy heuristic,belief state compression, factored representation and Option theory,give a new decision-making framework with a concept of behavior generator,and design a new real-time heuristic search algorithm consequently,which is aimed to quickly find a feasible solution online. Finally,show the practical effect of testing these new methods on Robocup simulation 2D as a standard platform.
     (4) On the basis of the Option theory,when applying hierarchical planning toward the large-scale problem,the capability of solving sub-problem efficiency is also crucial.Real-Time Dynamic Programming combines the technology of heuristic search and value iteration to solve the Markov decision problems.The core issues of the algorithm design are the branch strategy and the convergence criterion.The branch strategy influences the convergence speed of value iteration;the convergence criterion is applied to determine the optimal solution.Several typical convergence criterions are compared and analyzed,afterward,a new one named the optimal action criterion and a corresponding branch strategy are proposed on the basis of the upper and lower bound theory.This criterion guarantees that the agent can act earlier in a real-time decision process while an optimal policy with sufficient precision still remains.It can be proven that,under certain conditions,one can obtain an optimal policy with arbitrary precision using such an incremental method.With these new techniques,a Bounded Incremental Real-Time Dynamic Programming algorithm is designed.In the experiments of two typical real-time simulation systems,BI-RTDP outperforms the other state-of-the-art RTDP algorithms.Finally,through the study on the asynchronous value iteration mechanism of RTDP,improve its capability of dealing with cycle in the search graph,and show the off-line experimental results.
引文
陈希孺,1996,概率论与数理统计,中国科学技术大学出版社。
    郭叶军,2004,机器人足球仿真比赛中多智能体系统的构建,硕士学位论文.
    李海刚,吴启迪.2003 1.多Agent系统研究综述.同济大学学报.2003.1(6):728-732.
    李实,2001,基于模糊神经网络的多智能体系统学习问题研究,博士学位论文.
    刘海龙,2001.动态环境下分布式智能系统的任务协作理论研究[博士论文].浙江大学.
    刘克,2004,实用马尔可夫决策过程[M],北京:清华大学出版社,20-52.
    Mehmed Kantardzic著,闪四清,陈茵,程雁译.2003.数据挖掘-概念、模型、方法和算法[M].北京:清华大学出版社.
    许博,吴敏,2007年5月,部分可观察马尔可夫决策过程研究进展,计算机工程与设计,第28卷第9期(Vol.28 No.9).
    薛宏涛,叶媛嫒,沈林成,常文森.多智能体系统体系结构及协调机制研究综述[J].机器人.2001.23(1):85-90.
    杨帆,2003,RoboCup仿真中的多智能体合作,学士学位论文,清华大学.
    朱淼良,杨建刚,吴春明编著.2000.自主式智能系统.浙大出版社.
    Anand S R,Michael P,Georgeff M.1991.Asymmetry thesis and side-effect problems in linear time and branching time intention logics.Proceedings of the 12th Inter Joint Conf on Artificial Intelligence.Sydney,Austraila.498-505.August.
    Anand S R,Michael P,Georgeff M.1995.BDI agents:From theory to practice.Proceedings of the First International Conference on Multi-agent Systems.San Francisco,USA.312-319.
    Andersen H R,1997.An Introduction to Binary Decision Diagrams.Lecture notes for 49285Advanced Algorithms E97.
    Alexander L.Strehl and Michael L.Littman,An Empirical Evaluation of Interval Estimation for Markov Decision Processes,The 16th IEEE International Conference on Tools with Artificial Intelligence(ICTAI-2004),Pages 128-135,2004.
    Barto A G.,Bradtke S J.(?)ngh S P.1995.Learning to act using real-time dynamic programming.Artif.Intell.,72,81-138.
    Barto A G,Mahadevan S.(?).Recent advances in hierarchical reinforcent learning.Discrete event dynamic systems.Theory and applications,13:343-379.
    Becket R,Zilberstein S,Lesser V,Goldman C V.2004.Solving transition independent decentralized markov decision processes.Journal of Artificial Intelligence Research,22:423-455,2004.
    Bernstein D S, Zilberstein S, Immerman N. 2002. The Complexity of Decentralized Control of Markov Decision Processes. UAI 2000: 32-37
    
    Bertsekas D.1995. Dynamic Programming and Optimal Control, Athena Scientific, Belmont,MA.
    
    Bonet B, Geffner H. 2006. Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and its application to MDPs. In Proc. ofAAAI.
    
    Bonet B, Geffner H. 2003. Labeled RTDP: Improving the convergence of real-time dynamic programming. In Proc. of ICAPS-03, 12-21.
    
    Bonet B, Geffner H. 2003. Faster heuristic search algorithms for planning with uncertainty and full feedback. Proc. 18th International Joint Conf. on Artificial Intelligence (pp. 1233-1238).Acapulco, Mexico: Morgan Kaufmann.
    
    Boutilier C, Dean T, Hanks S. 1999. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. Journal of Artificial Intelligence Research, 11: 1-94.
    
    Boutilier C, Dearden R, Goldszmidt M. 2000. Stochastic dynamic programming with factored representations. Artif. Intell. 121(1-2): 49-107[DBLP:journals/ai/BoutilierDG00]
    
    Bowling M, Stone P, Veloso M. 1996. Predictive Memory for an Inaccessible Environment. Proceedings of the IROS-96 Workshop on RoboCup. Osaka, Japan. 28-34.
    
    Bradtke S J, Duff, M O. 1995. Reinforcement learning methods for continuous-time Markov decision problems. Advances in Neural Information Processing Systems 7:393-400. MIT Press,Cambridge, MA.
    
    Brafman R I, Tennenholtz M. 1997. Modeling agents as qualitative decision makers.Artificial Intelligence 94(1):217-268.
    
    Cassandra A R, Littman M L, Zhang N L. 1997. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes. UAI 1997: 54-61
    
    Cassandra A R, Kaelbling L, Kurien J A. 1996. Acting under uncertainty: Discrete Bayesian models for mobile-robot navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.
    
    Chades I, Scherrer B, Charpillet F. 2002. A heuristic approach for solving decentralized-pomdp: Assessment on the pursuit problem. In Proceedings of the 2002 ACM Symposium on Applied Computing.
    
    Cheng H. 1988. Algorithms for Partially Observable Markov Decision Processes. Ph.D.Dissertation, University of British Columbia, British Columbia, Canada.
    
    Cohen P R, Levesque H J.1990. Intention is choice with commitment. Artificial Intelligenc. 42(2-3):213-261.
    
    Cohen S, Maimon O. 2005. Reinforcement-Learning: An Overview from a Data Mining Perspective. The Data Mining and Knowledge Discovery Handbook 2005: 469-486 [DBLP:books/sp/datamining2005/CohenM05]
    
    Dean T, Kaelbling L P, Kirman J, Nicholson A. 1995. Planning under time constraints in stochastic domains. Artif. Intell., 76, 35-74.
    
    Dejong G F. 1994. Learning to plan in continuous domains. Artificial Intelligence 65:71-141.
    
    Emery-Montemerlo R, Gordon G, Schneider J, Thrun S. 2004. Approximate solutions for partially observable stochastic games with common payoffs. In Proceedings of the Third Joint Conference on Autonomous Agents. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1. New York. 136 - 143.
    
    Franklin S, Graesser A. 1996. Is it an agent, or just a program? A taxonomy for autonomous agent. Proceedings of the Third International Workshops on Agent Theories, Architectures, and Language. Budapest, Hungary. Springer-Verlag. 21-35.
    
    Ferguson D, Stentz A T. 2004. Focused dynamic programming: Extensive comparative results (Technical Report CMU-RI-TR-04-13). Robotics Institute, Carnegie Mellon University, Pittsburgh,PA.
    
    Foka A, Trahanias P. Real-time hierarchical POMDPs for autonomous robot navigation.Robotics and Autonomous Systems (RAS), in review.
    
    Geffner H, Bonet B. 1998. Solving large POMDPs by real time dynamic programming. In Working Notes Fall AAAI Symposium on POMDPs.
    
    Givan R, Dean T, Greig M. 2003. Equivalence notions and model minimization in Markov decision processes. Artif. Intell. 147(1-2): 163-223.
    
    Goldman C V. Zilberstein S. 2004. Decentralized control of cooperative systems:Categorization and complexity analysis. Journal of Artificial Intelligence Research, 22:143-174.
    
    Haddawy P, Hanks S. 1998. Utility models for goal-directed, decision-theoretic planners.Computational Intelligence, 14:392-429,1998.
    
    Hansen E A. 1997. An Improved Policy Iteration Algorithm for Partially Observable MDPs.NIPS 1997.
    
    Hansen E A, Bernstein D S, Zilberstein S. 2004. Dynamic Programming for Partially Observable Stochastic Games. AAAI 2004:709-715
    
    Hansen E A, Zilberstein S. 2001. LAO*: a heuristic search algorithm that finds solutions with loops. Artif. Intell., 129,35-62.
    Hauskrecht M. 2000. Value-function approximations for partially observable markov decision processes. Journal of Artificial Intelligence Research, 13:33-94.
    
    Hern'andez C, Meseguer P. 2005. Improving convergence of LRTA*(k). In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI).
    
    Howard R A, 1960. Dynamic Programming and Markov Processes, MIT Press, Cambridge Mass.
    
    Jennings N R, Sycara K, Wooldridge M. 1998. A Roadmap of Agent Research and Development. Autonomous Agents and Multi-Agent Systems.1(1):7-38.
    
    Jong N K, Stone P. 2005. State Abstraction Discovery from Irrelevant State Variables. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05).752-757.
    
    Joliffe, I. T. (1986). Principal Component Analysis[M]. Springer-Verlag.
    
    Kitano H, Asada M. 1998. RoboCup Humanoid Challenge: That's One Small Step for A Robot, One Giant Leap for Mankind. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-98). Victoria, Canada. 12(2-3):149-164.
    
    Kleer J, Brown J S. 1984. A qualitative physics based on confluences. Artificial Intelligence 24(1-3):7-83.
    
    Kok J R, Spaan M, Vlassis N. 2003. Multi-robot decision making using coordination graphs.Proceedings of the 11th International Conference on Advanced Robotics. Coimbra, Portugal.1124-1129.
    
    Kok J, Boer R, Vlassis N, Groen F. 2002. Towards an optimal scoring policy for simulated soccer agents. Proceedings Robocup 2002 symposium. Fukuoka, Japan. 296-303.
    
    Kok J R, Spaan M, Vlassis N. 2002. An approach to noncommunicative multiagent coordination in continuous domains. Proceedings of the Belgian-Dutch Conference on Machine Learning. Utrecht, Netherlands. 46-52.
    
    Kuipers B J. 1979. Commonsense knowledge of space: Learning from experience.Proceedings of the Sixth International Joint Conference on Artificial Intelligence, pp. 499-501.
    
    Lane D M, Mcfadzean A G. 1994. Distributed problem solving and real-time mechanisms in robot architectures. Engineering Applications of Artificial Intelligence Journal. 7(2):105-117.
    
    Li L H, Walsh T J, Littman M. 2006 Towards a Unified Theory of State Abstraction for MDPs , Ninth International Symposium on Artificial Intelligence and Mathematics.
    
    Littman M L. 1994. The witness algorithm: Solving partially observable Markov decision processes. Technical Report CS-94-40, Brown University, Department of Computer Science,Providence, RI.
    
    Littman M L, Cassandra A R, Kaelbling L P. 1995. Learning policies for partially observable environments: Scaling up. In Proceedings of the 12th International Conference on Machine Learning.
    
    Littman M L, Dean T, Kaelbling L P. 1995. On the Complexity of Solving Markov Decision Problems. In Proc. of the Eleventh International Conference on Uncertainty in Artificial Intelligence, UAI 1995: 394-402
    
    Littman M L, Majercik S M. 1997. Large-Scale Planning Under Uncertainty: A Survey. In Workshop on Planning and Scheduling for Space, pages 27:1-8.
    
    Liu C, Layland J W. 1973. Scheduling Algorithms for Multiprogramming in A Hard-Real-Time Environment. In Journal of the ACM 20(1): 46-61.
    
    Mahadevan S, Marchalleck N, Das T, Gosavi A. 1997. Self-improving factory simulation using continuous-time average-reward reinforcement learning. Proceedings of the 14th International Conference on Machine Learning, pp. 202-210.
    
    McGovern A, Barto AG. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In To appear in Proceedings of the 18th International Conference on Machine Learning ICML 2001.
    
    McMahan H B, Likhachev M, Gordon G J. 2005. Bounded real-time dynamic programming:RTDP with monotone upper bounds and performance guarantees. In Proc. of ICML.
    
    Montanari M U, 1978. Optimizing decision trees through heuristically guided search, Comm.ACM 21 (12) (1978) 1025-1039.
    
    Murphy K, Paskin M. 2001. Linear time inference in hierarchical hmms. Proceedings of Neural Information Processing Systems.
    
    Nair R, Tambe M, Yokoo M, Pynadath D, Marsella S. 2003. Taming decentralized pomdps:Towards efficient policy computation for multiagent settings. In Proceedings of the 18th International Joint Conference on Artificial Intelligence.
    
    Nilsson N J, 1980, Principles of Artificial Intelligence, Tioga Publishing, Palo Alto, CA,1980.
    
    Nourbakhsh I, Powers R, Birchfield S. 1995. DERVISH: an office-navigating robot. AI Magazine, 16(2), 53-60.
    
    Oliehoek F, Visser A. 2006. A hierarchical model for decentralized fighting of large scale urban fires. In AAMAS'06 Workshop on Hierarchical Autonomous Agents and Multi-Agent Systems (H-AAMAS), pp. 14-21.
    
    Osborne M J. 2003. An Introduction to Game Theory. ISBN 9780195128956 Publisher:Oxford University Press, USA.
    
    Parr, R. (1998). Hierarchical control and learning for Markov decision processes. PhD Thesis, University of California at Berkeley.
    
    Pemberton J C, Korf R E. 1994. Incremental Search Algorithms for Real-Time Decision Making. In Proc. of AIPS-94,140-145.
    
    Peshkin L, Kim K E, Meuleau N, Kaelbling L. 2000. Learning to cooperate via policy search. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence.
    
    Pineau J, Gordon G, Thrun S. 2003. Value iteration: An anytime algorithm for POMDPs[C].In Proc. Int. Joint Conf. on Artificial Intelligence, Acapulco. Mexico.
    
    Puterman M. 1994. Markov decision processes. John Wiley and Sons, New York.
    
    Quinlan J R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.San Francisco, CA, USA.
    
    Ravindran B, Barto A G. 2003. SMDP homomorphisms: An algebraic approach to abstraction in semi-Markov decision processes. In Proceedings of the Eighteenth Internatinal Joint Conference on Artificial Intelligence
    
    Ravindran B, Barto A G. 2003. Relativized Options: Choosing the Right Transformation.ICML 2003: 608-615 [DBLP:conf7icml/RavindranB03]
    
    Ravindran B, Barto A G, Mathew V. 2007. Deictic Option Schemas. IJCAI 2007: 1023-1028[DBLP:conf/ijcai/RavindranBM07]
    
    Reis L P, Lau FC. 2000. Portugal Team Description. RoboCup-2000 Simulation League Champion. RoboCup-2000. Robot Soccer World Cup IV.
    
    Remco de Boer, Jelle Kok. The Incremental Development of a Synthetic Multi-Agent System:The UvA Trilearn 2001 Robotic Soccer Simulation Team[Master Thesis]. Arti-ficial Intelligence and Computer Science, University of Amsterdam. February,2002.
    
    Riedmiller M, Merke A. 2002. Using Machine Learning Techniques in Complex Multi-Agent Domains.
    
    Ross S, Chaib-draa B, 2007. AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07).
    
    Roy N, Gordon G J, Thrun S. 2005. Finding Approximate POMDP solutions Through Belief Compression. J. Artif. Intell. Res. (JAIR) 23: 1-40 [DBLP:journals/jair/RoyGT05]
    
    Sallans B. 2000. Learning Factored Representations for Partially Observable Markov Decisopn Proceses, In Advance in Neural Information Processing Systems 12, MIT Press,Cambridge, MA.
    
    Shani G, Brafman R In, Shimony S E. 2007. Forward Search Value Iteration For POMDPs,IJCAI07.
    Shoham Y. 1993. Agent-oriented programming. Artificial Intelligence. 60(1):51-92.
    
    Singh M P. 1995. Semantical considerations on some primitives for agent specification.Proceedings of the Workshop on Agent Theories, Architectures, and Languages, Lecture Notes in Artificial Intelligence. Montreal, Canada. Springer Verlag. 1037:49-64.
    
    Simmons R, Koenig S. 1995. Probabilistic navigation in partially observable enviroments. In Proceedings of the 14th International Joint Conference on Artifical Intelligence (IJCAI), pages 1080-1087, Montreal, Canada.
    
    Smallwood, Richard D and Sondik, Edward J. 1973 The optimal control of partially observable Markov processes over a finite horizon Operations Research 21.1071-1088 。
    
    Smith T, Simmons R. 2006. Focused Real-Time Dynamic Programming for MDPs:Squeezing More Out of a Heuristic. In Proc. of AAAI, 2006.
    
    Smith T, Simmons R. 2004. Heuristic Search Value Iteration for POMDPs. In Proc. of UAI-04, Banff, Canada.
    
    Say A, Selahattin K. 1996. Qualitative system identification: Deriving structure from behavior. Artificial Intelligence 83(1):75-141.
    
    Sondik, E. 1971. The Optimal Control of Partially Observable Markov Processes. Ph.D.Dissertation, Stanford University.
    
    Stone P. 1998. Layered Learning in Multi-Agent Systems[PhD thesis]. Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA.
    
    Stone P, Veloso M. 1998. Task Decomposition and Dynamic Role Assignment for Real-Time Strategic Teamwork. Intelligent Agents V - Proceedings of the Fifth International Workshop on Agent Theories, Architectures and Languages (ATAL-98). Paris,France. Springer Verlag.1555:293-308.
    
    Stone P, Veloso M. 1999. Task Decomposition, Dynamic Role Assignment and Low-Bandwidth Communication for Real-Time Strategic Teamwork. Artificial Intelligence.110(2):241-273.
    
    Stone P, Veloso M, Riley P. 1999. Team-Partitioned, Opaque-Transition Reinforcement Learning. RoboCup-98: Robot Soccer World Cup II. Paris,France. Springer Verlag, 6-212.
    
    Sutton R S, Precup D, SinghS. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2): 181-211.
    
    Szer D, Charpillet F, Zilberstein S. 2005. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI), pages 576-583, Edinburgh, Scotland.
    
    Theocharous G. 2002. Hierarchical Learning and Planning in Partially Observable Markov Decision Processes. PhD thesis, Michigan State University.
    
    Voorbraak F, Massios N. 2001. Decision-Theoretic Planning for Autonomous Robotic Surveillance. Appl. Intell. 14(3): 253-262. [DBLP:joumals/apin/VoorbraakM01].
    
    Washington R. 1996. Incremental markov-model planning. In Proceedings of the 8th International Conference on Tools with Artificial Intelligence.
    
    Werner E. 1990. Cooperating agents: A unified theory of communication and social structure. Distributed Artifical Intelligence. 2:3-36. 1990.
    
    Wooldridge M, Fisher M. 1997. Agent-based software engineering. IEEE Proceedings Software Engineering. 144(1):26-37.
    
    Zhang N L, Liu W. 1996. Planning in stochastic domains: Problem characteristics and approximation. Technical Report HKUST-CS96-31.http://kepu.ccut.edu.cn/100k/read-htm-tid-2784.html http://www.robocup.Org/overview/21.html

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700