基于人工神经网络的决策算法研究

英文题名：Use of Neural Networks as Decision Makers in Strategic Situations
作者：Couraud ; Benoit
论文级别：硕士
学科专业名称：电路与系统
中文关键词：神经网络 ; 增强学习 ; 后向传播 ; 博弈论 ; 重复囚徒困境
英文关键词：Neural Networks ; Reinforcement Learning ; Back-Propagation ; Game Theory ; Iterated Prisoner’s Dilemma
学位年度：2009
导师：刘佩林
学科代码：080902
学位授予单位：上海交通大学

摘要

智能包括在特殊情况下为了实现某个特定的目标,作出正确的决策、达到特定目标的能力。迄今为止,大部分的智能系统仅仅能够模拟某一个特定的推理过程,而很少有系统能够根据环境自动地找到找到自己的思维方式。此外,神经网络(Neural Network)也从来没有在这个领域中被采用。本文介绍一个新的智能系统,它能根据自己所在的环境自动地做出决定,以达到某种特定的目标。即在面临需要达到某一个目标的情况下,该智能系统必须进行自我调整,自己找出最佳的策略。
     在大多数情况下,某种特定环境的情况参数需要以非线性的方式映射到最终的决策。这种映射过程可以通过人工神经网络来完成。在本论文中,我们利用人工神经网络来充当决策者。我们可以证明精心设计的人工神经网络能够在复杂的环境下(例如其他智能系统的比赛中)具有像人一样的行为、做出合理的决策。
     本文采用了一种新的人工神经网络结构。我们将对这个新的结构进行和介绍和测试,可以证明这个人工神经网络能够像人一样智能地决策。本文除了采用新的类人人工神经网络结构,还引入了一种新的训练方法。这种训练方法能够让我们的类人人工神经网络不断进化,并最终收敛到一个最佳的决策。这种新的训练方法受启发于人类的学习过程,包括一种新的BP(Back-Propagation)随机无监督强化训练方法(Stochastic Unsupervised Reinforcement-learning Rule)。本文中,我们也通过数学方法证明了这种训练方法的有效性。更重要的是,我们采用的这种训练方法和许多其他的强化训练方法不同,它能够使用在非离散输出的应用中,因而拥有更为广阔的实际应用前景。
     为了验证本文引入的新的类人人工神经网络结构和新的训练方法。我们通过计算机软件实现该类人人工神经网络,并对其进行测试。测试中,我们采用框架(Framework)为真实生活中的数学模型,例如博弈理论中提供的模型,尤其是重复囚徒困境(Iterated Prisoner Dilemma)的模型。因为博弈论建立的模型常常被使用在对新的类人人工智能模型的测试,我们可以通过这些模型验证我们类人人工神经网络与训练方法的设计,并最终证明我们的类人人工神经网络能够用于制作拥有智能行为的机器。通过测试,我们可以得出我们设计的类人人工神经网络能够像人一样做出只能的决策,从而证明了用人工神经网络根据环境进行决策的想法的正确性。
Intelligence consists of the ability to make right decisions in a given situation in order to achieve a certain goal. Until now, most of intelligent systems were just able to copy some reasoning process, but very few systems could find their own way of thinking by themselves and none of them was constituted of only neural networks. This thesis introduces a new intelligent agent who is capable of intelligent behavior, which means that he is able to adapt himself to his environment and to make his own decisions, in order to achieve a predetermined objective. Thus, confronted to strategic situations (situations in which one has to make the right decisions in order to achieve a given goal), such an intelligent agent will be able to adapt himself and to find his own optimal strategy.
     Most of time, effective decision-making in strategic situations requires nonlinear mapping between stimulus and the appropriate decision. This sort of mapping can be provided by Artificial Neural Networks. Therefore, in this thesis, we describe the utilization of Artificial Neural Networks as decision makers, and we demonstrate that if they are well designed, they are capable of intelligent behavior in complex situations, such as competitive situations against other intelligent agents.
     The Artificial Neural Network designed in this thesis, benefits of a new architecture that is introduced, explained, and that will then be tested in order to see the ability of such an intelligent agent to make decisions as humans do. Then, after its architecture has been introduced, the Artificial Neural Network will have to evolve according to a new learning rule, as every other neural network, in order to converge to a good Decision Making. This new learning rule that we introduce in this thesis is inspired from the human-learning process and consists in a new stochastic unsupervised reinforcement-learning rule using Back-Propagation. Its effectiveness is also mathematically demonstrated. Furthermore, unlike most of reinforcement learning rules, it is designed so it can be used even with continuous outputs, what makes it worth for a lot of different real-life applications.
     Finally, to validate the architecture and the human-inspired reinforcement learning that we introduced, the Human-Like Artificial Neural Network is tested, and is shown to be able to evolve himself and to make decisions as humans do. The frameworks used for these tests are mathematical models of real-world situations such as those provided by Game Theory and in particular the Iterated Prisoner Dilemma, which has been used several times those last few years to test new models in artificial intelligence. Thus, Game Theory provides us a framework that validates the design of our Human-Like Artificial Neural Network and of the new reinforcement-learning rule we designed, and it allows us to demonstrate that Artificial Neural Networks can be used to design machines that are also capable of intelligent behavior.

引文

[1] Martin T. Hagan, Howard B. Demuth, Mark H. Beale, Neural Network Design. Princeton, PWS, Publishing Company, 1999.
    [2] Kevin Gurney, An introduction of Neural Networks. CRC PRESS.
    [3] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT Press, Cambridge, MA.
    [4] J. Von Neumann and O. Morgensten, Theory of Games and Economic Behavior. Princeton, NJ: Priceton Univ. Press, 1944.
    [5] J. Maynard Smith, Evolution and the Theory of Games. Cambridge University Press, 1982.
    [6] R. Axelrod, The Evolution of Cooperation. New York: Basic Books, 1984.
    [7] B.Widrow, N. K. Gupta, and S. Maitra. Punish/reward: Learning with a critic in adaptive threshold systems. IEEE Transactions on Systems, Man, and Cybernetics, vol. 5, pp. 455-465, 1973.
    [8] A. G. Barto. Learning by statistical cooperation of self-interested neuron-like computing elements, Human Neurobiology, vol. 4, pp. 229-256, 1985.
    [9] R. Axelrod, More effective choice in the iterated prisoner’s dilemma, J. Conflict Resolution, vol. 24, pp. 379-403, 1980.
    [10] J. Reed, R. Toombs, and N. A. Barricelli,“Simulation of biological evolution and machine learning,”J. Theoretical Biology, vol. 17, pp. 319–342, 1967.
    [11] R. Axelrod,“The Evolution of Strategies in the iterated prisoner dilemma”in Genetic Algorithms and Simulated Annealing, L. Davis, Ed. London, UK: Pitman, pp. 32-41, 1987..
    [12] D. Kreps, P. Milgron, J. Roberts, and J. Wilson,“Rational cooperation in the finitely reeated prisoner’s Dilemma”, J. Econ. Theory, vol 27, pp. 326-355, 1982.
    [13] R.Boyd and J. P. Loberbaum,“No pure strategy is evolutionarily stable in the repeated prisoner’s Dilemma”, Nature, vol. 327, pp. 58-59, 1987.
    [14] P. Hingston, G. Kendall,“Learning versus Evolution in Iterated Prisoner’s Dilemma”, Proceedings of the 2004 Congress on Evolutionary Computation, 2004
    [15] D.Channel, S. Markowitch,“Learning Models of intelligent agents”, Proceedings of the Thirteenth National Conference on Artificial Intelligence and the eighth Innovative Applications of Artificial Intelligence Conference, vol.2, pp 62-67, AAAI Press, Menlo Park, California, 1996.
    [16] D. B. Fogel,“Evolving behaviors in the iterated prisoner’s dilemma,”Evol. Comput., vol. 1, no. 1, pp. 77–97, 1993.
    [17] L. J. Fogel,“On the organization of intellect,”Ph.D. dissertation,Univ. California, Los Angeles, 1964.
    [18] L. J. Fogel, A. J. Owens, and M. J. Walsh, Artificial Intelligence Through Simulated Evolution. New York: Wiley, 1966.
    [19] C. L. Tan, T. S. Quah, H. H. Teh, An Artificial Neural Network that models Human Decision Making. Computer, vol. 29, pp. 64-70, 1996.
    [20] P.G Harrald and D. B. Fogel, Evolving continuous behaviors in the iterated prisoner’s dilemma, Biosyst., vol. 37, n0. 1-2, pp. 135-145, 1996.
    [21] C. L. Tan, T. S. Quah, H. H. Teh, An Artificial Neural Network that models Human Decision Making. Computer, vol. 29, pp. 64-70, 1996.
    [22] V. Gullapalli. A stochastic reinforcement algorithm for learning real-valued functions. Neural Networks, vol. 3, pp. 671-692, 1990.
    [23] A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 10-229, 1959. Reprinted in E. A. Feigenbaum and J. Feldman, editors. Computers and Thought, pages 71-105. McGraw-Hill, New York, 1963.
    [24] Shashi Mittal, Lalyanmoy Deb, Optimal Strategies of the Iterated Prisoner’s Dilemma Problem for multiple Conflicting objectives. IEEE symposium on Computational intelligence and Games, pp197-204, 2006.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700