计算机围棋中的算法研究

英文题名：The Study of Algorithms in Computer Go
作者：岳鹏
论文级别：博士
学科专业名称：基础心理学
中文关键词：计算机围棋 ; 棋手思维模型 ; 搜索算法 ; 学习算法
英文关键词：Computer Go ; player's thinking model ; search algorithm ; learning algorithm
学位年度：2007
导师：邱玉辉
学科代码：040201
学位授予单位：西南大学
论文提交日期：2007-04-25

摘要

博弈是人工智能的重要研究主题，人工智能的发展在很大程度上得益于博弈研究的发展。作为博弈研究的主要内容之一，棋类博弈得到了满意的解决，唯一的例外的是围棋，目前最优秀的围棋程序的水平还不及人类初级棋手。由于围棋的搜索空间太大、计算机难于处理模糊概念且难于设计学习算法，造成了计算机围棋程序的棋力难于提高。围棋是检验人工智能发展水平的良好环境，如何提高围棋程序的棋力是人工智能领域的一大难题。同时，开发出与人类棋手水平相当的围棋程序也有助于对人类认知能力的理解。所以计算机围棋研究具有重要的理论意义和实用价值。
     我们首先介绍了国内外计算机围棋研究现状，包括基础算法、搜索算法和学习算法三方面并介绍了部分计算机围棋程序，认为计算机围棋的搜索算法主要有minmax算法、alphabeta算法、failsoft算法、negmax算法、negscout算法和mtdf算法等等，涉及到的学习算法和理论基础主要有组合博弈理论、数学形态学、蒙特卡罗算法、模糊学习、分治法、强化学习算法、遗传算法、神经网络、支持向量机、贝叶斯模式分类、基于解释的泛化和并行算法等等，指出了目前研究中存在的主要不足主要表现为局面表示法欠完善、中盘策略欠完整以及学习算法欠成熟。
     然后，我们简述了本研究的相关理论基础，包括数学形态学、有限状态机、线性模型、感知机与遗传算法。
     接着，我们阐明了本研究提出的棋手思维模型、基础算法、搜索算法、学习算法及相应实验结果。具体说来，我们完成的主要工作与创新点包括以下几个方面：
     一、提出了一个完整的棋手思维模型。这是在提出了领土领海和领空等地域概念、提出了局面的层次表示法、归纳并分类了大量围棋术语、提取了目标概念、建立了目标图、总结了若干目标选择原则和走步属性并分析了棋风概念的基础上完成的。这个模型的特点在于它的完整性和围棋术语的分类、目标选择原则与走
     步属性的全面性。
     二、设计了基于数学形态学的局面层次表示法、棋群聚类算法和地域划分算法。这些有统一理论基础的算法计算简单，实验结果表明其效果良好。利用已有的数学形态学理论可以设计更多有意义的启发式策略。
     三、设计了PEMIS模式编码方法。它基于模式的邻近特征、行列特征和轮廓特征进行编码，其突出优点是与模式的黑白对称性、旋转与翻转对称性以及平移对称性均无关，实验结果表明这种模式编码方法性能良好。在基础算法方面，我们还设计了一种走步增量算法。
     四、设计了复合目标搜索算法。我们认为复合目标可看作是由“与”或“或”关系构成的单一目标的二维向量。复合目标搜索算法的优点是其调用的基本函数可由单一目标搜索算法的基本函数合成。我们还比较了经典搜索算法的性能。
     五、设计了PEMIS模式库与定式库学习算法。实验结果表明了其有效性，最终学习到的模式库与定式库占用的空间比较小。另外，还设计了ZOBRIST定式库学习算法，实验结果也表明了其有效性。在学习算法方面，我们还设计了棋形与气术语的示教学习算法和棋风模型的遗传学习算法。
     六、开发了以此棋手思维模型为核心的计算机围棋程序ShoutGo，实现了上述各算法。ShoutGo认为棋手拥有模式库和定式库，有各自的棋风；棋手在完成棋群聚类和地域划分后，在目标选择原则的指引下以对方最后所下之子为焦点进行目标猜测，同样在目标选择原则及棋风的指引下生成特定目标，继而以目标为导向在各自的模式库和定式库推荐走步的作用下进行搜索发现走步，再根据走步属性选取特定走步，如果目标不成功或无可行走步，则重新进行地域划分或根据其它决策原则生成其它目标，直到发现合适走步；在这一过程中，模式库和定式库影响了走步的推荐，棋风影响了目标之间的跳转。
     最后，我们探讨了棋手思维模型的评价、走步增量算法与走步扫描算法的关系、数学形态学方法在基础算法中的应用、劫与共活现象对搜索的影响、搜索树特点与心理因素的关系、搜索时间估计、局面评价函数、目标搜索的可学习性以及棋风建模等问题，并探讨了机器学习方法在计算机围棋中的应用可能性，提出了进一步的研究计划。
     计算机围棋研究作为人工智能领域的一个分支，与心理学有着天然联系。我们在研究过程中，特别强调以人类棋手为本的原则，力求棋手思维模型与人类棋手真实思维过程的高度契合，力求其学习算法的完善。我们今后的研究重点将在学习算法上，能象人类棋手一样地不断地学习，计算机围棋才有希望。
Game is one of the important subjects of Artificial Intelligence (AI) and AI's development contributes to the development of the study of game greatly. As one of the main subjects of game, the board game has been studied thoroughly, except for Go. The level of the most excellent Go programs is lower than the level of the primary human player at present. Because of the too large search space in Computer Go, computer is hard to get hold of the fuzzy concept of Go. Therefore, the level of Computer Go is hard to rise. Go is a nicer environment to inspect the development level of AI. It is a puzzle in Artificial Intelligence about how to improve the capability in the Computer Go programs. At the same time, it is helpful to comprehend the human cognition that develop the Computer Go programs, whose level is equal to human's. Therefore, the study of Computer go is important both in theory and in practice.
     At first, we review the study of Computer Go in the world from the aspects of basis algorithm, search algorithm and learning algorithm, and then we introduce some programs of Computer Go. Search algorithms in Computer Go chiefly include minmax algorithm、alphabeta algorithm、failsoft algorithm、negamax algorithm、negascout algorithm and mtdf algorithm. Learning algorithms and theories in Computer Go involve combinatorial games theory, mathematic morphology, Monte-Carlo algorithm, fuzzy learning, divide and conquer, reinforcement learning, genetic algorithm, neural network, support vector machine, bayesian pattern classification, generalization by explaining, parallel algorithm, and so on. At present, the mainly defects rest with the deficient method in representation of position, incomplete tactics of middle game, and immature learning algorithm. Subsequently, we introduce the relative theory in this study, including mathematics morphology, finite state machine, linear model, perception, and genetic algorithm.
     After that, we introduce the player's thinking model, basis algorithm, search algorithm and learning algorithm in this study, and the result of experiments of these algorithms. In detail, the major contributions of this study are:
     1. Put forward an integrated player's thinking model. We do this after putting forward the method of erarchical representation of position, generalizing and classifying plenty of Go terms, extracting target notion, setting up target graph, summarizing some target choices principle and move attributes, and analyzing the notion of player's style. The model's characteristics are comprised of integration, and completeness of the classification of Go terms, target choice principles and move attributes.
     2. Design an erarchical representation of position, Go cluster algorithm and area partitions based on mathematic morphology. These algorithms based on united theory are simple and have nicer effect tested by the experiment. More meaningful heuristic tactics will be introduced by utilizing the existing mathematic morphology methods.
     3. Design a pattern encoding method independent of symmetry (PEMIS). Its advantages are encoding based on adjacent character, queue character and figure character of pattern, while irrelevant to black-white symmetry, rotation and transposition symmetry, and translation symmetry of the pattern. The experiment shows that the method works well. In addition, we design a move increment algorithm as a basis algorithm.
     4. Design a compound target search algorithm. We think that a compound target is composed of simplex targets in two dimensions by AND or OR logic. Its advantages lie on that its basic function may be constituted of the basic functions of simplex target search algorithm. Furthermore, we compare the performances of classic search algorithms.
     5. Design a pattern/joseki library learning algorithm by PEMIS. The experiment shows that it work well, and the learned library occupied in a small storage area. We design a joseki library learning algorithm by ZOBRIST too, and it works well. Moreover, we design a Go shape and liberty term learning algorithm by teaching, and a player's style learning algorithm by genetic algorithm.
     6. Develop a Computer Go program named ShoutGo whose core is the player's thinking model, and put these algorithms into practice. ShoutGo deems that a human player own a pattern/joseki library and respective style. After making Go cluster and partitioning the area, a player can proceed to do target guess, focusing on the last stone of the opponent by target choice principle. Likewise, led by target choice principle and respective style, a player can generate new targets, and then piloted by the target searching move with the effect of recommendable move put by respective pattern library/joseki library, a player can select a special move based on move attributes. If the target is unsuccessful, or there is no feasible move, we need to partition areas again or generate other targets based on other decision principles, until finding suitable move. During this course, the pattern/joseki library will affect the recommendation of move, while the player's style model affects the switch of targets.
     Finally, we discuss the evaluation of the human thinking models, the relation between move increment algorithm and move scanning algorithm, the application of mathematics morphology methods in the basis algorithm, the effect on searching by ko and seki phenomenon, the relationship between characters of searching trees and psychology factors, estimation of searching time, position evaluation function, feasibility of the target search and player's style model building etc.; at the same time, we discuss the application possibility about the method of the machine learning in Computer Go, and put study plan forward.
     As a branch of AI, Computer Go is related with psychology naturally. In the study, we emphasize the principle of taking human players as model especially, and the identity with human players' thinking, and the perfection of its learning algorithm. Our future work will focus on the learning algorithm. Only when the Computer Go program can learn something just like a human player, Computer Go is hopeful.

引文

[1] Bart Eisenberg. Kasparov, deep blue, and the games machines play. Pacific Connection, 1997,8.

    [2] Jay Burmeister. Studies in Human and Computer Go: Assessing the Game of Go as a Research Domain for Cognitive Science. PhD thesis, The University of Queensland, Australia, 2001, 10.
    [3] Niels A. Taatgen. The study of learning mechanisms in unified theories of cognition. Lisse, the Netherlands Swets & Zeitlinger, 1994.

    [4] Thomas Wolf. Tsume go with RisiKo. In Game Festival in Cannes/France, 1992.
    [5] Thomas Wolf. Quality improvements in the tsume go mass production. In Proceedings of the Game Festival in Cannes/France, 1993.
    [6] Thomas Wolf. The program Gotools and its computer-generated tsume go database. In 1st Game Programming Workshop in Japan, Hakone, 1994.
    [7] Thomas Wolf. About problems in generalizing a tsumego program to open positions. In 3rd Game Programming Workshop in Japan, Hakone, 1996.
    [8] Thomas Wolf. The diamond. British Go Journal, 1997.
    [9] David Fotland. Knowledge representation in The Many Faces of Go, ftp://ftp-igs.joyjoy.net/Go/computer/mfg.tex.Z, 1993.
    [10] Tristan Cazenave. Iterative widening. IJCAI, 2001,1.

    [11] Tristan Cazenave. Gradual abstract proof search. ICGA, 2001,25(1): 3-15.

    [12] Tristan Cazenave. Admissible moves in two-player games. In SARA 2002, 2002: 52-63,.
    [13] Tristan Cazenave. Self fuzzy learning. International Workshop LPSC'96, 1996.
    [14] Tristan Cazenave. Learning to forecast by explaining the consequences of actions.International Workshop MALFO'96, 1996.
    [15] Akihiro Kishimoto and Martin Muller. A solution to the GHI problem for depth-first proof-number search. In 7th Joint Conference on Information Sciences (JCIS2003), 2003: 489-492.
    [16] Akihiro Kishimoto and Martin Muller. Df-pn in Go: An application to the one-eye problem. Kluwer Academic Publishers, 2003: 125-141.
    [17] Akihiro Kishimoto and Martin Muller. A general solution to the graph history interaction problem. In Nineteenth National Conference on Artificial Intelligence (AAAI'04), AAAI Press, 2004: 644-649.
    [18] Akihiro Kishimoto. Correct and Efficient Search Algorithms in the Presence of Repetitions. PhD thesis, University of Alberta, 2005, 1.
    [19] Akihiro Kishimoto and Martin Muller. Dynamic Decomposition Search: A Divide and Conquer approach and its application to the one-eye problem in Go. In IEEE Symposium on Computational Intelligence and Games (CIG'05), 2005: 164-170.
    [20] A. Kishimoto and M. Muller. Search versus knowledge for solving life and death problems in Go. In Twentieth National Conference on Artificial Intelligence (AAAI-05), 2005, 1374-1379.
    [21] Victor L. Allis. Searching for Solutions in Games and Artificial Intelligence. PhD thesis, University of Limburg, 1994.
    [22] Martin Muller. Proof-set search. Technical Report TR-99-20, Electrotechnical Laboratory, Tsukuba, Japan, 1999.
    [23] Ayumu Nagai. A new and/or tree search algorithm using proof number and disproof number. Complex Games Lab Workshop. Electrotechnical Laboratory, Tsukuba, Japan, 1998.
    [24] Thomas Thomsen. Lambda-search in game trees - with application to Go, Jaap and Iida, 2000, 2063(10): 19-38.
    [25] Keh-Hsun Chen. Soft decomposition search and binary game forest model for move decision in Go. Information Sciences, 2003, 154: 157-172.
    [26] Martin Muller. Decomposition search: A combinatorial games approach to game tree search, with applications to solving Go endgames. In IJCAI-99, 1999, 1: 578-583.

    [27] Martin Muller. Global and local game tree search. Information Sciences, 2001, 135:187-206.
    [28] Teigo Nakamura and Elwyn Berlekamp. Analysis of composite corridors. Technical ReportTR-02-005, International Computer Science Institute, Berkeley, 2002,4.
    [29] Erik D. Demaine. Playing games with algorithms: Algorithmic combinatorial game theory. In 26th Symposium on Mathematical Foundations in Computer Science (MFCS 2001), 2001,2136: 18-32.
    [30] William Edward Fraser. Thermographic analysis of Go endgames using brute force.Combinatorial Game Theory Workshop, 2000, 6.

    [31] Howard Landman. Eyespace Values in Go. Cambridge University Press, 1996, 227-257.
    [32] David Moews. On Some Combinatorial Games Connected with Go. PhD thesis, University of California, Berkeley, 1993.
    [33] Martin Muller. Computer Go as a Sum of Local Games: An Application of Combinatorial Game Theory. PhD thesis, ETH Zurich, 1995.
    [34] Martin Muller, Elwyn Berlekamp, and Bill Spight. Generalized thermography: Algorithms, implementation, and application to Go endgames. Technical Report 96-030, Berkeley, 1996.
    [35] Bruno Bouzy. The INDIGO program. In Proceedings of the 2nd Game Programming Workshop in Kanagawa, Japan, 1995.
    [36] Bruno Bouzy. Towards spatial reasoning about natural objects. Technical report, LIAFA-University Denis Diderot, 1995.
    [37] Bruno Bouzy. Complex games in practice. In Fifth Game Programming Workshop in Japan, 1999.
    [38] Bruno Bouzy. Go patterns generated by retrograde analysis. In Computer Olympiads, Maastricht, 2001.
    [39] Bruno Bouzy. A small Go board study of metric and dimensional evaluation functions. In 3rd Computer and Games Conference, Edmonton, 2002.

    [40] Bruno Bouzy. The move decision strategy of Indigo. ICGA Journal, 2003,26( 1).
    [41] Bruno Bouzy. Mathematical morphology applied to computer go. IJPRAI, 2003, 17(2).
    [42] Bruno Bouzy and Bernard Helmstetter. Monte Carlo Go developments. Advances inComputer Games conference (ACG-10), Graz, 2003: 159-174.
    [43] Bruno Bouzy. Associating domain-dependent knowledge and Monte Carlo approaches within a Go program. In Joint Conference on Information Sciences, 2003.
    [44] Bruno Bouzy. Associating shallow and selective global tree search with Monte Carlo for 9x9 Go. In 4rd Computer and Games Conference, Ramat-Gan, 2004.
    [45] Bruno Bouzy. The 4th Computers and Games conference. ICGJ, 2004, 27(3).

    [46] Bruno Bouzy. Associating domain-dependent knowledge and Monte Carlo approaches within a Go program. Information Sciences, 2005.
    [47] Bruno Bouzy and Guillaume Chaslot. Bayesian generation and integration of k-nearest-neighbor patterns for 19×19 Go. IEEE 2005 Symposium on Computational Intelligence in Games, Colchester, UK, 2005: 176-181.
    [48] Bruno Bouzy. History and territory heuristics for Monte-Carlo Go. In Joint Conference onInformation Sciences, Salt Lake City, Heuristic Search and Computer Game Playing Session, 2005.
    [49] Bruno Bouzy. Move pruning techniques for Monte-Carlo Go. In 11th Advances in Computer Game conference, Taipei, 2005.
    [50] Bruno Bouzy and Guillaume Chaslot. Monte-Carlo Go reinforcement learning experiments. IEEE 2006 Symposium on Computational Intelligence in Games, Reno, USA, 2006.
    [51] Alex B. Meijer and Henk Koppelaar. A learning architecture for the game of Go. In 2nd International Conference on Intelligent Games and Simulation (GAMEON 2001), London, 2001.
    [52] Dylan Shell George Konidaris and Nir Oren. Evolving neural networks for the Capture Game. SAICSIT Postgraduate Symposium, 2002.
    [53] H. A. Mayer and Peter Maier. Coevolution of neural Go players in a cultural environment. In Proceedings of the Congress on Evolutionary Computation 2005, Edinburgh, Scotland, 2005.
    [54] Christopher D. Rosin and Richard K. Belew. Methods for competitive co-evolution: Finding opponents worth beating. In 6th International Conference on Genetic Algorithms, 1995.
    [55] Per Rutquist. Evolving an evaluation function to play Go. Master's thesis, Ecole Polytechnique, 2000.
    [56] Kenneth O. Stanley and Risto Miikkulainen. Evolving a roving eye for Go. In Genetic and Evolutionary Computation - GECCO 2004, Part II, 2004: 1226-1238.
    [57] David Al-Dabass Richard Cant, Julian Churchill. Using hard and soft artificial intelligence algorithms to simulate human Go playing techniques. International Journal of Simulation Systems, Science & Technology, 2001, 2(1): 31-49.
    [58] Julian Churchill, Richard Cant, and David Al-Dabass. A new computational approach to the game of Go. In Game-On 2001 conference, 2001.
    [59] Herbert Enderton. The Golem Go program. Technical Report CMU-CS-92-101, School of Computer Science, Carnegie-Mellon University, 1991.
    [60] Graham Kendall, Razali Yaakob, and Philip Hingston. An investigation of an evolutionary approach to the opening of Go. In Proceedings of the 2004, IEEE Congress on Evolutionary Computation (CEC'04), 2004.
    [61] Andres Santiago Perez-Bergquist. Applying ESP and region specialists to neuro-evolution for Go. Technical Report CS-TR-01-24, The University of Texas at Austin, Department of Computer Sciences, 2001.
    [62] Tapani Raiko. Nonlinear relational Markov networks with an application to the game of Go. In 15th International Conference on Artificial Neural Networks(ICANN 2005), 2005: 989-996.
    [63] Erik van der Werf and Jaap van den Herik. Visual learning in Go. In The CMG 6th Computer Olympiad Computer-Games Workshop Proceedings, 2001.
    [64] Thore Graepel, Mike Goutrie, Marco Kruger, and Ralf Herbrich. Learning on graphs in the game of Go. In International Conference on Artificial Neural Networks (ICANN-01), Vienna, Austria, 2001.
    [65] P. Baldi L. Ralaivola, L. Wu. SVM and pattern-enriched common fate graphs for the game of Go. In European Symposium on Artificial Neural Networks, Bruges, Belgium, 2005.
    [66] Thore Graepel. PAC-Bayesian Pattern Classification with Kernels: Theory, Algorithms, and an Application to the Game of Go. PhD thesis, Department of Computer Science, Technical University of Berlin, Berlin, Germany, 2002.
    [67] David Stern, Ralf Herbrich, and Thore Graepel. Bayesian pattern ranking for move prediction in the game of Go. In International Conference on Machine Learning (ICML-2006), 2006.
    [68] Markus Enzenberger. The integration of a priori knowledge into a Go playing neural network, http://www.cs.ualberta.ca/～emarkus/neurogo/neurogo.ps.gz, 1996.
    [69] Markus Enzenberger. Evaluation in Go by a neural network using soft segmentation. In 10th Advances in Computer Games conference, 2003: 97-108.
    [70] Thomas Philip Runarsson and Simon Lucas. Co-evolution versus self-play temporal difference learning for acquiring position evaluation in small-board Go. IEEE Transactions on Evolutionary Computation, 2005.
    [71] Myriam Abramson and Harry Wechsler. Competitive reinforcement learning for combinatorial problems. In INNS-IEEE International Joint Conference on Neural Networks (IJCNN 2001), 2001.
    [72] Myriam Abramson. Learning Coordination Strategies. PhD thesis, George Mason University, 2003.
    [73] Horace Wai-kit Chan. Application of temporal difference learning and supervised learning in the game of Go. Master's thesis, Chinese University of Hong Kong, 1996.

    [74] Horace Wai-Kit Chan, Irwin King, and John Lui. Performance analysis of a new updating rule for td(lambda) learning in feedforward networks for position evaluation in Go game. In IEEE International Conference on Neural Networks, 1996, 3: 1716-1720.
    [75] Reindert-Jan Ekker. Reinforcement learning and games. Master's thesis, Rijksuniversiteit Groningen, 2003.
    [76] Imran Ghory. Reinforcement learning in board games. Technical Report CSTR-04-004, Department of Computer Science, University of Bristol, 2004.
    [77] Tim Huang, Graeme Connell, and Bryan McQuade. Experiments with learning opening strategy in the game of Go. International Journal on Artificial Intelligence Tools, 2004, 13(1): 101-104.

    [78] Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski. Temporal difference learning of position evaluation in the game of Go. In Advances in Neural Information Processing 6, Morgan Kaufmann, 1994.
    [79] Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski. Learning to Evaluate Go Positions via Temporal Difference Methods. Springer Verlag, Berlin, 2000.
    [80] David Silver, Richard Sutton, and Martin Muller. Reinforcement learning of local shape in the game of Go. In IJCAI2007,2007.
    [81] Raonak Zaman, Danil Prokhorov, and Donald C. Wunsch. Adaptive critic design in learning to play game of Go. Technical report, Texas Tech University, Lubbock, 1997.
    [82] Shinichi Sei, Nobuyuki Ichiyoshi, and Kazuo Taki. Experimental version of parallel computer Go-playing system GOG.Technical Report TR 669, ICOT, Japan, 1991.
    [83]Shinichi Sei, Hiroaki Oki, Noriaki Sanechika, Takashi Akaosugi, and Kazuo Taki.Experimental version of parallel computer Go-playing systems GOG.Technical Report TR718, ICOT, Japan, 1991.
    [84]Xiaoyu Zhang.Parallel computing as a way to go.Master's thesis, University of California, Santa Cruz, 2004.
    [85]Xindi Cai and D.C.Ⅱ Wunsch.A parallel computer-Go player, using HDP method.International Joint Conference on Neural Networks, 2001, 4:2373-2375.
    [86]Erik van der Werf.AI techniques for the game of Go.PhD thesis, Universiteit Maastricht, Maastricht, The Netherlands, 2004.
    [87]A.Zobrist.A model of visual organization for game of Go.In proceedings of the spring joint computer cinference, 1969, 34:103-112.
    [88]Yue Peng, Yang Juan, Qiu yuhui.Time cost model of seaching in Tsume-go.Computer science(App), 2006, 33(10):319 & 355.
    [89]http://www.gnu.org/software/gnugo/
    [90]陈祖源。围棋规则新论。四川，蜀蓉棋艺出版社，2000，12。
    [91]王小春。PC游戏编程(人机博弈)。重庆，重庆大学出版社，2005，5。
    [92]陈志行。电脑围棋小洞天。广州，中山大学出版社，2000，8。
    [93]Milan Sonka等著，艾海舟等译。图像处理、分析与机器视觉。北京，人民邮电出版社，2003。
    [94]谷蓉，刘学民，朱仲涛，周杰。一种围棋定式的机器学习方法。计算机工程，2004，30(6)：142-144。

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700