基于认知科学的计算机围棋博弈问题的研究

英文题名：Study on Problems for Computer Go Based on Cognitive Science
作者：余磊
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：认知科学 ; 计算机围棋 ; 厚势价值量化 ; 形势判断 ; 获胜概率
英文关键词：cognitive science ; computer Go ; quantization of the strong group's value ; positional analysis ; winning probability
学位年度：2011
导师：刘锦高
学科代码：081001
学位授予单位：华东师范大学
论文提交日期：2010-09-01

摘要

认知科学是20世纪世界科学标志性的新兴研究门类,它作为探究人脑工作机制的前沿性尖端学科,已经引起了全世界科学家的广泛关注。由于在视觉表现形式上具有抽象、复杂的特性,计算机围棋正在成为认知科学研究的重要方向和工具。
     近年来,研究者大多采用Monte Carlo (MC)、Upper Confidence bounds applied to Trees (UC)、All Moves As First (AMAF)等不包含任何围棋知识的统计算法解决围棋对弈问题。本文则以不同的角度,通过研究围棋棋手的知识结构、表述形式、思维模型等,提出了一系列基于模拟人类棋手思考方式,量化棋手模糊判断结果的计算机围棋解决方案,不但有助于围棋程序棋力的提高,更有助于提升对人类认知能力的理解,促进认知科学的研究、发展,具有重要的创新意义和实用价值。
     论文具体的创新内容如下：
     1.对棋块的气划分等级。根据分级结果,可以判断棋块的安全程度,确定捕获目标,产生候选着点并对其排序。通过Memory-enhanced Test Driver (MTD(f))算法对候选点依次进行搜索,寻找出正确的捕获棋步。实验结果表明其效果较好,可以应用于计算机围棋实战中。这也为第三章计算厚势价值时确定有效子数做了必要的准备。
     2.对厚势价值进行量化。与传统的计算完全控制点的数量不同,本文提出了厚势的影响有如控制概率在二维空间的弥散,所有空点被控制的概率总和即为此厚势价值的思想,并为此设计了一种棋子影响函数,建立了一种计算厚势价值的数学模型；尝试将厚势的价值分为基本值和附加值两部分,进而动态地调整不同棋力下对厚势价值的不同判断,模拟了人类棋手对厚势的感觉；利用简单遗传算法对模型参数进行分级优化,获得了各个棋力层次下的厚势价值量化模型。
     实验结果表明该模型达到了较高精度,可以应用于计算机围棋序盘、中盘、收官等模块中,为第四章量化棋局形势的程序设计提供了基础。
     3.提出了一种对棋局形势进行量化评价的方法,采用获胜概率表征量化结果。通过模拟人类棋手判断形势的思考方式,以领先目数和棋局进展程度作为获胜概率的计算依据,并结合多级种群竞争消亡算法对模型参数进行了优化。当围棋程序的棋力发生变化时,只需相应调整模型中的参数,因此模型可以适用于不同棋力层次,具有较强的移植性和较高的普适性,在实验中取得了较好的效果,为第五章程序选择、确定最佳棋步奠定了基础。
     这种根据围棋知识构建模型、利用遗传算法确定参数的方法也可以应用于其他计算机博弈问题的解决。
     4.提出了一种计算机围棋中盘着手策略,包括棋步产生、评估和确定的方法。通过计算实地价值、战略价值、棋形价值和后续价值,对候选棋步进行初始评估。根据目的性差异,将棋步分为进攻和防御两部分；并结合获胜概率,计算攻防力度调整权值,从而动态调整攻击和防御棋步的评估值,寻找出最佳棋步。此方法既模拟了人类棋手在落子前的思考过程,又发挥了计算机擅于运算的特长,将动态分析、静态搜索和知识库的应用相结合,体现出一定层次的智能。
     5.开发了一个计算机围棋博弈系统CognitiveGo,将上述内容整合实现。每当落子前,CognitiveGo先根据模式库寻找一些可能的着点,继而根据自身棋力,对双方的实地、厚势、棋块强弱进行判断,并结合判断结果调整下一步棋的目标方向。在此过程中,模式库影响候选棋步的推荐,形势判断则影响目标之间的转换。
     综上所述,本文主要采用模拟人类棋手思考过程,建立相应模型的方式对计算机围棋相关问题进行研究,其研究方法和成果对于提升计算博弈智能,促进认知科学发展具有现实的应用价值和理论意义。
Cognitive science is a symbolic emerging research field of world science in the 20th century. As an exploratory subject focusing on the working of human brain, it has aroused wide attention from all around the world. Computer Go has become an important tool in cognitive scientific research due to its visual abstractness and complexity.
     In recent years, researchers have been solving Go game problems by using the algorithms including no Go knowledge such as MC, UCT, AMAF, etc. This paper presents a series of Go game resolutions based on the simulation of human players' thinking method and quantization of the general judgment by analyzing the knowledge structure, expression style and thinking mode of Go players. It is not only helpful for improving the level of the Go program, but also for promoting the cognitive ability of the human beings, which has innovative significance and utilitarian value in the research and development of cognitive science.
     The innovative thinking is as follows:
     1. Set levels for the groups’liberties. According to the levels, the groups’safeness can be judged, the target can be fixed and the candidate moves can be generated and sorted. Search the candidate moves in sequence by MTD(f) algorithm to find the best capture move. The result shows the method has good effect and can be applied in the real game of computer Go, which lays groundwork for defining effective number of stones when calculating the strong group's value.
     2. Quantize the value of the strong groups. Instead of the traditional way of calculating the controlled points, this paper presents a theory that the influence of the strong groups resembles the diffusion of control probability in the two-dimensional space and the sum of the control probability of the unoccupied points equals to the value of the strong group. An influence function and a mathematical model are established to prove it. We attempt to divide the value of the strong groups into the basic value and the added value, and then further adjust the judgments of the value of the strong groups and simulate the sense of human players toward the strong groups. Finally, the simple genetic algorithm is used to optimize the model's parameter by level and the strong group quantization models of different Go levels are obtained.
     The experimental result shows the model is of high accuracy so that it can be applied in starting game, middle game and end game of the computer Go and provide the basis for the static evaluation program and founds the basis for the subsequent program design of quantizing the game situation.
     3. Based on the above model, a quantization method is presented to evaluate the situation and winning probability is used to show the results of quantization. The parameters are optimized by simulating human player's thinking method, taking the leading points and game process as the calculating basis of winning probability and combining species compete-die out algorithms. The parameters in the model can be adjusted according to different Go levels so that the model can successfully cope with various Go levels, which boasts good portability and universality and obtains satisfying results in the experiment.
     The method which establishes model by Go knowledge and defines parameters by genetic algorithm can also be applied in other computer games. Thus it lays the foundation for the subsequent programs to select and define the best move.
     4. A middle game strategy in computer Go including methods of generating, evaluating and defining moves is presented. The candidate moves are initially evaluated by assessing the values of territory, strategy, shape and influence. The moves are divided into attacking and defending moves according to the difference of the purposes. Combining with winning probability, the strength of attack and defense is calculated and the weight value is adjusted, and then the weight values of attacking and defending moves are dynamically adjusted so that the best move is obtained. This method does not only simulate the thinking process of human players but also makes full play of the calculating advantage of the computer to combine the dynamic analysis, static search and knowledge base, which shows the intelligence of the computer.
     5. CognitiveGo, a computer Go game system is developed by integrating the above. CognitiveGo searches some possible moves according to the pattern library before playing each move, and then make further judgments about the territory, strong groups and group strength of both sides according to its Go level. After that, it adjusts the subject direction of the next move by the judgment result. During the process, the pattern library has influence on the recommendation of the candidate moves while the transformation of the subject is influence by the positional analysis.
     In conclusion, this paper researches on the computer Go problem mainly by establishing the model based on simulating the thinking process of the human players. The research method and results has academic and practical value in promoting computer game intelligence and the development of cognitive science.

引文

1 David N. Levy, eds. Computer Games. New York:Springer New York Inc,1988. 335-365.
    2 Bart Eisenberg. Kasparov. Deep Blue and the Games Machines Play. Pacific Connection,1997,8.
    3 H. Remus. Simulation of a learning machine for playing Go. In:Proceedings of the International Federation of Information Processing Congress(IFIP), North-Holland, Amsterdam,1963.
    4 Albert L. Zobrist. A model of visual organization for the game of GO. In: Proceedings of the Spring Joint Computer Conference. Spring Joint Computer Conference. Boston:AFIPS Press,Montvale,NJ,1969.103-112.
    5 Albert L.Zobrist. A New Hashing Method with Application for Game Playing, Technical report 88,University of Wisconsin, April 197O.Reprinted in ICCA Journal, 1970,13:69-73
    6 Albert L. Zobrist. Feature extractions and representation for pattern recognition and the game of Go:[PhD thesis]. Madison:Graduate School of the University of Wisconsin,1970.
    7 Walter Reitman, Bruce Wilcox, R. Nado. Goals and plans in a program for playing Go.In:Proc.29thACM Conference. Now York:ACM,1974:123-127.
    8 Walter Reitman, Bruce Wilcox. Modeling tactical analysis and problem solving in Go. In:Proceedings of the Tenth Annual Pittsburg Conference on Modeling and Simulation. Pittsburg,1979.2133-2148.
    9 Walter Reitman, Bruce Wilcox. The Structure and performance of the Interim.2 Go program. In:Proc. of the 6th International Joint Conference on Artificial Intelligence. IJCAI, Tokyo 1979.711-719.
    10 Walter Reitman, Bruce Wilcox. Pattern recognition and pattern-directed inference in a program for playing Go. In:D. Waterman, F. Hayes-Roth, eds. Pattern-Directed Inference Systems. Academic Press,1978.503-523.
    11 M Muller. Decomposition Search:A Combinatorial Games Approach to Game Tree Search, with Applications to Solving Go Endgames. In:Proceedings IJCAI'99.1999,1:578-583.
    12 M Muller. Not like other games-why tree search in Go is different. In: Proceedings of Fifth Joint Conference on Information Sciences(JCIS 2000),2000: 974-977.
    13 Muller. Games theories and Computer Go. In:Proc. of the Go and Computer Science Workshop(GCSW′93), INRIA, Sophia-Antipolis,1993.
    14 Muller. Computer Go as a sum of local games:An application of combinatorial game theory:[PhD thesis]. Zurich:Swiss Federal Institute of Technology,1995.
    15 M Mueller. Race to Capture:Analyzing Semeai in Go. In:Game Programming Workshop in Japan'99,1999,99(14):61-68.
    16 K. Chen. Group identification in Computer Go. In:D. Levy, D. Beal, eds. Heuristic Programming in Artificial Intelligence:the First Computer Olympiad, Chichester:Ellis Horwood,1989.195-210.
    17 K. Chen, Z. Chen. Static analysis of life and death in the game of Go. Information Sciences,1999.113-134.
    18 Chen, K. The move decision process of Go Intellect. In:David Erbach, eds. Computer Go,1990,14:9-17.
    19 Chen, K. Attack and defense. In:H. J. Van den Herik, L. V. Allis, eds. Heuristic Programming in Artificial Intelligence 3-The Third Computer Olympiad. Chichester:Ellis Horwood,1992.146-156.
    20 K. Chen. Some practical techniques for global search in Go. ICGA Journal,2000, 23:67-74.
    21 M. Reiss. E-mail sent in January 1995 to the computer Go mailing list. http://www.cs.uoregon.edu/-richard/computer-go.
    22 Chen Z.E-mail sent in May 1997 to the computer Go mailing list. http://www.cs.uoregon.edu/-richard/computer-go.
    23陈志行.电脑围棋小洞天.中山大学出版社,2000.
    24 David Fotland. The program G2.Computer Go,1986,1:10-16.
    25 David Fotland. Computer Go design issues. E-mail sent in October 1996 to the computer Go mailing list. http://www.cs.uoregon.edu/-richard/computer-go.
    26 Fotland, D. Knowledge representation in The Many Faces of Go. Fotland, D. Report posted on internet newsgroup rec.games.go,1993.
    27 GNUGO Documentation. http://www.gnu.org/software/gnugo/devel.html.
    28 Sylvain Gelly, Yizao Wang, Remi Munos, and Olivier Teytaud. Modification of UCT with patterns in Monte-Carlo Go. Technical Report RR-6062, INRIA,2006.
    29 Franck de Groot. Moyo Go Studio. http://www.moyogo.com/,2007.
    30 B. Bouzy. Modification cognitive in the game of Go [PhD thesis]. France: University Paris,1995.
    31 B. Bouzy. Incremental updating of objects in Indigo. In:Proceedings of the Fourth Game Programming Workshop in Japan,1997.
    32 Atsushi Yoshikawa, Takuya Kojima, Yasuki Saito. Relations between skill and the use of terms-an analysis of protocols of the game of Go. In:H.J.van den Herik, H. Iida, eds. Computers and Games,1999, No.1558 in LNCS:282-299.
    33 R. Langston. Perception in Go as a problem in AI. Computer Go,1988,6.
    34 Nobuo Araki, Kazuhiro Yoshida, Yoshimasa Tsuruoka, and Jun'ichi Tsujii. Move prediction in Go with the maximum entropy method. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Games,2007.
    35 Remi Coulom. Efficient selectivity and backup operators in Monte-Carlo tree search. In P. Ciancarini and H. J. van den Herik, editors, Proceedings of the 5th International Conference on Computer and Games, Turin, Italy,2006.
    36 Bruno Bouzy. History and territory heuristics for Monte-Carlo Go. New Mathematics and Natural Computation,2(2):1-8,2006.
    37 Sylvain Gelly, Yizao Wang, Remi Munos, and Olivier Teytaud. Modification of UCT with patterns in Monte-Carlo Go. Technical Report RR-6062, INRIA,2006.
    38 Gelly, S., Wang, Y, Munos, R., & Teytaud, O. (2006).Modification of UCT with patterns in Monte-CarloGo (Technical Report 6062). INRIA.
    39 Coulom, R. (2006). Efficient selectivity and backup operators in Monte-Carlo tree search.5th Interna-tional Conference on Computer and Games,2006-05-29. Turin, Italy.
    40 Kocsis, L., & Szepesvari, C. (2006). Bandit basedMonte-Carlo planning.15th European Conference on Machine Learning (pp.282-293).
    41 Gelly, S., Silver, D.:Combining online and offline knowledge in uct. In: ICML'07:Proceedings of the 24th international conference on Machine learning, New York,NY, USA, ACM Press (2007) 273-280.
    42 Wang, Y, & Gelly, S. (2007). Modifications of UCT and sequence-like simulations for Monte-CarloGo. IEEE Symposium on Computational Intelli-gence and Games, Honolulu, Hawaii (pp.175-182).
    43 Wang, Y., Gelly, S.:Modifications of UCT and sequence-like simulations for Monte-Carlo Go. In:IEEE Symposium on Computational Intelligence and Games, Honolulu, Hawaii. (2007) 175-182.
    44陈志行.电脑围棋小洞天[M].广州：中山大学出版社,2000.38-39.
    45 Keh Hsun Chen. Computer Go:Knowledge, Search and Move Decision[J]. ICGA Journal,2001,24(4):203-215.
    46 BREUKER DM, UITERWIJK JWHM, HERIK HJVD. Replacement Schemes for Transposition Tables[J]. ICCA Journal,1994,17(4):183-193.
    47 BREUKER DM, UITERWIJK JWHM, HERIK HJVD. Replacement Schemes for Transposition Tables[J]. ICCA Journal,1994,17(4):183-193.
    48 AKL SC, NEWBORN MM. The principle continuation and the killer heuristic[A]. ACM Annual Conference Proceedings[C]. Seattle:ACM,1977.466-473.
    49 SCHAEFFER J. The History Heuristic and the Performance of Alpha-Beta Enhancements[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989,11(11):1203-1212.
    50小林光一.围棋吃子技巧[M].安徽：安徽科学技术出版社,1987.
    51 Ryder, J. Heuristic analysis of large trees as generated in the game of Go. [Phd thesis]. Department of Computer Science, Standford University,1971.
    52 Sylvain Gelly and Yizao Wang. "Exploration exploitation in Go:UCT for Monte-Carlo Go", December 2006.
    53 Sylvain Gelly and David Silver. "Combining online and offline knowledge in UCT." In International Conference on Machine Learning, ICML 2007,2007.
    54 Remi Coulom. "Efficient selectivity and backup operators in Monte-Carlo tree search." Submitted to CG 2006,2006.
    55 Remi Coulom. "Computing Elo ratings of move patterns in the game of Go." Draft, submitted to ICGA Computer Games Workshop 2007,2007.
    56石田芳夫.围棋形势判断基础.四川：成都棋苑编辑委员会,1985.pp：27.
    57安倍吉辉.厚味计算法.中国奥林匹克出版社,1990.pp：108.
    58黄希文.阶梯围棋教室.辽宁：辽宁科学技术出版社,2006.pp：548.
    59 B. Bouzy, T. Cazenave. Computer Go:An AI-Oriented Survey. Artificial Intelligence, October 2001,.132(1):39-103.
    60 Albert L. Zobrist. A model of visual organization for the game of GO. In: Proceedings of the Spring Joint Computer Conference. Spring Joint Computer Conference. Boston:AFIPS Press, Montvale, NJ,1969.103-112.
    61 Albert L. Zobrist. A New Hashing Method with Application for Game Playing, Technical report 88, University of Wisconsin, April 1970.Reprinted in ICCA Journal,1970,13:69-73.
    62 Albert L. Zobrist. Feature extractions and representation for pattern recognition and the game of Go:[PhD thesis]. Madison:Graduate School of the University of Wisconsin,1970.
    63 Jon Ryder. Heuristic analysis of large trees as generated in the game of Go:[PhD thesis]. Department of Computer Science, Stanford University,1971.
    64 K. Chen. Computer Go:Knowledge, Search, and Move Decision, ICGA Journal, Vol.24, No.4,203-215 (2001)
    65 http://www.gnu.org/software/gnugo/devel.html.
    66 K. Chen. Group identification in Computer Go. In:D. Levy, D. Beal. Heuristic Programming in Artificial Intelligence:the First Computer Olympiad, Chichester: Ellis Horwood,1989.195-210.
    67 Lei Yu, Jingao Liu. Improved Winning Probability Model in Go Based on Strong Group Quantization and Multi-level Species Compete-Die Out Algorithms. Proc. of 2010 the 2nd International Conference on Future Computer and Communication (ICFCC 2010).
    68 Zhengwei. Zhou, Shijing. Yan. Development Situation of Computer Go. Communications of the Institute of Information and Computing Machinery, Vol.1, No.2, April.2007, pp:23-30.
    69戴春妮.高光谱显微图像的特征提取与分类方法及其应用研究[D].[博士论文].华东师范大学,2009.
    70李国勇.最优控制理论及参数优化.国防工业出版社.2006.
    71张润楚.实验设计与分析及参数优化.中国统计出版社.2003.
    72石田芳夫.围棋定式大全.四川：蜀蓉棋艺出版社.1988.
    73高川秀格.秀格棋话(三).北京：围棋天地,2007,5.pp：73.
    74杭天鹏.变化中的势.北京：围棋天地,2009,18.pp：89.
    75黄希文.阶梯围棋教室.(从业余3段到业余6段).辽宁：辽宁科学技术出版社.2002.pp：543.
    76 B. Bruno. The Move Decision Strategy of Go[J]. ICGA Journal,2003,26(1): 372-378.
    77 L. Kishimoto, Correct and efficient search algorithms in the presence of repetitions[D].[Phd thesis]. Albertson College,2005.
    78 LEVY B. Heuristic Programming in Artificial Intelligence. London:Academic Press,1989:195-210.
    79李月,李昂.本因坊秀策全集.四川：蜀蓉棋艺出版社.2004.
    80吴清源.吴清源对局集.四川：蜀蓉棋艺出版社.1999.
    81陈兆峰.李昌镐对局集.四川：蜀蓉棋艺出版社.1996.
    82陈志行.手谈Ⅲ.北京：围棋天地,2007.
    83余磊,刘锦高.一种围棋中盘问题的计算机求解方法.华东师范大学学报(自然科学版).2008,1(1)：89-93.
    84李纲.中盘战术.湖北：湖北人民出版社.1989.pp：1.
    85 Lei. Yu, Chunni. Dai, Xiaojun. Zhang. Solving Winning Probability Problems in Go," Proc. of 2010 the 2nd International Conference on Advanced Computer Control(ICACC 2010). pp:477-480.
    86黄希文.阶梯围棋教室.(从入门到业余初段).辽宁：辽宁科学技术出版社.2002.pp：578.
    87戴春妮,姚蒙.一种新的进化计算算法模型——种群竞争消亡算法[J].计算机应用,2005,25(1)：224-225.
    88 Yao Meng, Dai Chun-ni, Pei Min, et al. The species compete-die out(SCD) algorithms model for evolutionary computation[C]. Proc. of Genetic and Evolutionary Computation Conference. Washington:GECCO-2005,2005: 158-160.
    89 Yao Meng, Dai Chun-ni, Li Ming, et al. A species compete-die out(SCD) algorithm model for improving the performances of evolutionary computation in greenhouse[C]. Proc. of the,Fourth International Conference on Machine Learning and Cybernetics. Guangzhou:ICMLC 2005,2005:96-98.
    90 Dai Chun-ni, Yao Meng, Xie Zhu-jie. Parameter optimization for growth model of greenhouse crop using genetic algorithms[J]. Applied Soft Computing,2009, 9(1):12-19.
    91 CONWAY J H. On Numbers and Games[M]. London:Academic Press,1976.
    92 MULLER C. Decomposition search:A combinatorial games approach to game tree search, with applications to solving Go endgames[J]. IJCAI Journal,1999, 20(3):578-583.
    93 KEH H C. Soft decomposition search in the game of Go[J]. ICGA Journal, 2002,22(4):461-464
    94 MARKUS E. Evaluation in Go by a neural network using soft segmentation[R], Graz:Proceedings of the 10th Advances in Computer Games Conference,2003.
    95 MARTIN M, MARKUS E. Temperature discovery search[R]. San Jose: Proceedings of the 19th National Conference on Artificial Intelligence,2004.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700