基于强化学习模型的科技用户学习机制研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本论文为国家自然科学基金项目“数据库网站用户信息搜索中的学习行为研究”(编号:70773054)的一个子课题,核心任务在于通过强化学习模型的拟合实验研究来挖掘科技文献数据库用户学习行为的内在机制。
     本论文首先对科技文献数据库检索方法功能进行剖析,并提出检索方法的分类体系框架;在此基础上,基于符号理论对科技文献数据库检索方法符号表征体系进行分析。
     然后,运用认知心理学理论,对检索方法符号体系的认知机制以及认知改变的学习机制进行分析,并提出检索方法符号选择行为改变的影响因素体系框架;同时,基于博弈学习理论,对本文要研究的科技用户对于检索方法的选择学习行为的机制与博弈学习行为的机制进行了比较。
     接下来,对两个重要的强化学习模型——BM模型、RE基本模型及其修正模型进入深入的理论分析,对模型的产生背景、理论假设、基本原理、验证方法、实验拟合情况、修正点、适用性等进行全面考察。
     最后,在模型分析的基础上,选择了其中的六个模型,对应科技用户检索方法选择调整行为中的学习规则进行解释与量化,并通过实验室控制实验,对模型的拟合预测效果进行了实证性分析。通过研究发现,利用强化学习模型对科技用户的学习机制进行探索是可行的。
     本论文的理论研究与实验研究在情报学领域尚属第一次,这对下一步研究的深入发展具有重要意义。
As a fundamental part of the NSFC project titled "Research of the Learning Behavior of Scientific and Technological Database Users in the Process of Information Searching" (No.70773054), this paper is focused on studying learning mechanism of scientific and technological database users by analyzing learning models and organizing fitting experiment.
     After analyzing the function of search methods of scientific and technological database, a classification system framework of search methods is proposed, and the representation system of search methods is analyzed based on symbol theory.
     Secondly, the cognitive mechanism for symbol system of search methods and the learning mechanism for cognitive change are analysed based on cognitive psychology, and the influence factors system framework on choosing search methods is proposed, and the learning mechanism is compared between choosing search methods and games based on game theory.
     Thirdly, two important reinforcement-based learning models which are BM model and RE basic model and their modified models are analyzed, including background, theory hypothesis, basic principle, verification method, experimental fitting situation, modification, applicability, and so on.
     The explanation and quantization of six learning models according to search methods choosing behavior is conducted after models analysis, and fitting experimental is organized. The result is that it is reasonable that exploring learning behavior of scientific and technological database users using reinfoecement-based learning models.
     As the first time of exploration in the related field, the research conducted in this paper also formed a solid foundation for later researchs.
引文
[1]甘利人,李恒.科技用户信息搜索过程中的问题解决机制解释.情报学报,2006,25(4):441-450
    [2]甘利人,高依旻.科技用户信息搜索行为特点研究.情报学报,2005,24(1):26-33
    [3]甘利人,白晨,李恒.基于科技用户信息搜索的学习迁移实验观察与分析——以大学生样本为例.http://sem.njust.edu.cn/im/.[2008-03-05]
    [4]甘利人,白晨,贺娟.基于信息搜索行为的强化学习模型理论研究及其实验考察.http://sem.njust.edu.cn/im/.[2008-02-11]
    [5]甘利人,白晨,贺娟.基于科技用户信息搜索的信念学习模型研究:以大学生样本实验数据分析为例.情报学报,2007(4)
    [6]刘丽娟.CNKI资源利用的IA影响研究:基于用户体验的分析[硕士论文].南京理工大学,2007
    [7]DeLoache J S et al.Multiple Factors in Early Symbol Use:The Effect of Instructions,Similarity,and Age in Understanding a Symbol-Referent Relation.Cognitive Development,1999,14:299-312
    [8]Ittelson W H.Visual Perception of Markings.Psychonomic Bulletin & Review,1996,3(2):171-187
    [9]Namy L L,Waxman S R.Words and Gestures:Infants' Interpretations of Different Forms of Symbolic Reference.Child Development,1998,69:295-308
    [10]曾冬梅.从皮尔士符号学角度看雅克布森的翻译理论.邵阳学院学报(社会科学版),2005(6):133-134
    [11]Nelson Goodman.Languages of Art:An Approach to a Theory of Symbols.The British Journal for the Philosophy of Science,1971,22(2):187-198
    [12]Huttenlocher J,Higgins E T.Issues in the Study of Symbolic Development.The Minnesota Symposia on Child Psychology,1978,11:98-140
    [13]Deacon T W.The Symbolic Species:The Co-Evolution of Language and the Brain.New York:W.W.Norton,1997
    [14]Newell A,H A Simon.Human Problem Solving.NJ:Prentice-Hall,1972
    [15]DeLoache J S.Early Understanding and Use of Symbols.Current Directions in Psychological Science,1995,4(4):109-103
    [16]李恩来.符号的世界——人学理论的一次新突破——恩斯特·卡西尔人学思想探析.安徽大学学报(哲学社会科学版),2003,27(2)
    [17]Langer S.Philosophy in a New Key:A Study in the Symbolism of Reason,Rite,and Art.New York:New American Library of Literature,1942
    [18]Vygotsky L S.Thought and language[MA.Dissertation].Cambridge University,1962
    [19]DeLoache J S.Becoming Symbol-Minded.Trends in Cognitive Sciences,2004,18(2):66-70
    [20]DeLoache J S.Early Development of the Use of Symbolic Artifacts.Blackwell Handbook of Childhood Cognitive Development,2002,1:206-226
    [21]Mills D Let al.Language Comprehension and Cerebral Specialization From 13 to 20Months.Developmental Neuropsychology,1997,13:397-445
    [22]DeLoache J S,Sharon T.Symbols and Similarity:You Can Get Too Much of a Good Thing.Journal of Cognition and Development,2005,6:33-49
    [23]Campbell A L,Namy L L.The Role of Social-Referential Context in Verbal and Nonverbal Symbol Learning.Child Development,2003,74:549-563
    [24]Tomasello M.The Cultural Origins of Human Cognition.Harvard:Harvard University Press,1999
    [25]陈会昌.中国学前教育百科全书(心理发展卷).沈阳:沈阳出版社,1995
    [26]李汉松.西方心理学史.北京:北京师大出版社,1988
    [27]王墨荣.开发元认知-促进初中生数学认知理解水平[硕士论文].天津师范大学,2007
    [28]赵艳芳.认知语言学概论.上海:上海外语出版社,2001
    [29]Best J B.认知心理学.北京:中国轻工业出版社,2000
    [30]王甦,汪安圣.认知心理学.北京:北京大学出版社,1991
    [31]谭绍珍,曲琛.认知过程模型研究评述.四川教育学院,2004,11:33-35
    [32]雷永生等.皮亚杰发生认识论述评.北京:人民出版社,1987
    [33]李恒.基于认知心理学的科技用户信息搜索行为理论研究[硕士论文].南京理工大学,2006
    [34]孙喜林,荣晓华.现代心理学教程.大连:东北财经出版社,2000
    [35]卢家嵋,魏庆安,李其维.心理学.上海:上海人民出版社,2004
    [36]Mayer R E.Should There Be a Three-Strikes Rule Against Pure Discovery Learning?.American Psychologist,2004,59(1):14- 19
    [37]Harold W K.Classics in Game Theory.Princeton,NJ:Princeton University Press,1997
    [38]Myerson R.Game Theory:Analysis of Conflict.Cambridge and London:Harvard University Press,1991
    [39]Osborne M,A Rubinstein.A Course in Game Theory.Cambridge and London:The MIT Press,1994
    [40]Dixit,Avinash K,Skeath,Susan.Games of Strategy.New York:W.W.Norton,1999
    [41]Kelly,Anthony.Decision Making Using Game Theory:An Introduction for Managers.Cambridge and London:Cambridge University Press,2003
    [42]Hargreaves Heap,Shaun P,Yanis Varoufakis.Game Theory:A Critical Text.New York:Routledge,2004.
    [43]艾里克·拉斯缪森.博弈与信息.北京:北京大学出版社,2003
    [44]张维迎.博弈论与信息经济学.上海:上海人民出版社,1994
    [45]朱·弗登博格,让·梯若尔.博弈论.北京:中国人民大学出版社,2002
    [46]Thomas Brenner.Agent Learning Representation-Advice in Modeling Economic Learning.Computational Economics,2006,2:895-947
    [47]Bush R R,Mosteller F.Stochastic Models for Learning.New York,John Wiley &Sons,1955
    [48]Arthur W B.On Designing Economic Agents that Behave Like Human agents.Journal of Evolutionary Economics,1993,3(1):1 -22
    [49]Roth A E,Erev I.Learning in Extensive Form Games:Experimen al Data and Simple Dynamic Models in the Intermediate Run.Games and Economic Behavior,1995,6:164-212
    [50]Margaret Mary Skelly.Hierarchical Reinforcement Learning with Function Approximation for Adaptive Control[PH.D.Dissertation].Case Western Reserve University,2004
    [51]Brown G W.Iterative Solution of Games by Fictitious Play.Activity Analysis of Production and Allocation,1951,1:374-376
    [52]Young P.The Evolution of Conventions.Econometrica,1993,61:57-84
    [53]K Binmore,L Samuelson.Muddling Through:Noisy Equilibrium Selection.Journal of Economic Theory,1997,74:235-265
    [54]Crawford V.Adaptive Dynamics in Coordination Games.Econometrica,1995,63:103-144
    [55]Jordan J S.Bayesian learning in normal form games.Games and Economic Behavior,1991,3(1):60-81
    [56]Jordan J S.Bayesian Learning in Repeated Games.Games and Economic Behavior,1995,9:8-20
    [57]Eichberger J,Hailer H,Milne F.Naive Bayesian learning in 2 x 2 matrix games.Journal of Economic Behavior & Organization,1993,22(1):69-90
    [58]Bergemann,Valimaki.Market Diffusion with Ttwo-Sided Learning.http://cowles.econ.yale.edu/P/cd/d11a/d1138.pdf.[2008-02-14]
    [59]Fu W-T.Adaptive Planning in Problem Solving - Cost-Benefit Trade offs in Bounded Rationality[PH.D.Dissertation].George Mason University,2003
    [60]Witt U.Learning to Consume - A Theory of Wants and the Growth of Demand.Journal of Evolutionary Economics,2001,11:23-36
    [61]Camerer C,Ho T H.Experience-Weighted Attraction Learning in Normal Form Games.Econometrical,1999,67:837-874
    [62]R Marimon.Learning From Learning in Economics.Economics and Econometrics:Theory and Applications,1997,5:278-315.
    [63]Peter Findeisen.Asymptotic Properties of a Certain Class of Bush-Mosteller Learning Models.Applied Probability,1980,12(4):922-941
    [64]Atanasios Mitropoulos.On the Measurement of the Predictive Success of Learning Theories in Repeated Games.http://129.3.20.41/eps/exp/papers/0110/0110001.pdf.[2008-04-21]
    [65]Izquierdo Segismundo S.Dynamics of the Bush-Mosteller Learning Algorithm in 2x2Games.http://s.i-techonline.com/Book/Reinforcement-Learning/ISBN978-3-902613-14-1-rl11.pdf.[2007-03-20]
    [66]Staddon J E R,Horner J M.Stochastic Choice Models:A Comparison Between Bush-Mosteller and a Source-Independent Reward-Following Model.Journal of the Experimental Analysis of Behavior,1989,52:57-64
    [67]Karl Tuyls,Dries Heytens,Ann Nowe,Bernard Manderick.Extended Replicator Dynamics as a Key to Reinforcement Learning in Multi-agent Systems.LECT NOTES ARTIF INT,2003,2837:421-431
    [68]Cross John G A.Stochastic Learning Model of Economic Behavior.Quarterly Journal of Economics,1973,87:239-266
    [69]Atanasios Mitropoulos.Little Information,Efficiency and Learning -An Experimental Study.http://econpapers.repec.org/paper/wpawuwpga/0110002.htm.[2008-04-10]
    [70]Mookherjee D,Sopher B.Learning and Decision Costs in Experimental Constant Sum Games.Games and Economic Behavior,1997,19:97-132
    [71]Charles Romeo,Barry Sopher.Learning and Decision Costs in One-Person Games.Journal of Applied Econometrics,1999,14(4):335-357
    [72]Nobuyuki Hanaki.Action Learning Versus Strategy Learning.Complexity,2004,9(5):41-50
    [73]Antonio Cabrales.Stochastic Replicator Dynamics.International Economic Review,2000,41(2):451-481
    [74]Tilman Borgers,Rajiv Sarin.Learning Through Reinforcement and Replicator Dynamics.Journal of Economic Theory,1997,77(1):1 - 14
    [75]T Borgers,R Satin.Naive Reinforcement Learning with Endogenous Aspirations.International Economic Review,2000,41(4):921-950
    [76]Karandikar Rajeeva,Mookherjee Dilip,Ray Debraj,Vega-Redondo,Femando.Evolving Aspirations and Cooperation.Journal of Economic Theory,1998,80(2):292-331
    [77]Macy M W,A Flache.Learning Dynamics in Social Dilemmas.Proceedings of the National Academy of Sciences,2002,99(10):7229-7236
    [78]Huet S,Edwards M,Deffuant G.Taking into Account the Variations of Neighbourhood Sizes in the Mean-Field Approximation of the Threshold Model on a Random Network.Journal of Artificial Societies and Social Simulation,2007,10(1):10-43
    [79]Galan J M,Izquierdo L R.Appearances Can Be Deceiving:Lessons Learned Re-Implementing Axelrod's 'Evolutionary Approach to Norms'.Journal of Artificial Societies and Social Simulation,2005,8(3):2-54
    [80]Edwards R et al.(Re)presenting research in lifelong learning.Kwartalnik Mysli Spolec\no-Pedagogicznej,2003,21(1):83-99
    [81]Castellano C,Marsili M,Vespignani A.Nonequilibrium Phase Transition in a Model for Social Influence.Physical Review Letters,2000,85(16):3536-3539
    [82]Fudenberg D,D Kreps,D K Levine.On the Robustness of Equilibrium Refinements.Journal of Economic Theory,1988,44:354-380
    [83]Kandori et al.Learning,Mutation,And long-Run Equilibria in Games.Econometriea,1993,61:29-56
    [84]D P Foster,H P Young.Stochastic Evolutionary Game Dynamics.Theor.Popul.Biol,1990,38:219-232
    [85]Ido Erev,Arnnon Rapoport.Coordination,"Magic," and Reinforcement Learning in a Market Entry Game.Games and Economic Behavior,1998,23:146-175
    [86]Yoella Bereby Meyer,Ido Erev.On Learning To Become a Successful Loser:A Comparison of Alternative Abstractions of Learning Processes in the Loss Domain.Journal of Mathematical Psychology,1998,42(23):266-286
    [87]Ido Erev,Yoella Bereby Meyer,Roth Alvin E.The Effect of Adding a Constant to All Payoffs:Experimental Investigation,And Implications Forreinforcement Learning Models.Journal of Economic Behavior & Organization,1999,39(1):111-128
    [88] Premack D. Reinforcement Theory. Nebraska Symposium on Motivation, 1965, 13:107-132
    
    [89] Premack D. Catching up with Common Sence or Two Sides of a Generalization:Reinforcement and Publishment. The Nature of Reinforcement, 1971, 1:311 -320
    
    [90] Tinklepaugh L H. An Experimental Study of Representative Factors in Monkeys.Journal of Comparative Psychology, 1928, 8:197-236
    
    [91] Tinklepaugh L H. Maze Learning of a Turtle. Journal of Comparative Psychology,1932, 13:201-206
    
    [92] Myers J. Space biology: Ecological Aspects Introductory remarks. Amer. Biol.Teacher, 1963, 25:409-411
    
    [93] Luce R D. Individual Choice Behavior. New York: Wiley, 1959
    
    [94] Busemeyer J R, Myung I J. An adaptive Approach to Human Decision Making:Learning Theory and Human Performance. Journal of Experimental Psychology: General,1992,121:177-194
    
    [95] McKelvey R D, T R Palfrey. Quantal Response Equilibrium for Normal Form Games.Games and Economic Behavior, 1995, 10:6-38
    
    [96] Blackburn J M. Acquisition of Skill: An Analysis of Learning Curves[R]. IHRB Report, 1936(73)
    
    [97] Brier Glenn W. Verification of Forecasts Expressed in Terms of Probability. Monthly Weather Review, 1950, 78(1):1~3
    
    [98] Selten R. Axiomatic Characterization of the Quadratic Scoring Rule. Experimental Economics, 1998, 1:43-62
    
    [99] Chen Y, Tang F. Learning and Incentive Compatible Mechanisms for Public Goods Provision: an Experimental study. J. Polit. Economy, 1998, 106:633-662.
    
    [100] Feltovich N. Reinforcement-Based vs. Belief-Based Learning Models in Experimental Asymmetric-Information Games. Econometrica, 2000, 68: 605-641
    
    [101] Yan Chen, Yuri Khoroshilov. Learning under Limited Information. Games and Economic Behavior, 2003, 44:1-25

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700