用户名: 密码: 验证码:
POMDP近似解法研究及在中医诊疗方案优化中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
序贯决策是人类在生产和生活中时刻都会遇到的问题,也是人工智能和控制领域的热点研究内容。部分可观察马尔可夫决策过程(Partially Observable Markov Decision Process, POMDP)是一种描述不确定环境下序贯决策问题的概率模型。POMDP的精确值迭代算法利用动态规划在整个信念状态空间上更新值函数,无法解决现实中大规模的POMDP问题。因此,对POMDP近似求解算法的研究具有重要的理论价值和现实意义。近几年来,基于点的值迭代算法成为POMDP模型主流的近似求解方法。基于点的算法只利用少量可以到达的信念状态更新值函数,其中如何选择用于更新值函数的信念状态和如何确定信念状态上值函数的更新顺序是基于点的值迭代算法的两个关键问题。然而现有算法在这两个方面仍然存在一定的不足,提出更好的信念状态选择算法是提高POMDP值函数收敛速度的重要因素,这正是本文的主要研究内容之一。
     另外,在医学领域中,动态治疗方案规划是一种不确定环境下的多阶段决策问题。动态序贯干预是中医临床过程中治疗慢性疾病的基本方法。以患者为轴心的治疗原则和医生的个体性特点,使得中医动态序贯干预过程中包含多样化的治疗方案。临床专家往往试图通过这种无外部对照的、大量的临床数据获得疗效较好的治疗方案,进而逐步形成固化、有效的经验知识。但是,利用传统经验整理方式形成有效治疗方案是一个较为漫长的过程。如何从大规模、复杂的多维临床数据中发现较优的动态序贯治疗方案,成为有效临床方案形成的重要课题,也是辨证论治临床评价研究的关键问题。针对这一问题,本文提出用POMDP模型对中医临床观察性数据建模的方法,并从来自临床实际的大规模数据中发现优化的动态治疗方案,为中医辨证论治过程中的动态治疗方案规划和疗效评价提供一种有力的工具。
     针对以上问题,本文的主要工作如下:
     1.系统地归纳和总结了近几年POMDP模型近似解法中基于点的值迭代算法的有关理论与方法,特别是对信念状态的选择和信念状态上值函数的更新顺序两个方面进行了深入的分析。这些内容将是本文工作的基础。
     2.提出了一种基于信念状态不确定性的信念状态选择算法UBBS。UBBS算法每次扩充信念状态集合时,首先选择不确定性较小并且到已选信念状态集合的1-范数距离大于一定阈值的信念状态。我们采用两种方法描述信念状态的不确定性:一种方法用信息论中的熵来衡量信念状态的不确定性,另一种方法利用信念状态本身最大概率元素与最小概率元素的间隔来计算信念状态的不确定性。实验结果表明,UBBS方法通过较少数量的信念状态就可以得到与其他算法相近的最优值函数。
     3.提出了一种基于最短哈密顿通路产生用于更新POMDP值函数的信念状态轨迹的算法SHP-VI。SHP-VI方法是一种基于试探的POMDP值迭代算法,用计算最短哈密顿通路的近似算法求解一个最优行动序列,并利用该序列模拟Agent与环境的交互来探索信念状态空间从而得到一条信念状态轨迹,然后沿着信念状态轨迹反向更新值函数。实验结果表明,SHP-VI算法明显地提高了基于试探的算法中用于更新值函数的信念状态轨迹的计算效率,并减少了求解POMDP问题最优值函数的迭代次数。
     4.如何从大规模的复杂多维临床数据里发现中医临床实际中较优的动态序贯诊疗方案,是辨证论治临床评价研究的关键问题。针对这一问题,提出了一种利用POMDP模型发现优化的动态治疗方案的方法。这是首次在中医领域里探讨用POMDP方法解决治疗方案规划问题,并且模型的所有参数均由实际临床数据计算得出。我们把中医临床专家可以直接观测到的症状体征作为观察变量,将健康状态数设置在一个适合的范围之内并通过K-均值聚类算法从临床数据中得到健康状态。利用大规模的临床数据来估计POMDP模型中的状态转移概率和观察函数,并用加权的症状改善作为治疗措施疗效的评价。实验中,对中医治疗2型糖尿病的临床数据建立了POMDP模型,并利用PBVI和UBBS算法求解并发现了临床数据中优化的处方治疗方案。实验表明,POMDP模型可以用于挖掘临床数据中较优的序贯治疗方案,能够为中医辨证论治有效动态干预方案的形成和临床验证提供参考知识。同时,实验也验证了我们提出的UBBS算法在解决实际问题时的有效性。
Sequential decision-making is a problem that could be encountered frequently and becomes an interesting research field of artificial intelligence and control during production and our life. Partially Observable Markov Decision Process (POMDP) is a powerful probabilistic model for planning under uncertain environment. However, any exact algorithm could not be able to solve large-scale POMDP problems by dynamic programming over the whole belief space. Therefore, research on approximate POMDP algorithms is of great value both theoretically and practically, in which value iteration algorithms using point-based methods become main solutions. When computing value function, point-based algorithms use backup operations over a finite set of reachable belief states. The choice of the set of belief points and the order of backups are two key issues in point-based methods. Because existing algorithms have some drawbacks in these two aspects, more efficient method of choosing belief points is an important task for accelerating the convergence and is one of the main research contents of this dissertation.
     Dynamic treatment regime planning is a multi-step sequential decision-making problem under uncertainty in domain of medicine. Dynamic sequential intervene is the essential therapy method of treating chronic disease in Traditional Chinese Medicine (TCM). The characters of individualized treatment principle and individuation of TCM physicians make the clinical data contain diversified treatment plans in TCM sequential therapeutic procedure. TCM physicians usually summarize optimal therapy plan and effective TCM experiential knowledge from large-scale clinical data without the help of randomized controlled trials. However, it is a long process to summarize effective sequential therapy regime by traditional methods of summing up experience from clinical materials. So discovering and identifying optimal dynamic treatment regime from large-scale multidimensional clinical data is a key research topic in TCM. Aiming at this problem, we propose a POMDP solution for modeling TCM clinical observable data and exploring optimal dynamic treatments. This POMDP model could be served as a powerful tool for discovering dynamic treatment regimes and evaluating clinical treatments in TCM.
     The main contents and contributions of this dissertation are as follows:
     1. We discuss the systematic conclusions of theories and methods of point-based POMDP approximate solvers proposed recently and give a deep analysis of the key issues in point-based algorithms. And these backgrounds serve as the basis of the research of this dissertation.
     2. We propose a belief selection method based on the uncertainty of belief point (UBBS for short). When expanding the belief set, this algorithm first computes the uncertainties of the belief points that could be reached, and then selects the belief points that have lower uncertainties and whose 1-norm distances to the current belief set are larger than a threshold. We use two different methods to represent uncertainty of a belief state:one method uses entropy to describe the uncertainty of a belief point, and the other is based on the gap between the maximal and minimal elements of a belief state to compute the uncertainty. Experimental results indicate that this method is effective to gain an approximate long-term discounted reward using fewer belief states than other point-based algorithms.
     3. We propose a new value iteration algorithm based on shortest Hamiltonian path (Shortest Hamiltonian Path-based Value Iteration, SHP-VI). SHP-VI is also a trial-based algorithm. This method computes an optimal sequence of actions using an approximate algorithm to compute the shortest Hamiltonian path, explores an optimal belief trajectory by simulating interaction between the agent and environment with the resulting actions, and updates the value function over the encountered belief states in a reversed order. Experimental results show that SHP-VI accelerates the computation of belief trajectory greatly compared with other trial-based methods and reduces the iterations for solving POMDP optimal value function.
     4. According to the key issue of TCM clinical evaluation that how to identify optimal dynamic treatment regime from large-scale multidimensional clinical data, we propose a method using POMDP model to solve this problem. It is the first time that POMDP has been applied to optimizing dynamic therapy planning in TCM, and all parameters of this model are calculated from real-world clinical data. In this model, we treat the symptoms that could be observed directly by TCM physicians as observation variables, and use.K-means cluster algorithm to model health states by setting the number of clusters in a reasonable range. The transition probabilities and observation functions are calculated from large amount of clinical data. The immediate evaluation of a treatment is measured by weighted sum of improvements of all symptoms. We model the clinical data on type 2 diabetes mellitus with POMDP and identify the optimal treatment plans by PBVI and UBBS algorithms. Experimental results demonstrate that POMDP could help to abstract effective sequential treatment regime from data and provide useful reference for clinical evaluation of TCM. At the same time, this experiment also shows the validity of our UBBS algorithm in real-world problems.
引文
[1]胡奇英,刘建庸.马尔可夫决策过程引论.西安:西安电子科技大学出版社,2000.
    [2]刘克.实用马尔可夫决策过程.北京:清华大学出版社,2004.
    [3]王珏,周志华,周傲英主编.机器学习及其应用.北京:清华大学出版社,2006.
    [4]T. M. Mitchell. Machine Learning. New York:McGraw-Hill,1997.
    [5]殷苌茗.激励学习的若干新算法及其理论研究.博士论文.上海大学,2006.
    [6]POMDP Web Page. http://www.cassandra.org/pomdp/pomdp-faq.shtml.
    [7]R. A. Howard. Dynamic Programming and Markov Processes. Cambridge, MA:MIT Press, 1960.
    [8]D. Blackwell. Discrete Dynamic Programming. The Annals of Mathematical Statistics,1962, 33(2):719-726.
    [9]M. L. Puterman. Markov Decision Processes:Discrete Stochastic Dynamic Programming,1st edition. New York, NY, USA:John Wiley & Sons, Inc.,1994.
    [10]M. L. Littman. Algorithms for Sequential Decision Making. Ph.D. Dissertation. Brown University,1996.
    [11]C. Boutilier, R. Dearden, M. Goldszmidt, M. E. Goldszmidt. Exploiting Structure in Policy Construction. In:Proceedings of the Fourteenth International Joint Conference on Artifficial Inteligence (IJCAI'95), Montreal, Quebec, Canada,1995, pp.1104-1111.
    [12]T. Dean, L. P. Kaelbling, J. Kirman, A. Nicholson. Planning With Deadlines in Stochastic Domains. In:Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI'93), Washington DC, AAAI Press,1993, pp.574-579.
    [13]E. J. Sondik. The Optimal Control of Partially Observable Markov Processes. Ph.D. Dissertation. Stanford University,1971.
    [14]R. D. Smallwood, E. J. Sondik. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon. Operations Research,1973,21(5):1071-1088.
    [15]E. J. Sondik. The Optimal Control of Partially Observable Markov Processes Over The Infinite Horizon:Discounted Costs. Operations Research,1978,26(2):282-304.
    [16]A. R. Cassandra. A Survey of POMDP Applications. In:Proceedings of AAAI Fall Symposium: Planning with POMDPs,1998.
    [17]M. L. Littman. A Tutorial on Partially Observable Markov Decision Processes. Journal of Mathematical Psychology,2009,53(3):119-125.
    [18]桂林,武小悦.部分可观测马尔可夫决策过程算法综述.系统工程与电子技术,2008,200830(6):1058-1064.
    [19]D. Aberdeen. A (Revised) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes. Technical Report, Canberra, Australia, National ICT Australia, 2003.
    [20]K. P. Murphy. A Survey of POMDP Solution Techniques. Technical Report, Dept.of Computer Science, U.C.Berkeley,2000.
    [21]A. R. Cassandra. Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Ph.D. Dissertation. Brown university,1998.
    [22]M. Hauskrecht. Value-Function Approximations for Partially Observable Markov Decision Processes. Journal of Artificial Intelligence Research,2000,13:33--94.
    [23]H. Zhang. Partially Observable Markov Decision Processes:A Geometric Technique and Analysis. Operation Research,2010,58(1):214-228.
    [24]J. Pineau, G. Gordon, S. Thrun. Point-based Value Iteration:An Anytime Algorithm for POMDPs. In:Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI'03), Acapulco, Mexico, Morgan Kaufmann Publishers Inc.,2003, pp. 1025-1032.
    [25]K. J. Astrom. Optimal Control of Markov Decision Processes with Incomplete State Estimation. Journal of Mathematical Analysis and Applications,1965,10.
    [26]C. T. Striebel. Sufficient Statistics in The Optimal Control of Stochastic Systems. Journal of Mathematical Analysis and Applications,1965.
    [27]A. R. Cassandra, L. P. Kaelbling, M. L. Littman. Acting Optimally in Partially Observable Stochastic Domains. In:Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI'94), Seattle, Washington, United States,1994, pp.1023-1028.
    [28]W. S. Lovejoy. A Survey of Algorithmic Methods for Partially Observed Markov Decision Processes. Annals of Operations Research,1991,28(1):47-66.
    [29]N. L. Zhang, W. Liu. Planning in Stochastic Domains:Problem Characteristics and Approximation. Technical Report HKUST-CS96-31, Hong Kong,1996.
    [30]G. E. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms. MANAGEMENT SCIENCE,1982,28(1):1-16.
    [31]H. T. Cheng. Algorithms for partially observed Markov decision processes. Ph.D. Dissertation. Universify of British Columbia,1988.
    [32]A. Cassandra, M. Littman, N. Zhang. Incremental Pruning:A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes. In:Proceedings of the Thirteenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI'97), Providence, RI, Morgan Kaufmann,1997, pp.54-61.
    [33]R. I. Brafman. A Heuristic Variable Grid Solution Method for POMDPs. In:Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI'97), Providence, Rhode Island, AAAI Press,1997, pp.727-733.
    [34]I. Nourbakhsh, R. Powers, S. Birchfield. DERVISH An Office-Navigating Robot. AI Magazine, 1995,16(2):53-60.
    [35]R. Simmons, S. Koenig. Probabilistic Robot Navigation in Partially Observable Environments. In:Proceedings of the 14th international joint conference on Artificial intelligence, Montreal, Quebec, Canada,1995, pp.1080-1087.
    [36]M. L. Littman, A. R. Cassandra, L. P. Kaelbling. Learning Policies for Partially Observable Environments:Scaling Up. In:Proceedings of the Twelfth International Conference on Machine Learning (ICML'95), San Francisco, CA, USA, Morgan Kaufmann Publishers Inc.,1995, pp. 362-370.
    [37]N. L. Zhang, W. Zhang. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes. Journal of Artificial Intelligence Research,2001,14: 29-51.
    [38]W. Zhang, N. L. Zhang. Value Iteration Working with Belief Subset. In:Proceedings of the Eighteenth national conference on Artificial intelligence (AAAI'02), Edmonton, Alberta, Canada, AAAI Press,2002, pp.307-312.
    [39]W. Zhang. Algorithms for Partially Observable Markov Decision Processes. Ph.D. Dissertation. Hong Kong University of Science and Technology,2001.
    [40]J. Pineau, G. Gordon, S. Thrun. Point-based Approximations for Fast POMDP Solving. Journal of Artificial Intelligence Research,2005,27:335-380.
    [41]T. Smith, R. Simmons. Heuristic Search Value Iteration for POMDPs. In:Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI'04), Banff, Canada, AUAI Press,2004, pp.520-527.
    [42]M. T. J. Spaan, N. Vlassis. Perseus:Randomized Point-based Value Iteration for POMDPs. Journal of Artificial Intelligence Research,2005:195-220.
    [43]M. T. Izadi, D. Precup, D. Azar. Belief Selection in Point-Based Planning Algorithms for POMDPs. In:Proceedings of the 19th Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI 2006, Quebec City, Quebec, Canada, Springer LNAI,2006, pp.383-394.
    [44]M. T. Izadi, D. Precup. Exploration in POMDP Belief Space and Its Impact on Value Iteration Approximation. In:Proceedings of The 17th European Conference on Artificial Intelligence (ECAI'06), Riva del Garda, Italy, Amsterdam, IOS Press,2006.
    [45]M. R. James, M. E. Samples, D. A. Dolgov. Improving Anytime Point-Based Value Iteration Using Principled Point Selections. In:Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI'07), Hyderabad, India, Morgan Kaufmann Publishers,2007, pp.865-870.
    [46]G. Shani, R. I. Brafman, S. E. Shimony. Forward Search Value Iteration for POMDPs. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07), Hyderabad, India, Morgan Kaufmann Publishers Inc.,2007, pp.2619-2624.
    [47]E. A. Hansen. Solving POMDPs by Searching in Policy Space. In:Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI'98), Madison, WI, USA, 1998, pp.211-219.
    [48]N. Meuleau, K.-E. Kim, L. P. Kaelbling, A. R. Cassandra. Solving POMDPs by Searching the Space of Finite Policies. In:Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI'99), Stockholm, Sweden,1999, pp.417-426.
    [49]孙湧,仵博,冯延蓬.基于策略迭代和值迭代的POMDP算法.计算机研究与发展,2008,45(10):1763-1768.
    [50]S. Ji, R. Parr, H. Li, X. Liao, L. Carin. Point-Based Policy Iteration. In:Proceedings of the 22nd national conference on Artificial intelligence (AAAI'07), Vancouver, British Columbia, Canada, AAAI Press,2007, pp.1243-1249.
    [51]B. Sallans, G. E. Hinton. Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research,2004,5:1063-1088.
    [52]C. Boutilier, T. Dean, S. Hanks. Decision-Theoretic Planning:Structural Assumptions and Computational Leverage. Journal of Artificial Intelligence Research,1999,11:1-94.
    [53]C. Guestrin, D. Koller, R. Parr, S. Venkataraman. Efficient Solution Algorithms for Factored MDPs. Journal of Artificial Intelligence Research,2003,19:399-468.
    [54]P. Poupart. Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes. Ph.D. Dissertation. University of Toronto,2005.
    [55]E. A. Hansen, Z. Feng. Dynamic Programming for POMDPs using a Factored State Representation. In:Proceedings of the Fifth International Conference on Artificial Intelligence Planning Systems (AIPS'00),2000, pp.130-139.
    [56]T. Dean, K. Kanazawa. A Model for Reasoning About Persistence and Causation. Computational Intelligence,1990,5(3):142-150.
    [57]张连文,郭海鹏.贝叶斯网络引论.北京:科学出版社,2006.
    [58]R. I. Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Macii, A. Pardo, F. Somenzi. Algebraic Decision Diagrams and their Applications. In:Proceedings of IEEE/ACM International Conference on Computer-Aided Design, Santa Clara, California, United States, 1993, pp.188-191.
    [59]G. Shani, R. I. Brafman, S. E. Shimony. Efficient ADD Operations for Point-Based Algorithms. In:Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling (ICAPS'08), Sydney, Australia, AAAI Press,2008, pp.330-337.
    [60]J. Hoey, R. St-aubin, A. Hu, C. Boutilier. SPUDD:Stochastic Planning using Decision Diagrams. In:Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI'99), Stockholm, Sweden, Morgan Kaufmann,1999, pp.279-288.
    [61]P. Poupart, C. Boutilier. Value-directed Compression of POMDPs. In:Proceedings of the Sixteenth Annual Conference on Neural Information Processing Systems (NIPS'02), Vancouver, BC,2002, pp.1547-1554.
    [62]N. Roy, G. J. Gordon, S. Thrun. Finding Approximate POMDP solutions Through Belief Compression. Journal of Artificial Intelligence Research,2005:1-40.
    [63]X. Li, W. K. Cheung, J. Liu. Integrating Value-Directed Compression and Belief Space Analysis for POMDP Decomposition. In:Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT'06), Hong Kong, China 2006, pp.45-51.
    [64]H. V. Fineberg, M. C. Weinstein. Clinical decision analysis. Philadelphia:Saunders,1980.
    [65]Wiki. Dynamic Treatment Regime. http://en.wikipedia.org/wiki/Dynamic_treatment_regime.
    [66]S. A. Murphy. Optimal Dynamic Treatment Regimes. Journal of the Royal Statistical Society, 2003,65(2):331-366.
    [67]S. A. Murphy. An Experimental Design for the Development of Adaptive Treatment Strategies. STATISTICS IN MEDICINE,2005,24:1455-1481.
    [68]P. W. Lavori, R. Dawson. Dynamic Treatment Regimes:Practical Design Considerations. Clinical Trials,2004,1:9-20.
    [69]谢雁鸣,王永炎,翁维良,武常生,易丹辉.中医临床方案优化的思路与方法探析.世界科学技术-中医药现代化,2008,10(1):22-26.
    [70]J. Han, M. Kamber. Data Mining:Concepts and Techniques, Second Edition. Morgan Kaufmann Publishers,2006.
    [71]V. Podgorelec, P. Kokol, B. Stiglic, I. Rozman. Decision Trees:An Overview and Their Use in Medicine. Journal of Medical Systems,2002,26:445-463.
    [72]I. H. Witten, E. Frank. Data Mining:Practical Machine Learning Tools and Techniques, Second Edition. Diane Cerra,2005.
    [73]M. Hauskrecht, H. Fraser. Planning Treatment of Ischemic Heart Disease with Partially Observable Markov Decision Processes. Artificial Intelligence in Medicine,2000,18:221-244.
    [74]A. J. Schaefer, M. D. Bailey, S. M. Shechter, M. S. Roberts. Modeling Medical Treatment Using Markov Decision Processes. Operations Research and Health Care:A Handbook of Methods and Applications. Boston:Kluwer.2004:593-612.
    [75]P. Magni, S. Quaglini, M. Marchetti, G. Barosi. Deciding When to Intervene:A Markov Decision Process Approach International Journal of Medical Informatics,2000,60(3):237-253
    [76]O. Alagoz, L. M. Maillart, A. J. Schaefer, M. S. Roberts. The Optimal Timing of Living-Donor Liver Transplantation. MANAGEMENT SCIENCE,2004,50(10):1420-1430.
    [77]S. M. Shechter, C. L. Bryce, O. Alagoz, J. E. Kreke, J. E. Stahl, A. J. Schaefer, D. C. Angus, M. S. Roberts. A Clinically Based Discrete-Event Simulation of End-Stage Liver Disease and the Organ Allocation Process Medical Decision Making,2005,25(2):199-209
    [78]B. T. Denton, M. Kurt, N. D. Shah, S. C. Bryant, S. A. Smith. Optimizing the Start Time of Statin Therapy for Patients with Diabetes. Medical Decision Making,2009,29(3):351-367.
    [79]L. M. Maillart, J. S. Ivy, S. Ransom, K. Diehl. Assessing Dynamic Breast Cancer Screening Policies. Operations Research,2008,56(6):1411-1427.
    [80]J. Zhang, B. Denton, H. Balasubramanian, N. Shah, B. Inman. Optimization of PSA-Based Screening Decisions for Prostate Cancer Detection. working paper. URL: http://www.ise.ncsu.edu/bdenton/Papers/pdf/Zhang-2009.pdf,2009.
    [81]R. A. Howard, J. E. Matheson. Readings on The Principles and Applications of Decision Analysis. Menlo Park, Calif.:Strategic Decisions Group,1989.
    [82]J. A. Tatman, R. D. Shachter. Dynamic Programming and Influence Diagrams. IEEE Transactions on Systems, Man and Cybernetics,1990,20(2):365-379.
    [83]M. A. J. v. Gerven, F. J. Diez, B. G. Taal, P. J. F. Lucas. Selecting Treatment Strategies with Dynamic Limited-memory Influence Diagrams. Artificial Intelligence in Medicine,2007,40(3): 171-186.
    [84]C. Boutilier, D. Poole. Computing Optimal Policies for Partially Observable Decision Processes using Compact Representations. In:Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI'96), Portland, Oregon, AAAI Press,1996, pp.1168-1175.
    [85]田元祥,翁维良,李睿.中医临床研究治疗方案的优化.中华中医药杂志,2010,3.
    [86]J. D. Williams, S. Young. Partially observable markov decision processes for spoken dialogue systems. Computer Speech and Language,2007,21:393-422.
    [87]Z. Li, P. Nguyen, G. Zweig. Optimal Dialog in Consumer-Rating Systems using a POMDP Framework. In:Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, Columbus, Ohio, Association for Computational Linguistics,2008, pp.104-111.
    [88]T. Paek, R. Pieraccini. Automating Spoken Dialogue Management Design Using Machine Learning:An Industry Perspective. Speech Communication,2008,50(8-9):716-729.
    [89]J. Pineau, M. Montemerlo, M. Pollack, N. Roy, S. Thrun. Towards Robotic Assistants in Nursing Homes:Challenges and Results. Robotics and Autonomous Systems,2003,42(3-4): 271-281.
    [90]J. Hoey, P. Poupart, A. v. Bertoldi, T. Craig, C. Boutilier, A. Mihailidis. Automated Handwashing Assistance for Persons with Dementia Using Video and a Partially Observable Markov Decision Process. Computer Vision and Image Understanding,2009,114(5):503-519.
    [91]L. P. Kaelbling, M. L. Littman, A. W. Moore. Reinforcement Learning:A Survey. Journal of Artificial Intelligence Research,1996,4:237-285.
    [92]L. P. Kaelbling. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence,1998,101:99-134.
    [93]M. L. Puterman. Markov Decision Processes:Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc. New York, NY, USA 1994.
    [94]T. Smith. Probabilistic Planning for Robotic Exploration. Ph.D. Dissertation. Carnegie Mellon University,2007.
    [95]周继恩.基于POMDP的用户兴趣获取模型研究.博士论文.中国科学技术大学,2003.
    [96]R. I. Brafman, G. Shani, S. E. Shimony. Partial Observability Under Noisy Sensors—From Model-Free to Model-Based. In:Proceedings of Workshop on Rich Representations for Reinforcement Learning, Bonn, Germany,2005.
    [97]G. Shani. A Survey of Model-Based and Model-Free Methods for Resolving Perceptual Aliasing. Technical Report 05-02, Department of Computer Science at the Ben-Gurion University in the Negev,2004.
    [98]C. Watkins, P. Dayan. Q-learning. Machine Learning,1992,8:279-292.
    [99]A. K. McCallum. Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Dissertation. University of Rochester,1996.
    [100]卞爱华,王崇骏,陈世福.基于点的POMDP算法的预处理方法.软件学报,2008,19(6):1309-1316.
    [101]R. Bellman. Dynamic Programming. Princeton, N.J.:Princeton University Press,1957.
    [102]M. Hauskrecht. Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes. In:Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI'97), Providence, Rhode Island, AAAI Press,1997, pp.734-739.
    [103]B. Bonet. An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes. In:Proceedings of the Nineteenth International Conference on Machine Learning (ICML'02), Sydney, NSW, Australia, Morgan Kaufmann,2002, pp.51-58.
    [104]R. Zhou, E. A. Hansen. An Improved Grid-Based Approximation Algorithm for POMDPs. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI'01), Seattle, Washington, USA, Morgan Kaufmann Publishers Inc.,2001, pp.707-714.
    [105]M. Hauskrecht. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research,2000,13:33-94.
    [106]Y. Virin, G. Shani, S. E. Shimony, R. Brafman. Scaling Up:Solving POMDPs through Value Based Clustering. In:Proceedings of the Twenty-First Conference on Artificial Intelligence (AAAI'07), Vancouver, British Columbia, Canada, AAAI Press,2007, pp.1910-1911.
    [107]N. Vlassis, M. T. J. Spaan. A Fast Point-based Algorithm for POMDPs. In:Proceedings of the Thirteenth Belgian-Dutch Conference on Machine Learning, Brussels, Belgium,2004, pp. 170-176.
    [108]D. Hsu, W. Sun, L. N. Rong. What Makes Some POMDP Problems Easy to Approximate? In: Proceedings of the Twenty-Second Annual Conference on Advances in Neural Information Processing Systems (NIPS'08), Vancouver, B.C., Canada, MIT Press,2008.
    [109]M. T. Izadi, D. Precup. Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes. In:Proceedings of the 16th European Conference on Machine Learning (ECML'05), Porto, Portugal, Spriger,2005, pp.593-600.
    [110]M. T. Izadi, A. V. Rajwade, D. Precup. Using Core Beliefs for Point-Based Value Iteration. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI'05), Edinburgh, Scotland Morgan Kaufmann Publishers,2005, pp.1751-1753.
    [111]G. Shani, R. I. Brafman, S. E. Shimony. Prioritizing Point-Based POMDP Solvers. In: Proceedings of the 17th European Conference on Machine Learning (ECML'06), Berlin, Germany, Springer,2006, pp.389-400.
    [112]M. T. Izadi, D. Precup. Point-Based Planning for Predictive State Representations. In: Proceedings of the 21st Conference of the Canadian Society for Computational Studies of Intelligence (Canadian AI 2008), Windsor, Canada, Lecture Notes in Computer Science,2008, pp.126-137.
    [113]X. Li, W. K. Cheung, J. Liu. Towards Solving Large-Scale POMDP Problems Via Spatio-Temporal Belief State Clustering. In:Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI'2005) Workshop on Reasoning with Uncertainty in Robotics (RUR'05),2005.
    [114]陈茂.基于采样的POMDP快速求解.硕十论文.中国科学技术大学,2005.
    [115]T. Smith, R. G. Simmons. Point-Based POMDP Algorithms:Improved Analysis and Implementation. In:Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence(UAI'05), Edinburgh, Scotland, AUAI Press,2005, pp.542-547.
    [116]A.-H. Bian, C.-J. Wang, S.-F. Chen. Preprocessing for Point-Based Algorithms of POMDPs. In: Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2008), Dayton, Ohio, USA, IEEE Computer Society 2008,2008, pp.519-522.
    [117]F. Wu, S. Zilberstein, X. Chen. Trial-Based Dynamic Programming for Multi-Agent Planning. In:Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI'10), Toronto, Canada, AAAI Press,2010.
    [118]A. G. Barto, S. J. Bradtke, S. P. Singh, T. T. R. Yee, V. Gullapalli, B. Pinette. Learning to Act Using Real-Time Dynamic Programming. Artificial Intelligence,1995,72:81-138.
    [119]H. Geffner, B. Bonet. Solving Large POMDPs Using Real Time Dynamic Programming. In: Proceedings of AAAI Fall Symposium:Planning with POMDPs,1998.
    [120]J. S. Dibangoye, G. Shani, B. Chaib-draa, A.-I. Mouaddib. Topological Order Planner for POMDPs. In:Proceedings of the 21st International Joint Conference on Artifical Intelligence (IJCAI'09), Pasadena, California, USA, Morgan Kaufmann Publishers Inc.,2009.
    [121]J. S. Dibangoye, B. Chaib-draa, A.-i. Mouaddib. Topological Orders Based Planning for Solving POMDPs. In:Proceedings of AAAI 2008 Workshop on Advancements in POMDP Solvers,2008, pp.25-30.
    [122]P. Dai, J. Goldsmith. Topological Value Iteration Algorithm for Markov Decision Processes. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI'07), Hyderabad, India, Morgan Kaufmann Publishers Inc.,2007, pp.1860-1865.
    [123]B. Bonet, H. Geffner. Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback. In:Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI'03), Acapulco, Mexico, Morgan Kaufmann Publishers Inc.,2003, pp. 1233-1238.
    [124]S. Kirkpatrick, C. D. G. Jr., M. P. Vecchi. Optimization by Simulated Annealing. Science,1983, 220(4598):671-680.
    [125]M. Dorigo, V. Maniezzo, A. Colorni. The Ant System:Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B,1996,26: 29-41.
    [126]D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning,1st edition. Boston, MA, USA:Addison-Wesley Longman Publishing Co.,1989.
    [127]M. Gendreau, A. Hertz, G. Laporte. New Insertion and Postoptimization Procedures for The Traveling Salesman Problem. Operations Research,1992,40(6):1086-1094.
    [128]F.-X. L. Louarn, M. Gendreau, J.-Y. Potvin. GENI Ants for the Traveling Salesman Problem. Annals of Operations Research,2004,131:187-201.
    [129]S. A. Murphy, D. W. Oslin, A. J. Rush, J. Zhu. Methodological Challenges in Constructing Effective Treatment Sequences for Chronic Psychiatric Disorders. Neuropsychopharmacology, 2007,32:257-262.
    [130]L. Peelena, N. Peeka, N. F. d. Keizera, E. d. Jongeb, R. J. Bosman. A Markov Model to Describe Daily Changes in Organ Failure for Patients at the ICU. Studies in Health Technology and Informatics,2006,124:555-560.
    [131]D. J. Lizotte, M. Bowling, S. A. Murphy. Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis. In:Proceedings of the 27th International Conference on Machine Learning (ICML'10), Haifa, Israel, ACM,2010.
    [132]孙曾祺.实用中医辨证论治学基础.北京:学苑出版社,1999.
    [133]陈世波,刘保延,王永炎,周雪忠,倪青,何丽云.“症”及其分类在中医临床评价中的作用分析.中医杂志,2009,50(1):5-7.
    [134]印会河.中医基础理论.上海:科学技术出版社,1984.
    [135]王永炎,张启明,赵宜军.对中医个体化诊疗的理解与解释.环球中医药,2009,2(3):161-163.
    [136]wiki中医学.http://zh.wikipedia.org/zh-cn/%E4%B8%AD%E5%8C%BB#_note-2.
    [137]Wiki:Traditional Chinese medicine. http://en.wikipedia.org/wiki/Traditional_chinese_medicine.
    [138]邓中甲.方剂学.上海:上海科学技术出版社,2008.
    [139]周雪忠,刘保延,王映辉,张润顺,姚乃礼,崔蒙.复方药物配伍的复杂网络方法研究.中国中医药信息杂志,2008,15(11):98-100.
    [140]刘保延,周雪忠.中医临床研究方法的思考与实践——系统生物学湿干研究模式与中医临床研究.世界科学技术:中医药现代化,2007,9(1):85-89.
    [141]周雪忠,刘保延,姚乃礼,陈世波,李平,王映辉,张润顺.中医临床数据库及挖掘分析平台的研究与应用探讨.世界科学技术:中医药现代化,2007,9(4):74-80.
    [142]W. H. Inmon. Building the Data Warehouse,2nd Edition. New York, NY, USA:John Wiley & Sons, Inc.,1993.
    [143]X. Zhou, S. Chen, B. Liu, R. Zhang, Y. Wang, P. Li, Y. Guo, H. Zhang, Z. Gao, X. Yan. Development of Traditional Chinese Medicine Clinical Data Warehouse for Medical Knowledge Discovery and Decision Support. Artificial Intelligence in Medicine,2010,48(2): 139-152.
    [144]R. Albert, A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics,2002,22:47-97.
    [145]N. L. Zhang, T. D. Nielsen, F. V. Jensen. Latent variable discovery in classification models. Artificial Intelligence in Medicine,2004,30:283-299.
    [146]D. M. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research,2003,3:993-1022.
    [147]Y. Feng, Z. Wu, X. Zhou, Z. Zhou, W. Fan. Knowledge Discovery in Traditional Chinese Medicine:State of the Art and Perspectives. Artificial Intelligence in Medicine,2006,38(3): 219-236.
    [148]X. Zhou, R. Zhang, Y. Wang, P. Li, B. Liu. Network Analysis for Core Herbal Combination Knowledge Discovery from Clinical Chinese Medical Formulae. In:Proceedings of the First International Workshop on Database Technology and Applications, Wuhan, Hubei,2009, pp. 188-191.
    [149]X. Zhou, S. Chen, B. Liu, R. Zhang, Y. Wang, X. Zhang. Extraction of Hierarchical Core Structures From Traditional Chinese Medicine Herb Combination Network. In:Proceedings of the 2008 International Conference on Artificial Intelligence, Beijing, China,2008, pp. 262-267.
    [150]X. Zhou, Y. Peng, B. Liu. Text Mining For Traditional Chinese Medical Knowledge Discovery: A Survey. Journal of Biomedical Informatics,2010,43(4):650-660.
    [151]张连文,袁世宏.隐结构模型与中医辨证Technical Report, Hong Kong, Department of Computer Science, The Hong Kong University of Science and Technology,2004.
    [152]A. I. Adler, R. J. Stevens, S. E. Manley, R. W. Bilous, C. A. Cull, R. R. Holman. Development and Progression of Nephropathy in Type 2 Diabetes:The United Kingdom Prospective Diabetes Study (UKPDS 64). Kidney International,2003,63:225-232.
    [153]李修洋.仝小林教授运用经方治疗糖尿病剂量规律的数据挖掘研究.2010.http://wenku.baidu.com/view/5381331b6bd97f192279e98c.html.
    [154]V. Kumar, N. Fausto, A. Abbas. Robbins & Cotran Pathologic Basis of Disease.7th Edition. Saunders,2004.
    [155]World Health Organization. Definition, Diagnosis and Classification of Diabetes Mellirus and Its Complications:Report of a WHO Consultation. Part 1. Diagnosis and Classification of Diabetes Mellitus.2007.http://www.who.int/diabetes/publications/en/.
    [156]国家中医药管理局.中医病证诊断疗效标准.南京:南京大学出版社,1994.
    [157]郑红.消渴病及其并病方药证治规律研究.博十论文.山尔中医药大学,2005.
    [158]仝小林.降糖心悟.中国医药学报,2004,19(1):36-36.
    [159]C. A. Sugar, G. M. James, L. A. Lenert, R. A. Rosenheck. Discrete State Analysis for Interpretation of Data from Clinical Trials. Medical care,2004,42(2):183-196.
    [160]S. L. Scott, G. M. James, C. A. Sugar. Hidden Markov Models for Longitudinal Comparisons. Journal of the American Statistical Association,2005,100:359-369.
    [161]J. MacQueen. Some Methods of Classification and Analysis of Multivariate Observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp.281-297.
    [162]M. R. Gold, D. Stevenson, D. G. Fryback. HALYS AND QALYS AND DALYS, OH MY: Similarities and Differences in Summary Measures of Population Health. Annual Review of Public Health,2002,23:115-134.
    [163]T. Dean, R. Givan. Model Minimization in Markov Decision Processes. In:Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI'97), Providence, Rhode Island, AAAI Press,1997, pp.106-111.
    [164]T. Dean, K.-e. Kim, R. Givan. Solving Stochastic Planning Problems With Large State and Action Spaces. In:Proceedings of the Fourth International Conference on Artificial Intelligence Planning Systems (AIPS'98), AAAI Press,1998, pp.102-110.
    [165]R. Givan, T. Dean, M. Greig. Equivalence Notions and Model Minimization in Markov Decision Processes. Artificial Intelligence,2003,147(1-2):163-223.
    [166]J. Pineau, G. Gordon, S. Thrun. Policy-Contingent Abstraction for Robust Robot Control. In: Proceedings of the 19th Annual Conference on Uncertainty in Artificial Intelligence (UAI'03), Acapulco, Mexico, San Francisco, CA, Morgan Kaufmann,2003, pp.477-484.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700