不完整数据的贝叶斯网络参数学习新算法

英文题名：An Algorithm of Parameter Learning in Bayesian Network from Incomplete Data
作者：董辉
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：贝叶斯网络 ; 参数学习 ; 不完整数据集 ; 数据采掘
英文关键词：Bayesian networks ; parameter learning ; incomplete dataset ; Data Mining
学位年度：2003
导师：林士敏
学科代码：081202
学位授予单位：广西师范大学
论文提交日期：2003-04-15

摘要

在现实世界中存在着海量数据，因此如何处理这些数据并从中发现知识是具有现实意义的亟待解决的问题。随着信息技术的发展，数据挖掘技术已经越来越广泛的应用于实际的运用中，而贝叶斯网络作为不确定性环境下一种有力的知识表示方式和概率推理模型，是处理数据挖掘的强有力工具。
    贝叶斯网络是在不确定性环境下有效的知识表示方式和概率推理模型，是一种流行的图形决策化分析工具。近年来，人们研究了直接从数据中学习并建立贝叶斯网络的问题，并把它用于数据挖掘。虽然基于贝叶斯网络的数据挖掘技术仍处于不断完善之中，但它已经在一些数据建模问题中取得令人瞩目的成绩。
    贝叶斯网络学习有两大问题：参数学习问题和结构学习问题。在现实世界中，不完整数据是广泛存在的，如何从不完整数据中学习贝叶斯网络的参数和结构一个非常实用而有价值的问题。其中，基于不完整数据的参数学习问题要做到精确处理是非常困难的，现有的算法处理此类问题都采用近似的算法。这些算法在解决大数据集时由于需要很多次循环迭代，故效率不高，且占用系统资源较多。
    本文首次给出一种新的基于学习的相容性的BCL参数学习算法，可用于在不完整数据集下进行的贝叶斯网络参数学习。
    新算法是以相容的贝叶斯学习的渐进正态性为理论基础。在胡振宇的硕士毕业论文中推导得出以下结论：若正则条件成立，且 , 则的后验概率，以概率1趋近于，。（这里是参数）
    这个结论告诉我们：当观测到的样本数据量趋于无穷时，用贝叶斯方法学习的参数θ趋于一个正态分布。由于参数的分布性质已经确定，所以可以用来直接估计出参数的值。
    考虑到算法是基于不完整数据集的，所以修补完全数据集对计算结果的精确性有很大影响，因此应首先处理这个问题。我们在此应用了贝叶斯启发式方法（BHA－Bayesian Heuristic Approach），试图将先验信息的影响加入到修补数据集的过程之中，我们是这样做的：首先利用已有的完整的数据样本，先初步估计出参数θ的值，然后利用公式：


     =
    修补完全给出的数据样本集。
    如上所述，本算法主要有两个关键：（1）如何较好地修补数据集，（2）算法的主体采用何种近似方法估计出参数。基于以上分析，我们提出一种新的参数学习算法―BCL算法，BCL算法主要由以下几个步骤实现：
     第一步：从不完备样本数据集中抽取相对完整的样本数据，估计出可能的参数向量值，即直接利用局部数据计算出服从正态分布的参数初始值。
     第二步：在已得初始参数的情况下，补充剩余不完备数据集，以便估计出概率上最匹配的参数向量集。
     第三步：利用已完全的数据，用矩法估计近似出最终值。
    在实验阶段，我们通过对两个经典贝叶斯网络Asia网络，Alarm网络（此两个网络是医疗上已经成功运用于专家系统的贝叶斯网络）使用BCL算法和传统两种算法：Gibbs Sampling算法和EM算法分别进行参数学习，并且在运算结果的差错率和运行时间上分别进行比较，实验结果可以看出我们的算法在样本少量的情况下精确度较高，而时间代价相当。在大样本容量的情况下，精确度相当的情况下，时间代价明显低于以上两种算法。
    本文的研究工作把贝叶斯网络（作为一种数据挖掘技术）的理论算法向前推进了一步。
Along with the Information Technology development, technique in Data Mining has been applied to reality more and more. Bayesian networks, as a strong model that can represent knowledge and deal with data under uncertainty, is a powerful tool to handle out the data in Data Mining. There exists a large of data in realistic world, therefore how to handle these datasets to discover the knowledge from it is a problem that need to solve urgently.
    Bayesian Networks is a model that efficiently represents knowledge and probabilistic inference and is a popular graphics decision-making analysis tool. In recent years, the people directly study how to learn Bayesian Networks from data and begin to apply it to Data Mining. Although Data Mining technology is still placed in continuously perfect, it has obtained achievement that make person focusing attention in some data modeling problem.
    There are two problems in Bayesian Networks: Learning and Inference. In the real world, not exact data exist here and there, how to learn the parameters and structure of Bayesian Networks from data is of practical value greatly. Learning parameter from incomplete data accurately is very difficult in fact. The algorithms exist deal with this kind of problem use approximate approach, these algorithms need many loops and replace so the efficient is not high and occupancy much system resource.
     In this thesis we proposed an algorithm call BCL based in consistency. It can apply to learn parameters of Bayesian Networks from incomplete data.
    The new algorithm is based in the property of tendency and normal distribution of consistent Bayesian learning. In Hu Zhenyu's postgraduate thesis there is the following result : if regularity conditions exist, and , the post probability that , namely tends to in probability as . (Here is parameter ).
    This result tells us that when sample data observed tend to infinite, the parameter learned by Bayesian method tends to normal distribution. The property of parameter's distribution is fixed on so we can estimate the value of parameter by it.
    Considering this algorithm is used in incomplete data, so repairing the dataset

    influence the exactitude of the result. We should deal with this problem cautiously. We will use Bayesian Heuristics Approach and try to add the influence of prior massage to the procedure of repairing the dataset. We do that as follow: firstly we estimate the parameter using the complete part of dataset. Then we use the formula:
    =
    to complete the dataset.
    This algorithm consists of two keys:(1)how to recover the dataset,(2).how to estimate the parameters. This thesis analyses these two sides, and proposed a feasible algorithm-BCL ,BCL algorithm is made up of the following steps:
     Step1：take out the opposite and complete data from the dataset, making use of the formula 2.11 and Bayesian Heuristics Approach to estimates a parameter of possibility vector value, is that making use of directly the partial data to compute the the parameter obedient to normal distribution.
     Step2: under the situation of recognizing tacitly the original parameter, repair the rest incomplete data use the formula, preparing to estimate of all parameter vector most fit.
     Step3:；Make use of the complete data to estimate the end value with the matrix method
    At the stage of experiment, we pass to two classics Bayesian Network Asia network and Alarm( this two networks is a shell that medical treatment ascend the already successful application in the expert system ).We use Algorithm BCL and two kinds of calculate ways: Gibbs Sampling and EM to learn the parameters of two Bayesian Networks respectively. In addition we compare error rate and running time respectively, from the result we can indicate that our algorithm is high accurate but time is close in the condition of small sample data and in the condition of large sample data when the accurate is close but time occupy is smaller much than the other two algorithms.
    Based on the above study, we developed a Bayesian Networks based

引文

[1] J Pearl. Probabilistic reasoning in intelligent systems: Network of plausible inference. Morgan Kaufmann ,San Mateo,CA,1988.
    [2] F.V. Jensen .An introduction to Bayesian networks. UCL Press,1998.
    [3] D. Heckerman. Bayesian networks for data mining. Data Mining and Knowl. Discov., 1:79--119, 1997.
    [4] 吴喜之. 现代贝叶斯统计学. 中国统计出版社. 2000年10月第一版 Pages:4
    [5] D. Heckerman .A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-6,Microsoft Research ,Advanced Division,1995
    [6] D. Geiger D. Heckerman .and D.M. Chickering. Learning Bayesian Networks :The combination of knowledge and statistical data. Machine Learning,9:309-347,1992
    [7] G.F. Cooper and E. Herskovits .A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309-347,1992.
    [8] J. Rissanen.Stochastic complexity in statistical inquiry. In New Jewsey: World Scientific Publishing Company,1989
    [9] David Heckerman, Dan Geiger, and David Maxwell Chickering .Learning Bayesian networks: The combination of knowledge and statistical data. In KDD Workshop ,pages 85-94,1994.
    [10] R.M.Fung and S.L.Crawford. Constructor: A system for the induction of probabilistic models .In AAAI-90 proceedings. Eighth National Conference on Artificial Intelligence ,MIT Press,1990,vol.2pp.762-769,1990.
    [11] J.cheng, D.A. Bell and W.Liu. Learning Bayesian networks from data :An efficient approach based on information theory. Technical Report ,Dept.of Computing Science ,University of Alberta,1998.
    [12] J.cheng, D.A. Bell and W.Liu. Learning belief networks from data :An information theory based approach .Proceeding of the sixth ACM International Conference on Information and Knowledge Management,1997.
    [13] P.Spirtes and C.Glymour .An algorithm for fast recovery of sparse casual graphs. In Social Science Computer Review ,volume 9(1),pages 62-73,1991.
    [14] K. Murphy. Learning Bayes net structure from sparse data sets. Technical report, Computer Science Division, University of California, Berkeley, 2001.
    [15] Spirtes P.and Glymour C.and Scheines R.Causality prom probability .In G.McKee ed.: Evolving knowledge in natural and artificial intelligence (London :Pitman),pages 181-199,1990
    [16] D.Heckerman.." Bayesian networks for Knowledge reprsentation and learning ",in Advanced in Knowledge Discovery and data

    mining ,U.M.Fayyad, G.Piatetsky-shapiro, P.smyth, and R.S.U thurrasrmy ,Eds.MIT Press.1995.
    [17] Max Welling. Approximate Inference. Meeting Unsupervised Learning,2000.
    [18] Mackay,D. Introduction to monte carlo methods.In Jordans,M., editor, learning in Graphical Models,pages 175-204.MIT Press.1996.
    [19] D. Heckerman, D. Geiger. Learning Bayesian networks: A unification for discrete and Gaussian domains. In Proc. of the 11 t lnt. Conf. on Uncertainty in Al. 1995.
    [20] S.L.Lauritzen, "The EM algorithm for graphical association models with missing data", Computational Statistics and Data Analysis , vol.39, pp. 191-201,1995.
    [21] G.F.Cooper .The computational complexity of probabilistic inference using Bayesian network. Artificial Intelligence,42:2-3,1990.
    [22] R. Dechter, Bucket elimination: A unifying framework for probabilistic inference, in: Proc. 12th Conference on Uncertainty in Artificial Intelligence， Portland ,OR,1996,pp.211-219.
    [23] D.J. Spigelhalter S.L. Lauritzen .Local computation with probabilities on graphical structure and their application to expert systems .Journal of Royal of Royal Statistical Society, 1998.
    [24] G.R.Shafer , P. P. Shenoy, and Ann.Math .Probability propagation. Artificial Intelligence, 2:327-352, 1990.
    [25] K.G.Olesen F.V.Jensen.S.L.lautitzen .Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quarterly ,pages 269-282,1990.
    [26] Shafer,G. &Shenoy,P. Probability propagation, Annals of Mathematics and Artificial Intelligence 2:327-352.
    [27] M.Henrion. Search-based methods to bound diagnostic probabilities in very large belief networks.In Proc. Seventh Conf.on Uncertainty in Artificial Intelligence,12 1991.
    [28] David Poole .probabilistic conflicts in a search algorithm for estimating posterior probabilities in Bayesian networks. Artificial Intelligence,88(1-2):69-100,1996.
    [29] R.M.Neal .Probabilistic inference using markov chain monte-karlo methods. Technical Report CGR-TR-93-1,Univ. of Toronto,1993.
    [30] 林士敏，胡振宇，陆玉昌.相容的贝叶斯学习及其先验无关性. 南京大学学报（自然科学）2002，3.
    [31] Freedman, D. On the asymptotic behavior of Bayes estimates in the discrete case I.Ann.Statist.14 69-87.1963
    [32] Freedman,D.On the asymptotic behavior of Bayes estimates in the discrete case I.Ann.Statist.36,1386-1403.1965


    [33] 胡振宇硕士毕业论文. 贝叶斯学习的先验分布的研究，2001. page14
    [34] Zhengyu Hu, Shimin Lin, Yuchang Lu .Asymptotic normality of posterior in consister Bayesian learning, 第二届机器学习与控制论国际会议（ICMLC-2003）(待发表)
    [35] 林士敏、田凤占、陆玉昌，贝叶斯学习，贝叶斯网络与数据采掘。计算机科学，2000,Vol.27(10):69。
    [36] 林士敏、王双成、陆玉昌. 贝叶斯方法的计算学习机制与问题求解[J]。清华大学学报，2000，40(9):61-64。LIN Shimin,WANG Shuangcheng, LU Yuchang, Computational learning mechanism of Bayesian approach and problem solving [J]. J of Tsinghua University, 2000,40(9):61-64. (in Chinese)
    [37] Shimin Lin, Zhengyu Hu,Yuchang LU, Bayesian Heuristic Approach for Learning Bayesian Networks. Asia Journal of Information Technology2(1):13-17,2003.
    [38] www.ai.nit.edu/murphyk/Software/BNT/bnt.html
    [39] R. Chen, K. Sivakumar, and H. Kargupta. Collective mining of Bayesian networks from distributed heterogeneous data. Knowledge and Information Systems, 2002.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700