基于PCA的贝叶斯网络构造算法应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
贝叶斯网络是用来表示变量间概率分布的图形模式,它提供了一种自然的表示因果信息的方法,用来发现数据间的潜在关系,具有稳固的数学基础,由于其具有图形化的模型表示形式、局部及分布式的学习机制、直观的推理;适用于表达和分析不确定性和概率性的事物;能够对不完全、不精确或不确定的知识或信息做出有效的推理等特性,而成为目前不确定知识表达和推理领域最有效的模型之一。如何通过有效的方法和算法利用现实数据学习贝叶斯网络,并准确地表达蕴含在数据中有价值的信息是目前研究的热点和难点。本文采用基于信息论的方法进行贝叶斯网络的结构学习,并针对其当节点集越大,计算效率越低的缺点采用PCA降维,减少节点集的数量,提高算法的效率,主要工作如下:
     1、用模糊聚类对连续数据或混合数据进行离散化;对数据集用PCA主元分析算法进行降维,减少其中节点的个数;
     2、运用Gibbs抽样算法对数据集中的缺失数据进行补充,用基于信息论的方法学习贝叶斯网络结构;
     3、用分类实验验证基于PCA的贝叶斯网络分类器的准确率及算法效率,并对乙烯生产中不同生产规模或不同技术的能耗及物耗相关数据进行贝叶斯数据融合,得到的结果对乙烯生产中能耗物耗水平的评价有一定的参考价值。
The Bayesian belief network is a powerful knowledge representation and reasoning tool under conditions of uncertainty.A Bayesian belief network is a directed acyclic graph with a conditional probability distribution for each node,With a solid math foundation.Bayesian networks is one of the most efficient models in the fields of uncertain knowledge expression and inference.It has the following characteristics: the expression form of graph model,partial and distributed study mechanism and directly perceived inference;applicable in expressing and analyzing uncertain and probability things and efficiently reasoning partial,inaccurate and uncertain knowledge or information.In the field of graph model and data mining,the central issue and difficult point is how to learn Bayesian networks and to accurately express valuable information in the data through the efficient methods and algorithm.This paper using the algorithm of learning bayesian network from data on information theory and according to the disadvantage of it,using PCA to reduce the dimensionality of the database amended by Gibbs sampling to cut down the number of nodes of data and improve efficiency of learning bayesian network.
     The main work are as follows:
     1、Using fuzzy clustering discretize the continuous attribute and using PCA to reduce the dimensionality of the database to cut down the number of nodes of data;
     2、Amending missing data by Gibbs sampling,Learning the structure of bayesian network using method on information theory from data;
     3、To verity the accuracy and efficiency of the algrithm of learning bayesian network from data,using bayesian network to classify data;and using bayes data fusion to fuse the data from different installation of ethylene production,the results have a certain extent reference value.
引文
[1]黄解军.贝叶斯网络结构学习及其在数据挖掘中的应用研究[D].武汉:武汉大学,2005
    [2]黄友平.贝叶斯网络研究[D].北京:中国科学院研究生院,2005.1-5
    [3]关菁华.基于依赖分析的贝叶斯网络结构学习和分类器的研究实现[D].吉林:吉林大学,1979
    [4]Heckennan D.A Bayesian approach for learning causal networks[C].Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence,Montreal QU:Morgan Kaufmann,1995,285-295
    [5]Chickering D,Geiger D,Heckennan D.Learning Bayesain networks:Search methods and experimental results[C].In Fifth International Worshop on Artificial Intelligence and Statiscs,1995,112-128
    [6]张连文,郭海鹏.贝叶斯网引论[M].北京:科学出版社,2006
    [7]聂文广,刘惟一,杨运涛,等.基于信息论的Bayesian网络结构学习算法研究[J].计算机应用,2005,25(1):1-3,10
    [8]曹冬明,张伯明,邓佑满,等.一种新型故障定位方位方法的研究[J].电力系统自动化,1999,23(7):12-14
    [9]李伟生,王宝树.实现规划识别的一种贝叶斯网络[J].西安电子科技大学学报(自然科学版),2002,29(6):741-744
    [10]邓勇,施文康,陈良州.基于模型诊断的贝叶斯解释及应用[J].上海交通大学学报,2003,37(1):5-8
    [11]李明,邓家梅,曹家麟.基于贝叶斯网络的串行译码方法[J].通信技术,2001,4:38-40
    [12]Lucas P J F.Expert knowledge and its role in learning Bayesian networks in medicine:An appraisal[J].LECTNOTES ARTIF INT,2001,2101:156-166
    [13]Onisko A,Lucas P,Druzdzel M J.Comparison of rule-based and Bayesian network approaches in medical diagnostic systems[J].LECT NOTES ARTIF INT,2001,2101:283-292
    [14]Beinlich I A,Suermondt H J,Chavez R M,et al.The ALARM monitoring system:A case study with two probabilistic inference techniques for belief networks[C].In:Proceedings of the 2th Second European Conference on Artificial Intelligence in Medicine,London,England,1989:247-256
    [15]Heckerman D,Mamdani A,Wellman M.Real-world applications of Bayesian Networks[J].Communications of the ACM,1995,38
    [16]Ho K M,Scott P D.Zeta:A global method for discretization of continuous variables.In:Proceedings of KDD97,Newport beach CA,USA,1997,191-194
    [17]Neil M,Fenton N,Forey S,et al.Using Bayesian belief networks to predict the reliability of military vehicles[J].COMPUT CONTROL ENG,2001,12(1):11-20
    [18]Alberola C,Tardon L,Ruiz-Alzola J.Graphical models for problem Solving[J],COMPUT SCI ENG,2000,2(4):46-57
    [19]Rodrigues M A,Liu Y,Bottaci L,et al.Learning and diagnosis in manufacturing processes through an executable Bayesian network[J].LECT NOTES ARTIF INT,2000,1821: 390-395
    [20]Sillanpaa M J,Corander J.Model choice in gene mapping:what and Why[J].TRENDS GENET,2002,18(6):301-307
    [21]Raval A,Ghahramani Z,Wild DL.A Bayesian network model for protein fold and remote homologue recognition.BIOINFORMATICS,2002,18(6):788-801
    [22]Geman S,Kochanek K.Dynamic programming and the graphical representation of error-correcting codes.IEEE T INFORM THEORY,2001,47(2):549-568
    [23]Raval A,Ghahramani Z,Wild DL.A Bayesian network model for protein fold and remote homologue recognition.BIOINFORMATICS,2002,18(6):788-80
    [24]McCabe B.Biliefnetworks for engineering applications[J].1NT J TECHNOL MANAGE,2001,21(3-4):257-270
    [25]Gemela J.Financial analysis using Bayesian networks[J].APPL STOCH MODEL BUS,2001,17(1):57-67
    [26]Giudici P.Bayesian data mining with application to benchmarking and credit Scoring[J].APPL STOCH MODEL BUS,2001,17(1):69-81
    [27]Millan E,Perez-de-la-Cruz J L,Suarez E.Adaptive Bayesian networks for multilevel student modelling[J].LECT NOTES COMPUT SC,2000,1839:534-543
    [28]Socher G,Sagerer G,Perona R Bayesian reasoning on qualitative descriptions from images and speech[J].IMAGE VISION COMPUT,2000,18(2):155-172
    [29]Muhlenbein H,Mahnig T.Evolutionary optimization using graphical models.NEW GENERAT COMPUT,2000,18(2):157-166
    [30]Pham T V,Worring M,Smeulders A W M.Face detection by aggregated Bayesian network classifiers.PATTERN RECOGN LETT,2002,23(4):451-461
    [31]Wooff D A,Goldstein M,Coolen F P A.Bayesian graphical models for sofeware testing.IEEE T SOFTWARE ENG,2002,28(5):510-525
    [32]Pearl J.Probabilistic reasoning in intelligent systems:networks of plausible inference.San Mateo,California,Morgan Kaufmann,1988
    [33]李小琳.面向智能数据处理的贝叶斯网络研究应用[D].吉林:吉林大学,2005,6-8
    [34]Jackson J E.Quality control methods for two related variables[J].Industrial Quality Control,1956,7:2-6
    [35]Jackson J E.Quality control methods for several related Variables[J].Technometrics,1956,1:359-377
    [36]MacGregor J F.Statistical process control of multivariate processes[C].In Proc.of the IFAC Int.Symp.On Advanced Control of Chemical Processes,New York:Pergamon Press,1994,427-435
    [37]Russell E L,Chiang L H,Braatz R D.Data-driven Techniques for Fault Detection and Diagnosis in Chemical Processes,Springer-Verlag,London,2000
    [38]Kresta J V,Marlin T E,MacGtrgor J E Multivariable statistical monitoring of process operating performance[J].Can.J of Chem.Eng,1991,69:35-47
    [39]Piovoso M J,Kosanovich K A,Pearson R K.Monitoring process perfomance in real time[C].In Proc.Of the American Control Conf.,Piscataway,New Jersey,IEEE Press,1992:2359-2363
    [40]Kosanovich K A,Piovoso M J,Dahl K S,et al.Multi-way PCA applied to an industrial batch process[C].In Proc.of the American Control Conf:,Piscataway,New Jersey,IEEE Press,1994,1294-1298
    [41]Wise B M,Gallagher N B.The process chemomtrics approach to process monitoring and fault detection[J].J.of Process Control.1996,6:329-348
    [42]Dunia R,Qin S J.Joint diagnosis of process and sensor faults using principal Component analysis[J].Control Engineering Practice,1998,6:457-469
    [43]Kaspar M H,Ray W H.Chemometric methods for process monitoring and high-performance controller design.AIChE J.,1992,38:1593-1608
    [44]Luo R,Misra M,Himmelblau D M.Sensor fault detection via multiscale Analysis and dynamic PCA[J].Ind,Eng,Chem.Res.,1999,38:1489-1495
    [45]Kramer M A.Nonlinear principal compenent analysis using autoassociative Neural networks.AIChE J.,1991,37:233-243
    [46]Dong D,McAvoy T J.Nonlinear principal component analysis-based on Principal curves and neural networkds,Comput.Chen.Eng.1996(20):65
    [47]Dunia R,Qin S J,Edgar T F,et al.Identification of faulty sensors using principal component analysis.AIChE J.,1996,42:2797-2812
    [48]Ku W,Storer R H,Georgakis C.Disturbance detection and isolation by dynamic principal component analysis[J],Chem.Intell.Lab.Syst.,1995,30:179
    [49]Kouti T,MacGregor J F.Multivariate SPC methods for process and product Monitoring[J].Quality Technology,1996,28:409-428
    [50]Bakshi B R.Multiscale PCA with application to multivariate statistical processMonitoring[J].AIChE J,1998,44:1596-1610
    [51]Daubechies I.Ten lectures on wavelets[J].SLAM,Philadelphia,1992
    [52]Konsanovich.Improved process understanding using multiway principal Component analysis[J].Ind.Eng.Chen.Res,1996,35:138-146
    [53]Boque R,Smilde A K.Monitoring and diagnosis batch processes with mulriway covariates regression models[J].AIChE,J.,1999,45:1504-1520
    [54]Chen J H,Liu K.On-line batch process monitoring using dynamic PCA and Dynamic PLS models[J].Chem.Eng Sci.,2002,57:63-75
    [55]Chen J,Liu J.Process monitoring using principal component analysis in different operating time processes[J].Preprints of 14~(th) IFAC World Congress,Beijing,N:91-96
    [56]Lane S.Monitoring of multi-product process[J].Preprints of 14~(th) IFAC World Congress,Beijing,N:97-102
    [57]Tong H,Crowe C M.Detection of gross errors in data reconciliation by Principal component analysis.AIChE J.,1995,41:1712-1722
    [58]Data Mining Concepts and Techniques,Second Edition.Jia wei Han,Micheline Kamber.北京:机械工业出版社,2007
    [59]数据挖掘中的数据预处理[EB/OL].(2007-12-17).[2009-4-1].http://hi.baidu.com/dingzhoufang/blog/item/927a0afa8813818a9f51463b.html.
    [60]Dougherty J,Kohavi R,Sahami M..Supervised and unsupervised discretization of continuous features[C].In Proc.Twelfth International Conference on Machine Learning.Los Altos,CA:Morgan Kaufmann,1995,194-202
    [61]Ian H.Witten,Eibe Frank,.数据挖掘实用机器学习技术[M].北京:机械工业出版社,396-305
    [62]Kantardzic M.Data Mining:Concepts,Models,Methods,and Algorithms[M]. IEEE press,2003.19-22,54-58
    [63]汪庆,张巍,刘鹏.连续特征离散化方法综述[J].上海:上海财经大学.
    [64]模糊离散化算法[EB/OL].[2008-09-28].http://home.dei.polimi.it/matteucc/Clustering/tutorial_ht.
    [65]关于数据缺失问题的总结[EB/OL].(2008-07-01).[2009-04-03].http://hi.baidu.com/hihsw/blog/item/9ff6ab44beb65b44500ffe78.html
    [66]王双成,苑森淼.具有丢失数据的贝叶斯网络结构学习研究[J].软件学报,2004,15(7):1042-1048
    [67]Gibbs 抽样.[EB/OL].(2006-05-20).[2009-4-5].http://pandawendao.spaces.live.com/blog/cns!6327f5bc5215a21c!143.entry.
    [68]Friedman N,Geiger D,Goldszmidt,M.Bayesian network classifiers.Machine Learning,1997,29(3):131-161
    [69]赵雅明,金祥林,刘智勇.因子分析法在试卷分析中的应用[J].数理统计管理.1995
    [70]方吴丰.基于PCA并行算法的学生评教系统的改计实现[D].东北师范大学,2008
    [71]毛振华.基于主元分析的自适应过程监控方法研究[D].浙江:浙江大学,2008
    [72]方开泰.实用多元统计分析[M1.上海:华东师范大学出版社.1992
    [73]Hastie T,Tibshirani R,Friedman J.统计学习基础[M].北京:电子工业出版社,2004.
    [74]Cheng J,Bell D,Liu W.Learning Bayesian Networks from Data An Efficient Approach Based on Information Theory.Artificial Intelligence,2002,137(1-2):43-90
    [75]姜丹,钱玉美.信息论编码[M].北京:科学出版社,1992
    [76]聂文广,刘惟一,杨运涛,等.基于信息论的Bayesian网络结构学习算法研究[J].计算机应用,2005,25(1):1-3,10
    [77]gezdek J C,Keller J M,Krishnapuram R,et al.wlll the Real IRIS Data Please Stand Up[J].IEEE Trans on Fuzzy System,1999,7(3):368-369
    [78]孟宪尧,白广来,伞宝钢,等.贝叶斯数据融合技术在机舱故障智能诊断中的应用[J].大连海事大学学报,2002,28(3):10-13
    [79]潘巍,王阳生,杨宏戟.多模态信息融合的一般功能模型设计——基于融合功能信息层次[J].北京:首都师范大学信息工程学院,2006
    [80]Xiang Y,Pant B,Eisen A,et al.Multiply sectioned bayesian networks for neuromuscular diagnosis.Artificial Intelligence in Medicine,1993,5(4):293-314

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700