基于贝叶斯网数据挖掘若干问题研究

英文题名：Research on Bayesian Network Based Approach for Data Mining
作者：关菁华
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：贝叶斯网 ; 分类 ; 聚类 ; 集成学习 ; 概念漂移 ; 粒子群算法 ; 遗传算法
英文关键词：Bayesian Network ; Classification ; Clustering ; Ensemble learning ; Concept Drift ; Partical Swarm Optimization ; Genetic Algorithm
学位年度：2009
导师：刘大有
学科代码：081203
学位授予单位：吉林大学
论文提交日期：2009-10-01
答辩委员会主席：杨华民

摘要

贝叶斯网以其丰富的概率表达能力、灵活的推理能力、综合先验知识的特性及其坚实的理论基础引起科研与应用人员的广泛关注,涌现出大量的基于贝叶斯网模型的数据挖掘方法。
     本文以基于贝叶斯网的数据挖掘方法中存在的若干研究问题为背景,对分类问题、聚类问题以及增量学习中的概念漂移问题等进行了研究,重点研究基于贝叶斯网的解决方法。
     本论文的主要内容包括:
     1)在比较和分析了现有四种经典贝叶斯网分类器的基础上,提出了基于实例选择的贝叶斯网分类器集成算法。该方法选择当前测试实例在训练集中的k最近邻作为验证集,根据各分类器在此验证集上的分类准确性,确定单个分类器的权重,并采用加权投票法进行结果组合,以提高分类准确性;
     2)针对增量学习中的概念漂移问题,提出一种自适应集成学习算法AMCE,该算法中各个分类器的权重可独立地进行调整,以增强自适应能力;采用剪枝策略对冗余的个体分类器进行约简,以提高集成的泛化性能;提出了基于方向选择的分类器集成算法OSEN,以降低参与集成的个体分类器数目,并提高集成的泛化性能;
     提出采用遗传算法从当前集成分类器中选择部分个体分类器参与集成,以降低集成的泛化误差;
     3)针对朴素贝叶斯聚类问题,提出基于离散粒子群的朴素贝叶斯混合聚类算法HDPSO。该算法具有较好的全局搜索性能,并混合EM算法对单个粒子进行局部寻优,以提高算法的收敛速度。
     通过大量实验验证了本文所提算法的有效性和实用价值。本文的工作预期对国内数据挖掘领域该领域的发展起到一定推进作用,本文对概念漂移数据的分类方法研究方面特色鲜明,具有较高的理论意义和实际应用价值。
With the development of computer technology and the applications of computer network, the data generated daily is also increasing at a unprecedented speed. How to manage or use these data, and to transform them into useful information and knowledge for decision-making, becomes an important problem. Data Mining arise in response to this demand. In short, Data Mining is the process to search useful hidden knowledge that stored in the large database. It makes use of statistics and Artificial Intelligence technology to handle and analysis data, detect hidden knowledge and construct different models according to practical problem; as a result it provides a reference basis for decision-making.
     Bayesian Network as a key technology to deal with uncertainty problem was paid close attention by more and more researchers and application developer because of its following features: first of all, it has rich power to express probability knowledge; second, it can synthesize prior knowledge, and realize incremental learning; third, it has a solid mathematical foundation. Therefore a large number of Bayesian Network-based data mining methods emerged. Although Bayesian Network performs well in model diagnosis and other area, there are still some issues to be resolved. For example, it is a NP-hard problem to construct unconstrained Bayesian Network, how to deal with data stream that has concept-drifting is also a problem.
     The main results and contributions included the algorithm based on Bayesian Network of classification and clustering, as well as dynamic integration approach for ensembles to handle concept drift. The main achievements are as follows:
     (1) The task of classification is to assign one of predefined classes to instances described by a set of attribute (feature) values. Classification analysis is a very active research topic in machine learning and data mining. Classification can be applied to a wide variety of domains: pattern recognition, forecasting, medical diagnosing, fault diagnosis, loan applications, user modeling, biomedical informatics, etc. At present, a great many of techniques have been developed based on Logical techniques(decision tree,rules-based techniques), based on Perceptron, based on Statistics(Bayesian Networks, SVM) and based on Instance(lazy learning).
     In the recent years many researchers pay attention to Bayesian networks which are efficient for classification. Naive Bayes is a kind of simple Bayesian networks with a strong assumption of independence among attribute nodes in the context of the class node.
     This thesis empirically evaluates algorithms for learning four types of Bayesian Networks classifier: NB, TAN ,BAN and GBN, where the latter two are constructed by using BN structure learning algorithm DSBNL. This thesis implemented these learners to test their performance in classification problems. This thesis also argued some factors that influence classifier’s accuracy, such as the thresholdεand training sample size. The experiments show that the larger the data sets are the fewer the prediction errors and that for some data sets this algorithm cannot get the best accuracy with fixed threshold. The analysis of our experimental results suggests two ways to improve predictive performance of BN classifiers. Firstly, this thesis uses threshold selection based on the prediction accuracy to void overfitting and underfitting in construction of classification models. Secondly, This thesis proposed example selection based integration algorithm of NB, TAN and BAN with weight. This thesis demonstrates empirically that this integration algorithm does work as expected. Collectively, these results argue that BN classifiers deserve more attention in machine learning and data mining communities.
     (2) More and more data in the real world are dynamic data, rather than the static data. The pattern uncoverd in the data may change more or less as the time goes by. Incremental learning ability becomes important for Data Mining algorithm. Concept-drifting is a common problem in incremental learning. In order to adapt the variation of concept, we need to improve the adaptability of the system, but stability of the algorithm will decline in the meantime. On the contrary, we need to improve the stability of the system so that it can make better use of historical information and knowledge, now the adaptability will be affected. In consequence, it is hard to learn correctly when concept-drifting occurred.
     In the real world, concepts are often not stable but change with time, which is known as the problem of concept drift. Concept drifts complicate the task of learning and require unusual solutions, different from commonly used batch learning techniques. We consider synthetically generated concept drifts and an example of concept drift from the area of antibiotic resistance. Among the most popular and effective approaches to handling concept drift is ensemble learning, where a set of concept descriptions built on data blocks corresponding to different time intervals is maintained, and the final prediction is the aggregated prediction of ensemble members
     At present, the problem of concept drift has received considerable attention. There are two approaches to handling concept drift: instance based approach and ensemble learning approach. The literature on concept drift shows that ensemble learning is the most promising approach. In this thesis, we propose an ensemble learning algorithm AMCE(Adaptive Multiple Classifiers Ensemble) which can adaptively adjusts weights of classifiers. In order to improve the adaptive ability, this novel algorithm distributes parameters for adjusting weight to each classifier respectively, and dynamically adjusts the parameters during online learning to maximize performance. We also use pruning method based on KL divergence to eliminate redundancy classifiers. Experimental results show that the proposed algorithm improves the predictive accuracy, speed and reduces memory space, and is effective in tracking concept drift with noise.
     In data streams concept are often not stable but change with time. This thesis proposed a selective integration algorithm OSEN (Orientation based Selected ENsemble) for handling concept drift data streams. This algorithm selects a near optimal subset of base classifiers based on the output of each base classifier on validation dataset. The experiments with synthetic data sets simulating abrupt (SEA) and gradual (Hyperplane) concept drifts demonstrate that selective integration of classifiers built over small time intervals or fixed-sized data blocks can be significantly better than majority voting and weighted voting, which are currently the most commonly used integration techniques for handling concept drift with ensembles. This thesis also explained the working mechanism of OSEN from error-ambiguity decomposition. Based on experiments, OSEN improves the generalization ability through reducing the average generalization error of the base classifiers constituting the ensembles.
     This thesis proposed a selective integration algorithm DGASEN (Dynamic GASEN) for handling concept drift data streams. This algorithm selects a near optimal subset of base classifiers based on the output of each base classifier on validation dataset. The experiments show that DGASEN improves the generalization ability.
     (3) Clustering is the unsupervised classification of patterns into clusters, because unlike classification (known as supervised learning), no a priori labeling of some patterns is available to use in categorizing others and inferring the cluster structure of the whole data. The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in data analysis. Many algorithms have been proposed, such as model-based algorithm, distance-based algorithm, density-based algorithm and deviation-based algorithm. In this paper, we concentrate on the research of the model-based algorithm. The most frequently used approaches include mixture density models mixture models and bayesian networks.
     This thesis has described how to apply DPSO method to parameter estimation for NB clustering effectively. This thesis found that the DPSO algorithm converges much slowly than the EM algorithm according to experimental results, but it can get better global optimal solution. At the same time, the EM method is sensitive to initial solution and easy to get local optima. Because DPSO and EM algorithms have their own drawbacks respectively, in order to improve the efficiency of DPSO, local search algorithm-EM is introduced into the traditional DPSO method. The EM algorithm makes every particle can find the local optimal solution in current space. This local search process improves the performance of swarm.
     The research results of the thesis will greatly enrich and push the studies of the algorithm of clustering and classification based on Bayesian Network, as well as the algorithm to solving concept drift problem in both theoretical and technological aspects.

引文

[1]张连文,郭海鹏.贝叶斯网引论[M].北京:科学出版社, 2006.
    [2]王双成.面向智能数据处理的图形模式研究[D].吉林:吉林大学计算机科学与技术学院, 2004.
    [3] Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference[M]. San Mateo, California: Morgan Kaufmann Publishers, 1988.
    [4] Heckerman D. Bayesian networks for data mining. Data Mining and Knowledge Discovery[J]. 1997,1: 79-119.
    [5] Nir Friedman, M. Linial, I. Nachman, D. Pe'er. Using Bayesian Networks to Analyze Expression Data[J]. Journal of Computational Biology, 2000, 7:601-620.
    [6] Sampsa Hautaniemi, Henrik Edgren, Petri Vesanen etc. A novel strategy for microarray quality control using Bayesian networks[J]. Bioinformatics, 2003,16(19): 2031-2038.
    [7]邢永康,沈一栋.基于互信息和测度学习信度网结构.重庆大学学报(自然科学版)[J], 2001, 24(1):78-83.
    [8]张坤,徐永红,王珩,刘凤玉.用于入侵检测的贝叶斯网络[J].小型微型计算机系统,2003,24(5): 913-915.
    [9] Widmer G, Kubat M. Learning in the presence of concept drift and hidden contexts[J]. Machine Learning, 1996, 23, 69-101.
    [10] Batista G, Monard M C. An Analysis of Four Missing Data Treatment Methods for Supervised Learning[J]. Applied Artificial Intelligence, 2003, 17:519-533.
    [11] Hodge V, Austin J. A Survey of Outlier Detection Methodologies[J]. Artificial Intelligence Review, 2004, 22(2): 85-126.
    [12] Reinartz T. A Unifying View on Instance Selection[J]. Data Mining and Knowledge Discovery, Kluwer Academic Publishers, 2002, 6: 191–210.
    [13] Yu L, Liu H. Efficient Feature Selection via Analysis of Relevance and Redundancy[J]. JMLR, 2004, 5(Oct):1205-1224.
    [14] Markovitch S, Rosenstein D. Feature Generation Using General Construction Functions[J]. Machine Learning, 2002, 49: 59-98.
    [15]宫秀军,孙建平,史忠植.主动贝叶斯网络分类器[J].计算机研究与发展, 2002, 39(5), 574-579.
    [16] Langley P, Iba W, Thompson, K. An analysis of Bayesian classifier[C]. In Proceedings, Tenth National Conference on Artificial Intelligence 1992, 223-228 Menlo Park, CA: AAAI Press.
    [17] Kotsiantis S B. Supervised Machine Learning: A Review of Classification Techniques[J]. Informatica, 2007, 31: 249-268.
    [18] Langley P, Sage S. Induction of selective Bayesian classifiers[C]. In: Mantaras RL, Poole DL, eds. Proc. of the 10th Conf. on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers, 1994. 399-406.
    [19] Webb G I, Pazzani M J. Adjusted probability naive Bayesian induction[C]. In: Antoniou G, Slaney JK, eds. Proc. of the 11th Australian Joint Conf. on Artificial Intelligence. Berlin: Springer-Verlag, 1998. 285-295.
    [20] Kohavi R. Scaling up the accuracy of Naive Bayes classifiers: A decision-tree hybrid[C]. In: Simoudis E, Han J, Fayyad UM, eds. Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining. Menlo Park: AAAI Press, 1996. 202-207.
    [21] Kononenko I. Seminaive Bayesian classifier[C]. In: Kodratoff Y, ed. Proc. of the 6th European Working Session on Learning. New York: Springer-Verlag, 1991: 206-219.
    [22] Cheng J, Greiner R, Kelly J. Learning Bayesian networks from data: An efficient approach based on information-theory[J]. Artificial Intelligence, 2002, 137 (1-2): 43-90.
    [23] Pazzani MJ. Searching for dependencies in Bayesian classifiers[C]. In: Fisher D, Lenz HJ, eds. Learning from Data: Artificial Intelligence and Statistics V. New York: Springer-Verlag. 1996. 239-248.
    [24] Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers[J]. Machine Learning, 1997, 29: 131-161 Wilson D.R., T.R. Martinez, Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6(1), 1997, 1-34.
    [25]宫秀军.贝叶斯学习理论及其应用研究[D].北京:中国科学院研究生院(计算技术研究所), 2002.
    [26] Tsymbal A. The problem of concept drift: definitions and related work[R]. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland, 2004.
    [27]陈海霞.面向数据挖掘的分类器集成研究[D].吉林:吉林大学计算机科学与技术学院, 2006.
    [28] Stanley K O. Learning concept drift with a committee of decision trees[R]. Tech Report UTAI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA, 2003.
    [29] Widmer G., Kubat M. Effective learning in dynamic environments by explicit context tracking[C]. Proc. 6th European Conf. on Machine Learning ECML-1993, Springer-Verlag, Lecture Notes in Computer Science, 1993, 667:227-243.
    [30] Klinkenberg Ralf. Learning Drifting Concepts: Example Selection vs. Example Weighting[C]. In Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 2004, Vol. 8 (3), 281-300.
    [31] Maloof M, Michalski R. Selecting examples for partial memory learning[J]. Machine Learning, 2000, 41, 27-52.
    [32] Delany S J, Cunningham P, Tsymbal A, Coyle L A. Case-Based Technique for Tracking Concept Drift in Spam Filtering[C]. Applications and Innovations in Intelligent Systems XII, Proceedings of AI2004, In: Macintosh, A., LNCS, Springer, 2004, 3-16.
    [33] Koychev, R Lothian. Tracking Drifting Concepts by Time Window Optimisation[C]. Proceedings of the 25th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, 2005, 46-59.
    [34] Domingos P, Hulten G. Mining high-speed data streams[C]. Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining. 2000, 71-80.
    [35] Hulten G., Spencer L. and Domingos P. Mining Time-Changing Data Streams[C]. Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, 2001, 97-106.
    [36] Kuh A, Petsche T, Rivest R L. Learning time-varying concepts[J]. Advances in Neural Information Processing Systems, 1991, 3:183-189.
    [37] Helmbold D P, Long P M. Tracking drifting concepts by minimizing disagreements[J]. Machine Learning, 1994, 14:27-45.
    [38] Klinkenberg R. Learning Drifting Concepts: Example Selection vs. Example Weighting[J]. In Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 2004, 8(3):281-300.
    [39] Kukar M. Drifting Concepts as Hidden Factors in Clinical Studies[C]. Proceedings of 9th Conference on Artificial Intelligence in Medicine in Europe, AIME 2003, Protaras, Cyprus, Springer-Verlag, Lecture Notes in Computer Science, October 18-22, 2003, 2780:355-364.
    [40] Melville P. and Mooney R.J. Diverse Ensembles for Active Learning[C]. Proceedings of the 21st International Conference on Machine Learning (ICML-2004), Banff, Canada, July 2004, 584-591.
    [41] Schlimmer J C, Granger R H Jr. Incremental learning from noisy data[J]. Machine Learning, 1986, 1:317-354.
    [42] Mitchell T., Caruana R., Freitag D., McDermott J. and Zabowski D. Experience with a Learning Personal Assistant[J]. Communications of the ACM, 1994, 37(7):81-91.
    [43] Littlestone N. and Warmuth M.K. The weighted majority algorithm[J]. Information and Computation, 1994, 108:212-261.
    [44] Street WN and Kim YS. A streaming ensemble algorithm for large-scale classification[C]. Proceeding of the 7th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, 2001, 377-382.
    [45] Kolter J.Z. and Maloof M.A. Using additive expert ensembles to cope with concept drift[C]. In: Proceedings of the Twenty-second International Conference on Machine Learning, New York, NY: ACM Press, 2005, 449-456.
    [46] M. Kukar. Drifting Concepts as Hidden Factors in Clinical Studies[C]. In Dojat, D., Elpida T. Keravnou, Pedro Barahona (Eds.): Proceedings of 9th Conference on Artificial Intelligence in Medicine in Europe, Protaras, Cyprus, October 18-22, 2003, 2780:355-364.
    [47] Wang H., Fan W., Yu P.S., Han J. Mining concept-drifting data streams using ensemble classifiers[C]. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining KDD-2003, ACM Press, 2003, 226-235.
    [48] Freund Y. and Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Computer and Sys. Sciences, 1997, 55:119-139.
    [49]孙岳,毛国君,刘旭,刘椿年.基于多分类器的数据流中的概念漂移挖掘[J].自动化学报, 2008, (1):93-97.
    [50] Scholz Martin and Klinkenberg Ralf. Boosting Classifiers for Drifting Concepts[C]. In Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams, 2007, 11(1):3-28.
    [51] Kotsiantis, P. Pintelas. Recent Advances in Clustering: A Brief Survey[C]. WSEAS Transactions on Information Science and Applications, 2004, 1(1): 73-81.
    [52] Kotsiantis, P. Pintelas, Recent Advances in Clustering: A Brief Survey[C]. WSEAS Transactions on Information Science and Applications, 2004, 1(1): 73–81.
    [53] Banfield J. and Raftery A. Model-based Gaussian and non-Gaussian Clustering[J]. Biometrics, 1993, 49:803-821.
    [54] Cheeseman P. and Stutz J. Bayesian classification (Auto-Class): Theory and results[C]. Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA, 1995, 153–180.
    [55] Pena J., Lozano J., Larranaga P. Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction[J]. Machine Learning, 2002, 47: 63-89.
    [56] Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of Royal Statistical Society B, 39, 1-38, 1977.
    [57] Cover T M, Thomas J A. Elements of Information Theory[M]. John Wiley and Sons, Inc., New York, 1991.
    [58] Lauritzen S L, Spiegelhalter D. Local Computations with Probabilities on Graphical Strictures and their Applications to Expert Systens[J]. Joumal of the Royal Statistical Society, 1988, 50(1570): 157-224.
    [59] Nevin Zhang, David Poole. A simple approach to Bayesian network computations[C]. Proceedings of the 10th Canadian Conference on Artificial Intelligence,1994:171-178.
    [60] Wenhui Liao, Weihong Zhang, Qiang J. A Factor Tree Inference Algorithm for Bayesian Networks and Its Application[C]. Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2004. Page: 652-656.
    [61] Pear1 J. Evidential reasoning using stochastic simulation of causal models[J]. Artificial Intelligence, 1987, 32:245-257.
    [62] Hrycej T. Gibbs sampling in Bayesian networks[J]. Artificial Intelligence, 1990, 46: 351-363.
    [63] Saul L K, Jaakkola T, Jordan M. Mean field theory for sigmoid belief networks[J]. Journal of Artificial Intelligence Research, 4: 61-76.1996.
    [64] Yedidia J S, Freeman W T, Weiss Y. Understanding belief propagation and its generalizations[R]. Technical Report TR2001-22, MERL Research Lab.2001.
    [65] Wainwright M J, Jordan M I. Graphical models exponential families, and variational inference[R]. Technical Report 549, Department of Statistics, University of California, Berkeley. 2003.
    [66] Y. Wang, Zhang N L, Chen T. Latent tree models and approximate inference in Bayesian networks[J]. Journal of Artificial Intelligence Research, 2008, 32: 879-900..
    [67] Cooper G F, Herskovits E. A Bayesian method for the induction of probabilistic networks from data[J]. Machine Learning, 1992, 9: 309-347.
    [68] Remco R. Bouckaert. Properties of Bayesian Belief Network Learning Algorithms[C]. Uncertainty in Artificial Intelligence (UAI) conferences, 1994: 102-109.
    [69] Heckerman D, Mamdani A, Wellman M. Real-world applications of Bayesian networks. Communications of the ACM, 1995, 38 (3): 24-26.
    [70] Suzuki J. Learning Bayesian belief networks based on the MDL principle: An efficient algorithm using the branch and bound technique[C]. Proceedings of the international conference on machine learning, Bari, Italy, 1996, 2(82)..
    [71] Lam W, Bacchus F. Learning Bayesian belief networks: an approach based on the MDL principle[J]. Computational Intelligence, 1994, 10: 269-293.
    [72] Friedman N, Goldszmidt M. Learning Bayesian networks with local structure[C]. Proceedings of the twelfth international conference on uncertainty in artificial intelligence, 1996.
    [73] Wallace C, Korb K B, Dai H. Causal discovery via MML[C]. Proceedings of the Thirteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1996. 516-524.
    [74] Cooper G F, Herskovits E A. Bayesian Method for the Induction of Probabilistic Networks from Data[J]. Machine Learning, 1992, 9, 309-347.
    [75] Y. Xiang, S. K. M. Wong, N. Cercone, A 'Microscopic' Study of Minimum Entropy Search in Learning Decomposable Markov Networks[J], Machine Learning, 1997, 26(1): 65-92.
    [76]王飞,刘大有,薛万欣.基于遗传算法的Bayesian网中连续变量离散化的研究[J].计算机学报, 2002, 25(8): 794-800.
    [77]王飞,刘大有,卢奕南,虞强源.基于遗传算法的动态Bayesian网结构学习的研究[J].电子学报, 2003, 31 (05): 698-702.
    [78]王飞,刘大有,王淞昕.基于遗传算法的Bayesian网结构增量学习的研究[J].计算机研究与发展, 2005, 42 (09): 1461-1466.
    [79] Xiao-Lin Li, Shuang-Cheng Wang and Xiang-Dong He. Learning Bayesian networks structures based on memory binary particle swarm optimization[C]. Simulated Evolution and Learning, Lecture Notes in Computer Science (4247), 2006, 568-574.
    [80]刘欣,贾海洋,刘大有.基于粒子群优化算法的Bayesian网络结构学习[J].小型微型计算机系统, 2008, 8: 1516-1519.
    [81]潘吉斯,吕强,王红玲,一种并行蚁群Bayesian网络学习的算法[J].小型微型计算机系统, 2007, 28(4), 651-655.
    [82] Fung R M, Crawford S. L. Constructor: a system for the induction of probabilistic models[C]. In Proceedings of AAAI-90, 1990, 762-769
    [83] Wermuth N, Lauritzen S. Graphical and recursive models for contingency tables[J]. Biometrika, 1983, 72, 537-552.
    [84] Spirtes P, Glymour C, Scheines R. Causality from probability[C] proceedings of Advanced Computing for the Social Sciences, Williamsburgh, VA,1990.
    [85] Cheng J, Greiner R, Kelly J. Learning Bayesian networks from data: An efficient approach based on information-theory. Artificial Intelligence, 137 (1-2): 43-90, 2002
    [86]王双成,苑森淼.具有丢失数据的贝叶斯网络结构学习研究[J].软件学报, 2004, 15(7), 1042-1048.
    [87] P′adraig Cunningham. Ensemble Techniques[R]. Technical Report UCD-CSI-2007-5, 2007.
    [88] Zhou Z-H, Wu J, Tang W. Ensemble neural networks: many could be better than all[J]. Artificial Intelligence, 2002, 137(1-2): 239-263.
    [89]周志华,选择性集成, MLA05, 2005.
    [90] Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning[C]. In: Tesauro G, Touretzky D S, Leen T K, eds. Advances in Neural Information Processing Systems 7, Cambridge, MA: MIT Press, 1995, 231-238.
    [91] Zhou Z-H, Tang W. Selective ensemble of decision trees[J]. In: Wang G, Liu Q, Yao Y, Skowron A, eds. Lecture Notes in Artificial Intelligence 2639, Berlin: Springer, 2003, 476-483.
    [92] Kearns M. The Computational Complexity of Machine Learning[M]. Cambridge: MIT Press, 1990.
    [93] Kearns M, Valiant L G. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata[J]. Journal of the ACM, 1994, 41(1):67-95.
    [94] Breiman L. Bagging Predictors[J]. Machine Learning. 1996, 24(2): 123-140.
    [95] Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and Variants[J]. Machine Learning, 1997, 36(1-2), 105-139.
    [96] Valiant L G. A Theory of the Learnable[C]. Communications of the ACM, 1984, 27(11): 1134-1142.
    [97] Breiman L. Random forests[J]. Machine Learning. 2001, 45(1):5-32.
    [98] Schapire R E. The Strength of Weak Learnability. Machine Learning. 1990, 5(2):197-227.
    [99] Freund Y. Boosting a Weak Learning Algorithm by Majority[J]. Information and Computation, 1995, 121(2):256-285.
    [100] Drucker H, Schapire R E, Simard P. Boosting Performance in Neural Networks[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(4):705-719.
    [101] Hansen L K, Salamon P. Neural network ensembles[C]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10):993–1001.
    [102] Breiman L. Bias, variance, and arcing classifiers[R]. Technical Report 460, Statistics Department, University of California, Berkeley, CA: 1996.
    [103] Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussions)[J]. The Annals of Statistics, 2000, 28(2): 337-407.
    [104] Ho T K. Nearest neighbors in random subspaces[C]. In Advances in Pattern Recognition, Joint IAPR International Workshops SSPR’98 and SPR’98, Sydney, NSW, Australia, August 11-13, 1998, 640-648.
    [105] Ho T K. The random subspace method for constructing decision forests[C]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8):832-844.
    [106] Cunningham P, Carney J. Diversity versus quality in classification ensembles based on feature selection[C]. Machine Learning: ECML 2000, 11th European Conference on Machine Learning, Barcelona, Catalonia, Spain, 2000, 109–116.
    [107] Brodley C, Lane T. Creating and exploiting coverage and diversity[C]. In: Proc. AAAI-96 Workshop on Integrating Multiple Learned Models, Portland, OR, 1996, 8-14.
    [108] Bauer E., Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants[J]. Machine Learning, 1999, 36 (1,2), 105-139.
    [109] Schaffer C. Selecting a classification method by cross-validation[J]. Machine Learning, 1993, 13, 135-143.
    [110] Kennedy J, Eberhart R. Particle Swarm Optimization[C]. Proc. of IEEE International Conference on Neural Networks, 1995, Vol. IV, 1942-1948, Perth, Australia, 1995.
    [111]张长胜.求解规划、聚类和调度问题的混合粒子群算法研究[D].吉林:吉林大学计算机科学与技术学院, 2009.
    [112] Engelbrecht P, Ismail A. Training product unit neural networks[J]. Stability and Control: Theory and Applications, Vol. 2, No. 1-2, 59-74, 1999.
    [113] F. van den Berg. Particle Swarm Weight Initialization in Multi-layer Perceptron Artificial Neural Networks[J]. In Development and Practice of Artificial Intelligence Techniques, 1999, 41-45.
    [114] Shi Y, Eberhart R C. Empirical Study of Particle Swarm Optimization[C]. Proc. Of the Congress on Evolutionary Computation, 1999, 1945-1949,.
    [115] Shi Y, Eberhart R.C. A Modified Particle Swarm Optimizer[C]. Proceedings of the IEEE International Conference on Evolutionary Computation, 1998. 69-73.
    [116] Holland JH. Adaptation in natural and artificial systems[M]. Ann Arbor: University of Michigan Press; 1975.
    [117] Konak A, Coit D W, Smith A E. Multi-objective optimization using genetic algorithms: A tutorial[J]. Reliability Engineering and System Safety, 2006, 91(9): 992-1007.
    [118]石洪波,王志海,黄厚宽,励晓健.一种限定性的双层贝叶斯分类模型[J]软件学报, 2004(2), 193-199.
    [119] Zhang N L, Nielsen T D, Jensen F. V. Latent variable discovery in classification models[J].Artificial Intelligence in Medicine 2003, 30(3), 283–299.
    [120] T. Kocka, N. L. Zhang. Dimension correction for hierarchical latent class models[C]. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence. 2002, San Francisco, CA., , Morgan Kaufmann Publishers:267–274.
    [121] N. L. Zhang, Hierarchical latent class models for cluster analysis[J]. Journal of Machine Learning Research 2004, 5(6), 697–723.
    [122] Helge Langseth, Thomas D. Nielsen. Classification using Hierarchical Naive Bayes models[J]. Machine learning, 2006(63):135-159.
    [123]关菁华.基于依赖分析的贝叶斯网结构学习和分类器的研究与实现[D].吉林:吉林大学计算机科学与技术学院, 2005.
    [124] Thomas Verma, Judea Pearl. An Algorithm for Deciding if a Set of Observed Independencies Has a Causal Explanation. Proceeding of the 8th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 323-330, 1992.
    [125]王双成,苑森淼,王辉.基于类约束的贝叶斯网络分类器学习[J].小型微型计算机系统, 2004, 6(25): 968-971.
    [126] http://www.norsys.com/.
    [127] Beinlich I A, Suermondt H J, Chavez R M et al. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In: Proceedings of the 2th Second European Conference on Artificial Intelligence in Medicine, London, England, 1989: 247-256.
    [128] Lucas P J F. Expert knowledge and its role in learning Bayesian networks in medicine[C]: An appraisal. LECT NOTES ARTIF INT, 2001, 2101: 156-166
    [129] Heckerman D, Geiger D, Chickering D M. Learning Bayesian networks: the combination of knowledge and statistical data[R]. Technical Report MSR-TR-94-09, Microsoft Research, 1994. To appear, Machine Learning Journal.
    [130] Greiner R, Zhou W. Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers[C]. AAAI Proceedings, 2002, 167-173.
    [131] Eric Sven Ristad. A Natural Law of Succession[R]. Research Report CS-TR-495-95, 1995.
    [132] Moninder Singh, Marco Valtorta. Construction of Bayesian Network Structures from Data[J]. Intenational Journal of Approximate Reasoning, 1994.
    [133] J Cheng, R Greiner. Comparing Bayesian network classifiers[C]. In: Laskey KB, Prade H, eds. Proc. of the 15th Conf. on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers, 1999. 101-108.
    [134] Tsymbal A, Puuronen S. Bagging and boosting with dynamic integration of classifiers[C]. In:D.A. Zighed, J. Komorowski, J. Zytkow (eds.), Principles of Data Mining and Knowledge Discovery, Proceedings of PKDD 2000, Springer, LNAI 1910, 2000, 116-125.
    [135] Wilson D R, Martinez T R. Improved heterogeneous distance functions[J]. Journal of Artificial Intelligence Research, 1997, 6(1), 1-34.
    [136] Lazarescu M., Venkatesh S. and Bui H. H. Using Multiple Windows to Track Concept Drift[J]. In the Intelligent Data Analysis Journal, 2004, 8(1):29-59.
    [137] Koychev I. Gradual Forgetting for Adaptation to Concept Drift[C]. Proceedings of ECAI 2000 Workshop on Current Issues in Spatio-Temporal Reasoning, Berlin, 2000, 101-107.
    [138] Tsymbal M. Pechenizkiy, P. Cunningham, S. Puuronen. Dynamic Integration of Classifiers for Handling Concept Drift[J]. Information Fusion, 2008, 9(1):56-68.
    [139] Salzberg S. On comparing classifiers: pitfalls to avoid and a recommended approach[J]. Data Mining and Knowledge Discovery, 1997, 1(3): 317-327.
    [140]关菁华,刘大有,贾海洋.自适应多分类器集成学习算法[C].第二十五届中国数据库学术会议,计算机研究与发展, 2008, 45(10A):218-221.
    [141] T.M Cover and J.A. Thomas. Information Theory[M]. Wiley, 1991.
    [142] Gonzalo Martinez-Munoz and Alberto Suarez. Pruning in ordered bagging ensembles[C]. In 23rd International Conference in Machine Learning, ACM Press, 2006, 609–616.
    [143] Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning[C]. Advances in Neural Information Processing Systems 7, In Gerald Tesauro, USA, MIT Press, 1994, 231–238.
    [144] Giacinto G, Roli F. Design of effective neural network ensembles for image classification purposes[J]. Image and Vision Computing, 2001, 19(9-10):699-707.
    [145] Bakker B and Heskes T. Clustering ensembles of neural network models[J]. Neural Networks, 2003, 16:261-269.
    [146] Christino Tamon and Jie Xiang. On the boosting pruning problem[C]. Proc. 11th European Conference on Machine Learning, Springer, Berlin, 2000, 404-412.
    [147] Duda, R. O. and P. E. Hart. Pattern Classification and Scene Analysis[M]. 1973, NewYork: John Wiley & Sons.
    [148] Onisko A, Lucas P, Druzdzel M J. Comparison of rule-based and Bayesian network approaches in medical diagnostic systems. LECT NOTES ARTIF INT, 2001, 2101: 283-292
    [149]王利民.贝叶斯学习理论中若干问题的研究[D].吉林:吉林大学计算机科学与技术学院, 2005.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700