基于互信息与先验信息的机器学习方法研究

英文题名：Study of the Machine Learning Methods Based on Mutual Information and Prior Knowledge
作者：王泳
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：机器学习 ; 先验信息 ; 归一化互信息 ; 人工神经元网络 ; 模型构造 ; 模型选择 ; 模式识别 ; 回归分析
英文关键词：machine learning ; prior knowledge ; normalized mutual information ; artificial neural networks ; model construction ; model selection ; pattern recognition ; regression
学位年度：2008
导师：胡包钢
学科代码：081203
学位授予单位：中国科学院研究生院（自动化研究所）
论文提交日期：2008-04-01
答辩委员会主席：张鸿宾

摘要

本文对基于互信息与先验信息的机器学习方法进行了研究。针对模式识别,本文研究了基于互信息的分类模型选择问题,提出了归一化互信息(Normalized MutualInformation)学习准则,分析并讨论了在处理二类和多类分类问题时,它与其它分类准则(准确率、精确率、召回率、ROC曲线、P-R曲线等)之间的非线性关系,并以支撑向量机方法中核函数选择问题为例,应用统计方法对互信息学习准则进行了研究;针对回归分析,本文研究了带广义约束的神经元网络模型(Generalized Constraint NeuralNetworks),讨论了神经元网络与部分已知关系进行结合的基本方法,通过应用先验知识来构造解决特定问题的神经元网络模型,以增加神经元网络的“透明度”。本文的主要工作和贡献有以下几个方面:
     ①针对模式识别,研究了基于互信息的模型选择问题。提出了归一化互信息学习准则,推导并分析了在处理二类和多类分类问题时,它与其它分类准则(准确率、精确率、召回率、ROC曲线、P-R曲线等)之间的非线性关系,并对其应用特点与局限性作出初步解释。指出基于信息理论为学习准则的机器(分类、或聚类)学习原理就是将无序(类标、或特征)数据转变为有序(类标、或特征)数据的过程,其中转变效果是以信息熵为测量尺度。虽然不确定度(信息熵)为分类器设计者提供了独特的,不同于传统性能准则的有用信息,但该准则在分类问题应用上还有一定的局限性,特别是不确定度与传统分类性能指标并非有一致而单调的函数关系,在进行分类器设计选择时仍然需求传统分类性能指标的辅助计算。
     ②以支撑向量机方法中核函数选择问题为例,应用统计方法对互信息学习准则进行了研究。通过综合实验和根据气象数据进行的特性实验表明:不同模型评估准则之间存在差异,但应用统计方法可以从这些差异中发现一些规律。同时,不同统计方法之间也存在差异,且这种差异对模型评估的影响要大于由于评估准则的不同而产生的影响。互信息学习准则作为一个综合性指标,在一定程度上可以弥补其它单一评估准则的不足。所以,在模型选择和模型评估时,要在应用多种统计方法的基础上,综合考评多种评估准则。
     ③对人工神经元网络在解决“黑箱”问题方面的研究进展进行了文献综述。提出了“透明度”研究中的多层次划分分类框架,并针对回归分析,研究了基于先验信息的模型构造方法。讨论了带广义约束的神经元网络模型与部分已知关系进行结合的基本方法,特别是其中两种最为常见的“加和模型”和“乘积模型”。初步分析了广义约束神经元网络模型优于传统神经元网络学习性能的条件和应用特点。
This work studies the machine learning methods based on mutual information and prior knowledge.For pattern recognition,this work studies the problem of classifier evaluation based on mutual information,and proposes the normalized mutual information criterion(NI).By analysis and deduction we prove NI is the nonlinear functions of other classification criteria(accuracy,precision,recall,ROC curve,P-R curve)and its statistical characters have been studied by applying it to the problem of kernel selections in support vector machines;For regression problems,this work studies the generalized constraint neural networks(GCNN) for the purpose of associating neural networks with partially known relationships, and reviews the methods of incorporating domain knowledge into neural networks for increasing its "transparency".The main contributions of this work include following issues:
     ①For pattern recognition,this work studies the problem of classifier evaluation based on mutual information,and proposes the normalized mutual information criterion(NI).By analysis and deduction we prove NI is the nonlinear functions of other classification criteria(accuracy,precision,recall, ROC curve,P-R curve)and its application characters and limitations are primarily discussed.We point out that the classification(or clustering) based on information-based criteria is the process to transform the disordered data(label or features)into ordered data(label or features),and the transformation effect is measured by entropy.Though uncertainty(or entropy)provides a typical,unlike traditional,measurement for classifier designers,it has limitations for practical applications,especially it does not show the monotonous property with traditional classification criteria,and it needs the aids of traditional criteria for model evaluation.
     ②This work studies the characters of NI with statistical methods by applying it to the problem of kernel selections in support vector machines.By synthesis experiments and special experiments on weather data,we point out that: there exists difference among different model evaluation criteria,but some rules can be found from the difference by statistical methods.Meanwhile, difference among different statistical methods also exists,and it affects the final results more seriously than the difference among different criteria.As a synthesis criterion NI shows statistical superiority than traditional criteria to a certain extent.So for model selection or model evaluation,it should be based on different statistical methods and different evaluation criteria need be analyzed.
     ③For the "Black-box" problem of artificial neural networks,this work reviews the usually used methods of applying domain knowledge into neural networks for increasing its "transparency".A new framework of classifying different methods has been proposed,and for regression problems,this work studies the model construction method of applying prior knowledge. Furthermore,we discuss the generalized constraint neural networks for the purpose of associating neural networks with partially known relationships, and study two typical cases-Superposition and Multiplication.We deduce the conditions under which GCNN is superior to traditional neural networks, and elementarily analyze its application characters.

引文

[1]Y.S.Abu-Mostafa,"Learning from Hints in Neural Networks",Journal of Complexity,Vol.6(2),pp.192-198,1990.
    [2]Y.S.Abu-Mostafa,"A Method for Learning from Hints",Advances in Neural Information Processing Systems,NIPS-5,pp.73-80,San Mateo,CA,1993.
    [3]Y.S.Abu-Mostafa,"Hints",Neural Computation,Vol.7,pp.639-671,1995.
    [4]Y.S.Abu-Mostafa,"Financial Model Calibration Using Consistency Hints",IEEE Trans.on Neural Networks,Vol.12(4),pp.791-808,2001.
    [5]F.Aires,M.Schmitt,A.Chedin,and N.Scott,"The Weight Smoothing Regularization of MLP for Jacobian Stabilization",IEEE Trans.on Neural Networks,Vol.10(6),pp.1502-1510,1999.
    [6]J.A.Anderson,and E.Rosenfeld,Neurocomputing:Foundations of Research.MIT Press,Boston,MA,1989.
    [7]S.Amari,S.Wu,"Improving Support Vector Machine Classifiers by Modifying Kernel Functions",Neural Networks,Vol.12,pp.783-789,1999.
    [8]A.Argyriou,R.Hauser,C.Micchelli,and M.Pontil,"A DC-Programming Algorithm for Kernel Selection",Proceedings of the 23rd International Conference on Machine Learning(ICML),ACM,2006.
    [9]N.Aronszajn,"Theory of Reproducing Kernels",Transactions of the American Mathematical Society,Vol.68,pp.337-404,1950.
    [10]E.Artyomov,and Y.-P.Orly,"Modified High-Order Neural Network for Invariant Pattern Recognition",Pattern Recognition Letters,Vol.26,pp.843-851,2005.
    [11] A. Aznar, Econometric model selection: a new approach. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1989.
    [12] E. Barnard, and D. Casasent, "Invariance and Neural Nets", IEEE Trans. on Neural Networks, Vol. 2(5), pp. 498-508, 1991.
    [13] P. L. Bartlett, "The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weight is More Important than the Size of the Network", IEEE Trans. on Information Theory, Vol. 44(2), pp. 525-536, 1998.
    [14] V. Basios, A.Y. Bonushkina, and V.V. Ivanov, "A method for approximating one-dimensional functions" , Comput. Math. Applic, Vol. 7/8, pp. 687-693, 1997.
    [15] R. Battiti, "Using mutual information for selecting features in supervised neural net learning" , IEEE Trans. on Neural Networks, Vol. 5, pp. 537-550, 1994.
    [16] J. M. Bernardo, and A. F. M. Smith, Bayesian Theory. John Wiley, New York, 1994.
    [17] K. P. Bennet, M. Momma, and M. J. Embrechts, "Mark: A Boosting Algorithm for Heterogeneous Kernel Models" , Proceedings of Knowledge Discovery and Data Mining (SIGKDD), pp. 24-31, 2002.
    [18] C. Bishop, "Improving the Generalization Properties of Rdial Basis Function Neural Networks", Neural Computation, Vol. 3(4), pp. 579-588, 1991.
    [19] C. Bishop, "Curvature-Driven Smoothing: A Learning Algorithm for Feedforward Networks", IEEE Trans, on Neural Networks, Vol. 4(5), pp. 882-884, 1993.
    [20] M. Borga, "Learning Multidimensional Signal Processing", Ph.D. Dissertation, Linkoping University, Sweden, 1998.
    [21] O. Bousquet, and D. J. L. Herrmann, "On the Complexity of Learning the Kernel Matrix", Advances in Neural Information Processing Systems (NIPS-15), pp. 399-406, 2002.
    [22] M. W. Browne, "Cross-validation methods", Journal of Mathematical Psychology, Vol. 44, pp. 108-132, 2000.
    [23] W. Buntine, "Graphical Models for Discovering Knowledge", In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusay, eds., Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, USA, PP. 59-83, 1995.
    [24] E. A. Catchpole, and J.T. Morgan, "Detecting parameter redundancy" , Biometrika, Vol. 84, pp. 187-196, 1997.
    [25] A. Chalimourda, B. Scholkopf, and A. Smola, "Experimentally Optimal v in Support Vector Regression for Different Noise Models and Parameter Settings" , Neural Networks, Vol. 17(1), pp. 127-141, 2004.
    [26] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing Multiple Parameters for Support Vector Machines" , Machine Learning, vol. 46(1), pp. 131-159, 2002.
    [27] T. P. Chen, and H. Chen, "Universal Approximation to Nonlinear Operators by Neural Networks with Arbitrary Activation Functions and Its Application to Dynamical Systems", IEEE Trans. on Neural Networks, Vol. 6(4), pp. 911-917, 1995.
    [28] Z. Chen, and S. Haykin, "On Different Facets of Regularization Theory", Neural Computation, Vol. 14, pp. 2791-2846, 2002.
    [29] Y.-X. Chen, J.-Z. Wang, "Support Vector Learning for Fuzzy Rule-Based Classification Systems", IEEE Trans. on Fuzzy System, Vol. 11(6), pp. 716-728, 2003.
    [30]V.Cherkassky,and Y.Ma,"Practical Selection of SVM Parameters and Noise Estimation for SVM Regression",Neural Networks,Vol.17(1),pp.113-126,2004.
    [31]G.Cauwenberghs,and T.Poggio,"Incremental and Decremental Support Vector Machine Learning",Advances in Neural Information Processing Systems (NIPS-13),pp.409-415,2000.
    [32]K.-M.Chung,W.-C.Kao,C.-L.Sun,L.-L.Wang,and C.-J.Lin,"Radius Margin Bounds for Support Vector Machines with the RBF Kernel",Neural Computation,Vol.15,pp.2643-2681,2003.
    [33]P.S.Churchland,and T.J.Sejnowski,The Computational Brain.Cambridge,MIT Press,1992.
    [34]T.M.Cover,and J.A.Thomas,Elements of Information Theory.John Willy,New York,1991.(中译本:信息论基础,阮吉寿,李华译,机械工业出版社,2005)
    [35]R.A.Cozzio-Bueler,The Design of Neural Networks Using a Priori Knowledge,Ph.D Dissertation,Zurich,Switzerland:Swiss Federal Institute of Technology,1995.
    [36]K.Crammer,J.Keshet,and Y.Singer,"Kernel Design Using Boosting",Advances in Neural Information Processing Systems(NIPS-15),pp.537-544,2002.
    [37]M.W.Craven,and J.W.Shavlik,"Visualizing Learning and Computation in Artificial Neural Networks",International Journal on Artificial Intelligence Tools,Vol.1(3),pp.399-425,1991.
    [38]N.Cristianini,C.Campbell,and J.Shawe-Taylor,"Dynamically Adapting Kernels in Support Vector Machines",Advances in Neural Information Processing Systems(NIPS-11),pp.204-210,1998.
    [39]N.Cristianini,J.Shawe-Taylor,A.Elisseeff,and J.Kandola,"On KernelTarget Alignment",Advances in Neural Information Processing Systems (NIPS-14),pp.367-374,2001.
    [40]DARPA Neural Network Study.Lexington,MA:MIT Lincoln Laboratory,1988.
    [41]丁晓青,吴佑寿,“模式识别统一熵理论”,电子学报,Vol.21(8),pp.1-8,1993.
    [42]R.O.Duda,P.E.Hart,and D.G.Stork,Pattern Classification,John Willy,2nd Edition,2001.(中译本:模式分类,李宏东,姚天翔译,机械工业出版社,2003)
    [43]T.Eriksson,S.Kim,H.-G.Kang,and C.Lee,"An Information-Theoretic Perspective on Feature Selection in Speaker Recognition",IEEE Signal Processing Letter,Vol.12,pp.500-504,2005.
    [44]M.Espinoza,J.A.K.Suykens,and B.D.Moor,"Kernel Based Partially Linear Models and Nonlinear Identification",IEEE Trans.on Automatic Control,Vol.50(10),pp.1602-1606,2005.
    [45]T.Evgeniou,M.Pontil,T.Poggio,"Regularization Networks and Support Vector Machines",Advance in Computational Mathematics,Vol.13(1),pp.1-50,2000.
    [46]R.M.Fano,Transmission of Information:A Statistical Theory of Communication,MIT Press,New York,1961.
    [47]冯汉中,陈永义,“处理非线性分类和回归问题的一种新方法(Ⅱ)-支持向量机方法在天气预报中的应用”,应用气象学报,Vol.15(3),pp.355-365,2004.
    [48]冯端,冯少彤,“溯源探幽:熵的世界”,科学出版社,2005.
    [49]P.Foggia,R.Genna,and M.Vento,"Symbolic vs.Connectionist Learning:An Experimental Comparison in a Structured Domain",IEEE Trans.on Knowledge and Data Engineering,Vol.13(2),pp.176-195,2001.
    [50] K. Fukunaga, and R. R. Hayes, "Effects of Sample Size in Classifier Design", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 11(8), pp. 873-885, 1989.
    [51] G. Fung, M. Dundar, J. Bi, and B. Rao, "A Fast Iterative Algorithm for Fisher Discriminant Using Heterogeneous Kernel", Proceedings of the 21st International Conference on Machine Learning (ICML), ACM, 2004.
    [52] S. I. Gallant, "Connectionist Expert Systems", Communications of the ACM, Vol. 31(2), pp. 152-169, 1988.
    [53] G. D. Garson, "Interpreting Neural-Network Connection Weights", AI Expert, Vol. 6(4), pp. 47-51, 1991.
    [54] X. Geng, D. Zhan, and Z. Zhou, "Supervised Nonlinear Dimensionality Reduction for Visualization and Classification", IEEE Trans. on System, Man, and Cybernetics, Vol. 35(5), pp. 1098-1107, 2005.
    [55] M. Gevrey, I. Dimopoulos, and S. Lek, "Review and Comparison of Methods to Study the Contribution of Variable in Artificial Neural Network Models", Ecological Modeling, Vol. 160(2), pp. 249-264, 2003.
    [56] C. Gold, and P. Sollich, "Model Selection for Support Vector Machine Classification", Neurocomputing, Vol. 55, pp. 221-249, 2003.
    [57] C. Goutte, and L. K. Hansen, "Regularization with a Pruning Prior", Neural Networks, Vol. 10(6), pp. 1053-1059, 1997.
    [58] K. Hagiwara, "Regularization Learning, Early Stopping and Biased Estimator", Neurocomputing, Vol. 48(1), pp. 937-955, 2002.
    [59] B. Hamers, J. Suykens, V. Leemans, and B. D. Moor, "Ensemble Learning of Coupled Parameterized Kernel Models", International Conference on Neural Information Processing (ICONIP), pp. 130-133, 2003.
    [60]D.Haussler,M.Kearns,and R.E.Schapire,"Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension",Machine Learning,Vol.14,pp.83-113,1994.
    [61]D.Haussler,"Convolution Kernels on Discrete Structures",Technical Report UCSC-CRL-99-10,University of California in Santa Cruz,Computer Science Department,July 1999.
    [62]S.Haykin,Neural Networks,A Comprehensive Foundation,Printice Hall,New York,2nd Edition,1999.(中译本:神经元网络原理,叶世伟,史忠植译,机械工业出版社,2004)
    [63]M.Hellman,and J.Raviv,"Probability of error,equivocation,and the Chernoff bound",IEEE Trans.on Information Theory,Vol.16,pp.368-372,1970.
    [64]K.Hornik,"Approximation Capabilities of Multilayer Feedforward Networks",Neural Networks,Vol.4,pp.251-257,1991.
    [65]C.-W.Hsu,C.-C.Chang,C.-J.Lin,"A Practical Guide to Support Vector Classification",Technique Report,National Taiwan University,2003.[http://www.csie.ntu.edu.tw/cjlin/papers/guide/guide.pdf]
    [66]胡包钢,应浩,“模糊PID控制技术研究发展回顾及其面临的若干重要问题”,自动化学报,Vol.27(4),pp.567-584,2001.
    [67]B.-G.Hu,Han-Bing Qu,and Yong Wang,"Associating Neural Networks with Partially Known Relationships for Nonlinear Regressions",ICIC 2005,Part Ⅰ,LNCS 3644,pp.737-746,2005.
    [68]胡包钢,王泳,杨双红,睦寒冰,“如何增加人工神经元网络的透明度?”,模式识别与人工智能,Vol.20(1),pp.72-84,2007.
    [69]胡包钢,王泳,“关于互信息学习准则在分类问题中的应用”,2007年全国模式识别学术会议(CCPR2007),2007年12月,北京,科学出版社,pp.35-45,2007.
    [70] B.-G. Hu, Yong Wang, "Applications of Mutual Information Criteria for Classifications Including Rejected Class", Submitted to ACTA AUTOMAT-ICA SINICA.
    [71] B.-G. Hu, Han-Bing Qu, Yong Wang, and Shuang-Hong Yang, "A Generalized Constraint Neural Network Model: Associating Partially Known Relationships for Nonlinear Regressions", Submitted to Information Science.
    [72] D. Huang, "A Constructive Approach for Finding Arbitrary Roots of Polynomials by Neural Networks", IEEE Trans. on Neural Networks, Vol. 15(2), pp. 477-491, 2004.
    [73] M. Ishii, and I. Kumazawa, "Linear Constraints on Weight Representation for Generalized Learning of Multilayer Networks", Neural Computation, Vol. 13, pp. 2851-2863, 2001.
    [74] J. S. R. Jang, "ANFIS: Adaptive-Network-Based Fuzzy Inference System", IEEE Trans. on Systems, Man, and Cybernetics, Vol. 23(3), pp. 665-685, 1993.
    [75] T. Jebara, "Multi-Task Feature and Kernel Selection for SVMs", Proceedings of the 21st International Conference on Machine Learning (ICML), ACM, 2004..
    [76] W. H. Joerding, and J. L. Meador, "Encoding a Priori Information in Feedforward Networks", Neural Networks, Vol. 4(6), pp. 847-856, 1991.
    [77] T. A. Johansen, "Identification of Non—Linear Systems Using Empirical Data and Prior Knowledge - An Optimization Approach", Automatica, Vol. 32(3), pp. 337-356, 1996.
    [78] G. E. Johnson, "Mimic Nets", IEEE Trans. on Neural Networks, Vol. 4(5), pp. 803-815, 1993.
    [79] J. S. Judd, "Learning in Networks is Hard", IEEE First International Conference on Neural Networks, San Diego, CA: IEEE, Vol. 2, pp. 685-692, 1987.
    [80] D. A. Karras, and S. J. Perantonis, "An Efficient Constrained Training Algorithm for Feedforward Networks", IEEE Trans. on Neural Networks, Vol. 6(6), pp. 1420-1434, 1995.
    [81] S. S. Keerthi, "Efficient Tuning of SVM Hperparameters Using Radius/Margin Bound and Iterative Algorithms", IEEE Trans. on Neural Networks, Vol. 13(5), pp. 1225-1229, 2002.
    [82] S. S. Keerthi, and C.-J. Lin, "Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel", Neural Computation, Vol. 15, pp. 1667-1689, 2003.
    [83] T. Keller, P. Gerjets, K. Scheiter, and B. Garsoffky, "Information Visualizations for Knowledge Acquisition: The Impact of Dimensionality and Color Coding", Computers in Human Behavior, Vol. 22(1), pp. 43-65, 2006.
    [84] J. L. Kevin, and J. W. Michael, "Learning to Tell Two Spirals Apart", Proceedings of the 1988 Connectionist Models Summer School, Morgan Kauf-mann, pp. 52-59, 1988.
    [85] S.-J. Kim, S. Magnani, and S. Boyd, "Optimal Kernel Selection in Kernel Fisher Discriminant Analysis", Proceedings of the 23rd International Conference on Machine Learning (ICML), ACM, 2006.
    [86] M. Koppen, D. H. Wolpert, and W. G. Macready, "Remarks on a recent paper on the 'no free lunch' theorems", IEEE Trans. on Evolutionary Computation, Vol. 5(3), pp. 295-296, 2001.
    [87] J. Kowk, "The Evidence Framework Applied to Support Vector Machines", IEEE Trans. On Neural Networks, Vol. 11(5), pp. 1162-1173, 2000.
    [88]N.Kwak,and C.-H.Choi,"Input feature selection for classification problems",IEEE Trans.on Neural Networks,Vol.13,pp.143-159,2002.
    [89]J.Lampinen,and A.Selonen,"Multilayer Perceptron Training with Inaccurate Derivative Information",Proceedings of the 1995 IEEE International Conference on Neural Networks(ICNN'95),pp.2811-2815,Perth,WA,1995.
    [90]J.Lampinen,and A.Vehtari,"Bayesian Approach for Neural NetworksReview and Case Studies",Neural Networks,Vol.14(3),pp.257-274,2001.
    [91]G.R.G.Lanckriet,N.Cristianini,P.Bartlett,L.El-Ghaoui,and M.I.Jordan,"Learning the Kernel Matrix with Semi-Definite Programming",Journal of Machine Learning Research,Vol.5,pp.27-72,2004.
    [92]B.Lang,"Monotonic Multi-layer Perceptron Networks as Universal Approximators",In W.Duch,J.Kacprzyk,E.Oja,and S.Zadrozny Eds.,Artificial Neural Networks - ICANN'05,Lecture Notes in Computer Science,PP.31-37,Springer,2005.
    [93]Y.LeCun,B.Boser,J.S.Denker,D.Henderson,R.E.Howard,W.Hubbard,and L.D.Jackel,"Backpropagation applied to hand written zip code recognition",Neural Computation,Vol.1,pp.541-551,1989.
    [94]S.Lee,and R.M.Kil,"A Gaussian Potential Function Network With Hierarchically Self-Organizing Learning",Neural Networks,Vol.4,pp.207-224,1991.
    [95]李元诚,方廷健,“小波支持向量机”,模式识别与人工智能,Vol.17(2),pp.167-172,2004.
    [96]H.-F.Li,T.Jiang,"A Class of Edit Kernels for SVMs to Predict Translation Initiation Sites in Eukaryotic mRNAs",Computational Biology,Vol.12(6),pp.702-718,2005.
    [97]李琳,张晓龙,“基于RBF核的SVM学习算法的优化算法”,计算机工程与应用,Vol.29,pp.190-192,2006.
    [98]C.T.Lin,and C.S.G.Lee,"Neural-Network-Based Fuzzy Logic Control and Decision System",IEEE Tran.on Computer,Vol.40(12),pp.1320-1336,1991.
    [99]R.Linsker,"Self-organization in a perceptual network",Computer,Vol.21,pp.885-902,1988.
    [100]B.Liu,Theory and Practice of Uncertain Programming,Heidelberg,Ger many:Physica-Verlag,2002.
    [101]刘向东,骆斌,陈兆乾,“支持向量机最优模型选择的研究”,计算机研究与发展,Vol.42(4),pp.576-581,2005.
    [102]L.Ljung,System Identification,Theory for the User,Printice Hall,Upper Saddle River,2nd Edition,1999.
    [103]吕柏权,村田纯一,平泽宏太郎,“使用三层神经网络的先验信息新学习方法”,中国科学E辑,Vol.34(4),pp.374-390,2004.
    [104]D.J.C.Mackay,Information Theory,Inference,and Learning Algorithems,Cambridge University Press,2003.
    [105]S.Mallat,A Wavelet Tour of Signal Processing.Academic Press,New York,1998.(中译本:信号处理的小波导引,杨力华,戴道清,黄文良,湛秋辉译,机械工业出版社,2002)
    [106]S.Marinai,M.Gori,and G.Soda,"Artificial Neural Networks for Document Analysis and Recognition",IEEE Trans.on Pattern Analysis and Machine Intelligence,Vol.27(1),pp.23-35,2005.
    [107]W.McCulloch,and W.Pitts,"A logical calculus of the ideas immanent in nervous activity",Bulletin of Mathematical Biophysics,Vol.5,pp.115-133,1943.
    [108]K.McGarry,S.Wermter,and J.MacIntyre,"Hybrid Neural Systems:from Simple Coupling to Fully Integrated Neural Networks",Neural Computing Surveys,Vol.2(1),pp.62-93,1999.
    [109]D.J.Newman,8.Hettich,C.L.Blake,and C.J.Merz,UCI Repository of machine learning databases [http://www.ics.uci.edu/mlearn/MLRepository.html].Irvine,CA:University of California,Department of Information and Computer Science,1998.
    [110]苗夺谦,王珏,“粗糙集理论中知识粗糙性与信息熵关系的讨论”,模式识别与人工智能,Vol.11,pp.34-40,1998.
    [111]C.Micchelli,and M.Pontil,"Learning the Kernel Function via Regularization",Journal of Machine Learning Research,Vol.6,pp.1099-1125,2005.
    [112]K.Mikolajczyk,and C.Schmid,"A Performance Evaluation of Local Descriptors",IEEE Trans.On Pattern Analysis and Machine Intelligence,Vol.27,pp.1615-1630,2005.
    [113]T.M.Mitchell,Machine Learning,McGraw-Hill,New York,1997.(中译本:机器学习,曾华军,张银奎译,机械工业出版社,2003)
    [114]S.Mitra,S.K.Pal,and P.Mitra,"Data Mining in Soft Computing Framework:A Survey",IEEE Trans.on Neural Networks,Vol.13(1),pp.3-14,2002.
    [115]C.Nadeau,and Y.Bengio,"Inference for the generalization error",Machine Learning,Vol.52(3),pp.239-281,2003.
    [116]P.Niyogi,F.Girosi,and T.Poggio,"Incorporating Prior Information in Machine Learning by Creating Virtual Examples",Proceedings of the IEEE,Vol.86(11),pp.2196-2209,1998.
    [117] S.-K. Oh, W. Pedrycz and S.-B. Roh, "Genetically optimized fuzzy polynomial neural networks with fuzzy setbased polynomial neurons" , Information Sciences, vol. 176, pp. 3490-3519, 2006.
    [118] J. D. Olden, and D. A. Jackson, "Illuminating the Black-Box: A Randomization Approach for Understanding Variable Contributions in Artificial Neural Networks", Ecological Modeling, Vol. 154(1), pp. 135-150, 2002.
    [119] C. S. Ong, A. J. Smola, R. C. Williamson, "Superkernels", Advances in Neural Information Processing Systems (NIPS-15), pp. 478-485, 2002.
    [120] D. W. Opitz, and J. W. Shavlik, "Dynamically Adding Symbolically Meaningful Nodes to Knowledge-Based Neural Networks", Knowledge-Based Systems, Vol. 8(6), pp. 301-311, 1995.
    [121] D. W. Opitz, and J. W. Shavlik, "Connectionist Theory Refinement: Genetically Searching the Space of Network Topologies", Journal of Artificial Intelligence Research, Vol. 6(1), pp. 177-209, 1997.
    [122] Y.-Y. Ou, C.-Y. Chen, S.-C. Hwang, and Y.-J. Oyang, "Expediting Model Selection for Support Vector Machines Based on Data Reduction", IEEE International Conference on Systems, Man and Cybernetics, Vol. 4, pp. 786-796, 2003.
    [123] H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 27, pp. 1226-1238, 2005.
    [124] T. Poggio, V. Torre, and C. Koch, "Computational Vision and Regulariza-tion Theory", Nature, Vol. 317(6035), pp. 314-319, 1985.
    [125] T. Poggio, and T. Girosi, "Networks for Approximation and Learning", Proceedings of the IEEE, Vol. 78(9), pp. 1481-1497, 1990.
    [126]T.Poggio,and T.Vetter,"Recognition and Structure from one 2D Model View:Observations on Prototypes,Object Classes and Symmetries",A.I.Memo No.1347,Artificial Intelligence Laboratory,Massachusetts Institute of Technology,1992.
    [127]J.C.Principe,D.Xu,Q.Zhang and J.W.Fisher,"Learning from Examples with Information Theoretic Criteria",Journal of VLSI Signal Processing,Vol.26(1-2),pp.61-77,2000.
    [128]W.H.Press,S.A.Teukolsky,W.T.Vetterling,and B.P.Flannery,Numerical Recipes in C,Cambridge University Press,1988.
    [129]D.Psaltis,and M.Neifeld,"The Emergence of Generalisation in Networks with Constrained Representations",IEEE International Conference on Neural Networks(ICNN'88),Vol.1,pp.371-381,1988.
    [130]D.Psichogios,and L.H.Ungar,"A Hybrid Neural Network - First Principles Approach to Process Modeling",AICHE Journal,Vol.38(10),pp.1499-1511,1992.
    [131]曲寒冰,“贝叶斯框架下广义结合泛函网络与广义约束神经网络的研究与应用”,博士论文,中国科学院自动化研究所,2007.
    [132]J.R.Quinlan,"Introduction of Decision Trees",Machine Learning,Vol.1,pp.81-106,1986.
    [133]J.R.Quinlan,C4.5:Programs for Machine Learning,Morgan Kaufmann,San Francisco,1993.
    [134]J.Racine,"Consistent cross-validatory model-selection for dependent data:hv-block cross-validation",Journal of Econometrics,Vol.99(1),pp.39-61,2000.
    [135]S.J.Raudys,and A.K.Jain,"Small Sample Size Effects in Statistical Pattern Recognition:Recommendations for Practitioners",IEEE Trans.on Pattern Analysis and Machine Intelligence,Vol.13(3),pp.252-264,1991.
    [136]D.E.Rumelhart,G.E.Hinton,and R.J.Williams,"Learning Internal Representations by Error Propagation",In Parallel Distributed Processing Explorations in the Microstructure of Cognition,D.E.Rumelhart,J.L.McClelland,and the PDP Research Group,Eds.,Vol.1,pp.318-362,Bradford Books,Cambridge,MA,1986.
    [137]R.E.Schapire,and Y.Singer,"Improved Boosting Algorithms Using Confidence-rated Predictions" Machine Learning,Vol.37(3),pp.297-336,1999.
    [138]R.E.Schapire,M.Rochery,M.Rahim,and N.Gupta,"Boosting with Prior Knowledge for Call Classification" IEEE Trans.on Speech and Audio Processing,Vol.13(2),pp.174-181,2005.
    [139]G.P.J.Schmitz,C.Aldrich,and F.S.Gouws,"ANN-DT:An Algorithm for Extraction of Decision Trees from Artificial Neural Networks" IEEE Trans.on Neural Networks,Vol.10(6),pp.1392-1401,1999.
    [140]B.Scholkopf,"Support Vector Learning",Ph.D.Dissertation,R.Oldenbourg Verlag Publication,Munich,Germany,1997.
    [141]B.Scholkopf,P.Simanrd,A.Smola,and V.Vapnik,"Prior Knowledge in Support Vector Kernels",M.I.Jordan,M.J.Kearns,and S.A.Solla eds,Advances in Neural Information Proceeding Systems 10,pp.640-646,MIT Press,1998.
    [142]A.Selonen,and J.Lampinen,"Experiments on regularizing MLP models with background knowledge",In W.Gerstner,A.Germond,M.Hasler,and J.D.Nicoud Eds.,Artificial Neural Networks- ICANN'97,Lecture Notes in Computer Science,Springer,PP.367-372,1997.
    [143]沈艳军,汪秉文,“激活函数可调的神经元网络的一种快速算法”,中国科学:E辑,Vol.33(8),pp.733-740,2003.
    [144]史忠植,机器学习,清华大学出版社,北京,2002.
    [145] P. Y. Simard, B. Victorri, Y. LeCun, and J. Denker, "Tangent Prop - A Formalism for Specifying Selected Invariances in An Adaptive Network", Advances in Neural Information Proceeding Systems 4, pp. 895-903, MIT Press, 1992.
    [146] T. Sincich, usiness statistics by example, 5th Edition, New Jersey: Prentice Hall, 1996.
    [147] J. Sjoberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P. Y. Glorennec, H. Hjalmarsson, and A. Juditsky, "Nonlinear Black-Box Modeling in System Identification: A Unified Overview", Automatica, Vol. 31(12), pp. 1691-1724, 1995.
    [148] S. A. Smith, "A Derivation of Entropy and the Maximum Entropy Criterion in the Context of Decision Problems", IEEE Trans. on Systems, Man, and Cybernetics, Vol. 4(1), pp. 157-184, 1974.
    [149] A. J. Smola, B. SchOlkopf, and K.-R. Müller, "The Connection between Regularization Operators and Support Vectors Kernels", Neural Networks, Vol. 11(3), pp. 637-649, 1998.
    [150] C. Soares, P. Brazdil, and P. Kuba, "A Meta-Learning Method to Select Kernel Width in Support Vector Regression", Machine Learning, Vol. 54, pp. 195-209, 2004.
    [151] D. V. Sridhar, E. B. Bartlett, and R. C. Seagrave, "Information theoretic subset selection for neural network models", Computers Chem. Eng., Vol. 22, pp. 613-626, 1998.
    [152] Statlib - Data, Software and News from the Statistics Community. [http://lib.stat.cmu.edu/datasets/]
    [153] I. Steinwart, "On the Influence of the Kernel on the Consistency of Support Vector Machines", Journal of Machine Learning Research, Vol. 2, pp. 67-93, 2002.
    [154] A. Strehl, and J. Ghosh, "Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions", Journal of Machine Learning Research, Vol. 3, pp. 583-617, 2002.
    [155] Z.-Q. Sun, G.-C. Xi, J.-Q. Yi, and D.-B. Zhao, "Select Informative Symptoms Combination for Diagnosing Syndrome", Journal of Biological Systems, Vol. 15(1), pp. 27-37, 2007.
    [156] Y. Tan, and J. Wang, "A Support Vector Machine with a Hybrid Kernel and Minimal Vapnik-Chervonenkis Dimension", IEEE Trans. on Knowledge and Data Engineering, Vol. 16(4), pp. 385-395, 2004.
    [157] Y. Tan, and J. Wang, "A Support Vector Machine with a Hybrid Kernel and Minimal Vapnik-Chervonenkis Dimension", IEEE Trans. on Knowledge and Data Engineering, Vol. 16(4), pp. 385-395, 2004.
    [158] J. S. Taylor, P. L. Bartlett, R. C. Willianmson, and M. Anthony, "Structural Risk Minimization over Data-Dependent Hierarchies", IEEE Trans. on Information Theory, Vol. 44(5), pp. 1926-1940, 1998.
    [159] B. J. Taylor, M. Darrah, and C. Moats, "Verification and Validation of Neural Networks: A Sampling of Research in Progress", Proceedings of Aero Sense, Orlando, FL, 21-25, April 2003.
    [160] S. Thrun, "Explanation Based Neural Network Learning: A Lifelong Learning Approach", Kluwer Academic Publisher, Boston, MA, 1996.
    [161] G. Towell, and J. Shavlik, "Knowledge-Based Artificial Neural Networks", Artificial Intelligence, Vol. 70(1/2), pp. 119-165, 1994.
    [162] F. Y. Tzeng, and K. L. Ma, "Opening the Black Box-Data Driven Visualization of Neural Networks", In Proc. of the IEEE Conference on Visualization, Hangkong, China, pp. 383-390, 2005.
    [163] B. (?)stün, W. J. Meissen, M. Oudenhuijzen, and L. M. C. Buydens, "Determination of Optimal Support Vector Regression Parameters by Genetic Algorithms and Simple Optimization",Analytica Chimica Acta,Vol.544,pp.292-305,2005.
    [164]B.Ustun,W.J.Melssen,and L.M.C.Buydens,"Facilitating the Application of Support Vector Regression by Using a Universal Pearson ⅦFunction Based Kernel",Chemometrics and Intelligent Laboratory Systems,Vol.81,pp.29-40,2006.
    [165]V.Vapnik,Statistical Learning Theory,New York,USA:John W iley and Sons,1998.(中译本:统计学习理论,许建华,张学工译,电子工业出版社,2004)
    [166]R.F.Wagner,C.E.Metz,and G.Campbell,"Assessment of Medical Imaging Systems and Computer Aids:A Tutorial Review",Academic Radiology,Vol.14,pp.723-748,2007.
    [167]A.Waibel,"Modular Construction of Time-Delay Neural Networks for Speech Recognition",Neural Computation,Vol.1(1),pp.39-46,1989.
    [168]B.Z.Wang,D.Zhao,and J.Hong,"Modeling Stripline Discontinuities by Neural Network with Knowledge-Base Neurons",IEEE Trans.on Advanced Packaging,Vol.23(4),pp.692-698,2000.
    [169]王仲宇,刘红星,“ANN构造性设计中基于GA优选神经元激活函数类型”,计算机工程与应用,Vol.40(23),pp.46-49,2004.
    [170]王玲,薄列峰,刘芳,焦李成,“最小二乘隐空间支持向量机”,计算机学报,Vol.28(8),pp.1302-1307,2005.
    [171]王泳,胡包钢,“应用统计方法综合评估核函数分类能力的研究”,计算机学报,已录用.
    [172]王泳,胡包钢,“归一化信息增益准则与准确率、精确率、召回率的非线性关系研究”,2007年全国模式识别学术会议(CCPR2007),2007年12月,北京,科学出版社,pp.27-34,2007.
    [173]Yong Wang,B.-G.Hu,"Derivations of Normalized Mutual Information in Binary Classifications",Submitted to IEEE Trans.on Information Theory.
    [174]王珏,石纯一,“关于知识表示的讨论”,计算机学报,Vol.18(3),pp.212-224,1995.
    [175]C.Watkins,"Dynamic alignment kernels",In A.J.Smola,P.L.Bartlett,B.Scholkopf,and D.Schuurmans,editors,Advances in Large Margin Classifters,MIT Press,1999
    [176]J.A.Wilson,and L.F.M.Zorzetto,"Generalised Approach to Process State Estimation Using Hybrid Artificial Neural Network/Mechanistic Model",Computers and Chemical Engineering,Vol.21(9),pp.951-963,1997.
    [177]I.H.Witten,A.Moffat,and T.C.Bell,Managing gigabytes:Compressing and indexing documents and images,Morgan Kaufmann,San Francisco,2nd Edition,1999.
    [178]I.H.Witten,and E.Frank,Data Ming:Practical Machine Learning Tools and Techniques,Morgan Kaufmann,San Francisco,2nd Edition,2005.(中译本:数据挖掘-实用机器学习技术(第二版),董琳等译,机械工业出版社,2006)
    [179]D.H.Wolpert,"The relationship between PAC,the statistical physics framework,the Bayesian framework,and the VC framework",In David H.Wolpert,eds.,The Mathematics of Generalization,Addison Wesley,Reading,MA,pp.117-214,1995.
    [180]D.H.Wolpert,and W.G.Macready,"No Free lunch theorems for optimization",IEEE Trans.on Evolutionary Computation,Vol.1(1),pp.67-82,1997.
    [181]D.H.Wolpert,"The Supervised Learning No-Free-Lunch Theorems",In Proc.6th Online World Conference on Soft Computing in Industrial Applications,2001.
    [182]吴佑寿,“利用输入信号先验知识构造某些分类神经元网络的研究”,中国科学:E辑,Vol.26(2),pp.141-144,1996.
    [183]吴佑寿,赵明生,“激活函数可调的神经元模型及其有监督学习与应用”,中国科学:E辑,Vol.31(3),pp.263-272,2001.
    [184]S.Wu,and S.Amari,"Conformal Transformation of Kernel Functions:a Data-Dependent Way to Improve Support Vector Machine Classifiers",Neural Processing Letters,Vol.15(1),pp.59-67,2002.
    [185]吴涛,贺汉根,贺明科,“基于插值的核函数构造”,计算机学报,Vol.26(8),pp.990-996,2003.
    [186]G.Wu,and E.Y.Chang,"KBA:Kernel Boundary Alignment Considering Imbalanced Data Distribution",IEEE Trans.on Knowledge and Data Engineering,Vol.17(6),pp.786-795,2005.
    [187]西广成,“复杂系统分划的熵方法”,自动化学报,Vol.13(3),pp.216-220,1987.
    [188]西广成,“生态经济区划的熵方法”,自动化学报,Vol.16(2),pp.170-173,1990.
    [189]H.Xiong,M.Swamy,and M.Ahmad,"Optimizing the Kernel in the Empirical Feature Space",IEEE Transactions on Neural Networks,Vol.16,pp.460-474,2005.
    [190]J.Wood,"Invariant Pattern Recognition:A Review",Pattern Recognition,Vol.29(1),pp.1-17,1996.
    [191]D.Xu,"Energy,Entropy and Information Potential for Neural Computation",Ph.D.Dissertation,University of Florida,1999.
    [192]薛福珍,梧洁,“基于先验知识和神经网络的非线性建模与预测仿真”,系统仿真学报,Vol.16(5),pp.1057-1059,1063,2004.
    [193]阎平凡,黄端旭,人工神经网络--模型,分析与应用,安徽教育出版社,合肥,1993.
    [194]杨光正,吴岷,张晓莉,模式识别,中国科学技术大学出版社,2003.
    [195]袁著祉,陈增强,李翔,“联结主义智能控制综述”,自动化学报,Vol.28(增刊),pp.38-59,2002.
    [196]张朝晖,陆玉昌,张钹,“利用神经网络发现分类规则”,计算机学报,Vol.22(1),pp.108-112,1999.
    [197]L.Zhang,W.-D.Zhou,and L.-C.Jiao,"Hidden Space Support Vector Machines",IEEE Trans.On Neural Networks,Vol.15(6),pp.1424-1434,2004.
    [198]T.Zhang,and M.Nakamura,"A Hybrid Neuro-Inverse Control Approach With Knowledge-Based Nonlinear Separation for Industrial Nonlinear System",IEEE Trans.on Control Systems Technology,Vol.13(5),pp.840-846,2005.
    [199]周志华,陈世福,“神经网络规则抽取”,计算机研究与发展,Vol.39(4),pp.398-405,2002.
    [200]W.Zucchini,"An Introduction to Model Selection",Journal of Mathematical Psychology,Vol.144,pp.41-61,2000.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700