多层感知器神经网络的局部泛化误差模型

英文题名：Localized Generalization Error Model of Multilayer Perceptron Neural Networks
作者：杨飞
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：局部泛化误差模型 ; 多层感知器神经网络 ; 随机敏感度测量 ; 结构选择 ; 图像标注
英文关键词：Multilayer Perceptron Neural Network ; Localized Generalization Error Model ; Stochastic Sensitivity Measure ; Architecture Selection ; Image Annotation
学位年度：2008
导师：杨苏
学科代码：081203
学位授予单位：哈尔滨工业大学
论文提交日期：2008-06-01

摘要

多层感知器神经网络在模式识别、函数逼近、风险预测和控制等领域中有广泛的应用,泛化能力是评价多层感知器神经网络训练成功的重要标准。多层感知器神经网络从训练样本提取“知识”,实现从输入空间到输出空间的映射,最后用训练过的多层感知器神经网络分类器对新到来的未知样本进行有效的分类。
     现有的评价多层感知器神经网络泛化能力的方法主要有两类:解析模型和交叉验证方法。解析模型提供了一种数学的方法来评价多层感知器神经网络的泛化能力,而交叉验证方法是一种实验性方法来评价多层感知器神经网络泛化能力。这些方法有以下缺点:不能区分有相同隐藏层神经元数而权值不同的网络的泛化能力、忽略了未知样本和训练样本之间存在多大差异、对大的数据集时间复杂度比较高。在实际应用中,对于一个特定的分类问题,期望训练的多层感知神经网络分类器能够正确识别与训练样本相差很大的未知样本是不合理的。这是本文研究多层感知器神经网络局部泛化误差模型的动机所在。
     局部泛化误差模型利用与训练样本“相似”的未知样本来确定训练网络泛化误差上界。未知样本与训练样本是“相似”的,如果这个未知样本特征值与训练样本的特征值的差异小于给定的实数值Q。局部泛化误差模型包括训练集误差,随机敏感度测量和给定训练集常数。在局部泛化误差模型中,训练误差和随机敏感度测量之间达到最好的折中时有最小化的局部泛化误差。
     在本文中,用局部泛化误差模型对多层感知器神经网络进行结构选择。即对于给定的分类问题,选择的隐藏层神经元数的多层感知器神经网络具有最好的泛化能力。用15个UCI数据集实验仿真,实验结果表明用局部泛化误差模型进行多层感知器神经网络结构选择的方法结果好于其他现有的几种方法。最后多层感知器神经网络的局部泛化误差模型应用到图像标注问题中,实验结果表明该方法有很好的应用前景。
Multilayer Perceptron Neural Networks (MLPNNs) have a wide range of applications, such as pattern recognition, function approximation, risk prediction, control, etc...The generalization capability of a MLPNN is always the major criterion to determine the successfulness of the MLPNN training. This is because the ultimate goal of MLPNN learning is to extract the input-output mapping of a given classification problem from a set of training samples and then recognize future unseen samples correctly.
     There are two major approaches for estimating the generalization capability of a MLPNN: Analytical Models and Cross-Validation Methods (CV). Analytical models provide a mathematical tool for us to analyze the generalization capability of a MLPNN while CV methods estimate the MLPNN generalization capability empirically. The major drawbacks of existing methods include: can not distinguish particular MLPNN with the same number of hidden neurons but different weight values; ignore how much differences occur between unseen samples and training samples and time-consuming for large datasets. In practice, one trains a particular MLPNN for a given classification problem and one should not expect that the MLPNN could be able to recognize unseen samples that are very different to the training samples correctly. So, these motivate us to propose the Localized Generalization Error Model (L-GEM) for MLPNN in this thesis.
     The L-GEM provides an analytical upper bound of the generalization error for unseen samples which is similar to the training samples. An unseen sample is considered to be similar to the training sample if the difference between its input feature values and that of a training sample is smaller than a pre-selected real value Q (i.e. local in the input space). The L-GEM model consists of the training error, the stochastic sensitivity measure and constants computed from the given training dataset. The L-GEM model shows that a MLPNN with minimum generalization error must have the best balance between training error and its stochastic sensitivity measure.
     In this thesis, the L-GEM is adopted to architecture selection for MLPNN. We find the best number of hidden neurons of MLPNN for a given classification problem. Experimental results using 15 UCI datasets show that MLPNNs built using L-GEM outperform several existing architecture selection methods. The L-GEM architecture selection method is also applied to an image annotation problem, and the experiment result show that our method has a good application prospect.

引文

1 S. Warren, McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics. 1943, 5: 33~115
    2 D. O. Hebb. The organization of behavior. New York: Wiley. 1949
    3 F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review. 1958, 65(6): 386~408
    4 B. Widrow, and M. E. Hoff Jr. Adaptive switching circuits. IRE Wescon Convenition Record Part 4. 1960: 96~104
    5 B. Widrow. Generalization and information storage in networks of Adeline‘neurons’. in M.C.Yovitz, G.T.Jacobi, and G.D.Goldstein, eds., Self-Organization Systems, eds.:M.C. Yovitz, G.T. Jacobi, and G.D. Goldstein. Spartan Books. 1962: 435~461
    6 N. Nilsson. Learning machines:foundations of trainable pattern-classifying systems. New York:McGraw-Hill. 1965
    7 M. Minsky, S. Papert. Perceptrons. Cambridge, MA: MIT Press. 1969
    8 J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proc.National Academy of Sciences. 1982, 79: 2554~2558
    9 D. E. Rumelhart, G. E. Hinton, R. J. Williams. Learning representations of back-propagation errors. Nature (London). 1986, (323): 533~536
    10 M. D. Richard and R. Lippmann. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput. 1991, 3: 461~483
    11 Guoqing and Peter Zhang. Neural Networks for Classification: A Survey. IEEE Transactions on system, man, and cybernetics. 2000, 30(4): 451~462
    12 Raul Pino, Jose Parreno, Alberto Gomez, Paolo Priore. Forecasting next-day price of electricity in the Spanish energy market using artificial neural networks. Engineering Applications of Artificial Intelligence. 2008, 21(1): 53~62
    13吴清佳,张青平,万健.遗传神经网络的智能天气预报系统.计算机工程, 2005, 31(14): 176~177
    14董戎萍,唐伯良.基于DCT-BP神经网络的人脸表情识别.微计算机信息. 2005, 21(10): 142~144
    15席敏红,贝智敏,叶从英.地震属性神经网络油气模式识别技术及其在东海的应用.中国海上油气. 2003,17(6): 412~415
    16 S.Haykin. Neural Networks: A Comrehensive Fundation. Englewood Cliffs, NJ: Prentice Hall. 1999
    17鲁子奕,杨绿溪,吴球,何振亚.提高前馈神经网络泛化能力的新算法.电路与系统学报. 1997: 7~12
    18冯乃勤,邱玉辉,王芳.提高神经网络泛化能力的研究.计算机工程与应用. 2006: 38~41
    19冯乃勤,邱玉辉,王芳.一种提高神经网络泛化能力的新方法.计算机科学. 2006: 201~204
    20 H. Akaike. A Bayesian Analysis of the Minimum AIC Procedure. Annals of the Institute of Statistical Mathematics 30. 1978: 9~14
    21 H. Park, N. Murata and S-i. Amari. Improving Generalization Performance of Natural Gradient Learning Using Optimized Regularization by NIC. Neural Computation. 2004: 355~382
    22 V. Vapnik. Statistical Learning Theory. Wiley. 1998
    23 M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge. 1999
    24 V. Cherkassky, X. Shao, F. M. Mulier and V. N. Vapnik. Model Complexity Control for Regression Using VC Generalization Bounds. IEEE Transaction on Neural Networks. 1999: 1075~1089
    25 T. Hastie, R. Tibshirani and J.Friedman. The Element of Statistical Learning: Data Mining, Inference, and Prediction. Springer. 2001
    26 S. Watanabe. Algebraic Analysis for Nonidentifiable Learning Machines. Neural Computation. 2001: 899~933
    27 Kurt Hornik. Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks. 1991, 4(2): 251~257
    28尚钢,钟络,陈立耀.神经网络结构与训练参数选择.武汉工业大学学报, 1997, 19(2): 108~110
    29魏海坤,徐嗣鑫,宋文忠.神经网络的泛化理论和泛化方法.自动化学报, 2001, 27(6): 806~815
    30 E. B. Baum, D. Haussler. What size Net Gives Valid Generalization? NIPS, 1, 1989, San Mateo, CA: 81~90
    31 W. Mass. Neural Nets with Super Linear VC-Dimension. Neural Computation. 1994, (6): 877~884
    32 V. Vapmk, E. Levin, Y. Lecun. Measuring the VC-Dimension of Learning machine. Neural Computation. 1994, (6): 851~876
    33 P. Niyogo, F. Girosi. On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Function. Neural Computation. 1996, (8): 819~842
    34 F. Girosi, M. Jones, T. Poggio. Regularization Theory and Neural Networks Architecture. Neural Computation. 1995, (7): 219~269
    35 R. Reed. Pruning Algorithms—A survey. IEEE Trans. Neural Networks. 1993, (4): 740~747
    36董聪,郭晓华.计算智能中若干热点问题的研究与进展.控制理论与应用. 2000, 17(5): 691~698
    37 Daniel S. Yeung, Wing W. Y. Ng, Defeng Wang, Eric C. C. Tsang, Xi-Zhao Wang. Localized Generalization Error and Its Application to Architecture Selection for Radial Basis Function Neural Network. IEEE Trans. on Neural Networks. vol. 2007,18(5): 1294~1305
    38 Wing W.Y. Ng, Daniel S. Yeung, D. Wang, Eric. C. C. Tsang, Xizhao Wang. Localized Generalization Error of Gaussian Based Classifiers and Visualization of Decision Boundaries. Soft Computing. 2007: 375~381
    39 Xiaoqin Zeng and Daniel S. Yeung. Sensitivity Analysis of Multilayer Perceptron to Input and Weight Perturbations. IEEE Trans. Neural Networks. 2001: 1358~1366
    40 J. Hwang, S. You, S. Lay, I. Jou. The cascade-correlation learning: a projection pursuit learning perspective. IEEE Trans. Neural Networks. 1996, 7(2): 278~289
    41 A. P. EngelBrecht. A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans. Neural Networks. 2001: 1386~1398
    42 E. Alpaydim. GAL: networks that grow when they learn and shrink when they forget. International Journal of Pattern Recognition and Artificial. Intellgence. 1994: 391~414
    43 P. A. Castillo, J. Carpio, J. J. Merelo, V. Rivas, G. Romero, and A. Prieto. Evolving multilayer perceptrons. Neural Process. Letter. 2000, 12(2): 115~127
    44 X. Yao. Evolutionary artificial neural networks. Proc. of IEEE. 1999, 87(9): 1423~1447
    45 S. Kirkpatrick, C. D. Gellat Jr. and M.P. Vecchi. Optimization by simulated annealing. Science, 1983, 220: 671~680
    46 J. M. Zurada, A. Malinowski and S. Usui. Perturbation Method for Deleting Redundant Inputs of Perceptron Networks. Neurocomputing. 1997: 177~193
    47 D. J. C. MacKay. Bayesian interpolation. Neural Comput. 1992, 4: 415~447
    48 L. Breiman. Bagging predictors. Machine Learning.1996, 24: 123~140

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700