RBF和MLP神经网络逼近能力的几个结果

英文题名：Several Results of Approximation Capability of Radial Basis Function and Multilayer Perceptron Neural Networks
作者：南东
论文级别：博士
学科专业名称：计算数学
中文关键词：径向基神经网络 ; 多层感知器 ; L~p逼近 ; 连续泛函和算子 ; 系统识别
英文关键词：RBF Neural networks ; Multilayer perceptron ; L~p approximation ; Continuous functionals and operators ; System identification
学位年度：2007
导师：吴微
学科代码：070102
学位授予单位：大连理工大学
论文提交日期：2007-04-01

摘要

神经网络的理论和方法在过去的十几年发展极为迅速，它的应用范围涉及到工程、计算机、物理、生物、经济、管理等科学领域．人们应用神经网络进行聚类分析、智能控制、模式识别和优化计算等等．然而，许多问题的研究都要转化为用迭代的神经网络逼近函数的问题．该问题在数学上可以解释成，用一元函数的复合来表示多元函数，这也是希尔伯特的第十三个猜想．
     本文主要基于神经网络的非线性逼近性质，来研究径向基函数神经网络的逼近能力问题，包括函数逼近问题、强逼近问题以及算子逼近问题．
     即：函数集合在C(K)(或L~p(K))中的稠密问题，在C(K)(或L~p(K))中紧集上的稠密问题以及算子空间T∶L~(p1)(K_1)→L~(p2)(K_2)的逼近问题．这里c_i，λ_i∈R_i，x，y_i∈R~n，i=1，2，…，N，K，K_1，K_2(?)R~n为任意紧集，1≤p，p_1，p_2＜∞，激活函数g常常取作高斯函数．函数集合F_1又常常称作是RBF神经网络的数学表达形式．
     同时本文也研究了一般前馈网络对于完备线性距离空间中紧集上的函数逼近能力，即：如果H是由以‖·‖_H为范数的所有函数构成的完备线性距离空间，V(?)H为一个紧集，函数族在V中稠密问题．这里(?)λ_jg(τ_j(x))是输入x的输出，λ_j是第j个隐单元到输出单元的权值，g是激活函数．τ_j(x)是第j个隐单元的输入值，它是由输入层以及输入层到第j个隐单元之间的权值决定的．根据不同类型的前馈网络，τ_j(x)具有不同的数学表达形式．
     本论文的结构安排如下：
     第一章回顾一些有关神经网络的背景知识，其中包括近十几年的前馈神经网络逼近结果．
     第二章介绍本文中需要的泛函分析和广义函数的基础知识，例如：基本函数空间和广义函数空间的关系，基本函数的支集和广义函数的支撑，以及基本函数和广义函数的卷积，等等．
     第三章主要讨论径向基神经网络的逼近问题，包括一般函数逼近问题，强逼近问题和算子逼近问题．这些结果推广了径向基网络的逼近结论[1-5]，为RBF神经网络逼近能力的研究提供了有利的理论基础．
     第四章研究一般前馈神经网络的强逼近问题，并给出了前馈网络的具体形式的强逼近结果：例如MLP网络等等．指出对于带有一个隐层的前馈神经网络，可以预先给定隐单元的个数和输入单元到隐单元的权值，只需选择适当的隐单元到输出单元的权值，就可以对一族函数中的任意函数作逼近．
Neural network theory and methods have been developed rapidly in the past two decades and have been applied in diverse areas, such as engineering, computer science, physics, biology, economy and managements, etc. Many researches in this respect can be converted into problems of approximating multivariate functions by superpositions of the neuron activation function of the network. In mathematical terminology, these problems can be expressed that under what conditions can multivariate functions be represented by superposition of univariate functions, which is also the thirteenth conjecture of Hilbert's.
     In this thesis, the nonlinear approximation property of neural networks with one hidden layer is investigated and the approximation capability of radial-basis-function (RBF) neural networks is analyzed theoretically, including approximation any given function, approximation a compact set of functions and the system identification capability of RBF neural networks. In other words, under what conditions can the family of functions approximates any given function, a compact set of functions and any given operate T : L~(P1)(K_11)→L~(P2)(K_2), where c_i,λ_i∈R_i, x, y_i∈R~n, i = 1,2,...,N, K, K_1, K_2(?)R~n are compact sets, 1≤p, P_1, P_2＜∞, the activation function g is typically the Gaussian function, and the network is said to be radial basis function neural network.
     Moveover, the approximation capability of feedforward neural networks to a compact set of functions is concerned in this thesis. We use to denote a family of neural networks, where F_2(x) is the output of the network for the input x,λ_j the weight between the output neuron and the j-th hidden neuron, and 9 the activation function.τ_j (x) is the input value to the j-th hidden neuron which is determined by the weights between the j-th hidden neuron and the input neurons. To elaborate, we shall prove the following: If a family of feedforward neural networks with a hidden layer is dense in H, a metric linear space of functions, then given a compact set V(?) H and an error boundε, one can choose and fix the quantity of the hidden neurons and the weights between the input and hidden layers, such that in order to approximate any function f∈V with accuracyε, one only has to further choose suitable weights between the hidden and output layers.
     This thesis is organized as follows:
     Some background information about FNN is reviewed and some popular results are introduced in Chapter 1.
     Some elementary sentences and fundamental properties of distributions are introduced in Chapter 2, including the relationship between fundamental space and distributions, supports of distributions, distributions as derivatives, convolutions and so on.
     The third chapter mainly deals with the approximation capability of RBF neural networks, including approximation any given function, a compact set of functions and any given operate. These result improve some recent results such as [1-5] et. al.
     The approximation capability of feedforward neural networks to a compact set of functions is investigated in Chapter 4. We follow a general approach that covers all the existing results and gives some new results in this respect. A few examples of straightforward applications of this result to RBF, MLP and other neural networks in some metric linear spaces such as L~p(K) and C(K) is presented in the following. Some of these results have been proved (cf. [1, 2, 6, 7]) in terms of the particular settings of the problems, while the others are new up to this knowledge.

引文

[1] Chen T, Chen H. Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks. IEEE Trans. on Neural Networks, 1995, 6(4):904-910.
    [2] 蒋传海．神经网络中的逼近问题．数学年刊(A辑)，1998，19A(3)：295-300．
    [3] Pinkus A. TDI-subspace of C(R~d) and some density problems from neural networks. Approximation Theory, 1996, 85:269-287.
    [4] Park J, Sandberg I W. Universal approximation using radial-basis-function networks. Neural Computation, 1991, 3:246-257.
    [5] Park J, Sandberg I W. Approximation and radial-basis-function networks. Neural Computation, 1993, 5:305-316.
    [6] Chen T. Approximate problem on neural networks and its application in system recognise. Science in China (Series A), 1994, 37(4):414-421.
    [7] 蒋传海，郭洪斌．L~p(R~n)中的神经网络逼近和系统识别．数学年刊(A辑)，2000，21A(4)：417-422．
    [8] Rumelhart D E, McClelland J L, The PDP Research Group. Parallel distributed processing-explorations in the microstmcture of cognition. Cambridge: MIT Press, 1986.
    [9] 徐秉铮，张百灵，韦岗著．神经网络理论与应用．广州：华南理工大学出版社，1994．
    [10] Hagan M T，Demuth H B，Beale M H著，戴葵等译．神经网络设计．北京：机械工业出版社，2002．
    [11] 沈世镒著．神经网络系统理论及其应用．北京：科学出版社，1998．
    [12] Haykin S. Neural networks: a comprehensive foundation. New York: Macmillan, 1994.
    [13] 吴一全，朱兆达．图像处理中阈值选取方法30年(1962-1992)的进展(一)．数据采集与处理，1993，8(3)：37-45．
    [14] 吴一全，朱兆达．图像处理中阈值选取方法30年(1962-1992)的进展(二)．数据采集与处理，1993，8(4)：29-43．
    [15] Teh C H, Chin R T. On image analysis by the methods of moments. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1988, 10(4):496-513.
    [16] 郭力宾，吴微．二维图像中交叉点的神经网络识别．大连理工大学学报，2003，43：548-550．
    [17] 孔俊，吴微，赵卫海．识别数学符号的神经网络方法．吉林大学自然科学学报，2001，3：11-16．
    [18] Hou L, Wu W, Zhu B, Li F. A segmentation method for merged characters using self-organizing map neural networks. Journal of Information and Computational Science, 2006, 3(2):219-226.
    [19] 孔俊，吴微，赵卫海．识别数学符号的神经网络方法．吉林大学自然科学学报，2001，3：11-16．
    [20] 吴微，陈维强，刘波．用bp神经网络预测股票市场涨跌．大连理工大学学报，2001，41(1)：9-15．
    [21] 张玉林，吴微．用bp神经网络捕捉股市黑马初探．运筹与管理，2004，13(2)：123-130．
    [22] 阎平凡，张长水著．人工神经网络与模拟进化计算．北京：清华大学出版社，2000．
    [23] 张军英，许进著．二进前向人工神经网络．西安：西安电子科技大学出版社，2001．
    [24] 吴微著．神经网络计算．北京：高等教育出版社，2003．
    [25] 华青，白水著．数学家小辞典．上海：知识出版社，1987．
    [26] Eves H著，欧阳绛译．数学史上的里程碑．北京：北京科学技术出版社，1990．
    [27] Cybenko G. Approximation by superpositions of a sigrnoidal function, Mathematics of Control, Signals and Systems, 1989, 2(4):303-314.
    [28] Hartman E J, Keeler J D, Kowalski J M. Layered neural networks with gaussian hidden units as universal approximations. Neural Computation, 1990, 2:210-215.
    [29] Chen T, Chen H, Liu R. Approximation capability in C(R~n) by multilayer feedforward networks and related problems. IEEE Trans. on Neural Networks, 1995, 6(1):25-30.
    [30] Chen T, Chen H. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Trans. on Neural Networks, 1995, 6(4):911-917.
    [31] 陈天平．神经网络及其在系统识别应用中的逼近问题．中国科学(A辑)，1994，24(1)：1-7．
    [32] Chen T, Chen H. University approximation capability to function of ebf neural networks with arbitrary activation functions. Circuits System Signal Processing, 1996, 15(5):671-683.
    [33] 李纯明，陈天平．Sigma-pi神经网络中的逼近问题．科学通报，1996，41(8)：683-684．
    [34] 蒋传海，陈天平．Sobolev空间W_2~m(R~d)中平移不变子空间的逼近问题．数学年刊(A辑)，1999，20A(4)：499-504．
    [35] 蒋传海，陈天平．W_2~m(R~d)中单个函数的平移和伸缩组合的稠密性．数学学报，1999，42(3)：495-500．
    [36] Leshno M, Lin Ya V, Pinkus A, Schocen S. Multilayer feedforward networks with a non-polynomial activation function can approximate any function. Neural Networks, 1993, 6:861-867.
    [37] Li X. On simultaneous approximations by radial basis function neural networks. Applied Mathematics and Computation, 1998, 95:75-89.
    [38] Li X. Note on constructing near tight wavelet frames by neural networks. Proceedings of SPIE Conference on Wavelet Applications in Signal and Image Processing Ⅳ. 1996.
    [39] Liao Y, Fang S, Nuttle H L W. Relaxed conditions for radial-basis function networks to be universal approximators. Neural Networks, 2003, 16:1019-1028.
    [40] Luo Y, Shen S. L~p Approximation of sigma-pi neural networks. IEEE Trans. on Neural Networks, 2000, 11 (6): 1485-1489.
    [41] Mhaskar H N, Micchelli C A. Approximation by superposition of sigmoidal and radial basis function. Advances in Applied Mathematics, 1992, 13:350-370.
    [42] Wu W, Feng G, Li X. Training multiple perceptrons via minimization of sum of ridge functions. Advances in Computational Mathematics, 2002, 17:331-347.
    [43] Back A D, Chen T. Universal approximation of multiple nonlinear operators by neural networks. Neural Computation, 2002, 14:2561-2566.
    [44] Wu W, Feng G R, Li Z X et al. Convergence of an online gradient method for bp neural networks. IEEE Trans. on Neural Neworks, 2005, 16:533-540.
    [45] Huang G, Saratchandran P, Sundararajan N. A generalized growing and pruning rbf (ggap-rbf) neural networks for function approximation. IEEE Trans. on Neural Networks, 2005, 16(1):57-67.
    [46] 蒋传海，李南，徐永强．与神经网络有关的逼近问题．高校应用数学学报，1997，14(1)：117-122．
    [47] 罗跃虎, 沈世镒. 关于sigma-pi神经网络中的逼近问题. 高校应用数学学报(A辑), 2000, 15(1): 107-112.
    [48] Attali J G, Pages G. Approximations by a multilayer perception: A new approach. Neural Networks, 1997, 10(6):1069-1081.
    [49] Ito Y. Approximation of continuous functions on R~d by linear conbinations of shifted rotations of a sigmiodal function without scaling. Neural Networks, 1992, 5(5):105—115.
    [50] Wu W, et al. Recent developments in on convergence of online gradient methods for neural network training. Lecture Notes in Computer Science, 2004, 3174:235-238.
    [51] Georgiou G M, Koutsougeras C. Complex domain backpropagation. IEEE Trans. on Circuits and Systems-H: Analog and Digital Signal Processing, 1992, 39(5):330-334.
    [52] Kim T, Adali T. Approximation by fully complex multilayer perceptrons. Neural Computation, 2003, 15:1641-1666.
    [53] Kim T, Adali T. Fully-complex multilayer perceptron for nonlinear signal processing. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 2002, 32:29-43.

    [54] Lin V Y, Pinkus A. Fundamentality of ridge functions. Journal of Approximation Theory, 1993,75:295-311.
    [55] Petrushev P P. Approximation by ridge functions and neural networks. SIAM Journal on Mathematical Analysis, 1998, 30(1):155-189.

    [56] Buhmann M D, Pinkus A. Identifying linear combinations of ridge functions. Advances in Applied Mathematics, 1999, 22(1):103 - 118.
    [57] Back A, Chen T. Approximation of hybrid systems by neural nwtworks. Processing of 1997 International Conference on Neural Information Processing (ICONIP'97). Berlin: Springer-Verlag, 1998.
    [58] Sandberg I W. Notes on weighted norms and networks approximation of functional. IEEE Trans. on Circuits and Systems-I: Fundamental Theory and Applications, 1996,43(7):600-601.
    [59] Sandberg I W, Xu L. Network approximation of input-output maps and functional. Circuits Systems Signal Processing, 1996, 15(6):711-725.
    [60] Stinchombe M B. Neural network approximation of continuous functional and continuous functions on compactifications. Neural Networks, 1999,12(3):467-477.
    [61] Sandberg I W. Approximation theorem for discrete-time systems. IEEE Trans. on Circuits and Systems, 1991,38(5):564-566.
    [62] Narendra K S, Parthasarathy K. Identification and control of dynamic systems using neural networks. IEEE Trans. on Neural Networks, 1990,1:4-27.
    [63] Narendra K S, Parthasarathy K. Gradient methods for optimization of dynamical systems containing neural networks. IEEE Trans. on Neural Networks, 1991, 2:252-262.
    [64] Ito Y. Activation functions defined on higher-dimensional spaces for approximation on compact sets with and without scaling. Neural Computation, 2003,15:2199-2226.
    [65] Ito Y, Saito K. Supperposition of linearly independent functions and finite mapping by neural networks. Math. Scientist, 1996,21:27-33.
    [66] Friedlander F G, Joshi I. Introduction to The Theory of Distributions. Cambridge, UK; Cambridge University Press, 1982.
    [67] Friedman A. Generalized functions and partial differential equations. Englewood Cliffs, 1963.
    [68] Gel'fand I M, Shilov G E. Generalized Functions, Spaces of Fundamental and Generalized Functions, vol. 2. Academic Press, 1968.
    [69] Rudin W. Functional Analysis. New York: McGraw-Hill, 1987.
    [70] 胡适耕．实变函数．北京：高等教育出版社，施普林格出版社，1999．
    [71] Barros-Neto J著，欧阳光中，朱学炎译．广义函数引论．上海：上海科学技术出版社，1981．
    [72] 夏道行，吴卓人，严绍宗，舒五昌著．实变函数论与泛函分析．北京：人民教育出版社，1979．
    [73] Rudin W. Real and complex analysis. New York: McGraw-Hill, 1974.
    [74] 刘斯铁尔尼克，索伯列夫著，杨从仁译．泛函分析概要．北京：科学出版社，1985．
    [75] Hewitt E, Stromberg K. Real and abstract analysis. New York: Springer-Verlag, 1975.
    [76] Jones D S. The theory of generalised functions. Cambridge, UK; Cambridge University Press, 1982.
    [77] Colombeau, Franois J. New generalized functions and multiplication of distributions. Amsterdam; North- Holland, 1984.
    [78] 周锦成著．傅里叶级数与广义函数论．北京：科学出版社，1983．
    [79] 齐民友，吴方同著．广义函数与数学物理方程．北京：高等教育出版社，1999．
    [80] 程民德，邓东皋，龙瑞麟著．实分析．北京：高等教育出版社，1993．
    [81] 张恭庆，林源渠著．泛函分析讲义．北京：北京大学出版社，1987．
    [82] Royden H L. Real Analysis. New York: Macmillan, 1988.
    [83] 夏道行，严绍宗著．实变函数与应用泛函分析基础．上海：上海科学技术出版社，1987．
    [84] 郑维行，王声望著．实变函数与泛函分析概要．北京：高等教育出版社，1980．
    [85] 陈景良著．近代分析数学概要．北京：清华大学出版社，1987．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700