优化核方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

优化核方法

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Optimal Kernel Methods
作者：李青
论文级别：博士
学科专业名称：电路与系统
中文关键词：统计学习理论 ; 核机器 ; 支撑矢量机 ; 核匹配追踪 ; 支撑矢量预选取 ; 简化支撑矢量 ; Mercer核 ; 模糊核匹配追踪 ; 集成核匹配追踪
英文关键词：Statistical Learning Theory ; Kernel Machine ; Support Vector Machine ; Kernel Matching Pursuit ; Pre-extract Support Vector ; Simplification of Support Vector ; Mercer Kernel ; Fuzzy Kernel Matching Pursuit ; Kernel Matching Pursuit Ensemble
学位年度：2007
导师：焦李成
学科代码：080902
学位授予单位：西安电子科技大学
论文提交日期：2007-01-01

摘要

从上世纪60年代初人们开始对基于数据的机器学习进行研究至今，机器学习领域已经取得了长足的发展。Vapnik等人提出的基于统计学习理论的支撑矢量机，同时结合了统计学习理论和核技术，有效地控制了假设函数集的容量，成为一种通用的学习机；由于支撑矢量机的成功，促使了这一时期新的核机器的出现和快速发展，涌现出了众多优秀的学习机，如核Fisher分类器、核主分量分析及相关向量机等。近年来，Pascal Vincent等人又提出了一种非常有效的学习机：核匹配追踪，该学习机同经典的支撑矢量机相比，其性能相当，同时具有更为稀疏的解。目前，这些学习机均已成功的应用于模式识别，回归估计，函数逼近等领域中。本论文主要包括五个方面的内容：预选取支撑矢量，支撑矢量机稀疏性的自适应控制，Mercer核函数的构造，模糊核匹配追踪及集成核匹配追踪学习机，主要的工作有：
     1．提出了基于向量投影的支撑矢量预选取方法。已有的支撑矢量机分类学习算法的优化过程不仅包含了对支撑矢量的优化，也包括了对非支撑矢量的优化，这无疑大大增加了不必要的计算量。我们提出的方法是在给定的样本中提取出一个包含了支撑矢量的边界矢量集合作为新的训练样本。如果选取适当的预选取参数，边界矢量集能够包括所有的支撑矢量，这样，在保证支撑矢量机的分类性能不变的前提下，该方法能够大大地减少了训练样本的个数，提高支撑矢量机的训练速度。
     2．在提高支撑矢量稀疏性方面，提出了一种自适应的控制策略。支撑矢量机的决策速度(即反映速度)取决于支撑矢量的个数，当决策系统含有大量的支撑矢量时，测试时间就会变得异常缓慢。将一个已设计好的SVM学习机应用于在线问题(实时问题)时，学习机的判决速度常常不能满足问题的需要，这是因为SVM的决策系统不够稀疏。在本论文中，我们提出了一种自适应的简化策略，能够根据具体问题的识别要求自适应的简化支撑矢量机解的复杂度，在保证满足任务检测性能的要求下最大化的削减支撑矢量，提升SVM的在线检测速度。
     3．在核函数构造方面，提出了两种允许Mercer核函数：子波核函数和多分辨核函数。通常核机器中所采用的核函数并不能构成特征空间中一组完备的基，从而学习机的决策函数并不能以任意精度逼近特征空间中任意的目标函数，在大多数情况下，它只是对目标函数的一个近似。子波基函数不仅具有良好的时间—尺度(时频、时空)多分辨特性，而且还具有良好的逼近性能和降噪能力，为此我们构造并证明了子波核函数和多分辨核函数，并成功的将其应用于核匹配追踪学习机中。
     4．在核机器的拓展方面，提出了模糊核匹配追踪学习机。在实际的应用中常常碰到对非平衡样本和特征目标的检测问题，而对这一类信息的检测通常是困难的——由于传统的智能机器在处理模式识别的问题中均是平等的对待所有的训练数据，并不能对某一类指定的数据或某一些特殊的信息进行有针对性的检测，而对这类信息的有效识别往往成为任务的关键环节。在本论文中，我们提出了模糊核匹配追踪学习机(Fuzzy KMP)，预先根据任务的要求对采集的数据设定不同的权重因子(即重要性因子)，使学习机根据样本之间的重要性进行程度不同的训练，最终得出基于特征目标的判决准则。
     5．建立了集成核匹配追踪分类器。在实际工程中，当要求较高的识别精度时，一般采用单一的学习机器并不能达到期望的性能，而集成方法则给出了另一种提升性能的途径——即将一个识别问题划分为多个子任务进行学习得到多个训练好的智能机器，最后采用一定的策略将这多个智能机器集成起来得出最后的决策；采用集成策略同时能够解决另一个更为重要的问题：大规模样本的训练问题。当所采集到的数据非常庞大时，由于计算机的存储空间及计算速度的限制，使得学习机器根本无法处理这些海量的数据，集成策略的采用，先将原始训练数据分裂成一些小的子训练问题，然后对这些子问题分别进行处理，最好通过集成得到最终的判决。集成策略的优势在于不损失原始数据所包含信息的前提下，进一步提升系统的推广能力。
In the early of 1960s, the theory of machine learning based on the data has begun to be researched. It has made great progress after 40 years research. In the nineties of the last decade, Vapnik and his group completed the statistical learning theory and constructed a general and effective machine learning algorithm—Support Vector Machine (SVM). SVM combines the advantages of both statistical learning theory and kernel method, and effectively controls the capacity of the hypothesis function set, which directly result into a good generalization performance. Many novel kernel machines appear in this period such as Kernel Fisher Discrimination (KFD), Kernel Principle Component Analysis (PCA), Relevant Vector Machine (RVM), etc, which is mainly enlightened by the successful applications of kernel functions in SVM. In the nowadays, researchers proposed the other successful learning machine, Kernel Matching Pursuit (KMP), based on the kernel technology. KMP could almost reach the equivalent performance compared with the classical SVM, while has the sparser solution. At present, all of these learning machines have been successfully used in pattern recognition, regression estimation, function approximation, density estimation, etc. From the viewpoints of the optimal kernel method, this dissertation includes five parts work as follows:
     1. We proposed a method to pre-extract support vectors before the training of support vector machine. As all we know, training a support vector machine (SVM) will cost quite more time, which is equivalent to solving a linearly constrained quadratic programming (QP) problem in a number of variables equal to the number of data points. This optimization problem will be challenging when the number of data points exceeds few thousands. Also, it is well known that the ratio of support vectors (SVs) is far low in many practical circumstances, and the decision made by SVM is only relate to these support vectors and has nothing with the other data. So the method of how to pre-extracting SVs becomes a novel task in SVM fields. In this dissertation, we introduce a new method for pre-extracting SVs based on vector projection and the geometrical characteristic of SVs, which could reduces the training samples greatly and speeds up the SVM learning, while remains the generalization performance of the SVM.
     2. For the improvement the sparsity of the SVM, an adaptive simplification strategy is proposed to simplify the solution of the SVM. In usually, SVM is currently considerably slower in test phase caused by numbers of the support vectors, which has been a serious limitation for the real time applications. So how to simplify the solutions of SVM becomes a key problem when using SVM in the online task. To overcome this problem, we proposed an adaptive algorithm named feature vectors selection (FVS) to select the feature vectors from the support vector solutions, which is based on the vector correlation principle and greedy algorithm. Moreover, the selection of number of the feature vectors can be controlled directly by the requirements, so the generalization and complexity trade-off can be controlled adaptively.
     3. Two efficient Mercer permitted kernels are constructed in this dissertation. The common used kernel functions in the kernel learning machines are not the orthogonal basis in the Hilbert space, which will directly degrade the learning machines' accuracy. Wavelet technique has been successfully used in the signal processing for its excellent approximation performance, which is also a complete orthogonal basis in the Hilbert space. In order to make use of the advantages of the wavelet, we have constructed the wavelet Mercer kernel and MRA Mercer kernel. Combining the machines with these two kernels can make up the defects of the traditional kernel functions and improve the performance of the learning machine.
     4. A new kind of machine named fuzzy kernel matching pursuit is proposed. In the practical, the recognitions of following problems are much important. One is to identify the special target, such as the detection of the cancer, aggressive plane, etc. The other is to recognize the minor target from the large data. Yet the conventional KMP machine treats all training data without difference, and gives the minimal errors of the whole dataset. Therefore, it doesn't present high performance to the signified patterns. In order to solve such problems, we proposed an effective machine, fuzzy kernel matching pursuit (FKMP), which imposes a fuzzy factor on each training data such that different points can make different contributions to the learning decision. The fuzzy factor can be pre-chosen according to the task's request. As the result, the detections of these special patterns are well solved.
     5. In the solution of the large scale problems, we need to resolve it quickly and hope to obtain the high performance simultaneous. The current computers are strong enough to deal with the complex science computations, while they still will be invalid when the scale of the problem exceeds some limit. Otherwise, processing large scale problem is an important problem in the real engineering application. In order to deal with such problems, we propose to combine the ensemble strategy with the machines and constructed the kernel matching pursuit classifier ensemble. The ensemble system separates the original images into several small sub-problems firstly, and then processes these individual problems one by one. Finally, it aggregates all the predictions and acquires the synthetic prediction. The greatest advantage of such strategy is that the system does not lose any information included in the original data and greatly improve the recognition performance simultaneous.

引文

[1] Rich E., Artificial Intelligence. McGraw-Hill, Inc., 1983.
    [2] Turing A., Computing machinery and intelligence. MIND, 1950, 59: 433-460.
    [3] Turing A. Intelligence machinery. In Darrell C. Ince, editor, Collected Works of A. M. Turing." Mechanical Intelligence. Elsevier Science Publishers, Amsterdam, The Netherlands, 1992.
    [4] Warren S., McCulloch and Walter Pitts. A logical calculus of ideas imminent in nervous activity. Bulletin of Mathematical Bilphysics, 5: 115-133, 1949.
    [5] Walter Pitts and Warren S. McCulloch. How we know universals: The perception of auditory and visual forms. Bulletin of Mathematical Bilphysics, 9: 127-147, 1947.
    [6] Frank Rosenblatt. The perception: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6): 386-408, 1958.
    [7] Frank Rosenblatt. Principles of Neurodynamics: Perception and theory of brain mechanisms. Spartan Books, Washington DC, 1962.
    [8] Oliver G. Selfridge. Pandemonium: A paradigm for learning. In Mechanisation of Thought Processes: Proceedings of a Symposium held at the National Physical Laboratory, pp. 513-526, London, 1958. HMSO.
    [9] Oliver G. Selfridge and Ulrich Neisser. Pattern Recognition by machine. Scientific American, 203(2): 60-68, 1960.
    [10] Marvin L. Minsky and Seymour A. Papert. Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge, MA, 1969.
    [11] Bernard Widrow. 30 years of adaptive neural networks: Perceptron, Madaline and Backpropagation. Proceedings of IEEE, 78(9): 1415-1452, 1990.
    [12] Minsky M. and Papert S. Perceptron, MIT Press, 1969.
    [13] Simon Haykin. Adaptive Filter Theory. Prentice-Hall, Englewood Cliffs, NJ, second edition, 1991.
    [14] Rudolf E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME, Series D, Journal of Basic Engineering, 82(1): 34-45, 1960.
    [15] Arthur E. Bryson, Jr., Walter Denham and Stuart E. Dreyfus. Optimal programming problem with inequality constraints. Ⅰ: Necessary conditions for extremal solutions. American Institute of Aeronautics and Astronautics Journal, 1(11): 2544-2550, 1963.
    [16] Arthur E. Bryson, Jr. and Yu-Chi Ho. Applied Optimal Control. Blaisdell, Waltham, MA, 1969.
    [17] Bernard Widrow and Marcian E. Hoff, Jr. Adaptive switching circuits. 1960 IRE WESCON Convention Record, pp. 96-104, 1960.
    [18] Bernard Widrow and Samuel D. Stearns, editors. Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs, NJ, 1985.
    [19] Paul John Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. thesis, Harvard University, Cambredge, MA, 1974.
    [20] Paul John Werbos. The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. Wiley, New York, 1994.
    [21] David B. Parker. Technical Report S8164, File 1, Stanford University Office of Technology Licensing, 1982.
    [22] David B. Parker. Learning logic. Technical Report, TR47, MIT Center for Research in Computational Economics and Management Science, 1985.
    [23] LeCun Y. Learning Progress in an asymmetric threshold network. Disordered Systems and Biological Organizations, Les Houches, Fance, Springer, pp. 233-240, 1986.
    [24] Rumelhart D. E., Hinton G. E. and Williams R. J. Learning internal representations by error propagation. Parallel distributed processing: Explorations in macrostructure of cognition, vol. Ⅰ, Badford Book, Cambridge, MA, pp. 318-362, 1986.
    [25] Richard O. Duda, Peter E. Hart and David G. Stork. Pattern Classification, Second Edition. John Wiley & Sons, Inc., 2001.
    [26] Pierre Simon Laplace. Theory Analytique des Probabilities. Courcier, Paris, France, 1812.
    [27] Chao K. Chow. An optimum character recognition system using decision functions. IRE Transactions, pp. 247-254, 1957.
    [28] Keinosuke Fukunaga and Thomas F. Krile. Calculation of Bayes recognition error for two multivariate Gaussian distributions. IEEE Transactions on Computers, C18: 220-229,1969.
    [29] Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23: 493-507, 1952.
    [30] Anil Bhattacharyys. On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society, 35: 99-110, 1943.
    [31] Keinosuke Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, second edition, 1990.

    [32] Pearl J. Probabilistic Reasoning in Intelligence Systems. Kaufmann Press, 1988.
    [33] Lauritzen S. L. and Speigelhalter D. J. Local computations with probabilities on graphical structures and their application to expert systems. J Roy. Statist. Soc. Series B, 50:157-224, 1988.
    [34] Vovk V. Aggregating strategies. In M. Fulk and J. Case, editors, Procs of the third Workshop on Computational Learning Theory, Kaufmann Press, CA, pp. 371-383, 1990.
    [35] Hadar Avi-Itzhak and Thanh Diep. Arbitrarily tight upper and lower bounds on the Bayesian probability of error. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-18(1): 89-91, 1996.
    [36] Jerzy Neyman and Egon S. Pearson. On the problem of the most efficient tests of statistical hypothesis. Philosophical Transactions of the Royal Society, London, 231:289-337, 1928.
    [37] V. Vapnik. An overview of statistical learning theory. IEEE Trans. Neural Networks, vol.10, no.5, pp.988-999, 1999.

    [38] V. Vapnik. The Nature of statistical learning Theory. New York: Spinger-Verlag, 1995.

    [39] V. Vapnik. Statistical Learning Theory. Wiley-Interscience Publication, 1998.
    [40] Vapnik, V.N., Chervonenkis, A.Ja. On the uniform convergence of relative frequencies of events to their probabilities. Doklady Akademii Nauk USSR, 181(4), 1968. (English transl. Sov. Math. Dokl.)
    [41] Vapnik, V.N., Chervonenkis, A.Ja. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Apt, 16: 264-280, 1971.
    [42] Vapnik, V.N., Chervonenkis, A.Ja. Theory of Pattern Recognition (in Russian), Nauka, Moscow, 1974.
    [43] Vapnik, V.N. Estimation of dependencies based on empirical data (in Russian), Nauka, Moscow, 1979. (English translation: Springer Verlag, New York, 1982).
    [44] Vapnik, V.N., Chervonenkis, A. J. The necessary and sufficient conditions for consistency of the method of empirical risk minimization (in Russian). Yearbook of the Academy of Sciences of the USSR on Recognition, Classification, and Forecasting, 2, Nauka, Moscow, 207-249, 1989. (English translation: Pattern Recogn. and Image Analysis, 1(3): 284-305, 1991).
    [45] Scholkopf, B., Burges, C. and Smola A. J. Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA. 1999.
    [46] Scholkopf, B. Support Vector Learning. Oldenbourg Verlag, Munich, 1997.
    [47] Cortes, C., Vapnik, V. Support Vector Networks. Machine Learning, 20: 1-25, 1995.
    [48] Scholkopf, B and Smola A. J. Learning with Kernels. 2002.
    [49] Müller, K.-R., Smola, A., Ratsch, G. and et al. An introduction to kernel-based learning algorithms. IEEE Trans. on Neural Networks, 12(2): 181-201, 2001.
    [50] Campbell C. An introduction to kernel methods. In R. J. Howlett and L. C. Jain, editors, Radial Basis Function Networks: Design and Applications, Springer Verlag, Berlin, pp. 155-192, 2000.
    [51] Smola A. J. Learning with Kernels. PhD thesis, Technische University Berlin, 1998.
    [52] Mercer, J. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209: 415-446, 1909.
    [53] Aizerman, M. A., Braverman, E. M., Rozonoér, L.I. Theoretical foundation of the potential function method in pattern recognition learning. Automation and Remote Control, 25: 821-837, 1964.
    [54] Boser, B. E., Guyon, I.M., Vapnik, V.N. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA: ACM Press, pp. 144-152, 1992.
    [55] Burges, C.J.C. A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, vol.2, no.2, pp. 1-47,1998.
    [56] Scholkopf, B., Smola, A. J., Willianson, R., and Bartlett, P. New support vector algorithms, Neural Computation, 12: 1207-1245, 2000
    [57] Scholkopf, B., Smola, A. J., Willianson, R., and Bartlett, P. New support vector algorithms, Neural Computation, 12: 1207-1245, 2000
    [58] Smola, A., and Scholkopf, B. A tutorial on support vector regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK, 1998. Available http://www.kernel-machines.org/
    [59] Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V. Support vector regression machines. Advances in Neural Information Processing Systems, 9: 155-161, 1997.
    [60] Tax, D., and Duin, R. Data domain description by Support Vectors. In Proceedings of ESANN99, M. Verleysen, editor, Brussels: D. Facto Press, pp. 251-256, 1999.
    [61] Scholkopf, B., Smola, A., Müller, K.-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computations, 10: 1299-1319, 1998.
    [62] Mika, S., Ratsch, G., Weston, J., Scholkopf, B., and Müller, K.-R. Fisher discriminant analysis with kernels. In Y.-H.Hu, J. Larsen, E. Wilson, and S. Douglas, editors., Neural Networks for Signal Processing Ⅸ, IEEE, pp. 41-48. 1999
    [63] Weinert, H.L. Reproducing Kernel Hilbert Space. Hutchinson Ross, Stroudsburg. PA. 1982.
    [64] Wahba, G. Spline Models for Observational Data, vol. 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
    [65] Williams, C.K.I. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M.I. Jordon, editor, Learning and Inference in Graphical Models. Kluwer, 1998.
    [66] MacKay, D.J.C. Introduction to Gaussian processes. In C.M. Bishop, editor, Neural Networks and Machine Learning, Berlin: Springer-Verlag, pp. 133-165, 1998.
    [67] Scholkopf, B., Burges, C., and Vapnik, V. Extracting support data for a given task. In U. M. Fayyad and R. Uthurusamy, editors, Proceedings, First International Conference on Knowledge Discovery & Data Mining. AAAI Press, Menlo Park, CA, 1995.
    [68] Blanz, V., Scholkopf, B., Bülthoff, H., Burges, C., Vapnik, V., and Vetter, T. Comparison of view-based object recognition algorithms using realistic 3d models. In C. von der Malsburg, W.von Seelen, J. C. Vorbrüggen, and B. Sendhoff, editors, Artificial Neural Networks—ICANN'96, pp. 251-256, Berlin, 1996. Springer Lecture Notes in Computer Science, Vol. 1112.
    [69] Schmidt, M. Identifying speaker with support vector networks. In Interface '96 Proceedings, Sydney, 1996
    [70] Osuna, E., Freund, R., Girosi, G. Training support vector machines: an application to face detection. In International Conference on Computer Vision and Pattern Recognition, pp. 130-136, 1997
    [71] Joachims, T. Text categorization with support vector machines. Technical report, LS Ⅷ Number 23, University of Dortmund, 1997.
    [72] 张莉，周伟达，焦李成．用于一维图像识别的支撑矢量机方法．红外与毫米波学报．2002，21(2)：119-123．
    [73] 周伟达，张莉，焦李成．自适应支撑矢量机多用户检测．电子学报，31(1)：92-97，2003。
    [74] Mukherjee, S., Osuna, E., Girosi, F. Nonlinear prediction of chaotic time series using a support vector machine. In Proceedings of the IEEE Workshop on Neural Networks for Signal processing 7, Amelia Island, FL, pp. 511-519, 1997.
    [75] Müller, K.-R., Smola, A., Ratsch, G., Schoikopf, B., Kohlmorgen, J., Vapnik, V. Predicting time series with support vector machines. In B. Scholkopf, C. J. C. Burges, and A.J. Smola (Eds.), Advances in Kernel Methods---Support Vector Learning, Cambridge, MA: MIT Press, pp. 243-254. 1999.
    [76] Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V. Support vector regression machines. Advances in Neural Information Processing Systems, 9: 155-161, 1997.
    [77] Weston, J., Gammerman, A., Stitson, M.O., Vapnik, V., et al. Density estimation using support vector machines. Technical report, Royal Holloway College, Report number CSD-TR-97-23, 1997.
    [78] Scholkopf, B., Burges, C., and Vapnik, V. Incorporating invariances in support vector learning machines. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, and B. Sendhoff, editors, Artificial Neural Networks---ICANN'96, Berlin, pp.47-52, 1996. Springer Lecture Notes in Computer Science, Vol. 1112.
    [79] Scholkopf, B., Simard, P., Smola, A., and Vapnik, V. Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, Cambridge, MA: MIT Press, pp. 640-646, 1998.
    [80] Burges, C.J.C. Geometry and invariance in kernel based methods, in Advance in Kernel Methods—Support Vector Learning, B. Scholkopf, C. Burges, and A. Smola, editors, Cambridge, MA: MIT Press, pp. 89-116, 1999.
    [81] Burges, C.J.C. Simplified support vector decision rules, in Proc. 13th Int. conf. Machine Learning, L. Saitta, Ed. San Mateo, CA: Morgan Kaufmann, pp. 71-77, 1996.
    [82] C.J.C. Burges, B. Schoelkopf. Improving speed and accuracy of support vector learning machines. Advances in Neural Information Processing Systems, 9, MIT Press, 1997, pp. 375-381.
    [83] T. Downs, K. Gates, A. Masters. Exact simplification of support vector solutions. Journal of Machine Learning Research, 2: 293-297, December 2001.
    [84] Osuna, E., Freund, R., Girosi, G. Improved training algorithm for support vector machines. Proc. IEEE NNSP'97. Amelia Island. pp. 24-26, 1997.
    [85] Joachims, T. Making large-scale SVM learning practical. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods---Support Vector Learning, Cambridge, MA: MIT Press, pp. 169-184, 1999.
    [86] Platt, J. Fast training of support vector machines using sequential minimal optimization. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods—Support Vector Learning, Pages 185-208, Cambridge, MA: MIT Press. 1999.
    [87] Bennett, K.P., Demiriz, A. Semi-supervised support vector machines. In M.S. Kearns, S.A. Solla, D.A. Cohn, editors, Advance in Neural Information Processing Systems, Cambridge, MA: MIT Press, 12: 368-374, 1998.
    [88] Smola, A., Frieβ, T.T. Scholkopf, B. Semiparametric support vector and linear programming machines, NeuroCOLT Technical Report NC-TR-98-024, 1998.
    [89] Zhang, X. Using class-center vectors to build support vector machines. In Proceedings of NNSP'99, 1999.
    [90] Chun-Fu Lin, and Sheng-De Wang. Fuzzy support vector machines. IEEE Trans. Neural Networks, vol. 13, no. 2, pp. 464-471, 2002.
    [91] Weston, J., Watkins, C. Multi-class Support Vector Machines, Technical Report CSD-TR-98-04, Royal Holloway University of London, 1998
    [92] Bredensteiner, E. J., Bennett, K.P. Multicategory classification by support vector machines. Computational Optimizations and Applications, Pages 53-79,1999.
    [93] Guermeur, Y., Eliseeff, A., Paugam Moisy, H. A new multi-class svm based on a uniform convergence result In S.I. Amari, C.L. Giles, M. Gori, and V. Piuri, editors, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks IJCNN 2000, pages Ⅳ--183--Ⅳ--188, Los Alamitos, 2000. IEEE Computer Society 2000.
    [94] G. Baudat, F. Anouar. Feature vector selection and projection using kernels. Neurocomputing, 55(1-2): 21-28, 2003.
    [95] Saitoh S. Theory of Reproducing Kernels and its Application. Longman Scientific & Technical, Hoarlow, England, 1998.
    [96] C.J.C. Burges. Geometry and invariance in kernel based method. In Advance in Kernel Method-Support Vector Learning. Cambridge, MA: MIT Press, 1999, pp. 86-116.
    [97] Mika S., Ratsch G., Scholkopf B. and et al. Invariant feature extraction and classification in kernel spaces. In Advances in Neural Information Processing System 12, Cambridge, MA, MIT Press, 2000, pp. 526-532.
    [98] Scholkopf B., Mika S., Burges C. J. C. and et al. Input space vs. feature space in kernel-based methods. IEEE Trans. on Neural Networks, 10(5): 1000-1017, 1999.
    [99] Scholkopf B. The kernel trick for distances. Advanced in Neural Information Processing Systems, pp. 301-307, 2000.
    [100] Scholkopf B., Herbrich R., Smola A. J. and et al. A generalized representer theorem. Technical Report 81, NeuroCOLT, 2000.
    [101] Wahba G. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods—Support Vector Learning, Cambridge, MA, MIT Press, pp.69-88, 1999.
    [102] Wahba G. An introduction to model building with reproducing kernel Hilbert spaces. Technical Report 1020, University of Wisconsin-Madison, Statistics Dept., 2000.
    [103] Wahba G., Lin Y. and Zhang H. Gacv for support vector machines. In A. J. Smola, P. L. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, Cambridge, MA, MIT Press, pp. 297-311, 2000.
    [104] Wahba G., Lin Y., Lee Y. and et al., On the relation between the gacv and joachims' xi alpha method for tuning support vector machines, with extensions to the non-standard case. Technical Report 1039, Department of Statistics, University of Wisconsin, Madison WI, 2001.
    [105] Wahba G., Lin Y., Lee Y., and et al. Optimal properties and adaptive tuning of standard and nonstandard support vector machines. In nonlinear Estimation and Classification, New York: Springer, pp. 125-143, 2002.
    [106] Tsuda K. Support vector classifier with asymmetric kernel function. In M. Verleysen, editor, Proceedings of ESANN'99, Brussels, D Facto, pp. 183-188, 1999.
    [107] Tipping M. E. The Relevance Vector Machine. In Sara A. Solla, Todd K. Leen, and Klaus-Robert Müller, editors, Advances in Neural Information Processing Systems 12. Cambridge, Mass: MIT Press, 2000.
    [108] Ruiz A. and et al. Nonlinear kernel-based statistical pattern analysis. IEEE Transactions on Neural Networks, 12(1): 16-32, 2001.
    [109] Pascal Vincent, Yoshua Bengio. Kernel matching pursuit. Machine Learning, 48: 165--187, 2002.
    [110] Mallat S., Z. Zhang (1993, Dec.). Matching pursuit with time-frequency dictionaries. IEEE Trans. Signal Proc. 41 (12), 3397-3415.
    [111] Davis G., Mallat S., Z. Zhang. Adaptive time-frequency decompositions. Optical Engineering 33 (7), 2183-2191.
    [112] S. Mallat. A theory for nuliresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674-693, July 1989.
    [113] Pati Y., Rezaiifar R. and Krishnaprasad P. Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In Proceedings of the 27th Annual Asilomar Conference on Signals, Systems, and Computers, pp. 40-44, 1993.
    [114] 周伟达．核机器学习方法研究．博士论文，西安电子科技大学，西安，中国 2003．
    [115] Qing Li, Licheng Jiao. Adaptive Simplification of Solution for Support Vector Machine. Pattern Recognition, 2006.
    [116] 张莉．支撑矢量机与核方法研究．博士论文，西安电子科技大学，西安，中国 2002．
    [1] Vapnik, V.N. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.
    [2] Vapnik, V.N. Statistical Learning Theory. New York: Wiley, 1998.
    [3] Vapnik, V.N. An Overview of Statistical Learning Theory, IEEE Trans. Neural Network, vol. 10, NO.5 pp. 988-999, 1999.
    [4] Minsky M. and Papert S. Perceptron, MIT Press, 1969.
    [5] T. Poggio. On optimal nonlinear associative recall. Biological Cybernetic, 19: 201-209, 1975.
    [6] T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9), September, 1990.
    [7] Scholkopf, B., Burges, C. and Smola A. J. Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA. 1999.
    [8] Scholkopf, B and Smola A. J. Learning with Kernels. 2002.
    [9] Campbell C. An introduction to kernel methods. In R. J. Howlett and L. C. Jain, editors, Radial Basis Function Networks: Design and Applications, Springer Verlag, Berlin, pp. 155-192, 2000.
    [10] Cristianini N. and Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge Press, 2000.
    [11] Mercer, J. Function of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209: 415-446, 1909.
    [12] Smola, A., and Scholkopf, B. A tutorial on support vector regression, NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK, 1998.
    [13] Burges, C.J.C. A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, vol. 2, no.2, pp. 1-47, 1998
    [14] Cortes, C., Vapnik, V. Support Vector Networks. Machine Learning, 20: 1-25, 1995.
    [15] Smola, A. Regression estimation with support vector learning machines. Master's thesis, Technische Universitat München, 1996.
    [16] Fletcher, R. Practical Methods of Optimization. John Wiley and Sons, Inc., 2nd edition, 1987.
    [17] Pascal Vincent and Yoshua Bengio. Kernel matching pursuit. Machine Learning, 48: 165--187, 2002.
    [18] Mallat S. A theory for nuliresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674-693, July 1989.
    [19] Davis G., Mallat S., Z. Zhang. Adaptive time-frequency decompositions. Optical Engineering 33 (7), 2183-2191.
    [20] Mallat S., Z. Zhang (1993, Dec.). Matching pursuit with time-frequency dictionaries. IEEE Trans. Signal Proc. 41 (12), 3397-3415.
    [21] Scholkopf, B., Smola, A. Learning with Kernels. MIT Press, 1999.
    [22] Burges C. J. C. Geometry and invariance in kernel based method. In Advance in Kernel Method-Support Vector Learning. Cambridge, MA: MIT Press, 1999, pp. 86-116.
    [23] Engel, Y., Mannor, S., Meir, R. The Kernel Recursive Least-Squares Algorithm. IEEE Trans. Signal Processing, vol. 52, Issue: 8, pp. 2275-2285, August 2004.
    [24] Gunnar Ratsch. Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification. IEEE Trans, Pattern Analysis and Machine Intelligence, vol.24, no.9, 1184-1198. September, 2002.
    [1] Vapnik, V.N. The Nature of Statistical Learning Theory. New York: Spring-Verlag, 1995.(中文版：张学工译．《统计学习理论的本质》．北京：清华大学出版社，2000)．
    [2] Vapnik, V.N. An Overview of Statistical Learning Theory, IEEE Trans. Neural Network, vol. 10, NO.5 pp. 988-999, 1999.
    [3] 边肇祺，张学工，《模式识别》，北京，清华大学出版社，2000年．
    [4] Vapnik, V.N. Statistical Learning Theory. New York: Wiley, 1998.
    [5] Scholkopf, B and Smola A. J. Learning with Kernels. 2002.
    [6] Campbell C. An introduction to kernel methods. In R. J. Howler and L. C. Jain, editors, Radial Basis Function Networks: Design and Applications, Springer Verlag, Berlin, pp. 155-192, 2000.
    [7] Cristianini N. and Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge Press, 2000.
    [8] Burges, C.J.C. A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, vol. 2, no.2, pp. 1-47, 1998
    [9] Scholkopf, B., Burges, C. and Smola A. J. Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA. 1999.
    [10] Cortes, C., Vapnik, V. Support Vector Networks. Machine Learning, 20: 1-25, 1995.
    [11] Edgar Osuna, Robert Freund, Federico Girosi. An Improved Training Algorithm for Support Vector Machines. Proc. of IEEE NNSP'97. Amelia Island., FL. 24-26 Sep, 1997.
    [12] Platt J. Sequential Minimal Optimization: a Fast Algorithm for Training Support Vector Machine. Advanced in Kernel Methods--Support Vector Learning. MA: MIT Press, 185-208, 1998.
    [13] Joachims T. Making Large-Scale SVM Learning Practical. Advanced in Kernel Methods—Support Vector Learning. MA: MIT Press, 169-184, 1998.
    [14] 焦李成，张莉，周伟达。支撑矢量预选取的中心距离比值法。电子学报，vol．29，no．3，March 2001．
    [15] C. Blake, and C. Merz. UCI repository of machine learning databases.
    [16] 张莉．支撑矢量机与核方法研究．博士论文，西安电子科技大学，西安，中国 2002．
    [1] V. Vapnik. An overview of statistical learning theory. IEEE Trans. Neural Network, vol. 10, no.5, pp.988-999, 1999.
    [2] C. Cortes, V. Vapnik. Support vector network. Mach. Learn. 20 (1995) 273-297.
    [3] V. Vapnik. Three remarks on support vector machine. In: S. A. Solla, T. K. Leen, K. R. Muller (Eds.), Advances in Neural Comput. 10 (1998), pp. 1299-1319.
    [4] V. Vapnik. The Nature of statistical learning Theory. New York: Spinger-Verlag, 1995.

    [5] A. Smola, and B. Scholkopf. A tutorial on support vector regression. Royal Holloway Col., Univ. London, UK. Neuro Tech. Rep. NC-TR-98-030, 1998.
    [6] Cristianini N. and Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge Press, 2000.
    [7] V. Vapnik, S. E. Golowich, and A. J. Smola. Support vector method for function approximation, regression estimation and signal processing. Adv. Neural Information Processing Syst. 9, pp. 281-287, 1996.
    [8] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discovery. 2 (2), 1998, pp. 121-167.
    [9] T. Graepel, R. Herbrich, and J. Shawe-Taylor. Generalization Error Bounds for Sparse Linear Classifiers. In Thirteenth Annual Conference on Computational Learning Theroy, 2000, pp. in press. Morgan Kanfmann.
    [10]B. Scholkopf, and A. Smola. Learning with Kernels. MIT Press, 1999.
    [11]M. Schmidt. Identifying speaker with support vector networks. In Interface '96 Proc, Sydney, Australia, 1996.
    [12]Kwang In Kim, Keechul Jung, Se Hyun Park, and Hang Joon Kim. Support vector machines for texture classification. IEEE Trans. on pattern analysis and machine intelligence, vol. 24, no. 11, pp.1542-1550, Nov, 2002.
    [13]L. J. Cao, and E. H. Francis. Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans. Neural Networks, vol. 14, no. 6, pp. 1506-1518,2003.
    [14]O. Chapelle, P. Haffner, and V. Vapnik. Support vector machines for histogram image classification. IEEE Trans. on Neural Networks, no. 5, vol. 10, pp. 1055-1064, Sep., 1999.
    [15]C. J. C. Burges. Simplified support vector decision rules. Proceedings 13th International Conference on Machine Learning, Bari, Italy, 1996, pp. 71-77.
    [16]C. J. C. Burges, B. Schoelkopf. Improving speed and accuracy of support vector learning machines. Advances in Neural Information Processing Systems, 9, MIT Press, 1997, pp. 375-381.
    [17]T. Downs, K. Gates, A. Masters. Exact simplification of support vector solutions. Journal of Machine Learning Research, 2:293-297, December 2001.
    [18]J. Brank, M. Grobelnik, N. Milic-Frayling, and D. Mladenic. Feature selection using linear support vector machines. Microsoft Research Technical Report MSR-TR-2002-63, 12 June 2002.
    [19]G. Baudat, F. Anouar. Feature vector selection and projection using kernels. Neurocomputing, 55(1-2): 21-28, 2003.

    [20] C. J. C. Burges. Geometry and invariance in kernel based method. In Advance in Kernel Method-Support Vector Learning. Cambridge, MA: MIT Press, 1999, pp.86-116.
    [21]K. J. Lang, and M. J. Witbrock. Learning to tell two spirals apart. In Proc. 1989 Connectionist Models Summer School, 1989, pp.52-61.
    [22]C. Blake, and C. Merz. UCI repository of machine learning databases.
    [1] Hardle W., Kerkyacharian G., Picard D. and et al. Wavelets, Approximation, and Statistical Application. Spinger-Verlag, New York, Inc., 1998.
    [2] Strang G. Wavelets and dilation equations: A Introduction. SIAM Rev., 1989, 23: 614-627.
    [3] Sweldens W. and Piessens R. Quadrature formulate and asymptotic error expansions for wavelet approximation of smooth functions. SIAM J. Numer. Anal., 31(2): 427-441, 1994.
    [4] Berger J., Coifman R. and Goldberg M. Removing noise from music using local trigonometric bases and wavelet packages. J. Audio Eng. Soci., 42(10): 808-818, October 1994.
    [5] Berger T. Stromberg J. O. Exact reconstruction algorithms for the discrete wavelet transform using spline wavelets. J. of Appl. And Comput. Harmonic Analysis, 2: 392-397, 1995.
    [6] Akansu A. N., Haddad R. A. and Caglar H. The binomial QMF-wavelet transform for multiresolution signal decomposition. IEEE Trans. on Signal Processing, SP-40, 1992.
    [7] Pascal Vincent, Yoshua Bengio. Kernel matching pursuit. Machine Learning, 48: 165--187, 2002.
    [8] Engel, Y., Mannor, S., Meir, R. The kernel recursive least-squares algorithm. IEEE Trans. Signal Processing, vol. 52, Issue: 8, pp. 2275-2285, August 2004.
    [9] Davis, G., S. Mallat, and Z. Zhang. Adaptive time-frequency decompositions. Optical Engineering 33 (7), 2183-2191.
    [10] S. Mallat, Z. Zhang (1993, Dec.). Matching pursuit with time-frequency dictionaries. IEEE Trans. Signal Proc. 41 (12), 3397-3415.
    [11] S. Mallat. A theory for nuliresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674-693, July 1989.
    [12] S. Mallat, S. A wavelet tour of signal processing, Second Edition. Beijing: China Machine Press, 2003.
    [13] Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Network, vol. 10, no.5, pp.988-999, 1999.
    [14] C. J. C. Burges. Geometry and invariance in kernel based method. In Advance in Kernel Method-Support Vector Learning. Cambridge, MA: MIT Press, 1999, pp. 86-116.
    [15] A. Smola, B. Scholkopf, and K. R. Müller. The connection between regularization operators and support vector kernels. Neural Network, vol. 11, pp. 637-649, 1998.
    [16] Friedman, J. (1999). Greedy function approximation: a gradient boosting machine. IMS 1999 Reitz Lecture, February 24, 1999, Dept. of Statistics, Stanford University.
    [17] Graepel, T., R. Herbrich, J. Shawe-Taylor (2000). Generalization Error Bounds for Sparse Linear Classifiers. In Thirteenth Annual Conference on Computational Learning Theroy, 2000, pp. in press. Morgan Kanfmann.
    [18] Q. H. Zhang, A. Benveniste. Wavelet networks. IEEE Trans. Neural Networks, vol. 3, pp. 889-898, Nov. 1992.
    [19] Daubechies I. The wavelet transform time-frequency localization and signal analysis. IEEE Trans on IT, 1990, 36: 961-1005.
    [20] Unser, M. Approximation Power of Biorthogonal Wavelet Expansions, IEEE Transactions On Signal Processing, Vol. 44, No. 3, pp. 519-527, 1996.
    [21] Licheng Jiao, Jin Pan, Yangwang Fang. Multiwavelet neural network and its approximation properties. IEEE Trans. Neural Networks, vol. 12, no. 5, Sep. 2001.
    [22] 赵松年，熊小芸．子波变换与子波分析，北京：电子工业出版社，1997
    [23] Mercer, J. Function of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209: 415-446, 1909.
    [24] Licheng Jiao, Jin Pan, Yangwang Fang. Multiwavelet neural network and its approximation properties. IEEE Trans. Neural Networks, vol. 12, no. 5, Sep. 2001.
    [25] Effros, M. Universal multiresolution source codes. IEEE Transactions, Information Theory, on vol. 47, pp. 2113-2129. 2001.
    [26] Tira, Z.; Duvaut, P.; Brunel, P.; Georgel, B. Oblique multiresolution analysis. Proceedings of the IEEE-SP International Symposium, Time-Frequency and Time-Scale Analysis, pp. 377-380, 1994.
    [27] L. Zhang, W. D. Zhou, L. C. Jiao. Wavelet support vector machine. IEEE Trans. On Systems, Man, and Cybernetics. Part B: Cybernetics. Vol. 34, no. 1, February 2004.
    [28] Bastys, A. Periodic shift-invariant multiresolution analysis. IEEE, Digital Signal Processing Workshop Proceedings, pp. 398-400, 1996.
    [29] Yang, J., Shen, L. Time-frequency localisation of Shannon wavelet packets. IEEE Proceedings, Vision, Image and Signal Processing, Volume: 150, Issue: 6, pp. 365-369, 2003.
    [30] C. Blake, and C. Merz. UCI repository of machine learning databases.
    [31] Lang K.J., Witbrock M.J. Learning to tell two spirals apart. In Proc. 1989 Connectionist Models Summer School, 1989, pp. 52-61.
    [1] Pascal Vincent, and Yoshua Bengio. Kernel matching pursuit. Machine Learning, 48: 165--187, 2002.
    [2] S. Mallat. A theory for nuliresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674-693, July 1989.
    [3] S. Mallat. A wavelet tour of signal processing, Second Edition. Beijing: China Machine Press, 2003.
    [4] G. Davis, S. Mallat, and Z. Zhang. Adaptive time-frequency decompositions. Optical Engineering 33(7), 2183-2191.
    [5] S. Mallat, and Z. Zhang (1993, Dec.). Matching pursuit with time-frequency dictionaries. IEEE Trans. Signal Proc. 41 (12), 3397-3415.
    [6] V. N. Vapnik. The nature of statistical learning theory. New York: Spinger-Verlag, 1995.
    [7] T. Graepel, R. Herbrich, and J. Shawe-Taylor (2000). Generalization Error Bounds for Sparse Linear Classifiers. In Thirteenth Annual Conference on Computational Learning Theroy, 2000, pp. in press. Morgan Kanfmann.
    [8] C. Cortes, and V. N. Vapnik, Support vector networks. Mach. Learn., vol. 20, pp. 273-297, 1995.
    [9] V. N. Vapnik. Three remarks on support vector machine. Advances in neural comput. 10(1998), pp. 1299-1319.
    [10] Chun-Fu Lin, and Sheng-De Wang. Fuzzy support vector machines. IEEE Trans. Neural Networks, vol. 13, no. 2, pp. 464-471, 2002.
    [11] L. J. Cao, and E. H. Francis. Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans. Neural Networks, vol. 14, no. 6, pp. 1506-1518, 2003.
    [12] V. N. Vapnik. An overview of statistical learning theory. IEEE Trans. Neural Network, vol. 10, no.5, pp.988-999, 1999.
    [13] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2): 1-47.
    [14] B. Scholkopf, and A. Smola. Learning with kernels. MIT Press, 1999.
    [15] J. Friedman. Greedy function approximation: a gradient boosting machine. IMS 1999 Reitz Lecture, February 24, 1999, Dept. of Statistics, Stanford University.
    [16] L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. In S. A. Solla, T. K. Leen, et K.-R. Müller (Eds), Advances In Neural Information Processing Systems 12, pp. 512-518. The MIT Press.
    [17] R. E. Schapire, Y. Freund, P. Barlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics 26(5), pp. 1651-1686, 1998.
    [18] M. Elad, Y. Hel-Or, and R. Keshset. Pattern Detection Using a Maximal Rejection Classifier. Pattern Recognition Letters, 23(12) 1459-1471, October 2002.
    [19] N. de Freitas, M. Milo, P. Clarkson, M. Niranjan, and A. Gee. Sequential support vector machines. Proc. IEEE NNSP'99, pp. 31-40, 1999.
    [20] C. Blake, and C. Merz. UCI repository of machine learning databases.
    [1] Thomas G. Dietterich. Ensemble Methods in Machine Learning. Multiple Classifier Systems, Lecture Notes in Computer Science, Calgari, Italy, 2000.
    [2] Zhi-Hua Zhou, Jianxin Wu, Wei Tang. Ensembling Neural Networks: Many could Be Better Than All. Artificial Intelligence, 2002, vol. 137, no. 1-2, pp. 239-263.
    [3] Pascal Vincent and Yoshua Bengio. Kernel matching pursuit. Machine Learning, 48: 165--187, 2002.
    [4] Licheng Jiao and Qing Li. Kernel Matching Pursuit Classifier Ensemble. Pattern Recognition, Vol. 39, Iss. 4, pp. 587-594, April 2006.
    [5] Schapire R.E. "The strength of weak learn ability", Machine Learning, Vol. 5, pp. 197-227, 1990
    [6] Freund Y., "Boosting a weak algorithm by majority", Information and Computation, 121 (2), pp. 256-285, 1995
    [7] Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of on-line learning and An Application to Boosting. Journal of Computer and System Science, Vol. 55(1) 119-139, 1997
    [8] Breiman L., "Bagging predictors", Machine learning, Vol. 24, pp. 123-140, 1996.
    [9] C. Y. Ji and S. Ma, "Combinations of weak classifiers". IEEE Transaction on Netral Networks, Vol. 8, pp 32-42, 1997
    [10] Xu L., Krzyzak A., and Suen C. Y., "Methods of combining multiple classifiers and their applications to handwriting recognition", IEEE Transaction on System, Man, and cybernation, Vol. 23(3), pp. 418-435, 1992.
    [11] Jocobs R.A., "Methods for combining experts' probability assessments," Nerual Computation, Vol. 7(5), pp. 867-888, 1995.
    [12] Xu L., Krzyzak A., and Suen C.Y., "Associative switch for combining multiple classifiers," Journal of Artificial Neural Networks, Vol. 1 (1), pp. 77-100, 1994.
    [13] Chen K. and Chi H.S., "A method of combining multiple probabilistic classifiers through soft competition on different feature sets," Neurocomputing--An International Journal, Elsevier Science Press. Vol. 20(1-3), pp. 227-252, 1998.
    [14] Davis, G.., S. Mallat, and Z. Zhang. Adaptive time-frequency decompositions. Optical Engineering 33(7), 2183-2191.
    [15] Vapnik, V.N. The Nature of Statistical Learning Theory. New York: Spring-Verlag, 1995..
    [16] Vapnik, V.N. An overview of statistical learning theory, IEEE Trans. Neural Network, vol. 10, no.5, pp.988-999, 1999.
    [17] Graepel, T., R. Herbrich, and J. Shawe-Taylor (2000). Generalization Error Bounds for Sparse Linear Classifiers. In Thirteenth Annual Conference on Computational Learning Theroy, 2000, pp. in press. Morgan Kanfmann.
    [18] Hyun-Chul Kim, Shaoning Pan. Constructing Support Vector Machine Ensemble. Pattern Recognition, 36 (2003), 2757-2767.
    [19] Ratsch, G., Demiriz, A., and Bennett, K. Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces. Machine Learning, vol. 48, nos. 1-3, pp. 193-221, 2002.
    [20] Thomas G. Dietterich. An Experimental Comparition of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning, 26 (1998), pp. 1-22.
    [21] C. Blake, and C. Merz. UCI repository of machine learning databases.
    [22] Hansen, L., Salamon, P. (1990). Neural Network Ensembles. IEEE Trans. Pattern Analysis and Machine Intell., 12, 993-1001.
    [23] Mallat, S. and Z. Zhang (1993, Dec.). Matching Pursuit with Time-Frequency Dictionaries. IEEE Trans. Signal Proc. 41 (12), 3397-3415.
    [24] Quinlan, J. R. (1996). Bagging, Boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725-730 Cambridge, MA. AAAI Press/MIT Press.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700