基于子空间分析的特征抽取及分类方法研究

英文题名：Feature Extraction and Classification Based on Supspace Analysing and Its Applications
作者：徐洁
论文级别：博士
学科专业名称：模式识别与智能系统
中文关键词：特征抽取 ; 分类 ; 图嵌入 ; 递归 ; 稀疏 ; 模糊集理论 ; 核
英文关键词：feature extraction ; classification ; graph embedding ; recursion ; sparsity ; Fuzzy
英文关键词：set theory ; Kernel
学位年度：2012
导师：杨健
学科代码：081104
学位授予单位：南京理工大学
论文提交日期：2012-10-01

摘要

特征抽取和分类是模式识别领域的两大热点,其主要任务是根据样本图像中的有效信息进行个体的类别识别。本文以代数统计为研究工具,在子空间学习的基础上,提出了新的特征抽取及分类方法,并将其与现阶段的主流方法进行了比较,验证了本文方法的有效性。本文的主要工作集中在以下几个方面：
     (1)提出稀疏Fisher线性鉴别分析算法。利用最小二次优化问题与Fisher线性鉴别分析在两类模式的识别问题上的等价性,从求解最小二次优化问题获得稀疏Fisher线性鉴别投影。所获得的稀疏Fisher线性鉴别投影可帮助我们从变量层面上发现是哪些变量在我们的鉴别过程中起了核心作用,这些变量对应着哪些物理功能等,从而使我们对数据有更深层次的理解。另外,由于稀疏Fisher鉴别投影从求解最小二次优化问题得到,因此避免了对特征方程的求解,这在很大程度上减少计算的花销。除此之外,稀疏鉴别向量比紧致的鉴别向量所需的存贮空间也更少。
     (2)提出了处理单样本识别问题的局部图嵌入鉴别分析算法。从单样本识别问题存在的不足入手,作了以下两方面进行尝试：一是利用均值滤波器增加训练样本,以缓解训练样本不足的问题；二是考虑数据的局部信息,利用图嵌入对数据的局部结构进行刻画。综合以上两方面,所设计的局部图嵌入鉴别分析算法很好地避免了“小样本问题”的出现,这对提高系统的识别性能和稳定性有很大的帮助。
     (3)提出了去相关局部保持投影算法(RLPP)。在局部保持投影(LPP)算法基础上,利用递归的方法,逐一得到去相关的鉴别投影。与现有的不相关局部保持投影不同,所提出的RLPP从另外的角度给出了去相关鉴别投影的求解方法,这一方法简单且有效,可作为其它算法发展去相关鉴别投影的借鉴。
     (4)提出了基于统一度量的特征抽取与分类器设计的一体化框架。从分类器出发,利用其有效的分类度量,设计出与之相匹配的特征抽取算法。以正则化K局部超平面最近距离分类器(RHKNN)为例,我们提出了与RHKNN相匹配的局部鉴别分析算法(HOLDA)。RHKNN+HOLDA作为这一框架下发展而来的识别系统,对整体识别性能的提高有很大的帮助。
     (5)提出了模糊相似近邻分类器(FSNC)。FSNC引入了“模糊集”理论,从数据间的“相似度”出发,对未知测试样本的类别隶属度作了具体量化,依据量化的结果给出分类的判断。在“相似近邻”的寻找上,借助了非负稀疏表示算法的优势,自动获取到“相似近邻”及其“相似度”,这在很大程度上减少了由人为因素对系统造成的负面影响,使FSNC的分类结果更确实可信。
     (6)提出了核Hilbert空间下的正则化线性回归分类器。算法主要对线性回归分类器(LRC)作了以下两方面的改进：一是对其原有的度量作了L1-范数的正则化处理,使得正则化后的LRC度量更具可靠性,这很好地提高LRC的分类性能；二是对正则化的LRC作核化拓展,使得在原空间线性不可分的数据样本,在核Hilbert空间下更具可分性。由于对LRC施加了L1-范数正则化约束,因此在没获得确切投影函数的基础上,要完成核化处理,并非件易事。然而,我们借助核技巧和微积分理论,成功化解了难题,完成了带L1范数约束的最小二次优化问题的核化扩展。
Both feature extraction and classification techniques are two main hot branches in the field of pattern recognition. The aim of them is to recognize individual identities according to the effective information in images. In this paper, based on the subspace learning we use algebra statistics as our tool to develop some novel feature extraction techniques and classifiers. Furthermore, we compare these developed approaches with the current popular recognition algorithms and verify the effectiveness of our approaches. The main work and innovation of this dissertation are included as follows:
     (1) A sparse Fisher linear discriminative analysis (SFLDA) is proposed. Utilizing the equivalence of Fisher linear discriminant analysis (FLDA) and least squares Fisher linear regression (LSLR) on the binary-class recognition problem, we obtain the sparse discriminative vectors from solving the least squares optimization problem. The obtained sparse Fisher linear discriminative vectors can help us to find the main factors which affect the decision, and the psychological, physiological or physical interpretations from the sparse discriminative vectors. In addition, due to the fact that the sparse Fisher linear discriminative vectors are obtained from solving the least squares optimization imposed with the L1-norm constrain on coefficients, rather than solving the generalization eigen-equation,it can help us to save the time-costing.
     (2) A local graph embedding discriminant analysis is proposed for face recognition with single training sample per person. Due to the fact that only one training sample per class is avaliable, we present strategies to overcome this limitation from the following two aspects:one is to construct imitated training samples using the mean filter with the window of2x2; the other is to use the graph embedding to character the local data structure, rather than the global one. Based on above considerations of two aspects, the resulting local graph embedding discriminant analysis can successfully avoid the "small sample size problem", and the resulting recognition system become more stable and the corresponding recognition performance can be boosted a lot.
     (3) A de-correlated locality preserve projection (RLPP) is proposed. Based on the locality preserve projection (LPP) algorithm, we use the recursive method to obtain the de-correlated discriminative vectors, one by one. Unlike the existing the uncorrelated locality preserve projection (ULPP), the proposed RLPP present us with another simple but effective way in finding the de-correlated discriminative vectors. The existing ULPP and the proposed RLPP are from different ways to recognize the same things. Thus, one can use either of them to develop the de-correlated version for any feature extractors.
     (4) A unified framework of developing a recognition system combining the feature extractor and the classifier together under the same measure metric is designed. Specifically, select an effective classifier, with whose measure metric we can design a mathched feature extraction approach. Taking the regularized K-local hyperplane distance nearest neighbor (RHKNN) classifier as an example, we develop the RHKNN classification oriented local discriminant analysis (HOLDA). The recognition performance of the system combining RHKNN with HOLDA can be improved, since under the same measure metric the features extracted by HOLDA should be very suitable for RHKNN.
     (5) A fuzzy similar neighbor classifation (FSNC) algorithm is proposed. Taking the "similarity"between the samples into account and introducing the "Fuzzy set theory"into the algorithm, the similarity between each query sample and every category can be specified. Based on the obtained similarity, the decision can be made. It is worth to notice that the "similar neighbor"and the "similarity" between the sample and the category can be obtained automatically by taking the advantage of nonnegative sparse representation method. In such a way, the negative influence of artificial on the classification performance can be reduced and should benefit for the higher recognition accuracy.
     (6) Based on the linear regression classifier (LRC), we develop a kernel LASSO regression classifier (LASSO-KRC). LASSO-KRC is an improved version of LRC. We improve the LRC from the following two aspects:one is to impose the L1-norm on the regression coefficients, such that the measure metric of LRC is more reliable; the other is to extend the regularized LRC to nonlinear case, i.e. the kernel extension of the regularized LRC, such that the samples in the kernel Hilbert space are more separable. As we all know, without an explicit mapping function, it is not an easy thing to develop the kernel version of L1-LRC. To this end, we use the kernel trick and the theory of Calculus, and successfully extend the L1-LRC to nonlinear case. Motivated by this, many least square optimizations imposed with the L1-norm constrain can easily develop their own kernel versions.

引文

[1]边肇祺,张学工.模式识别(第二版).北京：清华大学出版社,1999.
    [2]肖健华.智能模式识别方法.广州：华南理工大学出版,2006.
    [3]刘青山,卢汉清,马颂德.综述人脸识别中的子空间方法.自动化学报,2003,29(6)：900-911.
    [4]刘艳艳.特征抽取方法及其在人脸识别中的应用[博士论文].大连理工大学,大连,2008.
    [5]R.A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics,1936,7:178-188.
    [6]S.S. Wilks. Mathematical Statistics. Wiley, New York,1962:577-578.
    [7]R. Duda and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973,3.
    [8]D.H. Foley and J.W.Jr. Sammon. An optimal set of discriminant vectors. IEEE Trans. Computer,1975,24(3):281-289.
    [9]Z. Jin, J.Y. Yang, Z.M. Tang, Z.S. Hu. A theorem on uncorrelated optimal discriminant vectors. Pattern Recognition,2001,34(10):2041-2047.
    [10]Z.Jin, J. Y.Yang, Z.S. Hu, Z. Lou. Face recognition based on the uncorrelated discriminant transformation. Pattern Recognition,2001,34(7):1405-1416.
    [11]X. Wang, X. Tang. Dual-space linear discriminant analysis for face recognition, in Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR'04),2004:564-569.
    [12]H. Cevikalp, M. Neamtu, M. Wilkes, A. Barkana, Discriminative common vectors for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005,27(1):4-13.
    [13]J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos. Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition, Pattern Recognition Letter,2005,26(2):181-191.
    [14]Z.Q. Hong, J.Y. Yang. Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognition,1991,24(4):317-324.
    [15]D.Q. Dai, P.C. Yuen. Regularized discriminant analysis and its application to face recognition, Pattern Recognition,2003,36 (3):845-847.
    [16]P. Zhang, J. Peng, N. Riedel. Discriminant analysis:a least squares approximation view. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05),2005:46-46.
    [17]J. Yang, J.Y. Yang. Why can LDA be performed in PCA transformed space? Pattern Recognition,2003,36(2):563-566.
    [18]H. Yu, J. Yang. A direct LDA algorithm for high-dimensional data with application to face recognition.Pattern Recognition,2001,34(10):2067-2070.
    [19]M. Li, B. Yuan.2D-LDA:A novel statistical linear discriminant analysis for image matrix. Pattern Recognition Letter,2005,26(55):527-532.
    [20]H. Huang, J.W. Li, J.M. Liu. Enhanced semi-supervised local Fisher discriminant analysis for face recognition. Future Generation Computer systems,2012,28(1): 244-253.
    [21]X.S. Zhuang, D.Q. Dai. Inverse fisher discriminate criteria for small sample size problem and its application to face recognition. Pattern Recognition,2005,38(11): 2192-2194.
    [22]H.L. Xiong, M.N.S. Swanmy, M.O. Ahmad. Two-dimensional FLD for face recognition. Pattern Recognition,2005,38:1121-1124.
    [23]J. Yang, D. Zhang, Y. Xu, J.J. Yang. Two-dimensional discriminant transform for face recognition. Pattern Recognition,2005,38:1125-r1129.
    [24]L.J. Yan, J.S. Pan, S.C. Chu. Muhammad Khurram Khan:Adaptively weighted sub-directional two-dimensional linear discriminant analysis for face recognition. Future Generation Computer systems,2012,28(1):232-235.
    [25]J. Ye, R. Janardan, Q. Li. Two-dimensional linear discriminant analysis. In Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada,2004, 1569-1576.
    [26]M.A.O. Vasilescu and D. Terzopoulos. Multilinear subspace analysis of image ensemble. Proe. of Computer Vision and Pattern Recognition,2003:93-99.
    [27]J. Yang, Z. David, J.Y. Yang. Two-Dimensional PCA:A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(1):131-137.
    [28]J.P. Ye. Generalized low rank approximation matrices. Machine Learning Journal.2005,61:167-191.
    [29]A. Shashua, A. Levin. Linear image coding for regression and classification using the tensor-rank-principle. Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA,2001,1:42-49.
    [30]M.Turk, A. Pentland, Eigenfaces for recognition. J. Cognitive Neurosci.1991, 3(1):72-86.
    [31]J. Yang, D. Zhang, Y. Xu, J.J. Yang. Two-dimensional discriminant transform for face recognition. Pattern Recognition,2005,38:1125-1129.
    [32]D. Xu, S.C. Yan, L. Zhang. Reconstruction and recognition of Tensor-based objects with consurrent subspace analysis.IEEE Transactions on Circuits and systems for Video Technology,2008,18(1):36-47.
    [33]S.C. Yan, D. Xu, Q. Yang. Multilinear disrciminant analysis for Face recognition. IEEE Transactions on Image Processing,2007,16(1):212-220.
    [34]M.A.O. Vasilescu, D. Terzopoulos. Multilinear Analysis of Image Ensembles: Tensorfaces. In:Proceedings of the 7th European Conference on Computer Vision, 2002:447-460.
    [35]G. Dai, D.Y. Yueng. Tensor embedding methods. In:Proceedings of the 21th National Conference on Artificial Intelligence,2006:330-335.
    [36]X. Li, S. Lin, S. Chen. Discriminant locally linear embedding. IEEE Transactions on Systems, Man and cybernetics,2008,38(2):342-352.
    [37]李勇周.人脸识别中基于流形学习的子空间特征抽取方法研究[博士论文].中南大学,湖南,2009.
    [38]A.J. Bell, T.J. Sejnowski. An Information-maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation,1995,7(6):1129-1159.
    [39]J.F. Cardos. High-order contrasts for independent component analysis. Neural Computation,1999,11(1):157-192.
    [40]J.F. Cardos. Multidimensional indpendent component analysis. IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, WA, USA,1998: 1942-1994.
    [41]K.J. Johnson, Robert E. Synovec. Pattern Recognition of Jet Fuels:Comprehensive GC×GC with ANOVA-based Feature Selection and Principal Component analysis. Chemometric and Intelligent Laboratory systems,2002,60(1-2):225-237.
    [42]P. Comon. Independent component analysis:Anew concept? Signal Processing,1994, 36(3):287-314.
    [43]P. Paatero and U. Tapper. Positive matrix factorization:A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics,1994,5: 111-126.
    [44]D.D. Lee, H.S. Seung. Unsupervised learning by convex and conic coding. Advances in Neural Information Processing System,1997,9:515-521.
    [45]D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature,1999,401:788-791.
    [46]D.D. Lee and H.S. Seung. Algorithms for non-negative matrix factorization. In: Proc. ofAdvance in Neural Information Processing System,2001,13:556-562.
    [47]W.X. Liu, N. Zheng. Non-negative matrix factorization based methods for object recognition, Pattern Recognition Letters,2004,25(8):893-897.
    [48]M. Aizerman, E. Braverman, and L. Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 1964,25:821-837.
    [49]B.E. Boser, I.M. Guyon and V. Vapnik. A training algorithm for optimal margin classifiers. In Haussler D, editor, Processing O the 5th Annual ACM Workshop on Computational Learning Theory(COLT)ACM Press, July 1992:144-152.
    [50]VN. Vapnik.统计学习理论的本质(中译本).北京：清华大学出版社,2000.
    [51]VN. Vapnik, E. Levin, and Y.L. Cun. Measuring the VC dimension of a learning machine. Neural Computation,1994,6:851-876.
    [52]E. Osuna, R. Freund, F. Girosi. An improved training algorithm for support vector machines. In IEEE Workshop on Neural Networks for Signal Processing,1997. New York:IEEE.
    [53]B. Scholkopf, C. Burges, VN.Vapnik. Incorporating invariances in support vector learning machines. In Artificial Neural Networks ICANN'96,1996, Berlin:Springer.
    [54]B. Scholkopf, K.K. Sung, C.J.C. Burges. Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans on Signal Processing, 1997,45(11):2758-2765.
    [55]J.C. Burges. A Tutorial on Support Vector machines for pattern recognition. Data Mining and Knowledge Discovery,1998.2(2):955-974.
    [56]陶卿,姚穗,范劲松.一种新的机器学习算法：Support Vector Machines模式识别与人工智能,2000,13(3)：285-289.
    [57]肖嵘.基于支持向量机的模式识别技术中若干问题的研究.2001,南京大学：南京.
    [58]J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press,2004. ISBN 0-521-81397-2.
    [59]B. Scholkopf, S. Mika, C. Burges, P. Knirsch, K.R. Muller, Cz. Ratsch, A. Smola. Input space vs. feature space in kernel-based methods. IEEE trans, on Neural Networks,1999,10(5):1000-1017.
    [60]S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K.R. Muller. Fisher discriminant analysis with kernels. IEEE International Workshop on Neural Networks for Signal Processing IX, Madison (USA), August,1999:41-48.
    [61]S. Mika, G. Ratsch, K.R. Muller. A mathematical programming approach to the Kernel Fisher algorithm. In T.K. Leen, T.Cz Dietterich, and V Tresp, editors, Advances in Neural Information Processing Systems, MIT Press,2001,13:591-597.
    [62]S. Mika, A.J. Smola, B. Scholkopf. An improved training algorithm for kernel fisher discriminants. In T. Jaakkola and T. Richardson, editors, Proceedings AISTATS 2001, San Francisco, Morgan Kaufmann, CA,2001:98-104.
    [63]S. Mika, G. Ratsch, J. Weston, B. Scholkopf, A. Smola, and K. R. Muller. Constructing descriptive and discriminative non-linear features:Rayleigh coefficients in kernel feature spaces. IEEE Transaction on Pattern Analysis and Machine Intelligence,2003, 25(5):623-628.
    [64]G. Baudat, F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation,2000,12 (10):238s-2404.
    [65]J.V. Roth, V. Steinhage. Nonlinear discriminant analysis using kernel functions.In prioc. Of Neural Information Processing systems, Denver, Nov.1999.
    [66]F.R. Bach, M.I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research,2002,3:1-48.
    [67]G. Baudat, F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation,2000,12 (10):2385-2404.
    [68]K.Y. Chang and J. Ghosh. A unified model for probabilistic principal surfaces. IEEE Trans Pattern Anal and Mach Intelligence, Jan.2001,23(1):22-41.
    [69]L.K. Saul and S.T. Roweis. Think Globally, Fit Locally:Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research,2003, (4):119-155.
    [70]J. Gomes and A. Mojsilovic. A variational approach to recovering a manifold from sample points. Proc. European Conf. Computer Vision,2002, Copenhagen, May.
    [71]G. Hinton, P. Dayan, M. Revow. Modeling the manifolds of images of handwritten digits. IEEE Transactions on Neural Network,1997,8(1):65-74.
    [72]J.B. Tenenbaum, V. de Silva and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science,2000,290:2319-2323.
    [73]S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science,2000,290:2323-2326.
    [74]D.L. Donoho and C. Grimes. Hessian eigenmaps:New locally linear embedding techniques for high-dimensional data. Proc. of the National Academy of Science, 2003,100(10):5591-5596.
    [75]M. Belkin, P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation.2003,15(6):1373-1396.
    [76]Z. Zhang, H. Zha. Principle manifolds and nonlinear dimension reduction vialocal tangent space alignment. SIAM Journal of Scientific Computing,2004,26(1): 313-338.
    [77]D.D. Ridder, O. Kouropteva, O. Okun, M. Pietik"ainen, R.P.W. Duin. Supervised locally linear embedding. Artificial neural networks and neural information processing, In:ICANN/ICONIP 2003 Proceedings, Lecture Notes in Computer Science, Springer, Berlin,2003,2714:333-341.
    [78]D. de Ridder, M. Loog, and M.J.T. Reinders. Local Fisher embedding. International Conference of Pattern Recognition,2004,2:295-298.
    [79]M. Loog, D.Ridder. Local discriminant analysis. International Conference of Pattern Recognition, ICPR,2006,3:328-331.
    [80]X.F. He, P. Niyogi. Locality preserving projections. Proceedings of Conference on Advances in Neural Information Processing System,16,2003.
    [81]X.F. He, D. Cai, S.C. Yan, H.J.Zhang. Neighborhood preserving embedding. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Beijing, china, October 2005:1208-1213.
    [82]T. Zhang, J. Yang, D. Zhao, and X. Ge. Linear local tangent space alignment and application to face recognition. Neurocomputing,2007,70:1547-1553.
    [83]X. He, S. Yan, Y. Hu, Niyogi, H. Zhang. Face recognition using Laplacianfaces. IEEE Trans. Pattern Analysis and Machine Intelligence,2005,27(3):328-340.
    [84]S. Yan, D. Xu, B. Zhang, and H.J. Zhang. Graph embedding and extensions:a general framework for dimensionality reduction. IEEE Trans Pattern Anal and Mach Intelligence,2007,29(1):40-51.
    [85]S.Yan, D.Xu, B. Zhang, and H.J. Zhang. Graph embedding:A General framework for dimensionality reduction. Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2005:830-837.
    [86]D. Cai, X. He, and J. Han. Spectral regression for efficient regularized subspacelearning. IEEE 11th International Conference on Computer Vision,2007,6: 214-221.
    [87]W.E. Vinje, J.L. Gallant. Sparse coding and decorrelation in primary visual cortex during natural vision. Science,18 February 2000,287(5456):1273-1276.
    [88]B.A. Olshausen, D J. Field. Sparse coding of sensory inputs. Current Opinion in Neurobiology,14(4):481-487.
    [89]A. Hyvarinen, PO Hoyer. A two-layer sparse coding model learns simple and complex cell receptive fields and topography. Vision Research,2001,41(9):2413-2423.
    [90]K. Labusch, E. Barth, and T. Martinetz. Simple method for high-performance digit recognition based on sparse coding. IEEE Trans, on Neural Networks.2008,11: 1985-1989.
    [91]夏思宇,李久贤,袁晓辉,夏良正.一种基于Contourlet变换的人脸识别方法,信号处理,2008年24卷4期.
    [92]F. Murtagh and J.L. Starck.Wavelet and Curvelet Moments for Image Classification: Application to Aggregate Mixture Grading. Pattern Recognition Letters,2008,29: 1557-1564.
    [93]M. Tipping.Sparse bayesian learning and the relevance vector mhine. Journal of Machine Learning Research,2001,1:211-244.
    [94]A.J. Smola, O.L. Mangasarian and B. Scholkopf. Sparse kernel feature analysis. University of Wisconsin, Data Mining Institute, Madison, Technical Report.1999: 99-04.
    [95]S. Mika, G. Ratsch and K.R. Muller.A mathematical programming approach to the Kernel Fisher algorithm. Advances in Neural Information Processing Systems (NIPS), 13,2001.
    [96]H. Zhou, T. Hastie and R. Tibshirani. Sparse principle component analysis. Technical Report, Statistics Department, Stanford University,2004.
    [97]A. d'Aspremont, L. El Ghaoui, M. I. Jordan, and G.R.G. Lanckriet. A Direct Formulation for Sparse PCA using Semide_nite Programming. In Advances in Neural Information Processing Systems (NIPS). Vancouver, BC, December 2004.
    [98]B. Moghaddam, Y. Weiss and S. Avidan. Spectral bounds for sparse PCA:Exact and greedy algorithms. In Advances in Neural Information Processing Systems 18,2005.
    [99]B. Moghaddam, Y. Weiss and S. Avidan. Generalized spectral bounds for sparse LDA. In ICML'06:Proceedings of the 23rd international conference on Machine learning, 2006:641-648.
    [100]K. Huang and S. Aviyente. Sparse representation for wignal classification. Neural Information Processing Systems,2006.
    [101]D. Cai, X. He and J. Han. Sparse projections over graph, Proc. AAAI Conf. on Artificial Intelligence (AAAI-08), Chicago, Illinois, July 2008.
    [102]J. Wright, A.Y. Yang, A. Ganesh, S. Sastry and Y. Ma. Robust face recognition via sparse representation. IEEE Trans. Pattern Analysis and Machine Intelligence,2009, 31(2):210-227.
    [103]D. Cai, X. He, J. Han. Spectral regression:A Unified Approach for Sparse Subspace Learning. Proc.2007 Int. Conf. on Data Mining (ICDM 07), Omaha, NE, Oct.2007.
    [104]L. Clemmensen, T. Hastie, B. Erb(?)ll. Sparse discriminant analysis. Technical report, June.2008
    [105]C. Blake, C. Merz. UCI repository of machine learning databases,1998.
    [106]C.Y. Suen, C. Nadal, R. Legault, T.A. Mai, and L. Lam. Computer recognition of unconstrained handwritten numerals. Proc. IEEE SO,1992:1162-1180.
    [107]A.M. Martinez, R. Benavente. The AR Face CVC Technical Report#24, June,1998.
    [108]A.M. Martinez, R. Benavente. The AR Face Database, ,2006.
    [109]A.K. Jain and B. Chandrasekaran. Dimensionality and sample size considerations in pattern recognition practice. Handbook of Statistics, P. R. Krishnaiah and L. N. Kanal, Ed. Amsterdam, The Netherlands:North-Holland,1982,2:835-855.
    [110]X.Wu, Z.H. Zhou. Face recognition with on training image per person. Pattern Recognition Letters,2002,23(4):1171-1719.
    [111]S. Chen, J. Liu, Z.H. Zhou. Making FLDA applicable to face recognition with one sample per person, Pattern Recognition,2004,37(1):1553-1555.
    [112]D. Zhang, S.C. Chen, Z.H. Zhou. A new face recognition method based on SVD perturbation for single example image per person, Appl. Math. Comput.2005,163(2): 895-907.
    [113]S.C. Chen, D.Q. Zhang, Z.H. Zhou. Enhanced (PC)2A for face recognition with One training image per person, Pattern Recognition Letter,2004,25(10):1173-1181.
    [114]Q.X. Gao, L. Zhang and D. Zhang. Face recognition using FLDA with single training image per person. Applied Mathematics and Computation,2008,205(2):726-734.
    [115]Z. Hong. Algebraic feature extraction of imae for recognition. Pattern Recognition, 1991,24(3):211-219.
    [116]B. Li, D.S. Huang, C. Wang and K.H. Liu. Feature extraction using constrained maximum variance mapping. Pattern Recognition,2008,41:3287-3294.
    [117]P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss. The FERET evaluation methodology for face recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(10):1090-1104.
    [118]P.J. Phillips, H. Wechsler, J. Huang, P. Rauss. The FERET database and evaluation procedure for face recognition algorithms. Image and Vision Computing,1998,16(5): 295-306
    [119]D.A. Landgrebe. Signal Theory Methods in Multispectral Remote Sensing. John Wiley and Sons, Hoboken, NJ:Chichester,2003.
    [120]X.Y. Jing, D. Zhang, Z. Jin. Improvements on the uncorrelated optimal discriminant vectors. Pattern Recognition,2003,36(8):1921-1923.
    [121]Y. Xu, J.Y. Yang, Z. Jin.Theory analysis on FSLDA and ULDA.Pattern Recognition, 2003,36(12):3031-3033.
    [122]J. Yang, J.Y. Yang, A.F. Frangi, D. Zhang. Uncorrelated projection discriminant analysis and its application to face image feature extraction. International Journal of Pattern Recognition and Artificial Intelligence,2003,17(8):1325-1347.
    [123]W.M. Zheng, L. Zhao, C.R. Zou. An efficient algorithm to solve the small sample size problem for LDA. Pattern Recognition,2004,37(5):1077-1079.
    [124]H.T. Zhao, P.C. Yuen, J.Y. Yang. Optimal subspace analysis for face recognition. International Journal of Pattern Recognition and Artificial Intelligence,2005,19(3): 375-393.
    [125]Z. Liang, P. Shi. Uncorrelated discriminant vectors using a kernel method. Pattern Recognition,2005,38(2); 307-310.
    [126]P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19:711-720.
    [127]Z.L. Zheng, F.Yang, W.A. Tan, J. Jia, J. Yang. Gabor feature-based face recognition using supervised locality preserving projections. Signal processing,2007,87(10): 2473-2483.
    [128]H. Zhao, S. Sun, Z. Jing. Local information based uncorrelated feature extraction. Optical Engineering,2006,45(2):020505-1-020505-3.
    [129]D. Cai, X. He, J. Han, H.J. Zhang. Orthogonal laplacianfaces for face recognition. IEEE Transactions on Image Processing,2006,15(11):3608-3614.
    [130]W.K. Wong, H.T. Zhao. Supervised optimal locality preserving projection. Pattern Recognition,2012,45(1):186-197.
    [131]Y.L. Yu, L.M. ZHANG. Orthogonal MFA and uncorrelated MFA. Pattern Recognition and Artificial Intelligence,2008,21(5):603-608 (in Chinese).
    [132]C. Xiang, X.A. Fan, T.H., Lee. Face Recognition Using Recursive Fisher Linear Discriminant. IEEE Transactions on Image Processing,2006,15(8):2097-2105.
    [133]D. Cai, X.F. He, J.W. Han. Using Graph Model for Face Analysis. Department of Computer Science Technical Report No.2636, University of Illinois at Urbana-Champaign (UIUCDCS-R-2005-2636), Sept.2005.
    [134]D. Cai, X. He, K. Zhou, J. Han, H. Bao. Locality sensitive discriminant analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007:708-713.
    [135]P. Vincent, Y. Bengio. K-local hyperplane and convex distance nearest neighbor algorithms. Advances in Neural Information Processing Systems, Cambridge, MA: MIT Press.2002,14:985-992.
    [136]J. Yang, L. Zhang, J.Y. Yang, D. Zhang. From classifiers to discriminators:A nearest neighbor rule induced discriminant analysis. Pattern Recognition,2011,44(7): 1387-1402.
    [137]M. Wu and B. Scholkopf. Transductive classification via local learning regularization. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics,2007:628-635.
    [138]Y. Yang, D. Xu, F.P. Nie, J.B. Luo, Y.T. Zhuang. Ranking with Local regression and global alignment for cross media retrieval. ACM Multimedia,2009:175-184.
    [139]A. Yang, A. Ganesh, S. Sastry and Y. Ma. Fast 11-Minimization Algorithms and An Application in Robust Face Recognition:A Review, ICIP 2010.
    [140]V. Raghavan, P. Bollmann, G.S. Jung. A critical investigation of recall and precision as measures of retrieval system performance. ACM Transactions on Information Systems,1989,7(3):205-229.
    [141]D. Zhang, Palmprint Authentication. Kluwer Academic,2004.
    [142]D. Dubois, H. Prade. Rough Fuzzy Sets and Fuzzy Rough Sets, International Journal of General Systems,1990,17:191-209.
    [143]P. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory,1967 13(1):21-27.
    [144]H.L. Chen, D.Y. Liu, B. Yang, J. Liu, G. Wang and S.J. Wang. An adaptive fuzzy k-Nearest neighbor method based on parallel particle swarm optimization for bankruptcy prediction. PAKDD2011:The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining.
    [145]L.A. Zadeh, Fuzzy sets, Information control,1965,8:338-353.
    [146]J. Keller. A Fuzzy k-Nearest Neighbor Algorithm. IEEE Transactions on Systems, Man, and Cybernetics.1985,15(4):580-585.
    [147]H. Bian, L. Mazlack. Fuzzy-rough nearest-neighbor classification approach. In: Fuzzy Information Processing Society,2003. NAFIPS 2003:500-505.
    [148]P.O. Hoyer. Non-negative sparse coding. In Proc. IEEE Workshop on Neural Networks for Signal Processing,2002:557-565.
    [149]S. Chen, D. Donoho, M.Saunders. Atomic decomposition by basis pursuit. SIAM Rev.,2001,43(1):129-159.
    [150]L. Qiao, S. Chen, X. Tan. Sparsity preserving projections with application to face recognition. Pattern Recognition,2010,43(1):331-341.
    [151]S.J. Kim, K. Koh, M. Lustig, S. Boyd, D. Gorinevsky. A method for large-scale 11-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 2007,1(4):606-617.
    [152]J. Yang, L. Zhang, Y. Xu, J.Y. Yang. Beyond sparsity:The role of L1-optimizer in pattern classification. Pattern Recognition,2012,45(3):1104-1118.
    [153]R.C. Gonzalez, R.E. Woods. Digital Image Processing. Addison Wesley,1997.
    [154]I. Naseem, R. Togneri, M. Bennamoun. Linear Regression for Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(11): 2106-2112.
    [155]J. Xu, J. Yang. Mean representation based classifier with its applications, IET Electronics Letters,2011,47(18):1024-1026.
    [156]K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York,1990.
    [157]T. Hastie, R. Tibshirani and J. Friedman.The elements of statistical learning. Data Mining. Inference and Prediction. Springer,2001.
    [158]L. Elden. Perturbation theory for the least square problem with linear equality constraints. SIAM Journal on Numerical Analysis,1980,17(3):338-350.
    [159]R.D. Fierro, J.R. Bunch. Collinearity and total least squares. SIAM Journal on Matrix Analysis and Applications,1994,15:1167-1181
    [160]A.N. Tikhonov. Regularization of Incorrectly Posed Problems. Soviet Math. Dokl., 1963,4:1624-1627.
    [161]R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B,1996,58(1):267-288.
    [162]B. Efron, T. Hastie, I. Johnstone, R. Tibshirani. Least angle regression. The Annals of Statistics,2004,32(2):407-499.
    [163]H. Zou, T. Hastie. Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society, Series B,2005,67(2):301-320.
    [164]M. Schmidt. Least squares optimization with 11-norm regularization. Technical report, 2005.
    [165]N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc.,1950,68: 337-404.
    [166]J. Kim, C. Scott. Kernel classification via integrated squared error. Proc. IEEE Workshop Statistical Signal Processing, Aug.2007.
    [167]J. Kim, C.Scott. L2 Kernel Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(10):1822-1831.
    [168]G. Fung, O. Mangasarian, J. Shavlik. Knowledge-based nonlinear kernel classifiers. In COLT.,2003.
    [169]V. Wyk, B. J, M.A. van Wyk, J.J. Naude and F. Perrier. A variable kernel classifier for bearing faults diagnosis using simple statistical features. Proc. of the 16th Annual Symposium of the Pattern Recognition Association of South Africa, Langebaan, South Africa,2005:39-45.
    [170]G. Fung, O.L. Mangasarian, A. Smola. Minimal Kernel Classifiers. Journal of Machine Learning research,2002,3(11):303-321.
    [171]J. Yin, Z.H. Liu, Z.Jin, W.K. Yang. Kernel sparse representation based classification. Neural Computation,2012,77(1):120-128.
    [172]V.N. Vapnik. Statistical Learning Theory. New York:Wiley,1998.
    [173]H.L. Fei, J. Huan. L2 norm regularized feature kernel regression for graph data. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09),2009.
    [174]R. Rifkin, G. Yeo, T. Poggio. Regularized least-squares classification. Nato Science Series Sub Series Ⅲ Computer and Systems Sciences,2003,190:131-154.
    [175]J. Liu, S. Ji, J. Ye. SLEP:Sparse Learning with Efficient Projections. Arizona. State Univ., http://www.public.asu.edu/jye02/Software/SLEP,2009.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700