基于图的降维技术研究及应用

英文题名：Research on Graph-based Dimensionality Reduction and Its Applications
作者：乔立山
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：降维 ; 图学习 ; 稀疏表示 ; 局部保持投影 ; 稀疏保持投影 ; 软局部保持投影 ; 同时降维与图更新框架 ; 人脸识别
英文关键词：Dimensionality reduction ; graph learning ; sparse representation ; locality preserving projections ; sparsity preserving projections ; simultaneous dimensionality reduction and graph updating framework ; face recognition
学位年度：2009
导师：陈松灿
学科代码：081203
学位授予单位：南京航空航天大学
论文提交日期：2009-11-01

摘要

高维数据的涌现是模式识别面临的极大挑战,降维技术已成为处理高维数据,克服“维数灾难”的重要途径。研究表明多数降维方法可归结于图的构造及其嵌入方式。然而,现有许多典型的降维算法均依赖于人工预定义的近邻图,如局部保持投影(LPP)及其变体。虽然这类算法在很多实际问题中取得了良好性能,但存在诸如近邻参数选择、噪声敏感、判别力不足、无法自然地合并领域先验等一系列问题。本文围绕图的构建与优化对降维方法进行了研究,主要贡献有:
     (1)对全局保持和局部性保持(降维)策略的重新认识与评价。以几种典型的局部保持降维算法为例,通过与全局降维方法的对比,获得了一系列新的洞察(特别是,局部保持策略的不足)。进而,从图的构造角度分析了其深层原因,并给出了具体的改进策略和建议。这一方面,澄清了最近某些方法对局部和全局保持策略的误解,为模型选择提供了依据;另一方面,说明现有局部保持降维方法存在很大提升空间,成为本文研究工作的重要动机之一。
     (2)首次将稀疏表示引入图的构造,设计了稀疏保持投影(SPP)算法。由于采用全局策略构图,SPP在一定程度上克服了局部保持降维方法中近邻参数选择的困难;而SPP隐含的“近邻”通过l 1优化问题自动获取,很好地弥补了局部构图方法无视数据分布,所有样本使用同一近邻数的缺陷。另外,受益于稀疏表示自然的判别能力,SPP在人脸识别等问题上获得了较LPP等局部保持降维方法更优的性能。
     (3)提出了稀疏保持判别分析(SPDA)算法,并将其应用于单标号图像人脸识别问题。SPDA不仅是SPP的半监督推广,而且进一步将稀疏表示建图思想统一于贝叶斯学习的框架之下,使得先验知识能够自然地引入图的构造过程。此外,通过集成策略加速稀疏建图,设计了集成稀疏保持判别分析(enSPDA)算法。实验表明所提算法不仅较传统的半监督判别分析方法(如SDA)有效,并且需要更少的无标号样本。
     (4)提出了软局部保持投影(SLPP)方法。传统的局部保持降维技术中,近邻图起着至关重要的作用,但其构造依赖于人为定义,并独立于后续的降维过程。鉴于此,在LPP的基础上,提出了SLPP算法,将图的构造与投影学习整合于单个目标函数,通过交替优化,不仅使图学习过程简洁、高效、易于处理,并且获得了解析的,具有原则性指导意义的图更新公式。标准数据集上的实验表明了SLPP的有效性。
     (5)搭建了同时降维与图学习的统一框架。受SLPP的启发,提出了一个降维与图更新的同时学习框架,其思想可以应用于几乎所有基于图的降维技术。进一步,为验证此框架的可行性,基于此扩展了经典的LPP,提出了自助式局部保持投影(SdLPP)算法,并在数据可视化、聚类和分类等问题上验证了其有效性。
The high dimensionality of data is one of the main challenges faced by the current pattern recognition techniques. Dimensionality reduction (DR) has become an important tool to handle high-dimensional data and overcome the“curse of dimensionality”. Recent researches showed that most of the DR algorithms can generally reduce to graph construction and its embedding manners. However, many existing graph-based DR methods rely on artificially pre-defined neighborhood graph, e.g., locality preserving projections algorithm and its variants. Despits their success in many practical applications, those algorithms usually suffer from some limitations such as neighborhood parameter selection, sensitivity to noices, insufficient discriminative power and the difficulty of incorporating prior knowledge. This paper is a research on graph construction and optimation, especially for dimensionality reduction task. The main contributions include:
     (1) The revisit about globality and locality preserving strategies. In particular, we compare several popular locality-oriented DR algorithms with classical globality-oriented DR ones, and obtain a series of new insights (especially some undesirable characteristics involved in locality preserving strategy). Furthermore, we analyze the reasons for the empirical results from the viewpoint of graph construction, and provide specific tricks and suggestions to address such issues. On one hand, these studies clarify the current misunderstanding involved in locality-oriented methods; on the other hand, these show that there exists large space to improve the current DR methods, and become the important motivations of our following study works.
     (2) The first attempt to construct graph by sparse representation and the design of Sparsity Preserving Projections (SPP) algorithm. By virtue of global strategy for graph construction, SPP alleviates to a certain extent the difficulty of neighborhood parameter involved in locality preserving methods; SPP captures the“neighbors”by l 1-minimization problem potentially and automatically, instead of artificially predefinition which assigns the same neighborhood size for each sample. In addition, SPP benefits from the natural discriminative power of sparse respresentation, and thus achieves better performance than some popular locality preserving DR algorithms for face recognition application.
     (3) Proposing Sparsity Preserving Discriminant Analysis (SPDA) method and applying it to single labeled training image face recognition problem. Not only does SPDA extend SPP to semi-supervised version, but also unifies sparse graph construction under Bayesian learning framework which facilitates the incorporation of prior knowledge. In addition, we attempt to speed up SPDA by ensemble strategy and design ensemble SPDA algorithm. The experiments on publicly available data sets show that the proposed algorithms achieve better performance than their competitors such as SDA, and tend to work well just resoring to very few extra unlabeled samples.
     (4) Presenting Soft Locality Preserving Projections (SLPP) method. It is well known that graph plays an important role in typical locality-based DR techniques. However, the graph construction involved in these methods relies on artificial predefinition and is independent of subsequent DR step. To address this issue, we design SLPP algorithm based on LPP, which integrates graph construction with projection learning into one single objective function. With alternating iterative optimization, we obtain a principle way of graph construction. The feasibility and effectiveness of SLPP are verified on several standard data sets with promising results.
     (5) A unified framework for simultaneous dimesionality reduction and graph learning. Motivated by SLPP, we establish a simultaneous dimensionality reduction and graph updating framework, which is very general and applicable to most of the current graph-based DR techniques. To verify the feasibility and effectiveness of such framework, we further extend the typical LPP and develop Self-dependent LPP (SdLPP) algorithm. The effectiveness of the proposed algorithm is validated by the experiments including data visulization, clustering and face recognition on publicly avaible data sets.

引文

[1] C. M. Bishop, Pattern recognition and machine learning, 2006, New York: Springer.
    [2] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. 2nd ed, 2001, New York: Wiley.
    [3] E. Mjolsness and D. DeCoste, Machine learning for science: State of the art and future prospects. Science, 2001, 293(5537): 2051-2055.
    [4]中华人民共和国国务院,国家中长期科学和技术发展规划纲要(2006-2020年).
    [5] A. K. Jain, R. P. W. Duin, and J. C. Mao, Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(1): 4-37.
    [6] T. Ho, Data Complexity Analysis: Linkage between Context and Solution in Classification. in Structural, Syntactic, and Statistical Pattern Recognition, 2008.
    [7]谭璐,高维数据的降维理论及应用, 2005,国防科学技术大学.
    [8] I. T. Jolliffe, Principal component analysis. 2nd ed, 2002, New York: Springer.
    [9] S. W. Ji and J. P. Ye, Generalized Linear Discriminant Analysis: A Unified Framework and Efficient Model Selection. IEEE Transactions on Neural Networks, 2008, 19(10): 1768-1782.
    [10] E. Bingham and H. Mannila, Random Projection in dimensionality reduction: applications to image and text data. in International Conference on Knowledge Discovery and Data Mining (KDD), 2001.
    [11] T. Kohonen, Self-organizing formation of topologically correct feature maps. Bilogical Cybernetics, 1982, 43(1): 59-69.
    [12] T. J. Hastie and W. Stuetzle, Principal curves. Journal of the American Statistical Association, 1989, 84(406): 502-516.
    [13] S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323-2326.
    [14] J. B. Tenenbaum, Mapping a manifold of perceptual observations. in Neural Information Processing Systems (NIPS), 1998.
    [15] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003, 15(6): 1373-1396.
    [16] X. F. He, D. Cai, S. C. Yan, and H. J. Zhang, Neighborhood preserving embedding. in IEEE International Conference on Computer Vision (ICCV), 2005.
    [17] X. F. He and P. Niyogi, Locality preserving projections. in Neural Information Processing Systems (NIPS), 2003.
    [18] J. Shawe-Taylor and N. Cristianini, Kernel methods for pattern analysis, 2004, Cambridge, UK ; New York: Cambridge University Press.
    [19] J. Ham, D. D. Lee, S. Mika, and B. Scholkopf, A kernel view of the dimensionality reduction of manifolds. in International Conference on Machine learning (ICML), 2004.
    [20] D. Q. Zhang, Z.-H. Zhou, and S. C. Chen, Semi-supervised dimensionality reduction. in International Conference on Data Mining (ICDM), 2007.
    [21] A. Hyvarinen and E. Oja, Independent component analysis: algorithms and applications. NeuralNetworks, 2000, 13(4-5): 411-430.
    [22] M. Turk and A. Pentland, Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 1991, 3(1): 71-86.
    [23] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(7): 711-720.
    [24] B. Sch?lkopf, A. Smola, and K.-R. Müller, Kernel principal component analysis. in Artificial Neural Networks—ICANN'97, 1997.
    [25] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. R. Mullers, Fisher discriminant analysis with kernels. in IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing IX, 1999.
    [26] X. F. He, S. C. Yan, Y. X. Hu, P. Niyogi, and H. J. Zhang, Face recognition using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(3): 328-340.
    [27] L. J. P. Van-der-Maaten, E. O. Postma, and H. J. Van-den-Herik, Dimensionality Reduction: A Comparative Review. submit to Journal of Machine Learning Research (http://ict.ewi.tudelft.nl/~lvandermaaten/Publications_files/JMLR_Paper.pdf), 2009-10.
    [28] Y. Bengio, J. F. O. Paiement, P. Vincent, O. Delalleau, N. Le Roux, and M. Ouimet, Out-of-sample extensions for LLE, isomap, MDS, eigenmaps, and spectral clustering. in Neural Information Processing Systems (NIPS), 2004.
    [29] Y. Fu and T. S. Huang, Locally Linear Embedded Eigenspace Analysis. Technical Report (http://www.ifp.uiuc.edu/~yunfu2/papers/LEA-Yun05.pdf), 2005.
    [30]张丽梅,乔立山,陈松灿,基于张量模式的特征提取及分类器设计综述.山东大学学报(工学版), 2009, 39(1): 6-14.
    [31] D. Xu, S. C. Yan, D. C. Tao, S. Lin, and H. J. Zhang, Marginal Fisher analysis and its variants for human gait recognition and content-based image retrieval. IEEE Transactions on Image Processing, 2007, 16(11): 2811-2821.
    [32] D. Cai, X. F. He, and J. W. Han, Semi-supervised discriminant analysis. in IEEE International Conference on Computer Vision (ICCV), 2007.
    [33] L. Zhu and S. N. Zhu, Face recognition based on orthogonal discriminant locality preserving projections. Neurocomputing, 2007, 70(7-9): 1543-1546.
    [34] X. L. Li, S. Lin, S. C. Yan, and D. Xu, Discriminant locally linear embedding with high-order tensor data. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, 2008, 38(2): 342-352.
    [35] S. C. Yan, D. Xu, B. Y. Zhang, H. J. Zhang, Q. Yang, and S. Lin, Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1): 40-51.
    [36] L. Hagen and A. B. Kahng, New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992, 11(9):1074-1085.
    [37] J. B. Shi and J. Malik, Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
    [38] C. H. Q. Ding, X. F. He, H. Y. Zha, M. Gu, and H. D. Simon, A min-max cut algorithm for graph partitioning and data clustering. IEEE International Conference on Data Mining (ICDM), 2001: 107-114.
    [39] M. Maier and U. Luxburg, Influence of graph construction on graph-based clustering measures. in Neural Information Processing Systems(NIPS), 2008.
    [40] Y. Bengio, O. Delalleau, and N. L. Roux, Label Propagation and Quadratic Criterion. in Semi-Supervised Learning, 2006, MIT press.
    [41] D. Y. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf, Learning with local and global consistency. in Neural Information Processing Systems (NIPS), 2004.
    [42] M. Belkin, P. Niyogi, and V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 2006, 7: 2399-2434.
    [43] F. Wang and C. S. Zhang, Label propagation through linear Neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55-67.
    [44] L. S. Qiao, S. C. Chen, and X. Y. Tan, Sparsity Preserving Discriminant Analysis for Single Training Image Face Recognition. Pattern Recognition Letter, 2010, 31(5): 422-429.
    [45] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008.
    [46] E. Hu, S. Chen, and X. Yin, Manifold Contraction for Semi-Supervised Classification. Science in China (F), (In press), 2009.
    [47] E. Hu, S. Chen, and X. Yin, Manifold contraction for semi-supervised classification. Science In China, Serial F, Accepted, 2009.
    [48] T. Zhang and R. K. Ando, Analysis of Spectral Kernel Design based Semi-supervised Learning. in Neural Information Processing Systems (NIPS), 2005.
    [49] X. Zhu, J. Kandola, J. Lafferty, and Z. Ghahramani, Graph Kernels by Spectral Transforms. in Semi-Supervised Learning, 2006, MIT press.
    [50] J. Zhuang, I. Tsang, and S. C. H. Hoi, SimpleNPKL : Simple Non-Parametric Kernel Learning. in International Conference on Machine Learning (ICML), 2009.
    [51] C. Cortes and M. Mohri, On transductive regression. in Neural Information Processing Systems (NIPS), 2007.
    [52] L. S. Qiao, S. C. Chen, and X. Y. Tan, Sparsity preserving projections with applications to face recognition. Pattern Recognition, 2010, 43(1): 331-341.
    [53] S. Yan and H. Wang, Semi-supervised Learning by Sparse Representation. in SIAM International Conference on Data Mining (SDM), 2009.
    [54] T. Jebara, J. Wang, and S. Chang, Graph Construction and b-Matching for Semi-Supervised Learning. in International Conference on Machine Learning(ICML), 2009.
    [55] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    [56] H. T. Chen, H. W. Chang, and T. L. Liu, Local discriminant embedding and its variants. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
    [57] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is "nearest neighbor" meaningful? Database Theory - Icdt'99, 1999, 1540: 217-235.
    [58] S. J. An, W. Q. Liu, and S. Venkatesh, Exploiting side information in locality preserving projection. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
    [59] J. H. Chen, J. P. Ye, and Q. Li, Integrating global and local structures: A least squares framework for dimensionality reduction. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
    [60] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210-227.
    [61] H. F. Li, T. Jiang, and K. S. Zhang, Efficient and robust feature extraction by maximum margin criterion. IEEE Transactions on Neural Networks, 2006, 17(1): 157-165.
    [62] J. Liu, S. C. Chen, X. Y. Tan, and D. Q. Zhang, Comments on“Efficient and robust feature extraction by maximum margin criterion”. IEEE Transactions on Neural Networks, 2007, 18(6): 1862-1864.
    [63] Y. Q. Song, F. P. Nie, C. S. Zhang, and S. M. Xiang, A unified framework for semi-supervised dimensionality reduction. Pattern Recognition, 2008, 41(9): 2789-2799.
    [64] I. K. Fodor, A survey of dimension reduction techniques, Technical Report UCRL-ID-148494,Lawrence Livermore National Laboratory, Center for Applied Scientific Computing. Technical Report, 2002.
    [65] A. Hyvarinen, J. Karhunen, and E. Oja, Independent component analysis, 2001, New York: J. Wiley.
    [66] S. Kaski, Dimensionality reduction by random mapping: Fast similarity computation for clustering. IEEE World Congress on Computational Intelligence, 1998: 413-418.
    [67] N. Goel, G. Bebis, and A. Nefian, Face recognition experiments with random projection. Biometric Technology for Human Identification II, 2005, 5779: 426-437.
    [68] D. L. Donoho, Compressed sensing. IEEE Transactions on Information Theory, 2006, 52(4): 1289-1306.
    [69] R. G. Baraniuk, Compressive Sensing [Lecture Notes]. Signal Processing Magazine, IEEE, 2007, 24(4): 118-121.
    [70] K. Fukunaga, Introduction to statistical pattern recognition. 2nd ed, 1990, Boston: Academic Press.
    [71] H. Wang, S. C. Yan, D. Xu, X. O. Tang, and T. Huang, Trace Ratio vs. Ratio Trace for dimensionality reduction. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
    [72] J. H. Friedman, Regularized discriminant analysis. Journal of the American Statistical Association, 1989, 84(405): 165-175.
    [73] W. J. Krzanowski, P. Jonathan, W. V. McCarthy, and M. R. Thomas, Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Applied Statistics, 1995, 44(1): 101-115.
    [74] L. F. Chen, H. Y. M. Liao, M. T. Ko, J. C. Lin, and G. J. Yu, A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognition, 2000, 33(10): 1713-1726.
    [75] J. P. Ye and Q. Li, A two-stage linear discriminant analysis via QR-decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 929-941.
    [76] J. P. Ye, R. Janardan, and Q. Li, Two-dimensional linear discriminant analysis. in Neural Information Processing Systems (NIPS), 2004.
    [77] J. P. Ye and S. W. Ji, Discriminant Analysis for Dimensionality Reduction: An Overview of Recent Developments. in Biometrics: Theory, Methods & Applications, N.V. Boulgouris, K.N. Plataniotis, and E. Micheli-Tzanakou, Editors, (In press), IEEE/Wiley.
    [78] J. P. Ye, Least squares linear discriminant analysis. in International Conference on Machine Learning (ICML), 2007.
    [79] M. Wu and B. Scholkopf, A Local Learning Approach for Clustering. in Neural Information Processing Systems (NIPS), 2006.
    [80] J. Yang and J. Y. Yang, Why can LDA be performed in PCA transformed space? Pattern Recognition, 2003, 36(2): 563-566.
    [81] X. Y. Tan, S. C. Chen, Z. H. Zhou, and F. Y. Zhang, Face recognition from a single image per person: A survey. Pattern Recognition, 2006, 39(9): 1725-1745.
    [82] Y. Bengio, O. Delalleau, and N. L. Roux, The Curse of Highly Variable Functions for Local Kernel Machines. in Neural Information Processing Systems (NIPS), 2006.
    [83] M. Meytlis and L. Sirovich, On the dimensionality of face space. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(7): 1262-1267.
    [84] Y. Bengio, M. Monperrus, and H. Larochelle, Nonlocal Estimation of Manifold Structure. Neural Computation, 2006, 18(10): 2509-2528.
    [85] X. Tan, L. Qiao, and W. Gao, Robust Faces Manifold Modeling: Most Expressive Vs. Most Sparse Criterion. in ICCV09 workshop on subspace methods, 2009.
    [86] J. Yang, D. Zhang, J. Y. Yang, and B. Niu, Globally maximizing, locally minimizing: Unsupervised discriminant projection with applications to face and palm biometrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(4): 650-664.
    [87] S. G. Mallat and Z. Zhang, Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 1993, 41(12): 3397-3415.
    [88] S. S. B. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit. Siam Review, 2001, 43(1): 129-159.
    [89] J. F. Murray and K. Kreutz-Delgado, Visual recognition and inference using dynamic overcomplete sparse learning. Neural Computation, 2007, 19(9): 2301-2352.
    [90] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, Orthogonal Matching Pursuit - Recursive Function Approximation with Applications to Wavelet Decomposition. Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems & Computers, Vols 1 and 2, 1993: 40-44.
    [91] S. H. Ji, Y. Xue, and L. Carin, Bayesian compressive sensing. IEEE Transactions on Signal Processing, 2008, 56(6): 2346-2356.
    [92] M. W. Marcellin, M. A. Lepley, A. Bilgin, T. J. Flohr, T. T. Chinen, and J. H. Kasner, An overview of quantization in JPEG 2000. Signal Processing-Image Communication, 2002, 17(1): 73-84.
    [93] M. Elad and M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 2006, 15(12): 3736-3745.
    [94] R. Tibshirani, Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society: Series B, 1996, 58(1): 267-288.
    [95] J. Mairal, G. Sapiro, and M. Elad, Learning multiscale sparse representations for image and video restoration. Multiscale Modeling & Simulation, 2008, 7(1): 214-241.
    [96] J. Mairal, M. Elad, and G. Sapiro, Sparse representation for color image restoration. IEEE Transactions on Image Processing, 2008, 17(1): 53-69.
    [97] J. C. Yang, J. Wright, T. Huang, and Y. Ma, Image super-resolution as sparse representation of raw image patches. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008: 2378-2385.
    [98] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding. Technical report, arXiv:0908.0050. Technical Report, 2009.
    [99] K. Huang and S. Aviyente, Sparse representation for signal classification. in Neural Information Processing Systems (NIPS), 2006.
    [100] D. Cai, X. F. He, and J. W. Han, Spectral regression: A unified approach for sparse subspace learning. in IEEE International Conference on Data Mining (ICDM), 2007.
    [101] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, Self-taught learning: transfer learning from unlabeled data. in International Conference on Machine Learning (ICML), 2007.
    [102] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Supervised Dictionary Learning. in Neural Information Processing Systems (NIPS), 2008.
    [103] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Discriminative Learned Dictionaries for Local Image Analysis. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
    [104] D. S. Pham and S. Venkatesh, Joint learning and dictionary construction for pattern recognition. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
    [105] S. C. Chen and M. Wang, Seeking multi-thresholds directly from support vectors for image segmentation. Neurocomputing, 2005, 67: 335-344.
    [106]乔立山,陈松灿,王敏,基于相关向量机的图像阈值技术.计算机研究与发展, 2009.
    [107] V. N. Vapnik, The nature of statistical learning theory. 2nd ed, 2000, New York: Springer.
    [108] J. Wright and Y. Ma, Dense error correction via l1-minimization. in IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.
    [109] D. Cai, X. He, and J. Han, Spectral regression for dimensionality reduction. Technical Report, 2007.
    [110] J. Liu, J. Ye, and R. Jin, Sparse Learning with Euclidean Projection onto l1 Ball. Journal of Machine Learning Research, 2009.
    [111] J. Liu and J. Ye, Efficient Euclidean Projections in Linear Time. in International Conference on Machine Learning (ICML), 2009.
    [112] Y. Nesterov, Introductory Lectures on Convex Optimization, 2004, Boston: Kluwer.
    [113] D. L. Donoho and Y. Tsaig, Fast Solution of l(1)-Norm Minimization Problems When the Solution May Be Sparse. IEEE Transactions on Information Theory, 2008, 54(11): 4789-4812.
    [114] K. Zhang and J. T. Kwok, Density-Weighted Nystrom Method for Computing Large Kernel Eigensystems. Neural Computation, 2009, 21(1): 121-146.
    [115] M. E. Tipping, Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 2001, 1(3): 211-244.
    [116] H. Zou, T. Hastie, and R. Tibshirani, Sparse principal component analysis. Journal of Computational and Graphical Statistics, 2006, 15(2): 265-286.
    [117] R. Zass and A. Shashua, Non-negative sparse PCA. in Neural Information Processing systems (NIPS), 2007.
    [118] P. O. Hoyer, Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 2004, 5: 1457-1469.
    [119] M. Wu, K. Yu, S. Yu, and B. Scholkopf, Local learning projections. in International Conference on Machine Learning (ICML), 2007.
    [120] A. M. Martinez and A. C. Kak, PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(2): 228-233.
    [121] K. C. Lee, J. Ho, and D. J. Kriegman, Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5): 684-698.
    [122] S. C. Chen and Y. L. Zhu, Subpattern-based principle component analysis. Pattern Recognition, 2004, 37(5): 1081-1083.
    [123] D. Beymer and T. Poggio, Face recognition from one example view. in International Conference on Computer Vision (ICCV), 1995.
    [124] S. C. Chen, J. Liu, and Z. H. Zhou, Making FLDA applicable to face recognition with one sample per person. Pattern Recognition, 2004, 37(7): 1553-1555.
    [125] S. P. Boyd and L. Vandenberghe, Convex optimization, 2004, Cambridge, UK ; New York: Cambridge University Press.
    [126] T. Hesterberg, N. H. Choi, L. Meier, and C. Fraley, Least angle and l1 penalized regression: A review. Statistics Surveys, 2008, 2: 61-93.
    [127] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression. Annals of Statistics, 2004, 32(2): 407-451.
    [128] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    [129] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, Sparse Representation For Computer Vision and Pattern Recognition. to appear in the Proceedings of the IEEE, 2009.
    [130] H. Xue, S. C. Chen, and Q. Yang, Discriminatively regularized least-squares classification. Pattern Recognition, 2009, 42(1): 93-104.
    [131] B. Scholkopf, R. Herbrich, and A. J. Smola, A generalized representer theorem. in Computational Learning Theory, 2001.
    [132] W. Zhao, R. Chellappa, and P. J. Phillips, Subspace linear discriminant analysis for face recognition. Technical Report, 1999.
    [133] J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms, 1981, New York: Plenum Press.
    [134] W. Rudin, Principles of Mathematical Analysis. second ed, 1964, New York: McGraw-Hill Book Company.
    [135] C. M. Bishop, Neural networks for pattern recognition, 1995, New York: Oxford University Press.
    [136] D. Cai, X. F. He, J. W. Han, and H. J. Zhang, Orthogonal laplacianfaces for face recognition. IEEE Transactions on Image Processing, 2006, 15(11): 3608-3614.
    [137] L. Qiao, L. Zhang, and Z. Sun, Self-dependent Locality Preserving Projections with Transformed Space-oriented Neighborhood Graph. Transactions of Nanjing University of Aeronautics & Astronautics (Accepted), 2009.
    [138] C. M. Bishop, Pattern recognition and machine learning, 2006, New York: Springer.
    [139] D. Q. Zhang, S. C. Chen, and Z. H. Zhou, Constraint Score: A new filter method for feature selection with pairwise constraints. Pattern Recognition, 2008, 41(5): 1440-1451.
    [140] X. He, D. Cai, and P. Niyogi, Laplacian Score for Feature Selection. in Neural Information Processing Systems (NIPS), 2005.
    [141] X. Yin, S. Chen, E. Hu, and D. Zhang, Semi-Supervised Clustering with Metric Learning: An Adaptive Kernel Method. submit to Pattern recognition, 2009.
    [142] B. J. Yan and C. Domeniconi, An adaptive kernel method for semi-supervised clustering. in European Conference on Machine Learning (ECML), 2006.
    [143] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems. IEEE Journal of Selected Topics in Signal Processing, 2007, 1(4): 586-597.
    [144] S. J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, An Interior-Point Method for Large-Scale l(1)-Regularized Least Squares. IEEE Journal of Selected Topics in Signal Processing,2007, 1(4): 606-617.
    [145] E. T. Hale, W. Yin, and Y. Zhang, A fixed-point continuation method for l1-regularized minimization with applications to compressed sensing. Technical report, CAAM TR07-07. Technical Report, 2007.
    [146] J. Friedman, T. Hastie, and R. Tibshirani, Regularized paths for generalized linear models via coordinate descent. Technical report, Department of Statistics, Stanford University. Technical Report, 2008.
    [147] W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, Self-taught Clustering. in International Conference on Machine Learning (ICML), 2008.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700