维数约简中的若干问题

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

维数约简中的若干问题

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

作者：何力
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：机器学习 ; 维数约简 ; 流形学习 ; 核方法 ; 图嵌入 ; 独立性准则 ; 监督学习 ; 半监督学习 ; 非监督学习
英文关键词：machine learning ; dimensionality reduction ; manifold learning ; kernel methods ; graph embedding ; independence criteria ; supervised learning ; semi-supervised learning ; unsupervised learning
学位年度：2010
导师：陆汝钤 ; 张军平
学科代码：081202
学位授予单位：复旦大学
论文提交日期：2010-04-14

摘要

维数约简是机器学习中的重要问题,本文着重介绍了该领域中四个问题的研究成果：
     流形学习作为非监督、非线性降维方法曾一度广为关注,如何对多样的流形学习算法进行合理的分类与评估一直是难以解决的问题。我们提出了一个基于算法设计思想的分类方法将常见的算法分为保距映射、图嵌入与统计方法三类,我们分别讨论了每类方法共同的优点以及不足；之后我们分几个方面对这些算法进行了评估：我们仔细的分析了常见算法的复杂性；讨论了谱与维数的关系；分析了噪声对每类方法产生的影响；解释了参数空间存在空洞时对算法的影响；使用邻域保持率分析了算法能否保持流形拓扑结构；提出了使用放大因子、主延展方向以及一些定量准则用于分析流形学习算法更细致的特性。作为这些分析的一个简单应用,我们针对人脸识别问题,从诸多算法中选择了较合适的流形学习算法进行降维,并获得了较传统线性降维算法更好的识别率。
     图嵌入算法是流形学习的一个重要的分支(见第2章),它的参数化(包括线性化和核化两个过程)为我们提供了一个完整的降维框架。核化产生了一个计算代价为O(N3)的问题,这阻碍了该方法在大规模数据上的应用。我们提出使用AP初始化k均值获得代表元进行近似的算法,由于我们的方法能够更好的控制量化误差,在相同代表元个数时能得到对Gram矩阵更好的逼近；我们分析了对不同部分谱逼近的程度,并通过实验说明不同应用需要对不同谱进行逼近。我们还给出了对映射逼近的误差界,并证明该误差界一样被量化误差所控制；相对于对Gram矩阵的逼近,这种方式在PKLR与图嵌入算法上有着更直观的解释,我们的实验也表明图嵌入上该方法获得的解更好且参数更少。我们前期的工作比较了一些线性化图嵌入算法的特点；利用近似算法我们在大规模问题上比较了这些核化图嵌入算法,我们得出了一些有意思的结论,如：求最小特征值的图嵌入算法不适合使用谱下降较快的核函数进行核化；局部性的模型可以通过局部性的核函数得到类似的效果。
     .我们利用基于核方法构造的独立性准则设计了一种监督维数约简算法,分析表明它可以做为充分维数约简算法如KDR的一种近似。但是相对于KDR每次迭代需要O(N3)的时间复杂度,我们的算法仅需要O(N2)与一次V阶矩阵乘法的时间,具有更低的计算代价。我们在一些模拟数据上讨论了我们的方法可能存在的问题,但是使用真实数据的多数实验中,我们的方法可以给出与KDR类似的结果。我们还讨论了使用HSIC统计量确定SDR投影空间维数的上界的方法,这个问题在多数文献中都没有给出较合理的解决方案。我们进一步讨论了这类算法与图嵌入算法之间的联系,发现图嵌入算法可以为其提供较好的初始值,以此减少随机搜索的次数。为了能让这类模型能够处理非监督信息,我们为原模型添加了Laplace光滑子,通过实验发现在较低维投影时能够获得较仅利用监督信息的模型更好的结果。最后我们提出了使用这类算法处理非监督降维与CCA问题的方法作为今后一个潜在的研究方向。
     在处理一些实际问题的时候,数据中存在的序关系往往十分重要,因为这些关系揭示了数据在潜在的流形上的分布,在我们的实验中也发现保持序关系能够改善分类器的泛化能力。我们第一次将这类问题从传统分类问题中分离出来,称之为趋势学习。我们比较了趋势学习与其他传统学习问题的异同点,如分类是对分界面建模,而趋势学习是对状态之间的迁移过程建模。通过对传统线性模型SVM与PKLR的仔细比较,我们认为后者能更方便地用于对趋势学习建模。这样我们获得了一个DAG正则化的PKLR模型,由于其约束非凸,我们给出了一个使用CCCP求解的算法。为了验证我们想法的合理性,我们在两组模拟数据和两组真实数据上进行了实验,结果说明在标注样本较少的时候,通过DAG正则化生成的趋势学习模型具有较监督学习与半监督学习模型更好的泛化能力。
Dimensionality reduction (DR) is one of the most important problems in machine learn-ing. There are four main contributions to this field in my thesis:
     (?) manifold learning is an unsupervised and nonlinear DR method, which has captured many attentions in the past few years. It is difficult to find a proper taxonomy and eval-uation criteria for manifold learning algorithms due to their diversity. We propose a taxonomy based on the design of the algorithms, including distance-preserving map-pings, graph embeddings and statistical methods; afterwards their cons and pros are discussed respectively; moreover, we focus on several other aspects:the complexity of the algorithms; the relationship between spectra and dimensions; the impact of noise to each category; the consequence in existence of "holes" in the parametric space; local-ity preserving rates which can detect whether the topological structure of the manifold is destroyed; magnification factors, principal spreading directions and their quantitive criteria which allows us to explore more detailed properties of the mapping; as an ap-plication in face recognition, we choose a proper algorithm for DR and it yields higher recognition rate than traditional linear DR methods;
     (?) as an important branch in manifold learning, the graph embedding algorithms (c.f. chapter 2) can be parametrized (including linearization and kernelization) and make a complete DR framework. But the O(N3) cost of optimizing the kernelized mod-els makes it impractical for real applications; we propose an approximation algorithm based on AP-initialized k-means, which has a lower quantization error and therefore is a better approximation to the Gram matrix than other methods with the same amount of exemplars; we analyze the approximation to different spectra and the experiments verify that different applications demand different spectra; we also find an error bound for measuring approximation of the mapping, which is proved to be bounded by the quantization error as well; direct approximation of the mapping function has intuitive interpretations compared to the strategy of approximating the Gram matrix and in our experiments on graph embedding we find the former has better performance with fewer parameters than the latter; in our previous work, we have compared some lin- earized graph embedding algorithms; with the approximation algorithms, we are able to compare them on some large-scale problems, from which we draw some interesting conclusions, e.g. those graph embeddings corresponding to minimal eigenvalues suffer from kernel functions with fast-descreasing spectra; local models might be realized via local kernels instead;
     (?) we propose a supervised DR algorithm based on recently-developed independence cri-teria using kernel methods; our analysis shows it yields comparable results to SDR algorithms such as KDR but it has lower computational cost of O(N2) and one mul-tiplication of size-N matrices while KDR takes O(N3) time; we discuss the potential problems of our models on some simulated data but in many practical problems our models yields comparable results to KDR; we also develop an algorithm to find an upper bound for SDR, which is not addressed in many literatures; a further analysis discloses the relationship between these algorithms and graph embedding; therefore graph embedding yields a reasonable initial value for these algorithms and cuts down the cost of random searching; to incorporate unsupervised information in these mod-els, Laplacian smoother is employed, which outperforms supervised models when the data is projected into low dimensional space; we point out a potential research direction of solving unsupervised DR and CCA problems with these algorithms;
     (?) ordinal relationship is very important in many practical problems because it reveals how the data are distributed on a manifold and in our experiments we find it improve the generalization capability of learners; we treat these problems in a different way from traditional discriminative tasks, which we call tendency learning; we show the differ-ences and connections of tendency learning with other well-known learning problems, e.g. classification is modelling the separation surface while tendency learning is mod-elling the transition; we analyze the feasibility of applying two linear models (SVM and PKLR) in tendency learning, which shows the latter is more suitable for tendency learning; the DAG-regularized model we obtain has to be solved with CCCP due to its non-convex constraints; our experiments on two simulated data and two real data show the DAG-regularized model for tendency learning outperforms other supervised and semi-supervised models especially when few data are labeled.

引文

[1]Yasemin Altun, Alexander J. Smola, and Thomas Hofmann. Exponential families for conditional random fields [C]. In David Maxwell Chickering and Joseph Y. Halpern, editors, UAI, pages 2-9. AUAI Press,2004. ISBN 0-9749039-0-6.
    [2]Cedric Archambeau and Francis Bach. Sparse probabilistic projections [C]. In Koller等人[97],pages 73-80.
    [3]Sunil Arya and David M. Mount. Approximate nearest neighbor queries in fixed di-mensions [C]. In SODA, pages 271-280,1993.
    [4]Francis Bach and Michael I. Jordan. A probabilistic interpretation of canonical cor-relation analysis [R]. Technical Report 688, Department of Statistics, University of California, Berkeley,2005.
    [5]Francis R. Bach and Michael I.Jordan. Kernel independent component analysis [J]. Journal of Machine Learning Research,3:1-48,2002.
    [6]Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. Clus-tering with bregman divergences [J]. Journal of Machine Learning Research,6:1705-1749,2005.
    [7]Suzanna Becker, Sebastian Thrun, and Klaus Obermayer, editors. Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, NIPS 2002, December 9-14,2002, Vancouver, British Columbia, Canada] [C],2003. MIT Press. ISBN 0-262-02550-7.
    [8]Peter N. Belhumeur, Joao Hespanha, and David J. Kriegman. Eigenfaces vs. fish-erfaces:Recognition using class specific linear projection [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,19(7):711-720,1997. URL http: //citeseer.ist.psu.edu/article/belhumeur97eigenfaces.html.
    [9]Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. [C]. In Dietterich等人[48],pages 585-591.
    [10]Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization:A geometric framework for learning from labeled and unlabeled examples. [J]. Journal of Machine Learning Research,7:2399-2434,2006.
    [11]Aharon Ben-Tal and Arkadi Nemirovski. Lectures on Modern Optimization:Analy-sis, Algorithms, Engineering Applications [M]. MPS-SIAM Series on Optimization. SIAM, Philadelphia,2001.
    [12]Yoshua Bengio, Jean-Francois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Le Roux, and Marie Ouimet. Out-of-sample extensions for LLE, ISOMAP, MDS, Eigenmaps, and spectral clustering [C]. In Thrun等人[179].ISBN 0-262-20152-6.
    [13]Adam L. Berger, Stephen Della Pietra, and Vincent J. Della Pietra. A maximum en-tropy approach to natural language processing. [J]. Computational Linguistics,22(1): 39-71,1996.
    [14]H. A. Bethe. Statistical theory of lattice [C]. In Proceedings of the Royal Society of London, volume 150 of A, Mathematical and Physical Sciences, pages 552-575,1935.
    [15]Wei Bian and Dacheng Tao. Manifold regularization for sir with rate root-n conver-gence [G]. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 117-125.2009.
    [16]Christopher M. Bishop. Bayesian pca [C]. In Kearns等人[96],pages 382-388. ISBN 0-262-11245-0.
    [17]Christopher M. Bishop. Variational principal components [C]. In Proceedings Ninth International Conference on Artificial Neural Networks, ICANN 99, pages 509-514, 1999.
    [18]Christopher M. Bishop. Pattern Recognition and Machine Learning [M]. Springer, New York, NY,2006. URL http://research.microsoft.com/en-us/um/ people/cmbishop/prml/.
    [19]David M. Blei, Andrew Y. Ng, and Michael I.Jordan. Latent dirichlet allocation [C]. In Dietterich等人[48],pages 601-608.
    [20]Christos Boutsidis, Michael Mahoney, and Petros Drineas. Unsupervised feature se-lection for the κ-means clustering problem [G]. In Y. Bengio, D. Schuurmans, J. Laf-ferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Pro-cessing Systems 22, pages 153-161.2009.
    [21]Stephen Boyd and Lieven Vandernberghe. Convex Optimization [M]. Cambridge University Press, NY, USA,2004.
    [22]Matthew Brand. Charting a manifold. [C]. In Becker等人[7],pages 961-968. ISBN 0-262-02550-7. URL http://citeseer.ist.psu.edu/article/ brand03charting.html.
    [23]Carla E. Brodley, editor. Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8,2004 [C],2004. ACM.
    [24]Anders Brun, Carl-Fredrik Westin, Magnus Herberthson, and Hans Knutsson. Fast manifold learning based on riemannian normal coordinates. [C]. In Heikki Kalviainen, Jussi Parkkinen, and Arto Kaarna, editors, SCIA, volume 3540 of Lecture Notes in Computer Science, pages 920-929. Springer,2005. ISBN 3-540-26320-9.
    [25]Jose M. Buenaposada, Enrique Munoz, and Luis Baumela. Efficiently estimating fa-cial expression and illumination in appearance-based tracking [C]. In Proc. BMVC, volume Ⅰ, pages 57-66,2006.
    [26]Wray L. Buntine. Variational extensions to em and multinomial pca [C]. In Tapio Elomaa, Heikki Mannila, and Hannu Toivonen, editors, ECML, volume 2430 of Lec-ture Notes in Computer Science, pages 23-34. Springer,2002. ISBN 3-540-44036-4.
    [27]Deng Cai, Xiaofei He, Kun Zhou, Jiawei Han, and Hujun Bao. Locality sensitive discriminant analysis [C]. In Manuela M. Veloso, editor, IJCAI, pages 708-713,2007.
    [28]Francesco Camastra and Alessandro Vinciarelli. Estimating the intrinsic dimension of data with a fractal-based method. [J]. IEEE Trans. Pattern Anal. Mach. Intell.,24 (10):1404-1407,2002.
    [29]O. Chapelle, B. Scholkopf, and A. Zien, editors. Semi-Supervised Learning [M], MIT Press, Cambridge, MA,2006. URL http://www. kyb. tuebingen. mpg. de/ ssl-book.
    [30]Youngmin Cho and Lawrence Saul. Kernel methods for deep learning [G]. In Y. Ben-gio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 342-350.2009.
    [31]Heeyoul Choi and Seungjin Choi. Robust kernel isomap [J]. Pattern Recognition,40 (3):853-862,2007.
    [32]Fan R. K. Chung. Spectral Graph Theory [M]. American Mathematical Society,1997.
    [33]William W Cohen and Andrew Moore, editors. Machine Learning, Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29,2006 [C],2006. ACM. ISBN 1-59593-383-2.
    [34]William W. Cohen, Andrew McCallum, and Sam T. Roweis, editors. Machine Learn-ing, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9,2008 [C], volume 307 of ACM International Conference Proceeding Series,2008. ACM. ISBN 978-1-60558-205-4.
    [35]Michael Collins, S. Dasgupta, and Robert E. Schapire. A generalization of principal components analysis to the exponential family [C]. In Dietterich 等人 [48], pages 617-624.
    [36]Ronan Collobert, Fabian H. Sinz, Jason Weston, and Leon Bottou. Trading convexity for scalability. [C]. In Cohen与Moore [33], pages 201-208. ISBN 1-59593-383-2.
    [37]Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic pro-gressions [J]. Journal of Symbolic Computation,9:251-280,1990.
    [38]Thomas H. Cormen, Charlse E. Leiserson, Ronald L. Rivest, and Clifford Stein. In-troduction to Algorithms [M]. The MIT Press,2nd edition,2001.
    [39]T.M. Cover andJ.A. Thomas. Elements of Information Theory [M]. Wiley-Interscience, August 1991.
    [40]Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods [M]. Cambridge University Press,2000.
    [41]G. V. Cybenko. Approximation by superpositions of a sigmoidal function [J]. Math-ematics of Control, Signals and Systems,2(4):303-314,1989.
    [42]Andrea Pohoreckyj Danyluk, Leon Bottou, and Michael L. Littman, editors. Proceed-ings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18,2009 [C], volume 382 of ACM International Conference Proceeding Series,2009. ACM. ISBN 978-1-60558-516-1.
    [43]Alexandre dAspremont, Laurent El Ghaoui, Michael I.Jordan, and Gert R. G. Lanck-riet. A direct formulation for sparse pca using semidefinite programming [C]. In Saul等人[153].
    [44]Alexandre dAspremont, Francis R. Bach, and Laurent El Ghaoui. Full regularization path for sparse principal component analysis [C]. In Ghahramani [66], pages 177-184. ISBN 978-1-59593-793-3.
    [45]Arthur P. Dempster, Nan P. Laird, and Donald B. Rubin. Maximum likelihood from incomplete data via the em algorithm [J]. Journal of the Royal Statistical Society,34:1-38,1977.
    [46]Inderjit S. Dhillon and Suvrit Sra. Generalized nonnegative matrix approximations with bregman divergences [C]. In Weiss等人[193].
    [47]Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting output codes [J]. J. Artif. Intell. Res. (JAIR),2:263-286,1995.
    [48]Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahramani, editors. Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3-8,2001, Vancouver, British Columbia, Canada] [C],2001. MIT Press.
    [49]Chris H. Q. Ding and Xiaofeng He. κ-means clustering via principal component analysis [C]. In Brodley [23].
    [50]Chris H. Q. Ding and Xiaofeng He. On the equivalence of nonnegative matrix fac-torization and spectral clustering [C]. In SDM,2005.
    [51]David L. Donoho and Carrie Grimes. Image manifolds which are isometric to eu-clidean space [R]. Technical report, Statistics Department, Stanford University,2002. a.k.a. When Does ISOMAP Recover the Natural Parameterization of Families of Ar-ticulated Images?
    [52]David L. Donoho and Carrie Grimes. Hessian eigenmaps:new locally linear embed-ding techniques for highdimensional data [R]. Technical Report TR2003-08, De-partment of Statistics, Stanford University, March 2003. URL http://citeseer. ist.psu.edu/donoho03hessian.html.
    [53]Jaoquim Antonio dos Santos Gromicho. Quasiconvex Optimization and Location The-ory [M], volume 9 of Applied Optimization. Kluwer Academic Publishers, Boston, London,1998.
    [54]Petros Drineas and Michael W. Mahoney. On the nystrom method for approximating a gram matrix for improved kernel-based learning [J]. Journal of Machine Learning Research,6:2153-2175,2005.
    [55]Delbert Dueck. Affinity Propagation:Clustering Data by Passing Messages [D]. PhD thesis, University of Toronto, June 2009.
    [56]Alan Edelman, Tomas A. Arias, and Steven T. Smith. The geometry of algorithms with orthogonality constraints [J]. SI AM Journal on Matrix Analysis and Applications, 20:303-353, April 1999.
    [57]Tom Fawcett and Nina Mishra, editors. Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24,2003, Washington, DC, USA [C],2003. AAAI Press. ISBN 1-57735-189-4.
    [58]Brendan J. Frey and Delbert Dueck. Mixture modeling by affinity propagation [C]. In Weiss等人[193].
    [59]Brendan J. Frey and Delbert Dueck. Clustering by passing messages between data points [J]. Science,315:972-976,2007. URL www.psi.toronto.edu/ affinitypropagation.
    [60]Yun Fu and Thomas S. Huang. Locally linear embedded eigenspace analysis [R]. Tech-nical Report IFP.TR-LEA.YunFu-Jan.l, Beckman Institute for Advanced Science and Technology, UIUC,2005.
    [61]Kenji Fukumizu, Francis R. Bach, and Michael I.Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces [J]. Journal of Machine Learning Research,5:73-99,2004.
    [62]Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, and Bernhard Scholkopf. Kernel mea-sures of conditional dependence [C]. In Platt等人[138].
    [63]Kenji Fukumizu, Francis R. Bach, and Michael I.Jordan. Kernel dimensionality re-duction for in regression [J]. Annals of Statistics,2009.
    [64]Eric Gaussier and Cyril Goutte. Relation between plsa and nmf and implications [C]. In Ricardo A. Baeza-Yates, Nivio Ziviani, Gary Marchionini, Alistair Moffat, and John Tait, editors, SIGIR, pages 601-602. ACM,2005. ISBN 1-59593-034-5.
    [65]Martin Gebel and Claus Weihs. Calibrating classifier scores into probabilities [C]. In Reinhold Decker and Hans-Joachim Lenz, editors, GfKl, Studies in Classification, Data Analysis, and Knowledge Organization, pages 141-148. Springer,2006. ISBN 978-3-540-70980-0.
    [66]Zoubin Ghahramani, editor. Machine Learning, Proceedings of the Twenty-Fourth In-ternational Conference (ICML 2007), Corvalis, Oregon, USA, June 20-24,2007 [C], volume 227 of ACM International Conference Proceeding Series,2007. ACM. ISBN 978-1-59593-793-3.
    [67]Yair Goldberg, Alon Zakai, Dan Kushnir, and Ya'acov Ritov. Manifold learning:The price of normalization [J]. Journal of Machine Learning Research,9:1909-1939,2008.
    [68]Jacob Goldberger, Sam T. Roweis, Geoffrey E. Hinton, and Ruslan Salakhutdinov. Neighbourhood components analysis [C]. In Saul等人[153].
    [69]Edward F. Gonzalez and Yin Zhang. Accelerating the lee-seung algorithm for non-negative matrix factorization [R]. Technical report, Department of Computational and Applied Mathematics, Rice University,2005.
    [70]Geoffrey J. Gordon. Generalized2 linear2 models [C]. In Becker等人 [7], pages 577-584. ISBN 0-262-02550-7.
    [71]Daniel B. Graham and Nigel M. Allinson. Characterizing virtual eigensignatures for general purpose face recognition [C]. In Face Recognition:From Theory to Applications, volume 163 of NATO ASI Series F, Computer and Systems Sciences, pages 446-456, 1998.
    [72]Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimiza-tion [C]. In Saul等人[153].
    [73]Arthur Gretton, Ralf Herbich, and Alexander J. Smola. The kernel mutual informa-tion [R]. Technical report, Cambridge University Engineering Department and Max Planck Institute for Biological Cybernetics,2004.
    [74]Arthur Gretton, Olivier Bousquet, Alex J. Smola, and Bernhard Scholkopf. Mea-suring statistical dependence with hilbert-schmidt norms [C]. In Sanjay Jain, Hans-Ulrich Simon, and Etsuji Tomita, editors, ALT, volume 3734 of Lecture Notes in Com-puter Science, pages 63-77. Springer,2005. ISBN 3-540-29242-X.
    [75]Arthur Gretton, Ralf Herbrich, Alexander J. Smola, Olivier Bousquet, and Bernhard Scholkopf. Kernel methods for measuring independence [J].Journal of Machine Learn-ing Research,6:2075-2129,2005.
    [76]Arthur Gretton, Alexander Smola, Olivier Bousquet, Ralf Herbrich, Andrei Belitski, Mark Augath, Tusuke Murayama, Jon Pauls, Bernhard Scholkopf, and Nikos Logo-thetis. Kernel constrained covariance for dependence measurement [J]. AISTATS, 10,2005.
    [77]Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Scholkopf, and Alexander J. Smola. A kernel method for the two-sample-problem [C]. In Scholkopf等人[156],pages 513-520. ISBN 0-262-19568-2.
    [78]Arthur Gretton, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Scholkopf, and Alex J. Smola. A kernel statistical test of independence [C]. In Platt等人 [138].
    [79]G. H. Gulob and C. F. Van Loan. Matrix Computations [M]. Johns Hopkins Univer-sity Press, second edition,1989.
    [80]Yuhong Guo. Supervised exponential family principal component analysis via convex optimization [C].In Koller等人[97],pages 569-576.
    [81]Paul R. Halmos. Measure Theory [M]. Springer-Verlag New York Inc.,1974.
    [82]Jihun Ham, Daniel D. Lee, Sebastian Mika, and Bernhard Scholkopf. A kernel view of the dimensionality reduction of manifolds. [C]. In Brodley [23].
    [83]Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning:Data Mining, Inference and Prediction [M]. Springer,2001.
    [84]Simon Haykin. Neural Networks:A Comprehensive Foundation [M]. Prentice Hall, second edition, July 6 1998.
    [85]Li He, Junping Zhang, and Zhi-Hua Zhou. Investigating manifold learning algo-rithms based on magnification factors and principal spread directions [J]. Chinese Journal of Computers,28(12):2000-2009,2005. in Chinese.
    [86]Li He, Jose Miguel Buenaposada, and Luis Baumela. Real-time facial expression recog-nition with illumination-corrected image sequences [C]. In FG, pages 1-6. IEEE, 2008.
    [87]Li He, Jose Miguel Buenaposada, and Luis Baumela. An empirical comparison of graph-based dimensionality reduction algorithms on facial expression recognition tasks [C]. In ICPR, pages 1-4. IEEE,2008. ISBN 978-1-4244-2175-6.
    [88]Xiaofei He and Partha Niyogi. Locality preserving projections [R]. Technical Report TR-2002-09, Department of Computer Science, University of Chicago, Oct 2002. URL http://citeseer.ist.psu.edu/he021ocality.html.
    [89]Xiaofei He, Shuicheng Yan, Yuxiao Hu, Partha Niyogi,, and Hong-Jiang Zhang. Face recognition using laplacianfaces [J]. IEEE Transactions on Patten Analysis and Machine Intelligence,27(3), March 2005.
    [90]G. E. Hinton and R. R. Salakhutdinov. Reducing the dimentionality of data with neural network [J]. Science,313(5786):504-507,28 July 2006.
    [91]Geoffrey E. Hinton. Training products of experts by minimizing contrastive diver-gence [J]. Neural Computation,14(8):1771-1800,2002.
    [92]Geoffrey E. Hinton and Sam T. Roweis. Stochastic neighbor embedding [C]. In Becker等人[7],pages 833-840. ISBN 0-262-02550-7.
    [93]Geoffrey E. Hinton, Simon Osindero, and Yee Whye Teh. A fast learning algorithm for deep belief nets [J]. Neural Computation,18(7):1527-1554,2006.
    [94]Thomas Hofmann. Probabilistic latent semantic analysis [C]. In Kathryn B. Laskey and Henri Prade, editors, UAI, pages 289-296. Morgan Kaufmann,1999.
    [95]Takeo Kanade, Ying li Tian, and Jeffrey F. Cohn. Comprehensive database for facial expression analysis [C]. In FG, pages 46-53. IEEE Computer Society,2000. ISBN 0-7695-0580-5.
    [96]Michael J. Kearns, Sara A. Solla, and David A. Cohn, editors. Advances in Neural In-formation Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30-December 5,1998] [C],1999. The MIT Press. ISBN 0-262-11245-0.
    [97]Daphne Koller, Dale Schuurmans, Yoshua Bengio, and Leon Bottou, editors. Ad-vances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second An-nual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8-11,2008 [C],2009. MIT Press.
    [98]Frank R. Kschischang, Brendan J. Frey, and Hans-Andrea Loeliger. Factor graphs and the sum-product algorithm [J]. IEEE Transactions on Information Theory,47(2):498-519,2001.
    [99]Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. On sampling-based approx-imate spectral decomposition [C]. In Danyluk等人[42],page 70. ISBN 978-1-60558-516-1.
    [100]Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Ensemble nystrom method [G]. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1060-1068.2009.
    [101]Simon Lacoste-Julien, Fei Sha, and Michael I.Jordan. Disclda:Discriminative learn-ing for dimensionality reduction and classification [C]. In Koller等人 [97], pages 897-904.
    [102]John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional ran-dom fields:Probabilistic models for segmenting and labeling sequence data [C]. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, ICML, pages 282-289. Morgan Kaufmann,2001. ISBN 1-55860-778-1.
    [103]Stephane S. Lafon. Diffusion Maps and Geometric Harmonics [D]. PhD thesis, Yale University, May 2004.
    [104]Pei Ling Lai and Colin Fyfe. Kernel and nonlinear canonical correlation analysis [C]. In IJCNN (4), pages 4614-4619,2000.
    [105]Neil D. Lawrence.Gaussian process latent variable models for visualisation of high dimensional data [C]. In Thrun等人[179].ISBN 0-262-20152-6.
    [106]Neil D. Lawrence and Raquel Urtasun. Non-linear matrix factorization with gaussian processes [C]. In Danyluk等人[42],page 76. ISBN 978-1-60558-516-1.
    [107]Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization [J]. Nature,401, October 10,1999.
    [108]Daniel D. Lee and H.Sebastian Seung. Algorithms for non-negative matrix fac-torization [C]. In Advance in Neural Information Processing System, volume 13, pages 556-562,2001. URL http://hebb. mit. edu/people/seung/papers/ nmfconverge.pdf.
    [109]Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations [C]. In Danyluk等人[42],page 77. ISBN 978-1-60558-516-1.
    [110]Honglak Lee, Peter Pham, Yan Largman, and Andrew Ng. Unsupervised feature learning for audio classification using convolutional deep belief networks [G]. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1096-1104.2009.
    [111]Kerc Li. Sliced inverse regression for dimensionality reduction [J]. Journal of American Statistical Association,86:316-342,1991.
    [112]Stan Z. Li, Kap Luk Chan, and Changliang Wang. Performance evaluation of the nearest feature line method in image classification and retrieval [J]. IEEE Trans. Pat-tern Anal. Mach. Intell,22(11):1335-1349,2000.
    [113]Stan Z. Li, Xinwen Hou, Hongjiang Zhang, and Qiansheng Cheng. Learning spa-tially localized, parts-based representation [J]. IEEE,2001.
    [114]Chih-Jen Lin. Projected gradient methods for nonnegative matrix factorization [J]. Neural Computation,19(10):2756-2779,2007.
    [115]Chih-Jen Lin. On the convergence of multiplicative update algorithms for nonnegative matrix factorization [J]. IEEE Transactions on Neural Networks,18(6):1589-1596, 2007.
    [116]Tony Lin, Hongbin Zha, and Sang Uk Lee. Riemannian manifold learning for nonlin-ear dimensionality reduction. [C]. In Ales Leonardis, Horst Bischof, and Axel Pinz, editors, ECCV (1), volume 3951 of Lecture Notes in Computer Science, pages 44-55. Springer,2006. ISBN 3-540-33832-2.
    [117]Tie-Yan Liu. Learning to rank for information retrieval [G]. In Foundation and Trends on Information Retrieval, volume 3, pages 225-331. Now Publishers,2009.
    [118]Jorg Liicke, Richard Turner, Maneesh Sahani, and Marc Henniges. Occlusive com-ponents analysis [G]. In Y Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1069-1077.2009.
    [119]Michael J. Lyons, Julien Budynek, and Shigeru Akamatsu. Automatic classification of single facial images [J]. IEEE Trans. Pattern Anal. Mach. Intell.,21(12):1357-1362, 1999.
    [120]Jinwen Ma, Lei Xu, and Michael I.Jordan. Asymptotic convergence rate of the em algorithm for gaussian mixtures [J]. Neural Computation,12(12):2881-2907,2001.
    [121]David J.C. MacKay. Bayes methods for backprop networks [G]. In Eytan Domany, Jan Leonard Hemmen, and Klaus Schulten, editors, Models of Neural Networks Ⅲ, pages 212-254. Springer,1994.
    [122]Lester Mackey. Deflation methods for sparse pca [C]. In Koller等人[97],pages 1017-1024.
    [123]Yi Mao and Guy Lebanon. Isotonic conditional random fields and local sentiment flow [C]. In Scholkopf等人[156],pages 961-968. ISBN 0-262-19568-2.
    [124]J. L. McClelland, D. E. Rumelhart, and the PDP Research Group. Parallel Distributed Processing:Explorations in the Microstructure of Cognition. Volume 2:Psychological and Biological Models [M]. MIT Press, Cambridge, MA,1986.
    [125]Peter McCullagh. Regression models for ordinal data [J]. Journal of Royal Statistic Society B,42:109-142,1980.
    [126]Thomas Melzer, Michael Reiter, and Horst Bischof. Appearance models based on kernel canonical correlation analysis [J]. Pattern Recognition,36(9):1961-1971,2003.
    [127]S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Muller. Fisher discriminant analysis with kernels [C]. In Proceedings of IEEE Neural Networks for Signal Processing Workshop 1999,1999. URL http://citeseer.ist.psu.edu/mika99fisher. html.
    [128]Baback Moghaddam, Yair Weiss, and Shai Avidan. Spectral bounds for sparse pca: Exact and greedy algorithms [C]. In Weiss等人[193].
    [129]Shakir Mohamed, Katherine A. Heller, and Zoubin Ghahramani. Bayesian exponen-tial family pca [C]. In Koller等人[97],pages 1089-1096.
    [130]Enrique Munoz, Jose Miguel Buenaposada, and Luis Baumela. Efficient model-based 3d tracking of deformable objects [C]. In ICCV, pages 877-882. IEEE Computer Society,2005. ISBN 0-7695-2334-X.
    [131]Boaz Nadler, Stephane Lafon, Ronald R. Coifman, and Ioannis G. Kevrekidis. Diffu-sion maps, spectral clustering and eigenfunctions of Fokker-Planck operators [C]. In Advances in Neural Information Processing System,2005.
    [132]S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library (coil-20) [R]. Technical Report Technical Report CUCS-005-96, Department of Computer Science, Columbia University, Feb 1996.
    [133]Minh Hoai Nguyen and Fernando De la Torre. Robust kernel principal component analysis [C]. In Koller等人[97],pages 1185-1192.
    [134]Nam Nguyen and Rich Caruana. Improving classification with pairwise constraints: A margin-based approach [C]. In Walter Daelemans, Bart Goethals, and Katharina Morik, editors, ECML/PKDD (2), volume 5212 of Lecture Notes in Computer Science, pages 113-124. Springer,2008. ISBN 978-3-540-87480-5.
    [135]Jorge Nocedal and Stephen J. Wright. Numerical Optimization [M]. Springer Science +Business Media Inc., NY, USA,1999.
    [136]J. Park and I. W. Sandberg. Universal approximation using radial-basis-function net-works [J]. Neural Computation,3:246-257,1991.
    [137]John C. Platt. Probilistic outputs for support vector machines and comparisons to reg-ularized likelihood methods [G]. In Alexander J. Smola, Peter L. Bartlett, Bernhard Scholkopf, and Dale Schuurmans, editors, Advances in Large Margin Classifiers, pages 61-74. MIT Press, Cambridge,1999.
    [138]John C. Platt, Daphne Koller, Yoram Singer, and Sam T. Roweis, editors. Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Con-ference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6,2007 [C],2008. MIT Press.
    [139]Luc De Raedt and Stefan Wrobel, editors. Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005), Bonn, Germany, August 7-11,2005 [C], volume 119 of ACM International Conference Proceeding Series,2005. ACM. ISBN 1-59593-180-5.
    [140]Jason D. M. Rennie and Nathan Srebro. Fast maximum margin matrix factorization for collaborative prediction [C]. In Raedt与Wrobel [139], pages 713-719. ISBN 1-59593-180-5.
    [141]Irina Rish, Genady Grabarnik, Guillermo Cecchi, Francisco Pereira, and Geoffrey J. Gordon. Closed-form supervised dimensionality reduction with generalized linear models [C]. In Cohen等人[34],pages 832-839. ISBN 978-1-60558-205-4.
    [142]Sam Roweis, Lawrence K. Saul, and Geoffrey E. Hinton. Global coordination of local linear models [C]. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Ad-vances in Neural Information Processing Systems, volume 14. MIT Press,2002. URL http://citeseer.ist.psu.edu/roweis02global.html.
    [143]Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by local linear embedding [J]. Science,290, December 2000.
    [144]D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representation of back-propagation errors [J]. Nature (London),323:533-536,1986.
    [145]D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing:Explorations in the Microstructure of Cognition. Volume 1:Foundations [M]. MIT Press, Cambridge, MA,1986.
    [146]Ruslan Salakhutdinov and Geoffrey Hinton. Learning a nonlinear embedding by pre-serving class neighbourhood structure [C]. In Proceedings of the International Confer-ence on Artificial Intelligence and Statistics, volume 11,2007.
    [147]Ruslan Salakhutdinov and Andriy Mnih. Bayesian probabilistic matrix factorization using markov chain monte carlo [C]. In Cohen等人[34],pages 880-887. ISBN 978-1-60558-205-4.
    [148]Ruslan Salakhutdinov, Sam T. Roweis, and Zoubin Ghahramani. Optimization with em and expectation-conjugate-gradient [C]. In Fawcett与Mishra [57], pages 672-679. ISBN 1-57735-189-4.
    [149]Ferdinando S. Samaria. Face Recognition Using Hidden Markov Model [D], PhD thesis, University of Cambridge,1994.
    [150]Jr. J. W. Sammon. A nonlinear mapping for data structure analysis [J]. IEEE Trans-actions on Computers,18:401-409,1969.
    [151]Guido Sanguinetti. Dimensionality reduction of clustered data sets [J]. IEEE Trans. Pattern Anal. Mach. Intell.,30(3):535-540,2008.
    [152]Lawrence K. Saul and Sam T. Roweis. Think globally, fit locally:Unsupervised learn-ing of low dimensional manifold [J]. Journal of Machine Learning Research,4:119-155, 2003.
    [153]Lawrence K. Saul, Yair Weiss, and Leon Bottou, editors. Advances in Neural Informa-tion Processing Systems 17 [Neural Information Processing Systems, NIPS 2004, Decem-ber 13-18,2004, Vancouver, British Columbia, Canada] [C], Cambridge, MA,2004. MIT Press.
    [154]Bernhard Scholkopf and Alexander J. Smola. Learning with Kernels:Support Vector Machines, Regularization, Optimization and Beyond [M]. The MIT Press,2002.
    [155]Bernhard Scholkopf, Alex J. Smola, and Klaus-Robert Muller. Kernel principal com-ponent analysis [C]. In Wulfram Gerstner, Alain Germond, Martin Hasler, and Jean-Daniel Nicoud, editors, ICANN, volume 1327 of Lecture Notes in Computer Science, pages 583-588. Springer,1997. ISBN 3-540-63631-5.
    [156]Bernhard Scholkopf, John C. Platt, and Thomas Hoffman, editors. Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7,2006 [C],2007. MIT Press. ISBN 0-262-19568-2.
    [157]James R. Schott. Matrix Analysis for Statistics [M]. John Wiley & Sons, Inc.,2nd edition,2005.
    [158]Fei Sha and Lawrence K. Saul. Analysis and extension of spectral methods for nonlinear dimensionality reduction [C]. In Proceedings of the Twenty Second Inter-national Conference on Machine Learning (ICML-05), pages 785-792,2005. URL http://www.cs.ucsd.edu/-saul/papers/conformal_icm105.pdf.
    [159]Amnon Shashua and Tamir Hazan. Non-negative tensor factorization with applica-tions to statistics and computer vision [C]. In Raedt与Wrobel [139], pages 792-799. ISBN 1-59593-180-5.
    [160]Blake Shaw and Tony Jebara. Structure preserving embedding [C]. In Danyluk等人[42], page 118. ISBN 978-1-60558-516-1.
    [161]Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,22(8):888-905,2000. URL http://citeseer.ist.psu.edu/shi97normalized.html.
    [162]Andrew Smith, Xiaoming Huo, and Hongyuan Zha. Convergence and rate of conver-gence of a manifold-based dimension reduction algorithm [C]. In Koller等人 [97]， pages 1529-1536.
    [163]Alex J. Smola, S. V. N. Vishwanathan, and Thomas Hofmann. Kernel methods for missing data [C]，2005. Proceedings of the Tenth International Workshop on Artifi-cial Intelligence and Statistics.
    [164]Alex J. Smola, Arthur Gretton, Le Song, and Bernhard Scholkopf. A hilbert space embedding for distributions [C]. In Marcus Hutter, Rocco A. Servedio, and Eiji Taki-moto, editors, ALT, volume 4754 of Lecture Notes in Computer Science, pages 13-31. Springer,2007. ISBN 978-3-540-75224-0.
    [165]Le Song, Alex J. Smola, Arthur Gretton, Karsten M. Borgwardt, and Justin Bedo. Supervised feature selection via dependence estimation [C]. In Ghahramani [66], pages 823-830. ISBN 978-1-59593-793-3.
    [166]Le Song, Alexander J. Smola, Arthur Gretton, and Karsten M. Borgwardt. A de-pendence maximization view of clustering [C]. In Ghahramani [66], pages 815-822. ISBN 978-1-59593-793-3.
    [167]Bharath Sriperumbudur and Gert Lanckriet. On the convergence of the concave-convex procedure [G]. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1759-1767.2009.
    [168]Bharath K. Sriperumbudur, David A. Torres, and Gert R. G. Lanckriet. Sparse eigen methods by d.c. programming [C]. In Ghahramani [66], pages 831-838. ISBN 978-1-59593-793-3.
    [169]M. B. Stegmann. Analysis and segmentation of face images using point annotations and linear subspace techniques [R]. Technical report, Informatics and Mathematical Modelling, DTU, aug 2002. URL http://www. imm. dtu. dk/～aam/datasets/ face_data.zip.
    [170]Ingo Steinwart. On the influence of the kernel on the consistency of support vector machines [J]. Journal of Machine Learning Research,2:67-93,2001.
    [171]Olga Streibel. Trend mining with semantic-based learning [J], June 2008.
    [172]Masashi Sugiyama. Local fisher discriminant analysis for supervised dimensionality reduction [C]. In Cohen与 Moore [33], pages 905-912. ISBN 1-59593-383-2.
    [173]Masashi Sugiyama, Tsuyoshi Ide, and Shinichi Nakajima. Semi-supervised local fisher discriminant analysis for dimensionality reduction [J]. Machine Learning,78(1-2): 35-61,2010.
    [174]Liang Sun, Shuiwang Ji, and Jieping Ye. A least squares formulation for canonical correlation analysis [C]. In Cohen等人[34],pages 1024-1031. ISBN 978-1-60558-205-4.
    [175]Liang Sun, Shuiwang Ji, and Jieping Ye. A least squares formulation for a class of generalized eigenvalue problems in machine learning [C]. In Danyluk等人[42],page 123. ISBN 978-1-60558-516-1.
    [176]Johan A. K. Suykens and Joos Vandewalle. Least squares support vector machine classifiers [J]. Neural Processing Letters,9(3):293-300,1999.
    [177]Yee Whye Teh and Sam T. Roweis. Automatic alignment of local representations [C]. In Becker等人[7],pages 841-848. ISBN 0-262-02550-7.
    [178]Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric frame-work for nonlinear dimensionality reduction [J]. Science,290, December 2000.
    [179]Sebastian Thrun, Lawrence K. Saul, and Bernhard Scholkopf, editors. Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13,2003, Vancouver and Whistler, British Columbia, Canada] [C], 2004. MIT Press. ISBN 0-262-20152-6.
    [180]Robert Tibshirani. Regression shrinkage and selection via the lasso [J]. Journal of Royal Statistical Society B,58(1):267-288,1996.
    [181]Michael E. Tipping and Chris M. Bishop. Probabilistic principal component analysis [J]. Journal of the Royal Statistical Society, Series B,61:611-622,1999.
    [182]Michael E. Tipping and Christopher M. Bishop. Mixtures of probabilistic principal component analysers [J]. Neural Computation, 11(2):443-482,1999.
    [183]Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method [C]. In The 37th annual Allerton Conference on Communication, Control, and Computing, pages 368-377, Sep 1999.
    [184]Matthew A. Turk and Alex P. Pentland. Face recognition using eigenfaces [C]. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, page 586,1991.
    [185]Dimitris G. Tzikas, Liyang Wei, Aristidis Likas, Yongyi Yang, and Nikolas P. Galat-sanos. A tutorial on relevance vector machines for regrssion and classification with applications [J]. EURASIP news letter,17(2):4-23, June 2006.
    [186]Laurens van der Maaten, Eric Postma, and Jaap van den Herik. Dimension reduction: A comparative review [R]. Technical Report TiCC-TR-2009-005, TiCC, Tilburg University,2009.
    [187]Laurens J. P. van der Maaten and Geoffrey E. Hinton. Visualizing high-dimensional data using t-sne [J]. Journal of Machine Learning Research,9:2579-2605, Nov 2008.
    [188]Jarkko Venna and Samuel Kaski. Visualizing gene interaction graphs with local mul-tidimensional scaling [C]. In Proceedings of ESANN'06,14th European Symposium on Artificial Neural Networks, pages 557-562,2006.
    [189]Paul Viola and Michael J. Jones. Robust real-time face detection [J]. International Journal of Computer Vision,57(2):137-154, May 2004.
    [190]Huan Wang, Shuicheng Yan, Dong Xu, Xiaoou Tang, and Thomas S. Huang. Trace ratio vs. ratio trace for dimensionality reduction [C]. In CVPR. IEEE Computer So-ciety,2007.
    [191]Kilian Q. Weinberger, Fei Sha, and Lawrence K. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. [C]. In Brodley [23].
    [192]Kilian Q. Weinberger, Fei Sha, Qihui Zhu, and Lawrence K. Saul. Graph lapla-cian regularization for large-scale semidefinite programming [C]. In Scholkopf等人[156],pages 1489-1496. ISBN 0-262-19568-2.
    [193]Y. Weiss, B. Scholkopf, and J. Platt, editors. Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8,2005, Vancouver, British Columbia, Canada] [C], Cambridge, MA,2005. MIT Press.
    [194]Max Welling and Markus Weber. Positive tensor factorization [J]. Pattern Recognition Letters,22(12):1255-1261,2001.
    [195]Christopher K. I. Williams and Matthias Seeger. Using the nystrom method to speed up kernel machines [C]. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, NIPS, pages 682-688. MIT Press,2000.
    [196]Qiang Wu, Sayan Mukherjee, and Feng Liang. Localized sliced inverse regression [C]. In Koller等人[97],pages 1785-1792.
    [197]Xiao-Ming WU, Anthony Man-Cho So, Zhenguo Li, and Shuo-Yen Robert Li. Fast graph laplacian regularized kernel learning via semidefinite-quadratic-linear pro-gramming [G]. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1964-1972.2009.
    [198]Cun Lu Xu and Yan Qiu Chen. Statistical landscape features for texture classification [C]. In ICPR (1), pages 676-679,2004.
    [199]Zenglin Xu, RongJin, Jieping Ye, Michael R. Lyu, and Irwin King. Non-monotonic feature selection [C]. In Danyluk等人[42],page 144. ISBN 978-1-60558-516-1.
    [200]Rong Yan, Jian Zhang, Jie Yang, and Alexander G. Hauptmann. A discriminative learning framework with pairwise constraints for video object classification [J]. IEEE Trans. Pattern Anal. Mach. Intell,28(4):578-593,2006.
    [201]Shuicheng Yan, Dong Xu, Benyu Zhang, and HongJiang Zhang. Graph embedding: A general framework for dimensionality reduction [C]. In CVPR (2), pages 830-837. IEEE Computer Society,2005. ISBN 0-7695-2372-2.
    [202]Jieping Ye. Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems [J]. Journal of Machine Learning Research,6:483-502,2005.
    [203]Jieping Ye, Ravi Janardan, Qi Li, and Haesun Park. Feature extraction via generalized uncorrelated linear discriminant analysis [C]. In Brodley [23].
    [204]Jieping Ye, Ravi Janardan, Cheong Hee Park, and Haesun Park. An optimization criterion for generalized discriminant analysis on undersampled problems [J]. IEEE Transaction on Pattern Recognition and Machine Intelligence,26:982-994,2004.
    [205]Xiaotong Yuan and Bao-Gang Hu. Robust feature extraction via information theoretic learning [C]. In Danyluk等人[42],page 150. ISBN 978-1-60558-516-1.
    [206]Alan L. Yuille and Anand Rangarajan. The concave-convex procedure (CCCP) [C]. In Dietterich等人[48],pages 1033-1040.
    [207]Hongyuan Zha and Zhenyue Zhang. Isometric embedding and continuum isomap. [C]. In Fawcett与Mishra [57], pages 864-871. ISBN 1-57735-189-4.
    [208]Hongyuan Zha, Xiaofeng He, Chris H. Q. Ding, Ming Gu, and Horst D. Simon. Spectral relaxation for k-means clustering [C]. In Dietterich等人[48],pages 1057-1064.
    [209]Daoqiang Zhang, Zhi-Hua Zhou, and Songcan Chen. Semi-supervised dimension-ality reduction [C]. In SDM. SIAM,2007.
    [210]Jian Zhang and Rong Yan. On the value of pairwise constraints in classification and consistency [C]. In Ghahramani [66], pages 1111-1118. ISBN 978-1-59593-793-3.
    [211]Junping Zhang and Li He. Supervised manifold learning [C]. In Zhi-Hua Zhou and Jue Wang, editors, Machine Learning and its Application:2007. Tsinghua university Press,2007.
    [212]Junping Zhang, Li He, and Zhi-Hua Zhou. Analyzing magnification factors and prin-cipal spread directions in manifold learning [C]. In A. Abraham, B. deBaets, M. Kop-pen, and B. Nickolay, editors, Soft Computing Technologies:The Challenge of Complex-ity, Advances in Soft Computing Series Advances in Soft Computing, pages 651-664. Springer,2006.
    [213]Junping Zhang, Li He, and Zhi-Hua Zhou. Ensemble-based discriminant manifold learning for face recognition [C]. In LichengJiao, Lipo Wang, Xibo Gao, Jing Liu, and Feng Wu, editors, the Second International Conference on Natual Computation (ICNC 2006), pages 29-38. Springer-Verlag Berlin Heidelberg,2006. Part I, LNCS4221.
    [214]Junping Zhang, Ben Tan, Fei Sha, and Li He. Predicting pedestrian counts in crowded scenes with rich and high-dimensional features [C]. submitted to Transactions on Intelligent Transportation Systems,2010.
    [215]Kai Zhang, Ivor W. Tsang, and James T. Kwok. Improved Nystrom low-rank ap-proximation and error analysis [C]. In Cohen等人[34],pages 1232-1239. ISBN 978-1-60558-205-4.
    [216]Tianhao Zhang, Jie Yang, Deli Zhao, and Xinliang Ge. Linear local tangent space alignment and application to face recognition [J]. Neurocomputing,70(7-9):1547-1553,2007.
    [217]W. Zhang, X. Xue, Z. Sun, Y. Guo, and H. Lu. Optimal dimensionality of metric space for classification [C]. In Zoubin Ghahramani, editor, Proceedings of the 24th An-nual International Conference on Machine Learning (ICML 2007), pages 1135-1142. Omnipress,2007.
    [218]Xinhua Zhang, Le Song, Arthur Gretton, and Alex J. Smola. Kernel measures of independence for non-iid data [C]. In Koller等人[97],pages 1937-1944.
    [219]Yin Zhang and Zhi-Hua Zhou. Multi-label dimensionality reduction via dependence maximization [C]. In Dieter Fox and Carla P. Gomes, editors, AAAI, pages 1503-1505. AAAI Press,2008. ISBN 978-1-57735-368-3.
    [220]Yin Zhang and Zhi-Hua Zhou. Multi-label dimensionality reduction via dependence maximization [J]. ACM Transactions on Knowledge Discovery from Data,2010.
    [221]Zhenjie Zhang, Bing Tian Dai, and Anthony K. H. Tung. Estimating local optimums in em algorithm over gaussian mixture model [C]. In Cohen等人[34],pages 1240-1247. ISBN 978-1-60558-205-4.
    [222]Zhenyue Zhang and Hongyuan Zha. Principal manifolds and non-linear dimension reduction via local tangent space alignment [C],2002. URL http://arxiv.org/find/grp_cs/1/abs:+AND+learning+AND+ nonlinear+manifold/0/1/0/all/0/1.
    [223]Ji Zhu and Trevor Hastie. Kernel logistic regression and the import vector machine [C]. In Dietterich等人[48],pages 1081-1088.
    [224]Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net [J]. JRSSB,67:301-320,2005.
    [225]Hui Zou, Trevor Hastie, and Rob Tibshirani. Sparse principal component analysis [J]. JCGS,15:262-286,2006.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700