融合全局和局部信息的度量学习方法研究

英文题名：Globality and Locality Incorporation in Distance Metric Learning
作者：王微
论文级别：博士
学科专业名称：模式识别与智能系统
中文关键词：度量学习 ; 马氏距离 ; 凸优化 ; 解析解 ; 分类 ; 降维
英文关键词：metric learning ; Mahalanobis distance ; convex optimization ; closed-
英文关键词：form solution ; classification ; dimensionality reduction
学位年度：2014
导师：胡包钢 ; 汪增福
学科代码：081104
学位授予单位：中国科学技术大学
论文提交日期：2014-05-01

摘要

度量学习(Metric Learning)在机器学习中是一个非常重要的基础性命题。距离函数度量了不同样本点之间的相似性,因此,距离函数显著地影响着大部分机器学习算法的性能,如k-近邻分类、径向基函数网络分类、支持向量机分类以及κ-means聚类等方法。由于线性度量学习的高效性和可扩展性(通过核方法可扩展为非线性度量方法),现今的研究重点放在了线性度量(马氏距离)学习问题上。为了提升分类性能并且适应多峰分布的数据集,将全局信息和局部信息融合在马氏距离学习中是一个非常有价值而且具有挑战性的课题。同时随着互联网和信息行业的快速发展,人们面临对海量数据的挖掘和应用,高效性也是度量学习亟待解决的问题。本篇论文针对度量学习中的两个问题：1)通过不引入平衡权重的方式实现全局和局部信息融合；2)降低运算复杂度,进行了系统性的研究,取得了下面三个阶段性研究成果。
     第一阶段：基于识别坍塌的全局和局部保持映射最大坍塌的度量学习(Maximally Collapsing Metric Learning,MCML)[5]是一种广泛应用的马氏度量学习算法,旨在将所有相同标签的数据点通过学习到的度量矩阵坍塌在一起。针对MCML中数据局部信息的丢失,本部分提出一个度量学习算法将最大坍塌的思想、局部保持的思想和分类识别能力统一在一起,从而有效地将全局信息和局部信息融合在学习到的马氏距离中而不需要引入平衡权重。更重要的是,该提出的度量学习算法是一个凸问题,可以通过一个一阶梯度下降法求解而避免陷入局部极值。为了进一步的降低运算时间,本部分将算法中一些计算密集的步骤映射到了并行平台图像处理器(graphics processor units. GPUs)上。基准数据集上的分类和可视化结果验证了提出算法的可靠性和有效性。
     第二阶段：基于相关性最大化的度量学习第一阶段提出的度量学习算法虽然能够有效地融合全局信息和局部信息,但是它的目标函数比较复杂,求导的运算复杂度比较高。因此,在第二个阶段我们提出了一个基于统计的马氏学习框架,称为“基于相关性最大化的度量学习”。本部分的贡献包括：
     ·有效地将全局信息和局部信息融合在马氏距离中而不需要引入平衡权重。
     ·区别于经典的相关性衡量标准,例如互信息(Mutual Information)和皮尔森卡方检定(Pearson's X2test),本部分采用了在再生核希尔伯特空间(reproducing kernel Hilbert spaces, RKHSs)计算的衡量标准,从而不需要对数据的分布进行估计或者假设。
     ·在这个度量学习框架下,通过采用不同的基于核的准则,提出了两种具体的学习算法。这两种算法都属于凸优化问题,而且目标函数的求导运算复杂度很低,可以通过一个一阶梯度下降法有效求解。在基准数据集下的分类、可视化和检索实验结果证明了两种算法的有效性和不同的适用范围。
     第三阶段：基于信息几何的度量学习方法前两个阶段提出的度量学习算法虽然都是凸的优化问题,但是都需要通过一个梯度下降法迭代求解。不同于现今存在的大部分度量学习算法,信息几何度量算法(Information Geometry Metric Learning, IGML)[24]可以找到一个解析解而不需要求解一个半正定规划问题。在第三个阶段,我们根据信息几何理论,提出了两种算法来分别解决IGML的局限性。(1) IGML的时间复杂度是O(d3＋nd2),其中n是训练样本个数,d是数据的维度。基于低秩的假设,本部分提出一个度量学习算法EIGML将IGML的运算复杂度降到了O(nd),极大地提升了算法在高维数据集上的性能。(2) IGML不适用于奇异核矩阵,而且丢失了数据的局部信息。本部分提出一个度量学习算法SIGML将IGML扩展到了非奇异核矩阵的情况而且同时融合了数据的局部和全局信息。我们强调提出的两种算法都能找到解析解,可以被高效优化。实验结果验证了这两种算法的有效性。
     小结：通过以上三个阶段的研究,论文最后提出的基于信息几何的算法SIGML在全局信息和局部信息融合的思想上涵盖了前两个阶段的研究,而且SIGML能够找到解析解从而避免了迭代求解中参数和步长的调整。对于全局信息保持的算法,我们提出的EIGML极大地降低了运算复杂度,使得度量学习算法能够应用于大规模高维数据。
Metric Learning is important and fundamental in machine Learning. The metric distances provide a measurement of dissimilarity between different points and significantly influence the performance of many algorithms in machine learn-ing, such as κ-nearest neighbor classification, support vector machines, radial basis function networks and κ-means clustering. Due to the efficiency and scalability of linear metric learning, most effort has been spent on learning a Mahalanobis distance from labeled training data. To improve the classification performance and adapt to the multimodal data distributions, incorporating the geometric in-formation (i.e., locality) with the label information (i.e., globality) is of particular valuable and challenging. Therefore, in this thesis, our specific concern is:1) incorporating globality and locality in Mahalanobis distance without optimizing balancing weight (s);2) reducing the computational complexity. The following three stage research results were obtained.
     The First Stage:Discriminating Classes Collapsing for Globality and Lo-cality Preserving Projections. As a widely used metric learning method, Metric Learning by Collapsing Classes (MCML)[5] aims to find a distance metric that collapses all the points in the same class while maintaining the separation between different classes. This part attempts to combine the ideas behind Locality Pre-serving, Discriminating Power and MCML in a unified method. The proposed al-gorithm is convex and incorporates the globality and locality information without balancing weight (s). To further decrease the running time, some computation-ally intensive steps of the proposed method are mapped to a GPU architecture. Experimental results demonstrate the effectiveness of the proposed method.
     The Second Stage:Dependence Maximization based Metric Learning. The method proposed in the first stage has a complex objective function and the deriva- tion is difficult to be calculated. Therefore, this part proposes a general Maha-lanobis distance learning framework referred to as "Dependence Maximization based Metric Learning"(DMML) in a statistical setting. The main contributions of this part include:
     · DMML effectively incorporates two sources of information (i.e., globality and locality) in Mahalanobis distance without optimizing balancing weight (s).
     · Distinguished from classical dependence measuring criteria (e.g., Mutual Information and Pearson's X2test), DMML focuses on using the criteria computed in RKHSs to avoid estimation or assumption of the data distribu-tions. Many existing kernel-based criteria can be incorporated into DMML to tackle the independence measurement problem.
     · Under DMML framework, two methods are proposed by employing Hilbert-Schmidt Independence Criterion (HSIC)[8] and generalized Distance Co-variance [28], respectively. They are formulated as convex programs and can be efficiently optimized by the first order gradient procedure.
     The Third Stage:Efficient and Scalable Information Geometry Metric Learn-ing. Although the methods proposed in the first two stages are convex problems, they are optimized by a gradient descent method. In contrast to most existing metric learning methods, Information Geometry Metric Learning (IGML)[24] can find a closed-form solution. This part propose two novel distance metric learn-ing algorithms to alleviate the limitations respectively.(1) The proposed method EIGML can reduce the computational complexity of IGML from O(d3+nd2) to O(nd).(2) IGML is infinite for singular matrices. Moreover, the geometric in-formation of data is lost in IGML. The proposed method SIGML can preserve both locality and globality. We emphasize that these two methods can find the closed-form solutions, leading to efficient optimization.
     Summary:The proposed method SIGML in the third stage includes the incorporation of globality and locality in the first two stages. SIGML can find the closed-form solution and avoid parameters tuning in the iteration solution. As a globality metric learning method, EIGML greatly reduces the computation complexity and can be applied to large-scale high-dimensional data.

引文

[1]J. Chen, Z. Zhao, J. Ye, and H. Liu. Nonlinear adaptive distance metric learning for clustering. In Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 123-132, ACM,2007.
    [2]J.B. Tenenbaum, V.d. Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science,290(5500):2319-2323,2000.
    [3]R. Jin, S. Wang, and Y. Zhou. Regularized distance metric learning:theory and algorithm. In Proceedings of the 23th Annual Conference on Advances in Neural Information Processing Systems, pages 862-870,2009.
    [4]E.P. Xing, A.Y. Ng, M.I. Jordan, and S.J. Russell. Distance metric learning, with application to clustering with side-information. In Proceedings of the 16th Annual Conference on Advances in Neural Information Processing Systems, pages 505-512,2002.
    [5]A. Globerson, S. Roweis, Metric learning by collapsing classes. In Proceedings of the 20th Annual Conference on Advances in Neural Information Processing Systems, pages 451-458,2006.
    [6]J.V. Davis, B. Kulis, P. Jain, S. Sra, and I.S. Dhillon. Information-theoretic metric learning. In Proceedings of the 24th Annual International Conference on Machine Learning, pages 209-216, ACM,2007.
    [7]NVIDA. Compute Unified Device Architecture Programming Guide. NVIDIA Corporation,2007.
    [8]A. Gretton, O. Bousquet, A.J. Smola, and B. Scholkopf. Measuring statistical dependence with hilbert-schmidt norms. In Proceedings of the 16th International Conference on Algorithmic Learn-ing Theory, pages 63-67,2005,
    [9]L. Song, A. Smola, A. Gretton, and K.M. Borgwardt. A dependence maximization view of clus-tering. In Proceedings of the 24th Annual International Conference on Machine Learning, pages 815-822, ACM,2007.
    [10]L. Song, A. Smola, A. Gretton, and K.M. Borgwardt. Supervised feature selection via dependence estimation. In Proceedings of the 24th Annual International Conference on Machine Learning, pages 823-830, ACM,2007.
    [11]M. Wang, F. Sha, and M.I. Jordan. Unsupervised kernel dimension reduction. In Proceedings of the 24th Annual Conference on Advances in Neural Information Processing Systems, pages 2379-2387, 2010.
    [12]R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7(2):179-188,1936.
    [13]I. Jolliffe. Principal Component Analysis. Springer-Verlag,1986.
    [14]X. He and P. Niyogi. Locality preserving projections. In Proceedings of the 17th Annual Conference on Advances in Neural Information Processing Systems, pages 153-160,2003.
    [15]J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In Proceedings of the 19th Annual Conference on Advances in Neural Information Processing Systems, pages 513-520,2005.
    [16]K.Q. Weinberger, J. Blitzer, and L.K. Saul. Distance metric learning for large margin nearest neighbor classification. In Proceedings of the 20th Annual Conference on Advances in Neural Information Processing Systems, pages 1473-1480,2006.
    [17]L. Yang, R. Jin, R. Sukthankar, and Y. Liu. An efficient algorithm for local distance metric learning. In Proceedings of the 21th National Conference on Artificial Intelligence, pages 543-548,2006,
    [18]M. Sugiyama. Dimensionality reduction of multimodal labele data by local fisher discriminant Analysis. Journal of Machine Learning Research,8:1027-1061,2007.
    [19]S.C.H. Hoi, W. Liu, and S.-F. Chang. Semi-supervised distance metric learning for collaborative image retrieval. In Proceedings of the 21st IEEE Conference on Computer Vision and Pattern Recognition, pages 1-7,2008,
    [20]G.Q. Zhong, K.Z. Huang, and C.L. Liu. Low rank metric learning with manifold regularization. In Proceedings of the 11th IEEE International Conference on Data Mining, pages 1266-1271,2011.
    [21]S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326,2000.
    [22]M. Wang, B. Liu, J. Tang, and X.-S. Hua. Metric learning with feature decomposition for image categorization. Neurocomputing,73:1562-1569,2010.
    [23]A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a Mahalanobis metric from equiv-alence constraints, Journal of Machine Learning Research,6:937-965,2005.
    [24]S. Wang and R. Jin. An information geometry approach for distance metric learning. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, pages 591-598,2009.
    [25]N. Cristianini, J. Kandola, A. Elisseeff, and J. Shawe-Taylor. On kernel target alignment. In Proceedings of the 16th Annual Conference on Advances in Neural Information Processing Systems, pages 367-373,2002.
    [26]C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Springer, New-York,1984.
    [27]D. Sejdinovic, A. Gretton, B. Sriperumbudur, and K. Fukumizu. Hypothesis testing using pairwise distances and associated kernels. In Proceedings of the 29th International Conference on Machine Learning, pages 1111-1118,2012.
    [28]G. Szekely, M. Rizzo, and N.K. Bakirov. Measuring and testing dependence by correlation of distances. Annals of Statistics,6:2769-2794,2007.
    [29]R.S. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the 19th International Conference on Machine Learning, pages 315-322,2002,
    [30]A. Smola and R. Kondor. Kernels and regularization on graphs. In Proceedings of the 16th Annual Conference on Learning Theory, pages 144-158,2003.
    [31]A. Oliva and A. Torralba. Modeling the shape of the scene:a holistic representation of the spatial envelope. International Journal of Computer Vision,42:145-175,2001.
    [32]Y. Zhang and Z.H. Zhou. Multi-label dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data,4:1-21,2010.
    [33]R. Lyons. Distance covariance in metric spaces. The Annals of Probability,41(5):3051-3696,2013.
    [34]A. Frome, Y. Singer, and J. Malik. Image retrieval and classification using local distance functions. In Proceedings of the 21th Annual Conference on Advances in Neural Information Processing Systems, pages 417-424,2007.
    [35]J. Yu, D. Tao, and M. Wang. Adaptive hypergraph learning and its application in image classifica-tion. IEEE Transaction on Image Processing,21:3262-3272,2012.
    [36]L. Yang, R. Jin, L. Mummert, R. Sukthankar, A. Goode, B. Zheng, S.C. Hoi, and M. Satyanarayanan, A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence,32(1):30-44,2010.
    [37]S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, Cambridge, 2004.
    [38]D.P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection method. IEEE Transaction on Automatic Control,21(2):174-184,1976.
    [39]G.Retsch, T. Onoda, and K.-R. M(u｜")ller Soft margins for adaboost. Machine Learning,42(3):287-320, 2001.
    [40]K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimension reduction in regression. The Annals of Statistics,37:316-327,2009.
    [41]J. Nilsson, F. Sha, and M. I. Jordan. Regression on manifolds using kernel dimension reduction. In Proceedings of the 24th International Conference on Machine Learning, pages 697-704,2007.
    [42]A. Gretton, R. Herbrich, A. Smola, O. Bousqrt, and B. Scholkopf, Kernel methods for measuring independence, Journal of Machine Learning Research,6:2075-2129,2005.
    [43]F.R. Bach and M.I. Jordan. Kernel independence component analysis. Journal of Machine Learning Research,3:1-48,2003.
    [44]B. Kulis. Metric learning:a survey. Foundations & Trends in Machine Learning,5(4):287-364,2012.
    [45]刘博.距离测度学习理论与应用研究.合肥：中国科学技术大学,2010.
    [46]B. Cao, X. Ni, J. T. Sun, G. Wang, and Q. Yang. Distance metric learning under covariate shift. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pages 1204-1210, 2011.
    [47]P. Common. Independent component analysis-a new concept? Signal Processing 36:287-314,1994.
    [48]X. He, D. Cai, S. Yan, and H. Zhang. Neighborhood preserving embedding. In Proceedings of the 10th International Conference on Computer Vision, pages 1208-1213,2005.
    [49]H.-T. Chen, H.-W. Chang, and T.-L. Liu. Local discriminant embedding and its variants. In Proceedings of the 18th IEEE Conference on Computer Vision and Pattern Recognition, pages 846-853,2005.
    [50]L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering. In Proceedings of the 18th Annual Conference on Advances in Neural Information Processing Systems, pages 1601-1608,2004.
    [51]S. JacobGoldberger and R. GeoffHinton. Neighbourhood components analysis. In Proceedings of the 19th Annual Conference on Advances in Neural Information Processing Systems, pages 513-520, 2005.
    [52]T. Yeh, T. Chen, Y. Chen, and W. Shih. Efficient parallel algorithm for nonlinear dimensionality reduction on gpu. In Proceedings of the 6th International Conference on Granular Computing, pages 592-597,2010.
    [53]S. Lahabar and P. Narayanan. Singular value decomposition on gpu using cuda. In Proceedings of the 23th International Symposium on Parallel & Distributed Processing, pages 1-10,2009.
    [54]M. Andrecut. Parallel GPU implementation of iterative PCA algorithms. Journal of Computational Biology,16:1593-1599,2009.
    [55]C. Lessig and P. Bientinesi. On parallelizing the MRRR algorithm for data-parallel coprocessors. In Proceedings of the 7th International Conference on Parallel Processing and Applied Mathematics, pages 396-402,2010.
    [56]P. Jain, B. Kulis, J. V. Davis, and I. S. Dhillon. Metric and kernel learning using a linear transfor-mation. Journal of Machine Learning Research,13:519-547,2012.
    [57]K. Tsuda, S. Akaho, K. Asai, and C. Williams. The EM algorithm for kernel matrix completion with auxiliary data. Journal of Machine Learning Research,4:67-81,2003.
    [58]S. Amari and H. Nagaoka. Methods of Information Geometry. Oxford University Press,2000.
    [59]L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Mathematics and Mathematical Physics,7:200-217,1967.
    [60]D. A. Harville. Matrix Algebra From a Statistician's Perspective. Springer Berlin,2008.
    [61]J. V. Davis and I. S. Dhillon. Structured metric learning for high dimensional problems. In Pro-ceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 195-203,2008.
    [62]S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Association for Information Science and Technology, 41:391-407,1990.
    [63]B. Kulis, A. Surendran, and J. Platt. Fast low-rank semidefinite programming for embedding and clustering. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics,2007.
    [64]B. Kulis, M. A. Sustik, and I. S. Dhillon, Low-rank kernel learning with bregman matrix divergences. Journal of Machine Learning Research,10:341-376,2009.
    [65]C. K. I. Williams. On a connection between kernel PCA and metric multidimensional scaling. Machine Learning,46(1-3):11-19,2002.
    [66]J. B. Tenenbaum, V. D. Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science,290(5500):2319-2323,2000.
    [67]M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representa-tion. Neural Computation,15(6):1373-1396,2003.
    [68]D. D. Lee and H. S. Seung. Leaning the parts of objects by non-negative matrix factorization. Nature,401(6755):788-791,1999.
    [69]S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Analysis and Machine Intelligence,29:40-51,2007.
    [70]吴磊.视觉语言分析：从底层视觉特征表达到语义距离学习.合肥：中国科学技术大学,2010.
    [71]张杰.基于距离测度学习的图像分类方法研究.上海：复旦大学,2010.
    [72]张巍.基于K近邻分类准则的特征变换算法研究.上海：复旦大学,2007.
    [73]李珊珊.特征与相似性度量研究.合肥：中国科学技术大学,2010.
    [74]Z. J. Zha, T. Mei, M. Wang, Z. F. Wang, and X. S. Hua. Robust distance metric learning with aux-iliary knowledge. In Proceedings of the 21st international joint conference on Artificial intelligence, pages 1327-1332,2009.
    [75]C. Xiong, D. Johnson, R. Xu, and J. J. Cors. Random forests for metric learning with implicit pairwise position dependence. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pages 958-966,2012.
    [76]Y. Ying and P. Li. Distance metric learning with eigenvalue optimization. Journal of Machine Learning Research,13:1-26,2012.
    [77]D. Yeung and H. Chang. A kernel approach for semi-supervised metric learning. IEEE Transactions on Neural Networks,18(1):141-149,2007.
    [78]陈巧娜.距离度量学习的理论与算法研究：核回归、大间隔最近、Fisher线性判别.上海：华东师范大学,2009.
    [79]E. Hartman, J. D. Keeler, and J. M. Kowalski. Layered neural networks with gaussian hidden units as universal approximations. Neural Computation,2:210-215,1990.
    [80]Vapnik, V. The nature of statistical learning theory. Berlin:Springer-Verlag,1995.
    [81]T. M. Cover. Estimation by the nearest neighbor rule. IEEE Transactions on Information Theory, 14(1):50-55,1968.
    [82]J. MacQueen. Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability,1:281-296,1967.
    [83]E. Forgey. Cluster analysis of multivariate data:efficiency vs. interpretability of classification. Biometrics,21:768,1965.
    [84]F. Yin and C.-L. Liu. Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recognition,42(12):3146-3157,2009.
    [85]K. Huang, R. Jin, Z. Xu, and C.-L. Liu. Robust metric learning with smooth optimization. In Proceedings of the 26th International Conference on Uncertainty in Artificial Intelligence, pages 244-251,2010.
    [86]P. Yang, K. Huang, and C.-L. Liu. Geometry preserving multi-task metric learning, Machine Learn-ing,92(1):133-175,2013.
    [87]S. Xiang, F. Nie, and C. Zhang. Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognition,41(12):3600-3612,2008.
    [88]J. Lee and C. Zhang. Classification of gene-expression data:The manifold based metric learning way. Pattern Recognition,39(12):2450-2463,2006.
    [89]Y. Zhang, C. Zhang, and D. Zhang. Distance metric learning by knowledge embedding. Pattern Recognition,37(1):161-163,2004.
    [90]S. Xiang, F. Nie, C. Zhang, and C. Zhang. Spline embedding for nonlinear dimensionality reduction. In Proceedings of the 17th European Conference on Machine Learning, pages 825-832,2006.
    [91]F. Nie, D. Xu, X. Li, and S. Xiang. Semisupervised dimensionality reduction and classification through virtual label regression. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 41(3):675-685,2011.
    [92]Y. Song, F. Nie, C. Zhang, and S. Xiang. A unified framework for semi-supervised dimensionality reduction. Pattern Recognition,41(9):2789-2799,2008.
    [93]S. Xiang, F. Nie, C. Zhang, and C. Zhang. Nonlinear dimensionality reduction with local spline embedding. IEEE Transactions on Knowledge and Data Engineering,21(9):1285-1298,2009.
    [94]R. He, B.-G. Hu, W.S. Zheng, and X.W. Kong. Robust principal component analysis based on maximum correntropy criterion. IEEE Transactions on Image Processing,20:1485-1494,2011.
    [95]S.H. Yang, H.Y. Zha, S. Zhou, and B.-G. Hu. Variational graph embedding for globally and locally consistent feature extraction. In Proceedings of the 20th European Conference on Machine Learning, pages 538-553,2009.
    [96]X.T. Yuan and B.-G. Hu. Robust feature extraction via information theoretic learning. In Proceed-ings of the 26th Annual International Conference on Machine Learning, pages 1193-1200,2009.
    [97]B.-G. Hu, R. He and X.T. Yuan. Information-theoretic measures for objective evaluation of classi-fications. Acta Automatica Sinica,38(7):1160-1173,2012.
    [98]J. Yu, D. Liu, D. Tao, and H.-S. Seah. On combining multiple features for cartoon character retrieval and clip synthesis. IEEE Transaction on Systems, Man, and Cybernetics, Part B,42:1413-1427, 2012.
    [99]J. Yu, M. Wang, and D. Tao. Semisupervised multiview distance metric learning for cartoon syn-thesis. IEEE Transaction on Image Processing,21:4636-4648,2012.
    [100]J. Yu, D. Liu, D. Tao, and H.-S. Seah. Complex object correspondence construction in two-dimensional animation. IEEE Transaction on Image Processing,20:3257-3269,2011.
    [101]J. Yu, D. Tao, Y. Rui, and J. Cheng. Pairwise constraints based multiview features fusion for scene classification. Pattern Recognition,46:483-496,2013.
    [102]M.S. Baghshah and S.B. Shouraki, Learning low-rank kernel matrices for constrained clustering, Neurocomputing,74:2201-2211,2011.
    [103]T. M. Cover and J. A. Thomas. Elements of Information Theory,2nd ed. New York, NY, USA: Wiley,2006.
    [104]S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22(1):79-86,1951.
    [105]K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research,10:207-244,2009.
    [106]A. Frank and A. Asuncion. UCI machine learning repository, http://www.ics.uci.edu.
    [107]G. Griffin, A. Holub, and P. Perona. Caltech-256 object category data set. Technical Report UCB/CSD-04-1366, California Inst. of Technology,2007.
    [108]A. Farhadi, I. Endres, and D. Hoiem. Attribute-centric recognition for cross-category generaliza-tion, In Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition, page 2352-2359,2010.
    [109]A. Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth. Describing objects by their attributes. In Proceedings of the 22nd IEEE Conference on Computer Vision and Pattern Recognition,pages 1778-1785,2009.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700