半监督流形学习算法研究和应用

英文题名：Semi-supervised Manifold Learning and Its Application
作者：李昱
论文级别：硕士
学科专业名称：测试计量技术及仪器
中文关键词：流形学习 ; 半监督学习 ; 拉普拉斯特征映射 ; 数据降维 ; 目标跟踪
英文关键词：Dimensionality Reduction ; Manifold Learning ; Semi-supervised Learning ; Laplacian Eigenmaps ; Object Tracking
学位年度：2010
导师：任获荣
学科代码：080402
学位授予单位：西安电子科技大学
论文提交日期：2010-01-01

摘要

作为一种非线性降维技术,流形学习算法能更好地发现复杂数据集的内在结构,为数据的进一步处理提供基础。目前已出现一些成熟的流形学习算法,并在模式识别,机器视觉等领域取得了成功应用。
     目前的流形学习算法多是无监督的算法,没有利用到样本的先验信息。如能获得部分样本的先验信息,可以在训练阶段利用这些信息来提高分类器的分类性能,对普通学习算法进行推广得到其半监督算法。
     本文主要研究了流形学习算法的半监督推广。在研究和分析了目前的一些方法后,基于传统的流形学习方法拉普拉斯特征映射(LE)算法,提出了半监督的拉普拉斯特征映射(SS-LE)算法。该算法利用少量样本的已知信息,可以大幅提高所求解的低维嵌入坐标的精度。另外从计算复杂度和准确度方面比较了半监督拉普拉斯和半监督局部线性嵌入算法(SS-LLE)的性能。随着邻域数k的增加,SS-LE的计算复杂度远低于SS-LLE,而精度只是相对略有下降,且当k取较小的值时,SS-LE算法就已经可以取得和SS-LLE算法最高精度接近的结果。最后使用人造数据和真实数据验证了SS-LE算法在数据降维,人脸识别,可视化和视频目标跟踪中的应用,取得了预期的效果。
As a non-linear dimensionality reduction method, manifold learning can discover the intrinsic construction of complex data for further processing. Several manifold learning algorithms have been developed and were widely used in pattern recognition and machine vision etc. area.
     Most of the existing manifold learning algorithms are unsupervised method without using prior-information. If we can obtain some prior-information of training samples, we can use this information to help the study step to increase the classification ability, so improving the original method to semi-supervised method.
     The objective of this paper is to research about the semi-supervised manifold learning. After analyzing existing methods, we propose a semi-supervised manifold learning algorithm based on Laplacian Eigenmaps, named semi-supervised Laplacian Eigenmaps (SS-LE). SS-LE uses prior information of very few samples to calculate more accurate low dimensional embedding coordinates. We also analyze its superior in accuracy and computation complexity compared with other methods. Compared with SS-LLE, the computation time of SS-LE is dramatically reduced and the drop of the accuracy is within an acceptable range. Besides, the accuracy of our method can achieve nearly best result of SS-LLE only need to set K (the number of neighbors) to a relative small value. We demonstrated the usefulness of our algorithm by synthetic and real world problems, especially its efficiency in object tracking problem.

引文

[1]刘小明.数据降维及分类中的流形学习研究.浙江大学博士学位论文. 2007.
    [2] Zhang CS, Wang J, Zhao N Y, et al. Reconst ruction and analysis of multipose face images based on nonlinear dimensionality reduction. Pattern Recognition, 2004, vol.37. 325-336.
    [3] Jolliffe I. Principal Component Analysis. New York: Springer Series in Statistics, 1989.
    [4] Fisher R A. The use of multiple measurements in taxonomic problems. Annals Eugen, 1936, 7179-188.
    [5] Cox T.F., Cox M.A.A. Multidimensional Scaling. London, Chapman&Hall, 1994.
    [6] Seung H, Lee D. The manifold ways of perception. Science, 290, 2000, 2268-2269.
    [7] Tenenbaum J.B., Silva V., Langford J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290, 2000, 2319-2323.
    [8] Roweis S.T., Saul L.K. Nonlinear Dimensionality Reduction by Locally Linear Embeddin. Science, 2000, vol.290, 2323-2326.
    [9] Belkin M., Niyogi P. Laplacian Eigenmaps for dimensionality reduction and data representation, Neural Computation, 15, 2003, 1371-1396.
    [10] Donoho D.L., Grimes C.E. Hessian Eigenmaps: locally linear embedding techniques for high dimensional data, Proceedings of the National Academy of Arts and Sciences, 2003, 5591-5596.
    [11] Zhang Zh., Zha H. Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment, Journal. Sci. Comput. SIAM. 2004, vol.26, 31 9-338.
    [12] CoifmanR. et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data:Diffusion maps, Proc.of the National Academy of Sciences, 2005, vol.102,7426-7431.
    [13] Weinberger K.Q., Saul L.K. Unsupervised learning of image manifolds by semidefinite programming, CVPR, II, 2004, 988-995.
    [14]彭岩,张道强.半监督典型相关分析算法.软件学报, 2008, vol.19, 2822?2832.
    [15] Zhu XJ. Semi-Supervised learning literature survey. Technical Report, 1530, Madison: Department of Computer Sciences, University of Wisconsin, 2005.
    [16] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 2000, vol.39, 1?3.
    [17] de Sa, V. R. Learning classification with unlabeled data. Advances in Neural Information Processing Systems (NIPS). 1993, vol.6, 112.
    [18] Zhou ZH, Li M. Semi-Supervised regression with co-training. In: Proc. of the 19th Int’l Joint Conf. on Artificial Intelligence. 2005. ?909813
    [19] El-Yaniv R, Pechyony D, Vapnik V. Large margin vs. large volume in transductive learning. Machine Learning, 2008, vol.72, 173?188.
    [20]喻军.监督流形学习及其应用研究.中国农业大学硕士学位论文. 2006.
    [21] Belkin, M. Niyogi, P. Using Manifold Structure for Partially Labeled Classification. Advaces in Neural Information Processing Systems. 2003, vol.15, 953-960.
    [22] Yang, X. Fu, H. Zha, H. Barlow, J. Semi-supervised nonlinear dimensionality reduction. Machine learning International Workshop Conference. 2006, vol.23, 1065-1072.f
    [23] Z Zhang, H Zha, M Zhang. Spectral Methods for Semi-supervised Manifold Learning. IEEE Conference on Computer Vision. 2008.
    [24]冯海亮,黄鸿,李见为等.基于Semi-Supervised LLE的人脸表情识别方法.沈阳建筑科技大学学报:自然科学版. 2008, vol.24, 1109-1113.
    [25] Nayar S, Nene S, Murase H. Subspace methods for robot vision. IEEE Transactions on Robotics and Automation, 1996, vol.12, 750-758.
    [26]孙明明.流形学习理论与算法研究.南京理工大学博士学位论文. 2007.
    [27] M Balasubramanian, EL Schwartz, JB Tenenbaum, V. The isomap algorithm and topological stability. Science, 2002, vol.295, 7.
    [28]李春光.流形学习及其在模式识别中的应用.北京邮电大学博士学位论文. 2007, 27-30.
    [29] Kouropteva, O., O. Okun, M. Peitikainen. Selection of the optimal parameter value for the localy linear embedding algorithm. in Proc. of the 1st Int. Conf. on Fuzzy Systems and Knowledge Discovery, Singapore, 2002.
    [30] Bengio, Y., J.F. Paiement, P. Vincent et al. Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. in Advaces in Neural Information Processing Systems. 2004.
    [31] D. de Ridder, R.P.W. Duin. Locally Linear Embedding for Classification. TRSeries, No. PH-2002-01, PR Group, IS&T Dept., Delft Uni. of Tech., Netherlands, 2002.
    [32] Dick de Ridder, Olga Kouropteva, Oleg Okun. Supervised locally linear embedding. Lecture Notes In Computer Science. 2003, 333-341.
    [33] O. Kouropteva, O. Okun, M. Pietik?inen. Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine. in Proc. 11th European Symp.Artificial Neural Networks, Belgium, 2003, 229-234.
    [34] He, X., P. Niyogi. Locality preserving projections. in Advances in Neural Information Processing Systems. 2003.
    [35] Verveer P, Duin R. An evaluation of intrinsic dimensionality estimators. IEEE Trans. On PAMI, 1995, vol.17, 81-86.
    [36] Law M, Jain A K. Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans. on PAMI, 2006, 377—391.
    [37] Olga K, Oleg O, Matti P. Incremental locally linear embedding. Pattern Recognition, 2005, vol.38,1764-1767.
    [38]尹峻松.流形学习理论与方法研究及在人脸识别中的应用.国防科学技术大学博士学位论文. 2007.
    [39] Nigam, K., Ghani, R. Analyzing the effectiveness and applicability of co-training. Ninth International Conference on Information and KnowledgeManagement. 2000, 86–93.
    [40] Baluja, S. Probabilistic modeling for face orientation discrimination: Learning from labeled and unlabeled data. Advaces in Neural Information Processing Systems. 1999, vol.11, 854-860.
    [41] Yarowsky, D. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. 1995, 189–196.
    [42] Rosenberg, C., Hebert, M., Schneiderman, H. Semi-supervised selftraining of object detection models. Seventh IEEE Workshop on Applications of Computer Vision. 2005.
    [43] M. Vlachos, C. Domeniconi, D. Gunopulos et al. Non-linear dimensionality reduction techniques for classification and visualization. In Proceedings of 8th SIGKDD, Edmonton, Canada, 2002. 645-651.
    [44] Geng, X. Zhan, D.C. Zhou, Z.H. Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Transactions on Systems Man and Cybernetics Prat B. 2005, vol.35, 1098-1107.
    [45] D. de Ridder, M. Loog, and M. J. T. Reinders. Local fisher embedding. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), 2004, vol.2, 295–298.
    [46] Loog M., Ridder D. Local Discriminant Analysis, ICPR, 2006.
    [47] Kouropteva, O. Okun, O. et al. Supervised Locally Linear Embedding Algorithm for Pattern Recognition. Lecture Notes in Computer Science. 2003, 386-394.
    [48] M.-F. Balcan, A. Blum. A PAC-style model for learning from labeled and unlabeled data. Eighteenth Annual Conference on Learning Theory, 2005
    [49] Wang, F., Zhang, C. Label propagation through linear neighborhoods. ICML06, 23rd International Conference onMachine Learning. Pittsburgh, USA. 2006.
    [50] Hein, M., Maier, M. Manifold denoising. Advances in Neural Information Processing Systems (NIPS) 19. 2006.
    [51] Xin Yang, Ren-Cang Li, Hongyuan Zha. Active Learning Methods for Semi-supervised Manifold Learning. Technical Report, 2009.
    [52]黄启宏.流形学习方法理论研究及图像中应用.电子科技大学博士论文. 2007.
    [53] A. Rahimi, B. Recht, and T. Darrell. Learning to transform time series with a few examples. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2007.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700