     2.经典的PCA和KPCA算法都是在最小平方意义下进行建模的,其求解缺乏足够的稳健性。数据中即使掺杂了少量的离群样本也会使得它们求解的主分量方向产生很大偏倚。本文针对这一问题提出了一种稳健的非线性降维算法IRobust KPCA。该算法通过隐式的方式辨别并抑制数据中的离群样本,能够学习出准确的非线性子空间。由于采用了迭代的方式更新计算,算法还具有潜在的增量学习的优势。与标准KPCA算法的对比实验结果表明了该算法的有效性和稳健性。
     4.基于流形正则化的思想,提出了一种可用于多类问题半监督学习算法MLapRLS。MLapRLS算法采用多变量回归模型用于分类问题,并且构建了所有样本的近邻图来估计整个数据空间的几何结构,作为回归目标的正则化项。在该算法中,无标签样本的作用就是协助估计数据空间的局部几何结构,帮助获得更为有效的判别向量。在Extended YaleB和PIE人脸数据库上的实验结果表明了该算法的有效性。
Many pattern recognition and data mining problems,such as face recognition, digital image recognition and data visualization,involve data in very high dimensional spaces.The high feature dimensionality of data not only burdens the computational requirement of algorithms,but also cantains redundancy and obscures the intrinsic structures of data.Dimensionality reduction is an effective tool to deal with this problem,which can help to probe into the essential structure of the input data and contributes to accomplish desired learning tasks at low computational cost. As a result,the research on dimensionality reduction has always been important in related scientific fields.
     This thesis focuses on the theories and methods of dimensionality reduction for high dimensional data,as well as related applications in face recognition.The main contents and achievements are as follows:
     1.The characteristics and advantages of existing dimensionality reduction algorithms are summarized from global statistic-based and local geometry-based perspectives.The internal relations of various algorithms are also analyzed.
     2.Both the classical PCA and KPCA algorithms,implemented in the sense of least mean squared error,have the deficiency of instability when input data are spoiled by outliers.And even small amount of outliers will obviously deteriorate the performance of standard PCA and KPCA algorithms.To deal with this problem,we propose a new robust nonlinear principal component analysis technique called IRobust KPCA.The algorithm can effectively eliminate the effect of outliers,and produce an accurate nonlinear subspace.In addition,IRobust KPCA computes iteratively and shows the potential of expansibility to the incremental learning version. The comparative experimental results with standard KCPA demonstrate the effectiveness and robustness of IRobust KPCA.
     3.Focusing on the dimensionality reduction for manifold learning and pattern classification on high-dimensional data,we propose a new supervised dimensionality reduction algorithm.The classical LDA method considers only the global statistical information of samples and tends to fail in dealing with nonlinear distributed data.While the manifold learning algorithms have shown great power in discovering the intrinsic structures of high dimensional data.Therefore,we utilized the locality preserving idea and developed a new algorithm called Sub-manifold Discrimiant Analysis (SMDA).SMDA finds the low-dimensional embeddings of the input data by maximizing the sub-manifold margin while maintaining the neighboring relations of samples.In addition,an optimized process of intrinsic structure discovery is adopted to avoid the limitations of existing locality preserving based methods.The experimental results on Yale and UMIST face databases domenstrate the effectiveness of SMDA and the supiority to popular PCA,LDA,LPP and MFA algorithms.
     4.Considering the semi-supervised learning framework based on manifold regularization,we propose a method called MLapRLS.In MLapRLS,a nearest neighbor graph is constructed firstly to model the intrinsic geometrical structure of the data space,and then the graph structure is incorporated into the objective function of the Multivariate Linear Regression as a regularization term.Aiming to extract effective features for the semi-supervised multi-class problem,MLapRLS can make use of all limited labeled samples and large amount of unlabeled samples.The experimental results on Extended YaleB and PIE face databases domenstrate the effectiveness of MLapRLS.
