组稀疏子空间的大间隔特征选择

英文题名：Large Margin of Group Sparse Subspace for Feature Selection
作者：刘波
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：大间隔 ; 特征选择 ; 组稀疏 ; 子空间学习 ; 高维数据处理
英文关键词：Large Margin ; Feature Selection ; Group Sparsity ; Subspace Learning ; High-
英文关键词：Dimensional Data Pocessing
学位年度：2013
导师：房斌
学科代码：081203
学位授予单位：重庆大学
论文提交日期：2013-04-01

摘要

随着信息技术、生物技术的飞速发展，由此产生大量高维数据。特征选择作为高维数据的一种有效分析方法，受到越来越多研究者重视。所谓特征选择是指在给定的特征集中，选择一个子集，使其能很好表示原始数据。对高维数据特征选择的好坏，直接影响到后续学习算法的最终结果。因此，高维数据的特征选择也是机器学习研究的重要内容。
     本文以高维数据作为研究对象，针对该领域实际存在的问题，研究高维数据的特征选择方法。论文的主要内容和创新点如下：
     ①采用局部线性方法来有效表示非线性高维数据的结构。任意复杂问题可分解成多个局部线性问题。基于此，本文通过样本的同类最近邻和不同类最近邻来局部线性近似高维非线性数据的结构。该方法非常简洁、高效且直观。
     ②建立组稀疏子空间的大间隔特征选择模型(简称GSLM)。将样本与不同类最近邻之间的距离信息和样本与同类最近邻的距离信息投影到子空间，然后将两者相减的差值作为子空间的样本间隔，将这些间隔相加，就可得到所有样本的间隔。最大化此间隔会使投影到子空间的样本与不同类最近邻之间的距离尽量大，而与同类最近邻之间的距离尽小量。因此，最大化间隔会使投影到子空间的最近邻信息被尽量保持，从而选择有效特征。为了更合理解释由子空间所选择的特征，让所建立的模型有较强的抗干扰能力，且不破坏样本已有的概率分布。本文在目标函数中引入L2,1范数作为正则项。针对该模型的目标函数，提出一种高效的求解算法。该算法可以得到局部最优解。实验验证，该模型能选择较好的特征且相应求解算法效率很高。
     ③提出Trace Ratio-组稀疏子空间的大间隔特征选择模型(简称TR-GSLM)。用样本与不同类最近邻的距离除以样本与同类最近邻距离之商作为间隔值。若这种间隔越大，则样本与不同类最近邻的距离应尽量大，而与同类最近邻之间的距离要尽量小。因此，最大化此间隔也可以使投影到子空间的最近邻信息被尽量保持，从而选择有效特征。针对非凸目标函数，提出一种新的迭代算法，它可获得全局最优解。同时，也给出该算法的收敛性证明。实验验证，该模型所选择的特征比GSLM要好，但其求解算法的效率较低。
     ④提出一种增强的Trace Ratio-组稀疏子空间的大间隔特征选择模型(简称ETR-GSLM)。采用替换变量法来提高TR-GSLM求解算法的效率，由此创建目标函数，并提出相应的求解算法。虽然目标函数极其复杂且非凸，但该算法仍能得到全局最优解。由于该算法的求解过程需保持样本间隔矩阵的正定性，本文采用修改的Cholesky算法来保证此矩阵的正定性。最后证明该算法收敛。实验验证，该模型所选择的特征比前两种模型都要好，且该算法效率比TR-GSLM的要高。
     通过大量的实验验证本文所提出的三种算法在分类精度上比其它相近算法要好。并验证所提出算法对核函数参数、正则参数不敏感。在运行时间上，本文提出的GSLM算法具有优势。
With the rapid development of information technology, bio-technology andInternet technology, high-dimensional data is becoming increasingly popular.Featureselection is an effective method of high-dimensional data; it attached more and moreresearchers. The feature selection is to find a subset that can be a good representation ofthe original data in a given feature set. The following learning algorithm is directlyaffected by feature selection of high-dimensional data.
     In this dissertation, we investigate feature selection methods, and apply tohigh-dimemsional data. The main content and contributions include:
     ①We use local linear method to represent the non-linear high-dimensional datastructure. The key idea is to decompose an arbitrarily complex nonlinear problem into aset of locally linear ones. We take the nearest neighbor of the sample with same labeland the nearest neighbor of the sample with different lable to approximate the nonlinearhigh-dimension data structure.The method is very simple, efficient and intuitive.
     ②We establish a large margin model of group sparse subspace for feature selection(GSLM). Local nearest neighbor of the sample is projected into the subsapce and definethe margin in the subspace. The nearest neighbor's distance of the samples with differentlabel and the nearest neighbor's distance of the samples with same label are project intothe subspace, and then to subtract the two difference value as the margin of the samplein the subspace, the sum of these margins is termed as the margin of all samples.Maximizing the margin can be made the nearest neighbor with the different label shouldbe as large as possible, and the nearest neighbor with the same label should be as smallas possible, and preserve the information of the nearest neighbor to select usefulfeature.In order to more reasonable interpretation of the selected feature by the subspace,and make model to have the strong robust capability, we incorporate theL2,1norm intothe objective function. We propose an efficient algorithm to solve the objectivefunction.the algorithm can obtain the local optimal solution. Comprehensiveexperiments verify that the model can select good fetures and corresponding solvingalgorithm is very efficient.
     ③We establish a large margin model of Trace Ratio-group sparse subspace forfeature selection (TR-GSLM). This method uses the nearest neighbor with the samelabel samples over of the nearest neighbor with the different label samples, if the margin is larger, the nearest neighbor with the same label should be as small as possible, andthe nearest neighbor with the different label should be as small as possible. We use thetrace ratio to obtain the objective function and then incorporateL2,1norm into the theobjective function. For the non-convex objective function, we propose the novelalgorithm which can obtain the global optimal solution. The convergence of thealgorithm is proven. Comprehensive experiments verify the model is better than GSLMfor feature selection, but it is lower efficiency of the algorithm.
     ④We establish an enhanced large margin model of Trace Ratio-group sparsesubspace for feature selection (ETR-GSLM). We take substitution variables to improvethe efficiency of the TR-GSLM algorithm. Although the objective function is extremelycomplex and non-convex, the algorithm can obtain global optimal solution. In order toguarantee the positive definiteness of the sample marginal matrix, we present the newcorrecting method for indefinition matrix. If the marginal matrix is positive definitematrix, the algorithm is the convergence.
     Comprehensive experiments are conducted to compare the performance of theproposed algorithms with the other five state-of-art algorithms RFS, SPFS, mRMR, TR.Our algorithms achieve better performance than the five alogorithm. The proposedalgorithms are not sensitive for kernel function parameters and the regularizationparameters. Compared with the algorithm LLFS and TR, the proposed GSLM algorithmhas acompetitive performance with a significantly faster computational time.

引文

[1] Dudoit S, Fridlyand J, Speed T. Comparison of discrimination methods for the classificationof tumors using gene expression data [J].2002,2002(97):77-87.
    [2] Saeys Y, Inza I, Larra aga P. A review of feature selection techniques in bioinformatics [J].Bioinformatics,2007,23(19):2507-2517.
    [3]中国互联网协会反垃圾邮件中心.2012年第二季度中国反垃圾邮件状况调查报告[EB/OL]. http://www.anti.spam.cn.
    [4] Zhao W, Chellappa R, Phillips P Jet al. Face recognition: A literature survey [J]. JournalACM Computing Surveys,2003,35(4):399-458.
    [5]奉国和,朱思铭.改进SVM及其在时间序列数据预测中的应用[J].华南理工大学学报(自然科学版),2005.33(5):19-22.
    [6] Jie C, Zhiang W U, Yi Zet al. A Novel Collaborative Filtering Using Kernel Methods forRecom m ender System[J]. Chinese Journal of Electronic,2012,21(4):609-614.
    [7]孟详武,胡勋,王立才等.移动推荐系统及其应用[J].软件学报,2013.24(1):91-108.
    [8]边肇祺,张学工.模式识别[M].北京:清华大学出版社,2000.
    [9] Blum A L, Langley P. Selection of relevant features and examples in machine learning [J].Artificial Intelligence,1997,97(1-2):245-271.
    [10] Jain A, Zongker D. Feature selection: evaluation, application, and small sample performance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(2):153-158.
    [11] Xing E P, Jordan M I, Karp R M. Feature Selection for High-Dimensional GenomicMicroarray Data[C].18th International Conf. on Machine Learning,2001.
    [12] Langley P, Sage S. Computational learning theory and natural learning systems: VolumeIV[M]. MIT Press,1997.
    [13] M L P. The characteristic selection problem in recognition systems [J]. IRE Transaction onInformation Theory,1962,1962(8):171-178.
    [14]李宏,姜伟,陆瑶等．矩阵表示的子空间学习算法综述[J].辽宁师范大学学报(自然科学版),2012.35(3):294-299.
    [15] Belhumeur P N, Hespanha J P, Kriegman D J. Eigenfaces vs. fisherfaces: recognition usingclass specific linear projection [J]. IEEE Transactions on Pattern Analysis and MachineIntelligence,1996,19:711-720.
    [16] Duda R O, Hart P E, Stork D G. Pattern classification [M]. New York: John Wiley Sons,2001.
    [17] He X F, Niyogi P. Locality preserving projections[C].Advances in Neural InformationProcessing Systems16,2003.
    [18] Roweis S, Saul. L. Nonlinear dimensionality reduction by locally linear embedding [J].Science,2000,290(5500):2323-2326.
    [19] Liu C. Gabor-Based Kernel PCA with Fractional Power Polynomial Models for FaceRecognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(5):572-581.
    [20] Tzimiropoulos G, Zafeiriou S, Pantic M. Subspace Learning from Image GradientOrientations [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(12):2454-2466.
    [21] Turk M, Pentland A. Eigenfaces for recognition [J]. Journal of Cognitive Neuroscience,1991,3(1):71-86.
    [22] Howland P, Park H. Generalizing discriminant analysis using the generalized singular valuedecomposition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(8):995-1006.
    [23] Ye J, Li Q. A two-stage linear discriminant analysis via QR-decomposition [J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2005,27(6):929-941.
    [24] Zhang T, Fang B, Tang Y Yet al. Generalized Discriminant Analysis: A Matrix ExponentialApproach [J]. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics,2010,40(1).
    [25] Tao D, Li X, Wu Xet al. Geometric Mean for Subspace Selection [J]. IEEE Transactions onPattern Analysis and Machine Intelligence,2009,31(2):260-274.
    [26] P L, S S. Scaling to Domains with Irrelevant Features [M].Computational Learning Theoryand Natural Learning Systems.1994:17-29.
    [27] Siedlecki W, Sklansky J. A note on genetic algorithm for large-scale feature selection[J].Pattern Recognition Letters,1989,10(11):335-347.
    [28] M D, Maniezzo V. Positive Feedback as a Search Strategy [J]. Technical report,1991.
    [29]叶志伟,郑肇葆,万幼川等.基于蚁群优化的特征选择新方法[J].武汉大学学报,信息科学版,2007.32(12).
    [30]张永波,游录金,陈杰新.基于模拟退火的多标记数据特征选择[J].计算机工程与设计,2011.32(7):2494-2496,2500.
    [31]李义峰,刘毅慧.基于模拟退火算法的高分辨率蛋白质质谱数据特征选择[J].生物信息学,2009.7(2):85-90.
    [32] R B. Using mutual information for selecting features in supervised neural net learning [J].IEEE Transactions on Neural Network,1994,5(4):537-550.
    [33] Hall M A. Correlation-based feature selection for discrete and numeric class machinelearning [J].17th International Conference on Machine Learning,2000:359-366.
    [34] Kira K, Rendell L. A practical approach to feature selection[C]. Proceedings of AAAIConference on Artifcial Intelligence,1992.
    [35] Kononenko I. Estimating Attributes: Analysis and Extensions of RELIEF [J]. In Proceedingsof European conference on machine learning,1994:171-182.
    [36] Mitra P, Murthy C A, Pal S K. Unsupervised feature selection using feature similarity [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(3):301-312.
    [37] Gu Q, Li Z, Han J. Generalized Fisher Score for Feature Selection[C]. The27th Conferenceon Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain,2011.
    [38] Gu Q Q, Li Z H, Han J W. Joint feature selection and subspace learning[C],2011.
    [39] Kim S, Boyd S. A Minimax Theorem with Applications to Machine Learning, SignalProcessing, and Finance[J]. SIAM Journal on Optimization,2008,19(3):1344-1367.
    [40] Le Song, Smola A, Gretton Aet al. Supervised Feature Selection via DependenceEstimation[C]. Proceedings of the24th international conference on Machine learning,2007.ACM New York.
    [41] Gretton A, Bousquet O, Smola Aet al. Measuring Statistical Dependence withHilbert-Schmidt Norms Algorithmic Learning Theory[C].16th International Conference,Berlin/Heidelberg,2005.
    [42] He X, Cai D, Niyogi P. Laplacian Score for Feature Selection[C]. In Advances in NeuralInformation Processing Systems18(2006),2006.
    [43] H G, John, Kohavi Ret al. Irrelevant Features and the Subset Selection Problem[C]. TheEleventh International Conference of lMachine Learning,1994.
    [44] Caruana R, Freitag D. Greedy Attribute Selection[C] In Proceedings of the EleventhInternational Conference on Machine Learning,1994.
    [45] Kohavi R, John G H. Wrappers for feature subset selection [J]. Artificial Intelligence,1997,97(1997):273-324.
    [46] L M, Raymer, Punch W Fet al. Dimensionality reduction using genetic algorithms[J]. IEEETransactions on Evolutionary Computation,2000,4(2):164-171.
    [47] Durbha S S, King R L, Younan N H. Wrapper-Based Feature Subset Selection for RapidImage Information Mining[J]. IEEE Geoscience and Remote Sensing Letters,2010,1(7):43-47.
    [48] Guyon I, Gunn S, Nikravesh Met al. Feature Extraction: Foundations and Applications[M].Springer,2006.
    [49] Chapelle O, Vapnik V, Bousquet Oet al. Choosing Multiple Parameters for Support VectorMachines[J]. Machine Learning,2002,46(1-3):131-159.
    [50]李航.统计学习方法[M].清华大学出版社,2012:15-17.
    [51] Brdman L, J F. Classification and Regression Trees[M]. Wadsforth International Group,1984.
    [52] Quinlan J R. C4.5: programs for machine learning [M]. Morgan Kaufinarm,1993.
    [53] Quialan J R. Learning efficient classification procedures and their application to chess endgames[M]. CA：Morgan Kaufmann,1983.
    [54] Weston J, Mukherjee S, Chapelle Oet al. Feature Selection for SVMs [C]. Advances inNeural Information Processing Systems13,2001. MIT Press.
    [55] Vapnik V. Statistical Learning Theory [M]. New York: John Wiley and Sons,1998.
    [56] Candès E, Romberg J, Tao T. Robust uncertainty principles: Exact signal reconstructionfrom highly incomplete frequency information [J]. IEEE Transactions on InformationTheory,2006,52(2):489-509.
    [57] Candès E, Tao T. Near-Optimal Signal Recovery from Random Projections: UniversalEncoding Strategies [J]. IEEE Transactions on Information Theory,2006,52(12):5406-5425.
    [58] Rauhut H, Schnass K, Vandergheynst P. Compressed Sensing and Redundant Dictionaries [J].IEEE Transactions on Information Theory,2008,54(5):2210-2219.
    [59] Wimalajeewa T, Chen H, Varshney P K. Performance Limits of Compressive Sensing-BasedSignal Classification [J]. IEEE Transaction on Signal Processing,2012,60(6):2758-2770.
    [60] Ng A Y. Feature selection, L1vs. L2regularization, and rotational invariance[C].ProceedingICML '04Proceedings of the twenty-first international conference on Machine learning,New York,2004.
    [61] Ng A Y, Jordan M I. Convergence rates of the voting gibbs classifier, with application tobayesian feature selection[C]. Proceedings of the Eighteenth International Conference onMachine Learning,2001. Morgan Kaufmann.
    [62] Zhu J, Rosset S, Hastie Tet al.1-norm Support Vector Machines: Neural InformationProcessing Systems,2003[C]. MIT Press.
    [63] Nie F P, Huang H, Cai Xet al. Efficient and robust feature selection via jointl2,1normsminimization. In Advances in Neural Information Processing Systems,2010. December.
    [64] Nennis J E, Schanbel R B. Numerical methods for Unconstrained Optimization andNon-linear Equations [J]. SIAM,1983,18(2).
    [65] Fletcher R, Reeves C M. Function minimization by conjugate gradients [J]. ComputerJournal,1964,7(1964):149-154.
    [66] Davido W C. Variable metric method for minimization [J]. SIAM Journal on NumericalAnalysis,1991,19(1982):400-408.
    [67] Achter A W, Biegler L T. Failure of global convergence for a class of interior point methodsfor nonlinear programming [J]. Mathematical Programming,2000,88(2000):565-574.
    [68] Hestenes M R. Multiplier and gradient methods [J]. Journal of Optimization Theory and,1969,4(1969):303-320.
    [69] Weinberger K Q, Saul L K. Distance Metric Learning for Large Margin Nearest NeighborClassification [J]. Journal of Machine Learning Research,2009,10(2009):207-244.
    [70] Wang J, Shen X. Large Margin Semi-supervised Learning [J]. Journal of Machine LearningResearch,2007,8(2007):1867-1891.
    [71] Parameswaran S, Weinberger K Q. Large Margin Multi-Task Metric Learning[C]. Advancesin Neural Information Processing Systems23(NIPS),2010.
    [72] Bachrachy R G, Navotz A, Tishbyy N. Margin based feature selection theory andalgorithms[C]. In International Conference on Machine Learning2004,2004,July.
    [73] Crammer K, Gilad-Bachrach R, Navot A. Margin Analysis of the LVQAlgorithm[C].Advances in Neural Information Processing System.2002.
    [74] Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and anapplication to boosting [J]. Journal of Computer and System Sciences,1997,55(1):119-139.
    [75] Weinberger K Q, Saul L K. Distance metric learning for large margin nearest neighborclassification [J]. The Journal of Machine Learning Research,2009,10:207-244.
    [76] Vandenberghe L, Boyd S. Semidefinite Programming [J]. SIAM Review,1997,38(1):49-95.
    [77] Ying Y, Li P. Distance Metric Learning with Eigenvalue Optimization [J]. Journal ofMachine Learning Research,2012,13(2012):1-26.
    [78] Hazan E. Sparse Approximate Solutions to Semidefinite Programs[C]. Proceedings of the8th Latin American Conference on Theoretical informatics,2008.
    [79] Nesterov Y. Smoothing minimization of non-smooth functions [J]. MathematicalProgramming,2005,127-152(103).
    [80] Chen B, Liu H, Chai Jet al. Large Margin Feature Weighting Method via LinearProgramming [J]. IEEE Transactions on Knowledge and Data Engineering,2009,21(10):1475-1488.
    [81] Karmarkar N. A new polynomial-time algorithm for linear programming [J]. Combinatorics,1984,1984(4):373.
    [82] Vanderbei R J. Linear Programming: Foundations and Extensions [M]. New York: SpringerVerlag,2001.
    [83] Sun Y J, Todorovic S, GoodIson S. Local-Learning-Based feature selection forhigh-dimensional data analysis [J]. IEEE Transaction on pattern analysis and machineintelligence,2010,32(9):1610-1626.
    [84] Ng A Y. Feature selection,l1vsl2regularization, and rotational invariance[C],Proceeding ICML '04Proceedings of the twenty-first international conference on Machinelearning, New York,2004, July.
    [85] Koh K, Kim S, Boyd S. An Interior-Point Method for Large-Scale l1-Regularized LogisticRegression [J]. Journal of Machine LearningResearch,2007,8(2007):1519-1555.
    [86] Volker R. The generalized LASSO [J]. IEEE Transactions on Neural Networks,2004,15(1):16-28.
    [87] Eldar Y C, Mishali M. Robust Recovery of Signals from a Structured Union of Subspaces [J].IEEE Transactions on Information Theory,2009,55(11):5302-5316.
    [88] Jenatton R, Gramfort A, Michel Vet al. Multi-scale Mining of fMRI data with HierarchicalStructured Sparsity [J]. SIAM Journal on Imaging Sciences,2012,5(3):835-856.
    [89] Huang J, Zhang T. The Bene t of Group Sparsity[J]. Annals of Statistics,2010,38:1978-2004.
    [90] Yuan M, Lin Y. Model Selection and Estimation in Regression with Grouped Variables [J].Joural Royal Statistical Soc. Series,2006,68:49-67.
    [91] Bach F, Jenatton R, Mairal Jet al. Structured Sparsity through Convex Optimization [J].Institute of Mathematical Statistics,2012,27(4):450-468.
    [92] Ding C, Zhou D, He Xet al. R1-PCA: rotational invariantl1-norm principalcomponentanalysis for robust subspace factorization[C], ICML '06Proceedings of the23rdinternational conference on Machine learning.2006, June.
    [93] Huang H, Ding C. Robust Tensor Factorization Using R1Norm[C]. IEEE Conference onComputer Vision and Pattern Recognition.2008.
    [94] Argyriou A, Evgeniou T, Pontil M. Multi-task feature learning[C]. Twentieth AnnualConference on Neural Information Processing Systems.2006, December.
    [95] Liu J, Ji S W, Ye J P. Multi-task feature learning via efficientl2,1-norm minimization[C].Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence.2009.
    [96] Nesterov Y. Introductory Lectures on Convex Optimization: A Basic Course [M]. KluwerAcademic Publishers,2003.
    [97] Zhao Z, Wang L, Liu Het al. On similarity preserving feature selection [J]. IEEETransactions on Knowledge and Data Engineering,2013,25(3):619-632.
    [98] Nesterov Y. Introductory Lectures on Convex Optimization: A Basic Course [M].2003.
    [99] Cai D, He X F, Han J W. Spectral regression: a unified approach for sparse subspacelearning: Proc. Int. Conf. on Data Mining (ICDM'07)[C].,2007Cambridge University Press,June.
    [100] Jenatton R, Gramfort A, Michel Vet al. Multi-scale Mining of fMRI data with HierarchicalStructured Sparsity [J]. SIAM Journal on Imaging Sciences,2012,5(3):835-856.
    [101] Bach F, Jenatton R, Mairal Jet al. Structured sparsity through convex optimization [J].Statistical Science,2012,27(4):450-468.
    [102]王松桂,吴密霞,贾忠贞.矩阵不等式[M].北京:科学出版社,2006:273.
    [103] Golub G H, Van Loan C F. Matrix Computations [M]. The Johns Hopkins University Press,1996.
    [104] Wang H, Yan S, Xu Det al. Trace ratio vs. ratio trace for dimensionality reduction[C].Computer Vision and Pattern Recognition (CVPR2007),2007.
    [105] Nie F P, Xiang S, Jia Y Qet al. Trace ratio criterion for feature selection [C]. AAAI'08Proceedings of the23rd national conference on Artificial Intelligence.2008:2,671-676.
    [106] Jia Y, Nie F, Zhang C. Trace Ratio Problem Revisited [J]. IEEE Transactions On NeuralNetworks,2009,20(4):729-735.
    [107] Huang Y, Xu D, Nie F. Semi-Supervised Dimension Reduction Using Trace Ratio Criterion[J]. IEEE Transactions On Neural Networks,2012,23(3):519-526.
    [108] Fisher R A. The statistical utilization of multiple measurements [J]. Ann.Eugenics,1938,8:376-386.
    [109] Nie F, Xiang S, Zhang C. Neighborhood MinMax Projections[C]. International JointConferences on Artificial Intelligence (IJCAI2007),2007.
    [110] Yang Y, Shen H T, Ma Zet al.l2,1-Norm Regularized Discriminative Feature Selection forUnsupervised Learning: Proceedings of the Twenty-Second International Joint Conferenceon Artificial Intelligence[C], Barcelona, Catalonia, Spain,2011. AAAI Press.
    [111]郇中丹,刘永平,王昆扬.简明数学分析[M].高等教育出版社,2009:175-177.
    [112] Horn R A, Johnson C R. Matrix Analysis [M]. Cambridge U.K: Cambridge Univ Press,1985.
    [113] Kato T. Perturbation Theory for Linear Operators [M]. New Youk: Springer-Verlag,1995.
    [114] Ngo K V. An Approach of Eigenvalue Perturbation Theory [J]. Applied Numerical Analysis&Computational Mathematics,2005,2(1):108-125.
    [115] Goldfarb D. Curvilinear path steplenght algorithms for minimization which use directions ofnegative curvature [J]. Mathematical Programming,1980,1980(18):31-40.
    [116] More J J, Oorensen D C. On the use of directions of negative curvature in a modifiedNewton method [J]. Mathematical Programming,1979,1979(16):1-20.
    [117] Nocedal J, Wright S J. Numerical Optimization Second Edition [M]. Springer,2006.
    [118] Watkins D S. Fundamentals of Matrix Computations Third Edition [M]. Wiley,2010.
    [119] Bonnans J, Gilbert J C, Lemarechal Cet al. Numerical Optimization: Theoretical andPractical Aspects Second Edition [M]. Springer,2006.
    [120] Peng H C, Long F H, Ding C. Feature selection based on mutual information:criteria ofmax-dependency, max-relevance, and min-redundancy[J]. IEEE Transaction on PatternAnalysis and Machine Intelligence,2005,27(8):1226-1238.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700