基于随机矩阵理论的高维数据线性判别分析方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Linear Discriminant Analysis of High-dimensional Data Using Random Matrix Theory
  • 作者:刘鹏 ; 叶宾
  • 英文作者:LIU Peng;YE Bin;School of Information and Control Engineering,China University of Mining and Technology;
  • 关键词:线性判别分析 ; 高维数据 ; 随机矩阵理论 ; 分类 ; 协方差矩阵
  • 英文关键词:Linear discriminant analysis;;High-dimensional data;;Random matrix theory;;Classification;;Covariance matrix
  • 中文刊名:JSJA
  • 英文刊名:Computer Science
  • 机构:中国矿业大学信息与控制工程学院;
  • 出版日期:2019-06-15
  • 出版单位:计算机科学
  • 年:2019
  • 期:v.46
  • 基金:徐州市应用基础研究计划项目(KC18069)资助
  • 语种:中文;
  • 页:JSJA2019S1091
  • 页数:4
  • CN:S1
  • ISSN:50-1075/TP
  • 分类号:433-436
摘要
线性判别分析(LDA)是机器学习和数据挖掘中一种常用的基于模型的分类方法。尽管该分类方法在许多实际应用中表现良好,但在处理高维数据时其效果却很不理想。其原因在于:当变量数目p接近或者大于样本数目n时,样本协方差矩阵不再是真实协方差矩阵的一个良好估计,导致线性判别函数值产生了较大的偏差。文中提出了一种基于随机矩阵理论的高维数据分类器正则化方法。首先,利用随机矩阵理论,分别以旋转不变估计法(当p≤n时)或者特征值截取法(当p>n时)对高维协方差矩阵进行一致估计;然后,使用估计出的高维协方差矩阵计算判别函数值。在模拟数据集和3个微阵列数据集上进行的分类实验的结果表明,所提线性判别分析方法在处理高维数据时不但适用范围更广,而且具有较高的分类正确率。
        Linear discriminant analysis(LDA) is an important theoretical and analytic tool for many machine learning and data mining tasks.As a parametric classification method,it performs well in many applications.However,LDA is impractical for high-dimensional data sets which are now routinely generated everywhere in modern society.A primary reason for the inefficiency of LDA for high-dimensional data is that the sample covariance matrix is no longer a good estimator of the population covariance matrix when the dimension of feature vector is close to or even larger than the sample size.Therefore,this paper proposed a high-dimensional data classifier regularization method based on random matrix theory.Firstly,a truly consistent estimation was conducted for high-dimensional covariance matrix through rotation invariance estimation and eigenvalue interception.Secondely,the estimated high-dimensional covariance matrix was used to calculate the discrimination function value.Numerical experiments on the artificial datasets,as well as some real world datasets such as the microarray datasets,demonstrate that the proposed discriminant analysis method has wider applications and yields higher accuracies than existing competitors.
引文
[1] 霍中花,陈莹.采用增量式线性判别分析的行人再识别[J].小型微型计算机系统,2017,38(3):595-600.
    [2] 尹洪涛,付平,沙学军.基于DCT和线性判别分析的人脸识别[J].电子学报,2009,37(10):2211-2214.
    [3] 余建波,卢笑蕾,宗卫周.基于局部与非局部线性判别分析和高斯混合模型动态集成的晶圆表面缺陷探测与识别[J].自动化学报,2016,42(1):47-59.
    [4] DUDOIT S,FRIDLYAND J,SPEED T P.Comparison of discrimination methods for the classification of tumors using gene expression data[J].Journal of the American Statistical Association,2002,97(457):77-87.
    [5] 蒋胜利.高维数据的特征选择与特征提取研究[D].西安:西安电子科技大学,2011.
    [6] 朱蔚恒,印鉴,邓玉辉,等.大数据环境下高维数据的快速重复检测方法[J].计算机研究与发展,2016,53(3):559-570.
    [7] 杨静,赵家石,张健沛.一种面向高维数据挖掘的隐私保护方法[J].电子学报,2013,41(11):2187-2192.
    [8] 白志东,郑术蓉,姜丹丹.大维统计分析[M].北京:高等教育出版社,2012:1-4.
    [9] TREVOR H,ROBERT T,JEROME F.The elements of statistical learning [M].Springer,2009:106-117.
    [10] FRIEDMAN J H.Regularized discriminant analysis[J].Journal of the American Statistical Association,1989,84(405):165-175.
    [11] YE J,WANG T.Regularized discriminant analysis for high dimensional,low sample size data[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2006:454-463.
    [12] 黄国宏,刘东峰.一种新的高维小样本情况下的线性判别分析[J].科学技术与工程,2008,8(10):2575-2578.
    [13] 崔振,山世光,陈熙霖.结构化稀疏线性判别分析[J].计算机研究与发展,2014,51(10):2295-2301.
    [14] GORECKI T,LUCZAK M.Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse[J].International Journal of Applied Mathematics and Computer Science,2013,23(2):463-471.
    [15] BUN J,BOUCHAUD J P,POTTERS M.Cleaning large correlation matrices:tools from random matrix theory [J].Physics Reports,2017,666:1-109.
    [16] BAI J,SHI S.Estimating high dimensional covariance matrices and its applications [J].Annals of Economics and Finance,2011,12(2):199-215.
    [17] 王磊,郑宝玉,李雷.基于随机矩阵理论的协作频谱感知[J].电子与信息学报,2009,31(8):1925-1929.
    [18] 韩华,吴翎燕,宋宁宁.基于随机矩阵的金融网络模型[J].物理学报,2014,63(13):138901.
    [19] 许帅.复杂网络的随机矩阵理论分析[D].徐州:中国矿业大学,2014.
    [20] BUN J,ALLEZ R,BOUCHAUD J P.Rotational invariant estimator for general noisy matrices[J].IEEE Transactions on Information Theory,2016,62(12):7475-7490.
    [21] EDELMAN A,RAO N R.Random matrix theory[J].ActaNumerica,2005,14:233-297.
    [22] SRIVASTAVA M S,KUBOKAWA T.Comparison of discrimination methods for high dimensional data[J].Journal of the Japan Statistical Society,2007,37(1):123-134.
    [23] TONG T,CHEN L,ZHAO H.Improved mean estimation and its application to diagonal discriminant analysis[J].Bioinformatics,2012,28(4):531-537.
    [24] GUO Y,HASTIE T,TIBSHIRANI R.Regularized linear discriminant analysis and its application in microarrays[J].Biostatistics,2007,8(1):86-100.
    [25] Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group [EB/OL].http://ico2s.org/datasets/microarray.html.
    [26] Gene Expression Model Selector [EB/OL].http://www.gems-system.org.