主因子逼近方法在变量选择中的应用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Application of Principal Factor Approximation Method in Variable Selection
  • 作者:许健 ; 崔靓然 ; 李雅芝 ; 张祎璠
  • 英文作者:XU Jian;CUI Liangran;LI Yazhi;ZHANG Yifan;School of Information Science and Technology, Hunan Agriculture University;
  • 关键词:变量选择 ; 主因子近似 ; 偏最小二乘 ; 变量共线性
  • 英文关键词:variable selection;;principal factor approximation;;partial least squares;;variable collinearity
  • 中文刊名:YYSF
  • 英文刊名:Journal of Hunan Institute of Science and Technology(Natural Sciences)
  • 机构:湖南农业大学信息科学技术学院;
  • 出版日期:2019-03-15
  • 出版单位:湖南理工学院学报(自然科学版)
  • 年:2019
  • 期:v.32;No.101
  • 基金:湖南农业大学青年自然科学基金(16QN11)
  • 语种:中文;
  • 页:YYSF201901003
  • 页数:6
  • CN:01
  • ISSN:43-1421/N
  • 分类号:12-16+63
摘要
当数据中变量个数远大于样本个数时,变量之间的共线性问题变得尤其突出.偏最小二乘方法作为一种潜变量方法,将原始变量通过线性组合的方式转化为几个新的潜变量用于对响应变量的建模解释,但变量之间复杂共线性的存在使得变量选择困难重重.本文采用主因子近似方法分离出原始变量之间的共线性信息,再进行变量选择.模拟研究表明主因子逼近方法能有效地提高变量选择的精度.
        The problem of variable collinearity between variable becomes particularly acute when variables are far more than samples in data. As a method of latent variables, partial least squares transform original variables into a few new factors by collinear combination, which can interpret response variable modeling. But, the complex sample data correlation structure makes variable selection become a tough task. In this paper, we introduced a principal component approximation(PFA) method to directly eliminate the effect of sample correlation on the observed values of the regression coefficients. Simulation studies were performed under three typical sample data correlation structures and the results showed that PFA and PLS performs comparably well.
引文
[1] Pearson K. Mathematical Contributions to the Theory of Evolution—On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs[J]. Proceedings of the Royal Society of London(1854-1905), 1896, 60(1):489~98
    [2]FanJ,LvJ.Sureindependencescreeningforultrahighdimensionalfeaturespace[J].JournaloftheRoyalStatisticalSociety:SeriesB(Statistical Methodology), 2008, 70(5):849~911
    [3] Trygg J, Wold S. Orthogonal projections to latent structures(O-PLS)[J]. Journal of Chemometrics, 2002, 16(3):119~28
    [4] Wold S, Sj Str M M, Erikssonl. PLS-regression:a basic tool of chemometrics[J]. Chemometrics and Intelligent Laboratory Systems, 2001, 58(2):109~30
    [5] Centner deNoord etal. Elimination of uninformative variables for multivariate calibration[J]. Analytical chemistry, 1996, 68(21):3851~8
    [6] Cai WS, Li YK, Shao XG. A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra[J]. Chemometrics and Intelligent Laboratory Systems, 2008, 90(2):188~94
    [7] Fernandez Pierna J A,Abbas O, Baeten V, et al. A Backward Variable Selection method for PLS regression(BVSPLS)[J]. Analytica Chimica Acta, 2009,642(1-2):89~93
    [8] Hoskuldsson A. Variable and subset selection in PLS regression[J]. Chemometrics and Intelligent Laboratory Systems, 2001, 55(1-2):23~38
    [9] Andersen R Bro. Variable selection in regression—a tutorial[J]. Journal of Chemometrics, 2010, 24(11-12):728~37
    [10] Leek J T, Storey J D. A general framework for multiple testing dependence[J]. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(48):18718~23
    [11] Fan J, Han X, Gu W. Estimating False Discovery Proportion Under Arbitrary Covariance Dependence[J]. J Am Stat Assoc, 2012, 107(499):1019~35

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700