超高维部分线性变系数模型的贪婪变量筛选
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Greedy Variable Screening in Ultra-high Dimensional Partially Linear Varying Coefficient Models
  • 作者:李玉杰 ; 李高荣
  • 英文作者:LI Yujie;LI Gaorong;College of Applied Sciences,Beijing University of Technology;Beijing Institute for Scientific and Engineering Computing,Beijing University of Technology;
  • 关键词:向前回归 ; 部分线性变系数模型 ; 变量筛选 ; 筛选相合性 ; 超高维
  • 英文关键词:forward regression;;partially linear varying coefficient model;;variable screening;;screening consistency property;;ultra-high dimensional
  • 中文刊名:BJGD
  • 英文刊名:Journal of Beijing University of Technology
  • 机构:北京工业大学应用数理学院;北京工业大学北京科学与工程计算研究院;
  • 出版日期:2018-06-06 11:11
  • 出版单位:北京工业大学学报
  • 年:2018
  • 期:v.44
  • 基金:国家自然科学基金资助项目(11471029)
  • 语种:中文;
  • 页:BJGD201809012
  • 页数:10
  • CN:09
  • ISSN:11-2286/T
  • 分类号:87-96
摘要
考虑超高维部分线性变系数模型,其中线性部分的协变量的维数随着样本容量以指数阶的速度增长.考虑到超高维协变量间存在相关性,提出贪婪的profile向前回归(greedy profile forward regression,GPFR)方法对超高维的线性部分的协变量进行变量筛选.并在一定的正则条件下,证明了所提出GPFR方法的筛选相合性.GPFR方法得到一系列嵌套的模型,为确定是否将某个候选的解释变量选入模型,用EBIC准则选择"最优"的模型.通过数值模拟和实例分析研究了GPFR算法的有限样本性质,发现在变量间存在高度相关和信噪比较低时,所提的GPFR方法优势明显.
        In this paper,the partially linear varying coefficient models were established when the predictors of the linear part were ultra-high dimensional,where the dimensionality grew exponentially with the sample size. A greedy profile forward regression( GPFR) method was proposed to finish the variable screening for the ultra-high dimensional linear predictors. Under some regularity conditions,the proposed GPFR method has a screening consistency property was proven. As for the GPFR procedure obtaining a list of the nested models,to determine whether or not to include the candidate predictor in the model of selected ones,an extended Bayesian information criterion( EBIC) was adopted to select the "best "candidate model. The finite-sample performance of the proposed GPFR method was assessed by using simulation studies and real data analysis. The result shows that the proposed GPFR method has advantage in the cases existing high correlation between the predictors and low signal noise ratio.
引文
[1]FAN J Q,LV J C.Sure independent screening for ultrahigh dimensional feature space[J].Journal of the Royal Statistical Society:Series B,2008,70(5):849-911.
    [2]FAN J Q,SONG R.Sure independence screening in generalized linear models with NP-dimensionality[J].The Annals of Statistics,2010,38(6):3567-3604.
    [3]ZHU L P,LI L X,LI R Z,et al.Model-free feature screening for ultrahigh-dimensional data[J].Journal of the American Statistical Association,2011,106(496):1464-1475.
    [4]LI G R,PENG H,ZHANG J,et al.Robust rank correlation based screening[J].The Annals of Statistics,2012,40(3):1846-1877.
    [5]LI R Z,ZHONG W,ZHU L P.Feature screening via distance correlation learning[J].Journal of the American Statistical Association,2012,107(499):1129-1139.
    [6]FAN J Q,MA Y B,DAI W.Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models[J].Journal of the American Statistical Association,2014,109(507):1270-1284.
    [7]WANG H S.Forward regression for ultra-high dimensional variable screening[J].Journal of the American Statistical Association,2009,104(488):1512-1524.
    [8]LIANG H,WANG H S,TASI C L.Profile forward regression for ultrahigh dimensional variable screening in semiparametric partially linear models[J].Statistica Sinica,2012,22(2):531-554.
    [9]CHENG M Y,HONDA T,ZHANG J T.Forward variable selection for sparse ultra-high dimensional varying coefficient models[J].Journal of the American Statistical Association,2016,111(515):1209-1221.
    [10]LI Y J,LI G R,LIAN H,et al.Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models[J].Journal of Multivariate Analysis,2017,155:133-150.
    [11]CHENG M Y,FENG S Y,LI G R,et al.Greedy forward regression for variable screening[J].Australian&New Zealand Journal of Statistics,2018,60(1):20-42.
    [12]FAN J Q,HUANG T.Profile likelihood inferences on semiparametric varying-coefficient partially linear models[J].Bernoulli,2005,11(6):1031-1057.
    [13]LAM C,FAN J Q.Profile-kernel likelihood inference with diverging number of parameter[J].The Annals of Statistics,2009,36(5):2232-2260.
    [14]LI R Z,LIANG H.Variable selection in semiparametric regression modeling[J].The Annals of Statistics,2008,36(1):261-286.
    [15]ZHAO P X,XUE L G.Variable selection for semiparametric varying coefficient partially linear models[J].Statistics&Probability Letters,2009,79(20):2148-2157.
    [16]WANG H X,ZHU Z Y,ZHOU J H.Quantile regression in partially linear varying coefficient models[J].The Annals of Statistics,2009,37(6):3841-3866.
    [17]KAI B,LI R Z,ZOU H.New efficient estimation and vari-able selection methods for semi-parametric varying coefficient partially linear models[J].The Annals of Statistics,2011,39(1):305-332.
    [18]HONG Z P,HU Y,LIAN H.Variable selection for highdimensional varying coefficient partially linear models via nonconcave penalty[J].Metrika,2013,76(7):887-908.
    [19]LI G R,XUE L G,LIAN H.Semi-varying coefficient models with a diverging number of components[J].Journal of Multivariate Analysis,2011,102:1166-1174.
    [20]薛留根.现代统计模型[M].北京:科学出版社,2012:275-300.
    [21]FAN J Q,GIJBELS I.Local polynomial modelling and its applications[M].London:Chapman&Hall,1996:57-105.
    [22]VOTAVOVH,DOSTALOVA MERKEROVA M,FEJGLOVA K,et al.Transcriptome alterations in maternal and fetal cells induced by tobacco smoke[J].Placenta,2011,32(10):763-770.
    [23]DUDOIT S,FRIDLYAND J,SPEED T P.Comparison of discrimination methods for the classification of tumors using gene expression data[J].Journal of the American Statistical Association,2002,97(457):77-87
    [24]GILLIAM M,RIFAS-SHIMAN S,BERKEY C,et al.Maternal gestational diabetes,birth weight and adolescent obesitybirth weight and adolescent obesity[J].Pediatrics,2003,111(3):221-226.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700