摘要
考虑超高维部分线性变系数模型,其中线性部分的协变量的维数随着样本容量以指数阶的速度增长.考虑到超高维协变量间存在相关性,提出贪婪的profile向前回归(greedy profile forward regression,GPFR)方法对超高维的线性部分的协变量进行变量筛选.并在一定的正则条件下,证明了所提出GPFR方法的筛选相合性.GPFR方法得到一系列嵌套的模型,为确定是否将某个候选的解释变量选入模型,用EBIC准则选择"最优"的模型.通过数值模拟和实例分析研究了GPFR算法的有限样本性质,发现在变量间存在高度相关和信噪比较低时,所提的GPFR方法优势明显.
In this paper,the partially linear varying coefficient models were established when the predictors of the linear part were ultra-high dimensional,where the dimensionality grew exponentially with the sample size. A greedy profile forward regression( GPFR) method was proposed to finish the variable screening for the ultra-high dimensional linear predictors. Under some regularity conditions,the proposed GPFR method has a screening consistency property was proven. As for the GPFR procedure obtaining a list of the nested models,to determine whether or not to include the candidate predictor in the model of selected ones,an extended Bayesian information criterion( EBIC) was adopted to select the "best "candidate model. The finite-sample performance of the proposed GPFR method was assessed by using simulation studies and real data analysis. The result shows that the proposed GPFR method has advantage in the cases existing high correlation between the predictors and low signal noise ratio.
引文
[1]FAN J Q,LV J C.Sure independent screening for ultrahigh dimensional feature space[J].Journal of the Royal Statistical Society:Series B,2008,70(5):849-911.
[2]FAN J Q,SONG R.Sure independence screening in generalized linear models with NP-dimensionality[J].The Annals of Statistics,2010,38(6):3567-3604.
[3]ZHU L P,LI L X,LI R Z,et al.Model-free feature screening for ultrahigh-dimensional data[J].Journal of the American Statistical Association,2011,106(496):1464-1475.
[4]LI G R,PENG H,ZHANG J,et al.Robust rank correlation based screening[J].The Annals of Statistics,2012,40(3):1846-1877.
[5]LI R Z,ZHONG W,ZHU L P.Feature screening via distance correlation learning[J].Journal of the American Statistical Association,2012,107(499):1129-1139.
[6]FAN J Q,MA Y B,DAI W.Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models[J].Journal of the American Statistical Association,2014,109(507):1270-1284.
[7]WANG H S.Forward regression for ultra-high dimensional variable screening[J].Journal of the American Statistical Association,2009,104(488):1512-1524.
[8]LIANG H,WANG H S,TASI C L.Profile forward regression for ultrahigh dimensional variable screening in semiparametric partially linear models[J].Statistica Sinica,2012,22(2):531-554.
[9]CHENG M Y,HONDA T,ZHANG J T.Forward variable selection for sparse ultra-high dimensional varying coefficient models[J].Journal of the American Statistical Association,2016,111(515):1209-1221.
[10]LI Y J,LI G R,LIAN H,et al.Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models[J].Journal of Multivariate Analysis,2017,155:133-150.
[11]CHENG M Y,FENG S Y,LI G R,et al.Greedy forward regression for variable screening[J].Australian&New Zealand Journal of Statistics,2018,60(1):20-42.
[12]FAN J Q,HUANG T.Profile likelihood inferences on semiparametric varying-coefficient partially linear models[J].Bernoulli,2005,11(6):1031-1057.
[13]LAM C,FAN J Q.Profile-kernel likelihood inference with diverging number of parameter[J].The Annals of Statistics,2009,36(5):2232-2260.
[14]LI R Z,LIANG H.Variable selection in semiparametric regression modeling[J].The Annals of Statistics,2008,36(1):261-286.
[15]ZHAO P X,XUE L G.Variable selection for semiparametric varying coefficient partially linear models[J].Statistics&Probability Letters,2009,79(20):2148-2157.
[16]WANG H X,ZHU Z Y,ZHOU J H.Quantile regression in partially linear varying coefficient models[J].The Annals of Statistics,2009,37(6):3841-3866.
[17]KAI B,LI R Z,ZOU H.New efficient estimation and vari-able selection methods for semi-parametric varying coefficient partially linear models[J].The Annals of Statistics,2011,39(1):305-332.
[18]HONG Z P,HU Y,LIAN H.Variable selection for highdimensional varying coefficient partially linear models via nonconcave penalty[J].Metrika,2013,76(7):887-908.
[19]LI G R,XUE L G,LIAN H.Semi-varying coefficient models with a diverging number of components[J].Journal of Multivariate Analysis,2011,102:1166-1174.
[20]薛留根.现代统计模型[M].北京:科学出版社,2012:275-300.
[21]FAN J Q,GIJBELS I.Local polynomial modelling and its applications[M].London:Chapman&Hall,1996:57-105.
[22]VOTAVOVH,DOSTALOVA MERKEROVA M,FEJGLOVA K,et al.Transcriptome alterations in maternal and fetal cells induced by tobacco smoke[J].Placenta,2011,32(10):763-770.
[23]DUDOIT S,FRIDLYAND J,SPEED T P.Comparison of discrimination methods for the classification of tumors using gene expression data[J].Journal of the American Statistical Association,2002,97(457):77-87
[24]GILLIAM M,RIFAS-SHIMAN S,BERKEY C,et al.Maternal gestational diabetes,birth weight and adolescent obesitybirth weight and adolescent obesity[J].Pediatrics,2003,111(3):221-226.