用户名: 密码: 验证码:
若干非参数和半参数模型的稳健估计和特征筛选
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近几十年来非参数和半参数建模受到越来越多统计学者的关注,大量的文献研究了非参数和半参数回归模型的估计问题.非参数模型的优势在于它的灵活性,不需要对模型的结构做任何具体的假设.可是,非参数模型存在明显的缺陷.首先,维数诅咒是非参数估计无法逃避的一个本质问题.其次,非参数模型中很难加入离散的预测变量.第三,当预测变量的维数较高时,很难画出估计函数的图像并给出估计的合理解释.半参数模型作为非参数模型和参数模型之间的一类模型,既继承了非参数模型的灵活性,又继承了参数模型的可解释性.关于模型结构的假定方面:半参数模型强于非参数模型又弱于线性模型,一定程度上降低了(并非完全消除)模型指定错误的可能性.现有的估计方法大多数基于最小二乘法;而最小二乘方法不稳健,同时需要误差的二阶矩存在并且有限.另一方面,随着收集数据能力的不断提高,(超)高维数据频繁地出现在社会生活和科学研究的诸多领域;高维数据的变量选择和超高维数据的特征筛选问题也因此成为当今统计界的又一研究热点.本文在非参数和半参数回归模型的框架下分别研究稳健的估计方法和稳健的特征筛选方法,以便进一步补充和完善相关的方法和理论.
     第2章研究一般的非参数模型Y=m,(T)+σ(T)ε,其中Y是响应变量,T是协变量并且与随机误差ε相互独立,误差满足E(ε)=0, var(ε)=1.假设m,(·)是光滑的,σ(·)恒正.Kai.Li和Zou(2010)在上述非参数模型下提出局部复合分位数回归(local composite quantile regression,LCQR)方法.当误差分布对称且非正态时,LCQR.估计能够显著地改进局部最小二乘(local least squares,LLS)估计的效率;误差服从正态时,LCQR估计相对于LLS估计损失的效率也很少.可是,LCQR方法仅适用于对称的误差分布,非对称的误差分布下LCQR估计的相合性无法保证.实际中误差分布一般是未知的,Kai.Li和Zou(2010)给出的误差对称性的假设有些牵强.为此我们针对非参数模型提出加权局部复合分位数回归(weighted local composite quantile regression,WLCQR)方法,新方法对误差分布没有任何要求,适用范围比LCQR更广泛.任意给定t0,构造m(t0)的WLCQR估计.我们利用不等的权重{ωk,k=1,...,q)对Kai.Li和Zou(2010)的LCQR方法中求出的初始估计{ak,k=1,...,q)进行加权复合.等间隔地取q个点{τk=k/(q+1),k=1,...,q}.记F-1(·)为误差ε的分位数函数,定义m(t0)的WLCQR估计m(t0)为其中权向量U=(ω1,m2,...,ωq)T满足在误差分布对称性未知的情况下.条件使得WLCQR估计m(t0)的渐近偏表达式中的常数项恰好为零,从而保证了WLCQR估计的相合性.于是我们可以得到m(t0)的渐近偏,渐近方差和渐近正态性,即和权向量ω一般是不唯一的,我们通过最小化渐近方差求出最优权向量ω*的理论表达式,从而得到与之对应的m(t0)的最优估计而。m(t0)的渐近方差当误差分布对称时,我们在渐近相对效率的准则下比较新方法求出的最优估计m*(t0),经典的LLS估计mts(t0)以及Kai,Li和Zou(2010)提出的LCQR估计m.cqr(t0)的效率,得到此外,数值模拟和一个实例分析得出的结论也与之前的理论分析一致.
     第3章研究变系数部分线性模型Y=XTα(U)+ZTβ+ε,
     其中α(U)={α1,(U),...,αd1,(U)}T是一个d1×1维未知的光滑函数系数向量,β=(β1,...,βd2)T是一个d2×1维未知的真实参数向量.假设U是一元协变量,随机误差£与协变量向量{U,X,Z}独立,E(ε)=0.任意给定u0,针对上述变系数部分线性模型给出局部秩方法的具体估计步骤.由于模型既涉及参数部分也涉及非参数部分,相对应估计的收敛速度应分别与经典的参数和非参数估计的收敛速度保持一致.受到Kai,Li和Zou(2011)的启发,我们提出三阶段估计步骤来实现局部秩的思想.第一阶段,利用局部秩回归得到参数部分β和非参数部分αu0)的初始估计.第二阶段,利用全局秩回归修正第一阶段求出的参数部分β的初始估计,改进后的参数估计的收敛速度与经典的参数估计的收敛速度保持一致.第三阶段,再次利用局部秩回归改进第一阶段求出的非参数部分α(u0)的初始估计.于是我们可以分别建立参数部分β的局部秩估计βLR和非参数部分α(u0)的局部秩估计αLR(u0)的渐近正态性,即和进一步,通过比较参数部分和非参数部分的局部秩估计和局部最小二乘估计的效率可以发现,局部秩方法相对于局部最小二乘法是一种既稳健又有效的估计方法.具体地说,对大多数非正态分布的误差而言,局部秩估计能够显著地改进局部最小二乘估计的效率;误差分布服从正态时,局部秩估计的效率损失极少.理论结果表明,非参数部分的局部秩估计损失的效率不超过11.1%,参数部分的局部秩估计损失的效率不超过13.6%.此外,我们通过数值模拟和一个环境数据集的实例分析再次验证了之前得到的理论结果.
     第4章研究超高维模型下的特征排序和筛选方法.大多数已有的特征筛选方法都需要假定模型的具体结构,并且要求工作模型与潜在的真实模型非常接近.Zhu,Li,Li和Zhu(2011)在很一般的模型框架下提出一种新的特征筛选方法,即SIRS (sure independent ranking and screening)方法.SIRS方法不需要假设回归模型的具体结构,适用于一大类常见的参数和半参数模型.可是我们发现SIRS方法在某些情况下无法选出活跃的预测变量,第4章将给出具体的例子加以说明.为了改进SIRS方法,我们首次利用预测变量的“局部”信息流来定义新的边际效用准则,进而提出新的非参数特征筛选(nonparametric ranking and screening,NRS)方法.NRS方法依然不需要假定模型的具体结构,其边际效用准则的定义为ψk=E[Ψ2(Xk,Y)], k=1,...,p,其中这里权重函数w(xk)满足w(xk)≥0,E[w,(Xk)]=1.实际中权重函数的简单选取方法是w(xk)=2E[I(Xk相合性.即在一定的正则性条件下,存在充分小的常数sδ/2∈(0,4/δ),使得成立.此外,我们还研究了活跃预测变量之间的相关性并将其运用到特征排序和筛选的过程中,使得非参数特征筛选方法更全面,适用范围更广.在数值模拟实验中,通过考查备种不同类型的回归模型,我们再次验证新提出的方法一致且显著地优于已有的特征筛选方法.
The interest in nonparametric and semiparametric modeling has grown quickly within the last decades, and there is a huge amount of literature that investigate various estimation methods for nonparametric and semiparametric regression. Nonparametric models maximize flexibility and minimize the risk of the error of model specification. However, the price of this flexibility can be high for several reasons. First, estimation precision decreases rapidly as the dimension of the predictors increases, i.e., the curse of dimensionality, which is an unavoidable problem in nonparametric estimation. Sec-ond, it is difficult to integrate discrete predictors into the nonparametric specification. And third, it is a sophisticated task to graph and interpret the resulting function in the multidimensional case. Semiparametric models offer a compromise between the flex-ibility of nonparametric models and the interpretability of parametric models, which make assumptions about functional forms that are stronger than those of nonpara-metric models but less restrictive than those of parametric models, thereby reducing (though not eliminating) the possibility of specification error. Most existing estimation procedures are built on least squares; it's known that least squares is non robust and needs the assumption of finite second order moment of the random error. On the other hand, with the rapid development of the ability to collect data, ultrahigh and high di-mensional data frequently appear in the social life and scientific research; thus variable selection and feature screening techniques become another popularity of statistics of research. In this dissertation, we study robust estimation and feature screening meth-ods for nonparametric and semiparametric models respectively to further perfect the related method and theory.
     Chapter2studies the following general nonparametric regression model Y=m(T)+σ(T)ε, where Y is the response variable, T is a scalar covariate independent of the random error ε, E(ε)=0, and var(ε)=1. Assume that rn(·) is smooth and σ(·) is positive. Kai, Li and Zou (2010) proposed local composite quantile regression (LCQR) method for the above nonparametric model and proved that LCQR could significantly improve the estimation efficiency of local least squares (LLS) for the case of non-normal and symmetric error distributions while losing just a small amount of efficiency for normal error. However, the assumption of symmetric random errors is an indispensable pre-requisite for estimation consistency of LCQR, without which the LCQR estimate is no longer consistent. In practice, the error density is generally unknown; the assumption of symmetric errors is strong. Therefore we put forward a unified method called weighted local composite quantile regression (WLCQR) to construct unbiased estimation of m(·) for general random errors, including both asymmetric and symmetric errors. For any given point t0. we construct the WLCQR estimator of m(t0). We combine the initial estimators{ak, k=1,...,q} computed by Kai, Li and Zou (2010) with possibly dif-ferent weights{ωk, k=1,...,q) and uniform quantiles{τk=k/(q+1), k=1,...,q}. Denote by F-1(·) the quantile function of the error ε. Then the WLCQR estimator m.(t0) of m(t0) can be denned as where the weight vector ω=(ω1,ω2,...,ωq)T satisfies conditions
     The condition∑kq=1ωF-1(τk)=0provides an opportunity to eliminate the bias term in the asymptotic representation caused by asymmetric random errors, which guarantees the estimation consistency and unbiasedness of m(t0) asymptotically. Then we can establish the asymptotic bias, asymptotic variance and asymptotic normality of m{t0). That is,
     and
     Furthermore, owing to the non uniqueness of the weight vector ω, we calculate the theoretically optimal weight vector ω*by minimizing asymptotic variance, and con-sequently we obtain the optimal estimate m*(to) of m(t0), whose minimum variance is For the case of symmetric errors, we compare the optimal estimate in*(to) with both the LLS estimator denoted by mts(t0) and the LCQR estimator denoted by mcqr(t0) via asymptotic relative efficiency, that is, we obtain Finite sample behaviors conducted by monte carlo simulations and a real data, analysis further illustrate our theoretical findings.
     Chapter3examines the following varying-coefficient partially linear model Y=XTα(U)+ZTβ+ε,
     where α(U)={α1(U),..., αd1((U)}T is a d1×1dimensional vector of unknown smooth regression coefficient functions, β=(β1,...,βd2)T is a d2×1dimensional vector of un-known true parameters. Assume U is univariate and the random error ε is independent of the covariates{U,X,Z}. For any given point u0, we develop a robust estimation procedure for the above varying coefficient partially linear model via local rank. The model involves both nonparametric and parametric components, and they should be estimated with nonparametric and parametric rates of convergence, respectively. Mo-tivated by Kai et al.(2011), we propose a three-stage estimation procedure to achieve a local rank estimation. In the first stage, we employ a local linear rank technique to derive the initial estimates of β and α(u0). Then, in the second stage, we use global rank regression to improve the convergence rate of the initial estimate of β. Finally, we use the local linear rank method again to improve the estimation efficiency of the initial estimate of α(u0). Finally, we can establish the asymptotic normality for the local rank estimate βLR of the parametric part β and the local rank estimate αLR(u0) of the nonparametric part α(u0) separately. That is, and Next we calculate the asymptotic relative efficiency (ARE) of the local rank method with respect to the local least squares method. By comparing the asymptotic relative efficiency, we find that our local rank method provides a highly efficient and robust alternative to the local least squares. In other words, the local rank method is highly efficient across a wide class of non-normal error distributions and it only loses a small amount of efficiency for normal error. Moreover, it is proved that the loss in efficiency is at most11.1%for estimating varying coefficient functions and is no greater than13.6%for estimating parametric components. Moreover, monte carlo simulations and a real data example are conducted to examine the finite sample performance, and the numerical results are consistent with our theoretical conclusions.
     In Chapter4, we focus on feature ranking and screening methods for ultrahigh dimensional models. Most existing methods assume a specific model structure, and also heavily depend on the belief that the working model is close to the underlying true model. Zhu, Li, Li and Zhu (2011) introduce a novel feature screening procedure called sure independent ranking and screening (SIRS) under a general model framework. SIRS does not require imposing specific model structures on regression functions and thus covers a wide variety of commonly seen parametric and semiparametric models. However, SIRS may miss some active predictors under certain cases, Chapter4will give detailed discussions. To further improve SIRS, we first use the "local" information flow of the predictors to define a new measure criterion, and then propose a nonparametric ranking and screening (NRS) method. NRS method needs no assumption of the model structure and its corresponding criterion is defined as follows: where
     Here w(xk) is a weight function satisfying w(xk)≥0, E[w(Xk)]=1, and a simple choice is w(xk)=2E[I(Xk     Moreover, we take the correlations among active predictors into account and append them into the ranking and screening procedure to make the nonparametric feature screening more comprehensive. By examining various regression models in the simu-lation part, we find that our new method performs uniformly and significantly better than those existing feature screening methods.
引文
[1]Buhlmann, P. and Geer, S. (2011). Statistics for High-Dimensional Data:Methods, Theory and Applications. Springer Heidelberg Dordrecht, London:New York.
    [2]Bradic, J., Fan, J. and Wang, W. (2011). Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. J. R. Stat. Soc. Ser. B Stat. Methodol, 73,325-349.
    [3]Candes, E. and Tao, T. (2007). The Dantzig selector:Statistical estimation when p is much larger than n (with discussion). Ann. Statist,35,2313-2404.
    [4]Cho, H. and Fryzlewicz, P. (2012). High dimensional variable selection via tilting. J. R. Stat. Soc. Ser. B Stat. Methodol.,74,593-622.
    [5]David, H. (1998). Early sample measures of variability. Statist. Sci.,13,368-377.
    [6]Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87.998-1004.
    [7]Fan, J. and Gijbels, I. (1992). Variable bandwidth and local linear regression smoothers. Ann. Statist,20,2008-2036.
    [8]Fan, J., Farmen, M. and Gijbels, I. (1998). Local maximum likelihood estimation and inference. J. R. Stat. Soc. Ser. B Stat. Methodol,60.591-608.
    [9]Fan, J., Feng, Y. and Song, R. (2011). Nonparametric independence screening in sparse ultra-high dimensional additive models. J. Amer. Statist. Assoc.,106, 544-557.
    [10]Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. London:Chapman and Hall.
    [11]Fan, J., Hu, T. C. and Truong, Y. K. (1994). Robust non-parametric function estimation. Scand, J. Statist,21,433-446.
    [12]Fan, J. and Huang, T. (2005). Profile likelihood inferences on semiparametric varying-cocfficient partially linear models. Bernoulli, 11,1031-1057.
    [13]Fan, J. and Li. R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc.,96,1348-1360.
    [14]Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol,70,849-911.
    [15]Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimen-sional feature space. Statist. Sinica,20,101-148.
    [16]Fan, J., Samworth, R. and Wu, Y. (2009). Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res.,10,1829-1853.
    [17]Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. Ann. Statist,38,3567-3604.
    [18]Fan, J. and Zhang, W. (1999). Statistical estimation in varying-coefficient Models. Ann. Statist.27.1491-1518.
    [19]Fan, J. and Zhang. W. (2000). Simultaneous confidence bands are and hypothesis testing in varying coefficient models. Scand. J. Statist.,27,715-731.
    [20]Feng, L., Zou, C. and Wang, Z. (2012). Local Walsh-average regression. J. Multi-variate Anal.,106,36-48.
    [21]Frank I. E. and Friedman J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics,35,109-148.
    [22]Gray, H. L and Schucany, W. R. (1972). The Generalized Jackknife Statistic. New York:Marcel Dekker.
    [23]Hall, P. and Li, K. (1993). On almost linearity of low dimensional projection from high dimensional data. Ann. Statist.,21,867-889.
    [24]Hettmansperger, T. and McKean, J. (2011). Robust Nonparametric Statistical Methods,2nd. New York:Chapman-Hall.
    [25]Hodges, J. and Lehmann, E. (1956). The efficiency of some nonparametric com-petitors of the t-test. Ann. Math. Statist.,27,324-335.
    [26]Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge esti-mators in sparse high-dimensional regression models. Ann. Statist,36,587-613.
    [27]Hunter, D. and Lange, K.(2000). Quantile regression via an MM algorithm. J. Comput. Graph. Statist,9,60-77.
    [28]Jia, J. and Yu, B. (2010). Model selection consistency of the Elastic Net when p> n. Statist. Sinica,20,595-611.
    [29]Kai, B., Li, R. and Zou, H. (2010). Local CQR smoothing:an efficient and safe alternative to local polynomial regression. J. R. Stat. Soc. Ser. B Stat. Methodol., 72,49-69.
    [30]Kai, B., Li, R. and Zou, H. (2011). New efficient estimation and variable selec-tion methods for semiparametric vary ing-coefficient partially linear models. Ann. Statist.,39,305-332.
    [31]Kim, M. (2007). Quantile Regression With Varying Coefficients. Ann. Statist,35, 92-108.
    [32]Knight, K. (1998). Limiting distributions for L1 regression estimators under gen-eral conditions. Ann. Statist,26,755-770.
    [33]Koenker, R. (2005) Quantile Regression. Cambridge:Cambridge University Press.
    [34]Koenker, R. and Bassett, G. (1978) Regression quantiles. Econometrica,46,33-50.
    [35]Li, G., Peng, H., Zhang, J. and Zhu, L. (2012). Robust rank correlation based screening. Ann. Statist.,40.1846-1877.
    [36]Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc,107,1129-1139.
    [37]Lin, L. and Li, F. (2008). Stable and bias-corrected estimation for nonparametric regression models. J. Nonparametr. Statist,20,283-303.
    [38]Lin. L., Sun, J. and Zhu, L. (2012). Nonparametric feature screening. Revised.
    Luo. X., Stefanski, L. and Boos. D. (2006). Tuning variable selection procedure by adding noise. Technometrics,48,165-175.
    [40]Parzen. E. (1962). On estimation of a probability density function and model. Ann. Math. Statist.,33,1065-1076.
    [41]Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econometric Theory,7,186-199.
    [42]Ruppert, D., Sheather, S. and Wand, M. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc,90,1257-1270.
    [43]Ruppert, D., Wand, M. and Carroll, R. (2003). Semiparametric Regression. Cam-bridge Univ. Press, Cambridge.
    [44]Ruppert, D., Wand, M., Holst, U. and Hossjer, O. (1997). Local polynomial variance-function estimation. J. Amer. Statist. Assoc,39,262-273.
    [45]Shang, S., Zou, C. and Wang, Z. (2012). Local walsh-average regression for semi-parametric varying-coefficient models. Statist. Probab. Lett.,82,1815-1822.
    [46]Silverman, B.(1986). Density Estimation. Chapman and Hall, London.
    [47]Speckman, P. (1988). Kernel smoothing in partial linear models. J. R. Stat. Soc. Ser. B Stat. Methodol., 50,413-436.
    [48]Sun. J., Gai, Y. and Lin, L. (2013). Weighted local linear composite quantile estimation for the case of general error distributions. J. Statist. Plann. Inference, 143,1049-1063.
    [49]Sun, J. and Lin, L. (2013). Local rank estimation for varying-coefficient partially linear models. Submitted.
    [50]Tibshirani R. (1996). Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Ser. B Stat. Methodol.,58,267-288.
    [51]Tong, T and Wang, Y. (2005). Estimating residual variance in nonparametric regression using least squares. Biometrika,92,821-830.
    [52]Wang, L., Kai, B. and Li, R. (2009). Local rank inference for varying coefficient models. J. Amer. Statist. Assoc.,488-1631-1645.
    [53]Welsh, A. (1996). Robust estimation of smooth regression and spread functions and their derivatives. Statist. Sinica,6,347-366.
    [54]Xia, Y., Zhang, W. and Tong, H. (2004). Efficient estimation for semivarying-coefficient models. Biometrika,91,661-681.
    [55]Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol,68,49-67.
    [56]Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist,38,894-942.
    [57]Zhang, W. and Lee, S. (2000). Variable bandwidth selection in varying-coefficient models, J. Multivariate Anal.,74,116-134.
    [58]Zhang, W., Lee, S. and Song, X. (2002). Local polynomial fitting in semivarying coefficient model. J. Multivariate Anal.,82.166-188.
    [59]Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. J. Mach. Learn. Res.,7,2541-2563.
    [60]Zhu, L. P., Li, L. X., Li, R. Z. and Zhu, L. X. (2011). Model-free feature screening for ultrahigh dimensional data. J. Amer. Statist. Assoc.,106,1464-1475.
    [61]Zou, H. (2006). The adaptive LASSO and its oracle properties. J. Amer. Statist. Assoc.,101,1418-1429.
    [62]Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol,67,301-320.
    [63]Zou, H. and Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. Ann. Statist.,36,1108-1126.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700