高维线性模型和部分线性模型的相合统计推断
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在模型建模过程的初始阶段,大量的预测变量被引入从而形成全模型.但在实际的应用过程中,过多的预测变量和较大的模型将需要大量的计算,而且会增加参数估计的方差(variance)和均方误差(MSE),甚至导致算法不稳定或异常退出,直接影响参数估计的结果和模型的预测效果.为了简化模型和增强可预测性,将一些不重要的预测变量从全模型中剔除,从而形成简化模型(或限制模型,restricted model)在一定的正则条件下,简化模型中参数的估计是相合的.然而,当采用变量选择的模型特别需要参数估计的相合性时,稀疏性和oracle性质(如SCAD估计)仅在逐点意义下成立,这意味着它不具有一个好的全局性质.而且,对于简化模型的估计或模型选择后(post-model-selection)估计,如果一些重要变量不幸被剔除,其系数虽然近似为零但对响应变量仍然有影响,那么简化模型(亦称子模型)是误定(misspecified)的,从而对于子模型中的参数采用常用方法得到的估计将收敛于伪参数而不是真实参数值.即使选出的子模型只是局部误定,采用常用的参数估计方法得出的估计也仍然是不相合的.
     另一方面,在某些应用场景下,可能只对其中的部分变量及其系数感兴趣,因为这些变量是可以适当控制的.反之,可能对其余的变量并不感兴趣,或者这些剩余变量本身不容易被精确控制,或者剩余变量对响应变量的影响结构还不是很清楚.在这种情况下,如果采用全模型同样会面临上述的问题,而且可能会出现模型误定的情形.如果只将响应变量对这些感兴趣的变量进行回归分析,显然模型是有偏的,对应的参数估计也将是有偏的;如果在回归结构中,将感兴趣的变量作为参数项而将剩余变量作为非参项处理,虽然这样得到的半参回归模型相对较为合理,但是这时非参项将可能面临维数祸根问题(curse of dimensionality)特别在剩余变量的维数很高时,维数祸根问题将直接导致半参回归模型的估计结果和预测效果的损失.由于人们认识的局限性,在实际应用中更多地将会面临到这种半参情形.
     以上表明,在包含变量选择(variable selection)或者变量指定(variable specified)的建模过程中相合估计和置信域构造仍然是一个困难.本论文将主要研究这种情形下感兴趣参数的估计方法以及相应模型的预测.对于线性回归模型假设仅对参数β及协变量X感兴趣,本论文在第二章将提出一个半参方法以调整有偏子模型从而得到参数β的相合估计,进而进行模型预测.详细地,通过寻找一个方向T∈(?),构造出调整模型其中将有偏子模型调整为部分条件无偏,即满足采用一维非参核方法对非参项g(TTZ)进行估计如下其中K(·)是核函数,h是依赖于n的窗宽.代入调整模型后,得到参数β的估计如下其中,随后、证明了由该调整模型导出的参数估计βA在子集区域W上n1/2是相合的.进一步,基于F检验的PT估计思想,得到了还依赖于全模型的调整PT估计βAPT.在第二章第三节中基于该调整模型,采用经验似然方法构造了参数β的置信域估计.采用第二章的方法,通过一个一维的非参函数可以成功地避免维数祸根问题,而且该推断的性质对非参函数中的方向γ不敏感,所以新方法对变量选择是稳健的.无论子模型的偏大小,数值模拟结果表明在相合意义下新的参数估计和置信域优于现有方法.
     对于线性回归模型,第二章中提出的非参调整方法能够大幅度地减小子模型的估计偏差.但是,这个方法的理论结果仅在协变量的一个子集区域W上成立.为此,在第三章中基于子模型Y=βTX+η感兴趣参数β,我们构造了一个全局无偏的工作模型假设E(Z)=0.主要的思路是首先将协变量Z分解成相互独立的分量Z(1),...,Z(q),然后利用了协变量X和协变量Z的独立分量Z(l),l=1,…,q之问的相关性信息,对于与协变量X相关的独立分量Z(l),向子模型中增加一个一维的非参调整项gt(Z(l))=E(Y-βTX|Z(l))=γTE(Z|Z(l)),从而通过多步的非参调整来减小子模型的偏差.
     (1)当协变量Z为正态分布时,将采用主成分回归(PCR)方法来构造全局无偏的调整模型,此时gl(Z(l))=αlZ(l),调整模型实际上是一个线性模型其中Z(l)是Z的第l个主成分;
     (2)否则,将采用独立成分分析(ICA)方法,其中Z(l)是Z的第l个独立成分.基于这个调整模型,采用一维非参核方法对非参项gt(Z(l))进行估计如下其中K(·)是核函数,hl是依赖于n的窗宽.代入调整模型后,得到参数β的估计如下其中,证明了参数β的这个点估计βA在协变量X和Z的全空间上是相合的,而且它是渐近正态分布的.同时,由于增加的非参调整项gl(Z(l))之间相互独立,因此避免了一般可加模型采用后拟(backfitting)方法时所产生的大量计算,而且当调整的项数K不大时新算法的计算误差将很小.当协变量Z服从正态分布时,可以对线性调整模型直接采用最小二乘得到相应结果.
     当增加的非参调整项数K较大时,第三章提出的算法将会产生较大的计算误差,该方法将失去其优势.因此在第四章,更一般地对于稀疏部分线性模型提出了一个两阶段的重建模和参数估计方法.其中,参数β是我们感兴趣的,参数γ是稀疏的.为简化起见,本章中假设U是1-维变量,E(Z)=0.实际上,f(·)可推广到U是多维变量时的可加结构.详细地,在第一阶段中利用协变量Z的独立分量Z(j)与协变量(X,U)之间的相关性,按照第三章的方法将首先构造一个多步调整的全局无偏模型在第二阶段,利用Zhao和Xue(2009)提出的半参变量选择方法,在稀疏性条件下进一步对上述调整模型进行简化.具体地,对每一个非参项gj(Z(j))和非参项f(U),采用非参正交级数方法展开进行近似然后通过组SCAD(group SCAD)方法对参数β,参数0j和v进行估计,即其中,pλ(·)是SCAD惩罚函数,定义如下满足α>2,ω>0,pλ(0)=0.令Mn={1≤j≤K0:θj≠0},记Kn=|Mn|.为简化起见,假设Mn={1,2,…,Kn}.记gj(Z(j)=E(γT|Z(j),j=1,…,Kn, ζKn=Y-βTX-g1(Z((1))-…-gKn(Z(Kn)-f(U),从而得到简化模型经过两阶段重新建模,最终得到的模型是全局条件无偏的而且是充分化简的.在理论结果中,证明了基于简化模型的参数估计β及非参估计gt和f的收敛速度,并得到了估计β的渐近正态性.因为变量选择主要依赖于参数的稀疏性,当直接对部分线性模型采用变量选择方法时,一些系数非零但与X不相关的变量可能会被选入模型,这样可能会影响参数β估计的有效性和稳定性.
     在本论文的第五章,对于协变量和误差项均服从正态分布的高维线性模型Y=βTX+γTZ+ε,基于包含感兴趣参数β的有偏子模型Y=βTX+η,将Cho和Fryzlewicz(2012)提出的倾斜变量(tilted variables)方法和Zhang和Zhang(2012)提出的松弛投影(relaxed projection)方法有机结合从而进行重新建模.如果γTE(Z|X)≠0,则E(η|X=x)是一个非零函数.所以,首先对有偏子模型,采用Cho和Fryzlewicz(2012)的方法,将协变量Z中与X相关的分量(记为Zcx)扩充到模型中,得到一个调整模型其中(?)=ε+∑k∈J\c.Y γkZ(k),J={1,2,…,q}.然后,计算样本矩阵x对应的倾斜变量其中,Ⅱzx为到由Zcx生成的空问的投影.
     (1)如果倾斜变量U0的长度不太小,可直接基于倾斜变量和调整模型,得到感兴趣参数β的估计并证明了在一定条件下该估计是相合的;
     (2)如果存在长度很小的倾斜变量,则需要按照Zhang和:Zhang(2012)勺方法对投影进行放松.具体地,定义松弛投影后的倾斜变量其中,d=|CX|,tr(V)表示矩阵V的迹,入为惩罚参数,θ满足利用倾斜变量U,即可得到参数β的一个线性估计由于对投影进行了放松,从而需要对估计βL进行纠偏.假设(β(init),γ(init))为模型参数(β,,γ)的一个初值,满足进而,构造参数β的一个新的纠偏估计如下最后,通过该估计还可构造参数β的置信区间估计.理论结果表明,这样得到的参数β的点估计是相合的,而且其置信区间估计的覆盖率是有保证的.
In modeling procedure a large number of predictors usually are introduced at initial stage as a full model. But when it is applied to real problems, lots of pre-dictors and large model will need heavy computation load and resource occupation, which may increase variance and mean squared error of the estimator, and even make the numerical algorithm unstable or abort abnormally. This will directly influence estimation of parameter and model prediction. In order to simplify the model and enhance predictability, some less significant variables are removed. So a reduced model (or restricted model) is formed. Under certain regularity conditions, the estimation of the remaining parameters in the reduced model is consistent. However, when the consistency of parameter estimation is specially taken into account for modeling with variable selection, the sparsity and oracle property (e.g. SCAD estimator) hold only in the pointwise sense, which means that such an estimator does not have a good glob-al property. Furthermore, for the case of restricted-model (RM) estimation or post-model-selection estimation, if some significant variables are unfortunately removed, even if their coefficients are nearly zero, the restricted model (or submodel) is misspec-ified and the popularly used estimators for remaining parameters converge to pseudo parameters rather than the true ones. Even if the selected submodel is only locally misspecified, the commonly used estimators for the parameters in the submodel are still inconsistent.
     On the other hand, in some application cases only part of the variables are inter-ested, because they can be easily controlled. While for the other variables, maybe they are not emphasized or cannot be properly controlled, or their influences on the response variable are not clear. If the full model is applied, the similar problems will be faced and the model may be misspecified. The submodel is highly biased if only regressed on the variables of interest, and so is the corresponding estimator. It is much reasonable to form a semiparametric model in which the variables of interest are parametric part and the other variables are nonparametric part, but the nonparametric part will be faced with the curse of dimensionality. Especially, when the dimension of the other variables are high the curse of dimensionality will ruin the estimation and prediction. In application fields more and more semiparametric models are to be faced.
     All the above indicate that, it is still a problem to consistently estimate parameters and construct confidence regions in the modeling process with variable selection or some variables specified. In this thesis, we shall mainly study some estimation methods for parameter of interest and the corresponding model prediction in the above cases. For linear regression model suppose that parameter β and covariate X are only interested. In chapter two, a semiparametric method is firstly proposed to adjust the biased submodel By finding a proper direction τ∈(?), a partially unbiased adjusted model is constructed, where g(τTZ)=E(Y-βTtX|τTZ)=-γTE(Z|τTZ). Such an adjusted model is partially locally unbiased in the sense that Applying univariate nonparametric kernel method to estimate the nonparametric term g(τTZ) as following in which K(·) is a kernel function, h is bandwidth depending on the sample size n. Substituting g(rTZ) into the adjusted model, an estimator of β is obtained as where It is proved that the estimator βA is (?)-consistent on the subset W. Furthermore, based on the idea of PT estimation with F-test, another estimator βAPT which is also relied on the full model is obtained. In the third section, a confidence region estimation is constructed by empirical likelihood method. Upon the method proposed in chapter two, the curse of dimensionality is avoided by a univariate nonparametric function. Furthermore, the property of inference is insensible to the direction τ, so the new method is robust. No matter how large the bias of the submodel, simulation results show that parameter estimation and confidence region of the new method are better than those of existing methods.
     Although the adjusted method proposed in chapter two can markedly reduce the estimation bias of the submodel, it only holds on a subset W of the covariates'support region. So in chapter three, based on the submodel Y=βT X+η and the parameter of interest β, a globally unbiased working model is constructed with E(Z)=0. The main idea is to firstly decompose covariate Z into independent components Z(1),…,Z(q)then make use of the independent components correlated with covariate X. For each component Z(l),a univariate nonparametric term gl(Z(l))=E(Y-βTX|Z(l))=γTE(Z|Z(l)), is added into the submodel to reduce the bias of the submodel.
     (1) When the covariate Z is normally distributed, the principal component regression (PCR) is applied, gl(Z(l))=αlZ(l) and the adjusted model is really a linear model where Z(l) is the l-th principal component of Z;
     (2) Otherwise, we adopt independent component analysis (ICA) method and Z(l) is the l-th independent component of Z. Based on this adjusted model, applying univariate nonparametric kernel method to estimate each nonparametric term gl(Z(l)) as following in which K(·) is a kernel function, hl is bandwidth depending on the sample size n. Substituting gl(Z(l)) into the adjusted model, an estimator of β is obtained as where It is proved that the estimator βA is globally consistent on the whole space of covariates X and Z, and it is also asymptotically normal. Because the added nonparametric parts gl(Z(l)) are independent of each other, large computation load is avoided which is faced in the backfitting method for general additive model. Furthermore, the computation error is fairly small if the number K of adjusted parts is not large. When Z is normally distributed, the corresponding result can be obtained by applying least squares method to the linear adjusted model.
     When the adjusted steps K is large or even close to the original dimension q of covariate Z, the method proposed in chapter three will result in large computation error and will lose its superiority. In chapter four, more generally, a two-stage remodeling and estimation method is constructed for sparse partially linear model in which parameter β is interested and parameter7is sparse. For simplicity, we assume that U is univariate and E(Z)=0. In fact,f(·) can be extended to additive strueture with multidimensional U.In details, in the first stage, making use of the independent components Z(j) correlated with covariate (X,U) and by a multi-step adjustment as that in chapter three, a globally unbiased model is reconstructed. In the second stage, we further reduce the adjusted model by a semi-parametric variable selection method proposed by Zhao and Xuc(2009). In details, by a truncated expansion of orthogonal series method, each nonparametric term gj(Z(j)) and nonparametric term f(U) are approximated with Then we estimate parameters β,θj and v with group SCAD method as following where Pλ(·) is the SCAD penalty function defined as satisfying a>2, ω>0, pλ(0)=0. Let Mn={1≤j≤Ko:θj≠0}, Kn=|Mn|. For simplicity, we suppose that Mn={1,2,…, Kn}. Denote gj(Z(j))=E(γTZ/Z(j)), j=1,???, Kn,ζKn=Y-βTX-g1(Z(1))-···-gKn(Z(Kn))-f(U), finally, we get model After two-stage remodeling the final model is sufficiently simplified and globally con-ditionally unbiased. In the theoretical results, the convergence rate of parametric estimator β and nonparametric estimators gl and f are proved, and the estimator β is proved to be asymptotically normal. Because variable selection much relies on the sparsity of the parameter, if we directly consider the partially linear model, some irrel-evant variables with X but with nonzero coefficients may be selected into model. This may affect the estimation of the parameter β on its efficiency and stability.
     In chapter five, for high dimensional linear model Y—βTX+γTZ+ε with normal covariates and error term, we combine the tilted variable method proposed by Cho and Fryzlewicz(2012) with the relaxed projection method proposed by Zhang and Zhang(2012) to remodel the biased submodel Y=βTX+η with parameter ft of interest. If γTE{Z|X)≠0, then E(η|X=x) is a nonzero function. So we shall firstly adjust the biased submodel with the method of Cho and Fryzlewicz(2012). After the components of Z correlated with X, which are denoted as ZCX, are added into the submodel, an adjusted model is obtained, where ζ=ε+∑k∈j\CX γkZ(k) and j={1,2,???, q}. Then we compute the tilted variables corresponding to design matrix X as where Πzx is the projection onto space spanned by ZCX.
     (1) If all the lengths of the tilted variables are not too short, based on the tilted variables Uo and the above adjusted model, parameter β can be directly estimated as It is proved that βT is consistent under mild conditions.
     (2) Otherwise, it need to relax the projection by the method of Zhang and Zhang(2012). In details, the tilted variables with relaxed projection are defined as where d-|CX|, tr(V) is the trace of matrix V, A is the penalty parameter,θ satisfying Applying the tilted variables U and based on the adjusted model, we can get a linear estimator Because the projection is relaxed, so it need to reduce the bias of βL. Suppose that (β(init),γ(init)) is an initial estimator of parameter (β,γ) of the full model, satisfying a new unbiased estimator of parameter β is constructed as following Finally, based on this point estimator, confidence region of β can also be estimated. It is proved that such a point estimator ftu of parameter β is consistent and coverage rate of the confidence region is guaranteed.
引文
[1]Acharya, D.P., Panda, G. (2008). A review of independent component analysis tech-niques and their applications. IETE Tech. Rev.,25(6),320-332.
    [2]Adragni, K.P., Cook, R.D. (2009). Sufficient dimension reduction and prediction in regression. Phil. Trans. R. Soc. A.,367,4385-4405.
    [3]Akaike, H. (1973). Information theorey and an extension of the maximum likelihood principle. In Proc.2nd International Symposium on Information Theory (V. Petrov and F. Csdki, eds.).267-281. Akademiai Kiado, Budapest.
    [4]Anderson, T.W. (2003). An introduction to multivariate statistical analysis (3rd edi-tion). John Wiley & Sons.
    [5]Barrios, E.B., Lansangan, J.R.G. (2010). Sparse Principal Component Regression. U-niversity of the Philippines Diliman (working paper).
    [6]Bates, D.M., Watts, D.G. (1988). Nonlinear regression analysis and its application. John Wiley & Sons.
    [7]Breiman, A., Friedman, J.H. (1985). Estimating optimal transformations for multiple regression and correlations (with discussion). J. Amer. Statist. Assoc.,80,580-619.
    [8]Breiman, L. (1996). Heuristics of instablity and stabilization in model selection. Ann. Statist,24,2350-2382.
    [9]Buhlmann, P., van de Geer, S. (2011). Statistics for high-dimensional data:methods, theory and applications. Springer-Verlag Berlin Heidelberg.
    [10]Buja, A., Hastie, T.J., Tibshirani, R. (1989). Linear smoothers and additive models (with discussion). Ann. Statist.,17,453-555.
    [11]Cai, T., Liu, W.D. (2011). Adaptive thresholding for sparse covariance matrix estima-tion. J. Amer. Statist. Assoc.,106,672-684.
    [12]Candes, E., Tao, T. (2007). The Dantzig selector:statistical estimation when p is much larger than n. Ann. Statist,35,2313-2351.
    [13]Chamberlain. G. (1992). Efficiency bounds for semiparametric regression. Econometri-ca,60,567-596.
    [14]Chen, A.Y., Bickel, P.J. (2005). Consistent independent component analysis and prewhitening. IEEE Trans. Sig. Proc.,53,3625-3632.
    [15]Chen, A.Y., Bickel, P.J. (2006). Efficient independent component analysis. Ann. S-tatist.,34,2825-2855.
    [16]Chen, H. (1988). Convergence rates for parametric componens in a partially linear model. Ann. Statist,16,136-146.
    [17]Cho, H., Fryzlewicz, P. (2012). High demensional variable selection via tilting.J. Roy. Statist. Soc. B,74(3),593-622.
    [18]Chen, X., Zou, C Cook. R.D. (2010). Coordinate-independent sparse sufficient dimen-sion reduction and variable selection, Ann. Statist.,38,3696-3723.
    [19]Claeskens, G., Carroll, R.J. (2007). An asymptotic theory for model selection inference in general semiparametric problems. Biometrika,94,1-17.
    [20]Cook, R.D., Forzani, L. (2009). Likelihood-based sufficient dimension reduction. J. Amer. Statistist. Assoc.,104,197-208.
    [21]Cook, R.D., Ni, L.(2005). Sufficient dimension reduction via inverse regression:A minimum discrepancy approach. J. Amer. Statistist. Assoc.,100,410-428.
    [22]Craven, P., Wahba, G.(1979). Smoothing noisy data with spline functions:estimating the correct degree of smoothing by the method of generalized cross-validation. Numer-ical Mathematics,31,377-403.
    [23]Deng, G.H., Liang, H. (2010). Model averaging for semiparametric additive partial linear models. Science China Mathematics,53,1363-1376.
    [24]Dicker, L., Lin, X. (2009). A large sample analysis of the Dantzig slector and extensions, (manuscript).
    [25]Draper, N. R., Smith, H. (1998). Applied regression analysis (wiley series in probability and statistics). Wiley-Interscience.
    [26]Efronmovich, S. (1999). Nonparametric curve estimation:methods, theory, and appli-cations. Springer-Verlag New York.
    [27]Ehsanes Saleh, A.K.Md. (2006). Theory of preliminary test and stein-type estimation with applications. John Wiley & Sons, Inc.
    [28]Fan, J., Feng, Y., Song, R.(2011). Nonparametric independence screening in sparse ultra-high dimensional additive models.J. Amer. Statistist. Assoc.,106.544-557.
    [29]Fan,.]., Gijbels, I. (1996). Local polynomial modelling and its applications. Chapman and Hall, London.
    [30]Fan, J., Hardle, W, Mammen, E. (1998). Direct estimation of low-dimensional compo-nents in additive models. Ann. Statist.,26,943-971.
    [31]Fan, J., Jiang, J. (2005). Nonparametric inferences for additive models.J. Amer. S-tatist. Assoc.,100,890-907.
    [32]Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc.,96,1348-1360.
    [33]Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. Roy. Statist. Soc. B,70,849-911.
    [34]Fan, J., Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica,20,101-148.
    [35]Fan, J., Lv, J. (2011). Properties of non-concave penalized likelihood with NP-dimensionality. IEEE Trans. Inf. Theory,57(8),5467-5484.
    [36]Fan, J., Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist.,32,928-961.
    [37]Fan, Y., Li, Q. (2003). A kernel-based method for estimating additive partially linear models. Statistica Sinica,13,739-762.
    [38]Friedman, J.H., Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc.,87,998-1004.
    [39]Fu, W.J. (1998). Penalized regressions:the Bridge versus the lasso. J. Comp. Graph. Stat.,7,397-416.
    [40]Gai, Y., Lin, L., Wang, X.(2011). Consistent inference for biased sub-model of high-dimensional partially linear model. J. Statist. Plan. Infer.,141(5),1888-1898.
    [41]高集体,赵林城.(1992).部分线性模型中的自适应估计.中国科学,A辑,22(8),791-803.
    [42]Glad, I.K. (1998). Parametrically guided non-parametric regression. Scand. J. Statist., 25 649-668.
    [43]Hall, A.R., Inoue, A. (2003). The large sample behavior of the generalized method of moments estimator in misspecified models. J. Econometrics,114,361-394.
    [44]Hardle, W., Hall, P. Ichimura, H. (1993). Optimal smoothing in single-index models. Ann. Statist,21,157-178.
    [45]Hardle, W., Liang, H., Gao, J.T. (2000). Partially linear models. Physica Verlag.
    [46]Hardle, W., Mammen, E. (1993). Comparing nonparametric versus parametric regres-sion fits. Ann. Statist,21(4),1926-1947.
    [47]Hart, J.D. (1997). Nonparametric Smoothing and Lack-of Fit Tests. Springer-Verlag, New York, Inc.
    [48]Harville, D.(1976). Extension of the Gauss-Markov theorem to include the estimation of random effects.. Ann. Statist,4(2),384-395.
    [49]Hastie, T.J., Tibshirani, R.(1993) Varying-coefficient models. J. Roy. Statist. Soc. B, 55,757-796.
    [50]Hjort. N.L., Claeskens, G. (2003). Frequentist model average estimators (with Discus-sion). J. Amer. Statist. Assoc.,98,879-899.
    [51]Hjort, N.L., Glad, I.K. (1995). Nonparametric density estimation with a parametric start. Ann. Statist,23,882-904.
    [52]Hjort, N.L., Jones, M.C. (1996). Locally parametric nonparametric density estimation. Ann. Statist,24,1619-1647.
    [53]Hyvarinen, A., Karhunen, J., Oja, E. (2001). Independent component analysis. John Wiley & Sons.
    [54]Hyvarinen,A., Oja. E. (1997). A fast fixed-point algorithm for independent component analysis. Neural Computation,9(7),1483-1492.
    [55]James, G.M., Radchenko, P. (2009). A generalized Dantzig selector with shrinkage tuning. Biometrika,96,323-337.
    [56]Johnstone, I.M., Lu, A.Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc.,104,682-693.
    57] Kariya, T., Kurata, H. (2004). Generalized Least Squares. John Wiley & Sons.
    58] Kim, Y., Choi, H., Oh, H.S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc.,103,1665-1673.
    59] Kim, Y., Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika,99(2),315-325.
    [60]Kitamura, Y., Tripathi, G., Ahn, H. (2004). Empirical likelihood-based inference in conditional moment restriction models. Econornetrica,72,1667-1714.
    [61]Kutner, M.H., Naehtsheim, C., Neter, J. (2004). Applied linear regression models. McGraw-Hill New York, NY.
    [62]Leeb, H. (2009). Conditional predictive inference post model selection. Ann. Statist. 37,2838-2876.
    [63]Leeb, H., Poetscher, B.M. (2005). Model selection and inference:facts and fiction. Econometric Theory,21.21-59.
    [64]Leeb, H., Poetscher, B.M. (2008). Sparse estimators and the oracle property, or the return to the Hodges' estimator.J. Econometrics.,142,201-211.
    [65]Lewis, T., Odell, P. (1966). A generalization of the Gauss-Markov theorem.J. Amer. Statist. Assoc.,61,1063-1066.
    [66]李根,邹国华,张新雨.(2012).高维模型选择方法综述.数理统计与管理,31(4),640-658.
    [67]Li, L. (2007). Sparse sufficient dimension reduction. Biometrika,94,603-613.
    [68]Li, L., Zhu, L., Zhu, L. (2011). Inference on the primary parameter of interest with the aid of dimension reduction estimation. J. Roy. Statist. Soc. B,73(1),59-80.
    [69]Li, Q. (2000). Efficient estimation of additive partially linear models. International Economic Review,41(4),1073-1092.
    [70]Li, R., Zhong, W., Zhu, L. (2012). Feature screening via distance correlation learning, J. Amer. Statist. Assoc.,107:499,1129-1139.
    [71]Lin, L., Cui, X., Zhu, L. (2008). An adaptive two-stage estimation method for additive models. Scand. J. Statist,36,248-269.
    [72]Lin, L., Li, F. (2008). Stable and bias-corrected estimation for nonparametric regression models. J. Nonparametr. Stat,20,283-303.
    [73]Lin, L., Zeng, Y., Zhu, L. (2008). A semiparametric estimation approach for biased sub-models of high-dimensional linear regression models, (manuscript).
    [74]Lin, L., Zhu, L., Gai, Y. (2012). Quasi-instrumental variable-based inference for high-dimensional non-sparse models, (manuscript).
    [75]Lv. J., Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist.,37,3498-3528.
    [76]Mallows, C.L. (1973). Some comments on Cp. Technometrics,12,661-675.
    [77]Miller, A.J. (2002). Subset selection in regression (2nd edition). Chapman and Hall/ CRC Press, London and New York.
    [78]Montgomery, D.C., Peck, E.A., Vining, G.G., Vining, J. (2001). Introduction to linear regression analysis. Wiley New York.
    [79]Naito, K. (2004). Semiparametric density estimation by local L2 fitting. Ann. Statist., 32,1162-1191.
    [80]Opsomer, J.D., Ruppert, D. (1999). A root-n consistent backfitting estimator for semi-parametric additive modeling. J. Amer. Statist. Assoc.,8,715-732.
    [81]Owen, A. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika,75,237-249.
    [82]Owen, A. (1990). Empirical likelihood ratio confidence regions. Ann. Statist.,18,90-120.
    [83]Owen, A. (1991). Empirical likelihood for linear models. Ann. Statist,19,1725-1747.
    [84]Pfeifermann. D. (1984). On extensions of the Gauss-Markov theorem to the case of stochastic regression coefficients. J. Roy. Statist. Soc. B,46(1),139-148.
    [85]Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann. Statist,22,300-325.
    [86]Ruppert. D., Sheather, S.J., Wand, M.P. (1995). An effective bandwidth selector for local least squares regression. J. Arner. Statist. Assoc.,90,1257-1270.
    [87]Rutimann, P., Biihlmann, P. (2009). High dimensional sparse covariance estimation via directed acyclic graphs. Electronic J. Statist.,3,1133-1160.
    [88]Salinelli, E. (2009). Nonlinear principal components, Ⅱ:Characterization of normal distributions. J. Multi. Anal.,100,652-660.
    [89]Samarov, A., Tsybakov, A. (2004). Nonparametric independent component analysis. Bernoulli,10,565-582.
    [90]Schennach, S.M. (2007). Point estimation with exponentially tilted empirical likelihood. Ann. Statist,35,634-672.
    [91]Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist.,6,461-464.
    [92]Seber, G.A.F., Wild, C.J. (2003). Nonlinear regression. John Wiley & Sons.
    [93]Sen, P.K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist.,5,1019-1033.
    [94]Sen, P.K., Ehsanes Saleh, A.K.M. (1987). On preliminary test and shrinkage M-estimation in linear models. Ann. Statist.,15,1580-1592.
    [95]Severini. T.A. (1998). Some properties of inferences in inisspecified linear models. S-tatist. Probab. Lett.,40,149-153.
    [96]Shen, D., Shen, H.P., Marronx, J.S. (2011). Consistency of sparse pea in high dimen-sion,low sample size contexts. arXiv:1104.4289vl, [math.ST].
    [97]Shen, X.T., Huang, H.C., Ye, J. (2004). Inference after model selection.J. Amer. Statist. Assoc.,99,751-762.
    [98]Shi, J., Lau, T.S. (2000). Empirical likelihood for partially linear models.J. Multi. Anal,72(1),132-148.
    [99]Simas Filho, E.F., Seixas, J.M. (2007). Nonlinear independent component analysis: theoretical review and applications. Learning and Nonlinear Models,5(2),99-120.
    [100]Speckman, P. (1988). Kernel smoothing in partial linear models.J. Roy. Statist. Soc. B,50,413-436.
    [101]Stone, M. (1974). Cross-validatory choice and assessment of statistical predietions (with discussions).J. Roy. Statist. Soc. B,36,111-147.
    [102]Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist.,10,1040-1053.
    [103]Sun, T., Zhang, C. (2012). Scaled sparse linear regression. arXiv:1104.4595v2, [s-tat.ML],21 Jun.2012.
    [104]Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B,58,267-288.
    [105]van der Vaart, A.W. (2000). Asymptotic statistics. Cambridge University Press.
    [106]Wang, H., Li, G., Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the LAD- Lasso. J. Bus. Econom. Statist.,25,347-355.
    [107]Wang, L., Chen, G., Li, H. (2007). Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics.23,1486-1494.
    [108]Wang, L., Li, H., Huang, J. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Amer. Statist. Assoc., 103,1556-1569.
    [109]Wang, L., Li, H., Tsai, C.L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika,94,553-568.
    [110]Wang, Q., Jing, B. (2003). Empirical likelihood for partially linear models. Ann. Inst. Statist. Math.,55,585-595.
    [111]Wu, S., Harris, T.J., McAuley, K.B. (2007). The use of simplified or misspecified mod-els:linear case. Can. J. Chem. Eng.,85,386-398.
    [112]Ye, F., Zhang, C. (2010). Rate minimaxity of the lasso and dantzig selector for the  loss in lr balls. J. Machine Learning Research,11,3519-3540.
    [113]Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. B,68,49-67.
    [114]Yuan, M., Ekici, A., Lu, Z., Monteiro, Y. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. Roy. Statist. Soc. B,69,329-346.
    [115]Zhang, C., Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Statist.,36(4),1567-1594.
    [116]Zhang. C. Zhang, S. (2012). Confidence intervals for low-dimensional parameters with high-dimensional data. arXiv:1110.2563v2, [stat.ME],2 Nov.2012.
    [117]Zhang, P. (1992). Inference after variable selection in linear regression models. Biometrika,79,741-746.
    [118]Zhao, P., Xue, L. (2009). Variable selection for semiparametric varying coefficient par-tially linear models. Statist. Probab. Lett.,79,2148-2157.
    [119]Zhou, Z., Jiang, R., Qian, W. (2011). Variable selection for additive partially linear models with measurement error. Metrika,74(2),185-202.
    [120]Zhu, L., Xue, L. (2006). Empirical likelihood confidence regions in a partially linear single-index model. J. Roy. Statist. Soc. B,68,549-570.
    [121]Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. B,67,301-320.
    [122]Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc., 101,1418-1429.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700