删失数据下若干半参数模型的经验似然与惩罚经验似然推断
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
回归分析是统计学研究中的一个重要领域.本文主要研究将经验似然方法应用于响应变量被随机删失时几种常用的回归模型,对回归模型中的参数进行统计推断.删失数据是医学、可靠性工程、金融保险、环境科学等科学研究和实际问题中会经常出现一种重要的统计数据.
     对于这种响应变量随机删失的回归模型,回归分析中的标准方法如最小二乘法不能直接应用,于是如何在删失数据下对回归模型进行统计分析就需要深入的探讨,因此对响应变量随机删失回归模型的研究具有重要意义.
     经验似然方法是由Owen(1988)提出的一种非参数统计方法,与传统的渐近正态方法构造参数的置信域相比,经验似然方法则不需要估计参数的渐近方差,这是经验似然方法的一个优点.尤其是随机删失回归模型中参数估计量的渐近方差计算复杂,因此,应用经验似然方法更有意义.本文研究了随机删失数据下回归模型中参数估计问题,给出了经验似然比统计量,证明了其渐近分布为χ2分布,避免在构造参数置信域时需要估计渐近方差,提高了估计的精确性.
     另一方面,变量选择是目前回归分析中研究的热点问题之一.有效的变量选择方法可以选择显著的变量,剔除多余的变量,提高模型的预测精度Tibshirani(1996)提出了LASSO惩罚方法,它是一种系数压缩的思想方法,相对于传统子集选择方法计算量小,而且稳定.目前利用系数压缩的思想方法得到了统计学界的极大关注,一些统计学家相继提出了各种基于惩罚函数的变量选择方法,证明变量选择具有的Oracle性质.本文利用将系数压缩的变量选择方法与经验似然方法相结合的惩罚经验方法,研究了Cox比例风险模型中变量选择和参数估计问题.本文研究的主要内容包括以下几部分:
     第二章研究了响应变量在随机右删失情形下非线性半参数回归模型的参数估计问题,给出了关于未知参数的经验对数似然比统计量和调整经验对数似然比统计量,在一定条件下,证明了所给的经验似然比统计量渐近于χ2分布,并由此可以构造关于未知参数的置信域.此外,也给出了未知参数的最小二乘估计量,证明了它的渐近性质.模拟结果表明,经验似然方法在置信域的覆盖概率以及精度方面要优于最小二乘法.
     第三章研究了响应变量随机右删失非参数协变量带有测量误差情形下的非线性半参数回归模型参数估计问题,给出了关于未知参数的经验对数似然比统计量和调整经验对数似然比统计量,在一定条件下,证明了所给的经验似然比统计量渐近于χ2分布,并由此可以构造关于未知参数的置信域.此外,也给出了未知参数的最小二乘估计量,证明了它的渐近性质.模拟结果表明,经验似然方法在置信域的覆盖概率以及精度方面要优于最小二乘法.
     第四章研究了在响应变量随机右删失情形下的半参数变系数部分线性EV模型参数部分估计问题,构造了关于未知参数的经验对数似然比统计量,并证明了所构造的经验似然比统计量渐近于χ2分布,据此结果可以用来构造未知参数的置信域.通过模拟,在有限样本情形下,对经验似然方法和正态近似方法构造的置信区间在区间长度和覆盖概率两个方面进行了比较.
     第五章在Cox比例风险模型中,用惩罚经验似然方法研究模型中变量选择问题.利用Bridge惩罚函数,在一定的条件下,讨论了惩罚经验似然的Oracle性质,定义了回归系数的惩罚经验似然比统计量,证明了它渐近服从卡方分布.模拟研究表明Bridge惩罚经验似然方法具有较好的性质.
     第六章研究了保险精算中一类复合分布的计算问题,其索赔数变量属于一个较为广泛的分布族,而索赔额变量为混合型分布.首先给出复合分布满足的递归方程,然后将其用于超额损失再保险中得到相应的递归方程.最后,给出一些具体例子及数值计算结果.
Regression analysis is an important area of statistical research. The paper studies several commonly used regression models when the response variable is randomly censoring of, by the means of the empirical likelihood method, and statistically inference the parameters in the regression model. Censored data is an important statistical data in research and reality of fields like medicine, reliability engineering, finance and insurance, environmental science and so on.
     For the regression model which the response variable is randomly censored, standard methods of regression analysis such as least squares method cannot be applied directly, so how to statistically analyse the regression model when there are censored data needs to be discussed in depth, and the study on the regression model when the response variable is randomly censored is of great significance
     Empirical likelihood method proposed by Owen (1988) is a non-parametric statistical method. Compared with traditional asymptotic normality method to construct confidence re-gion of parameters, the empirical likelihood method do not care about estimating asymptotic variance of parameters, which is an advantage of the empirical likelihood method. Further-more, the expression of asymptotic distribution variance of the parameter estimators in the model of the randomly censored regression is complex. Then, the application of empirical likelihood method is more meaningful.
     The paper studies the problem of parameter estimation in the regression model when the response variable is randomly censored, gives the empirical likelihood ratio statistic, and makes its asymptotic distribution is χ2distribution, avoids estimating asymptotic variance when constructing empirical likelihood confidence region of parameters, and improves the accuracy of the estimation.
     On the other hand, variable selection is one of the hot issues of the regression analysis research so far. Effective variable selection methods can select the remarkable variables and eliminate redundant variables to improve the prediction accuracy of the model. Tibshirani (1996) proposed LASSO penalty method, which is a coefficient shrunk method, and compared with the traditional subset selection method,the amount of its calculation is little and stable. At present, the coefficient shrunk method has been greatly concerned by statisticians, and some new penalty methods have been proposed to prove the Oracle property of selections.
     The paper studies variable selection and parameter estimation of the Cox proportional hazards model,which uses penalized empirical likelihood method combining coefficient shrunk method with empirical likelihood method.
     The main contents of this paper contain several following chapters.
     The second chapter investigates the question of the parameter estimation in non-linear semi-parametric regression model when the response variable is randomly right censored, constructs empirical log-likelihood ratio statistic and adjusted empirical log-likelihood ratio statistic for unknown parameters, proves that the constructed empirical likelihood ratio fol-lows an asymptotically χ2distribution under certain conditions, and constructs a confidence region of the unknown parameters. In addition, this chapter constructs least squares esti-mators of the unknown parameters, and proves its asymptotic properties. By corresponding simulation results, the empirical likelihood method is better than the least squares method at the coverage probability and accuracy of confidence region.
     The third chapter investigates the question of the parameter estimation in non-linear semi-parametric regression model when the response variable is randomly right censored and the nonparametric covariate has measurement error. An empirical log-likelihood ratio statistics for unknown parametric components is proposed, and it is proved that the pro-posed statistics follow an asymptotically χ2distribution under the null hypothesis, and the consequence can be used to construct the confidence region of the unknown parameter In addition, the least squares estimator of the unknown parameters is constructed, and its asymptotic properties is proved. Corresponding simulation results show that the empirical likelihood method is better than the least squares method at the coverage probability of the confidence region as well as precision.
     The fourth chapter mainly investigates the question of the estimation of the parame-ter part of semiparametric varying-coefficient partially linear errors-in-variables models in the condition of random right censored response variable. An empirical log-likelihood ratio statistics for unknown parametric components is proposed, and it is proved that the pro-posed statistics follow an asymptotically χ2under the null hypothesis, and the consequence can be used to construct the confidence region of the unknown parameter. By imitating, the confidence regions constructed by empirical likelihood method and the normal approxima-tion method are compared in terms of length of interval and coverage probability under the condition of finite sample.
     In the fifth chapter, the question of variable selected is researched with penalized em-pirical likelihood method in Cox proportional hazards model. The penalized function used is Bridge. The Oracle property of penalized empirical likelihood is discussed under certain conditions, namely, select the non-zero coefficients with probability1and the non-zero coef-ficients following a progressive normal distribution have the asymptotic normality. A penalty empirical likelihood ratio for regression coefficients is defined and it is proved to follow an asymptotically χ2distribution. Simulations and a real data example show that the proposed bridge penalty empirical likelihood have satisfying characters.
     The sixth chapter investigates the calculation of a kind of composite distribution in insurance and actuarial. The number of claim variable belongs to a widely distributed family, and claims amount follows a hybrid distribution. Firstly, present the recursive equation that composite distribution is satisfied. Secondly it is applied to the excess-of-loss reinsurance treaty to obtain corresponding recursive equation. Finally, give some concrete examples and numerical results.
引文
[1]Owen A B. Empirical likelihood ratio confidence intervals for single functional[J]. Biometrika,1988, 75(2):237-249.
    [2]Miller A. Subset Selection in Regression[M]. London:Chapman and Hall,1994.
    [3]Tibshirani R. Regression Shrinkage and Selection via the Lasso [J]. Journal of The Royal Statistical Society, Series B,1996,58:267-288.
    [4]Fan J, Li R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties[J]. Journal of the American Statistical Association,2001,96:1348-1360.
    [5]Zou H. The adaptive Lasso and its Oracle properties[J]. Journal of the American Statistical Asso-ciation,2006,101:1418-1429.
    [6]Zou H, Hastie T. Regularization and variable selection via the elastie net[J]. Jornal of The Roaly Statistical Society Series B,2005,67(2):301-320.
    [7]Candes E, Tao T. The Dantzig selector:statistical estimation when p is much larger than n[J]. The Annals of Statistics,2007,35(6):2313-2351.
    [8]Tang C Y, Leng C L. Penalized high-dimensional empirical likelihood[J]. Biometrika,2010,97(4):905-920.
    [9]Tibshirani R. The lasso method for variable selection in the cox model[J]. Statistics in Medicine, 1997,16:385-395.
    [10]Fan J, Li R. Variable selection for Cox's proportional hazards model and frailty model[J]. The Annals of Statistics,2002,30:74-99.
    [11]Zhang H H, LU W B. Adaptive-Lasso for COX proportiona hazards model[J]. Biometrika, 2007,94 (3):691-703.
    [12]Buckey J, James. Linear Regression with Censored Data[J]. Biometrika,1979,66:429-436.
    [13]Koul H, Susaela V, Van Rtzin J. Regression analysis with randomly Right-censored data[J]. The Annals of Statistics,1981,9:1276-1288.
    [14]Owen A B. Empirical likelihood ratio confidence regions[J]. The Annals of Statistics,1990,18, 90-120.
    [15]Cox D R. Regression models and lifetables(with diseussion)[J]. Journal of The Royal Statistieal Soeiety B,1972,34:187
    [16]Frank I E, Friedman J H. Astatistieal view of some chemometries Regression tools(with diseussion). Teehnometri's,1993,35(1):109-148.
    [17]Owen A B. Empirical likelihood for linear models[J]. The Annals of Statistics,1991,19:1725-1747.
    [18]Qin J, Lawless J. Empirical likelihood and general estimating equations[J). The Annals of Statistics, 1994,22:300-325.
    [19]Kolaczyk E D. Empirical likelihood for generalized linear models[J]. Statistica Sinica,1994,4:199-218.
    [20]Kitamura Y. Empirical Likelihood Methods with Weakly Dependent Processes[J]. The Annals of Statistics,1997,25:2084-2102.
    [21]Wang Q H, Jing B Y. Empirical likelihood for partial linear model with fixed designs[J]. Statistics & Probability Letters,1999,41:425-433.
    [22]Chen S X, Qin Y S. Empirical likelihood confidence interval for a local linear smoother[J]. Biometrika,2000,87:946-953.
    [23]Whang Y J. Smoothed empirical likelihood methods for quantile regression models[J]. Econometric Theory,2006,22(2):173-205.
    [24]Xue L G, Zhu L X. Empirical likelihood for single-index model[J]. Journal of Multivariate Analysis, 2006,97:1295-1312.
    [25]Zhu L X, Xue L G. Empirical likelihood confidence region for partially linear single-index model[J]. Journal ol the Royal Statistical Society Series B,2006,68:549-570.
    [26]You J H, Zhou Y. Empirical likelihood for semi-parametric varying coefficient partially linear regression models[J]. Statistics & Probability Letters,2006,76:412-422.
    [27]Huang Z S, Zhang R Q. Empirical Likelihood for Nonparametric Parts in Semiparametric Varying-Coefficient Partially Linear Models[J]. Statistics & Probability Letters,2009,79:1798-1808.
    [28]Zhu L X, Lin L, Cui X, Li G R. Bias-corrected empirical likelihood in a multi-link semiparametrie model[J]. Journal of Multivariate Analysis,2010,101:850-868.
    [29]Thomas D R., Grunkemeier G L. Confidence intercal estimation for survival probabilities for cen-sored data[J]. Journal of the American Statistical Association,1975,865-871.
    [30]Qin G S, Jing B Y. Empirical likelihood for censored linear regression models[J]. Scandinavian Journal of Statistics,2001a, C61-673.
    [31]Qin G, Tsao M. Empirical likelihood inference for median regression models for censored survival data[J]. Journal of Multivariate Analysis,2003,85:416-430.
    [32]Wang Q, H, Li G. Empirical likelihood semiparametric regression analysis under random censor-ship[J]. Journal of Multivariate Analysis,2002 83:469-486.
    [33]Qin G.S., Jing B.Y. Empirical likelihood for the Cox regression model under randomcensroship[J]. Communications in Statistics-Simulation and Computation,2001b,79-90.
    [34]Lu W, Liang Y. Empirical likelihood inference for linear transformation models[J]. Journal of Multivariate Analysis,2006,97,1586-1599.
    [35]Akaike H, Petrov B N, Csaki F Information theory and an extension of the maximum likelihood principle.Proceedings of the Second InternationalSymposium on Information Theory[J]. Budapest, 1973:267-281.
    [36]Schwarz G. Estimating the dimension of a model[J]. The Annals of statistics,1978,6:461-464.
    [37]Mallows C L. Some comments on Cp[J]. Technometrics,1973,15:661-675.
    [38]Variyatha A M, Chen J H, Abrahamc B. Empirical likelihood based variable selection[J]. Journal of Statistical Planning and Inference.2010,140(4):971-981.
    [39]Efron B, Hastie T, Johstonel, Tibshirani R. Least angle regression[J]. The Annals of Statistics, 2004,32:407-499.
    [40]Fan J Q, Li R Z. New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis[J]., Journal of the American Statistical Association,2004,99,710-723.
    [41]Li R Z, Liang H. Variable selection in semiparametric regression modeling[J]. The Annals of Statistics,2008,36(1):261-286.
    [42]Wang L, Liu X, Liang H, Carroll R J. Estimation and variable selection for generalized additive partial linear models[J]. The Annals of Statistics,2011,39(4):1827-1851.
    [43]Kim Y, Choi H, Oh H S. Smoothly clipped absolute deviation on high dimensions[J]. Journal of the American Statistical Association,2008,103:1665-1673.
    [44]Xue L, Qu A. Variable selection in high-dimensional varying-coefficient models with global opti-mality[J]. Journal of Machine Learning Research,2012,13:1973-1998.
    [45]Yang Y P, Xue L G, Cheng W H. Variable selection for partially linear models with randomly censored data[J]. Communications in Statistics-Simulation and Computation,2010,39(8),1577-1589.
    [46]Fan J Q, Feng Y, Wu Y C. High-dimensional variable selection for Cox's proportional hazards model[J]. Institute of Mathematical Statistics,2010,6:70-87.
    [47]Li G, Wang Q H. Empirical likelihood regression analysis for right censored data[J]. Statistica Sinica,2003,13(1):51-68.
    [48]Wang Q H, Li G. Empirical likelihood semiparametric regression analysis under random censor-ship[J]. Journal of Multivariate Analysis,2002,83(2):469-486.
    [49]刘强,薛留根,陈放.删失数据下部分线性EV模型中参数的经验似然置信域[J].数学学报,2009,52(3):549-560.
    [50]陈放,李高荣,冯三营.右删失数据下非线性回归模型的经验似然推断[J].应用数学学报,2010,33(1):130-141.
    [51]Kaplan E L, Meier P. Nonparametric estimation from incomplete observations [J]. Journal of the American Statistical Association,1958,53(3):457-481.
    [52]Rao J N, Scott A J. The analysis of categorical data from complex sample surveys:chi-squared tests for goodness of fit and independence in two-way tables [J]. Journal of the American Statistical Association,1981,76(2):221-230.
    [53]冯三营,薛留根,范承华.非线性半参数回归模型中参数的经验似然置信域[J].数学物理学报,2009,29A(5):1338-1349.
    [54]洪圣岩,成平.随机删失场合部分线性模型中的核光滑方法[J].应用概率统计,1995,16A(4):441-453.
    [55]秦更生.随机删失场合部分线性模型中的核光滑方法[J].数学年刊,1995,16A(4):441-453.
    [56]Wang Q H, Zheng Z K. Asymptotic properties for the semiparametric regression model with randomly censored data [J]. Science in China, Ser A 1997,40:945-957.
    [57]Lai T L,Ying Z,Zhang Z K. Asymptotic normality of a class of adaptive statistics with applica-tions to synthetic data methods for censored regression [J]. Journal of Multivariate Analysis,1995, 52(2):259-279.
    [58]Zhou M. Asymptotic Normality of the Synthetic Estimator for Censored Survival Data [J]. The Annals of Statistics,1992,20(3):1002-1021.
    [59]Stute W,Wang J L. The strong law under random censorship[J]. The Annals of Statistics,1993, 21(1):146-156.
    [60]Liang H. An application of the Bernstein's inequality[J]. Econometrics Theory,1999,152:905-906.
    [61]Liang H. Asymptotic normality of parametric part in partly linear models with measurement error in the nonparametric part[J]. Journal of Statistical Planning and Inference.2000,86(1):51-62.
    [62]冯三营,薛留根.非线性半参数EV模型的最大经验似然估计[J].数学物理学报,2012,32A(2):1-14.
    [63]Hardle W, Liang H, Gao J T. Partially linear models[M]. Heidellberg:Physica-Verlag,2000.
    [64]Fan J, Truong Y K. Nonparametric regression with errors in variables[J]. The Annals of Statistics, 1993,21:1900-1925.
    [65]Fan J, Huang T. Profile likelihood inferences on semi-parametric varying-coefficient partially linear models [J]. Bernoulli,2005,11(6):1031-1057.
    [66]Xia Y, Li W K. On the estimation and testing of functional-coefficient linear models [J]. Statistica Sinica,1999,9:737-757.
    [67]Zhang W. Local polynomimal fitting in semivarying coefficient model[J]. Journal of Multivariate Analysis,2002,82:166-188.
    [68]Xia Y, Zhang W, Tong H. Efficient estimation semi-parametric varying-coefficient models [J]. Biometrika,2004,91:661-681.
    [69]Shi G, Lau T S. Empirical likelihood for partially linear models [J]. Journal of Multivariate Analysis, 2000,72(1):132-148.
    [70]You J H, Chen G M. Estimation of a semi-parametric varying-coefficient partially linear errors-in-variables model[J]. Journal of Multivariate Analysis,2006,97:324-341.
    [71]Liang H,Hardle W, Carroll R J. Estimation in a semi-parametric partially linear errors-in-variables models.[J]. The Annals of Statistics.1999,27:1519-1533.
    [72]Hu X M, Wang Z Z, Zhao Z W. Empirical likelihood for semiparametric varying-coefficient partially linear errors-in-variables models[J]. Statistics & Probability Letters 2009,79:1044-1052.
    [73]Wang X L, Li G R, Lin L. Empirical likelihood inference for semi-parametric varying-coefficient partially linear EV models [J]. Metrika,2011,73:171-185.
    [74]Srinivasan C, Zhou M. Linear regression with censoring [J]. Journal of Multivariate Analysis, 1991,49:179-201.
    [75]Cox D R. Partial likelihood [J]. Biometrika,1975,62:269-276.
    [76]Owen. Empirical Likelihood [M]. Florida:Chapman and Hall-CRC,2001.
    [77]Fleming T, Harrington D. Counting Processes and Survival Analysis [M]. New York:Wiley 1991.
    [78]Fan J,Lv J. Sure independence screening for ultra- high dimensional feature space[J]. Journal of the Royal Statistical Society Series B.2008,70:849-911
    [79]Anestis A,Piotr F, Frederique L. The Dantzig Selector in Cox's Proportional Hazards Model[J]. Scandinavian Journal of Statistics.2010,37:531-552.
    [80]Van Der Varrt A W. Asymptotic Statistics [M]. Cambridge University Press,1998.
    [81]Panjer H H. Recursive evaluation of a family of compound distributions [J]. ASTIN Bulletin,1981,12:22-26.
    [82]Wang R M, Yuen K C, and Zhu L X. On the distributions of two classes of multiple dependentag-gregate claims[J]. Acta Mathematica Applicatae Sinica (English Series),2008,24:655-668.
    [83]Zhou J, Yang J P, Cheng S H, and Cheng Q S. Recursive equations of compound distributions for bivariate mixed-type severity distribution [J] Advances in Mathematics,2005,34:54-72.
    [84]Kaas R, Goovaerts M, Dhaene J, and Denuit M. Modern Actuarial Risk Theory (2nd ed) [M]. Kluwer Academic Publishers, Boston,2008.
    [85]Yang J P, Cheng S H, and Wu Q. Recursive equations for compound distribution with the severity distribution of the mixed type [J]. Science in China Ser.A,2005,48:594-609.
    [86]Panjer H H and Wang S. On the stability of recursive formulas [J]. ASTIN Bulletin,1993,23:227-258.
    [87]Sundt B. On some extensions of panjer's class of counting distributions [J]. ASTIN Bulletin,1992,22:61-80.
    [88]Panjer H H and Wang S. Computational aspects of sundt's generalized class [J]. ASTIN Bulletin,1995,25:5-17.
    [89]Rolski T, Schmidle H, Schmidt V, and Teugels J. Stochastic Processes for Insurance and Finance [M]. John Wiley, Chichester,1999.
    [90]Panjer H H and Willmot G E. Insurance Risk Models [M]. Society of Actuaries, Chicago,1992.
    [91]Ruohonen M. On a model for the claim number process [J]. ASTIN Bulletin,1988,18:57-68.
    [92]Murat M and Szynal D. On moments of counting distributions satisfying the kth-order recursion and their compound distributions[J]. Journal of Mathematical Sciences,1998,92:4038-4043.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700