基于样条函数的两类回归模型的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
与参数方法相比,非参数方法可以根据观测数据更加灵活地构建模型从而减少建模偏差带来的风险,帮助人们选择适当的参数模型。作为常用的非参数方法之一,样条光滑法具有简单易行和计算速度快的优点。本文将采用样条光滑法对非参数时间序列回归模型和含有连续型区间删失协变量的线性回归模型中的三个问题进行研究。
     在第二章中我们采用样条光滑法对非参数时间序列回归模型中条件方差函数的估计及置信带构造的问题进行了研究。首先,我们采用样条光滑法得到条件均值函数的样条估计,然后,基于残差再次采用样条光滑法得到条件方差函数的样条估计。在a混合条件下,我们推导出一致收敛速度。进一步,我们采用常数样条和线性样条构造了条件方差函数的同时置信带。同时置信带可以用于检验条件方差函数是否具有某种参数形式。例如,检验条件异方差是否存在或检验条件方差函数是否是二次函数等。在模拟中,我们不仅考虑了条件均值函数和条件方差函数都光滑的例子,也考虑了条件均值函数或条件方差函数不光滑的例子。在实例中,我们对S&P500指数进行了研究。通过模拟和实例的研究,我们发现所提的方法可以得到较好的估计结果,而且比常用的局部多项式光滑法的计算速度快。
     在第三章中我们采用样条光滑法对非参数时间序列回归模型中的条件均值函数进行了跳点探测的研究。我们假设条件均值函数中跳点的数量、位置和跳跃幅度都是未知的。首先,我们采用样条光滑法得到条件均值函数的样条估计。然后,根据相邻节点间条件均值函数的样条估计的最大差值构造探测跳点是否存在的检验统计量,并在条件均值函数连续的原假设下得到了检验统计量的极限分布。之后我们根据检验统计量判断出与真正的跳点相邻的节点,用节点来推测跳点的位置,再通过增加样条基函数的方法对数据再次拟合来估计跳跃的幅度。在模拟中,我们对500次重复试验中探测出跳点的频率、探测跳点数目出错的频率及跳点的覆盖率进行了详细的分析。
     在第四章中我们对含有连续型区间删失协变量的线性回归模型的参数估计问题进行了研究。含有区间删失协变量的线性回归模型是由Gomez、Espinal和Lagakos[9]于2003年根据一个AIDS临床试验的研究提出的。Gomez等[9]研究的被删失的协变量是离散型随机变量。为了估计回归系数,Gomez等[9]提出一种求似然函数极值的方法,同时也提出一种两步算法来寻找似然函数的极大值。但是,Gomez等[9]的迭代算法复杂且耗时很长,而且Gomez等[9]也指出她们的算法不适用于被删失的协变量是连续型随机变量的情况。在本章中,我们对这个尚未解决的难题——当被删失的协变量是连续型随机变量时如何估计回归系数的问题进行了研究。我们提出了一种快速算法来估计线性模型的回归系数。首先,基于Stone[54]、Kooperberg和Stone[55]的对数样条(logspline)模型,我们得到被删失的协变量的密度函数估计和分布函数估计。然后,我们利用条件期望构造了一个变量来取代被删失的协变量。最后再通过最小二乘法得到回归系数的估计。在模拟中,我们把所提的方法与中点替代法和半参数贝叶斯方法进行了比较。通过大量的模拟研究,我们发现在删失区间长度可变的情况下,我们的方法可以得到更精确的估计结果和更小的MSE值,而且其运算速度是半参数贝叶斯方法的100倍。
     在第五章中,我们对前三章内容进行了总结,分析了我们的方法的优点和不足之处,并提出了进一步的研究方向。
Compared with parametric methods, nonparametric techniques have the flexibility of constructing models based on observations, which can reduce modeling biases and help us to choose appropriate parametric models. As one of the commonly used nonparametric techniques, the spline smoothing method has the advantage of simple implementation and fast computation. In this paper, we will use the spline smoothing method to study three problems proposed in the nonparametric time series regressive model and the linear regression model with an interval-censored continuous covariate.
     In Chapter2, for nonparametric time series regression, we propose to apply the poly-nomial spline smoothing to estimate the conditional variance and construct its simulta-neous confidence bands. To be specific, we obtain the spline estimation of the conditional mean function in the first step. Then, the conditional variance function is estimated by applying polynomial spline smoothers to the residuals. Under a weak a-mixing condition, we obtain the uniform convergence rate. Furthermore, we construct simultaneous confi-dence bands for the conditional variance, using piecewise constant and piecewise linear splines. Simultaneous confidence bands can be used to test a parametric pattern for the variance curve. For instance, we can test whether the conditional heteroscedasticity exists or the conditional variance has the quadratic form. In the simulation, not only have we considered the case that both the conditional mean and variance functions are smooth, but also we have used the examples in which the conditional mean or the conditional variance function is rough. In the application,, we apply the proposed method to the S&P500Index daily data. Through the numerical results provided in the simulation and the application, we conclude that our method performs well enough and it is faster than the commonly used local polynomial smoothing method.
     In Chapter3, for time series nonparametric regression models with discontinuities, we propose to use polynomial splines to detect jumps in the conditional mean function. The number, locations, as well as magnitudes of the jumps are all assumed unknown. First, we obtain the spline estimator of the conditional mean function. Then, based on the maximal difference of the spline estimators between neighboring knots, test statistics for the existence of jumps are given and their limiting distributions are derived under the null hypothesis that the conditional mean function is continuous. Moreover, we use the knots to locate the detected jumps and apply a multiple ordered regression spline procedure to refit the data and estimate the jump magnitudes. In the simulation section, we analyze the asymptotic power of the proposed method, the frequencies of wrong detection and the frequencies of converge for true jumps over500replications.
     In Chapter4, we focus on the estimation of a linear regression model with an interval-censored continuous covariate. The linear regression model with an interval-censored covariate is proposed by Gomez, Espinal and Lagakos [9] in2003, which is motivated by an acquired immunodeficiency syndrome clinical trial. The censored covariate which is studied by Gomez et. al.[9] is a discrete random variable. To estimate the regression coefficients, Gomez et. al.[9] developed a likelihood approach, together with a two-step conditional algorithm. However, the algorithm of Gomez et. al.[9] is complex and time-consuming. Worse still, their method is inapplicable when the interval-censored covariate is continuous. In this chapter, we studied the tough problem that how to estimate the regression coefficients when the censored covariate is a continuous random variable. A novel and fast method is proposed to estimate the linear regression coefficients. Based on the logspline model of Stone [54], Kooperberg and Stone [55], we estimate the density and distribution functions of the interval-censored covariate in the first step. In the next step, we impute the interval-censored covariate with a conditional expectation. Then, we apply the ordinary least squared method to the linear regression model with the imputed covariate and obtain the estimated regression coefficients. In the simulation, we compare our imputation method with the midpoint imputation and the semiparametric Bayesian method. Through intensive simulation studies, we found our imputation method can give more accurate estimates and smaller MSEs when the width of the censoring interval is variable, and our imputation method is more than100times faster than the semiparametric Bayesian method.
     In Chapter5, we summarize the study of the whole paper by analyzing the merits and shortcomings of the proposed method. Moreover, we propose some possible directions for further study.
引文
[1]陈希孺,王松桂(1987).近代回归分析.合肥:安徽教育出版社,p1-7
    [2]王松桂(1999).线性统计模型-线性回归与方差分析.北京:高等教育出版社,p1-6
    [3]J. Fan, Q. Yao (2003). Nonlinear time series:nonparametric and parametric methods. Springer, New York
    [4]J. Z. Huang, H. Shen (2004). Functional coefficient regression models for nonlinear time series:a polynomial spline approach. Scand. J. Statist.,31:pp515-534
    [5]J. Z. Huang, L. Yang (2004). Identification of nonlinear additive autoregression mod-els. J. R. Statist. Soc. Ser. B,66:pp463-477
    [6]L. Wang, L. Yang (2007). Spline-Backfitted Kernel Smoothing of nonlinear additive autoregression model. The Annals of Statistics,35:pp2474-2503
    [7]E.C.M. Hui, C.K.W. Yu, W.C. Ip (2010). Jump point detection for real estate in-vestment sucess. Physica A,389:pp1055-1064
    [8]J.Y. Koo (1997). Spline estimation of discontinuous regression functions. Journal of Computational and Graphical Statistics,6:pp266-284
    [9]G. Gomez, A. Espinal, and S. W. Lagakos (2003). Inference for a linear regression model with an interval-censored covariate. Statistics in Medicine,22:pp409-425
    [10]R. F. Engle (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica,50:pp987-1007
    [11]C. J. Stone (1985). Additive regression and other nonparametric models. The Annals of Statistics,13:pp689-705
    [12]J. Fan, I. Gijbels (1996). Local polynomial modelling and its applications. Chapman and Hall, London.
    [13]J. Z. Huang (2003). Local asymptotics for polynomial spline regression. The Annals of Statistics,31:pp1600-1635
    [14]W. Hardle, A. Tsybakov (1997). Local polynomial estimators of the volatility function in nonparametric autoregression. Journal of Econometrics,81:pp223-242
    [15]J. Fan, Q. Yao (1998). Efficient estimation of conditional variance functions in s-tochatic regression. Biometrika,85:pp645-660
    [16]Z.B. Zhao, W.B. Wu (2008). Confidence bands in nonparametric time series regres-sion. The Annals of Statistics,36:pp1854-1878
    [17]G.J. Johnston (1982). Probabilities of maximal deviations for nonparametric regres-sion function estimates. J. Multivariate Anal.,12:pp402-414
    [18]W. Hardle (1989). Asymptotic maximal deviation of M-smoothers. J. Multivariate Ammal.,29:pp163-179
    [19]J. Sun, C.R. Loader (1994). Simultaneous confidence bands for linear regression and smoothing. Ann. Statist.,22:pp1328-1345
    [20]R.L. Eubank, P.L. Speckman (1993). Confidence bands in nonparametric regression. J. Amer. Statist. Assoc.,88:pp1287-1301
    [21]W.B. Wu, Z. Zhao (2007). Inference of trends in time series. J. Roy. Statist. Soc. Ser. B,69:pp391-410
    [22]J.Z. Huang, L. Yang (2004). Identification of non-linear additive autoregressive mod-els. Journal of the Royal Statistical Society, Series B,66:pp463-477
    [23]L. Xue, L. Yang (2006). Additive coefficient modeling via polynomial spline. Statistica Sinica,16:pp1423-1446
    [24]R.F. Engle, J.G. Rangel (2005). The spline GARCH model for unconditional volatility and its global macroeconomic causes. Czech National Bank Working Paper Series #13.
    [25]F. Audrino, P. Buhlmann (2009). Splines for financial volatility. Journal of the Royal Statistical Society, Series B,71:pp655-670
    [26]D. Qiu, Q. Shao, L. Yang (2013). Efficient inference for autoregressive coefficients in the presence of trends. Journal of Multivariate Analysis,114:pp40-53
    [27]X. Wu, Z. Tian, H. Wang (2009). Polynomial spline estimation for nonparamet-ric (auto-)regressive models. Studia Scientiarum Mathematicarum Hungarica,46: pp515-538
    [28]J. Wang, L. Yang (2009). Polynomial spline confidence bands for regression curves. Statistca Sinica,19:pp325-342
    [29]Q. Song, L. Yang (2009). Spline confidence bands for variance function. Journal of Nonparametric Statistics,21:pp589-609
    [30]L. Wang, L. Yang (2010). Simultaneous confidence bands for time-series prediction function. Journal of Nonparametric Statistics,22:pp999-1018
    [31]H.G. Miiller (1992). Change-points in nonparametric regression analysis. The Annals of Statistics,20:pp737-761
    [32]P.H. Qiu (1994). Estimation of the number of jumps of the jump regression functions. Communications in Statistics-Theory and Methods,23:pp2141-2155
    [33]P.H. Qiu, B. Yandell (1998). A Local polynomial jump detection algorithm in non-parametric regression. Technometrics,40:pp141-152
    [34]J.H. Shiau (1987). A note on MSE coverage intervals in a partial spline model. Com-munications in Statistics-Theory and Methods,16:pp1851-1866
    [35]S.J. Ma, L.J. Yang (2011). A jump-detecting procedure based on spline estimation. Journal of Nonparametric Statistics,23:pp67-81
    [36]P.H. Qiu, C. Asano, X. Li (1991). Estimation of jump regression function. Bulletin of Informatics and Cybernetics,24:pp197-212
    [37]J.S. Wu, C.K. Chu (1993). Kernel type estimators of jump points and values of a regression function. The Annals of Statistics,21:pp1545-1566
    [38]H.G. Miiller, U. Stadtmuller (1999). Discontinuous versus smooth regression. The Annals of Statistics,27:pp299-337
    [39]P.H. Qiu (1991). Estimation of a kind of jump regression functions. Systems Science and Mathematical Sciences,4:pp1-13
    [40]P.H. Qiu (2003). A Jump-preserving curve fitting procedure based on local piecewise-linear kernel estimation. Journal of Nonparametric Statistics,15:pp437-453
    [41]P.H. Qiu (2005). Image Processing and Jump Regression Analysis. Wiley, New York.
    [42]A.W. Bowman, A. Pope, B. Ismail (2006). Detecting discontinuities in nonparametric regression curves and surfaces. Statistics and Computing,16:pp377-390
    [43]I. Gijbels, A. Lambert, P.H. Qiu (2007). Jump-preserving regression and smoothing using local linear fitting:a compromise. Annals of the Institute Statistical Mathe-matics,59:pp235-272
    [44]J. Joo, P.H. Qiu (2009). Jump detection in a regression curve and its derivative. Technometrics,51:pp289-305
    [45]Z.Y. Lin, D. Li, J. Chen (2008). Change point estimators by local polynomial fits under a dependence assumption. Journal of Multivariate Analysis,99:pp2339-2355
    [46]H. Wong, W. Ip, Y. Li (2001). Detection of jumps by wavelets in a heteroscedastic autoregressive model. Statistics & Probability Letters,52:pp365-372
    [47]G.M. Chen, Y.K. Choi, Y. Zhou (2008). Detections of changes in return by a wavelet smoother with conditional heteroscedastic volatility. Journal of Econometrics,143: pp227-262
    [48]E.C.M. Hui, C.K.W. Yu, W.C. Ip (2010). Jump point detection for real estate in-vestment sucess. Physica A,389:pp1055-1064
    [49]Y. Zhou, A.T.K. Wan, S.Y. Xie, X.J. Wang (2010). Wavelet analysis of change-points in a non-parametric regression with heteroscedastic variance. Journal of Economet-rics,159:pp183-201
    [50]X.L. Kang, W.J. Braun, J.E. Stafford (2011). Local regression when the responses are interval-censored. Journal of Statistical Computation and Simulation,81:pp1247-1279
    [51]何其祥(2007).协变量区间删失时线性模型的参数估计.应用数学,20:pp427-432
    [52]B.W. Turnbull (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society:Series B,38: pp290-295
    [53]E.L. Kaplan, P. Meier (1958). Nonparametric estimation from incomplete observa-tions. Journal of the American Statistical Association,53:pp457-481
    [54]C.J. Stone (1990). Large-sample inference for log-spline models. The Annals of S-tatistics,18:pp717-741
    [55]C. Kooperberg, C.J. Stone (1992). Logspline density estimation for censored data. Journal of Computational and Graphical Statistics,1:pp301-328
    [56]C. De Boor (2001). A practical guide to splines. Springer, New York.
    [57]J.C. Sergeant, D. Firth (2006). Relative index of inequality:definition, estimation, and inference. Biostatistics,7:pp213-224
    [58]Y. Zhang, L. Hua, J. Huang (2010). A spline-based semiparametric maximum likeli-hood estimation method for the Cox model with interval-censored data. Scandinavian Journal of Statistics,37:pp338-354
    [59]A.C. Yavuz, P. Lambert (2011). Smooth estimation of survival functions and hazard ratios from interval-censored data using Bayesian penalized B-splines. Statistics in Medicine,30:pp75-90
    [60]J. Schmee, G. Hahn (1979). A simple method for regression analysis with censored data. Technometrics,21:pp417-432
    [61]R. Miller (1976). Least squares regression with censored data. Biometrika,63:pp449-464
    [62]J. Buckley, L. James (1979). Linear regression with censored data. Biometrika,66: pp429-436
    [63]R.M. Gulick, X.J. Hu, S.A. Fiscus等(2000). Randomized study of saquinavir with ritonavir or nelfinavir together with delavirdine, adefovir, or both in human immun-odeficiency virus-infected adults with virologic failure on indinavir:AIDS clinical trials group study 359. Journal of Infectious Diseases,182:pp1375-1384
    [64]K. Langohr, G. Gomez (2005). Likelihood maximization using web-based optimiza-tion tools:a short tutorial. The American Statistician,59:pp192-202
    [65]M.L. Calle, G. Gomez (2005). A semiparametric hierarchical method for a regression model with an interval-censored covariate. Australian & New Zealand Journal of Statistics,47:pp351-364
    [66]D. Blackwell, J.B. MacQueen (1973). Ferguson distribution via Polya urn shcemes. Ann. Statist.,1:pp353-355
    [67]杨皎杰(2008).协变量区间删失情况下回归模型的参数估计问题.华东师范大学硕士学位论文
    [68]J.C. Lindsey, L.M. Ryan (1998). Tutorial in biostatistics:methods for interval-censored data. Statistics in Medicine,17:pp219-238
    [69]J.K. Lindsey (1998). A study of interval censoring in parametric regression models. Lifetime Data Analysis,4:pp329-354
    [70]E.T. Lee, J. Wang (2003). Statistical methods for survival data analysis.3rd edn. John Wiley & Sons Inc., New York.
    [71]T.R. Fleming, D.Y. Lin (2000). Survival analysis in clinical trials:past developments and future directions. Biometrics,56:pp971-983
    [72]郑祖康,丁邦俊(2004).关于区间数据的分布函数的估计问题.应用概率统计.20:pp119-125
    [73]I.J. Schoenberg (1946). Contributions to the problem of approximation of equidistant data by analytic functions, Parts A and B. Quart. Appl. Math.,4:pp45-99 and 112-141
    [74]L.L. Schumaker (1981). Spline functions. Wiley, New York.
    [75]卢一强(2003).变系数模型的研究与分析.华东师范大学博士学位论文
    [76]R.A. DeVore, G.G. Lorentz (1993). Constructive approximation. Springer-Verlag, New York.
    [77]D. Bosq (1998). Nonparametric statistics for stochastic processes. Springer, New York.
    [78]S. Demko (1977). Inverse of band matrices and local convergence of spline projections. SIAM Journal on Numerical Analysis,4:pp616-619
    [79]P. Hall, R.J. Carroll (1989). Variance function estimation in regression:the effect of estimating the mean. Journal of the Royal Statistical Society, Series B,51:pp3-14
    [80]L. Wang, L.D. Brown, T.T.Cai, M. Levine (2008). Effect of mean on variance function estimation in nonparametric regression. The Annals of Statistics,36:pp646-664
    [81]B.W. Silverman (1986). Density estimation for statistics and data analysis. Chapman and Hall, London.
    [82]R.A. Johnson, D.W. Wichern (1992). Applied Multivariate Statistical Analysis. Pren-tice Hall, Englewood Cliffs.
    [83]R.B. Alley, J. Marotzke, W.D. Nordhaus, J.T. Overpeck, D.M. Peteet, R. Pielke, R.T. Pierrehumbert, P.B. Rhines, T.F. Stocker, L.D. Talley, J.M. Wallace (2003). Abrupt climate change. Science,299:pp2005-2010
    [84]M.A. Ivanov, S.N. Evtimov (2010).1963:The break point of the northern hemisphere temperature trend during the twentieth century. International Journal of Climatology 30:pp1738-1746
    [85]I. Matyasovszky, (2011). Detecting abrupt climate changes on different time scales. Theoretical and Applied Climatology,105:pp445-454
    [86]National Bureau of Economic Research (2011). US business cycle expansions and contractions. www.nber.org/cycles/cyclesmain.html#announcements.
    [87]R. Oller, G. Gomez, M.L. Calle (2004). Interval censoring:model characterizations for the validity of the simplified likelihood. Canadian Journal of Statistics,32:pp315-326

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700