稳健多元线性回归在地理数据处理中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
多元线性回归是建立地理统计分析模型常用的数学方法。统计学家指出,在生产实践和科学实验采集的数据中,粗差出现的概率约为1%-10%。为了减弱或消除粗差对参数估计的影响,G.E.P.BOX于1953年提出了稳健估计的概念。稳健估计理论是建立在符合于观测数据的实际分布模式上,而不是建立在某种理想的分布模式上,即在粗差不可避免的情况下,选择适当的估计方法,使参数的估值尽可能避免粗差的影响,得到正常模式下的最佳估值。稳健多元线性回归能有效地消除或减弱粗差对参数估计的影响,同时稳健估计方法消除粗差的范围因稳健估计方法本身和具体问题的观测值数量的不同而不同。本文用仿真实验的方法,确定了稳健多元线性回归中相对更为有效的稳健估计方法,并确定了这些稳健估计方法消除或减弱粗差的范围以及它们完全消除粗差时需要的最少观测值数量。
     本文提出了一种确定稳健估计方法消除粗差范围的途径和具体的计算方法。用仿真实验(1000次)的方法、以多元(2-5)线性回归为例对常用的13种稳健估计方法消除粗差(最大值8.0σ0)的范围进行了比较。得出的结论为:L1法、German-McClure法、IGGIII方案和Danish法是常用13种稳健估计方法中相对更为有效的稳健估计方法。当观测值中包含一个粗差时,二元、三元、四元和五元线性回归完全消除3.0~8.0σ0粗差影响的最小观测值数量分别是7、8、10和11。当观测值中同时包含两个粗差时,二元、三元、四元和五元线性回归完全消除3.0~8.0σ0。粗差影响的最小观测值数量分别是10、12、15和17。
     一元线性回归是应用最为广泛的参数估计方法之一。本文提出了一元线性回归的自变量在等差级数的基础上进行双向黄金分割,提高了两端点观测值的多余观测分量,缩小了观测值之间多余观测分量的差异。在不增加观测值数量和不改变观测值精度的前提下,提高了稳健估计方法消除或减弱粗差的能力。
     多元线性回归的系数求解通常是用最小二乘法,但在实际应用还会出现另外一种情况,即自变量间存在多重共线关系(multicolinearity),常会影响参数估计。在这方面,针对多种实际问题,Hore, Massy, Webster, Stein分别提出了回归系数的岭估计,主成分估计,压缩估计,特征根估计,以减弱多重共线的影响。本文针对此问题也总结了常用的诊断和消除多重共线性影响的方法。常见的多重共线性的诊断方法主要有:容许值,方差膨胀因子,特征根等。消除方法主要有:岭回归,主成分回归分析,偏最小二乘估计等。
Multiple linear regression is commonly used mathematical method in establishing geographic analysis models. Statisticians said that the probability of appearing gross error is about1%~10%in the production practice and the collected data of scientific experiments. In order to eliminate or weaken the effects of gross errors on parameter estimation, G.E.P.BOX proposed the concept of robust estimation in1953. Robust estimation theory is based on the actual distribution, rather than the ideal distribution, of data. Appropriate methods are adopted to ensure that the estimated values of parameters are unaffected by unavoidable gross errors. Optimum estimated values are targeted under the normal mode. Robust multiple linear regression can efficiently eliminate or weaken the influence of gross errors on parameter estimation when gross errors exist in observations inevitably. The extents of gross errors eliminated by robust multiple linear regression are different with robust estimation methods themselves and different observations of specific problem. The current paper compares the capability of commonly used robust estimation methods in eliminating or weakening gross errors through simulation experiments. This paper confirms extent of gross errors eliminated (EGEE) by robust estimation methods for dealing with multiple linear regressions, as well as the minimum number of observations needed to eliminate gross errors in certain ranges completely.
     This paper presents a new approach to determine EGEE by robust estimation method and specific calculation method. Taking multiple linear regression (2-5) as examples, this current paper uses simulation experiments (1000times) to compare13frequently used robust estimation methods. Several additional efficient robust estimation methods are confirmed for dealing with multiple linear regressions. Finally, the minimum number of observations needed for eliminating completely gross errors (3.0-8.0σ0) is also confirmed. In summary, the L1method, German-McClure method, IGGIII scheme, and Danish method are comparatively more effective methods among the14robust estimation methods. When the observations contain one gross error, the minimum observed numbers of the binary, ternary, quaternary, and five-element linear regressions that fully eliminate the influence of gross errors (3.0-8.0σ0) are7,8,10, and11, respectively. When the observations contain two gross errors simultaneously, the minimum observed numbers of binary, ternary, quaternary, and five-element linear regressions that fully eliminate the influence of gross errors (3.0-8.0σ0) are10,12,15, and17, respectively.
     Simple linear regression is one of the most widely used methods of parameter estimation. This paper proposes a bidirectional golden section based on independent variables according to arithmetical progression, which increases the redundant observations of the observations at both endpoints and narrows the difference of redundant observations among the observations. Under the premise of not increasing the number of observations and changing observation accuracy, this method improves the capability of robust estimate method in eliminating or weakening gross errors.
     Usually, the solution for multiple linear regression coefficient solution is the least square method, but in actual application still appearing another case, the phenomenon of mult ico linearity among variables often seriously influences the parameter estimation. In this respect, for a variety of practical problems, Hore, Massy, Webster, and Stein introduced Ridge regression, Principal Component Regression, Shrinkage estimator, and Robust latent root estimator of regression coefficients, respectively, to weaken the effects of gross errors on multicolinearity. According to this problem, the paper summed up the commonly used diagnoses and methods of eliminating the influence of multicolinearity. Common diagnoses of multicolinearity mainly are Latent root, Variance inflation factor, Tolerance value, etc. Eliminating methods mainly are Ridge Regression, Principal Component Regression, Partial least squares estimate, and so on.
引文
[1]姚宜斌,陶本藻,施闯.稳健回归分析及其应用研究[J].大地测量与地球动力学.2002,22(2)
    [2]郝刚,于启升.稳健统计方法在老采空区沉降数据处理中的应用[J].现代矿业,2010,2(2)
    [3]刘群,王颖喆.回归模型在贵州地理要素分析中的应用[J].贵州师范大学学报(自然科学版),2009,27(4)
    [4]徐建华,鲁凤等.中国区域经济差异的时空尺度分析[J].地理研究.2005,24(1).57-68.
    [5]陈炳为,许碧云,倪宗瓒等.地理权重回归模型在甲状腺肿大中的应用[J].数理统计与管理.2005,.24(3)
    [6]张超,地理系统工程[M].科学出版.1993.
    [7]Zioutas, G, Avramidis, A.,. Deleting Outliers in Robust Regression with Mixed Integer Programming. Acta Math. Appl. Sin. Engl. Ser.2005,21,323-334.
    [8]徐建华.现代地理学中的数学方法[M].高等教育出版社.2002.
    [9]李霖,应申,朱海红.地理计算原理与方法[M].测绘出版社.2008.
    [10]艾南山,李后强.第四纪研究的非线性科学方法[J].第四纪研究.1993(2)109-120.
    [11]廖顺宝,孙九林.基于GIS的青藏高原人口统计数据空间化[J].地理学报.2003,58(1).
    [12]徐建华,鲁凤等.中国区域经济差异的时空尺度分析[J].地理研究.2005,24(1).57-68.
    [13]刘妙龙,李乔,罗敏.地理计算:数量地理学的新发[J].地球科学进展.2000,15(6)679-683.
    [14]刘妙龙,李乔.从数量地理学到地理计算学——对数量地理方法的若干思考[J].人文地理.2000,]5(3)13-16.
    [15]Atkinson P, Martin D. GIS and GeoComputation[J]. NewYork:Taylor & Francis,2000.
    [16]Openshaw S, Abrahart R J. GeoComputation[J]. New York:Taylor & Francis,2000.
    [17]毛政元,李霖The developing tendency of quantitative geography GeoComputation[J]华中师范大学学报:自然科学版.2003,37(1):111-1]4.
    [18]陈彦光.地理数学方法:从计量地理到地理计算.华中师范大学学报:自然科学版[J].2005,39(1):113-119/125.
    [19]陈彦光,罗静.地学计算的研究进展与问题分析[J].地理科学进展 2009,28(4).48]-488.
    [20]陈彦光,刘继生.地理学的主要任务与研究方法——从整个科学体系的视角看地理科学的发展[J].地理科学.2004,24(3)257-263.
    [21]王铮,隋文娟等.地理计算及其前言问题[J].地理科学进展.2007,26(4):1-10.
    [22]Mei-Po Kwan.GIS Methods in Time-Geographic Research:Geocomputation and Geovisualization of Human Activity Patterns.Geografiska Annaler:Series B, Human Geography.2004,86(4):267-280.
    [23]李山.旅游圈形成的基本理论及其地理计算研究[D].华东师范大学.2006.
    [24]刘群,王颖喆.回归模型在贵州地理要素分析中的应用[J].贵州师范大学学报(自然科学版).2009,27(4).
    [25]董春,吴喜之,程博.偏最小二乘回归方法在地理与经济的相关性分析中的应用研究[J].测绘科学.2000,25(4).
    [26]Huber, P. J.1964. Robust Estimation of a Location Parameter. Annals of Mathematical Statistics 35:73-101.
    [27]Horel,A.E.etal. Ridge Regression Application to Nonorthogonal Problema.Technomctrics [J].1970:69-82.
    [28]Massy,W.F.,Principal Components in Exploratory Statistical Research, [J]JASA,60(1 965)234-256.
    [29]Webster,J.T.et al., Latent Root Regression Analysia, [J]Technomctrics 16(1988) 21-30.
    [30]夏结来.回归系数广义根方估计及其模拟[J].应用数学.1994(2):187-192.
    [31]夏结来,颜光宇.回归系数的稳健主成分估计[J].数学的实践与应用.1994(1)40-45.
    [32]孙辉,刘子方.利用带约束条件的非线性规划改进多元线性回归[J].南昌水专学报.1995,4(1):54-60.
    [33]王斌会,徐勇勇.线性回归中不规则数据的处理方法及其应用[J].第四军医大学学报.2003,24(12):1143-1146.
    [34]崔恒建.多元线性模型t型回归参数估计的相合性和渐近正态性[J].中国科学A辑数学.2004,34(3):361-372.
    [35]李宗坤,陈乐意,孙颖章.偏最小二乘回归在渗流监控模型中的应用[J].郑州大学学报(工学版)2006,27(4):117-123.
    [36]孙辉,张忠梅,葛寒娟.微粒群算法在改进多元线性回归上的应用[J].计算机工程与应用.2007,43(3):43-44.
    [37]吴健平,张立.地理数据线性回归中的稳健估计方法[J].干旱区地理,1994,17(1):83-88.
    [38]张雅君,刘全胜,冯萃敏.多元线性回归分析在北京城市生活需水量预测中的应用[J].给水排水.2003,29(4):26-29.
    [39]Huifen Li Xaingqian Jiang, Zhu Li. Robust estimation in Gaussian filtering for engineering surface characterization [J]. Precision Engineering.2004,28:186-193.
    [40]Sharmishtha Mitra, Amit Mitra, Debasis Kundu. Genetic algorithm and M-estimator based robust sequential estimation of parameters of nonlinear sinusoidal signals. [J]Commun Nonlinear Sci Numer Simulat 2010.
    [41]Insha Ullah, Muhammad F. Qadir, Asad Ali, Insha's Redescending M-estimator for Robust Regression:A Comparative Study [J]. Pakistan Journal of Statistics and Operation Research.2006,2(2):135-144.
    [42]Paolo Pennacchi. Robust estimate of excitations in mechanical systems using M-estimators-Theoretical background and numerical applications [J]. Journal of Sound and Vibration.2008,310:923-946.
    [43]Baselga, S. "Global optimization solution of robust estimation." J. Surv. Engrg., 2007,133(3),123-128.
    [44]El-Hawary, F., and Mbamalu, G. A. N. "Fair and Andrews's weighting-based 1RWLS algorithms for time-delay estimation in underwater target tracking." Ieee. J. Oceanic. Eng., 1993 18(2),142-150.
    [45]Xie, Y., Liang, Y., Jiang, J., and Yu, R.. "Robust regression used for the treatment of partial non-linearity in multivariate calibration." Analytica. Chimica. Acta.,1995,313(3), 185-196.
    [46]Knight, N. L., and Wang, J. L. "A comparison of outlier detection procedures and robust estimation methods in GPS positioning." J. Navigation.,2009,62(4),699-709.
    [47]Li, H. J., Tang, S, H., and Huang, J.."Discussion for the selection of constant in selecting weight iteration method in robust estimation." Sci. Surv. Mapp.,2006,31(6),70-72.
    [48]Chang, Z. Q., Hao, J. M., Zhang, C. J., and Cui, J. Y. (2008). "Regularization combined with robust estimation and its application for GPS rapid positioning." J. G. G.,28(3),83-86.
    [49]Michael P. Windham. Convavity in Data Analysis [J]. Journal of Classification.2003, 20:77-92.
    [50]陈彦光.地理数学方法基础和应用[M].科学出版社,2011.
    [51]张凤莲,多元线性回归中多重共线性问题的解决办法探讨[D].华南理工大学.2010.
    [52]丁元林,孔丹莉,毛宗福等.多重线性回归分析中的常用共线性诊断方法,数理医药学杂志,2004,17卷第4期.
    [53]李艳军.多元线性模型回归系数的主成分估计.东北师范大学[D].2006.
    [54]周纪芗,实用回归分析方法,上海,上海科学技术出版社,1990
    [55]邱卫宁,陶本藻,姚宜斌等.测量数据处理理论与方法[M].武汉:武汉大学出版社,2008.
    [56]陈彦光,地理数学方法:基础和应用[M].科学出版社,2011.
    [57]Y.X. Yang, L.J. Song, and T.H. Xu, Robust estimator for correlated observations b.3-5428based on bifactor equivalent weights, J. Geodesy.76 (2002),353-358
    [58]陈轲,归清明,柳丽等Gauss-Markov模型的t型抗差估计[J].测绘学报.2008.37(3):280-284.
    [59]王新洲,陶本藻,邱卫宁等.高等测量平差[M].北京:测绘出版社.2006.73-89
    [60]黄幼才.数据探测与抗差估计[M].北京:测绘出版社.1990.287-309
    [61]周江文.经典误差理论与抗差估计[J].测绘学报.1989.18(2):115-120.
    [62]M.l. Griep, I.N. Wakeling, P. Vankeerberghen,D.L. Massart.Comparison of semirobust and robust partial least squares procedures[J].Chemom. Intell. Lab. Syst.1995.29(1):37-50.
    [63]Peter J. Huber. Robust statistics[M]. New York John Wiley and Sons.1981.229-236.
    [64]Zioutas, G. and Avramidis,A. Deleting Outliers in Robust Regression with Mixed Integer Programming[J].Acta Math. Appl. Sin. Engl. Ser 2005.21(2)323-334.
    [65]贾超.稳健回归分析在变形监测中的应用.太原理工大学.[D].2010.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700