Bias-variance Error Decomposition for Data-driven Geospatial Modeling.

详细信息

作者：Gao ; Jing.
学历：Ph.D.
年：2013
关键词：bias-variance error decomposition ; modeling errors
导师：Burt, James E.,eadvisorZhu, A-Xingecommittee memberBurnicki, Amy C.ecommittee memberNewton, Michael A.ecommittee memberShavlik, Jude W.ecommittee member
毕业院校：The University of Wisconsin
Department：Geography
ISBN：9781303393488
CBH：3594669
Country：USA
语种：English
FileSize：4351770
Pages：108

文摘

Careful model evaluation is essential when using data-driven geospatial models. A useful evaluation should help the analyst 1) understand and 2) improve model performance. Commonly used error analysis methods provide limited help for accomplishing these goals. Hence, I propose to use the bias-variance BV) error decomposition in geospatial modeling. This approach decomposes the expected model error into bias systematic error), variance model sensitivity to variations in training data), and noise unavoidable error). Originating in statistics and machine learning, BV analysis has proven useful for achieving the aforementioned two goals of model evaluation, and it has been used to compare different error metrics. However, it has not been tested for geospatial models. This research investigates the BV decomposition of three error types relevant for geospatial modeling squared error, absolute error, and categorical error), through both analytical inquiry and case studies. It is the first research to analytically derive the BV decomposition for absolute errors, the first to explore the usefulness of BV decomposition for geospatial models, and the first to investigate the implications of using different error definitions in geospatial model evaluation. My results showed that the benefits of BV analysis demonstrated in statistics and machine learning apply for geospatial modeling. Additionally, the BV decomposition can reveal new insights about the modeled geospatial process； mapping bias can help identify and delineate model spatial non-stationarity； and mapping variance can help predict the effects of ensemble methods and guide training sample collection. All of these may assist the development of model improvement strategies. Further, squared, absolute, and categorical errors can potentially lead to different model evaluation conclusions for the same model. In practice, it can be beneficial to use both squared and zero-one errors for geospatial classification models, but for geospatial regression models a careful choice between squared and absolute errors is recommended. Interestingly, in my results, the widely accepted bias/variance tradeoff did not always emerge, but the cause of this phenomenon is not clear. Finally, the geospatial context also deepened the theoretical understandings about data-driven modeling in general, especially regarding the effects of ensemble methods and effective model training.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700