Model selection in multivariate adaptive regression splines (MARS) using information complexity as the fitness function
详细信息    查看全文
  • 作者:Elcin Kartal Koc ; Hamparsum Bozdogan
  • 关键词:Model selection ; Multivariate adaptive regression Splines (MARS) ; Nonparametric regression ; Information complexity
  • 刊名:Machine Learning
  • 出版年:2015
  • 出版时间:October 2015
  • 年:2015
  • 卷:101
  • 期:1-3
  • 页码:35-58
  • 全文大小:1,395 KB
  • 参考文献:Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. Petrox, F. Csaki (Eds.) Second International Symposium on Information Theory (pp. 267-81). Academiai Kiado, Budapest.
    Akaike, H. (1974). A new look at the statistical identification model. IEEE, 19, 716-23.MATH MathSciNet
    Akaike, H. (1979). A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 30, 9-4.MathSciNet CrossRef
    Amemiya, T. (1980). Selection of regressors. International Economic Review, 21, 331-54.MATH MathSciNet CrossRef
    Barron, A. R., & Xiao, X. (1991). Discussion: Multivariate adaptive regression splines. Annals of Statistics, 19, 67-2.CrossRef
    Bozdogan, H. (1987). Model selection and akaike’s information criterion: The general theory and its analytical extensions. Psychometrika, 52, 345-70.MATH MathSciNet CrossRef
    Bozdogan, H. (1988). Icomp: A new model-selection criteria. In H. Bock (Ed.), Classification and related methods of data analysis. Amsterdam, North-Holland: Elsevier Science Publishers.
    Bozdogan, H. (1990). On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communication in Statistics, Theory and Methods, 19, 221-78.MATH MathSciNet CrossRef
    Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity, vol. 2. In H. Bozdogan (Ed.) Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach (pp. 69-13). Dordrecht, the Netherlands: Kluwer Academic Publishers.
    Bozdogan, H. (2000). Akaike’s information criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44, 62-1.MATH MathSciNet CrossRef
    Bozdogan, H. (2004). Intelligent statistical data mining with information complexity and genetic algorithms. In H. Bozdogan (Ed.), Statistical data mining and knowledge discovery. Boca Raton, FL: Chapman and Hall/CRC.
    Bozdogan, H. (2010). A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation. Istanbul University Journal of the School and Business Administration, 39, 370-98.
    Bozdogan, H., & Bearse, P. (1998). Subset selection in vector autoregressive models using the genetic algorithm with information complexity as the fitness function. Systems Analysis Modeling and Simulation, 31, 61-1.MATH
    Bozdogan, H., & Haughton, D. (1998). Informational complexity criteria for regression models. Computational Statistics and Data Analysis, 28, 51-6.MATH MathSciNet CrossRef
    Bozdogan, H., & Howe, J. A. (2012). Misspecified multivariate regression models using the genetic algorithm and information complexity as the fitness function. European Journal of Pure and Applied Mathematics, 5, 211-49.MathSciNet
    Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design a review. Statistical Science, 10, 273-04.MATH MathSciNet CrossRef
    Chou, S. M., Lee, T. S., Shao, Y. E., & Chen, I. F. (2004). Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications, 27(1), 133-42.CrossRef
    Cramér, H. (1946). Mathematical methods of statistics. Princeton, NJ: Princeton University Press.MATH
    Craven, P., & Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross validation. Numerische Mathematik, 31, 377-03.MATH MathSciNet CrossRef
    Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19, 1-1.MATH MathSciNet CrossRef
    Friedman, J. H., & Silverman, B. W. (1989). Flexible parsimonious smoothing and additive modelling. Technometrics, 31, 3-1.MATH MathSciNet CrossRef
    Hastie, T. J., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning, data mining, inference and prediction. New York: Springer.MATH
    Hild, C., & Bozdogan, H. (1995). The use of information-based model evaluation criteria in the GMDH algorithm. Systems Analysis Modeling Simulation, 20, 29-0.MATH
    Ivakhnenko, A. G. (1966). Group method of data handling: A rival of the method of stochastic approximation. Soviet Automatic Control, 13, 43-1.
    Jekabsons, G. (2011). ARESLab: Adaptive regression splines toolbox for matlab/Octave. http://?www.?cs.?rtu.?lv/?jekabsons/-/span> .
    Kartal Koc, E., Iyigun, C. (2013). Restructuring forward step of mars algorithm using a new knot selection procedure based on a mapping approach. Journal of Global Optimization. doi:10.-007/?s10898-013-0107-5 .
    Kullback, A., & Leibler, R. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79-6.MATH MathSciNet CrossRef
    Kullback, S. (1968). Information theory and statistics. New York: Dover.
    Lee, T. S., Chiu,
  • 作者单位:Elcin Kartal Koc (1)
    Hamparsum Bozdogan (1)

    1. Department of Statistics, Operations, and Management Science, The University of Tennessee, Knoxville, TN, 37996, USA
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Automation and Robotics
    Computing Methodologies
    Simulation and Modeling
    Language Translation and Linguistics
  • 出版者:Springer Netherlands
  • ISSN:1573-0565
文摘
This paper introduces information-theoretic measure of complexity (ICOMP) criterion for model selection in multivariate adaptive regression splines (MARS) to tradeoff efficiently between how well the model fits the data and the model complexity. As is well known, MARS is a popular nonparametric regression technique used to study the nonlinear relationship between a response variable and the set of predictors with the help of piecewise linear or cubic splines as basis functions. A critical aspect in determining the form of the nonparametric regression model during the MARS strategy is the evaluation of portfolio of submodels to select the best submodel with the appropriate number of knots over subset of predictors. In the usual regression modeling, when a large number of predictor variables are present in the model, and there is no precise information about the exact functional relationships among the variables, many model selection criteria still overfit the model. In this paper, to find the simplest model that balances the overfitting and underfitting for the model, ICOMP is proposed as a powerful model selection criterion for MARS modeling. Here, the model complexity is treated with respect to the interdependency of parameter estimates, as well as the number of free parameters in the model. We develop and study the performance of ICOMP along with several most popular model selection criteria such as Akaike’s information criterion, Schwarz’s Bayesian information criterion and generalized cross-validation in MARS modeling to select the best subset models. We provide two Monte Carlo simulation examples and a real benchmark example to demonstrate the utility and versatility of the proposed model selection approach to determine best functional form of the predictive model. Our numerical examples show that ICOMP provides a general model selection criterion with an insight to the interdependencies and/or correlational structure between parameter estimates in the selected model. This new approach can also be applicable to many complex statistical modeling problems. Keywords Model selection Multivariate adaptive regression Splines (MARS) Nonparametric regression Information complexity

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700