Two-sample homogeneity tests based on divergence measures
详细信息    查看全文
  • 作者:Max Wornowizki ; Roland Fried
  • 关键词:Nonparametric two ; sample test ; Semiparametric two ; sample test ; Density ratio estimation ; Kullback ; Leibler divergence ; Hellinger distance ; Permutation test
  • 刊名:Computational Statistics
  • 出版年:2016
  • 出版时间:March 2016
  • 年:2016
  • 卷:31
  • 期:1
  • 页码:291-313
  • 全文大小:596 KB
  • 参考文献:Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J R Stat Soc (B) 28:131–142MathSciNet MATH
    Alin A, Kurt S (2008) Ordinary and penalized minimum power-divergence estimators in two-way contingency tables. Computat Stat 23:455468MathSciNet MATH
    Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178MathSciNet MATH
    Basu A, Linday BG (1994) Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann Inst Stat Math 46(4):683–705MathSciNet CrossRef MATH
    Basu A, Harris IR, Hjort NL, Jones MC (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–559MathSciNet CrossRef MATH
    Beran R (1977) Minimum Hellinger distance estimates for parametric models. Ann Stat 3:445463MathSciNet MATH
    Bischl B, Lang M, Mersmann O (2013) BatchExperiments: statistical experiments on batch computing clusters. R package version 1.0-968, http://​CRAN.​R-project.​org/​package=​BatchExperiments​/​
    Cardot H, Prchal L, Sarda P (2007) No effect and lack-of-fit permutation tests for functional regression. Comput Stat 22:371390MathSciNet CrossRef MATH
    D’Addario M, Kopczynski D, Baumbach JI, Rahmann S (2014) A modular computational framework for automated peak extraction from ion mobility spectra. BMC Bioinform 15:25–36CrossRef
    Fisher RA (1935) The design of experiments. Oliver and Boyd, Edinburgh
    Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. CRC Monogr Stat Appl Probab (Book 58), Chapman and Hall, New York
    Govindarajulu Z (2007) Nonparametric inference. World Scientific Pub Co, SingaporeCrossRef MATH
    Kim JS, Scott CD (2012) Robust kernel density estimation. J Mach Learn Res 13(1):2529–2565MathSciNet MATH
    Kanamori T, Suzuki T, Sugiyama M (2012) F-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Trans Inf Theor 58:708–720MathSciNet CrossRef
    Kopczynski D, Baumbach JI, Rahmann S (2012) Peak modeling for ion mobility spectrometry measurements. In: Proceedings of the 20th European signal processing conference (EUSIPCO 2012), pp. 1801–1805
    Lee ET, Desu MM, Gehan EA (1975) A monte carlo study of the power of some two-sample tests. Biometrika 62:425–432CrossRef MATH
    Lee S, Na O (2005) Test for parameter change based on the estimator minimizing density-based divergence measures. Ann Inst Stat Mat 57:553–573MathSciNet CrossRef MATH
    Liese F, Miescke KJ (2008) Statistical decision theory: estimation, testing, and selection. Springer Series in Statistics, BerlinCrossRef MATH
    Lindsay BG (1994) Efficiency versus robustness: the case for minimum hellinger distance and related methods. Annals Stat 22:1081–1114MathSciNet CrossRef MATH
    Nelder JA, Mead R (1965) A simple algorithm for function minimization. Comput J 7:308–313CrossRef MATH
    Qin J (1998) Inferences for case control and semiparametric two-sample density ratio models. Biometrika 85:619–630MathSciNet CrossRef MATH
    R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://​www.​R-project.​org
    Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the KullbackLeibler divergence. IEEE Trans Neural Netw 18:97–104CrossRef
    Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc (B) 53:683–690MathSciNet MATH
    Sohn S, Jung BC, Jhun M (2012) Permutation tests using least distance estimator in the multivariate regression model. Comput Stat 27:191201MathSciNet CrossRef MATH
    Sugiyama M, Kanamori T, Suzuki T, Hido S, Sese J, Takeuchi I, Wei L (2009) A density-ratio framework for statistical data processing. IPSJ Trans Comput Vis Appl 1:183–208
    Turlach BA (1993) Bandwidth selection in kernel density estimation: a review. Universit catholique de Louvain
    Zeileis A, Hothorn T (2013) A toolbox of permutation tests for structural change. Stat Pap 54:931–954MathSciNet CrossRef MATH
    Zhu Y, Wu J, Lu X (2013) Minimum Hellinger distance estimation for a two-sample semiparametric cure rate model with censored survival data. Comput Stat 28:2495–2518MathSciNet CrossRef MATH
  • 作者单位:Max Wornowizki (1)
    Roland Fried (1)

    1. Department of Statistics, Technische Universität Dortmund, 44221, Dortmund, Germany
  • 刊物类别:Mathematics and Statistics
  • 刊物主题:Mathematics
    Statistics
    Statistics
    Probability and Statistics in Computer Science
    Probability Theory and Stochastic Processes
    Economic Theory
  • 出版者:Physica Verlag, An Imprint of Springer-Verlag GmbH
  • ISSN:1613-9658
文摘
The concept of f-divergences introduced by Ali and Silvey (J R Stat Soc (B) 28:131–142, 1996) provides a rich set of distance like measures between pairs of distributions. Divergences do not focus on certain moments of random variables, but rather consider discrepancies between the corresponding probability density functions. Thus, two-sample tests based on these measures can detect arbitrary alternatives when testing the equality of the distributions. We treat the problem of divergence estimation as well as the subsequent testing for the homogeneity of two-samples. In particular, we propose a nonparametric estimator for f-divergences in the case of continuous distributions, which is based on kernel density estimation and spline smoothing. As we show in extensive simulations, the new method performs stable and quite well in comparison to several existing non- and semiparametric divergence estimators. Furthermore, we tackle the two-sample homogeneity problem using permutation tests based on various divergence estimators. The methods are compared to an asymptotic divergence test as well as to several traditional parametric and nonparametric procedures under different distributional assumptions and alternatives in simulations. It turns out that divergence based methods detect discrepancies between distributions more often than traditional methods if the distributions do not differ in location only. The findings are illustrated on ion mobility spectrometry data.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.