基于随机优化的抽样
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
抽样方法在现实生活中有非常广泛的应用,因此抽样理论得到了统计学者的广泛关注。特别是对于给定密度函数的分布进行抽样,目前已经有多种成熟的方法,如合成法、拒绝法、MCMC方法和分位点方法等。但总体来说,已有文献大多是关于如何实现这些抽样算法的研究,而对于系统比较抽样方法的优劣,目前还没有得到广泛深入的研究。
     本文基于核密度估计函数概念,首先定义了用来刻画目标函数和样本的核密度估计函数的偏离程度的指标“L_(2~-)距离”,并以此作为度量样本效果标准,提出了一种采用随机优化算法的抽样方法。最后以一类具有复杂密度函数的分布为例,综合比较了包括合成法、拒绝法、MCMC方法、分位点方法及随机优化等抽样方法的具体表现。通过多次试验抽取不同容量的样本,分析多次试验的L_(2~-)距离的均值和标准差,同时结合算法运行时间和抽取样本的k阶矩等指标,对抽样的方法进行优劣评价及取舍。计算表明,在处理一维或多维抽样问题时,随机优化算法与其他方法相比,虽然在算法运行上耗时略长,但有很好的稳定性和很高的精度,综合评价具有较明显的优越性。
Sampling method has many applications, so sampling theory attracts muchattention in Statistics. Especially, when the probability density function of adistribution is given, there are many existing sampling methods, such as Mixturemethod, Rejection method, MCMC and Quantile method. However, in literature,research is limited to individual descriptions of instructions about algorithms forimplementation of these methods, while systematic comparison among them hasnot been extensively investigated.
     Based on the concept of kernel density estimation, this paper firstly definesan”L_(2~-)distance”, which can describe the difference between the probability den-sity function of the distribution and kernel density estimation of a sample. Thenaiming to minimize”L_(2~-)distance”, a stochastic optimization sampling algorithmis proposed accordingly. Finally, various sampling methods including Mixturemethod, Rejection method, MCMC, Quantile method and Stochastic optimiza-tion methods are compared when a distribution with complex probability densityfunction is considered. Experiments with different sample size are conducted.Means and standard deviations of L_(2~-)distances are calculated. Time elapsed andk-th moments of different sampling methods are recorded. All results show thatthough a little more time may be need, the stochastic optimization algorithmproposed in the current paper performs better related to stability and accuracy.It generally outperforms other existing sampling methods in comprehensive eval-uation.
引文
[1]峁诗松,王静龙,濮晓龙,高等数理统计,高等教育出版社, 2007.
    [2] V. Neumann, Various techniques used in connection with random digits ,National Bureau of Standard Series, 12, 36-38, 1951.
    [3] K. E. E. Raatikainen, Simultaneous Estimation of Several Persentiles, Sim-ulation., October, 159-164, 1987.
    [4] L. Devroye, Non-Uniform Random Variate Generation, New York: SpringerPress, 1986.
    [5] W. Ho¨rmann, J. Leydold and G. Derffinger, Automatic Nonuniform RandomVariate Generation, New York: Springer Press, 2004.
    [6] J. Q. Fan and Q. W. Yao, Nonlinear Time Series: Nonparametric and Para-metric Methods, Bei Jing: Science Press, 2005.
    [7] B. W. Silverman, Density Estimation for Statistic and Data Analysis, Lon-don: Chapman and Hall Press, 1986.
    [8]张晓华,张茜,艾昊,基于层次分析法的世博会经济影响力的定量评估,中国科教创新导刊, No.1, 2011.
    [9]戴彬,屈锡华,李宏伟,基于模糊综合评价的技术创新合作伙伴创新模型研究,科技进步与对策,第28卷,第1期, 2011.
    [10] J. Leydold, G. Der?inger, G. Tirler and W. H¨ormann, An automatic codegenerator for nonuniform random variate generation, Mathematics and Com-puters in Simulation, 62, 405-412, 2003.
    [11] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E.Teller, Equation of State Calculations by Fast Computing Machines, Journalof Chemical Physics, 21(6), 1087-1092, 1953.
    [12] W. Hastings, Monte Carlo sampling methods using Markov Chain and theirapplications, Biometrika, 57, 97-110, 1970.
    [13] M. Rueda, A. Arcos and E. Art′es, Quantile interval estimation in fnite pop-ulation using a multivariate ratio estimator, Metrika, 47, 203-213, 1998.
    [14] M. D. M. Rueda, A. Arcos, J. F. Mun?z and S. Singh, Quantile estimationin two-phase sampling, Computional Statistics and Data Analysis, 51, 2559-2572, 2007.
    [15]高惠璇,统计计算,北京大学出版社, 2006.
    [16] X. Q. Wang, Improving the rejection sampling method in quasi-Monte Carlomethod, Journal of Computational and Applied Mathematics, 114, 231-246,2000.
    [17] M. Evans and T. Swartz, Random Variate generation using concavity proper-ties of transform densities. Journal of Computational and Graphical Statis-tics, 7(4), 514-528, 1998.
    [18] D. Gammerman, Markov Chain Monte Carlo, London: Chapman and HallPress, 1997.
    [19]袁修开,吕震宙,池巧君,基于核密度估计的自适应抽样可靠性灵敏度分析,西北工业大学学报,第26卷,第3期, 2008.
    [20] R. J. Patz and B. W. Junker, Applications and Extensions of MCMC inIRT Multiple Item Types, Missing Data, and Rated Responses, Journal ofEducational and Behavioral Statistics, Vol.24, No.4, 342-366, 1999.
    [21]夏春华,二维概率密度核窗估计的快速算法,山西大学学报(自然科学版),24(2), 107-110, 2001.
    [22] A. Panagiotelis and M. Smith, Bayesian skew selection for multivariate mod-els, Computational Statistics and Data Analysis, 54, 1824-1839, 2010.
    [23] Z. Zhang, K. L. Chan, Y. Wu and C. Chen, Learning a multivariate Gaus-sian mixture model with the reversible jump MCMC algorithm, Statistics andComputing, 14, 343-355, 2004.
    [24]孙晓祥,杜宇静,金融问题中二元损失函数的核密度估计,北华大学学报(自然科学版),第7卷,第4期, 2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700