小子样下数据处理的若干问题研究

英文题名：Research on Some Problems of Data Processing under Small Sample
作者：梁武
论文级别：硕士
学科专业名称：应用数学
中文关键词：小子样 ; 数据处理 ; Bootstrap ; 最小二乘支持向量机 ; 粒子群优化算法
英文关键词：Small sample ; Data processing ; Bootstrap ; Least squares support vector machine ; Particle swarm optimization
学位年度：2008
导师：王建州
学科代码：070104
学位授予单位：兰州大学
论文提交日期：2008-05-01

摘要

传统的数据处理都是立足于大子样的前提下,并且所提出的各种预测方法,如:时间序列预测方法、人工神经网络等都是在大子样时其性能才有理论上的保证。而在多数实际情况中,样本数目通常是非常有限的,甚至是很少的,这样很多方法都难以取得理想的效果,且大部分时间序列预测方法没有包含非线性的因素,而人工神经网络得到的解易陷入局部最优。这些不足极大限制了这些方法在实际中的应用。因此,小子样预测一直是统计界研究的难点问题。灰色预测理论较好地解决了小子样预测问题,但灰色预测在模型检验为不合格时,即P≤0.7,C≥0.65时不可用。支持向量机(Support Vector Machine,即SVM)用来解决非线性函数估计问题,服从结构风险最小化原理而非经验最小化原理,其算法是一个凸二次优化问题,保证找到的解是全局最优解,能较好的解决小子样、非线性、高维数等实际问题。最小二乘支持向量机(Least Squares Support VectorMachine,即LS-SVM),是支持向量机的一种演变,即将SVM法中的不等式约束改为等式约束,且将误差平方和损失函数作为训练集的经验损失,这样我们就把问题转化为一个线性矩阵求解问题。该方法具有专门针对小子样、算法复杂度与样本维数无关、处理非线性等优点。
     本文的主要研究成果及贡献如下:
     1)小子样预测问题一直是统计界研究的难点问题。本文通过对尖峰负荷及传染病发病率的预测,比较了几种方法的优劣,找到对于尖峰负荷及传染病发病率的最优预测。
     2)首次将LS-SVM方法应用于小子样传染病发病率的预测中,通过与灰色预测方法的比较,验证了该方法对于传染病发病率预测的有效性和先进性。
     3)提出了粒子群优化算法优化灰色预测模型的参数及输入集的方法,通过模拟计算预测精度明显提高。
The traditional statistics analysis bases on large sample data, and all kinds of estimate methods, such as, time series forecasting method, the artificial neural network etc. has theoretical assurance all just under the big sample. But in most actual circumstances, the sample number is usually very limited, even is few, thus a lot of methods are hard to obtain ideal result and many time series forecasting methods don't include the nonlinear factors, and the solution that artificial neural network gets into the local superior easily. These shortages limit these methods in actual application. Therefore, the small sample forecasting has been a difficult problem in statistics. The gray prediction theory is more adaptive to the small sample estimate problem compared with other methods, but gray estimate model examine grade is unqualified, namely: P≤0.7, C≥0.65, gray forecasting model can't be used. Support vector machine has been introduced for solving nonlinear function estimation problems. It is established based on the structural risk minimization principal rather than the minimized empirical error. Within this new approach the training problem is reformulated and represented in such a way so as to obtain a (convex) quadratic programming (QP) problem. The solution to this QP problem is global and unique, and it can well solve small sample, nonlinear, high dimension problems. A modified version of SVM for regression is called least squares support vector machine, namely, changed the restrictions from inequation to equation in the SVM method and made the error squares sum loss function as the empirical loss. In this way, we translate the problem into a linear matrix requesting problem. This method has the advantage to deal with small sample, complex algorithm and nonlinear and has nothing with the sample dimension.
     The main contributions of this paper are listed as follows:
     (1) The small sample forecasting has been a difficult problem in statistics for a long time. In the paper, we compare these methods by forecasting peak load and the incidence of infectious diseases, and we get the superior forecasting methods about peak load and the incidence of infectious diseases.
     (2) We apply LS-SVM method to the incidence of infectious diseases forecasting under small sample for the first time. By comparing with grey forecasting method, we get that the method is effective and advanced to forecast the incidence of infectious diseases.
     (3) Put forward a method using Particle Swarm Optimization algorithm to optimize the Grey Forecasting Model parameters and optimum input subset. By simulating and computing, we get an improved forecasting accuracy.

引文

[1]刘晨晖,电力系统负荷预报理论与方法,哈尔滨工业大学出版社,1987
    [2]Takeshi Haida,Shoichi muto,Regression Based Peak Load Forecasting Using a Transformation Thchnique,IEEE Transactions on Power System,9:4(1994)
    [3]K.Ono,K.Tokoro,Next day peak load forecating using the regression model,Tech.Rep.,Y90502,Central Research Institute of Electric Power Industry,1990
    [4]Dong C.Park,Osama Mohammed,Artificial Neural Network based Electric Peak Load Forecasting,IEEE,1991
    [5]Takashi Onoda,Next day peak load foresting using an artificial neural network,IEEE,1993
    [6]D.Seinivasan,C.S.Chang,A.C.Liew,Demand forecasting using fuzzy neural computation with special emphasis on weeked and public holiday forecasting,IEEE Trans.Power Systems,10:4(1995)
    [7]A.G.Bakirtzis,J.B.Theocharis,S.J.Kiartzis,K.J.Satsois,Short term load forecasting using fuzzy neural networks,IEEE Trans.Power Systems,10:3(1995)
    [8]Nima Amjady,Short-Term Hourly Load Forecasting Using Time-Series Modeling with Peak Load Estimation,IEEE Trans.Power Systems,16:3(2001)
    [9]S.KATO,K.YUKITA,Study of Daily Peak Load Forecasting by Structured Representation on Genetic Algorithms for Function Fitting,IEEE,2002
    [10]Efron B.,Bootstrap methods:another look at the jacknife[J],The Annals of Statistics,7:1(1979)1-26
    [11]Efron B,Tibshirani RJ,An introduction to the bootstrap[M],London:Chapman and Hall,1993
    [12]Efron B,Nonparametric estimate of standard error and confidence intervals[J],The Canadian Journal of Statistics,9:2(1981)137-172
    [13]M.R.斯皮格尔,L.J.斯蒂芬斯,统计学[M],科学出版社,2002
    [14]陈文华,李奇志,张为鄂,产品可靠性的Bootstrap区间估计方法[J],机械工程学报,39:6(2003)106-109
    [15]冯蕴雯,黄玮,吕震宙,宋笔锋,冯元生,极小子样试验的半经验评估方法[J],航空学报,25:5(2004)456-459
    [16]冯蕴雯,黄玮,吕震宙,极小子样试验的虚拟增广样本评估方法[J],西北工业大学学报,23:3(2005)384-387
    [17]冯蕴雯,黄玮,吕震宙,基于Bootstrap方法的小子样试验评估方法研究[J],机械科学与技术,25:1(2006)31-35
    [18]刘建,吴翊,谭璐,对Bootstrap方法的自助抽样的改进[J],数学理论与应用,26:1(2006)69-72
    [19]屈斐,王树宗,基于Bootstrap仿真的鱼雷系统可靠度置信下限[J],鱼雷技术,14:4(2006)32-35
    [20]李洪双,吕震宙,小子样场合下估算母体百分位值置信下限和可靠度置信下限的Bootstrap方法[J],航空学报,27:5(2006)789-794
    [21]方亚,机械产品可靠性评估方法研究,西北工业大学,2007
    [22]陈为元,段建军,陈东青,夏玉森,正态双边可靠度的Bootstrap下限,军械工程学院学报,17:4(2005)34-36
    [23]Peter J Bickel,David A Freedman,Some asymptotic theory for the Bootstrap,The Annuals of Statistics,9:6(1981)1196-1217
    [24]普雷斯W H,弗拉内里B P,托科尔斯基S A,数值方法大全,兰州:兰州大学出版社,1991
    [25]Rumelhart D E,Hinton G E,Williams R J,Learning internal representations by error propagation [A],Rumelhart D E,James L,McCleland J L,Parrallel distributed processing:explorations in the microstructure of cognition[C],Cambridge,MA:MIT Press,1(1986)318-362
    [26]丛爽,面向MATLAB工具箱的神经网络理论与应用[M],合肥:中国科技大学出版社,1998
    [27]袁曾任,人工神经元网络及其应用[M],北京:清华大学出版社,(1999)131-273
    [28]楼顺天,施阳,基于MATLAB的系统分析与设计—神经网络[M],西安:西安电子科技大学出版社,(1998)143-154
    [29]Neural Network Toolbox User's Guide,The Mathworks,inc.,1999
    [30]The Math Works Inc.,http://www.mathworks.com[OL],2004
    [31]Ho K L,Hsu Y Y,Yang C C,STLF using a multilayer neural network with an adaptive learning algorithm[J],IEEE Transon PS,7:1(1992)141-149
    [32]Parlos A G,An accelerated learning algorithm for multiplayer perceptron networks[J],IEEE Trans on Neural Networks,5:3(1994)86-88
    [33]Rigler A K,Irvine J M,Vogl T P,Rescaling of variables in BP learning[J],Neural Network,4:2(1991)225-229
    [34]Maniezzo V,Genetic evolution of the topology and weight distribution of neural networks[J],IEEE Trans on Neural Networks,5:1(1994)39-53
    [35]Hagan M T,Menhaj M B,Trainning feed forward networks with the Marquardt algorithm[J],IEEE Trans on Nerual Networks,5:6(1994)989-993
    [36]Hirose Y,BP algorithm which varies the number of hidden units[J],Neural Network,4:1(1991)61-66
    [37]Fahlman S E,Faster-learning variations on back-propagation:an emprical study[A],Touretzky D,Hinton G,Sejnowski T,Proceedings of the 1998 Connectionist Models Summer School[C],Carnegic Mellon University,(1988)38-51
    [38]Shar S,Palmieri F,MEKA-a fast local algorithm for training feedforward neural networks[A],Proceedings of the International Jiont Conference on Neural Networks[C],IEEE Press,New York,(1990)41-46
    [39]Watrious R L,Learning algorithms for connectionist network:applied gradient methods of nonlinear optimization[A],Proceedings of IEEE International Conference on Neural Networks[C],IEEE Press,New York,(1987)619-627
    [40]张智星,朱春在,水谷英二,张平安,高春华,高峰峦译,神经.模糊和软计算[M],西安:西安交通大学出版社,(2000)139-234
    [41]Jacobs R A,Increased rates of converagence through learning rate adaptation[J],Neural Networks,1(1998)295-307
    [42]Shar S,Palmieri F,Datum M,Optimal filtering algorithms for fast learning in feedforward neural networks[J],Neural Networks,5:5(1992)779-787
    [43]李义宝,张学勇,马建国,汪力君,基于BP神经网络的改进算法研究,合肥工业大学学报(自然科学版),28:6(2005)668-671
    [44]张圣楠,郭文义,肖力墉,基于MATLAB的BP神经网络的设计与训练,内蒙古科技与经济,(2005)95-98
    [45]苏高利,邓芳萍,论基于MATLAB语言的BP神经网络的改进算法,科技通报,19∶2(2003)130-135
    [46]罗成汉,基于MATLAB神经网络工具箱的BP网络实现,计算机仿真,21:5(2004)109-115
    [47]Vapnik V N,The Nature of Statistical Learning Theory[M],N Y:Springer-Verlag,1995
    [48]Vapnik V N,Statistical Learning Theory[M],New York,Wiley,1998
    [49]计丽霞,付晓刚,LS-SVM在电梯交通流预测中的应用,上海电机学院学报,9:3(2006)62-64
    [50]Suykens J A K,Vandewalle II,Least Squares Support Vector Machines Classifiers[J],Neural Processing Letters,9:3(1999)293-300
    [51]Suykens J A K,Van Gestel T,De Brabanter J,et al,Least Squares Support Vector Machines[M],Singapore:World Scientific Pulblishing Co Pte Lte,2002
    [52]Suykens J A K,De Brabanter J,Lukas L,et al,Weighted Least Squares Support Vector Machines:Robustness and Sparse Approximation[J],Neurocomputing,48:1-4(2002)85-105
    [53]范玉刚,李平,宋执环,动态加权最小二乘支持向量机,控制与决策,21:10(2006)1130-1133
    [54]Smola A,B Scholkopf,On a Kernel-based Method for Pattern Recognition,Regression,Approximation and Operator Incersion[R],GMD Technical Report,No:11064,1997
    [55]Smola A J,Scholkopf B A,Tutorial on support vector regression[R],Neuro COLT Tech.REP.TR 1998-030,Royal Holloway College,London,U K,1998
    [56]Steve Gunn,Support Vector Machine for Classification and Regression[R],ISIS Techinical Report,Image Speech and Intelligent Systems Group unit versity of southampton,1998
    [57]Muller K R,Smola A,Scholkopf B,et al,Prediction time series with support vector machines [C],Proceedings of International Conference on Artificial Neural Networks,1997
    [58]Gwstel T V,Johan A K,Suykens,Dirk-Emma Baestaens,et al,Financial time series prediction using least squares support ve,ctor machines with in the evidence framework[J],IEEE Transactions on Neural Networks,12:4(2001)1-41
    [59]纪延光,徐启华,韩之俊,基于支持向量机的R&D项目过程质量度量[J],中国管理科学,12:6(2004)62-67
    [60]杜小芳,张金隆,农产品销量预测的支持向量机方法,中国管理科学,13:4(2005)129-134
    [61]常军,李祯,朱业玉,李素萍,基于支持向量机(SVM)方法的冬季温度预测,气象科技,33(2005)100-104
    [62]张芬,陶亮,孙艳,基于混合核函数的SVM及其应用,计算机技术与发展,16:2(2006)176-178
    [63]李新东,张恒喜,基于粗支持向量机的飞机机体研制费用预测,长沙航空职业技术学院学报,6:4(2006)33-36
    [64]刘涵,刘丁,郑岗,梁炎明,宋念龙,基于最小二乘支持向量机的天然气负荷预测,化工学报,55:5(2004)828-832
    [65]Vapnik V,An Overview of Statistical Learning Theory,IEEE Trans.on Neural Network,10:5(1999)988-999
    [66]邓聚龙,灰色控制系统,华中理工大学出版社,1988
    [67]Kennedy J,Eberhart R C,Particle swarm optimization[J],Proc.IEEE International Conference on Neural Networks,4(1995)1942-1948
    [68]Eberhart R C,Shi Y,Particle Swarm Optimization:Developments,Applications and Resources [J],Proc.Congress on Evolutionary Computation,1(2001)81-86
    [69]王素欣,高利,王丽杰,崔小光,基于粒子群优化的制造单元重构研究,现代制造工程,1(2007)76-79
    [70]中华人民共和国国家统计局,北京:中国统计年鉴[M],1991-2002

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700