非线性核主成分的神经网络台风强度集合预报建模研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
以中国气象局出版的南中国海海域1980-2008年台风强度资料为基础,针对台风强度的非线性、时变性特点,仿照数值天气预报的集合预报思想,采用神经网络、遗传算法及非线性核主成分分析方法进行了台风强度客观预报建模新方法研究。
     论文在建立这种新的神经网络台风强度集合预报模型时,采用的神经网络基本模型是前馈网络模型,该模型具有自适应学习、非线性映射等多种优良性能。但由于该模型在实际预报建模应用中发现,模型存在网络结构难以客观确定、容易产生“过拟合”等问题。而遗传算法是近年来人工智能技术领域应用十分广泛的一种基于生物界自然选择和自然遗传的全局优化算法,为了提高神经网络的泛化性能,论文利用遗传算法全局性搜索的特点进行神经网络结构和连接权的优化,并通过选择、交叉和变异三种遗传算子,在神经网络遗传种群的个体间进行信息交换,不断产生新的优良种群,将进化到最后一代的遗传种群作为集合预报建模的全部集合个体,并对每一个集合个体给予相同的权重,将每个集合个体的预报结果作合成,从而建立一种新的神经网络台风强度集合预报模型。
     在预报因子的处理方法上,由于以往在台风强度预报的实际应用研究中,一般是先通过计算预报因子与预报量的相关系数来初选相关较高的因子,再对这些高相关因子采用逐步回归方法自动筛选出建模的预报因子组合,但该方法没有进一步利用被逐步回归剔除的大量剩余高相关因子所包含的预报信息。对于这部分剩余因子若全部加入神经网络模型则会造成模型输入数据维数过高,网络学习训练时间过长,易产生“过拟合”问题。为了充分发掘利用全部预报因子的有效预报信息,简化模型输入,论文尝试采用一种非线性主成分分析方法——核主成分分析方法对被逐步回归剔除的高相关因子进行特征提取,综合考虑各核主成分的累积方差贡献及其与预报量的相关关系,提取包含了剩余因子大部分数据信息的核主成分与用逐步回归方法选入的因子一起作为神经网络集合预报模型的模型输入。其中,核主成分分析方法是核方法与主成分分析方法相结合的一种特征提取方法,可以提取数据间的高阶非线性关系,该方法是把输入空间的数据通过非线性映射变换到特征空间中,再在特征空间中使用主成分分析,最后通过核函数将特征空间的点积运算转化为输入空间的核函数计算,进行非线性特征提取。
     通过对上述模型构造和因子处理方法的研究,建立了一种非线性核主成分的神经网络集合预报模型。选取南中国海海域1980-2008年台风强度资料的6、7、8、9月份处于该海域内具有48小时以上生命史的台风个例作为预报研究对象,以气候持续因子作为预报的初选因子,1980-1999年数据资料作为建模样本,2000-2008年数据资料作为独立预报样本,分别建立了4个具有24小时预报时效的台风强度集合预报模型进行预报试验。由于各月初选的高相关因子有28~31个左右,对这些高相关因子用逐步回归选入模型的因子一般为5~8个左右,因而进一步采用核主成分分析方法对这部分被舍弃的因子进行特征提取,选取包含了剩余因子大部分数据信息的核主成分与逐步回归方法选入的因子一起作为神经网络集合预报模型的模型输入,以此建立各月的神经网络集合预报模型。利用各月的预报模型分别对2000-2008年独立样本进行预报试验,预报结果统计表明,6、7、8、9各月独立预报样本的预报平均绝对误差分别为4.58m/ s、4.52m/ s、3.13m/ s、4.58m/ s。为了分析检验该神经网络集合预报方法的性能,进一步依据相同的初选预报因子,使用传统的逐步回归预报方法建立方程进行预报计算,逐步回归方法对各月独立预报样本的预报平均绝对误差分别为4.84m/ s、5.58m/ s、3.68m/ s、5.14m/ s。神经网络集合预报模型的预报误差比逐步回归预报方法分别下降了5.25%、19.12%、14.94%、10.81%(即平均相对误差)。
     综合以上的预报试验和对比分析结果表明,论文研究建立的这种非线性核主成分神经网络集合预报方法比传统的逐步回归预报方法有更好的预报效果,其原因主要是论文提出的这种新的预报建模方法,能够很好地将被剔除的预报因子通过非线性核主成分特征提取的数学计算处理,将这些以往被剔除的有用预报信息加入预报模型,使预报模型包含更多的有效预报信息,从而改进和提高了预报模型的预报效果,这种预报模型构建方法和预报因子处理技术对相关领域的预报建模研究具有较好的参考意义。
A new objective prediction model has been developed for predicting typhoon intensitybased on neural network, genetic algorithm, Kernel Principal Component Analysis (KPCA) andusing the ensemble prediction theory of numerical weather prediction, due to the fact thattyphoon intensity is characteristic of nonlinearity and transientness. Typhoon intensity data weretaken from the“Typhoon Almanac”published by the China Meteorological Administration from1980 through 2008.
     To construct a new neural network ensemble prediction model for typhoon intensity, a back-propagation (BP) network is used as the basic model. The BP network has advantages ofadaptive learning, nonlinear mapping and so forth. But this network is difficult to determinedobjectively and yield“over-fitting”. So genetic algorithm is applied for optimize both of theneural network structure and connection weights with its global search characteristicsmeantime 3 genetic operators called selection crossover and mutation are used to exchangeinformation among individuals continuously until the best also the last generation of geneticpopulation in evolution process is reserved which worked as the member of ensemble predictionmodel than compound each forecast result of individuals with the same weight to set up a neuralnetwork ensemble prediction model thereby.
     Generally, for the treatment of predictors in the practical typhoon intensity prediction, thefactors, that have high individual correlation coefficients with the predictor, are treated usingstepwise regression method to select predictors for modeling, but the predictors, that areeliminated by stepwise regression method and have high prediction information, are discarded.While, if the eliminated factors with a number of prediction information that stepwise regressionselected are all used, too long training time may lead to over-fitting. So KPCA method is appliedfor feature extraction from the eliminated factors that linear regression equation selected, thanchoose few of the KPCA which contain the most prediction information with the factorsstepwise regression selected as input data for the ensemble prediction model. KPCA combinedKernel method and Principal Component Analysis, use PCA method after carried the data frominput space to feature space by nonlinear mapping, at last, change dot-product operation in thefeature space into kernel calculation in the input space, thus be able to extract non-linear relationship in data.
     According to the above research of model construction and treatment method for predictor,a neural network ensemble prediction model has been established base on KPCA. Take theclimatology and persistence factors as the primary factors to set up 4 typhoon intensity ensembleprediction model with 24 hours forecast aging based on the data of the typhoon intensity with 48hours life history in June July August and September from 1980 to 2008 respectively for test,which the data from 1980 to 1999 as modeling samples and the data from 2000 to 2008 asindependent prediction samples. As 28 to 31 factors selected by correlation coefficient eachmonth, but only 5 to 8 factors reserved after stepwise regression used, so KPCA method isapplied for feature extraction from the eliminated factors that stepwise regression selected, thanchoose few of the KPCA which contain the most prediction information with the factors stepwiseregression selected as input data to establish the neural network ensemble prediction model eachmonth thereby. The statistical results show that the mean absolute error of June July August andSeptember are 4.58 m /s 4.52 m /s 3.13 m /s 4.58 m /s respectively. In order to investigate theforecasting capability of this model, traditional linear regression prediction equation for the sameindependent prediction samples is discussed based on the same data, the corresponding error are4.84 m /s 5.58 m /s 3.68 m /s and 5.14 m /s . The error of ensemble prediction decreased5.25% 19.12% 14.94% and 10.81% than linear regression prediction respectively (the meanrelative error).
     The results show that the neural network ensemble prediction model based on KPCA ismore accurate than the traditional stepwise regression method. The reason is that, in the newmodel, the predictors that are eliminated, were treated using KPCA, and then their usefulinformation was added into the prediction model. Thus the new model contains more effectiveprediction information that can improve the forecast effect of the ensemble prediction model.Furthermore both of the way to model construction and the treatment technology for predictorhave a good reference significance for the research of prediction modeling in related fields.
引文
[1]王诗文.国家气象中心台风数值模式的改进及其应用试验[J].应用气象学报,1999,10(3):347-353.
    [2]杨平章.作用于台风系统的动力—热力因子分析[J].气象科学,2000,20(3):348-353.
    [3]冯利华,骆高远.基于模型叠加方法的登陆台风强度预报[J].海洋学报,2001,23(1):127-132.
    [4]夏国恩,金炜东,张葛祥.非线性主成分分析新方法[J].统计与决策,2006,(3):10-11.
    [5] Takashi Kimoto,Kazuo Asakawa,Morio Yoda,et al.Stock Market Prediction System with Modular NeuralNetworks[J]. Neural Networks,1990,1:1-6.
    [6] C.Rodriguez,S.Rementeria,J.I.Martin,et al.Fault Analysis with Modular Neural Networks[J]. ElectricalPower and Energy Systems,1996,18(2):99-110.
    [7] A.Verikas, K.Malmqvist, L.Bergman.Colour Image Segmentation by Modular Neural Network[J]. PatternRecognition Letters,1997,18:173-185.
    [8] A. J. Cannon and P. H. Whitfied, Downscaling recent streamflow conditions in British Columbia[J].Journal of Hydrology, 2002, 259:136-151.
    [9] W. H. Slade, R. L. Millerl and H. Ressom et al, Ensemble Neural Network Methods for Satellite[J].Derived Estimation of Chlorophyll a, Proc. Of the International Joint Conference on Neural Networks,2003,Vol-1,pp.547-552.
    [10] D. Baratta, G. Ciciono and F. Masulli er al, Application of an ensemble technique based on singularspectrurn analysis to daily raill forecasting[J], Neural Networks, 2003, 16: 375-387.
    [11] G. Santos. Garcia, G. Varela and Novoa et al, Prediction of postoperative morbidity after lung resectionusing an artificial neural network ensemble[J], Artificial Intelligence in Medicine,2004,30: 61-69.
    [12] Mohammad N. Almasri, Jagath J. Kaluarachchi. Modular neural networks to predict the nitratedistribution in ground water using the on-ground nitrogen loading and recharge data[J]. EnvironmentalModelling and Software, 2005, 20(7):851-871.
    [13] Brent Ferguson, Ranadhir Ghosh, John Yearwood. Modular neural network design for the problem ofalphabetic character recognition[J]. International Journal of Pattern Recognition and ArtificialIntelligence, 2005, 19 (2):249-269.
    [14] M. Gguz Guler, Recep Artir. Modular neural network modeling of compressive strength of high-aluminabricks by using tangent function[J]. Materials and Design, 2007, 28(26):112-118.
    [15]王航平,王淼等.基于RBF神经网络分析的微弱电信号预报[J].浙江大学学报(工学版),2008, 42(12):2127-2132.
    [16]郝立波,蒋艳明等.利用多目标地球化学数据识别第四纪沉积物类型—基于概率神经网络方法[J].吉林大学学报(地球科学版),2008, 38(6): 1081-1084.
    [17]孙文恒,王炜等.人工神经网络技术在胰腺癌诊断中的应用[J].兰州大学学报(自然科学版),2008,44: 224-227.
    [18]秦国强,王家序.神经网络在水润滑橡胶合金轴承内衬材料配方设计中的应用[J].世界橡胶工业,2009,36(4):20-23.
    [19]林远艳,王斌武等.基于GASA的RBF神经网络在木糖醇发酵过程测量中的应用[J].制造业自动化,2009,31(2):101-104.
    [20]徐妙君,吴远红.基于粗糙集和神经网络结合的鱼病诊断方法[J].计算机工程与设计,2009,30(7):1738-1741.
    [21]赵延明.基于改进BP神经网络的瓦斯含量预测模型[J].工矿自动化,2009, (4):10-13.
    [22]赵静娴,杜子平.基于神经网络和决策树相结合的信用风险评估模型研究[J].北京理工大学学报(社会科学版),2009, 11(1):76-79.
    [23]钱华明,王雯升.遗传神经网络在导航传感器故障诊断中的应用[J].中国航海,2009, 32(1): 6-9.
    [24]杨杰,刘桂雄.基于小波Elman神经网络的活塞环渗氮质量预测控制[J].华南理工大学学报(自然科学版),2009, 37(2): 45-48.
    [25]林健玲,金龙,林开平.神经网络方法在广西日降水预报中的应用[J].南京气象学院学报,2006,29(2): 215-219.
    [26] B Scholkopf,A J Smola,K R Muller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem [J].Neural Computation,1998,(10):1299-1319.
    [27]刘遵雄,况志军,刘觉夫.核主成分回归方法在电力负荷中期预测中的应用[J].计算机工程,2006,32(1):31-33.
    [28]赵丽红,孙宇舸,蔡玉等.基于核主成分分析的人脸识别[J].东北大学学报(自然科学版),2006,27 (8):847-850.
    [29]张友静,黄浩,马雪梅.基于KPCA和SAM的城市植被遥感分类研究[J].地理与地理信息科学,2006,22 (3):35-38.
    [30]李岳,温熙森,吕克洪.基于核主成分分析的铁谱磨粒特征提取方法研究[J].国防科技大学学报,2007,29 (2):113-116.
    [31]杨道军,钱新,钱瑜等.核主成分分析法在生态经济可持续发展评价中应用[J].环境科学与技术,2007,30(12):91-93.
    [32]常卫东,王正华,刘完芳等.基于核主成分分析的虹膜识别方法[J].兰州理工大学学报,2007,33(4):86-89.
    [33]王桂明,袁美玲.基于KPCA和RS理论的支持向量分类机及其应用研究[J].统计与信息论坛,2008,23(12):9-14.
    [34]郝惠敏,汤晓君,白鹏等.基于核主成分分析和支持向量回归机的红外光谱多组分混合气体定量分析[J].光谱学与光谱分析,2008,28(6):1286-1289.
    [35]林正春,王知衍.基于核主成分分析和子空间分类的边缘检测方法[J].华南理工大学学报(自然科学版),2009,37(1):59-63.
    [36]王春香,秦智渊.遗传算法在机械优化设计中的应用[J].机械,2009,36 (3):4-6.
    [37]顾磊,吴慧中.一种基于遗传算法的减法聚类方法[J].模式识别与人工智能,2008,21(6):758-762.
    [38]李惠峰,王健.基于遗传算法的高速飞行器模糊控制律设计[J].北京航空航天大学学报,2008,34(11):1250-1253.
    [39]邬莉娜,汪雄海.基于自适应遗传算法的机泵群效率优化控制[J].浙江大学学报,2008,42 (11):1910-1914.
    [40]金龙.神经网络气象预报建模理论方法与应用[M].北京:气象出版社,2004.
    [41]金龙,吴建生,林开平等.基于遗传算法的神经网络短期气候预测模型[J].高原气象,2005,24(6):981-987.
    [42]王太微,陈德辉.数值预报发展的新方向——集合数值预报[J].气象研究与应用,2007,28(1):6-12.
    [43] Long Jin, Cai Yao, Xiao-Yan Huang, A Nonlinear Artificial Intelligence Ensemble Prediction Model forTyphoon Intensity[J]. MONTHLY WEATHER REVIEW,2008,136:4541-4554.
    [44]金龙,罗莹,李永华.长期天气的人工神经网络混合预报模型研究[J].系统工程学报, 2003, l18(4):331-336.
    [45] Goldberg D. E. . Generic Algorithms in Search, Optimigatim and Mchine Learning[M]. Reading, MA,AddisiomWisely, 1989.
    [46] Davis, L.D. . Handbook of Genetic Algoithms[M]. Van Nostrand Reinhold, 1991.
    [47] Holland J. H.. Adaptation in Natural and Artifical Systems:An Introductory Analysis with Applications toBiology, Control and Artificial Intelligence. Ist Edition, A NN Arbor[M], MI: The University of MichiganPress,1975.
    [48]周明,孙树栋.遗传算法原理及应用[M].北京:国防工业出版社, 1999.
    [49]李志梅.基于KPCA-SVM模型的企业员工绩效评价研究[D].江门:五邑大学,2008.
    [50]王辉.基于核主成分分析特征提取及支持向量[D].合肥:合肥工业大学,2006.
    [51]王成名,余鑫晖.应用概率统计[M].桂林:广西师范大学出版社,2003.
    [52]刘素京,杨琳,王从庆.基于核主成分分析和支持向量回归机的飞机舱音信号识别[J].东南大学学报(自然科学版),2008,38(Ⅱ):123-127.
    [53]徐义田,王来生,崔文善等.核主成分分析(KPCA)在企业经济效益评价中的应用[J].数学的实践与认识,2006,36 (1):35-38.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700