神经网络集成BOOSTING类算法研究

英文题名：Study on Boosting Algorithm of Neural Network Ensemble
作者：高敬阳
论文级别：博士
学科专业名称：控制理论与控制工程
中文关键词：神经网络集成 ; 争议度 ; 样本分布 ; 逆向权值分布策略 ; 层次分析法AHP ; 特征优化选择 ; 滚动轴承 ; 波形特征
英文关键词：Neural network ensemble ; Error-Right std ; sample
英文关键词：distribution ; Inverse Boosting ; Analytic Hierarchy Process ; feature
英文关键词：selection and optimization ; the rolling bearings ; waveform features
学位年度：2012
导师：朱群雄
学科代码：081101
学位授予单位：北京化工大学
论文提交日期：2012-05-30
答辩委员会主席：黄德先

摘要

神经网络集成Boosting算法有许多变种，在实践中最典型最有应用价值的是Adaboost算法，AdaBoost算法有错误样本恶性积累的缺点。随着迭代的继续，错误样本的权重呈指数级不断上升，便会出现恶性积累，这种恶性积累将会一直持续下去。为了避免这种恶性积累的产生，以及造成的过拟合现象，本文针对AdaBoost算法的研究主要包括以下内容：
     (1)针对AdaBoost算法权值修改策略中存在过分偏重于困难样本的情况，提出了基于争议度修改权值的算法ERstd—AdaBoost。该算法根据争议度的大小和分类正误结果来决定权值调整幅度，这种有差别的对样本权值进行调整，在一定程度上抑制了困难样本权重的逐轮积累；在不损失差异度的前提下提高了个体分类器的泛化精度，从而提高了网络集成的泛化性能。
     (2)针对AdaBoost算法产生的过拟合现象，提出了基于样本分布调整权值的算法ABSD。该算法根据样本在各类中的分布情况，设置非相等的初始权值，以及训练过程中样本权值遵循样本分布情况的调整策略，ABSD算法大大减少了过拟合现象，并可以有效提高集成网络的泛化性能。
     (3)对逆向权值分布策略的集成网络泛化性能、个体分类器泛化性能及网络的差异度进行了深入研究，提出了逆向权值分布策略的改进算法IB+。该算法与正向权值策略不同，即差异度对逆向权值分布策略所生成的集成网络泛化性能的影响很小，泛化性能的决定因素为其个体分类器的泛化性能。改进后逆向权值分布策略算法的泛化性能某些方面优于正向权值分布策略。
     (4)针对滚动轴承在线故障诊断问题，提出了基于时域信号波形的特征提取方法。提出基于AHP分析方法，通过确定最小欧氏距离、欧氏距离总和及欧氏距离方差三个测度指标的不同权重，完成了同维数时特征提取方法的综合评价，并给出特征提取方法评价的量化结果。综合评价方法不仅可以验证提取方法的可分性和有效程度，还为不同方法的取舍或特征的优化选择提供了度量评价手段。
     (5)采用ABSD算法融入到改进的逆向算法IB+中，通过对常规统计特征、优化后的波形纵向特征、横向特征的训练，验证了IB+算法泛化性能好于其他算法；验证了波形特征的分类能力好于常规统计特征。
There is shortcoming of error malignant accumulation in AdaBoostalgorithms. As the wrong sample weight rises to certain proportion, thevicious circle will appear and always continue. In order to avoid thisvicious circle, this study of AdaBoost algorithm includes the following:
     (1) For there being too much emphasis on the difficult samples inAdaBoost algorithm weights modifying policy, this paper presentsERstd—AdaBoost algorithms to modify the weights based on thecontroversial degree. This algorithm decides the size of the weightadjusted according to the dispute size and the classified error orcorrection result. This difference in the adjustment to sample weights willinhibit the accumulation of difficult sample weights round by round incertain degree. Because improving the generalization performance of theindividual classifiers under the premise of without losing the differences,thus the generalization performance of the integrated network isimproved.
     (2) For there being overfitting in AdaBoost algorithm, this paperpresents ABSD algorithms to modify the weights based on sampledistribution. ABSD algorithm can reduce too much emphasis on thedifficult samples, and greatly reduce the overfitting, and can effectivelyimprove the generalization performance of the integrated network.
     (3) This paper studied the generalization performance of integratednetwork in reverse weight distribution strategy, generalizationperformance of individual classifiers and the differences degree ofintegrated network, and proposed the improved algorithms (IB+) inreverse weight distribution strategy. The generalization performance ofthe improved algorithm in reverse weight distribution strategy wassignificantly better than of algorithm in forward weight distributionstrategy.
     (4) The feature extraction method based on time-domain signalwaveform was proposed for on-line fault diagnosis of rolling bearings.Based on AHP analysis methods to determine the different weights of thethree measurement indicators (the minimum Euclidean distance, the sumof Euclidean distance, the Euclidean distance variance), this papercompleted the comprehensive evaluation of the feature extractionmethods in same dimension, and given a quantitative result ofcomprehensive evaluation. Comprehensive evaluation method can notonly verify the separability and effectiveness of the extraction method, but also provide the measure appraisal method for the different methodchoices and feature selection and optimization.
     (5) Using improved reverse algorithm IB+integrated by ABSD, thetraining result of conventional statistical features, the optimizedwaveform vertical features and horizontal features have verified that thegeneralization performance of IB+algorithm is better than otheralgorithms, and also verified that the classification ability of waveformfeatures is better than of conventional statistical features.

引文

[1] McCulloch W S, Pitts W. A logical calculus of the ideas immanent in nervous activity.1943[J].Bull Math Biol,1990,52(1-2):99-115,73-97.
    [2] Hebb D O. The organization of behavior;a neuropsychological theory.[M]. New York: Wiley,1949.
    [3]张会生.前馈神经网络梯度训练算法的几个收敛性结果[D].大连理工大学计算数学,2009.
    [4]张建海.标准神经网络模型鲁棒稳定性分析及其在非线性系统鲁棒控制中的应用[D].浙江大学电气工程学院浙江大学控制理论与控制工程,2008.
    [5] Papert S, Minsky M L. Perceptrons; an introduction to computational geometry[M]. Cambridge,Mass.: MIT Press,1969.
    [6] Hopfield J J. Neural networks and physical systems with emergent collective computationalabilities[J]. Proc Natl Acad Sci U S A,1982,79(8):2554-2558.
    [7] McClelland J L, Rumelhart D E. Parallel distributed processing:explorations in the microstructureof cognition[M]. Cambridge, Mass.: MIT Press,1986.
    [8] Narendra K S, Parthasarathy K. Identification and control of dynamical systems using neuralnetworks[J]. IEEE Trans Neural Netw,1990,1(1):4-27.
    [9] Nigrin A. Neural networks for pattern recognition[M]. Cambridge, Mass.: MIT Press,1993.
    [10]傅强.选择性神经网络集成算法研究[D].浙江大学材料与化学工程学院浙江大学控制理论与工程,2007.
    [11] Valiant L G．A Theory of the Learnable．Commmnications of the ACM，l984，27(11)：ll34-l142.
    [12] Kearns M, Valiant L G. Learning Boolean formulae or factoring. In: Technical Report TR-1488.Cambridge, MA:. Aiken Computation Laboratory, Harvard University,1988
    [13] Hansen L K, Salamon P. Neural network ensembles. IEEE Transactions on Pattern Analysis andMachine Intelligence,1990,12(10):993-1001
    [14]周志华,陈世福.神经网络集成[J].计算机学报,2002,25(1):1-8.
    [15] Schapire R E. The strength of weak learnability[J]. Machine Learning,1990(2).
    [16]沈掌泉.神经网络集成技术及其在土壤学中应用的研究[D].浙江大学计算机应用,2005.
    [17] Sollich P, Krogh A. Learning with ensembles: how over-fitting can be useful. In: Advances inNeural Information Processing Systems8, Denver, CO: MIT Press, Cambridge, MA,1996.190-196.
    [18] Breiman L. Bagging predictors[J]. Machine Learning,1996(2).
    [19] Freund Y, Schapire R E. A Decision-Theoretic Generalization of On-Line Learning and anApplication to Boosting[J]. Journal of Computer and System Sciences,1997,55(1):119-139.
    [20]董乐红,耿国华,高原. Boosting算法综述[J].计算机应用与软件,2006(8):27-29.
    [21] H. Schwenk,Y. Bengio. Adaptive Boosting of Neural Network for Character Recognition,Technical Report, University de Montreal,1997.
    [22] Mao J, Mohiuddin K M. Improving OCR performance using character degradation models andboosting algorithm[J]. Pattern Recognition Letters,1997,18(11–13):1415-1419.
    [23] Y. Shimshoni, N. Intrator. Classification of Seismic Signals by Integrating Ensembles ofNeural Networks, IEEE Transactions on Signal Processing,1998,46(S):1194-1201.
    [24] Gutta S, Wechsler H. Face recognition using hybrid classifiers[J]. Pattern Recognition,1997,30(4):539-553.
    [25] Gutta S, Huang J R J, Jonathon P, Wechsler H. Mixture of experts for classification of gender,ethnic origin, and pose of human faces[J]. IEEE Trans Neural Networks,2000,11(4):948-960.
    [26] Cortes C, Vapnik V. Support vector networks[J]. Machine Learning,1995,20(3):273-297.
    [27]周志华,皇甫杰,张宏江,等.基于神经网络集成的多视角人脸识别[J].计算机研究与发展,2001(10):1204-1210.
    [28] Y. Hayashi, R. Setiono. Combining neural network predictions for medical diagnosis, Computersin Biology and Medicine,2002,32:237-246.
    [29]陆建江,张文献.中文文本分类器的设计[J].计算机工程与应用,2002(15):49-51.
    [30]张秀艳,徐立本.基于神经网络集成系统的股市预测模型[J].系统工程理论与实践,2003(9):67-70.
    [31]宋星光,夏利民.基于Bagging算法的水库水沙联合智能调度[J].计算机工程与应用,2004(25):218-219.
    [32]李凯,黄厚宽.一种提高神经网络集成差异性的学习方法[J].电子学报,2005(8):1387-1390.
    [33]林健,彭敏晶.基于神经网络集成的GDP预测模型[J].管理学报,2005(4):434-436.
    [34]孙冰,宫宁生,朱梧槚.基于覆盖的神经网络集成在语音识别中的应用[J].南京大学学报(自然科学版),2006(3):331-336.
    [35]于繁华,刘寒冰,谭国金.神经网络集成在结构损伤识别中的应用[J].吉林大学学报(工学版),2007(2):438-441.
    [36]邢杰,萧德云.基于集成神经网络的CSTR状态预测[J].计算机与应用化学,2007(4):433-436.
    [37]杨涛,张良春.基于Adaboost集成RBF神经网络的高速公路事件检测[J].计算机工程与应用,2008(32):223-225.
    [38]钱博,李燕萍,唐振民,等.基于神经网络集成的说话人识别算法仿真研究[J].系统仿真学报,2008(5):1285-1288.
    [39]王泉德,文必洋.高频地波雷达海杂波神经网络选择集成预测[J].系统工程与电子技术,2009(12):2801-2805.
    [40]朱群雄,孟庆浩.一种新的选择性神经网络集成方法及其在PTA中的应用[J].化工学报,2009(10):2510-2516.
    [41]黄承清,高敬阳.基于AdaBoost的车牌字符快速识别方法研究[J].计算机与现代化,2010(9):140-143.
    [42]徐鹤,王锁萍,王汝传,等.基于神经网络集成的P2P流量识别研究[J].南京邮电大学学报(自然科学版),2010(3):79-83.
    [43]刘红,陈光,宋国明,等.基于AdaBoost集成网络的模拟电路单软故障诊断[J].仪器仪表学报,2010(4):851-856.
    [44] Yoav Freund, Robert E.Schapire. Experiments with a New Boosting Algorithm[C]. MachineLearning: Proceeding of the Thirteenth International Conference,1996,148-156.
    [45] Nicolas Garcia-Pedrajas, Domingo Ortiz-Boyer, Boosting Random Subspace Method[J]. NeuralNetworks,2008,21,1344-1362.
    [46] L.I.Kuncheva, M.Skurichina,R.P.W.Duin, An Experimental Study on Diversity for Bagging andBoosting with Linear Classifiers[J]. Pattern Recognition Group, Department of AppliedPhysics,2001.
    [47] Joaquin Torres-Sospedra,Carlos Hernandez-Espinsosa, Mercedes Fernandez-Redondo, Mixingaveboost and Conserboost to Improve Boosting Methods[C]. Proceeding of International JointConference on Neural Networks,2007,12-17.
    [48] L.I.Kuncheva and C.J. Whitaker, Using Diversity with Three variants of Boosting[C]. MCS’02proceedings of the Third International Workshop on Mutiple Classifier Systems,2002.
    [49] Nikunj C.Oza, Boosting with Averaged Weight Vectors[J]. Computer Science,2003,2709:15-24.
    [50] Chun-Xia Zhang, Jiang-She Zhang, A Local Boosting algorithm for solving classificationproblems[J]. Computational Statistics&Data Analysis,2008,52,1928-1941.
    [51] Chun-Xia Zhang, Jiang-She Zhang, An efficient modified boosting method for solvingclassification problems[J]. Journal of Computaional and Applied Mathematics,2008,214,381-392.
    [52] UCI Machine Learning Repository[OL]. Available at http://archive.ics.uci.edu/ml/.
    [53] Julien Meynet, Jean-Philippe Thiran. Information Theoretic Combination of Pattern Classifiers[J].Pattern Recognition.2010,43,3412-3421.
    [54] Margineantu D., Dietterich T..Pruning adaptive boosting[C]. Proceeding of the14th InternationalConference on Machine Learning.1997,211-218.
    [55] Dietterich, T.G., An Experimental Comparison of Three Methods for Constructing Ensembles ofDecision Trees: Bagging[J]. Boosting and Randomization, Mach. Learn,2000,40(2):139-157.
    [56] Yoav Freund，Freund Y. Boosting a weak learning algorithm by majority [J]，Information andComputation，1995(121).
    [57] Robert E, Schapire，The strength of weak learn ability [J]，Machine Learning，5，197-227,1990.
    [58] Robert E. Schapire，A Brief Introduction to Boosting [J]，Proceedings of the SixTeenth InternationalJoint Conference on Artificial Intelligence，1999.
    [59] W. Kao,C. K. Ma. Memories. heteroscedasticity and prices limit in currency futures markets. J.Futures Markets,12:672–692,1992.
    [60] Robert E.Schapire, Yoram Singer. Improved Boosting Algorithms Using Confidence-RatedPredictions [J]，Machine Learning,37(3)297-336,1999.
    [61] Jie Song, Xiaoling Lu, Xizhi Wu. An Improved AdaBoost Algorithm for Unbalanced ClassificationData [J].2009Sixth International Conference on Fuzzy Systems and Knowledge Discovery，2009.
    [62] Yijun Sun, Sinisa Todorovic, Jian Li. Reducing the overfitting of adaboost by controlling its datadistribution skewness[J]. International Journal of Pattern Recognition and Artificial IntelligenceVol.20, No.7(2006)1093–1116
    [63] Quinlan JR. Bagging, Boosting, and C4.5[J]. Proceedings of the thirteenth national conference onartificial intelligence,725–730.
    [64] Lior Rokach Ensemble-based classifiers[J]. Artif Intell Rev (2010)33:1–39.
    [65] Gunnar R tsch, Takashi Onoda, Klaus Robert Müller. An improvement of AdaBoost to avoidoverfitting. Proc. of the Int. Conf. on Neural Information Processing,1998
    [66] Ludmila I.Kuncheva, Christtopher J.Whitaker. Using Diversity with Three Variants of Boosting[C],MCS‘02proceedings of the Third International Workshop on Multiple Classifier Systems,2002..
    [67] Joaquin Torres-Sospedra, Carlos Hernandez-Espinosa and Mercedes Fernandez-Redondo,Designing a multilayer feedforward ensemble with the weighted conservative boosting algorithm,Proceedings of international joint conference on neural networks,2007
    [68] Joaquin Torres-Sospedra, Carlos Hernandez-Espinosa and Mercedes Fernandez-Redondo,Researching on Combining Boosting Ensembles, International Joint Conference on NeuralNetworks,2008.
    [69] C.A.Shipp and L.I.kuncheva, Relationships between combination methods and measures ofdiversity in combining classifiers, Information Fusion,2002, Vol.3, no2, pp.135-148.
    [70] L.I.Kuncheva and C.J. Whitaker,Ten measures of diversity in classifier ensembles: limits for twoclassifiers. In Proc.IEE Workshop on Intelligent Sensor Processing, pages10/1-10/6, Birmingham,February2001
    [71] Kagan Tumer and Joydeep Ghosh, Error Correlation and Error Reduction in Ensemble Classifiers,Connection Science,1996,8(3/4):385-404.
    [72] J.Wickramaratna, S.Holden, and B.Buxton, Performance degradation in boosting. In J.Kittler andF.Roli, Proc.Second International Workshop on Multiple Classifier Systems,2001, Vol.2096,pp.11-21
    [73]郭丽华,刘超华,丁士圻.水下目标特征提取方法比较研究[J].吉林大学学报(信息科学版),2008(4):359-363.
    [74]王娟,慈林林,姚康泽.特征选择方法综述[J].计算机工程与科学,2005(12):72-75.
    [75]李杨寰,高峰,李腾,等.特征选择中信息熵的应用[J].计算机工程与应用,2009(15):54-57.
    [76]姚旭,王晓丹,张玉玺,等.特征选择方法综述[J].控制与决策,2012(2):161-166.
    [77] Avci E, Sengur A, Hanbay D. An optimum feature extraction method for texture classification[J].Expert Systems with Applications,2009,36(3, Part2):6036-6043.
    [78] Gauri S K, Chakraborty S. Feature-based recognition of control chart patterns[J]. Computers&Industrial Engineering,2006,51(4):726-742.
    [79] Saaty T L. A scaling method for priorities in hierarchical structures[J]. Journal of MathematicalPsychology,1977,15(3):234-281.
    [80]武广.我国风险投资项目评估模型的探究[D].华东师范大学,2008.
    [81]姜杉.基于特征零件结构并行设计技术研究[D].天津大学,2004.
    [82]王肖宇.基于层次分析法的京沈清文化遗产廊道构建[D].西安建筑科技大学,2009.
    [83]何正嘉,陈进,王太勇,褚福磊.机械故障诊断理论及应用[M].高等教育出版社,2010.
    [84]屈梁生,张西宁,沈玉娣.机械故障诊断理论与方法[M].西安交通大学出版社,2009.
    [85] R.B. Randall，Jerome Antoni. Rolling element bearing diagnostics—a tutorial. Mechanical Systemsand Signal Processing，(2011) v25, n2, p485-520
    [86]褚福磊,彭志科,冯志鹏,李志农.机械故障诊断中的现代信号处理方法[M].科学出版社,2009.
    [87] S. Chakraborty, Eric Keller, et al, Data Driven Anomaly detection via Symbolic Identification ofComplex Dynamical Systems[C], Proceedings of the2009IEEE International Conference onSystems, Man, and Cybernetics,2009,3745-3750.
    [88] A.Srivastav,A.Ray, et al, An information-theoretic measure for anomaly detection in complexdynamical systems[J], Mechanical Systems and Signal Processing，23(2009),358-371.
    [89] Xinnian Chen, Irene C Solomon, Comparison of the use of approximate entropy and sample entropy:Applications to neural respiratory signal[C], Proceedings of the2005IEEE Engineering inMedicine and Biology27th Annual Conference,4212-4215.
    [90]国家自然科学基金委工程与材料科学部.机械工程学科发展战略报告(2011-2020)[M].科学出版社,2010.
    [91]佟德纯,姚宝恒.工程信号处理与设备诊断[M].北京:科学出版社,2008

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700