一种新的特征选择方法及其在路面使用性能分析中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在高速公路管理信息系统中存有丰富的普查检测数据,这些普查数据可用于路面使用性能的综合评价,也可用于预测路面使用性能。实质上,这是对普查数据进行回归分析,这种回归分析需要满足以下要求:1、首先为保证评价和预测的精度,模型要具有非线性;2、受数据资料的制约,模型能够适用于处理小样本数据集;3、能够避免数据中噪音的影响;4、回归模型可以表达成简单易理解的显式,便于进行成因分析。这样的回归模型可以为公路养护的科学决策提供依据。
     用现有的回归方法在对上述的回归问题进行分析时,往往效果不好。比如支持向量回归机,在用规模很小的样本数据集进行训练时,得到的回归函数精度低,阶数失真。并且回归函数过于复杂,不能很好的体现输入与输出之间关系。如用神经网络回归,会产生过学习问题,不能得到回归函数,不能反映输入与输出之间关系。针对这些问题,本论文提出两种新的特征选择方法,将新方法应用于公路管理信息系统中的普查检测数据,可以得到新的路面使用性能综合评定方法,和新的路面使用性能预测方法。
     本论文的创新性工作主要体现在以下几点:
     (1)提出了一种基于矩阵相似性度量、遗传算法和支持向量机的特征选择方法。该方法使用矩阵相似性度量方法选择非线性空间,再通过遗传算法从非线性空间中选择特征,最后用线性支持向量机得到简明的回归函数或决策函数。实验证明在样本规模很小的情况下,该方法比其他方法回归精度要高。该方法所得的回归函数简单明了,便于进行成因分析,可以直观地建立起输入与输出之间的联系。同时在理论上阐明了矩阵相似性度量方法是一种有效控制VC维的方法。
     (2)提出了一种适用性更强的基于混合核函数、矩阵相似性度量和核主成分分析的序列极小化方法。进行核主成分分析时,使用的是混合核函数,其权值和形式参数是通过遗传算法,以矩阵相似性度量作为适应度,经过优化求得的,这样可以尽可能的控制核函数的复杂程度。使用序列极小化方法,可以对主成分做进一步的判别和选择,降低输入空间的维数,同时由于是线性的支持向量回归,不会增加学习机的VC维。经过验证该方法精度高于以往的类似方法。
     (3)将基于矩阵相似性度量、遗传算法和支持向量机的特征选择方法应用于路而使用性能的综合评价,可以克服样本数据规模过小的困难,将路面诸多损坏形式与路面使用性能之间的关系表达成简单易理解的多项式形式,便于进行综合评价组成分析。
     (4)将基于矩阵相似性度量、遗传算法和支持向量机的特征选择方法应用于路面使用性能衰变的预测。该方法可以克服路面养护信息系统中数据不完整的困难,将影响路而使用性能的诸多因素与路面使用性能之间的关系表达成简单易理解的函数形式,便于进行路面使用性能的成因分析。
There are wealth of census data in highway management information system. The census data can be used for comprehensive evaluation of pavement performance, and used to forecast pavement performance as well. In essence, this is a regression analysis of census data, and it has following characteristics.1.The regression should be nonlinear in order to ensure the accuracy of evaluation and prediction.2.The regression can be applied to small sample data sets.3.The regression Also can avoid the effect of the noise in data.4.The regression model should be an explicit function which is simple and easy to analyze the causality. The evaluation model and the forecasting evaluation model can provide a strong base for decision-making on road maintenance.
     In such practical problems as above, the existing regression methods are ineffective. Such as support vector regression trained by small sample data set is easy to fall in overfitting. The precision of regression function is low. The degree of regression function is distorted. Using neural network method cannot get an explicit function, and can not reflect the relationship between input and output. To solve these problems, two new features extraction methods are proposed. Using the new methods in highway management information system, we get a new comprehensive evaluation and prediction of pavement performance.
     The innovations of this paper are as following:
     (1) A feature extraction method based on matrix similarity measurement, genetic algorithm and linear support vector regression is proposed in this paper. Firstly, the nonlinear space is selected by using matrix similarity measurement. Then features are extracted from the nonlinear space by GA. A regression function is gotten by linear SVR. Experiments prove that the precision is higher than other methods when the sample size is small. The regression function gotten by this method has a simple and clear form. This facilitates the causality analysis. It is intuitive to set input-output model. In addition, it is proved that the matrix similarity measurement is effective to control VC dimension.
     (2) A sequence minimization based on mixed kernel, matrix similarity measurement and kernel principal component analysis is proposed. The mixed kernel is used in KPCA. The parameters of the mixed kernel are determined by GA, while the matrix similarity measurement serves as the fitness. So one can control kernel complexity as much as possible. A sequence minimization method is used to choose principal component, and the dimension of input space is reduced further. It will not increase the VC dimension of the learning machine because sequence minimization method is a linear SVM. Experiments prove that this method is better than previous methods.
     (3) The feature extraction method based on matrix similarity measurement, GA and linear SVR is applied to pavement performance evaluation. The difficulties caused by small training data set is avoided. A simple polynomial function can be gotten to express the relationship between pavement performance and all kinds of damage on road. This function makes it easy to analyze the causality.
     (4) The feature extraction method based on matrix similarity measurement, GA and linear SVR was applied to pavement performance prediction. A simple polynomial function is clear to express the relationship between pavement performance and all kinds of factor. This function provides a sound basis for decision-making on road maintenance.
引文
[1]边肇祺,张学工.模式识别(第2版).北京:清华大学出版社,2000.
    [2]K Kira, L A Rendell. The feature selection problem:Traditional methods and a new algorithm. Proc of 9th National Conf. on Artificial Intelligence, Menlo Park,1992:129-134.
    [3]G H John, R Kohavi, K Pfleger. Irrelevant features and the subset selection problem. Proc. of the 11th Int'1 Conf. on Machine Learning. Morgan Kaufmann,1994:121-129.
    [4]D Koller, M Sahami. Toward optimal feature selection. Proc. of Int'1 Conf. on Machine Learning.1996:284-292.
    [5]Manoranjan Dash, Huan Liu. Feature selection for classification. Intelligent Data Analysis,1997,1(3):131-156.
    [6]Reinhold Huber, Ludano VDatra. Feature Selection for ERS-1/2 In SAR Classification: High Dimensionality Case. Proc. of Int'1 Geoscience and Remote Sensing Symp (Vol.3).1998: 1605-1607.
    [7]Y Yamagata, H Oguma, Bayesian Feature Selection for Classifying Multi-Temporal SAR and TM Data. Proc. of Int'1 Geoscience and Remote Sensing Symp (Vol.2).1997:978-980.
    [8]A L Blum, P Langley. Selection of relevant feature and examples in machine learning. Artificial Intelligence,1997,97:245-271.
    [9]M Seherf, W Brauer. Feature Selection by Means of a Feature Weighting Approach [dissertation]. Munchen:Technical University Munchen,1997.
    [10]B Chakraborty. Genetic Algorithm with Fuzzy Fitness Function for Feature Selection. Proc. of the 2002 IEEE Int'1 Symp on Industrial Electronics.2002:315-319.
    [11]B Serpico, L Bruzzone. A New Search Algorithm for Feature Selection in Hyper Spectral Remote Sensing Images. IEEE Trans on Geoscience and Remote Sensing,2001,39(7):1360-1367.
    [12]谭湘莹,于秀兰,钱国惠.一种大小窗口结合的SAR图像纹理特征分类方法.系统工程与电子技术,2000,22(4):15-17.
    [13]L Xu, P Yan, T Chang. Best First Strategy for Feature Selection. Proc. of 9th Int' 1 Conf. on Pattern Recognition.1988:706-708.
    [14]C Cardie. Using Decision Trees to Improve Case-Based Learning. Proc. of 10th Int'1 Conf. on Machine Learning.1993:25-32.
    [15]I Kononenko. Estimating Attributes:Analysis and Extension of Relief. Proc. of European Conf. on Machine Learning.1994:171-182.
    [16]II Liu, RSetiono. A Probabilistic Approach to Feature Selection:A filter Solution. Proc. of Int'1 Conf. on Machine Learning.1996:319-327.
    [17]13 Chakraborty. Genetic Algorithm with Fuzzy Fitness Function for Feature Selection. Proc. of the 2002 IEEE International Symp on Industrial Electronic.2002, Vol 1,315-319.
    [18]Jos Bins, Bruce A Draper. Feature Selection from Huge Feature Sets. Proc. of the 8th IEEE Conf. on Computer Vision and Pattern Recognition. Vol 2.2001:159-165.
    [19]Sanmay Das. Filters, Wrappers and a Boosting Based Hybrid for Feature Selection. Proc. of the 8th Int'1 Conf. on Machine Learning.2001:74-81.
    [20]Huang Yuan, Shian-Shyong Tseng, Wu Gangshan, et al. A Two-Phase Feature Selection Method Using Beth Filter and Wrapper. Proc. of 1999 IEEE Int'1 Conf. on Systems, Man, and Cybernetios. Vol 2.1999:132-136.
    [21]R Kohavi, G H John. Wrappers for Feature Subset Selection. Artificial Intelligence Journal,1997,97(12):273-324.
    [22]王娟,慈林林.特征选择方法综述.计算机工程与科学,2005,27(12):68-71.
    [23]Sylvie Le Ilegarat-Mascle, Isabel le Bloch, et al. Application of Dempster-Shafer Evidonce Theory tO Unsupervised Classification in Multi-Source Remote Sensing[J]. IEEE Trans on Geoscience and Remote Sensing,1997,35(4):1018-1031.
    [241 Kari Torkkola. Nonlinear Feature Transforms Using Maximum Mutual Information. Proc. IJCNN.2001:2756-2761.
    [25]P R Krishnaiah, L N Kanal. Classification, pattern recognition, and reduction of dimensionality. Amsterdam:North-Holland Pub. Co.; New York:Elsevier Science Pub. Co. 1982.773-791.
    1261 Jos Bins, Bruce A, Draper. Feature Selection from Huge Feature Sets. Eighth International Conference on Computer Vision (ICCV' 01),2001, vol.2,159.
    [27]钱忠良,王文军.不变矩目标特征描述误差分析和基于上层建筑不变矩的舰船识别.电子测量与仪器学报,1994,8(3):23-31.
    [28]赵小杰,钟劲松,王宏琦.合成孔径雷达图像的特征选择.遥感技术与应用,2001,16(3):190-194.
    [29]张曦,阎威武,刘振亚等.基于核主元分析和邻近支持向量机的汽轮机凝汽器过程监控和故障诊断,中国电机工程学报,2007,27(14):56-61.
    [30]周鸣争.基于核函数Fisher鉴别的异常入侵检测.电子与信息学报,2006,28(9):1727-1730.
    [31]贾银山.支持向量机算法及其在网络入侵检测中的应用:(博士学位论文),大连:大连海事大学,2004.
    [32]Vapnik V, Golowich S, Smola A. Support vector method for function approximation, regression estimation, and signal processing, In:Advances in Neural Information Processing Systems, Cambridge, MA, MIT Press,1997:281-287
    [33]Vapnik VN著,张学工译.统计学习理论的本质(第二版).北京:清华大学出版社,2000.
    [34]Corts C,Vapnik V. Support Vector Networks. Machine learning.1995,20(3):273-297.
    [35]Vapnik V N. The nature of statistical learning theory (second edition). New York: Springer,2000.
    [36]Boser B, cuyon L, vapnik V. A training algorithm for optimal margin classifier. In fifth annual workshop on computational learning theory. ACM Press,1992:144-152.
    [37]Corts C, Vapnik V. The soft margin classifier. techinal memorandum 11359-931209-18TM, At&T Bell Labs,1993.
    [38]Scholkopf B. Comparing support vector machines with Gaussian kernel to radial basis function classifier. IEEE Transactions on signal processing,1997,45(11):2758-2765.
    [39]Smola A J, Scholkopf B. A tutorial on support vector regression. NeuroCOLT TR NC-TR-98-030, Royal Holloway College, University of London, UK,1998.
    [40]Weston J. Extensions to the support vector method(PhD thesis). London:Royal Holloway University of London,1999.
    [41]Scholkopf B, Smola A, and Vapnik V. Prior knowledge in support vector kernels. in Advances in Neural Information Processing Systems. Cambridge, MA, MIT Press,1998:640-646.
    [42]Burges C J C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery,1998,2(2):121-167.
    [43]Burges C J C. Geometry and invariances in kernel based methods in Advance in Kernel Methods-Support Vector Learning, Cambridge, MA, MIT Press,1999:89-116
    [44]Scholkopf B, Smola A, Muller K R. Kernel principal component analysis. In Proc. of ICANN'97,1997:583-589
    [45]Joachims T., Transductive inference for text classification using support vector machine. In Proceedings of 16 th International Conference on Machine Learning, Morgan Kaufmann,1999:148-156
    [46]Ayat N E, Cheriet M, remake L, et al, KMOD-a new support vector machine kernel with moderate decreasing for pattern recognition, application to digit image recognition. Proceedings of 6th Int Conf. on Document Analysis and Recognition, Seattle, USA:IEEE,2001, 1211-1215
    [47]Goh K., Chang E., Cheng K., SVM Binary Classifiers Ensembles for Image Classification. CIKM'01, November 5-10,2001, Atlanta, Georgia, USA
    [48]Dong Jingrong, "Research on Nonlinear Combining Modeling andForecasting of Foreign Exchange Rate", Journal of Chongqing Normal University (Natural Science Edition),2003.3, pp.1-5
    [49]Zhang Pingkang, Wang Meng, Zhao Dengfu, Zhang Jiangshe, "Support Vector Machine Approach for Peak Load Forecasting", Journal of Xi'an Jiaotong University, vol.32,2005, pp.398-401
    [50]Scholkopf B., Smola A., Williamson R. C. et al, New support vector algorithms, Neural Computation,2000,12(5):1207-1245
    [51]Chang Chih-Chung, Lin Chih-Jen, Training v-support vector classifiers:theory and algorithms, Neural Computation,2001,13(9):2119-2147
    [52]Chew Hong-Gunn,, Bogner Robert E., Lim Cheng-Chew, Dual nu-support vector machine with error rate and training size biasing. Proceedings of 26th IEEE ICASSP 2001, Salt Lake City, USA,2001:1269-1272
    [53]范昕炜.支持向量机算法的研究及其应用:(博士学位论文).杭州:浙江大学,2003.
    [54]Lee Y. J. and Mangasarian L. L., RSVM:Reduced support vector machines. In Proceedings of the lth SIAM International Conference on Data Mining, philadelphia,2001.
    [55]Lin Kuan-Ming and Lin Chih-Jen, A study on reduced support vector machines, IEEE Transactions on Neural Networks,2003,14(6):1449-1459.
    [56]Suykens J. and Vandewalle J., Least square support vector machine classifiers. Neural Processing Letters,1999,9(3):293-300
    [57]Suykens J., Branbanter J D, Lukas L, et al. Weighted least square support vector:robustness and spare approximation. Neural Computing and Applications,2002,48 (1):85-105.
    [58]Chew Hong-Gunn, Crisp D J, Bogner R E and Lim C C. Target detection in radar imagery using support vector machines with training size biasing. In:Proceedings of the sixth international conference on control, Automation, Robotics and Vision, (CD-ROM), Singapore, 2000.
    [59]Baesens B, Viaene S, Gestel T V, et al, An empirical assessment for kernel type performance for least squares support vector machine classifiers, IEEE Proceedings of 4th Int Conf. on Knowledge-based Intelligent Engineering Systems and Allied Technologies., UK: Brighton,2000,1:313-316.
    [60]Lin C-J, On the convergence of the decomposition method for support vector machines. IEEE Transactions on Neural Networks,2001,12(6):1288-1298.
    [61]Platt J C., Fast training of support vector machines using sequential minimal optimization. In Advances in kernel method-support vector learning. Cambridge, MA:MIT Press, 1999:185-208.
    [62]Keerthi S S., Gilbert E G., Convergence of a generalized SMO algorithm for SVM classifier design. Machine Learning,2002,46(1):351-360.
    [63]Domeniconi C. Gunopulos D., Incremental support vector machine construction. Proceedings of IEEE Int Conf. on Data Mining. USA:San Joes,2001,589-592.
    [64]Glenn Fung, Olvi L Mangasarian. Proximal Support Vector Machine Classifiers. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA,2001:77~86.
    [65]Laskov P. Feasible direction decomposition algorithms for training support vector machines, Machine Learning,2002,46(1):315-349.
    [66]Lin Chun-Fu and Wang Sheng-De. Fuzzy Support Vector Machines. IEEE Transactions on Neural Networks,2002,13(2):464-471.
    [67]Hsu Chih-Wei, Lin Chih-Jen, A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks 2002,13(2):415-425.
    [68]Chapelle 0, Vapnik V. Bacsquest 0, et al. Choosing multiple parameters for support vector machines. Machine Learning,2002,46(1):131-159.
    [69]Sebald D J, Buchlew J A. Support vector machines and the multiple hypothesis test problems. IEEE Trans on Signal Processing,2001,49(11):2865-2872.
    [70]Taku Kudo, Yuji Matsumoto. Chunking with support vector machines. NAACL2001
    [71]Vojtech Franc, Vaclav Hlavac. Multi-class support vector machine. ICPR 2002 Quebec
    [72]Har-Peled S., Roth D., and Zimak D.. Constraint classification:Anew approach to multiclass classification. In Proc.13th International Conf. Of Algorithmic Learning Theory, 2002:365-397
    [73]Fung Glenn, Mangasarian O. L.. Multicategory proximal support vector machine classifiers. Data Mining Institute Technical Report 01-06, July 2001
    [74]Xu P, Chan A K. An efficient algorithm on multi-class support vector machine model selection. In:Proceedings of the International Joint Conference on Neural Networks 2003, Portland,2003:3229-3232
    [75]蒋琳,彭黎.基于支持向量机的特征提取方法研究.计算机工程与应用,2007,43(20)210-213.
    [76]吴德会.一种基于LS-SVM的特征提取新方法及其在智能质量控制中的应用.计算机应用,2006,10(26):2446-2447.
    [77]Mangasarian O, Musicant D. Successive overrelaxation for support vector machines. IEEE transaction on Nueral Networks,1999,10(5):1032-1037.
    [78]Cherkassky V, Shao X, Mulier F, et al. Model complexity control for regression using VC generalization bound. IEEE Transaction on Neuzral Networks,1999,10(5):1075—1089.
    [79]杨丽明.基于VC推广界的线性支持向量回归机及其应用.计算机工程与应用,2006,42(31):230-232.
    [80]Smola A. Learning with kernels(PhD thesis). Technische University at Berlin,1998.
    [81]邓乃扬,田英杰.数据挖掘中的新方法——支持向量机.北京:科学出版社,2004.
    [82]Scholkopf B., Smola A., Muller K R., Kernel principal component analysis, In Proceedings of ICANN'97,1997:583-589.
    [83]李晓宇,张新峰,沈兰荪.一种确定径向基核函数参数的方法.电子学报,2005,33(B12):2459-2462.
    [84]邵军力,张景,魏长华.人工智能基础.北京:电子工业出版社,2000.
    [85]吴文虎,王建德.实用算法的分析与程序设计.北京:电子工业出版社,1998.
    [86]傅清祥,王晓东.算法与数据结构.北京:电子工业出版社,1998.
    [87]乔立岩,彭喜元,马云彤.基于遗传算法和支持向量机的特征子集选择方法.电子测量与仪器学报,2006,20(1):1-4.
    [88]韦振中,黄廷磊,基于支持向量机和遗传算法的特征选择广西工学院学报2006年第02期
    [89]顾钧.基于KPCA和SVM的网络入侵检测研究.计算机仿真,2010,27(7):105-107.
    [90]中华人民共和国交通部.JTJ 073—96公路养护技术规范.北京:人民交通出版社,1996
    [91]李志刚,邓学钧,顾锋.高速公路沥青路面性能综合评价模型的探讨.东南大学学报(自然科学 版),2000,30(4):129-131.
    [92]刘慧敏.高速公路沥青混凝土路面使用性能评价及养护决策研究,公路(增刊),2006,8:325-327.
    [93]王茵,胡昌斌,才华等.高速公路沥青路面使用性能综合评价指标的研究.沈阳建筑工程学院学报,2000,16(4):264-268.
    [94]胡霞光,王秉纲.两种基于遗传算法的路面性能综合评价方法,长安大学学报(自然科学版)2002,22(2):6-9.
    [95]陈页开,唐玉国,杨春霖.路面使用性能的综合评价,沈阳建筑工程学院学报.1997,13(3):283-286.
    [96]宋俊敏,李明,范虎彪,等.基于BP神经网络的路面使用性能评价方法研究.公路,2008,12:140-143.
    [97]Hsu Chih-Wei and Lin Chih-Jen. A comparison of method for multi-class support vector machines. IEEE Transactions on Neural Networks,2001,12(6):1288-1298.
    [98]姚祖康.路面管理系统.北京:人民交通出版社,1990.
    [99]黄仰贤.路面分析与设计.北京:人民交通出版社,1998.
    [100]潘玉利.路面管理系统原理.北京:人民交通出版社,1998.
    [101]胡昌斌,周蓝玉.沈大高速公路沥青路面使用性能及特点分析.沈阳建筑工程学院学报,1999,15(4):313-316.
    [102]张敏江,张丽萍,赵俭斌.路面使用性能灰色理论建模方法.吉林大学学报(工学版),2002,32(3):83-86.
    [103]刘有山,李春雷,李波,等.基于灰色系统理论预测路面使用性能.桂林工学院学报2005,25(2):173-175.
    [104]武建民.半刚性基层沥青路面使用性能衰变规律研究:(博士学位论文).西安:长安大学,2005.
    [105]喻翔.高速公路路面养护管理系统决策优化的研究:(博士学位论文).成都:西南交通大学,2005.
    [106]刘恩峰,郭天榜,党耀国,等.灰色系统理论及其应用.北京:科学出版社,2000.
    [107]周键伟,王大明,白琦峰.水泥混凝土桥梁沥青铺装层的病害成因分析.2008,24(6):52-55.
    [108]刘淑娟,郭小春,李庆杰,等.灰色双向差分建模在作物产量预测中的应用.山东农业大学学报(自然科学版),2002,33(4):486-490.
    [109]王国晓,安景峰,陈荣生.灰色理论在路面使用性能预测中的应用.公路交通科技,2002,19(3):16-19.
    [110]孙立军.沥青路面结构行为理论.北京:人民交通出版社,2005.
    [111]苗英豪,王秉纲.影响沥青混凝土路面的气候因素及评价指标研.公路,2007,(3):36-42.
    [112]夏国恩,曾绍华,金炜东.支持向量回归机在铁路客运量时间序列预测中的应用.计算机应用研究,2006,23(10):180-182.
    [113]王艳丽,王秉纲.用人工神经网络预测路面使用性能.西安公路交通大学学报,2001,21(1):42-45.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700