支持向量机与卡尔曼滤波算法在组合导航中的应用研究

英文题名：Research on Support Vector Machine and Its Application in Integrated Navigation with Kalman Filter
作者：陈磊琛
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：支持向量机 ; 支持向量回归 ; 卡尔曼滤波 ; GPS/INS组合导航系统
英文关键词：Support Vector Machine ; Support Vector Regression ; Kalman filter ; GPS/INS integrated navigation
学位年度：2010
导师：蔡之华
学科代码：081203
学位授予单位：中国地质大学
论文提交日期：2010-05-01

摘要

本世纪90年代中期,基于有限样本的机器学习理论研究得到了长足的发展,形成了一套完善的理论体系——统计学习理论(Statistics Learning Theory,SLT)。支持向量机(Support Vector Machine, SVM)是以此理论中的结构风险最小化原则为基础建立起来的。SVM采用核函数,使算法复杂度与样本维数无关,将“维数灾难”问题得以解决,在处理非线性问题上优于其他机器学习算法,具有良好的泛化能力。支持向量回归(Support Vector Regression, SVR)是SVM算法的扩展,为解决回归问题而提出来的,而且在函数估计问题中具有良好的表现。
     卡尔曼滤波是实时递推算法,并且所有状态量都是在时域空间内,因此适用于多维随机过程的估计。在过程处理中,系统内各个状态量都无需存储,只需实时地处理估计状态信息,使估计量逐渐趋于实际状态量。卡尔曼滤波用状态方程体现实际状态量的实时动态规律,无需了解实际状态量和观测量在各个时刻的一、二阶方差矩阵,只需通过系统状态方程和观测噪声的统计特性表征实际状态量和噪声的统计特征。系统中状态噪声和观测噪声都是白噪声,是平稳过程,统计特性不随时间改变,系统的状态方程又是已知的,所以卡尔曼滤波能估计平稳和非平稳状态变量。
     GPS和INS系统都具有全球、全方位、全时间的导航特点,并且都能输出十分完整的导航数据。GPS/INS组合导航系统发挥各自优势,弥补对方缺点,使组合后的导航精度高于两个系统独自工作的精度。对于INS方面,组合导航系统可以校准惯性传感器,提高INS的精度；而对于GPS而言,由于INS系统的辅助,提高了其定位跟踪的能力,并能防止接收机受到干扰。我国组合导航系统的研究起步于上个世纪70年代末,经过二十多年的努力,现在发展很快,已广泛应用于各个领域,并正在赶超世界先进水平。
     本文在前三章介绍了支持向量机、卡尔曼滤波和GPS/INS组合导航系统的基础知识,在第四章中提出了一种新型重采样支持向量机算法,从GA和SMOTE思想得到启发,采用类似差分演化交叉变异算子对少数类数据进行过采样,产生新的正类样本,使类之间数据量基本相等。然后根据支持向量机算法的特点,提出一种使用聚类的数据清理方法,删去冗余或者噪声样本。这样,通过对数据集的过采样和清理,一些有用的样本被保留下来,可以减小数据集规模,增强SVM训练的执行效率。
     第五章提出了一种在线实时优化算法——支持向量回归自适应卡尔曼滤波算法。根据实时获取的观测信息,使用支持向量回归在线调整观测协方差矩阵信息,动态地调整噪声信息能够使之接近实际噪声量,从而提高滤波估计精度。具体方法是假设噪声为零均值高斯白噪声,本章利用理论新息方差阵与实际方差阵比值应该在1附近的原理,如果比值偏离1,则显示观测噪声发生变化,需要对噪声协方差矩阵进行调整,使之重新回到比值为1附近。
     本文的主要创新之处在于：(1)提出了一种新型重采样支持向量机算法应用于不平衡数据问题中,并采用对比实验和UCI标准数据集实验,通过与标准支持向量机、SMOTE过采样支持向量机、遗传算法过采样支持向量机算法的比较,验证该算法的性能；(2)提出了支持向量回归自适应卡尔曼滤波算法应用于车载GPS/INS组合导航系统中,并与扩展卡尔曼滤波和模糊自适应卡尔曼滤波比较,验证该算法的性能。
In the 1960s of the last century, SVM arose from statistical learning theory, the aim being to solve only the problem of interest without solving a more difficult problem as an intermediate step. SVM are based on the structural risk minimization principle, closely related to regularization theory. This principle incorporates capacity control to prevent over-fitting and thus is a partial solution to the bias-variance trade-off dilemma. SVM were first suggested by Vapnik for classification and have recently become an area of intense research owing to developments in the techniques and theory coupled with extensions to regression and density estimation, and had good performance.
     The kalman filter is a real-time recursion algorithm and all the system states are in the time domain space, therefore, it is appropriate for estimating multi-dimensional stochastic process. Moreover, there is no need to save each system state in memory and we deal with the estimates online, making them trend to real states regularly. In addition, the kalman filter uses the statistical property of the system noise and the observation noise to process the signal and the kalman filter applies the system observation as the input of the filter and the estimation (system state or parameter) as the output. Not only it may carry on the process to the steady uni-dimensional stochastic process, but also it can estimate the non-steady multi-dimensional stochastic process, therefore its application is very widespread.
     The combination of GPS and inertial navigation system (INS) is the best integrated navigation, and both INS and GPS are global, all-round and full-time navigation equipments. They can provide very completed navigation data, and supplement the shortcomings of each other, which can supply higher accuracy than each works alone. As for INS, integrated navigation corrects the inertial sensors for improving the accuracy, and with the help of INS, GPS enhances the ability of positioning and tracking, which protects the GPS receiver from interference.
     In the first three chapters, we present the basic principal of SVM, kalman filter and GPS/INS integrated navigation. In the chapter four, we propose a novel resampling SVM algorithm, which is inspired by GA and SMOTE. This method is based on using the mutation and crossover operators of DE to over-sample the minority class to lessen the imbalance ratio and then clustering for both classes to delete redundant or noisy samples. Thus, by combining over-sampling and data cleaning technique, the useful samples are remained, improving the computational efficiency.
     In chapter five, we present an online optimized method named support vector regression self-adaption kalman filter algorithm (SVREKF). This method uses SVR for adjusting the observation covariance matrix online according to the current system observations. Moreover, using the adjustment factor to update the noisy system dynamically in order to make trend to actual noise and improve the accuracy of estimation. Providing system noise is the zero-mean Gaussian white noise, we recognize that the ratio of the theoretical residual covariance matrix and the actual residual covariance matrix is 1. If the ratio is far away from 1, then it illustrates the observation noise changes, which should adjust the noisy covariance matrix so that the ratio returns to 1.
     The innovation of this thesis can be grouped into two points. (ⅰ) Propose a novel resampling SVM algorithm application in imbalanced datasets problems, and then make experiments on UCI standard datasets. The results show that our method is an efficient way to solve imbalanced datasets problems, compared with standard SVM, SMOTE-SVM and DE-SVM under the criterion of F-measure and ROC Area (AUC). (ⅱ) Present a support vector regression self-adaption kalman filter application in vehicle-mounted GPS/INS integrated navigation, and make comparison with extend kalman filter and fuzzy self-adaption kalman filter for verifying the performance of this algorithm.

引文

[1]胡小平.自主导航理论与应用.长沙：国防科技大学出版社,2002.
    [2]袁信,俞济祥,陈哲.导航系统.北京：航空工业出版社,1993.
    [3]董绪荣,张守信,华仲春.GPS/INS组合导航定位及其应用.长沙：国防科技大学出版社,1998.
    [4]万德钧,房建成,王庆.GPS动态滤波的理论、方法及其应用.江苏：科学技术出版社,2000.
    [5]Vapnik V N. The nature of statistical learning theory. New York:Springer,1995.
    [6]Smola A J. A tutorial on support vector regression, Neuro COLT Technical Report Series, Royal Holloway College, University of London, UK,1998,10:1-49.
    [7]R E Kalman. A New Approach to Linear Filtering and Prediction Theory. Trans. ASME. Journal of Basic Eng,1960,82D:35-46.
    [8]G Welch, and G Bishop. An introduction to the kalman filter. Technical Report TR 95-041, University of North Carolina, Department of Computer Science,2006.
    [9]R E Kalman, R S Bucy. New Results in Linear Filtering and Prediction Theory. Trans. ASME. Journal of Basic Eng,1961,83D:95-108.
    [10]付梦印,邓志红,张继伟.Kalman滤波理论及其在导航系统中的应用.北京：科学出版社,2003.
    [11]V N Vapnik. An overview of statistical learning theory, IEEE Tran. Neural Networks,1999, 10(5):988-998.
    [12]V N Vapnik;张学工.统计学习理论的本质.北京：清华大学出版社,1999.
    [13]许建华,张学工.统计学习理论.北京：电子工业出版社,2004.
    [14]秦永元,张洪钺,汪叔华.卡尔曼滤波与组合导航原理.西安：西北工业大学出版社,2004.
    [15]邓自立.最优估计理论及其应用——建模、滤波、信息融合估计.哈尔滨：哈尔滨工业大学出版社,2005.
    [16]C D Evans, R Riggins. The design and analysis of integrated navigation systems using real INS and GPS data. Proceedings of the IEEE,1995.1:154-160.
    [17]朱家海.惯性导航.北京：国防工业出版社.2008.
    [18]邓自立,王欣,高媛.建模与估计.北京：科学出版社,2007.
    [19]黄晓瑞,崔平远.GPS/INS组合导航系统自适应滤波算法仿真研究.飞行力学,2001,19(2).
    [20]张炎华.鲁棒滤波理论及捷联惯导系统研究.[博士学位论文].上海交通大学.1996.
    [21]邓自立.自校正滤波理论及其应用——现代时间序列分析方法.哈尔滨：哈尔滨工业大学出版社.2003.
    [22]李国正,王猛,曾华军.支持向量机导论.西安：电子工业出版社.2004.
    [23]邓乃扬,田英杰.数据挖掘中的新方法——支持向量机.北京：科学出版社.2006.
    [24]边肇棋,张学工等.模式识别.北京：清华大学出版社.2002.
    [25]张学工.关于统计学习理论与支持向量机.自动化学报.2000,1(26).
    [26]李建民,张钹,林福宗.支持向量机的训练算法.清华大学学报(自然科学版),2003,1(43).
    [27]张浩然,韩正之.回归支持向量机的改进序列最小优化学习算法.软件学报,2003,14(12).
    [28]C Ronan, B Samy. Support Vector Machines for Large-Scale Regression Problems, IDIAY-P-R00-17. hitp://www.idi-ap.ch,2000.
    [29]曾华军,张根奎.机器学习.北京：机械工业出版社.2006.
    [30]J C Platt. Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges C, Smola A, eds. Advances in Kernel Methods Support Vector Learning. Cambridge, MA:MIT Press.1999,185-208.
    [31]G Wikstrom. Data classification using Support Vector Machines.2005.
    [32]S S Keerthi, S K. Shevade, C Bhattacharyya, et al. Improvements to Platt's SMO Algorithm for SVM Classifier Design. Dept. of Mechanical and Production Engineering National University of Singapore. Technical Report CD-99-14.
    [33]王君,朱莉,蔡之华.一种基于卡尔曼滤波和模糊控制的RBF神经网络新型学习算法.计算机应用,2006,26(7).
    [34]Xuewen Chen, Byron Gerlach, David Casasent. Pruning Support Vectors for Imbalanced Data Classification. Proceedings of International Joint Conference on Neural Networks, 2005,3:1883-1888.
    [35]N Chawla, K Bowyer, L Hall, et al. SMOTE:Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research,2002,16:321-357.
    [36]K Miroslav, M Stan. Addressing the Curse of Imbalanced Training Sets:One-Sided Selection. Proceedings of the 14th International Conference on Machine Learning,1997, 179-186.
    [37]D R Wilson, T R Martinez. Reduction Techniques for Instance-Based Learning Algorithms. Machine Learning.2000,38:257-286.
    [38]Hui Han, Wenyuan Wang, Binghuan Mao. Borderline-SMOTE:A New Over-Sampling Method in Imbalanced Data Sets Learning. Advances in Intelligent Computing.2005,3644: 878-887.
    [39]Z Bianca, C Elkan. Learning and Making Decisions When Costs and Probabilities are Both Unknown. The Seventh International Conference on Knowledge Discovery and Data Mining.2001,204-213.
    [40]Yi Lin, Lee Yoonkyung, Wahba Grace. Support Vector Machines for Classification in Nonstandard Situations. Machine Learning 2002,46:191-202.
    [41]Gang Wu, Edward Y. Chang. Class-Boundary Alignment for Imbalanced Dataset Learning. The ICML Workshop on Learning from Imbalanced Data Sets.2003.
    [42]L Jorma. Improving Identification of Difficult Small Classes by Balancing Class Distribution. Proceedings of the 8th Conference on AI in Medicine in Europe:Artificial Intelligence Medicine.2001,63-66.
    [43]Qiong Gu, Zhihua Cai, Li Zhu, Bo Huang. Data Mining on Imbalanced Data Sets.2008 International Conference on Advanced Computer Theory and Engineering.2008, 1020-1024.
    [44]E A Gustavo, P A Batista, R C Prati, et al. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explorations Newsletter.2004, 6(1):20-29.
    [45]Y L Murphey, Zhihang Chen, M Putrus, et, al. SVM learning from large training data set. Proceedings of the International Joint Conference on Neural Networks.2003,4:2860-2865.
    [46]R Storn, K Price. Differential Evolution:A Simple and Efficient Adaptive Scheme for Global Optimization over Continuous Spaces. Journal of Global Optimization.1997,11: 341-359.
    [47]P B Andrew. The Use of the Area under the Roc Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition.1997,30(7):1145-1159.
    [48]Peng Li, Peili Qiao, Yuanchao Liu. A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets. Fifth International Conference on Fuzzy Systems and Knowledge Discovery.2008,2:65-69.
    [49]A Rehan, K Stephen, J Nathalie. Applying Support Vector Machines to Imbalanced Datasets. The 15th European Conference on Machine Learning.2004,3201:39-50.
    [50]N Chawla, N Japkowicz, A Kolcz. Editorial:Special Issue on Learning from Imbalanced Data Sets. Sigkdd Explorations Special Issue on Learning from Imbalanced Datasets.2004, 6(1):1-6.
    [51]吴洪兴,彭宁,彭喜元.适用于不平衡样本数据处理的支持向量机方法.电子学报,2006.12A(34)：2395-2398.
    [52]王和勇,樊泓坤,姚正安.SMOTE和Biased-SVM相结合的不平衡数据分类方法.计算机科学,2008,5(35)：174-176.
    [53]赵凤英,王崇骏,陈世福.用于不均衡数据集的挖掘方法.计算机科学,2007,9(34)：139-141.
    [54]王小平,曹立明.遗传算法——理论、应用与软件实现.西安：西安交通大学出版社,2002.
    [55]潘正君,康立山,陈毓屏.演化计算.北京：清华大学出版社,广西科学技术出版社.
    [56]柏菁,刘建业,袁信.模糊自适应卡尔曼滤波技术研究.信息与控制.2002,31(3)：193-197.
    [57]徐田来,游文虎,崔平远.基于模糊自适应卡尔曼滤波的INS/GPS组合导航系统算法研究.宇航学报.2005,26(5)：571-575.
    [58]张池平,刘宗尧.一种改进的自适应模糊卡尔曼滤波算法.计算机工程与应用.2007,43(28)：25-28.
    [59]戴洪德,陈明,周绍磊,李娟,彭贤.基于支持向量机的自适应卡尔曼滤波技术研究.控制与决策.2008,8(23).
    [60]郭庆祝,孟维明,宋扬等.模糊控制技术发展现状及研究热点.自动化博览.2005,22(4)：68-70.
    [61]肖辞源.工程模糊系统.北京：科学出版社,2004.
    [62]A Z Lotfi, C Berkeley. Fuzzy Logic Toolbox for Use with MATLAB User's Guide. The Math Works Inc,2005.
    [63]王忠礼,段慧达,高玉峰.MATLAB应用技术——在电气工程与自动化专业中的应用.北京：清华大学出版社.2005.
    [64]张亮,郭仕剑.MATLAB7.X系统建模与仿真.北京：人民邮电出版社.2006.
    [65]何强.一种新的模糊支持向量机方法.[理学硕士学位论文].河北大学.2003.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700