结合粗糙集的支持向量机研究及应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
SVM是统计学习的一种,是在统计学习理论基础上发展起来的一种新型的学习机器。目前,SVM被看作是解决分类问题和回归问题的强有力的工具,并已经是机器学习领域继神经网络后新的研究热点。它以结构风险最小化原则以及VC维理论为理论基础,根据有限的样本信息在模型的复杂性和学习能力之间寻求最佳折衷,以期获得最好的推广能力。支持向量机被看作是对传统分类器的一个好的发展,在解决小样本、非线性和高维的机器学习问题中表现出了许多特有的优势。
     众所周知,利用支持向量来进行线性或非线性规划具有全局收敛优势,但是支持向量机在解决多类问题时转化过程较为繁杂,且计算量较大,需要占用大量的训练时间。为此,提出了基于邻域的支持向量机训练算法,即通过邻域的计算来减少训练样本的数目以节约训练时间并降低计算量。为了在降低冗余的同时确保分类的准确率,在训练过程中也引入了粗糙集的原理,利用粗糙集理论对数据进行属性约简,从而进一步减少支持向量机求解计算量。实际结果证明了该方法的有效性。
     本论文解决的主要问题:
     (1)针对二类分类问题提出的支持向量机在解决多类分类问题时需要进行一定的转化,本文采用将一个多类问题统一为一个两类问题的转化方法,并在空间映射方面做出改进,使得新类的类内距离更小,类间距离更大,从而提高样本的可分性,最后通过类内散度和类间散度的计算在UCI数据集上加以验证。
     (2)结合粗糙集与支持向量机的理论,利用粗糙集理论对数据的属性进行约简,在保持知识库分类能力不变的条件下,根据其等价关系删除其中不相关或不重要的属性,从而简化决策表,在某种程度上减少支持向量机求解计算量及处理时间。最后将属性约简结合邻域概念以及支持向量回归机算法应用到电力系统负荷预测当中,并与传统算法进行对比分析来证明改进算法的优越性。
A new learning machine---support vector machine(SVM) is one of the statistical learning theories,which also based on the statistical learning theory.Now,SVM is viewed as the most convincing tool in solving the classification and regression problems and the research focus in the filed of machine learning after neural network.On the basis of structural risk minimization and VC dimension,it seeking the optimum tradeoff between model complexities and learning abilities under the limited samples,to achieve the best generalization.SVM is regard as a good development to traditional classifier,showed many unique advantages in solving small samples,nonlinear,high dimension and other machine learning problems.
     It is well known,SVM has the advantage of global convergence,but the process become more complexity when it treat multi-class problems,also the computation cost and training time increased.Therefore,an improved algorithm based on neighborhood theory is proposed to handle the problems above.The rough set theory is also introduced to reduce data's features,so the time complexity is decreased.Actual results proved our method's validity.
     In this paper our main work is as follows:
     First,SVM is a two-class classifier previously,it need transformation when we meet a multi-class problem.The traditional method of transforming a multi-class problem is reconstructing the classes and the outputs of SVM classifier,and then the recognition rate is enhanced.Based on this method,we proposed an improved algorithm which changes the distribution of new samples by using mapping,after that,the same samples are more compact and the different samples are looser,which will contribute to classification.At last,with-class scatter and between-class scatter are calculated for validation in UCI database.
     Second,combining RS with SVM,use rough set theory to reduce data's features, delete the irrelevant features according to the equivalence relation,then,the time complexity is decreased.At last,the attribute reduction and neighborhood theory are combined with support vector regression;this improved algorithm is applied to electric power system load forecast and compared with traditional method to prove its advantages.
引文
[1]K.Fukunaga.Introduction to Statistical Pattern Recognition.Second edition.Academic Press,New York,1990:1-48.
    [2]张学工.关于统计学习理论与支持向量机.自动化学报.2000,26(1):32-42.
    [3]Vapnik,V.Statistical Learning Theory.John wiley & Sons,1998.
    [4]V.Vapnik.The Nature of Statistical Learning Theory.Springer Verlag,New York.1995:85-151.
    [5]Cristianini,N.,and J.Shawe-Taylor.An Introduction to Support Vector Machines and Other Kernel-based Learning Methods.Cambridge University Press,Cambridge,UK,2000.
    [6]Osuna,E.,R.Freund,and F.Girosi.Support vector machines:Training and applications.Technical Report A.I.Memo No.1602.Artificial Intelligence Lab,MIT,1997.
    [7]Muller,K.R.,S.Mika,G.Ratsch,K.Tsuda,and B.Scholkopf.An introduction to kernel-based learning algorithms.TEEE Trans,on Neural Networks,12(2):181-202,Mar.2001.
    [8]Burges,C.J.C.A tutorial on support vector machines for pattern recognition.Data Mining and Knowlege Discovery,2(2):121-167,June 1998.
    [9]刘耀年 庞松岭.基于粗糙集和最小二乘支持向量机的中长期负荷预测.中国电力,2007,10.
    [10]满江虹.基于粗糙集的分类知识发现方法及其应用研究.东南大学博士论文,2005,8.
    [11]邓乃扬 田英杰著.数据挖掘中的新方法---支持向量机[M].北京:科学出版社,2004
    [12]孙丙宇.支持向量机及其应用研究.中国科技大学博士论文,2004,11
    [13]苗夺谦 胡桂荣.知识约简的一种启发式算法[J].计算机研究与发展,1999,6(36):681-684.
    [14]J.Weston,C.Watkins.Multi-class support vector machines.In M. Verleysen,editor,Proceedings of ESANN99,Brussels,D.FactoPress,1999.
    [15]U.KreBel.Pairwise classification and support vector machines.In B.Schuolkopf,C.J.C.Burges,and A.J.Smola,editors,Advances in Kernel Methods:Support Vector Learning,Pages,MIT Press,Cambridge,MA,1999:255-268.
    [16]Guo.Y.,P.L.Bartlett,J.Shave-Taylor,and R.C.Williamson.Covering numbers for support vector machines.IEEE Trans.on Information Theory,48(1):239-250,2002.
    [17]Hastie T,Tibshirani R.Classification by pairwise coupling.In:Jordan MI,Kearnsa MJ,Solla SA(eds)Advances inneural information processing systems,MIT Press,Cambridge,MA,vol 10,1998.
    [18]Piatt JC,Cristianini N,Shawe-Taylor J.Large margin DAGs for multiclass classification.In:Advances in neural information processing systems,MIT Press,Cambridge,MA,vol 12,2000:547-553.
    [19]Xi,D.,I.Poaolak,and S.Lee.Facial component extraction and face recognition with support vector machines.In Proc.IEEE Intl.Gonf.Auto.Face & Gesture Recog,pages 83-88,2002.
    [20]Phillips P J.Support vector machines applied to face recognition.In M.I.Jordan,M.J.Kearns,S A.Solla,eds.Advnances in Neural Information Processing Systems 11,1998,803-809.
    [21]Zhifeng Li,Xiaoou Tang,Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition,IEEE Transactions on information forensics and security,JUNE 2007,VOL.2,NO.2,174-180.
    [22]H.Xiaohua.Knowledge Discovery in Databases:an Attribute-oriented Rough Set Approach.Ph.D.thesis,University of Regiona,Canada,1995.
    [23]J.Wroblewski.Finding minimal reducts using genetic algorithms.In Proceedings of the Second Annual International Joint Conference on Information Sciences,1995,186-189.
    [24]G.J.Bazan,A.Skowron,P.Synak.Dynamic reducts as a tool for extracting laws from decision tables.In Proceedings of the 8th Symposium on Methodologies for Intelligent Systems,1994,346-355.
    [25]苗夺谦,胡桂荣.知识约简的一种启发式算法.计算机研究与发展,1999,36(6):681-684.
    [26]L.A.Zadeh.Fuzzy sets.Information and Control,June 1965,8(3):338-353
    [27]L.G.Valiant.“A theory of the learnable”.Communications of the ACM.Nov 1984.27(11):1134-1142.
    [28]Wei Chiang Hong.Electric load forecasting by support vector modle.Department of Information Management,Oriental Institute of Technology 58,Section 2.2008.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700