基于尺度化凸壳的最大间隔分类方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
以SVM为代表的最大间隔机器学习方法,因为具有简洁的数学形式、直观的几何解释和良好的泛化能力,在模式分类、数据挖掘等领域受到越来越多的关注。本文受压缩凸壳思想的启发,提出了一种新的用最大间隔思想构造线性不可分问题分类器的方法——尺度化凸壳(Scaled Convex Hull,简记为SCH)方法。该方法可以把求解线性不可分问题转化为求解两类样本分别生成的SCH间的最近点对的问题。通过使用核技巧,该方法还可以用于解决非线性分类问题。
     首先,给出了SCH的定义,证明了与其相关的一些性质,这些性质从理论上保证了在采用SCH构造分类器时的推广能力。SCH的大小是由尺度因子控制的,因此,通过不断地减小尺度因子,两个SCH不断缩小直至可分。然后,就可以通过计算几何中已有的成熟的最近点对算法,求解SCH间的最近点对,把垂直平分连接最近点对线段的超平面作为线性不可分问题的分类超平面,其对应的分类器称为基于SCH的最大间隔分类器。这种构造分类器的思想和用压缩凸壳构造SVM最大间隔分类器的思想是一致的,因此,该方法也可以看成是一种变形的SVM方法。SCH方法改进了压缩凸壳方法的不足之处,这是因为SCH与原凸壳有相同数量的顶点,这就给求解最近点对提供简单的方法,并且求解最近点对的复杂度与尺度因子无关。此外,SCH的形状不随尺度因子的变化而变化,这也是称之为尺度化凸壳的原因。
     其次,介绍了求解最近点对的三种计算几何算法,即Gilbert算法、SK算法和MDM算法,把这三种算法应用到SCH最近点对的求解中去。并与压缩凸壳的情形下的三种算法进行了计算复杂度的对比分析,说明了SCH方法的优点。
     再次,SCH方法还可用于解决类不平衡问题。一般地,对于类不平衡问题,正类样本数较少,生成的凸壳相对也较小,而负类点生成的凸壳较大,在这种情况下,得到的分类面会倾向于误分正类样本。而利用本文提出的SCH方法,通过不同程度的缩小两个凸壳,则可以解决这个问题。即对于负类点的凸壳,赋予小的尺度因子,而正类点的凸壳,则赋予大的尺度因子,这样得到的正类SCH和负类SCH大小基本一致,然后把垂直平分两个SCH的超平面作为分类面。
     用类似的方法,通过赋予不同的SCH以不同的尺度因子,本文提出的方法还可以解决代价敏感问题。
     最后,通过建立SCH和最小闭球问题之间的关系,本文把求解SCH分类器的问题转化为求解最小闭球问题。然后,利用现有的求解最小闭球的快速算法,可以求解大规模的SCH分类问题。
Maximal-margin classification approaches, noted as support vector machines, have attracted more and more attentions in the fileds of pattern recognition and data mining, due to the clear mathematical formulation, intuitive geometric description and good generalization performance. In this thesis, inspired by the notion of Reduced convex hull (RCH), a notion of scaled convex hull (SCH) is introduced to construct the maximal-margin classifiers for nonseparable classificaiton problems, through which the nonseparable problems can be transformed to the problems of finding the nearest point pair between two SCHs generated by the two class training points respectively.
     First, the notion of "scaled convex hulls" (SCH) is employed and a set of theoretical results are exploited to support it, and those results guartee theoretically the generalization performance of the SCH based classifiers. By a suitable selection of the scale factor (reduction factor), the initially overlapping SCHs can be reduced to become separable. Once they are separable, we can find the nearest point pair between them using the existing algorithms, and the separating hyperplane a) bisects, and b) is normal to the line segment joining these two nearest points. This separating hyperplane obtains the maximal margin between SCHs, resulting in a maximal-margin classifier. This viewpoint is the same as the reduced convex hull (RCH) framework for SVM classifiers, so it can be seen as a variant of SVMs. Being different to the RCH, the SCH has the number of extreme points as the original convex hull. It simplifies the computation of nearest point pair between the SCHs, which is independent to the reduction factor. Besides, the SCH has the same shape as the original concex hull when the sacle factor changes, so we call it scaled convex hull.
     Second, three algorithms are introduced to compute the nearest point pair between SCHs, i.e.,the Gilbert algorithm,the SK algorithm and the MDM algorithm. The comparison with the RCH methods shows the advantages of the proposed method.
     Third, the SCH can be used to solve imbalance problems. In this situation, the number of the positive points is much less than that of the negative points, so the convex hull generated by the positive points is smaller than that generated by the negative points, and the resulting classifier will tend to misclassify more positive points. By providing different SCH with a different scale factor, the imbalance can be addressed, i.e., providing the positive SCH with bigger scale factor and the two SCH will have the same area, and the resulting classifier will misclassify less positive points.
     In the same way, by providing different SCH with a different scale factor, the proposed SCH method can solve the cost-sensitive problems.
     In the last, by building the relationship between the SCH and the minimum enclosing ball (MEB) problems, the solution of SCH based classifiers can be transformed to the solution of MEB. By the existing methods to solve MEB problems,the large-scale classification problems can be resolved.
引文
[1]Vapnik V. The Nature of Statistical Learning Theory. New York:Springer,1995
    [2]Vapnik V. Statistical Learning Theory. New York:John Willey,1998
    [3]Cortes C, Vapnik V. Support vevctor networks. Machine Learning,1995,20(3): 273-297
    [4]Osuna E, Freund R, Girosi F. An improved training algorithm for support vector machines. In Proceeding of the IEEE workshop on Neural Networks for Signal Processing.,1997
    [5]Platt J. Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods:Support Vector Learning,1999
    [6]Chapelle O. Training a support vector machine in the primal. Neural Computation, 2007,19(5):1155-1178
    [7]Pavlov D, Mao J, Dom B. Scaling-up support vector machines using boosting algorithm. In Proceedings of the International Conference on Pattern Recognition, Barcelona, Spain,2000,2:2219-2222
    [8]Lee Y J, Mangasarian O L. RSVM:Reduced support vector machines. In Proceeding of the First SIAM International Conference on Data Mining, Chicago, USA,2001
    [9]Mangasarian O, Musicant R. Active set support vector machine classification.In Leen T, Dietterich T, Tresp V. Advances in Neural Information Processing Systems 13, Cambridge:MIT Press,2001:577-583
    [10]Mangasarian O, Musicant R. Lagrangian support vector machines. Journal of Machine Learning Research,2001,1:161-177
    [11]Bennett K P, Bredensteiner E J. Geometry in learning. In Geometry at Work Washington, USA:Mathematical Asso-ciation of America,1998:132-145
    [12]Bennett K P, Bredensteiner E J. Duality and geometry in SVM classifiers. In Proceedings of 17th International Conference on Machine Learning, San Mateo, CA: IEEE,2000:57-64
    [13]Crisp D J, Burges C J C. A geometric interpretation of v-SVM classifiers.In Advanves in Neural Information Processing System (NIPS) 12. Denver, USA:MIT press,1999:244-250
    [14]Tsang I W, Kwok J T, Cheung P M. Core vector machines:fast SVM training on very large data sets. The Journal of Machine Learning Research,2005,6:363-392
    [15]Keerthi S S, Shevade S K, Bhattacharyya C, et al. A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Transaction on Neural Network,2000,11(1):124-136
    [16]Franc V, Hlavac V. An iterative algorithm learning the maximal margin classihier. Pattern Recognition,2003,36(9):1985-1996
    [17]Scholkopf, B., Smola, A., Williamson R, et al. New support vector algorithms. Neural Computation,2000,12:1083-1121
    [18]Tao Q, Wu G W, Wang J. A general soft method for learning SVM classifiers with Ll-norm penalty. Pattern Recognition,2008,41(3):939-948
    [19]Mavroforakis M E, Theodoridis S. A geometric approach to support vector machine (SVM) classification. IEEE Transaction on Neural Network,2006,17(3):671-682
    [20]Theodoridis S, Mavroforakis M. Reduced convex hulls:a geometric approach to support vector machines. IEEE Signal Processing Magazine,2007,24(3):119-122
    [21]Chand D R, Kapur S S. An algorithm for convex polytopes JACM,1970, 17(1):78-86
    [22]Jarvis R A. On the identification of the convex hull of a finite set of points in the plane. Information Processing letters,1973,2(1):18-21
    [23]Preparata F P, Hong S. I. Convex hulls of finite sets of points in two and three dimensions. CACM,1977,20(2):87-93
    [24]Bronnimann H, Iacono J, Katajainen J, et al. Space-efficient planar convex hull algorithms. Theoretical Computer Science,2004,321(1):25-40
    [25]周培德.寻求平面上线段集的凸壳的算法.工程图学学报,2003,23(2):116-119
    [26]金文华,何涛,刘晓平等.基十有序简单多边形的平面点集凸包快速求取算法.计算机学报,1998,21(6):533-539
    [27]Chen T. Optimal output-sensilive convex hull algorithms in two and three dimensions. Discrete Compul Geom,1996,16(3):361-368
    [28]李一波,刘敏,王庆军等.基于分区和凸包的3维相貌复原.中国图象图形学报,2005,10(7):862-866
    [29]苏小红,丁进,马培军.用兴趣点凸包和SVM加权反馈实现图象检索.计算机学报,2009,32(11):2221-2228
    [30]姜文瀚,周晓飞,杨静宇.基于样本选择的最近邻凸包分类器.中国图象图形学报,2008,13(1):109-113
    [31]周晓飞,姜文瀚,杨静宇.仿射子空间最近点分类算法.中国图象图形学报,2008,13(8):1506-1510
    [32]周晓飞,姜文瀚,杨静宇.核最近邻凸包分类算法.中国图象图形学报,2007,12(7):1209-1213
    [33]Duan K, Keerthi S S, Poo A N. Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing,2003,51 (4):41-59
    [34]Lanckriet G, Cristianini N, Bartlett P, et al. Learning the kernel matrix with senudefinite programming. In Proceedings of the International Conference on Machine Learning,2002
    [35]Grammer K, Keshet J, Singer Y. Kernel design using boosting. in Advances in Neural Information Processing Systems,2002
    [36]Bousquet O, Herrmann D. On the complexity of learning the kernel matrix. In Advances in Neural Information Processing Systems,2002
    [37]Cancedda N, Gayssuer E, Goutte C, et al. Word sequence kernels. Journal of Mcichine Learnning Research,2003,3:1059-1082
    [38]Lodhi H, Saunders C, Shawe-Taylor J, et al. Text classification using string kernels. Journal of Machine Learning Research,2002,2:419-444
    [39]Jain A, Dubes R. Algorithms for clustering data. Englewood Cliffs. NJ:Prentice Hall,1988
    [40]McLachlan G, Peel D. Finite mixture models. New York:John Wiley & Sons,2000
    [41]Ng A Y, Jordan M I, Weiss Y. On spectral clustering:analysis and an algorithm. In Proceedings of Advances in Neural Information Processing Systems 14, Cambridge, MA:MIT Press,2002
    [42]Xu L, Neufeld J, Larson B, et al. Maximum margin clustering. In Advances in Neural Information Processing Systems 17, Cambridge, MA:MIT Press,2005
    [43]Valizadegan H, Jin R. Generalized maximum margin clustering and unsupervised kernel learning. In Advances in Neural Information Processing Systems 19, Cambridge, MA:MIT Press,2007
    [44]Zhang K, Tsang W, Kwok T. Maximum margin clustering made practical. IEEE Transactions on Neural Networks,2009,20(4):583-596
    [45]Zhao B, Wang F, Zhang C. Efficient multiclass maximum margin clustering. In the 25th International Conference on Machine Learning (ICML 08). Helsinki, Finland, 2008:1248-1255
    [46]Chapelle O, Sindhwani V, Keerthi S. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research,2008,9:203-233
    [47]Joachims T. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning, 1999
    [48]Chapelle O, Zien A. Semi-supervised classification by low density separation. In Proceedings of the 10 th International Workshop on Artificial Intelligence and Statistics,2005
    [49]Collobert R, Sinz F, Weston J, et al. Large scale transductive SVMs. Journal of Machine Learning Research,2006,7:1687-1712
    [50]Bie T, Cristianini N. Semi-supervised learning using semi-definite programming. In Semi-supervised Learning,2006
    [51]Sindhwani V, Keerthi S, Chapelle O. Deterministic annealing for semi-supervised kernel machines. In Proceedings of the International Conference on Machine Learning,2006
    [52]Taskar B, Guestrin C, Koller D. Max-margin markov networks. In Advances in Neural Information Processing Systems,2003
    [53]Tsochantaridis I, Hofmann T, Joachims T, et al. Support vector machine learning for interdependent and structured output spaces. In Proccedings of International Conference on Machine Learning,2004
    [54]Lafferty J, McCallum A, Pereira F. Conditional random fields:probabilistic models for segmenting and labeling sequence data. In Proceedings of International Conference on Machine Learning,2001
    [55]Miroslav K, Stan M. Addressing the curse of imbalanced training sets-one sided selection. In Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann,1997:179-186
    [56]Marcus A. Learning when data sets are imbalanced and when costs are unequal and unknown. In Proceedings of the Workshop on Learning from Imbalanced Data Sets II, ICML, Washington D. C.,2003
    [57]Freund Y, Iyer R, Schapire R, et al. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research,2003,4:933-969
    [58]Burges C, Shaked T, Renshaw E, et al. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning,2005
    [59]Andrews S, Tsochantaridis I, Hofinann T. Support vector machines for multiple-instance learning. In Advances in Nectral Information Processing Systems,2002
    [60]Hsu C, Lin C. A comparison of methods for multi-class support vector machines. IEEE transactions on neural networks,2001,13(2):415-425
    [61]Krebel U. Pairwise classification and support vector machines. In Scholkopf B, Burges C J C, Smola A J. Advances in Kernel Methods:Support vector learning, Cambridge, MA, MIT Press,1999:255-268
    [62]Chawla N, Japkowicz N, Kotcz A. Proceedings of ICML Workshop Learn. Imbalanced Data Sets,2003
    [63]Japkowicz N. Proceedings of AAAI Workshop Learning Imbalanced Data Sets, 2000
    [64]Weiss G. Mining with rarity:a unifying framework. ACM SIGKDD Explor. Newslett.,2004,6(1):7-19
    [65]Chawla V, Japkowicz N, Kolcz A. Editorial:special issue on learning from imbalanced data sets, ACM SIGKDD Explor. Newslett.,2004,6(1):1-6
    [66]Drummond C, Holte C. C4.5, class imbalance, and cost sensitivity:Why under-sampling beats over-sampling. In Proc. Working Notes ICML Workshop Learn. Imbalanced Data Sets, Washington DC,2003
    [67]Zhou Z H, Liu X. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng.,2006,18(1):63-77
    [68]Chawla N V, Bowyer K W, Hall L O, et al. SMOTE:Synthetic minority over-sampling technique. J. Artif. Intell. Res.,2002,16:321-357
    [69]Kubat M, Matwin S. Addressing the curse of imbalanced training sets:one-sided selection. In Proc.14th Int. Conf. Mach. Learn., Nashville, TN,1997:179-186
    [70]Weiss M, Provost F. Learning when training data are costly:the effect of class distributions on tree induction. J. Artif. Intell. Res.,2003,19:315-354
    [71]Batista G, Prati C, Monard C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett.2004, 6(1):20-29
    [72]谢纪刚,裘正定.非平衡数据集Fisher线性判别模型.北京交通大学学报,2006,30(5):15-18
    [73]Karakoulas J, Taylor J. Optimizing classifiers for imbalanced training sets. In Proc. Adv. Neural Inf. Process. Syst.11,1999:253-259
    [74]Ting M. An empirical study of metacost using boosting algorithm. In Proc.11th Eur. Conf. Mach. Learn., Barcelona, Spain,2000:413-425
    [75]Guo H, Viktor J. Learning from imbalanced data sets with boosting and data generation:the data boost-IM approach. ACM SIGKDD Explor. Newslett.,2003, 6(1):30-39
    [76]Viola P, Jones M. Fast and robust classification using asymmetric AdaBoost and a detector cascade. In Proc. Adv. Neural Inf. Process. Syst.14, T. G Dietterich, S. Becker and Z. Ghahramani. Cambridge, MA:MIT,2002:1311-1318
    [77]Masnadi-Shirazi H, Vasconcelos N. Asymmetric boosting. In Proc.24th Int. Conf. Mach. Learn., Corvallis, OR,2007:609-619
    [78]Veropoulos K, Campbell C, Cristianini N. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on Artificial Intelligence,1999:55-60
    [79]Elkan C. The foundations of cost-senstive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, Seattle, WA,2001:973-978
    [80]Ting M. An instance-weighting method to induce costsensitive trees. IEEE Transactions on Knowledge and Data Engineering,2002,14(3):659-665
    [81]McCallum A, Nigam K. A comparison of event model for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization,1998
    [82]薛贞霞,张素玲,刘三阳.基于类权重的模糊不平衡数据分类方法.计算机科学,2008,35(11):170-173
    [83]Huang K Z, Yang H Q, King L, et al. Learning classifiers from imbalanced data based on biased minimax p robability machine. In Proc of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2004:558-563
    [84]Margineantu D, Ietterich G. Bootstrap methods for the cost-sensitive evaluation of classifiers. In Proc of International Conference on Machine Learning,2000: 582-590
    [85]郑恩辉.基于支持向量机的代价敏感数据挖掘研究及应用:[博士学位论文].杭州:浙江大学图书馆,2006
    [86]Kukar M, Kononenko I. Cost-sensitive learning with neural networks. In Proceedings of the 13th European Conference on Artificial Intelligence,1998: 445-449
    [87]Schiffers J. A classification incorporating misclassification costs. Intelligent Data Analysis,1997,1(1):59-68
    [88]Lin Y M, Lee Y, Wahba. Support vector machines for classification in nonstandard situations, Machine Learnin g,2002,46(1):191-202
    [89]郑恩辉,李平,宋执环.基于支持向量机的代价敏感挖掘.信息与控制,2006,35(3):294-298
    [90]周志华,王珏.机器学习及其应用2007.北京:清华大学出版社,2007
    [91]Mitchell B, Demyanov V, Malozemov V. Finding the point of a polyhedron closet to the origin, SIAM J. Control,1974,12:19-26
    [92]Mavroforakis M, Sdralis M, Theodoridis S. A novel geometric algorithm based on reduced convex hulls. In Proc.18 th Int. Conf. Pattern Recognition (ICPR'06),2006, 2:564-568
    [93]Tao Q, Wu G, Wang J. A generalized S-K algorithm for learning v-SVM classifiers. Pattern Recognition Letters,2004,25:1165-1171
    [94]Joshi V. On evaluating performance of classifiers for rare classes. In Proceedings of the 2nd IEEE International Conference on Data Mining, Maebishi, Japan,2002: 641-644
    [95]Bradley P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition,1997,30(7):1145-1159
    [96]Charles L, Huang J, Zhang H. AUC:a statistically consistent and more discriminating measure than accuracy. In Proceedings of 18th International Conference on Artificial Intelligence,2003:329-341
    [97]Faragg D, Reiser B. Estimation of the area under the ROC curve. Statistics in Medicine,2002,21:3093-3106
    [98]Liu X, Zhou Z H. The influence of class imbalance on cost-sensitive learning:an empirical study. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM'06), Hong Kong, China,2006:970-974
    [99]Badoiu M, Clarkson L. Optimal core sets for balls. In DIMACSWorkshop on Computational Geometry,2002
    [100]Tax D, Duin R. Support vector domain description. Pattern Recognition Letters. 1999,20(14):1191-1199
    [101]Scholkopf B, Platt J, Shawe-Taylor J, et al,. Estimating the support of a high-dimensional distribution. Neral Computation,2002,13(7):1443-1471

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700