基于角度的变系数多分类支持向量机(英文)
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Targeted local angle-based multi-category support vector machine
  • 作者:康文佳 ; 林文辉 ; 张三国
  • 英文作者:KANG Wenjia;LIN Wenhui;ZHANG Sanguo;School of Mathematical Sciences,University of Chinese Academy of Sciences;Technology Research Institute,Aisino Corporation;
  • 关键词:局部光滑 ; 多分类支持向量机 ; 基于角度的间隔最大分类框架
  • 英文关键词:local smoothing;;multi-category support vector machine;;angle-based maximum margin classification framework
  • 中文刊名:ZKYB
  • 英文刊名:Journal of University of Chinese Academy of Sciences
  • 机构:中国科学院大学数学科学学院;航天信息股份有限公司技术研究院;
  • 出版日期:2019-07-15
  • 出版单位:中国科学院大学学报
  • 年:2019
  • 期:v.36
  • 基金:Supported by the open project of Hubei Collaborative Innovation Center for Early Warning and Emergency Response Technology(JD20150402)
  • 语种:英文;
  • 页:ZKYB201904021
  • 页数:12
  • CN:04
  • ISSN:10-1131/N
  • 分类号:21-32
摘要
支持向量机作为机器学习中一个经典的分类算法,一直广受数据科学家的喜爱。无论是处理线性可分还是非线性可分数据,传统的支持向量机能够很好地解决二分类问题。针对给定的样本,支持向量机通过最大化最小间隔得到最佳的决策分界面,从而实现对新样本的类别预测。然而现实中的数据更为复杂多样,一方面数据的类别往往多于两个,近年不乏有优秀的多分类支持向量机算法出现;另一方面不同领域的数据的特征集中可能存在相对特殊的变量(称之为主变量,targeted variable),需要将其挑选出来并加以特殊处理,以保持主变量对最终分类结果的重要影响。考虑这两个方面,提出基于角度的变系数多分类支持向量机(TLAMSVM)模型以解决含有主变量的多分类问题。它使用具备更好几何解释能力的基于角度的间隔最大分类框架完成多分类,并引入变系数模型,通过选择合适的局部光滑函数处理主变量对模型的影响。把基于角度的变系数多分类支持向量机分别应用到模拟数据集和真实数据集上。数值结果显示,相比没有使用变系数思想或基于角度的多分类框架的多分类支持向量机,TLAMSVM模型具有更高的预测准确度。
        The support vector machine( SVM) is one of the most concise and efficient classification methods in machine learning. Traditional SVMs mainly handle with binary classification problems by maximizing the smallest margins. However,the real-world data are much more complicated. On the one hand,the label set usually has more than two categories,so SVMs need to be generalized for solving multi-category problems reasonably. On the other hand,there may exist one special variable which should be singled out to preserve its effect on the final results from other variables such as age in bioscience field. We name such a special variable as targeted variable. In this work,in order to take both aspects mentioned above into consideration,targeted local angle-based multi-category support vector machine( TLAMSVM) is proposed. This new model not only solves multi-category problems but also pays special attention to targeted variable. Moreover,TLAMSVM solves multiclassification in the framework of angle-based method,which provides a better interpretation from the geometrical viewpoint,and it uses local smoothing method to pool the information of targeted variable. In order to validate the classification effect of TLAMSVM model,we apply it to both simulated and real data sets,respectively,and get the expected results in numerical experiments.
引文
[1] Cortes C,Vapnik V.Support vector networks[J].Machine Learning,1995,20(3):273-297.
    [2] Yau G X,Zhang C.Multi-category angle-based classifier refit [EB/OL].(2016-07-19)[2017-08-12].https://arxiv.org/abs/1607.05709.
    [3] Moguerza J M,Muňoz A.Support vector machines with applications[J].Statistical Science,2006,21(3):322-336.
    [4] Allwein E L,Schapire R E,Singer Y.Reducing multiclass to binary:a unifying approach for margin classifiers[J].Proc International Conference on Machine Learning,San Francisco Ca:Morgan Kaufmann,2000,1(2):9-16.
    [5] Hastie T,Tibshirani R.Classification by pairwise coupling[C]//Conference on Advances in Neural Information Processing Systems.MIT Press,1998:507-513.
    [6] Crammer K,Singer Y.On the algorithmic implementation of multiclass kernel-based vector machines[J].J Machine Learning Res,2001,2(2):265-292.
    [7] Lee Y,Lin Y,Wahba G.Multicategory support vector machines:theory and application to the classification of microarray data and satellite radiance data[J].Journal of the American Statistical Association,2004,99(465):67-81.
    [8] Zhang C,Liu Y,Wu Z.On the effect and remedies of shrinkage on classification probability estimation[J].American Statistician,2013,67(3):134-142.
    [9] Zhang C,Liu Y.Multicategory angle-based large-margin classification[J].Biometrika,2014,3(3):625-640.
    [10] Zhang C,Liu Y,Wang J,et al.Reinforced angle-based multicategory support vector machines[J].Journal of Computational and Graphical Statistics,2016,25(3):806-825.
    [11] Wei Z,Wang K,Qu H Q,et al.From disease association to risk assessment:an optimistic view from genome-wide association studies on type 1 diabetes[J].Plos Genetics,2009,5(10):e1000678.
    [12] Torr P H S.Locally linear support vector machines[C]//International Conference on International Conference on Machine Learning.Omnipress,2011:985-992.
    [13] Chen T,Wang Y,Chen H,et al.Targeted local support vector machine for age-dependent classification[J].Journal of the American Statistical Association,2014,109(507):1 174-1 187.
    [14] Fernandes K,Cardoso J S,Fernandes J.Transfer Learning with Partial Observability Applied to Cervical Cancer Screening[M] Pattern Recognition and Image Analysis.2017:243-250.
    [15] Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.