基于核方法的不平衡数据学习

英文题名：Imbalanced Data Learning Based on Kernel Methods
作者：林智勇
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：不平衡数据学习 ; 类不平衡问题 ; 核方法 ; 支持向量机
英文关键词：imbalanced data learning ; class imbalance problem ; kernel method ; support vector machine
学位年度：2009
导师：郝志峰
学科代码：081203
学位授予单位：华南理工大学
论文提交日期：2009-12-25

摘要

不平衡数据学习(IDL)是最近几年才引起人们广泛关注的一类特殊的有监督(分类)学习,它主要解决类间训练样本分布不均衡的分类问题,即所谓的类不平衡问题(CIP)。CIP存在于许多重要的实际领域,包括医疗诊断和入侵检测等。现有的学习算法大多是基于类分布平衡和精度最大化而设计的;当用于处理CIP时,它们容易对多类“过学习”,进而导致分类器的整体性能退化。客观地说,CIP已给当前的机器学习界带来了巨大挑战。
     围绕着如何合理且有效地处理CIP,本文以新兴的核方法尤其是以支持向量机(SVM)为途径开展了如下一系列的相关研究工作:
     (1)研究了IDL中分类器性能的合理评价这一基本问题。首先,对目前常用的一组评价准则进行了系统的归纳分析,从理论上探讨了传统的精度准则不适于IDL的原因。在此基础上,利用元学习方法,从实验角度研究了在不同准则上得到优化的SVM分类器的性能差异。研究结果表明,SVM虽然是先进的学习方法,但是在IDL中若以精度为优化准则去选择分类器,那么所得到的SVM分类器也是极易产生类别“偏斜”的,它更倾向于将数据预测为多类。而在其他一些更合理的准则上进行优化,则可以获得“纠偏”的SVM分类器,它们的整体性能更高。这一部分的研究结果不但揭示了不同评价准则的差异,而且为SVM的模型选择方面提供了有益的启示。
     (2)系统地研究了如何通过样本非对称加权而将若干拓展SVM用于处理CIP。以最小二乘SVM以及临近SVM等为代表的拓展SVM,由于求解容易且性能较好,它们也与标准SVM一样被广泛使用。然而,直接将这些拓展SVM用于IDL,往往难以获得令人满意的效果;对样本进行非对称加权是提高它们处理CIP能力的一种最简单易行的做法。针对某些已有样本加权策略的不足,提出了一种新的加权策略。新策略一方面赋予属于少类的样本比属于多类的样本更多的权重,另一方面也尽量减少异常样本的权重。不同的加权策略可方便地与不同的拓展SVM相结合;利用15个基准测试数据集,对各种SVM与加权策略组合进行了实验比较。实验结果表明,新的加权策略在某些情况下有较明显的性能优势。
     (3)受标准SVM模型的间隔最大化以及结构风险控制训练原则的启发,提出了一种新的大间隔核分类器训练模型,这是本文的一个主要创新之处。新模型不仅具有几何直观意义,更重要的是它强调对分类器泛化能力的优化。原始模型是一个难解的非凸优化模型,通过适当的松弛处理,得到了两个不同的易解的二阶锥规划(SOCP)模型。借助于SeDuMi优化工具箱,在12个基准测试数据集上进行了仿真实验。实验结果表明,与标准SVM模型相比,两个SOCP模型无论是在平衡数据集还是在不平衡数据集上都有一定的性能优势,其中一个还具有较强的稳定性。
     (4)针对下抽样技术容易造成训练样本信息丢失的问题,提出将它与集成学习相结合,进而提高SVM处理CIP的能力。以Bagging和AdaBoost为集成框架整合下抽样技术,针对已有算法的不足,提出了两个新算法,即,“基于聚类的反对称集成”(CABagE)以及“修正的反对称AdaBoost集成”(MAAdaBE),这是本文的另一个主要创新之处。基于20个基准测试数据集,对各种算法进行了实验比较。实验结果表明,与传统的单一SVM分类器相比,集成SVM分类器对少类的预测能力能得到显著提高,其整体性能也往往更理想。而与已有的集成算法相比,CABagE和MAAdaBE能构建具有更高少类预测精度的SVM集成分类器。进一步地,综合多个评价准则上的比较分析表明,MAAdaBE的整体性能是最好的,这与MAAdaBE中嵌入了一种有效的样本权重平滑机制有关。
Imbalanced data learning (IDL), which has got people to pay intensive attention in recent years, is a special kind of supervised (classification) learning. The main goal of IDL is to handle the classification problems with unevenly-distributed training examples between classes, i.e., the so-called class imbalance problems (CIPs). CIPs exist in many important real-world domains, including medical diagnosis and intrusion detection etc. Most of the existing classification algorithms are designed based on balanced class-distribution and classification-accuracy maximization; when applied to CIPs, they often“over-learn”majority class and further degrade the overall performance of trained classifiers. Objectively speaking, CIPs have posed an enormous challenge for the current machine learning research community.
     Focusing on how to deal with CIPs reasonably and effectively, via the newly-developed kernel methods, especially via support vector machine (SVM), we have carried out a series of related research work, which are summarized as follows:
     (1) Study on a basic issue of IDL, that is, how to evaluate classifier performance reasonably. We firstly summarize and analyze a set of evaluation metrics, which are frequently used in the current machine learning fields. In particular, the reason that traditional accuracy doesn’t suit for IDL is appropriately explored from theoretical aspect. Then, by using meta-learning method, we experimentally study the performance differences between SVM classifiers, which are optimized under different metrics. The experimental results show that although SVM is a state-of-the-art method, but the classifiers constructed by SVM are still readily biased to the majority when they are optimized under accuracy. Whereas, when optimizing under other more reasonable metrics, we can obtain“bias-rectified”SVM classifiers, which have better overall performance. The results obtained in this part not only exposit the distinction among different evaluation metrics, but also supply beneficial enlightenment for SVM’s model selection.
     (2) Study on how to apply several extended SVMs to CIPs by way of weighting the training examples asymmetrically. With least square SVM and proximal SVM as representatives, some extended SVMs have also been used as extensively as the standard SVM due to their easily-resolution and good performance. However, if these extended SVMs are implemented in IDL directly, we usually can’t obtain satisfying results. To improve their efficacy, one of the most simple and practical ways is to weight the training example asymmetrically. A new weighting strategy is proposed in this dissertation to overcome the deficiencies of some existing weighting methods, which assigns more weights to majority-class examples than to minority-class examples, and tries to decrease the weights of abnormal examples as well. The weighting strategies can be easily embedded in the extended SVMs. Based on 15 benchmark datasets, we have conducted the numerical experiments to compare the performance of different combinations of extended SVMs and weighting mechanisms. The experimental results show that our new weighting strategy has significant performance advantages over other strategies in some cases.
     (3) Enlightened by the margin-maximization and structural risk control training principles of the standard SVM, we have proposed a new model for training kernel classifier with large margin, and this is one significant innovation of this dissertation. The proposed model has intuitive geometrical meaning; and more importantly, it emphasizes on optimizing classifier’s generalization capacity. The original optimization form of new model is non-convex and it is difficult to be handled. But, after appropriate relaxation, the original model can be transformed into two different and easily-resolved second order cone programming (SOCP) formulations. With the help of SeDuMi, a kind of freely-used optimization toolbox, we have conducted the numerical experiments on 12 benchmark datasets. The experimental results demonstrate that no matter for dealing with balanced datasets or unbalanced datasets the new SOCP models both outperform the standard SVM significantly in some cases; furthermore, one SOCP model shows relatively higher robustness than the standard SVM.
     (4) In view that under-sampling technique may suffer from training examples’information loss, we propose to combine it with ensemble learning to enhance the efficacy of SVM on CIPs. Bagging and AdaBoost are utilized as the ensemble learning frameworks to integrate the under-sampling technique. To overcome the deficiencies of some existing ensemble learning algorithms, two new ones, namely,“Clustering Based Asymmetric Bagging Ensemble”(CABagE) and“Modified Asymmetric AdaBoost Ensemble”(MAAdaBE), are proposed, and this is another significant innovation of this dissertation. Numerical experiments for comparison between different algorithms have been conducted on 20 benchmark datasets. The experimental results show that the ensembling SVMs can improve the prediction ability for minority class and usually have better overall performance than the single SVM. Compared with the existing ensemble learning algorithms, both CABagE and MAAdaBE can build the ensembling SVMs with higher prediction ability for minority class. Furthermore, the comparison analyses of experimental results under different metrics demonstrate that MAAdaBE has best overall performance, and this should be attributed to an efficient example-weight smoothing mechanism embedded in it.

引文

[1] Akbani R., Kwek S., Japkowicz N. Applying Support Vector Machines to Imbalanced Datasets[A]. In Boulicaut J.-F., Esposito F., Giannotti F., et al., Eds. The 15th European Conference on Machine Learning (ECML 2004)[C]. Berlin: Springer, 2004: 39-50.
    [2] Asuncion A, Newman D.J.UCI Machine Learning Repository. 2007, http://www.ics.uci.edu/~mlearn/MLRepository.html.
    [3] Batista G. E. A. P. A., Prati R. C., Monard M. C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20-29.
    [4] Bennett K. P., Parrado-Hernández E. The Interplay of Optimization and Machine Learning Research[J]. The Journal of Machine Learning Research, 2006, 7: 1265-1281.
    [5] Boser B. E., Guyon I., Vapnik V. A Training Algorithm for Optimal Margin Classifiers[A]. In Proceedings of the 5th Annual ACM Conference on Computational Learning Theory (COLT 1992)[C]. New York, NY, USA: ACM, 1992: 144-152.
    [6] Brefeld U., Scheffer T. AUC Maximizing Support Vector Learning[A]. In Ferri C., Lachiche N., Macskassy S., et al., Eds. Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning (ROCML 2005 )[C].
    [7] Breiman L. Bagging Predictors [J]. Machine Learning, 1996, 24: 123-140.
    [8] Callut J., Dupont P. FβSupport Vector Machines[A]. In Proceedings of 2005 International Joint Conference on Neural Networks (IJCNN 2005)[C]. 2005: 1443-1448.
    [9] Caruana R., Niculescu-Mizil A. Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria[A]. In Kim W., Kohavi R., Gehrke J., et al., Eds. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004)[C]. New York, NY, USA: ACM, 2004: 69-78
    [10] Chang C.-C., Lin C.-J. Libsvm : A Library for Support Vector Machines, 2001. Software Available at Http://Www.Csie.Ntu.Edu.Tw/~Cjlin/Libsvm
    [11] Chawla N. V. C4.5 and Imbalanced Data Sets: Investigating the Effect of Sampling Method, Probabilistic Estimate, and Decision Tree Structure[A]. In Chawla N. V., Japkowicz N. and lcz A. K., Eds. Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Datasets II[C].
    [12] Chawla N. V., Bowyer K. W., Hall L. O., et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 341-378.
    [13] Chawla N. V., Cieslak D. A., Hall L. O., et al. Automatically Countering Imbalance and Its Empirical Relationship to Cost[J]. Data Mining and Knowledge Discovery, 2008, 17(2): 225-252.
    [14] Chawla N. V., Japkowicz N., Lcz A. K. Editorial: Special Issue on Learning from Imbalanced Data Sets[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 1-6.
    [15] Chawla N. V., Lazarevic A., Hall L. O., et al. SMOTEBoost: Improving Prediction of the Minority Class in Boosting[A]. In Lavrac N., Gamberger D., Blockeel H., et al., Eds. The 7th European Conference on Principles and Practice of Knowledge Discovery in Database (PKDD 2003)[C]. Berlin: Springer, 2003: 107-1119.
    [16] Chen X.-W., Gerlach B., Casasent D. Pruning Support Vectors for Imbalanced Data Classification[A]. In Proceedings of 2005 International Joint Conference on Neural Networks (IJCNN 2005)[C]. 2005: 1883-1888.
    [17] Chen X.-W., Wasikowski M. FAST: A ROC-Based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems[A]. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2008)[C]. New York, NY, USA ACM, 2008: 124-132.
    [18] Conover W. J. Practical Nonparametric Statistics(3rd Edition)[M]. New York: John Wiley and Sons, 1999.
    [19] Davis J., Goadrich M. The Relationship between Precision-Recall and ROC Curves[A]. In Cohen W. W. and Moore A., Eds. Proceedings of the 23rd International Conference on Machine Learning (ICML 2006)[C]. New York, NY, USA: ACM, 2006: 233-240.
    [20] Dietterich T. G. Ensemble Learning[A]. In Arbib M. A., Ed. The Handbook of Brain Theory and Neural Networks,Second Edition[C]. Cambrige, MA: MIT Press, 2002:110-125.
    [21] Drummond C., Holte R. C. Cost Curves: An Improved Method for Visualizing Classifier Performance[J]. Machine Learning, 2006, 65: 95-130.
    [22] Drummond C., Holte R. C. C4. 5, Class Imbalance, and Cost Sensitivity: Why under-Sampling Beats over-Sampling?[A]. In Chawla N. V., Japkowicz N. and lcz A. K., Eds. Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Datasets II[C].
    [23] Duan K., Keerthi S. S., Poo A. N. Evaluation of Simple Performance Measures for Tuning SVM Hyperparameters[J]. Neurocomputing, 2003, 51: 41-59.
    [24] Duda R. O., E.Hart P., G.Stork D. Pattern Classification[M]. Wiley-Interscience, 2000.
    [25] Elkan C. The Foundations of Cost-Sensitive Learning[A]. In Nebel B., Ed. Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001)[C]. San Francisco, CA, USA: Morgan Kaufmann, 2001: 973-978.
    [26] Estabrooks A., Jo T., Japkowicz N. A Multiple Resampling Method for Learning from Imbalanced Data Sets[J]. Computational Intelligence, 2004, 20(1): 18-36.
    [27] Fan W., Stolfo S. J., Zhang J., et al. Adacost: Misclassification Cost-Sensitive Boosting[A]. In Bratko I. and Dzeroski S., Eds. Proceedings of the 16th International Conference on Machine Learning (ICML 1999)[C]. San Francisco, CA, USA: Morgan Kaufmann, 1999: 97-105.
    [28] Fawcett T. An Introduction to ROC Analysis[J]. Pattern Recognition Letters, 2006, 27(8): 861–874.
    [29] Ferri C., Flach P. A., Hernández-Orallo J. Learning Decision Trees Using the Area under the ROC Curve[A]. In Sammut C. and Hoffmann A. G., Eds. Proceedings of the 19th International Conference on Machine Learning (ICML 2002)[C]. San Francisco, CA, USA Morgan Kaufmann, 2002: 139-146.
    [30] Ferri C., Hernández-Orallo J., Modroiu R. An Experimental Comparison of Performance Measures for Classification[J]. Pattern Recognition Letters, 2009, 30(1): 27-38.
    [31] Freund Y., Schapire R. E., Abe N. A Short Introduction to Boosting[J]. Journal of Japanese Society for Artificial Intelligence, 1999, 14: 771-780.
    [32] Friedman J., Hastie T., Tibshirani R. Additive Logistic Regression: A Statistical View of Boosting[J]. The Annals of Statistics, 2000, 38(2): 337-374.
    [33] Fung G., Mangasarian O. L. Proximal Support Vector Machine Classifiers[A]. In Proceedings of the7th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2001)[C]. New York, NY, USA: ACM, 2001: 77-86.
    [34] García S., Herrera F. Evolutionary Under-Sampling for Classification with Imbalanced Data Sets: Proposals and Taxonomy[J]. Evolutionary Computation, 2008, 17(3): 275-306.
    [35] Guo H. Y., Viktor H. L. Learning from Imbalanced Data Sets with Boosting and Data Generation: The Databoost-Im Approach[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 30-39.
    [36] Han H., Wang W. Y., Mao B. H. Borderline-Smote: A New Over-Sampling Method in Imbalanced Data Sets Learning[J]. International Conference on Intelligent Computing (ICIC'05). Lecture Notes in Computer Science, 2005, 3644: 878-887.
    [37] Hand D. J., Vinciotti V. Choosing K for Two-Class Nearest Neighbour Classifiers with Unbalanced Classes[J]. Pattern Recognition Letters, 2003, 24(9-10): 1555-1562.
    [38] Hastie T., Tibshirani R., Friedman J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction[M]. New York: Springer, 2001.
    [39] Herschtal A., Raskutti B. Optimizing Area under the Roc Curve Using Gradient Descent[A]. In Brodley C. E., Ed. Proceedings of the 21st International Conference on Machine Learning (ICML 2004)[C]. New York, NY, USA: ACM, 2004: 49-56.
    [40] Hong X., Chen S., Harris C. J. A Kernel-Based Two-Class Classifier for Imbalanced Data Sets[J]. IEEE Transactions on Neural Networks, 2007, 18(1): 18(1): 28-41.
    [41] Huang J., Ling C. X. Using AUC and Accuracy in Evaluating Learning Algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17: 299-310.
    [42] Huang K., Yang H., King I., et al. Imbalanced Learning with a Biased Minimax Probability Machine[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 2006, 36(4): 913-923.
    [43] Huang K., Yang H., King I., et al. Maxi-Min Margin Machine: Learning Large Margin Classifiers Locally and Globally[J]. IEEE Transactions on Neural Networks, 2008,19(2): 260-272.
    [44] Hulse J. V., Khoshgoftaar T. M., Napolitano A. Experimental Perspectives on Learning from Imbalanced Data[A]. In Proceedings of the 24th International Conference on Machine Learning (ICML 2007)[C]. New York, NY, USA: ACM, 2007: 935-942.
    [45] Japkowicz N. Supervised Versus Unsupervised Binary-Learning by Feedforward Neural Networks[J]. Machine Learning, 2001, 32: 97–122.
    [46] Japkowicz N., Stephen S. The Class Imbalance Problem: A Systematic Study[J]. Intelligent Data Analysis, 2002, 6(5): 429-449.
    [47] Jo T., Japkowicz N. Class Imbalances Versus Small Disjuncts[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 40-49.
    [48] Joachims T. A Support Vector Method for Multivariate Performance Measures[A]. In Raedt L. D. and Wrobel S., Eds. Proceedings of the 22nd International Conference on Machine Learning (ICML 2005)[C]. New York, NY, USA: ACM, 2005: 385-392.
    [49] Kim H.-C., Pang S., Je H.-M., et al. Constructing Support Vector Machine Ensemble[J]. Pattern Recognition, 2003, 36(12): 2757-2767.
    [50] Kubat M., Holte R. C., Matwin S. Machine Learning for the Detection of Oil Spills in Satellite Radar Images[J]. Machine Learning, 1998, 30: 195–215.
    [51] Kubat M., Matwin S. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection[A]. In Proceedings of the 14th International Conference on Machine Learning (ICML 1997)[C]. San Francisco: Morgan Kaufmann, 1997: 179-186.
    [52] Lanckriet G. R. G., Ghaoui L. E., Bhattacharyya C., et al. A Robust Minimax Approach to Classification[J]. Journal of Machine Learning Research, 2003, 3(3): 555-582.
    [53] Leskovec J., Shawe-Taylor J. Linear Programming Boosting for Uneven Datasets[A]. In Fawcett T. and Mishra N., Eds. Proceedings of the 20th International Conference on Machine Learning (ICML 2003)[C]. Menlo Park, CA, USA: AAAI Press, 2003: 456-463.
    [54] Li D.-F., Hu W.-C., Xiong W., et al. Fuzzy Relevance Vector Machine for Learning from Unbalanced Data and Noise[J]. Pattern Recognition Letters, 2008, 29(9): 1175-1181.
    [55] Li G.-Z., Meng H.-H., Lu W.-C., et al. Asymmetric Bagging and Feature Selection forActivities Prediction of Drug Molecules[J]. BMC Bioinformatics, 2008, 9(6): 1-11.
    [56] Li X., Wang L., Sung E. AdaBoost with SVM-Based Component Classifiers[J]. Engineering Applications of Artificial Intelligence, 2008, 21(5): 785-795.
    [57] Lin C. F., Wang S. D. Fuzzy Support Vector Machines[J]. IEEE Transaction on Neural Networks, 2002, 13(2): 464-471.
    [58] Lin Y., Lee Y., Wahba G. Support Vector Machines for Classification in Nonstandard Situations[J]. Machine Learning, 2002, 46: 191-202.
    [59] Liu A., Ghosh J., Martin C. A Framework for Analyzing Skew in Evaluation Metrics[A]. In AAAI 2007 Workshop on Evaluation Methods for Machine Learning II[C].
    [60] Liu J., Hu Q., Yu D. A Comparative Study on Rough Set Based Class Imbalance Learning[J]. Knowledge-Based Systems, 2008, 21: 753-763.
    [61] Liu J., Hu Q., Yu D. A Weighted Rough Set Based Method Developed for Class Imbalance Learning[J]. Information Sciences, 2008, 178: 1235–1256.
    [62] Liu X. Y., Wu J. X., Zhou Z. H. Exploratory under-Sampling for Class-Imbalance Learning[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 2009, 39(2): 539-550.
    [63] Liu Y., An A., Huang X. Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles[A]. In Ng W. K., Kitsuregawa M., Li J., et al., Eds. Proceedings of the 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2006)[C]. Berlin: Springer, 2006: 107-118.
    [64] Lobo M. S., Vandenberghe L., Boyd S., et al. Applications of Second Order Cone Programming[J]. Linear Algebra Application, 1998, 284: 193-228.
    [65] Maloof M. A. Learning When Data Sets Are Imbalanced and When Costs Are Unequal and Unknown[A]. In Chawla N. V., Japkowicz N. and lcz A. K., Eds. Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Datasets II[C].
    [66] Manevitz L. M., Yousefs M. One-Class SVMs for Document Classification[J]. Journal of Machine Learning Research, 2001, 2: 139-154.
    [67] Mangasarian O. Mathematical Programming in Data Mining[J]. Data Mining and Knowledge Discovery, 1997, 1(2): 183-201.
    [68] Mccarthy K., Zabar B., Weiss G. Does Cost-Sensitive Learning Beat Sampling for Classifying Rare Classes?[J]. Proceedings of the 1st International Workshop on Utility-based Data Mining, 2005: 69-77.
    [69] Mitchell T. M. Machine Learning[M]. New York: McGraw-Hill, 1997.
    [70] Mjolsness E., Decoste D. Machine Learning for Science: State of the Art and Future Prospects[J]. Science, 2001, 293(5537): 2051-2055.
    [71] Mladeni? D., Grobelnik M. Feature Selection for Unbalanced Class Distribution and Naive Bayes[A]. In Bratko I. and Dzeroski S., Eds. Proceedings of the 16th International Conference on Machine Learning (ICML 1999)[C]. San Francisco, CA, USA: Morgan Kaufmann, 1999: 258-267.
    [72] Orriols-Puig A., Bernadó-Mansilla E. Evolutionary Rule-Based Systems for Imbalanced Data Sets[J]. Soft Computing, 2009, 13(3): 213-225.
    [73] Peng X., King I. Robust BMPM Training Based on Second-Order Cone Programming and Its Application in Medical Diagnosis[J]. Neural Networks, 2008, 21: 450–457.
    [74] Prati R. C., Batista G. E. A. P. A., Monard M. C. Class Imbalances Versus Class Overlapping: An Analysis of a Learning System Behavior[A]. In Osorio M., Monroy J. A. R., Arroyo-Figueroa G., et al., Eds. Advances in Artificial Intelligence: Third Mexican International Conference on Artificial Intelligence[C]. Berlin: Springer, 2004: 312-321.
    [75] Provost F. Machine Learning from Imbalanced Data Sets 101. Working Notes AAAI’00 Workshop Learning from Imbalanced Data Sets. 2000.
    [76] R?tsch G., Onoda T., Müller K. R. Soft Margins for AdaBoost[J]. Machine Learning, 2001, 42(3): 287-320.
    [77] R?tsch G., Warmuth M. K. Efficient Margin Maximizing with Boosting[J]. Journal of Machine Learning Research, 2003, 6: 2131-2152.
    [78] Rosset S. Model Selection Via the AUC[A]. In Brodley C. E., Ed. Proceedings of the 21st International Conference on Machine Learning (ICML 2004)[C]. New York, NY, USA: ACM, 2004: 89-96.
    [79] Schapire R. E., Freund Y., Bartlett P., et al. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods[J]. The Annals of Statistics, 1998, 26(5):1651-1686.
    [80] Sch?kopf B., Burges C. J. C., Smola A. J. Advances in Kernel Methods: Support Vector Learning[M]. Cambridge, Massachusetts: MIT press, 1999.
    [81] Sch?lkopf B., Platt J. C., Taylor J. S., et al. Estimating the Support of a High-Dimensional Distribution[J]. Neural Computation, 2001, 13(7): 1443-1472.
    [82] Shawe-Taylor J., Cristianini N. Kernel Methods for Pattern Analysis[M]. Cambridge University Press, 2004.
    [83] Shivaswamy P. K., Bhattacharyya C., Smola A. J. Second Order Cone Programming Approaches for Handling Missing and Uncertain Data[J]. The Journal of Machine Learning Research, 2006, 7: 1283-1314.
    [84] Smola A. J., Bartlett P. L., Sch?lkopf B., et al. Advances in Large Margin Classifiers[M]. Cambridge, Massachusetts: The MIT Press 2000.
    [85] Sokolova M., Japkowicz N., Szpakowicz S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation[A]. In Sattar A. and Kang B. H., Eds. Proceedings of the 2006 Australian Conference on Artificial Intelligence[C]. Berlin: Springer, 2006: 1015-1021.
    [86] Spackman K. A. Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning[A]. In Proceedings of the 6th International Workshop on Machine Learning[C]. San Francisco, CA, USA: Morgan Kaufman, 1989: 160-163.
    [87] Steinwart I., Hush D., Scovel C. A Classification Framework for Anomaly Detection[J]. Journal of Machine Learning Research, 2005, 6: 211-232.
    [88] http://sedumi.ie.lehigh.edu/.
    [89] Sun Y., Kamela M. S., K.C.Wongb A., et al. Cost-Sensitive Boosting for Classification of Imbalanced Data[J]. Pattern Recognition, 2007, 40: 3358– 3378.
    [90] Suykens J. A. K., Brabanter J. D., Lukas L., et al. Weighted Least Squares Support Vector Machines: Robustness and Sparse Approximation[J]. Neurocomputing, 2002, 48(1): 85-105.
    [91] Suykens J. A. K., Vandewalle J. Least Squares Support Vector Machine Classifiers[J]. Neural Processing Letters, 1999, 9(3): 293-300.
    [92] Tan S. Neighbor-Weighted K-Nearest Neighbor for Unbalanced Text Corpus[J]. ExpertSystems with Applications, 2005, 28(4): 667-671.
    [93] Tao D., Tang X., Li X., et al. Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(7): 1088-1099.
    [94] Tao Q., Wu G.-W., Wang F.-Y., et al. Posterior Probability Support Vector Machines for Unbalanced Data[J]. IEEE Transactions on Neural Networks, 2005, 16(6): 1561-1573.
    [95] Vanderlooy S., Hüllermeier E. A Critical Analysis of Variants of the AUC[J]. Machine Learning, 2009, 72: 247–262.
    [96] Vapnik V. N. The Nature of Statistical Learning Theory [M]. New York: Springer-Verlag, 1995.
    [97] Veropoulos K., Campbell C., Cristianini N. Controlling the Sensitivity of Support Vector Machines[A]. In Dean T., Ed. Proceedings of the16th International Joint Conference on Artificial Intelligence (IJCAI 1999)[C]. San Francisco: Morgan Kaufmann, 1999: 55-60.
    [98] Viola P., Jones M. Rapid Object Detection Using a Boosted Cascade of Simple Features[A]. In Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001)[C]. IEEE Computer Society, 2001: 511-518.
    [99] Wang S.-J., Mathew A., Chen Y., et al. Empirical Analysis of Support Vector Machine Ensemble Classifiers[J]. Expert Systems With Applications, 2009, 36: 6466-6476.
    [100] Weiss G. M. The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning[D]. New Brunswick, New Jersey: The State University of New Jersey, 2003.
    [101] Wu G., Chang E. Y. KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(6): 786-795.
    [102] Wu J., Mullin M. D., Rehg J. M. Linear Asymmetric Classifier for Cascade Detectors[A]. In Raedt L. D. and Wrobel S., Eds. Proceedings of the 22nd International Conference on Machine Learning (ICML 2005)[C]. New York, NY, USA: ACM, 2005: 988-995.
    [103] Wu S., Flach. P. A Scored AUC Metric for Classifier Evaluation and Selection[A]. InFerri C., Lachiche N., Macskassy S., et al., Eds. Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning (ROCML2005 )[C].
    [104] Xie J., Qiu Z. The Effect of Imbalanced Data Sets on LDA: A Theoretical and Empirical Analysis[J]. Pattern Recognition, 2007, 40: 557-562.
    [105] Yan L., Dodier R., Mozer M. C., et al. Optimizing Classifier Performance Via an Approximation to the Wilcoxon-Mann-Whitney Statistic[A]. In Fawcett T. and Mishra N., Eds. Proceedings of the 20th International Conference on Machine Learning (ICML 2003)[C]. Menlo Park, CA, USA: AAAI Press, 2003: 848-855.
    [106] Yang Q., Wu X. 10 Challenging Problems in Data Mining Research[J]. International Journal of Information Technology & Decision Making, 2006, 5(4): 597-604.
    [107] Yang Y. A Study of Thresholding Strategies for Text Categorization[A]. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C]. ACM New York, NY, USA: 137-145.
    [108] Yen S.-J., Lee Y.-S. Cluster-Based Sampling Approaches to Imbalanced Data Distributions[J]. Expert Systems with Applications, 2009, 36: 5718–5727.
    [109] Zhang L., Zhou W., Jiao L. Hidden Space Support Vector Machines[J]. IEEE Transactions on Neural Networks, 2004, 15(6): 1424-1434.
    [110] Zhang X., Wang X., Guo H., et al. Floatcascade Learning for Fast Imbalanced Web Mining[A]. In Huai J., Chen R., Hon H.-W., et al., Eds. Proceedings of International World Wide Web Conference (WWW 2008)[C]. New York, NY, USA: ACM, 2008: 71-80.
    [111] Zheng Z., Srihari R. Optimally Combining Positive and Negative Features for Text Categorization[A]. In Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Datasets II[C].
    [112] Zhou Z.-H., Liu X.-Y. Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 63-73.
    [113]邓乃杨,田英杰.数据挖掘中的新方法----支持向量机[M].北京:科学出版社, 2004.
    [114]杜喆,刘三阳.最小二乘支持向量机变型算法研究[J].西安电子科技大学学报(自然科学版), 2009, 36(2): 331-337.
    [115]郝秀兰,陶晓鹏,徐和祥,胡运发. KNN文本分类器类偏斜问题的一种处理对策[J].计算机研究与发展, 2009, 46(1): 52-61.
    [116]刘万里,刘三阳,薛贞霞.不平衡支持向量机的平衡方法[J].模式识别与人工智能, 2008, 21(2): 136-141
    [117]陶晓燕,姬红兵,董淑福.用于非平衡样本分类的近似支持向量机[J].模式识别与人工智能, 2007, 20(4): 553-557.
    [118]王咏,胡包钢.应用统计方法综合评估核函数分类能力的研究[J].计算机学报, 2008, 31(6): 942-952.
    [119]吴洪兴,彭宇,彭喜元.适用于不平衡样本数据处理的支持向量机方法[J].电子学报, 2006, 34(12A): 2395-2398.
    [120]张贤达.矩阵分析与应用[M].北京:清华大学出版社, 2004.
    [121]郑恩辉,李平,宋执环.不平衡数据知识挖掘:类分布对支持向量机分类的影响[J].信息与控制, 2005, 34(6): 703-708.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700