详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     (1)针对非平衡数据中的不平衡问题,提出了在核方法里的象空间进行数据处理的方法即SMOIS(Synthetic Minority Over-sampling In Image Space)方法。该方法不同于在数据原空间中产生新合成的少数类样本的策略,而是通过在映射后的象空间(Image Space)里引入非重复性的人造少数类样本,以减少分类算法对少数类样本的敏感度,实验结果表明,在ROC曲线和g-means评估度量上该方法能达到一个更好的分类性能。
Imbalanced dataset classification problem is very common in the real world, such as medical diagnostic, radar image detection, fraud detection and so on. Due to the intrinsic uneven attribute, namely the extraordinary difference between the amount of positive samples and negative samples, it leads to the reduction of the tradition classification algorithm's performance, so how to effectively and accurately classify the imbalanced dataset has become a hot research problem in the machine learning and pattern recognition field.
     On the basis of tradition kernel method, this paper propose a classification learning algorithm, which integrates a new over-sampling method and the Support Vector Machine with different costs, to achieve the target of improving the imbalanced dataset classification performance. Main works are studied follows:
     (1) Aim at the imbalance problem of imbalanced dataset, this paper proposes a method of data processing in the image space of kernel method, namely SMOIS (Synthetic Minority Over-sampling in Image Space). This method which is different from the strategy of synthesizing minority samples in the original data space brings in non-repetitive synthetic minority samples in the image space after mapped and thus reducing the sensitive of minority sample of classification algorithm. The experiment results show that this method has a better classification performance according to the evaluation on roc curve and g-means.
     (2) Support Vector Machine (SVM) is an effective classification learning algorithm, but usually obtains an unsatisfactory performance in face of the imbalanced dataset. Consequently this paper proposes a new SVM learning algorithm based on the SMOIS to improve the performance of classification, which integrate the SMOIS method and revised SVM algorithm.
     The researches in this paper are the one of currently key problems. It has important theoretical significance, and also has direct application value for real-world problems.
    [5]Bernhard,Scholkopf.Alexander,J Smola.Learning with Kernels[M].Massachusetts:The MIT Press,2002:87
    [6]Akbani,R.Kwek,S.Japkowicz,N.Applying Support Vector Machine to Imbalanced Datasets[A].In:European Conference on Machine Learning[C],2004,39-5.
    [7]Wu,G..Chang,E.Class-Boundary Alignment for Imbalanced Dataset Learning[A].In:International Conference on Machine Learning[C],2003
    [8]Chawla,N.Bowyer,K.Hall,L.Kegelmeyer,W.SMOTE:Synthetic Minority Over-Sampling Technique[J],Journal of Artificial Intelligence Research,2002,16,321-357.
    [9]Mercer,J.Functions of positive and negative type and their connection with the theory of integral equations[J].Philos.Trans.Roy.Soc.London,1909,A 209:415-446,
    [10]C.J.Merz,P.M.Murphy.UCI repository of machine learning databases[DB],Department of Imformation and Computer Sciences,University of California,Irvine.http://www.ics.uci.edu/~mlearn/MLSummary.html,1998
    [11]S.Canu,Y.Grandvalet,V.Guigue,A.Rakotomamonjy.SVM and Kernel Methods Matlab Toolbox[CP],Perception Syst(?)mes et Information,INSA de Rouen,Rouen,France.
    [12]Amari,S.Wu,S.Improving support vector machine classifiers by modifying kernel function[J].Neural Networks,12,783-789.
    [13]Kubat,M.& Matwin,S.Addressing the Curse of Imbalanced Training Sets:One-Sided Selection[A].In:Proceedings of the 14th International Conference on Machine Learning[C].1997.
    [14]Wu,G.,& Chang,E..Adaptive feature-space conformal transformation for imbalanced data learning[A].In Proc.of the 20th International Conference on Machine Learning[C].2003
    [15]Wu,S.,& Amari,S..Conformal transformation of kernel functions:A data-dependent way to improve the performance of support vector machine classifiers[J].Neural Processing Letter,2002,15.
    [16]Kaizhu Huang,Haiqin Yang,Irwin King.Correspondence:Imbalanced Learning with a Biased Minimax Probability Machine[J].IEEE Transactions on Systems,Man,and Cybernetics.2006,VOL.36,NO.4:913.
    [17] Richard O.Duda, Peter E.Hart, David G. Stork. Pattern Classification[M], Second Edition. 2001
    [18] Nugroho. A, Kuroyanagi, S, Iwata, A. A solution for imbalanced training sets problem by combnet-ii and its application on fog forecasting[J]. IEICE Transactionon Information and Systems, 2002,E85-D.
    [19] Jason Van Hulse, Taghi M. Khoshgoftaar, Amri Napolitano, Experimental Perspectives on Learning from Imbalanced Data[A]. In Procceeding of the 24~(th) International Conference on Machine Learning[C], 2007
    [20] Nitesh V.Chawla C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure[A]. In:The International Conference on Machine Learning[C], Washington DC, 2003
    [21] Vapnik,V. The nature of statistical learning theory[M]. New York: Springer Verlag Press, 1995
    [22] Japkowicz, N. Learning from imbalanced data sets: a comparison of various strategies[A]. In AAAI Workshop on Learning from Imbalanced Data Sets[C]. AAAI'00.2000,pp.10-15.
    [23] Han, H., Wang, W. Y., & Mao, B. H.. Borderline-smote: A new over-sampling method in imbalanced data sets learning[A]. In International Conference on Intelligent Computing (ICIC'05)[C].Lecture Notes in Computer Science, Springer-Verlag, 2005
    [24] Boser,B.E., Guyon,I.M., Vapnik,V.N. A training algorithm for optimal margin classifiers[A]. In Proceedings of the 5~(th) Annual ACM Workshop on Computational Learning Theory[C], Pittsburgh, PA: ACM Press, 1992, pp. 144-152,
    [25] Osuna,E., Freund,R., Girosi,G. Improved training algorithm for support vector machines[A]. In Proc. IEEE NNSP'97[C]. Amelia Island. 1997,pp. 24-26,
    [26] Platt, J. Fast training of support vector machines using sequential minimal optimization[A]. In advances in Kernel Methods Support Vector Learning[M], Cambridge, MA: MIT Press. 1999, Pages 185-208,.
    [27] Cortes,C., Vapnik,V. Support vector networks[J]. Machine Learning, 1995, 20:1-25.
    [28] Schmidt,M. Identifying speaker with support vector networks[A]. In Interface '96 Proceedings[C], Sydney, 1996.
    [29] Osuna,E., Freund,R., Girosi,G. Training support vector machines: an application to face detection[A]. In International Conference on Computer Vision and Pattern Recognition[C], 1997, pp. 130-136.
    [30] Joachims,T. Text categorization with support vector machines[R]. Technical Report, LS Ⅷ Number 23, university of Dortmund, 1997
    [31] Scholkopf,B. Smola,A., Muller,K.R. Nonlinear component analysis as a kernel eigenvalue problem[J]. Neural Computations, 1998, 10:1299-1319,
    [32] Mika,S., Ratsch,G., Weston,J., Scholkopf,B. Muller,K.R. Fisher discriminant analysis with kernels[J]. Neural Networks for Signal Processing IX, IEEE, 1999, pp.41-48.
    [33] Aizerman,M.A., Braverman,E.M., Rozonoer,L.I. Theoretical foundation of the potential function method in pattern recognition learning[J]. Automation and Remote Control, 1964,25:821-837.
    [34] Batista, G.; Prati, M, and Monard, M. A study of the behavior of several methods for balancing machine learning training data[A]. In: SIGKDD Explorations[C], 2004 6(1):20-29.
    [35] Japkowicz, N. Class imbalances: Are we focusingon the right issue?[A], In proc. of the ICML-2003 Workshop: Learning with Imbalanced Data Sets Ⅱ[C], 2003.17-23.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700