摘要
针对非均匀类数据,本文提出K最近邻分类子的一个分类原则改良方法,能够度量待分类数据的K个近邻点中的类比率提升量,增大了最小类数据的竞争力,明显地提高了小类数据的分类正确率。实验结果表明,本文提出的改良分类原则对非均匀数据分类的准确率明显高于传统的KNN分类算法。
A KNN classifier is presented for classifying imbalanced data.A gain model is constructed for measuring the lift of probability of a class label.The competition of minority class is well enhanced in imbalanced-class dataset.And the accurate rate of classifying minor-class data is significantly improved.The experimental results show that in the setting of imbalanced-class datasets,the proposed approach has significantly improved the classification accuracy,compared with the existing KNN classifiers.
引文
[1]ZHU Xiaofeng,XIE Qing,ZHU Yonghua,et al.Multi-view multi-sparsity kernel reconstruction for multi-class image classification[J].Neurocomputing,2015,169:43-49.DOI:10.1016/j.neucom.2014.08.106.
[2]WU Xindong,KUMAR V,QUINLAN J R,et al.Top 10algorithms in data mining[J].Knowledge and Information Systems,2008,14(1):1-37.DOI:10.1007/s10115-007-0114-2.
[3]DENG Zhenyun,ZHU Xiaoshu,CHENG Debo,et al.Efficient kNN classification algorithm for big data[J].Neurocomputing,2016,195:143-148.DOI:10.1016/j.neucom.2015.08.112.
[4]ZHANG Shichao.KNN-CF approach:incorporating certainty factor to kNN classification[J].IEEE Intelligent Informatics Bulletin,2010,11(1):24-33.
[5]张师超.KDD全过程利用缺失数据的模型与方法[R/OL].北京:中国科学院数学与系统科学研究院数学研究所,2017[2018-11-02].http://www.math.ac.cn/xshd/xsbg/201712/t20171220_391373.html.
[6]ZHANG Shichao,LI Xuelong,ZONG Ming,et al.Learning k for kNN classification[J].ACM Transactions on Intelligent Systems and Technology,2017,8(3):43.DOI:10.1145/2990508.
[7]ZHU Xiaofeng,ZHANG Shichao,JIN Zhi,et al.Missing value estimation for mixed-attribute datasets[J].IEEETransactions on Knowledge and Data Engineering,2011,23(1):110-121.DOI:10.1109/TKDE.2010.99.
[8]ZHU Xiaofeng,LI Xuelong,ZHANG Shichao.Block-row sparse multiview multilabel learning for image classification[J].IEEE Transactions on Cybernetics,2016,46(2):450-461.DOI:10.1109/TCYB.2015.2403356.
[9]ZHU Xiaofeng,LI Xuelong,ZHANG Shichao,et al.Graph PCA hashing for similarity search[J].IEEE Transactions on Multimedia,2017,19(9):2033-2044.DOI:10.1109/TMM.2017.2703636.
[10]ZHU Xiaofeng,LI Xuelong,ZHANG Shichao,et al.Robust joint graph sparse coding for unsupervised spectral feature selection[J].IEEE Transactions on Neural Networks and Learning Systems,2017,28(6):1263-1275.DOI:10.1109/TNNLS.2016.2521602.
[11]ZHU Xiaofeng,ZHANG Shichao,HU Rongyao,et al.One-step multi-view spectral clustering[J].IEEE Transactions on Knowledge and Data Engineering,2018.DOI:10.1109/TKDE.2018.2873378.
[12]COVER T,HART P.Nearest neighbor pattern classification[J].IEEE Transactions on Information Theory,1967,13(1):21-27.DOI:10.1109/TIT.1967.1053964.
[13]ZHANG Shichao.Nearest neighbor selection for iteratively kNN imputation[J].Journal of Systems and Software,2012,85(11):2541-2552.DOI:10.1016/j.jss.2012.05.073.
[14]吴昊.最近邻分类的改良模型[J].广西大学学报(自然科学版),2012,37(6):1261-1266.DOI:10.13624/j.cnki.issn.1001-7445.2012.06.022.
[15]吴昊,唐振军.加权壳近邻填充数学模型[J].华南师范大学学报(自然科学版),2013,45(3):45-48.
[16]DUA D,EFI K T.UCI machine learning repository[DS/OL].Irvine,CA:University of California,School of Information and Computer Science,2017[2018-11-02].http://archive.ics.uci.edu/ml.