摘要
k近邻多标签算法(ML-k NN)是一种懒惰学习算法,并已经成功地应用到实际生活中。随着信息量的不断增大,将ML-kNN算法运用到大数据集上已是形势所需。利用聚类算法将数据集分为几个不同的部分,然后在每一个部分中使用ML-k NN算法,并在四个规模不同的数据集上进行了一系列实验。实验结果表明,基于此思想的ML-kNN算法不论在精度、性能还是效率上都略胜一筹。
Multi-label k Nearest Neighbor(ML-kNN)algorithm is a lazy learning approach and has successfully been developed in real application. With the increasing amount of information, it is necessary for the ML-kNN algorithm to be applied to large data sets. This paper firstly conducts clustering algorithm to separate the dataset into several parts, and then, each of which conducts ML-kNN classification. And a series of experiments are carried out on four different datasets.The experimental results show that ML-kNN algorithm proposed works well in terms of accuracy and efficient.
引文
[1] Qi H,Zhou Y,Guo Q.A hierarchical ML-k NN method for complex emotion analysis on customer reviews[C]//Proceedings of the International Conference on Mechatronics Engineering and Information Technology(ICMEIT2016),2016:6.
[2] Li Z Q.An improved ML-kNN multi-label classification model based on feature dimensionality reduction[C]//Proceedings of 2016 International Conference on Computer,Mechatronics and Electronic Engineering(CMEE 2016),2016:4.
[3] Mahdavi-Shahri A,Houshmand M,Yaghoobi M,et al.Applying an ensemble learning method for improving multi-label classification performance[C]//Proceeedings of Signal Processing and Intelligent Systems,2017:14-15.
[4] Tanaka E A,Nozawa S R,Macedo A A,et al.A multi-label approach using binary relevance and decision trees applied to functional genomics[J].Journal of Biomedical Informatics,2015,54:85-95.
[5] Read J,Pfahringer B,Holmes G,et al.Classifier chains for multi-label classification[J].Machine Learning,2011,85(3):333-359.
[6] soumakas G,Vlahavas I.Random k-labelsets:An ensemble method for multilabel classification[C]//Proceeedings of ECML 2007,2007:406-417.
[7] Xu J.Multi-label Lagrangian support vector machine with random block coordinate descent method[J].Information Sciences,2016,329:184-205.
[8] Fu Z,Zhang D,Wang L.Improvement on AdaBoost for multilabel classification[J].Sichuan Daxue Xuebao,2015,47(5):103-109.
[9] Zhang M L,Zhou Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
[10] Deng Z,Zhu X,Cheng D,et al.Efficient k NN classification algorithm for big data[J].Neurocomputing,2016,195:143-148.
[11] Shan S.Big data classification:problems and challenges in network intrusion prediction with machine learning[J].ACM Sigmetrics Performance Evaluation Review,2014,41(4):70-73.
[12]张丹普.多标签集成学习算法的关键技术研究[D].北京:中国科学院大学,2015.
[13]檀何凤,刘政怡.基于标签相关性的K近邻多标签分类方法[J].计算机应用,2015,35(10):2761-2765.
[14]周海琴,张红梅,靳小波.基于多标记ML-k NN算法的食用植物油检测研究[J].电脑知识与技术,2017(8):265-268.
[15] Zeng Y,Fu H M,Zhang Y P,et al.An improved ML-k NN algorithm by fusing nearest neighbor classification[C]//Proceedings of International Conference on Artificial Intelligence and Computer Science,2016.