ML-kNN算法在大数据集上的高效应用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Efficient ML-kNN Algorithm on Large Data Set
  • 作者:陆凯 ; 徐华
  • 英文作者:LU Kai;XU Hua;School of Internet of Things Engineering, Jiangnan University;
  • 关键词:多标签分类 ; ML-kNN算法 ; 聚类 ; 大数据集
  • 英文关键词:multi-label classification;;Multi-Label k Nearest Neighbor(ML-kNN);;cluster;;big data set
  • 中文刊名:JSGG
  • 英文刊名:Computer Engineering and Applications
  • 机构:江南大学物联网工程学院;
  • 出版日期:2018-03-29 12:02
  • 出版单位:计算机工程与应用
  • 年:2019
  • 期:v.55;No.920
  • 基金:江苏省自然科学基金(No.BK20140165);; 国家留学基金委项目(No.201308320030)
  • 语种:中文;
  • 页:JSGG201901014
  • 页数:5
  • CN:01
  • 分类号:90-94
摘要
k近邻多标签算法(ML-k NN)是一种懒惰学习算法,并已经成功地应用到实际生活中。随着信息量的不断增大,将ML-kNN算法运用到大数据集上已是形势所需。利用聚类算法将数据集分为几个不同的部分,然后在每一个部分中使用ML-k NN算法,并在四个规模不同的数据集上进行了一系列实验。实验结果表明,基于此思想的ML-kNN算法不论在精度、性能还是效率上都略胜一筹。
        Multi-label k Nearest Neighbor(ML-kNN)algorithm is a lazy learning approach and has successfully been developed in real application. With the increasing amount of information, it is necessary for the ML-kNN algorithm to be applied to large data sets. This paper firstly conducts clustering algorithm to separate the dataset into several parts, and then, each of which conducts ML-kNN classification. And a series of experiments are carried out on four different datasets.The experimental results show that ML-kNN algorithm proposed works well in terms of accuracy and efficient.
引文
[1] Qi H,Zhou Y,Guo Q.A hierarchical ML-k NN method for complex emotion analysis on customer reviews[C]//Proceedings of the International Conference on Mechatronics Engineering and Information Technology(ICMEIT2016),2016:6.
    [2] Li Z Q.An improved ML-kNN multi-label classification model based on feature dimensionality reduction[C]//Proceedings of 2016 International Conference on Computer,Mechatronics and Electronic Engineering(CMEE 2016),2016:4.
    [3] Mahdavi-Shahri A,Houshmand M,Yaghoobi M,et al.Applying an ensemble learning method for improving multi-label classification performance[C]//Proceeedings of Signal Processing and Intelligent Systems,2017:14-15.
    [4] Tanaka E A,Nozawa S R,Macedo A A,et al.A multi-label approach using binary relevance and decision trees applied to functional genomics[J].Journal of Biomedical Informatics,2015,54:85-95.
    [5] Read J,Pfahringer B,Holmes G,et al.Classifier chains for multi-label classification[J].Machine Learning,2011,85(3):333-359.
    [6] soumakas G,Vlahavas I.Random k-labelsets:An ensemble method for multilabel classification[C]//Proceeedings of ECML 2007,2007:406-417.
    [7] Xu J.Multi-label Lagrangian support vector machine with random block coordinate descent method[J].Information Sciences,2016,329:184-205.
    [8] Fu Z,Zhang D,Wang L.Improvement on AdaBoost for multilabel classification[J].Sichuan Daxue Xuebao,2015,47(5):103-109.
    [9] Zhang M L,Zhou Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
    [10] Deng Z,Zhu X,Cheng D,et al.Efficient k NN classification algorithm for big data[J].Neurocomputing,2016,195:143-148.
    [11] Shan S.Big data classification:problems and challenges in network intrusion prediction with machine learning[J].ACM Sigmetrics Performance Evaluation Review,2014,41(4):70-73.
    [12]张丹普.多标签集成学习算法的关键技术研究[D].北京:中国科学院大学,2015.
    [13]檀何凤,刘政怡.基于标签相关性的K近邻多标签分类方法[J].计算机应用,2015,35(10):2761-2765.
    [14]周海琴,张红梅,靳小波.基于多标记ML-k NN算法的食用植物油检测研究[J].电脑知识与技术,2017(8):265-268.
    [15] Zeng Y,Fu H M,Zhang Y P,et al.An improved ML-k NN algorithm by fusing nearest neighbor classification[C]//Proceedings of International Conference on Artificial Intelligence and Computer Science,2016.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700