ML-kNN算法在大数据集上的高效应用

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

ML-kNN算法在大数据集上的高效应用

详细信息查看全文 | 推荐本文 |

英文篇名：Efficient ML-kNN Algorithm on Large Data Set
作者：陆凯 ; 徐华
英文作者：LU Kai;XU Hua;School of Internet of Things Engineering, Jiangnan University;
关键词：多标签分类 ; ML-kNN算法 ; 聚类 ; 大数据集
英文关键词：multi-label classification;;Multi-Label k Nearest Neighbor(ML-kNN);;cluster;;big data set
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：江南大学物联网工程学院;
出版日期：2018-03-29 12:02
出版单位：计算机工程与应用
年：2019
期：v.55;No.920
基金：江苏省自然科学基金(No.BK20140165);; 国家留学基金委项目(No.201308320030)
语种：中文;
页：JSGG201901014
页数：5
CN：01
分类号：90-94

摘要

k近邻多标签算法(ML-k NN)是一种懒惰学习算法,并已经成功地应用到实际生活中。随着信息量的不断增大,将ML-kNN算法运用到大数据集上已是形势所需。利用聚类算法将数据集分为几个不同的部分,然后在每一个部分中使用ML-k NN算法,并在四个规模不同的数据集上进行了一系列实验。实验结果表明,基于此思想的ML-kNN算法不论在精度、性能还是效率上都略胜一筹。
Multi-label k Nearest Neighbor(ML-kNN)algorithm is a lazy learning approach and has successfully been developed in real application. With the increasing amount of information, it is necessary for the ML-kNN algorithm to be applied to large data sets. This paper firstly conducts clustering algorithm to separate the dataset into several parts, and then, each of which conducts ML-kNN classification. And a series of experiments are carried out on four different datasets.The experimental results show that ML-kNN algorithm proposed works well in terms of accuracy and efficient.

引文

[1] Qi H,Zhou Y,Guo Q.A hierarchical ML-k NN method for complex emotion analysis on customer reviews[C]//Proceedings of the International Conference on Mechatronics Engineering and Information Technology(ICMEIT2016),2016:6.
    [2] Li Z Q.An improved ML-kNN multi-label classification model based on feature dimensionality reduction[C]//Proceedings of 2016 International Conference on Computer,Mechatronics and Electronic Engineering(CMEE 2016),2016:4.
    [3] Mahdavi-Shahri A,Houshmand M,Yaghoobi M,et al.Applying an ensemble learning method for improving multi-label classification performance[C]//Proceeedings of Signal Processing and Intelligent Systems,2017:14-15.
    [4] Tanaka E A,Nozawa S R,Macedo A A,et al.A multi-label approach using binary relevance and decision trees applied to functional genomics[J].Journal of Biomedical Informatics,2015,54:85-95.
    [5] Read J,Pfahringer B,Holmes G,et al.Classifier chains for multi-label classification[J].Machine Learning,2011,85(3):333-359.
    [6] soumakas G,Vlahavas I.Random k-labelsets:An ensemble method for multilabel classification[C]//Proceeedings of ECML 2007,2007:406-417.
    [7] Xu J.Multi-label Lagrangian support vector machine with random block coordinate descent method[J].Information Sciences,2016,329:184-205.
    [8] Fu Z,Zhang D,Wang L.Improvement on AdaBoost for multilabel classification[J].Sichuan Daxue Xuebao,2015,47(5):103-109.
    [9] Zhang M L,Zhou Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
    [10] Deng Z,Zhu X,Cheng D,et al.Efficient k NN classification algorithm for big data[J].Neurocomputing,2016,195:143-148.
    [11] Shan S.Big data classification:problems and challenges in network intrusion prediction with machine learning[J].ACM Sigmetrics Performance Evaluation Review,2014,41(4):70-73.
    [12]张丹普.多标签集成学习算法的关键技术研究[D].北京:中国科学院大学,2015.
    [13]檀何凤,刘政怡.基于标签相关性的K近邻多标签分类方法[J].计算机应用,2015,35(10):2761-2765.
    [14]周海琴,张红梅,靳小波.基于多标记ML-k NN算法的食用植物油检测研究[J].电脑知识与技术,2017(8):265-268.
    [15] Zeng Y,Fu H M,Zhang Y P,et al.An improved ML-k NN algorithm by fusing nearest neighbor classification[C]//Proceedings of International Conference on Artificial Intelligence and Computer Science,2016.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700