数据挖掘中聚类算法综述
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Survey of Clustering Algorithms in Data Mining
  • 作者:刘维
  • 英文作者:Liu Wei;Henan College of Husbandry Economy, School of Logistics and E-commerce;
  • 关键词:聚类算法 ; 数据挖掘 ; 数据分析 ; 个性化推荐 ; 人工智能
  • 英文关键词:clustering algorithm;;data mining;;data analysis;;personalized recommendation;;artificial intelligence
  • 中文刊名:SAHG
  • 英文刊名:Jiangsu Commercial Forum
  • 机构:河南牧业经济学院物流与电商学院;
  • 出版日期:2018-07-20
  • 出版单位:江苏商论
  • 年:2018
  • 期:No.405
  • 语种:中文;
  • 页:SAHG201807030
  • 页数:6
  • CN:07
  • ISSN:32-1076/F
  • 分类号:122-127
摘要
大数据时代,如何对海量的数据进行有效的聚类、分析和预测,解决用户信息超载的问题已成为重要的研究课题。聚类算法作为数据挖掘的重要技术,已被广泛应用在数据分析、客户细分、人工智能等领域。本文在分析五类传统聚类算法的研究现状基础上,综述了一些新发展的聚类算法。归纳总结了已有聚类算法存在的问题,并从搜索引擎领域、个性化推荐领域与人工智能领域三个方面探讨聚类算法的发展方向。
        In the era of big data, how to effectively cluster, analyze and forecast the massive data, and cope with the issue of overloading user information has become an important research topic. As an important technology of data mining, clustering algorithm has been widely used in data analysis, customer segmentation, artificial intelligence and other fields. Based on the analysis of five traditional clustering algorithms, combed some new clustering algorithms. This paper summarizes the existing issues of clustering algorithm,and discusses the development direction of clustering algorithm from three aspects: search engine domain, personalized recommendation field and artificial intelligence field.
引文
(1)Marin,M.Moonen,L.van Deursen,A.An Integrated Crosscutting Concern Migration Strategy and its Application to JHOTDRAW[C].Seventh IEEE International Working Conference on Source Code Analysis and Manipulation,2007:101-110.
    (2)Chen,J.K.C.Yuan,B.J.C.He,J.Z.Y.Exploring correlation of industry cluster alternative with cluster formation and correlation of cluster formation with cluster effect:A case study of Taiwan precision machinery industry[C].Portland International Conference on Management of Engineering&Technology,2008:1704-1720.
    (3)Xu Rui.Survey of clustering algorithm[J].IEEE Tran on Neural Networks,2005,16(3):645-678.
    (4)Mac Queen J.Some methods for classification and analysis of multivariate observations[C].//Proc of the 5th Symposium on Mathematical Statistics and Probability,Berkeley,1967:281-297.
    (5)Jianpeng Qi,Yanwei Yu,Lihong Wang,Jinglei Liu,Yingjie Wang.An effective and efficient hierarchical K-means clustering algorithm[J].International Journal of Distributed Sensor Networks,2017,13(8):15-19.
    (6)Zahidul Islam,Vladimir Estivill-Castro,Anisur Rahman,Terry Bossomaier.Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering[J].Expert Systems with Applications,2018,91:402-417.
    (7)Kaufman L,Rousseeuw P J.Finding groups in data:an introduction to cluster analysis[M].New York:John Wiley&Sons,1990.
    (8)Yasmine Aboubi,Habiba Drias,Nadjet Kamel.BAT-CLARA:BAT-inspired algorithm for Clustering LARge Applications[J].IFAC-Papers On Line,2016,49(12):243-248.
    (9)Debora de Chiusole,Luca Stefanutti,Andrea Spoto.A class of k-modes algorithms for extracting knowledge structures from data[J].Behavior Research Methods,2017,49(4):1212-1226.
    (10)Lauritzen S L.The EM algorithm for graphical association models with missing data[J].Computational Statistics and Data Analysis,1995,19:191-201.
    (11)Ng Raymond T.,Jiawei Han.CLARANS:A Method for Clustering Objects for Spatial Data Mining[J].IEEE Transactions on Knowledge and Data Engineering,2002,14(5):1003.
    (12)Li Mingchao,Han Shuai,Shi Jonathan.An enhanced ISODATA algorithm for recognizing multiple electric appliances from the aggregated power consumption dataset[J].Energy&Buildings,2017,140:305-316.
    (13)Zhang T,Ramakrishnan R,Livny M.BIRCH:an efficient data clustering method for very large databases[C].//Proc 1996 ACMSIGMOD Int Conf Management of Data,1996:103-114.
    (14)Li Ma,Suohai Fan.CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests[J].BMC Bioinformatics,2017,18:1-18.
    (15)金阳,左万利.一种基于动态近邻选择模型的聚类算法[J].计算机学报,2007,30(05):756-762.
    (16)王荣,王飞戈,吴坤芳.基于改进ROCK算法的个性化推荐系统研究[J].河南科学,2011,29(11):1346-1349.
    (17)H Xing,T Li,H Wang,Y Yang,W Xiao.Semi-supervised hierarchical clustering ensemble and its application[J].Neurocomputing,2016,173:1362-1367.
    (18)Wang Wei,Yang Jiong,Muntz R.STING:an approach to active spatial data mining[C].//15th International Conference on Data Engineering,1999:116-125.
    (19)Nguyen Duy Dat,Vo Ngoc Phu,Vo Thi Ngoc Tran,Vo Thi Ngoc Chau,Tuan A.Nguyen.STING Algorithm Used English Sentiment Classification in a Parallel Environment[J].International Journal of Pattern Recognition and Artificial Intelligence,2017,31(7):1750021.
    (20)Ahmet Artu Yldrm,Cem zdoan.Parallel Wave Cluster:A linear scaling parallel clustering algorithm implementation with application to very large datasets[J].Journal of Parallel and Distributed Computing,2011,7(7):955-962.
    (21)EL Anggraini,N Suciati,W Suadi.Parallel computing of Wave Cluster algorithm for face recognition application[J].International Conference on Qir,2013,21:56-59.
    (22)于亚飞,周爱武.一种改进的DBSCAN密度算法[J].计算机技术与发展,2011,21(02):30-33+38.
    (23)李双庆,慕升弟.一种改进的DBSCAN算法及其应用[J].计算机工程与应用,2014,50(08):72-76.
    (24)吴伟民,黄焕坤.基于差分隐私保护的DP-DBScan聚类算法研究[J].计算机工程与科学,2015,37(04):830-834.
    (25)曾依灵,许洪波,白硕.改进的OPTICS算法及其在文本聚类中的应用[J].中文信息学报,2008,(01):51-55+60.
    (26)党秋月,陆月明.基于OPTICS可达图的自动识别簇方法[J].计算机应用,2012,32(S2):19-21+47.
    (27)孙天宇,孙炜,薛敏.OPTICS聚类与目标区域概率模型的多运动目标跟踪[J].中国图象图形学报,2015,20(11):1492-1499.
    (28)Hinneburg A,Keim D A.An efficient approach to clustering in large multimedia databases with noise[C].//Proc1998 Int Conf Knowledge Discovery and Data Mining,1998:58-65.
    (29)李光强,邓敏,刘启亮,等.一种适应局部密度变化的空间聚类方法[J].测绘学报,2009,38(03):255-263.
    (30)岳佳,王士同.高斯混合模型聚类中EM算法及初始化的研究[J].微计算机信息,2006,(33):244-246+302.
    (31)王继利,杨兆军,李国发,朱晓翠.基于改进EM算法的多重威布尔可靠性建模[J].吉林大学学报(工学版),2014,44(4):1010-1015.
    (32)Fisher D.Improving inherence through conceptual clustering[C].//Proc1987 AAAI Conf,1987:461-465.
    (33)Gennar J,Langley P,Fisher D.Models of incremental concept for mation[J].Artificial Intelligence,1989,40(1):11-61.
    (34)邵超,黄厚宽.一种新的基于SOM的数据可视化算法[J].计算机研究与发展,2006,(03):429-435.
    (35)郭晓利,曲朝阳,李晓栋,张加玲,孟凡奇.基于SOM聚类的电网可视化数据挖掘模型[J].情报科学,2012,30(02):206-209+225.
    (36)徐丽,丁世飞.粒度聚类算法研究[J].计算机科学,2011,38(08):25-29.
    (37)孙梦梦,唐旭清.基于粒度空间的最小生成树分类算法[J].南京大学学报(自然科学),2017,(05):963-971.
    (38)赵骏鹏,赵书良,李超,高琳,池云仙.基于粒计算的多尺度聚类尺度上推算法[J/OL].计算机应用研究,2018,(02):1-2.
    (39)Ruspini E H.New experimental results in fuzzy clustering[J].Information Science,1973,18(02):273-287.
    (40)马文萍,黄媛媛,李豪,李晓婷,焦李成.基于粗糙集与差分免疫模糊聚类算法的图像分割[J].软件学报,2014,25(11):2675-2689.
    (41)李远成,阴培培,赵银亮.基于模糊聚类的推测多线程划分算法[J].计算机学报,2014,37(03):580-592.
    (42)刘解放,蒋亦樟,王骏,邓赵红,王士同.单趟贝叶斯模糊聚类算法[J/OL].软件学报,1-17(2017-03-31).
    (43)Ruspini E H.A new approach to clustering[J].Information and Control,1969,19(15):22-32.
    (44)李订芳,章文,何炎祥.一种新的带模糊权的粗糙聚类算法[J].信息与控制,2006,(01):120-125.
    (45)何明,冯博琴,马兆丰,傅向华.一种基于高斯混合模型的无监督粗糙聚类方法[J].哈尔滨工业大学学报,2006,(02):256-259+322.
    (46)李学,苗夺谦,冯琴荣.基于数据场的粗糙聚类算法[J].计算机科学,2009,36(02):203-206+244.
    (47)魏立梅,谢维信.模糊C-球壳聚类算法的研究[J].电子与信息学报,2001,23(01):37-44.
    (48)杨勋,谢维信,黄建军.量子球壳聚类[J].西安电子科技大学学报:自然科学版,2008,35(01):43-48.
    (49)惠周利,杨明,潘晋孝.基于遗传算法与FCSS相结合的模糊球壳聚类算法[J].传感器与微系统,2008,27(12):109-111.
    (50)王洪春,彭宏.一种基于熵的聚类算法[J].计算机科学,2007,34(01):178-180.
    (51)魏霖静,宁璐璐,郭斌,侯振兴.大数据中基于熵加权的稀疏分数特征选择聚类算法[J/OL].计算机应用研究,2018,(08):1-5(2017-07-21).
    (52)Scholkopf B,Mika S,Burges C,et al.Input space versus feature space in kernel-based methods[J].IEEE Transon Neural Networks,1999,10(5):1000-1017.
    (53)伍忠东,高新波,谢维信.基于核方法的模糊聚类算法[J].西安电子科技大学学报,2004,31(04):533-537.
    (54)杨广泉,朱昌明.基于粒子群优化的模糊核聚类方法[J].上海交通大学学报,2009,43(06):935-939.
    (55)余晓东,雷英杰,岳韶华,王睿.基于粒子群优化的直觉模糊核聚类算法研究[J].通信学报,2015,36(05):78-84.
    (56)Donath WE,Hoffman A.J.Lower bounds for partitioning of graphs[J].IBM JRes Develop,1973,17(5):420-425.
    (57)王玲,薄列峰,焦李成.密度敏感的谱聚类[J].电子学报,2007,35(08):1577-1581.
    (58)郑吉,苗夺谦,王睿智,钟才明.一种基于粗糙集理论的谱聚类算法[J].计算机科学,2009,36(05):193-196.
    (59)杨艺,马儒宁.基于核心点的大数据谱聚类算法[J].中国科学技术大学学报,2016,46(09):757-763.
    (60)刘维.基于谱聚类的微博关注推荐方法研究[D].河南工业大学,2014.
    (61)李志华,王士同.一种改进的量子聚类算法[J].数据采集与处理,2008,(02):211-214.
    (62)徐永振,郭躬德,蔡彬彬,林崧.基于一维三态量子游走的量子聚类算法[J].计算机科学,2016,43(03):80-83.
    (63)罗会兰,危辉.基于数学形态学的聚类集成算法[J].计算机科学,2010,37(08):214-218.
    (64)邓强,杨燕,王浩.一种改进的多视图聚类集成算法[J].计算机科学,2017,44(01):65-70.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700