K-means聚类算法的研究

英文题名：Research of K-means Clustering Algorithm
作者：尚海昆
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：聚类分析 ; K-means ; 蚁群算法 ; 供电企业
英文关键词：clustering analysis ; K-means ; ant colony algorithm ; power supply enterprise
学位年度：2009
导师：孟建良
学科代码：081203
学位授予单位：华北电力大学（河北）
论文提交日期：2009-12-18

摘要

聚类分析是数据挖掘的一个重要的研究领域,是一种用于数据划分或分组的重要手段。K-means算法是一种传统的基于划分的聚类算法,其对大规模数据进行聚类时效率较高,从而被广泛应用在数据挖掘领域。本文在研究传统聚类算法的基础上,给出基于加权蚁群聚类的WAC K-means算法。该算法首先将加权思想引入到蚁群算法当中,而后将蚂蚁的转移概率引入到K-means聚类算法中,根据概率来决定数据归属,最后将改进的WAC K-means聚类算法运用到供电企业CRM系统的客户细分研究和实际应用中,实现了对用电客户群的细分,得出了有价值的信息。
Clustering is an important area for research in Data Mining, which is also an important method in data partition or data grouping. K-means algorithm is a traditional partition clustering method. It is widely used in the area of Data Mining to cluster large data sets due to its high efficiency. Based on the traditional clustering algorithms, we bring forward the WAC K-means algorithm based on the Weighted Ant Clustering. In this improved method a weighting idea is introduced to the ant algorithm and then the transition probability of ants is introduced into the K-means clustering algorithm to determine which group the data belongs to. Finally, apply the improved WAC K-means algorithm into the customer segmentation and application in Power Supply Enterprise. And it has realized the application of enterprise's customer segmentation and we could get valuable information.

引文

[1]Pang-Ning Tan, Michael Steinbach,Vipin Kumar.数据挖掘导论.北京：人民邮电出版社,2006,1～103
    [2]Michael J. Corey, Michael Abbey, Lan Abramson and Ben Taub. Oracle 8数据仓库分析及构建实用指南.北京：机械工业出版社,2000
    [3]R. Agrawal, J. Gehrke,D.Gunopulos,and P.Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. Proc. ACM SIGMOD,1998.94-105
    [4]李卫平.K-Means聚类算法研究.中国西部科技,2008,7(8)：52～53
    [5]HUANG Zhexue. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery,1998.283-304
    [6]HUANG Zhexue.Clustering large data sets with mixed numeric and categorical values. Proceedings of the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore:World Scientific,1997.21-34
    [7]RUSPINI. A new approach to clustering[J]. IEEE Transaction on fuzzy systems,1999.446-452
    [8]HUANG Z,NGM. A fuzzy k-modes algorithm for clustering categorical data. IEEE Transaction on fuzzy systems,1999.446～452
    [9]CHEN N, CHEN A, ZHOU L. Fuzzy K-prototypes algorithm for clustering mixed numeric and categorical valued data(in English). Journal of Software, 2001.1107-1119
    [10]范明译.数据挖掘概念与技术.北京：机械工业出版社,2005,35～206
    [11]朱明.数据挖掘.北京：中国科学技术出版社,2002,1～52
    [12]T. Zhang, R. Ramakrishnan, M. Livny.BIRCH:An efficient data clustering method for very large databases. In Proc.1996 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD'96).Montreal,Canada,2003.103-114
    [13]马帅,王腾蛟,唐世渭,杨冬青,高军.一种基于参考点和密度的快速聚类算法.软件学报,2003,14(6)：1089～1095
    [14]J. Han,M. Kamber.数据挖掘概念与技术(影印版).北京：高等教育出版社,2001,135～149
    [15]W Wang, J Yang, R Muntz. A statistical information grid approach to spatial data mining.In Proc 1997 Int Conf. Very Large Databases Athens, Greece:Aug,1997.186-195
    [16]D Fisher. Improving inference through conceptual clustering. In Proc 1987 AAAI Conf. Seattle,1987.461～-465
    [17]蔡元龙.模式识别.西安：西北电讯工程学院出版社,1986,213～235
    [18]Dorigo M, Maniezzo V,Coloni A.Introduction to natural algorithms. Rivista-di-Infomatic,1994.179～197
    [19]王明明,费洪刚.应用蚁群算法求解TSP.电脑编程技巧与维护.软件开发与设计,2009,72～74
    [20]薛瑞红,李杨.一种改进的蚁群算法及其在TSP问题中的检验.科技创新导报,2007,(36)：211～212
    [21]詹士昌,徐婕,吴俊.蚁群算法中有关算法参数的最优选择,科技通报,2003,19(5)：382-386
    [22]Dorigo M, Bonabeau E, Theraulaz G. Ant algorithm and stigmergy. Future Generation Computer System,2000.851～871
    [23]覃刚力,杨家本.自适应调整信息素的蚁群算法.信息与控制,2002,31(3)：198～201
    [24]陈守煜,黄宪成.确定目标权重和定性目标相对优属度一种新方法.辽宁工程技术大学学报(自然科学版),2002,21(2)：245～248
    [25]李杨,薛瑞红.基于图形的加权蚁群算法.辽宁工程技术大学学报(自然科学版),2008,27(2)：258～260
    [26]裴振奎,李华,宋建伟,韩锦峰.蚁群聚类算法研究及应用.计算机工程与设计,2008,29(19)：5009～5013
    [27]张听,彭宏,郑启伦.一种改进的蚁群算法.哈尔滨工程大学学报,2006,(27)：518～522
    [28]Parag M Kanade, Lawrence 0 Hall. Fuzzy ants as a clustering concept:proc of the 22nd International Conference of the North American Fuzzy Information Processing Society,2003.227～232
    [29]叶旭东,徐光宪,邵良杉.CRM系统在电力行业构建中的认识和考虑.企业管理信息化,2005,(10)：32～33
    [30]牛东晓,庆蕾,高冲.供电企业客户关系管理应用研究.商场现代化,2006,(35)：62～63
    [31]张彩庆,魏秀梅.供电企业CRM应用研究.电力信息化,2004,2(1)：46～48
    [32]Rainer alt, Thomas puschmann. successful practices in costomer relationship management. Hawaii international conference,2004.167～171
    [33]李丙春,耿国华.数据仓库与数据挖掘在电信中的应用.新疆大学学报：自然科学版,2002,19(3)：359～360
    [34]陈远高.基于数据挖掘的客户价值管理研究：[硕士学位论文].杭州：浙江大学,2002
    [35]H.Wilson, M.Clark,B. Smith. Justifying CRM projects in a business-to-business context:The potential of the Benefits Dependency Network. Industrial Marketing Management,2007.770～783
    [36]胡健,郭子仪.电力市场营销管理.北京：中国电力出版社,2002,41～42
    [37]曾鸣,贾振旺,黄昆彪.电力营销服务与电价体系.北京：中国电力出版社,2006
    [38]王敬敏,王振旗,周锋华.供电企业CRM中客户分析与评价研究.电力信息化,2007,37(4)：66～70
    [39]侯建英,谭忠富,李莉.供电企业CRM中对大客户评价的指标体系与模型.现代电力,2008,(6)：73～76
    [40]王彦辉.电力CRM中客户分析研究：[硕士学位论文].保定：华北电力大学,2004
    [41]马辉民,尹汉斌,肖威.客户潜在价值预测模型及细分研究.工业工程与管理,2003,(2)：25～26

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700