交通领域中的聚类分析方法研究

英文题名：Research on Clustering Algorithms in Traffic Domain
作者：李桃迎
论文级别：博士
学科专业名称：管理科学与工程
中文关键词：数据整合 ; 数据挖掘 ; 模糊聚类 ; 聚类融合 ; 增量聚类
英文关键词：Data Integration ; Data Mining ; Fuzzy Clustering ; Clustering Ensemble ; Incremental Clustering
学位年度：2010
导师：陈燕
学科代码：1201
学位授予单位：大连海事大学
论文提交日期：2010-09-01

摘要

随着信息化技术的发展,各领域系统中积累的数据越来越多,简单的查询统计功能已经满足不了实际需求,运用数据挖掘方法从现有数据中发现潜在、有意义的规律,获取有价值的知识,为高层管理与辅助决策提供依据已经成为解决问题的关键。因此,本文提出了“交通领域中的聚类分析方法研究”,主要包括：
     1.复杂多源异构数据整合方法研究,采用XML技术实现数据交换的接口,提供数据的共享与交互功能,解决各行业现行系统中数据的异构问题,从而满足不同系统之间数据的互联互通,为进行数据挖掘提供了数据准备。
     2.面向混合属性数据的权熵模糊c-均值优化方法研究,主要是针对现有算法的不足而提出。同时将其引入模糊关联规则中,以此提高了关联规则挖掘的精度和效率,同时拓展了模糊关联规则的应用范围。
     3.面向混合属性数据的聚类融合方法,提高聚类稳定性的同时,提高了聚类的精度和效率。给出了聚类融合的模型体系,并根据混合属性数据的特征进行了相应的扩充,包括分类、混合属性数据聚类成员的产生方法；共识函数的设计方法及步骤；簇的合并与分裂策略及步骤。
     4.研究基于聚类融合的混合属性数据增量聚类方法,针对增量聚类的研究中缺少对混合属性数据的研究,且增量方法易出现不稳定的现象,提出了基于聚类融合的增量聚类方法,分别讨论了有、无数据基础时的增量聚类问题,提高了聚类的精度和效率,节省了聚类的时间。
     5.研究聚类分析在交通领域中的应用,挖掘导致交通事故的原因和潜在规律,为相关管理部门提供辅助决策,预防交通事故的发生,确保国家、人民的生命财产安全。通过聚类在船舶等级划分中的应用,提高海事管理部门的管理效率,为管理者提供决策的依据。
With the development of information technology, the data stored in database of all fields becomes more and more, and simple query and statistic methods are not enough now. Providing the proof for high management and assistant decision is the key of solving problem, which makes use of data mining for discovering potential and meaningful rules from existing data and obtains valuable knowledge. Therefore, the "Research on Clustering Algorithms and Their Application in Traffic Domain" is proposed in this dissertation, which can be shown as follows:
     1. Integration methods for complexity, isomerism and multiple sources data, this method adopts XML technology to implement the interface of data interchange and provides data sharing and exchange, and solves the problem of data isomerism among existing systems in any field. Then it can implement data interconnection and mutual communication and prepare the data for data mining.
     2. Weighted entropy fuzzy c-means optimization method for mixed numerical and categorical data, which is proposed for overcoming the disadvantages of existing algorithms. Then it is introduced into fuzzy association rules, which improves the accuracy and efficiency of association rule algorithm and broadens the application range of association rule.
     3. Study clustering ensemble algorithm for mixed numerical and categorical data, this algorithm is able to increase the stability, accuracy and efficiency of clustering. The structure of clustering ensemble models is given in this dissertation, and then we expands the models for mixed numerical and categorical data, including the methods of producing clustering memberships for categorical data and mixed data, algorithms and steps of designing integration functions, and merging and dividing strategies and its procedure.
     4. Incremental clustering algorithm for mixed numerical and categorical data based on clustering ensemble. The algorithm is proposed for solving problems that research on incremental clustering algorithms is little and existing incremental clustering algorithms is often unstable. Then the incremental clustering algorithms with history data and without history data are discussed respectively, which increase the accuracy and efficiency of clustering, and reduce the clustering time.
     5. Application of clustering analysis in traffic domain, it mines the reasons and potential rules leading to traffic accidents and aids decision making for related management departments, which can be used to prevent the occurrence of traffic accidents and guarantee the safety of the nation and people's lives and property. The algorithm improves the management efficiency of maritime management organizations and provides proof of decision making by clustering applied in partitioning ship ranks.

引文

[1]http://www.uml.org.cn/zjjs/200911274.asp.
    [2]李宝玲.我国档案资源整合研究现状分析.档案学研究,2010,2：31-33.
    [3]张晓娟,张洁丽.我国信息资源整合研究现状分析.情报科学,2009,27(1)：26-32.
    [4]Yuan Yulai, Wu Yongwei, Feng Xiao, Li Jing, Yang Guangwen, Zhen Weimin. VDB-MR: MapReduce-based distributed data integration using virtual database. Future Generation Computer Systems,2010,26:1418-1425.
    [5]Eike Schallehn, Kai-Uwe Sattler, Gunter Saake. Efficient similarity-based operations for data integration. Data & Knowledge Engineering,2004,48:361-387.
    [6]Olga Brazhnik, John F. Jones. Anatomy of data integration, Journal of Biomedical Informatics, 2007,40:252-269.
    [7]Juraj Bartok, Ondrej Habala, Peter Bednar, Martin Gazak, Ladislav Hluchy. Data Mining and Integration for Predicting Significant Meteorological Phenomena. Procedia Computer Science,2010, 1:37-46.
    [8]Han J, Kamber M. Data mining:concepts and techniques (2nd ed). Morgan Kaufmann:Elsevier Inc,2006.
    [9]许卉莹,包勇强,江海龙,陈学浩,季君.道路交通事故数据分析挖掘技术研究.中国人民公安大学学报(自然科学版),2008,4(4)：69-73.
    [10]Berry M J A, Linoff G S著,别荣芳,尹静,邓六爱译.数据挖掘技术：市场营销、销售与客户关系管理领域应用(第二版),北京：机械工业出版社,2006.
    [11]张晓伟,谢强,陈伟.基于划分和孤立点检测的审计证据获取研究.计算机应用研究,2009,7：2495-2498.
    [12]秦庭荣.海运综合安全评估集成性方法(MIAM-FSA)构建及其应用研究：(博士学位论文).上海：上海海事大学,2008.
    [13]韩华.海洋综合观测系统信息集成与智能管理的研究：(博士学位论文).上海：东华大学,2008.
    [14]叶光.基于VV&A的船舶运动控制系统仿真的研究：(博士学位论文).大连：大连海事大学,2007.
    [15]朱飞祥.远洋船舶调度数据挖掘技术研究与应用：(博士学位论文).大连：大连海事大学,2008.
    [16]Jain A K, Murty M N, Flynn P J. Data clustering:A Review. ACM Computing Surveys,1999, 31(3):264-323.
    [17]Everitt B S, Landau S, and Leese M. Cluster Analysis (4th edition). London:Arnold Press, 2001.
    [18]殷瑞飞.数据挖掘中的聚类方法及其应用：(博士学位论文).厦门：厦门大学,2008.
    [19]Hansen P, Jaumard B. Cluster analysis and mathematical programming, Math Program,1997, 79:191-215.
    [20]Kolatch E. Clustering algorithms for spactial databases:A survey. http://citeseer.nj.nec.com/436 843.html.
    [21]He Q. A review of clustering algorithms as applied to ir, UIUCLIS-1999/6+IRG. Univ. Illinois at Urban-Champaign,1999.
    [22]Berkhin P. survey of clustering data mining. [2001-4-15] http://www.accrue.com/products/rp_cluster_review.pdf.
    [23]Murtagh F. A survey of recent advances in hierarchical clustering algorithms. Computer Journal, 1983,26(4):354-359.
    [24]Baraldi A, Blonda P. A survey of fuzzy clustering algorithms for pattern recognition-Part I and II. IEEE transactions on Systems Man and Cybernetics Part B-Cybernetics,1999,29(6):778-801.
    [25]刘远超,王晓龙,徐志明,关毅.文档聚类综述.中文信息学报,2006,3：53-62.
    [26]章成志,王惠临.多语言文本聚类研究综述.现代图书情报技术,2009,179(6)：31-36.
    [27]张建华,江贺,张宪超.蚁群聚类算法综述.计算机工程与应用,2006,16：171-174.
    [28]李峻金,向阳,芦英明,吴朔桐.粒子群聚类算法综述,计算机应用研究,2009,26(12)：4423-4427.
    [29]陈树.聚类算法模型的研究及应用：(博士学位论文).无锡：江南大学,2007.
    [30]Taoying Li. Yan Chen. An improved k-means algorithm for clustering using entropy weighting measures. Proceedings of the 7th World Congress on Intelligent Control and Automation (WCICA2008),2008,149-153.
    [31]Huang J Z, Ng M K, Rong H, and Li Z. Automated Variable Weighting in k-Means Type Clustering. IEEE Trans. Pattern Analysis and Machine Intelligence,2005,27(5):1-12.
    [32]Friguiand H, Nasraoui O. Unsupervised Learning of Prototypes and Attribute Weights. Pattern Recognition,2004,37(3):567-581.
    [33]Chan Y, Ching W, Ng M K, and Huang J Z. An Optimization Algorithm for Clustering Using Weighted Dissimilarity Measures. Pattern Recognition,2004,37(5):943-952.
    [34]Domeniconi C, Papadopoulos D, Gunopulos D, and Ma S. Subspace Clustering of High Dimensional Data, Proc. SIAM Int'l Conf. Data Mining,2004, http://cs.gmu.edu/-carlotta/publications/.
    [35]Domeniconi C. Locally Adaptive Techniques for Pattern Classification:[dissertation]. Berkeley: UNIVERSITY OF CALIFORNIA,2002.
    [36]Jing Liping, Ng Michael K., and Huang Joshua Zhexue. An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,2007,19(8):1026-1041.
    [37]杨善林,李永森,胡笑旋,潘若愚.K-means算法中的k值优化问题研究.系统工程理论与实践,2006,97-101.
    [38]Sharan R, and Shamir R. CLICK:A Clustering Algorithm for Gene Expression Analysis. The International Conference on Intelligent for Molecular Bioogy,2000,260-268.
    [39]Dhillon I S and Modha D S. Concept Decompositions for Large Sparse Text Data using Clustering. Machine Learning,2001,42(3):143-175.
    [40]Zhang T, Ramakrishnan R, Livny M. BIRCH:An efficient data clustering method for very large databases. SIGMOD Conference,1996,103-114.
    [41]Guha S, Rastogi R, Shim K. CURE:An efficient clustering algorithm for clustering large databases. Proceedings of the Symposiumon Management of Data (SIGMOD),1998,73-84.
    [42]Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications. Proc of 1998 ACM SIGMOD Intl Conf on Management of Data. Seattle, Washington:ACM Press,1998,94-105.
    [43]Cheng C H, Fu A W, and Zhang Y. Entropy-Based Subspace Clustering for Mining Numerical Data. Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge and Data Mining,1999,84-93.
    [44]Aggarwal C, Yu P S. Finding Generalized Projected Clusters in High Dimensional Spaces. Proc. ACMSIGMOD Int'l Conf. Management of Data,2000,70-81.
    [45]Aggarwal C C, Han J W, Wang J Y, et al. A framework for clustering evolving data streams. Proceedings of the 29th VLDB Conference, Berlin:VLDB Endowment,2003,81-92.
    [46]Johnson S C. Hierarchical Clustering Schemes. Psychometrika,1967,2:241-254.
    [47]Ester M, Kriegel H, Sander J, and Xu X. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, The 2nd International conference on knowledge Discovery and Data Mining, Portland,1996,226-231.
    [48]Ankerst M, Breunig M M, et al. OPTICS:ordering points to identify the clustering structure. Proc ACM SIGMOD'99 Int Conf on Management of Data. Philadelphia Pennsylvania:ACM Press, 1999,49-60.
    [49]Katsavounidis I, Kuo C, Zhang Z. A new initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letters,1994,1(10):144-146.
    [50]孙即祥.现代模式识别.长沙：国防科技大学出版社,2002.
    [51]Sander J, Ester M, Kriegel H, and Xu X. Density-based Clustering in Spatial Databases:The Algorithm GDBSCAN and its Applications. Data Mining and Knowledge Discovering,1998, 2:169-194.
    [52]Keinosuke Fukunaga. Introduction to Statistical Pattern Recognition. Boston:Boston Academic press,1990.
    [53]王晶,夏鲁宁,荆继武.一种基于密度最大值的聚类算法.中国科学院研究生院学报,2009,26(4)：539-548.
    [54]张伟莉,倪志伟,赖建章.一种新的基于网格的聚类算法.计算机应用研究,2008,25(2)：1337-1339.
    [55]Wang W, Yang J, Muntz R. Sting:a statistical information grid app roach to spatial data mining. Proceedings of the 23rd conference on VLDB, Athens, Greece,1997,186-195.
    [56]Gholamhosein Sheikholeslami, Surojit Chatterjee, Aidong Zhang. Wavecluster:a multi-resolution clustering app roach for very large spatial databases. Proceedings of the 24th Conference on VLDB, New York, NY,1998,428-439.
    [57]Agrawal R, Imielinski T, and Swami A, Mining Association Rules between Sets of Items in Large Data bases. In proceedings Of the ACM SIGMOD Conference on Management of Data, Washington DC, USA,1993,207-216.
    [58]刘俊岭,孙焕良,王大玲,牛志成.一种优化的基于网格的聚类算法.小型微型计算机系统,2006,27(10)：1927-1930：
    [59]曲福恒.一类模糊聚类算法研究及其应用：(博士论文).长春：吉林大学,2009.
    [60]Sun Z, Li C. A mean approximation approach to a class of grid-based clustering algorithms, Journal of Software,2003,14(7):1267-1274.
    [61]宋浩远.基于模型的聚类方法研究.重庆科技学院学报(自然科学版).2008,10(3)：71-73.
    [62]Fisher D. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 1987,2:139-172.
    [63]Gennari J, Langley P, Fisher D. Models of incremental concept formation. Artificial Intelligence. 1989,40(1):11-61.
    [64]Cheeseman R, Stutz J. Bayesian classification (Auto Class):theory and results. Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press,1996,153-180.
    [65]McCulloch W S, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics,1943,5:115-133.
    [66]黄丽娟,甘筱青.基于SOFM神经网络的e-供应链客户聚类分析及营销策略,系统工程理论与实践,2009,12：49-55.
    [67]Ujjwal Maulik, Anirban Mukhopadhyay. Simulated annealing based automatic fuzzy clustering combined with ANN classification for analyzing microarray data. Computers &OperationsResearch, 2010,37:1369-1380.
    [68]Sungjune Park. Neural networks and customer grouping in e-commerce-a framework using fuzzy ART. Proceedings of Academia Industry Working Conference on Research Challenges,2000, 331-336.
    [69]Melchiorre C, Matteucci M, Azzoni A, Zanchi A. Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology,2008,94:379-400.
    [70]Enrique H. Ruspini. A New Approach to Clustering, Information and Control,1969,15(1): 22-32.
    [71]Enrique H. Ruspini. Numerical methods for fuzzy clustering. Information Science,1970,2: 319-360.
    [72]Enrique H. Ruspini. New experimental results in fuzzy clustering. Information Science,1973,6: 273-284.
    [73]Enrique H. Ruspini. A fast method for probablistic and fuzzy cluster analysis using association measures. Proc. Hawaii Int. Conf. Syst. Sci,1973,56-58.
    [74]Vijayalakshmi Pai G A, Implementation of Fuzzy Clustering Using FUZZY ENVIRON. http://ieeexplore.ieee.org/iel2/924/7708/00323059.pdf,1993.
    [75]Tamra S. Pattern classification based on fuzzy relations. IEEE Trans on Systems, Man, and Cybernetics,1971,1 (1):217-242.
    [76]Zadeh L A. Similarity relations and fuzzy orderings. Information Science,1971,3 (2):177-200.
    [77]Backer E, Jain A K. A clustering performance measure based on fuzzy set decomposition. IEEE Trans on Pattern Analysis and Ma chine Intelligence,1981,3(1):66-77.
    [78]Dunn J C. A fuzzy relative of the ISO Data process and its use in detecting compact well separated cluster. Cyber net,1974,3(1):32-57.
    [79]Le Z. Fuzzy relation compositions and pattern recognition. Information Science,1996,89: 107-130.
    [80]Zahn C T. Graph theoretical methods for detecting and describing gestalt clusters. IEEE Trans on Computers,1971,20 (1):68-86.
    [81]丁斌.动态Fuzzy图最大树聚类分析.数值计算与计算机应用,1992,13(2)：157-159.
    [82]Wu Z, Leathy R. An optimal graph theoretic approach to data clustering theory and its application to image segmentation. IEEE Transaction Pattern Analysis and Machine Intelligence, 1993,15(11):1101-1113.
    [83]阳琳赞,王文渊.聚类融合方法综述.计算机应用研究,2005,12：8-10.
    [84]Strehl J Ghosh. Cluster ensembles-a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research,2002,3:583-617.
    [85]Fred A L. Finding Consistent Clusters in Data Partitions. The 2nd International Workshop on Multiple Classifier Systems, Volume 2096 of Lecture Notes in Computer Science, Cambridge: Springer,2001,309-318.
    [86]Li T Y, Chen Y. Fuzzy Clustering Ensemble Algorithm for Partitioning Categorical Data. The 2009 International Conference on Business Intelligence and Financial Engineering, Beijing:IEEE Computer Society Press,2009,170-174.
    [87]杨燕.基于计算智能的聚类组合算法研究：(博士学位论文).西安：西南交通大学,2006.
    [88]Muna Saleh Al-Razgan. Weighted Clustering ensembles:(dissertation). George Mason University,2008.
    [89]He Zengyou, Xu Xiaofei, Deng Shengchun. A cluster ensemble method for clustering categorical data. Information Fusion,2005,6:143-151.
    [90]刘建国.复杂网络模型构建及其在知识系统中的应用：(博士学位论文).大连：大连理工大学,2006.
    [91]杨祖元,徐姣,罗兵,杜长海.基于SFLA-FCM聚类的城市交通状态判别研究.计算机应用研究,2010,27(5)：1743-1745.
    [92]杜长海,黄席樾,杨祖元,邓天民,詹建平.改进的FCM聚类在交通时段自动划分中的应用.计算机工程与应用,2009,45(24)：190-193.
    [93]张磊,王建强,杨馥瑞,李克强.驾驶员行为模式的因子分析和模糊聚类.交通运输工程学报,2009,9(5)：121-126.
    [94]许洪国,刘兆惠,王超.道路安全等级定权聚类评价模型及因素辨析.交通运输工程学报,2007,7(2)：94-98.
    [95]胡江碧,曹新涛.道路交通事故肇事驾驶员特征分析.中国公路学报.2009,22(6)：106-110.
    [96]张良春,夏利民,石华玮.基于模糊聚类支持向量机的高速公路事件检测.计算机工程与应用,2007,43(17)：206-208.
    [97]曹阳,陈天滋,柴勇.基于GIS的道路事故黑点聚类应用研究.微计算机信息,2006,22(11-1)：253-255.
    [98]Tessa K Anderson. Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Analysis and Prevention,2009,41:359-364.
    [99]Raktim Mitra, Ron N. Buliung, Guy E.J. Faulkner. Spatial clustering and the temporal mobility of walking school trips in the Greater Toronto Area, Canada. Health & Place,2010,16(4):646-655.
    [100]Jiuh-Biing Sheu. A fuzzy clustering approach to real-time demand-responsive bus dispatching control. Fuzzy Sets and Systems,2005,150(3):437-455.
    [101]Lynn B Meuleners, Delia Hendrie, Andy H. Lee, Matthew Legge. Effectiveness of the Black Spot Programs in Western Australia. Accident Analysis & Prevention,2008,40(3):1211-1216.
    [102]刘丽娜,郭子坚,宋向群.基于聚类分析的港口类型化研究.港工技术,2006,4：30-31.
    [103]肖红,杨东援,陈幼林.外贸集装箱运输CFS货运站内陆延伸的聚类分析.计算机工程与应用,2007,43(18)：184-187.
    [104]杨静蕾,李欣.港口群内港口间协调发展评价.上海海事大学学报,2009,20(3)：54-58.
    [105]陈航,栾维新,王跃伟.基于聚类分析的我国港口城市类型化研究.水运工程,2008,8：66-70.
    [106]Cesar Ducrueta, Celine Rozenblatb, and Faraz Zaidic. Ports in multi-level maritime networks: evidence from the Atlantic (1996-2006). Journal of Transport Geography,2010,18(4):508-518.
    [107]Tsai Ming-Chih, Su Chin-Hui. Political risk assessment of five East Asian ports—the viewpoints of global carriers. Marine Policy,2005,29(4):291-298.
    [108]Lin Ying-Dar, Lu Chun-Nan, Lai Yuan-Cheng, Peng Wei-Hao, Lin Po-Ching. Application classification using packet size distribution and port association. Journal of Network and Computer Applications,2009,32(5):1023-1030.
    [109]宋云雪,于宏超,史永胜.动态聚类法的粗糙集规则提取.计算机工程与设计,2010,31(13)：3054-3056.
    [110]梁建海,杜军,冀捐灶.航空发动机性能识别SOFM收缩聚类方法.传感器与微系统,2006,25(11)：70-72.
    [111]Feyza Gurbiiz, Lale Ozbakir, Huseyin Yapici. Classification rule discovery for the aviation incidents resulted in fatality. Knowledge-Based Systems,2009,22(8):622-632.
    [112]Tasha R Inniss. Seasonal clustering technique for time series data. European Journal of Operational Research,2006,175(1):376-384.
    [113]Joseph Sarkis, Srinivas Talluri. Performance based clustering for benchmarking of US airports. Transportation Research Part A:Policy and Practice,2004,38(5):329-346.
    [114]孟宪尧,韩新洁,孟松.C均值聚类算法及在故障诊断中的应用.控制工程,2006,13(S1)：198-201.
    [115]孟宪尧,韩新洁.模糊C-均值聚类算法及其在船舶故障诊断中的应用.中国造船,2007,48(4)：98-103.
    [116]贾俊涛,李伟才,赵丽青,汤志旭,徐琴,金燕,杨伟克.山东主要港口入境船舶压载水中细菌组成的等级聚类分析.海洋环境科学,2010,29(4)：541-544.
    [117]王天真,汤天浩.一种基于动态数据窗口的复合聚类方法及在中的应用.模式识别与人工智能,2005,18(4)：506-512.
    [118]汤亚波,刘晓军,徐守时.一种遥感图像海上船舶多级自适应聚类分割方法.计算机应用,2005,25(9)：2126-2127.
    [119]Hamilton L J. Characterising spectral sea wave conditions with statistical clustering of actual spectra. Applied Ocean Research,2010,32(3):332-342.
    [120]陈燕.数据仓库与数据挖掘.大连：大连海事大学出版社,2006.
    [121]甘玲.油库系统数据集成及应用：(硕士学位论文).重庆：重庆大学,2003.
    [122]Nelson Souto Rosa, and Paulo Roberto Freire Cunha. A Software Architecture-Based Approach for Formalising Middleware Behaviour. Electronic Notes in Theoretical Computer Science,2004,108:39-51.
    [123]杨晓强,陈冰,魏生民.用基于XML的中间件访问异构数据库.计算机应用研究,2004,6：205-206.
    [124]严玮峰,李生琦.XML和RDF异构数据源的语义集成和检索.计算机工程.2008,34(9)：73-73.
    [125]姚全珠,白敏,黄蔚.基于模型驱动的ETL模型映射方法.计算机工程,2009,35(19)：91-93.
    [126]http://www.dwway.com/html/67/n-4667.html.
    [127]张宁,贾自艳,史忠植.数据仓库中ETL技术的研究.计算机工程与应用,2002,24：213-216.
    [128]安中华,安琼.模糊聚类的有效性研究.湖北大学学报,2006,28(3)：222-226.
    [129]Shannon C E. A Mathematical Theory of Communication. Bell Syst Tech,1948, ⅩⅩⅦ(3): 379-423.
    [130]Bedzek J C. Cluster Validity with Fuzzy Sets. Journal of Cybernetics,1973,3(3):58-72.
    [131]Xie X L, Beni G. A Validity Measure for Fuzzy Clustering. IEEE Trans on Pattern Analysis and Machine Intelligence,1991,8(13):841-847.
    [132]Rhee H. A Validity Measure for Fuzzy Clustering and Its Use in Selecting Optimal Number of Clusters. Proc of the 5th IEEE Int'l Conf on Fuzzy System,1996,1020-1025.
    [133]Kwon S H. Cluster Validity Index for Fuzzy Clustering. Electronics Letters,1998,34(22): 2176-2177.
    [134]谢季坚,刘承平.模糊数学方法及其应用.武汉：华中科技大学出版社,2000.
    [135]李晓红,田军委.面向FCM聚类阈值分割的聚类有效性判别函数.安徽大学学报(自然科学版),2007,31(5)：23-27.
    [136]Ahmad A, Dey L. A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering,2007,63:503-527.
    [137]Huang Z. Clustering large datasets with mixed numeric and categorical values. Proceedings of the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, World Scientific, Singapore,1997,21-34.
    [138]陆建江,张亚非,宋自林.模糊关联规则的研究与应用.北京：科学出版社,2008.
    [139]赵宇,李兵,李秀,刘文煌,任守榘.混合属性数据聚类融合算法.清华大学学报(自然科学版),2006,46(10)：1673-1676.
    [140]Minaei-Bidgoli B, Topchy A and Punch W F. A Comparison of Resampling Methods for Clustering Ensembles. Proceedings of Intl. Conf. on Machine Learning, Models, Technologies and Applications,2004,939-945.
    [141]Reza Ghaemi, Md. Nasir Sulaiman, Hamidah Ibrahim, Norwati Mustapha. A Survey: Clustering Ensembles Techniques. World Academy of Science, Engineering and Technology,2009, 50:636-645.
    [142]Dudoit S, and Fridlyand J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics,2003,19 (9):1090-1099.
    [143]Topchy A, Jain A K, Punch W F. Combining Multiple Weak Clustering. Proceedings of the 3 rd IEEE International Conference on Data Mining (ICDMP03),2003,331-338.
    [144]Topchy A, Jain A K, Punch W. A Mixture Model for Clustering Ensembles. Proceedings of the SIAM International Conference on Data Mining, Michigan State University, USA,2004,379-90.
    [145]Fred A, Jain A K. Data Clustering Using Evidence Accumulation. Proceedings of the 16 th International Conference on Pattern Recognition (ICPR 2002),2002,4:276-280.
    [146]Fred A, Jain A K. Evidence Accumulation Clustering Based on the K-means Algorithm. Proceedings of the International Workshops on Structural and Syntactic Pattern Recognition (SSPR 2002),2002,442-451.
    [147]Strehl A, Ghosh J. Cluster Ensembles:A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research,2003,3(3):583-617.
    [148]Fred L N, and Jain A K. Data clustering using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,835-850.
    [149]Hsu C C, Huang Y. Incremental clustering of mixed data based on distance hierarchy. Expert Systems with Applications,2008,35(3):1177-1185.
    [150]Somlo G L, Howe A E. Incremental Clustering for Profile Maintenance in Information Gathering Web Agents. The fifth international conference on Autonomous agents, New York:ACM Press,2001,262-269.
    [151]Hartigan J A. Clustering Algorithms. New York:John Wiley & Sons, Inc,1975.
    [152]Carpenter G, Grossberg S. Art3:Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Networks,1990,3(2):129-152.
    [153]Can F, Fox E A, Snavely C D, et al. Incremental clustering for very large document databases: Initial MARIAN experience. Information Systems,1995,84:101-114.
    [154]Can F. Incremental clustering for dynamic information processing. ACM Transaction for Information Systems,1993,11:143-164.
    [155]Langford T, Giraud-Carrier C G, Magee J. Detection of infectious outbreaks in hospitals through incremental clustering. The 8th Conference on AI in Medicine (AIME), Berlin:Springer, 2001:30-39.
    [156]Lin J, Vlachos M, Keogh E J, et al. Iterative Incremental clustering of time series. Lecture notes in computer science,2004,106-122.
    [157]Charikar M, Chekuri C, Feder T, et al. Incremental clustering and dynamic information retrieval. The twenty-ninth annual ACM symposium on Theory of computing, El Paso:ACM Press, 1997,626-635.
    [158]Charikar M, O'Callaghan L and Panigrahy R. Better streaming algorithms for clustering problems. In Proc. of 35th ACM Symposium on Theory of Computing,2003,30-39.
    [159]Simovici D, Singla N, Kuperberg M. Metric incremental clustering of nominal data. The 4th IEEE International Conference on Data Mining, Brighton:IEEE Computer Society Press,2004, 523-526.
    [160]陈宁,陈安,周龙骧.基于密度的增量式网格聚类算法.软件学报,2002,13(1)：1-7.
    [161]黄永平,邹力鹃.数据仓库中基于密度的批量增量聚类算法.计算机工程与应用,2004,29：206-208.
    [162]徐新华,谢永红.增量聚类综述及增量DBSCAN聚类算法研究.华北航天工业学院学报,2006,16(2)：15-17.
    [163]刘建晔,李芳.一种基于密度的高性能增量聚类算法.计算机工程.2006,32(21)：76-78.
    [164]Chen C, Hwang S, Oyang Y. An incremental hierarchical data clustering algorithm based on gravity theory. Proceedings of the Sixth Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Berlin:Springer,2002,2336:237-250.
    [165]Widyantoro D H, Ioerger T R, Yen J. An incremental approach to building a cluster hierarchy. Proceedings of the 2002 IEEE International Conference on Data Mining. New York:IEEE Press, 2002,705-708.
    [166]He J, Lan M, Tan C L, et al. Initialization of cluster refinement algorithms:a review and comparative study. Proceedings of International Joint Conference on Neural Networks. Buda Pest, Hungary,2004,297-302.
    [167]李新良.数据挖掘中聚类初始化方法的优化研究.计算技术与自动化,2008,27(2)：130-133.
    [168]张霞,王素珍,尹怡欣,赵海龙.基于模糊力度计算的K-means文本聚类算法研究.计算机科学,2010,37(2)：209-211.
    [169]田生文,王伊蕾,李阿丽.一种应用复杂网络特征的K-means初始化方法.计算机工程与应用,2010,46(6)：127-129.
    [170]杨圣云,袁德辉,赖国明.一种新的聚类初始化方法.计算机应用与软件,2007,24(8)：50-52.
    [171]申晓勇,雷英杰,蔡茹,雷阳.一种基于密度函数的直觉模糊聚类初始化方法.计算机科学,2009,36(5)：197-199.
    [172]马秀丽,焦李成.联合模型初始化独立谱聚类算法.西安电子科技大学学报(自然科学版),2007,34(5)：768-772.
    [173]盛莉,邹开其,邓冠男.基于网格和密度的模糊c均值聚类初始化方法.计算机应用与软件,2008,25(3)：22-24.
    [174]沈洁,林颖,陈志敏,赵敏涯.基于增量式蚁群聚类的用户访问模式挖掘.计算机应用,2005,25(7)：1654-1657.
    [175]Lughofer E. Extensions of vector quantization for incremental clustering. Pattern Recognition, 2008,41:995-1011.
    [176]吴琪,左万利.一种基于距离的增量聚类算法.湖南工程学院学报,2005,15(3)：41-44.
    [177]高新波.模糊聚类分析及其应用.西安：西安电子科技大学出版社,2004.
    [178]彭勇,吴友情.一种新的聚类有效性函数.计算机工程与应用,2010,46(6)：124-126.
    [179]孟海东,王淑玲,郝永宽.基于簇特征的增量聚类算法设计与实现.计算机工程与应用,2010,46(24)：132-134.
    [180]苏晓珂,兰洋,程耀东,万仁霞.基于约束的混合属性增量聚类算法.计算机工程与设计,2010,31(8)：1799-1781.
    [181]李洁,高新波,焦李成.一种基于GA的混合属性特征大数据集聚类算法.电子与信息学报,2004,26(8)：1203-1209.
    [182]Lauritzen S L. The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis,1995,19(2):191-201.
    [183]刘靖明,韩丽川,候立文.一种新的聚类算法——粒子群聚类算法.计算机工程与应用,2005,41(20)：183-185.
    [184]Yang F Q, Sun T L, Zhang C H. An efficient hybrid data clustering method based on K-harmonic means and partical swarm optimization. Expert Systems with Applications,2009 36(6): 9847-9852.
    [185]Hammerly G, Elkan C. Alternatives to the k-means algorithm that find better clusterings. Proceedings of the 11th International Conference on Information and Knowledge Management. New York:ACM Press,2002,600-607.
    [186]Kao Y T, Zahara E, Kao I W. A hybirdized approach to data clustering. Expert Systems with Applications,2008,34(3):1754-1762.
    [187]Liu Bo, Pan Jiuhui, and McKay R I (Bob). Incremental Clustering Based on Swarm Intelligence. Proceedings of Simulated Evolution and Learning-6th International Conference,2006, 189-196.
    [188]Chen Zhuo, Meng Qing-Chun. An incremental clustering algorithm based on swarm intelligence theory[C]. In:Proceedings of 2004 International Conference on Machine Learning and Cybernetics,2004,3:1768-1772.
    [189]Deneubourg J L, Goss S, Franks N, et al. The dynamics of collective sorting:Robot-like ants and ant-like robots. Proceedings of the First international Conference on Simulation of Adaptive haviour, From Animals to Animals J, Cambridge MA:MIT Press,1991,356-365.
    [190]Bonabeau E, Dorigo M, Theraulaz G. Swarm Intelligence-From Natural to Artificial System. New York:Oxford University Press,1999.
    [191]杨新斌,孙京诰,黄道.一种进化聚类学习新方法.计算机工程与应用,2003,39(15)：60-62.
    [192]张斌,苏一丹,曹波.基于蚁群聚类模型的增量式Web用户聚类.微计算机信息,2008,24(5-3)：231-233.
    [193]Timmis J, Neal M. A resource limited artificial immune system for data analysis. Knowledge-Based Systems,2001,14(34):121-130.
    [194]Castro L N, Von Zuben F J. An evolutionary immune network for data clustering, Proceedings of the 6th Brazilian Symposium on Neural Networks,2000,84-89.
    [195]李向华,王钲旋,吕天阳,车翔玖,基于混沌和免疫应答的增量聚类新算法.自动化学报,2010,32(2)：208-214.
    [196]张晓龙,曾伟.实时数据流聚类的研究新进展.计算机工程与设计,2009,30(9)：2177-2181.
    [197]金澈清,钱卫宁,周傲英.流数据分析与管理综述.软件学报,2004,15(8)：1172-1181.
    [198]Aggarwal C C, Han J W, Wang J Y, et al. A framework for projected clustering of high dimensional data streams. Proceedings of the 30th VLDB Conference. Toronto:VLDB Endowment, 2004,852-863.
    [199]Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. The 41st Annual Symp on Foundations of Computer Science, FOCS 2000, Redondo Beach:IEEE Computer Society, 2000,359-366.
    [200]Guha S, Meyerson A, Mishra N, Motwani R and O'Callagham L. Clustering Data Streams: Theory and Practice. IEEE Transactions on Knowledge and Data Engineering,2003,15(3):515-528.
    [201]Babcock B, Datar M, Motwani R and O'Callaghan L. Maintaining Variance and k-Medians over Data Streams Windows. Proceedings of the 22nd Symposium on Principles of Database Systems,2003,234-243.
    [202]O'Callaghan L, Mishra N, Meyerson A, Guha S and Motwani R. Streaming-data algorithms for high quality clustering. Proceedings of IEEE International Conference on Data Engineering, 2002,685-694.
    [203]Udommanetanakit K, Rakthanmanon T and Waiyamai K. E-Stream:Evolution-Based Technique for Stream Clustering. Berlin Heidelberg:Springer-Verlag,2007,605-615.
    [204]Chen Y X, Tu L. Density-based clustering for real-time stream data. Proceedings of the 13th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, California: ACM Press,2007:133-142.
    [205]Bhatnagar V, and Kaur S. Exclusive and Complete Clustering of Streams. Berlin Heidelberg: Springer-Verlag,2007,629-638.
    [206]Cao F, Ester M, Qian W and Zhou A. Density-Based Clustering over an Evolving Data Stream with Noise. Proceedings of the SIAM Conference on Data Ming,2006,328-339.
    [207]Motoyoshi M, Miura T and Shioya I. Clustering Stream Data by Regression Analysis. The Australasian Workshop on Data Mining and Web Intelligence (DMWI2004), Dunedin, New Zealand, 2004,115-120.
    [208]姚文集,高明霞,毛国君,李广奎,基于滑动窗口的XML数据流聚类算法.计算机工程,2010,36(13)：87-89.
    [209]常建龙,曹锋,周傲英.基于滑动窗口的进化数据流聚类.软件学报,2007,18(4)：905-918.
    [210]彭源.Web流数据聚类挖掘技术研究.电脑知识与技术,2010,6(4)：935-936.
    [211]李晓明,夏秀峰,张斌.一种具有增量挖掘功能的Web点击流聚类算法.沈阳大学学报,2010,22(3)：8-10.
    [212]文益民,杨旸,吕宝粮.集成学习算法在增量学习中的应用研究.计算机研究与发展,2005,42：222-227.
    [213]http://news.163.com/05/1215/12/2510AHJ700011MS6.html.
    [214]http://news.dayoo.com/china/57400/200901/04/57400_5065587.htm.
    [215]http://www.167ok.com/shop/Normal/news_view.asp?CompanyMemberID=&ID=3653
    [216]Fong Joseph, Cheung San Kuen. Translating relational schema into XML schema definition with data semantic preservation and XSD graph. Information and Software Technology, 2005,47:437-462.
    [217]Fong J, et al. Converting relational database into XML documents with DOM. Information and Software Technology,2003,45:335-355.
    [218]周杭霞,夏荣钊,何利力.基于XML数据安全交换的方法.计算机应用研究,2006,4：126-128.
    [219]http://archive.ics.uci.edu/ml/datasets.html
    [220]Li Taoying, Chen Yan. A weight entropy k-means algorithm for clustering dataset with mixed numeric and categorical data. Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery,2008,36-41.
    [221]钱线,黄萱菁,吴立德.初始化K-means的谱方法.自动化学报,2007,33(4)：342-346.