交易数据的聚类分析

英文题名：Clustering Transactional Data
作者：晏华
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：交易数据 ; 覆盖密度 ; SCALE框架 ; 聚类结构 ; 频繁项集
英文关键词：Transactional data ; Coverage Density ; SCALE Framework ; Clustering Structure ; Frequent Itemsets
学位年度：2008
导师：章毅
学科代码：081203
学位授予单位：电子科技大学
论文提交日期：2008-09-01

摘要

聚类分析，是将物理或抽象对象集合划分为由相似对象组成的多个类的过程。近年来，随着数据挖掘技术的发展，聚类分析作为数据挖掘的重要内容得到了广泛的研究，并应用于许多领域中。
     随着信息与互联网技术的发展，人们拥有的数据不仅数量越来越庞大，而且数据类型越来越复杂、结构越来越多样。因此，现有的聚类算法在实际应用中仍然面临两个问题：1)算法在处理大规模数据时，性能急剧下降甚至无法完成数据分析，不具有可伸缩性；2)很多聚类算法局限于理论上的分析，较少考虑具体应用中的实际数据特征与差异，因而实用性差。
     交易数据是一类特殊的类别数据，具有数据量大和维数高的特点。典型的交易数据包括购物篮数据、WEB日志数据、客户信息、病人诊断记录以及图像信息等，通常产生于零售业、电子商务、医疗以及电信、保险、银行等行业。因此，针对交易数据，研究可伸缩聚类分析方法是一个同时具有挑战性和实际意义的课题。本论文以大规模交易数据为研究对象，重点研究大规模交易数据聚类分析中的一些问题。本文的主要研究内容和创新点包括以下几个方面：
     (1)提出了可伸缩的大规模交易数据聚类分析框架，即SCALE(Sampling，Clustering structure Assessment，cLustering and domain-specific Evaluation)。SCALE的设计具有下列特点：1)针对交易数据的特征，提出采用覆盖密度以及加权覆盖密度有效地测量一组交易数据的整体相似度；2)基于加权覆盖密度设计和实现可伸缩的WCD交易数据聚类算法；3)采用聚类结构探测方法生成候选的聚类数量，有效地减少聚类算法参数空间的搜索；4)将聚类结果评估集成到该框架下，用领域特定的度量辅助用户选择最优的聚类结果。实验结果表明SCALE框架下的交易数据聚类分析能生成高质量的交易数据聚类结果。
     (2)研究了交易数据聚类结构探测的问题。针对通用类别数据聚类结构识别方法BKPlot的两个弱点，即噪音候选聚类数量多以及处理具有大量数据项的交易数据集时算法性能下降，提出在交易数据集找出一组候选的最优聚类数量“Ks”的新方法，即DMDI方法。以自定义的交易聚类模式相异度度量为基础设计和开发出一种凝聚的层次聚类算法，即ACTD算法。利用ACTD算法在聚类过程中生成的合并索引值可发现候选的最优聚类数量。实验表明，DMDI方法能有效地识别交易数据聚类结构。
     (3)研究了交易数据聚类分析结果的稳定性问题。传统基于划分的聚类方法的聚类结果常常陷入局部最优，而SOM神经网络的聚类结果稳定，但只能处理数值型数据。为此，本文提出了一种基于GHSOM神经网络的交易数据聚类分析方法，即GHSOM-CD方法。该方法在GHSOM网络学习算法中引入覆盖密度的概念，改进了神经元权值更新方法以及网络训练停止条件。实验表明GHSOM-CD方法在交易数据集上产生的聚类结果更有意义，是SOM神经网络在类别数据聚类分析上的扩展应用。
     (4)研究了频繁项集的压缩问题。针对频繁项集挖掘中频繁项集数量过多的问题，研究并提出一种动态聚类的方法，即EESC算法，近似压缩频繁项集。该聚类方法基于自定义的频繁项集类内相似度度量：表达式相似度和支持度相似度。实验结果显示这种近似的频繁项集压缩方法是可行的并且压缩质量好。
Clustering is a process that partitions a set of physical or abstract objects intoa set of disjoint clusters such that the objects within the clusters are close to eachother and the objects from different clusters are dissimilar from each other.As animportant tool of Data Mining,clustering methods are studied extensively in recentyears and applied in many application areas.
     Nowadays the volume of data that people owned becomes larger and larger.Even worse,the types and structures of data are more complex and versatile.Sothe existing clustering algorithms face two problems in real applications:1) Theperformance of algorithms comes down dramatically and,in worse cases,these al-gorithms may not be able to perform data analysis for lack of scalability;2) Manyclustering algorithms are limited in theory analysis,while the features of real dataand differences among applications are seldom considered.So the practicability ofthese algorithms are not good.
     Transactional data is a kind of special categorical data with large volume andhigh dimensions.Typical examples of transactional data are market basket data,web usage data,customer profiles,patient symptoms records,and image features.The research on scalable clustering algorithm for transactional data is both chal-lenging and meaningful.The main research topics and contributions of this thesisare as follows:
     (1) A scalable clustering framework for large-scale transactional data is pro-posed,i.e.SCALE(Sampling,Clustering structure Assessment,cLustering anddomain-specific Evaluation).The SCALE has the following four features.First,the set-based similarity measures Coverage Density and Weighted Coverage Den-sity are defined according to the features of transactional data.Second,a scalableclustering algorithm based on Weighted Coverage Density for transactional data isdesigned and implemented.Third,the clustering structure detecting can efficientlyreduce the search on parameter space of clustering algorithms.Finally,the domain-specific measures is used to help users selecting the optimal clustering result.Theexperimental results show that the scalable clustering algorithm powered by the SCALE framework can efficiently generate high quality clustering results.
     (2) The problem of detecting clustering structure for transactional datasets isstudied.Since the generic categorical data clustering structure detecting methodBKPlot has two weaknesses on dealing with transactional data,the DMDI mcthodis specially proposed for finding clustering structure in transactional data.Based onthe concept of Transactional-cluster-modes Dissimilarity,an agglomerative hierar-chical clustering algorithm,i.c.ACTD algorithm is designed and implemented.Thepair-cluster merging index values generated in the ACTD clustering procedure areutilized to find the candidate optimal cluster numbers.The experimental resultsshow that the new method often can identify the clustering structure of transac-tional data effectively.
     (3) The stability problem of transactional clustering results is studied.Theclustering procedures of traditional partition-based clustering algorithms are oftentrapped into local optimal results,while the SOM neural network has stable cluster-ing results but only for numerical data.So a GHSOM-based clustering method fortransactional data is proposed,i.e.GHSOM-CD method,which introduces the con-cept of Coverage Density into the GHSOM learning algorithm.The new methodimproves the neuron weight values updating way and the network training stopcriterion.The experimental results show that the GHSOM-CD method producescorrect and meaningful transactional clustering results.
     (4) The frequent itemsets compression problem is studied.To solve the problemthat frequent itemsets mining often generates a large collection of frequent itemsets,a dynamic clustering method,i.e.EESC method is proposed to compress frequentitemsets approximately.The EESC method is based on two frequent itemsets intra-cluster similarity measures:the expression similarity and the support similarity.The experimental results show that the approximate frequent itemsets method isfeasible and the compression quality is good.

引文

[1]William J.Frawley,Gregory Piatetsky-shapiro,Christopher J.Matheus.Knowledge discovery in databases:an overview.AI magazine,1995,13:57-70
    [2]Pierre Miehaud.Clustering techniques.Future Generation Computer Systems,1997.13:135-147
    [3]S.B.Kotsiantis and P.E.Plntelas.Recent advances in clustering:a brief survey.WSEAS Transactions on Information Science and Applicarions,2004,73-81
    [4]Jiawei Han and Micheline Kamber.Data Mining:Concepts and Techniques.Morgan Kauflnann Publishers,2000
    [5]Everitt and Brian.Cluster analysis.Halsted Press.1993
    [6]王实，高文．数据挖掘中的聚类方法．计算机科学，2000，27(4)：42-45
    [7]行小帅，焦李成．数据挖掘的聚类方法．电路与系统学报，2003，8(1)：59-67
    [8]Johannes Grabmeier,Andreas Rudolph.Techniques of cluster algorithms in data mining.Data Mining and Knowledge Discovery,2002,6(4):303-360
    [9]Michael R.Anderberg.cluster analysis for applications.Academic Press,Inc.,1973
    [10]Anil K.Jain,Richard C.Dubcs.algorithms for clustering.Prentice-Hall advanced reference series.Prentice-Hall,Inc.,1988
    [11]E.Diday,J.C.Simon.cluster analysis.Digital Pattern Recognition,1976,47-94
    [12]R.Michalski,R.E.Stepp,E.Diday.Automated construction of classifications:conceptual clustering versus numerical taxonomy.IEEE Transactions on Pattern Analysis and Machine Intelligence,1983,5(9):396-409
    [13]Daniel Barbara,Yi Li,Julia Couto.Coolcat:an entropy-based algorithm for categorical clustering.Proceedings of ACM Conference on Information and Knowledge management,2002,582-589
    [14]Tao Li,Sheng Ma,Mitsunori Ogihara.Entropy-based criterion in categorical clustering.Proceedings of the 21st International Conference on Machine learning,2004,68-75
    [15]Keke Chen,Ling Liu.The ”best k” for entropy-based categorical data clustering.Proceedings of the 17th international conference on Scientific and statistical database management,2005,253-262
    [16] Victor L. Brailovsky. A probabilistic approach to clustering. Pattern Recognition Letter.1991, 12(4):193-198
    [17] C.T.Zahn. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers. 1971, 4:68-86
    [18] Richard C. Dubes. How many clusters are best?—an experiment. Pattern Recognition,1987, 20(6):645-663
    [19] Yizong Cheng. Mean shift, mode seeking, clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(8):790-799
    [20] Jianchang Mao, A.K.Jain. A self-organizing network for hyperellipsoidal clustering (hec).IEEE Transactions on Neural Networks, 1996, 7:16-29
    [21] Daniel P. Huttenlocher, Gregory A. Kl, William J. Rucklidge. Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence,1993, 15:850-863
    [22] M.-P.Dubuisson, A.K.Jain. A modified hausdorff distance for object matching. Pattern Recognition, 1994, 1:566-568
    [23] Craig Stanfill, David Waltz. Toward memory-based reasoning. ACM Communnication,1986, 29(12):1213-1228
    [24] Zhexue Huang. Extensions to the k-means algorithm for clustering large data sets with categorical values. Workshop on Research Issues on Data Mining and Knowledge Discovery, 1998, 2(3):283-304
    [25] Shyam Boriah, Varun Chandola, Vipin Kumar. Similarity measures for categorical data:A comparative evaluation. Proceedings of the 2008 SLAM International Conference on Data Mining, 2008, 243-254
    [26] Ke Wang, Chu Xu, Bing Liu. Clustering transactions using large items. Proceedings of ACM Conference on Information and Knowledge Management, 1999, 483-490
    [27] Periklis Andritsos, Panayiotis Tsaparas, Renee J. Miller, et al. Limbo: Scalable clustering of categorical data. Proceedings of the 9th International Conference on Extending Database Technology, 123-146, 2004
    [28] Naftali Tishby, Fernando C. Pereira, William Bialek. The information bottleneck method. Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, 1999, 368-377
    [29] Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, Christos Faloutsos. Fully automatic cross-associations. Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, 79-88
    [30]Gerald Salton.Developments in automatic text retrieval.Science,1991,253:974-980
    [31]J.B.MacQueen.Some methods for classification and analysis of multivariate observations.Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability,1967,1:281-297
    [32]Leonard Kaufman.Peter J.Roussccuw.Finding Groups in Data:An Introduction to Cluster Analysis.John Wiley&Sons,New York,1990.
    [33]Zhcxue Huang.Clustering large data sets with mixed numeric and categorical values.Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining,1997,21-35
    [34]Lauritzen,L.Steffen.The em algorithm for graphical association models with missing data.Computational Statistics & Data Analysis,1995,19(2):191-201
    [35]Carlos Ordonez,Edward Omiecinski.Frem:Fast and robust EM clustering for large data sets.Proceedings of ACM Conference on Information and Knowledge Management,2002,590-599
    [36]R.NG and J.Han.Efficient and effective clustering method for spatial data mining.Proceeding of Very Large Data Bases,1994,144-155
    [37]Martin Ester,Hans-Peter Kriegel,Xiaowei Xu.Knowledge discovery in large spatial databases:Focusing techniques for efficient class identification.Proceedings of the 4th International Symposium on Advances in Spatial Databases,1995,6782
    [38]K.Krishna,M.Narasimha Murty.Genetic k-means algorithm.IEEE Transactions on Systems,Man,Cybernetics,Part B,1999,29(3):433-439
    [39]Sanghamitra Bandyopadhyay,Ujjwal Maulik.An evolutionary technique based on kmeans algorithm for optimal clustering in r~n.Information Sciences-Applications:An International Journal,2002,146:221-237
    [40]王磊，戚飞虎．大矢量空间聚类的遗传k-均值算法．上海交通大学学报，1999，33(9)：1154-1156
    [41]王敞，陈增强，袁著祉．基于遗传算法的k均值聚类分析．计算机科学，2003，30(2)：163-164
    [42]M.N.Vrahatis,B.Boutsinas,P.Alevizos,G.Pavlides.The new k-windows algorithm for improving the k-means clustering algorithm.Journal of Complexity,2002,18(1):375-391
    [43]行小帅，潘进，焦李成．基于免疫规划的k-means聚类算法．计算机学报，2003，26(5)：605-610
    [44]王磊，潘进，焦李成．免疫算法．电子学报，2000，28(7)：74-78
    [45]刘静，钟伟才，刘芳．免疫进化聚类算法．电子学报，2001，29(12A)：1897-1901
    [46] Michael K. Ng, Joyce C. Wong. Clustering categorical data sets using tabu search techniques. Pattern Recognition. 2002, 35(12):2783-2790
    [47] Mu-Chun Su, Chien-Hsing Chou. A modified version of the k-means algorithm with a distance based on cluster symmetry. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001, 23(6):674-680
    [48] Tapas Kanungo, David Mount. Nathan Netanyahu, et al. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2002, 24(7):881-892
    [49] R.D.Pascual-marqui, A.D.Pascual-montano, K.Kochi, et al. Smoothly distributed fuzzy c-means: a new self-organizing map. Pattern Recognition, 2001, 34:2395-2402
    [50] Nabil Belacel, Pierre Hansen, Nenad Mladenovic. Fuzzy j-means: a new heuristic for fuzzy clustering. Pattern Recognition, 2002, 35(10):2193 - 2200
    [51] Chaturvedi Anil, Foods Kraft, Green Paul E., et al. K-modes clustering. Journal of classification, 18(1):35 - 55, 2001
    [52] P. H. A.SNEATH, R. R.SOKAL. Numerical Taxonomy. London,UK, 1973
    [53] Benjamin King. Step-wise clustering procedures. Journal of the American Statistical Association, 1967, 62:86-101
    [54] Jr. Joe H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 1963, 58:236-244
    [55] Tian Zhang, Raghu Ramakrishnan, Miron Livny. Birch: an efficient data clustering method for very large databases. SIGMOD Record, 1996, 25(2):103-114
    [56] Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. Cure: an efficient clustering algorithm for large databases. Proceedings of ACM SIGMOD International Conference on Management of Data, 1998, 73-84
    [57] Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. Rock: A robust clustering algorithm for categorical attributes. Proceedings of the 15th IEEE International Conference on Data Engineering, 1999, 512-521
    [58] Yuntao Qian, Qingsong Shi, Qi Wang. Cure-ns: a hierarchical clustering algorithm with new shrinking scheme. Proceedings of International Conference on Machine Learning and Cybernetics, 2002, 2:895 - 899
    [59] Chin-Wang Tao. Unsupervised fuzzy clustering with multi-center clusters. Fuzzy Sets System, 2002, 128(3):305-322
    [60]陈恩红，王上飞．一种利用代表点的有效聚类算法设计与实现．模式识别与人工智能，2001，14(4)：417-422
    [61]G.Karypis,Eui-Hong Han,V.Kumar.Chameleon:hierarchical clustering using dynamic modeling.Computer,1999,32(8):68-75
    [62]Ning Chen,An Chen,Long xiang Zhou.An effective clustering algorithm in large transaction databases..Jonrnal of Software,2001,12(4):475-484
    [63]Haofeng Zhou,Qingqing Yuan,Zunping Cheng,et al.Phc:A fast partition and hierarchy-based clustering algorithm.Journal of Computer Science and Technology,2003,18(3):408-411
    [64]Chris Ding,Xiaofeng He.Cluster merging and splitting in hierarchical clustering algorithms.Proceedings of IEEE International Conference on Data Mining,2002,139-146
    [65]Manoranjan Dash,Huan Liu,Peter Scheuermann,et al.Fast hierarchical clustering and its validation.Data & Knowledge Engineering,2003,44(1):109-138
    [66]Yihong DONG.Hierarchical clustering algorithm based on neighborhood-linked in large spatial databases.Lecture Notes in Artificial Intelligence,2003,2639:619-622
    [67]Martin Ester,Hans-Peter Kriegel,J(?)rg Sander,ctal.A density-based algorithm for discovering clusters in large spatial databases with noise.Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining,1996,226-231
    [68]Shuigeng Zhou,Aoying Zhou,Wen Jin.Fdbscan:a fast dbscan algorithm.Journal of Software,2000,11(6):735-744
    [69]周水庚，周傲英．一种基于密度的快速聚类算法．计算机研究与发展，2000，37(11)：1287-1292
    [70]周水庚，周傲英．基于数据分区的dbscan算法．计算机研究与发展，2000，37(10)：1153-1159
    [71]Aoying Zhou,Shuigeng Zhou,Jing Cao.Approaches for scaling dbscan algorithm to large spatial databases.Journal of Computer Science and Technology,2000,15(6):509-526
    [72]Michael Ankerst,Markus M.Brcunig,Hans-Peter Kricgel,et al.OPTICS:Ordering points to identify the clustering structure.Proceedings of ACM SIGMOD International Conference on Management of Data,1999,49-60
    [73]赵艳厂，谢帆．一种新的聚类算法：等密度线算法．北京邮电大学学报，2002，25(2)：8-13
    [74]Er Hinncburg,Daniel A.Keim.An efficient approach to clustering in large multimedia databases with noise.Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining,1998,58-65
    [75] Xiaoping Qiu, Yunchuan Tang, Dan Meng. A new fuzzy clustering method based on distance and density. Proceedings of the IEEE International Conference on Systems.Man and Cybernetics, 2002, 86-90
    [76] Cuevas Antonio. Febrero Manuel, Praiman Ricardo. Cluster analysis:a further approach based on density estimation. Computational Statistics and Data Analysis, 2001.36(4):441-459
    [77] Peter Bajcsy, Narcndra Ahuja. Location- and density-based hierarchical clustering using similarity analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.1998, 20(9):1011-1015
    [78] Wei Wang, Jiong Yang, Richard R. Muntz. Sting: A statistical information grid approach to spatial data mining. Proceedings of the 23rd International Conference on Very Large Data Bases, 1997, 186-195
    [79] Gholamhosein Sheikholeslami, Surojit Chatterjee, Aidong Zhang. Wavecluster: a wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal, 2000, 289-304
    [80] Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, et al. Automatic subspace clustering of high dimensional data for data mining applications. Proceedings of ACM SIGMOD International Conference on Management of Data, 1998, 27(2):94-105
    [81] D.Fisher. Improving inference through conceptual clustering. Proceedings of 1987 AAAI Conference, 1987, 461-465
    [82] J. H. Gennari, P. Langley, D. Fisher. Models of incremental concept formation. Artificial Intelligence, 1989, 40(1):11-61
    [83] Peter Cheeseman, John Stutz. Bayesian classification (autoclass): theory and results.Advances in knowledge discovery and data mining, 1996, 153-180
    [84] Clara Pizzuti, Domenico Talia. P-autoclass: Scalable parallel clustering for mining large data sets. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3):629-641
    [85] Teuvo Kohonen. Self-organized formation of topologically correct feature maps. Biology Cybernet, 1982, 43:59-69
    [86] G.Carpenter and S.Grossberg. Art3:hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Networks, 1990, 3:129-152
    [87] Nagiza F. Samatova, George Ostrouchov, Al Geist, et al. Rachet: An efficient cover-based merging of clustering hierarchies from distributed datasets. Distributted Parallel Databases, 2002, 11(2):157-180
    [88] Yong Shi, Yuqing Song, Aidong Zhang. A shrinking-based clustering approach for multidimensional data. IEEE Transactions on Knowledge and Data Engineering, 2005.17(10):1389-1403
    [89] Frequent itemset mining dataset repository, http://fimi.cs.helsinki.fi/data/
    [90] James Abcllo, Mauricio G. C. Rcscnde. Sandra Sudarsky. Massive quasi-cliquc detection.Latin American Theoretical Informatics. 2002, 598-612
    [91] Foto Afrati, Aristides Gionis. Heikki Mannila. Approximating a collection of frequent sets. Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 2004, 12-19
    [92] Charu C. Aggarwal, Cecilia Magdalena, Philip S. Yu. Finding localized associations in market basket data. IEEE Transactions on Knowledge and Data Engineering, 2002,14(1):51-62
    [93] Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami. Mining association rules between sets of items in large databases. Proceedings of ACM SIGMOD International Conference on Management of Data, 1993, 207-216
    [94] Rakesh Agrawal, Ramakrishnan Srikant. Fast algorithms for mining association rules.Proceedings of the 20th International Conference on Very Large Data Bases, 1994, 487-499
    [95] Rakesh Agrawal, Ramakrishnan Srikant. Mining sequential patterns. Proceedings of the 11th IEEE International Conference on Data Engineering, 1995, 3-14
    [96] Amir Ahmad, Lipika Dey. A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 2007, 63(2):503-527
    [97] J. Boulicaut, A. Bykowski, C. Rigotti. Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery,2003, 7:5-22
    [98] Tom Brijs, Gilbert Swinnen, Koen Vanhoof,et al. Using association rules for product assortment decisions: A case study. Knowledge Discovery and Data Mining, 1999, 254-260
    [99] Sergey Brin, Rajeev Motwani, Craig Silverstein. Beyond market baskets: Generalizing association rules to correlations. Proceedings of ACM SIGMOD International Conference on Management of Data, 1997, 265-276
    [100] Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, et al. Dynamic itemset counting and implication rules for market basket data. SIGMOD Record, 1997, 26(2):255-264
    [101] Toon Calders, Bart Goethals. Mining all non-derivable frequent itemsets. Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery,2002, 74-85
    [102] Chia-Hui Chang, Zhi-Kai Ding. Categorical data visualization and clustering using subjective factors. Data & Knowledge Engineering, 2005, 53:243-262
    [103] Kekc Chen. Ling Liu. VISTA: Validating and refining clusters via visualization. Information Visualization, 2004, 3(4):257-270
    [104] Teresa Wu Darshit Parmar, Jennifer Blackhurst. Mmr: An algorithm for clustering categorical data using rough set theory. Data & Knowledge Engineering, 2007, 63:879-893
    [105] G. DeBoeck, T. Kohonen (Eds.). Visual explorations in finance. Springer, 1998
    [106] Inderjit S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining, 2001, 269-274
    [107] Chris H. Q. Ding, Xiaofeng He, Hongyuan Zha, et al. A min-max cut algorithm for graph partitioning and data clustering. Proceedings of IEEE International Conference on Data Mining, 2001, 107-114
    [108] Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan. Cactus: Clustering categorical data using summaries. Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining, 1999, 73-83
    [109] David Gibson, Jon M. Kleinberg, Prabhakar Raghavan. Clustering categorical data: An approach based on dynamical systems. Proceedings of the 24th International Conference on Very Large Data Bases, 1998, 311-322
    [110] Dimitrios Gunopulos, Heikki Mannila, Roni Khardon, et al. Data mining, hypcrgraph transversals, machine learning (extended abstract). Proceedings of ACM symposium on Principles of database systems, 1997, 209-216
    [111] Maria Halkidi, Yannis Batistakis, Michalis Vazirgiannis. Cluster validity methods: Part Ⅰ and Ⅱ. SIGMOD Record, 2002, 31(2):40-45
    [112] Jiawei Han, Yongjian Fu. Discovery of multiple-level association rules from large databases. Proceedings of International Conference on Very Large Data Bases, 1995,420-431
    [113] Jiawei Han, Jian Pei, Yiwen Yin. Mining frequent patterns without candidate generation. SIGMOD Record, 2000, 29(2):1-12
    [114] Anil K. Jain, Richard C. Dubes. Data clustering: A review. ACM Computing Surveys,1999, 31:264-323
    [115] T. Kohonen. Self-organizing maps. Springer, 1995
    [116] Nina Mishra. Dana Ron, Ram Swaminathan. On finding largc conjunctive clusters.Proceedings of ACM Conference on Computational Learning Theory, 2003, 448-462
    [117] Jong Soo Park, Ming syan Chen. Philip S. Yu. An effective hash-based algorithm for mining association rules. Proceedings of ACM SIGMOD International Conference on Management of Data, 1995, 175-186
    [118] Nicolas Pasquier, Yves Bastide, Rafik Taouil, et al. Discovering frequent closed itemsets for association rules. The 7th International Conference on Database Theory, 1999, 398-416
    [119] Andreas Rauber, Dieter Merkl, Michael Dittenbach. The growing hierarchical self- organizing map: Exploratory analysis of high-dimensional data. IEEE Transactions on Nerual Networks, 2002, 13(6):1331-1341
    [120] Ashok Savasere, Edward Omiecinski, Shamkant Navathe. An efficient algorithm for mining association rules in large databases. Proceedings of International Conference on Very Large Data Bases, 1995, 432-444
    [121] O. Simula, P. Vasara, J. Vesanto, et al. The self-organizing map in industry analysis.Industrial Applications of Neural Networks, 1999
    [122] Hannu Toivonen. Sampling large databases for association rules. Proceedings of International Conference on Very Large Data Bases, 1996, 134-145
    [123] T.Sprenger, R.Brunella, M.Gross. H-blob: a hierarchical visual clustering method using implicit surfaces. Proceedings of the IEEE Visualization Conference,2000, 61-68
    [124] Dong Xin, Jiawei Han, Xifeng Yan, et al. On compressing frequent patterns. Data & Knowledge Engineering, 2007, 60(1):5-29
    [125] Hua Yan, Keke Chen, Ling Liu, et al. Efficiently clustering transactional data with weighted coverage density. Proceedings of ACM Conference on Information and Knowledge Management, 2006, 367-376
    [126] Hua Yan, Lei Zhang, Yi Zhang. Clustering categorical data using coverage density.Proceedings of International Conference on Advance Data Mining and Application, 2005,248-255
    [127] Yiling Yang, Xudong Guan, Jinyuan You. Clope: A fast and effective clustering algorithm for transactional data. Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining, 2002, 682-687
    [128] Hongyuan Zha, Xiaofeng He, Chris H. Q. Ding, et al. Bipartite graph partitioning and data clustering. Proceedings of ACM Conference on Information and Knowledge Management, 2001, 25-32

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700