文摘
The ability to discover projected clusters in high-dimensional data is essential for many machinelearning applications. Projective clustering of categorical data is currently a challenge due to the difficultiesin learning adaptive weights for categorical attributes coordinating with clusters optimization. In this paper,a probability-based learning framework is proposed, which allows both the attribute weights and the centerbasedclusters to be optimized by kernel density estimation on categorical attributes. A novel algorithm is thenderived for projective clustering on categorical data, based on the new learning approach for the kernel bandwidthselection problem. We show that the attribute weight substantially connects to the kernel bandwidth, whilethe optimized cluster center corresponds to the normalized frequency estimator of the categorical attributes.Experimental results on synthesis and real-world data show outstanding performance of the proposed method,which significantly outperforms state-of-the-art algorithms.