Hierarchical data organization, clustering and denoising via localized diffusion folders
详细信息查看全文 | 推荐本文 |
摘要
Data clustering is a common technique for data analysis. It is used in many fields including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. The proposed Localized Diffusion Folders (LDF) methodology, whose localized folders are called diffusion folders (DF), introduces consistency criteria for hierarchical folder organization, clustering and classification of high-dimensional datasets. The DF are multi-level data partitioning into local neighborhoods that are generated by several random selections of data points and DF in a diffusion graph and by redefining local diffusion distances between them. This multi-level partitioning defines an improved localized geometry for the data and a localized Markov transition matrix that is used for the next time step in the advancement of the hierarchical diffusion process. The result of this clustering method is a bottom-up hierarchical data organization where each level in the hierarchy contains LDF of DF from the lower levels. This methodology preserves the local neighborhood of each point while eliminating noisy spurious connections between points and areas in the data affinities graph. One of our goals in this paper is to illustrate the impact of the initial affinities selection on data graphs definition and on the robustness of the hierarchical data organization. This process is similar to filter banks selection for signals denoising. The performance of the algorithms is demonstrated on real data and it is compared to existing methods. The proposed solution is generic since it fits a large number of related problems where the source datasets contain high-dimensional data.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700