Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach
详细信息    查看全文
  • 关键词:Hierarchical clustering ; Aggregation ; Centroid ; Gini ; index ; Genie algorithm
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9880
  • 期:1
  • 页码:191-202
  • 全文大小:1,175 KB
  • 参考文献:1.Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)MATH
    2.Aristondo, O., García-Lapresta, J., de la Vega, C.L., Pereira, R.M.: Classical inequality indices, welfare and illfare functions, and the dual decomposition. Fuzzy Sets Syst. 228, 114–136 (2013)MathSciNet CrossRef MATH
    3.Beliakov, G., Bustince, H., Calvo, T.: A Practical Guide to Averaging Functions. Springer, Heidelberg (2016)CrossRef
    4.Bortot, S., Marques Pereira, R.: On a new poverty measure constructed from the exponential mean. In: Proceedings of IFSA/EUSFLAT’15, pp. 333–340. Atlantis Press (2015)
    5.Cena, A., Gagolewski, M.: Fuzzy K-minpen clustering and K-nearest-minpen classification procedures incorporating generic distance-based penalty minimizers. In: Carvalho, J.P., Lesot, M.-J., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2016. CCIS, vol. 611, pp. 445–456. Springer, Heidelberg (2016). doi:10.​1007/​978-3-319-40581-0_​36 CrossRef
    6.Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2013)CrossRef MATH
    7.Gagolewski, M.: Data Fusion: Theory, Methods, and Applications. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2015)
    8.Gagolewski, M., Bartoszuk, M., Cena, A.: Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8–23 (2016)CrossRef
    9.García-Lapresta, J., Lasso de la Vega, C., Marques Pereira, R., Urrutia, A.: A new class of fuzzy poverty measures. In: Proceedings of IFSA/EUSFLAT 2015, pp. 1140–1146. Atlantis Press (2015)
    10.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2013)MATH
    11.Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373–380 (1967)CrossRef
    12.Legendre, P., Legendre, L.: Numerical Ecology. Elsevier Science BV, Amsterdam (2003)MATH
    13.Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv:​1109.​2378 [stat.ML] (2011)
    14.Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313–1325 (1995)MathSciNet CrossRef MATH
    15.R Development Core Team: \({\sf {R}}\) : A Language and Environment for Statistical Computing. \({\sf {R}}\) Foundation for Statistical Computing, Vienna (2016). http://​www.​R-project.​org
  • 作者单位:Marek Gagolewski (17) (18)
    Anna Cena (17)
    Maciej Bartoszuk (18)

    17. Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
    18. Faculty of Mathematics and Information Science, Warsaw University of Technology, ul. Koszykowa 75, 00-662, Warsaw, Poland
  • 丛书名:Modeling Decisions for Artificial Intelligence
  • ISBN:978-3-319-45656-0
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9880
文摘
The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, de Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700