How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity
详细信息    查看全文
  • 作者:Wilmer Leal ; Eugenio J. Llanos ; Guillermo Restrepo…
  • 关键词:Ties in proximity ; Cluster stability ; Hierarchical cluster analysis (HCA) ; Dendrogram ; Cluster frequency ; Molecular descriptor
  • 刊名:Journal of Cheminformatics
  • 出版年:2016
  • 出版时间:December 2016
  • 年:2016
  • 卷:8
  • 期:1
  • 全文大小:8,301 KB
  • 参考文献:1.Schummer J (1998) The chemical core of chemistry I: a conceptual approach. HYLE Int J Philos Chem 4:129–162
    2.Theodoridis S, Koutroumbas K (2009) Pattern recognition. Elsevier, San Diego
    3.Downs GM, Barnard JM (2002) Clustering methods and their uses in computational chemistry. Rev Comput Chem 18:1–40
    4.Plewczynski D, Spieser SA, Koch U (2006) Assessing different classification methods for virtual screening. J Chem Inf Model 46(3):1098–1106CrossRef
    5.Kim S, Han L, Yu B, Hähnke VD, Bolton EE, Bryant SH (2015) PubChem structure–activity relationship (SAR) clusters. J Cheminform 7:33. doi:10.​1186/​s13321-015-0070-x CrossRef
    6.Saeed F, Salim N, Abdo A (2012) Voting-based consensus clustering for combining multiple clusterings of chemical structures. J Cheminform 4(1):1–8CrossRef
    7.Basak S, Niemi G, Veith G (1991) Predicting properties of molecules using graph invariants. J Math Chem 7:243–272CrossRef
    8.Gütlein M, Karwath A, Kramer S (2012) CheS-Mapper—chemical space mapping and visualization in 3D. J Cheminform 4:7. doi:10.​1186/​1758-2946-4-7 CrossRef
    9.Škuta C, Bartůněk P, Svozil D (2014) InCHlib—interactive cluster heatmap for web applications. J Cheminform 6:44. doi:10.​1186/​s13321-014-0044-4 CrossRef
    10.Gobbi A, Giannetti A, Chen H, Lee ML (2015) Atom–atom-path similarity and sphere exclusion clustering: tools for prioritizing fragment hits. J Cheminform 7:11. doi:10.​1186/​s13321-015-0056-8 CrossRef
    11.Amari S, Aizawa M, Zhang J, Fukuzawa K, Mochizuki Y, Iwasawa Y, Nakata K, Chuman H, Nakano T (2006) VISCANA: visualized cluster analysis of protein–ligand interaction based on the ab initio fragment molecular orbital method for virtual ligand screening. J Chem Inf Model 46(1):221–230CrossRef
    12.Akerman KJ, Fagenson AM, Cyril V, Akerman MP, Munro OQ (2014) Gold(III) macrocycles: nucleotide-specific unconventional catalytic inhibitors of human topoisomerase I. J Am Chem Soc 136(15):5670–5682CrossRef
    13.Santos-Filho O, Cherkasov A (2008) Using molecular docking, 3D-QSAR, and cluster analysis for screening structurally diverse data sets of pharmacological interest. J Chem Inf Model 48(10):2054–2065CrossRef
    14.Bellera CL, Balcazar DE, Alberca L, Labriola CA, Talevi A, Carrillo C (2013) Application of computer-aided drug repurposing in the search of new cruzipain inhibitors: discovery of amiodarone and bromocriptine inhibitory effects. J Chem Inf Model 53(9):2402–2408CrossRef
    15.Lin H, Jang M, Suslick KS (2011) Preoxidation for colorimetric sensor array detection of VOCs. J Am Chem Soc 133(42):16786–16789CrossRef
    16.Mesa H, Restrepo G (2008) On dendrograms and topologies. MATCH Commun Math Comput Chem 60:371–384
    17.Bailey KD (1994) Typologies and taxonomies: an introduction to classification techniques. Sage publications, Inc., Thousand Oaks, pp 34–63 [Lewin-Beck M (series editor): Sage University paper series on quantitative applications in the social sciences, vol 102]
    18.Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies: 1. Hierarchical systems. Comput J 9(4):373–380CrossRef
    19.Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, ChichesterCrossRef
    20.Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage Publications Inc., Newbury Park
    21.MacCuish J, Nicolaou C, MacCuish NE (2001) Ties in proximity and clustering compounds. J Chem Inf Comput Sci 41:134–146CrossRef
    22.MacCuish J, MacCuish NE (2011) Clustering in bioinformatics and drug discovery. CRC Press, Boca Ratón (Chapman & Hall: Series on Mathematical and Computational Biology)
    23.Arnau V, Mars S, Marin I (2005) Iterative cluster analysis of protein interaction data. Bioinformatics 21(3):364–378CrossRef
    24.Himberg J, Hyvärine A (2001) Independent component analysis for binary data: An experimental study. In: Lee TW, Jung TP, Makeig S, Sejnowsky TJ (eds) Proceedings of the international workshop on independent component analysis and blind signal separation (ICA2001), pp 552–556
    25.Fernandez A, Gomez S (2008) Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. J. Classif 25(1):43–65CrossRef
    26.Bertrand P (1995) Structural properties of pyramidal clustering. In: Cox I, Hansen P, Julesz B (eds) Partitioning data sets. American Mathematical Society, Providence, pp 35–53 (DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol 19.)
    27.Nicolaou C, MacCuish J, Tamura S (2000) A new multi-domain clustering algorithm for lead discovery that exploits ties in proximities. In: Proceedings from the 13th European symposium on quantitative structure–activity relationships. Prous Science, Barcelona pp 486–495
    28.Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A, Dimitrov K, Siegel AF, Galitski T (2004) Control of yeast filamentous-form growth by modules in an integrated molecular network. Genome Res. 14:380–390CrossRef
    29.Clustering Ambiguity II. http://​learningandother​things.​blogspot.​de/​2015/​07/​clustering-ambiguity-ii.​html
    30.Felsenstein J (1978) The number of evolutionary trees. Syst Zool 27:27–33CrossRef
    31.Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics, volume I: alphabetical listing. Wiley-VCH, WeinheimCrossRef
    32.Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model 50:1189–1204CrossRef
    33.von Luxburg U (2009) Clustering stability: an overview. Found Trends Mach Learn 2(3):235–274CrossRef
    34.Graham P (1996) ANSI common Lisp. Prentice Hall, New Jersey
    35.Felsestein J (2004) Inferring phylogenies. Sinauer Associates Inc., Massachusetts
    36.Robbins A (2001) Effective awk programming. O’Reilly, Sebastopol
    37.Restrepo G, Mesa H, Llanos E, Villaveces JL (2004) Topological study of the periodic system. J Chem Inf Comput Sci 44:68–75CrossRef
    38.Restrepo G, Mesa H, Llanos E, Villaveces JL (2006) Topological study of the periodic system. In: King RB, Rouvray D (eds) The mathematics of the periodic table. Nova, New York
    39.Restrepo G, Mesa H, Villaveces JL (2006) On the topological sense of chemical sets. J Math Chem 39:363–376CrossRef
    40.Leal W, Restrepo G, Bernal A (2012) A network study of chemical elements: From binary compounds to chemical trends. MATCH Commun Math Comput Chem 68:417–442
    41.Restrepo G, Mesa H, Llanos E (2007) Three dissimilarity measures to contrast dendrograms. J Chem Inf Comput Sci 47:761–770CrossRef
  • 作者单位:Wilmer Leal (1) (2)
    Eugenio J. Llanos (1) (2) (3)
    Guillermo Restrepo (2) (4)
    Carlos F. Suárez (1) (5)
    Manuel Elkin Patarroyo (1) (6)

    1. Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá, Colombia
    2. Laboratorio de Química Teórica, Universidad de Pamplona, Pamplona, Colombia
    3. Corporación SCIO, Bogotá, Colombia
    4. Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany
    5. Universidad del Rosario, Bogotá, Colombia
    6. Universidad Nacional de Colombia, Bogotá, Colombia
  • 刊物类别:Physics and Astronomy
  • 刊物主题:Computer Applications in Chemistry
    Theoretical and Computational Chemistry
    Computational Biology/Bioinformatics
    Documentation and Information in Chemistry
  • 出版者:Chemistry Central Ltd
  • ISSN:1758-2946
文摘
Background Hierarchical cluster analysis (HCA) is a widely used classificatory technique in many areas of scientific knowledge. Applications usually yield a dendrogram from an HCA run over a given data set, using a grouping algorithm and a similarity measure. However, even when such parameters are fixed, ties in proximity (i.e. two equidistant clusters from a third one) may produce several different dendrograms, having different possible clustering patterns (different classifications). This situation is usually disregarded and conclusions are based on a single result, leading to questions concerning the permanence of clusters in all the resulting dendrograms; this happens, for example, when using HCA for grouping molecular descriptors to select that less similar ones in QSAR studies.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700