How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity

详细信息查看全文

作者：Wilmer Leal ; Eugenio J. Llanos ; Guillermo Restrepo…
关键词：Ties in proximity ; Cluster stability ; Hierarchical cluster analysis (HCA) ; Dendrogram ; Cluster frequency ; Molecular descriptor
刊名：Journal of Cheminformatics
出版年：2016
出版时间：December 2016
年：2016
卷：8
期：1
全文大小：8,301 KB
参考文献：1.Schummer J (1998) The chemical core of chemistry I: a conceptual approach. HYLE Int J Philos Chem 4:129–162
2.Theodoridis S, Koutroumbas K (2009) Pattern recognition. Elsevier, San Diego
3.Downs GM, Barnard JM (2002) Clustering methods and their uses in computational chemistry. Rev Comput Chem 18:1–40
4.Plewczynski D, Spieser SA, Koch U (2006) Assessing different classification methods for virtual screening. J Chem Inf Model 46(3):1098–1106CrossRef
5.Kim S, Han L, Yu B, Hähnke VD, Bolton EE, Bryant SH (2015) PubChem structure–activity relationship (SAR) clusters. J Cheminform 7:33. doi:10.1186/s13321-015-0070-x CrossRef
6.Saeed F, Salim N, Abdo A (2012) Voting-based consensus clustering for combining multiple clusterings of chemical structures. J Cheminform 4(1):1–8CrossRef
7.Basak S, Niemi G, Veith G (1991) Predicting properties of molecules using graph invariants. J Math Chem 7:243–272CrossRef
8.Gütlein M, Karwath A, Kramer S (2012) CheS-Mapper—chemical space mapping and visualization in 3D. J Cheminform 4:7. doi:10.1186/1758-2946-4-7 CrossRef
9.Škuta C, Bartůněk P, Svozil D (2014) InCHlib—interactive cluster heatmap for web applications. J Cheminform 6:44. doi:10.1186/s13321-014-0044-4 CrossRef
10.Gobbi A, Giannetti A, Chen H, Lee ML (2015) Atom–atom-path similarity and sphere exclusion clustering: tools for prioritizing fragment hits. J Cheminform 7:11. doi:10.1186/s13321-015-0056-8 CrossRef
11.Amari S, Aizawa M, Zhang J, Fukuzawa K, Mochizuki Y, Iwasawa Y, Nakata K, Chuman H, Nakano T (2006) VISCANA: visualized cluster analysis of protein–ligand interaction based on the ab initio fragment molecular orbital method for virtual ligand screening. J Chem Inf Model 46(1):221–230CrossRef
12.Akerman KJ, Fagenson AM, Cyril V, Akerman MP, Munro OQ (2014) Gold(III) macrocycles: nucleotide-specific unconventional catalytic inhibitors of human topoisomerase I. J Am Chem Soc 136(15):5670–5682CrossRef
13.Santos-Filho O, Cherkasov A (2008) Using molecular docking, 3D-QSAR, and cluster analysis for screening structurally diverse data sets of pharmacological interest. J Chem Inf Model 48(10):2054–2065CrossRef
14.Bellera CL, Balcazar DE, Alberca L, Labriola CA, Talevi A, Carrillo C (2013) Application of computer-aided drug repurposing in the search of new cruzipain inhibitors: discovery of amiodarone and bromocriptine inhibitory effects. J Chem Inf Model 53(9):2402–2408CrossRef
15.Lin H, Jang M, Suslick KS (2011) Preoxidation for colorimetric sensor array detection of VOCs. J Am Chem Soc 133(42):16786–16789CrossRef
16.Mesa H, Restrepo G (2008) On dendrograms and topologies. MATCH Commun Math Comput Chem 60:371–384
17.Bailey KD (1994) Typologies and taxonomies: an introduction to classification techniques. Sage publications, Inc., Thousand Oaks, pp 34–63 [Lewin-Beck M (series editor): Sage University paper series on quantitative applications in the social sciences, vol 102]
18.Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies: 1. Hierarchical systems. Comput J 9(4):373–380CrossRef
19.Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, ChichesterCrossRef
20.Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage Publications Inc., Newbury Park
21.MacCuish J, Nicolaou C, MacCuish NE (2001) Ties in proximity and clustering compounds. J Chem Inf Comput Sci 41:134–146CrossRef
22.MacCuish J, MacCuish NE (2011) Clustering in bioinformatics and drug discovery. CRC Press, Boca Ratón (Chapman & Hall: Series on Mathematical and Computational Biology)
23.Arnau V, Mars S, Marin I (2005) Iterative cluster analysis of protein interaction data. Bioinformatics 21(3):364–378CrossRef
24.Himberg J, Hyvärine A (2001) Independent component analysis for binary data: An experimental study. In: Lee TW, Jung TP, Makeig S, Sejnowsky TJ (eds) Proceedings of the international workshop on independent component analysis and blind signal separation (ICA2001), pp 552–556
25.Fernandez A, Gomez S (2008) Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. J. Classif 25(1):43–65CrossRef
26.Bertrand P (1995) Structural properties of pyramidal clustering. In: Cox I, Hansen P, Julesz B (eds) Partitioning data sets. American Mathematical Society, Providence, pp 35–53 (DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol 19.)
27.Nicolaou C, MacCuish J, Tamura S (2000) A new multi-domain clustering algorithm for lead discovery that exploits ties in proximities. In: Proceedings from the 13th European symposium on quantitative structure–activity relationships. Prous Science, Barcelona pp 486–495
28.Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A, Dimitrov K, Siegel AF, Galitski T (2004) Control of yeast filamentous-form growth by modules in an integrated molecular network. Genome Res. 14:380–390CrossRef
29.Clustering Ambiguity II. http://learningandotherthings.blogspot.de/2015/07/clustering-ambiguity-ii.html
30.Felsenstein J (1978) The number of evolutionary trees. Syst Zool 27:27–33CrossRef
31.Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics, volume I: alphabetical listing. Wiley-VCH, WeinheimCrossRef
32.Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model 50:1189–1204CrossRef
33.von Luxburg U (2009) Clustering stability: an overview. Found Trends Mach Learn 2(3):235–274CrossRef
34.Graham P (1996) ANSI common Lisp. Prentice Hall, New Jersey
35.Felsestein J (2004) Inferring phylogenies. Sinauer Associates Inc., Massachusetts
36.Robbins A (2001) Effective awk programming. O’Reilly, Sebastopol
37.Restrepo G, Mesa H, Llanos E, Villaveces JL (2004) Topological study of the periodic system. J Chem Inf Comput Sci 44:68–75CrossRef
38.Restrepo G, Mesa H, Llanos E, Villaveces JL (2006) Topological study of the periodic system. In: King RB, Rouvray D (eds) The mathematics of the periodic table. Nova, New York
39.Restrepo G, Mesa H, Villaveces JL (2006) On the topological sense of chemical sets. J Math Chem 39:363–376CrossRef
40.Leal W, Restrepo G, Bernal A (2012) A network study of chemical elements: From binary compounds to chemical trends. MATCH Commun Math Comput Chem 68:417–442
41.Restrepo G, Mesa H, Llanos E (2007) Three dissimilarity measures to contrast dendrograms. J Chem Inf Comput Sci 47:761–770CrossRef
作者单位：Wilmer Leal (1) (2)
Eugenio J. Llanos (1) (2) (3)
Guillermo Restrepo (2) (4)
Carlos F. Suárez (1) (5)
Manuel Elkin Patarroyo (1) (6)

1. Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá, Colombia
2. Laboratorio de Química Teórica, Universidad de Pamplona, Pamplona, Colombia
3. Corporación SCIO, Bogotá, Colombia
4. Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany
5. Universidad del Rosario, Bogotá, Colombia
6. Universidad Nacional de Colombia, Bogotá, Colombia
刊物类别：Physics and Astronomy
刊物主题：Computer Applications in Chemistry
Theoretical and Computational Chemistry
Computational Biology/Bioinformatics
Documentation and Information in Chemistry
出版者：Chemistry Central Ltd
ISSN：1758-2946

文摘

Background Hierarchical cluster analysis (HCA) is a widely used classificatory technique in many areas of scientific knowledge. Applications usually yield a dendrogram from an HCA run over a given data set, using a grouping algorithm and a similarity measure. However, even when such parameters are fixed, ties in proximity (i.e. two equidistant clusters from a third one) may produce several different dendrograms, having different possible clustering patterns (different classifications). This situation is usually disregarded and conclusions are based on a single result, leading to questions concerning the permanence of clusters in all the resulting dendrograms; this happens, for example, when using HCA for grouping molecular descriptors to select that less similar ones in QSAR studies.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700