Cartogram visualization for nonlinear manifold learning models
详细信息    查看全文
  • 作者:Alfredo Vellido (1)
    David L. García (1)
    àngela Nebot (1)
  • 关键词:Cartogram ; Data visualization ; Generative topographic mapping ; Manifold learning ; Nonlinear mapping distortion ; Magnification factor
  • 刊名:Data Mining and Knowledge Discovery
  • 出版年:2013
  • 出版时间:July 2013
  • 年:2013
  • 卷:27
  • 期:1
  • 页码:22-54
  • 全文大小:1716KB
  • 参考文献:1. Alahakoon D, Halgamuge SK, Srinivasan B (2000) Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw 11(3): 601-14 CrossRef
    2. Aupetit M (2007) Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing 70(7-): 1304-330 CrossRef
    3. Bishop CM (1998) Latent variable models. In: Jordan MI (eds) Learning in graphical models. The MIT Press, Cambridge, pp 371-04 CrossRef
    4. Bishop CM, Tipping ME (1998) A hierarchical latent variable model for data visualization. IEEE Trans Pattern Anal 20(3): 281-93 CrossRef
    5. Bishop CM, Svensén M, Williams CKI (1997a) Magnification factors for the GTM algorithm. In: Proceedings of the IEE Fifth international conference on artificial neural networks. Cambridge, U.K., pp 64-9
    6. Bishop CM, Svensén M, Williams CKI (1997b) Magnification factors for the SOM and GTM algorithms. In: WSOM-7, Helsinki, Finland, pp 333-38
    7. Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10(1): 215-34 CrossRef
    8. Cruz R, Vellido A (2010) Semi-supervised geodesic generative topographic mapping. Pattern Recognit Lett 31(3): 202-09 CrossRef
    9. Cruz R, Vellido A (2011) Semi-supervised analysis of human brain tumours from partially labeled MRS information, using manifold learning models. Int J Neural Syst 21(1): 17-9 CrossRef
    10. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal 1(2): 224-27 CrossRef
    11. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1): 1-8
    12. Dey TK, Edelsbrunner H, Guha S (1999) Computational topology. In: Chazelle B, Goodman JE, Pollack R (eds) Advances in discrete and computational geometry (Contemporary Mathematics, 223), pp 109-43. American Mathematical Society
    13. Du Q, Faber V, Gunzburger M (1999) Centroidal Voronoi tessellations: applications and algorithms. SIAM Rev 41(4): 637-76 CrossRef
    14. Fayyad U, Piatetski-Shapiro G, Smith P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3): 37-4
    15. Furukawa T (2009) SOM of SOMs. Neural Netw 22(4): 463-78 CrossRef
    16. Gastner MT, Newman MEJ (2004) Diffusion-based method for producing density-equalizing maps. Proc Natl Acad Sci USA 101(20): 7499-504 CrossRef
    17. Gisbrecht A, Mokbel B, Hammer B (2011) Relational generative topographic mapping. Neurocomputing 74(9): 1359-371 CrossRef
    18. Govindaraju V, Young K, Maudsley AA (2000) Proton NMR chemical shifts and coupling constants for brain metabolites. NMR Biomed 13(3): 129-53 CrossRef
    19. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157-182
    20. Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. Studies in Fuzziness and Soft Computing. Springer, Berlin
    21. Hammer B, Villmann Th (2003) Mathematical aspects of neural networks. In: ESANN 2003, d-side pub, Brussels, Belgium, pp 59-2
    22. Hammer B, Hasenfuss A, Villmann Th (2007) Magnification control for batch neural gas. Neurocomputing 70(7-): 1225-234 CrossRef
    23. Jain AK (2010) Data clustering: 50?years beyond k-means. Pattern Recognit Lett 31(8): 651-66 CrossRef
    24. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3): 264-23 CrossRef
    25. Jeanny H (2010) Vision: images, signals and neural networks. Models of neural processing in visual perception. World Scientific Publishing, Singapore
    26. Jolliffe IT (2002) Principal component analysis (2nd ed.) Springer Series in Statistics. Springer, Berlin
    27. Julià-Sapé M, Acosta D, Mier M, Arús C, Watson D, The INTERPRET Consortium (2006) A multi-centre, web-accessible and quality control checked database of in vivo MR spectra of brain tumour patients. Magn Reson Mater Phys 19: 22-3 CrossRef
    28. Kohonen T (2000) Self-organizing maps, (3rd ed.) Information Science Series. Springer, Berlin
    29. Kim M, Ramakrishna RS (2005) New indices for cluster validity assessment. Pattern Recognit Lett 26(15): 2353-363 CrossRef
    30. Leban G, Zupan B, Vidmar G, Bratko I (2006) VizRank: data visualization guided by machine learning. Data Min Knowl Discov 13(2): 119-36 CrossRef
    31. Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction, information science and statistics. Springer, Berlin CrossRef
    32. Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 140: 1-5
    33. Lisboa PJG, Vellido A, Tagliaferri R, Napolitano F, Ceccarelli M, Martin-Guerrero JD, Biganzoli E (2010) Data mining in cancer research. IEEE Comput Intell Mag 5(1): 14-8 CrossRef
    34. McLachlan G, Peel D (2000) Finite mixture models. Series in Probability and Statistics. Wiley-Blackwell
    35. Meyers LS, Guarino A, Gamst G (2005) Applied multivariate research: design and interpretation. Sage Publications, Thousand Oaks
    36. Miikkulainen R, Bednar JA, Choe Y, Sirosh J (2005) Computational maps in the visual cortex. Springer, Berlin
    37. Okabe A, Boots B, Sugihara K, Chiu SN (2000) Spatial tessellations: concepts and applications of Voronoi diagrams (2nd ed.). Wiley-Blackwell, New York CrossRef
    38. Paulovich FV, Eler DM, Poco J, Botha CP, Minghim R, Nonato LG (2011) Piecewise Laplacian-based projection for interactive data exploration and organization. Comput Graph Forum (Proceedings EuroVis) 30(3): 1091-100 CrossRef
    39. Peel D, McLachlan GJ (2000) Robust mixture modelling using the t-distribution. Stat Comput 10: 339-48 CrossRef
    40. Pointer JS (1986) The cortical magnification factor and photopic vision. Biol Rev 61(2): 97-19 CrossRef
    41. Rauber A, Merkl D, Dittenbach M (2002) The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Trans Neural Netw 13(6): 1331-341 CrossRef
    42. Rong G, Liu Y, Wang W, Yin X, Gu XD, Guo X (2011) GPU-assisted computation of centroidal Voronoi tessellation. IEEE Trans Vis Comput Graph 17(3): 345-56 CrossRef
    43. Rossi F (2006) Visual data mining and machine learning. In: ESANN 2006, d-side pub, Brussels, Belgium, pp 251-64
    44. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500): 2323-326 CrossRef
    45. Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4): 13-2
    46. Svensén M (1998) GTM: The Generative Topographic Mapping. PhD Thesis. Birmingham, UK: Aston University
    47. Tino P, Nabney I (2002) Hierarchical GTM: Constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal 24(5): 639-56 CrossRef
    48. Tobler WR (2004) Thirty-five years of computer cartograms. Ann Assoc Am Geogr 94: 58-3 CrossRef
    49. Tosi A, Vellido A (2012) Cartogram representation of the batch-SOM magnification factor. In ESANN 2012, Bruges, Belgium, 25-7th of April, pp 203-08
    50. Ultsch A (1992) Self-organizing neural networks for visualization and classification. In: GfKl 1992, Dortmund, Germany.
    51. Ultsch A, M?rchen F (2005) ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report 46, CS Department, Philipps-University Marburg, Germany
    52. Vellido A (2006) Missing data imputation through GTM as a mixture of t-distributions. Neural Netw 19(10): 1624-635 CrossRef
    53. Vellido A, Romero E, González-Navarro FF, Belanche-Mu?oz L, Julià-Sapé M, Arús C (2009) Outlier exploration and diagnostic classification of a multi-centre 1H-MRS brain tumour database. Neurocomputing 72(13-15): 3085-097 CrossRef
    54. Vellido A, Martín JD, Rossi F, Lisboa PJG (2011) Seeing is believing: the importance of visualization in real-world machine learning applications. In: ESANN 2011, d-side pub, Brussels, Belgium, pp 219-26
    55. Vellido A, Martín-Guerrero JD, Lisboa PJG, Making machine learning models interpretable. In: ESANN 2012, d-side pub, Brussels, Belgium, pp 163-72
    56. Venna, J (2007) Dimensionality reduction for visual exploration of similarity structures. Doctoral thesis, Helsinki University of Technology, Dissertations in Computer and Information Science, Report D20, Espoo, Finland
    57. Villmann Th, Claussen JC (2006) Magnification control in self-organizing maps and neural gas. Neural Comput 18(2): 446-69 CrossRef
    58. W?ssle H, Grünert U, R?hrenbeck J, Boycott BB (1990) Retinal ganglion cell density and cortical magnification factor in the primate. Vision Res 30(11): 1897-911 CrossRef
    59. Ziemkiewicz C, Kosara R (2009) Preconceptions and individual differences in understanding visual metaphors. Comput Graph Forum (Proceedings EuroVis) 28(3): 911-18 CrossRef
  • 作者单位:Alfredo Vellido (1)
    David L. García (1)
    àngela Nebot (1)

    1. Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, C. Jordi Girona, 1-3, 08034, Barcelona, Spain
  • ISSN:1573-756X
文摘
Real-world applications of multivariate data analysis often stumble upon the barrier of interpretability. Simple data analysis methods are usually easy to interpret, but they risk providing poor data models. More involved methods may instead yield faithful data models, but limited interpretability. This is the case of linear and nonlinear methods for multivariate data visualization through dimensionality reduction. Even though the latter have provided some of the most exciting visualization developments, their practicality is hindered by the difficulty of explaining them in an intuitive manner. The interpretability, and therefore the practical applicability, of data visualization through nonlinear dimensionality reduction (NLDR) methods would improve if, first, we could accurately calculate the distortion introduced by these methods in the visual representation and, second, if we could faithfully reintroduce this distortion into such representation. In this paper, we describe a technique for the reintroduction of the distortion into the visualization space of NLDR models. It is based on the concept of density-equalizing maps, or cartograms, recently developed for the representation of geographic information. We illustrate it using Generative Topographic Mapping (GTM), a nonlinear manifold learning method that can provide both multivariate data visualization and a measure of the local distortion that the model generates. Although illustrated here with GTM, it could easily be extended to other NLDR visualization methods, provided a local distortion measure could be calculated. It could also serve as a guiding tool for interactive data visualization.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.