Component retention in principal component analysis with application to cDNA microarray data
详细信息    查看全文
  • 作者:Richard Cangelosi (1)
    Alain Goriely (2) (3)
  • 刊名:Biology Direct
  • 出版年:2007
  • 出版时间:December 2007
  • 年:2007
  • 卷:2
  • 期:1
  • 全文大小:1267KB
  • 参考文献:1. Pearson K: On lines and planes of closest fit to systems of points in space. / Phil Mag 1901, 2:559鈥?72.
    2. Hotelling H: Analysis of a complex statistical variable into principal components. / J Educ Psych 1933, 26:417鈥?41. 498鈥?20 CrossRef
    3. Rao C: The use and inter etation of principal component analysis in applied research. / Sankhya A 1964, 26:329鈥?58.
    4. Gower J: Some distance properties of latent root and vector methods used in multivariate analysis. / Biometrika 1966, 53:325鈥?38.
    5. Jeffers J: Two case studies in the application of principal component analysis. / Appl Statist 1967, 16:225鈥?36. CrossRef
    6. Preisendorfer R, Mobley C: / Principal component analysis in meterology and oceanography Amsterdam: Elsevier 1988.
    7. Jackson J: / A User's Guide to Principal Components New York: John Wiley & Sons 1991. CrossRef
    8. Arnold G, Collins A: Interpretation of transformed axes in multivariate analysis. / Appl Statist 1993, 42:381鈥?00. CrossRef
    9. Jolliffe I: / Principal Component Analysis Springer, New York 2002.
    10. Alter O, Brown P, Botstein D: Singular value decomposition for genome-wide expression data processing and modeling. / Proc Natl Acad Sci USA 2000, 97:10101鈥?0106. CrossRef
    11. Holter N, Mitra M, Maritan A, Cieplak M, Banavar J, Fedoroff N: Fundamental patterns underlying gene expression profiles: simplicity from complexity. / Proc Natl Acad Sci USA 2000, 97:8409鈥?4. CrossRef
    12. Crescenzi M, Giuliani A: The main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data. / FEBS Letters 2001, 507:114鈥?18. CrossRef
    13. Hsiao L, Dangond F, Yoshida T, Hong R, Jensen R, Misra J, Dillon W, Lee K, Clark K, Haverty P, Weng Z, Mutter G, Frosch M, Macdonald M, Milford E, Crum C, Bueno R, Pratt R, Mahadevappa M, Warrington J, Stephanopoulos G, Stephanopoulos G, Gullans S: A compendium of gene expression in normal human tissues. / Physiol Genomics 2001, 7:97鈥?04.
    14. Misra J, Schmitt W, Hwang D, Hsiao L, Gullans S, Stephanopoulos G, Stephanopoulos G: Interactive exploration of microarray gene expression patterns in a reduced dimensional space. / Genome Res 2002, 12:1112鈥?120. CrossRef
    15. Chen L, Goryachev A, Sun J, Kim P, Zhang H, Phillips M, Macgregor P, Lebel S, Edwards A, Cao Q, Furuya K: Altered expression of genes involved in hepatic morphogenesis and fibrogenesis are identified by cDNA microarray analysis in biliary atresia. / Hepatology 2003,38(3):567鈥?76. CrossRef
    16. Mori Y, Selaru F, Sato F, Yin J, Simms L, Xu Y, Olaru A, Deacu E, Wang S, Taylor J, Young J, Leggett B, Jass J, Abraham J, Shibata D, Meltzer S: The impact of microsatellite instability on the molecular phenotype of colorecal tumors. / Cancer Research 2003, 63:4577鈥?582.
    17. Jiang H, Dang Y, Chen H, Tao L, Sha Q, Chen J, Tsai C, Zhang S: Joint analysis of two micorarray gene expression data sets to select lung adenocarcinoma marker genes. / BMC Bioinformatics 2004, 5:81. CrossRef
    18. Oleksiak M, Roach J, Crawford D: Natural variation in cardiac metabolism and gene expression in Fundulus Heteroclitus . / Nature Genetics 2005,37(1):67鈥?2.
    19. Schena M, Shalon D, Davis R, Brown P: Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. / Science 1995, 270:467鈥?70. CrossRef
    20. Jackson D: Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. / Ecology 1993,74(8):2204鈥?214. CrossRef
    21. Ferr茅 L: Selection of components in principal component analysis: A comparison of methods. / Computat Statist Data Anal 1995, 19:669鈥?82. CrossRef
    22. Bartkowiak A: How to reveal the dimensionality of the data? / Applied Stochastic Models and Data Analysis 1991, 55鈥?4.
    23. Franklin S, Gibson D, Robertson P, Pohlmann J, Fralish J: Parallel Analysis: a method for determining significant components. / J Vegatat Sci 1995, 99鈥?06.
    24. Zwick W, Velicer W: Comparison of five rules for determining the number of components to retain. / Psychol Bull 1986, 99:432鈥?46. CrossRef
    25. Karr J, Martin T: Random number and principal components: further searches for the unicorn. / The use of multivariate statistics in wildlife habitat / (Edited by: Capen D). United Forest Service General Technical Report 1981, RM-87:20鈥?4.
    26. Basilevsky A: / Statistical Factor Analysis and Related Methods: Theory and Applications New York: Wiley-Interscience 1994. CrossRef
    27. Rencher A: / Multivariate Statistical Inference and Applications New York: John Wiley & Sons, Inc 1998.
    28. Tinker N, Robert L, Harris GBL: Data pre-processing issues in microarray analysis. / A Practical Approach to Microarray Data Analysis / (Edited by: Berrar DP, Dubitzky W, Granzow M). Kluwer, Norwell, MA 2003, 47鈥?4. CrossRef
    29. Dubitzky W, Granzow M, Downes C, Berrar D: Introduction to microarray data analysis. / A Practical Approach to Microarray Data Analysis / (Edited by: Berrar DP, Dubitzky W, Granzow M). Kluwer, Norwell, MA 2003, 91鈥?09.
    30. Baxter M: Standardization and transformation in principal component analysis, with applications to archaeometry. / Appl Statist 1995, 44:513鈥?27. CrossRef
    31. Bro R, Smilde A: Centering and scaling in component analysis. / J Chemometrics 2003, 17:16鈥?3. CrossRef
    32. Wall M, Rechsteiner A, Rocha L: Singular value decomposition and principal component analysis. / A Practical Approach to Microarray Data Analysis / (Edited by: Berrar DP, Dubitzky W, Granzow M). Kluwer, Norwell, MA 2003, 91鈥?09. CrossRef
    33. Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. / Proc Natl Acad Sci USA 1998, 95:14863鈥?4868. CrossRef
    34. Spellman P, Sherlock G, Zhang M, Iyer V, Anders K, Eisen M, Brown P, B DB, Futcher : Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyes cerevisiae by microarray hybridization. / Mol Biol Cell 1998, 9:3273鈥?297.
    35. Pielou E: / Ecological Diversity New York: John Wiley & Sons 1975.
    36. Macarthur R: On the relative abundance of bird species. / Proc Natl Acad Sci USA 1957, 43:293鈥?95. CrossRef
    37. Frontier S: 脡tude de la d茅croissance des valeurs propres dans une analyse en composantes principales: comparaison avec le mod猫le du b芒ton bris茅. / Biol Ecol 1976, 25:67鈥?5. CrossRef
    38. North G, Bell T, Cahalan R, Moeng F: Sampling errors in the estimation of empirical orthogonal functions. / Mon Weather Rev 1982, 110:699鈥?06. CrossRef
    39. Reza F: / An Introduction to Information Theory New York: Dover Publications, Inc 1994.
    40. Pierce J: / An introduction to information theory: symbols, signals and noise New York: Dover Publications, Inc 1980.
    41. Khinchin A: / Mathematical Foundations of Information Theory New York: Dover Publications, Inc 1957.
    42. Shannon C: A mathematical theory of communication. / Bell System Technical Journal 1948, 27:379鈥?23. 623鈥?56.
    43. Schneider T: Information theory primer with an appendix on logarithms. [http://www.lecb.ncifcrf.gov/toms/paper/primer] / Center for Cancer Research Nanobiology Program (CCRNP), [Online] 2005.
    44. Chu S, Derisi J, Eisen M, Mulholland J, Bolstein D, Brown P, Herskowitz I: The transcriptional program of sporulation in budding yeast. / Science 1998, 282:699鈥?05. CrossRef
    45. Iyer V, Eisen M, Ross D, Schuler G, Moore T, Lee J, Trent J, Staudt L, J Hudson J, Boguski M: The Transcriptional Program in the Response of Human Fibroblasts to Serum. / Science 1999, 283:83鈥?7. CrossRef
    46. Ross D, Scherf U, Eisen M, Perou C, Rees C, Spellman P, Iyer V, Jeffrey S, Rijn MVD, Waltham M, Pergamenschikov A, Lee J, Lashkari D, Shalon D, Myers T, Weinstein J, Botstein D, Brown P: Systematic variation in gene expression patterns in human cancer cell lines. / Nat Genet 2000, 24:227鈥?35. CrossRef
    47. Raychaurdhuri S, Stuart J, Altman R: Principal component analysis to summarize microarray experiments: application to sporulation time series. / Pac Symp Biocomput 2000, 455鈥?66.
    48. Lax P: / Linear Algebra New York: Wiley 1996.
    49. Cattell B: The scree test for the number of factors. / Multiv Behav Res 1966, 1:245鈥?76. CrossRef
    50. Farmer S: An investigation into the results of principal component analysis of data derived from random numbers. / Statistian 1971, 20:63鈥?2. CrossRef
    51. Stauffer D, Garton E, Steinhorst R: A comparison of principal components from real and random data. / Ecology 1985,66(6):1693鈥?698. CrossRef
    52. Velicer W: Determining the number of principal components from the matrix of partial correlations. / Psychometrika 1975,41(3):321鈥?27. CrossRef
    53. Bartlett M: Tests of significance in factor analysis. / Brit J Psychol Statist Section 1991, 3:77鈥?5.
    54. Guiasu Silviu: Information Theory with Applications. New York: McGraw Hill International Book Company 1977.
  • 作者单位:Richard Cangelosi (1)
    Alain Goriely (2) (3)

    1. Department of Mathematics, University of Arizona, Tucson, AZ85721, USA
    2. Program in Applied Mathematics, University of Arizona, Tucson, AZ85721, USA
    3. BIO5 Institute, University of Arizona, Tucson, AZ85721, USA
文摘
Shannon entropy is used to provide an estimate of the number of interpretable components in a principal component analysis. In addition, several ad hoc stopping rules for dimension determination are reviewed and a modification of the broken stick model is presented. The modification incorporates a test for the presence of an "effective degeneracy" among the subspaces spanned by the eigenvectors of the correlation matrix of the data set then allocates the total variance among subspaces. A summary of the performance of the methods applied to both published microarray data sets and to simulated data is given. This article was reviewed by Orly Alter, John Spouge (nominated by Eugene Koonin), David Horn and Roy Varshavsky (both nominated by O. Alter).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700