Sparse canonical methods for biological data integration: application to a cross-platform study
详细信息    查看全文
  • 作者:Kim-Anh Lê Cao (1) (2)
    Pascal GP Martin (3)
    Christèle Robert-Granié (1)
    Philippe Besse (2)
  • 刊名:BMC Bioinformatics
  • 出版年:2009
  • 出版时间:December 2009
  • 年:2009
  • 卷:10
  • 期:1
  • 全文大小:2164KB
  • 参考文献:1. Wold H: / Multivariate Analysis / (Edited by: krishnaiah pr). Academic Press, New York, Wiley 1966.
    2. Hotelling H: Relations between two sets of variates. / Biometrika 1936, 28:321-77.
    3. Kr?mer N: An overview of the shrinkage properties of partial least squares regression. / Computational Statistics 2007,22(2):249-73. CrossRef
    4. Chun H, Keles S: Sparse Partial Least Squares Regression with an Application to Genome Scale Transcription Factor Analysis. / Tech rep Department of Statistics, University of Wisconsin, Madison, USA 2007.
    5. Bylesj? M, Eriksson D, Kusano M, Moritz T, Trygg J: Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. / The Plant Journal 2007, 52:1181-191. CrossRef
    6. Vijayendran C, Barsch A, Friehs K, Niehaus K, Becker A, Flaschel E: Perceiving molecular evolution processes in Escherichia coli by comprehensive metabolite and gene expression profiling. / Genome Biology 2008,9(4):R72. CrossRef
    7. Tibshirani R: Regression shrinkage and selection via the lasso. / Journal of the Royal Statistical Society, Series B 1996, 58:267-88.
    8. Zou H, Hastie T: Regularization and variable selection via the elastic net. / Journal of the Royal Statistical Society Series B 2005,67(2):301-20. CrossRef
    9. Lê Cao KA, Rossouw D, Robert-Granié C, Besse P: A Sparse PLS for Variable Selection when Integrating Omics data. / Stat Appl Genet Mol Biol 2008, 7:Article 35.
    10. Waaijenborg S, de Witt Hamer V, Philip C, Zwinderman A: Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis. / Stat Appl Genet Mol Biol 2008,7(1):Article3.
    11. Doledec S, Chessel D: Co-inertia analysis: an alternative method for studying species-environment relationships. / Freshwater Biology 1994,31(3):277-94. CrossRef
    12. Culhane A, Perriere G, Higgins D: Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. / BMC Bioinformatics 2003, 4:59. CrossRef
    13. Gittins R: / Canonical Analysis: A Review with Applications in Ecology Springer-Verlag 1985.
    14. González I, Déjean S, Martin PGP, Baccini A: CCA: An R Package to Extend Canonical Correlation Analysis. / Journal of Statistical Software 2008.,23(12):
    15. Vinod HD: Canonical Ridge and Econometrics of Joint Production. / Journal of Econometrics 1976,4(2):147-66. CrossRef
    16. Combes S, González I, Déjean S, Baccini A, Jehl N, Juin H, Cauquil L, Gabinaud B, Lebas F, Larzul C: Relationships between sensorial and physicochemical measurements in meat of rabbit from three different breeding systems using canonical correlation analysis. / Meat Science 2008, / in press.
    17. Wold S, Eriksson L, Trygg J, Kettaneh N: The PLS method-partial least squares projections to latent structures-and its applications in industrial RDP (research, development, and production). / Tech rep Umea University 2004.
    18. de Jong S: SIMPLS: An alternative approach to partial least squares regression. / Chemometrics and Intelligent Laboratory Systems 1993, 18:251-63. CrossRef
    19. Lorber A, Wangen L, Kowalski B: A theoretical foundation for the PLS algorithm. / Journal of Chemometrics 1987,1(19-1):13.
    20. Tenenhaus M: / La régression PLS: théorie et pratique Editions Technip 1998.
    21. Wegelin J: A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. / Tech Rep 371 Department of Statistics, University of Washington, Seattle 2000.
    22. Zou H, Hastie T, Tibshirani R: Sparse principal component analysis. / Journal of Computational and Graphical Statistics 2006,15(2):265-86. CrossRef
    23. Shen H, Huang JZ: Sparse Principal Component Analysis via Regularized Low Rank Matrix Approximation. / Journal of Multivariate Analysis 2008, 99:1015-034. CrossRef
    24. Robert P, Escoufier Y: A unifying tool for linear multivariate statistical methods: the RV-coefficient. / Applied Statistics 1976,25(3):257-65. CrossRef
    25. Thioulouse J, Chessel D, Dolédec S, Olivier J: ADE-4: a multivariate analysis and graphical display software. / Statistics and Computing 1997, 7:75-3. CrossRef
    26. Butte A, Tamayo P, Slonim D, Golub T, Kohane I: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. / Proc Nat Acad Sci U S A 2000,97(22):12182-2186. CrossRef
    27. Staunton J, Slonim D, Coller H, Tamayo P, Angelo M, Park J, Scherf U, Lee J, Reinhold W, Weinstein J, Mesirov J, Lander E, Golub T: Chemosensitivity prediction by transcriptional profiling. / Proceedings of the National Academy of Sciences 2001,98(19):10787. CrossRef
    28. Ross D, Scherf U, Eisen M, Perou C, Rees C, Spellman P, Iyer V, Jeffrey S, Rijn M, Waltham M, Pergamenschikov A, Lee J, Lashkari D, Shalon D, Myers T, Weinstein J, Botstein D, Brown P: Systematic variation in gene expression patterns in human cancer cell lines. / Nat Genet 2000,24(3):227-5. CrossRef
    29. Scherf U, Ross D, Waltham M, Smith L, Lee J, Tanabe L, Kohn K, Reinhold W, Myers T, Andrews D, Scudiero D, Eisen M, Sausville E, Pommier Y, Botstein D, Brown P, Weinstein J: A gene expression database for the molecular pharmacology of cancer. / Nat Genet 2000,24(3):236-44. CrossRef
    30. Fredman P, Hedberg K, Brezicka T: Gangliosides as Therapeutic Targets for Cancer. / BioDrugs 2003,17(3):155. CrossRef
    31. González I, Déjean S, Martin P, Goncalves O, Besse P, Baccini A: Highlighting Relationships Between Heteregeneous Biological Data Through Graphical Displays Based On Regularized Canonical Correlation Analysis. / Journal of Biological Systems 2008, / in press.
    32. Jolliffe I, Trendafilov N, Uddin M: A Modified Principal Component Technique Based on the LASSO. / Journal of Computational & Graphical Statistics 2003,12(3):531-47. CrossRef
    33. Calvano S, Xiao W, Richards D, Felciano R, Baker H, Cho R, Chen R, Brownstein B, Cobb J, Tschoeke S, Miller-Graziano C, Moldawer L, Mindrinos M, Davis R, Tompkins R, Lowry S: A network-based analysis of systemic in ammation in humans. / nature 2005,437(7061):1032. CrossRef
    34. Yang J, Weinberg R: Epithelial-Mesenchymal Transition: At the Crossroads of Development and Tumor Metastasis. / Developmental Cell 2008,14(6):818-29. CrossRef
    35. Portoukalian J, Zwingelstein G, Dore J: Lipid composition of human malignant melanoma tumors at various levels of malignant growth. / Eur J Biochem 1979,94(1):19-3. CrossRef
    36. Juliano R, Reddig P, Alahari S, Edin M, Howe A, Aplin A: Integrin regulation of cell signalling and motility. / Biochemical Society Transactions 2004, 32:443-46. CrossRef
  • 作者单位:Kim-Anh Lê Cao (1) (2)
    Pascal GP Martin (3)
    Christèle Robert-Granié (1)
    Philippe Besse (2)

    1. Station d'Amélioration Génétique des Animaux UR 631, Institut National de la Recherche Agronomique, F-31326, Castanet, France
    2. Institut de Mathéematiques, Université de Toulouse et CNRS (UMR 5219), F-31062, Toulouse, France
    3. Laboratoire de Pharmacologie et Toxicologie UR 66, Institut National de la Recherche Agronomique, F-31931, Toulouse, France
  • ISSN:1471-2105
文摘
Background In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS) to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines. Results We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results. Conclusion sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. They outperformed CIA that tended to select redundant information.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700