Group sparse canonical correlation analysis for genomic data integration
详细信息    查看全文
  • 作者:Dongdong Lin (14) (15)
    Jigang Zhang (15) (16)
    Jingyao Li (14) (15)
    Vince D Calhoun (17) (18)
    Hong-Wen Deng (15) (16)
    Yu-Ping Wang (14) (15) (16)
  • 关键词:Group sparse CCA ; Genomic data integration ; Feature selection ; SNP
  • 刊名:BMC Bioinformatics
  • 出版年:2013
  • 出版时间:December 2013
  • 年:2013
  • 卷:14
  • 期:1
  • 全文大小:1005KB
  • 参考文献:1. Hamid JS, / et al.: Data integration in genetics and genomics: methods and challenges. / Proteomics Hum Genomics 2009., 2009:
    2. Le Cao KA, / et al.: Sparse canonical methods for biological data integration: application to a cross-platform study. / Bmc Bioinform 2009, 10:34. CrossRef
    3. Wiley HS: Integrating multiple types of data for signaling research: challenges and opportunities. / Sci Signal 2011,4(160):pe9. CrossRef
    4. Le Cao KA, / et al.: A sparse PLS for variable selection when integrating omics data. / Stat Appl Genet Mol Biol 2008, 7:35.
    5. Hotelling H: Relations between two sets of variates. / Biometrika 1936, 28:321鈥?77.
    6. Wegelin JA: / A Survey of Partial Least Squares(PLS) Methods, with Emphasis on the Two-Block Case. Technical Report 371: Department of Statistics. Seattle: University of Washington; 2000.
    7. Parkhomenko E, Tritchler D, Beyene J: Sparse Canonical Correlation Analysis with Application to Genomic Data Integration. / Stat Appl Genet Mol Biol 2009,8(1):1鈥?4.
    8. Lee W, / et al.: Sparse Canonical Covariance Analysis for High-throughput Data. / Stat Appl Genet Mol Biol 2011,10(1):1鈥?4.
    9. Naylor MG, / et al.: Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants. / PLoS One 2010.,5(5):
    10. Soneson C, / et al.: Integrative analysis of gene expression and copy number alterations using canonical correlation analysis. / Bmc Bioinformatics 2010, 11:191. CrossRef
    11. Sui J, / et al.: A CCA鈥?鈥塈CA based model for multi-task brain imaging data fusion and its application to schizophrenia. / Neuroimage 2010,51(1):123鈥?34. CrossRef
    12. Wright J, / et al.: Robust face recognition via sparse representation. / IEEE Trans Pattern Anal Mach Intell 2009,31(2):210鈥?27. CrossRef
    13. Wu TT, / et al.: Genome-wide association analysis by lasso penalized logistic regression. / Bioinformatics 2009,25(6):714鈥?21. CrossRef
    14. Zou H, Hastie T, Tibshirani R: Sparse principal component analysis. / J Comput Graph Stat 2006,15(2):265鈥?86. CrossRef
    15. Waaijenborg S, Hamer PCVDW, Zwinderman AH: Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. / Stat Appl Genet Mol Biol 2008.,7(1):
    16. Witten DM, Tibshirani R, Hastie T: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. / Biostatistics 2009,10(3):515鈥?34. CrossRef
    17. Witten DM, Tibshirani RJ: Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data. / Stat Appl Genet Mol Biol 2009.,8(1): Article 28
    18. Chalise P, Fridley BL: Comparison of penalty functions for sparse canonical correlation analysis. / Comput Stat Data Anal 2012,56(2):245鈥?54. CrossRef
    19. Tyekucheva S, / et al.: Integrating diverse genomic data using gene sets. / Genome Biol 2011,12(10):R105. CrossRef
    20. Yuan M, Lin Y: Model selection and estimation in regression with grouped variables. / J R Stat Soc Ser B-Methodological 2006,68(Part 1):49鈥?7. CrossRef
    21. Meier L, Svd G, Buhlmann P: The group lasso for logistic regression. / J R Stat Soc Ser B-Methodological 2008,70(Part 1):53鈥?1. CrossRef
    22. Puig A, Wiesel A, Hero A: / A multidimensional shrinkagethresholding operator. 2009, 113鈥?16. [ / SSP'09.IEEE/SP 15th Workshop on Statistical Signal Processing]
    23. Simon N, Tibshirani R: Standarization and the group lasso penalty. / Stat Sin 2012, 22:983鈥?001.
    24. Simon N, / et al.: A sparse group lasso. / J Comput Graph Stat 2013,22(2):231鈥?45. CrossRef
    25. Huang JZ, Zhang T: The Benefit of Group Sparsity. / Annals of Statistics 2010,38(4):1978鈥?004. CrossRef
    26. Friedman J, Hastie T, Tibshirani R: A note on the group Lasso and a sparse group Lasso. 2010.Available: http://arxiv.org/pdf/1001.0736.
    27. Zhou H, / et al.: Association screening of common and rare genetic variants by penalized regression. / Bioinformatics 2010,26(19):2375鈥?382. CrossRef
    28. Chen X, Liu H: An efficient optimization algorithm for structured sparse CCA, with applications to eQTL Mapping. / Stat Biosci 2012, 4:3鈥?6. CrossRef
    29. Chen J, / et al.: Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. / Biostatistics 2013,14(2):244鈥?58. CrossRef
    30. Shen HP, Huang JHZ: Sparse principal component analysis via regularized low rank matrix approximation. / J Multivar Anal 2008,99(6):1015鈥?034. CrossRef
    31. Yan JJ, / et al.: Sparse 2-D canonical correlation analysis via low rank matrix approximation for feature extraction. / Ieee Signal Process Letters 2012,19(1):51鈥?4. CrossRef
    32. Kotliarov Y, / et al.: High-resolution global genomic survey of 178 gliomas reveals novel regions of copy number alteration and allelic imbalances. / Cancer Res 2006,66(19):9428鈥?436. CrossRef
    33. Scherf U, / et al.: A gene expression database for the molecular pharmacology of cancer. / Nat Genet 2000,24(3):236鈥?44. CrossRef
    34. Culhane AC, Perriere G, Higgins DG: Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. / Bmc Bioinformatics 2003, 4:59. CrossRef
    35. Wei F, Zhu H: Group coordinate descent algorithms for nonconvex penalized regression. / Comput Stat Data Anal 2012, 56:316鈥?26. CrossRef
    36. Ma S, / et al.: Integrative analysis of multiple cancer prognosis studies with gene expression measurements. / Stat Med 2011,30(28):3361鈥?371. CrossRef
    37. Waaijenborg S, Zwinderman AH: Correlating multiple SNPs and multiple disease phenotypes: penalized non-linear canonical correlation analysis. / Bioinformatics 2009,25(21):2764鈥?771. CrossRef
    38. Su Z, Marchini J, Donnelly P: HAPGEN2: simulation of multiple disease SNPs. / Bioinformatics 2011,27(16):2304鈥?305. CrossRef
    39. Boudreau NJ, Jones PL: Extracellular matrix and integrin signalling: the shape of things to come. / Biochem J 1999,339(Pt 3):481鈥?88. CrossRef
    40. Moissoglu K, Schwartz MA: Integrin signalling in directed cell migration. / Biology of the Cell 2006,98(9):547鈥?55. CrossRef
    41. Giancotti FG, Ruoslahti E: Integrin Signaling. / Science 1999.,285(1028):
    42. Springer TA: Traffic signals on endothelium for lymphocyte recirculation and leukocyte emigration. / Annu Rev Physiol 1995, 57:827鈥?72. CrossRef
    43. Giese A, Westphal M: Glioma invasion in the central nervous system. / Neurosurgery 1996,39(2):235鈥?50. discussion 250鈥? CrossRef
    44. Boone B, / et al.: EGFR in melanoma: clinical significance and potential therapeutic target. / J Cutan Pathol 2011,38(6):492鈥?02. CrossRef
    45. Avery-Kiejda KA, / et al.: P53 in human melanoma fails to regulate target genes associated with apoptosis and the cell cycle and may contribute to proliferation. / Bmc Cancer 2011, 11:203. CrossRef
    46. Hess AR, / et al.: Phosphoinositide 3-kinase regulates membrane Type 1-matrix metalloproteinase (MMP) and MMP-2 activity during melanoma cell vasculogenic mimicry. / Cancer Res 2003,63(16):4757鈥?762.
    47. Wang X, / et al.: Epithelial tight junctional changes in colorectal cancer tissues. / Sci World J 2011, 11:826鈥?41. CrossRef
    48. Silver M, / et al.: Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression. / Neuroimage 2012, 63:1681鈥?694. CrossRef
    49. Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. / J Stat Softw 2010,33(1):1鈥?2.
  • 作者单位:Dongdong Lin (14) (15)
    Jigang Zhang (15) (16)
    Jingyao Li (14) (15)
    Vince D Calhoun (17) (18)
    Hong-Wen Deng (15) (16)
    Yu-Ping Wang (14) (15) (16)

    14. Biomedical Engineering Department, Tulane University, New Orleans, LA, USA
    15. Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
    16. Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, USA
    17. The Mind Research Network, Albuquerque, NM, 87131, USA
    18. Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM, 87131, USA
  • ISSN:1471-2105
文摘
Background The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). Results We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. Conclusions The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature selection simultaneously. It outperforms the two sCCA methods (CCA-l1 and CCA-group) by identifying the correlated features with more true positives while controlling total discordance at a lower level on the simulated data, even if the group effect does not exist or there are irrelevant features grouped with true correlated features. Compared with our proposed CCA-group sparse models, CCA-l1 tends to select less true correlated features while CCA-group inclines to select more redundant features.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700