Structure-revealing data fusion
详细信息    查看全文
  • 作者:Evrim Acar ; Evangelos E Papalexakis ; G?zde Gürdeniz…
  • 关键词:Data fusion ; Coupled matrix and tensor factorizations ; Optimization ; Sparsity ; NMR ; DOSY ; MS
  • 刊名:BMC Bioinformatics
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:15
  • 期:1
  • 全文大小:3,656 KB
  • 参考文献:1. Alter, O, Brown, PO, Botstein, D (2003) Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. PNAS 100: pp. 3351-3356 CrossRef
    2. Ponnapalli, SP, Saunders, MA, Loan, CFV, Alter, O (2011) A higher-order generalized singular value decomposition for comparison of global mRNA expression from multiple organisms. PLoS One 6: pp. e28072 CrossRef
    3. Acar, E, Plopper, GE, Yener, B (2012) Coupled analysis of in vitro and histology tissue samples to quantify structure-function relationship. PLoS One 7: pp. e32227 CrossRef
    4. Badea, L (2008) Extracting gene expression profiles common to colon and pancreatic adenocarcinoma using simultaneous nonnegative matrix factorization. Pacific Symposium on Biocomputing, Volume. pp. 279-290
    5. Acar, E, Gurdeniz, G, Rasmussen, MA, Rago, D, Dragsted, LO, Bro, R (2012) Coupled matrix factorization with sparse factors to identify potential biomarkers in metabolomics. Int J Knowl Discov Bioinformatics 3: pp. 22-43 CrossRef
    6. Richards, SE, Dumas, ME, Fonville, JM, Ebbels, TM, Holmes, E, Nicholson, JK (2010) Intra- and inter-omic fusion of metabolic profiling data in a systems biology framework. Chemometrics Int Lab Syst 104: pp. 121-131 CrossRef
    7. Krishnamurthy, R, Saleem, F, Liu, P, Dame, ZT, Poelzer, J, Huynh, J, Yallou, FS, Psychogios, N, Dong, E, Bogumil, R, Roehring, C, Wishart, DS (2013) The human urine metabolome. PLoS One 8: pp. e73076 CrossRef
    8. Singh, AP, Gordon, GJ (2008) Relational learning via collective matrix factorization. KDD-8: Proceeding of the 14th ACM SIGKDD international conference on Knowledge Discovery and Data Mining. pp. 650-658
    9. Ma, H, Yang, H, Lyu, MR, King, I (2008) SoRec: Social recommendation using probabilistic matrix factorization. CIKM-8: Proceedings of the 17th ACM Conference on Information and Knowledge Management. pp. 931-940
    10. Jiang, M, Cui, P, Liu, R, Yang, Q, Wang, F, Zhu, W, Yang, S (2012) Social contextual recommendation. CIKM-2: Proceedings of the 21st ACM Conference on Information and Knowledge Management. pp. 45-54
    11. Yeredor, A (2002) Non-orthogonal joint diagonalization in the least-squares sense with application in blind source separation. IEEE Trans Signal Process 50: pp. 1545-1553 CrossRef
    12. Yoo, J, Kim, M, Kang, K, Choi, S (2010) Nonnegative matrix partial co-factorization for drum source separation. ICASSP-0: Proceedings of IEEE International Conference on Acoustics, Speech and Signal. pp. 1942-1945
    13. Lee, CH, Alpert, BO, Sankaranarayanan, P, Alter, O (2012) GSVD Comparison of patient-matched normal and tumor aCGH profiles reveals global copy-number alterations predicting glioblastoma multiforme survival. PLoS One 7: pp. e30098 CrossRef
    14. Acar, E, Kolda, TG, Dunlavy, DM (2011) All-at-once Optimization For Coupled Matrix and Tensor Factorizations. KDD Workshop on Mining and Learning with Graphs (arXiv:1105.3422).
    15. Banerjee, A, Basu, S, Merugu, S (2007) Multi-way clustering on relation graphs. SDM-7: Proceedings of the 2007 SIAM International Conference on Data Mining. pp. 145-156
    16. Smilde, A, Westerhuis, JA, Boque, R (2000) Multiway multiblock component and covariates regression models. J Chemometrics 14: pp. 301-331 CrossRef
    17. Yilmaz, YK, Cemgil, AT, Simsekli, U Generalised coupled tensor factorisation. In: and, Pereira, Shawe-taylor, J, Zemel, RS, Bartlett, P, Weinberger, KQ eds. (2011) Advances in Neural Information Processing Systems 24. pp. 2151-2159
    18. Johnson, CS (1999) Diffusion ordered nuclear magnetic resonance spectroscopy: principles and applications. Prog Nucl Magn Reson Spectrosc 34: pp. 203-256
  • 刊物主题:Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms;
  • 出版者:BioMed Central
  • ISSN:1471-2105
文摘
Background Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. Results While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. Conclusions We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700