Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs
详细信息    查看全文
  • 作者:Timothy Clough (1)
    Safia Thaminy (2) (3)
    Susanne Ragg (4)
    Ruedi Aebersold (2) (5)
    Olga Vitek (1) (6)
  • 关键词:Label ; free LC ; MS/MS ; linear mixed effects models ; protein quantification ; quantitative proteomics ; statistical design of experiments
  • 刊名:BMC Bioinformatics
  • 出版年:2012
  • 出版时间:November 2012
  • 年:2012
  • 卷:13
  • 期:16-supp
  • 全文大小:781KB
  • 参考文献:1. Cox J, Mann M: Quantitative, high-resolution proteomics for data-driven systems biology. [http://www.annualreviews.org/doi/abs/10.1146/annurev-biochem-061308鈥?93216] / Annual Review of Biochemistry 2011, 80:273鈥?99. CrossRef
    2. Mallick P, Kuster B: Proteomics: a pragmatic perspective. / Nature Biotechnology 2010, 28:695鈥?09. CrossRef
    3. Schulze WX, Usadel B: Quantitation in mass-spectrometry-based proteomics. [http://www.annualreviews.org/doi/abs/10.1146/annurev-arplant-042809鈥?12132] / Annual Review of Plant Biology 2010, 61:491鈥?16. CrossRef
    4. Liu H, Sadygov RG, Yates JR: A model for random sampling and estimation of relative protein abundance in shotgun proteomics. [http://pubs.acs.org/doi/abs/10.1021/ac0498563] / Analytical Chemistry 2004,76(14):4193鈥?201. CrossRef
    5. Walther TC, Mann M: Mass spectrometry-based proteomics in cell biology. / The Journal of Cell Biology 2010, 190:491. CrossRef
    6. Domon B, Aebersold R: Options and considerations when selecting a quantitative proteomics strategy. / Nature Biotechnology 2010,28(7):710鈥?21. CrossRef
    7. Mueller LN, Brusniak M, Mani DR, Aebersold R: An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. / Journal of Proteome Research 2008, 7:51鈥?1. CrossRef
    8. Gstaiger M, Aebersold R: Applying mass spectrometry-based proteomics to genetics, genomics and network biology. / Nature Revews Genetics 2009,10(9):617鈥?27. CrossRef
    9. Hanash S, Taguchi A: The grand challenge to decipher the cancer proteome. / Nature Reviews Cancer 2010,10(9):652鈥?60. CrossRef
    10. Nilsson T, Mann M, Aebersold R, Yates JR III, Bairoch A, Bergeron JJM: Mass spectrometry in high-throughput proteomics: ready for the big time. / Nature Methods 2010,7(9):681. CrossRef
    11. Lipshutz RJ, Fodor SPA, Gingeras TR, Lockhart DJ: High density synthetic oligonucleotide arrays. / Nature Genetics 1999, 21:20鈥?4. CrossRef
    12. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. / Biostatistics 2003, 4:249鈥?64. CrossRef
    13. Slonim DK, Yanai I: Getting started in gene expression microarray analysis. / PLoS Computational Biology 2009, 5:e10e1000543. CrossRef
    14. Clough T, Key M, Ott I, Ragg S, Schadow G, Vitek O: Protein quantification in label-free LC-MS experiments. / Journal of Proteome Research 2009, 8:5275鈥?284. CrossRef
    15. Bukhman YV, Dharsee M, Ewing R, Chu P, Topaloglou T, Le Bihan T, Goh T, Duewel H, Stewart II, Wisniewski JR, Ng NF: Design and analysis of quantitative differential proteomics investigations using LC-MS technology. / Journal of Bioinformatics and Computational Biology 2008, 6:107鈥?23. CrossRef
    16. Daly DS, Anderson KK, Panisko EA, Purvine S, Fang R, Monroe ME, Baker SE: Mixed-effects statistical model for comparative LC-MS proteomics studies. / Journal of Proteome Research 2008, 7:1209鈥?217. CrossRef
    17. Dicker L, Lin X, Ivanov AR: Increased power for the analysis of label-free LC-MS/MS proteomics data by combining spectral counts and peptide peak attributes. / Molecular & Cellular Proteomics 2010, 9:2704鈥?718. CrossRef
    18. Karpievitch Y, Stanley J, Taverner T, Huang J, Adkins JN, Ansong C, Heffron F, Metz TO, Qian W, Yoon H, Smith RD, Dabney AR: A statistical framework for protein quantitation in bottom-up MS-based proteomics. / Bioinformatics 2009, 25:2028鈥?034. CrossRef
    19. Griffin NM, Yu J, Long F, Oh P, Shore S, Li Y, Koziol JA, Schnitzer JE: Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis. / Nature Biotechnology 2010, 28:83鈥?9. CrossRef
    20. R Development Core Team: [http://www.R-project.org] / R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2009. [ISBN 3鈥?00051鈥?7鈥?]
    21. Zhang H, Li XJ, Martin D, Aebersold R: Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. / Nature Biotechnology 2003, 21:660鈥?66. CrossRef
    22. Sturm M, Bertsch A, Gr枚pl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O: OpenMS - An open-source software framework for mass spectrometry. / BMC Bioinformatics 2008,9(163):1鈥?1.
    23. Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. / Bioinformatics 2003,19(2):185鈥?93. CrossRef
    24. Higgs RE, Knierman MD, Gelfanova V, Butler JP, Hale JE: Label-free LC-MS method for the identification of biomarkers. / Methods in Molecular Biology 2008, 428:209鈥?0. CrossRef
    25. Chang CY, Picotti P, H眉ttenhain R, Heinzelmann-Schwarz V, Jovanovic M, Aebersold R, Vitek O: Protein significance analysis in Selected Reaction Monitoring (SRM) measurements. / Molecular & Cellular Proteomics 2012,11(4):273鈥?99. CrossRef
    26. Kreutz C, Timmer J: Systems biology: experimental design. / FEBS Journal 2009, 276:923鈥?42. CrossRef
    27. Cleveland WS: / Visualizing Data. 1st edition. Summit, New Jersey: Hobart Press; 1993.
    28. Kutner MH, Nachtsheim CJ, Netter J, Li W: / Applied Linear Models. 5th edition. New York: McGraw-Hill/Irwin; 2005.
    29. Cleveland WS, Devlin SJ, Grosse E: Regression by local fitting: methods, properties, and computational algorithms. [http://www.sciencedirect.com/science/article/pii/0304407688900772] / Journal of Econometrics 1988, 37:87鈥?14. CrossRef
    30. Benjamini Y, Hochberg Y: Controlling the false discovery rate:a practical and powerful approach to multiple testing. / JRSS(B) 1995, 57:289鈥?00.
    31. Lenth RV: Some practical guidelines for effective sample size determination. / The American Statistician 2001, 55:187鈥?93. CrossRef
    32. Wittes J: Sample size calculations for randomized controlled trials. / Epidemiologic Reviews 2002, 24:39鈥?3. CrossRef
    33. Oberg AL, Vitek O: Statistical design of quantitative mass spectrometry-based proteomic experiments. / Journal of Proteome Research 2009, 8:2144鈥?156. CrossRef
    34. Zhou C, Simpson KL, Lancashire LJ, Walker MJ, Dawson MJ, Unwin RD, Rembielak A, Price P, West C, Dive C, Whetton AD: Statistical considerations of optimal study design for human plasma proteomics and biomarker discovery. / Journal of Proteome Research 2012,11(4):2103鈥?113. CrossRef
    35. Ning K, Fermin D, Nesvizhskii AI: Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-seq gene expression data. / Journal of Proteome Research 2012,11(4):2261鈥?271. CrossRef
    36. Higgs RE, Knierman MD, Gelfanova V, Butler JP, Hale JE: Comprehensive label-free method for the relative quantification of proteins from biological samples. / Journal of Proteome Research 2005, 4:1442鈥?450. CrossRef
    37. Duda RO, Hart PE, Stork DG: / Pattern Classification. 2nd edition. Wiley-Interscience; 2000.
    38. Wang X, Anderson G, Smith RD, Dabney AR: A hybrid approach to protein differential expression in mass spectrometry-based proteomics. [http://bioinformatics.oxfordjournals.org/content/early/2012/04/19/bioinformatics.bts193.abstract] / Bioinformatics 2012,28(12):1586鈥?591. CrossRef
    39. Tekwe CD, Carroll RJ, Dabney AR: Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data. / Bioinformatics 2012,28(5):1988鈥?003. CrossRef
    40. Webb-Robertson B, McCue LA, Waters KM, Matzke MM, Jacobs JM, Metz TO, Varnum SM, Pounds JG: Combined statistical analyses of peptide intensities and peptide occurrences improves identification of significant peptides from MS-based proteomics data. / Journal of Proteome Research 2010, 9:5748鈥?756. CrossRef
    41. Bates D, Maechler M: [http://CRAN.R-project.org/package=lme4] / lme4: Linear mixed-effects models using S4 classes. 2010. [R package version 0.999375鈥?7]
    42. Montgomery DC: / Design and Analysis of Experiments. 5th edition. New York: John Wiley and Sons; 2000.
    43. McCulloch CE, Searle SR, Neuhaus JM: / Generalized, Linear, and Mixed Models. 2nd edition. Hoboken, New Jersey: Wiley; 2008.
    44. Riter LS, Jensen PK, Ballam JM, Urbanczyk-Wochniak E, Clough T, Vitek O, Sutton J, Athanas M, Lopez MF, MacIsaac S: Evaluation of label-free quantitative proteomics in a plant matrix: a case study of the night-to-day transition in corn leaf. [http://dx.doi.org/10.1039/C1AY05473B] / Anal Methods 2011, 3:2733鈥?739. CrossRef
  • 作者单位:Timothy Clough (1)
    Safia Thaminy (2) (3)
    Susanne Ragg (4)
    Ruedi Aebersold (2) (5)
    Olga Vitek (1) (6)

    1. Department of Statistics, Purdue University, West Lafayette, IN, USA
    2. Department of Biology, Institute of Molecular Systems Biology, ETH, Z眉rich, Switzerland
    3. Institute for Systems Biology, Seattle, WA, USA
    4. School of Medicine, Indiana University, Indianapolis, IN, USA
    5. Faculty of Science, University of Z眉rich, Kragujevac, Switzerland
    6. Department of Computer Science, Purdue University, West Lafayette, IN, USA
  • ISSN:1471-2105
文摘
Background Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs. Results We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background. Conclusions We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700