文摘
Assembling peptides identified from LC-MS/MS spectra into a list of proteins is a critical step inanalyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins ina database, naïve protein assembly can substantially overstate the number of proteins found in samples.We model the peptide-protein relationships in a bipartite graph and use efficient graph algorithms toidentify protein clusters with shared peptides and to derive the minimal list of proteins. We test theeffects of this parsimony analysis approach using MS/MS data sets generated from a defined humanprotein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion.The results demonstrate that the bipartite parsimony technique not only simplifies protein lists butalso improves the accuracy of protein identification. We use bipartite graphs for the visualization ofthe protein assembly results to render the parsimony analysis process transparent to users. Ourapproach also groups functionally related proteins together and improves the comprehensibility ofthe results. We have implemented the tool in the IDPicker package. The source code and binaries forthis protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.