文摘
A statistical model is presented for computing probabilities that proteins are present in a sample on the basisof peptides assigned to tandem mass (MS/MS) spectraacquired from a proteolytic digest of the sample. Peptidesthat correspond to more than a single protein in thesequence database are apportioned among all corresponding proteins, and a minimal protein list sufficientto account for the observed peptide assignments isderived using the expectation-maximization algorithm.Using peptide assignments to spectra generated from asample of 18 purified proteins, as well as complex H.influenzae and Halobacterium samples, the model isshown to produce probabilities that are accurate and havehigh power to discriminate correct from incorrect proteinidentifications. This method allows filtering of large-scaleproteomics data sets with predictable sensitivity and falsepositive identification error rates. Fast, consistent, andtransparent, it provides a standard for publishing large-scale protein identification data sets in the literature andfor comparing the results obtained from different experiments.