蛋白质组肽段鉴定质量控制方法的研究与应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类基因组测序的完成,为从整体上掌握生命现象的本质和规律,生命科学对生命活动功能的真正执行者——蛋白质展开了全面研究,蛋白质组学成为后基因组时代生命科学研究的热点之一。生物质谱技术的发展为蛋白质组提供了高通量、高灵敏度和高分辨率的分析平台,成为蛋白质组学研究的支撑技术之一,并直接促成了大规模蛋白质组研究的开展。而串联质谱技术结合数据库搜索策略鉴定蛋白质,可以满足组学研究高通量、自动化的要求,已成为人类蛋白质组表达谱研究的重要技术路线。
     数据库搜索策略极大地增强了生物质谱数据的解析效率,但由于生物样品的多样性和实验过程的复杂性,以及现有搜索算法的局限性,使其不能完全解决蛋白质鉴定问题,导致质谱数据分析一直是蛋白质组数据处理的难点。数据库搜索策略存在的问题主要可以概括为两点:即如何保证鉴定结果的完整性和正确性。
     本研究致力于解决质谱数据蛋白质鉴定的正确性问题,针对数据库搜索策略鉴定肽段结果的质量控制展开,在保证肽段置信度的基础上,实现有效地区分正确/错误的鉴定结果。数据库搜索过程中主要由于模糊匹配和随机匹配两种情况存在导致阴性结果的产生,本研究也正是从这两方面着手。同时,本研究还着重考虑了质谱数据质量控制研究所面临的下面几个挑战:
     1.质谱数据复杂程度高,数据库搜索结果易受质谱仪器类型、图谱产生参数、搜库参数、数据库大小构成等多方面因素影响,充分利用质谱数据中所包含的信息将有利于全面完整地描述数据集特征;
     2.如何建立客观的评价体系,既考虑数据集整体置信度水平,又能体现肽段的“个性”,为实验人员提供单个肽段/蛋白鉴定结果的正确概率;
     3.保证所发展模型和方法的通用性及普适性,实现有效分析、整合多种来源的海量复杂数据;
     4.高精度质谱数据已成为生物质谱技术发展的趋势,如何针对高精度质谱数据的特点解析结果将成为质谱信息学的发展方向。
     本文针对数据库搜索策略鉴定肽段质量控制所面临的上述问题,考虑两种肽段水平产生阴性结果的原因,基于随机数据库搜索策略,对不同精度质谱仪器数据以及SEQUEST和Mascot两种最常用数据库搜索引擎的结果展开质控方法研究,提高了肽段过滤过程的灵敏度和实用性,并构建了大规模蛋白质组数据的质控分析流程,为后续生物学问题研究提供更可信、更完备的肽段和蛋白质列表。
     首先,利用标准蛋白数据集和理论模拟谱图集获得常规数据库搜索结果模糊匹配的基本模式,以及在不同精度数据集中的出现频率,并考察了不同数据库搜索质量误差设置对模糊匹配的影响。同时,通过构建包含人和非人物种蛋白质序列数据库,初步估计了实际样品数据集中模糊匹配发生的概率。我们认为模糊匹配主要受到数据集母离子精度的影响,对标准蛋白数据集应采用和样品蛋白同源性小的序列库作为搜索数据库能更真实的评估算法性能,而对于实际样品数据集可以通过把无法区分的鉴定肽段合并不做取舍,来提高蛋白质装配的准确性。
     然后,针对随机匹配问题,分别对高精度LTQ-FT质谱数据、SEQUEST和Mascot软件的数据库搜索结果,通过发展新搜库策略和过滤方法有效改进了其肽段水平的质控性能。
     LTQ-FT是一种兼具高精度和高通量的质谱平台,被广泛地应用于蛋白质组定性和定量分析中,但是该仪器时间依赖的系统误差会导致数据库搜索时无法确定合理的质量误差范围而使其精度大打折扣。我们详细分析了LTQ-FT质谱平台母离子质量误差分布的特点,改进了现有校正公式,并开发了自动化校正的工具。同时,我们提出了一种全新的数据库搜索策略——大误差搜库小误差过滤,用于数据库搜索误差规范和搜库结果确认,通过在标准蛋白数据集和实际样品数据集上的应用,证明了该策略可以显著提高鉴定肽段过滤方法的灵敏度。
     基于随机数据库策略和非参概率密度模型,我们发展了一种用于过滤鸟枪法蛋白质组串联质谱数据SEQUEST软件肽段鉴定结果的方法——贝叶斯非参模型(BNP)。共提取了28个描述搜库结果及其匹配信息的特征参数,利用多元线性回归、期望最大算法和贝叶斯公式完成了肽段局部发现假阳性率的估计,并给出其过滤门限。将模型应用于三批标准蛋白和五批实际样品(包括LCQ、LTQ和LTQ-FT三种仪器的数据)数据集的SEQUEST搜库结果中,并与动态卡值法、PeptideProphet以及简单非参模型比较,在给定期望假阳性率下,BNP模型能得到最多的过滤肽段数,说明了该模型较好的灵敏度和普适性,并且根据BNP模型计算的概率分值可以保留相当一部分被其他方法过滤的高可信肽段结果,从而大大提高了质谱数据的利用效率。
     Mascot作为与SEQUEST齐名的另外一种常用的搜库软件,对其鉴定肽段的质控研究较少,基于Mascot一致性阈值可以严格控制结果的假阳性率,但是其低灵敏度会带来较高的假阴性率,造成大量真实结果的丢失。我们对现有Mascot鉴定结果的过滤和评估方法进行了分类总结,并基于随机数据库搜索策略,通过应用概率模型整合新特征参数完善了Mascot肽段水平的质量控制,有效提高了Mascot搜库结果质控的敏感性,降低了假阴性率并增加了高可信鉴定肽段数目。
     随着人类蛋白质组计划研究的迅速发展,在实验仪器和技术不断进步的同时,也产生了大量的异质数据。为有效整合多来源实验数据,我们基于贝叶斯非参模型构建了大规模质谱数据统一质控标准的分析流程,完成了中国人类肝脏蛋白质组计划中小鼠肝脏细胞器表达谱数据集的系统分析,改进了表达谱常规分析策略的鉴定结果。
     在蛋白质组研究中,应用质谱实验数据获得高可信的鉴定结果对于后续的生物学和临床学应用意义重大,因此如何有效地控制鉴定肽段的假阳性率仍是数据库搜索策略面对的首要问题之一。本文着眼于质谱数据肽段鉴定确认这一过程,合理利用多种数学、统计模型整合多元特征参数深度解析质谱数据,从灵敏性、特异性和普适性三个方面发展和改进肽段过滤方法,完善了蛋白质组鉴定肽段的质量控制,并成功地构建了大规模蛋白质组表达谱数据的质控分析流程。
With the completion of the Human Genome Project, in order to explore the essential nature and laws of life, scientists launched a comprehensive analysis of gene products– genome coded proteins. Proteomics has become one of the most active areas of life science research in the post-genomic era. The development of mass spectrometry has provided a high-throughput, high-sensitivity and high-resolution analysis platform for proteomics. Now, tandem mass spectrometry is one of the most powerful technologies for protein identification, and it makes the global protein profiling possible.
     Tandem mass spectrometry combined with database searching strategy allows high-throughput identification of peptides and proteins in shotgun proteomics. However, it cannot solve the problem of protein identification completely, considering the diversity of biological samples, complexity of experimental process as well as the limitations of existing search algorithms. The problems of applying database search strategy to analyze mass spectrometry data can be summarized as two points, that is how to ensure the integrity and correctness of the identified results. Thus, selection of all those peptide - spectrum assignments that are actually correct is one of the most daunting tasks in mass spectrometry based proteomics investigations.
     In this study, we focus on the improvements of the quality control procedure for peptide identification in shotgun proteomics, and the primary aim is to distinguish the correct and incorrect matches effectively. The negative results of tandem mass data identified peptides are mainly caused by the ambiguous identifications and randomized identifications in database search strategy. The challenges that we face for quality control procedure in mass data analysis are as follows.
     1. The analysis of digested proteins by mass spectrometry is a complex physical and chemical process. The database search results are likely to be effected by many factors, such as the sample complexities, sequence databases, experimental protocols and types of instrumentation. Taking advantage of many new features would provide a means of improving the sensitivity of filtration methods.
     2. It is necessary to establish an objective evaluation system, which not only takes into account the specific data set as a whole, but also reflects the "personality" of each identified peptide by providing the confidence level for each single peptide or protein identification for experimenters.
     3. It is vitally needed to develop robust filter methods and models which can effective analysis data from multi-sources.
     4. Mass spectrometers that provide high-accuracy data are being increasingly used in proteomic studies. Utilizing the accurate mass measurement in data analysis strategy would become a trend in proteomics application.
     In this paper, on the basis of target-decoy database search strategy, we conducted a comprehensive investigation on two kinds of identifications that contribute to the negative hits, and the research focused on the improvements of the validation process of identified peptides, which involved improving the sensitivity, specificity, and generalizability of the filter methods.
     First of all, we evaluated the patterns and frequencies of ambiguous matches occurred in database search outputs, using the standard data sets, theoretically simulated spectra and real sample data. We also conducted an in-depth study about how the different mass error tolerance (MET) settings in database search affected the ambiguous matches’occurrence. The observations indicated that the peptide MET was the main reason that determinated the number of ambiguous matches. The ambiguous matches would be one of the effects that impact the calculated false positive rate of standard protein data sets; and it can be improved by using the searched database composed of low homology sequences. If the ambiguous matches of the same spectrum belong to different proteins, we recommend reporting all peptides as a peptide group and chose the favoring protein supported by other peptide identifications.
     Then, we presented and evaluated the filter methods for peptide validation procedure, specifically for high accurate mass data and two most commonly used search engines SEQUSET and Mascot.
     The hybrid linear trap quadrupole Fourier-transform ion cyclotron resonance mass spectrometer (LTQ-FT), an instrument with high accuracy and resolution, is widely used in the identification and quantification of peptides and proteins. However, time-dependent errors in the system may lead to deterioration of the accuracy of these instruments, negatively influencing the determination of the MET in database searches. We investigated the parent ion mass error distribution of the LTQ-FT mass spectrometer and applied an improved recalibration procedure to determine the statistical MET of different data sets. Based on the improved recalibration formula, we introduced a new tool, FTDR (Fourier-transform data recalibration), that employs a graphic user interface (GUI) for automatic calibration. Consequently, we presented a new strategy, LDSF (Large MET database search and small MET filtration), for database search MET specification and validation of database search results. As the name implies, a large-MET database search is conducted and the search results are then filtered using the statistical MET estimated from high-confidence results. By applying this strategy to both standard protein dataset and complex dataset, we demonstrated the LDSF can significantly improve the sensitivity of the result validation procedure.
     A Bayesian nonparametric (BNP) model was developed to improve the validation of database search results for SEQUEST, which incorporated several popular techniques, including the linear discriminant function (LDF), the flexible nonparametric probability density function (PDF) and the Bayesian method. The BNP model is compatible with the popular target-decoy database search strategy naturally. We tested the BNP model on standard proteins and real complex-sample data sets from multiple MS platforms (LCQ, LTQ and LTQ-FT) and compared it with the cutoff-based method, PeptideProphet and a simple nonparametric method. The performance of the BNP model was shown to be superior for all data sets searched on sensitivity and generalizability. Some high-quality matches that had been filtered out by other methods were detected and assigned with high probability by the BNP model. Thus, the BNP model could be able to validate the database search results effectively and extract more information from MS/MS data.
     The probability-based search engine Mascot has been widely used to identify peptides and proteins in shotgun proteomic research. Most subsequent quality control methods filter out ambiguous assignments according to the ion score and threshold provided by Mascot. On the basis of target–decoy database search strategy, we evaluated the performance of several filter methods on Mascot search results and demonstrated that using filter boundaries on two-dimensional feature space, the Mascot ion score and its relative score, can improve the sensitivity of the filter process. Furthermore, using a linear combination of several of the characters of the assigned peptides, including the Mascot score, 23 previously employed features, and three newly introduced features, we applied the Bayesian nonparametric model to Mascot search results and validated more correctly identified peptides in control and complex data sets than could be validated by empirical score thresholds, the cutoff-based method and linear discriminant model.
     With the rapid development of Human Proteome Project, the experimental instruments and techniques have made great progress. However, a huge number of heterogeneous data has been generated by different laboratories using diverse analytical strategies. In order to integrate the multi-sources data, on the basis of the Bayesian nonparametric model, we conducted a unified data analysis procedure of quality control for large-scale mass spectrometry data. By using this strategy, we reprocessed the mouse liver organelle expression data set of Chinese Human Liver Proteome Project, and greatly improved the peptide and protein identifications.
     Making use of available information which was typically ignored could benefit data analysis process in proteomics. Compared to early researches that only a few characters were used for mass data classifier, more and more features would be involved in mass spectrum data mining process. Combination of new features with an appropriate framework is making an important role in obtaining the good results. On the basis of these concepts, we have done several positively exploratory studies which focused on the application of computational and statistical methods in high-throughput MS/MS data analysis process to improve the quality control for peptide identification in shotgun proteomics.
引文
1. Brower, V., Proteomics: biology in the post-genomic era. Companies all over the world rush to lead the way in the new post-genomics race. EMBO Rep. 2001, 2, (7), 558-560.
    2. Pandey, A.; Mann, M., Proteomics to study genes and genomes. Nature 2000, 405, (6788), 837-846.
    3. Tyers, M.; Mann, M., From genomics to proteomics. Nature 2003, 422, (6928), 193-197.
    4. Aebersold, R.; Mann, M., Mass spectrometry-based proteomics. Nature 2003, 422, (6928), 198-207.
    5. Omenn, G. S.; States, D. J.; Adamski, M.; Blackwell, T. W.; Menon, R.; Hermjakob, H.; Apweiler, R.; Haab, B. B.; Simpson, R. J., Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics 2005, 5, (13), 3226-3245.
    6. Hamacher, M.; Apweiler, R.; Arnold, G.; Becker, A.; Blüggel, M.; Carrette, O.; Colvis, C.; Dunn, M. J.; Fr hlich, T.; Fountoulakis, M., HUPO Brain Proteome Project: summary of the pilot phase and introduction of a comprehensive data reprocessing strategy. Proteomics 2006, 6, (18), 4890-4898.
    7. Consortium, C. H. L. P. P., First Insight into the Human Liver Proteome from PROTEOMESKY-LIVERHu 1.0, a Publicly Available Database. J. Proteome Res. 2010, 9, (1), 79-94.
    8. Gao, X.; Zhang, X. L.; Zheng, J. J.; He, F. C., Proteomics in China: Ready for prime time. SCIENCE CHINA Life Sciences 2010, 53, (1), 22-33.
    9. Shimonishi, Y.; Yeong-Man, H.; Kitagishi, T.; Matsuo, T.; Matsuda, H.; Katakuse, I., Sequencing of peptide mixtures by Edman degradation and field-desorption mass spectrometry. Eur. J. Biochem. 1980, 112, (2), 251-264.
    10. Sakurai, T.; Matsuo, T.; Matsuda, H.; Katakuse, I., Paas 3: A computer program to determine probable sequence of peptides from mass spectrometric data. Biomed. Mass Spectrom. 1984, 11, (8), 396-399.
    11. Bartels, C., Fast algorithm for peptide sequencing by mass spectroscopy. Biomed. Environ. Mass Spectrom. 1990, 19, (6), 363-368.
    12. Lu, B.; Chen, T., Algorithms for de novo peptide sequencing using tandem mass spectrometry. Drug Discov. Today 2004, 2, (2), 85-90.
    13.王中胜;朱云平;贺福初,肽序列从头测序算法.军事医学科学院院刊2006, 30, (5), 465~467.
    14. Eng, J. K.; McCormack, A. L.; Yates, J. R., An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. In J. Am. Soc. Mass Spectrom., 1994; Vol. 5, p 976~989.
    15. Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S., Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, (18), 3551~3567.
    16. Colinge, J.; Masselot, A.; Giron, M.; Dessingy, T.; Magnin, J., OLAV: Towards high-throughput tandem mass spectrometry data identification. Proteomics 2003, 3, (8), 1454-1463.
    17. Craig, R.; Beavis, R. C., TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20, (9), 1466-1467.
    18. Tanner, S.; Shu, H.; Frank, A.; Wang, L. C.; Zandi, E.; Mumby, M.; Pevzner, P. A.; Bafnas, V., InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 2005, 77, (14), 4626-4639.
    19. Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; Stein, S. E.; Aebersold, R., Buildingconsensus spectral libraries for peptide identification in proteomics. Nat Methods. 2008, 5, (10), 873-875.
    20. Frewen, B. E.; Merrihew, G. E.; Wu, C. C.; Noble, W. S.; MacCoss, M. J., Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal. Chem. 2006, 78, (16), 5678–5684.
    21. Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; King, N.; Stein, S. E.; Aebersold, R., Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 2007, 7, (5), 655–667.
    22. Craig, R.; Cortens, J. C.; Fenyo, D.; Beavis, R. C., Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res. 2006, 5, (8), 1843-1849.
    23. Patterson, S. D.; Aebersold, R. H., Proteomics: the first decade and beyond. Nat. Genet. 2003, 33, (Suppl), 311-323.
    24. Deutsch, E. W.; Lam, H.; Aebersold, R., Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol Genomics 2008, 33, (1), 18-25.
    25. Leipzig, J.; Pevzner, P.; Heber, S., The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome. Nucleic Acids Res. 2004, 32, (13), 3977.
    26. Frank, A.; Pevzner, P., PepNovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem. 2005, 77, (4), 964-973.
    27. Nesvizhskii, A. I.; Roos, F. F.; Grossmann, J.; Vogelzang, M.; Eddes, J. S.; Gruissem, W.; Baginsky, S.; Aebersold, R., Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data Toward More Efficient Identification of Post-translational Modifications, Sequence Polymorphisms, and Novel Peptides. Mol. Cell. Proteomics 2006, 5, (4), 652-670.
    28. Nesvizhskii, A. I.; Aebersold, R., Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. Drug Discov. Today 2004, 9, (4), 173-181.
    29. Nesvizhskii, A. I.; Aebersold, R., Interpretation of Shotgun Proteomic Data The Protein Inference Problem. Mol. Cell. Proteomics 2005, 4, (10), 1419-1440.
    30. Ying, W.; Jiang, Y.; Guo, L.; Hao, Y.; Zhang, Y.; Wu, S.; Zhong, F.; Wang, J.; Shi, R.; Li, D., A dataset of human fetal liver proteome identified by subcellular fractionation and multiple protein separation and identification technology. Mol. Cell. Proteomics 2006, 5, (9), 1703-1707.
    31. Carr, S.; Aebersold, R.; Baldwin, M.; Burlingame, A. L.; Clauser, K.; Nesvizhskii, A., The need for guidelines in publication of peptide and protein identification data. Mol. Cell. Proteomics 2004, 3, (6), 531-533.
    32. Bradshaw, R. A.; Burlingame, A. L.; Carr, S.; Aebersold, R., Reporting protein identification data: the next generation of guidelines. Mol. Cell. Proteomics 2006, 5, (5), 787-788.
    33. Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R., Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74, (20), 5383-5392.
    34. Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P., Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2003, 2, (1), 43-50.
    35. Resing, K. A.; Meyer-Arendt, K.; Mendoza, A. M.; Aveline-Wolf, L. D.; Jonscher, K. R.; Pierce, K. G.; Old, W. M.; Cheung, H. T.; Russell, S.; Wattawa, J. L., Improving reproducibilityand sensitivity in identifying human proteins by shotgun proteomics. Anal. Chem. 2004, 76, (13), 3556-3568.
    36. Chamrad, D.; Meyer, H. E., Valid data from large-scale proteomics studies. Nat. Methods 2005, 2, (9), 647-648.
    1. Pandey, A.; Mann, M., Proteomics to study genes and genomes. Nature 2000, 405, (6788), 837-846.
    2. Aebersold, R.; Mann, M., Mass spectrometry-based proteomics. Nature 2003, 422, (6928), 198-207.
    3. Resing, K. A.; Ahn, N. G., Proteomics strategies for protein identification. FEBS Letters 2005, 579, (4), 885-889.
    4. Nesvizhskii, A. I.; Aebersold, R., Interpretation of Shotgun Proteomic Data The Protein Inference Problem. Mol. Cell. Proteomics 2005, 4, (10), 1419-1440.
    5. Nesvizhskii, A. I.; Aebersold, R., Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. Drug Discov. Today 2004, 9, (4), 173-181.
    6. Johnson, R. S.; Davis, M. T.; Taylor, J. A.; Patterson, S. D., Informatics for protein identification by mass spectrometry. Methods 2005, 35, (3), 223-236.
    7. Baldwin, M. A., Protein identification by mass spectrometry: issues to be considered. Mol. Cell. Proteomics 2004, 3, (1), 1.
    8. Sadygov, R. G.; Cociorva, D.; Yates, J. R., Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat. Methods 2004, 1, (3), 195-202.
    9. Eng, J. K.; McCormack, A. L.; Yates, J. R., An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, (11), 976~989.
    10. Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S., Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, (18), 3551~3567.
    11. Carr, S.; Aebersold, R.; Baldwin, M.; Burlingame, A. L.; Clauser, K.; Nesvizhskii, A., The need for guidelines in publication of peptide and protein identification data. Mol. Cell. Proteomics 2004, 3, (6), 531-533.
    12. Bradshaw, R. A.; Burlingame, A. L.; Carr, S.; Aebersold, R., Reporting protein identification data: the next generation of guidelines. Mol. Cell. Proteomics 2006, 5, (5), 787-788.
    13. Steen, H.; Mann, M., The abc's(and xyz's) of peptide sequencing. Nat. Rev. Mol. Cell Biol. 2004, 5, (9), 699-711.
    14. Resing, K. A.; Meyer-Arendt, K.; Mendoza, A. M.; Aveline-Wolf, L. D.; Jonscher, K. R.; Pierce, K. G.; Old, W. M.; Cheung, H. T.; Russell, S.; Wattawa, J. L., Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal. Chem. 2004, 76, (13), 3556-3568.
    15. Falkner, J. A.; Kachman, M.; Veine, D. M.; Walker, A.; Strahler, J. R.; Andrews, P. C., Validated MALDI-TOF/TOF mass spectra for protein standards. J. Am. Soc. Mass Spectrom. 2007, 18, (5), 850-855.
    16. Keller A, P. S., Nesvizhskii AI, Stolyar S, Goodlett DR, Kolker E, Experimental protein mixture for validating tandem mass spectral analysis. OMICS 2002, 6, (2), 207-212.
    17. Klimek, J.; Eddes, J. S.; Hohmann, L.; Jackson, J.; Peterson, A.; Letarte, S.; Gafken, P. R.; Katz, J. E.; Mallick, P.; Lee, H., The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools. J. Proteome Res. 2008, 7, (1), 96-103.
    18. Zhang, Z., Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 2004, 76, 3908-3922.
    19. Zhang, J.; Li, J.; Liu, X.; Xie, H.; Zhu, Y.; He, F., A nonparametric model for quality control of database search results in shotgun proteomics. BMC Bioinformatics 2008, 9, (29).
    20. Brosch, M.; Swamy, S.; Hubbard, T.; Choudhary, J., Comparison of mascot and X! tandem performance for low and high accuracy mass spectrometry and the development of an adjusted mascot threshold. Mol. Cell. Proteomics 2008, 7, (5), 962-970.
    1. Domon, B.; Aebersold, R., Mass spectrometry and protein analysis. Science 2006, 312, (5771), 212-217.
    2. Tolmachev, A. V.; Monroe, M. E.; Purvine, S. O.; Moore, R. J.; Jaitly, N.; Adkins, J. N.; Anderson, G. A.; Smith, R. D., Characterization of strategies for obtaining confident identifications in bottom-up proteomics measurements using hybrid FT MS instruments. Anal Chem 2008, 80, (22), 8514.
    3. Pan, C.; Park, B. H.; McDonald, W. H.; Carey, P. A.; Banfield, J. F.; VerBerkmoes, N. C.; Hettich, R. L.; Samatova, N. F., A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry. BMC Bioinformatics 2010, 11, 118.
    4. Adachi, J.; Kumar, C.; Zhang, Y.; Olsen, J. V.; Mann, M., The human urinary proteome contains more than 1500 proteins, including a large proportion of membrane proteins. Genome Biol. 2006, 7, (9), R80.
    5. Pilch, B.; Mann, M., Large-scale and high-confidence proteomic analysis of human seminal plasma. Genome Biol. 2006, 7, (5), R40.
    6. Res, J. P., Differential Analysis of Membrane Proteins in Mouse Fore-and Hindbrain Using a Label-Free Approach. J. Proteome Res. 2006, 5, (10), 2701-2710.
    7. de Souza, G. A.; Godoy, L. M.; Mann, M., Identification of 491 proteins in the tear fluid proteome reveals a large number of proteases and protease inhibitors. Genome Biol. 2006, 7, (8), R72.
    8. Krijgsveld, J.; Gauci, S.; Dormeyer, W.; Heck, A. J., In-gel isoelectric focusing of peptides as a tool for improved protein identification. J. Proteome Res. 2006, 5, (7), 1721–1730.
    9. Everley, P. A.; Bakalarski, C. E.; Elias, J. E.; Waghorne, C. G.; Beausoleil, S. A.; Gerber, S. A.; Faherty, B. K.; Zetter, B. R.; Gygi, S. P., Enhanced analysis of metastatic prostate cancer using stable isotopes and high mass accuracy instrumentation. J. Proteome Res. 2006, 5, (5), 1224-1231.
    10. Schroeder, M. J.; Webb, D. J.; Shabanowitz, J.; Horwitz, A. F.; Hunt, D. F., Methods for the detection of paxillin post-translational modifications and interacting proteins by mass spectrometry. J. Proteome Res. 2005, 4, (5), 1832–1841.
    11. Dieguez-Acuna, F. J.; Gerber, S. A.; Kodama, S.; Elias, J. E.; Beausoleil, S. A.; Faustman, D.; Gygi, S. P., Characterization of Mouse Spleen Cells by Subtractive Proteomics. Mol. Cell. Proteomics 2005, 4, (10), 1459-1470.
    12. Denison, C.; Rudner, A. D.; Gerber, S. A.; Bakalarski, C. E.; Moazed, D.; Gygi, S. P., A Proteomic Strategy for Gaining Insights into Protein Sumoylation in Yeast. Mol. Cell. Proteomics 2005, 4, (3), 246-254.
    13. Faca, V.; Coram, M.; Phanstiel, D.; Glukhova, V.; Zhang, Q.; Fitzgibbon, M.; McIntosh, M.; Hanash, S., Quantitative analysis of acrylamide labeled serum proteins by LC-MS/MS. J. Proteome Res. 2006, 5, (8), 2009-2018.
    14. Collier, T. S.; Sarkar, P.; Rao, B.; Muddiman, D. C., Quantitative Top-Down Proteomics of SILAC Labeled Human Embryonic Stem Cells. J Am Soc Mass Spectrom 2010, Epub ahead of print.
    15. Rifai, N.; Gillette, M. A.; Carr, S. A., Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol. 2006, 24, (8), 971-983.
    16. Andreev, V. P.; Li, L.; Rejtar, T.; Li, Q.; Ferry, J. G.; Karger, B. L., New algorithm for 15N/14N quantitation with LC-ESI-MS using an LTQ-FT mass spectrometer. J. Proteome Res. 2006, 5, (8), 2039-2045.
    17. Collier, T. S.; Hawkridge, A. M.; Georgianna, D. R.; Payne, G. A.; Muddiman, D. C., Top-down identification and quantification of stable isotope labeled proteins from Aspergillus flavus using online nano-flow reversed-phase liquid chromatography coupled to a LTQ-FTICR mass spectrometer. Anal Chem 2008, 80, (13), 4994-5001.
    18. Meng, F.; Forbes, A. J.; Miller, L. M.; Kelleher, N. L., Detection and localization of protein modifications by high resolution tandem mass spectrometry. Mass Spectrom Rev. 2005, 24, (2), 126-134.
    19. Norbeck, A. D.; Monroe, M. E.; Adkins, J. N.; Anderson, K. K.; Daly, D. S.; Smith, R. D., The Utility of Accurate Mass and LC Elution Time Information in the Analysis of Complex Proteomes. J. Am. Soc. Mass Spectrom. 2005, 16, (8), 1239-1249.
    20. Wang, G.; Wu, W. W.; Zeng, W.; Chou, C. L.; Shen, R. F., Label-free protein quantification using LC-coupled ion trap or FT mass spectrometry: reproducibility, linearity, and application with complex proteomes. J. Proteome Res. 2006, 5, (5), 1214-1223.
    21. Green, M. K.; Johnston, M. V.; Larsen, B. S., Mass Accuracy and Sequence Requirements for Protein Database Searching. Anal. Biochem. 1999, 275, (1), 39-46.
    22. Sleno, L.; Volmer, D. A.; Marshall, A. G., Assigning product ions from complex MS/MS spectra: The importance of mass uncertainty and resolving power. J. Am. Soc. Mass Spectrom. 2005, 16, (2), 183-198.
    23. Ledford Jr, E. B.; Rempel, D. L.; Gross, M. L., Space charge effects in Fourier transform mass spectrometry. Mass calibration. Anal. Chem. 1984, 56, (14), 2744-2748.
    24. Masselon, C.; Tolmachev, A. V.; Anderson, G. A.; Harkewicz, R.; Smith, R. D., Mass measurement errors caused by“local”frequency perturbations in FTICR mass spectrometry. J. Am. Soc. Mass Spectrom. 2002, 13, (1), 99-106.
    25. Muddiman, D. C.; Oberg, A. L., Statistical evaluation of internal and external mass calibration laws utilized in Fourier transform ion cyclotron resonance mass spectrometry. Anal. Chem. 2005, 77, (8), 2406–2414.
    26. Palmblad, M.; Bindschedler, L. V.; Gibson, T. M.; Cramer, R., Automatic internal calibration in liquid chromatography/Fourier transform ion cyclotron resonance mass spectrometry of protein digests. Rapid Commun. Mass Spectrom. 2006, 20, (20), 3076-3080.
    27. Wu, S.; Kaiser, N. K.; Meng, D.; Anderson, G. A.; Zhang, K.; Bruce, J. E., Increased protein identification capabilities through novel tandem MS calibration strategies. J. Proteome Res. 2005, 4, (4), 1434-1441.
    28. Kaiser, N. K.; Anderson, G. A.; Bruce, J. E., Improved mass accuracy for tandem mass spectrometry. J. Am. Soc. Mass Spectrom. 2005, 16, (4), 463-470.
    29. Bruce, J. E.; Anderson, G. A.; Brands, M. D.; Pasa-Tolic, L.; Smith, R. D., Obtaining more accurate Fourier transform ion cyclotron resonance mass measurements without internal standards using multiply charged ions. J. Am. Soc. Mass Spectrom. 2000, 11, (5), 416-421.
    30. Duan, L.; Chan, T. W. D., A modified internal lock-mass method for calibration of the product ions derived from sustained off-resonance irradiation collision-induced dissociation using a Fourier transform mass spectrometer. Rapid Commun. Mass Spectrom. 2004, 18, (12), 1286-1294.
    31. Belov, M. E.; Zhang, R.; Strittmatter, E. F.; Prior, D. C.; Tang, K.; Smith, R. D., Automated gain control and internal calibration with external ion accumulation capillary liquid chromatography-electrospray ionization Fourier transform ion cyclotron resonance. Anal. Chem. 2003, 75, (16), 4195-4205.
    32. Yanofsky, C. M.; Bell, A. W.; Lesimple, S.; Morales, F.; Lam, T. K. T.; Blakney, G. T.; Marshall, A. G.; Carrillo, B.; Lekpor, K.; Boismenu, D., Multicomponent Internal Recalibration of an LC-FTICR-MS Analysis Employing a Partially Characterized Complex Peptide Mixture: Systematic and Random Errors. Anal. Chem. 2005, 77, (22), 7246-7254.
    33. Tolmachev, A. V.; Monroe, M. E.; Jaitly, N.; Petyuk, V. A.; Adkins, J. N.; Smith, R. D., Mass Measurement Accuracy in Analyses of Highly Complex Mixtures Based Upon Multidimensional Recalibration. Anal. Chem. 2006, 78, (24), 8374–8385.
    34. Kruppa, G.; Schnier, P. D.; Tabei, K.; Van Orden, S.; Siegel, M. M., Multiple Ion Isolation Applications in FT-ICR MS: Exact-Mass MS n Internal Calibration and Purification/Interrogation of Protein-Drug Complexes. Anal. Chem. 2002, 74, (15), 3877-3886.
    35. Haas, W.; Faherty, B. K.; Gerber, S. A.; Elias, J. E.; Beausoleil, S. A.; Bakalarski, C. E.; Li, X.; Villen, J.; Gygi, S. P., Optimization and Use of Peptide Mass Measurement Accuracy in Shotgun Proteomics. Mol. Cell. Proteomics 2006, 5, (7), 1326-1337.
    36. Klimek, J.; Eddes, J. S.; Hohmann, L.; Jackson, J.; Peterson, A.; Letarte, S.; Gafken, P. R.; Katz, J. E.; Mallick, P.; Lee, H., The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools. J. Proteome Res. 2008, 7, (1), 96-103.
    37. Zhang, J.; Li, J.; Liu, X.; Xie, H.; Zhu, Y.; He, F., A nonparametric model for quality control of database search results in shotgun proteomics. BMC Bioinformatics 2008, 9, (29).
    38. Zhang, J.; Li, J.; Liu, X.; Xie, H.; Zhu, Y.; He, F., A new strategy to filter out false positive identifications of peptides in SEQUEST database search results. Proteomics 2007, 7, (22), 4036-4044.
    39. Elias, J. E.; Haas, W.; Faherty, B. K.; Gygi, S. P., Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2005, 2, (9), 667-675.
    40. Zubarev, R.; Mann, M., On the Proper Use of Mass Accuracy in Proteomics. Mol. Cell. Proteomics 2007, 6, (3), 377.
    41. Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S., Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, (18), 3551-3567.
    42. Rudnick, P. A.; Wang, Y.; Evans, E.; Lee, C. S.; Balgley, B. M., Large-scale analysis of MASCOT results using a Mass Accuracy–based THreshold (MATH) effectively improves data interpretation. J. Proteome Res. 2005, 4, (4), 1353-1360.
    1. Aebersold, R.; Mann, M., Mass spectrometry-based proteomics. Nature 2003, 422, (6928), 198-207.
    2. Marcotte, E. M., How do shotgun proteomics algorithms identify proteins? Nat. Biotechnol. 2007, 25, (7), 755-757.
    3. Carr, S.; Aebersold, R.; Baldwin, M.; Burlingame, A. L.; Clauser, K.; Nesvizhskii, A., The need for guidelines in publication of peptide and protein identification data. Mol. Cell. Proteomics 2004, 3, (6), 531-533.
    4. Bradshaw, R. A.; Burlingame, A. L.; Carr, S.; Aebersold, R., Reporting protein identification data: the next generation of guidelines. Mol. Cell. Proteomics 2006, 5, (5), 787-788.
    5. Eng, J. K.; McCormack, A. L.; Yates, J. R., An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, (11), 976~989.
    6. Grossmann, J.; Roos, F. F.; Cieliebak, M.; Liptak, Z.; Mathis, L. K.; Muller, M.; Gruissem, W.; Baginsky, S., AUDENS: a tool for automated peptide de novo sequencing. J. Proteome Res. 2005, 4, (5), 1768-1774.
    7. Tabb, D. L.; McDonald, W. H.; Yates Iii, J. R., DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 2002, 1, (1), 21–26.
    8. Qian, W. J.; Liu, T.; Monroe, M. E.; Strittmatter, E. F.; Jacobs, J. M.; Kangas, L. J.; Petritis, K.; Camp, D. G.; Smith, R. D., Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome. J. Proteome Res. 2005, 4, (1), 53-62.
    9. Sun, W.; Li, F.; Wang, J.; Zheng, D.; Gao, Y., AMASS: Software for Automatically Validating the Quality of MS/MS Spectrum from SEQUEST Results. Mol. Cell. Proteomics 2004, 3, (12), 1194-1199.
    10. Li, F.; Sun, W.; Gao, Y.; Wang, J., RScore: a peptide randomicity score for evaluating tandem mass spectra. Rapid Commun. Mass Spectrom. 2004, 18, (14), 1655-1659.
    11. Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R., Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74, (20), 5383-5392.
    12. Lopez-Ferrer, D.; Martinez-Bartolome, S.; Villar, M.; Campillos, M.; Martin-Maroto, F.; Vazquez, J., Statistical model for large-scale peptide identification in databases from tandem mass spectra using SEQUEST. Anal. Chem. 2004, 76, (23), 6853-6860.
    13. Sadygov, R. G.; Liu, H.; Yates, J. R., Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases. Anal. Chem. 2004, 76, (6), 1664~1671.
    14. Anderson, D. C.; Li, W.; Payan, D. G.; Noble, W. S., A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J. Proteome Res. 2003, 2, (2), 137–146.
    15. Razumovskaya, J.; Olman, V.; Xu, D.; Uberbacher, E. C.; VerBerkmoes, N. C.; Hettich, R. L.; Xu, Y., A computational method for assessing peptide- identification reliability in tandem mass spectrometry analysis with SEQUEST. Proteomics 2004, 4, (4), 961-969.
    16. Huttlin, E. L.; Hegeman, A. D.; Harms, A. C.; Sussman, M. R., Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. J. Proteome Res. 2007, 6, (1), 392-398.
    17. Rudnick, P. A.; Wang, Y.; Evans, E.; Lee, C. S.; Balgley, B. M., Large-scale analysis of MASCOT results using a Mass Accuracy–based THreshold (MATH) effectively improves data interpretation. J. Proteome Res. 2005, 4, (4), 1353-1360.
    18. Higdon, R.; Hogan, J. M.; Belle, G. V.; Kolker, E., Randomized sequence databases for tandem mass spectrometry peptide and protein identification. OMICS 2002, 9, (4), 364-379.
    19. Purvine, S.; Picone, A. F.; Kolker, E., Standard mixtures for proteome studies. OMICS 2004, 8, 79-92.
    20. Klimek, J.; Eddes, J. S.; Hohmann, L.; Jackson, J.; Peterson, A.; Letarte, S.; Gafken, P. R.; Katz, J. E.; Mallick, P.; Lee, H., The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools. J. Proteome Res. 2008, 7, (1), 96-103.
    21. Resing, K. A.; Meyer-Arendt, K.; Mendoza, A. M.; Aveline-Wolf, L. D.; Jonscher, K. R.; Pierce, K. G.; Old, W. M.; Cheung, H. T.; Russell, S.; Wattawa, J. L.; Goehle, G. R.; Knight, R. D.; Ahn, N. G., Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal Chem 2004, 76, (13), 3556-3568.
    22. Piening, B. D.; Wang, P.; Bangur, C. S.; Whiteaker, J.; Zhang, H.; Feng, L. C.; Keane, J. F.; Eng, J. K.; Tang, H.; Prakash, A., Quality control metrics for LC-MS feature detection tools demonstrated on Saccharomyces cerevisiae proteomic profiles. J. Proteome Res. 2006, 5, (7), 1527-34.
    23. Chen, M.; Ying, W.; Song, Y.; Liu, X.; Yang, B.; Wu, S.; Jiang, Y.; Cai, Y.; He, F.; Qian, X., Analysis of human liver proteome using replicate shotgun strategy. Proteomics 2007, 7, (14), 2479-88.
    24. Zhang, J.; Li, J.; Liu, X.; Xie, H.; Zhu, Y.; He, F., A nonparametric model for qualitycontrol of database search results in shotgun proteomics. BMC Bioinformatics 2008, 9, (29).
    25. Zhang, J.; Li, J.; Liu, X.; Xie, H.; Zhu, Y.; He, F., A new strategy to filter out false positive identifications of peptides in SEQUEST database search results. Proteomics 2007, 7, (22), 4036-4044.
    26. Kaliszan, R.; Baczek, T.; Cimochowska, A.; Juszczyk, P.; Wi?niewska, K.; Grzonka, Z., Prediction of high-performance liquid chromatography retention of peptides with the use of quantitative structure-retention relationships. Proteomics 2005, 5, (2), 409-415.
    27. Baczek, T.; Bucinski, A.; Ivanov, A. R.; Kaliszan, R., Artificial Neural Network Analysis for Evaluation of Peptide MS/MS Spectra in Proteomics. Anal Chem 2004, 76, (6), 1726-1732
    28. Hogan, J. M.; Higdon, R.; Kolker, N.; Kolker, E., Charge state estimation for tandem mass spectrometry proteomics. OMICS 2005, 9, (3), 233-250.
    29. Huang, Y.; Triscari, J. M.; Tseng, G. C.; Pasa-Tolic, L.; Lipton, M. S.; Smith, R. D.; Wysocki, V. H., Statistical characterization of the charge state and residue dependence of low-energy CID peptide dissociation patterns. Anal Chem 2005, 77, (18), 5800-5813.
    30. Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates III, J. R., Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol 1999, 17, (7), 676–682.
    31. Kristensen, D. B.; Br?nd, J. C.; Nielsen, P. A.; Andersen, J. R.; S?rensen, O. T.; J?rgensen, V.; Budin, K.; Matthiesen, J.; Ven?, P.; Jespersen, H. M.; Ahrens, C. H.; Schandorff, S.; Ruhoff, P. T.; Wisniewski, J. R.; Bennett, K. L.; Podtelejnikov, A. V., Experimental Peptide Identification Repository (EPIR): an integrated peptide-centric platform for validation and mining of tandem mass spectrometry data. Mol Cell Proteomics 2004, 3, (10), 1023-1038.
    32. Fridman, T.; Razumovskaya, J.; Verberkmoes, N.; Hurst, G.; Protopopescu, V.; Xu, Y., The probability distribution for a random match between an experimental-theoretical spectral pair in tandem mass spectrometry. J Bioinform Comput Biol 2005 3, (2), 455-476.
    33. Feny?, D.; Beavis, R. C., A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal Chem 2003, 75, (4), 768-774.
    34. Ulintz, P. J.; Zhu, J.; Qin, Z. S.; Andrews, P. C., Improved classification of mass spectrometry database search results using newer machine learning approaches. Mol Cell Proteomics 2006, 5, (3), 497-509.
    35. Zhang, Z., Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides. Anal Chem 2004, 76, (14), 3908-3922.
    36. Matthiesen, R.; Bunkenborg, J.; Stensballe, A.; Jensen, O. N.; Welinder, K. G.; Bauw, G., Database-independent, database-dependent, and extended interpretation of peptide mass spectra in VEMS V2.0. Proteomics 2004, 4, (9), 2583-2589.
    37. Choi, H.; Nesvizhskii, A. I., Semisupervised model-based validation of Peptide identifications in mass spectrometry-based proteomics. J. Proteome Res. 2008, 7, (1), 254-65.
    38. Jiyang, Z.; Jianqi, L.; Xin, L.; Hongwei, X.; Yunping, Z.; Fuchu, H., A nonparametric model for quality control of database search results in shotgun proteomics. BMC Bioinformatics 2008, 9, (29).
    39. Zubarev, R.; Mann, M., On the Proper Use of Mass Accuracy in Proteomics. Mol Cell Proteomics 2007, 6, (3), 377-381.
    40. Brosch, M.; Swamy, S.; Hubbard, T.; Choudhary, J., Comparison of mascot and X!tandem performance for low and high accuracy mass spectrometry and the development of an adjusted mascot threshold. Mol Cell Proteomics 2008, 7, (5), 962-970.
    41. Sun, W.; Li, F.; Wang, J.; Zheng, D.; Gao, Y., AMASS: software for automatically validating the quality of MS/MS spectrum from SEQUEST results. Mol Cell Proteomics 2004 3, (12), 1194-1199.
    42. Chen, Y.; Kwon, S. W.; Kim, S. C.; Zhao, Y., Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra. J Proteome Res 2005, 4, (3), 998-1005.
    43. Kapp, E. A.; Schutz, F.; Connolly, L. M.; Chakel, J. A.; Meza, J. E.; Miller, C. A.; Fenyo, D.; Eng, J. K.; Adkins, J. N.; Omenn, G. S., An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 2005, 5, (13), 3475–3490.
    44. Yang, X.; Dondeti, V.; Dezube, R.; Maynard, D. M.; Geer, L. Y.; Epstein, J.; Chen, X.; Markey, S. P.; Kowalak, J. A., DBParser: web-based software for shotgun proteomic data analyses. J Proteome Res. 2004, 3, (5), 1002-1008.
    45. Choi, H.; Ghosh, D.; Nesvizhskii, A. I., Statistical validation of Peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J. Proteome Res. 2008, 7, (01), 286–292.
    46. Nesvizhskii, A.; Roos, F. F.; Grossmann, J.; Vogelzang, M.; Eddes, J. S.; Gruissem, W.; Baginsky, S.; Aebersold, R., Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol Cell Proteomics 2006, 5, (4), 652-670.
    47. Put, R.; Daszykowski, M.; Baczek, T.; Vander Heyden, Y., Retention prediction of peptides based on uninformative variable elimination by partial least squares. J Proteome Res 2006, 5, (7), 1618-1625.
    1. Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S., Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, (18), 3551~3567.
    2. Eng, J. K.; McCormack, A. L.; Yates, J. R., An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, (11), 976-989.
    3. Tabb, D. L.; McDonald, W. H.; Yates Iii, J. R., DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 2002, 1, (1), 21–26.
    4. von Haller, P. D.; Yi, E.; Donohoe, S.; Vaughn, K.; Keller, A.; Nesvizhskii, A. I.; Eng, J.; Li, X.; Goodlett, D. R.; Aebersold, R., The application of new software tools to quantitative protein profiling via ICAT and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis, and the application of statistical tools for data analysis and interpretation. Mol. Cell. Proteomics 2003, 2, (7), 425-442.
    5. Choi, H.; Ghosh, D.; Nesvizhskii, A. I., Statistical validation of Peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J. Proteome Res. 2008, 7, 286–292.
    6. Elias, J. E.; Gygi, S. P., Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4, 207-214.
    7. Elias, J. E.; Haas, W.; Faherty, B. K.; Gygi, S. P., Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2005, 2, (9), 667-675.
    8. Brosch, M.; Swamy, S.; Hubbard, T.; Choudhary, J., Comparison of mascot and X! tandem performance for low and high accuracy mass spectrometry and the development of an adjusted mascot threshold. Mol. Cell. Proteomics 2008, 7, (5), 962-970.
    9. Rudnick, P. A.; Wang, Y.; Evans, E.; Lee, C. S.; Balgley, B. M., Large-scale analysis of MASCOT results using a Mass Accuracy–based THreshold (MATH) effectively improves data interpretation. J. Proteome Res. 2005, 4, (4), 1353-1360.
    10. Balgley, B. M.; Laudeman, T.; Yang, L.; Song, T.; Lee, C. S., Comparative Evaluation of Tandem MS Search Algorithms Using a Target-Decoy Search Strategy. Mol. Cell. Proteomics 2007, 6, (9), 1599-1608.
    11. Alves, G.; Ogurtsov, A. Y.; Wu, W. W.; Wang, G.; Shen, R. F.; Yu, Y. K., Calibrating E-values for MS2 database search methods. Biol. Direct 2007, 2, (26).
    12. Moore, R. E.; Young, M. K.; Lee, T. D., Qscore: an algorithm for evaluating SEQUEST database search results. J. Am. Soc. Mass Spectrom. 2002, 13, (4), 378-386.
    13. Kall, L.; Storey, J. D.; MacCoss, M. J.; Noble, W. S., Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J. Proteome Res. 2008, 7, (1), 29-34.
    14. Haas, W.; Faherty, B. K.; Gerber, S. A.; Elias, J. E.; Beausoleil, S. A.; Bakalarski, C. E.; Li, X.; Villen, J.; Gygi, S. P., Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol Cell Proteomics 2006, 5, (7), 1326-1337.
    15. Zhang, J.; Ma, J.; Dou, L.; Wu, S.; Qian, X.; Xie, H.; Zhu, Y.; He, F., Mass measurement errors of Fourier-transform mass spectrometry (FTMS): distribution, recalibration, and application. J. Proteome Res. 2009.
    16. Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R., Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74, (20), 5383-5392.
    17. Kall, L.; Canterbury, J. D.; Weston, J.; Noble, W. S.; MacCoss, M. J., Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 2007, 4, (11), 923-926.
    18. Kall, L.; Storey, J. D.; Noble, W. S., Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry. Bioinformatics 2008, 24, (16), i42-i48.
    19. Choi, H.; Nesvizhskii, A. I., Semisupervised model-based validation of Peptide identifications in mass spectrometry-based proteomics. J. Proteome Res. 2008, 7, (1), 254-265.
    20. Brosch, M.; Yu, L.; Hubbard, T.; Choudhary, J., Accurate and Sensitive Peptide Identification with Mascot Percolator. J. Proteome Res. 2009, 8, (6), 3176-3181.
    21. Keller A, P. S., Nesvizhskii AI, Stolyar S, Goodlett DR, Kolker E, Experimental protein mixture for validating tandem mass spectral analysis. OMICS 2002, 6, (2), 207-212.
    22. Klimek, J.; Eddes, J. S.; Hohmann, L.; Jackson, J.; Peterson, A.; Letarte, S.; Gafken, P. R.; Katz, J. E.; Mallick, P.; Lee, H., The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools. J. Proteome Res. 2008, 7, (1), 96-103.
    23. Piening, B. D.; Wang, P.; Bangur, C. S.; Whiteaker, J.; Zhang, H.; Feng, L. C.; Keane, J. F.; Eng, J. K.; Tang, H.; Prakash, A., Quality control metrics for LC-MS feature detection tools demonstrated on Saccharomyces cerevisiae proteomic profiles. J. Proteome Res. 2006, 5, (7), 1527-34.
    24. Zhang, J.; Li, J.; Liu, X.; Xie, H.; Zhu, Y.; He, F., A nonparametric model for quality control of database search results in shotgun proteomics. BMC Bioinformatics 2008, 9, (29).
    25. Zhang, J.; Li, J.; Liu, X.; Xie, H.; Zhu, Y.; He, F., A new strategy to filter out false positive identifications of peptides in SEQUEST database search results. Proteomics 2007, 7, (22), 4036-4044.
    26. Koenig, T.; Menze, B. H.; Kirchner, M.; Monigatti, F.; Parker, K. C.; Patterson, T.; Steen, J. J.; Hamprecht, F. A.; Steen, H., Robust Prediction of the MASCOT Score for an Improved Quality Assessment in Mass Spectrometric Proteomics. J. Proteome Res. 2008, 7, (9), 3708-3717.
    27. Zhang, J.; Ma, J.; Dou, L.; Wu, S.; Qian, X.; Xie, H.; Zhu, Y.; He, F., Bayesian nonparametric model for the validation of peptide identification in shotgun proteomics. Mol. Cell. Proteomics 2008, 8, (3), 547-557.
    28. Mallick, P.; Schirle, M.; Chen, S. S.; Flory, M. R.; Lee, H.; Martin, D.; Ranish, J.; Raught, B.; Schmitt, R.; Werner, T., Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 2007, 25, (1), 125-131.
    29. Tang, W. H.; Shilov, I. V.; Seymour, S. L., Nonlinear Fitting Method for Determining Local False Discovery Rates from Decoy Database Searches. J. Proteome Res. 2008, 7, (9), 3661-3667.
    30. Higdon, R.; Hogan, J. M.; Van Belle, G.; Kolker, E., Randomized Sequence Databases for Tandem Mass Spectrometry Peptide and Protein Identification. OMICS 2002, 9, (4).
    31. Anderson, D. C.; Li, W.; Payan, D. G.; Noble, W. S., A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J. Proteome Res. 2003, 2, (2), 137–146.
    32. Ulintz, P. J.; Zhu, J.; Qin, Z. S.; Andrews, P. C., Improved classification of mass spectrometry database search results using newer machine learning approaches. Mol. Cell. Proteomics 2006, 5, (3), 497-509.
    1. Hanash, S., Building a foundation for the human proteome: the role of the Human Proteome Organization. J. Proteome Res. 2004, 3, (2), 197-199.
    2. Omenn, G. S.; States, D. J.; Adamski, M.; Blackwell, T. W.; Menon, R.; Hermjakob, H.; Apweiler, R.; Haab, B. B.; Simpson, R. J., Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics 2005, 5, (13), 3226-3245.
    3. Ying, W.; Jiang, Y.; Guo, L.; Hao, Y.; Zhang, Y.; Wu, S.; Zhong, F.; Wang, J.; Shi, R.; Li, D., A dataset of human fetal liver proteome identified by subcellular fractionation and multiple protein separation and identification technology. Mol. Cell. Proteomics 2006, 5, (9), 1703-1707.
    4. Hamacher, M.; Apweiler, R.; Arnold, G.; Becker, A.; Blüggel, M.; Carrette, O.; Colvis, C.; Dunn, M. J.; Fr hlich, T.; Fountoulakis, M., HUPO Brain Proteome Project: summary of thepilot phase and introduction of a comprehensive data reprocessing strategy. Proteomics 2006, 6, (18), 4890-4898.
    5. Consortium, C. H. L. P. P., First Insight into the Human Liver Proteome from PROTEOMESKY-LIVERHu 1.0, a Publicly Available Database. J. Proteome Res. 2010, 9, (1), 79-94.
    6. Gao, X.; Zhang, X. L.; Zheng, J. J.; He, F. C., Proteomics in China: Ready for prime time. SCIENCE CHINA Life Sciences 2010, 53, (1), 22-33.
    7.高雪;郑俊杰;贺福初,我国蛋白质组学研究现状及展望.生命科学2007, 19, (3), 257-263.
    8. Prince, J. T.; Carlson, M. W.; Wang, R.; Lu, P.; Marcotte, E. M., The need for a public proteomics repository. Nat. Biotechnol. 2004, 22, (4), 471-472.
    9. Jones, P.; Coté, R. G.; Martens, L.; Quinn, A. F.; Taylor, C. F.; Derache, W.; Hermjakob, H.; Apweiler, R., PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res. 2006, 34, (Database Issue), D659-D663.
    10. Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R., The peptideatlas project. Nucleic Acids Res. 2006, 34, (Database Issue), D655-D658.
    11. He, F.; Liu, S., CNHUPO Pioneer and Vigorous Roles for Proteomics Investigation in China. Mol. Cell. Proteomics 2008, 7, (6), 1186-1187.
    12. He, F.; Chung, M. C. M.; Jordan, T. W., Chinese Human Liver Proteome Project: A Pathfinder of HUPO Human Liver Proteome Project. J. Proteome Res. 2010, 9, (1), 1-2.
    13. Sun, A.; Jiang, Y.; Wang, X.; Liu, Q.; Zhong, F.; He, Q.; Guan, W.; Li, H.; Sun, Y.; Shi, L., Liverbase: A Comprehensive View of Human Liver Biology. J. Proteome Res. 2009, 9, (1), 50-58.
    14. Adamski, M.; Blackwell, T.; Menon, R.; Martens, L.; Hermjakob, H.; Taylor, C.; Omenn, G.; States, D., Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project. Proteomics 2005, 5, (13), 3246-3261.
    15. Balgley, B. M.; Laudeman, T.; Yang, L.; Song, T.; Lee, C. S., Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol. Cell. Proteomics 2007, 6, (9), 1599-1608.
    16. Elias, J. E.; Haas, W.; Faherty, B. K.; Gygi, S. P., Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2005, 2, (9), 667-675.
    17. Kapp, E. A.; Schutz, F.; Connolly, L. M.; Chakel, J. A.; Meza, J. E.; Miller, C. A.; Fenyo, D.; Eng, J. K.; Adkins, J. N.; Omenn, G. S., An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 2005, 5, (13), 3475–3490.
    18. Alves, G.; Wu, W. W.; Wang, G.; Shen, R. F.; Yu, Y. K., Enhancing peptide identification confidence by combining search methods. J. Proteome Res. 2008, 7, (8), 3102-3113.
    19. Jones, A. R.; Siepen, J. A.; Hubbard, S. J.; Paton, N. W., Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines. Proteomics 2009, 9, (5), 1220-1229.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700