用户名: 密码: 验证码:
蛋白质组学中串联质谱数据搜库结果质量控制方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白组学试图从整体上系统地研究生命活动的功能分子-蛋白质。由于生物系统中蛋白质表达丰度的动态范围超过了6个数量级,物理化学性质差异很大,所以,蛋白质组研究需要高通量、高灵敏度的分析仪器的支持。生物质谱具有这种特点,因此成为了蛋白质组研究的支撑技术之一。由于检测样品和实验原理的复杂性,质谱数据带有复杂的噪声,并且会受到实验过程中随机因素的严重影响,导致质谱数据分析一直是蛋白质组数据处理的难点。数据库搜索是目前质谱数据分析的主要方法,其基本思想是,将实验获得的图谱和数据库中酶切肽段的理论图谱进行比对,按照一定的打分算法,找出数据库中与实验图谱最匹配的肽段或蛋白质,这种匹配关系以及搜库软件提供的度量匹配质量的分值就构成了基本的搜库结果(又称肽段鉴定结果)。可以看出,搜库结果是在某个候选集合中寻找的最优匹配,但却并不一定是正确的,再加上计算复杂度大,自动化的搜库软件对图谱的解释一般比较粗糙,并且缺乏有效的结果可信度评价方法,数据质量控制问题十分突出。目前,质谱数据质量控制面临着以下几个困难:(1)在很多蛋白质组研究中,需要整合多来源、多质谱平台和多种数据处理软件的结果,需要一种统一的数据可信度评价体系;(2)由于实验原理的复杂性,从理论上推导肽段和图谱匹配的概率模型比较困难,数据质量控制中所使用的很多模型,都是从数据中通过观察、统计、拟合以及学习的方法获得的,建模工作依赖于特定的数据集,模型的推广性需要广泛的数据验证;(3)质谱数据复杂性的表现之一是图谱数据的统计特征会随实验条件、环境因素以及分析样品的变化而变化,这给从数据中建立具有一定推广性的算法模型带来了不小的困难;(4)质谱实验涉及种类繁多的物理化学机制,导致数据的“子类”情况特别多,建立统一、简单的搜库结果评估模型比较困难。目前,搜库结果的质量评价参数多种多样,这些参数从不同方面度量了搜库结果的质量。多元信息融合和综合判决是数据质量控制研究所必须面对的问题;(5)高通量的实验技术产出的数据量很大,给数据处理带来了不小的工程计算问题。
     本文针对蛋白质组中串联质谱数据搜库结果质量控制所面临的上述困难,以满足工程急需为原则,运用统计分析的方法,从数据库搜索参数优化,特征提取、优化和选择,基于随机数据库搜索的搜库结果验证等方面开展工作,研究了串联质谱数据搜库结果的质量控制问题。本文的研究目的在于,提高数据质量控制方法的灵敏度和分辨率,力图解决模型推广性和通用性等工程实践问题,为人类肝脏蛋白质组计划(Human Liver Proteome Proiect,HLPP)的数据分析提供技术支持和分析结果。本文的主要工作包括:
     (1)数据库搜索参数优化。数据库搜索是串联质谱数据搜库结果的质量控制问题的研究基础。在数据库搜索中,有一些需要用户指定的参数,其中有的参数可以决定一张图谱在数据库中的候选肽段集合,对搜库结果影响很大,例如,母离子质量误差容限和酶切参数。这些参数由仪器特性和实验的物理化学原理决定,并且与仪器运行状态、实验设计和样品复杂程度有关,不同的数据集需要根据实际情况慎重选择优化的搜库参数。目前,在蛋白质组数据分析中,很多搜库参数采用的是经验值或者仪器制造商的推荐值,缺乏根据用户数据集确定搜库参数的策略和方法。在实际数据分析中,通过试探性搜库,然后对结果进行统计分析,可以有针对性地优化搜库参数或者给出参数的确定方法。另外,已经有很多实验设计比较严密的标准数据集发表,利用这些数据集和数理统计的方法,也可以对搜库参数进行分析和优化。本文以标准蛋白质(control proteins)的数据集为分析对象,采用改变参数进行多次数据库搜索和数理统计的方法,分析了母离子质量误差容限、碎片离子质荷比误差容限、酶切方法等参数对搜库结果的影响,给出了这些参数的确定方法或者推荐值。在这些研究中,本文提出了从带有噪声的数据中估计母离子质量误差容限和碎片离子质荷比误差容限的方法;改进了高精度的傅立叶变换质谱仪的母离子质量校正公式;发现了碎片离子的质荷比误差随信号强度变化的规律,从而提出了一个根据相对信号强度确定误差容限的经验公式;分析了碎片离子质荷比误差容限对搜库分值的影响,从而给出了其确定方法;分析了漏切位点和酶切端数目对搜库结果的影响,为这2个参数的指定提供了参考。另外,本文还提出了扩大搜库误差容限,然后过滤搜库结果,利用分布拟合的方法确定统计意义上的母离子误差容限,再对全体结果进行过滤的数据处理策略。分析结果表明,这种策略可以有效提高搜库软件采用的参数的分类能力。
     (2)搜库结果质量控制的特征提取。搜库结果的质量控制是典型的模式分类问题,特征提取和选择是模式分类的基础工作。本文系统地总结了搜库结果质量控制的常用参数,将它们分为3类进行分析,包括常用的搜库软件SEQUEST提供的搜库分值、肽段和图谱的基本参数、不同文献中提出的经验参数。另外,对于特征计算相关的问题,例如理论图谱的产生,特征分类能力的度量等,本文也进行了比较深入的分析。在这一部分的研究中,通过文献阅读和对质谱实验背景知识的了解,再加上使用标准数据集进行数据“试验”,本文优化了一些特征的计算。对另外一些特征计算的实际问题,例如,肽段色谱保留时间预测模型的应用,提出了具体的解决方案。使用聚类分析和启发式知识,对特征之间的关系进行了分析。在此基础上,根据本文使用的不同分类方法的特点,给出了特征选择的建议规则。
     (3)基于随机数据库搜索的搜库结果验证方法研究。目前,在蛋白质组实验研究中,基于随机数据库搜索的肽段鉴定结果验证方法已经得到广泛应用。这种方法能够为不同样品、搜库软件、质谱平台、实验条件下的数据提供统一的质量控制框架。但是,基于随机数据库搜索方法的多个应用问题,还没有得到很好的解决,也缺乏方法性能评估的研究。在这一部分的研究中,本文首先提出了一种随机数据库的构建方法,通过实际搜库验证,发现这种方法可以很好地避免重复肽段问题,并且得到的搜库分值分布也能比较好地模拟正常数据库中的随机匹配的分值分布。在此基础上,本章研究了从简单到复杂的4种搜库结果验证的分类决策方法:提出了线性判别函数法(LDF法)、基于多元非参数概率密度函数拟合的方法和基于贝叶斯非参数模型的方法,改进了基于ln(Xcorr)和△Cn~(1/2)的边缘分布拟合的方法。这些研究共同的目的是,解决基于随机数据库搜索方法的判别函数选择和特征融合问题,以提高搜库结果过滤方法的灵敏度。其中,本文提出的线性判别函数法性能比较好,也比较简单,容易被实验人员所接受,在中国人肝脏蛋白质组计划的数据分析中已经得到应用。而基于ln(Xcorr)和△Cn~(1/2)的边缘分布拟合的方法和线性判别函数方法得到的结果基本一致,判别边界也十分接近。基于多元非参数概率密度函数拟合的方法和基于贝叶斯非参数模型的方法都使用了多个特征,方法的灵敏度得到了很大程度的提高。利用标准样品和实际样品的实验数据进行的验证表明,本文提出和改进的这4种方法比已有的搜库结果验证方法具有更高的灵敏度,并且在标准样品数据集上能够获得比较准确的假阳性率估计。另外,通过和PeptideProphet进行比较发现,基于随机数据库的方法在不同的数据集上都能够取得比较好的结果,模型具有比较好的泛化性能。
     总之,本文针对质谱数据质量控制中数据量大、特征分布可变、噪声复杂等特点,通过大量的数据统计分析,揭示了串联质谱数据质量控制的一系列问题和困难。在此基础上,通过对串联质谱数据处理各个环节的研究,包括搜库参数的优化,搜库结果验证的特征提取和选择、基于多元特征融合、非参数概率密度函数估计的搜库结果验证方法等方面,在很大程度上克服了串联质谱数据搜库结果质量控制的困难,提高了数据质量控制方法的灵敏度和鲁棒性。本文的研究成果在HLPP数据分析中已经得到了应用。
Proteomics aims to systematically investigate the function molecules of the life-proteins, at the global level. Because the varying range of the protein expression in a biological system may exceed 6 order of magnitude, the physical and chemical properties of them are very complex, proteomic research needs high-throughput and high sensitive experiment platforms. Biological mass spectrometry (MS) has these characters and thus become a supporting technique of proteomic researches. Because of the complexity of the sample and the complex chemical and physical mechanism of the MS experiment, MS data involves complex noises and MS data process is an open, hot and difficult problem of proteomics. Database searching serves as a popular method of the mass spectrometry data process by comparing the experiment mass spectrum with the predicted spectrum of the digested peptide in a target protein sequence database, and finding the best matches with some scores aimed to measure the match quality. A database search result (also called peptide identification) is the best match in a limited searching space, which is not necessarily correct. Because of the huge computing burden, the automatic database search software interprets the mass spectra roughly and without any effective methods to evaluate the confidence of the resulting matches. Therefore, the problem with quality control of the mass spectrometry data is notable in the fowllowing areas: (1) Integrating the MS data from the multiple laboratories and multiple platforms is a common manner in the proteomic research. Thus, a universal quality control framework is needed for the large-scale proteomic research. (2) It is difficult to set up the probability model based on the complex physical model of the MS experiment. Many models used in the data quality control of peptide identifications were obtained by observation, statistic fitting or training from the standard dataset, so that the universality of these models is doubtful and the validation of the results given by the model is tiring work in the proteomic research. (3) One reason for the complexity of MS data is that the statistical characters of the data may change with the experiment conditions, environment factors and treating samples making it very difficult to build universal algorithms for the MS data process. (4) The various chemical and physical mechanisms involved in the MS experiment leads to the existence of many sub classes in the MS data. It is difficult to model the database search problem with a one-size-fit-all algorithm. Hence, multiple parameters were used to validate the database search results. Those parameters measure the match quality between the mass spectra and peptides in different aspects. The integration and fusion of multi-source information and synthetic decision-making is needed for quality control of the peptide identifications. (5) The huge data volume in the proteomic research brings about notable computing problems.
     This paper is intended to address these problems in the database search result validation, and focuses on the optimization of some database search parameters, extraction and selection of the features for the classification of correct and random peptide identifications, and some algorithms and schemes for the evaluation of peptide identifications based on the randomized database searching. The main work includes: (1) The optimization of some database search parameters. Database search is the base of the quality control of peptide identifications. Many parameters need to be specified by the user before database searching. Some of database search parameters can restrict the candidate peptides of a mass spectrum and affect the database search results greatly. These parameters rely on the character of the instrument and the physical and chemical theory of the experiment, and can be affected by the work status of the instrument, the experiment protocol and the complexity of the sample. In many researches, these parameters are selected as the recommended values provided by the instrument manufacturer or references. Statistical conclusions are lacking about their optimized values, which should be based on the experiment data of the user. Actually, many the database search parameters can be estimated form the results of the exploring database search. On the other hand, many reference datasets with strict experiment design have been published, which can be used to analyze and optimize the database search parameters. In this paper, the influence (on the database results) of mass error of parent ions, m/z error of the fragment ions and the enzyme specificity were investigated using the reference datasets and statistical methods. A robust method was proposed to estimate the mass error tolerance of the parent ions and the m/z error tolerance of fragment ions from the data with noise. An improved recalibration law was proposed for the high accuracy Fourier-transform mass spectrometry based on the observation that the mass error increases with the retention time. The m/z error of the fragment ions was found to decrease with the signal intensity of the ions, and an empirical formula is provided to determine the m/z error tolerance according to the signal intensity. The distribution of the number of miss-cleavage sites of the correct peptide identification and the distribution of the number of peptide identifications with different tryptic terminals is also analyzed. Based on the work in this section, we proposed a database search strategy that enlarges the actual database search parent mass error tolerance at first and than filters the results based on the statistical parent mass error tolerance. This strategy was applied to a control dataset and the results showed that it could improve the discriminant power of the database scores.
     (2) Feature extraction and selection of the quality control of database search results. The quality control of database search results is a typical pattern classification problem. Feature extraction and selection is the essential work of pattern classification. This paper summarized the parameters of the quality control of database search results, which include the database scores, the basic character of the mass spectrum and peptide and the empirical parameters proposed in different literatures. And then, this paper introduced the generation of theoretic MS/MS spectrum and the measurement of the discriminant power of these features. In this research, the discriminant powers of some features were optimized based on the background knowledges and exploring data analysis. Meanwhile, some practice problems about the application of peptide retention time to the validation of peptide identifications were discussed and settled. A set of features proposed in different literatures were summarized and defined. Finally, based on the background knowledge and the clustering analysis, correlation analysis was performed on these features and the basic rules were provided for the feature selection of different methods of database search result validation, which will be used in this paper.
     (3) The work on the validation of peptide identifications based on the randomized database searching. Currently, the randomized searching based methods can provide a universal framework for the quality control of MS data with different samples, different platforms, different experiment conditions and different database search softwares. However, many practical problems with the randomized database searching based methods are not adequately solved and the evaluation research on the performance of the randomized database searching based methods is still primary. This paper proposed a method for the construction of randomized database, which could avoid the share peptide problem. Then, four methods were proposed to validate the database search results: linear discriminant function based method, ln(Xcorr) and (ΔCn)~(1/2) margin distribution fitting based method, the multivariate nonparametric density estimation based method and the Bayesian nonparametric model based method. These efforts aimed to provide some solutions for the discriminant functions and the feature fusion in the randomized database searching based methods, and thus improve the sensitivity of the database search result validation. The linear discriminant function based method was easy to use and had been applied to the Human Liver Proteome Project (HLPP). ln(Xcorr) and (ΔCn)~(1/2) margin distribution fitting based method got almost the same results with the linear discriminant function based method. The other two methods used more features and the sensitivity of them is improved a lot. These methods were evaluated using the control datasets and real sample datasets and were proved to be more sensitive than traditional randomized database searching based methods. In addition, the false positive rate estimation was proved accurate enough on the control dataset. On the other hand, we compared the performance of the randomized database searching based method with PeptideProphet and found that the randomized database searching based method could get better performance on datasets from different instruments and laboratories. The generalization performance of the randomized database searching method was improved.
     In a word, this paper revealed a series of problems of the quality control of tandem mass spectrometry data by applying the statistical analysis to the huge datasets in proteomic research, which had varying statistics and contained complex noise inherently. Consequently, a systematic research on the optimization of some database search parameters, extracting and selecting of the features for the classification of correct and random peptide identifications, and some algorithms and schemes for the evaluation of peptide identifications based on the randomized database searching was provided. The methods proposed in this paper can largely improve the sensitivity of the validation of peptide identifications and overcome the variation of the datasets, which were based on the multi-source feature fusion and feasible nonparametric technique. The methods proposed in this paper have been applied in the HLPP.
引文
1.Kitano,H.,Foundations of Systems Biology First Edition ed.2001:The MIT Press.320.
    2.Edda Klipp,R.H.,Axel Kowald,Christoph Wierling,Hans Lehrach Systems Biology in Practice:Concepts,Implementation and Application.2005:Wiley-VCH.486.
    3.Pandey,A.and M.Mann,Proteomics to study genes and genomes.Nature,2000.405(6788):p.837-846.
    4.Aebersold,R.and M.Mann,Mass spectrometry-based proteomics.Nature,2003.422(6928):p.198-207.
    5.http://www.hupo.org/.
    6.Cottingham,K.,HUPO Plasma Proteome Project:challenges and future directions.J Proteome Res,2006.5(6):p.1298.
    7.He,F.,Human liver proteome project:plan,progress,and perspectives.Mol Cell Proteomics,2005.4(12):p.1841-8.
    8.Hamacher,M.and H.E.Meyer,HUPO Brain Proteome Project:aims and needs in proteomics.Expert Rev Proteomics,2005.2(1):p.1-3.
    9.States DJ,O.G.,Blackwell TW,Fermin D,Eng J,et al,Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study.Nat Bioteehnol,2006.24(3):p.333-338.
    10.Ji C,L.L.,Quantitative proteome analysis using differential stable isotopic labeling and microbore LC-MALDI MS and MS/MS.J Proteome Res,2005.4(3):p.734-742.
    11.Wang G,W.W.,Zeng W,Chou CL,Shen RF,Label-free protein quantification using LC-coupled ion trap or FT mass spectrometry:Reproducibility,linearity,and application with complex proteomes.J Proteome Res,2006.5(5):p.1214-1223.
    12.Rifai A,G.A.,Carr AS,Protein biomarker discovery and validation:the long and uncertain path to clinical utility.Nat Biotechnol,2006.24(8):p.971-983.
    13.Hilario M,K.A.,M(u|¨)ller M,Pellegrini C,Machine learning approaches to lung cancer prediction from mass spectra.Proteomics,2003.3(9):p.1716-1719.
    14.Villén J,B.S.,Gerber SA,Gygi SP,Large-scale phosphorylation analysis of mouse liver.Proc Natl Acad Sci U S A,2007.104(5):p.1488-93.
    15.Olsen JV,B.B.,Gnad F,Macek B,Kumar C,Mortensen P,Mann M,Global,in vivo,and site-specific phosphorylation dynamics in signaling networks.Cell,2006.127(3):p.635-48.
    16.Callister SJ,D.M.,Nicora CD,Zeng X,Tavano CL,Kaplan S,Donohue TJ,Smith RD,Lipton MS,Application of the accurate mass and time tag approach to the proteome analysis of sub-cellular fractions obtained from Rhodobacter sphaeroides 2.4.1.Aerobic and photosynthetic cell cultures.J Proteome Res,2006.5(8):p.1940-1947.
    17.Dworzanski JP,D.S.,Chen R,Jabbour RE,Snyder AP,Wick CH,Li L,Mass spectrometry-based proteomics combined with bioinformatic tools for bacterial classification.J Proteome Res,2006.5(1):p.76-87.
    18.Savidor A,D.R.,Hurtado-Gonzales O,Verberkmoes NC,Shah MB,Lamour KH,McDonald WH,Expressed peptide tags:an additional layer of data for genome annotation. J Proteome Res, 2006. 5(11): p. 3048-3058.
    
    19. Beardsley RL, S.L., Reilly JP, Peptide de novo sequencing facilitated by a dual-labeling strategy. Anal Chem, 2005. 77(19): p. 6300-6309.
    
    20. Blueggel M, C.D., Meyer HE, Bioinformatics in proteomics. Curr Pharm Biotechnol, 2004. 5(1): p. 79-88
    
    21. Sadygov, R.G., D. Cociorva, and J.R. Yates, Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book. Nat Meth,2004.1(3): p. 195-202.
    
    22. Patterson, S.D., Data analysis—the Achilles heel of proteomics Nat Biotechnol,2003 21(3): p. 221-222.
    
    23. Nesvizhskii AI, A.R., Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. Drug Discov Today,2004. 9(4): p. 173-181.
    
    24. Chamrad D, M.H., Valid data from large-scale proteomics studies. Nat Methods,2005. 2(9): p. 647-648.
    
    25. Carr S, A.R., Baldwin M, Burlingame A, Clauser K, Nesvizhskii A; Working Group on Publication Guidelines for Peptide and Protein Identification Data, The need for guidelines in publication of peptide and protein identification data:Working Group on Publication Guidelines for Peptide and Protein Identification Data. Mol Cell Proteomics, 2004. 3(6): p. 531-533.
    
    26. Ulintz PJ, Z.J., Qin ZS, Andrews PC, Improved classification of mass spectrometry database search results using newer machine learning approaches. Mol Cell Proteomics, 2006. 5(3): p. 497-509.
    
    27. Wong JW, S.M., Cartwright HM, Cagney G, msmsEval: tandem mass spectral quality assignment for high-throughput proteomics. BMC Bioinformatics, 2007.8(51).
    
    28. Bern, M., et al., Automatic quality assessment of peptide tandem mass spectra.Bioinformatics, 2004. 20 Suppl 1: p. i49-54.
    
    29. Flikka, K., et al., Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering. Proteomics, 2006.6(7): p. 2086-94.
    
    30. Nesvizhskii, A.I., et al., Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol Cell Proteomics, 2006. 5(4): p. 652-70.
    
    31. Salmi, J., et al., Quality classification of tandem mass spectrometry data.Bioinformatics, 2006. 22(4): p. 400-6.
    
    32. Xu M, G.L., Bryant SH, Roth JS, Kowalak JA, Maynard DM, Markey SP,Assessing data quality of peptide mass spectra obtained by quadrupole ion trap mass spectrometry. J Proteome Res, .2005. 4(2): p. 5.
    
    33. Frank, A.P., P., PepNovo: De Novo Peptide Sequencing via Probabilistic Network Modeling. 2005. p. 964-973.
    
    34. Bern M, G.D., McDonald WH, Yates JR, Automatic quality assessment of Peptide tandem mass spectra. Bioinformatics, 2004 20(Suppl 1): p. 149-154.
    
    35. Xu, M.G., L. Y.Bryant, S. H.Roth, J. S.Kowalak, J. A.Maynard, D. M.Markey, S. P.,Assessing Data Quality of Peptide Mass Spectra Obtained by Quadrupole Ion Trap Mass Spectrometry. 2005. p. 300-305.
    
    36. Nesvizhskii AI, R.F., Grossmann J, Vogelzang M, Eddes JS, Gruissem W, Baginsky S,Aebersold R,Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data:toward more efficient identification of post-translational modifications,sequence polymorphisms,and novel peptides.Mol Cell Proteomics,2006.5(4):p.652-670.
    37.MA,B.,Protein identification by mass spectrometry:issues to be considered Mol Cell Proteomics,2004.3(1):p.1-9.
    38.Huttlin EL,H.A.,Harms AC,Sussman MR,Prediction of Error Associated with False-Positive Rate Determination for Peptide Identification in Large-Scale Proteomics Experiments Using a Combined Reverse and Forward Peptide Sequence Database Strategy.J.Proteome Res,2007.6(1):p.392-398.
    39.J.K.Eng,A.L.M.,and J.R.Yates,III,An approach to correlate MS/MS data to amino acid sequences in a protein database.J.American Soc.Mass Spectrom,1994.5:p.976-989.
    40.Nesvizhskii AI,A.R.,Interpretation of shotgun proteomic data:the protein inference problem.Mol Cell Proteomics,2005.4(10):p.1419-1440.
    41.Price TS,L.M.,Wu W,Austin DJ,Pizarro A,Yocum AK,Blair IA,FitzGerald GA,Grosser T,EBP,a program for protein identification using multiple tandem mass spectrometry datasets.Mol Cell Proteomics,2007.6(3):p.527-536.
    42.Elias JE,H.W.,Faherty BK,Gygi SP,Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations.Nat Methods,2005.2(9):p.667-675.
    43.Tabb,D.L.,W.H.McDonald,and J.R.Yates,DTASelect and Contrast:Tools for Assembling and Comparing Protein Identifications from Shotgun Proteomics.2002.p.21-26.
    44.Sun W,L.F.,Wang J,Zheng D,Gao Y,AMASS:software for automatically validating the quality of MS/MS spectrum from SEQUEST results.Mol Cell Proteomics,2004 3(12):p.1194-1199.
    45.Han,D.K.,et al.,Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry.Nat Biotech,2001.19(10):p.946-951.
    46.Resing KA,M.-A.K.,Mendoza AM,Aveline-Wolf LD,Jonscher KR,Pierce KG,Old WM,Cheung HT,Russell S,Wattawa JL,Goehle GR,Knight RD,Ahn NG,Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics.Anal Chem,2004.76(13):p.3556-68.
    47.Link AJ,E.J.,Schieltz DM,Carmack E,Mize GJ,Morris DR,et al,Direct analysis of protein complexes using mass spectrometry.Nat Biotechnol,1999.17(7):p.676-682.
    48.Eddes JS,K.E.,Frecklington DF,Connolly LM,Layton MJ,Moritz RL,Simpson RJ,CHOMPER:a bioinformatic tool for rapid validation of tandem mass spectrometry search results associated with high-throughput proteomic strategies.Proteomics,2002.2(9):p.1097-1103.
    49.Chen Y,K.S.,Kim SC,Zhao Y,Integrated Approach for Manual Evaluation of Peptides Identified by Searching Protein Sequence Databases with Tandem Mass Spectra.J Proteome Res,2005.4(3):p.998-1005.
    50.Qian WJ,L.T.,Monroe ME,Strittmatter EF,Jacobs JM,Kangas LJ,Petritis K,Camp DG,Smith RD,Probability-Based Evaluation of Peptide and Protein identifications from Tandem Mass Spectrometry and SEQUEST Analysis:The Human Proteome J Proteome Res,2005.4(1):p.53-62.
    51.Li F,S.W.,Gao Y,Wang J,RScore:a peptide randomcity score for evaluating tandem mass spectra.Rapid Commun Mass Spectrom,2004.18(14):p.1655-1659.
    52.Peng J,E.J.,Thoreen CC,Licklider Lj and Gygi SP,Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC-MS/MS) for Large-Scale Protein Analysis:The Yeast Proteome.J Proteome Res,2003.2(1):p.43-50.
    53.Kislinger T,R.K.,Radulovic D,Cox B,Rossant J,Emili A,PRISM,a generic large scale proteomic investigation strategy for mammals.Mol Cell Proteomics,2003 2(2):p.96-106.
    54.Haas W,F.B.,Gerber SA,Elias JE,Beausoleil SA,Bakalarski CE,Li X,Villen J,Gygi SP.,Optimization and use of peptide mass measurement accuracy in shotgun proteomics.Mol Cell Proteomics,2006.5(7):p.1326-1337.
    55.Washburn MP,W.D.,Yates JR,Large-scale analysis of the yeast proteome by multidimensional protein identification technology.Nat Biotechnol,2001.19(3):p.242-247.
    56.Keller A,N.A.,Kolker E,Aebersold R,Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal Chem,2002.74(20):p.5383-5392.
    57.Lopez-Ferrer D,M.-B.S.,Villar M,Campillos M,et al,Statistical model for large-scale peptide identification in databases from tandem mass spectra using SEQUEST.Anal Chem,2004.76(23):p.6853-6860.
    58.Eriksson J,F.D.,A model of random mass-matching and its use for automated significance testing in mass spectrometric proteome analysis.Proteomics,2002.2(3):p.262-270.
    59.Sadygov RG,Y.J.r.,A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases.Anal Chem,2003.75(15):p.3792-3798.
    60.Sadygov RG,L.H.,Yates JR 3rd,Statistical Models for Protein Validation Using Tandem Mass Spectral Data and Protein Amino Acid Sequence Databases.Anal Chem,2004.76(6):p.1664-1671.
    61.Moore RE,Y.M.,Lee TD,Qscore:An Algorithm for Evaluating SEQUEST Database Search Results.J Am Soc Mass Spectrom,2002.13(4):p.378-386.
    62.Baczek T,B.A.,Ivanov Ar and Kaliszan R,Artificial Neural Network Analysis for Evaluation of Peptide MS/MS Spectra in Proteomics.Anal Chem,2004.76(6):p.1726-1732
    63.Jane Razumovskaya,V.O.D.X.E.C.U.N.C.V.R.L.H.Y.X.,A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST.2004.p.961-969.
    64.Anderson DC,L.W.,Payan DG and Noble WS,A New Algorithm for the Evaluation of Shotgun Peptide Sequencing in Proteomics:Support Vector Machine Classification of Peptide MS/MS Spectra and SEQUEST Scores.J Proteome Res,2003.2(2):p.137-146.
    65.Higdon R,H.J.,Van Belle G,Kolker E,Randomized sequence databases for tandem mass spectrometry peptide and protein identification.OMICS,2005.9(4):p.364-379.
    66.Andersen JS,L.Y.,Leung AK,Ong SE,et al,Nucleolar proteome dynamics. Nature,2005.433(7021):p.77-83.
    67.Feny(o|¨) D,B.R.,A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes.Anal Chem,2003.75(4):p.768-774.
    68.Eriksson J,C.B.,Feny(o|¨) D,A statistical basis for testing the significance of mass spectrometric protein identification results.Anal Chem,2000.72(5):p.999-1005.
    69.Elias JE,G.S.,Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.Nat Methods,2007.4(3):p.207-214.
    70.de Godoy LM,O.J.,de Souza GA,Li G,Mortensen P,Mann M,Status of complete proteome analysis by mass spectrometry:SILAC labeled yeast as a model system.Genome Biol,2006.7(6):p.R50.
    71.Adachi J,K.C.,Zhang Y,Olsen JV,Mann M,The human urinary proteome contains more than 1500 proteins including a large proportion of membranes proteins.Genome Biol,2006.7(9):p.R80.
    72.Everley PA,B.C.,Elias JE,Waghorne CG,Beausoleil SA,Gerber SA,Faherty BK,Zetter BR,Gygi SP,Enhanced analysis of metastatic prostate cancer using stable isotopes and high mass accuracy instrumentation.J Proteome Res,2006.5(5):p.1224-1231.
    73.Pilch B,M.M.,Large-scale and high-confidence proteomic analysis of human seminal plasma.Genome Biol,2006.7(5):p.R40.
    74.Desiere F,D.E.,Nesvizhskii AI etc al,Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry Genome Biol,2005.6(1):p.R9.
    75.Alves,G.,et al.,Calibrating E-values for MS2 database search methods.Biol Direct,2007.2(1):p.26.
    76.Alves,G.,A.Y.Ogurtsov,and Y.K.Yu,RAId DbS:Peptide Identification using Database Searches with Realistic Statistics.Biol Direct,2007.2(1):p.25.
    77.Choi,H.and A.I.Nesvizhskii,Semisupervised Model-Based Validation of Peptide Identifications in Mass Spectrometry-Based Proteomics.J Proteome Res,2007.
    78.Choi,H.,D.Ghosh,and A.I.Nesvizhskii,Statistical Validation of Peptide Identifications in Large-Scale Proteomics Using the Target-Decoy Database Search Strategy and Flexible Mixture Modeling.J Proteome Res,2007.
    79.Kall,L.,et al.,Semi-supervised learning for peptide identification from shotgun proteomics datasets.Nat Methods,2007.4(11):p.923-5.
    80.Jiang,X.,et al.,Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics.BMC Bioinformatics,2007.8:p.323.
    81.C,R.C.a.R.,TANDEM:matching proteins with mass spectra.Bioinformatics,2004.20(9):p.1466-1467.
    82.Domon B,A.R.,Challenges and opportunities in proteomic data analysis.Mol Cell Proteomics,2006.5(10):p.921-926.
    83.Shadforth I,C.D.,Bessant C,Protein and peptide identification algorithms using MS for use in high-throughput,automated pipelines.Proteornics,2005.5(16):p.4082-4095.
    84.Venable JD,Y.J.,Impact of 1on Trap Tandem Mass Spectra Variability on the Identification of Peptides.Anal Chem,2004.76(10):p.2928-2937.
    85.Domon B,A.R.,Mass spectrometry and protein analysis.Science,2006.312(5771):p.212-217.
    86.Fenn JB,M.M.,Meng CK,Wong SF,Whitehouse CM,Electrospray ionization for mass spectrometry of large biomolecules.Science,1989.246(4926):p.64-71.
    87.Hillenkamp F,K.M.,Beavis RC,Chait BT,Matrix-assisted laser desorption/ionization mass spectrometry of biopolymers.Anal Chem,1991.63(24):p.1193A-1203A.
    88.Graber A,J.P.,Khainovski N,Parker KC,Patterson DH,Martin SA,Result-driven strategies for protein identification and quantitation-a way to optimize experimental design and derive reliable results.Proteomics,2004.4(2):p.474-489.
    89.Dass,C.,Principles and Practice of Biological Mass Spectrometry.1 edition ed.2000:Wiley-Interscience.
    90.Makarov A,D.E.,Lange O,Homing S,Dynamic range of mass accuracy in LTQ Orbitrap hybrid mass spectrometer.J Am Soc Mass Spectrom,2006 17(7):p.977-982.
    91.Paizs B,S.S.,Fragmentation pathways of protonated peptides.2005.24(4):p.508-548.
    92.Wysocki VH,T.G.,Smith LL,Breci LA,Mobile and localized protons:a framework for understanding peptide dissociation.J Mass Spectrom,2000.35(12):p.1399-1406.
    93.Bridgewater JD,S.R.,Lim J,Vachet RW,The effect of histidine oxidation on the dissociation patterns of peptide ions.J Am Soc Mass Spectrom,2007.18(3):p.553-562.
    94.Said,A.S.,A modified plate model in chromatography.1978,Journal of High Resolution Chromatography.p.203-204.
    95.Gorges Guiochon,A.F.,Dean G.G.Shirazi,Fundamentals of Preparative and Nonlinear Chromatography,Second Edition 2006.
    96.林炳昌编著,色谱模型理论导引.2004:科学出版社.216.
    97.Krokhin OV,C.R.,Spicer V,Ens W,Standing KG,Beavis RC,Wilkins JA,An improved model for prediction of retention times of tryptic peptides in ion pair reversed-phase HPLC:its application to protein peptide mapping by off-line HPLC-MALDIMS.Mol Cell Proteomics,2004.3(9):p.908-919.
    98.Two-dimensional gel electrophoresis of proteins.Nat Meth,2005.2(1):p.83-84.
    99.Canelle L,P.C.,Marie A,Bousquet J,Bigeard J,Lutomski D,Kadri T,Caron M,Joubert-Caron R,Automating proteome analysis:improvements in throughput,quality and accuracy of protein identification by peptide mass fingerprinting.Rapid Commun Mass Spectrom,2004.18(23):p.2785-2794.
    100.Steen H,M.M.,The ABC's(and XYZ's) of peptide sequencing.Nat Rev Mol Cell Biol,2004.5(9):p.699-711.
    101.Perkins,D.,Pappin,DJ,Creasy,DM,Cottrell,JS,Probability-based protein identification by searching sequence databases using mass spectrometry data.Electrophoresis,1999.20(18):p.3551-3567.
    102.Macek,B.,et al.,Top-down protein sequencing and MS3 on a hybrid linear quadrupole ion trap-orbitrap mass spectrometer.Mol Cell Proteomics,2006.
    103.Takao T,G.J.,Yoshidome K,Sato K,Asada T,Kammei Y,Shimonishi Y,Automatic precursor-ion switching in a four-sector tandem mass spectrometer and its application to acquisition of the MS/MS product ions derived from a partially (18)O-labeled peptide for their facile assignments.Anal Chem,1993.65(17):p.2394-2399.
    104.http://us.expasy.org/.
    105.Rogalski JC,L.M.,Sniatynski MJ,Taylor RJ,Youhnovski N,Przybylski M,Kast J,Statistical evaluation of electrospray tandem mass spectra for optimized peptide fragmentation.J Am Soc Mass Spectrom,2005.16(4):p.505-514.
    106.Zhang X,A.J.,Adamec J,Ouzzani M,Elmagarmid AK,Data pre-processing in liquid chromatography-mass spectrometry-based proteomics.Bioinformatics,2005.21(21):p.4054-4059.
    107.Zhang N,L.X.,Ye M and Pan S,et al,ProbIDtree:an automated software program capable of identifying multiple peptides from a single collision-induced dissociation spectrum collected by a tandem mass spectrometer.Proteomics,20055(16):p.4096-4106.
    108.Liu H,S.R.,Yates JR,A model for random sampling and estimation of relative protein abundance in shotgunproteomics.Anal Chem,2004 76(14):p.4193-4201.
    109.Liu C,Y.B.,Song Y,Xu Y,Cai L,Peptide sequence tag-based blind identification of post-translational modifications with point process model Bioinformatics,2006.22(14):p.e307-13.
    110.Vizcaino JA,M.L.,Hermjakob H,Julian RK,Paton NW,The PSI formal document process and its implementation on the PSI website.Proteomics,2007.[Epub ahead of print].
    111.Desiere F,D.E.,King NL,Nesvizhskii AI,Mallick P,Eng J,Chen S,Eddes J,Loevenich SN,Aebersold R,The PeptideAtlas project.Nucleic Acids Res,2006.34(Database issue):p.D655-8.
    112.Prince JT,C.M.,Wang R,Lu P,Marcotte EM,The need for apublic proteomics repository.Nat.Biotechnol,2004.22(4):p.471-472.
    113.Jones P,C.R.,Martens L,Quinn AF,Taylor CF,Derache W,Hermjakob H,Apweiler R,PRIDE:a public repository of protein and peptide identifications for the proteomics community.Nucleic Acids Res,2006.34(Database issue):p.D659-63.
    114.Kristensen DB,B.J.,Nielsen PA,Andersen JR,Sφrensen OT,Jφrgensen V,Budin K,Matthiesen J,Venφ P,Jespersen HM,Ahrens CH,Schandorff S,Ruhoff PT,Wisniewski JR,Bennett KL,Podtelejnikov AV,Experimental Peptide Identification Repository(EPIR):an integrated peptide-centric platform for validation and mining of tandem mass spectrometry data.Mol Cell Proteomics,2004.3(10):p.1023-1038.
    115.McLaughlin T,S.J.,Selley J,Lynch JA,Lau KW,Yin H,Gaskell SJ,Hubbard SJ,PepSeeker:a database of proteome peptide identifications for investigating fragmentation patterns.Nucleic Acids Res,2006.34(Database issue):p.D649-54.
    116.F(a|¨)lth M,S.K.,Norrman M,Svensson M,Feny(o|¨) D,Andren PE,SwePep,a database designed for endogenous peptides and mass spectrometry.Mol Cell Proteomics,2006.5(6):p.998-1005.
    117.http://www.nist.gov/srd/.
    118.www.ProteomeCommons.org.
    119.Colinge J,M.A.,Carbonell P,Appel RD,InSilicoSpectro:an open-source proteomics library.J Proteome Res,2006.5(3):p.619-624.
    120.Kohlbacher O,R.K.,Gr(o|¨)pl C,Lange E,Pfeifer N,Schulz-Trieglaff O,Sturm M,TOPP-the OpenMS proteomics pipeline.Bioinformatics,2007.23(2):p.e191-197.
    121.Leptos KC,S.D.,Jaffe JD,Krastins B,Church GM,MapQuant:open-source software for large-scale protein quantification.Proteomics,2006.6(6):p.1770-1782.
    122.Falkner JA,K.M.,Veine DM,Walker A,Strahler JR,Andrews PC,Validated MALDI-TOF/TOF mass spectra for protein standards.J Am Soc Mass Spectrom,2007.18(5):p.850-855.
    123.Nesvizhskii AI,K.A.,Kolker E,Aebersold R,A Statistical Model for Identifying Proteins by Tandem Mass Spectrometry.Anal.Chem,2003.75(17):p.4646-4658.
    124.http://sourceforge.net/project/showfiles.php?group_id=6928.
    125.WS,N.,Data hoarding is harmingproteomics.Nat Biotechnol,2004.22(10):p.1209.
    126.Keller A,P.S.,Nesvizhskii AI,Stolyar S,Goodlett DR,Kolker E,Experimental protein mixture for validating tandem mass spectral analysis.OMICS,2002.6(2):p.207-212.
    127.IPI2.33.fip://fip.ebi.ac.uk/pub/databases/IPI/old/HUMAN/ipi.fasta,v2.33.gz.
    128.IPI3.10,fip://tip.ebi.ac.uk/pub/databases/IPI/old/BO VIN/ipi.BOVIN.v3.10.fasta.
    129.Purvine S,P.A.,Kolker E,Standard mixtures for proteome studies.OMICS,2004.8(1):p.79-92.
    130.Heidelberg JF,P.I.,Nelson KE,Gaidos EJ,Nelson WC,Read TD,Eisen JA,Seshadri R,Ward N,Methe B,Clayton RA,Meyer T,Tsapin A,Scott J,Beanan M,Brinkac L,Daugherty S,DeBoy RT,Dodson RJ,Durkin AS,Haft DH,Kolonay JF,Madupu R,Peterson JD,Umayam LA,White O,Wolf AM,Vamathevan J,Weidman J,Impraim M,Lee K,Berry K,Lee C,Mueller J,Khouri H,Gill J,Utterback TR,McDonald LA,Feldblyum TV,Smith HO,Venter JC,Nealson KH,Fraser CM,Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis.Nat Biotechnol,2002.20(11):p.1118-1123.
    131.Beausoleil,S.A.,et al.,A probability-based approach for high-throughput protein phosphorylation analysis and site localization.Nat Biotechnol,2006.24(10):p.1285-92.
    132.Tolmachev AV,M.M.,Jaitly N,Petyuk VA,Adkins JN,Smith RD,Mass measurement accuracy in analyses of highly complex mixtures based upon multidimensional recalibration.Anal Chem,2006.78(24):p.8374-8385.
    133.Zubarev R,M.M.,On the proper use of mass accuracy in proteomics.Mol Cell Proteomics,2007.6(3):p.377-381.
    134.Rao,C.R.,Computational Statistics,in Handbook of Statistics,C.R.Rao,Editor.1993,NL Elsevier/North-Holland.
    135.Holland,P.W.,and R.E.Welsch,Robust Regression Using Iteratively Reweighted Least-Squares.Communications in Statistics:Theory and Methods,1977.A6:p.813-827.
    136.Levenberg,K.,A Method for the Solution of Certain Problems in Least Squares.Quart.Appl.Math,1944.2:p.164-168.
    137.Moshe Havilio,Y.H.,and Zeev Smilansky,Intensity-Based Statistical Scorer for Tandem Mass Spectrometry.Anal Chem,2003.75(3):p.435-444.
    138.Rockwood AL,K.M.,Nelson GJ,Dissociation of individual isotopic peaks:predicting isotopic distributions of product ions in MSn.J Am Soc Mass Spectrom,2003.14(4):p.311-322.
    139.RO Duda,P.H.a.D.S.,Pattern Classification.2nd Wiley-Interscience,2000:p.587-595.
    140.Turner PG,T.S.,Goulermas JY,Hampton K,Simulation of high-and low-resolution mass spectra for assessment of calibration methods.Rapid Commun Mass Spectrom,2007.21(3):p.305-313.
    141.Wolski WE,L.M.,Martus P,Herwig R,Giavalisco P,Gobom J,Sickmann A,Lehrach H,Reinert K,Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process.BMC Bioinformatics,2005.30(6):p.285.
    142.Torkkola,K.,Feature extraction by non parametric mutual information maximization.J.Mach.Learn.Res,2003 3(MIT Press):p.1415-1438.
    143.Nesvizhskii AI,V.O.,Aebersold R,Analysis and validation of proteomic data generated by tandem mass spectrometry.Nat Methods,2007.4(10):p.787-97.
    144.Tabb DL,F.C.,Chambers MC,MyriMatch:highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis.J Proteome Res,2007.6(2):p.654-61.
    145.http://bioinformatics.icmb.utexas.edu/OPD/.[cited;Available from:http://bioinformatics.icmb.utexas.edu/OPD/.
    146.ftp://genome-fip.stanfordedu/pub/yeast/sequence/.[cited;Available from:ftp://genome-ftp.stanford.edu/pub/yeast/sequence/.
    147.http://www.peptideatlas.org/.[cited;Available from:http://www.peptideatlas.org/.
    148.ftp://ftp.ebi.ac.uk/pub/databases/IPI/old/HUMAN/ipi.HUMAN.fasta.v2.31.gz.[cited;Available from:ftp://ftp.ebi.ac.uk/pub/databases/IPI/old/HUMAN/ipi.HUMAN.fasta.v2.31.gz.
    149.Matthiesen R,B.J.,Stensballe A,Jensen ON,Welinder KG,Bauw G,Database-independent,database-dependent,and extended interpretation of peptide mass spectra in VEMS V2.0.Proteomics,2004.4(9):p.2583-2589.
    150.Elias,J.E.,et al.,Intensity-based protein identification by machine learning from a library of tandem mass spectra.Nat Biotech,2004.22(2):p.214-219.
    151.Sch(u|¨)tz F,K.E.,Simpson RJ,Speed TP,Deriving statistical models for predicting peptide tandem MS product ion intensities.Biochem Soc Trans,2003.31(Pt 6):p.1479-1483.
    152.Havilio M,H.Y.,Smilansky Z,Intensity-based statistical scorer for tandem mass spectrometry.Anal Chem,2003.75(3):p.435-444.
    153.Zhang,Z.,Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides.Anal Chem,2004.76(14):p.3908-3922.
    154.Z,Z.,Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges.Anal Chem,2005.77(19):p.6364-6373.
    155.Cannon,W.R.,et al.,Evaluation of the Influence of Amino Acid Composition on the Propensity for Collision-Induced Dissociation of Model Peptides Using Molecular Dynamics Simulations,in Journal:Journal of the American Society for Mass Spectrometry,18(9):1625-1637.2007:United States.
    156.K,Y.,Theoretical and numerical analysis of the behavior of ions injected into a quadrupole ion trap mass spectrometer.Rapid Commun Mass Spectrom,.2000.14(4):p.215-223.
    157.Zhu,Z.T.,et al.,A new fragmentation rearrangement of the N-terminal protected amino acids using ESI-MS/MS.Indian J Biochem Biophys,2006.43(6):p.372-6.
    158.ergey,J.,A general approach to calculating isotopic distributions for mass spectrometry.Int.J.Mass Spectrom.Ion Phys,1983.52:p.337-349.
    159.Rockwood,A.L.a.V.O.,S.L,Ultrahigh-Speed Calculation of Isotope Distributions. Anal.Chem,1996.68(13):p.2027-2030.
    160.ftp://ftp.ebi.ac.uk/pub/databases/IPI/old/HUMAN/ipi.fasta.v3.07.gz.
    161.Sun S,M.-A.K.,Eichelberger B,Brown R,Yen CY,Old WM,Pierce K,Cios KJ,Ahn NG,Resing KA,Improved validation of peptide MS/MS assignments using spectral intensity prediction.Mol Cell Proteomics,2007.6(1):p.1-17.
    162.Hastings CA,N.S.,Roy S,New algorithms for processing and peak detection in liquid chromatography/mass spectrometry data.Rapid Commun Mass Spectrom,2002.16(5):p.462-467.
    163.Grossmann J,R.F.,Cieliebak M,Lipták Z,Mathis LK,M(u|¨)ller M,Gruissem W,Baginsky S,A UDENS:a tool for automated peptide de novo sequencing.J Proteome Res,2005.4(5):p.1768-1774.
    164.Zhang X,H.W.,Adamec J,Asara JM,Naylor S,Regnier FE,An automated method for the analysis of stable isotope labeling data in proteomics.J Am Soc Mass Spectrom,2005.16(7):p.1181-1191.
    165.Zhang N,A.R.,Schwikowski B,ProbID:a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data.Proteomics,2002.2(10):p.1406-1412.
    166.Palmblad M,R.M.,Bailey CG,McCutchen-Maloney SL,Bergquist J,Zeller LC,Protein identification by liquid chromatography-mass spectrometry using retention time prediction.J Chromatogr B Analyt Technol Biomed Life Sci,2004.803(1):p.131-135.
    167.Wang Y,Z.J.,Gu X,Zhang XM,Protein identification assisted by the prediction of retention time in liquid chromatography/tandem mass spectrometry.J Chromatogr B Analyt Technol Biomed Life Sci,2005.826(1-2):p.122-128.
    168.M,P.,Retention time prediction and protein identification.Methods Mol Biol,2007.367:p.195-207.
    169.Nikitas P,P.-L.A.,Expressions of the fundamental equation of gradient elution and a numerical solution of these equations under any gradient profile.Anal Chem,2005.77(17):p.5670-5677.
    170.Roman Kaliszan,T.B.,et al.,Prediction of gradient retention from the linear solvent strength(LSS) model,quantitative structure-retention relationships(QSRR),and artificial neural networks(ANN).2003.p.271-282.
    171.Kaliszan R,B.T.,Cimochowska A,Juszczyk P,Wisniewska K,Grzonka Z,Prediction of high-performance liquid chromatography retention of peptides with the use of quantitative structure-retention relationships.Proteomics,2005.5(2):p.409-415.
    172.Baczek T,W.P.,Marszall M,Heyden YV,Kaliszan R.,Prediction of Peptide Retention at Different HPLC Conditions from Multiple Linear Regression Models.J.Proteome Res,2005 4(2):p.555-563.
    173.Shinoda K,S.M.,Yachie N,Sugiyama N,Masuda T,Robert M,Soga T,Tomita M,Prediction of liquid chromatographic retention times of peptides generated by protease digestion of the Escherichia coli proteome using artificial neural networks.J Proteome Res,2006.5(12):p.3312-3317.
    174.Put R,D.M.,Baczek T,Vander Heyden Y,Retention prediction of peptides based on uninformative variable elimination by partial least squares.J Proteome Res,2006.5(7):p.1618-1625.
    175.Petritis K,K.L.,Ferguson PL,Anderson GA,Pasa-Tolic L,Lipton MS,Auberry KJ,Strittmatter EF,Shen Y,Zhao R,Smith RD,Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses.Anal Chem,2003.75(5):p.1039-1048.
    176.Salgado JC,R.I.,Asenjo JA,Prediction of retention times of proteins in hydrophobic interaction chromatography using only their amino acid composition.J Chromatogr A,2005.1098(1-2):p.44-54.
    177.Petritis K,K.L.,Yan B,Monroe ME,Strittmatter EF,Qian WJ,Adkins JN,Moore RJ,Xu Y,Lipton MS,Camp DG,Smith RD,Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information.Anal Chem,2006 78(14):p.5026-5039.
    178.Tripet B,C.D.,Kovacs JM,Mant CT,Krokhin OV,Hodges RS,Requirements for prediction of peptide retention time in reversed-phase high-performance liquid chromatography:hydrophilicity/hydrophobicity of side-chains at the N- and C-termini of peptides are dramatically affected by the end-groups and location.J Chromatogr A,2007.1141(2):p.212-225.
    179.Sturm M,Q.S.,Huber CG,Kohlbacher O,A statistical learning approach to the modeling of chromatographic retention of oligonucleotides incorporating sequence and secondary structure data.Nucleic Acids Res,2007.[Epub ahead of print].
    180.Strittmatter EF,K.L.,Petritis K,Mottaz HM,Anderson GA,Shen Y,Jacobs JM,Camp DG,Smith RD,Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry.J Proteome Res,2004.3(4):p.760-769.
    181.Zimmer JS,M.M.,Qian W J,Smith RD,Advances in proteomics data analysis and display using an accurate mass and time tag approach.Mass Spectrom Rev,2006.25(3):p.450-482.
    182.范金程,梅长林,数据分析.北京市:科学出版社,2002:p.P9-10.
    183.Jaffe JD,M.D.,Leptos KC,Church GM,Gillette MA,Carr SA,PEPPeR,a platform for experimental proteomic pattern recognition.Mol Cell Proteomics,2006.5(10):p.1927-1941.
    184.Tabb DL,H.Y.,Wysocki VH,Yates JR,Influence of basic residue content on fragment ion peak intensities in low-energy collision-induced dissociation spectra of peptides.Anal Chem,2004.76(5):p.1243-1248.
    185.Huang Y,T.J.,Tseng GC,Pasa-Tolic L,Lipton MS,Smith RD,Wysocki VH,Statistical characterization of the charge state and residue dependence of low-energy CID peptide dissociation patterns.Anal Chem,2005.77(18):p.5800-5813.
    186.C.Archambeau,M.V.,Fully nonparametric probability density function estimation with finite gaussian mixture models.In 7th ICPAR Conf,2003.p81-84.
    187.Colinge J,M.J.,Dessingy T,Giron M,Masselot A,Improvedpeptide charge state assignment.Proteomics,2003.3(8):p.1434-1440.
    188.Hogan JM,H.R.,Kolker N,Kolker E,Charge state estimation for tandem mass spectrometry proteomics.OMICS,2005.9(3):p.233-250.
    189.Klammer AA,W.C.,MacCoss MJ,Noble WS,Peptide charge state determination for low-resolution tandem mass spectra.Proc IEEE Comput Syst Bioinform Conf,2005:p.175-185.
    190.Fridman T,R.J.,Verberkmoes N,Hurst G,Protopopescu V,Xu Y,The probability distribution for a random match between an experimental-theoretical spectral pair in tandem mass spectrometry.J Bioinform Comput Biol,2005 3(2):p.455-476.
    191.Schwartz JC,J.I.,Quadrupole ion trap mass spectrometry.Methods Enzymol, 1996.270:p.552-586.
    192.Kast J,G.M.,Wilm M,Richardson K,Noise filtering techniques for electrospray quadrupole time of flight mass spectra.J Am Soc Mass Spectrom,2003.14(7):p.766-776.
    193.AI,N.,Protein identification by tandem mass spectrometry and sequence database searching.Methods Mol Biol,2007.367:p.87-119.
    194.http://www.peptideatlas.org/repository/.[cited;Available from:http://www.peptideatlas.org/repository/.
    195.ftp://ftp.ebi.ac.uk/pub/databases/IPI/old/HUMAN/ipi.HUMAN.v3.01.fasta.gz.[cited;Available from:ftp://ftp.ebi.ac.uk/pub/databases/IPI/old/HUMAN/ipi.HUM-AN.v3.01.fasta.gz.
    196.Chen M,Y.W.,Song Y,Liu X,Yang B,Wu S,Jiang Y,Cai Y,He F,Qian X,Analysis of human liver proteome using replicate shotgun strategy.Proteomics,2007.7(14):p.2479-88.
    197.ftp://ftp.ebi.ac.uk/pub/databases/IPI/old/HUMAN/ipi.HUMAN.v3.19.fasta.gz.
    198.sPRG.ABRF sPRG Study 2006 Poster.2006[cited;Available from:http://www.abrf.org/index.cfm/group.show/ProteomicsStandardsResearchGroup.47.htm.
    199.Snedecor,G.W.a.C.,William G,in Statistical Methods,Eighth Edition.1989:Iowa State University Press p.p76-78.
    200.Ojo.MO,O.A.,On the generalized logistic and Log-logistic distributions.Kragujevac J.Math,2003.25:p.65-73.
    201.Chen,Y.B.,N.C,Estimation of Ricean and Nakagami distribution parameters using noisy samples.Communications,2004 IEEE International Conference on,2004.1:p.562-566.
    202.Marsh H W,B.J.R.,Goodness-of-fit indices in confirmatory factor analysis:The effect of sample size and model complexity.Quality & Quantity,1994.28:p.185-217.
    203.Peter M(u|¨)ller,F.A.Q.,Nonparametric Bayesian Data Analysis.Statist.Sci,2004.19:p.95-110.
    204.Jenq-Neng Hwang Shyh-Rong Lay Lippman,A.,Nonparametric multivariate density estimation:a comparative study.IEEE Transactions on Signal Processing,1994.42(10):p.2795-2810.
    205.Vladimir,K.and S.Ilya,Nonparametric density estimation with adaptive varying window size,B.S.Sebastiano,Editor.2001,SPIE.p.141-150.
    206.Richard O.Duda,P.E.H.,David G.Stork,Pattern Classification,Second Edition,in Pattern Classification,Second Edition.2001,John Wiley.p.3-13.
    207.Cotter,N.E.,The Stone-Weierstrass theorem and its application to neuralnetworks.IEEE Transactions on Neural Networks,1990.1(4):p.290-295.
    208.Bilmes,J.A.,A gentle tutorial of the EM algorithm and its applications to parameter estimation for gaussian mixture and hidden Markov models.International Computer Science Institute,Berkeley,California,1998.Technical Report TR-97-021.
    209.D.Modha,W.S.-S.,Feature weighting in k-means clustering.Machine Learning,2003.52(3):p.217-237.
    210.Snedecor,G.W.a.C.,William G,Statistical Methods,Eighth Edition.Iowa State University Press,1989:p.76-78.
    211.成世学等编著,非参数假设检验,in概率统计.1994,北京:中国人民大学出 版社.p.p214-230.
    212.Darken.,J.M.a.C.J.,Fast learning in networks of locally tuned processing units..Neural Computation,1989.1:p.281-294.
    213.Silverman,B.W.,Density estimation for statistics and data analysis.Chapman Hall:London 1986.
    214.王惠文,偏最小二乘回归方法及应用.1996,北京:国防科技出版社.
    215.Pan C,K.C.,Tabb DL,Pelletier DA,McDonald WH,Hurst GB,Hettich RL,Samatova NF,Robust Estimation of Peptide Abundance Ratios and Rigorous Scoring of Their Variability and Bias in Quantitative Shotgun Proteomics.Anal.Chem,2006.78(20):p.7110-7120.
    216.Pevtsov S,F.I.,Mirzaei H,Buck C,Zhang X,Performance evaluation of existing de novo sequencing algorithms.J Proteorne Res,2006.5(11):p.3018-3028.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700