摘要
基于串联质谱技术的蛋白质组学已经成为生命科学领域的重要工具,其中肽段的理论串联质谱图(通常也被称为二级谱图)预测问题在近年来广受关注.大量高质量质谱数据的积累和计算技术的发展为此问题的解决提供了有效途径.肽段的理论二级谱图预测的方法可以分为两大类,一类是基于物理模型的方法,即基于移动质子模型的方法,例如MassAnalyzer、MS-Simulator;另一类是基于机器学习的方法,包括集成学习相关算法和基于神经网络的方法,例如PeptideART、MS2PIP、MS2PBPI和p Deep等.本文对这两大类方法进行了整理和综述,并简要指出了目前理论谱图预测方法存在的一些不足,展望了未来的发展方向.
Tandem mass spectrometry(MS/MS)-based proteomics has become one of the most important tools in bioscience, and researchers now pay much attention to the prediction of MS/MS spectra for protein identification and quantification. With the accumulation of massive high-quality spectrum data and the development of computing technology, quite a few new methods were emerged to solve this problem. These methods can be divided into two catagories:mobile proton model-based methods, such as MassAnalyzer and MS-Simulator; and machine learning-based methods, including traditional machine learning and deep learning, such as PeptideART,MS2PIP, MS2PBPI and pDeep. In this paper, we investigated a wide variety of corresponding methods, and briefly pointed out the deficiencies of existing software tools, and suggested the future work.
引文
[1]Lam H,Deutsch E W,Eddes J S,et al.Building consensus spectra libraries for peptide identification in proteomics.Nature Methods2008,5(10):873-875
[2]Lam H,Aebersold R.Building and searching tandem mass(MSMS)spectral libraries for peptide identification in proteomics.Na Methods(San Diego,Calif.),2011,54(4):424-431
[3]Kusebauch U,Campbell D S,Deutsch E W,et al.Human SRMAtlas:a resource of targeted assays to quantify the complete human proteome.Cell,2016,166(3):766-778
[4]Farrah T,Deutsch E W,Hoopmann M R,et al.The state of the human proteome in 2012 as viewed through peptide Atlas.Journal of Proteome Research,2013,12(1):162-171
[5]Wilhelm M,Schlegl J,Hahne H,et al.Mass-spectrometry-based draft of the human proteome.Nature,2014,509(7502):582-587
[6]Schmidt T,Samaras P,Frejno M,et al.Proteomics DB.Nucleic Acids Research,2018,46(D1):D1271-D1281
[7]Kim M S,Pinto S M,Getnet D,et al.A draft map of the human proteome.Nature,2014,509(7502):575-581
[8]Ezkurdia I,Vazquez J,Valencia A,et al.Analyzing the first drafts of the human proteome.Journal of Proteome Research,2014,13(8):3854-3855
[9]Ezkurdia I,Calvo E,Del Pozo A,et al.The potential clinical impact of the release of two drafts of the human proteome.Expert Review of Proteomics,2015,12(6):579-593
[10]Eng J K,Mccormack A L,Yates J R.An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database.J Am Soc Mass Spectrom,1994,5(11):976-989
[11]Perkins D N,Pappin D J C,Creasy D M,et al.Probability-based protein identification by searching sequence databases using mass spectrometry data.Electrophoresis,1999,20(18):3551-3567
[12]Craig R,Beavis R C.TANDEM:matching proteins with tandem mass spectra.Bioinformatics,2004,20(9):1466-1467
[13]Geer L Y,Markey S P,Kowalak J A,et al.Open mass spectrometry search algorithm.J Proteome Res,2004,3(5):958-964
[14]Bafna V,Edwards N.SCOPE:a probabilistic model for scoring tandem mass spectra against a peptide database.Bioinformatics,2001,17(Suppl 1):S13-S21
[15]Zhang N,Aebersold R,Schwikowski B.Prob ID:a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data.Proteomics,2002,2(10):1406-1412
[16]Sun R X,Dong M Q,Song C Q,et al.Improved peptide identification for proteomic analysis based on comprehensive characterization of electron transfer dissociation spectra.JProteome Res,2010,9(12):6354-6367
[17]Kim S,Mischerikow N,Bandeira N,et al.The generating function of CID,ETD,and CID/ETD pairs of tandem mass spectra:applications to database search.Molecular&Cellular Proteomics,2010,9(12):2840-2852
[18]Zhang J,Xin L,Shan B,et al.PEAKS DB:de novo sequencing assisted database search for sensitive and accurate peptide identification.Molecular&Cellular Proteomics,2012,11(4):M111.010587
[19]Chi H,Chen H F,He K,et al.p Novo+:de novo peptide sequencing using complementary HCD and ETD tandem mass spectra.Journal of Proteome Research,2013,12(2):615-625
[20]Ma B,Zhang K Z,Hendrie C,et al.PEAKS:powerful software for peptide de novo sequencing by tandem mass spectrometry.Rapid Communications in Mass Spectrometry,2003,17(20):2337-2342
[21]Frank A,Pevzner P.Pep Novo:de novo peptide sequencing via probabilistic network modeling.Analytical Chemistry,200577(4):964-973
[22]Frese C K,Altelaar A F,van den Toorn H,et al.Toward full peptide sequence coverage by dual fragmentation combining electrontransfer and higher-energy collision dissociation tandem mass spectrometry.Anal Chem,2012,84(22):9668-9673
[23]Zhang Z.Prediction of low-energy collision-induced dissociation spectra of peptides.Analytical Chemistry,2004,76(14):3908-3922
[24]Zhang Z.Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges.Analytica Chemistry,2005,77(19):6364-6373
[25]Sun S,Yang F,Yang Q,et al.MS-simulator:predicting Y-ion intensities for peptides with two charges based on the intensity ratio of neighboring ions.Journal of Proteome Research,201211(9):4509-4516
[26]Li S,Arnold R J,Tang H,et al.On the accuracy and limits of peptide fragmentation spectrum prediction.Analytical Chemistry2011,83(3):790-796
[27]Arnold R J,Jayasankar N,Aggarwal D,et al.A machine learning approach to predicting peptide fragmentation spectra.Pacific Symposium on Biocomputing,2006:219-230
[28]Dong N P,Liang Y Z,Xu Q S,et al.Prediction of peptide fragmen ion mass spectra by data mining techniques.Analytical Chemistry2014,86(15):7446-7454
[29]Zhou X X,Zeng W F,Chi H,et al.p Deep:predicting MS/MSspectra of peptides with deep learning.Analytical Chemistry2017,89(23):12690-12697
[30]Wysocki V H,Tsaprailis G,Smith L L,et al.Mobile and localized protons:a framework for understanding peptide dissociation Journal of Mass Spectrometry,2001,35(12):1399-1406
[31]Paizs B,Suhai S.Fragmentation pathways of protonated peptides Mass Spectrom Review,2004,24(4):508-548
[32]Wang Y,Yang F,Wu P,et al.Open MS-simulator-an open-source software for theoretical tandem mass spectrum prediction.BMCBioinformatics,2015,16(1):110
[33]Degroeve S,Martens L.MS2PIP-a tool for MS/MS peak intensity prediction.Bioinformatics,2013,29(24):3199-3203
[34]Frank A M.Predicting intensity ranks of peptide fragment ions Journal of Proteome Research,2009,8(5):2226-2240
[35]Zhou C,Bowler L D,Feng J.A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data.BMC Bioinformatics,2008,9(1):325
[36]Breiman L.Random forests.Machine Learning,2001,45(1):5-32
[37]Shotton J,Fitzgibbon A W,Cook M,et al.Real-time human pose recognition in parts from single depth images.CVPR,2011:1297-1304
[38]Friedman J H.Greedy function approximation:a gradient boosting machine.Annals of Statistics,2001,29(5):1189-1232
[39]Freund Y,Iyer R D,Schapire R E,et al.An efficient boosting algorithm for combining preferences.Journal of Machine Learning Research,2003,4(6):933-969
[40]Hinton G E.Reducing the dimensionality of data with neural networks.Science,2006,313(5786):504-507
[41]Elias J E,Gibbons F D,King O D,et al.Intensity-based protein identification by machine learning from a library of tandem mass spectra.Nature Biotechnology,2004,22(2):214-219
[42]Kapp E A,Schütz F,Reid G E,et al.Mining a tandem mass spectrometry database to determine the trends and global factors influencing peptide fragmentation.Analytical Chemistry,2003,75(22):6251-6264
[43]Marshall J A.Neural networks for pattern recognition.Neural Networks,1995,8(3):493-494
[44]Mac Kay D J.Bayesian methods for neural networks:theory and applications.1995,
[45]Le Cun Y,Bengio Y,Hinton G E.Deep learning.Nature,2015,521(7553):436-444
[46]Li W,Ji L,Goya J,et al.SQID:an intensity-incorporated protein identification algorithm for tandem mass spectrometry.JProteome Res,2011,10(4):1593-1602
[47]Klammer A A,Reynolds S M,Bilmes J A,et al.Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification.ISMB,2008,24(13):i348-i356
[48]Bielow C,Aiche S,Andreotti S,et al.MSSimulator:simulation of mass spectrometry data.Journal of Proteome Research,2011,10(7):2922-2929
[49]Zolg D P,Wilhelm M,Schmidt T,et al.Proteome Tools:systematic characterization of 21 post-translational protein modifications by LC-MS/MS using synthetic peptides.Mol Cell Proteomics,2018,17(9):1850-1863
[50]Zolg D P,Wilhelm M,Schnatbaum K,et al.Building proteometools based on a complete synthetic human proteome.Nat Methods,2017,14(3):259-262