基于多次重复液相质谱生物实验数据校准方法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research of Alignment Algorithm based on the Repeated LC-MS Biological Experimental Data
  • 作者:崔健 ; 马媛媛 ; 张法伟 ; 马鑫 ; 郭兆龙
  • 英文作者:Cui Jian;Ma Yuanyuan;Zhang Fawei;Ma Xin;Guo Zhaolong;Shengli College, China University of Petroleum;
  • 关键词:蛋白质组学 ; 液相质谱(LC-MS) ; 时间特征 ; 校准 ; 统计学习模型
  • 英文关键词:Proteomics;;LC-MS;;Time characteristics;;Alignment;;Statistical learning model
  • 中文刊名:GXNB
  • 英文刊名:Genomics and Applied Biology
  • 机构:中国石油大学胜利学院;
  • 出版日期:2019-03-25
  • 出版单位:基因组学与应用生物学
  • 年:2019
  • 期:v.38
  • 基金:2017年地方高校国家级大学生创新创业训练计划项目(201713386028)资助
  • 语种:中文;
  • 页:GXNB201903036
  • 页数:7
  • CN:03
  • ISSN:45-1369/Q
  • 分类号:263-269
摘要
在蛋白质组学中,进行液相质谱(LC-MS)实验谱数据处理,发现并分析生物标志物的复杂肽或蛋白质样本的差异是重点,而校准相同样本的多次重复实验中肽链产生的洗脱时间峰信号(LC峰)是进行量化、分析差异的关键。目前多个重复实验数据的校准通常是在重复的实验数据集中根据液相二级质谱(LC-MS/MS)实验标识LC峰的时间特征,然后使用翘曲函数对时间特征进行对齐。由于多重数据的洗脱时间误差产生是随机的,统一使用翘曲函数校准会产生较大误差。为了解决这个问题,本研究重点研究了多个重复实验数据中LC峰的时间校准算法。我们选取了两个重复实验数据,采用机器学习的思路,通过选用两个数据的LC-MS/MS中重复检测到的肽链数据作为可信数据,部分选为训练序列,部分作为测试序列,建立统计数学模型,提出了一种新的校准算法,并采用测试序列对该统计模型进行准确率测试,表明算法的准确性达到95%以上;然后,将该模型应用在两个实验数据的所有LC-MS/MS肽链检测值上,提高检测值在多个数据中的覆盖率,表明覆盖率可以到达85%以上。
        In proteomics, it is very important to process the experimental spectrum data of LC-MS and find the differences of complex peptides or protein sample of biomarkers. At present, the calibration of multiple repeated experimental data is usually to identify the time characteristics of corresponding peptide features(LC peaks)according to the LC-MS/MS experiment in the in the repeated experimental datasets, and then the warpage function is used to align the time characteristics. Since the elution time error of multiple data is produced in random, the unified use of warping function calibration will cause a large error. In order to solve this problem, this study focused on the time calibration algorithm of LC peak in multiple repeated experimental data. We selected two repeated experimental data, and established a statistical mathematical model by selecting the repeated detected peptide chain data in LC-MS/MS of the two data as the credible data, part as the training sequence and part as the test sequence based on the idea of machine learning. we proposed a new calibration algorithm model, and tested the accuracy of the statistical model based on the test sequence. The result showed that the accuracy reached more than 95%. And then the model was applied to all LC-MS/MS peptide chain detection values of the two experimental data to improve the coverage of detection values in multiple data, indicating that the coverage rate could reach more than 85%.
引文
Bellew M.,C oram M.,Fitzgibbon M.,Igra M.,Randolph T.,Wang P.,May D.,Eng J.,Fang R.H.,Lin C.W.,Chen J.Z.,Goodlett D.,Whiteaker J.,Paulovich A.,and Mclntosh M.,2006,A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS,Bioinformatics,22(15):1902-1909
    Bielow C.,Mastrobuoni G.,and Kempa S.,2016,Proteomics quality control:quality control software for MaxQuant results,J.Proteome Res.,15(3):777-787
    Eilers P.H.C.,2004,Parametric time warping,Anal.Chem.,76(2):404-411
    Jaitly N.,Monroe M.E.,Petyuk V.A.,Clauss T.R.,Adkins J.N.,and Smith R.D.,2006,Robust algorithm for alignment of liquid chromatography-mass spectrometry analyses in an accurate mass and time tag data analysis pipeline,Anal.Chem.,78(21):7397-7409
    Lange E.,Tautenhahn R.,Neumann S.,and Gropl C.,2008,Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements,BMC Bioinformatics,9(1):375
    Liu Z.J.,Yang X.H.,and Bi Y.P.,2006,Proteomics and its application in the agriculture,Fenzi Zhiwu Yuzhong(Molecular Plant Breeding),4(3):106-110(柳展基,杨小红,毕玉平,2006,蛋白质组学在农业中的应用,分子植物育种,4(3):106-110)
    Mann B.,Madera M.,Sheng Q.,Tang H.,Mechref Y.,and Novotny M.V.,2008,Proteinquant suite:a bundle of automated software tools for label-free quantitative proteomics,Rapid Commun.Mass Sp.,22(23):3823-3834
    Mueller L.N.,Rinner O.,Schmidt A.,Letarte S.,Bodenmiller B.,Brusniak,M.Y.,Vitek O.,Aebersole R.,and MMüller M.,2007,Superhirn-a novel tool for high resolution LCMS-based peptide/protein profiling,Proteomics,7(19):3470-3480
    Nielsen S.B.,Andersen J.U.,Hvelplund P.,Jorgensen T.J.D.,Sorensen M.,and Tomita S.,2002,Triply charged bradykinin and gramicidin radical cations:their formation and the selective enhancement of charge-directed cleavage processes,Int.J.Mass Spectrom.,213(2):225-235
    R觟st H.L.,Sachsenberg T.,Aiche S.,Bielow C.,Weisser H.,Aicheler F.,Andreotti S.,Ehrlich H.C.,Gutenbrunner P.,Kenar E.,Liang X.,Nahnsen S.,Nilse L.,Pfeuffer J.,Rosenberger G.,Ruril M.,Schilling O.,Choudhary J.S.,Malmstr觟m L.,Aebersold R.,Reinert K.,and Kohlbacher O.,2016,Openms:a flexible open-source software platform for mass spectrometry data analysis,Nat.Methods,13(9):741-748
    R觟st H.L.,Schmitt U.,Aebersold R.,and Malmstrom L.,2014,Pyopenms:a python-based interface to the openms massspectrometry algorithm library,Proteomics,14(1):74-77
    Smith R.,Ventura D.,and Prince J.T.,2015,LC-MS alignment in theory and practice:a comprehensive algorithmic review,Brief.Bioinform.,16(1):104
    Tyanova S.,Temu T.,and Cox J.,2016,The MaxQuant computational platform for mass spectrometry-based shotgun pro teomics,Nat.Protoc.,11(12):2301-2319
    van Nederkassel A.M.,Xu C.J.,Lancelin P.,Sarraf M.,Mackenzie D.A.,Walton N.J.Bensaid F.,Lees M.,Martin G.J.,Desmurs J.R.,Massart D.L.,Smeyers V.J.,and Vander H.Y.,2006,Chemometric treatment of vanillin fingerprint chromatograms effect of different signal alignments on principal component analysis plots,J.Chromatogr.A,1120(1-2):291-298
    Wang G.H.,Wu W.W.,Pisitkun T.,Hoffert J.D.,Knepper M.A.,and Shen R.F.,2006,Automated quantification tool for high-throughput proteomics using stable isotope labeling and LC-MSn,Ana.Chem.,78(16):5752-5761

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700