Flexible Data Analysis Pipeline for High-Confidence Proteogenomics

详细信息查看全文

作者：Hendrik Weisser ; James C. Wright ; Jonathan M. Mudge ; Petra Gutenbrunner ; Jyoti S. Choudhary
刊名：Journal of Proteome Research
出版年：2016
出版时间：December 2, 2016
年：2016
卷：15
期：12
页码：4686-4695
全文大小：483K
ISSN：1535-3907

文摘

Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are “novel” peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, automated data analysis pipeline aimed at detecting such “novel” peptides in proteomic data sets. This pipeline implements criteria developed by proteomics and genome annotation experts for high-stringency peptide identification and filtering. Our pipeline is based on the OpenMS computational framework; it incorporates multiple database search engines for peptide identification and applies a machine-learning approach (Percolator) to post-process search results. We describe several new and improved software tools that we developed to facilitate proteogenomic analyses that enhance the wealth of tools provided by OpenMS. We demonstrate the application of our pipeline to a human testis tissue data set previously acquired for the Chromosome-Centric Human Proteome Project, which led to the addition of five new gene annotations on the human reference genome.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700