PPLine: An Automated Pipeline for SNP, SAP, and Splice Variant Detection in the Context of Proteogenomics

详细信息查看全文

作者：George Sergeevich Krasnov ; Alexey Alexandrovich Dmitriev ; Anna Viktorovna Kudryavtseva ; Alexander Valerievich Shargunov ; Dmitry Sergeevich Karpov ; Leonid Andreevich Uroshlev ; Natalya Vladimirovna Melnikova ; Vladimir Mikhailovich Blinov ; Ekaterina Vladimirovna Poverennaya ; Alexander Ivanovich Archakov ; Andrey Valerievich Lisitsa ; Elena Alexandrovna Ponomarenko
刊名：Journal of Proteome Research
出版年：2015
出版时间：September 4, 2015
年：2015
卷：14
期：9
页码：3729-3737
全文大小：434K
ISSN：1535-3907

文摘

The fundamental mission of the Chromosome-Centric Human Proteome Project (C-HPP) is the research of human proteome diversity, including rare variants. Liver tissues, HepG2 cells, and plasma were selected as one of the major objects for C-HPP studies. The proteogenomic approach, a recently introduced technique, is a powerful method for predicting and validating proteoforms coming from alternative splicing, mutations, and transcript editing. We developed PPLine, a Python-based proteogenomic pipeline providing automated single-amino-acid polymorphism (SAP), indel, and alternative-spliced-variants discovery based on raw transcriptome and exome sequence data, single-nucleotide polymorphism (SNP) annotation and filtration, and the prediction of proteotypic peptides (available at https://sourceforge.net/projects/ppline). In this work, we performed deep transcriptome sequencing of HepG2 cells and liver tissues using two platforms: Illumina HiSeq and Applied Biosystems SOLiD. Using PPLine, we revealed 7756 SAP and indels for HepG2 cells and liver (including 659 variants nonannotated in dbSNP). We found 17 indels in transcripts associated with the translation of alternate reading frames (ARF) longer than 300 bp. The ARF products of two genes, SLMO1 and TMEM8A, demonstrate signatures of caspase-binding domain and Gcn5-related N-acetyltransferase. Alternative splicing analysis predicted novel proteoforms encoded by 203 (liver) and 475 (HepG2) genes according to both Illumina and SOLiD data. The results of the present work represent a basis for subsequent proteomic studies by the C-HPP consortium.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700