QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:QuaPra: Efficient transcript assembly and quantification using quadratic programming with Apriori algorithm
  • 作者:Xiangjun ; Ji ; Weida ; Tong ; Baitang ; Ning ; Christopher ; E.Mason ; David ; P.Kreil ; Pawel ; P.Labaj ; Geng ; Chen ; Tieliu ; Shi
  • 英文作者:Xiangjun Ji;Weida Tong;Baitang Ning;Christopher E.Mason;David P.Kreil;Pawel P.Labaj;Geng Chen;Tieliu Shi;The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University;National Center for Toxicological Research, U.S.Food and Drug Administration;Department of Physiology and Biophysics, Weill Cornell Medicine;The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine;Feil Family Brain & Mind Research Institute;Chair of Bioinformatics Research Group, Boku University;Malopolska Centre of Biotechnology, Jagiellonian University;APART Fellow, Austrian Academy of Science;National Center for International Research of Biological Targeting Diagnosis and Therapy, Guangxi Key Laboratory of Biological Targeting Diagnosis and Therapy Research, Collaborative Innovation Center for Targeting Tumor Diagnosis and Therapy, Guangxi Medical University;
  • 英文关键词:RNA-Seq;;transcriptome reconstruction;;transcript assembly;;transcript quantification
  • 中文刊名:JCXG
  • 英文刊名:中国科学:生命科学(英文版)
  • 机构:The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University;National Center for Toxicological Research, U.S.Food and Drug Administration;Department of Physiology and Biophysics, Weill Cornell Medicine;The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine;Feil Family Brain & Mind Research Institute;Chair of Bioinformatics Research Group, Boku University;Malopolska Centre of Biotechnology, Jagiellonian University;APART Fellow, Austrian Academy of Science;National Center for International Research of Biological Targeting Diagnosis and Therapy, Guangxi Key Laboratory of Biological Targeting Diagnosis and Therapy Research, Collaborative Innovation Center for Targeting Tumor Diagnosis and Therapy, Guangxi Medical University;
  • 出版日期:2019-05-23 13:18
  • 出版单位:Science China(Life Sciences)
  • 年:2019
  • 期:v.62
  • 基金:supported by the National High Technology Research and Development Program of China(2015AA020108);; the National Key Research and Development Program of China(2016YFC0902100);; the China Human Proteome Project(2014DFB30010,2014DFB30030);; the National Science Foundation of China(31671377,31401133,31771460,91629103);; the Program of Introducing Talents of Discipline to Universities of China(B14019)
  • 语种:英文;
  • 页:JCXG201907009
  • 页数:10
  • CN:07
  • ISSN:11-5841/Q
  • 分类号:67-76
摘要
RNA sequencing(RNA-seq) has greatly facilitated the exploring of transcriptome landscape for diverse organisms. However,transcriptome reconstruction is still challenging due to various limitations of current tools and sequencing technologies. Here, we introduce an efficient tool, QuaPra(Quadratic Programming combined with Apriori), for accurate transcriptome assembly and quantification. QuaPra could detect at least 26.5% more low abundance(0.1–1 FPKM) transcripts with over 2.7% increase of sensitivity and precision on simulated data compared to other currently popular tools. Moreover, around one-quarter more known transcripts were correctly assembled by QuaPra than other assemblers on real sequencing data. QuaPra is freely available at http://www.megabionet.org/QuaPra/.
        RNA sequencing(RNA-seq) has greatly facilitated the exploring of transcriptome landscape for diverse organisms. However,transcriptome reconstruction is still challenging due to various limitations of current tools and sequencing technologies. Here, we introduce an efficient tool, QuaPra(Quadratic Programming combined with Apriori), for accurate transcriptome assembly and quantification. QuaPra could detect at least 26.5% more low abundance(0.1–1 FPKM) transcripts with over 2.7% increase of sensitivity and precision on simulated data compared to other currently popular tools. Moreover, around one-quarter more known transcripts were correctly assembled by QuaPra than other assemblers on real sequencing data. QuaPra is freely available at http://www.megabionet.org/QuaPra/.
引文
Bradford,J.R.,Cox,A.,Bernard,P.,and Camp,N.J.(2016).Consensus analysis of whole transcriptome profiles from two breast cancer patient cohorts reveals long non-coding RNAs associated with intrinsic subtype and the tumour microenvironment.PLoS ONE 11,e0163238.
    Bray,N.L.,Pimentel,H.,Melsted,P.,and Pachter,L.(2016).Near-optimal probabilistic RNA-seq quantification.Nat Biotechnol 34,525-527.
    Chan,M.C.,Ilott,N.E.,Sch?del,J.,Sims,D.,Tumber,A.,Lippl,K.,Mole,D.R.,Pugh,C.W.,Ratcliffe,P.J.,Ponting,C.P.,et al.(2016).Tuning the transcriptional response to hypoxia by inhibiting hypoxia-inducible factor(HIF)prolyl and asparaginyl hydroxylases.J Biol Chem 291,20661-20673.
    Chen,G.,Shi,T.,and Shi,L.(2017).Characterizing and annotating the genome using RNA-seq data.Sci China Life Sci 60,116-125.
    Chen,J.,and Xue,Y.(2016).Emerging roles of non-coding RNAs in epigenetic regulation.Sci China Life Sci 59,227-235.
    Derrien,T.,Johnson,R.,Bussotti,G.,Tanzer,A.,Djebali,S.,Tilgner,H.,Guernec,G.,Martin,D.,Merkel,A.,Knowles,D.G.,et al.(2012).The GENCODE v7 catalog of human long noncoding RNAs:analysis of their gene structure,evolution,and expression.Genome Res 22,1775-1789.
    Dobin,A.,Davis,C.A.,Schlesinger,F.,Drenkow,J.,Zaleski,C.,Jha,S.,Batut,P.,Chaisson,M.,and Gingeras,T.R.(2013).STAR:ultrafast universal RNA-seq aligner.Bioinformatics 29,15-21.
    Dong,C.,Zhao,G.,Zhong,M.,Yue,Y.,Wu,L.,and Xiong,S.(2013).RNA sequencing and transcriptomal analysis of human monocyte to macrophage differentiation.Gene 519,279-287.
    Griebel,T.,Zacher,B.,Ribeca,P.,Raineri,E.,Lacroix,V.,Guigó,R.,and Sammeth,M.(2012).Modelling and simulating generic RNA-Seq experiments with the flux simulator.Nucl Acids Res 40,10073-10083.
    Hipp J.,Myka A.,Wirth R.,Güntzer U.(1998)A new algorithm for faster mining of generalized association rules.Lect Notes Artif Int,1510,74-82.
    Kim,D.,Langmead,B.,and Salzberg,S.L.(2015).HISAT:a fast spliced aligner with low memory requirements.Nat Methods 12,357-360.
    Kim,D.,Pertea,G.,Trapnell,C.,Pimentel,H.,Kelley,R.,and Salzberg,S.L.(2013).TopHat2:accurate alignment of transcriptomes in the presence of insertions,deletions and gene fusions.Genome Biol 14,R36.
    ?abaj,P.P.,Leparc,G.G.,Linggi,B.E.,Markillie,L.M.,Wiley,H.S.,and Kreil,D.P.(2011).Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.Bioinformatics27,i383-i391.
    Leinonen,R.,Sugawara,H.,Shumway,M.,and Shumway,M.(2011).The sequence read archive.Nucl Acids Res 39,D19-D21.
    Li,B.,and Dewey,C.N.(2011).RSEM:accurate transcript quantification from RNA-Seq data with or without a reference genome.BMC BioInf12,323.
    Li,H.,Handsaker,B.,Wysoker,A.,Fennell,T.,Ruan,J.,Homer,N.,Marth,G.,Abecasis,G.,Durbin,R.,and Durbin,R.(2009).The sequence alignment/Map format and SAMtools.Bioinformatics 25,2078-2079.
    Li,W.,and Jiang,T.(2012).Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.Bioinformatics 28,2914-2921.
    Magistri,M.,Velmeshev,D.,Makhmutova,M.,and Faghihi,M.A.(2015).Transcriptomics profiling of Alzheimer’s disease reveal neurovascular defects,altered amyloid-βhomeostasis,and deregulated expression of long noncoding RNAs.J Alzheimer’s Disease 48,647-665.
    Mollet,I.G.,Ben-Dov,C.,Felício-Silva,D.,Grosso,A.R.,Eleutério,P.,Alves,R.,Staller,R.,Silva,T.S.,and Carmo-Fonseca,M.(2010).Unconstrained mining of transcript data reveals increased alternative splicing complexity in the human transcriptome.Nucl Acids Res 38,4740-4754.
    Parkinson,H.,Sarkans,U.,Kolesnikov,N.,Abeygunawardena,N.,Burdett,T.,Dylag,M.,Emam,I.,Farne,A.,Hastings,E.,Holloway,E.,et al.(2011).ArrayExpress update-an archive of microarray and highthroughput sequencing-based functional genomics experiments.Nucl Acids Res 39,D1002-D1004.
    Pertea,M.,Pertea,G.M.,Antonescu,C.M.,Chang,T.C.,Mendell,J.T.,and Salzberg,S.L.(2015).Stringtie enables improved reconstruction of a transcriptome from RNA-seq reads.Nat Biotechnol 33,290-295.
    Schiano,C.,Costa,V.,Aprile,M.,Grimaldi,V.,Maiello,C.,Esposito,R.,Soricelli,A.,Colantuoni,V.,Donatelli,F.,Ciccodicola,A.,et al.(2017).Heart failure:pilot transcriptomic analysis of cardiac tissue by RNA-sequencing.Cardiol J 24,539-553.
    Song,L.,Sabunciyan,S.,and Florea,L.(2016).CLASS2:accurate and efficient splice variant annotation from RNA-seq reads.Nucl Acids Res44,e98.
    Sun,T.T.,He,J.,Liang,Q.,Ren,L.L.,Yan,T.T.,Yu,T.C.,Tang,J.Y.,Bao,Y.J.,Hu,Y.,Lin,Y.,et al.(2016).LncRNA GClnc1 promotes gastric carcinogenesis and may act as a modular scaffold of WDR5 and KAT2A complexes to specify the histone modification pattern.Cancer Discov 6,784-801.
    The ENCODE Project Consortium(2012).An integrated encyclopedia of DNA elements in the human genome.Nature 489,57-74.
    Tomescu,A.I.,Kuosmanen,A.,Rizzi,R.,M?kinen,V.(2013).A novel min-cost flow method for estimating transcript expression with RNA-Seq.BMC Bioinformatics 14,S15.
    Trapnell,C.,Pachter,L.,and Salzberg,S.L.(2009).TopHat:discovering splice junctions with RNA-Seq.Bioinformatics 25,1105-1111.
    Trapnell,C.,Williams,B.A.,Pertea,G.,Mortazavi,A.,Kwan,G.,van Baren,M.J.,Salzberg,S.L.,Wold,B.J.,and Pachter,L.(2010).Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.Nat Biotechnol 28,511-515.
    Volders,P.J.,Helsens,K.,Wang,X.,Menten,B.,Martens,L.,Gevaert,K.,Vandesompele,J.,and Mestdagh,P.(2013).LNCipedia:a database for annotated human lncRNA transcript sequences and structures.Nucleic Acids Res 41,D246-D251.
    Wang,E.T.,Sandberg,R.,Luo,S.,Khrebtukova,I.,Zhang,L.,Mayr,C.,Kingsmore,S.F.,Schroth,G.P.,and Burge,C.B.(2008).Alternative isoform regulation in human tissue transcriptomes.Nature 456,470-476.
    Wang,K.,Singh,D.,Zeng,Z.,Coleman,S.J.,Huang,Y.,Savich,G.L.,He,X.,Mieczkowski,P.,Grimm,S.A.,Perou,C.M.,et al.(2010).MapSplice:accurate mapping of RNA-seq reads for splice junction discovery.Nucleic Acids Res 38,e178.
    Wang,Z.,Gerstein,M.,and Snyder,M.(2009).RNA-Seq:a revolutionary tool for transcriptomics.Nat Rev Genet 10,57-63.
    Zhu,Y.,Orre,L.M.,Johansson,H.J.,Huss,M.,Boekel,J.,Vesterlund,M.,Fernandez-Woodbridge,A.,Branca,R.M.M.,and Lehti?,J.(2018).Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow.Nat Commun 9,903.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700