Computational workflow for analysis of gain and loss of genes in distantly related genomes
详细信息    查看全文
  • 作者:Andrey Ptitsyn (1)
    Leonid L Moroz (1) (2)
  • 刊名:BMC Bioinformatics
  • 出版年:2012
  • 出版时间:September 2012
  • 年:2012
  • 卷:13
  • 期:15-supp
  • 全文大小:493KB
  • 参考文献:1. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, / et al.: The COG database: an updated version includes eukaryotes. / BMC bioinformatics 2003, 4:41. CrossRef
    2. Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. / Nucleic acids research 2000,28(1):33鈥?6. CrossRef
    3. Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, / et al.: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. / Genome biology 2004,5(2):R7. CrossRef
    4. Penel S, Arigon AM, Dufayard JF, Sertier AS, Daubin V, Duret L, Gouy M, Perriere G: Databases of homologous gene families for comparative genomics. / BMC bioinformatics 2009,10(Suppl 6):S3. CrossRef
    5. Sakarya O, Kosik KS, Oakley TH: Reconstructing ancestral genome content based on symmetrical best alignments and Dollo parsimony. / Bioinformatics 2008,24(5):606鈥?12. CrossRef
    6. De Bie T, Cristianini N, Demuth JP, Hahn MW: CAFE: a computational tool for the study of gene family evolution. / Bioinformatics 2006,22(10):1269鈥?271. CrossRef
    7. Librado P, Vieira FG, Rozas J: BadiRate: estimating family turnover rates by likelihood-based methods. / Bioinformatics 2012,28(2):279鈥?81. CrossRef
    8. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. / J Mol Biol 1990,215(3):403鈥?10.
    9. Smith TF, Waterman MS: Identification of common molecular subsequences. / Journal of molecular biology 1981,147(1):195鈥?97. CrossRef
    10. Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV: OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011. / Nucleic acids research 2011, 39:D283鈥?88. CrossRef
    11. Gamieldien J, Ptitsyn A, Hide W: Eukaryotic genes in Mycobacterium tuberculosis could have a role in pathogenesis and immunomodulation. / Trends Genet 2002,18(1):5鈥?. CrossRef
    12. Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF, Kellis M, Lindblad-Toh K, Lander ES: Distinguishing protein-coding and noncoding genes in the human genome. / Proceedings of the National Academy of Sciences of the United States of America 2007,104(49):19428鈥?9433. CrossRef
  • 作者单位:Andrey Ptitsyn (1)
    Leonid L Moroz (1) (2)

    1. Whitney Laboratory for Marine Biosciences, University of Florida, 9505 Ocean Shore Blvd., Saint Augustine, FL, 32080, USA
    2. Dept of Neuroscience, University of Florida, Gainesville, FL, 32610, USA
  • ISSN:1471-2105
文摘
Background Early evolution of animals led to profound changes in body plan organization, symmetry and the rise of tissue complexity including formation of muscular and nervous systems. This process was associated with massive restructuring of animal genomes as well as deletion, acquisition and rapid differentiation of genes from a common metazoan ancestor. Here, we present a simple but efficient workflow for elucidation of gene gain and gene loss within major branches of the animal kingdom. Methods We have designed a pipeline of sequence comparison, clustering and functional annotation using 12 major phyla as illustrative examples. Specifically, for the input we used sets of ab initio predicted gene models from the genomes of six bilaterians, three basal metazoans (Cnidaria, Placozoa, Porifera), two unicellular eukaryotes (Monosiga and Capsospora) and the green plant Arabidopsis as an out-group. Due to the large amounts of data the software required a high-performance Linux cluster. The final results can be imported into standard spreadsheet analysis software and queried for the numbers and specific sets of genes absent in specific genomes, uniquely present or shared among different taxons. Results and conclusions The developed software is open source and available free of charge on Open Source principles. It allows the user to address a number of specific questions regarding gene gain and gene loss in particular genomes, and user-defined groups of genomes can be formulated in a type of logical expression. For example, our analysis of 12 sequenced genomes indicated that these genomes possess at least 90,000 unique genes and gene families, suggesting enormous diversity of the genome repertoire in the animal kingdom. Approximately 9% of these gene families are shared universally (homologous) among all genomes, 53% are unique to specific taxa, and the rest are shared between two or more distantly related genomes.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700