Analysis of phylogenetic signal in protostomial intron patterns using Mutual Information
详细信息    查看全文
  • 作者:Natascha Hill (1)
    Alexander Leow (1)
    Christoph Bleidorn (2)
    Detlef Groth (1)
    Ralph Tiedemann (1)
    Joachim Selbig (1)
    Stefanie Hartmann (1)
  • 关键词:Mutual Information ; Evolution ; Gene structure
  • 刊名:Theory in Biosciences
  • 出版年:2013
  • 出版时间:June 2013
  • 年:2013
  • 卷:132
  • 期:2
  • 页码:93-104
  • 全文大小:625KB
  • 参考文献:1. Adami C (2004) Information theory in molecular biology. Phys Life Rev 1(1):3-2 CrossRef
    2. Ahmadinejad N, Dagan T, Gruenheit N, Martin W, Gabaldon T (2010) Evolution of spliceosomal introns following endosymbiotic gene transfer. BMC Evol Biol 10(1):57 CrossRef
    3. Bauer M, Schuster SM, Sayood K (2008) The average mutual information profile as a genomic signature. BMC Bioinformatics 9:48 CrossRef
    4. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57(1):289-00
    5. Butte A (2002) The use and analysis of microarray data. Nat Rev Drug Discov 1(12):951-0 CrossRef
    6. Carmel L, Rogozin IB, Wolf YI, Koonin EV (2007) Evolutionarily conserved genes preferentially accumulate introns. Genome Res 17(7):1045-050 CrossRef
    7. Cho S, Jin S, Cohen A, Ellis RE (2004) A phylogeny of caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res 14(7):1207-220 CrossRef
    8. Cs?r?s M (2008) Malin: maximum likelihood analysis of intron evolution in eukaryotes. Bioinform Biol Insights 24(13):1538-539 CrossRef
    9. Ding B, Gentleman R, Carey V bioDist: different distance measures. http://www.bioconductor.org/packages/release/bioc/html/bioDist.html
    10. Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, Sorensen MV, Haddock SHD, Schmidt-Rhaesa A, Okusu A, Kristensen RM, Wheeler WC, Martindale MQ, Giribet G (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nat Biotechnol 452(7188):745-49 CrossRef
    11. Ebersberger I, Strauss S, von Haeseler A (2009) HaMStR: profile hidden markov model based search for orthologs in ESTs. BMC Evol Biol 9:157-57 CrossRef
    12. Farris JS (1977) Phylogenetic analysis under dollo’s law. Syst Zool 26(1):77-8 CrossRef
    13. Fedorov A, Merican AF, Gilbert W (2002) Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Nat Acad Sci USA 99(25):16,128-6,133 CrossRef
    14. Felsenstein J (2005) PHYLIP (phylogeny inference package) version 3.6. Distributed by the author Department of Genome Sciences, University of Washington, Seattle
    15. Gee H (2003) Evolution: ending incongruence. Nat Biotechnol 425(6960):782 CrossRef
    16. Gentleman RC, Carey VJ, Bates DM et?al (2004) Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol 5:R80 CrossRef
    17. Groth D, Hartmann S, Friemel M, Hill N, Müller S, Poustka AJ, Panopoulou G (2010) Data integration using scanners with sql output–the bioscanners project at sourceforge. J Integr Bioinforma 7(3)
    18. Hartmann S, Vision TJ (2008) Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment. BMC Evol Biol 8:95-5 CrossRef
    19. Hejnol A, Obst M, Stamatakis A, Ott M, Rouse GW, Edgecombe GD, Martinez P, Bagu?à à J, Bailly X, Jondelius U, Wiens M, Müller WEG, Seaver E, Wheeler WC, Martindale MQ, Giribet G, Dunn CW (2009) Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc R Soc B: Biol Sci 276(1677):4261-270 CrossRef
    20. Holton TA, Pisani D (2010) Deep Genomic-Scale analyses of the metazoa reject coelomata: Evidence from single- and multigene families analyzed under a supertree and supermatrix paradigm. Genome Biol Evol 2:310-24 CrossRef
    21. Hummel J, Keshvari N, Weckwerth W, Selbig J (2005) Species-specific analysis of protein sequence motifs using mutual information. BMC Bioinformatics 6:164 CrossRef
    22. Irimia M, Roy SW (2008) Spliceosomal introns as tools for genomic and evolutionary analysis. Nucleic Acids Res Suppl 36(5):1703-712 CrossRef
    23. Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence. Trends Genet 22(4):225-31 CrossRef
    24. Katoh K, ichi Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res Suppl 33(2):511-18 CrossRef
    25. Koonin EV (2009) Intron-dominated genomes of early ancestors of eukaryotes. J Hered 100(5):618-23 CrossRef
    26. Krauss V, Pecyna M, Kurz K, Sass H (2005) Phylogenetic mapping of intron positions: a case study of translation initiation factor eIF2. Mol Biol Evol 22(1):74-4 CrossRef
    27. Krauss V, Thümmler C, Georgi F, Lehmann J, Stadler PF, Eisenhardt C (2008) Near intron positions are reliable phylogenetic markers: an application to holometabolous insects. Mol Biol Evol 25(5):821-30 CrossRef
    28. Li W, Tucker AE, Sung W, Thomas WK, Lynch M (2009) Extensive, recent intron gains in daphnia populations. Sci Agric 326(5957):1260-262 CrossRef
    29. Nguyen HD, Yoshihama M, Kenmochi N (2005) New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol 1(7):e79 CrossRef
    30. Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinform Biol Insights 20:289-90 CrossRef
    31. Penner O, Grassberger P, Paczuski M (2011) Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies. PLoS One 6(1):e14,373 CrossRef
    32. Philippe H, Lartillot N, Brinkmann H (2005) Multigene analyses of bilaterian animals corroborate the monophyly of ecdysozoa, lophotrochozoa, and protostomia. Mol Biol Evol 22(5):1246-253 CrossRef
    33. Qiu W, Schisler N, Stoltzfus A (2004) The evolutionary gain of spliceosomal introns: sequence and phase preferences. Mol Biol Evol 21(7):1252-263 CrossRef
    34. Raible F, Tessmar-Raible K, Osoegawa K, Wincker P, Jubin C, Balavoine G, Ferrier D, Benes V, de~Jong P, Weissenbach J, Bork P, Arendt D (2005) Vertebrate-Type Intron-Rich genes in the marine annelid platynereis dumerilii. Sci Agric 310(5752):1325-326 CrossRef
    35. Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13(17):1512-517 CrossRef
    36. Rokas, Holland (2000) Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol 15(11):454-59
    37. Rokas A, Williams BL, King N, Carroll SB (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nat Biotechnol 425(6960):798-04 CrossRef
    38. Roth A, Gonnet G, Dessimoz C (2008) Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics 9(1):518 CrossRef
    39. Roy SW (2006) Intron-rich ancestors. Trends Genet 22(9):468-71 CrossRef
    40. Roy SW, Gilbert W (2005a) Resolution of a deep animal divergence by the pattern of intron conservation. Proc Nat Acad Sci USA 102(12):4403-408 CrossRef
    41. Roy SW, Gilbert W (2005b) Rates of intron loss and gain: implications for early eukaryotic evolution. Proc Nat Acad Sci USA 102(16):5773-778 CrossRef
    42. Roy SW, Gilbert W (2005c) Complex early genes. Proc Nat Acad Sci USA 102(6):1986-991 CrossRef
    43. Roy SW, Gilbert W (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet 7(3):211-21
    44. Roy SW, Penny D (2007) Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of / O. sativa and / A. thaliana. Mol Biol Evol 24(1):171-81 CrossRef
    45. Roy SW, Irimia M (2009) Mystery of intron gain: new data and new models. Trends Genet 25(2):67-3 CrossRef
    46. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379-23, 623-56
    47. Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinform Biol Insights 22(21):2688-690 CrossRef
    48. Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV (2005) Conservation versus parallel gains in intron evolution. Nucleic Acids Res Suppl 33(6):1741-748 CrossRef
    49. Swofford D (2000) PAUP*: Phylogenetic analysis using parsimony and other methods, 4b10 edn, Sinauer.
    50. Venkatesh B, Ning Y, Brenner S (1999) Late changes in spliceosomal introns define clades in vertebrate evolution. Proc Nat Acad Sci USA 96(18):10,267-0,271 CrossRef
    51. Venkatesh B, Erdmann MV, Brenner S (2001) Molecular synapomorphies resolve evolutionary relationships of extant jawed vertebrates. Proc Nat Acad Sci USA 98(20):11382-1387 CrossRef
    52. Weckwerth W, Selbig J (2003) Scoring and identifying organism-specific functional patterns and putative phosphorylation sites in protein sequences using mutual information. Biochem Biophys Res Commun 307(3):516-1 CrossRef
    53. Wilkerson MD, Ru Y, Brendel VP (2009) Common introns within orthologous genes: software and application to plants. Brief Bioinforma 10(6):631-44 CrossRef
    54. Yandell M, Mungall CJ, Smith C, Prochnik S, Kaminker J, Hartzell G, Lewis S, Rubin GM (2006) Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput Biol 2(3):e15 CrossRef
    55. Zheng J, Rogozin IB, Koonin EV, Przytycka TM (2007) Support for the coelomata clade of animals from a rigorous analysis of the pattern of intron conservation. Mol Biol Evol 24(11):2583-592 CrossRef
  • 作者单位:Natascha Hill (1)
    Alexander Leow (1)
    Christoph Bleidorn (2)
    Detlef Groth (1)
    Ralph Tiedemann (1)
    Joachim Selbig (1)
    Stefanie Hartmann (1)

    1. Department of Bioinformatics, Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany
    2. Department of Molecular Evolution and Animal Systematics, Institute for Biology II, University of Leipzig, Leipzig, Germany
文摘
Many deep evolutionary divergences still remain unresolved, such as those among major taxa of the Lophotrochozoa. As alternative phylogenetic markers, the intron–exon structure of eukaryotic genomes and the patterns of absence and presence of spliceosomal introns appear to be promising. However, given the potential homoplasy of intron presence, the phylogenetic analysis of this data using standard evolutionary approaches has remained a challenge. Here, we used Mutual Information (MI) to estimate the phylogeny of Protostomia using gene structure data, and we compared these results with those obtained with Dollo Parsimony. Using full genome sequences from nine Metazoa, we identified 447 groups of orthologous sequences with 21,732 introns in 4,870 unique intron positions. We determined the shared absence and presence of introns in the corresponding sequence alignments and have made this data available in “IntronBase- a web-accessible and downloadable SQLite database. Our results obtained using Dollo Parsimony are obviously misled through systematic errors that arise from multiple intron loss events, but extensive filtering of data improved the quality of the estimated phylogenies. Mutual Information, in contrast, performs better with larger datasets, but at the same time it requires a complete data set, which is difficult to obtain for orthologs from a large number of taxa. Nevertheless, Mutual Information-based distances proved to be useful in analyzing this kind of data, also because the estimation of MI-based distances is independent of evolutionary models and therefore no pre-definitions of ancestral and derived character states are necessary.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700