Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak
详细信息    查看全文
  • 作者:Saneyoshi Ueno (1) (2)
    Grégoire Le Provost (1)
    Valérie Léger (1)
    Christophe Klopp (3)
    Céline Noirot (3)
    Jean-Marc Frigerio (1)
    Franck Salin (1)
    Jér?me Salse (4)
    Michael Abrouk (4)
    Florent Murat (4)
    Oliver Brendel (5)
    Jérémy Derory (1)
    Pierre Abadie (1)
    Patrick Léger (1)
    Cyril Cabane (6) (7)
    Aurélien Barré (6)
    Antoine de Daruvar (6) (7)
    Arnaud Couloux (8)
    Patrick Wincker (8)
    Marie-Pierre Reviron (1)
    Antoine Kremer (1)
    Christophe Plomion (1)
  • 刊名:BMC Genomics
  • 出版年:2010
  • 出版时间:December 2010
  • 年:2010
  • 卷:11
  • 期:1
  • 全文大小:1235KB
  • 参考文献:1. Geburek T, Turok J, Eds: / Conservation and management of forest genetic resources in Europe. Zvolen: Arbora Publishers; 2005.
    2. Camus A: / Les Chênes. Paris: Lechevalier; 1936-954.
    3. Axelrod DI: Biogeography of oaks in the Arcto-Tertiary Province. / Annals of the Missouri Botanical Garden 1990, 70: 629-57. CrossRef
    4. Brandle M, Brandl R: Species richness of insects and mites on trees: expanding Southwood. / Journal of Animal Ecology 2001, 70: 491-04. CrossRef
    5. Frodin DG, Govaerts R: / World Checklist and Bibliography of Fagales (Betulaceae, Corylaceae, Fagaceae and Ticodendraceae). Richmond: Kew Publishing; 1998.
    6. Manos PS, Stanford AM: The historical biogeography of Fagaceae: Tracking the tertiary history of temperate and subtropical forests of the northern hemisphere. / International Journal of Plant Science 2001, 162: S77-S93. CrossRef
    7. Manos PS, Doyle JJ, Nixon KC: Phylogeny, Biogeography, and Processes of Molecular Differentiation in Quercus Subgenus Quercus (Fagaceae). / Molecular Phylogenetics and Evolution 1999, 12: 333-49. CrossRef
    8. Kremer A, Casasoli M, Barreneche T, Bodenes C, Sisco P, Kubisiak T, Scalfi M, Leonardi S, Bakker E, Buiteveld J, / et al.: Fagaceae. In / Genome Mapping and Molecular Breeding in Plants. Volume 7 Forest Trees. Edited by: Kole CR. Heidelberg, Berlin, New York, Tokyo: Springer; 2007:165-87.
    9. Casasoli M, Derory J, Morera-Dutrey C, Brendel O, Porth I, Guehl JM, Villani F, Kremer A: Comparison of quantitative trait loci for adaptive traits between oak and chestnut based on an expressed sequence tag consensus map. / Genetics 2006, 172: 533-46. CrossRef
    10. Barreneche T, Casasoli M, Russell K, Akkak A, Meddour H, Plomion C, Villani F, Kremer A: Comparative mapping between Quercus and Castanea using simple-sequence repeats (SSRs). / Theor Appl Genet 2004, 108: 558-66. CrossRef
    11. Schmid R, Blaxter M: EST processing: from trace to sequence. / Methods Mol Biol 2009, 533: 189-20.
    12. Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. / Genome Res 1998, 8: 186-94.
    13. Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. / Genome Res 1998, 8: 175-85.
    14. Documentation for phrap and cross_match [http://bozeman.mbt.washington.edu/phrap.docs/phrap.html]
    15. Parkinson J, Anthony A, Wasmuth J, Schmid R, Hedley A, Blaxter M: PartiGene--constructing partial genomes. / Bioinformatics 2004, 20: 1398-404. CrossRef
    16. Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, / et al.: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. / Bioinformatics 2003, 19: 651-52. CrossRef
    17. Fleury E, Huvet A, Lelong C, de Lorgeril J, Boulo V, Gueguen Y, Bachere E, Tanguy A, Moraga D, Fabioux C, / et al.: Generation and analysis of a 29,745 unique Expressed Sequence Tags from the Pacific oyster ( Crassostrea gigas ) assembled into a publicly accessible database: the GigasDatabase. / BMC Genomics 2009, 10: 341. CrossRef
    18. Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WE, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. / Genome Res 2004, 14: 1147-159. CrossRef
    19. Gouzy J, Carrere S, Schiex T: FrameDP: sensitive peptide detection on noisy matured sequences. / Bioinformatics 2009, 25: 670-71. CrossRef
    20. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. / J Mol Biol 1990, 215: 403-10.
    21. Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, / et al.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. / Nucleic Acids Res 2003, 31: 365-70. CrossRef
    22. Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. / Nucleic Acids Res 2005, 33: D501-04. CrossRef
    23. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. / Nucleic Acids Res 2008, 36: D281-88. CrossRef
    24. Quackenbush J, Liang F, Holt I, Pertea G, Upton J: The TIGR gene indices: reconstruction and representation of expressed gene sequences. / Nucleic Acids Res 2000, 28: 141-45. CrossRef
    25. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, / et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. / Nat Genet 2000, 25: 25-9. CrossRef
    26. Guillaumie S, San-Clemente H, Deswarte C, Martinez Y, Lapierre C, Murigneux A, Barriere Y, Pichon M, Goffner D: MAIZEWALL. Database and developmental gene expression profiling of cell wall biosynthesis and assembly in maize. / Plant Physiol 2007, 143: 339-63. CrossRef
    27. Tedersoo L, Nilsson RH, Abarenkov K, Jairus T, Sadam A, Saar I, Bahram M, Bechem E, Chuyong G, Koljalg U: 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. / New Phytol 2010, 188: 291-01. CrossRef
    28. Kolpakov R, Bana G, Kucherov G: mreps: Efficient and flexible detection of tandem repeats in DNA. / Nucleic Acids Res 2003, 31: 3672-678. CrossRef
    29. Durand J, Bodenes C, Chancerel E, Frigerio J-M, Vendramin G, Sebastiani F, Buonamici A, Gailing O, Koelewijn H-P, Villani F, / et al.: SSR mining in oak ESTs and bin mapping 1 of 256 loci in a Quercus robur L. full-sib pedigree. / BMC Genomics 2010.
    30. Varshney RK, Graner A, Sorrells ME: Genic microsatellite markers in plants: features and applications. / Trends Biotechnol 2005, 23: 48-5. CrossRef
    31. Merkel A, Gemmell NJ: Detecting microsatellites in genome data: variance in definitions and bioinformatic approaches cause systematic bias. / Evol Bioinform Online 2008, 4: 1-.
    32. Iseli C, Jongeneel CV, Bucher P: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. / Proc Int Conf Intell Syst Mol Biol 1999, 138-48.
    33. Metzgar D, Bytof J, Wills C: Selection against frameshift mutations limits microsatellite expansion in coding DNA. / Genome Res 2000, 10: 72-0.
    34. Fujimori S, Washio T, Higo K, Ohtomo Y, Murakami K, Matsubara K, Kawai J, Carninci P, Hayashizaki Y, Kikuchi S, Tomita M: A novel feature of microsatellites in plants: a distribution gradient along the direction of transcription. / FEBS Lett 2003, 554: 17-2. CrossRef
    35. Teal TK, Schmidt TM: Identifying and removing artificial replicates from 454 pyrosequencing data. / Cold Spring Harb Protoc 2010., 2010: pdb prot5409
    36. Gomez-Alvarez V, Teal TK, Schmidt TM: Systematic artifacts in metagenomes from complex microbial communities. / Isme J 2009, 3: 1314-317. CrossRef
    37. Novaes E, Drost DR, Farmerie WG, Pappas GJ Jr, Grattapaglia D, Sederoff RR, Kirst M: High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. / BMC Genomics 2008, 9: 312. CrossRef
    38. Meyer E, Aglyamova GV, Wang S, Buchanan-Carter J, Abrego D, Colbourne JK, Willis BL, Matz MV: Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. / BMC Genomics 2009, 10: 219. CrossRef
    39. Tsumura Y, Kado T, Takahashi T, Tani N, Ujino-Ihara T, Iwata H: Genome scan to detect genetic structure and adaptive genes of natural populations of Cryptomeria japonica. / Genetics 2007, 176: 2393-403. CrossRef
    40. De Castro MH: / Allelic diversity in the CAD2 and LIM1 lignin biosynthetic genes of Eucalyptus grandis Hill ex Maiden and E. smithii R. T. Baker. University of Pretoria, Department of Genetics; 2006.
    41. Salse J, Bolot S, Throude M, Jouffe V, Piegu B, Quraishi UM, Calcagno T, Cooke R, Delseny M, Feuillet C: Identification and characterization of shared duplications between rice and wheat provide new insight into grass genome evolution. / Plant Cell 2008, 20: 11-4. CrossRef
    42. Salse J, Abrouk M, Bolot S, Guilhot N, Courcelle E, Faraut T, Waugh R, Close TJ, Messing J, Feuillet C: Reconstruction of monocotelydoneous proto-chromosomes reveals faster evolution in plants than in animals. / Proc Natl Acad Sci USA 2009, 106: 14908-4913. CrossRef
    43. Salse J, Abrouk M, Murat F, Quraishi UM, Feuillet C: Improved criteria and comparative genomics tool provide new insights into grass paleogenomics. / Brief Bioinform 2009, 10: 619-30. CrossRef
    44. Brendel O, Thiec DL, Scotti-Saintagne C, Bodenes C, Kremer A, Guehl J-M: Quantitative trait loci controlling water use efficiency and related traits in Quercus robur L. / TGG 2008, 4: 263-78.
    45. Le Provost G, Herrera R, Paiva J, Chaumeil P, Salin F, Plomion C: A micromethod for high throughput RNA extraction in forest trees. / Biological Research 2007, 40: 291-97.
    46. Diatchenko L, Lau YF, Campbell AP, Chenchik A, Moqadam F, Huang B, Lukyanov S, Lukyanov K, Gurskaya N, Sverdlov ED, Siebert PD: Suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cDNA probes and libraries. / Proc Natl Acad Sci USA 1996, 93: 6025-030. CrossRef
    47. Derory J, Leger P, Garcia V, Schaeffer J, Hauser MT, Salin F, Luschnig C, Plomion C, Glossl J, Kremer A: Transcriptome analysis of bud burst in sessile oak (Quercus petraea). / New Phytol 2006, 170: 723-38. CrossRef
    48. Roussel M, Dreyer E, Montpied P, Le-Provost G, Guehl JM, Brendel O: The diversity of (13)C isotope discrimination in a Quercus robur full-sib family is associated with differences in intrinsic water use efficiency, transpiration efficiency, and stomatal conductance. / J Exp Bot 2009, 60: 2419-431. CrossRef
    49. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. / Proc Natl Acad Sci USA 1977, 74: 5463-467. CrossRef
    50. RepeatMasker Open-3.0 [http://www.repeatmasker.org]
    51. Chaisson MJ, Pevzner PA: Short read fragment assembly of bacterial genomes. / Genome Res 2008, 18: 324-30. CrossRef
    52. SeqClean [http://compbio.dfci.harvard.edu/tgi/]
    53. Huang X, Madan A: CAP3: A DNA sequence assembly program. / Genome Res 1999, 9: 868-77. CrossRef
    54. Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. / J Comput Biol 2000, 7: 203-14. CrossRef
    55. Chevreux B, Wetter T, Suhai S: Genome sequence assembly using trace signals and additional sequence information. / Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 1999, 99: 45-6.
    56. Moreno-Hagelsieb G, Latimer K: Choosing BLAST options for better detection of orthologs as reciprocal best hits. / Bioinformatics 2008, 24: 319-24. CrossRef
    57. Parkinson J, Guiliano DB, Blaxter M: Making sense of EST sequences by CLOBBing them. / BMC Bioinformatics 2002, 3: 31. CrossRef
    58. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. / Bioinformatics 2005, 21: 3674-676. CrossRef
    59. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. / Nucleic Acids Res 1999, 27: 29-4. CrossRef
    60. SOM-PAK [http://www.cis.hut.fi/research/som_lvq_pak]
    61. Pearson WR: Using the FASTA program to search protein and DNA sequence databases. / Methods Mol Biol 1994, 25: 365-89.
    62. Goremykin VV, Salamini F, Velasco R, Viola R: Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer. / Mol Biol Evol 2009, 26: 99-10. CrossRef
    63. Goldman N, Yang Z: A codon-based model of nucleotide substitution for protein-coding DNA sequences. / Mol Biol Evol 1994, 11: 725-36.
    64. SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL: The paleontology of intergene retrotransposons of maize. / Nat Genet 1998, 20: 43-5. CrossRef
    65. Gaut BS, Morton BR, McCaig BC, Clegg MT: Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. / Proc Natl Acad Sci USA 1996, 93: 10274-0279. CrossRef
    66. Udall JA, Swanson JM, Haller K, Rapp RA, Sparks ME, Hatfield J, Yu Y, Wu Y, Dowd C, Arpat AB, / et al.: A global assembly of cotton ESTs. / Genome Res 2006, 16: 441-50. CrossRef
    67. Argout X, Fouet O, Wincker P, Gramacho K, Legavre T, Sabau X, Risterucci AM, Da Silva C, Cascardo J, Allegre M, / et al.: Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions. / BMC Genomics 2008, 9: 512. CrossRef
    68. Ralph SG, Chun HJ, Kolosova N, Cooper D, Oddy C, Ritland CE, Kirkpatrick R, Moore R, Barber S, Holt RA, / et al.: A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis). / BMC Genomics 2008, 9: 484. CrossRef
    69. Crowhurst RN, Gleave AP, MacRae EA, Ampomah-Dwamena C, Atkinson RG, Beuning LL, Bulley SM, Chagne D, Marsh KB, Matich AJ, / et al.: Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening. / BMC Genomics 2008, 9: 351. CrossRef
    70. Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner AM, Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH, / et al.: A Populus EST resource for plant functional genomics. / Proc Natl Acad Sci USA 2004, 101: 13951-3956. CrossRef
    71. Asamizu E, Nakamura Y, Sato S, Tabata S: Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis. / Plant Mol Biol 2004, 54: 405-14. CrossRef
    72. Terol J, Conesa A, Colmenero JM, Cercos M, Tadeo F, Agusti J, Alos E, Andres F, Soler G, Brumos J, / et al.: Analysis of 13000 unique Citrus clusters associated with fruit quality, production and salinity tolerance. / BMC Genomics 2007, 8: 31. CrossRef
  • 作者单位:Saneyoshi Ueno (1) (2)
    Grégoire Le Provost (1)
    Valérie Léger (1)
    Christophe Klopp (3)
    Céline Noirot (3)
    Jean-Marc Frigerio (1)
    Franck Salin (1)
    Jér?me Salse (4)
    Michael Abrouk (4)
    Florent Murat (4)
    Oliver Brendel (5)
    Jérémy Derory (1)
    Pierre Abadie (1)
    Patrick Léger (1)
    Cyril Cabane (6) (7)
    Aurélien Barré (6)
    Antoine de Daruvar (6) (7)
    Arnaud Couloux (8)
    Patrick Wincker (8)
    Marie-Pierre Reviron (1)
    Antoine Kremer (1)
    Christophe Plomion (1)

    1. INRA, UMR 1202 BIOGECO, 69 route d’Arcachon, F-33612, Cestas, France
    2. Forestry and Forest Products Research Institute, Department of Forest Genetics, Tree Genetics Laboratory, 1 Matsunosato, 305-8687, Tsukuba, Ibaraki, Japan
    3. Plateforme bioinformatique Genotoul, UR875 Biométrie et Intelligence Artificielle, INRA, 31326, Castanet-Tolosan, France
    4. INRA/UBP UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 234 avenue du Brézet, 63100, Clermont Ferrand, France
    5. INRA, UMR1137 EEF "Ecologie et Ecophysiologie Forestières", F 54280, Champenoux, France
    6. Université de Bordeaux, Centre de Bioinformatique de Bordeaux, Bordeaux, France
    7. CNRS, UMR 5800, Laboratoire Bordelais de Recherche en Informatique, Talence, France
    8. CEA, DSV, Genoscope, Centre National de Séquen?age, 2 rue Gaston Crémieux CP5706, 91057, Evry cedex, France
文摘
Background The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity. Results We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html. Conclusions This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700