SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa

详细信息查看全文

作者：Locedie Mansueto^a ; Roven Rommel Fuentes^a ; Dmytro Chebotarov^a ; Frances Nikki Borja^a ; Jeffrey Detras^a ; Juan Miguel Abriol-Santos^a ; Kevin Palis^a ; ^b ; Alexandre Poliakov^c ; ^d ; Inna Dubchak^c ; ^d ; Victor Solovyev^e ; Ruaraidh Sackville Hamilton^a ; Kenneth L. McNally^a ; Nickolai Alexandrov^a ; Ramil Mauleon^a ; ^{r.mauleon@irri.org}
关键词：3k RG ; 3000 Rice Genomes ; API ; Application Programming Interface ; DAO ; Data Access Object ; HDF5 ; Hierarchical Data Format 5 (file format) ; HDRA ; High Density Rice Array ; indel ; insertion or deletion in genomic region ; IRGCIS ; International Rice Genebank Collection Information System (http ; //www.irgcis.irri.org ; 81/grc/irgcishome.html) ; IRRI ; International Rice Research Institute ; RDBMS ; Relational Database Management System ; SNP ; Single Nucleotide Polymorphism
刊名：Current Plant Biology
出版年：2016
出版时间：November 2016
年：2016
卷：7-8
期：Complete
页码：16-25
全文大小：1266 K
卷排序：7

文摘

The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica), IR 64 (indica), 93–11 (indica), DJ 123 (aus), and Kasalath (aus).The results are accessible through the Rice SNP-Seek Database (http://snp-seek.irri.org) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek.In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700