基因本体及其注释数据语义网模型
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
作为当前应用最广泛的生物本体,截至2007年8月,基因本体中共包含了大约23,700条术语,对约20个生物数据库中超过1600万条的基因和基因产物进行注释。在语义网应用领域,基因本体协会提供了一个RDF-XML格式文件?(http://archive. geneontology.org/latest-full/go_200708-assocdb.rdf-xml.gz)。然而该文件存在以下缺点,无法提供复杂的语义查询和推理服务:1)基因本体的三个子本体间是相互孤立的,缺乏必要的跨本体语义联系。2)文件以基因本体术语为中心进行组织,所有的信息都存放在一个单独的文件中。3)文件中缺乏对GOSlim的支持。
     本文中我们提出了一个语义网模型GORouter。该模型主要论证了如何利用多种基于RDF规范的语义网技术和工具对原始资源重新组织,为用户提供复杂的有关基因本体及其注释数据的语义查询和推理服务。
     我们对基因本体协会提供的异构原始数据重新进行编码,构建了一系列的RDF数据模块。GORouter模型中每个RDF模块由两个部分组成:元数据部分采用RSS技术进行标识、数据部分采用LSID技术进行全球统一命名。
     通过采用GLUE系统,我们在三个独立的基因子本体间建立了一对一类型的本体映射关系。为了提高映射精确度,GLUE系统采用“放宽标记”技术获得在给定领域约束和先验知识的条件下最佳的映射配置方案。
     我们采用Oracle NDM作为RDF存储容器,通过调用SDO_RDF_MATCH表函数无缝的将RDF查询结果与传统的关系型数据结合起来。最终,GORouter模型的规模被最小化,那些不直接和语义推理相关的数据将被存储在传统的关系数据表中。我们相信该解决方案能够部分克服传统语义网应用程序的性能瓶颈问题。
     GORouter模型及其应用程序支持Apache 2.0开放协议,研究人员可以通过访问http://www.scbit.org/gorouter/来获得最新数据和服务。
Gene Ontology (GO, http://www.geneontology.org) is by far the most widely used bio-ontology. As of August 2007, it contains approximately 23,700 terms, linked to a database of more than 16 million annotations of genes and gene products, originating from about 20 organisms. As a Semantic Web application domain, Gene Ontology Consortium provides a RDF-XML data file (http://archive.geneontology. org/latest-full/go_ 200708-assocdb.rdf-xml.gz). It is an export of the database, containing both the GO vocabulary and associations between GO terms and gene products. However, this file has drawbacks, making it unsuitable for providing complex semantic query and inference services.
     The first drawback is the lack of relationships between concepts among different GO subontologies, limiting the power of inference based on them. The second drawback is that the RDF-XML data file is organized with a term-centric view of GO annotation data. The third drawback is the lack of support for GOSlim.
     In this paper, we present a RDF model GORouter, which mainly demonstrates how to use multiple semantic web tools and techniques to integrate heterogeneous resources and to provide a mixture of semantic query and inference solutions of GO and its associations. Most of the original files come from the Gene Ontology Consortium. We encoded these heterogeneous resources in uniform RDF format, and created a set of RDF datasets. Each dataset consists of two RDF files, metadata and data. The metadata RDF files are encoded with RSS1.0. Each metadata RDF file has a data RDF files associated with it. We assign only one unique LSID to each URL of data RDF files.
     By introducing GLUE system, we create ontology mappings between pairs of terms coming from the three independent GO sub-ontologies. To improve the match accuracy, the GLUE system uses a Relaxation Labeler, which searches for the match configuration that best satisfies the given domain constraints and heuristic knowledge.
     We use the Oracle Network Data Model (NDM) as the native RDF data repository and the table function SDO_RDF_MATCH to seamlessly combine the result of RDF queries with traditional relational data. As a result, the scale of GORouter is minimized; information not directly involved in semantic inference is put into relational tables. We believe that this is an effective way to partly overcome the bottleneck of conventional semantic web applications.
     GORouter is licensed under Apache License Version 2.0, and is accessible via the website: http://www.scbit.org/gorouter/.
引文
[1] Tim Berners-Lee J H, Ora Lassila. The Semantic Web. Scientific American, 2001, 284(5): 28-37
    [2] Berners-Lee T. Semantic Web Road Map. W3C Design Issues., 1998
    [3] Dolin R H, Alschuler L, Bray T, et al. SGML as a message interchange format in healthcare. Proc AMIA Annu Fall Symp, 1997: 635-9
    [4] Berger C. Oracle's Platform for Life Science - Technical White Paper. 2005
    [5] Stephens S. Semantic Data Integration in the Life Sciences - Technical White Paper. 2005
    [6] Stephens S, Morales A, Quinlan M. Applying Semantic Web Technologies to Drug Safety Determination. Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications], 2006, 21(1): 82-86
    [7] Robu I, Robu V, Thirion B. An introduction to the Semantic Web for health sciences librarians. J Med Libr Assoc, 2006, 94(2): 198-205
    [8] Taylor K R, Gledhill R J, Essex J W, et al. Bringing chemical data onto the Semantic Web. J Chem Inf Model, 2006, 46(3): 939-52
    [9] Good B M, Wilkinson M D. The Life Sciences Semantic Web is full of creeps! Brief Bioinform, 2006, 7(3): 275-86
    [10] Ruttenberg A, Clark T, Bug W, et al. Advancing translational research with the Semantic Web. BMC Bioinformatics, 2007, 8 Suppl 3: S2
    [11] Wu C H, Apweiler R, Bairoch A, et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res, 2006, 34(Database issue): D187-91
    [12] Whetzel P L, Parkinson H, Causton H C, et al. The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics, 2006, 22(7): 866-73
    [13] Harris M A, Clark J, Ireland A, et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res, 2004, 32(Database issue): D258-61
    [14] Moreira D A, Musen M A. OBO to OWL: a protege OWL tab to read/save OBO ontologies. Bioinformatics, 2007, 23(14): 1868-70
    [15] Neumann E. A life science Semantic Web: are we there yet? Sci STKE, 2005, 2005(283): pe22
    [16] Boulos M N, Roudsari A V, Carson E R. Towards a semantic medical Web: HealthCyberMap's tool for building an RDF metadata base of health information resources based on the Qualified Dublin Core Metadata Set. Med Sci Monit, 2002, 8(7): MT124-36
    [17] Feigenbaum L, Martin S, Roy M N, et al. Boca: an open-source RDF store for building Semantic Web applications. Brief Bioinform, 2007, 8(3): 195-200
    [18] Casher O, Rzepa H S. SemanticEye: a semantic web application to rationalize and enhance chemical electronic publishing. J Chem Inf Model, 2006, 46(6): 2396-411
    [19] Cheung K H, Yip K Y, Smith A, et al. YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics, 2005, 21 Suppl 1: i85-96
    [20] Neumann E K, Quan D. Biodash: A Semantic Web Dashboard for Drug Development. Pacific Symposium on Biocomputing, 2006, 11: 176-187
    [21] Belleau F, Nolin M A, Tourigny N, et al. Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System. http: //bio2rdf. org/, 2007
    [22] Knublauch H, Fergerson R W, Noy N F, et al. The Protege-OWL Plugin: An Open Development Environment for Semantic Web Applications. Third International Semantic Web Conference, 2004, 3298: 229-243
    [23] Tim Berners-Lee N S, Wendy Hall. The Semantic Web Revisited. IEEE Intelligent Systems, May/June 2006, 21(3): 96-101
    [24] Cao S L, Qin L, He W Z, et al. Semantic search among heterogeneous biological databases based on gene ontology. Acta Biochim Biophys Sin (Shanghai), 2004,36(5): 365-70
    [25] Louie B, Mork P, Martin-Sanchez F, et al. Data integration and genomic medicine. J Biomed Inform, 2007, 40(1): 5-16
    [26] Sujansky W, Heterogeneous database integration in biomedicine. J Biomed Inform, 2001, 34(4): 285-98
    [27] Spellman P T, Miller M, Stewart J, et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol, 2002, 3(9): RESEARCH0046
    [28] Hanisch D, Zimmer R, Lengauer T. ProML--the protein markup language for specification of protein sequences, structures and families. In Silico Biol, 2002, 2(3): 313-24
    [29] Fenyo D. The Biopolymer Markup Language. Bioinformatics, 1999, 15(4): 339-40
    [30] Noy N F, Semantic integration: a survey of ontology-based approaches. ACM SIGMOD Record, 2004, 33(4): 65-70
    [31] Wache H, Vaele T, Visser U, et al. Ontology-based integration of information-a survey of existing approaches. IJCAI-01 Workshop: Ontologies and Information Sharing, 2001, 2001: 108-117
    [32] Wang X, Gorlitsky R, Almeida J S. From XML to RDF: how semantic web technologies will change the design of 'omic' standards. Nat Biotechnol, 2005, 23(9): 1099-103
    [33] Decker S, Melnik S, van Harmelen F, et al. The Semantic Web: the roles of XML and RDF. Internet Computing, IEEE, 2000, 4(5): 63-73
    [34] Stanislaus R, Jiang L H, Swartz M, et al. An XML standard for the dissemination of annotated 2D gel electrophoresis data complemented with mass spectrometry results. BMC Bioinformatics, 2004, 5: 9
    [35] Li F, Li M, Xiao Z, et al. Construction of a nasopharyngeal carcinoma 2D/MS repository with Open Source XML database--Xindice. BMC Bioinformatics, 2006, 7: 13
    [36] Lin S M, Zhu L, Winter A Q, et al. What is mzXML good for? Expert Rev Proteomics, 2005, 2(6): 839-45
    [37] Blake J. Bio-ontologies-fast and furious. Nat Biotechnol, 2004, 22(6): 773-4
    [38] Diehl A D, Lee J A, Scheuermann R H, et al. Ontology development for biological systems: immunology. Bioinformatics, 2007, 23(7): 913-5
    [39] Jiang K, Nash C. Ontology-based aggregation of biological pathway datasets. Conf Proc IEEE Eng Med Biol Soc, 2005, 7: 7742-5
    [40] The Gene Ontology (GO) project in 2006. Nucleic Acids Res, 2006, 34(Database issue): D322-6
    [41] Ashburner M, Ball C A, Blake J A, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000, 25(1): 25-9
    [42] Eilbeck K, Lewis S E, Mungall C J, et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol, 2005, 6(5): R44
    [43] Stein L D, Mungall C, Shu S, et al. The generic genome browser: a building block for a model organism system database. Genome Res, 2002, 12(10): 1599-610
    [44] Azuaje F, Al-Shahrour F, Dopazo J, Ontology-driven approaches to analyzing data in functional genomics. Methods Mol Biol, 2006, 316: 67-86
    [45] Ilic K, Kellogg E A, Jaiswal P, et al. The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant. Plant Physiol, 2007, 143(2): 587-99
    [46] Smith B, Ceusters W, Klagges B, et al. Relations in biomedical ontologies. Genome Biol, 2005, 6(5): R46
    [47] Thurin A, Carlsson M, Gill H, et al. Arden syntax and GALEN terminology support: a powerful combination to represent medical knowledge. Medinfo, 1995, 8 Pt 1: 110
    [48] Heja G, Surjan G, Lukacsy G, et al. GALEN based formal representation of ICD10. Int J Med Inform, 2007, 76(2-3): 118-23
    [49] Heja G, Varga P, Pallinger P, et al. Restructuring the foundational model ofanatomy. Stud Health Technol Inform, 2006, 124: 755-60
    [50] Mungall C J, Obol: integrating language and meaning in bio-ontologies. Comparative and Functional Genomics, 2004, 5(6-7): 509-520
    [51] Adams M D, Celniker S E, Holt R A, et al. The genome sequence of Drosophila melanogaster. Science, 2000, 287(5461): 2185-95
    [52] Rensink W, Hart A, Liu J, et al. Analyzing the potato abiotic stress transcriptome using expressed sequence tags. Genome, 2005, 48(4): 598-605
    [53] Doan A H, Madhavan J, Domingos P, et al. Learning to map between ontologies on the semantic web. Proceedings of the eleventh international conference on World Wide Web, 2002: 662-673
    [54] Doan A H, Madhavan J, Dhamankar R, et al. Learning to match ontologies on the Semantic Web. The International Journal on Very Large Data Bases, 2003, 12(4): 303-319
    [55] Doan A, Madhavan J, Domingos P, et al. Ontology matching: A machine learning approach. Handbook on Ontologies in Information Systems, 2004: 397?16
    [56] Nicole Alexander S R, RDF Object Type and Reification in Oracle - Technical White Paper, 2005
    [57] Eugene Inseok Chong S D, George Eadon. Jagannathan Srinivasan, An Efficient SQL-based RDF Querying Scheme. 31st Very Large Data Bases Conference, Trondheim, Norway, 2005
    [58] Wilkinson M, Schoof H, Ernst R, et al. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiol, 2005, 138(1): 5-17
    [59] Martin S, Hohman M M, Liefeld T. The impact of Life Science Identifier on informatics data. Drug Discov Today, 2005, 10(22): 1566-72
    [60] Page R D. LSID Tester, a tool for testing Life Science Identifier resolution services. Source Code Biol Med, 2008, 3(1): 2
    [61] Page R D. A Taxonomic Search Engine: federating taxonomic databases using webservices. BMC Bioinformatics, 2005, 6: 48
    [62] Karp P D. An ontology for biological function based on molecular interactions. Bioinformatics, 2000, 16(3): 269-85
    [63] Stevens R, Goble C A, Bechhofer S. Ontology-based knowledge representation for bioinformatics. Brief Bioinform, 2000, 1(4): 398-414
    [64] Mungall C J, Emmert D B. A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics, 2007, 23(13): i337-46
    [65] Eppig J T, Blake J A, Bult C J, et al. The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res, 2007, 35(Database issue): D630-7
    [66] Balasubramanian R, LaFramboise T, Scholtens D, et al. A graph-theoretic approach to testing associations between disparate sources of functional genomics data. Bioinformatics, 2004, 20(18): 3353-62
    [67] Stevens R, Horrocks I, Goble C, et al. Building a Reason-able Bioinformatics Ontology Using OIL. Ontologies and Information Sharing, 2001
    [68] Joslyn C, Mniszewski S. Combinatorial Approaches to Bio-Ontology Management with Large Partially Ordered Sets. SIAM Workshop on Combinatorial Scientific Computing (CSC 04), 2004
    [69] Shvaiko P, Euzenat J. A survey of schema-based matching approaches. Journal on Data Semantics, 2005, 4: 146-171
    [70] Rahm E, Bernstein P A. A survey of approaches to automatic schema matching. The VLDB Journal The International Journal on Very Large Data Bases, 2001, 10(4): 334-350
    [71] Ogren P V, Cohen K B, Acquaah-Mensah G K, et al. The compositional structure of Gene Ontology terms. Pac Symp Biocomput, 2004: 214-25
    [72] Bada M, Hunter L. Enrichment of OBO ontologies. J Biomed Inform, 2007, 40(3): 300-15
    [73] Bodenreider O, Burgun A. Linking the Gene Ontology to other biological ontologies. ISMB Bio-ontologies SIG meeting, 2005
    [74] Johnson H L, Cohen K B, Baumgartner Jr W A, et al. Evaluation of lexical methods for detecting relationships between concepts from multiple ontologies. Proc Pacific Symp Biocomput, 2006: 28-39
    [75] Bada M, Turi D, McEntire R, et al. Using reasoning to guide annotation with gene ontology terms in GOAT. ACM SIGMOD Record, 2004, 33(2): 27-32
    [76] Kumar A, Smith B, Borgelt C. Dependence relationships between Gene Ontology terms based on TIGR gene product annotations. Proceedings of the 3rd International Workshop on Computational Terminology. 2004 Aug. 29. Geneva, Switzerland
    [77] Wroe C J, Stevens R, Goble C A, et al. A methodology to migrate the gene ontology to a description logic environment using DAML+OIL. Pac Symp Biocomput, 2003: 624-35
    [78] Bodenreider O, Aubry M, Burgun A. Non-lexical approaches to identifying associative relations in the gene ontology. Pac Symp Biocomput, 2005, 91: 102
    [79] Beissbarth T, Speed T P. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics, 2004, 20(9): 1464-5
    [80] Dhanapalan L, Chen J Y. A case study of integrating protein interaction data using semantic web technology. Int J Bioinform Res Appl, 2007, 3(3): 286-302
    [81] Kobayashi N, Toyoda T. Statistical Search on the Semantic Web. Bioinformatics, 2008
    [82] Mayfield J, Finin T. Information retrieval on the Semantic Web: Integrating inference and retrieval. SIGIR Workshop on the Semantic Web, Toronto, 2003, 1
    [83] Wilkinson K, Sayers C, Kuno H, et al. Efficient RDF Storage and Retrieval in Jena2. Proceedings of Semantic Web and Databases Workshop, 2003, 3: 7-8
    [84] David Wood P G, Tom Adams. Kowari: A Platform for Semantic Web Storage and Analysis. Proceedings of the 14th International WWW Conference, 2005
    [85] Jeen Broekstra A K, Frank van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. International Semantic Web Conference, Sardinia, Italy, 2002
    [86] Sealfon R S, Hibbs M A, Huttenhower C, et al. GOLEM: an interactive graph-based gene-ontology navigation and analysis tool. BMC Bioinformatics, 2006, 7: 443
    [87] Shoop E, Casaes P, Onsongo G, et al. Data exploration tools for the Gene Ontology database. Bioinformatics, 2004, 20(18): 3442-54
    [88] Kent W J, Hsu F, Karolchik D, et al. Exploring relationships and mining data with the UCSC Gene Sorter. Genome Res, 2005, 15(5): 737-41
    [89] Liu H, Hu Z Z, Wu C H. DynGO: a tool for visualizing and mining of Gene Ontology and its associations. BMC Bioinformatics, 2005, 6: 201
    [90] The Gene Ontology project in 2008. Nucleic Acids Res, 2008, 36(Database issue): D440-4
    [91] Berriz G F, White J V, King O D, et al. GoFish finds genes with combinations of Gene Ontology attributes. Bioinformatics, 2003, 19(6): 788-9
    [92] Stein L D. Integrating biological databases. Nat Rev Genet, 2003, 4(5): 337-45
    [93] Quan D A, Karger R. How to make a semantic web browser. Proceedings of the 13th international conference on World Wide Web, 2004: 255-265
    [94] Myhre S, Tveit H, Mollestad T, et al. Additional Gene Ontology structure for improved biological reasoning. Bioinformatics, 2006, 22(16): 2020-2027
    [95] Horrocks I, Sattler U. Ontology reasoning in the SHOQ (D) description logic. Proc. of the 17th Int. Joint Conf. on Artificial Intelligence (IJCAI 2001), 2001: 199?04
    [96] de Bruijn J, Lara R, Polleres A, et al. OWL DL vs. OWL flight: conceptual modeling and reasoning for the semantic Web. Proceedings of the 14th international conference on World Wide Web, 2005: 623-632
    [97] Hahn U, Schulz S, Romacker M. Part-whole reasoning: a case study in medical ontology engineering. Intelligent Systems and Their Applications, IEEE [see alsoIEEE Intelligent Systems], 1999, 14(5): 59-67
    [98] Rogers J, Rector A. GALEN's model of parts and wholes: experience and comparisons. Proc AMIA Symp, 2000: 714-8
    [99] Schug J, Diskin S, Mazzarelli J, et al. Predicting Gene Ontology Functions from ProDom and CDD Protein Domains. Genome Research, 2002, 12(4): 648-655
    [100] Aranguren M E, Bechhofer S, Lord P, et al. Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL. BMC Bioinformatics, 2007, 8: 57
    [101] Marley T, Rector A. Use of OWL-based tools to aid message development and maintenance. Med Inform Internet Med, 2007, 32(1): 43-9
    [102] Day-Richter J, Harris M A, Haendel M, et al. OBO-Edit - An Ontology Editor for Biologists. Bioinformatics, 2007
    [103] Aitken S, Korf R, Webber B, et al. COBrA: a bio-ontology editor. Bioinformatics, 2005, 21(6): 825-6
    [104] Oracle 11g: Semantic Data Integration for the Enterprise. Oracle White Paper, 2007
    [105] Doms A, Schroeder M. GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Res, 2005, 33(Web Server issue): W783-6
    [106] Bard J B, Rhee S Y. Ontologies in biology: design, applications and future challenges. Nat Rev Genet, 2004, 5(3): 213-22

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700