基于生物医学本体的生物信息数据库集成方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
分析与处理分子生物学实验、特别是近年来涌现的高通量方法产生的海量数据是生物信息学的重要任务。大量计算机学科方法广泛地应用在这个领域中。分子生物学数据库是这两个学科的交汇点。截至2009年,国际上已经有1000个以上的生物信息数据库,这些数据库涉及分子生物学和生物信息学的各个领域,包含的数据类型复杂多样。
     通过分析现有数据库的内容和结构可以描绘生物信息学的发展现状以及探索新的研究方向。在这个过程中,能够挖掘数据库以及相关研究之间关系的数据库网络将是十分重要的。当前只有以研究领域对数据库的简单分类,整合并分析生物信息学数据库内容关系的研究还未在文献中见到。
     生物学知识固有的复杂性导致难以简单集成在已有的数据库或分子数据中。本体是一种形式化表示概念意义的描述以及概念之间关系等方面知识的方式。用唯一的标识符来标记生物学本体中的每个概念,可以用于检索分子数据库。
     本文整合部分现有的生物信息数据库资源,分析了生物信息数据库的一般特征和生物信息学研究的一般过程,设计了一个基于内容的生物信息数据库集成模型。本文使用概念/术语来描述每个数据库的内容,抽取生物医学本体的知识以建立概念之间的联系组成生物学概念网络,在概念网络的基础上建立生物信息数据库网络。通过进一步区分概念之间的关系类型,包括生物学关系,可以使生物信息数据库网络具有生物学意义。不同的关系赋予不同的权值,以此量化数据库间的关系,能够衡量网络中数据库之间关系的紧密程度,并基于此进行生物信息数据库检索。
     本文实现了一个生物信息数据库集成平台Bio-DB^2,通过整合部分现有的生物信息数据库资源,建立了基于内容的数据库网络。在实际开发中,Bio-DB^2还提供直观的关系视图来表示概念与数据库以及数据库与数据库之间的关系。
Bioinformatics need to process, analyse mass data from molecular biology, especially data generated by the high-throughput methods these years. Computer science and technology has been used in this area widely. Molecular biology databases are the meeting point to the two subjects. Until 2009, More than 1000 Bioinformatics database has been built up all around the world. These databases were involved in every subjects of molecular biology and bioinformatics, and were restored multi datatype of data.
     Thus we could depict current situation of Bioinformatics and explore new study by analysing content and structure of these databases. And a database network can mine relationship among databases(and relative research) will be helpful. Currently there is database list simply classified by research fields, and no reference mentioned content-based databases integration.
     Biological knowledge is inherently complex and so cannot readily be integrated into existing databases of molecular data. An ontology is a formal way of representing knowledge in which concepts are described both by their meaning and their relationship to each other. Unique identifiers that are associated with each concept in Bio-Medical Ontologies can be used for linking to and querying molecular databases. In this paper, we integrate some current biological database resource, design a database integration model through analysing common feature of biological database and common research process of bioinformatics. We use concept/term to describe the content of every database. Then we extract knowledge information from biological ontologies to associate concept with others. Database network will be built on this concept network.We further distinguish relation type between concepts--inherit, part and whole, and even biological relations, etc. Thereby Bio-DB network could be put biological meanings. Different weight have been add on relation types in order to quantify relationship of databases. And we develop retrieval base on the relationship score.
     In Bio-DB^2, we developed a database network named Bio-DB^2 that integrated current Bioinformatics database. Bio-DB^2 is a database network based on DB content. We also provide visual relation view to characterize relations between concept and database, or database and database.
引文
1 Galperin MY, Cochrane GR. Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009. Nucleic Acids Res. 2009 Jan;37(Database issue):D1-4.
    2 Fries JF, Hess EV, Klinenberg J. A standard database for rheumatic diseases. Arthritis Rheum. 1974 May-Jun;17(3):327-36.
    3 Dayhoff MO, Schwartz RM, Chen HR, Barker WC, Hunt LT, Orcutt BC. Nucleic acid sequence database. DNA. 1981;1(1):51-8.
    4 Orcutt BC, George DG, Fredrickson JA, Dayhoff MO. Nucleic acid sequence database computer system. Nucleic Acids Res. 1982 Jan 11;10(1):157-74.
    5 Burks C, Fickett JW, Goad WB, Kanehisa M, Lewitter FI, Rindone WP, Swindell CD, Tung CS, Bilofsky HS. The GenBank nucleic acid sequence database. Comput Appl Biosci. 1985 Dec;1(4):225-33.
    6 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2009 Jan;37(Database issue):D26-31.
    7 Cochrane G, Akhtar R, Birney E, et al.. Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Res. 2009 Jan;37(Database issue):D19-25.
    8 Sugawara H, Ikeo K, Fukuchi S, Gojobori T, Tateno Y. DDBJ dealing with mass data produced by the second generation sequencer. Nucleic Acids Res. 2009 Jan;37(Database issue):D16-8.
    9 Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007 Jan;35(Database issue):D301-3.
    10 Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008 Jan;36(Database issue):D419-25.
    11 Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008 Jan;36(Databaseissue):D480-4.
    12 Barrett T, Troup DB, Edgar R, et al.. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009 Jan;37(Database issue):D885-90.
    13 Parkinson H, Kapushesky M, Brazma A, et al.. ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009 Jan;37(Database issue):D868-72.
    14 Demeter J, Beauheim C, Ball CA. The Stanford Microarray Database: implementation of new analysis tools and open source release of software. Nucleic Acids Res. 2007 Jan;35(Database issue):D766-70.
    15 Wheeler DL, Barrett T, Yaschenko E, et al.. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D39-45.
    16 Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6.
    17 Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE. The Mouse Genome Database genotypes::phenotypes. Nucleic Acids Res. 2009 Jan;37(Database issue):D712-9.
    18 Swarbreck D, Wilks C, Huala E, et al.. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 2008 Jan;36(Database issue):D1009-14.
    19 Zhao W, Wang J, Wang J, et al.. BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D377-82.
    20 Tweedie S, Ashburner M, Zhang H, et al.. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 2009 Jan;37(Database issue):D555-9.
    21 Rogers A, Antoshechkin I, Sternberg PW, etc. WormBase 2007. Nucleic Acids Res. 2008 Jan;36(Database issue):D612-7.
    22 Keseler IM, Bonavides-Martinez C, Karp PD, et al.. EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res. 2009 Jan;37(Database issue):D464-70.
    23 Hong EL, Balakrishnan R, Cherry JM, et al. Gene Ontology annotations at SGD:new data sources and annotation methods. Nucleic Acids Res. 2008 Jan;36(Database issue):D577-81.
    24 Fu W, Sanders-Beer BE, Katz KS, Maglott DR, Pruitt KD, Ptak RG. Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Res. 2009 Jan;37(Database issue):D417-22.
    25 http://www.oxfordjournals.org/nar/database/cap/
    26 Neches R, Fikes RE, Gruber TR, et al. Enabling Technology for Knowledge Sharing. AIMagazine, 1991,12(3):36-56.
    27 Gruber TR. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 1993,5:199-220.
    28 Borst WN. Construction of Engineering Ontologies for Knowledge Sharing and Reuse. PhD thesis, University of Twente, Enschede, 1997,4.
    29 Studer R, Benjamins VR, Fensel D. Knowledge Engineering, Principles and Methods. Data and Knowledge Engineering, 1998,25(122):161-197.
    30 Karp PD. An ontology for biological function based on molecular interactions. Bioinformatics. 2000 Mar;16(3):269-85.
    31邓志鸿,唐世渭,张铭,杨冬青,陈捷. Ontology研究综述.北京大学学报(自然科学版),第38卷,第5期,2002年9月.
    32 Perez AG, Benjamins VR. Overview of Knowledge Sharing and Reuse Components :Ontologies and Problem-Solving Methods. In:Stockholm VR, Benjamins B, Chandrasekaran A, et al. Proceedings of the IJCAI299 workshop on Ontologies and Problem2Solving Methods (KRR5) 1999,1-15.
    33 Gruber TR. Towards Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies, 1995,43:907-928.
    34 Gene Ontology Consortium. The Gene Ontology project in 2008. Nucleic Acids Res. 2008 Jan;36(Database issue):D440-4.
    35 Bard JB, Rhee SY. Ontologies in biology: design, applications and future challenges. Nat Rev Genet. 2004 Mar;5(3):213-22.
    36徐培刚.生物医学本体匹配与集成技术的研究.哈尔滨工业大学硕士学位论文,2007.
    37 http://www.obofoundry.org/
    38 http://www.mir2disease.org/
    39 Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009 Jan;37(Database issue):D98-104.
    40 Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell (2004) 116:281–297.
    41 O’Connell RM, Rao DS, Chaudhuri AA, Boldin MP, Taganov KD, Nicoll J, Paquette RL, Baltimore D. Sustained expression of microRNA-155 in hematopoietic stem cells causes a myeloproliferative disorder. J. Exp. Med. (2008) 205:585–594.
    42 Griffiths-Jones S. The microRNA Registry. Nucleic Acids Res. (2004) 32:D109–D111.
    43 Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature (2005) 434:338–345.
    44 Brennecke J, Hipfner DR, Stark A, Russell RB, Cohen SM. bantam encodes a developmentally regulated microRNA that controls cell proliferation and regulates the proapoptotic gene hid in Drosophila. Cell (2003) 113:25–36.
    45 Cheng AM, Byrom MW, Shelton J, Ford LP. Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis. Nucleic Acids Res. (2005) 33:1290–1297.
    46 Krichevsky AM, King KS, Donahue CP, Khrapko K, Kosik KS. A microRNA array reveals extensive regulation of microRNAs during brain development. RNA (2003) 9:1274–1281.
    47 Wienholds E, Kloosterman WP, Miska E, Alvarez-Saavedra E, Berezikov E, de Bruijn E, Horvitz HR, Kauppinen S, Plasterk RH. MicroRNA expression in zebrafish embryonic development. Science (2005) 309:310–311.
    48 Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. (2008) 36:D154–D158.
    49 Nam S, Kim B, Shin S, Lee S. miRGator: an integrated system for functional annotation of microRNAs. Nucleic Acids Res. (2008) 36:D159–D164.
    50 Megraw M, Sethupathy P, Corda B, Hatzigeorgiou AG. miRGen: a database for the study of animal microRNA genomic organization and function. Nucleic Acids Res. (2007) 35:D149–D155.
    51 Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA (2006) 12:192–197.
    52 Hsu SD, Chu CH, Tsou AP, Chen SJ, Chen HC, Hsu PW, Wong YH, Chen YH, Chen GH,
    53 Huang HD. miRNAMap 2.0: genomic maps of microRNAs in metazoan genomes. Nucleic Acids Res. (2008) 36:D165–D169.
    54 Betel D, Wilson M, Gabow A, Marks DS, Sander C. The microRNA.org resource: targets and expression. Nucleic Acids Res. (2008) 36:D149–D153.
    55 Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. Prediction of mammalian microRNA targets. Cell (2003) 115:787–798.
    56 Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel, et al. Combinatorial microRNA target predictions. Nat. Genet. (2005) 37:495–500.
    57 Kruger J, Rehmsmeier M. RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res. (2006) 34:W451–W454.
    58 Calin GA, Croce CM. MicroRNA-cancer connection: the beginning of a new tale. Cancer Res. (2006) 66:7390–7394.
    59 Esquela-Kerscher A, Trang P, Wiggins JF, Patrawala L, Cheng A, Ford L, Weidhaas JB, Brown D, Bader AG, Slack FJ. The let-7 microRNA reduces tumor growth in mouse models of lung cancer. Cell Cycle (2008) 7:759–764.
    60 Nana-Sinkam SP, Geraci MW. MicroRNA in lung cancer. J. Thorac. Oncol. (2006) 1:929–931.
    61 Yu SL, Chen HY, Chang GC, Chen CY, Chen HW, Singh S, Cheng CL, Yu CJ, Lee YC, Chen HS, et al. MicroRNA signature predicts survival and relapse in lung cancer. Cancer Cell (2008) 13:48–57.
    62 Yang N, Coukos G, Zhang L. MicroRNA epigenetic alterations in human cancer: one step forward in diagnosis and treatment. Int. J. Cancer (2008) 122:963–968.
    63 Blenkiron C, Miska EA. miRNAs in cancer: approaches, aetiology, diagnostics and therapy. Hum. Mol. Genet. (2007) 16(Spec No 1):R106–R113.
    64 Calin GA, Sevignani C, Dumitru CD, Hyslop T, Noch E, Yendamuri S, Shimizu M, Rattan S, Bullrich F, Negrini M, et al. Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc.Natl Acad. Sci. USA (2004) 101:2999–3004.
    65 McManus MT. MicroRNAs and cancer. Semin. Cancer Biol. (2003) 13:253–258.
    66 Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, et al. MicroRNA expression profiles classify human cancers. Nature (2005) 435:834–838.
    67 Mooney SD, Altman RB. MutDB: annotating human variation with functionally relevant data. Bioinformatics. 2003 Sep 22;19(14):1858-60.
    68 Smith B, Ceusters W, Rosse C et al. Relations in biomedical ontologies. Genome Biol. 2005;6(5):R46.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700