病毒基因组生物信息可视化系统研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类基因组计划的初步完成,生物信息学的研究重点已悄然的从生物数据的积累转到生物数据的处理和信息提取。然而,在对DNA序列的研究过程中,代数学范畴的传统分析方法难以发挥人脑在模式识别方面的强大能力。本文以病毒基因组为研究对象,对其通用模型、序列Z曲线的构建、图形数据库建立等进行了研究。并在此基础上,构建新城疫病毒(Newcastle Disease Virus, NDV)生物信息可视化系统,具体研究内容如下:
     1.核酸序列Z曲线构建研究。主要在对经Z变换得到的空间数据点分析的基础上,给出一种基于包围盒的抽稀方案,应用该方案动态调整抽稀因子,实现对核酸序列所含生物遗传信息的多精度显示。实验表明,该方案既能对全基因组序列进行整体研究,又能对个别重要基因展开细节钻研,为序列同源性比较和分子进化研究提供了新思路。
     2.基于C/S模式的生物信息可视化系统模型研究。在总结给出生物信息可视化系统的一般工作流程的基础上,对传统C/S模式进行研究,提出一种基于C/S模式的生物信息可视化结构模型,该模型解决生物核酸序列数据的转换、处理、显示、分析等问题。
     3. NDV生物信息数据库的建立。针对不同格式序列文件结构、Z变换后的空间数据点数据结构、曲线属性数据结构,建立NDV生物信息数据库。通过矢量化曲线图,建立NDV病毒基因组核酸序列Z曲线图形数据库;然后建立核酸序列数据库;最后,实现图形数据库和序列数据库的挂接。
     4.生物信息可视化系统实现与应用。重点研究基于C/S模式的NDV生物信息可视化系统的设计与实现,并应用该系统进行生物信息显示、全基因组可视化分析以及不同基因序列可视化分析。
Along with the completion of human genimics project, the key research of bioinformatics has transferred from collecting data to processing data and abstracting information.However, in the process of analyzing the DNA sequence, the traditional methods based on statistics couldn’t play the strong power of people’s brain in pattern recognizing. Regarding virus genome as the research object, this dissertation does some research on the common model, the construction of Z curve and the establishment of graphic database. At last, it talks about the corresponding problems in applying the model to build the NDV bioinformatics visualization system.
     1. The research of constructing the Z curve of nucleotide sequecnce. Through analyzing the distribution of the spatial points by T-transformed, a rarefying algorithm based on bounding box is applied to construct the Z curve of nucleotide sequence. Applying the algorithm and adjusting the value of rarefying factor dynamically, the user could achieve a multi-precision Z curve corresponded to the nucleotide sequence. Experiments have proved that this algorithm is convenient for general research on complete genome sequence and detailed research on some specific gene, and provides a new idea for homology comparison and molecular evolution study.
     2. The research of bioinformation visualization system model based on C/S mode. Due to the problems of data management, integration and application in the bioinformation visualization system, this dissertation firstly sums up the general process, and then gives a kind of model based on C/S mode. This model solves the transformation, the procession, the display, the analysis of nucleotide.
     3. The establishment of NDV bioinformation database. According to the architecture of different format sequence file, the structure of space data points after Z-transformed and the data structure of curve attribute, the NDV bioinformation database is established. Through vectorization the Z curve graph, the Z curve graphic database of the nucleotide sequecnce of virus genome is established; and then the nucleotide sequecnce database is also founded. Finally, the hitching of graphic database and nucleotide sequecnce database is made.
     4. The implementation and application of bioinformation visualization system. The design and implementation of NDV bioinformation visualization system based on C/S mode is the emphasis in this chapter and the system is used to display some bioinformation, to make visualization analysis of complete genome and different genome sequence.
引文
[1] E. S. Lander, L. M. Linton, B. Birren, etal. Initial sequencing and analysis of the human genome.Nature, 2001, 409: 860-921.
    [2] C. Venter, M. D. Adams, E. W. Myers, etal. The sequence of the human genome. Science, 2001, 291(5507): 1304-1351.
    [3] A. T. Chinwalla, L. L. Cook, K. D. Delehaunty, etal. Initial sequencing and comparative analysis of the mouse genome. Nature. 420: 520-562.
    [4] E.Marshall. Rat Genome Spurs an Unusual Partnership. Science, 2001, 291(5510): 1872.
    [5] J. Yu, S. Hu, J. Wang, etal. A Draft sequence of the Rice Genome (Oryze sative L.sep indica). Science, 2002, 296(5565): 79-92.
    [6] Baxevanis, A.D. The molecular biology database collection: 2003 update[J]. Nucleic Acids Res. 2003, 31(1): 1-12.
    [7] Discala, C. etal. DBcat: a catalog of 500 biological databases [J]. Nucleic Acids Res. 2000, 28(1): 8-9.
    [8] International Human Genome Sequencing Consortium.Initial sequencing and analysis of the human genome. Nature.2001, 409(15): 860-921.
    [9] J. C. Venter, M. D. Adams, E. W. Myers, etal. The sequence of the human genome.Science. 2001, 291(5507): 1304-1351.
    [10] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, etal. GenBank. Nucleic Acids Research, 2003, 31: 23-27.
    [11] T. Kulikova, P. Aldebert, N. Althorpe, etal. The EMBL Nucleotide Sequence Database. Nucleic Acids Research.2004, 32: D27-D30.
    [12] Miyazaki, H Sugawara, T Gojobori, etal. DNA Data Bank of Japan (DDBJ) in XML.Nucleic Acids Research. 2003, 31(1): 13-16.
    [13] C.M.Chen. Information Visualization: Beyond the Horizon[M].London: Springer. 2004: 10-25.
    [14]余红梅,梁战平.可视化数据探索及其应用[J].情报科学.2007 25(4):599-603.
    [15]宋绍成,毕强,杨达.信息可视化的基本过程与主要研究领域[J].情报科学.200422(1): 13-18.
    [16]闻少鹏,包宏.科学计算可视化及其在相图中的应用研究[J].微计算机信息.2006 11(22): 227-229.
    [17]王媛媛,丁毅,孙媛媛,赵志丹.数据可视化技术的实现方法研究[J].现代电子技术.2007 (3):71-74.
    [18]周宁,张玉峰,张李义.信息可视化与知识检索[M].科学出版社.2005,1-8.
    [19]杨峰.知识域可视化研究[J].情报杂志.2007,26(6): 82-84.
    [20]闫殿武.IDL可视化工具入门与提高[M].机械工业出版社.2003,32-35.
    [21]岳小莉,曹存根.信息设计和知识设计[J].http: //lib.ict.ac.cn/ITL/data/2004/11/ A1.pdf..
    [22]孙延奎.可视化技术简介[J].http: //vis.cs.tsinghua.edu.cn/cisual_technology.htm.
    [23] Ling-ling,Chen.Hong-yu,Ou.Ren Zhang,etal. ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes,and its applications in analyzing SARS_CoV genomes[J].Biochemical and Biophysical Research Communications.307(2003): 382-388.
    [24]杨锡南,孙啸.生物信息学中基因数据可视化[J].计算机与应用化学.2001,18(5).
    [25] Timothy Arndt. Visual software tools for bioinformatics[J].Journal of Visual Languages and Computing 19,(2008): 291-301.
    [26]李兵,罗静初,潘卫.分子生物学数据库及相关软件的开发利用[J].遗传.1999, 21(4):52-53.
    [27]孙敏,马月辉,叶绍辉.生物信息学研究进展[J].家畜生态学报,2006,27(1):6-11.
    [28]王树林,王戟,陈火旺,张波云.基于分形的DNA序列可视化表示研究[J].计算机科学.2006, 33(7):158-166.
    [29]郑利民,生物化学:第二章核酸的结构与功能[M].http: //jpkc.sysu.edu.cn/2006 /shenghua/shengwuhuaxue/dierzhang.swf.
    [30] Jeffrey H J. Chaos game representation of gene structure[J]. Nucleic Acid Research .1990,18 (8): 2163-2170.
    [31] Hao Bai2Lin. Fractals f rom genomes2 exact solutions of a biology inspired problem[J]. Physica A. 2000, 282: 225-246.
    [32] Hao Bailin, Lee H C, Zhang Shuyu. Fractals related to long DNA sequences and complete genomes[J]. Chaos, Solitons and Fractals. 2000,11: 825-836.
    [33] Ashlock D, Golden J B. Iterated function system fractals for the detection and display of DNA reading frame[J]. In: Proceedings of the 2000 Congress on Evolutionary Computation, 2000.
    [34] Back T, Hammel U, Schwefel H2P. Evolutionary computation: Comments on the history and current sstate[J]. IEEE Transactions on Evolutionary Computation, 1997, 1(1): 3-17.
    [35] Chellapilla K, Czarnecki D. A preliminary investigation into evolving modular finite state machines. In: Proceedings of t he 1999 Congress on Evolutionary Computation. 1999, 1349-1356.
    [36] Nandy A. Graphical analysis of DNA sequence structure: III. Indications of evolutionary distinctions and characteristics of intronsand exons[J]. Curr. Sci, 1996 (70): 611-668.
    [37] Leong P M, etal.Morgent haler.Random walk and gap plots of DNA sequences[J]. Computer Application in Biosciences. 1995, 11(5): 503-507.
    [38] National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Healt h1 http: //www.ncbi.nlm.gov/; NCBI GenBank http: //www.ncbi.nlm.nih.gov/Genbank/.
    [39] Berger J A, Mit ra S K, etal.Visualization and analysis of DNA sequences using DNA walks[J]. Journal of the Franklin Institute. 2004, 341: 37-53.
    [40] Ashlock D, Golden J. Chaos automata: iterated function systems with memory[J]. Physica D, 2003, 181: 274-285.
    [41] ZHANG C T, ZHANG R. The Z curve database: a graphic representation of genome sequences[J]. Bioinformatics, 2003 (19): 593- 99.
    [42] LIAO B. Analysis of similarity/dissimilarity of DNA sequences based on a condensed curve representation [J].J. Mol. Struct: Theochem, 2005 (717): 199-203.
    [43] LIAO B. A 2 - D graphical rep resentation of DNA sequences[J]. Chemical Physics Letter.2005 (401): 196-199.
    [44] LIAO B. 3 - D graphical rep resentation of DNA sequences and their numerical characterization [J].Journal ofMolecular Structure, 2004 (681): 209-212.
    [45] LIAO B. Analysis of similarity/dissimilarity of DNA sequences based on 3 - D graphical representation [J].Chemical PhysicsLetters, 2004 (388): 195-200.
    [46] LIAO B. New 2 - D graphical rep resentation of DNA sequences[J]. Journal Computational Chemistry.2004, 25 (11): 1364-13689
    [47] RAND ICM. On 3-D graphical representation of DNA primary sequences and their numerical characterization[J]. J. Chem. Inf. Comput. Sci. 2000(40): 1235-1244.
    [48] ZHENGW X. Coronavirus phylogeny based on a geometric app roach [J]. Mol. Phy and Evo.2005, (36): 224-232.
    [49] LIAO B. Phylogenetic tree construction based on 2D graphical representation [J].Chemical Physics Letter. 2006 (422): 282-288.
    [50]骆嘉伟,张惜珍.一种新的基于3D图形的进化树构造方法.武汉理工大学学报:信息与管理工程版[J].2007,29(4):24-27.
    [51]张任,张春霆.Z曲线,显示和分析DNA序列的直观工具[J].自然杂志,1995年,17(1):34-37.
    [52]徐春铭,龙伟,王永庆.一种用于飞机易损性分析的几何描述方法[J].工程图学学报. 2002,4:120-126
    [53]马登武,叶文,李瑛.基于包围盒的碰撞检测算法综述[J].系统仿真学报.2006 18(4):1058-1061.
    [54]刘彦花,叶国华.矢量曲线抽稀算法分析[J].城市勘测. 2001,4:1-4.
    [55]肖书立,李世其,王峻峰.基于广义包围盒的交互操作在Vega环境中的应用[J].计算机应用[J].2006, 26(2):500-504.
    [56]唐荣锡,汪嘉业,彭群生.计算机图形学教程[M].科学出版社.2003.
    [57]张义宽,张晓滨,耿楠.计算机图形学[M].西安电子科技大学出版社.2004,100-104.
    [58] Liu Xf, Wan Hq, Ni X X, etal. Pathotypical and genotypical characterization of strain of Newcastle disease isolated from outbreaks in chicken and goose flocks in some regions of China during 1985-2001[J]. Arch Virol, 2003, 148(7): 1387-1403.
    [59] Saitou N, Nei M. The neighbor-joining method: A new method for reconstructing phylogenetic trees[J]. Mol Biol Evol. 1987, 4(4): 406-425.
    [60] Hua Sun, Wen F.Lu. Information management system for dynamic product development in distributed manufacturing enterprise. Systems Man and Cybemetics, 2003 IEEE International Conference. 2003, (l): 622-629.
    [61] William Stallings. Network Security Essentials: Applications and stardards[M].Pearson. 2003, 23-27.
    [62] L en Bass Paul Clemen.软件构架实践[M].清华大学出版社.2002,135-178.
    [63]贾化萍. C/S与B/S结合模式的大坝安全监测信息管理系统研究[D].江苏南京.河海大学.2006.
    [64]袁玲.三层客户/服务器体系结构实现信息管理系统技术探讨[D].四川成都.电子科技大学.
    [65]王爱云.基于C_S结构的煤炭资源管理信息系统的设计与实现[D].山东济南.山东科技大学.
    [66]张白妮,骆嘉伟,汤德佑.基于比对相似度动态矩阵聚类算法在基因序列中的应用[J].计算机应用.2004, 24(8):35-37.
    [67] Ballagi P A, Wehmann E, Herczeg J, et al. Identification and grophing of NDV strains by restriction site analysis of a region from the F genes[J]. Arch Virol. 1996, 141: 243-261.
    [68]祝雪金.基于C_S的尸体管理系统的设计与实现[D].上海.上海同济大学.
    [69]文瑞,欧阳炜宸.SQL Server 2005开发技术大全[M].清华大学出版社.2007.1-7
    [70]周伦江,庄向生,陈少莺.新城疫病毒的分子生物学及应用研究进展[J].中国家禽.2006 28(20):65-69.
    [71] Hualei Liu, Zhiliang Wang, Yangong Wu, etal. Molecular epidemiological analysis of Newcastle disease virus isolated in china in 2005[J]. Journal of Virological Methods 140(2007): 206-211.
    [72]秦卓明,马保臣,袁小远等.新城疫分离毒HN基因的分子特性和片段同源相关性[J].病毒学报.2007, 23(1): 39-44.
    [73]宋雪梅,李红滨,杜立新.比较基因组学及其应用[J].生命的化学.2006, 26(5): 425-427.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700