F/10及G/11木聚糖酶家族的数学建模与分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基因序列的相似性研究是生物信息学研究的热门问题之一.随着人类基因组计划的相续完成,大量的基因序列被相续测序,蛋白质序列的相似性研究变得越来越复杂,工作量越来越大.因此,研究新的序列比对方法便成了迫切的问题.而基因序列的图形表示方法则是研究基因序列相似性的一种行之有效的方法.本文的主要工作包括以下几个方面:
     1、在DNA序列的混沌游走方法(CGR)及DNA序列的4线图谱表达方法(4-LGR)的基础上,提出了一种新型DNA序列的表达方法—矩阵图谱表达法(MGR).进一步,在DNA序列的上述三种表达式基础上,分别建立了基于经典HP模型的蛋白质序列的图谱表达法,而且对蛋白质序列的相似性进行了比较验证.
     2、基于经典HP模型下,利用蛋白质序列的矩阵图谱表达法(MGR)及数值刻划的思想提出了一种新的蛋白质序列的比对方法.通过观察蛋白质序列的数值刻划图及计算两蛋白质序列之间的欧氏距离d ,对木聚糖酶两家族的蛋白质序列进行了相似性分析.
     3、在石秀凡及朱平等人提出的拟氨基酸编码方法的基础上,计算了F/10和G/11木聚糖酶家族的同义密码子的二个相对使用度,即RSCU和QRSCU.通过分析和比较得到,基于拟氨基酸的编码方法能更明显的展示出密码子家族中对同义密码子的一致偏好性.也就是说,基于拟氨基酸编码方法下的F/10与G/11木聚糖酶家族更偏好使用密码子-反密码子结合作用强的密码子,恰好是以g/c结尾的密码子.这些结果与前人的偏好性研究结果一致,并且我们进一步验证了拟氨基酸的编码方法与密码子偏好性的研究结果密切相关.
     4、本文采用Jeffrey于1990年提出的描绘DNA序列的混沌游走方法(CGR)给出了F/10及G/11木聚糖酶家族的核酸序列的CGR图,计算了相应的马尔可夫两步转移概率,进而计算了F/10和G/11家族同义密码子的偏好使用度.通过以上分析得出的结论是,碱基的偏好使用情况与序列的G/C含量和分子进化成正相关性.
     文中的研究结果表明,上述的研究是有意义的,其具有实用价值,对今后的这一方面的研究具有极大的帮助.
The similarity research of gene sequence is one of the most hot question in the area of bioinformatics. With the completion of human genome project, a large number of gene sequences are measured, similarity research of protein sequences become more complex and workload more heavy. Therefore, the study of new methods of sequence alignment has become a urgent issue. In fact, graphical representation method of gene sequences is an effective method of research sequence similarity. The mains contents are listed as follows:
     1、Based on the chaos game representation method (CGR) of DNA sequences and 4 Line Graphical Representation method (4-LGR) of DNA sequences, we proposed a novel graphical representation method of DNA sequences—Matrix Graphical Representation (MGR).Further, on the basis of the above three kinds of DNA sequences model, we extend graphical representation of protein sequences based on the detailed HP model respectively. Then, the similarity of protein sequences is compared.
     2、Based on the detailed HP model, using the idea of matrix graphical represention of protein (MGR) and numerical description, we proposed a new method to align two protein sequences. Through review numerical description graph of protein sequences and compute Euclidean distance d between that two sequences, we analyse the similarity of protein sequences about two xylnase family.
     3、According to the work of Shi Xiufan and Zhu Ping et al, the paper computes the relative usage degree of the synonymous codon of F/10 and G/11 Xylanase: (RSCU and QRSCU). Through the analysis and comparison we can see that based on the classification of the quasi-amino acid can more abvious show the consistent preference to the synonymous codon. That is to say, based on the classification of quasi-amino acids, F/10 and G/11 Xylanase prefer to use the codons with strong combination of the codon-anticodon, just the codons ending with g/c. The conclusion accords with the preference studying about the 78 human genes. And further verified codon preference closely related to quasi-amino acids coding method.
     4、According to the CGR of the DNA sequences proposed by Jeffrey in 1990, the paper researched the gene sequences of F/10 and G/11 family and gived the CGR of gene sequences. At the same time, we further gived the corresponding probability matrix for the second-order Markov Chain model and computed the relative usage degree of the synonymous codon. Through the analyst we can see that the use of preferences of the synonymous codon closely related to the G/C content and molecular evolution.
     The paper’s results indicate that the research is meaningful, and it has great practical value for future research in this area.
引文
[1]贺平安. DNA序列及蛋白质序列的分析与比较[D]: [博士学位论文].大连:大连理工大学, 2004.
    [2]白凤兰.生物序列的图形表示及其应用[D]: [博士学位论文].大连:大连理工大学, 2005.
    [3]刘亮伟,秦天苍,翟继,张科,刘全军. F/10及G/11木聚糖酶家族密码子偏好性分析[J].河南农业大学学报, 2008, 42: 223-227.
    [4]刘亮伟,秦天苍,王宝,刘全军,刘新育,王明道.木聚糖酶的分子进化[J].食品与生物技术学报, 2007, 26: 110-116.
    [5]朱平,管维红,高雷,徐振源.基于氨基酸特征序列的蛋白质结构分析[J].生物信息学, 3 (2008) 106-109.
    [6] Jonas S.Almeida, Joao A.Carrico, Peter A.Noble, Madilyn Fletcher.Analysis of genomic sequences by chaos game representation[J]. Bioin format ics, 2001, 17(5): 429-437.
    [7] Milan Randic. Another look at the chaos-game representation of DNA[J]. Chemical Physics Letters, 2008, 456: 84-88.
    [8] Soumalee Basu, Archana Pan, Chitra Dutta, and Jyotirmoy Das. Chaos game representation of proteins [J]. Journal of Molecular Graphics and Modelling, 1997, 15: 279-289.
    [9] Xinchao Zhao. Advances on protein folding simulations based on the lattice HP models with natural computing[J]. Applied Soft Computing 8(2008): 1029-1040.
    [10] A. Nandy, P. nandy.S. C.Basak, Quantitative descriptor for SNP related gene sequences, Internet Ele. J.Mol. Des., 1(2002), 367-373.
    [11] A. Nandy, Investigation on evolutionary changes in base sequences, Internet Ele. J.Mol. Des., 1(2002)10, 545-558.
    [12] A. Nandy, P. nandy, On the uniquences of Quantitative of DNA difference descriptors in 2D graphical representation models, Chemical Physics Letters, 368(2003), 102-107.
    [13] X. F. Guo, A. Nandy, S. C. Basak, A novol 2-D graphical representation of DNA sequences of low degeneracy, Chemical Physics Letters, 350(2003), 361-366.
    [14] M. Randic, S.C.Basak, Characterization of DNA primary sequences based on the average distances between bases, J. Chem. Inf. Comput. Sci., 41(2001), 561-568.
    [15]张振慧.蛋白质分类问题的特征提取算法研究[D]: [博士学位论文].北京:国防科学技术大学,2006.
    [16] http://www.ncbi.nlm.nih.gov/protein/465492.
    [17] http://www.ncbi.nlm.nih.gov/protein/1722897.
    [18] http://www.ncbi.nlm.nih.gov/protein/145594.
    [19] http://www.ncbi.nlm.nih.gov/protein/2851474.
    [20]赵国屏.生物信息学[M].北京:科学出版社, 2008.
    [21]钟杨,张亮,赵琼.简明生物信息学[M].北京:高等教育出版社, 2001.
    [22]郝柏林.生物信息学[J].中国科学院院刊, 2000(4): 260-264.
    [23]刘亮伟.木聚糖酶蛋白质序列分析、分子进化和分子模拟[D]: [博士学位论文].无锡,江南大学, 2005.
    [24]张凯,耿修堂,肖建华,赵东明. DNA计算中核酸序列设计方法比较研究[J].计算机学报, 2008, 12(31): 2149-2154.
    [25] Zhu Ping, Tang Xuqing and Xu Zhenyuan. The structure analysis of protein sequence based on the quasi-amino acids code[J]. Chinese physics B, 1 (2009) 363-367.
    [26]朱平,高雷,徐振源.基于拟氨基酸编码方法下的同义密码子的偏好性仍与结合强度密切相关[J].物理学报, 2009(6): 714-719.
    [27]石秀凡,黄京飞,梁宠荣,柳树群,谢君,刘次全.人类基因中同义密码子的偏好性与密码子-反密码子间的结合强度密切相关吗?[J].科学通报, 2000(45): 2520-2525.
    [28]石秀凡,黄京飞,柳树群,刘次全.人类基因同义密码子偏好的特征以及与基因GC含量的关系[J].生物化学与生物物理进展[J], 2002(29): 411-414.
    [29]苏玉春,陈光,白晶,韩微微.木聚糖酶的产生条件优化[J].吉林农业大学学报, 2008(30): 793-796.
    [30]杨潇,胡军,陈祥贵.人类蛋白编码基因外显子谱和同义密码子偏好的研究[J].西华大学学报(自然科学版), 2008, 27.
    [31]邵蔚蓝,毛忠贵,薛业敏.极端耐热木聚糖酶基因在大肠杆菌中的高效表达[J].食品与发酵工业, 2003(29): 20-25.
    [32]刘亮伟,秦天苍,刘新育,等.热稳定性木聚糖酶结构模拟及分析[J].河南农业大学学报, 2007(41): 304-308.
    [33]顾万君,马建民,周童,等.不同结构的蛋白编码基因的密码子偏好性研究[J].生物物理学报, 2002(18): 81-86.
    [34] LIU LW, ZHANG J, CHEN B, et al. Principal component analysis in F/10 and G/11 xylanase[J]. Biochem Biophys Res Commum, 2 004(322): 277-280.
    [35] Zu-Guo Yu, Vo Anh, Ka-Sing Lau. Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses[J]. Journal of Theoretical Biology, 2004(226): 341-348.
    [36] ZHU Ping, GUAN Wei-hong,GAO Lei, XU Zhen-yuan. Analysis of the protein structure based on amino acids characteristics sequences[J]. China Journal of Bioinformatics, 2008(6): 106-109.
    [37] P.Deschavanne, P.Tufféry. Exploring an alignment free approach for protein classification and structual class prediction[J]. Biochimie, 2008(90): 615-625.
    [38] Orawan Tinnungwattana and Chidchanok Lursinsap. Statistical feature selection from chaos representation for promoter recognition[J]. LNCS 2006(3992): 838-845.
    [39] Jun Wang, Xiaoqi Zheng. Comparison of protein secondary structures based on backbone dihe-dial angles[J]. Journal of Theoretical Biology, 2008(250): 382-387.
    [40] Na Liu, Tianming Wang. Graphical representations for protein secongary structure sequences and their application[J]. Chemical Physics Letters, 2007(435): 127-131.
    [41] Weijuan Fu, Yuanyuan Wang, senior member, IEEE, Daru Lu. Multifractal analysis of genomic sequences' CGR Image[J]. Engineering in Medicine and Biology 27th Annual Conference, 2005: 4783-4786.
    [42] Gao jie, Xu Zhen-Yuan. Chaos game representation (CGR) -wolk model for DNA sequences[J]. China Physics B, 2009, 1(1): 370-376.
    [43] Yingwei Wang, Kathleen Hill, Shiva Singh, Lila Kari. The spectrum of genomic signatures:from dinucleotides to chaos game representation[J]. Gene, 2005(346): 173-185.
    [44]张立婷,管维红,徐振源.基于蛋白质CGR的线粒体蛋白质序列比对[J].计算机工程与应用, 2008, 44(13): 50-53.
    [45]白风兰. DNA序列的特征数值及相似性分析[J].数学的实践与认识, 2007, 37(18): 95-99.
    [46]周迎春,骆嘉伟,张惜珍.基于图形表达的基因序列相似性分析[J].科学技术与工程, 2007, 21 (7): 5593-5599.
    [47]史晓红,刘向荣,罗亮,刘文斌,许进.基于氨基酸分类的基本氨基酸秩序的研究[J].生物数学学报, 2005, 20(4): 491-495.
    [48] Orawan Tinnungwattana and Chidchanok Lursinsap. Statistical Feature Selection from Chaos Representation for Promoter Recognition. LNCS 3992 (2006): 838-845.
    [49] Jun Wang, Xiaoqi Zheng. Comparison of protein secondary structures based on backbone dihe-dial angles. Journal of Theoretical Biology, 250 (2008): 382-387.
    [50] Jun Wang, Xiaoqi Zheng. Comparison of protein secondary structures based on backbone dihe-dial angles. Journal of Theoretical Biology, 250 (2008): 382-387.
    [51] J.M.Guterrez, M.A.Rodriguze, G.Abramson. Multifractal analysis of DNA sequences using a novel chaos-game representation[J]. Physica A, 2001(300): 271-284.
    [52] L. Regad, F. Guyon, J. Maupetit, P. Tufféry, A. C. Camproux. A Hidden Markov Model applied to the protein 3D structure analysis[J]. Computational Statistics & Date Analysis, 52(2008): 3198-3207.
    [53]李春.生物大分子的数学描述及其应用[D]: [博士学位论文].大连:大连理工大学, 2006.
    [54]汪旭生.基于生物信息学方法分析基因家族及非编码序列的研究[D]: [博士学位论文].浙江:浙江大学, 2006.
    [55]彭司华.计算智能在生物信息学中的应用研究[D]: [博士学位论文].浙江:浙江大学, 2004.
    [56]刘立伟.蛋白质及RNA结构比较与进化分析[D]: [博士学位论文].大连:大连理工大学, 2008.
    [57] Zhu P. Some study on quasi-ring and quasi-group[J]. Journal of Wuxi University of Light Industry, 2000, 19(1): 93-94.
    [58] Zhu.P. The structure and the numbers of the cyclic subsemigroups and the contruction of quasi-rings[J]. Pure and Applied mathematics, 2002, 18(3): 239-243.
    [59] Ping Zhu, Zhengpan Wang, Jinxing Zhao. Quasi-rings and Clifford quasi-regular semirings[J]. The Journal of Applied Algebra and Discrete S tructures, 2006, 4.
    [60]刘慧.蛋白质序列数据的分类预测研究[D]: [博士学位论文].上海:上海交通大学, 2007.
    [61]王俊.蛋白质折叠与蛋白质组成复杂性简化的研究[D]: [博士学位论文].南京:南京大学, 2001.
    [62]梁桂兆.生物序列表征体系构建及结构与功能关系研究[D]: [博士学位论文].四川:重庆大学生物工程学院, 2007.
    [63] Jiang F and Li N. Protein structural codes and nucleation sites for protein folding[J]. Chin. Phys. 2007(16): 392-404.
    [64] Xu H, Guo A M and Ma S S. The influence of base pair sequence on electronic structure of DNA molecules[J]. Acta Physica Sinica. 2007, 56:1208(in Chinese).
    [65] Liu T, Wang Y and Wang K L. Extended Holstein polaron model for charge transfer in dry DNA[J]. Acta Physica Sinica, 2007, 16:0272.
    [66] Jijoy joseph and Roschen Sasikumar. Chaos game representation for comparison of whole genomes[J]. BMC Bioinformatics, 2006, 7:243.
    [67] Bernard Fertil, et al. GENSTYLE:exploration and analysis of DNA sequences with genomic signature[J]. Nucleic Acids Research, 2005(33): 512-515.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700