人类组织特异性基因的进化研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文基于公共数据库资源,以人类组织特异性基因为对象,利用生物信息学的理论和方法,对处于调控通路不同阶段的人类组织特异性基因的进化特征进行比较研究,揭示人类组织特异性基因和非组织特异性基因在特定进化特征上的差异,以及处于信号通路不同阶段的人类组织特异性基因之间存在的进化特征异同,进而对相应的潜在机制进行探索。另外,本文从共进化的角度揭示了导致编码区进化速率差异的可能原因,为深入研究此类基因的进化机制提供新的思路。主要研究结果如下:
     (1)通过系统分析人类组织特异性基因及其在小鼠中的同源基因,计算多种进化选择压力(如同义替代率(Ks)、非同义替代率(Ka)和进化速率)。结果显示调控通路中不同阶段的组织特异性基因具有不同的进化速率。和其他组织特异性基因相比,组织特异性的转录因子(包括Hox基因,non-Hox homeobox基因, HLH基因以及Zinger finger基因)具有相对较低的进化速率,而细胞外信号、膜结合的信号受体以及神经递质基因具有明显高的进化速率。
     (2)对不同种类组织特异性基因的非同义替代率进行比较分析,发现突变率的不同并不是造成进化速率不同的真正原因,而组织特异性转录因子处于较强选择压力以及功能和表达方式的差异才可能是潜在原因。
     (3)通过与非组织特异性基因比较发现并非所有的人类组织特异性基因均进化较快;和细胞信号及细胞分化相关的非组织特异基因相比,除神经递质和膜结合的细胞信号受体基因以外,其他所有的组织特异性基因的进化速率均显著较低。
     (4)进一步识别并分析了不同基因内含子区的可转移元件,发现编码区具有较低进化速率的基因在其相应内含子区便具有较低的可转移元件密度,表明编码区和其相应的内含子区之间可能存在协调进化。
     通过对上述研究结果的综合分析,初步揭示了人类组织特异性基因的特定进化特性,即:相对于非组织特异性基因,并非所有的人类组织特异性基因均进化较快,而且不同种类的人类组织特异性基因具有不同的进化速率,而不同编码区和其相应的内含子区之间存在的协调进化可能是此现象的潜在驱动因素。本文的研究结果为更深入探索此类基因的进化机制提供了相应基础和启示。
Based on the public databases, we investigate systematically the evolution features of the human tissue-specific genes which are involved in different places on the regulatory pathway, utilizing theories and methods of bioinformatics and evolutionary genomics. The work indicates that there exist the distinct evolution characteristics between the tissue-specific genes and the control. Further, we explore the underlying factors which may be the potential dynamics of the divergence in evolution rates. The presented work provides insight to the evolution selection of human tissue-specific genes, as well as new scenario for elucidating the potential evolutionary mechanism of corresponding genes.
     The main results are demonstrated as following:
     1) We examined different types of selection pressures [e.g. against amino acid mutations (Ka/Ks), against mutations at synonymous sites (Ks), against mutations at nonsynonymous sites (Ka)] by systemic analyses of human tissue-specific genes and their homologous genes in mouse. The results indicate that different classes of human tissue-specific genes involved in the regulatory pathway have divergent evolutionary rates. Similar to other tissue-specific genes, including signal transducers, nuclear receptors and neuroreceptors, tissue-specific transcription factors(Hox genes, non-Hox homeobox genes, HLH genes, Zinger finger genes) have on average lower value of Ka/Ks. However, the genes, including extracellular signals, signal receptors (Membrane-bound), neurotransmitters and other effectors, have the higher value of Ka/Ks.
     2) By calculating and comparing the values of Ks among all the classes of human tissue-specific genes, we find that the average synonymous rate does not exhibit the same magnitude of the average nonsynonymous substitution rate, illuminating that the mutation rate differences are not the main cause for the difference of selective constraints. The potential causes may underlie in the divergent function and expression.
     3) Compared to the cell-signaling and cell-division related genes, not all types of human tissue-specific genes evolve quickly. Apart from the Neurotransmitters and Signal receptors (Membrane-bound) genes, all the other tissue-specific genes have statistically lower value of Ka/Ks.
     4) We further identified and analyzed the transposable elements in the intronic regions of different genes. The results presented that the genes with lower evolution rates in coding regions have fewer density of transposable elements in intronic regions. It seems to imply that there exist potential correlation between the evolution of coding regions and corresponding intronic regions.
     In conclusion, in the paper, the divergent features and potential dynamics in evolution of human tissue-specific gene are detected and analyzed. The study provides a base clue and novel orientation for more profound investigations on evolution and divergence of tissue-specific genes.
引文
[1]陈润生,生物信息学,生物物理学报,1999,15(1):6~12.
    [2]郝柏林,张淑誉,生物信息学手册,2000.
    [3]孙啸,陆祖宏,谢建明,生物信息学基础,北京:清华大学出版社,2005.
    [4]赵国屏,生物信息学,北京, 2000.
    [5]张春霆,生物信息学—重大科学意义与经济效益兼备的新学科,中国科学基金,1999(2):65~68.
    [6] Peter H A, Sneath Sokal, Robert R. Numerical taxonomy: the principles and practice of numerical classification. San Francisco, W H Freeman,1973.
    [7] Pielou. Mathematical Ecology. Wiley, 1977.
    [8]史忠植,知识发现,北京:清华大学出版社,2002.
    [9] Baxevanis A D, Ouellette B F.生物信息学一基因和蛋白质分析的实用指南,李衍达,孙之荣等译。北京:清华大学出版社,2000.
    [10] Needleman S B, Wunsch C D, A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol.1970,48, 443-453.
    [11] Smith T F, Waterman M S, Identification of common molecular subsequences. J. Mol. Biol. 1981,147, 195-197.
    [12] Lander E S, Linton L M, Birren B, et al. Initial sequencing and analysis of the X21 human genome. Nature, 2001, 409: 860-921.
    [13] Venter C, Adams M D, Myers E W, et al. The sequence of the human genome.Science, 2001,291(5507): 1304-1351.
    [14] Huynen M A and Bork, Measuring genome evolution, Proc. Natl. Acad. Sci. USA , 1998, 95:5849~5856.
    [15]施晓秋,孔繁胜,计算机科学在生物信息学中的应用,浙江工业大学学报,2001.
    [16]黄原,生物信息学——信息时代生命科学研究的公共平台,中学生物教学,2004.
    [17]司徒琳莉,于敦亮,跨学科时代的基因工程研究,牡丹江师范学院学报,自然科学版, 1997.
    [18]侯国清,欧盟加强生物信息学研究,全球科技经济展望,2001(10):47.
    [19] Dewey T G, From microarray to networks: mining expression time series. Information biotechnology, 2002, 7(20)(Supplement): 170-175.
    [20] Kolpakov K A, Ananko E A, Kolesov G B, Kolchanov N A, GeneNet: a gene network database and its automated visualization, Bioinformatics, 1998, 14(6): 529-537.
    [21] Hvidsten T R, Lagreid A, Komorowski J, Bioinformatics, 2003 19(9): 1116-1123.
    [22] Hieter P and Boguski M. Functional genomics: It' s all how you read it. Science, 1997, 278:601-602.
    [23] Koonin E V, Aravind L, Kondrashov A S. The impact of comparative genomics on our understanding of evolution. Cell, 2000;101:573–576.
    [24] Kallioniemi A et al.Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors .Science, 1992, 258: 818-821.
    [25] Pollack J R, Perou C M, Alizadeh A A, et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet, 1999,23: 41-46.
    [26]彭翼飞,马文丽,寡核普酸阵列比较基因组杂交技术及其应用,中国生物工程杂志,2006,26 (10):46-51.
    [27] Urban A E, Korbel J O, Selzer R. High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. Proc Natl Acad , 2006, 103(12):4534- 4539.
    [28] Raap A K. Advances in fluorescence in situ hybridization. Mutat Res, 1998, 400:287-298.
    [29]王玲,宁顺斌,宋运淳,吕应堂,荧光原位杂交技术的发展与应用,植物学报, 2001,42(11):1101-1107.
    [30] Fu J J, Xia J H, Long Z G, et al. Study of chromosome painting for one rare carrier with complex transoclation. Chin J Genet, 1996, 23(2)∶167-172.
    [31]傅俊江,李麓芸,染色体涂染技术及其在染色体病诊断中的应用,中华医学遗传学杂志,1998,15(5):611-614.
    [32] O'Brien S J, et al, Comparative genomics: lessons from cats, Trends in Genetics, 13(10), (1999), 393-399.
    [33]李培青,刘焕民,朱必才,比较基因组学在哺乳动物进化研究中的应用,细胞生物学杂志,2006,28:47-50.
    [34] Rubin E, Pachter L, Dubchak I. Strategies and tools for whole genome alignments. Genome Research,2003,13:73-80.
    [35] Guha T D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res, 2006,34, 3585–3598.
    [36] Zaratiegui M, Irvine D V et al. Noncoding RNAs and gene silencing. Cell,2007,128(4): 763-766.
    [37] Frazer K A. Active conservation of noncoding sequences revealed by 3-way species comparisons. Genome Research,2000,10:1304-1306.
    [38] Stankiewicz P, Lupski J R. The Genomic Basis of Disease, Mechanisms and Assays for Genomic Disorders. Genome and Disease,2006,1:1-16.
    [39] O'Brien S J ,et al, The promise of comparative genomics in mammals. Science, 1999, 286: 458-462.
    [40] Bromberg Y and Rost B. SNAP: predict effect of non-synonymous polymorphisms on function.Nucleic Acids Res, 2007,35(11):3823-3835.
    [41] Nei M, Kumar S. Molecular Evolution and Phylogenetics. Oxford Univesity press,2000.
    [42] Resch, A M, Liran Carmel, Leonardo M R. Widespread Positive Selection in Synonymous Sites of Mammalian Genes. Molecular Biology and Evolution, 2007,24(8):1821-1831.
    [43] Shalon D, Smith S J and Brown P O. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res,1996, 6: 639-645.
    [44] Velculescu V E, Zhang L, Zhou W, Vogelstein J, Basrai M A, Bassett D E, Hieter P, Vogelstein B and Kinzler K W. Characterization of the yeast transcriptome. Cell,1997,88: 243-251.
    [45] Zhang L and W H Li. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol,2004,21: 236-239.
    [46] Pagani F and Baralle F E. Genomic variants in exons and introns: identifying the splicing spoilers. Nat Rev Genet,2004,5: 389-396.
    [47] Lynch M , Scofield D G, et al. The evolution of transcription-initiation sites.Mol Biol Evol 2005, 22: 1137-1146.
    [48] Xing, Y and Lee C. Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences. Proc Natl Acad Sci U S A 2005, 102: 13526-13531.
    [49] Ermakova, E. O., R. N. Nurtdinov, et al. Fast rate of evolution in alternatively spliced coding regions of mammalian genes. BMC Genomics 2006, 7: 84.
    [50] Chen F C and Chuang T J. The effects of multiple features of alternatively spliced exons on the K(A) /K(S) ratio test. BMC Bioinformatics 2006, 7: 259.
    [51] Hurst L D, The Ka/Ks ratio: diagnosing the form of sequence evolution, Trends Genet 2002,18 : 486.
    [52] Tzeng Y H, Pan R, Li W H, Comparison of three methods for estimating rates of synonymous and nonsynonymous nucleotide substitutions, Mol Biol Evol 2004,21 : 2290-2298.
    [53] Li W H, Unbiased estimation of the rates of synonymous and nonsynonymous substitution, J Mol Evol , 1993,36 : 96-99.
    [54] Pamilo P, Bianchi N O, Evolution of the Zfx and Zfy genes: rates and interdependence between the genes, Mol Biol Evol 1993,10 : 271-281.
    [55] Yang Z, PAML: a program package for phylogenetic analysis by maximum likelihood, Comput Appl Biosci, 1997,13 : 555-556.
    [56] Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al. Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860-921.
    [57] Simons C, Pheasant M, Makunin I V, Mattick J S.Transposon-free regions in mammalian genomes. Genome Res ,2006, 16(2):164-172.
    [58] Waterston R H, Lindblad-Toh K, Birney E, Rogers J, Abril J F, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P et al. Initial sequencing and comparative analysis of the mouse genome. Nature, 2002, 420(6915):520-562.
    [59] Sorek R, Lev-Maor G, Reznik M, Dagan T, Belinky F, Graur D, Ast G. Minimal conditions for exonization of intronic sequences: 5' splice site formation in alu exons. Mol Cell 2004, 14(2):221-231.
    [60] Smalheiser N R, Torvik V I. Mammalian microRNAs derived from genomic repeats. Trends Genet , 2005, 21(6):322-326.
    [61] Zheng C L, Fu X D, Gribskov M. Characteristics and regulatory elements defining constitutive splicing and different modes of alternative splicing in human and mouse. RNA,2005, 11(12):1777-1787.
    [62] Choi S S, Bush E C, Lahn B T, Different classes of tissue-specific genes show different levels of noncoding conservation, Genomics, 2006,87 : 433-436.
    [63] Harris M A, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al, The Gene Ontology(GO) database and informatics resource, Nucleic Acids Res. 2004,32 :D258–D261.
    [64] Kanehisa M, Goto S, et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res, 2006,34: D354-357.
    [65] Thompson J D, Gibson T J, et al. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res, 2005,25: 4876-4882.
    [66] Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci, 2005,13: 555-556.
    [67] Suyama M D, Torrents, et al. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res, 2006,34: W609-612.
    [68] Iwama H and Gojobori T. Highly conserved upstream sequences for transcription factor genes and implications for the regulatory network. Proc Natl Acad Sci U S A, 2004,101: 17156-17161.
    [69] Wagner G P, Amemiya C, et al. Hox cluster duplications and the opportunity for evolutionary novelties. Proc Natl Acad Sci U S A, 2003,100: 14603-14606.
    [70] Hsia C C and McGinnis W. Evolution of transcription factor function. Curr Opin Genet Dev 2003,13: 199-206.
    [71] Chen H, Blanchette M, Detecting non-coding selective pressure in coding regions, BMC Evol Biol 2007,7 Suppl 1 : S9.
    [72] Iwama H, Gojobori T, Highly conserved upstream sequences for transcription factor genes and implications for the regulatory network, Proc Natl Acad Sci U S A, 2004,101 : 17156-17161.
    [73] Wang Z, Rolish M E, Yeo G, Tung V, Mawson M, Burge C B, Systematic identification and analysis of exonic splicing silencers, Cell 2004,119 : 831-845.
    [74] Cartegni L, Chew S L, Krainer A R, Listening to silence and understanding nonsense: exonic mutations that affect splicing, Nat Rev Genet 2002,3 : 285-298.
    [75] King D C, Taylor J, Elnitski L, Chiaromonte F, et.al. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res, 2005,15: 1051-1060.
    [76] Blanchette M, Bataille A, Chen X, Poitras C et.al Genome-wide computation prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Research, 2006,16: 656-668.
    [77] Thomas E E, Short, local duplications in eukaryotic genomes, Curr Opin Genet Dev 2005,15 : 640-644.