摘要
GC含量是核酸序列组成的重要特征,其含量可作为反映进化的一种指标。为了探索GC含量作用于基因组的进化压力,本研究研究了大肠杆菌(Escherichia coli)和枯草杆菌(Bacillus subtilis)、真核生物酿酒酵母(Saccharomyces cerevisiae)三种模式微生物基因组编码序列的GC含量,分析了其基因组中蛋白质编码序列的GC含量与编码序列长度的关联,结果发现编码序列的GC含量与序列相对频数有一定的相关性,且编码序列的GC含量随序列相对频数的分布具有一定的规律。根据分布曲线我们推测这种规律应该符合某种分布,并用各种分布函数进行拟合,研究结果发现基因组中编码序列的GC含量随序列频数的分布与高斯分布相符合,且这种分布在真核生物和原核生物间有显著区别。另外,不同长度的编码序列GC含量的分析结果表明,编码序列的GC含量与序列长度有一定的相关性。
GC content is an important characteristic of nucleic acid sequence composition, which can be used as an index of evolution. In this research, in order to explore the evolutionary pressures of GC content in the genome, the GC contents of coding sequences in the genomes of three model microbes of Escherichia coli,Bacillus subtilis and Saccharomyces cerevisiae were studied, and the correlationship between GC content and length of the coding sequence were analyzed as well. The results indicated that there would be an existence of the correlationship between GC content of the coding sequence and the relative frequency of the sequence, and the GC content of coding sequence might have a certain rule with the relative frequency distribution of sequence.Therefore, we speculated that the law should conform to some kind of distribution based on the distribution curve, that would be a variety of distribution function to be fitted. The results found that the distribution of the GC content of the coding sequence with sequences relative frequency was consistent with the Gauss distribution,which was significantly different between eukaryote and prokaryote, and the results of GC content analysis of different length coding sequences show that the GC content of coding sequences has a certain correlation with the sequence length.
引文
Bernardi G.,1993,The vertebrate genome:isochores and evolution,Mol.Biol.Evol.,10(1):186-204
Bernardi G.,2000,Isochores and the evolutionary genomics of vertebrates,Gene,24(1):3-17
Bulmer M.,1991,The selection-mutation-drift theory of synonymous codon usage,Genetics,129(3):897-907
Carels N.,and Bernardi G.,2000,Two classes of genes in plants Genetics,154(4):1819-1825
Francino M.P.,and Ochman H.,1999,Isochores result from mutation,not selection,Nature,400(6739):30-31
Galtier N.,Piganeau G.,Mouchiroud D.,and Duret L.,2001GC-content evolution in mammalian genomes:the biased gene conversion hypothesis,Genetics,159(2):907-911
Halder B.,Malakar A.K.,and Chakraborty S.,2017,Nucleotide composition determines the role of translational efficiency in human genes,Bioinformation,13(2):46-57
Jabbari K.,and Bernardi G.,2017,An isochore framework underlies chromatin architecture,PLoS One,12(1):e0168023
Lin H.,Huang Y.Z.,and Zhang S.H.,2011,Correlation between genome size and GC content in prokaryotes with different lifestyles,Zhongshan Daxue Xuebao(Ziran Kexue Ban)(Acta Scientiarum Naturalium Universitatis Sunyatseni(Natural Science Edition)),50(3):90-93(林瀚,黄亚志,张尚宏,2011不同生活习性下原核生物基因组大小与GC含量的关系研究,中山大学学报(自然科学版),50(3):90-93)
Lobry J.R.,1997,Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species,Gene,205(1-2):309-316
Perry J.,and Ashworth A.,1999,Evolutionary rate of a gene affected by chromosomal position,Curr.Biol.,9(17):987-989
Romiguier J.,and Roux C.,2017,Analytical biases associated with GC-content in molecular evolution,Front Genet,8:16
Yadav M.K.,and Swati D.,2012,Comparative genome analysis of six malarial parasites using codon usage bias based tools,Bioinformation,8(24):1230-1239
Zhou H.Q.,Ning L.W.,Zhang H.X,and Guo F.B.,2014,Analysis of the relationship between genomic GC content and patterns of base usage,codon usage and amino acid usage in prokaryotes:similar GC content adopts similar compositional frequencies regardless of the phylogenetic lineages,PLoSOne,9(9):e107319
Zhou Z.W.,and Huang H.,1991,DNA extraction and GC content determination of mucoda fungi,Weishengwuxue Tongbao(Microbiology China),18(5):275-279(周志伟,黄河,1991,毛霉目真菌的DNA提取及其GC含量的测定,微生物学通报,18(5):275-279)