478种生物的密码对使用偏好性及其与翻译效率的相关性研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
根据中心法则,遗传信息的传递是由DNA到mRNA,再由mRNA到蛋白质。遗传信息在由mRNA到蛋白质的传递过程中是以三联体密码的形式传递的。每种氨基酸至少对应一个密码子,最多的有6种对应的密码子。编码同一种氨基酸的密码子称为同义密码子。人们已对不同物种的密码子使用偏好性进行了一些研究,发现不同物种在密码子的使用上存在着明显的偏好性;同一物种不同功能基因的密码子使用偏好性也存在较大的差异。61种有意密码子有3721(61×61)种不同的密码对组合。对于密码对用法的研究,早期主要集中在大肠杆菌等模式生物。这些研究结果表明,密码对的使用不是随机的,具有一定的偏好性。近年来,伴随着多种生物全基因组测序的完成,密码对的研究也进入了基因组水平。这些基因组水平上的研究近一步证实了密码对的使用偏好性是具有物种特异性的,并且这种偏好性不同于密码子的使用偏好性,但对于造成密码对使用偏好性的根源,还不是很清楚。已有的研究结果表明,密码对的使用与基因的翻译效率有关。有学者提出,蛋白质合成过程中,核糖体蛋白和密码子与反密码子对在核糖体的P位和A位上形成的空间结构影响了翻译的精确性和速率,而这种空间结构的稳定性是影响密码对使用偏好性的主要原因。
     基于密码对使用偏好性的生物信息学分析是研究基因表达、蛋白质翻译效率和基因组进化等课题中的一个重要环节。到目前为止,这方面的研究主要集中在研究单个基因或者基因组中所有基因的平均密码子使用偏好性。近年来的研究结果已经清晰的表明,核糖体对基因的翻译速度,在同一基因的不同区域是不同的。不同的密码对在基因序列上的排列顺序是否具有一定的规律?这些规律是否与基因不同区域的翻译速率有关系?这种关系是否是影响密码对使用偏好性的重要因素?这些问题是生物信息学和基因组学研究中极具挑战性的课题,但到目前为止还没有人研究。本论文利用基因组学和生物信息学的理论与技术,采用JAVA、Python和R等编程语言,针对不同的研究主题,分别编写了多个计算机程序,试图从涵盖细菌域(Bacteria)、古菌域(Archaea)和真核域(Eukarya)的478种生物的全基因组水平上分析密码对使用偏好性在基因序列的不同区域内的变化趋势,进而研究这些变化趋势与基因翻译效率的关系,以期揭示影响密码对非随机使用的进化因素,为基因表达和蛋白质翻译效率等方面的研究提供更多的理论基础。针对这一研究目标,我们开展了以下的研究:
     1.478种生物基因组水平上的密码对使用偏好性分析
     本研究的目的是在基因组水平上,分析478种生物的所有蛋白编码序列(coding sequence, CDS)中3721种密码对的组合模式,以期在不同的生物中找到普遍存在的密码对使用规律。
     我们从NCBI和USCS获取了人(Homo sapiens)、小鼠(Mus musculus)、大鼠(Rattus rattus)、牛(Bos Taurus)、果蝇(Drosophila melanogaster)、线虫(Caenorhabditis elegans)、酵母(Saccharomyces cerevisiae)、裂解酵母(Schizosaccharomyces pombe)、大肠杆菌(Escherichia coli)以及其它10种真菌(Fungi)、461种细菌和古细菌的CDS序列。针对该项研究,我们用JAVA和Python编程语言和R统计分析语言,编写了多个用于基因组水平上统计密码对使用频率的计算机程序并用数据库管理语言MySQL构建了相应的本地数据库。在所研究的478种生物中,我们分别计算了3721种密码对的使用偏好性分值(codon pair score,CPS)。密码对的CPS值越高说明该密码对在基因组上的偏好性越强。根据不同密码对的CPS值,我们首先分析了人、大鼠、小鼠、牛、果蝇、线虫、酵母、裂解酵母和大肠杆菌等9种模式生物中单个CDS序列的密码对使用偏好性(codon pair bias,CPB)。某一CDS序列的CPB值为该序列中所有密码对CPS值的算数平均值。研究结果表明,在所选取的这9种模式生物中,3721种密码对的使用具有强烈的偏好性。例如,人基因组上的17,635个CDS序列的CPB平均值为0.075,具有向正向偏移的趋势。
     根据基因组上3721种密码对的CPS值,针对基因组中的每一个CDS序列,我们按照CDS序列上密码对的排列顺序,构建了一个密码对偏好性分布型(CPS profile)。针对所研究的每一种生物,我们将该生物基因组中的所有CDS序列的密码对偏好性分布型分别从序列的5’和3’末端联配(aligning),并计算联配结果中的每一个密码对位点上CPS值的平均值,得到了该生物所有CDS序列的全基因组平均密码对偏好性分布型(averaged CPS profile)。分析基因组的平均密码对偏好性分布型表明,在所研究的478种生物中,有441种生物的全基因组CDS序列表现出相似的密码对偏好性变化规律,即在全基因组水平上,密码对的使用偏好性在CDS的5’末端普遍偏低,并由5’末端向3’末端逐步升高。我们将平均密码对偏好性分布型中出现的这一规律称为‘密码对斜坡’(codon pair ramp)。
     为了确定不同基因组中密码对斜坡的长度,我们使用sliding window法进一步分析了每种生物的平均密码对偏好性分布型。我们将平均密码对偏好性分布型的前120个密码对平均分为12个sliding window(每个sliding window包含10个连续的密码对)。通过Kolmogorov-Smirnov Test,我们比较了每个sliding window的平均CPS值与前120个密码对的平均CPS值,并将Kolmogorov-Smirnov Test中P值大于0.05时所对应的sliding window的位置定义为密码对斜坡的长度。通过这一算法,我们发现在所研究的479种生物中,有441种生物具有密码对斜坡,该密码对斜坡位于CDS序列的第20至第50个密码对之间(命名为:前密码对斜坡,head codon pair ramp),即CDS序列的前60至150个碱基之间。例如,在人基因组的CDS序列中,前40个密码对为前密码对斜坡区;该区域的平均CPS值为0.067,比前120个密码对的平均CPS值(0.072)低7﹪;而第50个密码对到第120个密码对的平均CPS值为0.076,比前120个密码对的平均CPS值高出6﹪。Kolmogorov-Smirnov Test的分析结果还表明,密码对斜坡在真核生物、细菌和古细菌中普遍存在,具有物种的特异性,但没有生物分类系统上的差别。
     为了进一步确定密码对斜坡的存在,我们分别计算了基因组中每一个CDS序列中前40个密码对的CPB值,并与每一个CDS序列的CPB值进行了比较。Paired t-test的比较结果表明,前40个密码对的CPB值,极显著的低于全序列的CPB值(Paired t-test, P<2.2E-16)。例如,在人基因组中,CDS序列的前40个密码对的平均CPB值为0.066,极显著的低于所有CDS序列的平均CPB值(0.075)(Paired t-test, P<2.2E-16)。
     通过分析全基因组的平均密码对偏好性分布型,我们还发现在所研究的478种生物中,密码对斜坡同时还存在于其中的413种生物例如人、大鼠、小鼠、牛、果蝇、线虫和大肠杆菌等的CDS序列的最后120个密码对中(命名为:后密码对斜坡,tail codon pair ramp);而在其余的69种生物例如酵母和裂解酵母等的CDS序列的最后120个密码对中,我们没有发现密码对斜坡的存在。除此之外,我们还发现在CDS序列的前120个密码对和后120个密码对中都发现密码对斜坡的413种生物中,有375种生物的前密码对斜坡的长度长于后密码对斜坡的长度。
     2.比较基因组密码对偏好性分布型和随机密码对偏好性分布型
     为了进一步证明我们所发现的密码对斜坡并不是随机的,而是生物基因组固有的内在特征,我们用R编程语言,结合Seqinr(http://seqinr.r-forge.r-project.org/)程序模块,编写了一个生成随机CDS序列的R计算机程序。利用codon randomization法和synonymous codon randomization法,我们对人、大肠杆菌和酵母这三种模式生物基因组上的每个CDS序列,分别生成了两组随机序列(每组包含50个随机序列)。Codon randomization法生成的随机序列保持了原有序列中61种有意密码子的使用频率不变,只是随机的改变CDS序列上密码对的排列顺序;而synonymous codon randomization法生成的随机序列不但保持了原有序列中61种有意密码子的使用频率不变,同时还保持了所编码的氨基酸序列不变,只是随机的改变序列上密码对的排列顺序。例如,对于人基因组中的17,635个CDS序列,我们用codon randomization法和synonymous codon randomization法分别生成了881,750个随机CDS序列。通过分析这两组随机序列,我们分别得到了人、大肠杆菌和酵母的两个随机密码对偏好性分布型(codon randomization profile和synonymous codon randomization profile)。在随机密码对偏好性分布型中,我们发现密码对的平均CPS值都是负值,说明在随机密码对偏好性分布型中出现的密码对在原有基因组中都是不常用的密码对;同时也说明原有基因组中的不同密码对出现的频率并不是随机的,即这些密码对的使用偏好性是具有物种特异性的,是基因组固有的特征。此外,在随机密码对偏好性分布型中,无论是对于前120个密码对还是后120个密码对,我们都没有发现密码对斜坡的存在。该结果也证明了我们在原有基因组中发现的密码对斜坡是生物固有的内在特征,而不是密码对在基因组上随机排列的结果。
     3.密码对斜坡与翻译效率的相关性研究
     已有研究表明,基因的密码对使用偏好性影响了基因的翻译效率。本研究的目的是利用生物信息学的方法,从基因组水平上研究密码对使用偏好性与基因翻译效率的相关性,尤其是密码对斜坡与翻译速率的相关性。我们用tRNA适应指数(tRNA adaptation index, tAI)作为度量基因翻译速率的指标。基因的tAI值表示的是该基因对于全基因组tRNA池的适应程度。基因的tAI值越高说明该基因的翻译速率也越高。
     我们用Java和Python编程语言,编写了多个用于基因组水平上计算tAI值的计算机程序。我们分别计算了9种模式生物(人、大鼠、小鼠、牛、线虫、果蝇、酵母、裂解酵母和大肠杆菌)基因组上的每一个CDS序列的tAI值。Spearman相关性分析的结果表明,在这9种模式生物中,CDS序列的CPB值与tAI值呈显著的相关。例如,在人的17,635个CDS序列中,CPB值与tAI值的Spearman相关系数为0.298(P<2.2E-16)。该结果表明,基因的翻译速率是影响基因密码对使用偏好性的一个重要因素。
     接着,我们从基因组水平上比较了选所取的9种模式生物的全基因组平均翻译速率分布型(averaged tAI profile)与平均密码对偏好性分布型(averaged CPB profile)。在人、牛、线虫、果蝇、裂解酵母和大肠杆菌的基因组CDS序列的前密码对斜坡区内,我们发现平均翻译速率分布型与平均密码对偏好性分布型呈现强烈的相关性,即基因组中CDS序列的前40个密码对的平均CPS值的变化趋势与平均tAI值的变化趋势强烈的相关。例如,在人基因组中,这种相关性高达0.651(Spearman test, P<9.177E-06)。但在基因组上密码对斜坡区以外的区域,我们没有发现这种相关性。例如,在人基因组的密码对斜坡以外的区域,CPS值与tAI值的Spearman相关系数为-0.032(P=0.776)。此外,在大鼠、小鼠以及酵母基因组的密码对斜坡中,我们也没有发现这种相关性(Spearman test, P>0.05),但分析酵母的CDS序列的前120个密码对(即前450个碱基)时,我们发现平均CPB值与平均tAI值呈现一定的相关性(Spearman test,ρ=0.242, P=0.0078)。以上的研究结果表明,在基因组的密码对斜坡中,密码对的偏好性与基因的翻译速率密切相关;非偏好使用的密码对降低了翻译的速度,进而影响了翻译的早期延长过程。这些结果也支持了基因表达的限速步骤是翻译的起始以及翻译的早期延长这一观点。
     4.密码对斜坡与大肠杆菌绿色荧光蛋白基因的表达水平的相关性研究
     本研究的目的是比较154个人工合成的大肠杆菌绿色荧光蛋白(green fluorescent protein, GFP)基因的密码对使用偏好性与其表达水平的关系,以期从已发表的的生物学实验结果中找到支持我们所得结论的证据。
     Plotkin等向我们提供了其2009年发表在《Science》上的论文中的154个人工合成的大肠杆菌GFP基因的DNA序列及其对应的基因表达水平数据。利用已有的Java和Python程序,我们分析了这154条GFP基因的CPB值。研究结果表明,这些GFP基因的平均CPB值为-0.098,低于大肠杆菌内源性基因的平均值(0.077)。由于这些人工合成的GFP基因上的密码对是随机排列的,在这些基因中我们没有发现密码对斜坡的存在。相关性分析表明,这些基因的CPB值与其对应的基因表达水平不存在相关性(Spearman test,ρ=-0.106, P>0.19)。当只考虑这154个GFP基因前40个密码对的CPB值时,我们发现前40个密码对的CPB值与基因的表达水平呈现显著的相关性(Spearman test,ρ=-0.256, P<0.01)。更有趣的是当只考虑这154个GFP基因中前40个密码对CPB值最高的37个基因(25﹪)时,我们发现CPB值与基因表达水平呈现显著的相关性(Spearman test,ρ=0.514, P<0.01)。该实验的结果支持了我们通过生物信息学分析得到的结论,即基因序列上局部的密码对使用偏好性,而不是全基因的密码对使用偏好性,与基因的表达水平密切相关。
     综上所述,本研究利用生物信息学和基因组学的理论和方法,分了478种生物全基因组密码对使用偏好性的变化趋势。我们在441种生物的全基因组CDS序列中发现了密码对斜坡的存在,即密码对的使用偏好性在CDS的5’末端普遍偏低,并由5’末端向3’末端逐步升高。这一规律在真核生、细菌和古细菌中普遍存在,具有物种的特异性,但没有生物分类系统上的差别。我们的研究还表明,在基因组的密码对斜坡中,密码对使用偏好性与基因的翻译速度密切相关;非偏好使用的密码对降低了翻译的速度,进而影响了翻译的早期延长过程。分析其他学者发表的生物学实验数据的结果也支持了这一结论。
     基于以上研究结果,我们认为翻译起始区域内的碱基序列包含了大量的信息,这些信息强烈的影响了蛋白质翻译的起始和翻译的早期延长过程。为开展本研究,我们编写了多个生物信息学程序,这些计算机程序都可免费提供下载,这为进一步开展相关研究打下了基础。本研究的结果对于理解密码对使用偏好性对基因表达的影响、基因序列的一维信息中蕴含的特定信号如何影响蛋白质功能和物种间进化等问题都具有一定的意义和指导作用,并为进一步开展此方面的研究提供了理论基础和新方法。
It is a longstanding idea that, in most species, synonymous codons are used with different frequencies (known as codon bias) and the order with which codons are used for one protein is far from random. There are 61 sense codons, therefore there are 3721 possible codon pairs (excluding stop codon pairs). It has been established by former studies that codon pair pattern in a given genome is also nonrandom and codon pair bias is a feature of different species which is independent of codon bias known as codon pair bias (CPB). Up to now, it is still not clear why some codon pairs are used more frequently than others. It has been suggested by previous experimental analysis that a selective force on codon pair preference within coding sequences may be translation, for the fitness of tRNAs within the A and P-sites in ribosomal may influence the efficiency of translation, and codon pair bias may have a component dictated by tRNA properties, rather than simply by codon properties.
     Analysis of codon pair usage in different organisms and its applications in bioinformatics and evolutionary studies are important issues for investigating gene expressing and genome evolution. CPB value has been applied on individual gene or individual genome to measure codon pair bias, but never codon-pair-by-codon-pair over an entire transcriptome. In this study, by using the methods of genomics and bioinformatics, the following researches have been done:
     1. Analysis of codon pair bias in 478 organisms through codon-pair-by-codon-pair over entire transcriptomes
     The aim of this research is to analyze codon pair bias through codon-pair-by-codon-pair across all coding sequences (CDS) in 478 organisms from all three domains of life and try to find out some general rules of codon pair usage. Consensus coding sequences (CCDS) for Homo sapiens (human) and Mus musculus (mouse) as well as coding sequences (CDS) for Rattus rattus (rat), Bos taurus (cow), Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and other organisms were downloaded from NCBI and UCSC.
     We developed several computer programs using Java, Python and R programming languages to carry out genome-wide analysis in this study. Based on these self-made computer programs, we computed the codon pair score (CPS) for each of the 3721 possible codon pairs. The CPS for a given codon pair is measured as the natural log of the ratio of the observed over the expected frequency of this codon pair over all coding sequences in a given genome. Positive and negative CPS values correspond to statistically over- and under-represented codon pairs. The codon pair bias (CPB) for an entire CDS with N codons (not including the stop codon) was then calculated as the arithmetic mean of the individual CPSs, and a CPS profile for the i-th CDS in a given genome is the vector of all its CPS values. For a particular species, the 5’and 3’portions of all CDSs were aligned according to their start and stop positions respectively, and an averaged head (the first 120 codon pairs) and tail (the last 120 codon pairs) CPS profiles were calculated by taking the mean of the CPS values of each codon pair position in the alignment respectively.
     We calculated total CPB for each CDS in the human as well as in the mouse, rat, cow, D.melanogaster, C.elegans, S.cerevisia, S.pombe and E. coli genomes. Specifically, in the human genome, the CPB distribution for a set of 17,635 CDS is shifted towards positive values, with the mean score being 0.075.
     We next inspected averaged head and tail CPS profiles of all the CDSs in a given genome. Remarkably, in nearly all species (441 out of 478) examined the CPS values are relatively low near the 5’end of mRNA and increase rapidly as the distance from the start codon grows. We call this effect a‘codon pair ramp’.
     In order to determine the typical length of the codon pair ramp we smoothed the averaged CPS profiles by calculating a mean value of the CPS profile within a sliding window of 10 codon pairs in length. The length of the ramp was then defined as the region in which the mean CPS value is significantly lower (Kolmogorov-Smirnov Test, P-value<0.05) than the mean of all 12 sliding windows. We found that the length of the codon pair ramp is about 20 to 50 codon pairs in almost all species examined. In the human genome the length of the codon pair ramp is 40 codon pairs, and the average CPS value in this region is 0.067, ~7% lower than the mean value of the first 120 codon pairs which is 0.072. By contrast, the average CPS value in the region between the 50th and the 120th codon pair is 0.076, ~6% higher than the mean value of first 120 codon pairs.
     We calculated the CPB value of the first 40 codon pairs for each individual CDS in a given genome. While the average CPB value for all CDS in human is 0.075, the mean value for the first 40 codon pairs of all CDS is 0.066, and the CPB value of the first 40 codon pairs in each CDS is significantly lower than the CPB value of the entire sequence (Paired t-test, p-value < 2.2e-16).
     We also found lower CPS values in the tail parts (the last 120 codon pairs) of coding sequences in 413 out of 478 species studied, such as human, mouse, rat, cow, D.melanogaster, C.elegans and E.coli, while in 69 out of 478 species studied, such as S.cerevisiae, S.pombe and A.fumigatus, no such tail ramp appears to exist. Out of 413 species possessing both head and tail codon pair ramp in 375 species the length of the head ramp is longer or equal than the length of the tail ramp.
     2. Comparing CPB between wild profiles and random profiles
     To verify that the observed codon pair profile is not a trivial consequence of lining up all CDSs in a given genome by their start/stop codons, we developed a computer program by using R programming language with Seqinr package (http://seqinr.r-forge.r-project.org/) to generate random sequences for each CDS in E. coli, human and S.cerevisiae. Using this R program, we randomly shuffled each CDS in a given genome. The shuffling was done using two alternative methods: a) random permutation of codons occurring in a CDS while preserving the exact count of each codon (codon randomization), and b) random selection of synonymous codons for each amino acid while preserving the amino acid sequence and codon usage of a given CDS (synonymous codon randomization). Both procedures were repeated 50 times, and the averaged CPS profiles of random sequences of a given species were produced by using CPS value of each codon pair from wild genome.
     Average CPS values in these two profiles are negative which means that codon pairs in random sequences are statistically under-represented compared with wild sequences. Such negative values are expected because the codon pair usage of wild sequences is not random and not all combinations of two codons in wild sequences are used as frequently as in random sequences. Moreover, while codon pair ramps near the 5’end of mRNA exist in all coding sequences in a given genome, randomized sequences do not show this effect, indicating that the observed profiles are not a trivial consequence of lining up all CDSs in a given genome by their start/stop codons.
     3. Analysis the correlation between CPB ramp and translation speed
     The aim of this research is to analyze the correlation between codon pair usage and translation speed, especially for the CPB ramp region.
     Based on several self-made Java and Python computer programs, we compared the tRNA adaptation index (tAI) to CPB in each CDS in human, mouse, rat, cow, D.melanogaster, C.elegans, S.cerevisiae, S.pombe and E. coli. The tAI value of a given transcript reflects its adaptation to the tRNAs pool in a given genome. tAI is a number between 0 and 1, with higher values corresponding to higher translation speed.
     A significant positive (albeit weak) correlation between these two values was indeed found in human (Spearman’sρ=0.298, P<2.2e-16) and other species, which confirms that one possible force shaping codon pair bias is optimization of translational speed by means of the adaptation to the tRNA pool.
     We also calculated an averaged tAI profile for each codon pair position in a given genome. In this case, tAI values were calculated for each codon pair by taking the geometric mean of the tAI values of the two codons comprising a given codon pair. For all CDSs in a given genome, we compared the average CPS value for each codon pair position along coding sequences to the average tAI value for this codon pair position. We observed a strong positive correlation between average CPS values and average tAI values for each codon pair position in the codon pair ramp regions of human, cow, D.melanogaster, C.elegans, S.pombe and E. coli. For example, in human the CPS profile has a strong and significant (Spearman’sρ=0.651, P<9.177E-06) correlation with the translation speed profile among the first 40 codon pairs. However, no significant correlation was found between CPS and tAI values for the 40th to 120th codon pairs (Spearman’sρ=-0.032, P=0.776) in human. In mouse, rat and S.cerevisiae we did not find any correlation between CPB and tAI the in ramp region. However, in S.cerevisiae, when considering the first 120 codon pairs we found a week but significant positive CPB/tAI correlation (Spearman’sρ=0.242, P=0.0078).
     Tight connection between codon pair bias and translation speed in the codon pair ramp region suggests that under-represented codon pairs slow down early elongation steps and thereby reduce the rate of translation in the vicinity of the translation initiation region. These findings are also consistent with the notion that it is translation initiation or early elongation, and not global elongation rate that is rate-limiting for gene expression. Interestingly, however, in mouse and rat we did not find any correlation between CPS and tAI. We speculate that in these organisms selection to promote mRNA stability, rather than translational selection, may affect the codon pair preference as well as codon usage.
     4. The effect of codon pair usage on the translation of GFP genes
     In this study, we used the sequences of 154 green fluorescent protein (GFP) genes to test the effect of codon pair usage on translation. Sequences of 154 genes that varied randomly in their codon usage, but encoded the same GFP, as well as normalized fluorescence levels for pGK8 (T7 promoter, no leader sequence), reflecting their expression levels in E.coli, were obtained from Kudla et al’s work.
     The average CPB value of these genes is -0.098, lower than in E.coli’s endogenous genes (0.077). As expected we neither found the codon pair ramp in these data, nor was the CPB value of each complete gene sequence significantly correlated with fluorescence levels (Spearman’sρ=-0.106, P>0.19). However, while considering only the first 40 codon pairs of each sequences (average CPS=-0.112) we found a significant and negative correlation (Spearman’sρ=-0.256, P<0.01) between CPB and fluorescence levels. Furthermore, in the 25% of GFP constructs with the highest CPB values in the first 40 codon pairs (37 constructs) fluorescence levels significantly and strongly (Spearman’sρ=0.514, P<0.01) correlate with codon pair bias of the first 40 codon pairs. These results fully suggest that instead of the global codon pair usage there is a relationship between the local codon pair usage and the expression level for each gene.
     In summary, in this study, several computer programs were developed by using Java, Python and R programming languages, and a broad survey of codon pair bias through codon-pair-by-codon-pair near the translation-initiation region of all protein-coding sequences in 478 organisms from all three domains of life has been completed. We found that in nearly all species there is a general tendency for increased CPB near the 5’end of protein coding sequences, to which we refer as“codon pair ramp”in this study. Such ramp is constituted by the first 20 to 50 codon pair positions of the protein-coding sequence where codon pair bias is relatively low. Our finding of strong interconnection between codon pair bias and translation speed confirms the important role played by the nucleotide sequence near the 5’end of mRNAs in controlling early elongation. All the source codes of computer programs developed and used in this study are free available. Statistical evidence presented in this work remains to be experimentally verified and explained furthering our knowledge of how information stored in DNA sequences determines diverse cellular processes.
引文
(2009a). The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res 37, D169-174.
    (2009b). The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res.
    Aitken, C.E., Petrov, A., and Puglisi, J.D. (2010). Single Ribosome Dynamics and the Mechanism of Translation. Annual Review of Biophysics 39, 491-513.
    Akashi, H. (1996). Molecular evolution between Drosophila melanogaster and D-simulans: Reduced codon bias, faster rates of amino acid substitution, and larger proteins in D-melanogaster. Genetics 144, 1297-1307.
    Akashi, H. (2003). Translational selection and yeast proteome evolution. Genetics 164, 1291-1303.
    Alff-Steinberger, C. (2000). A comparative study of mutations in Escherichia coli and Salmonella typhimurium shows that codon conservation is strongly correlated with codon usage. J Theor Biol 206, 307-311.
    Antequera, F., and Bird, A. (1993). Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci U S A 90, 11995-11999.
    Archetti, M. (2004). Codon usage bias and mutation constraints reduce the level of error minimization of the genetic code. Journal of Molecular Evolution 59, 258-266.
    Argos, P., Rossman, M.G., Grau, U.M., Zuber, H., Frank, G., and Tratschin, J.D. (1979). Thermal stability and protein structure. Biochemistry 18, 5698-5703.
    Auewarakul, P. (2005). Composition bias and genome polarity of RNA viruses. Virus Research 109, 33-37.
    Bahir, I., Fromer, M., Prat, Y., and Linial, M. (2009). Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences. Mol Syst Biol 5, 311.
    Ban, N., Nissen, P., Hansen, J., Moore, P.B., and Steitz, T.A. (2000). The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science 289, 905-920.
    Barrai, I., Scapoli, C., Barale, R., and Volinia, S. (1990). Oligonucleotide correlations between infector and host genomes hint at evolutionary relationships. Nucleic Acids Res 18, 3021-3025.
    Bastien, N., Chui, N., Robinson, J.L., Lee, B.E., Dust, K., Hart, L., and Li, Y. (2007). Detection of human bocavirus in Canadian children in a 1-year study. J Clin Microbiol 45, 610-613.
    Begun, D.J. (2001). The frequency distribution of nucleotide variation in Drosophila simulans. Molecular Biology and Evolution 18, 1343-1352.
    Berg, O.G., and Silva, P.J.N. (1997). Codon bias in Escherichia coli: The influence of codon context on mutation and selection. Nucleic Acids Research 25, 1397-1404.
    Bernardi, G. (1993). The vertebrate genome: isochores and evolution. Mol Biol Evol 10, 186-204. Bernardi, G. (2000). Isochores and the evolutionary genomics of vertebrates. Gene 241, 3-17.
    Bernardi, G. (2007). The neoselectionist theory of genome evolution. Proc Natl Acad Sci U S A 104,8385-8390.
    Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M., and Rodier, F. (1985). The mosaic genome of warm-blooded vertebrates. Science 228, 953-958.
    Bertram, G., Innes, S., Minella, O., Richardson, J.P., and Stansfield, I. (2001). Endless possibilities: translation termination and stop codon recognition. Microbiology 147, 255-269.
    Blencowe, B.J. (2000). Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci 25, 106-110.
    Bohlin, J., and Skjerve, E. (2009). Examination of Genome Homogeneity in Prokaryotes Using Genomic Signatures. PLoS ONE 4, -.
    Bollenbach, T.J., and Stern, D.B. (2003). Secondary structures common to chloroplast mRNA 3'-untranslated regions direct cleavage by CSP41, an endoribonuclease belonging to the short chain dehydrogenase/reductase superfamily. J Biol Chem 278, 25832-25838.
    Boycheva, S., Chkodrov, G., and Ivanov, I. (2003). Codon pairs in the genome of Escherichia coli. Bioinformatics 19, 987-998.
    Brockmann, R., Beyer, A., Heinisch, J.J., and Wilhelm, T. (2007). Posttranscriptional expression regulation: What determines translation rates? Plos Computational Biology 3, 531-539.
    Brower-Sinning, R., Carter, D.M., Crevar, C.J., Ghedin, E., Ross, T.M., and Benos, P.V. (2009). The role of RNA folding free energy in the evolution of the polymerase genes of the influenza A virus. Genome Biol 10, R18.
    Buchan, J.R., Aucott, L.S., and Stansfield, I. (2006). tRNA properties help shape codon pair preferences in open reading frames. Nucleic Acids Res 34, 1015-1027.
    Bulmer, M. (1991). The selection-mutation-drift theory of synonymous codon usage. Genetics 129, 897-907.
    Calderwood, M.A., Venkatesan, K., Xing, L., Chase, M.R., Vazquez, A., Holthaus, A.M., Ewence, A.E., Li, N., Hirozane-Kishikawa, T., Hill, D.E., et al. (2007). Epstein-Barr virus and virus human protein interaction maps. Proc Natl Acad Sci U S A 104, 7606-7611.
    Cambray, G., and Mazel, D. (2008). Synonymous genes explore different evolutionary landscapes. PLoS Genet 4, e1000256.
    Cammarano, R., Costantini, M., and Bernardi, G. (2009). The isochore patterns of invertebrate genomes. BMC Genomics 10, 538.
    Carlini, D.B. (2004). Experimental reduction of codon bias in the Drosophila alcohol dehydrogenase gene results in decreased ethanol tolerance of adult flies. J Evol Biol 17, 779-785.
    Carlini, D.B., and Stephan, W. (2003). In vivo introduction of unpreferred synonymous codons into the Drosophila Adh gene results in reduced levels of ADH protein. Genetics 163, 239-243.
    Chamary, J.V., and Hurst, L.D. (2005). Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol 6, R75.
    Chan, P.P., and Lowe, T.M. (2009). GtRNAdb: a database of transfer RNA genes detected in genomicsequence. Nucleic Acids Res 37, D93-97.
    Charif, D., Thioulouse, J., Lobry, J.R., and Perriere, G. (2005). Online synonymous codon usage analyses with the ade4 and seqinR packages. Bioinformatics 21, 545-547.
    Charlesworth, B. (2009). Effective population size and patterns of molecular evolution and variation. Nature Reviews Genetics 10, 195-205.
    Chen, L.L., and Zhang, C.T. (2003). Seven GC-rich microbial genomes adopt similar codon usage patterns regardless of their phylogenetic lineages. Biochem Biophys Res Commun 306, 310-317.
    Chen, S.L., Lee, W., Hottes, A.K., Shapiro, L., and McAdams, H.H. (2004). Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 101, 3480-3485.
    Chieochansin, T., Chutinimitkul, S., Payungporn, S., Hiranras, T., Samransamruajkit, R., Theamboolers, A., and Poovorawan, Y. (2007). Complete coding sequences and phylogenetic analysis of Human Bocavirus (HBoV). Virus Res.
    Christopherson, R.I., Cinquin, O., Shojaei, M., Kuehn, D., and Menz, R.I. (2004). Cloning and expression of malarial pyrimidine enzymes. Nucleosides Nucleotides Nucleic Acids 23, 1459-1465.
    Chuang, J.H., and Li, H. (2004). Functional bias and spatial organization of genes in mutational hot and cold regions in the human genome. PLoS Biology 2, 253-263.
    Cid-Arregui, A. (2009). Therapeutic vaccines against human papillomavirus and cervical cancer. Open Virol J 3, 67-83.
    Cid-Arregui, A., Juarez, V., and zur Hausen, H. (2003). A synthetic E7 gene of human papillomavirus type 16 that yields enhanced expression of the protein in mammalian cells and is useful for DNA immunization studies. J Virol 77, 4928-4937.
    Coleman, J.R., Papamichail, D., Skiena, S., Futcher, B., Wimmer, E., and Mueller, S. (2008). Virus attenuation by genome-scale changes in codon pair bias. Science 320, 1784-1787.
    Comeron, J.M., and Guthrie, T.B. (2005). Intragenic Hill-Robertson interference influences selection intensity on synonymous mutations in Drosophila. Mol Biol Evol 22, 2519-2530.
    Comeron, J.M., Kreitman, M., and Aguade, M. (1999). Natural Selection on Synonymous Sites Is Correlated With Gene Length and Recombination in Drosophila. Genetics 151, 239-249.
    Costantini, M., and Bernardi, G. (2008). The short-sequence designs of isochores from the human genome. Proceedings of the National Academy of Sciences of the United States of America 105, 13971-13976.
    Costantini, M., Cammarano, R., and Bernardi, G. (2009). The evolution of isochore patterns in vertebrate genomes. BMC Genomics 10, 146.
    Costantini, M., Clay, O., Auletta, F., and Bernardi, G. (2006). An isochore map of human chromosomes. Genome Res 16, 536-541.
    Crick, F. (1970). Central dogma of molecular biology. Nature 227, 561-563.
    Cutter, A.D., and Charlesworth, B. (2006). Selection intensity on preferred codons correlates with overall codon usage bias in Caenorhabditis remanei. Curr Biol 16, 2053-2057.
    Das, S., Paul, S., and Dutta, C. (2006). Synonymous codon usage in adenoviruses: influence of mutation, selection and protein hydropathy. Virus Res 117, 227-236.
    Davis, J.J., and Olsen, G.J. (2009). Modal Codon Usage: Assessing the typical codon usage of a genome. Mol Biol Evol.
    Davis, J.J., and Olsen, G.J. (2010). Modal codon usage: assessing the typical codon usage of a genome. Mol Biol Evol 27, 800-810.
    de Chassey, B., Navratil, V., Tafforeau, L., Hiet, M.S., Aublin-Gex, A., Agaugue, S., Meiffren, G., Pradezynski, F., Faria, B.F., Chantier, T., et al. (2008). Hepatitis C virus infection protein network. Mol Syst Biol 4, 230.
    dos Reis, M., Savva, R., and Wernisch, L. (2004). Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res 32, 5036-5044.
    dos Reis, M., and Wernisch, L. (2009). Estimating translational selection in eukaryotic genomes. Mol Biol Evol 26, 451-461.
    dos Reis, M., Wernisch, L., and Savva, R. (2003). Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res 31, 6976-6985.
    Drummond, D.A., and Wilke, C.O. (2008). Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134, 341-352.
    Duan, J., Wainwright, M.S., Comeron, J.M., Saitou, N., Sanders, A.R., Gelernter, J., and Gejman, P.V. (2003). Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor. Hum Mol Genet 12, 205-216.
    Duret, L. (2002). Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev 12, 640-649.
    Duret, L., and Mouchiroud, D. (1999). Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci U S A 96, 4482-4487.
    Dyer, M.D., Murali, T.M., and Sobral, B.W. (2008). The landscape of human proteins interacting with viruses and other pathogens. PLoS Pathog 4, e32.
    Ehrenberg, M., Dennis, P.P., and Bremer, H. (2010). Maximum rrn promoter activity in Escherichia coli at saturating concentrations of free RNA polymerase. Biochimie 92, 12-20.
    Eyre-Walker, A., and Hurst, L.D. (2001). The evolution of isochores. Nat Rev Genet 2, 549-555.
    Farlow, A., Meduri, E., Dolezal, M., Hua, L., and Schlotterer, C. (2010). Nonsense-mediated decay enables intron gain in Drosophila. PLoS Genet 6, e1000819.
    Fedorov, A., Saxonov, S., and Gilbert, W. (2002). Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res 30, 1192-1197.
    Flicek, P., and Birney, E. (2009). Sense from sequence reads: methods for alignment and assembly.Nat Methods 6, S6-S12.
    Folley, L.S., and Yarus, M. (1989). Codon contexts from weakly expressed genes reduce expression in vivo. J Mol Biol 209, 359-378.
    Fraser, H.B., Hirsh, A.E., Wall, D.P., and Eisen, M.B. (2004). Coevolution of gene expression among interacting proteins. Proc Natl Acad Sci U S A 101, 9033-9038.
    Fredrick, K., and Ibba, M. (2010). How the sequence of a gene can tune its translation. Cell 141, 227-229.
    Friedlander, M.R., Chen, W., Adamidi, C., Maaskola, J., Einspanier, R., Knespel, S., and Rajewsky, N. (2008). Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol 26, 407-415.
    Fryxell, K.J., and Zuckerkandl, E. (2000). Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol 17, 1371-1383.
    Fuglsang, A. (2003). The effective number of codons for individual amino acids: some codons are more optimal than others. Gene 320, 185-190.
    Fuglsang, A. (2006). Accounting for background nucleotide composition when measuring codon usage bias: brilliant idea, difficult in practice. Mol Biol Evol 23, 1345-1347.
    Gao, F., and Zhang, C.T. (2006). Isochore structures in the chicken genome. FEBS J 273, 1637-1648.
    Geslain, R., and Pan, T. (2010). Functional analysis of human tRNA isodecoders. J Mol Biol 396, 821-831.
    Ghaemmaghami, S., Huh, W.K., Bower, K., Howson, R.W., Belle, A., Dephoure, N., O'Shea, E.K., and Weissman, J.S. (2003). Global analysis of protein expression in yeast. Nature 425, 737-741.
    Gladitz, J., Shen, K., Antalis, P., Hu, F.Z., Post, J.C., and Ehrlich, G.D. (2005). Codon usage comparison of novel genes in clinical isolates of Haemophilus influenzae. Nucleic Acids Research 33, 3644-3658.
    Goetz, R.M., and Fuglsang, A. (2005). Correlation of codon bias measures with mRNA levels: analysis of transcriptome data from Escherichia coli. Biochem Biophys Res Commun 327, 4-7.
    Griswold, K.E., Mahmood, N.A., Iverson, B.L., and Georgiou, G. (2003). Effects of codon usage versus putative 5'-mRNA structure on the expression of Fusarium solani cutinase in the Escherichia coli cytoplasm. Protein Expr Purif 27, 134-142.
    Gu, T., Tan, S., Gou, X., Araki, H., and Tian, D. (2010). Avoidance of long mononucleotide repeats in codon pair usage. Genetics 186, 1077-1084.
    Gu, W.J., Zhou, T., Ma, J.M., Sun, X., and Lu, Z.H. (2004). Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales. Virus Research 101, 155-161.
    Gu, X., and Li, W.H. (1992). Higher rates of amino acid substitution in rodents than in humans. Mol Phylogenet Evol 1, 211-214.
    Guo, X.Y., Bao, J.D., and Fan, L.J. (2007). Evidence of selectively driven codon usage in rice: Implications for GC content evolution of Gramineae genes. Febs Letters 581, 1015-1021.
    Gustafsson, C., Govindarajan, S., and Minshull, J. (2004). Codon bias and heterologous protein expression. Trends in Biotechnology 22, 346-353.
    Gutman, G.A., and Hatfield, G.W. (1989). Nonrandom utilization of codon pairs in Escherichia coli. Proc Natl Acad Sci U S A 86, 3699-3703.
    Haas, J., Park, E.C., and Seed, B. (1996). Codon usage limitation in the expression of HIV-1 envelope glycoprotein. Curr Biol 6, 315-324.
    Haiminen, N., and Mannila, H. (2007). Discovering isochores by least-squares optimal segmentation. Gene 394, 53-60.
    Hall, M.N., Gabay, J., Debarbouille, M., and Schwartz, M. (1982). A role for mRNA secondary structure in the control of translation initiation. Nature 295, 616-618.
    Haywood-Farmer, E., and Otto, S.P. (2003). The evolution of genomic base composition in bacteria. Evolution 57, 1783-1792.
    Hershberg, R., and Petrov, D.A. (2008). Selection on codon bias. Annu Rev Genet 42, 287-299.
    Hershberg, R., and Petrov, D.A. (2009). General rules for optimal codon choice. PLoS Genet 5, e1000556.
    Higgs, P.G., Hao, W.L., and Golding, G.B. (2007). Identification of Conflicting Selective Effects on Highly Expressed Genes. Evol Bioinform 3, 1-13.
    Hildebrand, F., Meyer, A., and Eyre-Walker, A. (2010). Evidence of selection upon genomic GC-content in bacteria. PLoS Genet 6, e1001107.
    Hofacker, I.L., and Stadler, P.F. (2006). Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics 22, 1172-1176.
    Hou, Z.C., and Yang, N. (2003). Factors affecting codon usage in Yersinia pestis. Sheng Wu Hua Xue Yu Sheng Wu Wu Li Xue Bao (Shanghai) 35, 580-586.
    Hou ZC, Y.N. (2002). Analysis of factors shaping S. pneumoniae codon usage. YiChuan XueBao 29, 747-752.
    Huss, M. (2010). Introduction into the analysis of high-throughput-sequencing based epigenome data. Brief Bioinform 11, 512-523.
    Ikemura, T. (1981). Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol 151, 389-409.
    Ikemura, T. (1985). Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2, 13-34.
    Inge-Vechtomov, S., Zhouravleva, G., and Philippe, M. (2003). Eukaryotic release factors (eRFs) history. Biol Cell 95, 195-209.
    Ingolia, N.T., Ghaemmaghami, S., Newman, J.R., and Weissman, J.S. (2009). Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218-223.
    Irwin, B., Heck, J.D., and Hatfield, G.W. (1995). Codon pair utilization biases influence translationalelongation step times. J Biol Chem 270, 22801-22806.
    Jacques, N., and Dreyfus, M. (1990). Translation initiation in Escherichia coli: old and new questions. Mol Microbiol 4, 1063-1067.
    Jenkins, G.M., and Holmes, E.C. (2003). The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 92, 1-7.
    Jenkins, G.M., Pagel, M., Gould, E.A., Zanotto, P.M.d.A., and Holmes, E.C. (2001). Evolution of base composition and codon usage bias in the genus Flavivirus. Journal of Molecular Evolution 52, 383-390.
    Jia, M., and Li, Y. (2005). The relationship among gene expression, folding free energy and codon usage bias in Escherichia coli. FEBS Lett 579, 5333-5337.
    Jia, R.Y., Cheng, A.C., Wang, M.S., Xin, H.Y., Guo, Y.F., Zhu, D.K., Qi, X.F., Zhao, L.C., Ge, H., and Chen, X.Y. (2009). Analysis of synonymous codon usage in the UL24 gene of duck enteritis virus. Virus Genes 38, 96-103.
    Kalinna, B.H., and McManus, D.P. (1994). Codon usage in Echinococcus. Exp Parasitol 79, 72-76.
    Kanaya, S., Kinouchi, M., Abe, T., Kudo, Y., Yamada, Y., Nishi, T., Mori, H., and Ikemura, T. (2001a). Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. Gene 276, 89-99.
    Kanaya, S., Yamada, Y., Kinouchi, M., Kudo, Y., and Ikemura, T. (2001b). Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol 53, 290-298.
    Kanaya, S., Yamada, Y., Kudo, Y., and Ikemura, T. (1999). Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene 238, 143-155.
    Karlin, S., Blaisdell, B.E., and Schachtel, G.A. (1990). Contrasts in codon usage of latent versus productive genes of Epstein-Barr virus: data and hypotheses. J Virol 64, 4264-4273.
    Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. (2004). The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32, D493-496.
    Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., et al. (2008). The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res 36, D773-779.
    Kawashita, S.Y., da Silva, C.V., Mortara, R.A., Burleigh, B.A., and Briones, M.R. (2009). Homology, paralogy and function of DGF-1, a highly dispersed Trypanosoma cruzi specific gene family and its implications for information entropy of its encoded proteins. Mol Biochem Parasitol 165, 19-31.
    Kertesz, M. (2010). Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103-107.
    Kimchi-Sarfaty, C., Oh, J.M., Kim, I.W., Sauna, Z.E., Calcagno, A.M., Ambudkar, S.V., and Gottesman, M.M. (2007). A "silent" polymorphism in the MDR1 gene changes substrate specificity. Science 315, 525-528.
    Kisselev, L., Ehrenberg, M., and Frolova, L. (2003). Termination of translation: interplay of mRNA, rRNAs and release factors? EMBO J 22, 175-182.
    Kliman, R.M., and Bernal, C.A. (2005). Unusual usage of AGG and TTG codons in humans and their viruses. Gene 352, 92-99.
    Kloster, M., and Tang, C. (2008). SCUMBLE: a method for systematic and accurate detection of codon usage bias by maximum likelihood estimation. Nucleic Acids Res 36, 3819-3827.
    Knight, R.D., Freeland, S.J., and Landweber, L.F. (2001). A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2, RESEARCH0010.
    Ko, W.Y., Piao, S.F., and Akashi, H. (2006). Strong regional heterogeneity in base composition evolution on the Drosophila X chromosome. Genetics 174, 349-362.
    Kozak, M. (2005). Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13-37.
    Kudla, G., Murray, A.W., Tollervey, D., and Plotkin, J.B. (2009). Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255-258.
    Kumar, P.A., and Sharma, R.P. (1995). Codon Usage in Brassica Genes. Journal of Plant Biochemistry and Biotechnology 4, 113-115.
    Kusumi, J., and Tachida, H. (2005). Compositional properties of green-plant plastid genomes. Journal of Molecular Evolution 60, 417-425.
    Kyte, J., and Doolittle, R.F. (1982). A simple method for displaying the hydropathic character of a protein. J Mol Biol 157, 105-132.
    Lancaster, A.K., Bardill, J.P., True, H.L., and Masel, J. (2010). The spontaneous appearance rate of the yeast prion [PSI+] and its implications for the evolution of the evolvability properties of the [PSI+] system. Genetics 184, 393-400.
    Le, T.H., McManus, D.P., and Blair, D. (2004). Codon usage and bias in mitochondrial genomes of parasitic platyhelminthes. Korean J Parasitol 42, 159-167.
    Lenburg, M.E., Liou, L.S., Gerry, N.P., Frampton, G.M., Cohen, H.T., and Christman, M.F. (2003). Previously unidentified changes in renal cell carcinoma gene expression identified by parametric analysis of microarray data. BMC Cancer 3, 31.
    Lerat, E., Biemont, C., and Capy, P. (2000). Codon usage and the origin of P elements. Molecular Biology and Evolution 17, 467-468.
    Lercher, M.J., Urrutia, A.O., and Hurst, L.D. (2002). Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet 31, 180-183.
    Lercher, M.J., Urrutia, A.O., Pavlicek, A., and Hurst, L.D. (2003). A unification of mosaic structures inthe human genome. Hum Mol Genet 12, 2411-2415.
    Letzring, D.P., Dean, K.M., and Grayhack, E.J. (2010). Control of translation efficiency in yeast by codon-anticodon interactions. RNA.
    Levin, D.B., and Whittome, B. (2000). Codon usage in nucleopolyhedroviruses. J Gen Virol 81, 2313-2325.
    Li, H., and Homer, N. (2010). A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform.
    Lin, Z., Yang, S., and Mallavia, L.P. (1997). Codon usage and nucleotide composition in Coxiella burnetii. Gene 198, 171-180.
    Lithwick, G., and Margalit, H. (2005). Relative predicted protein levels of functionally associated proteins are conserved across organisms. Nucleic Acids Res 33, 1051-1057.
    Liu, Q.-P., Tan, J., and Xue, Q.-Z. (2003). Synonymous codon usage bias in the rice cultivar 93-11 (Oryza sativa L ssp indica). Acta Genetica Sinica 30(4) Apr, 335-340.
    Liu, Q. (2006). Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans. Biosystems 85, 99-106.
    Lloyd, A.T., and Sharp, P.M. (1993). Synonymous Codon Usage in Kluyveromyces-Lactis. Yeast 9, 1219-1228.
    Loh, P.G., and Song, H. (2010). Structural and mechanistic insights into translation termination. Curr Opin Struct Biol 20, 98-103.
    Lu, H., Zhao, W.M., Zheng, Y., Wang, H., Qi, M., and Yu, X.P. (2005). Analysis of synonymous codon usage bias in Chlamydia. Acta Biochimica Et Biophysica Sinica 37, 1-10.
    Lu, P., Vogel, C., Wang, R., Yao, X., and Marcotte, E.M. (2007). Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotech 25, 117-124.
    Lucks, J.B., Nelson, D.R., Kudla, G.R., and Plotkin, J.B. (2008). Genome landscapes and bacteriophage codon usage. PLoS Comput Biol 4, e1000001.
    Ma, F., Zhuang, Y.L., Chen, L.M., Lin, L.P., Li, Y.D., Xu, X.F., and Chen, X.P. (2004). Comparing synonymous codon usage of alternatively spliced genes with non-alternatively spliced genes in human genome. Journal of Biological Systems 12, 91-103.
    Man, O., and Pilpel, Y. (2007). Differential translation efficiency of orthologous genes is involved in phenotypic divergence of yeast species. Nat Genet 39, 415-421.
    Mandal, D., Feng, Z., and Stoltzfus, C.M. (2007). Gag Processing Defect of Hiv-1 Integrase E246 and G247 Mutants Is Caused by Activation of an Overlapping 5' Splice Site. J Virol.
    McHardy, A.C., Puhler, A., Kalinowski, J., and Meyer, F. (2004). Comparing expression level-dependent features in codon usage with protein abundance: an analysis of 'predictive proteomics'. Proteomics 4, 46-58.
    Metzker, M.L. (2010). Sequencing technologies - the next generation. Nat Rev Genet 11, 31-46.
    Morton, B.R. (2003). The role of context-dependent mutations in generating compositional and codon usage bias in grass chloroplast DNA. J Mol Evol 56, 616-629.
    Morton, B.R., and Wright, S.I. (2007). Selective constraints on codon usage of nuclear genes from Arabidopsis thaliana. Molecular Biology and Evolution 24, 122-129.
    Mouchiroud, D., D'Onofrio, G., Aissani, B., Macaya, G., Gautier, C., and Bernardi, G. (1991a). The distribution of genes in the human genome. Gene 100, 181-187.
    Mouchiroud, D., Donofrio, G., Aissani, B., Macaya, G., Gautier, C., and Bernardi, G. (1991b). The Distribution of Genes in the Human Genome. Gene 100, 181-187.
    Moura, G., Pinheiro, M., Arrais, J., Gomes, A.C., Carreto, L., Freitas, A., Oliveira, J.L., and Santos,
    M.A. (2007). Large scale comparative codon-pair context analysis unveils general rules that fine-tune evolution of mRNA primary structure. PLoS ONE 2, e847.
    Moura, G., Pinheiro, M., Silva, R., Miranda, I., Afreixo, V., Dias, G., Freitas, A., Oliveira, J.L., and Santos, M.A. (2005). Comparative context analysis of codon pairs on an ORFeome scale. Genome Biol 6, R28.
    Mueller, S., Coleman, J.R., Papamichail, D., Ward, C.B., Nimnual, A., Futcher, B., Skiena, S., and Wimmer, E. (2010). Live attenuated influenza virus vaccines by computer-aided rational design. Nat Biotechnol 28, 723-726.
    Nakamura, Y., and Tabata, S. (1997). Codon-anticodon assignment and detection of codon usage trends in seven microbial genomes. Microb Comp Genomics 2, 299-312.
    Naya, H., Romero, H., Carels, N., Zavala, A., and Musto, H. (2001). Translational selection shapes codon usage in the GC-rich genome of Chlamydomonas reinhardtii. FEBS Lett 501, 127-130.
    Neafsey, D.E., and Galagan, J.E. (2007). Positive selection for unpreferred codon usage in eukaryotic genomes. BMC Evol Biol 7, 119.
    Neske, F., Blessing, K., Tollmann, F., Schubert, J., Rethwilm, A., Kreth, H.W., and Weissbrich, B. (2007). Real-time PCR for human bocavirus infections and phylogenetic analysis. J Clin Microbiol.
    Nielsen, R. (2010). Genomics: In search of rare human variants. Nature 467, 1050-1051.
    Nishio, Y., Nakamura, Y., Kawarabayasi, Y., Usuda, Y., Kimura, E., Sugimoto, S., Matsui, K., Yamagishi, A., Kikuchi, H., Ikeo, K., et al. (2003). Comparative complete genome sequence analysis of the amino acid replacements responsible for the thermostability of Corynebacterium efficiens. Genome Res 13, 1572-1579.
    Novembre, J.A. (2002). Accounting for background nucleotide composition when measuring codon usage bias. Mol Biol Evol 19, 1390-1394.
    Oliver, J.L., Carpena, P., Hackenberg, M., and Bernaola-Galvan, P. (2004). IsoFinder: computational prediction of isochores in genome sequences. Nucleic Acids Research 32, W287-W292.
    Paddison, P.J., Caudy, A.A., Bernstein, E., Hannon, G.J., and Conklin, D.S. (2002). Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells. Genes Dev 16,948-958.
    Pascal, G., Medigue, C., and Danchin, A. (2005). Universal biases in protein composition of model prokaryotes. Proteins-Structure Function and Bioinformatics 60, 27-35.
    Paul, M.S., and Wen-Hsiung, L. (1986). An evolutionary perspective on synonymous codon usage in unicellular organisms. Journal of Molecular Evolution V24, 28-38.
    Perriere, G., and Thioulouse, J. (2002). Use and misuse of correspondence analysis in codon usage studies. Nucleic Acids Res 30, 4548-4555.
    Petit, N., and Barbadilla, A. (2009). Selection efficiency and effective population size in Drosophila species. J Evol Biol 22, 515-526.
    Plotkin, J.B. (2010). Transcriptional regulation is only half the story. Mol Syst Biol 6, 406. Plotkin, J.B., and Kudla, G. (2010). Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet.
    Pruitt, K.D., Harrow, J., Harte, R.A., Wallin, C., Diekhans, M., Maglott, D.R., Searle, S., Farrell, C.M., Loveland, J.E., Ruef, B.J., et al. (2009). The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res 19, 1316-1323.
    Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35, D61-65.
    Puigbo, P., Bravo, I.G., and Garcia-Vallve, S. (2008). E-CAI: a novel server to estimate an expected value of Codon Adaptation Index (eCAI). BMC Bioinformatics 9, 65.
    Qin, H., Wu, W.B., Comeron, J.M., Kreitman, M., and Li, W.H. (2004). Intragenic spatial patterns of codon usage bias in prokaryotic and eukaryotic genomes. Genetics 168, 2245-2260.
    Qing, G., Xia, B., and Inouye, M. (2003). Enhancement of translation initiation by A/T-rich sequences downstream of the initiation codon in Escherichia coli. J Mol Microbiol Biotechnol 6, 133-144.
    Qu, X.W., Duan, Z.J., Qi, Z.Y., Xie, Z.P., Gao, H.C., Liu, W.P., Huang, C.P., Peng, F.W., Zheng, L.S., and Hou, Y.D. (2007). Human bocavirus infection, People's Republic of China. Emerg Infect Dis 13, 165-168.
    Ramakrishna, L., Anand, K.K., Mohankumar, K.M., and Ranga, U. (2004). Codon optimization of the tat antigen of human immunodeficiency virus type 1 generates strong immune responses in mice following genetic immunization. J Virol 78, 9174-9189.
    Robison, K. (2010). Editorial: Second-generation sequencing. Brief Bioinform 11, 455-456.
    Rocha, E.P.C. (2004). Codon usage bias from tRNA's point of view: Redundancy, specialization, and efficient decoding for translation optimization. Genome Research 14, 2279-2286.
    Romero, H., Zavala, A., and Musto, H. (2000). Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces. Nucleic Acids Res 28, 2084-2090.
    Sahu, K., Gupta, S.K., Ghosh, T.C., and San, S. (2004). Synonymous codon usage analysis of the mycobacteriophage Bxz1 and its plating bacteria M-smegmatis: Identification of highly and lowly expressed genes of Bxz1 and the possible function of its tRNA species. Journal of Biochemistry and Molecular Biology 37, 487-492.
    Saini, P., Eyler, D.E., Green, R., and Dever, T.E. (2009). Hypusine-containing protein eIF5A promotes translation elongation. Nature 459, 118-121.
    Sau, K., Gupta, S.K., Sau, S., Mandal, S.C., and Ghosh, T.C. (2006). Factors influencing synonymous codon and amino acid usage biases in Mimivirus. Biosystems 85, 107-113.
    Saunders, R., and Deane, C.M. (2010). Synonymous codon usage influences the local protein structure observed. Nucleic Acids Res.
    Schmidt, T., and Frishman, D. (2008). Assignment of isochores for all completely sequenced vertebrate genomes using a consensus. Genome Biol 9, R104.
    Sewatanon, J., Srichatrapimuk, S., and Auewarakul, P. (2007). Compositional bias and size of genomes of human DNA viruses. Intervirology 50, 123-132.
    Shackelton, L.A., Parrish, C.R., and Holmes, E.C. (2006). Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses. J Mol Evol 62, 551-563.
    Shah, P., and Gilchrist, M.A. (2010). Effect of Correlated tRNA Abundances on Translation Errors and Evolution of Codon Usage Bias. PLoS Genet 6, e1001128.
    Shapira, S.D., Gat-Viks, I., Shum, B.O., Dricot, A., de Grace, M.M., Wu, L., Gupta, P.B., Hao, T., Silver, S.J., Root, D.E., et al. (2009). A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell 139, 1255-1267.
    Sharp, P.M., and Li, W.-H. (1987a). The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucl Acids Res 15, 1281-1295.
    Sharp, P.M., and Li, W.H. (1987b). The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15, 1281-1295.
    Sharp, P.M., Tuohy, T.M., and Mosurski, K.R. (1986). Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 14, 5125-5143.
    Shields, D.C., and Sharp, P.M. (1987). Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res 15, 8023-8040.
    Shields, D.C., Sharp, P.M., Higgins, D.G., and Wright, F. (1988). "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol Biol Evol 5, 704-716.
    Shrivastava, S., Poddar, R., Shukla, P., and Mukhopadhyay, K. (2009). Study of codon bias perspective of fungal xylanase gene by multivariate analysis. Bioinformation 3, 425-429.
    Siller, E., DeZwaan, D.C., Anderson, J.F., Freeman, B.C., and Barral, J.M. (2010). Slowing bacterial translation speed enhances eukaryotic protein folding efficiency. J Mol Biol 396, 1310-1318.
    Smith, D., and Yarus, M. (1989). tRNA-tRNA interactions within cellular ribosomes. Proceedings of the National Academy of Sciences of the United States of America 86, 4397-4401.
    Stenoien, H.K., and Stephan, W. (2005). Global mRNA stability is not associated with levels of gene expression in Drosophila melanogaster but shows a negative correlation with codon bias. J Mol Evol 61, 306-314.
    Su, M.W., Lin, H.M., Yuan, H.S., and Chu, W.C. (2009). Categorizing host-dependent RNA viruses by principal component analysis of their codon usage preferences. J Comput Biol 16, 1539-1547.
    Tao, P., Dai, L., Luo, M.C., Tang, F.Q., Tien, P., and Pan, Z.S. (2009). Analysis of synonymous codon usage in classical swine fever virus. Virus Genes 38, 104-112.
    Tats, A., Tenson, T., and Remm, M. (2008). Preferred and avoided codon pairs in three domains of life. BMC Genomics 9, 463.
    Tindle, R.W. (2002). Immune evasion in human papillomavirus-associated cervical cancer. Nat Rev Cancer 2, 59-65.
    Travers, A. (2006). The evolution of the genetic code revisited. Orig Life Evol Biosph 36, 549-555.
    Trinh, R., Gurbaxani, B., Morrison, S.L., and Seyfzadeh, M. (2004a). Optimization of codon pair use within the (GGGGS)3 linker sequence results in enhanced protein expression. Mol Immunol 40, 717-722.
    Trinh, R., Gurbaxani, B., Morrison, S.L., and Seyfzadeh, M. (2004b). Optimization of codon pair use within the (GGGGS)(3) linker sequence results in enhanced protein expression. Mol Immunol 40, 717-722.
    Tuller, T., Carmi, A., Vestsigian, K., Navon, S., Dorfan, Y., Zaborske, J., Pan, T., Dahan, O., Furman, I., and Pilpel, Y. (2010a). An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344-354.
    Tuller, T., Kupiec, M., and Ruppin, E. (2007). Determinants of protein abundance and translation efficiency in S. cerevisiae. PLoS Comput Biol 3, e248.
    Tuller, T., Waldman, Y.Y., Kupiec, M., and Ruppin, E. (2010b). Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci U S A 107, 3645-3650.
    Uetz, P., Dong, Y.A., Zeretzke, C., Atzler, C., Baiker, A., Berger, B., Rajagopala, S.V., Roupelieva, M., Rose, D., Fossum, E., et al. (2006). Herpesviral protein networks and their interaction with the human proteome. Science 311, 239-242.
    Urrutia, A.O., and Hurst, L.D. (2003). The signature of selection mediated by expression on human genes. Genome Res 13, 2260-2264.
    van Hemert, F.J., Berkhout, B., and Lukashov, V.V. (2007). Host-related nucleotide composition and codon usage as driving forces in the recent evolution of the Astroviridae. Virology 361, 447-454.
    Versteeg, R., van Schaik, B.D., van Batenburg, M.F., Roos, M., Monajemi, R., Caron, H., Bussemaker, H.J., and van Kampen, A.H. (2003). The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res 13, 1998-2004.
    Vogel, C., Abreu Rde, S., Ko, D., Le, S.Y., Shapiro, B.A., Burns, S.C., Sandhu, D., Boutz, D.R.,
    Marcotte, E.M., and Penalva, L.O. (2010). Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol Syst Biol 6, 400.
    Waldman, Y.Y., Tuller, T., Shlomi, T., Sharan, R., and Ruppin, E. (2010). Translation efficiency in humans: tissue specificity, global optimization and differences between developmental stages. Nucleic Acids Res 38, 2964-2974.
    Wang, S., Taaffe, J., Parker, C., Solorzano, A., Cao, H., Garcia-Sastre, A., and Lu, S. (2006). Hemagglutinin (HA) proteins from H1 and H3 serotypes of influenza A viruses require different antigen designs for the induction of optimal protective antibody responses as studied by codon-optimized HA DNA vaccines. J Virol 80, 11628-11637.
    Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, 57-63.
    Watanabe, Y., Tenzen, T., Nagasaka, Y., Inoko, H., and Ikemura, T. (2000). Replication timing of the human X-inactivation center (XIC) region: correlation with chromosome bands. Gene 252, 163-172.
    Watts, J.M., Dang, K.K., Gorelick, R.J., Leonard, C.W., Bess Jr, J.W., Swanstrom, R., Burch, C.L., and Weeks, K.M. (2009). Architecture and secondary structure of an entire HIV-1 RNA genome. Nature 460, 711-716.
    Weir, W., Sunter, J., Chaussepied, M., Skilton, R., Tait, A., de Villiers, E.P., Bishop, R., Shiels, B., and Langsley, G. (2009). Highly syntenic and yet divergent: a tale of two Theilerias. Infect Genet Evol 9, 453-461.
    Weygand-Durasevic, I., and Ibba, M. (2010). New Roles for Codon Usage. Science 329, 1473-1474.
    Worley, K.C., and Gibbs, R.A. (2010). Genetics: Decoding a national treasure. Nature 463, 303-304.
    Wright, F. (1990). The `effective number of codons' used in a gene. Gene 87, 23-29.
    Wu, C.I., and Li, W.H. (1985). Evidence for higher rates of nucleotide substitution in rodents than in man. Proc Natl Acad Sci U S A 82, 1741-1745.
    Yap, V.B., Lindsay, H., Easteal, S., and Huttley, G. (2010). Estimates of the effect of natural selection on protein-coding content. Mol Biol Evol 27, 726-734.
    Yarus, M., and Folley, L.S. (1985). Sense codons are found in specific contexts. J Mol Biol 182, 529-540.
    Youngman, E.M., McDonald, M.E., and Green, R. (2008). Peptide release on the ribosome: mechanism and implications for translational control. Annu Rev Microbiol 62, 353-373.
    Zaborske, J.M., Narasimhan, J., Jiang, L., Wek, S.A., Dittmar, K.A., Freimoser, F., Pan, T., and Wek, R.C. (2009). Genome-wide analysis of tRNA charging and activation of the eIF2 kinase Gcn2p. J Biol Chem 284, 25254-25267.
    Zamora, A., Sun, Q., Hamblin, M.T., Aquadro, C.F., and Kresovich, S. (2009). Positively selected disease response orthologous gene sets in the cereals identified using Sorghum bicolor L. Moench expression profiles and comparative genomics. Mol Biol Evol 26, 2015-2030.
    Zavala, A., Naya, H., Romero, H., and Musto, H. (2002). Trends in codon and amino acid usage in Thermotoga maritima. J Mol Evol 54, 563-568.
    Zeng, K., and Charlesworth, B. (2009). Estimating Selection Intensity on Synonymous Codon Usage in a Non-equilibrium Population. Genetics.
    Zhang, C.T., Gao, F., and Zhang, R. (2005). Segmentation algorithm for DNA sequences. Physical Review E 72, -.
    Zhang, C.T., Wang, J., and Zhang, R. (2001). A novel method to calculate the G+C content of genomic DNA sequences. J Biomol Struct Dyn 19, 333-341.
    Zhang, C.T., and Zhang, R. (2004). Isochore structures in the mouse genome. Genomics 83, 384-394.
    Zhang, F., Saha, S., Shabalina, S.A., and Kashina, A. (2010). Differential Arginylation of Actin Isoforms Is Regulated by Coding Sequence-Dependent Degradation. Science 329, 1534-1537.
    Zhang, G., Hubalewska, M., and Ignatova, Z. (2009). Transient ribosomal attenuation coordinates protein synthesis and co-translational folding. Nat Struct Mol Biol 16, 274-280.
    Zhang, G., and Ignatova, Z. (2009). Generic algorithm to predict the speed of translational elongation: implications for protein biogenesis. PLoS ONE 4, e5036.
    Zhao, K.N., Gu, W., Fang, N.X., Saunders, N.A., and Frazer, I.H. (2005). Gene codon composition determines differentiation-dependent expression of a viral capsid gene in keratinocytes in vitro and in vivo. Mol Cell Biol 25, 8643-8655.
    Zhao, K.N., Liu, W.J., and Frazer, I.H. (2003). Codon usage bias and A+T content variation in human papillomavirus genomes. Virus Res 98, 95-104.
    Zhong, J., Li, Y., Zhao, S., Liu, S., and Zhang, Z. (2007). Mutation pressure shapes codon usage in the GC-Rich genome of foot-and-mouth disease virus. Virus Genes 35, 767-776.
    Zhou, J., Liu, W.J., Peng, S.W., Sun, X.Y., and Frazer, I. (1999). Papillomavirus capsid protein expression level depends on the match between codon usage and tRNA availability. J Virol 73, 4972-4982.
    Zhou, M., and Li, X. (2009). Analysis of synonymous codon usage patterns in different plant mitochondrial genomes. Mol Biol Rep 36, 2039-2046.
    Zhou, T., Gu, W., Ma, J., Sun, X., and Lu, Z. (2005a). Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses. Biosystems 81, 77-86.
    Zhou, T., Gu, W.J., Ma, J.M., Sun, X., and Lu, Z.H. (2005b). Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses. Biosystems 81, 77-86.
    Zhou, T., Weems, M., and Wilke, C.O. (2009). Translationally optimal codons associate with structurally sensitive sites in proteins. Mol Biol Evol 26, 1571-1580.
    Zhu, C.T., Zeng, X.B., and Huang, W.D. (2003). Codon usage decreases the error minimization within the genetic code. Journal of Molecular Evolution 57, 533-537.
    Zoubak, S., Clay, O., and Bernardi, G. (1996). The gene distribution of the human genome. Gene 174, 95-102.
    Zvelebil, M., and Baum, J. (September 2007). Understanding Bioinformatics (Garland Science).
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.