Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis
详细信息    查看全文
文摘
Can we infer the lifestyle of an organism from the characteristic properties of its genome? More precisely, what are the relations between easily quantifiable properties from genomic sequences, such as amino-acid compositions, and more subtle characteristics concerning for example lifestyles or evolutionary trends? Here, we seek a global picture for such properties, based on a large number (56) of complete genomes, including significant numbers of representatives from the three domains of life. We consider the amino acid compositions of the predicted proteomes, and we use correspondence analysis, as a multivariate method to extract the relevant information from the large-scale data. From these analyses we derive a series of conclusions, concerning lifestyles, as well as physico-chemical and evolutionary trends: (1) correspondence analysis of the amino acid compositions permits discrimination between the three known lifestyles (mesophily/thermophily/hyperthermophily). (2) For various organisms, amino-acid composition properties are essentially driven by GC content, and to a significantly lesser extent by growth temperatures associated with lifestyles. Roughly speaking, the respective contributions of these two components are 57 and 20 % . It is notable that these proportions are essentially unchanged with respect to a previous analysis (Nature 393 (1998) 537), which involved only 15 genomes, available at the time. (3) In terms of amino acid compositional biases, two specific ‘signatures’ for thermophily (in a broad sense, including hyperthermophily) can be detected. First, thermophilic species display a relative abundance in glutamic acid (Glu), concomitantly with the depletion in glutamine. Second, in thermophilic species, the relative abundance in Glu (negative charge) is significantly correlated (Pearson correlation coefficient r=0.83 with P<0.0001), with the increase in the lumped ‘pool’ lysine+arginine (positive charges). This correlation (absent in mesophiles) could be interpreted on a physico-chemical basis, relevant to the thermostability of proteins. (4) Statistically significant differences are observed between the average lengths of the genes in the surveyed species, which follow their distribution between the three domains of life. Also a significant difference is observed between the average lengths of thermophilic (283.0±5.8) versus mesophilic (340±9.4) genes. It is thus possible that the ‘general’ shortening of the primary sequences in thermophilic proteins plays a role in thermostability. (5) Considering various combinations of conservation properties (genes conserved exclusively in eukaryotes, in archaea, in bacteria, in combinations of two domains, etc.) correspondence analysis reveals a trend towards thermophilic-hyperthermophilic profiles for the most conserved subset of genes (ancient genes). (6) When limited to the subset of species-specific genes, correspondence analysis leads to a different picture for the clustering of genomes following amino-acid compositions: for example, the ‘core’ specific part of a genome can bear lifestyle signatures different from those of the complete genome.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700