文摘
The aminoacyl-tRNA synthetases are one of the major protein components in the translation machinery. These essential proteins are found in all forms of life, and are responsible for charging their cognate tRNAs with the correct amino acid. The evolution of the tRNA synthetases is of fundamental importance with respect to the nature of the biological cell, and the transition from an RNA-world to the modern world dominated by protein-enzymes. By using structural alignments of all of the aminoacyl-tRNA synthetases of known structure in combination with a new measure of structural similarity, we reconstructed the evolutionary history of these proteins. By properly accounting for the effect and presence of gaps, a phylogenetic trees computed using this metric are shown to be congruent with the maximum-likelihood sequence-based phylogenies. The results indicated that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Since protein structure is more highly conserved than protein sequence, this study allowed us to glimpse the evolution of protein structure that predates the root of the universal phylogenetic tree. In addition, we developed a new algorithm, based on the multidimensional QR factorization, to remove redundancy from multiple structure or sequence alignments by choosing representative proteins that best span the evolutionary space of the homologous group. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, were shown to outperform well tested profiles in homology detection searches, a key step in genome annotation, over the Swiss-Prot and NCBI non-redundant sequence databases.