摘要
Identification of a correct N-terminus of a protein is an important step in genome annotation. However, we sometimes encounter incorrectly annotated N-termini in genomic databases. We analyzed statistics of surplus or missing N-terminal amino acid residues in tentatively translated coding sequence of cyanobacterial database entries, and found that, on average, about 8-9%of the aligned proteins have a putative incorrect N-terminus, although the percentage was dependent on the database entry. In an attempt to find more plausible N-termini for these proteins, we were able to estimate a better-aligning N-terminus in 90%of the cases. TTG was found as a putative initiation codon in most cases of recessed N-termini. This statistical approach, applicable to any group of prokaryotes, will help identify a plausible translation initiation site for each protein-coding gene in newly sequenced genomes, and also is a method of refining the N-terminus of proteins in already published genomes.