文摘
Various methods for identifying significant contextual signals are widely used to search for the transcription factor binding sites and to identify the structural and functional organization of the regulatory regions. These methods do not require any prealignment of the sample sequences analyzed or experimental information about the exact location of transcription factor binding sites. Methods of searching for contextual signals, based on the identification of degenerate oligonucleotide motifs recorded in the 15-letter IUPAC code have become widespread. A fundamental problem with degenerate motifs is their great diversity, which makes the researchers apply heuristics which do not guarantee that the most significant signal will be found. The development of high-performance computing systems based on the use of graphics cards has made it possible to use exact exhaustive methods to identify significant motifs. We have developed a new system for identifying significant degenerate oligonucleotide motifs of a given length in the regulatory regions based on the use of widespread graphics cards that provide a search for the signal with the greatest significance. The higher efficiency of the GPU compared to the CPU was demonstrated. Using the proposed approach, we analyzed the regulatory regions of the B. subtilis, E. coli, H. pylori, M. gallisepticum, M. genitalium, and M. pneumoniae genes. Sets of degenerate motifs have been identified for each species of prokaryotes. They were classified based on the similarity with the transcription factor binding sites of E. coli.