基于免疫遗传算法的基序识别方法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
从生物序列中识别基序是生物信息学中的一个热点问题,也是生物学中研究基因调控机制的基础计算问题之一。由于基序长度较短、非百分百保守以及生物数据复杂性高等原因,通过计算方法识别基序仍旧是人们研究基因转录调控过程的一大难点。遗传算法由于其鲁棒性强、随机性、全局性以及适于并行处理等优点,近年来被越来越广泛地应用到基序识别问题的求解中,并成为重要的发展方向。
     虽然遗传算法已形成一套较为完善的算法体系,但早熟收敛、随机漫游等不足限制了其应用。生物学领域的研究发现,生物免疫系统可以很好地保持种群多样性,抑制早熟收敛和限制随机漫游。因此,利用免疫原理可以有效地改进和提高遗传算法的性能。
     针对遗传算法缺少种群多样性保持策略的不足,考虑生物免疫系统的优点,本文将浓度调节机制引入到遗传算法中,提出了一种基于浓度机制的免疫遗传算法,并将其应用于基序识别。根据基序识别问题定义和基序的表示方式,设计了新的抗体亲和力和抗体浓度计算公式,模拟生物体免疫行为,在遗传算法比例选择算子的基础上,添加了浓度调节因子来抑制高浓度抗体的繁殖,使提出的算法能够有效地保持种群多样性,避免早熟收敛现象的发生。实验结果表明该算法有较好的基序识别效果,.能够在较长序列中识别基序,在一次运行中识别多基序。
     为了克服算法在遗传过程中随机搜索造成的种群退化现象,本文模拟人工免疫系统,引入疫苗的调控作用,提出了一种基于疫苗机制的基序识别算法,通过疫苗提取、疫苗接种和免疫选择来抑制种群的退化现象,加快算法收敛速度。通过在模拟生物数据和真实生物数据上的实验,说明该算法进一步提高了基序识别效果。
Motif discovery in biological sequences is a hot issue in bioinformatics and a fundametal computational problem with important applicaitons in understanding gene regulation. As the motif with very short length, non-hundred percent conservative, and the complexity of biological data, the identification of motifs through computational methods is still a major challenge. Because of its relative superiority to the traditional optimization algorithm, evolutionary algorithm has recently been more widely applied to the motif discovery problem, and becomes an important direction of development.
     The lack of premature convergence and random roaming limits GA's application. People find that biological immune system can keep the diversity well, and inhibit premature convergence and random roaming. Therefore, we can effectively improve and enhance the performance of genetic algorithm by using the immune theory.
     Considering the lack of population diversity maintain with genetic algorithm and the advantages of biological immune system, we introduce concentration regulation into genetic algorithm and propose an immune genetic algorithm based on concentration mechanism and apply to motif discovery. We define new formuals of antibody affinity and antibody concentration according to the definition of motif discovery problem and the representation of motifs. Based on the election operator of GA, we introduce a concentration regulation operator to inhibit the reproduce of high concentration antibodys. The proposed method can effectively maintain the population diversity and inhibit premature convergence phenomenon. Experimental results show that the method could find motifs in relative long sequences and multiple motifs in a single run.
     In order to inhibit the degradation during evolution, we introdce the regulation of vaccine and propose an motif discovery method based on immune vaccine. The Population degradation could be well inhibited by extracting the vaccine, vaccination and immunization choice, so the convergence could be speed up. Experimental results show that the motif discovery ability has been further improved.
引文
[1]Goodman N. Biological data becomes computer literate: new advances in bioinformatics. Journal of Current Opinion in Biotechnology, 2002, 13(1): 68-71
    [2]Lodish H, Berk A, Zipursky S, et al. Molecular Cell Biology. 2nd. New York: W. H. Freeman Co, 2000, chapter 10, 605-612.
    [3]Wray G A, Hahn M W, Abouheif E, et al. The Evolution of Transcriptional Regulation in Eukaryotes. Journal of Molecular Biology and Evolution, 2003, 20(9): 1377-1419
    [4]Haeseleer P D. How does DNA sequence motif discovery work. Journal of Nature Biotechnology, 2006, 24:959-961
    [5]Fogel G B, Weekes D G, Varga G, et al. Discovery of sequence motifs related to co-expression of genes using evolutionary computation. Journal of Nucleic Acids Res, 2004, 32(13): 3826-3835
    [6]Henikoff S, Henikoff J G. Amino acid substitution matrices from protein blocks. In: Proc of the National Academy of Sciences of the United States of America. Washington, USA: AAAI Press, 1993, 89(22): 10915-10919
    [7]Elnitski L, Jin V X, Farnham P J, et al. Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Journal of Genome Research, 2006, 16(12): 1455-1464
    [8]John E H. Learning using an artificial immune system. Journal of Network and Computer Applications, 1996, 19(1): 189-212
    [9]Galas D, Eggert, M, Waterman M. Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from Escherichia coli. Journal of molecular biology, 1985, 186 (1): 117-128
    [10]Pevzner P A, Sze S H. Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc of the Eighth International Conference on Intelligent Systems for Molecular Biology. San Diego, USA: AAAI Press, 2000, 8:269-278
    [11]Styczynski M P, Jensen K L, Rigoutsos I, et al. An extension and novelsolution to the(1-d)-motif challenge problem. Journal of Genome Informatics, 2004, 15(2): 63-71
    [12]Eskin E, Pevzuer P A. Finding composite regulatory patterns in DNA sequences. Journal of Bioinformatics, 2002, 18:354-363
    [13]Pesole G, Prunella N, Liuni S, et al. Wordup: An Efficient Algorithm for Discovering Statistically Significant Patterns in DNA Sequences. Journal of Nucleic Acids Res., 1992, 20 (11): 2871-2875
    [14]Sinha S, Tompa M. YMF: A program for discovery of novel transcription factor binding sites by tatistical overrepresentation. Journal of Nucleic Acids Res, 2003, 31: 3586-3588
    [15]王建新,黄元南.一种基于彩色编码技术的基序发现算法.软件学报,2007,18(6):1298-1307
    [16]Buhler J, Tompa M. Finding motifs using random projections. Journal of Computer Biology, 2002, 9(2): 225-242
    [17]Jian X W, Yang D. UPNT: Uniform Projection and Neighbourhood Thresholding method for motif discovery. International Journal of Bioinformatics Research and Applications, 2008, 4(1): 96-106
    [18]Lawrence C, Altschul S, Boguski M, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Journal of Science, 1993, 262(5131): 208-214
    [19]Matthieu D, Jacques V H. Info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling. Journal of Bioinformatics, 2009, 25(20): 2715-2722
    [20]Bailey T, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc of Int. Conf. Inteel. Syst. Mol. Biol.(ISMB). California, USA: AAAI Press, 1994, 2:28-36
    [21]Chengpeng B. A monte carlo EM algorithm for de novo motif discovery in biomoleeular sequences. Journal of IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009, 6(3): 370-386
    [22]Fogel G B, Weekes D G, Varga G, et al. Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Journal of Nucleic Acids Res, 2004, 32 (13): 26-35
    [23]Michael A L, Andy M T. Regulatory Motif Discovery Using a Population Clustering Evolutionary Algorithm. Journal of IEEE/ACM Transactions on computational biology and bioinformatics, 2007, 4(3): 403-414
    [24]Tak-Ming C, Kwong-Sak L, Kin-Hong L. TFBS identification based on genetic algorithm with representations and adaptive post-processing. Journal of Bioinformatics, 2008, 24(3): 341-349
    [25]Hightower R R, Forrest S, Perelson A S. The evolution of emergent organization in immune system gene libraries. In: Procs of the 6th International Conference on Genetic Algorithms. San Francisco, USA: Morgan Kaufmann Publishers Inc, 1995, 344-350
    [26]Chun J S, Kim M K, Jang Hyun-Kyo. Shape optimization of electromagnetic devices using immune algorithm. Journal of IEEE Trans On Magnetics, 1997, 33(2): 1876-1879
    [27]Isao T. An Evolutionary Optimization Based on the Immune System and Its Application to the VLSL Floor-Plan Design Problem. Journal of Electrical Engineering, 1998, 124(4): 27-36
    [28]Chun J S, Jang H, Hahn S. A study on comparison of optimization performances between immune algorithm and other heuristic algorithms. Journal of IEEE Trans On Magnetics, 1998, 34(5): 2972-2975
    [29]Huang S J. An Immune-Based Optimization Method to Capacitor Placement in a Radial Distribution System. Journal of IEEE Transaction on Power Delivery, 2000, 15(2): 744-749
    [30]Wang L. The Immune Genetic Algorithm and Its Converge. In: Fourth International Conference on Signal Processing Proceedings. Beijing, China: IEEE, 1998, 3(1): 1347-1350
    [31]徐雪松,诸静.免疫遗传算法的改进及其在模糊控制中的应用研究.信息与控制,2003,32(5):462-466
    [32]杨孔雨,王秀峰.自适应多模态免疫进化算法的研究与实现.控制与决策,2005,20(6):717-720
    [33]刘帅,马志强,刘清雪等.基于自适应免疫遗传算法的多序列比对.信息科学,2007,2:15-17
    [34]吴斯,曹炬.基于小生境免疫遗传算法的硅钢片优化排.计算机工程,2008,34(10):181-183
    [35]马佳,石刚,高立群等.求解FJSP的自适应免疫遗传算法.系统仿真学报,2009,21(12):3609-3613
    [36]Day W H, McMorris F R. Critical comparison of consensus methods for molecular sequences. Journal of Nucleic Acids Res, 1992, 20(5): 1093-1099
    [37]Staden R. Computer methods to locate signals in nucleic acid sequences. Journal of Nucleic Acids Res, 1984, 12(1,2): 505-519
    [38]Giulio P, Giancarlo M, Graziano P. In silico representation and discovery of transcription factor binding sites. Journal of Briefings in bioinformatics, 2004, 5(3): 217-236
    [39]Benos P V, Bulyk M L, Stormo G D. Additivity in protein-DNA interactions: how good an approximation is it. Journal of Nucleic Acids Res, 2002, 30(20): 42-4451
    [40]Schneider T D, Stephens R M. Sequence Logos: A New Way to Display Consensus Sequences. Journal of Nucleic Acids Res, 1990, 18(20): 6097-6100
    [41]Ming L, Bin M, Lusheng W. Finding similar regions in many sequences. Journal of Computer and System Sciences, 2002, 65: 73-96.
    [42]Sagot M. Spelling approximate repeated or common motifs using a suffix tree. Journal of Lecture Notes in Computer Science, 1998, 1380:111-127
    [43]Vilo J, Brazma A, Jonassen I, et al. Mining for putative regulatory elements in the yeast genome using gene expression data. In: Proc of the Eighth International Conference on Intelligent Systems for Molecular Biology. San Diego, USA: AAAI Press, 2000, 384-394
    [44]Man-Hung E T, Anders K, Ole W B. Flexible Biological Modeling for Motif Discovery. Journal of Computational Biology, 2008, 15(10): 1347-1363
    [45]Helden J, Andre B, Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of molecular biology, 1998, 281: 827-842.
    [46]Helden J, Rios A F, Collado-Vides J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Journal of Nucleic Acids Res, 2000, 28: 1808-1818
    [47]Ganesh R, Siegele D A, Ioerger T R. MOPAC: motif finding by preprocessing and agglomerative clustering from microarrays. Journal of Pac Symp Biocomput, 2003, 8: 41-52
    [48]Hughes J D, Estep P W. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomycess cerevisiae. Journal of Molecular Biology, 2000, 296(5): 1205-1214
    [49]Shida K. GibbsST: a Gibbs sampling method for motif discovery with enhanced resistance to local optima. Journal ofBMC Bioinformatics, 2006, 7:486
    [50]Down T A, Hubbard T J. NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence. Journal of Nucleic Acids Res, 2005, 33: 1445-1453
    [51]Lawrence C E, Reilly A A. An expectation maximization algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Journal of Proteins, 1990, 7(1): 41-51
    [52]Xing E P, Wu W, Jordan M I, et al. Logos: a modular bayesian model for de novo motif detection. Journal of Bioinform Comput Biol, 2004, 2(1): 127-154
    [53]Lones M A, Tyrrell A M. The Evolutionary Computation Approach to Motif Discovery in Biological Sequences. In. Proc of the 2005 workshops on Genetic and evolutionary computation. Washington, USA: ACM, 2005, 1-11
    [54]Clare B C, Charles W F, Noah W S, et al. Preliminary Results for GAMI: A Genetic Algorithms Approach to Motif Inference. In: Proc of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. La Jolla, CA, USA: IEEE, 2005, 97-104
    [55]Zhi W, Shane T J. GAME: detecting cis-regulatory elements using a genetic algorithm. Journal of Bioinformatics, 2006, 22(13): 1577-1584
    [56]Mehmet K. MOGAMOD: Multi-objective genetic algorithm for motif discovery. Journal of Expert Systems with Application, 2009, 36(2): 1039-1047
    [57]Habib Youssef. Evolutionary Algorithms, Simulated Annealing and Tabu Search: A Compatative Study. Journal of Engineering Application of Artificial Intelligence, 2001, 14(2): 167-181
    [58]Dasgupta D. Artificial Immune Systems and their Applications.1st edition. New York: Springer, 1999
    [59]Licheng J, Lei W. A Novel Genetic Algorithm Based on Immunity. Journal of IEEE Transactions on Systems, Man, and Cybernetics, 2000, 30(5): 552-561
    [60]Prier R, Praz V, Junier T, et al. The eukaryotic promoter database (EPD). Journal of Nucleic Acids Research, 2000, 28:302-303
    [61]Wasserman W W, Fickett J W. Identification of regulatory regions which confer muscle-specific gene expression. Journal of Molecular Biology, 1998, 278(1): 3613-3619
    [62]Sandelin A, Hogund A, Lenhard B, et al. Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes. Journal of Functional and integrative genomics, 2003, 3(3): 125-134

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700