基于改进遗传算法的DNA序列设计
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,DNA计算是计算机研究领域的一个研究热点,在分子生物计算机的研究中倍受学者们的关注。1994年Adleman博士在Science上发表了一篇名为“Molecular Computation of Solution to Combinatorial Problems”的论文标志着DNA计算的诞生。DNA分子具有高度的并行性,并且作为信息的载体其储存的容量也非常大,同时DNA分子的资源也非常的丰富,这些也就是DNA计算所具备的优点。随着生物技术的不断发展,DNA计算将会被用来解决越来越多、越来越复杂的实际问题,特别是一些NP问题,最终将会产生一种全新的DNA计算机,它将会给数学、计算机科学等学科带来飞速地发展。
     DNA计算中最主要的是DNA分子间的杂交反应,其效率和精度直接影响到DNA计算的结果,因此好的序列编码方法对于提高DNA计算的可靠性和有效性具有十分重要的意义。目前DNA编码研究主要分为两个主要方向:一个是DNA序列编码的质量优化;另一个是DNA序列编码的集合设计。本文针对这两个方向做了较深入地研究,并着重对DNA序列编码的集合进行研究。
     DNA序列编码的质量优化研究的主要内容是依据问题的规模来找出满足确定约束条件的一定数量的DNA序列。所采用的约束条件通常是汉明距离约束和热力学约束,主要研究方法包括理论上的构造和计算机搜索两种主要方法。目前DNA编码的质量方面的研究已比较成熟,并已开发出比较好的软件成品。本文中将利用改进的遗传算法对DNA序列进行质量优化,该优化方法得到的DNA序列不仅满足DNA序列的约束条件,而且与前人研究成果相比有了显著地提高。
     DNA序列集合设计研究的主要内容是已知序列的长度及其约束条件,找到满足条件的所有的DNA序列的集合的最大值。目前最常用的约束条件是汉明距离约束下的约束条件,热力学约束条件以及他们的组合约束。主要研究方法分为理论研究和算法研究。结合上述约束条件,本文基于DNA序列集合研究的特点,对遗传算法进行更进一步地改进,将其运用于对DNA序列集合进行研究,不仅算法性能上有了显著地提高,而且得到的结果也比前人的成果有了很大地提高。
DNA computing is a new method that uses biological molecule DNA as computing medium and biochemical reaction as computing tools. In 1994, Dr Adleman released“Molecular Computation of Solutions to Combinatorial Problems”in Science, which indicates DNA computing comes into being. The DNA molecule has a high parallelism, and a great of storage capacity as a carrier of information. There are the merits of DNA computing. At the same time, the resource of DNA molecule is very rich. Along with the development of biologic technology, the DNA computing will solve more and more complex problems, especially NP problems. Finally, it will product a new DNA computer which could bring the flying development of Mathematics, Computer Science and other subjects.
     In DNA computing, the core reaction is the specific hybridization between DNA sequences or the Watson-Crick complement, and which directly influences the reliability of DNA computation with its efficiency and accuracy. The efficiency and accuracy is influenced by DNA sequences, so how to product good DNA sequences is very important for improving the efficiency and reliability of DNA computing. At present, these are two researchful aspects about DNA sequences coding: on the one hand, it is the quality optimization of DNA sequences coding; on the other hand, it is the designing of DNA sequences coding sets. This paper counters these two aspects to do penetrating research, and puts emphasis on the research about the designing of DNA sequences sets.
     The main researchful content of the quality optimization of DNA sequences is that: according to the scale of problem, it is to search a certain amount of DNA sequences which satisfy the combinatorial constraints. The common constraints include the Hamming distance constrain, thermodynamics constrain and combinatorial constraints. The two main researchful methods are the theoretical contract and the searching by computer. At present, the research about the DNA sequences is relatively mature, and has developed software which has better ability. This paper which uses the improved genetic algorithm optimizes the quality of DNA sequences. These DNA sequences which are obtained by the improved genetic algorithm not only satisfy the DNA constraints, but also comparing with the previous works, have a significantly improvement.
     The main researchful content of the designing of DNA sequence sets is that: according to the known the length and the constraints, it is to search all the maximum of DNA sequences sets which satisfy the constraints. The common constraints include the Hamming distance constrain, thermodynamics constrain and combinatorial constraints. It includes two main methods one of which is theoretical research, other is research of algorithm. According to the constrains and the characteristic of the designing of DNA sequences sets, this paper ulteriorly improves the genetic algorithm and uses it to research the designing of DNA sequences sets. It not only makes the algorithm improved, but also obtains the results which are better than previous works.
引文
[1] J Q Liu, K Shimohara. Molecular Computation and Evolutionary Wetware: A Cutting-Edge Technology for Artificial Life and Nanobiotechnologies. Systems, Man and Cybernetics. Part C: Applications and Reviews, IEEE Transactions, 2007,37(3):325–336.
    [2]许进,范月科.经典Ramsey数DNA计算模型(Ⅰ):位序列计算模型.计算机学报,2008,12(3):2073-2080.
    [3]殷志祥,石晓龙,徐涛,许进. 0-1整数规划问题的半自动化DNA计算模型.生物信息学. 2006,4:113-116.
    [4] L M Adleman. Molecular Computation of Solution to Combinatorial problems. Science,1994,266:1021-1024.
    [5]许进,范月科.经典Ramsey数DNA计算模型(Ⅰ):位序列计算模型.计算机学报,2008,12(3):2081-2089.
    [6]石晓龙,许智榜,殷志祥.用于DNA计算的微流控制系统研究进展.计算机工程,2007,33(3):221-222.
    [7]强小利,曾波,王子成,寇铮.求解0-1规划问题的DNA计算模型.计算机学报, 2008,12(3):2155-2159.
    [8] Lipton R J. DNA solution of hard computational problems. Science, 1995,268(5210):542–545.
    [9]张勋才,牛莹,崔光照,王延峰.基于自组装DNA计算的NTRU密码系统破译方案(英文).计算机学报,2008,12(3):2129-2137.
    [10]朱翔鸥,刘文斌,陈丽春,吴桂初.一种实现三值逻辑电路的DNA计算模型.计算机科学,2008,35(2):178-180.
    [11]崔光照,牛云云,张勋才.DNA序列编码的研究进展.生物技术通报,2006,4:16-24.
    [12]王世英,原军,林上为. DNA标号图和DNA计算.中国科学(A辑:数学),2007,37(9):1059-1072.
    [13] Max H. Garzon, Vinhthuy Phan,Kiran C. Bobba, Raghuver Kontham. Sensitivity and Capacity of Microarray Encodings. DNA11, LNCS, 2006,3892:81–95.
    [14]沈世镒张拓. DNA计算中突变误差的纠正.计算机工程与应用, 2006,7:4-6.
    [15] Chun Li, Xiaoqing Yu, Nadia Helal. Similarity analysis of DNA sequences based on codon usage. Chemical Physics Letters, 2008, 459:172-174.
    [16]刘向荣,赵东明,郗方,李菲.活体生物计算模型的研究进展及展望(英文).计算机学报,2008,12(3):2103-2108.
    [17] Kitajima T, Takinoue M, Shohda K, Suyama A. Design of Code Words for DNA Computers and Nanostructures with Consideration of Hybridization Kinetics. DNA 13, LNCS, 2008, 4848: 119–129.
    [18]殷志祥.图与组合优化中的DNA计算.北京:科学出版社,2004.
    [19] Yachkov A D, Macula A, Rykov V, Ufimtsev V. DNA Codes Based on Stem Similarities Between DNA Sequences,DNA 13, LNCS, 2008, 4848: 146–151.
    [20]方刚,张运良,朱岩,许进.基于压电基因传感器的DNA计算(英文).计算机学报, 2008,12(3):2109-2115.
    [21] Bo Cui, S Konstantinidis. DNA Coding Using the Subword Closure Operation. DNA 13, LNCS, 2008, 4848: 284–289.
    [22]杨静,张成. DNA自组装技术的研究进展及难点(英文).计算机学报,2008,12(3):2138-2148.
    [23] Yadnyesh Joshi, Sathish Vadhiyar. Analysis of DNA sequence transformations on grids. J. Parallel Distrib. Comput. 2008, 69: 80-90.
    [24]刘伟,孙守霞.用于DNA数值计算的一种信息传递模式(英文).计算机学报, 2008,12(3): 2220-2224.
    [25] Oliver D. Kinga, Philippe Gaboritb. Binary templates for comma-free DNA codes hm for the Graph Coloring Problem. Discrete Applied Mathematics, 2007, 155: 831–839.
    [26] Gao Jie, Xu Zhen-Yuan. Chaos game representation (CGR)-walk model for DNA sequences. Chinese Physics B,2009,18(1):370-376.
    [27] Lipton R. DNA Solution of Hard Computation Problems. Science,1995,268(4):542-545.
    [28] Ouyang Q, Kaplan P D, Liu S, Libchaber A. DNA Solution of the Maximal Clique Problem. Science,1997,78(17):446-449.
    [29] Smith L M, Corn R M, Condon A E et al.A surface-based approach to DNA computation. Comput.biol,1998,5:255-267.
    [30] Liu Q,Wang L,Frutos A G et al.DNA Computing on Surfaces.Nature,2000,403:175-179.
    [31] Braich R S,Chelyapov N,Johnson C et al.Solution of a 20-Variable 3-SAT Problem on a DNA Computer.Science,2002,296:499-502.
    [32] Wood D H.Applying error correcting codes to DNA computing.4th DIMACS workshop on DNA based computers,Pennsylvania,1998:109-110.
    [33] Hartemink A,Gifford D K, Khodor J. et al. Automated constraint-based nucleotide sequence selection for DNA computation.4th DIMACS workshop on DNA based computers, Pennsylvania,1998:287-297.
    [34] Guarnieri F,Fiss M,Bancroft C.Making DNA add. Science,1996,273(12):220-223.
    [35] Berard Y,Allen P,Mills Jr et al.DNA implementation of addition in which the input strands are separated from the operater strands.Biosystem,1999,52:165-174.
    [36] Hug H,Schuler R.DNA-based parallel computation of simplearithmetic. 7th international Workshop on DNA-Based Computers, Tampa,2001:159-166.
    [37] Baum E B. Building an associative memory vastly larger than the brain. Science,1995, 268:583-585.
    [38] Reif J H,LaBean T H, Pirrung M,Rang V S,Guo B, Kingsford C, Wickham CAS. Experimental Construction of Very Large Scale DNA Databases with Associative Search Capability. LNCS,2002,2340:231-247.
    [39] Reif J H, LaHean T H. Computationally Inspired Biotechnologies: Improved DNA Synthesis and Associative Search Using Error-Correcting Codes and Vector-Quantization. DNA6,LNCS,2001, 2054: 145-172.
    [40]胡宇舟,王雷,顾学道.有界整数规划问题的DNA计算.计算机应用,2008,28:18-24.
    [41]马芳芳,王淑栋,李涵,薛圣伟.粘贴与删除系统求解最短有向路的DNA计算模型.计算机工程与应用,2008, 44(25):40-43.
    [42]王子成,周康,罗亮,强小利. DNA计算中编码序列的过滤函数研究.计算机工程与应用,2008,44(32): 10-11.
    [43]李涛.应用DNA计算解决最大团问题.电力学报,2006,21(3):297-300.
    [44]周康,覃磊,高婧,同小军. DNA计算在电路设计中的应用.华中科技大学学报(自然科学版), 2008, 36(8): 107-110.
    [45]刘毅,宋玉阶.多维背包问题的DNA计算.生物数学学报,2008,23(1):180-186.
    [46]崔光照,周君和,王延峰. DNA计算中的编码序列设计问题.郑州轻工业学院学报(自然科学版),2007,22(3):77-80.
    [47]张凯,耿修堂,肖建华等.DNA编码问题及其复杂性研究.计算机应用研究,2008,25(11):5622-5625.
    [48]周康,同小军,许进.基于DNA计算的指派问题.华中科技大学学报(自然科学版),2008,(36) 2:62-65.
    [49] Baum E B. DNA Sequences Useful for Computation. Proceedings of 2nd DIMACS Workshop on DNA Based Computers, 1996.
    [50] Deaton R, Garzon M, Pauned G. Thermodynamic Constraints on DNA-based Computing. Computing with Bio-Molecues: Theory and Expirements, 1998: 138-152.
    [51] Deaton R, Franceschetti, Garzon M. etal. Information transfer through hybridyzation reaction in DNA based computing.Proceedings of the Second Annual Conference, California,1997:463-471.
    [52] Deaton R, Murphy R C, Garzon M, Franceschetti D R, Stevens S E. Genetic search of reliable encodings for dna-based computation. First Genetic Programming Conference, Stanford University, 1996: 9-15.
    [53] Deaton R. et al. Good Encodings for DNA-Based Solutions to Combinatorial Problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 1999, 44:247-258.
    [54] Garzon M,Neathery P,Deaton R J et al.A new metric for DNA Computing. Proceedings of the 2nd Annual Genetic Programming Conference, Kaufmann,1997:472-487.
    [55] Wood D H.Applying error correcting codes to DNA computing.4th DIMACS workshop on DNA based computers, Pennsylvania, 1998: 109-110.
    [56] Hartemink A J,Gifford D K.Thermodynamics simulation of deoxyoligonucleotide hybridize for DNA computation.3rd DIMACS meeting on DNA based computers, Pennsylvania, 1997:23-25.
    [57] Shin S Y, Kim D M, Lee I H et al.Evolutionary sequence generation for reliable DNA computing.Congress on Evolutionary Computation,Honolulu ,2002,1:79-84.
    [58]刘文斌,朱翔鸥,王向红,张强,马润年.一种优化DNA计算模板性能的新方法.电子与信息学报,2008,30(5),1131-1135.
    [59]刘文斌,陈丽春,白宝钢,朱翔鸥,张强,马润年. DNA计算中的模板框优化方法研究.电子学报, 2007, 35(8): 1490-1494.
    [60]宋玉蓉.“重组DNA分子的模拟操作”中的问题及修正.生物学教学, 2008, 33(2): 35-36.
    [61]俞洋,缪淮扣,宋世平,樊春海. DNA分子计算与DNA计算机的研究进展.科学通报, 2008, 53(5): 497-502.
    [62]吴茵杰,高小琪.生物化学与分子生物学.北京:科学出版社,2003.
    [63]刘永明.分子生物学简明教程.北京:化学工业出版社,2006.
    [64] Braich R.S.,Chelyapov N,Johnson C,et al. Solution of a 20-Variable 3-SAT Problem on a DNA Computer.Science,2002,296:499-502.
    [65] Desmond S T Nicholl.遗传工程导论(第2版)(影印版).北京:高等教育出版社,2002.
    [66] Rose J A. The Fidelity of DNA computation. Ph. D thesis.The University ofMemphis,1999.
    [67]陈受宜,劳为德.遗传与遗传工程.北京:北京出版社,1980.
    [68]李静函.线粒体.北京:北京大学出版社,1988.
    [69] Garzon M, Deaton R, Nino L F, et al. Encoding Genomes for DNA Computing. Genetic Programming 1998: Proceedings of the Third Annual ConferenceUniversity of Wisconsin, Madison, Wisconsin, USA, 1998: 684-690.
    [70]刘文斌,高琳,王淑栋等.最大匹配问题的DNA表面计算模型.电子学报,2003,10: 1496-1499.
    [71]孟大志,高璞,慧等.数制转换的DNA计算模型.计算机工程与应用,2007,43(8):67-70.
    [72]朱祥鸥,刘文斌,陈丽春等.用线性码构造DNA计算编码的海明距离.计算机工程与应用,2007, 43(25):33-36.
    [73]殷志祥,张家秀.图论中的DNA计算模型.系统工程与电子技术,2007,29(7):1159-1162.
    [74]周康,殷燕芳,李玉华等.DNA编码的模型分析.华中科技大学学报(自然科学版),2007,35 (7):67-70.
    [75]朱翔鸥.DNA计算编码研究及其算法实现:(硕士论文).杭州:浙江工业大学2005.
    [76] Sager Jennifer, Stefanovic Darko. Designing nucleotide sequences for computation: A survey of constraints. DNA Computing - 11th International Workshop on DNA Computing, DNA11, Revised Selected Papers,LNCS,2006: 275-289.
    [77] Zhang M, Cheng M, Tarn T J.Genetic code based coding and mathematical formulation for dna computation. Proceedings of the IEEE International Conference on Robotics and Automation, Orlando, Florida, USA, 2006: 3630-3635.
    [78] Udo Feldkamp,Wolfgang Banzhaf, Hilmar Rauhe. A DNA Sequence Compile. Proceedings of 6th DIMACS Workshop on DNA Based Computers,2000:253-263.
    [79] Frutos A G, Liu Q, Thiel A J et al. Demonstration of a Word Design Strategy for DNA Computing on Surface. Nucleic Acids Research,1997,25(23):4748-4757.
    [80] Arita M, Masanori. Comma-free design for DNA words. Communications of the ACM,2004:99-100.
    [81] Arita M, KOBAYASHI S. DNA sequence Design using Template. New Generation Comput,2002, 20(3): 263-277.
    [82] Deaton R, Murphy R C, Rose J A, Garzon M.et a1. A DNA Based Implementation of an Evolutionary Computation. Proceedings IEEE Conference on Evolutionary Computation. Indiana, 1997:267-271.
    [83] Masanori Arita, Akio Nishikawa, Masami Hagiya, Ken Komiya, Hidetaka Gouzu, Kensaku Sakamoto. Improving Sequence Design far DNA computing. in Proceedings of Genetic and Evolutionary Computation Conference, 2000:875-882.
    [84] Marathe A, Condon A E, Corn R M. On Combinatorial DNA Code Design. Journal of computational biology,2001,8(3):201-219.
    [85] Tulpan D , Hoos H, and Condon A. Stochastic Local Search Algorithms for DNA Word Design, Proc. 8th DNA Based Computers,LNCS,2002,2568:229–241.
    [86] Tulpan D , Hoos H. Hybrid Randomized Neighborhoods Improve Stochastic Local Search for DNA Code Design, Proc. Advances in Artificial Intelligence, 16th Conference of the Canadian Society for Computational Studies of Intelligence,LNCS,2003,2671:418–433.
    [87]刘文斌.DNA计算中的编码问题及模型研究:(博士学位论文).武汉:华中科技大学,2003.
    [88] Liu Wenbin, Wang Shudong, Gao Lin. DNA sequence design based on template strategy. Journal of Chemical Information and Computer Sciences, 2003, 43(6): 2014-2018.
    [89] John SantaLucia, Jr., Hatim T. Allawi, and P. Ananda Seneviratne. Improved Nearest-Neighbor Parameters for Predicting DNA Duplex Stability. Biochemistry,1996,35:3555-3562.
    [90] Wei W, Xue dong ZH, Qiang ZH, Jin X. The Optimization of DNA Encodings Based on GA/SA Algorithms, Progress in Natural Science,2007, 17(6): 739-744.
    [91]雷英杰,张善文,李续武等. MATLAB遗传算法工具箱及应用.西安:西安电子科技大学出版社,2005.
    [92]吕晓.遗传算法的一些技术分析及在排课问题中的应用: (硕士学位论文).兰州:兰州大学,2008.
    [93]黄菲.基于遗传算法的图像分割: (硕士学位论文).武汉:武汉科技大学,2008.
    [94]任昊南.用遗传算法求解TSP问题: (硕士学位论文).济南:山东大学, 2008.
    [95] Shihua Zhou, Qiang Zhang, Jing Zhao et al. DNA encodings based on multiobject particle swarm. Journal of Computational and Theoretical Nanoscience, American Scientific Publishers,2007,4(7-8):1249-1252.
    [96] Galina T.Bogdanova, Andries E.Brouwer, Stoian N, Kapralov & Patric R J. Error-correcting Codes over an Alphabet of Four Elements. Codes Cryptogr, 2001, 23(3): 333-342.
    [97] Paolo B, Anne L, Vincenzo M, Victor M. Superposition Based on Watson–Crick-Like Complementarity. Theory Comput. Systems ,2006,39:503–524.
    [98] Chee Y M, Ling S. Improved lower bounds for constant GC-content DNA codes. IEEE Transactions on Information Theory, 2008, 54(1): 391-394.
    [99] Suguru Kawashimo, Hirotaka Ono, Kunihiko Sadakane, Masafumi Yamashita. DNA Sequence Design by Dynamic Neighborhood Searches, LNCS,2006,4287:157–171.
    [100] King O D. Bounds for DNA codes with constant GC-content. Electron. J. Combin, 2003,10(1):33-13.
    [101]万哲先.代数与编码(修改版).北京:科学出版社,1985.
    [102]孟道骥.代数学基础.天津:南开大学出版社,1992.
    [103]陈鲁生,沈世镒.编码理论基础.北京:高等教育出版社,2005.
    [104]冯克勤.纠错码的代数理论.北京:清华大学出版社,2005.
    [105] Hamming R W. Error detecting and error correcting codes. Bell Systems Technical Journal,1950,29:147-160.
    [106] Singleton R C. Maximum distance q-nary codes. IEEE Trans on Information Theory, 1964, 10:116-118.
    [107] Gilbert E N, A comparison if signaling alphabets, Bell Systems Technical Journal,1952,1:504-522.
    [108] Griesmer J H. A bound for error-correcting codes, IBM Journal of Research Development,1960,4:532-542.
    [109] MacWillams F J, Sloane N J A. The Theory of Error-Correcting Codes. AmsterdamL North-Holland, 1997.
    [110] Gaborit P, King O D. Linear constructions for DNA codes. Theoret. Comput. Sci, 2005,334:99–113.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700