仿生计算在生物信息学中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在过去的几年间,计算机仿生计算在生命科学和医学的各个领域愈来愈起着前所未有的重要作用。计算机在序列分析中的应用,掀起了生物信息学的第一个高潮,但这个方面至今还有许多重要的问题尚未解决,其中的一个重要原因是计算的速度和效率还不能满足数据处理的需要。
     随着基因组和其它测序项目的不断进展,研究的重点正逐步从积累数据转移到如何来解释这些数据。生物学的新发现将极大地依赖我们在多个维度和不同尺度下对多样化数据进行组合和关联的分析能力。
     在数据量呈几何级数增长的情况下,生物信息的存储、获取、联网、处理和浏览以及可视化等方面,都对理论、算法和软件的发展提出了迫切的需求。
     计算机科学也从生命系统中获得启示,通过对生命活动的分析和模仿,产生了许多新的概念,包括:遗传算法、人工神经网络、计算机病毒和人造免疫系统、DNA计算、人工生命。这样的学科交叉丰富了各个相关领域,将在未来的岁月中得到进一步的发展。
     在本文中,结合生物信息学中的应用,我们使用了基于生物启发的仿生计算方法,如遗传算法,覆盖算法,蚁群算法等。这些构成了一个有趣的循环,从生命中来,到生命中去,这是本论文的研究特色和中心任务。
     在本文中,我们以分子生物学的核心定律——中心法则为框架,简要介绍了生物信息学研究所牵涉到的生物学概念,对生物信息学的研究内容、研究方法作了概要的介绍。着重研究了生物信息学的一个重要研究对象蛋白质的性质和特点,以及蛋白质研究的结构分类方法;介绍了蛋白质结构研究的现状,综合研究和分析了蛋白质研究中所采用的各种方法的技术特点。同时对微阵列基因芯片的原理和作用给予扼要的介绍。
     本文具体分析了人工神经网络中的一种——FP神经网络的覆盖算法,讨论了
In recent years, simulation computation plays more and more important role in the domain of biology and medicine research. The application of computer in sequence analysis is the first great success in bioinformatics. But in this domain, there are still many unsolved problems. One of the reasons is that the speed and the efficiency of computing is not enough to feed up with the needs of data processing.
    With the continuous progress on genome and other sequence project, the emphasis is gradually shifting from the accumulation of data to explaination to these data. The new discoveries of biology will greatly depand on the analysis ability of combining and conjuncting these different multi-dimention and multi-measure data.
    Since the amount of biology data increases in a geometric series manner, for the store, the acquisition, the network communication, the processing, the data exploring and visuialisation of biology information, these will bring forward the exigent requests in theory, algorithm and software.
    Computer science acquires inspirations from life system. Many new concepts are proposed. It includes the Genetic Algorithm (GA), Artifical neural netwrok(ANN), computer virus and immunity system, DNA computing, artificial life. These cross of different studies enrich realtive domain and will have great progress in the future.
    In this dissertation, combining with the application in bioinformtics, we have used serveral computing methods inspired from life, such as GA, Covering Algorithm, ant colony system. These form a interesting loop, from life, to life. This is the characteristic and central task of the dissertation
    With the help of the kernel rule of molecular biology—central dogma, we briefly introduce some basic concepts of biology in bioinformatics and summarily introduced the study contents and methods. We have studied the properties and characteristics of protein— one improtant analysis object of bioinfomatics. And we also studied the structure classification methods. We introduced the study level of protein structure,
引文
[1] 曾建潮 等.微粒群算法[M].科学出版社,2004
    [2] 黄席樾等。现代智能算法理论及其应用[M]。科学出版社,2005.
    [3] 马少平,朱小燕。人工智能[M].清华大学出版社,2004.
    [4] Stuart Russell, Peter Norvig.人工智能——一种现代方法(第二版)[M].人民邮电出版社,2004
    [5] Hagen JB. The origins of bioinformatics[J]. Nat Rev Genetics. 2000, 1(3): 231-236.
    [6] Minoro Kanehiss,后基因组信息学[M].清华大学出版社,2002
    [7] 蒋彦,王小行,曹毅等,基础生物信息学及应用[M].清华大学出版社,2003
    [8] Karlin S. Altschul SF. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes[J]. Proc Natl Acad Sci, USA, 87(6): 2264-2268.
    [9] Krogh A, Brown M, Mian IS, Sjolander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling[J]. J Mol Biol. 1994, 235(5): 1501-1531.
    [10] R. durbin, S. Eddy, A. krogh, G. Mitchison.生物序列分析——蛋白质核核酸的概率论模型[M].清华大学学出版社,剑桥大学出版社.2002
    [11] Giegerich R. A systematic approach to dynamic programming in bioinformatics[J]. Bioinformatics.. 2000, 16(8): 665-677.
    [12] Needleman S, Wunsch C. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. J. Mol. Biol. 1970, 48: 443-453.
    [13] Richord O. Duda Peter E Hart David GStork,模式分类[M].机械工业出版社,2003
    [14] Muggleton S, King RD, Sternberg MJE. 1992. Protein secondary structure prediction using logic-based machine learning[J]. Prot. Engin., 5: 647-657.
    [15] Birney E, Bateman A, Clamp ME, Hubbard TJ. Mining the draft human genomelJ]. Nature. 2001, 409: 827-828.
    [16] Altchul SF. et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res[J].. 1997. 25(17): 3389-3402.
    [17] 李衍达.孙之荣等译.《生物信息学:基因和蛋白质分析的实用指南》[M].北京:清华大学出版社.2000.
    [18] Qian. N. and Sejnowski. T. Predicting the secondary structure of globular protein using neural network models[J]. Journal of Molecular Bioiogy. 1988. 202: 865-884
    [19] Burkhard Rost. Chris Sander. Prediction of Protein Secondary Structure at Better than 70% Accuracy[J]. J. Mol. Biol. 1993. 232: 584-589
    [20] Olson AJ. Pique ME. Visualizing the future of molecular graphics[J]. SAR QSAR Environ Res. 1998. 8(3-4): 233-247.
    [21] Haile JM. 1992. Molecular Dynamics Simulation. Elementary Methods. New York. John Wiley.
    [22] van Duijnen PT. Quantum chemistry and enzymes: a next step[J]. Enzyme. 1986. 36 (1-2): 93-100.
    [23] Richards WG. Computer-aided molecular design[J]. Sci Prog. 1988. 72(288 Pt 4): 481-492.
    [24] Wilbur WJ. Lipman DJ. 1983. Rapid similarity searches of nucleic acid and protein data banks[J]. Proc Natl Acad Sci USA. 1983. 80(3): 726-730.
    [25] 郝柏林,张淑誉.2000.《生物信息学手册》[M].上海:上海科学技术出版社.
    [26] Dan E. Krane & Michael L. Raymer,生物信息学概论[M].清华大学出版社,2004
    [27] Osguthorpe DJ. Ab Initio Protein Folding[J]. Current Opinion in Structural Biology. 2000. 10: 146-152.
    [28] Di Francesco V. Garnier J, Munson PJ. 1996. Improving protein secondary structure prediction with aligned homologous sequences[J]. Prot. Sci.. 5: 106-113.
    [29] Chou. P. and Fasman. G. Prediction of the secondary structure of proteins from their amino acid sequence[J]. Advances in Enzymology. 1978. 47: 45-148
    [30] J. Garnier and B. Robson. 1989. The GOR method for predicting secondary sturctures in protein
    [31] Yi TM, Lander ES. 1993. Protein Secondary Structure Prediction Using Nearest-neigbor Methods[J]. J. Mol. Biol., 232: 1117-1129
    [32] Rost, B. and Sander, C. Combinning evolutionary information and neural networks to predict protein secondary structure[J]. Preteins, 1993 19: 55-72.
    [33] Baldi, Brunak, Frasconi, Soda and Pollastri. Exploiting the past and the future in protein secondary structure prediction[J]. BIOINF: Binformatics. 1999, 15
    [34] 朱玉贤,现代分子生物学(第二版)[M].高等教育出版社,2002
    [35] Futreal PA, et al. Cancer and genomics[J]. Nature. 2001, 409: 850-852.
    [36] Chee M. Yang R, Hubbel, et al. 1996. Accessing genetic information with high-density DNA arrays[J]. Science, 274: 610-613.
    [37] 张钹,张铃.人工神经网络的设计方法[J].清华大学学报,1998,38(S1):14.
    [38] 张铃.张钹.M-P神经元模型的几何意义及其应用[J].软件学报,1998,9(5):334-338.
    [39] 史忠植.知识发现[M].清华大学出版社,2002:213-214
    [40] 张铃,张钹,殷海风.多层前向网络的交叉覆盖设计算法[J].软件学报,1999.10(7):737-742
    [41] 张持健,商空间下模糊系统与模糊控制的问题求解[D]。安徽大学博士论文,2005
    [42] 毛军军.吴涛.郑婷婷.基于商空间的构造性分层竞争网络算法[J].微机发展.2005,15(4):37-39
    [43] 赵姝,张燕平,张媛,陈传明.基于交叉覆盖算法的改进算法——核平移覆盖算法[J].微机发展,2004,14(11):1-3
    [44] 方慧生,向秉仁,安登魁。改进Madline学习算法预测蛋白质二级结构[J]。中国药科大学学报,1996,27(6):366-369
    [45] 王龙会,石峰。遗传神经网络及其在蛋白质二级结构预测中的应用[J]。数学杂志,2002,22(2):179-185
    [46] LAMONT Owen, Liang Hiew Hong, Bellgard Matthew. Data representation influences protein secondary structure prediction using artificail neural networks. Swventh Australian and New Zealand Intelligent Information Systems Conference. Perth. Western Australia: ANZIIS. 2001. 18-21
    [47] Bohr Henrik, Bohr Jakob, Brunak Seren et al. Protein secondary structure and homology by neural network[J]. The Ahclices in Rhodopsin. 1988. 241(1, 2): 223-228
    [48] Rost. B.. Sander. C., and Schneider. R. Rede_ning the goals of proteinsecondary structure prediction[J]. Jour. Mol. Biol. 1994. 235: 13. 26.
    [49] Hae-Jin Hu. Yi Pan. Robert Harrison. Phang C. tai. Improved Protein Secondary Structure Prediction Using Support Vector Machine With a New Encoding Scheme and an Advanced Teritiary Classificr[J]. IEEE Transctions on nanobioscmece 3(4). December 2004: 265-271
    [50] 孙向东.韦柳静.黄日波.蛋白质二级结构预测的支持向量机模型研究[J].广西农业生物科学.2004,23(1):67-71
    [51] Longhui Wang. Juan Liu. Yanfu Li. Huaibei Zhou. Predicting Protein Secondary Structure by a Support Vector Machine Base on a New Coding Scheme[J]. Genome Informatics 2004. 15(2): 181-190
    [52] Lewin, Genes Ⅷ[M]. Pearson Prentice Hall. 2004: 889-893
    [53] GOLUB T R. SLONIM D K. TAMAYO P. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science. 1999. 286: 531-537.
    [54] RAMASWAMY S. TAMAYO P. RIFKIN R. et al. Multiclass cancer diagnosis using tumor gene expression signatures[C]. Proceeding of the National Academy of Science. 2001. 98 (26): 15149-15154
    [55] WEST M. BLANCHETTE C. DRESSMAN H. et al. Prodicting the clinical status of human breast cancer by using gene expression profiles. Proceeding of the National Academy of Science. 2001. 98(20): 11462-11497.
    [56] Golub T R. Slonim D K. Tamayo P. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science. 1999. 286(5439): 531-537.
    [57] Zhang Ling, Zhang Bo, A Geometrical Represemation of McCulloch-Pitts Neural Model and Its Applications [J]. IEEE Trans. on Neural Networks. 1999. 10(4): 925-929
    [58] 吴鸣锐,大规模模式识别问题的分类器设计研究[D].北京:清华大学计算机系,博士论文。
    [59] 张燕平,张铃,段震.构造性核覆盖算法在图像识别中的应用[J].中国图象图形学报.2004,9(11):1304-1308
    [60] 段震,姚芳兵,张铃,基于构造性学习方法的车牌定位[J],微机发展,2004,14(8):41-44
    [61] 张旻,程家兴.基于粒度计算和覆盖算法的信号样式识别[J].计算机工程与应用,2003,24:56-59
    [62] Pierre Baldi, Soren Brunak, Bioinformatics: The Machine Learning Aproach[M], CITIC Publish House. 2003
    [63] Golub http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
    [64] 俞振超.李通化.吴姗.PLS和SVM应用于基因表达数据分类[J].计算机与应用化学200320(5):563-566.
    [65] 雷英杰 张善文 李续武 周创明,MATLAB遗传算法工具箱及应用[M]。西安电子科技大学出版社,2005
    [66] 张振飞,胡光道.杨明国.基于进化策略的CHC遗传算法及岩性波谱识别[J].地球科学-中国地质大学学报.2003.28(3):351-355
    [67] 张中英.谢刚,谢克明.用Messy遗传算法确定多层前向神经网络的隐层结点数[J].太原理工大学学报,2005,36(4):392-394
    [68] 田小梅,郑金华,李合军.基于父个体相似度的自适应遗传算法[J].计算机工程与应用.2005.18:61-64
    [69] 潘中良.熊银根.一种基于小生境的遗传算法及其应用[J].中山大学学报(自然科学版),2001,40(5):44-47
    [70] 郭彤城.慕春棣.并行遗传算法的新进展[J].系统工程理论与实践,2002.02:15-24.
    [71] 王文义.任刚.多种群退火贪婪混合遗传算法[J].计算机工程与应用.2005.23:60-62
    [72] Lau K. F. and Dill K. A. A lattice statistical mechanics model of the conformational and sequence spaces of proteins[J]. Marcromolecules 1989. 22: 3986-3997.
    [73] Lau K. F. and Dill K. A. Theory for protein mutability and biogenesis[C]. Proceedings of the National Academy of Sciences USA 1990. 87: 638-642.
    [74] Nayaka. Sinclair A. Zwick U (1999) J Comp Biol 6: 13
    [75] Krasnogor N. Hart WE. Smith J. Pelta DA. Proceedings of the 1999 international genetic and evolutionary computation conference(GECCO99)[C]. 1999.
    [76] William Hart. Sorin Istrail HP Benchmarks http:/www.cs.sandia.gov/tech_reports/compbio/tortilla-hp-benchmarks.html
    [77] Unger R. Moult J. J Mol Bil. 1993. 231: 75
    [78] Graham A. Cox. Thomas V. Mortimer-Jones. Robert P. Taylor. Roy L. JohnstonDevelopment and optimization of a noval genetic algorithm for studying model protein folding [J]. Theor Chem Acc. 2004. 112: 163-178
    [79] Martin Tompa. Lecture Notes on Biological Sequence Analysis[E]. 2000
    [80] M Dorigo. L M Gamabardella. Ant Colonies for the Traveling Salesman Problem[J]. BioSystems. 1997(43): 73~81
    [81] E bonabeau. M Dorigo. G Theraulaz. Swarm Inetelligence: From Natural to Artifical SysetmslM]. New York: Oxford University Press. 1999
    [82] J-L Deneubourg et al. The Self-Organizing Explorator Pattren of the Aregentine Ant[J]. J. Insect Behavior. 1990(3): 159~168
    [83] S Goss. S Aron. J-L Deneubourg et al. Self-Organized Shortcuts in the Argentine Ant[J]. Naturawissenchaften. 1989(76): 579~581
    [84] M Dorigo. V Maniezzo. A Colorni. Positive Feedback as a Search Strategy[E]. Technical Report. 91~106.
    [85] B Bullnheimer. R F Hartl. C Strauss. A New Rank-based Version of The ant System: A computational Study[E]. Thechnical Report POM-03/97. Institute of Management Science. University of Vienna. 1997. Accepte for Publiation in the Central European Journal for Operations Research and Economics
    [86] V Maniezzo, M Dorigo, A Colorni. The ant System Applied to the Quadratic Assignment Problem[E]. Technical Report IRIDIA/94~28, Universite de Bruxelles, Belgium, 1994
    [87] M Dorigo3, L M Gambradella. Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem[J]. IEEE Trans. On Evolutionary Computation, 1997, 1(1); 53~66
    [88] T Stutzle, H Hoos. Improvements on the Ant System: Introducing MAX-MIN ant System[C]. In Proceeding of the International Conference on Artificial Nerual Networks and Genetic Algorrithms, Springer Verlag, Wien. 1997: 245~249
    [89] 李士勇等.蚁群算法及其应用[M]。哈尔滨工业大学出版社,2004
    [90] Meuleau N and Dorigo M. Ant colony optimization and stochastic gradient descent[J]. Artif. Life. 2002. 8(2): 103-121
    [91] 陈峻,沈洁,秦玲。蚁群算法求解连续空间优化问题的一种新方法[J]。软件学报,2002,13(12):2317-2323
    [92] 汪镭,吴启迪。蚁群算法在系统辨识中的应用[J]。自动化学报,2003,29(1):102-109
    [93] 李冬冬,王正志,杜耀华,晏春.蚂蚁群落优化算法在蛋白质折叠二维亲-疏水格点模型中的应用[J].生物物理学报,2004,20(5):371-374
    [94] 胡小兵,黄席樾。基于蚂蚁算法的三维空间机器人路径规划研究[J]。重庆大学学报(自然科学版),2004,27(8):132-135
    [95] Rafael S Parpinelli, Heitor S Lopes, and Alex A Freitas. Data Mining with an Ant Colony Optimization Algorithm[J]. IEEE Transactions on Evolutiouary Computing, 2002, 6(4): 321-332
    [96] Di Caro and Dorigo M. AntNet: Distributed Stigmergetic Control for Communications Networks[J]. Journal of Artificial Intelligence Research, 1998, 9: 317-365
    [97] Kaizhi, Ken A. Dill. Forces of tertiary structural organizaton in globular protein[J]. Proc. Natl. Acad. Sci. USA. 2005. 1 vol. 92: 146-150
    [98] Dill KA. Fiebig KM, Chan HS: Cooperativity in Protein-Folding Kinetics[J]. Proc Natl Acad Sci USA 1993 90:1942-1946
    [99] Toma L. Toma S: Contact interactions method: A new algorithm for protein folding simulations[J]. Protein Sci 1996.5:147-153
    [100] Beutler T. Dill K: A fast conformational search strategy for finding low energy structures of model proteins[J]. Protein Sci 1996. 5:2037-2043.
    [101] Yue K. Dill KA: Forces of Tertiary Structural Organization in Globular Proteins[J] . Proc Natl Acad Sci USA 1995. 92:146-150.
    [102] Andrew S. Tanenbatum. 计算机网络(第4版)[M].清华大学出版社,2004
    [103] Kaizhi Yue. Klaus M. Fiebig. Paul D.Thomas. Hue Sun Chan. Eugene I. Shakhnovich and Ken A. Dill. A test of lattice protein folding algorithms[J] . Proc. Natl. Acad. Sci. USA. 1995. 92:325-329
    [104] 吴涛.张铃.张燕平.机器学习中的核覆盖算法[J].计算机学报.2005.28(8):1295-1301
    [105] 张旻,一种加权的构造型神经网络覆盖算法设计与实现[J]。计算机工程,2005,31(2):39-38

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700