摘要
DNA序列的相似性分析已成为当前生物信息学科中的研究热点,对分析算法的需求也逐步增加,基于样本熵的DNA序列相似性分析方法存在一定的效率问题。本文提出了一种基于多尺度熵的分析方法,以7种病毒DNA序列作为实验研究的对象,采用整数法将其分别表示为时间序列,而后通过对比多个时间尺度下序列之间样本熵互值大小来显示序列之间的相关性,并与原有的样本熵算法的分析结果进行比较。实验表明,本文提出多尺度熵分析方法是切实可行的。
The similarity analysis of DNA sequences has become a research hotspot in the bioinformatics discipline,and the demand for analysis algorithms has gradually increased. There are certain efficiency issues based on analyzing the similarity of DNA sequences with sample entropy. This paper studies the application of multiscale entropy for similarity analysis of DNA sequences. The DNA sequences of seven viruses are used as experimental objects,which are converted into digital sequences by the numerical representation of DNA sequences. Then,by comparing the mutual values of sample entropy between DNA sequences at multiple time scales,the similarity of DNA sequences is analyzed. And compared with the results of sample entropy method,the experiments are designed.Experiments results strengthens the conclusion that it is feasible to analyze the DNA sequences similarity by multiscale entropy.
引文
[1]PAL S K,BANDYOPADHYAY S,RAY S S. Evolutionary computation in bioinformatics:A review[J]. IEEE Transactions on Systems,M an,&Cybernetics,Part C,2006,36(5):601-615.
[2]鲁卫平,周元国.生物信息学的现状和展望[J].国际检验医学杂志,2002,23(5):254-255,274.
[3]张春霆.生物信息学的现状与展望[J].世界科技研究与发展,2000,22(6):17-20.
[4]唐玉荣.生物信息学中的序列比对算法[J].计算机工程与应用,2003,39(29):5-7.
[5] GIBBS A J,MCINTYRE G A. The diagram,a method for comparing sequences. Its use w ith amino acid and nucleotide sequence[J]. European Journal of Biochemistry,1970,16(1):1-11.
[6]张少宏,戴宪华.基于对齐的生物序列相似性分析[J].生物信息学,2005,3(2):81-84.
[7]GATLIN L L. Information theory and the living system[M]. New York:Columbia University Press,1972.
[8]HARIRI A,WEBER B,OLMSTED J. On the validity of Shannoninformation calculations for molecular biological sequences[J].Journal of Theoretical Biology,1990,147(2):235-254.
[9]刘芳.基于信息离散度的DNA序列相似性分析研究[D].长沙:湖南大学,2009.
[10]FANG Weiwu,ROBERTS F S,MA Zhengrong. A measure of discrepancy of multiple sequences[J]. Information Sciences,2001,137(1-4):75-102.
[11]LIAO Bo,WANG Tianming. New 2D graphical representation of DNA sequences[J]. Journal of Computational Chemistry,2004,259(11):1364-1368.
[12]ZHANG Xun,ZHOU Xiaoan,Yu Yunhui. Similarity analysis of DNA using improved approximate entropy[C]//2012 International Conference on Biomedical Engineering and Biotechnology(i CBEB). Macau,Macao:IEEE,2012:511-514.
[13]LAKE D E,RICHMAN J S,GRIFFIN M P,et al. Sample entropy analysis of neonatal heart rate variability[J]. Am J.Physiol. Requl. Inteqr. Comp. Physiol.,2002,283(3):789-797.
[14]PINCUS S M. Approximate entropy as a measure of system complexity[J]. Proceedings of the National Academy of Sciences of the United States of America,1991,88(6):2297-2301.
[15] ALCARAZ R, RIETA J J. A review on sample entropy applications for the non-invasive analysis of atrial fibrillation electrocardiograms[J]. Biomedical Signal Processing&Control,2010,5(1):1-14.
[16]LIU Lizhi,QIAN Xiyuan,LU Hengyao. Cross-sample entropy of foreign exchange time series[J]. Physica A:Statistical M echanics and its Applications,2010,389(21):4785-4792.
[17]COSTA M,GOLDBERGER A I,PENG C K. Multiscale entropy analysis of physiologic time series[J]. Physical Review Letters,2002,89(6):068102-1-4.
[18]ROSEN G L. Examining coding structure and redundancy in DNA[J]. IEEE Engineering in Medicine&Biology Magazine,2006,25(1):62-68.