基于DNA序列的功能位点识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
由于基因序列中的功能位点与基因的调控、转录紧密相关,人们对这些位点进行了广泛的分析。如何从DNA序列中准确地检测出这些功能位点成为了生物信息学中的一项长期热点。
     本文首先提出了一种基于熵度量的改进位置权重矩阵法,并以此方法对原核生物启动子进行识别。该方法首先运用信息论中的信息熵提取出原核生物启动子的保守位点,然后利用启动子训练集和非启动子训练集构建两个相应的改进位置权重矩阵。根据矩阵中相应于保守位点和关联片段的元素值,对测试序列进行计分,最后根据分值对测试序列进行分类。在大肠杆菌基因序列上的实验结果表明,该算法在敏感性、特异性、关联系数以及精确度方面优于现有的启动子识别算法。
     第二,提出了一种基于新颖模式识别技术的核小体识别算法。此技术结合了两种方法分别进行模式匹配和序列模糊性的去除。首先运用了电子技术中的镜像匹配滤波器来匹配序列中的模式信息;再运用图像处理中的概率松弛标示进行后续处理,根据位点左右的上下文信息减少或消除序列在测定过程中产生的噪声。将此技术应用到酵母基因组上,得到的核小体分布图表明该算法在识别准确率方面有显著的提高。实验结果同时也揭示出各物种之间核小体分布也许存在着一种共享的序列机制。
The functional sites in the DNA sequence are widely analyzed because of their relation with the gene regulation and transcription. How to recognize these functional sites accurately based on the DNA sequence has been a topic of long-standing interest in the Bioinformatics.
     In this paper, a detection algorithm is firstly proposed for the prokaryotic promoters using an improved position weight matrix (PWM) method based on an entropy measure. In this method, the conservative sites of the prokaryotic promoters are extracted according to an entropy measure, and then two improved position weight matrices are constructed based on the training set. By using the values of the matrix elements in the specific columns corresponding to the extracted conservative sites, the test sequences are scored and subsequently classified. Experimental results on several datasets show that the proposed algorithm outperforms the existing ones in sensitivity, specificity, correlation coefficient and precision.
     Secondly we develop a novel pattern recognition based approach to identify nucleosome positions. This technique combines two methods for nucleosome pattern matching and ambiguity elimination. Firstly the matched mirror position filter is used to match the patterns in the DNA sequence, and then the probabilistic relaxation labeling, which is widely used in image processing, is used to eliminate the noise in the DNA sequence by the contextual information. We then applied this combined framework to the Saccharomyces cerevisiae (yeast) genome. The resulting nucleosome occupancy maps of the yeast show that the accuracy of our proposed algorithm has been significantly improved. Experimental results also show that maybe a kind of mechanism is shared by the nucleosome occupancy maps of different species.
引文
[1]孙啸,陆祖宏,谢建明.生物信息学基础[M].北京:清华大学出版社,2005.
    [2]李其翔.新药药物靶标开发技术[M].北京:高等教育出版社,2006.
    [3]樊龙江.生物信息学札记(第三版)(网络教材) [Z] , 2010. http://ibi.zju.edu.cn/bioinplant
    [4] Abeel T, Saeys Y, Bonnet E, Rouze P, Van de Peer Y. Generic eukaryotic core promoter prediction using structural features of DNA[J]. Genome Res, 2008, 18:310-323.
    [5] Kanhere A, Bansal M. A novel method for prokaryotic promoter prediction based on DNA stability[J]. BMC Bioinformatics, 2005, 6:1-10.
    [6] Barrick D, Villaneuba K, Childs J, Kalil R, Schneider TD, Lawrence CE, Gold L, Stormo D. Quantitative analysis of ribosome binding sites in E. coli[J]. Nucleic Acids Res, 1994, 22:1287-1295.
    [7] Ma Q, Wang JTL, Wu CH. Application of Bayesian neural networks to biological data mining: a case study in DNA sequence classification[J]. IEEE Trans Syst Man Cybern Part, 2001, 31:468-475.
    [8] Cai J. Enhanced HMM for the recognition of Sigma70 promoters in Escherichia coli[J]. Digital Image Computing: Techniques and Applications, 2008, 46-51.
    [9] Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV. Sequence alignment kernel for recognition of promoter regions[J]. Bioinformatics, 2003, 19:1964-1971.
    [10] Claverie JM, Audic S. The statistical significance of nucleotide position-weight matrix matches[J]. CABIOS, 1996, 12:431-439.
    [11] Li Q, Lin H. The recognition and prediction of promoters in Escherichia coli K-12[J]. J Theor Biol, 2007, 242:135-141.
    [12] Lu P, Rha GB, Melikishvili M, Wu G, Adkins BC, Fried MG, Chi YI. Structural basis of natural promoter recognition by a unique nuclear receptor HNF4{alpha}[J]. J Biol Chem, 2008, 283:33685-33697.
    [13] Reese MG. Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome[J]. Comput Chem, 2001, 26:51-56.
    [14]张颖,贾芸,吕军.大肠杆菌σ70启动子的识别[J].生物物理学报, 2007, 23(6): 475-481.
    [15] Yuan GC, Liu YJ, Dion MF, Slack MD, Wu LF, Altschuler SJ, Rando OJ. Genome-scale identification of nucleosome positions in S. cerevisiae[J]. Science, 2005, 309:626-630.
    [16] Lee W, Tillo D, Bray N, Morse RH, Davis RW, Hughes TR, Nislow C. A high-resolution atlas of nucleosome occupancy in yeast[J]. Nature Genet, 2007, 39:1235-1244.
    [17] Albert I, Mavrich TN, Tomsho LP, Qi J, Zanton SJ, Schuster SC, Pugh BF. Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome[J]. Nature, 2007, 446:572-576.
    [18] Mavrich TN, Ioshikhes IP, Venters BJ, Jiang C, Tomsho LP, Qi J, Schuster SC, Albert I, Pugh BF. A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome[J]. Genome Res, 2008, 18: 1073-1083.
    [19] Field Y, Kaplan N, Fondufe-Mittendorf Y, Moore IK, Sharon E, Lubling Y, Widom J, Segal E. Distinct modes of regulation by chromatin encoded through nucleosome positioning signals[J]. PLoS Comput Biol, 2008, 4:e1000216.
    [20] Zhang Y, Moqtaderi Z, Rattner BP, Euskirchen G, Snyder M, Kadonaga JT, Liu XS, Struhl K. Intrinsic histone-DNA interactions are not the major determinant of nucleosome positions in vivi[J]. Nature Struct Mol Biol, 2009, 16:847-852.
    [21] Wang YH, Gellibolian R, Shimizu M, Wells RD, Griffith J. Long CCG triplet repeat blocks exclude nucleosomes: a possible mechanism for the nature of fragile sites in chromosomes[J]. J Mol Biol, 1996, 263:511-516.
    [22] Ioshikhes IP, Albert I, Zanton SJ, Pugh BF. Nucleosome positions predicted through comparative genomics[J]. Nat Genet, 2006, 38:1104-1105.
    [23] Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK, Wang JZ, Widom J. A genomic code for nucleosome positioning[J]. Nature, 2006, 442: 772-778.
    [24] Peckham HE, Thurman RE, Fu Y, Stamatoyannopoulos JA, Noble WS, Struhl K, Weng Z. Nucleosome positioning signals in genomic DNA[J]. Genome Res, 2007, 17:1170-1177.
    [25] Yuan GC, Liu JS. Genomic sequence is highly predictive of local nucleosomedepletion[J]. PLoS Comput Biol, 2008, 4:e13.
    [26] Teif VB, Rippe K. Predicting nucleosome positions on the DNA: combining intrinsic sequence preferences and remodeler activities[J]. Nucleic Acids Res, 2009, 37:5641-5655.
    [27] Hartley PD, Madhani HD. Mechanisms that specify promoter nucleosome location and identity[J]. Cell, 2009, 137:445-458.
    [28] Pokholok DK. Genome-wide map of nucleosome acetylation and methylation in yeast[J]. Cell, 2005, 122:517-527.
    [29] Zhao H, Yan H. Computational analysis of nucleosome positioning signals in the Simian Virus 40 chromatin[C]. Proc Int Multiconf Engineers and Computer Scientists, 2009, 245-249.
    [30] Schones DE, Cui K, Cuddapah S, Roh TY, Barski A, Wang Z, Wei G, Zhao K. Dynamic Regulation of Nucleosome Positioning in the Human Genome[J]. Cell, 2008, 132:887-898.
    [31] Wu Q, Wang J, Yan H. Prediction of nucleosome positions in the yeast genome based on matched mirror position filtering[J]. Bioinformation, 2009, 3:454-459.
    [32]吴春艳,王靖飞.生物信息学及其主要数学算法[J].中国畜牧兽医学会信息技术分会, 2005, 125-128.
    [33] Kielbasa SM, Gonze D, Herzel H. Measuring similarities between transcription factor binding sites[J]. BMC Bioinfomatics, 2005, 6:237-247.
    [34] Salgado H, Gama-Castro S, Martinez-Antonio A, Diaz-Peredo E, Sanchez-Solano F, Peralta-Gil M, Garcia-Alonso D, Jimenez-Jacinto V, Santos-Zavaleta A, Bonavides-Martinez C, Collado-Vides J. RegulonDB (version 4.0): transcriptional regulation, operon organizationand growth conditions in Escherichia coli K-12[J]. Nucleic Acids Res, 2004, 32: Database issue: D303-306.
    [35] Pribnow D. Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter[J]. Proc Natl Acad Sci USA, 1975, 72:784-788.
    [36] Seeburg PH, Nusslein C, Schaller H. Interaction of RNA polymerase with promoters from acteriophage fd[J]. Eur J Biochem, 1977, 74:107-113.
    [37] Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 A resolution[J]. Nature, 1997, 389:251-260.
    [38] Kornberg RD, Lorch Y. Twenty-five years of the nucleosome, fundamental particle of the eukaryotic chromosome[J]. Cell, 1999, 98:285-294.
    [39] Widom J. Role of DNA sequence in nucleosome stability and dynamics[J]. Quarterly Review of Biophysics, 2001, 34:269-324.
    [40] Voss RF. Evolution of Long-Range Fractal Correlation and 1/f Noise in DNA base sequence[J]. Phys Rev Lett, 1992, 68:3805.
    [41] Anastassiou D. Genomic Signal Processing[J]. IEEE Signal Processing Magazine, 2001, 18: 8-20.
    [42] Travers AA, Klug A. Bending of DNA in nucleoprotein complexes[M]. New York: Cold Spring Harbor Laboratory Press, 1990, 57-106.
    [43] Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Molecular Biology of the Cell[M]. New York: Garland Science, 2002.
    [44] Satchwell SC, Drew HR, Travers AA. Sequence periodicities in chicken nucleosome core DNA[J]. J Mol Biol, 1986, 191:659-675.
    [45] Hu JM, Yan H, Hu XH. Removal of impulse noise in highly corrupted digital images using a relaxation algorithm[J]. Optical Engineering, 1997, 36:849-856.
    [46]周卫星,廖欢.基于K均值聚类和概率松弛法的图像区域分割[J].计算机技术与发展, 2010, 20:68-70.
    [47] Ozsolak F, Song JS, Liu XS, Fisher DE. High-throughput mapping of the chromatin structure of human promoters[J]. Nat Biotechnol, 2007, 25:244-248.
    [48] Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA. Transcriptional regulatory code of a eukaryotic genome[J]. Nature, 2004, 431:99-104.
    [49] http://www.epd.isb-sib.ch/

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700