用户名: 密码: 验证码:
人类基因组中选择性剪接位点的预测及序列特征分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
信使RNA的选择性剪接是真核生物有别于原核生物的基本特征之一。选择性剪接使单个基因产生多种转录物,是多细胞高等真核生物蛋白质多样性和功能复杂性的主要机制。mRNA前体的选择性剪接具有组织发育阶段性、特异性等特征,在系统发育、分化和癌变过程中发挥着非常重要的作用。论文首先对人类基因组中剪接位点序列与假剪接位点序列的保守性及空间结构特征进行了统计分析;其次根据剪接位点序列的碱基联体的保守性特征及剪接位点上下游区域的空间结构特征,构建剪接位点的信息向量,利用支持向量机对剪接位点的供体端和受体端进行了预测。对于供体端,5-fold交叉检验方法的敏感性、特异性及总体预测精度都达到了92.30%以上,相关系数为0.69,3-way data split检验方法的敏感性、特异性及总体预测精度达到了91.96%以上,相关系数为0.68;对受体端的5-fold交叉检验方法的敏感性、特异性及总体预测精度都达到了90.53%以上,相关系数为0.63,3-way data split检验方法的敏感性、特异性及总体预测精度达到了89.62%以上,相关系数为0.62。
     选择性剪接位点和组成性剪接位点在序列水平上没有明显的差异,而且选择性剪接位点事件中每一对选择性剪接位点之间的距离都很近,所以选择性剪接位点事件预测工作是理论预测工作的一项挑战。论文中以位置关联权重矩阵和DNA结构信息参数作为剪接位点信息输入向量,应用支持向量机对选择性剪接位点和组成性剪接位点做分类。对于供体端剪接位点,独立检验集的敏感性、特异性及总体预测精度都在73.30%以上,相关系数为0.47;对受体端剪接位点,独立检验集的敏感性、特异性及总体预测精度都在74.57%以上,相关系数为0.49。此结果要明显的好于最近的文献中的预测结果,表明我们的方法可以作为选择性剪接位点识别问题的工具之一。
Alternative processing of mRNA is a basic distinction between eukaryotes and prokaryotes, which is a key mechanism enriching proteomic diversity and functional complexity of higher multicellular eukaryotes by producing several transcripts from single gene. Alternative splicing of pre-mRNA is specific to different stages of development and particular tissues of organism. Moreover, it plays an important role in development, differentiation and cancer of system. Firstly, in this paper some basic conservation features and the spatial structure characteristics of splice sites as well as pseudo splice sites in human genome were analysed, and based on the conservation of nucleotides and spatial structure characteristics of splice sites upstream and downstream regions, the information vector of splice sites is constructed. Secondly, the support vector machine (SVM) models combined with the features of information vector are developed and used to predict the donor and acceptor spice sites of human genome. For five-fold cross-validation, the total prediction accuracies are 92.55% and 90.70% for donors and acceptors respectively. For three-way data split, the total accuracies are 92.25% and 89.87% for donors and acceptors, respectively.
     On the sequence level, there is no obvious difference between alternative and constitutive splice sites. Moreover, the distances between two donor (or acceptor) sites for the same exon are very close in alternative splicing events. Therefore, it is still a challenge for the theoretical prediction of alternative splicing sites. In this paper, based on position-correlation weight matrix (PCWM) and DNA structural information, an approach for predicting the alternative splice sites is presented. The predictive success rates are 73.32% and 74.62% respectively for donor sites and acceptor sites. The prediction results are better than the recent methods which are based on the mechanism of splice site competition.
引文
[1]赵剑华,王秀琴,刘芝华,吴昊.功能基因组学的研究内容与方法[J].生物化学与生物物理进展,2000,27(1),6-8.
    [2]gilbert,w.towards a paradigm shift in biology[J]. Nature,1991,349(6305):99.
    [3]罗辽复.生命进化的物理观[M].上海:上海科学技术出版社,2000.
    [4]Ashurst, J.L.,Collins,J.E. Gene annotation:prediction and testing [J]. Annu. Rev. Genomics Hum.Genet.,2003,4:69-88
    [5]Vinayagam,A.,Konig,R.,Moormann,J.,Schubert,F.,etc.Applying support vector machines for gene ontology based gene function prediction [J].BMC Bioinformatics,2004,5:116.
    [6]George, R.A., Liu, J.Y., Feng, L.L., Bryson-richardson, R.J., etc. Analysis of protein sequence and interaction data for candidate disease gene prediction [J].Nucleic Acids Res.,2006,34(19): e130.
    [7]L'Hote D, Serres C, Veitia RA, Montagutelli X., etc. Gene expression regulation in the context of mouse interspecific mosaic genomes [J]. Genome Biol.,2008,9(8):R133.
    [8]Lassig M. From biophysics to evolutionary genetics:statistical aspects of gene regulation[J]. BMC Bioinformatics,2007,8(Suppl 6):S7.
    [9]Wang,Z.F., Burge, C.B. Splicing regulation:from aparts list of regulatory elements to an integrated splicing code [J]. RNA,2008,14(5):802-813.
    [10]Blencowe,B.J. Splicing regulation:the cell cycle connection [J]. Current Biology,2003,13(4): R149-R151.
    [11]Castle JC, Zhang C, Shah JK, Kulkarni AV, etc. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines [J]. Nat Genet,2008, 40(12):1416-1425.
    [12]Zheng,Z.M.Regulation of alternative RNA splicing by exon definition and exon sequences in viral and mammalisn gene expression [J].J.Biomed.Sic.,2004,11(3):278-294.
    [13]Demir E, Dickson BJ. Fruitless splicing specifies male courtship behavior in Drosophila. Cell, 2005,121:785-794.
    [14]Intenatinal Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 2004,431:931-945.
    [15]Clamp M, Fry B, Kamal M, et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci USA,2007,104(49):19428-19433.
    [16]Modrek B, Resch A, Grasso C, et al. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res,2001,29(13):2850-2859.
    [17]Clark F, Thanaraj TA. Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Hum.Mol. Genet,2002,11(4): 451-464.
    [18]Stamm S, Riethoven JJ, Texier VL, et al. ASD:a bioinformatics resource on alternative splicing. Nucleic Acids Res,2006,34:D46-D55.
    [19]Modrek, B., Lee, C.,2002. A genomic view of alternative splicing. Nature Genet.30,13-19.
    [20]Wang ET, Sandberg R, Luo SJ, et al. Alternative isoform regulation in human tissue transcriptomes. Nature,2008,456:470-476.
    [21]Kim N, Alekseyenko AV, Roy M, et al. The ASAP Ⅱ database analysis and comparative genomics of alternative splicing in 15 animal species. Nucleic Acid Res,2007,35(D): D93-D98.
    [22]Zahler AM. Alternative splicing in C.elegans[EB/OL].http://www.wormbook.org/chapters/ www_altsplicing/altsplicing.html,2005-09-26.
    [23]Zavolan M, Kondo S, Schonbach C, et al. Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Res,2003,13(6B):1290-1300.
    [24]Okazaki Y, Furuno M, Kasukawa T, et al. Analysis of the mouse transcriptome based on functional annotation of 60770 full-length cDNAs. Nature,2002,420(6915):563-573.
    [25]Eddo K, Alon M, Gil A. Different levels of alternative splicing among eukaryotes. Nucleic Acid Res,2007,34(D):D125-D131.
    [26]Dewey CN, Rogozin IB, Koonin EV. Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns. BMC Genomics,2006,7: 311-320.
    [27]Matlin AJ, Clark F, Smith CW. Understanding alternative splicing:Toward a cellular code. Nat. Rev. Mol. Cell Biol.,2005,6:386-398.
    [28]Will C, Lu¨hrmann R. Spliceosome structure and function. In The RNA world (eds. R.F. Gesteland et al.),2007, pp.369-400. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
    [29]Chasin L. Searching for splicing motifs. In Alternative splicing in the postgenomic era (eds. B.J. Blencowe and B.R. Graveley),2007, pp.85-106. Landes Biosciences, Austin, TX.
    [30]Fairbrother WG, Yeh RF, Sharp PA, et al. Predictive identification of exonic splicing enhancers in humangenes. Science,2002,297:1007-1013.
    [31]Wang Z, Rolish ME, Yeo G, et al. Systematic identification and analysis of exonic splicing silencers. Cell,2004,119:831-845.
    [32]Zhang XH, Chasin LA. Computational definition of sequence motifs governing constitutive exon splicing. Genes & Dev,2004,18:1241-1250.
    [33]武春晓,马大龙.可变剪接与蛋白质组多样性及其调节机制[EB/ON]. http://gene.bjmu.edu.c-n/science/download/1.doc.
    [34]Javier F, Caceres, Alberto R, et al. Alternative splicing:multiple control mechanisms and involvement in human disease TRENDS in genetics.18(4):186-193.
    [35]Thanaraj TA, Stamm S, Clark F, et al. ASD:the Alternative Splicing Database. Nucleic Acids Research,2004,32(Database issue):D64-D69.
    [36]Ast G. How did alternative splicing evolve [J]. Nat Rev Genet,2004,5(10):773-782.
    [37]Nagasaki H, Arita M, Nishizawa T, et al. Automated classification of alternative splicing and transcriptional initiation and construction of visual database of classified patterns [J]. Bioinformatics,2006,22(10):1211-1216.
    [38]Florea L. Bioinformatics of alternative splicing and its regulation. Bioinformatics,2005,7(1): 55-69.
    [39]李稚锋,王正志,张成岗.真核基因可变剪接研究现状与展望.生物信息学,2004,2:35-38.
    [40]Grabowaki PJ, Black DL. Alternative RNA splicing in the nervous system. Prog Neurobio 1,2001, 65(3):289-308.
    [41]Joseph R, Dou D, Tsang W. Neuronation mRNA:alternatively spliced forms of a novel brain-specific mammaliandevelopmental gene. Brain Res,1995,690:92-98.
    [42]Chen CD, Kobayashi R, Helfman DM. Binding of hnRNP H to an exonic splicing silencer is
    involved in the fegulation of alternative splicing of the rat beta-tropomyosin gene. Genes Dev, 1999,13:593-606.
    [43]Jiao Y, Robison AJ, Bass MA, et al. Developmentally-regulated alternative splicing of densinmodulates protein-protein interaction and subcecllular localization. J Neurochm,2008, Feb4 (Epub ahead of print). Doi:1111/j.1471-4159.2008.05280.x
    [44]Li Q, Lee JA, Black DL. Neuronal regulation of alternative pre-mRNA splicing. Nat Rev Neurosci,2007,8(11):819-831.
    [45]Stamm S, Zhu J, Nakai K, et al. An alternative-exon database and its statistical analysis. DNA Cell Biol,2000,19(12):739-756.
    [46]Xu Q, Modrek B, Lee C. Genome-wide detection of tissue-specific alternative splicing in the human transcriptomr. Nucleic Acids Res,2002,31(17):3754-3766.
    [47]Sakharkar MK, Perumal BS, Lim YP, et al. Alternative spliced human genes by exon skipping-a database(ASHESdb). In Silico Biol,2005,5(3):221-225.
    [48]Lopez A. Alternative splicing of pre-mRNA:development consequences and mechanisms of regulation. Ann.Rev Genet,1998,32:279-305.
    [49]Faustino NA, Cooper TA. Pre-mRNA splicing and human disease. Genes Dev 2003,17: 419-37.
    [50]Garcia-Blanco MA, Baraniak AP, Lasda EL. Alternative splicing in disease and therapy. Nat. Biotechnol,2004,22:535-546.
    [51]Pagani F, Baralle FE. Genomic variants in exons and introns:identifying the splicing spoiler. Nat Rev Genet,2004,5:389-96.
    [52]Holmila R, Fouquet C, Cadranel J, et al. Splice mutations in the p53 gene:case report and review of the literature. Hum Mutat,2003,21:101-102.
    [53]Julian PV. Abettent and Alternative Splicing in Canser. CANCER RESEARCH 2004,64: 7647-7654.
    [54]Kyrahashi H, Takami K, Oue T, et al. Biallelic inactivation of the APC gene in hepatoblastoma. Cancer Res 1995,55:5007-5011.
    [55]Clarke LA, Veiga I, Isidro G, et al. Pathological exon skipping in an HNPCC proband with MLH1 splixe acceptor site mutation. Genes Chromosomes Cancer,2000,29:367-370.
    [56]Hoffman JD, Hallam SE, Venne L, et al. Implication of a nove cryptic splice site in the BRCA1 gene. Am J Med Genet,1998,80:140-144.
    [57]Coulson JM, Edgson JL, Woll PJ, et al. A splice variant of the neuron-restrictive silencer factor fepressor is expressed in small cell lung cancer:a potential role in depression of neuroe ndocrine genes and a useful clinical marker. Cancer Res,2000,60:1840-1844.
    [58]Zhu X, Daffada AA, Chan CM, et al. Identification of an exon 3 deletion splice variant and drogen receptor mRNA in human breast cancer. Int J Cancer,1997,72:574-580.
    [59]Modrek B, Lee C. Alternative splicing in human, mouse, and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat. Genet,2002,34:177-180.
    [60]Su Z, Wang JM, Yu J, et al. Evolution of alternative splicing after gene duplication. Genome Research,2006,16:182-189.
    [61]Wen W, Hongkun Z, Shuang Y, et al. Origin and evolution of new exons in rodents. Genome Research,2005,15:1258-1264.
    [62]Makalowski W, Mitchell GA, Labuda D. Alu sequences in the coding regions of mRNA:a source of protein variability. Trends Genet.,1994,10:188-193.
    [63]Sorek R, Ast G, Graur D. Alu-containing exons are alternatively spliced. Genome Res.,2002,12: 1060-1067.
    [64]Kan Z, Rouchka EC, Gish WR, et al. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res.,2001,11:889-900.
    [65]Modrek B, Lee C. A genomic view of alternative splicing. Nature Genet.,2002,30:13-19.
    [66]Johnson JM, Castle J, Garrett-Engele P, et al. Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science,2003,302:2141-2144.
    [67]蔡禄.生物信息学.北京:化学工业出版社,2006,261.
    [68]Black DL. Protein Diversity from Alternative Splicing:A Challenge for Bioinformatics and Post-Genome Biology. Cell 103,367 (2000).
    [69]Dror G, Sorek R, Shamir R. Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics,2005,21(7):897-901.
    [70]杨乌日吐,李前忠,杨科利等.用支持向量机预测线虫基因选择性剪切位点.中国科技论文在线.
    [71]Hertz GZ, Stormo GD. Identifing DNA and protein patterns with statistically significant
    alignments of multiple sequences. Bioinformatics,1999,15:563-577.
    [72]Lin JC, Xu JL, Luo JH, et al. Prediction of prokaryotic promoters based on prediction of transcriptional units. Acta Biochimica et Biophysica Sinica,2003,35(4):317-324.
    [73]Reese MG. Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem,2001,26(1):51-56.
    [74]Ho, L.S., Rajapakse, J.C.. Splice site detection with a higher-order markov model implemented on a neural network[J]. Genome Informatics,2003,14:64-72.
    [75]Jin HY, Luo LF, Zhang LR. Using estimative reaction freeenergy to predict splice sites and their flanking competitors. Gene,2008,424(1-2):115-120.
    [76]Zhang LR, Luo LF. Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res,2003,31(21):6214-6220.
    [77]林昊,李前忠.基于二次判别的果蝇启动子识别.生物物理学报,2006,22(5):345-350.
    [78]Hiller M, Huse K, Platzer M, et al. Non-EST based prediction of exon skipping and intron retention events using Pfam information. Nucleic Acid Res,2005,33(17):5611-5621.
    [79]Stormo GD, Schneider TD, Gold LM. Characterization of translation initiation sites in E.coli. Nucl.Acids Res.,1982,10(9):2971-2996.
    [80]Garland,J.A.,Aalberts,D.P. Thermodynamic modeling of donor splice site recognition in pre-mRNA [J]. Physical Review E,2004,69(5):041903.
    [81]Bi, J.N., Xia, H.Y., Li,F., Zhang, X.G., Li Y.D.The effect of U1 snRNA binding free energy on the selection of 5 splice sites [J].Biochen.Biophys.Res.Commun.,2005,333(1):64-69.
    [82]晋宏营,罗辽复,张利绒.核酸-蛋白质结合能在剪切位点识别中的应用[J].生物物理学报,2007,23(3):185-191.
    [83]Jin, H.Y., Luo, L.F., Zhang,L.R. Using estimative reaction free energy to predict splice sites and their flanking competitors [J].Gene,2008,424(1-2):115-120.
    [84]晋宏营,罗辽复,张利绒.使用估计的反应自由能预测组成性和可变剪接位点[J].生物物理学报,2009,25(1):57-64.
    [85]Stormo G D; Schneider T D; Gold L M. Characterization of translational initiation sites in E. coli.. Nucleic acids research,1982,10(9):2971-2996.
    [86]刘利,李前忠,樊国梁.低维输入空间的支持向量机识别人类剪接位点[J].生物物理学报,2008,24(1):49-55.
    [87]林昊,李前忠.基于位置关联权重矩阵的大肠杆菌启动子预测[J].内蒙古大学学报,2007,38(2):181-186.
    [88]左永春,李前忠.基于序列和结构特征分析植物TATA和TATA-less启动子.生物化学与生物物理进展,2009,36(7):863-871.
    [89]WurituYang and Qian-zhong Li. One parameter to describe the mechanism of splice site competition. Biochemical and Biophysical Research Communications,2008,368(2):379-381.
    [90]Goni J R,Fenollosa C,Perez A,et al. Determining promoter location based on DNA structure first-principles calculations [J]. Genome Bio.l,2007,8(12):R263.
    [91]Dickerson R E. Definitions and nomenclature of nucleic acid structure components. Nucleic Acids Res,1989,17(5):1797-1803.
    [92]Yanagi K, Prive GG, Dickerson RE. Analysis of local helix geometry in three B-DNA decamers and eight dodecamers[J].J. Mol. Biol.,1991,217(1):201-214.
    [93]张鸽,陈书开.基于SVM的手写体阿拉伯数字识别.http://www.space.cetin.net.cn /upfile/Sbj_003/2005/09/up_211113094CF5E08.doc.
    [94]Burset M, Guigo R. Evaluation of gene structure prediction programs [J]. Genomics,1996, 34(3):353-367.
    [95]Matthews BW. Comparison of predicted and observed secondary structure of T4 phage lysozyme [J]. Biochim Biophys Acta,1975,405(2):442-451.
    [96]Metz CE. Basic principles of ROC analysis [J]. Semin Nucl Med,1978,8(4):283-298.
    [97]Swets JA, Dawes R, Monahan J. Psychological science can improve diagnostic decisions [J]. Psychological Science in the Public Interest,2000,1(1):1-26.
    [98]Sebag M, Aze J, Lucas N. Impact studies and sensitivity analysis in medical data mining with ROC-based genetic learning [In]. Third IEEE International Conference on Data Mining,2003, 637-640.
    [99]林昊,李前忠.拟南芥和线虫外显子/内含子剪切位点的研究[J].内蒙古大学学报,2006,37(3):279-284.
    [100]International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome [J].Nature,409 (2001):860-921.
    [101]Zhang QW, Peng QK, Li KK,et al. Splice sites detection by combining Markov and hidden Markov model [J].2009 2nd International Conference on Biomedical Engineering and Informatics 17-19 October 2009 Tianjin, China.
    [102]张利绒,罗辽复,晋宏营等.人类基因组可变和组成性剪接位点的预测[J].生物化学与生物物理进展.2008,35(10):1188-1194.
    [103]Pollastro P, Rampone S. HS3D, a Dataset of Homo Sapiens Splice Regions, and Its Extraction Procedure from a Major Public Database [J]. International Journal of Modern Physics C,2002, 13(8):1105-1117.
    [104]Zhang C, Hastings ML, Krainer AR, et al. Dual-specificity splice sites function alternatively as 5'and 3'splice sites[J].Proc. Natl. Acad. Sci.,2007,104 (38):15028-15033.
    [105]Chen TM, Lu CC, Li WH. Prediction of splice sites with dependency graphs and their expanded Bayesian networks [J]. Bioinformatics,2005,21:471-482.
    [106]YF Sun, XD Fan, YD Li. Identifying splicing sites in eukaryotic RNA:support vector machine approach[J]. Comput. Biol. Med.,2003,33:17-29.
    [107]Chang CC, Lin C J. LIBSVM:a library for support vector machines.http://www.csie.ntu. edu. tw/-cjlin/libsvm,2001.
    [108]CW Cheng, E Su, JK Hwang. et al. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information[J].BMC Bioinformatics.,2008,9:S6.
    [109]Ritchie MD, White BC, Parker JS, et al.Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases[J].BMC Bioinformatics,2003,4:28.
    [110]Su EC, Chiu HS, Lo A, et al. Protein subcellular localization prediction based on compartment specific features and structure conservation [J].BMC Bioinformatics,2007,8:330.
    [111]Koren E, Lev-Maor G, Ast G. The emergence of alternatice 3 and 5 splice site exons form constitutive exons [J]. PLoS Comput Biol,2007,3(5):e95.
    [112]Wang M, Marin A. Characterization and prediction of alternative splice sites [J]. Gene,2006, 366(2):219-227.
    [113]Clark F, Thanaraj TA. Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human [J]. Hum Mol Genet, 2002,11 (4):451-464.
    [114]H. Xia, J. Bi, Y. Li, Identification of alternative 5'/3'splice sites based on the mechanism of splice site competition, Nucleic Acids Res.34(2006) 6305-6313.
    [115]Zhang X, Lu X, Shi Q, et al. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data [J]. BMC Bioinformatics,2006,7:197.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700