最大信息原理、能量及选择约束在基因剪接位点预测分析中应用的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
确定基因组内所有基因并阐明基因的功能,不仅要依靠实验手段,还需要发展理论方法对实验进行指导。最大信息原理(maximum information principle,MIP)是非平衡统计理论的一个基本原理,该原理是模拟生物进化中突变——选择机制的一个很好的模型,可作为生物信息学提取信息的重要依据。完整基因结构的预测是当前研究的一个重要课题,其中一个关键环节是剪接位点(包括组成性和可变剪接位点)及各种可变剪接事件的精确识别,而预测已知剪接位点的侧翼竞争者是预测可变5'或者可变3'剪接位点事件的关键。
     本文把最大信息原理应用到剪接反应理论分析中,导出了剪接位点片段的反应自由能表达式;通过引入选择压力指数概念及相应的约束,导出了序列片段中k-mer的选择压力指数表达式。当把理论应用到剪接位点及其侧翼竞争者的预测问题时,获得了较高的预测精度。主要研究内容如下:
     1.从剪接反应的基本物理原则出发,应用传统最大信息原理分析了剪接位点保守片段。引入剪接位点片段在剪接反应中所涉及的反应自由能概念及相应的约束条件,基于反应自由能加性假设,推导出了剪接位点片段所涉及反应自由能的表达式。作为一个简化模型,该式能用于估计一个5'或者3'剪接位点片段在剪接反应中所涉及的自由能变化。把它运用到剪接位点的预测问题中进行检验时,预测结果精度较高,这说明其较为合理地反映了剪接反应的实际情况。
     2.作为剪接反应自由能理论估计的一个开端,精确性仍需提高。我们进一步把反应自由能加性假设改进为包含了剪接位点片段中各碱基之间关联的形式,并把传统的最大信息原理改进为包含背景概率的形式;进而导出一个不但考虑了背景概率影响,而且较全面地包括了片段中各碱基之间关联的更精确的剪接位点片段所涉及反应自由能估计表达式。使用该式对剪接位点进行预测时,预测精度与改进前相比有明显提高,说明改进后的表达式更为成功地符合了剪接反应过程。
     3.使用改进后的剪接位点片段反应自由能表达式预测了人类和小鼠基因中的可变和组成性剪接位点及其侧翼竞争者,预测结果较好,精度比得上最大熵模型等一些当前流行的方法。对于已知剪接位点侧翼竞争者的预测,使用竞争者片段本身的反应自由能估计值预测的精度要高于另一个预测指标——已知剪接位点片段和候选竞争者片段之间的反应自由能估计值之差,这说明就大量剪接位点的总体效果而言,在已知剪接位点片段和侧翼竞争者片段之间的反应自由能竞争不是一个决定可变剪接位点选择的唯一主要因素。
     4.为了把序列片段或其中k-mer所受的自然选择强度数量化,引入选择压力指数的概念,并引入相应的约束条件,利用最大信息原理推导出序列片段中k-mer的选择压力指数表达式。该式易于和功能联系而对某些功能物理量进行定量估计,前面的剪接反应自由能估计方法也可被纳入到选择压力指数理论框架内。当把理论应用到人和小鼠的组成性和可变剪接位点预测中时,反应自由能估计值和侧翼序列中k-mer的平均选择压力指数共三个指标用二次判别法整合形成的综合方法的预测能力与单个反应自由能指标相比有明显提高。
     5.基于序列信息量构造了可用于编码区预测的信息差异指数,它的预测能力比得上非均匀指数。使用选择压力指数分析了剪接位点侧翼序列中k-mer所受选择的情况,得到5'剪接位点左侧的GT二核苷酸以及3'剪接位点左和右侧的AG受到较强负选择等一些有意义的结论;还发现剪接位点左右两侧序列中k-mer所受选择情况存在较大差异,并基于此结果设计了两个预测指标。通过选用反应自由能估计值等七个指标,二次判别法整合后对已知剪接位点侧翼竞争者进行预测,精度高于文献中的其它预测方法,是目前为止侧翼竞争者预测方法中精度最高的。
To recognize gene sequences in genome and to clarify all functions of genes, not only experimental approaches are needed, but also theoretical methods are required to guide experiments. The maximum information principle is a fundamental principle in non-equilibrium statistical theory; the principle gives a good model for simulating the mutation-selection mechanism in the biological evolution, and can be taken as an important basis for extracting information in bioinformatics. Prediction of the complete gene structure is an important subject in the current research, and a crucial part in the subject is to accurately identify the splice sites (not only constitutive but also alternative ones) and all kinds of alternative splicing events. For predicting alternative 5' or 3' splice site events, it is the key step to predict flanking competitors of given splice sites.
     In this dissertation, the maximum information principle is applied to theoretical analysis of the splicing reaction, and an expression of reaction free energy involved by a donor or acceptor site segment is deduced. By introducing the concept of selection pressure index and corresponding constraint, the expression of the selection pressure index of k-mer in the sequence is deduced. When the theory is employed to predict splice sites and their flanking competitors, higher prediction accuracy is obtained. The main contributions are summarized as follows:
     1. Based on the basic physical principle of splicing reaction, traditional maximum information principle is used to analyze the conservative segments around splice sites. By introducing the concept of reaction free energy involved by a splice site segment in the splicing reaction and corresponding constraint, under the assumption of reaction free energy additivity, an estimative expression of reaction free energy involved by a splice site segment is deduced. As a simplified model, the expression can be employed to estimate the free energy change involved by a donor or acceptor site segment during splicing reaction. When it is applied to the prediction for splice sites in test set, the results show high accuracy, so the expression well presents the actual situation of splicing reaction.
     2. As a beginning of the theoretical estimation of the splicing reaction free energy, the accuracy still needs to be improved. Furthermore, we improve the reaction free energy additivity assumption to contain the dependencies among bases in splice site segments, and modify the traditional maximum information principle to contain the background probability. And then we deduced a more accurate estimative expression of reaction free energy which contains not only the background probability factors, but also all kinds of dependencies among bases. When it is employed to predict splice sites, the prediction accuracy is obviously improved compared with the results before modified. That indicates the improved expression is in accordance with the splicing reaction process more accurately.
     3. The improved estimative expression of reaction free energy is used to predict alternative and constitutive splice sites and their flanking competitors in human and mouse genes, the results are satisfactory. The prediction ability of the expression is comparable with some current popular methods such as maximum entropy model etc. For the prediction of flanking competitors of given splice sites, The reaction free energy of the candidate competitor itself outperforms another measure—the reaction free energy subtraction between a given splice site and its candidate competitor segment, that implies as far as general effect of the numerous splice sites is concerned, reaction free energy competition between a given splice site segment and its flanking competitor segment is not an only primary factor for alternative splice site selection.
     4. With the purpose of quantifying the intensity of natural selection on sequence segment or k-mers in it, we introduce the concept of selection pressure index and the corresponding constraint condition, and deduce the selection pressure index expression of k-mer in sequence segment by use of the maximum information principle. The expression can easily link with functions and then quantitatively estimate some physical quantity, the foregoing method for estimating the splicing reaction free energy can also be included into the frame of selection pressure index theory. When the theory is adopted to the prediction of constitutive and alternative splice sites of human and mouse, the prediction ability of integrative method, which is formed by the integration of tliree measures (estimative value of reaction free energy, average selection pressure indexes of k-mers in two flanking sequences), is obviously improved compared with single reaction free energy measure.
     5. Based on the information content of sequences, the information discrepancy index which can be used to predict coding regions is devised. The prediction ability of the index is comparable with the heterogeneity index. The selected situation of k-mers in flanking sequences of splice sites is analyzed by use of the selection pressure index, and some interesting conclusions are drown, such as GT dinucleotide on the left side of 5' splice site is under negative selection, so is AG on the left and right sides of 3' splice site. It is found that the selected situations of k-mers in the left and light flanking sequences of splice site are quite different, and two prediction measures are designed based on the result. By selecting seven measures including the estimative value of reaction free energy, etc., and employing quadratic discriminant analysis to integrate them into a coherent method, we predict the flanking competitors of given splice sites. The prediction accuracy is higher than the other methods in current literatures. It has the highest accuracy for flanking competitor prediction up to now.
引文
[1]赵剑华,王秀琴,刘芝华,吴旻.功能基因组学的研究内容与方法[J].生物化学与生物物理进展,2000,27(1):6-8.
    [2]Gilbert,W.Towards a paradigm shift in biology[J].Nature,1991,.349(6305):99.
    [3]罗辽复.生命进化的物理观[M].上海:上海科学技术出版社,2000.
    [4]Ashurst,J.L.,Collins,J.E.Gene annotation:prediction and testing[J].Annu.Rev.Genomics Hum.Genet.,2003,4:69-88.
    [5]Vinayagam,A.,Konig,R.,Moormann,J.,Schubert,F.,etc.Applying support vector machines for gene ontology based gene function prediction[J].BMC Bioinformatics,2004,5:116.
    [6]George,R.A.,Liu,J.Y.,Feng,L.L.,Bryson-Richardson,R.J.,etc.Analysis of protein sequence and interaction data for candidate disease gene prediction[J].Nucleic Acids Res.,2006,34(19):e130.
    [7]Meyer,I.M.,Durbin,R.Gene structure conservation aids similarity based gene prediction.[J].Nucleic Acids Res.,2004,32(2):776-783.
    [8]L'H(?)te,D.,Serres,C.,Veitia,R.A.,Montagutelli,X.,etc.Gene expression regulation in the context of mouse interspecific mosaic genomes[J].Genome Biology,2008,9(8):R133.
    [9]L(?)issig,M.From biophysics to evolutionary genetics:statistical aspects of gene regulation[J].BMC Bioinformatics,2007,8(Suppl 6):$7.
    [10]Wang,Z.F.,Burge,C.B.Splicing regulation:from a parts list of regulatory elements to an integrated splicing code[J].RNA,2008,14(5):802-813.
    [11]Blencowe,B.J.Splicing regulation:the cell cycle connection[J]..Current Biology,2003,13(4):R149-R151.
    [12]Castle,J.C.,Zhang,C.,Shah,J.K.,Kulkami,A.V.,etc.Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines[J].Nature Genetics,2008,40(12):1416-1425.
    [13]Zheng,Z.M.Regulation of alternative RNA splicing by exon definition and exon sequences in viral and mammalian gene expression[J].J.Biomed.Sci.,2004,11(3):278-294.
    [14]Lewin,B.Genes Ⅷ[M].New Jersey:Pearson Education Inc.,2004.
    [15]Kol,G.,Lev-Maor,G.,Ast,G.,Human-mouse comparative analysis reveals that branch-site plasticity contributes to splicing regulation[J].Hum.Mol.Genet.,2005,14(11):1559-1568.
    [16]Nilsen,T.The spliceosome:the most complex macromolecular machine in the cell?[J].Bioessays,2003,25(12):1147-1149.
    [17]Zhou,Z.,Licklider,L.J.,Gygi,S.P.,Reed,R.Comprehensive proteomic analysis of the human spliceosome[J].Nature,2002,419(6903):182-185.
    [18]Guth,S.,Valcarcel,J.Kinetic role for mammalian SF1/BBP in spliceosome assembly and function after polypyrimidine tract recognition by U2AF[J].J.Biol.Chem.,2000,275(48):38059-38066.
    [19]Wikipedia,the free encyclopedia.Website:http://en.wikipedia.org/wiki/Splicing_(genetics).
    [20]杨乌日吐.基于序列信息预测选择性剪接位点和盒式外显子.博士学位论文.呼和浩特:内蒙古大学,2008.
    [21]Brow,D.A.Allosteric cascade of spliceosome activation[J].Annu.Rev.Genet.,2002,36:333-360.
    [22]Kotlajich,M.V.,Crabb,T.L.,Hertel,K.J.Spliceosome assembly pathways for different types of alternative splicing converge during commitment to splice site pairing in the A complex[J].Mol.Cell.Biol.,2009,29(4):1072-1082.
    [23]Reed,R.,Hurt,E.A conserved mRNA export machinery coupled to pre-mRNA splicing[J].Cell,2002,108(4):523-531.
    [24]Le Hir,H.,Gatfield,D.,Izaurralde,E.,Moore,M.J.The exon-exon junction complex provides a binding platform for factors involved in mRNA export and nonsense-mediated mRNA decay [J].EMBO J.,2001,20(17):4987-4997.
    [25]Kohler,A.,Hurt,E.Exporting RNA from the nucleus to the cytoplasm[J].Nat.Rev.Mol.Cell Biol.,2007,8(10):761-773.
    [26]Tarn,W.Y.,Steitz,J.A.A novel spliceosome containing U11,U12,and U5 snRNPs excises a minor class(AT-AC) intron in vitro[J].Cell,1996,84(5):801-811.
    [27]Basu,M.K.,Makalowski,W.,Rogozin I.B.,Koonin E.V.U12 intron positions are more strongly conserved between animals and plants than U2 intron positions[J].Biology Direct,2008,3:19.
    [28]Fedor,M.J.Alternative splicing minireview series:combinatorial control facilitates splicing regulation of gene expression and enhances genome diversity[J].J.Biol.Chem.,2008,283(3):1209-1210.
    [29]Irimia,M.,Rukov,J.L.,Roy,S.W.,etc.Quantitative regulation of alternative splicing in evolution and development[J].Bioessays,2009,31(1):40-50.
    [30]Lian,Y.,Garner,H.R.Evidence for the regulation of alternative splicing via complementary DNA sequence repeats[J].Bioinformatics,2005,21(8):1358-1364.
    [31]Blencowe,B.J.Exonic splicing enhancers:mechanism of action,diversity and role in human genetic diseases[J].Trends Biochem.Sci.,2000,25(3):106-110.
    [32]Wang,Z.F.,Rolish,M.E.,Yeo,G.,etc.Systematic identification and analysis of exonic splicing silencers[J].Cell,2004,119(6):831-845.
    [33]Lenasi,T.,Peterlin,B.M.,Dovc,P.Distal regulation of alternative splicing by splicing enhancer in equine beta-casein intron 1[J].RNA,2006,12(3):498-507.
    [34]Wagner,E.J.,Baraniak,A.P.,Sessions,O.M.,etc.Characterization of the intronic splicing silencers flanking FGFR2 exon Ⅲb[J].J.Biol.Chem.,2005,280(14):14017-14027.
    [35]Smith,C.W.,Valc(?)rcel,J.Alternative pre-mRNA splicing:the logic of combinatorial control [J].Trends Biochem.Sci.,2000,25(8)::381-388.
    [36]Graveley,B.R.Alternative splicing:increasing diversity in the proteomic world[J].Trends Genet.,2001,17(2):100-107.
    [37]Modrek,B.,Resch,A.,Grasso,C.,Lee,C.Genome-wide detection of alternative splicing in expressed sequences of human genes[J].Nucleic Acids Res.,2001,29(13):2850-2859.
    [38]Lopez,A.J.Alternative splicing of pre-mRNA:developmental consequences and mechanisms of regulation[J].Annu.Rev.Genet.,1998,32:279-305.
    [39]Wang,M.,Matin,A.Characterization and prediction of alternative splice sites[J].Gene,2006,366(2):219-227.
    [40]Pozzoli,U.,Sironi,M.Silencers regulate both constitutive and alternative splicing events in mammals[J].Cell.Mol.Life Sci.,2005,62(14):1579-1604.
    [41]张利绒,罗辽复,邢永强,晋宏营.人类基因组中可变和组成性剪接位点的预测[J].生物化学与生物物理进展,2008,35(10):1188-1194.
    [42]Brett,D.,Pospisil,H.,Valcarcel,J.,etc.Alternative splicing and genome complexity[J].Nature Genetics,2002,30(1):29-30.
    [43]Modrek,B.,Lee,C.A genomic view of alternative splicing[J].Nature Genetics,2002,30(1):13-19.
    [44]Johnson,J.M.,Castle,J.,Garrett-Engele,P.,Kan,Z.Y.,etc.Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays[J].Science,2003,302(5653):2141-2144.
    [45]Kazan,K.Alternative splicing and proteome diversity in plants:the tip of the iceberg has just emerged[J].Trends Plant Sci.,2003,8(10):468-471.
    [46]Stetefeld,J.,Ruegg,M.A.Structural and functional diversity generated by alternative mRNA splicing[J].Trends Biochem.Sci.,2005,30(9):515-521.
    [47]Birzele,F.,Csaba,G.,Zimmer,R.Alternative splicing and protein structure evolution[J].Nucleic Acids Res.,2008,36(2):550-558.
    [48]Kriventseva,E.V.,Koch,I.,Apweiler,R.,etc.Increase of functional diversity by alternative splicing[J].Trends in Genetics,2003,19(3):124-128.
    [49]Matlin,A.J.,Clark,F.,Smith C.W.J.Understanding alternative splicing:towards a cellular code[J].Nat.Rev.Mol.Cell Biol.,2005,6(5):386-398.
    [50]Thanaraj,T.A.,Stamm,S.,Clark,F.,etc.ASD:the Alternative Splicing Database[J].Nucleic Acids Res.,2004,32(Database issue):D64-D69.
    [51]Liu,L.B.,Ho,Y.K.,Yau,S.Prediction of primate splice site using inhomogeneous Markov chain and neural network[J].DNA Cell Biol.,2007,26(7):477-483.
    [52]Rajapakse,J.C.,Ho,L.S.Markov encoding for detecting signals in genomic sequences[J].IEEE/ACM Trans.Comput.Biol.Bioinform.,2005,2(2):131-142.
    [53]冯秀程,钱敏平,邓明华,马小土,严熙婷.隐半马氏模型在3'剪接位点识别中的应用[J].生物化学与生物物理进展,2004,31(5):455-458.
    [54]Ho,L.S.,Rajapakse,J.C.Splice site detection with a higher-order markov model implemented on a neural network[J].Genome Informatics,2003,14:64-72.
    [55]Staden,R.The current status and portability of our sequence handling software[J].Nucleic Acids Res.,1986,14(1),217-231.
    [56]Yang,W.,Li,Q.Z.One parameter to describe the mechanism of splice sites competition[J].Biochem.Biophys.Res.Commun.,2008,368(2):379-381.
    [57]Zhang,M.Q.,Marr,T.G.A weight array method for splicing signal analysis[J].Comput.Appl. Bioscl, 1993,9(5): 499-509.
    [58] Yeo, G, Burge, C.B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals [J]. J. Comp. Biol., 2004, 11(2-3): 377-394.
    [59] Eng, L., Coutinho, G, Nahas, S., Yeo, G, etc. Nonclassical splicing mutations in the coding and noncoding regions of the ATM Gene: maximum entropy estimates of splice junction strengths [J]. Human Mutation, 2004,23(1): 67-76.
    [60] Pertea, M., Lin, X.Y., Salzberg, S.L. Genesplicer: a new computational method for splice site prediction [J]. Nucleic Acids Res., 2001, 29(5): 1185-1190.
    [61] Zhang, L.R., Luo, L.F. Splice site prediction with quadratic discriminant analysis using diversity measure [J]. Nucleic Acids Res., 2003, 31(21): 6214-6220.
    [62] Chen, T.M., Lu, C.C., Li, W.H. Prediction of splice sites with dependency graphs and their expanded bayesian networks [J]. Bioinformatics, 2005, 21(4): 471-482.
    [63] Cai, D., Delcher, A., Kao, B., Kasif, S. Modeling splice sites with Bayes networks [J]. Bioinformatics, 2000, 16(2): 152-158.
    [64] Brunak, S., Engelbrecht, J., Knudsen, S. Neural network detects errors in the assignment of mRNA splice sites [J]. Nucleic Acids Res., 1990, 18(16): 4797-4801.
    [65] Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., Ratsch, G. Accurate splice site prediction using support vector machines [J]. BMC Bioinformatics, 2007, 8(Suppl. 10): S7.
    [66] Degroeve, S., Saeys, Y., De Baets, B., Rouze, P., Van de Peer, Y. SpliceMachine: predicting splice sites from high-dimensional local context representations [J]. Bioinformatics, 2005, 21(8): 1332-1338.
    [67] Huang, J., Li, T., Chen, K., Wu, J. An approach of encoding for prediction of splice sites using SVM [J]. Biochimie, 2006, 88(7): 923-929.
    [68] Sun, Y.F., Fan, X.D., Li, Y.D. Identifying splicing sites in eukaryotic RNA: support vector machine approach [J]. Comput. Biol. Med., 2003, 33(1): 17-29.
    [69] Yamamura, M., Gotoh, O. Detection of the splicing sites with kernel method approaches dealing with nucleotide doublets [J]. Genome Informatics, 2003, 14: 426-427.
    [70] Zhang, Y., Chu, C.H., Chen, Y.X., Zha, H.Y., Ji, X. Splice site prediction using support vector machines with a Bayes kernel [J]. Expert Systems with Applications, 2006, 30(1): 73-81.
    [71] Baten, A., Chang, B., Halgamuge, S., Li, J. Splice site identification using probabilistic parameters and SVM classification[J].BMC Bioinformatics,2006,7(Suppl.5):S15.
    [72]Hebsgaard,S.M.,Korning,P.G.,Tolstrup,N.,Engelbrecht,J.,etc.Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information[J].Nucleic Acids Res.,1996,24(17):3439-3452.
    [73]Reese,M.G.,Eeckman,F.H.,Kulp,D.,Haussler,D.Improved splice site detection in Genie[J].J.Comput.Biol.,1997,4(3):311-323.
    [74]Mache,N.,Levy,P.EST/STS guides identification of genes in human genomic DNA[A].ISMB98 Poster,Montreal,Canada,1998.
    [75]Rogozin,I.B.,Milanesi,L.Analysis of donor splice signals in different eukaryotic organisms [J].J.Mol.Evol.,1997,45(1):50-59.
    [76]Garland,J.A.,Aalberts,D.P.Thermodynamic modeling of donor splice site recognition in pre-mRNA[J].Physical Review E,2004,69(5):041903.
    [77]Bi,J.N.,Xia,H.Y.,Li,F.,Zhang,X.G.,Li Y.D.The effect of U1 snRNA binding free energy on the selection of 5' splice sites[J].Biochem.Biophys.Res.Commun.,2005,333(1):64-69.
    [78]晋宏营,罗辽复,张利绒.核酸-蛋白质结合能在剪切位点识别中的应用[J].生物物理学报,2007,23(3):185-191.
    [79]Jin,H.Y.,Luo,L.F.,Zhang,L.R.Using estimative reaction free energy to predict splice sites and their flanking competitors[J].Gene,2008,424(1-2):115-120.
    [80]晋宏营,罗辽复,张利绒.使用估计的反应自由能预测组成性和可变剪接位点[J].生物物理学报,2009,25(1):57-64.
    [81]Xia,H.Y.,Bi,J.N.,Li,Y.D.Identification of alternative 5'/3' splice sites based on the mechanism of splice site competition[J].Nucleic Acids Res.,2006,34(21):6305-6313.
    [82]Jaynes,E.T.Information theory and statistical mechanics[J].Physical Review,1957,106(4):620-630.
    [83]Jaynes,E.T.Information theory and statistical mechanics Ⅱ[J].Physical Review,1957,108(2):171-190.
    [84]Haken,H.Information and Self-organization[M].Berlin:Springer-Verlag,1988.
    [85]Luo,L.F.Theoretic-Physical Approach to Molecular Biology[M].Shanghai:Shanghai Sci.Tech.Publishers,2004.
    [86]Luo,L.F.,Bai,G.Y.The maximum information principle and the evolution of nucleotide Sequences[J].J.Theor.Biol.,1995,174(2):131-136.
    [87]Bruers,S.A discussion on maximum entropy production and information theory[J].J.Phys.A:Math.Theor.,2007,40(27):7441-7450.
    [88]Dewar,R.Information theory explanation of the fluctuation theorem,maximum entropy production and self-organized criticality in non-equilibrium stationary states[J].J.Phys.A:Math.Gen.,2003,36(3):631-641.
    [89]Dong,P.Long-term equilibrium beach profile based on maximum information entropy concept [J].J.Waterw.Port C.-ASCE,2008,134(3):160-165.
    [90]Gerasimov,D.N.,Sinkevich,O.A.Boiling:Size distribution of bubbles as demanded by the principle of maximum information[J].High Temperature,2004,42(3):489-492.
    [91]汪小龙,袁志发,郭满才,宋世德,张全启,包振民.最大信息熵原理与群体遗传平衡[J].遗传学报,2002,29(6):562-564.
    [92]Miyano,H.Identification model based on the maximum information entropy principle[J]J.Math.Psychol.,2001,45(1):27-42.
    [93]罗辽复.非平衡统计理论[M].呼和浩特:内蒙古大学出版社,1990.
    [94]Lezon,T.R.,Banavar,J.R.,Cieplak,M.,Maritan,A.,Fedoroff,N.V.Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns[J].Proc.Natl.Acad.Sci.USA,2006,103(50):19033-19038.
    [95]Tseng,C.K.,Cheng,S.C.Both catalytic steps of nuclear pre-mRNA splicing are reversible[J].Science,2008,320(5884):1782-1784.
    [96]Smith,D.J.,Konarska,M.M.Mechanistic insights from reversible splicing catalysis[J].RNA,2008,14(10):1975-1978.
    [97]Graveley,B.R.Alternative splicing:regulation without regulators[J].Nat.Struct.Mol.Biol.,2009,16(1):13-15.
    [98]胡英主编.物理化学[M].北京:高等教育出版社,1999.
    [99]李庆国,汪和睦,李安之.分子生物物理学[M].北京:高等教育出版社,1992.
    [100]Berg,O.G.,von Hippel,P.H.Selection of DNA binding sites by regulatory proteins statistical-mechanical theory and application to operators and promoters[J].J.Mol.Biol.,1987,193(4):723-750.
    [101]Berg,O.G.,von Hippel,P.H.Selection of DNA binding sites by regulatory proteins Ⅱ.The binding specificity of cyclic AMP receptor protein to recognition sites [J]. J. Mol. Biol, 1988, 200(4): 709-723.
    [102] Benos, P.V., Bulyk, M.L., Stormo, G.D. Additivity in protein-DNA interactions: how good an approximation is it? [J]. Nucleic Acids Res., 2002, 30(20): 4442-4451.
    [103] Staley, J.P., Guthrie, C. An RNA switch at the 5' splice site requires ATP and the DEAD box protein Prp28p [J]. Molecular Cell, 1999, 3(1): 55-64.
    [104] Lund, M., Kjems, J. Defining a 5' splice site by functional selection in the presence and absence of Ul snRNA 5' end [J]. RNA, 2002, 8(2): 166-179.
    [105] Boguski, M.S., Lowe, T.M., Tolstoshev, C.M. dbEST-Database for "Expressed Sequence Tags" [J]. Nature Genetics, 1993,4(4): 332-333.
    [106] Kent, W.J. BLAT- the BLAST-like alignment tool [J]. Genome Research, 2002, 12(4): 656-664.
    [107] Chen, Q.K., Hertz, J.Z., Stormo, G.D. MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices [J]. Comput. Appl. Biosci, 1995, 11(5): 563-566.
    [108] Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae [J]. J. Mol. Biol, 2000,296(5): 1205-1214.
    [109] Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., etc. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment [J]. Science, 1993, 262(5131): 208-214.
    [110] Li, Q.Z., Lin, H. The recognition and prediction of σ~(70) promoters in Escherichia coli K-12 [J]. J. Theor. Biol, 2006,242(1): 135-141.
    [111] Hu, X.Z., Li, Q.Z. Prediction of the β-hairpins in proteins using support vector machine [J]. Protein Journal, 2008, 27(2): 115-122.
    [112] Burset, M., Guigo, R. Evaluation of gene structure prediction programs [J]. Genomics, 1996, 34(3): 353-367.
    [113] Reddy, T.E, DeLisi, C, Shakhnovich, B.E. Binding site graphs: a new graph theoretical framework for prediction of transcription factor binding sites [J]. PLoS Comput. Biol., 2007, 3(5): 844-854.
    [114] Chen, Y.L., Li, Q.Z. Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition [J]. J. Theor. Biol, 2007, 248(2): 377-381.
    
    [115] Metz, C.E. Basic principles of ROC analysis [J]. Semin. Nucl. Med., 1978, 8(4): 283-298.
    [116] Aalberts, D.P., Daub, E.G., Dill, J.W. Quantifying optimal accuracy of local primary sequence bioinformatics methods [J]. Bioinformatics, 2005, 21(16): 3347-3351.
    [117] Toh, K.A., Kim, J., Lee, S. Maximizing area under ROC curve for biometric scores fusion [J]. Pattern Recognition, 2008, 41(11): 3373-3392.
    [118] Hand, D.J., Till, R.J. A simple generalisation of the area under the ROC curve for multiple class classification problems [J]. Machine Learning, 2001,45(2): 171-186.
    [119] Burge, C.B. Modeling dependencies in pre-mRNA splicing signals. In Salzberg, S.L., Searls, D.B., and Kasif, S. (eds.), Computational Method in Molecular Biology [M], Elsevier Science, 1998: 129-164.
    [120] Wang, X.h., Gao, H.C., Shen, Y.F., Weinstock, G.M., etc. A high-throughput percentage-of-binding strategy to measure binding energies in DNA-protein interactions: application to genome-scale site discovery [J]. Nucleic Acids Res.,, 2008, 36(15): 4863-4871.
    [121] Donald, J.E., Chen, W.W., Shakhnovich, E.I. Energetics of protein-DNA interactions [J]. Nucleic Acids Res., 2007, 35(4): 1039-1047.
    [122] Djordjevic, M., Sengupta, A.M., Shraiman, B.I. A biophysical approach to transcription factor binding site discovery [J]. Genome Research, 2003,13(11): 2381-2390.
    [123] Florea, L., Francesco, V.D., Miller, J., Turner, R., etc. Gene and alternative splicing annotation with AIR [J]. Genome Research, 2005, 15(1): 54-66.
    [124] Thanaraj, T.A., Stamm, S. Prediction and statistical analysis of alternatively spliced exons [J]. Prog. Mol. Subcell. Biol, 2003, 31:1-31.
    [125] Stamm, S., Riethoven, J.J., Le, T.V., Gopalakrishnan, C, etc. ASD: a bioinformatics resource on alternative splicing [J]. Nucleic Acids Res., 2006, 34(Database issue): D46-D55.
    [126] Lim, L.P., Burge, C.B. A computational analysis of sequence features involved in recognition of short introns [J]. Proc. Natl. Acad. Sci. USA, 2001, 98(20): 11193-11198.
    [127] Gutierrez, G., Oliver, J.L., Marin, A. On the origin of the periodicity of three in protein coding DNA sequences [J].J. Theor. Biol., 1994,167(4): 413-414.
    [128]Lu,J.,Luo,L.F.,Zhang,Y.Distance conservation of transcription regulatory motifs in human promoters[J].Comput.Biol.Chem.,2008,32(6):433-437.
    [129]Mustonen,V.,L(?)ssig,M.Evolutionary population genetics of promoters:Predicting binding sites and functional phylogenies[J].Proc.Natl.Acad.Sci.USA,2005,102(44):15936-15941.
    [130]Zhou,Q.,Liu,J.S.Extracting sequence features to predict protein-DNA interactions:a comparative study[J].Nucleic Acids Res.,2008,36(12):4137-4148.
    [131]Ward,L.D.,Bussemaker,H.J.Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences[J].Bioinformatics,2008,24(13):i165-i171.
    [132]McLachlan,G.J.Discriminant Analysis and Statistical Pattern Recognition[M].New York:Wiley,1992.
    [133]Zhang,M.Q.Identification of protein coding regions in the human genome by quadratic discriminant analysis[J].Proc.Natl.Acad.Sci.USA,1997,94(2):565-568.
    [134]Lin,H.,Li,Q.Z.Using pseudo amino acid composition to predict protein structural class:Approached by incorporating 400 dipeptide components[J].J.Comput.Chem.,2007,28(9):1463-1466.
    [135]Li,F.M.,Li,Q.Z.Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach[J].Amino Acids,2008,34(1):119-125.
    [136]Feng,Y.E,Luo,L.F.Use of tetrapeptide signals for protein secondary-structure prediction[J].Amino Acids,2008,35(3):607-614.
    [137]张利绒,罗辽复.多样性指标用于基因中剪切位点的识别[J].生物化学与生物物理进展,2004,31(1):77-82.
    [138]吕军,罗辽复,张颖,赵巨东.用非联配方法预测人类转录调节模体[J].生物化学与生物物理进展,2006,33(11):1044-1050.
    [139]Tsonis,A.A.,Elsner,J.B.,Tsonis,P.A.Periodicity in DNA coding sequences:implications in gene evolution[J].J.Theor.Biol.,1991,151(3):323-331.
    [140]Sanchez,J.,Lopez-Villasenor,I.A simple model to explain three-base periodicity in coding DNA[J].FEBS Letters,2006,580(27):6413-6422.
    [141]Li,H.,Luo,L.F.The relation between codon usage,base correlation and gene expression level in Escherichia coli and yeast[J].J.Theor.Biol.,1996,181(2):111-124.
    [142]李宏,罗辽复.大肠杆菌编码区碱基片段的分析研究[J].生物物理学报,2001,17(1):167-173.
    [143]Cover,T.M.,Thomas,J.A.Elements of Information Theory(2nd Edition)[M].New Jersey:John Wiley & Sons,Inc.,2006.
    [144]Lander,E.S.,Linton,L.M.,Birren,B.,Nusbaum,C.,etc.Initial sequencing and analysis of the human genome[J].Nature,2001,409(6822):860-921.
    [145]Luo,L.F.,Li,H.,Zhang,L.R.ORF organization and gene recognition in the yeast genome[J].Comp.Funct.Genomics,2003,4(3):318-328.
    [146]吕军,罗辽复.人类pol Ⅱ启动子的识别[J].生物化学与生物物理进展,2005,32(12):1185-1191.
    [147]Eskesen,S.T.,Eskesen,F.N.,Ruvinsky,A.Natural selection affects frequencies of AG and GT dinucleotides at the 5' and 3' ends of exons[J].Genetics,2004,167(1):543-550.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700