用户名: 密码: 验证码:
人类基因组转录调节模体距离保守性的研究与转录起始位点的预测
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
对人类基因组转录调节相互作用网络的理解,是现代分子生物学面临的一个直接的挑战。这里的一个中心问题是,如何从近邻物种的启动子的比较,来提取进化信息和搜索进化保守性。通过对人类转录因子结合位点(transcriptionfactor binding site,TFBS)序列中的核苷k联体(k-mer)在人类和小鼠中分布的比较,我们发现一对转录调节7-mer模体(motif)之间的平均距离在人类和小鼠启动子中是保守的。我们称这种保守性为“距离保守性”。这个距离保守性是一种新的进化保守性,不依赖于碱基在基因组序列中的严格定位。利用这种k-mer距离保守性可以发展非联配方法来实现在基因组范围快速地发现转录调节模体。本文中,我们用距离保守性在基因组范围对保守转录调节模体进行搜索,成功率为90%。另外,作为对距离保守性的进一步检验,我们研究了人类组织特异性的转录调节模体对(motif pair),发现在由距离参数构成的2维空间中,对于28个组织,模体对可以显著地区别于其对照。据此,我们由距离参数构成特征向量,采用Fisher判别分析对人类28个组织的顶上140对转录调节模体的最可几对进行了预测。
     本文的另一个关于转录调节的相关工作是人类基因组转录起始位点(transcription start sites,TSS)的预测。启动子序列和转录起始位点的精确识别对于解释人类转录调节网络是至关重要的。随着统计理论的发展和机器学习算法在生物信息学预测方面的成功应用,发展新的高效的理论预测模型,在基因组尺度对转录起始位点进行辅助注释,已经成为当今生物信息学发展的主流方向之一。UCSC(University of California Santa Cruz)基因组浏览网站就接受了诸多的基因预测模型,作为基因组尺度的基因辅助注释工具。本文中,我们应用多样性增量结合二次判别分析(Increment of Diversity with Quadratic Discriminantanalysis,IDQD)方法对人类基因组转录起始位点进行了预测。在典型的TSS数据集上,正负集数据比为1:58的情形下,我们的预测结果敏感性和阳性预报值均高于65%。使用ROC和PRC评估算法性能,在正负集数据比分别为1:679和1:113的情形下,auROC均高于96%,auPRC分别为26%和64%。对4、21和22号染色体的全基因组搜索,我们预测了单一启动子和可变启动子5'端的第一个TSS,在正负集数据比分别为1:138和1:68的情形下,auROC分别为93%和97%,auPRC分别为40%和65%。以上结果在相同口径下优于最新报道的国外SVM预测精度。我们的结果显示,多样性增量结合二次判别分析(IDQD)方法有能力解决复杂的生物信息学分类问题。
     IDQD算法程序即及人类基因组TSS预测的相关数据资料可以在网址http://jichubu.imut.edu.cn/IDQD/idqd.htm找到。
     全文共分5章,第一章到第三章主要是讨论距离保守性问题,第四章和第五章讨论IDQD算法以及该算法在人类基因组转录起始位点预测问题中的应用。其中,第一章提出距离保守性概念,第二章应用距离保守性概念提出一个非联配的转录调节模体预测模型,给出距离保守性的第一个检验实例。第三章应用距离保守性概念对人类组织特异性转录调节模体对进行预测,给出距离保守性的第二个检验实例。第四章,详细描述IDQD算法,第五章,应用IDQD算法对人类基因组转录起始位点进行预测。
To understanding the interaction network among transcription-regulation elements in human is an immediate challenge for modern molecular biology. Here a central problem is how to extract evolutionary information and search the evolutionary conservation from the comparison of promoters of closely-related species. Through the comparative studies of k-mer distribution in human and mouse transcription factor binding site (TFBS) sequences we have discovered that the average distance between a pair of transcription regulatory 7-mer motifs is conservative in human-mouse promoters. The distance conservation is a new kind of evolutionary conservation, not based on the strict location of bases in genome sequence. By utilizing the conservation of k-mer distance it will be helpful to propose a non-alignment based approach for fast genome-wide discovery of transcription regulatory motifs. We demonstrated the distance conservation by genome-wide searching of conservative regulatory 7-mer motifs with successful rate 90%. Then, after defining human-mouse pair distance divergence parameter we studied the tissue-specific motif pairs and found that the parameter for motif pairs is 11 to 16 times smaller than for their controls for 28 tissues and these pairs can be clearly differentiated on 2-dimensional parameter plane. Finally, the mechanism of distance conservation was discussed briefly which is supposed to be related to the module structure of TFBSs.
     The accurate identification of promoter sequence and transcription start site is a challenge to the construction of human transcription-regulation networks. The novel method is highly necessary for improving the prediction.
     We used the method of Increment of Diversity with Quadratic Discriminant analysis (IDQD) to predict the transcription start sites (TSS). In typical TSS set prediction both sensitivity and positive predictive value have achieved a value higher than 65% with positives/negatives ratio 1:58. The performance evaluations by using Receiver Operator Characteristics (ROC) and Precision Recall Curves (PRC) were carried out, which give area under ROC(auROC) higher than 96% and area under PRC(auPRC)≈26% for positives/negatives ratio 1:679, 64% for postives/negatives ratio 1:113. In whole genome searching we made prediction on alternative-promoter-less and alternative-promoter-containing TSSs in chromosomes 4, 21 and 22 and obtained auROC =93% and auPRC =40% for positives/negatives ratio 1:138 and auROC =97% and auPRC =65% for positives/negatives ratio 1:68. The work shows the IDQD method is capable of solving complicate classification problems in bioinformatics.
     The implementation of IDQD algorithm, datasets and online-only supplementary data are available at the web site http://jichubu.imut.edu.cn/IDQD/idqd.htm.
引文
1 Taverner N.V., Smith J.C., Wardle F.C. Identifying transcriptional targets.Genome Biology, 2004, 5(3):210:1-7.
    
    2 Wasserman W.W., Sandelin A. applied bioinformatics for the identification of regulatory elements. Nature Reviews Genetics,2004,5(4): 276-287.
    
    3 Schena M., Shalon D., Davis R.W. Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science, 1995,270(5235):467-470.
    
    4 Shannon M.F., Rao S. Transcription:Of chips and ChIPs. Science,2002, 296(5568):666-669.
    
    5 Steensel B.V., Henikoff S. Identification of in vivo DNA targets of chromatin proteins using tethered Dam methyltransferase. Nature Biotechnology,2000,18(4):424-428.
    
    6 Frith M.C., Hansen U., Spouge J.L., et al. Finding functional sequence elements by multiple local alignment. Nucleic Acids Research, 2004,32(1): 189-200.
    
    7 Pavesi G., Mereghetti P., Mauri G., et al. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Research, 2004, 32: W199- 203.
    
    8 Blanchette M., Tompa M. Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting. Genome Research,2002,12(5): 739-748.
    
    9 Cliften P., Sudarsanam P., Desikan A., et al. Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting. Science,2003,301(5629):71-76.
    
    10 Moses A.M., Chiang D.Y., Pollard D. A., et al. MONKEY: identifying conserved transcription factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biology, 2004, 5(12):R98.1-15.
    
    11 Chin C.S., Chuang J.H., Li H. Genome-wide regulatory complexity in yeast promoters: Separation of functionally conserved and neutral sequence. Genome Research, 2005,15(2): 205-213.
    
    12 Bussemaker H.J., Li H., Siggia E.D. Regulatory element detection using correlation with expression. Nature Genetics,2001,27(2): 167-171.
    13 Xie X.H., Lu J., Kulbokas E.J., et al. Systematic discovery of regulatory motifs in human promoters and 3'UTRs by comparison of several mammals.Nature,2005,434(7031):338-345.
    
    14 Tompa M., Li N., Bailey T.L. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology,2005,23(1): 137-144.
    
    15 Elemento O., Tavazoie S. Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biology,2005,6(2):R18.1-27.
    
    16 Matys V., Kel-Margoulis O.V., Fricke E., et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Research, 2006,34, D108-110.
    
    17 Schones D.E., Smith, A.D., Zhang, M.Q. Statistical significance of cis-regulatory modules. BMC Bioinformatics,2007,8:19.
    
    18 Smith, A.D., Sumazin, P., Zhang, M.Q. Identifying tissue-selective transcription factor binding sites in vertebrate promoters. Proc. Natl. Acad. Sci. U.S.A. 2005,102,1560-1565.
    
    19 Smith, A.D., et al. DNA motifs in human and mouse proximal promoters predict tissue-specific expression. Proc. Natl. Acad. Sci. U.S.A. 2006,103,6275-6280.
    
    20 Yu, X.P., et al. Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues. Nucleic Acids Res. 2006,34,4925-4936.
    
    21 Tan K., McCue L.A., Stormo G.D. Making connections between novel transcription factors and their DNA motifs. Genome Res, 2005,15: 312-320.
    
    22 Burke T.W., Kadonaga J.T. Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters. Genes & Dev., 1996,10:711-724.
    
    23 Burke T.W., Kadonaga J.T. The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAF_Ⅱ60 of Drosophila .Genes & Dev., 1997, 11(22): 3020-3031.
    
    24 Chen Z., Manley J.L. Core promoter elements and TAFs contribute to the diversity of transcriptional activation in vertebrates. Mol. Cell. Biol., 2003,23(20): 7350-7362.
    
    25 Bajic, V.B., et al. Mice and men: their promoter properties. PLoS Genetics , 2006,2, e54.
    
    26 Gershenzon, N.I. and Ioshikhes, I.P. Synergy of human pol II core promoter elements revealed by statistical sequence analysis. Bioinformatics, 2004,21,1295-1300.
    
    27 Roeder, R.G. The role of general initiation factors in transcription by RNA polymerase II. Trends. Biochem. Sci., 1996,21, 327-335.
    
    28 Jin X.V. et al. Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs.BMC Bioinformatics,2006,7:114.
    
    29 Pedersen A.G., Baldi P., Chauvin Y., Brunak S. DNA structure in human RNA polymerase II promoters.J Mol Biol. 1998,281(4):663-673.
    
    30 Tsai L., Luo L.F., Sun, Z.R. Sequence-dependent flexibility in promoter sequences. J. Biomol. Struc. & Dynamics, 2002,20,127-134.
    
    31 Kanhere A., Bansal M.. A novel method for prokaryotic promoter prediction based on DNA stability. BMC Bioinformatics. 2005, 6(1):1-10.
    
    32 Hashimoto S., Suzuki Y., Kasai Y., Morohoshi K., Yamada T., Sese J.,Morishita S., Sugano S., Matsushima K. 5'-end SAGE for the analysis of transcriptional start sites. Nat Biotechnol 2004, 22:1146-1149.
    
    33 Suzuki Y., Yamashita R., Sugano S., Nakai K.. DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Res 2004,32:D78-D81.
    
    34 Carninci P., Kasukawa T., Katayama S., Gough J., Frith M.C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C, et al. The transcriptional landscape of the mammalian genome. Science 2005,309:1559-1563.
    
    35 Kim T.H., Barrera L.O., Zheng M., Qu C., Singer M.A., Richmond T.A., Wu Y., Green R.D., Ren B. A high-resolution map of active promoters in the human genome. Nature 2005, 436:876-880.
    
    36 Fickett J.W., Hatzigeorgiou A.G Eukaryotic promoter recognition.Genome Res. 1997,7(9):861-78.
    
    37 Pedersen A.G., Baldi P., Chauvin Y., Brunak S. The biology of eukaryotic promoter prediction--a review. Comput Chem. 1999,23(3-4): 191-207.
    
    38 Bajic, V.B., Tan, S.L., Suzuki, Y. and Sugano, S. Promoter prediction analysis on the whole human genome. Nature Biotechnology, 2004,22,1467-1473.
    39 Claverie JM, Audic S. The statistical significance of nucleotide position-weight matrix matches. Comput Appl Biosci, 1996,12(5):431 -439.
    
    40 Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol. Biol., 1990,212(4):563-578.
    
    41 Solovyev V.V., Salamov A.A. The Gene-Finder computer tools for analysis of human and model organisms genome sequences. In Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology (eds.Rawling C.,Clark D., Altman R.,Hunter L.,Lengauer T.,Wodak S.), Halkidiki, Greece, AAAI Press, 1997,5: 294-302.
    
    42 Solovyev V.V. Statistical approaches in Eukaryotic gene prediction. In Handbook of Statistical genetics (eds. Balding D. et al.), John Wiley & Sons, Ltd., 2001, 83-127.
    
    43 Solovyev V.V., Shahmuradov LA. PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res., 2003,31(13):3540-3545.
    
    44 Zhang M.Q. Identification of Human Gene Core Promoters in Silico. Genome Res., 1998,8(1):319-326.
    
    45 Down T. A., Hubbard T.J. Computational Detection and Location of Transcription Start Sites in Mammalian Genomic DNA Genome Res. 2002,12:458-461.
    
    46 Davuluri R.V., Grosse L, Zhang M.Q. Computational identification of promoters and first exons in the human genome. Nature Genetics, 2001,29:412-417.
    
    47 Audic S., Claverie J.M. Detection of eukaryotic promoters using Markov transition matrices. Comput Chem. 1997,21(4):223-227.
    
    48 Ohler U., Liao G.C., Niemann H., Rubin G.M. Computational analysis of core promoters in the Drosophila genome. Genome Biol. 2002,3(12): RESEARCH0087.
    
    49 Ohler U., Harbeck S., Niemann H., Noeth E., Reese M.G .Interpolated Markov chains for eukaryotic promoter recognition. Bioinformatics, 1999,15(5):362-369.
    
    50 Ohler U., Stemmer G., Harbeck S., Niemann H. Stochastic segment models of eukaryotic promoter regions. Pac. Symp. Biocomput. 2000,5:377-388.
    
    51 Ohler U., Niemann H. Identification and analysis of eukaryotic promoters: recent computational approaches.Trends Genet., 2001,17:56-60.
    
    52 Ohler U., Niemann H., Liao G., Rubin G.M. Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics, 2001,17:S199-S206.
    
    53 Burge C., Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997,268:78-94.
    
    54 Burge C. B., Karlin S. Finding the genes in genomic DNA. Curr. Opin. Struct. Biol. 1998,8: 346-354.
    
    55 Knudsen S. Promoter 2.0: for the recognition of PolII promoter sequences. Bioinformatics, 1999, 15:356-361.
    
    56 Reese M.G. Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem 2001,26(1),51-56.
    
    57 Hutchinson G.B. The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Comput Appl Biosci 1996,12(5):391-398.
    
    58 Scherf M., Klingenhoff A., Werner, T. Highly Specific Localization of Promoter Regions in Large Genomic Sequences by PromoterInspector: A Novel Context Analysis Approach. J. Mol. Biol. 2000,297 (3): 599-606.
    
    59 Gangal R., Sharma.P. Human pol II promoter prediction: time series descriptors and machine learning. Nucleic Acids Research, 2005,33(4): 1332-1336.
    
    60 Sonnenburg S., Zien A., Ratsch G ARTS: accurate recognition of transcription starts in human. Bioinformatics, 2006,22, e472-e480.
    
    61 吕军,罗辽复. 人类pol II启动子的识别. 生物化学与生物物理进展,2005,32(12):1185-1191.
    
    62 Jun Lu, Liaofu Luo. Prediction for human transcription start site using diversity measure with quadratic discriminant.Bioinformation,2008,2(7):316-321.
    
    63 Liu, R.X., David, J. Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling. Genome Res., 2002,12(3):462-469.
    
    64 Bajic V.B., Seah S.H., Chong A., Zhang G., Koh J.L.Y., Brusic V. Dragon Promoter Finder: recognition of vertebrate RNA Polymerase II promoters. Bioinformatics, 2002,18(1): 198-199.
    
    65 Bajic V.B., Chong A., Seah S.H., Brusic V. Intelligent System for Vertebrate Promoter Recognition.IEEE Intelligent Systems, 2002,17 (4): 64-70.
    66 Bajic V.B.,Seah S.H.,Chong A.,Krishnan S.P.T.,Koh J.L.Brusic Y.,V.Computer model for recognition of functional transcription start sites in polymerase Ⅱ promoters of vertebrates.Journal of Molecular Graphics & Modeling,2003,21(5):323-332.
    67 Bajic V.B.,Brusic V.Computational detection of vertebrate polymerase Ⅱ promoters.Methods in Enzymology,2003,370:237-50.
    68 Xie X.D.,Wu S.H.,Lam K.M.,Yan D.PromoterExpiorer:an effective promoter identification method based on the AdaBoost algorithm.Bioinformatics,2006,22(22):2722-2728.
    69 Zhang L R,Luo L F.Splice site prediction with quadratic discriminant analysis using diversity measure.Nucleic Acid's Research,2003,31(21):6214-6220.
    70 Doniger S W,Huh J,Fay J C.Identification of functional transcription factor binding sites using closely related Saccharomyces Species.Genome Research,2005,15(5):701-709.
    71 Wasserman,W.W.,et al.Human- mouse genome comparisons to locate regulatory sites.Nat.Genet.2000,26:225-228.
    72 Xuan Z.Y.,Zhao F.,Wang J.H.,et al.Genome-wide promoter extraction and analysis in human,mouse and rat.Genome Biology,2005,6(8):R72.1-12.
    73 Thakurta D.G.Computational identification of transcriptional regulatory elements in DNA sequence.Nucleic Acids Research,2006,34(12):35.85-3598.
    74 Blanchette M.,Bataille A.R.,Chen X.,et al.Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression.Genome Research,2006,16(5):656-668.
    75 Smith A.D.,Sumazin P.,Zhang M.Q.Tissue-specific regulatory elements in mammalian promoters.Mol Syst Biol.2007,3:73-80.
    76 Frith M.C.,Ponjavic J.,Fredman D.,Kai C.,Kawai J.,Carninci P.,Hayashizaki Y.,Sandelin A..Evolutionary turnover of mammalian transcription start sites.Genome Res.,2006,16:713-722.
    77 吕军,罗辽复,张颖,赵巨东.用非联配方法预测人类转录调节模体.生物化学与生物物理进展,2006,33(11):1044-1050.
    78 Jun Lu,Liaofu Luo,Ying Zhang.Distance Conservation of Transcription Regulatory Motifs in Human Promoters.(Submitted).
    79 Pritsker M.,et al.Whole-genome discovery of transcription factor binding sites using network-level conservation.Genome Res.,2004,14:99-108.
    80 Gilad Y.,et al.Expression profiling in primates reveals a rapid evolution of human transcription factors.Nature,2006,440:242-249.
    81 Zhang C.L.,et al.A clustering property of highly-degenerate transcription factor binding sites-in the mammalian genome.Nucleic Acids Res.,2006,34:2238-2246.
    82 Yu X.P.,et al.Genome-wide prediction and characterization of interactions between transcription factors in Saccharomyces cerevisiae.Nucleic Acids Res.,2006,34:917-927.
    83 Laxton R.R.The measure of diversity.J Theor Biol,1978,70(1):51-67.
    84 徐克学.生物数学.北京:科学出版社,1999.278-286.
    85 Li Q.Z.,Lu Z.Q.The Prediction of the Structural Class of Proein:Application of the Measure of Diversity.Journal of Theoretical Biology,2001,213(3):493-502.
    86 李凤敏,李前忠.蛋白质亚细胞定位的识别.生物物理学报,2004,20(4):297-306.
    87 陈颖丽,李前忠.用离散量方法预测细胞凋亡蛋白的亚细胞位置.内蒙古大学学报(自然科学版),2004,35(4):413-417.
    88 陈翠霞,李前忠,林昊.拟南芥和线虫基因序列及剪切位点的理论预测.生物物理学报,2004,20(2):125-131.
    89 罗辽复.生命进化的物理观.上海:上海科学出版社,2000,210-213.
    90 张颖,罗辽复,吕军.使用多样性量预测磷酸化位点.内蒙古大学学报(自然科学版),2008,39(1):34-39.
    91 张颖,贾芸,吕军.大肠杆菌Sigma70启动子的识别.生物物理学报,2007,23(6):475-481.
    92 贾芸,赵巨东,吕军.基于N端信号的蛋白质亚细胞定位识别.内蒙古工业大学学报,2008.
    93 Lin H.,Li Q.Z.Using pseudo amino acid composition to predict protein structural class:Approached by incorporating 400 dipeptide components.J Comput Chem.2007 Mar 1.
    94 Lin H.,Li Q.Z.Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant.Biochem Biophys Res Commun.2007,354(2):548-51.
    95 Chen Y.L.,Li Q.Z.Prediction of the subcellular location of apoptosis proteins.J Theor Biol.,2007,245(4):775-83.
    96 McLachlan G.J.Discriminant Analysis and Statistical Pattern Recognition.John Wiley & Sons,New York.1992,1-526.
    97 Zhang M.Q.Identification of protein coding regions in the human genome by quadratic discriminant analysis.Proc.Natl.Acad.Sci.USA,1997,94:565-568.
    98 Luo L.F.Information Biology:Hypotheses on Coding Information Quantity.内蒙古大学学报,2006,37:258-294.
    99 罗辽复.信息生物学刍议.内蒙古大学学报,2005,36:653-664.
    100 罗辽复.垃圾DNA与信息生物学.科学,2006,58:24-28.
    101 Luo L.F.,Bai G.Y.Maximum information principle & evolution of nucleotide sequences.J Theor Biol.,1995,174:131-136.
    102 晋宏营,罗辽复,张利绒.核算.蛋白质结合能在剪切位点识别中的应用.生物物理学报,2007,23(3):185-191.
    103 Down,T.A.and Hubbard,T.J.Computational detection and location of transcription start sites in mammalian genome DNA.Genome Res.,2002,12:458-461.
    104 Harbison,C.T.,et al.Transcriptional regulatory code of a eukaryotic genome.Nature,2004,431:99-104.
    105 Kimura,K.,et al.Diversification of transcriptional modulation:Large-scale identification and characterization of putative alternative promoters of human genes.Genome Res.,2006,16:55-65.
    106 Carninci,P.,et al.Genome-wide analysis of mammalian promoter architecture and evolution.Nature Genetics,2006,38:626-635.
    107 Suzuki,Y.,Yamashita,R.,Nakai,K.and Sugano,S.DBTSS,Database of human transcriptional start sites and full-length cDNAs.Nucleic Acids Res.,2002,30:328-331.
    108 Yamashita,R.et al.DBTSS,Database of human transcriptional start sites,progress report.Nucleic Acids Res.,2006,34:D86-89.
    109 Kent,W.J.,et al.The Human Genome Browser at UCSC.Genome Res.,2002,12:996-1006.
    1110 Karolchik,D.,et al.The UCSC Genome Browser Database.Nucleic Acids Res.,2003,31:51-54.
    111 Hsu,F.,Kent,W.J.,Clawson,H.,Kuhn,R.M.,Diekhans,M.and Haussler,D.The UCSC Known Genes. Bioinformatics, 2006,22: 1036-1046.
    112 Schmid C.D., Perier R., Praz V., and Bucher P. EPD in its twentieth year: towards complete promoter coverage of selected model organisms.Nucleic Acids Res., 2006; 34: D82 - D85.
    113 Landry, J.R., Mager, D.L., Wilhelm, B.T. Complex controls:The role of alternative promoters in mammalian genomes. Trends Genet. 2003,19: 640-648.
    114 Baek D., Davis C., Ewing B., Gordon D., Green P. Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters. Genome Res., 2007 17: 145-155.
    115 Jacques, P.E., Rodrigue, S., Gaudreau, L., Goulet, J. and Brzezinski, R. Detection of prokaryotic promoters from the genomic distribution of hexanucleotide pairs. BMC Bioinformatics, 2006,7: 423.
    116 Zavolan, M., van Nimwegen, E., and Gaasterland, T. Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. Genome Res., 2002,12: 1377-1385.
    117 Trinklein, N.D., Aldred, S.J., Saldanha, A.J., and Myers, R.M. Identification and functional analysis of human transcriptional promoters. Genome Res., 2003,13: 308-312.
    118 Sharov, A.A., Dudekula, D.B., and Ko, M.S. Genome-wide assembly and analysis of alternative transcripts in mouse. Genome Res., 2005,15: 748-754.
    119 Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K.,Ponjavic, J., Semple, C.A., Taylor, M.S., Engstrom, P.G., Frith, M.C.,et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 2006,38: 626-635.
    120 Cooper, S.J., Trinklein, N.D., Anton, E.D., Nguyen, L., and Myers, R.M.. Comprehensive analysis of transcriptional promoter structureand function in 1% of the human genome. Genome Res,. 2006,16: 1-10.
    121 The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.2007,1147:799-816.
    122 Koch C.M., Andrews R.M., Flicek P., et al. The landscape of histone modifications across 1% of the human genome in five human cell lines.Genome Res., 2007,17: 691-707.
    123 Trinklein N.D., Karaoz U., Wu J.Q., et al. Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. Genome Res., 2007,17: 720-731.
    124 Rada-Iglesias A., Enroth S., Ameur A., et al. Butyrate mediates decrease of histone acetylation centered on transcription start sites and down-regulation of associated genes. Genome Res., 2007,17: 708-719.
    125 Gerstein M.B., Bruce C, Rozowsky J.S., et al. What is a gene, post-ENCODE? History and updated definition. Genome Res., 2007,17: 669-681.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700