拟南芥AtERFs家族DNA结合特性计算分析及其亚家族DREBs调节靶基因的预测
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
拟南芥乙烯反应元件结合因子(Arabidopsis ethylene responsive element binding factors, AtERFs)是植物特有的一类转录因子。大量研究表明,AtERFs转录因子在植物的生长发育过程中起到重要作用,尤其是在对植物生长逆境的调节中。AtERFs转录因子家族隶属于AP2/ERF超家族,目前在拟南芥全基因组中共发现124个AtERFs转录因子。AtERFs通过其保守的DNA结合结构域特异性识别一类顺式调节元件,如,GCC-box等,从而调节特定基因的转录。为了解析AtERF蛋白和DNA模体(motif)间的相互作用,了解AtERFs各亚家族间DNA结合特性的异同点,本研究从四个亚家族中分别选出一个AtERF蛋白(AtERF1, AtEBP, CBF1和AtERF4),通过同源模建办法,得到AtEBP, CBF1和AtERF4的DNA结合结构域三维构象并使用分子动力学模拟方法,比较分析四个蛋白与GCC-box的相互作用特征。依据这些特征,解析AtERF家族识别DNA模体的结构基础。本文中的结果提供了AtERF蛋白特异性识别DNA序列的详细信息,同时通过比较分析AtERF四个亚家族中各代表蛋白与DNA模体结合的特性,解析了关键氨基酸残基和DNA模体中的关键位点,从而为解释AtERF蛋白的功能分化提供一些有用的信息。
     探索转录因子的调节靶基因是构建转录调控网络的重要步骤,同时也是理解细胞对环境应答机制的重要线索。转录因子中的DNA结合结构域决定了转录因子对基因的特异性调控。这些结构域使转录因子能够在目标基因上游调节区域中找到特异的DNA基序。理论上来说,全基因组测序完成的物种都可以通过寻找特定DNA基序的办法来直接确定转录因子的靶基因。而通过实验确定一个转录因子的靶基因的工作仍然十分繁琐,尤其是在复杂的基因组序列中。本研究通过分析转录因子结合位点,同时引入机器学习领域的先进方法,提出了高效特异性强的计算预测转录因子靶基因的方法。同时,对DREBs转录因子的调节靶基因在全基因组范围内进行了预测,预测结果为进一步解析DREBs转录因子参与的植物抗逆调节分子网络提供了有价值的参考。
Arabidopsis ethylene responsive element binding factors (AtERFs) form a transcription factor super family. While the functionality of most AtERFs are unknown, a number of AtERFs are reported to play essential role in regulation of stress-related genes, through binding to a consensus motif GCC-box at the regulatory region by their DNA binding domains, i.e. ERF domains. Phylogenetic analysis of the ERF domains led to a classification of the AtERFs super family into four predominant sub-families.
     In the first section of this thesis, computational analysis of the structural properties of AtERF-DNA motif complexes was performed. We selected four AtERF proteins, AtERF1, AtEBP, CBF1 and AtERF4, as representatives from each sub-family respectively and constructed four AtERF DBD-DNA complexes through homology modeling. Molecular dynamics simulations were then performed to explore the interactions between the six conserved residues and the DNA motif, GCC-box. By comparing the interactions between the six conserved resides and GCC-box among the four AtERF DBD-DNA complexes, we revealed the common properties of protein-DNA interactions among the AtERFs and the differential roles of each base of GCC-box in specific recognition by AtERFs. Our results suggested that three amino acid residues Arg29, Glu39 and Arg41 played a vital role in direct readout of DNA. The position of the consensus sequence GCCGCC has it intrinsic disparity on binding with ERF domains. The CGNC element in the GCC motif was perhaps compulsory for recognition by ERF domains. Our results provided the structural evidences for the sequence dependent recognition mechanism of AtERFs.
     The identification of downstream target genes of specific transcription factors (TFs) is necessary in understanding cellular responses to environmental stimuli. Most existing structures of gene regulatory network are highly complicated as it involves cooperative interactions and feedback regulations. The discovery of the direct targets of transcription factors is a fundamental step to elucidate the construction of regulatory networks. Availability of genome sequences made it possible to discover the target genes of a specific transcription factor by looking for the locations of the specific recognition motifs in genome. In practice, however, the task is still difficult due to the complication of plant genomes. During the last decade many computational methods have been developed to identify the target genes of transcription factors successfully. Among the methods, the positional weight matrix (PWM) was the technique most widely used in describing the transcription factors binding sites (TFBS) and scanning the TFBS in the genome scale. However, owing to the looseness of the TFBS’s conservation, these strategies were not capable of effectively identifying TFBS in genome scale. For this reason, the approach, including the PWM and the analysis of TFBS contexts, were developed to overcome the shortage. The fundamental nature of the aforementioned approaches was in fact to develop appropriate algorithms that will describe the properties of the TFBSs and their contexts.
     In the second section of this thesis, we reported a novel computational strategy to determine the DREB transcription factor binding sites in Arabidopsis genome by combination of the context analysis for the TFBS and machine learning approach.
     Dehydration responsive element binding proteins (DREBs) are important transcription factors that induce the expression of a series of abiotic stress-related genes and impart stress endurance to plants. They belong to the ethylene responsive element binding factors (AP2-EREBPs) super family of 124 members (so-called ERF proteins), and among which 57 proteins are in the DREB subfamily. The ERF proteins share a conserved DNA binding domain (ERF domain) of 58–60 amino acids that, reportedly, binds to two typical cis-acting elements, that is, the GCC-box, and the C-repeat CRT/dehydration responsive element (DRE) motif and involves in the expression of cold and dehydration responsive genes. It is important to identify the target genes of DREBs in Arabidopsis since the DREBs play a vital role in various types of biotic and abiotic stress responses. Maruyama, et al identified the downstream genes of the DREB1A/CBF3 using two microarray systems. Fowler and Thomashow, Taji et al also reported the downstream genes of DREBs proteins. Nevertheless, the overall target genes of DREBs are yet to be discovered.
     The differences between the DRE frame sequences (DNA fragments of 206 bp, which were retrieved from the PPRs of MGs, contained a DRE motif (A/GCCGAC) at their center region) and non-DRE frame sequences (DNA fragments of 206 bp, which were collected randomly from the PPRs of Arabidopsis genome, with a DRE motif inserted artificially at their center region) were given focus. A machine learning approach, specifically the support vector machine (SVM) based classifier, was developed to categorize DRE-containing sequences into DFSs and nDFSs. Our results suggested that this algorithm was effective in the discovery of the DREB binding sites in the promoter region of the target genes, so as to infer the target genes of DREBs in Arabidopsis. Furthermore, we predicted 474 candidate genes as the direct targets of DREBs. With Reference to the AtGenExpress microarray data, we achieved the 268 direct targets of DREBs that was inducible by abiotic stress stimuli such as cold, salinity and drought during a 24 hours observation. The results obtained in this study provided the primary information that warranted further experimental investigation regarding the anti-stress regulatory network of DREBs in plants.
引文
[1] O'Donnell, P.J., et al., Ethylene as a Signal Mediating the Wound Response of Tomato Plants. Science, 1996. 274(5294): p. 1914-7.
    [2] Penninckx, I.A., et al., Pathogen-induced systemic activation of a plant defensin gene in Arabidopsis follows a salicylic acid-independent pathway. Plant Cell, 1996. 8(12): p. 2309-23.
    [3] Buttner, M. and K.B. Singh, Arabidopsis thaliana ethylene-responsive element binding protein (AtEBP), an ethylene-inducible, GCC box DNA-binding protein interacts with an ocs element binding protein. Proc Natl Acad Sci U S A, 1997. 94(11): p. 5961-6.
    [4] Yamaguchi-Shinozaki, K. and K. Shinozaki, A novel cis-acting element in an Arabidopsis gene is involved in responsiveness to drought, low-temperature, or high-salt stress. Plant Cell, 1994. 6(2): p. 251-64.
    [5] Riechmann, J.L., et al., Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science, 2000. 290(5499): p. 2105-10.
    [6] Ohme-Takagi, M. and H. Shinshi, Ethylene-inducible DNA binding proteins that interact with an ethylene-responsive element. Plant Cell, 1995. 7(2): p. 173-82.
    [7] Hao, D., M. Ohme-Takagi, and A. Sarai, Unique mode of GCC box recognition by the DNA-binding domain of ethylene-responsive element-binding factor (ERF domain) in plant. J Biol Chem, 1998. 273(41): p. 26857-61.
    [8] Allen, M.D., et al., A novel mode of DNA recognition by a beta-sheet revealed by the solution structure of the GCC-box binding domain in complex with DNA. Embo J, 1998. 17(18): p. 5484-96.
    [9] Sessa, G., Y. Meller, and R. Fluhr, A GCC element and a G-box motif participate in ethylene-induced expression of the PRB-1b gene. Plant Mol Biol, 1995. 28(1): p. 145-53.
    [10] Shinshi, H., S. Usami, and M. Ohme-Takagi, Identification of an ethylene-responsive region in the promoter of a tobacco class I chitinase gene. Plant Mol Biol, 1995. 27(5): p. 923-32.
    [11] Deshpande, N., et al., The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res, 2005. 33(Database issue): p. D233-7.
    [12] Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42.
    [13] Thompson, J.D., et al., The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res, 1997. 25(24): p. 4876-82.
    [14] InsightII, version 1998. Accelrys Inc., San Diego, 2000.
    [15] Kale, L., Skeel, R., Bhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N. et al., NAMD2: greater scalability for parallel molecular dynamics. J. Comput. Phys., 1999. 151: p. 283-312.
    [16] MacKerell, A.D., Bashford, D., Bellott, M., Dunbrack, R. L., Evanseck, J. D., Field, M. J. et al., All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B., 1998. 102: p. 3586-3616.
    [17] Jorgensen, W.L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L., Comparison of simple potential functions for simulating liquid water. J. Chem. Phys., 1983. 79: p. 926-935.
    [18] Ryckaert, J.P., Ciccoti, G. & Berendsen, H. J. C., Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys., 1997. 23: p. 3327-341.
    [19] Darden, T.A., York, D. and Pedersen, L. G., Particle mesh Ewald: an Nplog(N) method for computing Ewald sums. J. Chem. Phys., 1993. 98: p. 10089-92.
    [20] Humphrey, W., A. Dalke, and K. Schulten, VMD: visual molecular dynamics. J. Mol. Graph, 1996. 14: p. 27–38.
    [21] Lee, H.E., et al., Ethylene responsive element binding protein 1 (StEREBP1) from Solanum tuberosum increases tolerance to abiotic stress in transgenic potato plants. Biochem Biophys Res Commun, 2007. 353(4): p. 863-8.
    [22] Duan, H., et al., Cloning and characterization of two EREBP transcription factors from cotton (Gossypium hirsutum L.). Biochemistry (Mosc), 2006. 71(3): p. 285-93.
    [23] Huang, Z., et al., Tomato TERF1 modulates ethylene response and enhances osmotic stress tolerance by activating expression of downstream genes. FEBSLett, 2004. 573(1-3): p. 110-6.
    [24] Tournier, B., et al., New members of the tomato ERF family show specific expression pattern and diverse DNA-binding capacity to the GCC box element. FEBS Lett, 2003. 550(1-3): p. 149-54.
    [1] Weinmann, A.S., Novel ChIP-based strategies to uncover transcription factor target genes in the immune system. Nat Rev Immunol, 2004. 4(5): p. 381-6.
    [2] Gibbons, F.D., et al., Chipper: discovering transcription-factor targets from chromatin immunoprecipitation microarrays using variance stabilization. Genome Biol, 2005. 6(11): p. R96.
    [3] Maruyama, K., et al., Identification of cold-inducible downstream genes of the Arabidopsis DREB1A/CBF3 transcriptional factor using two microarray systems. Plant J, 2004. 38(6): p. 982-93.
    [4] Tian, B., et al., Identification of direct genomic targets downstream of the nuclear factor-kappaB transcription factor mediating tumor necrosis factor signaling. J Biol Chem, 2005. 280(17): p. 17435-48.
    [5] Redestig, H., et al., Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana. BMC Bioinformatics, 2007. 8(1): p. 454.
    [6] Horsman, S., et al., TF Target Mapper: a BLAST search tool for the identification of Transcription Factor target genes. BMC Bioinformatics, 2006. 7: p. 120.
    [7] Zhang, W., et al., Cis-regulatory element based targeted gene finding: genome-wide identification of abscisic acid- and abiotic stress-responsive genes in Arabidopsis thaliana. Bioinformatics, 2005. 21(14): p. 3074-81.
    [8] Jolly, E.R., et al., Genome-wide identification of the regulatory targets of a transcription factor using biochemical characterization and computational genomic analysis. BMC Bioinformatics, 2005. 6: p. 275.
    [9] Chan, B.Y. and D. Kibler, Using hexamers to predict cis-regulatory motifs in Drosophila. BMC Bioinformatics, 2005. 6: p. 262.
    [10] Bigelow, H.R., et al., CisOrtho: a program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting. BMC Bioinformatics, 2004. 5: p. 27.
    [11] Qian, J., et al., Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data. Bioinformatics, 2003. 19(15): p. 1917-26.
    [12] Dieterich, C., R. Herwig, and M. Vingron, Exploring potential target genes of signaling pathways by predicting conserved transcription factor binding sites. Bioinformatics, 2003. 19 Suppl 2: p. II50-II56.
    [13] GuhaThakurta, D. and G.D. Stormo, Identifying target sites for cooperatively binding factors. Bioinformatics, 2001. 17(7): p. 608-21.
    [14] Qian, J., et al., Identification of regulatory targets of tissue-specific transcription factors: application to retina-specific gene regulation. Nucleic Acids Res, 2005. 33(11): p. 3479-91.
    [15] Kasuga, M., et al., Improving plant drought, salt, and freezing tolerance by gene transfer of a single stress-inducible transcription factor. Nat Biotechnol, 1999. 17(3): p. 287-91.
    [16] Riechmann, J.L., et al., Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science, 2000. 290(5499): p. 2105-10.
    [17] Allen, M.D., et al., A novel mode of DNA recognition by a beta-sheet revealed by the solution structure of the GCC-box binding domain in complex with DNA. Embo J, 1998. 17(18): p. 5484-96.
    [18] Hong, J.P. and W.T. Kim, Isolation and functional characterization of the Ca-DREBLP1 gene encoding a dehydration-responsive element binding-factor-like protein 1 in hot pepper (Capsicum annuum L. cv. Pukang). Planta, 2005. 220(6): p. 875-88.
    [19] Agarwal, P.K., et al., Role of DREB transcription factors in abiotic and biotic stress tolerance in plants. Plant Cell Rep, 2006. 25(12): p. 1263-74.
    [20] Wang, J.W., et al., Induced expression of DREB transcriptional factor and study on its physiological effects of drought tolerance in transgenic wheat. Yi Chuan Xue Bao, 2006. 33(5): p. 468-76.
    [21] Agarwal, P., et al., Stress-inducible DREB2A transcription factor from Pennisetum glaucum is a phosphoprotein and its phosphorylation negatively regulates its DNA-binding activity. Mol Genet Genomics, 2007. 277(2): p. 189-98.
    [22] Fowler, S. and M.F. Thomashow, Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. Plant Cell, 2002. 14(8): p.1675-90.
    [23] Taji, T., et al., Important roles of drought- and cold-inducible genes for galactinol synthase in stress tolerance in Arabidopsis thaliana. Plant J, 2002. 29(4): p. 417-26.
    [24] Baten, A., et al., Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics, 2006. 7 Suppl 5: p. S15.
    [25] Cui, J., et al., Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties. Mol Immunol, 2007. 44(5): p. 866-77.
    [26] Ogul, H. and E.U. Mumcuoglu, SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees. Comput Biol Chem, 2006. 30(4): p. 292-9.
    [27] Wang, H., et al., An SVM scorer for more sensitive and reliable peptide identification via tandem mass spectrometry. Pac Symp Biocomput, 2006: p. 303-14.
    [28] Wee, L.J., T.W. Tan, and S. Ranganathan, SVM-based prediction of caspase substrate cleavage sites. BMC Bioinformatics, 2006. 7 Suppl 5: p. S14.
    [29] Xue, C.X., et al., Support vector machines-based quantitative structure-property relationship for the prediction of heat capacity. J Chem Inf Comput Sci, 2004. 44(4): p. 1267-74.
    [30] Yu, G.X., et al., An SVM-based algorithm for identification of photosynthesis-specific genome features. Proc IEEE Comput Soc Bioinform Conf, 2003. 2: p. 235-43.
    [31] Dong, J.X., A. Krzyzak, and C.Y. Suen, Fast SVM training algorithm with decomposition on very large data sets. IEEE Trans Pattern Anal Mach Intell, 2005. 27(4): p. 603-18.
    [32] Xiong, Y. and S.Z. Fei, Functional and phylogenetic analysis of a DREB/CBF-like gene in perennial ryegrass (Lolium perenne L.). Planta, 2006. 224(4): p. 878-88.
    [33] Li, X.P., et al., Soybean DRE-binding transcription factors that are responsive to abiotic stresses. Theor Appl Genet, 2005. 110(8): p. 1355-62.
    [34] Kasukabe, Y., et al., Overexpression of spermidine synthase enhances tolerance to multiple environmental stresses and up-regulates the expression ofvarious stress-regulated genes in transgenic Arabidopsis thaliana. Plant Cell Physiol, 2004. 45(6): p. 712-22.
    [35] Chen, J.Q., et al., An AP2/EREBP-type transcription-factor gene from rice is cold-inducible and encodes a nuclear-localized protein. Theor Appl Genet, 2003. 107(6): p. 972-9.
    [36] Sakuma, Y., et al., Functional analysis of an Arabidopsis transcription factor, DREB2A, involved in drought-responsive gene expression. Plant Cell, 2006. 18(5): p. 1292-309.
    [37] Seki, M., et al., Monitoring the expression profiles of 7000 Arabidopsis genes under drought, cold and high-salinity stresses using a full-length cDNA microarray. Plant J, 2002. 31(3): p. 279-92.
    [38] Yamaguchi, K., et al., Activating transcription factor 3 and early growth response 1 are the novel targets of LY294002 in a phosphatidylinositol 3-kinase-independent pathway. Cancer Res, 2006. 66(4): p. 2376-84.
    [39] Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9.
    [40] Vavouri, T. and G. Elgar, Prediction of cis-regulatory elements using binding site matrices--the successes, the failures and the reasons for both. Curr Opin Genet Dev, 2005. 15(4): p. 395-402.
    [41] Kel, A.E., et al., Computer-assisted identification of cell cycle-related genes: new targets for E2F transcription factors. J Mol Biol, 2001. 309(1): p. 99-120.
    [42] Rebeiz, M., N.L. Reeves, and J.W. Posakony, SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation. Proc Natl Acad Sci U S A, 2002. 99(15): p. 9888-93.
    [43] Japkowicz, N., The class imbalance problem: significance and strategies. Proceedings of the 2000 International Conference on Artificial Intelligence, 2000. 1.
    [44] Matys, V., et al., TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res, 2003. 31(1): p. 374-8.
    [45] Sandelin, A., et al., JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res, 2004. 32(Database issue): p. D91-4.
    [46] Dai, X., et al., Overexpression of an R1R2R3 MYB gene, OsMYB3R-2, increases tolerance to freezing, drought, and salt stress in transgenic Arabidopsis. Plant Physiol, 2007. 143(4): p. 1739-51.
    [47] Matsushima, R., et al., An endoplasmic reticulum-derived structure that is induced under stress conditions in Arabidopsis. Plant Physiol, 2002. 130(4): p. 1807-14.
    [48] Prak, S., et al., Multiple phosphorylations in the C-terminal tail of plant plasma membrane aquaporins. Role in sub-cellular trafficking of AtPIP2;1 in response to salt stress. Mol Cell Proteomics, 2008.
    [49] Wang, M., et al., A novel MAP kinase gene in cotton (Gossypium hirsutum L.), GhMAPK, is involved in response to diverse environmental stresses. J Biochem Mol Biol, 2007. 40(3): p. 325-32.
    [50] Chitteti, B.R. and Z. Peng, Proteome and phosphoproteome differential expression under salinity stress in rice (Oryza sativa) roots. J Proteome Res, 2007. 6(5): p. 1718-27.
    [51] Boudsocq, M., et al., Different phosphorylation mechanisms are involved in the activation of sucrose non-fermenting 1 related protein kinases 2 by osmotic stresses and abscisic acid. Plant Mol Biol, 2007. 63(4): p. 491-503.
    [52] Chae, M.J., et al., A rice dehydration-inducible SNF1-related protein kinase 2 phosphorylates an abscisic acid responsive element-binding factor and associates with ABA signaling. Plant Mol Biol, 2007. 63(2): p. 151-69.
    [53] Schweighofer, A. and I. Meskiene, Regulation of stress hormones jasmonates and ethylene by MAPK pathways in plants. Mol Biosyst, 2008. 4(8): p. 799-803.
    [54] D'Angelo, C., et al., Alternative complex formation of the Ca-regulated protein kinase CIPK1 controls abscisic acid-dependent and independent stress responses in Arabidopsis. Plant J, 2006. 48(6): p. 857-72.
    [55] Kiegerl, S., et al., SIMKK, a mitogen-activated protein kinase (MAPK) kinase, is a specific activator of the salt stress-induced MAPK, SIMK. Plant Cell, 2000. 12(11): p. 2247-58.
    [56] Khan, M., H. Takasaki, and S. Komatsu, Comprehensive Phosphoproteome Analysis in Rice and Identification of Phosphoproteins Responsive to Different Hormones/Stresses. 2005. p. 1592-1599.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700