SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

详细信息查看全文

作者：Federico Agostini (1) (2)
Davide Cirillo (1) (2)
Riccardo Delli Ponti (1) (2)
Gian Gaetano Tartaglia (1) (2) (3)

1. Gene Function and Evolution ; Centre for Genomic Regulation (CRG) ; C/ Dr. Aiguader 88 ; 08003 ; Barcelona ; Spain
2. Universitat Pompeu Fabra (UPF) ; C/ Dr. Aiguader 88 ; 08003 ; Barcelona ; Spain
3. Instituci贸 Catalana de Recerca i Estudis Avan莽ats (ICREA) ; 23 Passeig Llu铆s Companys ; 08010 ; Barcelona ; Spain
关键词：Discriminative motif discovery ; Nucleic acids ; ChIP ; seq ; CLIP ; seq
刊名：BMC Genomics
出版年：2014
出版时间：December 2014
年：2014
卷：15
期：1
全文大小：1,333 KB
参考文献：1. Coulon, A, Chow, CC, Singer, RH, Larson, DR (2013) Eukaryotic transcriptional dynamics: from single molecules to cell populations. Nat Rev Genet 14: pp. 572-584 CrossRef
2. Janga, SC (2012) From specific to global analysis of posttranscriptional regulation in eukaryotes: posttranscriptional regulatory networks. Brief Funct Genomics 11: pp. 505-521 CrossRef
3. Pichon, X, Wilson, LA, Stoneley, M, Bastide, A, King, HA, Somers, J, Willis, AEE (2012) RNA binding protein/RNA element interactions and the control of translation. Curr Protein Peptide Sci 13: pp. 294-304 CrossRef
4. Koboldt, DC, Steinberg, KM, Larson, DE, Wilson, RK, Mardis, ER (2013) The next-generation sequencing revolution and its impact on genomics. Cell 155: pp. 27-38 CrossRef
5. Dassi, E, Quattrone, A (2012) Tuning the engine: an introduction to resources on post-transcriptional regulation of gene expression. RNA Biol 9: pp. 1224-1232 CrossRef
6. Sinha, S (2003) Discriminative motifs. J Comput Biol: J Comput Mol Cell Biol 10: pp. 599-615 88219" target="_blank" title="It opens in new window">CrossRef
7. Grau, J, Posch, S, Grosse, I, Keilwagen, J (2013) A general approach for discriminative de novo motif discovery from high-throughput data. Nucleic Acids Res 41: pp. 197 CrossRef
8. Yao, Z, Macquarrie, KL, Fong, AP, Tapscott, SJ, Ruzzo, WL, Gentleman, RC (2014) Discriminative motif analysis of high-throughput dataset. Bioinformatics (Oxford, England) 30: pp. 775-783 CrossRef
9. Ma, X, Kulkarni, A, Zhang, Z, Xuan, Z, Serfling, R, Zhang, MQ (2012) A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information. Nucleic Acids Res 40: pp. 50 CrossRef
10. Weirauch, MT, Cote, A, Norel, R, Annala, M, Zhao, Y, Riley, TR, Saez-Rodriguez, J, Cokelaer, T, Vedenko, A, Talukder, S, Bussemaker, HJ, Morris, QD, Bulyk, ML, Stolovitzky, G, Hughes, TR (2013) Evaluation of methods for modeling transcription factor sequence specificity. Nat Biotechnol 31: pp. 126-134 CrossRef
11. Bailey, TL (2011) DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics (Oxford, England) 27: pp. 1653-1659 CrossRef
12. Mason, MJ, Plath, K, Zhou, Q (2010) Identification of context-dependent motifs by contrasting ChIP, binding data. Bioinformatics (Oxford, England) 26: pp. 2826-2832 CrossRef
13. Huggins, P, Zhong, S, Shiff, I, Beckerman, R, Laptenko, O, Prives, C, Schulz, MH, Simon, I, Bar-Joseph, Z (2011) DECOD: fast and accurate discriminative DNA motif finding. Bioinformatics (Oxford, England) 27: pp. 2361-2367 CrossRef
14. Luehr, S, Hartmann, H, S枚ding, J (2012) The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences. Nucleic Acids Res 40: pp. 104-109 CrossRef
15. Fauteux, F, Blanchette, M, Str枚mvik, MV (2008) Seeder: discriminative seeding DNA motif discovery. Bioinformatics 24: pp. 2303-2307 CrossRef
16. Giardine, B, Riemer, C, Hardison, RC, Burhans, R, Elnitski, L, Shah, P, Zhang, Y, Blankenberg, D, Albert, I, Taylor, J, Miller, W, Kent, WJ, Nekrutenko, A (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15: pp. 1451-1455 CrossRef
17. Bailey, TL, Elkan, C (1995) The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol; ISMB. Int Conf Intell Syst Mol Biol 3: pp. 21-29
18. Anders, G, Mackowiak, SD, Jens, M, Maaskola, J, Kuntzagk, A, Rajewsky, N, Landthaler, M, Dieterich, C (2012) doRiNA: a database of RNA interactions in post-transcriptional regulation. Nucleic Acids Res 40: pp. 180-186 CrossRef
19. Harrow, J, Frankish, A, Gonzalez, JM, Tapanari, E, Diekhans, M, Kokocinski, F, Aken, BL, Barrell, D, Zadissa, A, Searle, S, Barnes, I, Bignell, A, Boychenko, V, Hunt, T, Kay, M, Mukherjee, G, Rajan, J, Despacio-Reyes, G, Saunders, G, Steward, C, Harte, R, Lin, M, Howald, C, Tanzer, A, Derrien, T, Chrast, J, Walters, N, Balasubramanian, S, Pei, B (2012) GENCODE: the reference human genome annotation for the ENCODE, project. Genome Res 22: pp. 1760-1774 CrossRef
20. Euskirchen, GM, Rozowsky, JS, Wei, C-L, Lee, WH, Zhang, ZD, Hartman, S, Emanuelsson, O, Stolc, V, Weissman, S, Gerstein, MB, Ruan, Y, Snyder, M (2007) Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res 17: pp. 898-909 CrossRef
21. Hafner, M, Landthaler, M, Burger, L, Khorshid, M, Hausser, J, Berninger, P, Rothballer, A, Ascano, M, Jungkamp, A-C, Munschauer, M, Ulrich, A, Wardle, GS, Dewell, S, Zavolan, M, Tuschl, T (2010) PAR-CliP鈥揳 method to identify transcriptome-wide the binding sites of RNA binding proteins. J Visualized Exper: JoVE.
22. Lebedeva, S, Jens, M, Theil, K, Schwanh盲usser, B, Selbach, M, Landthaler, M, Rajewsky, N (2011) Transcriptome-wide analysis of regulatory interactions of the RNA-binding protein HuR. Mol Cell 43: pp. 340-352 CrossRef
23. Kishore, S, Jaskiewicz, L, Burger, L, Hausser, J, Khorshid, M, Zavolan, M (2011) A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins. Nat Methods 8: pp. 559-564 CrossRef
24. Mukherjee, N, Corcoran, DL, Nusbaum, JD, Reid, DW, Georgiev, S, Hafner, M, Ascano, JM, Tuschl, T, Ohler, U, Keene, JD (2011) Integrative regulatory mapping indicates that the RNA-binding, protein HuR couples pre-mRNA processing and mRNA stability. Mol Cell 43: pp. 327-339 CrossRef
25. Hoell, JI, Larsson, E, Runge, S, Nusbaum, JD, Duggimpudi, S, Farazi, TA, Hafner, M, Borkhardt, A, Sander, C, Tuschl, T (2011) RNA targets of wild-type and mutant FET family proteins. Nat Struct Mol Biol 18: pp. 1428-1431 CrossRef
26. Sanford, JR, Wang, X, Mort, M, Vanduyn, N, Cooper, DN, Mooney, SD, Edenberg, HJ, Liu, Y (2009) Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res 19: pp. 381-394 CrossRef
27. Tollervey, JR, Curk, T, Rogelj, B, Briese, M, Cereda, M, Kayikci, M, K枚nig, J, Hortob谩gyi, T, Nishimura, AL, Zupunski, V, Patani, R, Chandran, S, Rot, G, Zupan, B, Shaw, CE, Ule, J (2011) Characterizing the RNA targets and position-dependent splicing regulation by TDP-43. Nat Neurosci 14: pp. 452-458 CrossRef
28. Wang, Z, Kayikci, M, Briese, M, Zarnack, K, Luscombe, NM, Rot, G, Zupan, B, Curk, T, Ule, J (2010) iCLIP predicts the dual splicing effects of TIA-RNA interactions. PLoS Biol 8: pp. 1000530 CrossRef
29. Mathelier, A, Zhao, X, Zhang, AW, Parcy, F, Worsley-Hunt, R, Arenillas, DJ, Buchman, S, Chen, C, Chou, A, Ienasescu, H, Lim, J, Shyr, C, Tan, G, Zhou, M, Lenhard, B, Sandelin, A, Wasserman, WW (2014) JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42: pp. 142-147 CrossRef
30. Jolma, A, Yan, J, Whitington, T, Toivonen, J, Nitta, KR, Rastas, P, Morgunova, E, Enge, M, Taipale, M, Wei, G, Palin, K, Vaquerizas, JM, Vincentelli, R, Luscombe, NM, Hughes, TR, Lemaire, P, Ukkonen, E, Kivioja, T, Taipale, J (2013) DNA-binding specificities of human transcription factors. Cell 152: pp. 327-339 CrossRef
31. Tanaka, E, Bailey, T, Grant, CE, Noble, WS, Keich, U (2011) Improved similarity scores for comparing motifs. Bioinformatics (Oxford, England) 27: pp. 1603-1609 CrossRef
32. Kankainen, M, L枚ytynoja, A (2007) MATLIGN: a motif clustering, comparison and matching tool. BMC Bioinformatics 8: pp. 189 CrossRef
33. Ule, J, Jensen, KB, Ruggiu, M, Mele, A, Ule, A, Darnell, RB (2003) CLIP identifies nova-regulated RNA networks in the brain. Science (New York, N.Y.) 302: pp. 1212-1215 CrossRef
34. Patel, RY, Stormo, GD (2014) Discriminative motif optimization based on perceptron training. Bioinformatics (Oxford, England) 30: pp. 941-948 CrossRef
35. Mathelier, A, Wasserman, WW (2013) The next generation of transcription factor binding site prediction. PLoS Comput Biol 9: pp. 1003214 CrossRef
36. Bellucci, M, Agostini, F, Masin, M, Tartaglia, GG (2011) Predicting protein associations with long noncoding RNAs. Nat Methods 8: pp. 444-445 CrossRef
37. Agostini, F, Zanzoni, A, Klus, P, Marchese, D, Cirillo, D, Tartaglia, GG (2013) catRAPID omics: a web server for large-scale prediction of protein-RNA interactions. Bioinformatics (Oxford, England) 29: pp. 2928-2930 CrossRef
刊物主题：Life Sciences, general; Microarrays; Proteomics; Animal Genetics and Genomics; Microbial Genetics and Genomics; Plant Genetics & Genomics;
出版者：BioMed Central
ISSN：1471-2164

文摘

Background The large amount of data produced by high-throughput sequencing poses new computational challenges. In the last decade, several tools have been developed for the identification of transcription and splicing factor binding sites. Results Here, we introduce the SeAMotE (Sequence Analysis of Motifs Enrichment) algorithm for discovery of regulatory regions in nucleic acid sequences. SeAMotE provides (i) a robust analysis of high-throughput sequence sets, (ii) a motif search based on pattern occurrences and (iii) an easy-to-use web-server interface. We applied our method to recently published data including 351 chromatin immunoprecipitation (ChIP) and 13 crosslinking immunoprecipitation (CLIP) experiments and compared our results with those of other well-established motif discovery tools. SeAMotE shows an average accuracy of 80% in finding discriminative motifs and outperforms other methods available in literature. Conclusions SeAMotE is a fast, accurate and flexible algorithm for the identification of sequence patterns involved in protein-DNA and protein-RNA recognition. The server can be freely accessed at http://s.tartaglialab.com/new_submission/seamote.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700