Parallelizing and optimizing a hybrid differential evolution with Pareto tournaments for discovering motifs in DNA sequences
详细信息    查看全文
  • 作者:David L. González-álvarez (1)
    Miguel A. Vega-Rodríguez (1)
    álvaro Rubio-Largo (1)
  • 关键词:Parallelism ; Hybrid algorithm ; Differential evolution ; Multiobjective optimization ; Motif discovery
  • 刊名:The Journal of Supercomputing
  • 出版年:2014
  • 出版时间:November 2014
  • 年:2014
  • 卷:70
  • 期:2
  • 页码:880-905
  • 全文大小:927 KB
  • 参考文献:1. Bailey TL, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21(1-):51-0
    2. Baldwin N, Collins R, Langston M, Symons C, Leuze M, Voy B (2004) High performance computational tools for motif discovery. In: Proceedings of 18th International parallel and distributed processing symposium (IPDPS-4) pp 1-
    3. Blanco E, Farre D, Alba M, Messenguer X, Guigo R (2006) ABS: a database of annotated regulatory binding sites from orthologous promoters. Nucleic Acids Res 34:D63–D67 CrossRef
    4. Challa S, Thulasiraman P (2008) Protein sequence motif discovery on distributed supercomputer. Adv Grid Pervasive Comput LNCS 5036:232-43 CrossRef
    5. Chapman B, Jost R, van der Pas R (2007) Using OpenMP. MIT Press, Boston
    6. Che D, Song Y, Rashedd K (2005) MDGA: motif discovery using a genetic algorithm. Proceedings of the 2005 Conference on genetic and evolutionary computation (GECCO-5), pp 447-52
    7. Chen C, Schmidt B, Weiguo L, Müller-Wittig W (2008) GPU-MEME: using graphics hardware to accelerate motif finding in DNA sequences. Pattern Recogn Bioinf LNCS 5265:448-59 CrossRef
    8. Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14(6):1188-190 CrossRef
    9. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182-97 CrossRef
    10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1-8
    11. D’haeseleer P (2006) What are DNA sequence motifs? Nat Biotechnol 24(4):423-25 CrossRef
    12. Fogel GB, Porto VW, Varga G, Dow ER, Crave AM, Powers DM, Harlow HB, Su EW, Onyia JE, Su C (2008) Evolutionary computation for discovery of composite transcription factor binding sites. Nucleic Acids Res 36(21):e142, 1-4
    13. Fogel GB, Weekes DG, Varga G, Dow ER, Harlow HB, Onyia JE, Su C (2004) Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucleic Acids Res 32(13):3826-835 CrossRef
    14. González-álvarez DL, Vega-Rodríguez MA (2013) Parallelizing a hybrid multiobjective differential evolution for identifying cis-regulatory elements. In: Proceedings of the 20th European MPI Users’s Group Meeting (EuroMPI 2013), pp 223-28
    15. González-álvarez DL, Vega-Rodríguez MA, Gómez-Pulido JA, Sánchez-Pérez JM (2010) Solving the motif discovery problem by using differential evolution with pareto tournaments. In: IEEE Congress on Evolutionary Computation (CEC-0), pp 4140-147
    16. González-álvarez DL, Vega-Rodríguez MA, Gómez-Pulido JA, Sánchez-Pérez JM (2012) Predicting DNA motifs by using evolutionary multiobjective optimization. IEEE Trans Syst Man Cybernet Part C: Appl Rev 42(6):913-25 CrossRef
    17. Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson Education Limited, Edinburgh
    18. Grundy W, Bailey T, Elkan C (1996) ParaMEME: a parallel implementation and a web interface for a dna and protein motif discovery tool. Computer Appl Biosci 12(4):303-10
    19. Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-):563-77 CrossRef
    20. Jensen S, Liu J (2004) BioOptimizer: a Bayesian scoring function approach to motif discovery. Bioinformatics 20:1557-564 CrossRef
    21. Kaya M (2009) MOGAMOD: multi-objective genetic algorithm for motif discovery. Expert Syst Appl 36(2):1039-047 CrossRef
    22. Kel A, Kel-Margoulis O, Farn
  • 作者单位:David L. González-álvarez (1)
    Miguel A. Vega-Rodríguez (1)
    álvaro Rubio-Largo (1)

    1. ARCO Research Group, Department Technologies of Computers and Communications, University of Extremadura, Escuela Politécnica, Campus Universitario s/n, 10003?, Cáceres, Spain
  • ISSN:1573-0484
文摘
Transcriptional regulation is the main regulation of gene expression, the process by which all prokaryotic organisms and eukaryotic cells transform the information encoded by the nucleic acids (DNA) into the proteins required for their operation and development. A crucial component in genetic regulation is the bindings between transcription factors and DNA sequences that regulate the expression of genes. These specific locations are short and share a common sequence of nucleotides. The discovery of these small DNA strings, also known as motifs, is labor intensive and therefore the use of high-performance computing can be a good way to address it. In this work, we present a parallel multiobjective evolutionary algorithm, a novel hybrid technique based on differential evolution with Pareto tournaments (H-DEPT). To study whether this algorithm is suitable to be parallelized, H-DEPT has been used to solve instances of different sizes on several multicore systems (2, 4, 8, 16, and 32 cores). As we will see, the results show that H-DEPT achieves good speedups and efficiencies. We also compare the predictions made by H-DEPT with those predicted by other biological tools demonstrating that it is also capable of performing quality predictions.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700