A new rhesus macaque assembly and annotation for next-generation sequencing analyses
详细信息    查看全文
  • 作者:Aleksey V Zimin (1)
    Adam S Cornish (2)
    Mnirnal D Maudhoo (2)
    Robert M Gibbs (2)
    Xiongfei Zhang (2)
    Sanjit Pandey (2)
    Daniel T Meehan (2)
    Kristin Wipfler (2)
    Steven E Bosinger (3)
    Zachary P Johnson (3)
    Gregory K Tharp (3)
    Guillaume Mar莽ais (1)
    Michael Roberts (1)
    Betsy Ferguson (4)
    Howard S Fox (5)
    Todd Treangen (6) (7)
    Steven L Salzberg (6)
    James A Yorke (1)
    Robert B Norgren Jr (2)

    1. Institute for Physical Science and Technology
    ; University of Maryland ; College Park ; Maryland ; 20742 ; USA
    2. Department of Genetics
    ; Cell Biology and Anatomy ; University of Nebraska Medical Center ; Omaha ; Nebraska ; 68198 ; USA
    3. Non-Human Primate Genomics Core
    ; Yerkes National Primate Research Center ; Robert W. Woodruff Health Sciences Center ; Emory University ; Atlanta ; Georgia ; 30322 ; USA
    4. Division of Neurosciences
    ; Primate Genetics Program ; Oregon National Primate Research Center ; Oregon Health & Sciences University ; Beaverton ; Oregon ; 97006 ; USA
    5. Department of Pharmacology and Experimental Neuroscience
    ; University of Nebraska Medical Center ; Omaha ; Nebraska ; 68198 ; USA
    6. Center for Computational Biology and Department of Biomedical Engineering
    ; Johns Hopkins University School of Medicine ; Baltimore ; Maryland ; 21205 ; USA
    7. National Biodefense Analysis and Countermeasures Center
    ; Frederick ; MD ; 21702 ; USA
  • 关键词:Macaca mulatta ; Rhesus macaque ; Genome ; Assembly ; Annotation ; Transcriptome ; Next ; generation sequencing
  • 刊名:Biology Direct
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:9
  • 期:1
  • 全文大小:1,569 KB
  • 参考文献:1. Gibbs, RA, Rogers, J, Katze, MG, Bumgarner, R, Weinstock, GM, Mardis, ER, Remington, KA, Strausberg, RL, Venter, JC, Wilson, RK, Batzer, MA, Bustamante, CD, Eichler, EE, Hahn, MW, Hardison, RC, Makova, KD, Miller, W, Milosavljevic, A, Palermo, RE, Siepel, A, Sikela, JM, Attaway, T, Bell, S, Bernard, KE, Buhay, CJ, Chandrabose, MN, Dao, M, Davis, C, Delehaunty, KD, Ding, Y (2007) Evolutionary and biomedical insights from the rhesus macaque genome. Science 316: pp. 222-234 CrossRef
    2. Vallender, EJ (2011) Expanding whole exome resequencing into non-human primates. Genome Biol 12: pp. R87 CrossRef
    3. Zhang, X, Goodsell, J, Norgren, RB (2012) Limitations of the rhesus macaque draft genome assembly and annotation. BMC Genomics 13: pp. 206 CrossRef
    4. Norgren, RB (2013) Improving genome assemblies and annotations for nonhuman primates. ILAR J 54: pp. 144-153 CrossRef
    5. Roberto, R, Misceo, D, D鈥橝ddabbo, P, Archidiacono, N, Rocchi, M (2008) Refinement of macaque synteny arrangement with respect to the official rheMac2 macaque sequence assembly. Chromosome Res 16: pp. 977-985 CrossRef
    6. Zhang, SJ, Liu, CJ, Shi, M, Kong, L, Chen, JY, Zhou, WZ, Zhu, X, Yu, P, Wang, J, Yang, X, Hou, N, Ye, Z, Zhang, R, Xiao, R, Zhang, X, Li, CY (2013) RhesusBase: a knowledgebase for the monkey research community. Nucleic Acids Res 41: pp. D892-D905 CrossRef
    7. Peng, X, Pipes, L, Xiong, H, Green, RR, Jones, DC, Ruzzo, WL, Schroth, GP, Mason, CE, Palermo, RE, Katze, MG (2014) Assessment and improvement of Indian-origin rhesus macaque and Mauritian-origin cynomolgus macaque genome annotations using deep transcriptome sequencing data. J Med Primatol 43: pp. 317-328 CrossRef
    8. Yan, G, Zhang, G, Fang, X, Zhang, Y, Li, C, Ling, F, Cooper, DN, Li, Q, Li, Y, van Gool, AJ, Du, H, Chen, J, Chen, R, Zhang, P, Huang, Z, Thompson, JR, Meng, Y, Bai, Y, Wang, J, Zhuo, M, Wang, T, Huang, Y, Wei, L, Li, J, Wang, Z, Hu, H, Yang, P, Le, L, Stenson, PD, Li, B (2011) Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol 29: pp. 1019-1023 CrossRef
    9. Zimin, AV, Mar莽ais, G, Puiu, D, Roberts, M, Salzberg, SL, Yorke, JA (2013) The MaSuRCA genome assembler. Bioinformatics 29: pp. 2669-2677 CrossRef
    10. Altschul, SF, Gish, W, Miller, W, Myers, EW, Lipman, DJ (1990) Basic local alignment search tool. J Mol Biol 215: pp. 403-410 CrossRef
    11. Karere, GM, Froenicke, L, Millon, L, Womack, JE, Lyons, LA (2008) A high-resolution radiation hybrid map of rhesus macaque chromosome 5 identifies rearrangements in the genome assembly. Genomics 92: pp. 210-218 CrossRef
    12. Murphy, WJ, Agarwala, R, Sch盲ffer, AA, Stephens, R, Smith, C, Crumpler, NJ, David, VA, O鈥橞rien, SJ (2005) A rhesus macaque radiation hybrid map and comparative analysis with the human genome. Genomics 86: pp. 383-395 CrossRef
    13. Ventura, M, Ventura, M, Antonacci, F, Cardone, MF, Stanyon, R, D鈥橝ddabbo, P, Cellamare, A, Sprague, LJ, Eichler, EE, Archidiacono, N, Rocchi, M (2007) Evolutionary formation of new centromeres in macaque. Science 316: pp. 243-246 CrossRef
    14. Rocchi, M (2013) Synteny block organization of Macaca mulatta.
    15. Zhang, Z, Schwartz, S, Wagner, L, Miller, W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7: pp. 203-214 CrossRef
    16. Wienberg, J, Stanyon, R, Jauch, A, Cremer, T (1992) Homologies in human and Macaca fuscata chromosomes revealed by in situ suppression hybridization with human chromosome specific DNA libraries. Chromosoma 101: pp. 265-270 CrossRef
    17. Rogers, J, Garcia, R, Shelledy, W, Kaplan, J, Arya, A, Johnson, Z, Bergstrom, M, Novakowski, L, Nair, P, Vinson, A, Newman, D, Heckman, G, Cameron, J (2006) An initial genetic linkage map of the rhesus macaque (Macaca mulatta) genome using human microsatellite loci. Genomics 87: pp. 30-38 CrossRef
    18. Homer N, Merriman B: TMAP: the Torrent Mapping Alignment Program. [https://github.com/iontorrent/TS/tree/master/Analysis/TMAP]
    19. Zerbino, DR, Birney, E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18: pp. 821-829 CrossRef
    20. Schulz, MH, Zerbino, DR, Vingron, M, Birney, E (2012) Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28: pp. 1086-1092 CrossRef
    21. Kim, D, Pertea, G, Trapnell, C, Pimentel, H, Kelley, R, Salzberg, SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14: pp. R36 CrossRef
    22. Trapnell, C, Hendrickson, DG, Sauvageau, M, Goff, L, Rinn, JL, Pachter, L (2012) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31: pp. 46-53 CrossRef
    23. Gish, W, States, DJ (1993) Identification of protein coding regions by database similarity search. Nat Genet 3: pp. 266-272 CrossRef
    24. Zhou, L, Pertea, M, Delcher, AL, Florea, L (2009) Sim4cc: a cross-species spliced alignment program. Nucleic Acids Res 37: pp. e80 CrossRef
    25. Wu, TD, Watanabe, CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21: pp. 1859-1875 CrossRef
    26. Trapnell, C, Williams, BA, Pertea, G, Mortazavi, A, Kwan, G, van Baren, MJ, Salzberg, SL, Wold, BJ, Pachter, L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28: pp. 511-515 CrossRef
    27. Needleman, SB, Wunsch, CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: pp. 443-453 CrossRef
    28. Robinson, JT, Thorvaldsd贸ttir, H, Winckler, W, Guttman, M, Lander, ES, Getz, G, Mesirov, JP (2011) Integrative genomics viewer. Nat Biotechnol 29: pp. 24-26 CrossRef
    29. Keibler, E, Brent, MR (2003) Eval: a software package for analysis of genome annotations. BMC Bioinformatics 4: pp. 50 CrossRef
    30. NCBI: Macaca mulatta GFF FTP site 2012. [ftp://ftp.ncbi.nih.gov/genomes/Macaca_mulatta/GFF/ref_Primary_Assembly_top_level.gff3.gz]
    31. Rice, P, Longden, I, Bleasby, A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16: pp. 276-277 CrossRef
    32. Lipman, DJ, Pearson, WR (1985) Rapid and sensitive protein similarity searches. Science 227: pp. 1435-1441 CrossRef
    33. Dobin, A, Schlesinger, F, Drenkow, J, Zaleski, C, Jha, S, Batut, P, Chaisson, M, Gingeras, TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29: pp. 15-21 CrossRef
    34. Narzisi, G, Mishra, B (2011) Comparing de novo genome assembly: The long and short of it. PLoS One 6: pp. e19175 CrossRef
    35. Salzberg, SL, Phillippy, AM, Zimin, A, Puiu, D, Magoc, T, Koren, S, Treangen, TJ, Schatz, MC, Delcher, AL, Roberts, M, Mar莽ais, G, Pop, M, Yorke, JA (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22: pp. 557-567 CrossRef
    36. Zimin, AV, Roberts, M, Mar莽ais, G, Salzberg, SL, Yorke, JA (2012) Mis-assembled 鈥渟egmental duplications鈥?in two versions of the Bos taurus genome. PLoS One 7: pp. e42680 CrossRef
    37. Hunt, M, Newbold, C, Berriman, M, Otto, TD (2014) A comprehensive evaluation of assembly scaffolding tools. Genome Biol 15: pp. R42 CrossRef
    38. Shiina, T, Ota, M, Shimizu, S, Katsuyama, Y, Hashimoto, N, Takasu, M, Anzai, T, Kulski, JK, Kikkawa, E, Naruse, T, Kimura, N, Yanagiya, K, Watanabe, A, Hosomichi, K, Kohara, S, Iwamoto, C, Umehara, Y, Meyer, A, Wanner, V, Sano, K, Macquin, C, Ikeo, K, Tokunaga, K, Gojobori, T, Inoko, H, Bahram, S (2006) Rapid evolution of major histocompatibility complex class I genes in primates generates new disease alleles in humans via hitchhiking diversity. Genetics 1731: pp. 1555-1570 CrossRef
    39. Daza-Vamenta, R, Glusman, G, Rowen, L, Guthrie, B, Geraght, DE (2004) Genetic divergence of the rhesus macaque major histocompatibility complex. Genome Res 14: pp. 1501-1515 CrossRef
    40. Tung, J, Barreiro, LB, Johnson, ZP, Hansen, KD, Michopoulos, V, Toufexis, D, Michelini, K, Wilson, ME, Gilad, Y (2012) Social environment is associated with gene regulatory variation in the rhesus macaque immune system. Proc Natl Acad Sci 109: pp. 6490-6495 CrossRef
    41. Kalin, NH (2003) Nonhuman primate studies of fear, anxiety, and temperament and the role of benzodiazepine receptors and GABA systems. J Clin Psychiatry 64: pp. 41-44
    42. Vallender, EJ (2009) Bioinformatic approaches to identifying orthologs and assessing evolutionary relationships. Methods 49: pp. 50-55 CrossRef
    43. Nagy, A, Hegyi, H, Farkas, K, Tordai, H, Kozma, E, B谩nyai, L, Patthy, L (2008) Identification and correction of abnormal, incomplete and mispredicted proteins in public databases. BMC Bioinformatics 9: pp. 353 CrossRef
    44. Nagy, A, Szl谩ma, G, Szarka, E, Trexler, M, B谩nyai, L, Patthy, L (2011) Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors. Genes 2: pp. 449-501 CrossRef
    45. Ebeling, M, K眉ng, E, See, A, Broger, C, Steiner, G, Berrera, M, Heckel, T, Iniguez, L, Albert, T, Schmucki, R, Biller, H, Singer, T, Certa, U (2011) Genome-based analysis of the nonhuman primate Macaca fascicularis as a model for drug safety assessment. Genome Res 21: pp. 1746-1756 CrossRef
    46. Sandler, NG, Bosinger, S, Estes, J, Zhu, R, Tharp, G, Boritz, E, Levin, D, Wijeyesinghe, S, Makamdop, KN, Del Prete, G, Hill, B, Timmer, J, Reiss, E, Darko, S, Contijoch, E, Todd, JP, Silvestri, G, Nason, M, Norgren, RB, Keele, N, Rao, S, Langer, J, Lifson, J, Schreiber, G, Douek, DC (2014) Type I IFN responses in rhesus macaques prevent SIV transmission and slow disease progression. Nature 511: pp. 601-605 CrossRef
  • 刊物主题:Life Sciences, general;
  • 出版者:BioMed Central
  • ISSN:1745-6150
文摘
Background The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses. Results We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies. Conclusions The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates. Reviewers This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700