文摘
Progress in human genome research has been made in a number of large international projects, including the HapMap, 1000 Genomes (1KGP), ENCyclopedia of DNA elements (ENCODE) and International Human Epigenome Consortium (IHEC) projects, and the data generated from the projects can be used as reference information for human genome studies. However, more specific reference sets are needed at each population level. While a few studies have been conducted for Korean reference sets with a few reference genomes as well as the chip-based Korean SNP and CNV databases, no Korean-specific variation information is constructed as genome scale. Here, we used Korean exomes to construct Korean variation information. Using read data of 100 Korean exomes obtained Korea National Institution of Health (KNIH), we mapped the exome data of each individual on NCBI GRCh37, merged the mapped information, and extracted information on SNPs and indels. We identified a pool of 1,907,598 SNPs and 325,166 indels as initial variations, masked dbSNP the known variation information against 1KGP variation database, and constructed a database of Korean-specific variations. The database can be utilized as a pilot database of Korean exome variation and contribute to Korean variation study with exome chips or whole genome data.KeywordsExome sequencingNGSKorean specificSNPVariants