用户名: 密码: 验证码:
全基因组拷贝数变异数据库的建立及序列特征分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
目的:1、建立生长发育迟缓智力障碍全基因组拷贝数变异(CNV)数据库2、分析拷贝数变异序列特征,对其形成机制进行初步研究。
     方法:1、针对实验室积累的生长发育迟缓智力障碍患者全基因组拷贝数变异数据,以Windows+Apache+MySQL+PHP为平台开发数据库;2、通过UCSC查询CNVs断裂点区域序列LCRs/SDs的分布特征,使用RepeatMasker对CNVs断裂点区域及504个重组冷点对照中SINEs、LINEs、LTR等重复元件进行分析检测。
     结果:1、构建了生长发育迟缓智力障碍全基因组拷贝数变异(CNV)数据库。该数据库包括管理员登录系统、数据库查询系统和数据库管理系统。收集了来自168个生长发育迟缓智力障碍患者共812个CNVs数据。2、在我们所研究的297个CNVs中近端和远端的断裂点区域都富含LCRs/SDs并具有高度的序列相似性的CNVs有19个,占6.40%;近端和远端的断裂点区域都富含LCRs/SDs但序列相似性较低的CNVs有53个,占17.85%;只有一端断裂点区域包含LCRs/SDs的CNVs有80个,占26.94%;在断裂点附近区域没有LCRs/SDs的CNVs有145个,占48.81%。对断裂点区域及对照中重复元件的分析结果:Alu SINEs(断裂点区域314/504,62.30%;对照338/504,67.06%);MIR SINEs(断裂点区域191/504,37.90%;对照207/504,41.07%);L1 LINEs(断裂点区域328/504,65.10%;对照344/504,68.25%);L2 LINEs(断裂点区域155/504,30.75%;对照171/504,33.93%);L3 LINEs(断裂点区域35/504,6.94%;对照37/504,7.34%);LTR(断裂点区域266/504,52.78%;对照253/504,50.20%)。
     结论:
     1、构建了生长发育迟缓智力障碍全基因组拷贝数变异(CNV)数据库。数据库输入数据快速、完整、可靠;数据查询快速准确。
     2、在我们所研究的297个CNVs中有19个(6.40%)认为是通过NAHR机制形成;其他CNVs的形成与NAHR机制无关。
     3、除了微卫星重复产生的染色体不稳定性可能与CNVs的形成有关,未发现其他重复元件的存在与基因组不稳定的增加、重组率的提高有明显关联。
     4、Alu SINEs与LCRs/SDs的形成和扩展有关。Alu SINEs通过NAHR机制产生LCRs/SDs, LCRs/SDs之间通过NAHR机制产生CNVs,可能是LCRs/SDs介导CNVs形成的机制之一。
Objective:
     1. To establish a Genomic Copy Number Variation Database of Growth and mental retardation.2. To analyze the sequence features of Copy Number Variation in order to study their formation mechanism.
     Methods:
     1. Using Windows+Apache+MySQL+PHP as development platform, to establish the Genomic Copy Number Variation Database of Growth and mental retardation.2. To identify the LCRs/SDs' distributional characteristics in breakpoint regions by the web-based UCSC Genome Browser; To investigate the repeat sequence elements (SINEs, LINEs, LTR et al.) in which these CNVs occurred, the sequences flanking each breakpoint (5kp at each end) and the 504 control sequences were analysed using RepeatMasker.
     Results:
     1. Established the Genomic Copy Number Variation Database of Growth and mental retardation. The database system contained administrator Login system, database query system and database management system. It has collected 812 CNVs datas from 168 patients of Growth and mental retardation.2. Our 297 CNVs of human610-quad beadchip can be grouped into four categories:(1) proximal and distal breakpoint regions are enriched for LCRs with high sequence similarity (19/297; 6.40%), (2) proximal and distal breakpoint regions are enriched for LCRs, but with low sequence similarity (53/297; 17.85%), (3) only one breakpoint region harbours LCRs (80/297; 26.94%) and (4) no LCR lies in the vicinity of both breakpoints (145/297; 48.81%). The results of the repeat sequence elements in breakpoint regions and control sequences:Alu SINEs (breakpoint regions 314/504,62.30%; control 338/504,67.06%); MIR SINEs (breakpoint regions 191/504,37.90%; control 207/504, 41.07%); L1 LINEs (breakpoint regions 328/504,65.10%; control 344/504,68.25%); L2 LINEs (breakpoint regions 155/504,30.75%; control 171/504,33.93%); L3 LINEs (breakpoint regions 35/504,6.94%; control 37/504,7.34%); LTR (breakpoint regions 266/504,52.78%; control 253/504,50.20%).
     Conclusions:
     1. Established the Genomic Copy Number Variation Database of Growth and mental retardation. Data input by using this database is rapid, complete and reliable. Data query is convenient and convincing.
     2. In 24.25% of these CNVs, both breakpoint regions carried LCRs, and in 26.39% of these, their high degree of sequence similarity identified NAHR as the most likely cause of these rearrangements. LCRs at only one of the two breakpoints, as detected in 80/297, and no LCR lies in the vicinity of both breakpoints, as detected in 145/297 are unlikely to be involved in NAHR.
     3. Genomic instability of Microsatellite repeats may be involved in the formation of CNVs. Genomic instability do not show any significant association with other repeaat elements.
     4. Alu mediated NAHR may be involved in the formation of LCRs/SDs. LCRs/SDs mediated NAHR has been suggested as a possible mechanism of CNV formation.
引文
[1]Freeman JL, Perry GH, Feuk L, el al. Copy number variation:new insight in genome diversity. Genone Res,2006,16(8):949-961.
    [2]Lupski JR, Stankiewicz P. Genomic disorders:molecular mechanisms for rearrangements and conveyed phenotypes. PLoS Genet,2005,1(6):e49.
    [3]McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S, Gabriel SB, Lee C, Daly MJ, Altshuler DM. Common deletion polymorphisms in the human genome. Nat Genet,2006,38(1):86-92.
    [4]Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavare S, Deloukas P, Hurles ME, Dermitzakis ET. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science,2007,315(5813): 848-853.
    [5]Kleinjan DA, van Heyningen V. Long-range control of gene expression:emerging mechanisms and disruption in disease. Am J Hum Genet,2005,76(1):8-32.
    [6]Claudia M. B. Carvalho, Feng Zhang, and James R. Lupskia.Genomic disorders: A window into human gene and genome evolution. PNAS,2010,January 26, vol.107:1765-1771.
    [7]Cooper GM, Nickerson DA, Eichler EE (2007) Mutational and selective effects on copy-number variants in the human genome. Nat Genet 39:S22-29.
    [8]Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, et al. (2007) Pairedend mapping reveals extensive structural variation in the human genome. Science 318: 420-426.
    [9]Perry GH, Ben-Dor A, Tsalenko A, Sampas N, Rodriguez-Revenga L, et al. (2008) The fine-scale and complex architecture of human copy-number variation. Am J Hum Genet 82:685-695.
    [10]Carter NP. Methods and strategies for analyzing copy number variation using DNA microarrays.Nat Genet,2007,39(7 Suppl):S16-S21.
    [11]Slater HR,Bailey OK,Ren H,et al.High-resolution identification of chromos-omal abnormalities using oligonucleotide arrays containing 116 204 SNPs.Am J Hum Genet,2005,77(5):709-726.
    [12]Bignell GR,Huang J,Greshock J,et al.High-resolution analysis of DNA copy number using oligonucleotide microarrays.Genome Res,2004,14(2):287-295.
    [13]http://www.zzbaike.com/wiki/WAMP
    [14]http://www.360doc.com/content/09/0909/13/266201_5753398.shtml
    [15]Ken Coar, Rich Bo wen. Apache Cookbook, First Edition.0'Reilly, November2003
    [16]Russel Dyer. MySQL in a Nutshell.0'Reilly, May 2005
    [17]Rasmus lerdorf. Programming PHP.0'Reilly, March 2002
    [18]David Sklar. PHP Cookbook.0'Reilly, November 2002
    [19]http://apmserv.s135.com/
    [20]Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, et al:Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol 4:R25 (2003).
    [21]She X, Jiang Z, Clark RA, Liu G, Cheng Z, et al:Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431:927-930 (2004).
    [22]Woodward KJ, Cundall M, Sperle K, Sistermans EA, Ross M, et al: Heterogeneous duplications in patients with Pelizaeus-Merzbacher disease suggest a mechanism of coupled homologous and nonhomologous recombination. Am J Hum Genet 77:966-987 (2005).
    [23]Richardson C, Jasin M:Coupled homologous and nonhomologous repair of a double-strand break preserves genomic integrity in mammalian cells. Mol Cell Biol 20:9068-9075 (2000).
    [24]Inoue K, Osaka H, Thurston VC, Clarke JT, Yoneyama A, et al:Genomic rearrangements resulting in PLP1 deletion occur by nonhomologous end joining and cause different dysmyelinating phenotypes in males and females. Am J Hum Genet 71:838-853 (2002).
    [25]Shaw CJ, Lupski JR:Non-recurrent 17p11.2 deletions are generated by homologous and non-homologous mechanisms. Hum Genet 116:1-7 (2005).
    [26]Bailey JA, Liu G, Eichler EE (2003) An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet 73:823-834.
    [27]Krawczak M, Cooper DN (1991) Gene deletions causing human genetic disease: mechanisms of mutagenesis and the role of the local DNA sequence environment. Hum Genet 86:425-441.
    [28]http://www.openbioinformatics.org/penncnv/
    [1]Ouzounis, C. A., Valencia, A. (2003)."Bioinformatics."19,2176-2190.
    [2]National Center for Biotechnology Information (NCBI).2001. NCBI Education Site. [Online]. Available:http://www.ncbi.nlm.nih.gov/Education/[November 19, 2001].
    [3]Luscombe NM, Greenbaum D, Gerstein M. (2001). "What is bioinformatics? A proposed definition and overview of the field. " Methods Inf Med.40(4):346-58.
    [4]Sabu M. Thampi. (2009). "Introduction to Bioinformatics CoRR." abs/0911.4230.
    [5]葛剑徽,李成,谢迅雷.“生物信息学发展现状与前景展望.”2008中华临床医学工程及数字医学大会、中华医学会工程学分会第九次学术年会暨国际医疗设备应用安全及质量管理论坛,2008.
    [6]蒋彦.《基础生物信息学及应用》.清华出版社,2003.
    [7]Benson, D. A., I. Karsch-Mizrachi, et al. (2010). "GenBank." Nucleic Acids Res 38(Database issue):D46-51.
    [8]Kulikova, T., R. Akhtar, et al. (2007). "EMBL Nucleotide Sequence Database in 2006." Nucleic Acids Res 35(Database issue):D16-20.
    [9]Boutet, E., D. Lieberherr, et al. (2007). "UniProtKB/Swiss-Prot." Methods Mol Biol 406:89-112.
    [10]Sigrist, C. J., L. Cerutti, et al. (2010). "PROSITE, a protein domain database for functional characterization and annotation." Nucleic Acids Res 38(Database issue):D161-166.
    [11]Bairoch, A. (2000). "The ENZYME database in 2000." Nucleic Acids Res 28(1): 304-305.
    [12]Cuticchia, A. J. (2000). "Future vision of the GDB human genome database." Hum Mutat 15(1):62-67.
    [13]Amberger, J., C. A. Bocchini, et al. (2009). "McKusick's Online Mendelian Inheritance in Man (OMIM)." Nucleic Acids Res 37(Database issue):D793-796.
    [14]Wu, C. and D. W. Nebert (2004). "Update on genome completion and annotations: Protein Information Resource." Hum Genomics 1(3):229-233.
    [15]Rhead, B., D. Karolchik, et al. (2010). "The UCSC Genome Browser database: update 2010." Nucleic Acids Res 38(Database issue):D613-619.
    [16]Walsh, S., M. Anderson, et al. (1998). "ACEDB:a database for genome information." Methods Biochem Anal 39:299-318.
    [17]Dee, C. R. (2007). "The development of the Medical Literature Analysis and Retrieval System (MEDLARS)." J Med Libr Assoc 95(4):416-425.
    [18]J. Zhang, L. Feuk, G.E. Duggan, R. Khaja, S.W. Scherer. (2006)."Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome." Cytogenet Genome Res 115:205-214.
    [19]Mangalam, H. (2002). "The Bio* toolkits--a brief overview." Brief Bioinform 3(3):296-302.
    [20]Stajich, J. E. (2007). "An Introduction to BioPerl." Methods Mol Biol 406: 535-548.
    [21]Holland, R. C., T. A. Down, et al. (2008). "BioJava:an open-source framework for bioinformatics." Bioinformatics 24(18):2096-2097.
    [22]Cock, P. J., T. Antao, et al. (2009). "Biopython:freely available Python tools for computational molecular biology and bioinformatics." Bioinformatics 25(11): 1422-1423.
    [23]Altchul SF, et al. (1997). "Gapped blast and psi-blast:a new generation of protein database search programs. " Nucleic Acids Res.,25(17):3389-3402.
    [24]Pearson, W. R. (1994). "Using the FASTA program to search protein and DNA sequence databases." Methods Mol Biol 24:307-331.
    [25]Rice, P., I. Longden, et al. (2000). "EMBOSS:the European Molecular Biology Open Software Suite." Trends Genet 16(6):276-277.
    [26]Thompson, J. D., T. J. Gibson, et al. (2002). "Multiple sequence alignment using ClustalW and ClustalX." Curr Protoc Bioinformatics Chapter 2:Unit2 3.
    [27]Goodsell, D. S. (2005). "Representing structural information with RasMol." Curr Protoc Bioinformatics Chapter 5:Unit 5 4.
    [28]Li, M., B. Ma, et al. (2004). "Patternhunter Ⅱ:highly sensitive and fast homology search." J Bioinform Comput Biol 2(3):417-439.
    [29]C. Liang.(2001). "COPIA:A New Software for Finding Consensus Patterns in Unaligned. Protein Sequences. " Master's thesis, University of Waterloo.
    [30]RepeatMasker Documentation. http://www.repeatmasker.org/webrepeatmaskerhelp.html
    [31]滕晓坤,肖华胜.“基因芯片与高通量DNA测序技术前景分析.”中国科学C辑:生命科学,2008,38(10):891-899.
    [32]生物信息学发展状况简介.e-Science,2009.
    [33]郑伟国,郭英,常春艳.“生物信息学的现状与未来.”口岸卫生控制,2004,9(5):40-43.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700