基于网络模型的基因相关预测问题算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着信息技术的飞速发展及各学科基本理论和技术的不断进步及相互渗透,逐渐兴起了多种交叉学科,例如生物信息学,以信息科学的计算方法和技术为手段,以数学理论和模型为基础,采用物理、化学等学科的理论方法及工具,开展生物学问题的研究;尤其是以计算机为主要分析工具的信息科学为生物信息学的发展提供了强有力的支撑。采用计算理论和方法解决生物信息学领域内的问题已经成为信息科学应用研究中极其重要的一部分。
     立足信息科学开展生物信息学领域内问题的研究,其中重要的一类是针对生物问题的预测算法研究,例如与复杂疾病的发生发展相关的疾病基因预测、基因功能预测、不同类型分子相互作用靶位点预测及相互作用关系的预测等,针对具体问题的预测研究为生物实验的开展和设计提供了有价值的参考,降低了大规模实验筛选的人力和物力消耗,加快了问题研究的进程,针对各种类型预测问题的算法研究不仅仅为生物问题的求解提供了有价值的参考和指引,同时也丰富了信息科学中算法研究的内涵,拓展了算法研究的外延,具有重要的理论意义和应用价值。
     本文以基因相关预测问题为研究对象,以生物网络为基础,从全局的、系统的观点研究问题的网络模型和预测算法。当然提出的网络模型和预测算法也可以应用于其他领域相关问题的研究中。具体而言,本文开展了以下研究工作并做出了相应的贡献:
     1.发现与复杂疾病的发生和发展密切相关的基因,也称之为疾病基因预测,是关系人类健康和医疗的挑战性问题,基因与复杂疾病关系的研究是对基因功能的深入理解和剖析。针对疾病基因预测问题,本文提出了一个新的、系统化的全局方法,通过集成蛋白质相互作用网络、疾病相似性网络和疾病与基因之间的关联网络构建混合网络模型,基于生物学假设—导致相同或者相似疾病的基因在生物网络中距离较近,通过挖掘混合网络模型中基因网络与疾病相似性网络之间的拓扑相关性,以及两个网络之间的关联关系发现新的疾病基因,为此定义基因与疾病之间的关联得分函数,设计实现了迭代算法求解关联得分函数,以关联得分来衡量基因与疾病的关联关系强弱,基于此预测与复杂疾病关联的疾病基因。采用10-fold的交叉验证实验对算法进行了分析与比较,其预测结果显著优于之前的一个经典方法PRINCE。最后,将本章提出的基于混合网络模型的疾病基因预测方法用于乳腺癌、阿尔茨海默病和二型糖尿病,发现了新的潜在疾病基因以及与疾病相关的网络模块,为后续实验研究提供了有价值的参考。
     2.长非编码RNA因其独特的生物特征和复杂的生物功能引起国内外广泛关注,但是目前已知功能的长非编码RNA数量极其有限,随着测序技术的发展,大量的长非编码RNA被实验鉴定,对其功能研究提出了迫切要求。为此,本文对长非编码RNA数据的生物特征进行研究和分析,涉及长非编码RNA的鉴定、数据特征和功能特异性,为大规模长非编码RNA的功能预测做特征分析和筛选。在此基础上,针对其功能研究现状,本文提出了一个基于双色网络模型的全局功能预测方法lnc-GFP(long non-coding RNA Global Function Predictor),该工作第一次采用基于网络的全局策略实现大规模长非编码RNA的功能预测。首先,通过集成基因共表达数据和蛋白质互作数据构建编码基因与非编码基因的双色网络,以反映长非编码RNA与编码基因的功能关联,为基于网络的大规模功能预测提供了基础。其次,设计了基于双色网络模型的信息传播算法,通过已知功能注释信息在双色网络中的迭代传播,实现大规模的长非编码RNA功能预测,通过合适的参数设置,预测准确率高达95%。最后,成功预测了鼠的双色网络中1625条长非编码RNA的功能,进一步对方法的准确性和鲁棒性以及预测结果的可靠性,从多个方面进行了充分的验证,交叉验证的实验结果以及广泛的文献验证结果表明本文提出的方法对于大规模长非编码RNA的功能预测是可靠的。ncFANs(non-coding RNAFunction Annotation server)是致力于长非编码RNA功能注释的第一个在线服务计算平台,在2011年发布。鉴于双色网络模型和大规模功能预测方法lnc-GFP的成功应用,以及长非编码RNA数量的不断增加,本文基于已有的功能注释平台ncFNAs,集成长非编码RNA全局功能预测方法lnc-GFP,设计实现了ncFNAs的升级版本—ncFANs2.0(http://www.bioinfo.org/ncfans/),全局功能预测算法的集成使得ncFANs2.0成为大规模长非编码RNA功能注释的综合服务平台,通过对多种生物数据的分析集成,实现快速大规模非编码基因功能注释。
With the rapid development of information technology, and the continousimprovement of the theory and technology in various disciplines, the mutual penetrationand fusion among different principles are arising. Bioinformatics as the interdisciplineof computational molecular biology and information processing science has attractedmuch attention, in which theories and technology in information, mathematics, physicsand chemistry are applied to tackle the problems in biology. Particularly, computeraided analysis provides sound supporting. As a result, some hard computationalproblems arising from bioinformatics are presented to researchers in information field.
     Various computational prediction problems in bioinformatics are of great challenge,such as predicting candidate genes involved in complex diseases (i.e. disease gene),gene function prediction, microRNA targets prediction, molecule interaction predictionetc. Reliable prediction results can give suggestions and clues to biological experiments.Furthermore, it can make the costs of biological experiments in human and time bedecreased dramatically, and also accelerate the experimental processes. Algorithms formany kinds of prediction problems not only provide answers for biological problems,but also enrich the theory connotation of algorithm itself, and of great value in theorystudy and application fields.
     This dissertation deals with two main biological prediction problems in a globaland systematic view based on biological networks. Specifically, we investigate thefollowing problems and make the following contributions.
     (1) The identification of disease-causing genes is a fundamental challenge inhuman health and of great importance in improving medical care, and provides a betterunderstanding of gene functions. Recent computational approaches based on theinteractions among human proteins and disease similarities have shown their power intackling the issue. Here, a novel systematic and global method which integrates twoheterogeneous networks for prioritizing candidate disease genes is provided, based onthe observation that genes causing the same or similar diseases tend to lie close to oneanother in a network of protein-protein interactions. In this method, the associationscore function between a query disease and a candidate gene is defined as the weightedsum of all the association scores between similar diseases and neighbouring genes,moreover, the topological correlation of these two heterogeneous networks can beincorporated into the definition of the score function, and lastly an iterative algorithm isdesigned to calculate the score function. This method was tested with10-fold cross-validation tests, significantly outperforming a state-of-the-art method calledPRINCE. The method presented here was also applied to study three multi-factorialdisorders: Breast Cancer, Alzheimer Disease and Diabetes Mellitus Type2, and somesuggestions of novel causal genes and candidate disease-causing subnetworks wereprovided for further investigation.
     (2) A large number of long non-coding RNAs(lncRNAs) have been identified bylarge-scale analyses of full-length cDNA sequences, chromatin-state maps or otheranalyses based on RNA-seq data, which draw a widespread attention on their studybecause of their specific properties and complicated biological functions. However, thefunctions of most lncRNAs remain to be determined. There is a critical need to annotatethe functions of increasing available lncRNAs. However, functional characterization oflncRNAs is a challenging task. For this purpose, we analyze the biological properties oflncRNAs and try to select proper features for function prediction of lncRNAs. In thisdissertation, we try to apply a global network-based strategy to tackle this issue for thefirst time. We develop a bi-colored network based global function predictor, named longnoncoding RNA Global Function-Predictor (lnc-GFP), to predict probable functionsfor lncRNAs at large scale by integrating gene expression data and protein interactiondata. The performance of lnc-GFP is evaluated on both protein-coding and lncRNAgenes. Cross-validation tests on protein-coding genes with known function annotationsindicate that our method can achieve a precision up to95%with a suitable parametersetting. Among the1713lncRNAs in the bi-colored network, the1625(94.9%)lncRNAs in the maximum connected component are all functionally characterized. Theinferred putative functions for many lncRNAs by our method highly match the knownliterature. With the success of lnc-GFP in function prediction for lncRNAscharacterized in mouse bi-colored network, we integrate lnc-GFP into the web server ofncFANs, which is a first web server designed to facilitate function annotation oflncRNAs. And here ncFANs2.0(http://www.bioinfo.org/ncFANs/) as a substantialupgrade to the original web server is presented, which is dedicated to functionalannotation of lncRNAs at large scale and comprehensively.
引文
[1] Altschul, S.F., et al., Basic local alignment search tool. Journal of molecularbiology,1990.215(3):403-410.
    [2] Kent, W.J., BLAT—the BLAST-like alignment tool. Genome research,2002.12(4):656-664.
    [3] Trapnell, C., L. Pachter, and S.L. Salzberg, TopHat: discovering splice junctionswith RNA-Seq. Bioinformatics,2009.25(9):1105-1111.
    [4] Hofacker, I.L., RNA secondary structure analysis using the Vienna RNApackage. Current Protocols in Bioinformatics,2004:12.2.1-12.2.12.
    [5] Reuter, J.S. and D.H. Mathews, RNAstructure: software for RNA secondarystructure prediction and analysis. BMC bioinformatics,2010.11(1):129.
    [6] Bowie, J.U., R. Luthy, and D. Eisenberg, A method to identify protein sequencesthat fold into a known three-dimensional structure. Science,1991.253(5016):164-170.
    [7] Kanehisa, M., et al., KEGG for integration and interpretation of large-scalemolecular data sets. Nucleic acids research,2012.40(D1): D109-D114.
    [8] Almaas, E., Biological impacts and context of network theory. Journal ofExperimental Biology,2007.210(9):1548-1558.
    [9] De Silva, E. and M.P.H. Stumpf, Complex networks and simple models inbiology. Journal of the Royal Society Interface,2005.2(5):419-430.
    [10] Srinivasan, B.S., et al., Current progress in network research: toward referencenetworks for key model organisms. Briefings in bioinformatics,2007.8(5):318-332.
    [11]孙景春,等,大规模蛋白质相互作用数据的分析与应用.科学通报,2005.50(19):6.
    [12]关薇,王建,贺福初,大规模蛋白质相互作用研究方法进展.生命科学,2006.18(5).
    [13]刘中扬,等,蛋白质相互作用网络进化分析研究进展.生物化学与生物物理进展,2009.36(1):13-24.
    [14]刘伟,等,信号转导网络的生物信息学分析.中国科学: C辑,2009.38(11):999-1006.
    [15] Zhang, S., et al., Discovering functions and revealing mechanisms at molecularlevel from biological networks. Proteomics,2007.7(16):2856-2869.
    [16] Kolá, M., M. L ssig, and J. Berg, From protein interactions to functionalannotation: graph alignment in Herpes. BMC systems biology,2008.2(1):90.
    [17] Sharan, R., et al., Conserved patterns of protein interaction in multiple species.Proceedings of the National Academy of Sciences of the United States ofAmerica,2005.102(6):1974-1979.
    [18] Liang, Z., et al., Comparison of protein interaction networks reveals speciesconservation and divergence. BMC bioinformatics,2006.7(1):457.
    [19] Singh, R., J. Xu, and B. Berger. Global alignment of multiple protein interactionnetworks with application to functional orthology detection. Proceedings of theNational Academy of Sciences,2008.105(35):12763-12768.
    [20] Hirsh, E. and R. Sharan. Identification of conserved protein complexes based ona model of protein network evolution. Bioinformatics,2007.23(2): e170-e176.
    [21] Sharan, R., et al., Identification of protein complexes by comparative analysis ofyeast and bacterial protein interaction data. Journal of Computational Biology,2005.12(6):835-846.
    [22] Tian, W. and N. Samatova. Pairwise alignment of interaction networks by fastidentification of maximal conserved patterns. Pac Symp Biocomput,2009.14:99-110
    [23] Koyutürk, M., et al., Pairwise alignment of protein interaction networks. Journalof Computational Biology,2006.13(2):182-199.
    [24] Flannick, J., et al., Graemlin: general and robust alignment of multiple largeinteraction networks. Genome Research,2006.16(9):1169-1181.
    [25] Flannick, J., et al. Automatic parameter learning for multiple network alignment.RECOMB,2008, LNBI4955,214-231.
    [26] Narayanan, M. and R.M. Karp, Comparing protein interaction networks via agraph match-and-split algorithm. Journal of Computational Biology,2007.14(7):892-907.
    [27] Ogata, H., et al., A heuristic graph comparison algorithm and its application todetect functionally related enzyme clusters. Nucleic Acids Research,2000.28(20):4021-4028.
    [28] Kelley, B.P., et al., Conserved pathways within bacteria and yeast as revealed byglobal protein network alignment. Science Signalling,2003.100(20):11394.
    [29] Kalaev, M., V. Bafna, and R. Sharan. Fast and accurate alignment of multipleprotein networks. RECOMB,2008, LNBI4955,246-256.
    [30] Bruckner, S., et al. Topology-free querying of protein interaction networks. inResearch in Computational Molecular Biology. Journal of ComputationalBiology,2010,17(3):237-252.
    [31] Pr ulj, N., Biological network comparison using graphlet degree distribution.Bioinformatics,2007.23(2): e177-e183.
    [32] Kuchaiev, O., et al., Topological network alignment uncovers biological functionand phylogeny. Journal of the Royal Society Interface,2010.7(50):1341-1354.
    [33] Milenkovi, T., et al., Optimal network alignment with graphlet degree vectors.Cancer informatics,2010.9:121.
    [34] Kuchaiev, O. and N. Pr ulj, Integrative network alignment reveals large regionsof global network similarity in yeast and human. Bioinformatics,2011.27(10):1390-1396.
    [35] Memi evi, V. and N. Pr ulj, C-GRAAL: Common-neighbors-based globalGRAph ALignment of biological networks. Integrative Biology,2012,4,734-743.
    [36] Berg, J. and M. L ssig, Cross-species analysis of biological networks byBayesian alignment. Proceedings of the National Academy of Sciences,2006.103(29):10967-10972.
    [37] Li, Z., et al., Alignment of molecular networks by integer quadraticprogramming. Bioinformatics,2007.23(13):1631-1639.
    [38] Singh, R., J. Xu, and B. Berger, Pairwise Global Alignment of ProteinInteraction Networks by Matching Neighborhood Topology. Proceedings of theNational Academy of Sciences105(35):12763-12768..
    [39] Liao, C.S., et al., IsoRankN: spectral methods for global alignment of multipleprotein networks. Bioinformatics,2009.25(12): i253-i258.
    [40] Klau, G.W., A new graph-based method for pairwise global network alignment.BMC bioinformatics,2009.10(Suppl1): S59.
    [41] Pinter, R.Y., et al., Alignment of metabolic pathways. Bioinformatics,2005.21(16):3401-3408.
    [42] Shlomi, T., et al., QPath: a method for querying pathways in a protein-proteininteraction network. BMC bioinformatics,2006.7(1):199.
    [43] Blum, T. and O. Kohlbacher, MetaRoute: fast search for relevant metabolicroutes for interactive network navigation and visualization. Bioinformatics,2008.24(18):2108-2109.
    [44] Li, Y., et al., Metabolic pathway alignment between species using acomprehensive and flexible similarity measure. BMC systems biology,2008.2(1):111.
    [45] Wernicke, S. and F. Rasche, Simple and fast alignment of metabolic pathwaysby exploiting local diversity. Bioinformatics,2007.23(15):1978-1985.
    [46] Tian, Y., et al., SAGA: a subgraph matching tool for biological graphs.Bioinformatics,2007.23(2):232-239.
    [47] Yang, Q. and S.H. Sze, Path matching and graph matching in biologicalnetworks. Journal of Computational Biology,2007.14(1):56-67.
    [48] Brevier, G., R. Rizzi, and S. Vialette. Pattern matching in protein-proteininteraction graphs. in Proc. of Fundamentals of Computation Theory. FCT,2007,LNCS4639,137-148
    [49]郭杏莉,高琳,陈新,生物网络比对的模型与算法.软件学报,2010,21(9):2089-2106
    [50] Weatherall, D., Phenotype—genotype relationships in monogenic disease:lessons from the thalassaemias. Nature Reviews Genetics,2001.2(4):245-255.
    [51] Motulsky, A.G., Genetics of complex diseases. Journal of Zhejiang UniversityScience B,2006.7(2):167-168.
    [52]李梢,等,复杂性疾病生物信息学研究的策略与方法. World,2003.11(10):1465-1469.
    [53] Doncheva, N.T., T. Kacprowski, and M. Albrecht, Recent approaches to theprioritization of candidate disease genes. Wiley Interdisciplinary Reviews:Systems Biology and Medicine,2012.4(5):429-442.
    [54] Adie, E.A., et al., Speeding disease gene discovery by sequence based candidateprioritization. BMC bioinformatics,2005.6(1):55.
    [55] Schlicker, A., T. Lengauer, and M. Albrecht, Improving disease geneprioritization using the semantic similarity of Gene Ontology terms.Bioinformatics,2010.26(18): i561-i567.
    [56] Dezs, Z., et al., Identifying disease-specific genes based on their topologicalsignificance in protein networks. BMC systems biology,2009.3(1):36.
    [57] Vanunu, O., et al., Associating genes and protein complexes with disease vianetwork propagation. PLoS computational biology,2010.6(1):e1000641.
    [58] Aerts, S., et al., Gene prioritization through genomic data fusion. Naturebiotechnology,2006.24(5):537-544.
    [59] Pers, T.H., et al., Meta‐analysis of heterogeneous data sources for genome‐scale identification of risk genes in complex phenotypes. Genetic epidemiology,2011.35(5):318-332.
    [60] Sharan, R., I. Ulitsky, and R. Shamir, Network-based prediction of proteinfunction. Molecular systems biology,2007.3(1).
    [61] Schwikowski, B., P. Uetz, and S. Fields, A network of protein–proteininteractions in yeast. Nature biotechnology,2000.18(12):1257-1261.
    [62] Hishigaki, H., et al., Assessment of prediction accuracy of protein function fromprotein–protein interaction data. Yeast,2001.18(6):523-531.
    [63] Chua, H.N., W.-K. Sung, and L. Wong, Exploiting indirect neighbours andtopological weight to predict protein function from protein–protein interactions.Bioinformatics,2006.22(13):1623-1630.
    [64] Vazquez, A., et al., Global protein function prediction from protein-proteininteraction networks. Nature biotechnology,2003.21(6):697-700.
    [65] Karaoz, U., et al., Whole-genome annotation by using evidence integration infunctional-linkage networks. Proceedings of the National Academy of Sciencesof the United States of America,2004.101(9):2888-2893.
    [66] Nabieva, E., et al., Whole-proteome prediction of protein function viagraph-theoretic analysis of interaction maps. Bioinformatics,2005.21(suppl1):i302-i310.
    [67] Deng, M., et al., Prediction of protein function using protein-protein interactiondata. Journal of Computational Biology,2003.10(6):947-960.
    [68] Letovsky, S. and S. Kasif, Predicting protein function from protein/proteininteraction data: a probabilistic approach. Bioinformatics,2003.19(suppl1):i197-i204.
    [69] Joshi, T., et al., Genome-scale gene function prediction using multiple sources ofhigh-throughput data in yeast Saccharomyces cerevisiae. OMICS: A Journal ofIntegrative Biology,2004.8(4):322-333.
    [70] Deng, M., et al., Mapping gene ontology to proteins based on protein–proteininteraction data. Bioinformatics,2004.20(6):895-902.
    [71] Lanckriet, G.R., et al., A statistical framework for genomic data fusion.Bioinformatics,2004.20(16):2626-2635.
    [72] Lee, H., et al., Diffusion kernel-based logistic regression models for proteinfunction prediction. OMICS: A Journal of Integrative Biology,2006.10(1):40-55.
    [73] Tsuda, K., H. Shin, and B. Sch lkopf, Fast protein classification with multiplenetworks. Bioinformatics,2005.21(suppl2): ii59-ii65.
    [74] Arnau, V., S. Mars, and I. Marín, Iterative cluster analysis of protein interactiondata. Bioinformatics,2005.21(3):364-378.
    [75] Bader, G.D. and C.W. Hogue, An automated method for finding molecularcomplexes in large protein interaction networks. BMC bioinformatics,2003.4(1):2.
    [76] LaCount, D.J., et al., A protein interaction network of the malaria parasitePlasmodium falciparum. Nature,2005.438(7064):103-107.
    [77] Rual, J.-F., et al., Towards a proteome-scale map of the human protein–proteininteraction network. Nature,2005.437(7062):1173-1178.
    [78] Adamcsek, B., et al., CFinder: locating cliques and overlapping modules inbiological networks. Bioinformatics,2006.22(8):1021-1023.
    [79] Dunn, R., F. Dudbridge, and C.M. Sanderson, The use of edge-betweennessclustering to investigate biological function in protein interaction networks.BMC bioinformatics,2005.6(1):39.
    [80] King, A., N. Pr ulj, and I. Jurisica, Protein complex prediction via cost-basedclustering. Bioinformatics,2004.20(17):3013-3020.
    [81] Krogan, N.J., et al., Global landscape of protein complexes in the yeastSaccharomyces cerevisiae. Nature,2006.440(7084):637-643.
    [82] Pr ulj, N., D. Wigle, and I. Jurisica, Functional topology in a network of proteininteractions. Bioinformatics,2004.20(3):340-348.
    [83] Spirin, V. and L.A. Mirny, Protein complexes and functional modules inmolecular networks. Proceedings of the National Academy of Sciences,2003.100(21):12123-12128.
    [84] Cabili, M.N., et al., Integrative annotation of human large intergenic noncodingRNAs reveals global properties and specific subclasses. Genes&Development,2011.25(18):1915-1927.
    [85] Khalil, A.M., et al., Many human large intergenic noncoding RNAs associatewith chromatin-modifying complexes and affect gene expression. Proceedingsof the National Academy of Sciences,2009.106(28):11667-11672.
    [86] Mitchell Guttman, I.A., et al., Chromatin signature reveals over a thousandhighly conserved large non-coding RNAs in mammals. Nature,2009.458(7235):223-227.
    [87] Liao, Q., et al., Large-scale prediction of long non-coding RNA functions in acoding–non-coding gene co-expression network. Nucleic acids research,2011.39(9):3864-3878.
    [88] Oti, M. and H. Brunner, The modular nature of genetic diseases. Clinicalgenetics,2007.71(1):1-11.
    [89] Gandhi, T., et al., Analysis of the human protein interactome and comparisonwith yeast, worm and fly interaction datasets. Nature genetics,2006.38(3):285-293.
    [90] Oti, M., et al., Predicting disease genes using protein–protein interactions.Journal of medical genetics,2006.43(8):691-698.
    [91] Lage, K., et al., A human phenome-interactome network of protein complexesimplicated in genetic disorders. Nature biotechnology,2007.25(3):309-316.
    [92] K hler, S., et al., Walking the interactome for prioritization of candidate diseasegenes. American journal of human genetics,2008.82(4):949.
    [93] Wu, X., et al., Network-based global inference of human disease genes.Molecular systems biology.2008.4(1).
    [94] Wu, X., Q. Liu, and R. Jiang, Align human interactome with phenome toidentify causative genes and networks underlying disease families.Bioinformatics.2009.25(1):98-104.
    [95] Goh, K.-I., et al., The human disease network. Proceedings of the NationalAcademy of Sciences.2007.104(21):8685-8690.
    [96] van Driel, M.A., et al., A text-mining analysis of the human phenome. Europeanjournal of human genetics.2006.14(5):535-542.
    [97] Rzhetsky, A., et al., Probing genetic overlap among complex human phenotypes.Proceedings of the National Academy of Sciences.2007.104(28):11694-11699.
    [98] Hamosh, A., et al., Online Mendelian Inheritance in Man (OMIM), aknowledgebase of human genes and genetic disorders. Nucleic acids research.2005.33(suppl1): D514-D517.
    [99] Peri, S., et al., Human protein reference database as a discovery resource forproteomics. Nucleic acids research,2004.32(suppl1): D497-D501.
    [100] Andersen, R., F. Chung, and K. Lang. Local graph partitioning using pagerankvectors. in Foundations of Computer Science,2006. FOCS'06.47th AnnualIEEE Symposium on.2006:475-486
    [101] Gaulton, K.J., K.L. Mohlke, and T.J. Vision, A computational system to selectcandidate genes for complex human traits. Bioinformatics.2007,23(9):1132-1140.
    [102] Reimand, J., et al., g: Profiler—a web-based toolset for functional profiling ofgene lists from large-scale experiments. Nucleic acids research.2007.35(suppl2): W193-W200.
    [103] Ashburner, M., et al., Gene Ontology: tool for the unification of biology. Naturegenetics.2000.25(1):25-29.
    [104] Oldenburg, R., et al., Genetic susceptibility for breast cancer: how many moregenes to be found? Critical reviews in oncology/hematology,2007.63(2):125-149.
    [105] Reitz, C., C. Brayne, and R. Mayeux, Epidemiology of Alzheimer disease.Nature Reviews Neurology,2011.7(3):137-152.
    [106] Schjeide, B.-M.M., et al., The role of clusterin, complement receptor1, andphosphatidylinositol binding clathrin assembly protein in Alzheimer disease riskand cerebrospinal fluid biomarker levels. Archives of general psychiatry,2011.68(2):207.
    [107] Huang, Y.-C., et al., Genome-wide association study of diabetic retinopathy in aTaiwanese population. Ophthalmology,2011.118(4):642-648.
    [108] Liang, H., et al., Type1receptor parathyroid hormone (PTH1R) influencesbreast cancer cell proliferation and apoptosis induced by high levels of glucose.Medical Oncology,2012.29(2):439-445.
    [109] Khalil, A.M. and J.L. Rinn. RNA–protein interactions in human health anddisease. Semin Cell Dev Biol (2011), doi:10.1016/j.semcdb.2011.02.016
    [110] Mercer, T.R., M.E. Dinger, and J.S. Mattick, Long non-coding RNAs: insightsinto functions. Nature Reviews Genetics,2009.10(3):155-159.
    [111] Bu, D., et al., NONCODE v3.0: integrative annotation of long noncoding RNAs.Nucleic acids research,2012.40(D1): D210-D215.
    [112] Huarte, M., et al., A large intergenic noncoding RNA induced by p53mediatesglobal gene repression in the p53response. Cell,2010.142(3):409-419.
    [113] Cesana, M., et al., A long noncoding RNA controls muscle differentiation byfunctioning as a competing endogenous RNA. Cell,2011.147(2):358-369.
    [114] Ponting, C.P., P.L. Oliver, and W. Reik, Evolution and functions of longnoncoding RNAs. Cell,2009.136(4):629-641.
    [115] Bartolomei, M.S., S. Zemel, and S.M. Tilghman, Parental imprinting of themouse H19gene. Nature,1991.351(6322):153-155.
    [116] Brockdorff, N., et al., Conservation of position and exclusive expression ofmouse Xist from the inactive X chromosome. Nature,1991.351(6324):329-331.
    [117] Tian, D., S. Sun, and J.T. Lee, The long noncoding RNA, Jpx, is a molecularswitch for X chromosome inactivation. Cell,2010.143(3):390-403.
    [118] Redrup, L., et al., The long noncoding RNA Kcnq1ot1organises alineage-specific nuclear domain for epigenetic gene silencing. Development,2009.136(4):525-530.
    [119] Guttman, M., et al., lincRNAs act in the circuitry controlling pluripotency anddifferentiation. Nature,2011.477(7364):295-300.
    [120] Guttman, M. and J.L. Rinn, Modular regulatory principles of large non-codingRNAs. Nature,2012.482(7385):339-346.
    [121] Tsai, M.-C., R.C. Spitale, and H.Y. Chang, Long intergenic noncoding RNAs:new links in cancer progression. Cancer research,2011.71(1):3-7.
    [122] Wapinski, O. and H.Y. Chang, Long noncoding RNAs and human disease.Trends in cell biology,2011.21(6):354-361.
    [123] Hiller, M., et al., Conserved introns reveal novel transcripts in Drosophilamelanogaster. Genome research,2009.19(7):1289-1300.
    [124] Rose, D., et al., Computational discovery of human coding and non-codingtranscripts with conserved splice sites. Bioinformatics,2011.27(14):1894-1900.
    [125] Okazaki, Y., et al., Analysis of the mouse transcriptome based on functionalannotation of60,770full-length cDNAs. Nature,2002.420(6915):563-573.
    [126] Li, L., et al., Genome-wide transcription analyses in rice using tiling microarrays.Nature genetics,2005.38(1):124-129.
    [127] Hung, T., et al., Extensive and coordinated transcription of noncoding RNAswithin cell-cycle promoters. Nature genetics,2011.43(7):621-629.
    [128] Zhang, G., et al., Deep RNA sequencing at single base-pair resolution revealshigh complexity of the rice transcriptome. Genome research,2010.20(5):646-654.
    [129] Zhao, J., et al., Genome-wide identification of polycomb-associated RNAs byRIP-seq. Molecular cell,2010.40(6):939-953.
    [130] Jia, H., et al., Genome-wide computational identification and manual annotationof human long noncoding RNA genes. Rna,2010.16(8):1478-1487.
    [131] Mikkelsen, T.S., et al., Genome-wide maps of chromatin state in pluripotent andlineage-committed cells. Nature,2007.448(7153):553-560.
    [132] Kong, L., et al., CPC: assess the protein-coding potential of transcripts usingsequence features and support vector machine. Nucleic acids research,2007.35(suppl2):W345-W349.
    [133] Lin, M.F., et al., Revisiting the protein-coding gene catalog of Drosophilamelanogaster using12fly genomes. Genome research,2007.17(12):1823-1836.
    [134] Lin, M.F., I. Jungreis, and M. Kellis, PhyloCSF: a comparative genomicsmethod to distinguish protein coding and non-coding regions. Bioinformatics,2011.27(13): i275-i282.
    [135] Prensner, J.R., et al., Transcriptome sequencing across a prostate cancer cohortidentifies PCAT-1, an unannotated lincRNA implicated in disease progression.Nature biotechnology,2011.29(8):742-749.
    [136] Pang, K.C., et al., RNAdb—a comprehensive mammalian noncoding RNAdatabase. Nucleic acids research,2005.33(suppl1): D125-D130.
    [137] Maeda, N., et al., Transcript annotation in FANTOM3: mouse gene catalogbased on physical cDNAs. PLoS genetics,2006.2(4): e62.
    [138] Yamasaki, C., et al., The H-Invitational Database (H-InvDB), a comprehensiveannotation resource for human genes and transcripts. Nucleic acids research,2008.36(Database issue): D793.
    [139] Amaral, P.P., et al., lncRNAdb: a reference database for long noncoding RNAs.Nucleic acids research,2011.39(suppl1): D146-D151.
    [140] Kin, T., et al., fRNAdb: a platform for mining/annotating functional RNAcandidates from non-coding RNA sequences. Nucleic acids research,2007.35(suppl1): D145-D148.
    [141] Gardner, P.P., et al., Rfam: updates to the RNA families database. Nucleic acidsresearch,2009.37(suppl1): D136-D140.
    [142] Dinger, M.E., et al., NRED: a database of long noncoding RNA expression.Nucleic acids research,2009.37(suppl1): D122-D126.
    [143] Szymański, M., V.A. Erdmann, and J. Barciszewski, Noncoding regulatoryRNAs database. Nucleic acids research,2003.31(1):429-431.
    [144] Nordstr m, K.J., et al., Critical evaluation of the FANTOM3non-coding RNAtranscripts. Genomics,2009.94(3):169-176.
    [145] Dinger, M.E., et al., Differentiating protein-coding and noncoding RNA:challenges and ambiguities. PLoS computational biology,2008.4(11):e1000176.
    [146] Ulveling, D., C. Francastel, and F. Hubé, When one is better than two: RNA withdual functions. Biochimie,2011.93(4):633-644.
    [147] Hershberg, R. and D.A. Petrov, Selection on codon bias. Annual review ofgenetics,2008.42:287-299.
    [148] Bulmer, M., The selection-mutation-drift theory of synonymous codon usage.Genetics,1991.129(3):897-907.
    [149] Niazi, F. and S. Valadkhan, Computational analysis of functional long noncodingRNAs reveals lack of peptide-coding capacity and parallels with3′UTRs. Rna,2012.18(4):825-843.
    [150] Yang, Z., et al., Codon-substitution models for heterogeneous selection pressureat amino acid sites. Genetics,2000.155(1):431-449.
    [151] Endo, T., K. Ikeo, and T. Gojobori, Large-scale search for genes on whichpositive selection may operate. Molecular biology and evolution,1996.13(5):685-690.
    [152] Ponjavic, J., C.P. Ponting, and G. Lunter, Functionality or transcriptional noise?Evidence for selection within long noncoding RNAs. Genome research,2007.17(5): p.556-565.
    [153] Pang, K.C., M.C. Frith, and J.S. Mattick, Rapid evolution of noncoding RNAs:lack of conservation does not mean lack of function. Trends Genet,2006.22(1).
    [154] Maenner, S., et al.,2-D structure of the A region of Xist RNA and its implicationfor PRC2association. PLoS biology,2010.8(1): e1000276.
    [155] Mercer, T.R., et al., Specific expression of long noncoding RNAs in the mousebrain. Proceedings of the National Academy of Sciences,2008.105(2):716-721.
    [156] Amaral, P.P. and J.S. Mattick, Noncoding RNA in development. Mammaliangenome,2008.19(7-8):454-492.
    [157] Wilusz, J.E., H. Sunwoo, and D.L. Spector, Long noncoding RNAs: functionalsurprises from the RNA world. Genes&Development,2009.23(13):1494-1504.
    [158] Clark, M.B., et al., Genome-wide analysis of long noncoding RNA stability.Genome research,2012.22(5):885-898.
    [159] Khaitan, D., et al., The melanoma‐upregulated long noncoding RNASPRY4-IT1modulates apoptosis and invasion. Cancer research,2011.71(11):3852-3862.
    [160] Faghihi, M.A., et al., Expression of a noncoding RNA is elevated in Alzheimer'sdisease and drives rapid feed-forward regulation of β-secretase. Nature medicine,2008.14(7):723-730.
    [161] Gupta, R.A., et al., Long non-coding RNA HOTAIR reprograms chromatin stateto promote cancer metastasis. Nature,2010.464(7291):1071-1076.
    [162] Gibb, E.A., et al., Human cancer long non-coding RNA transcriptomes. PloSone,2011.6(10): e25915.
    [163] Belinky, F., et al., Non-redundant compendium of human ncRNA genes inGeneCards. Bioinformatics,2013.29(2):255-261.
    [164] Volders, P.-J., et al., LNCipedia: a database for annotated human lncRNAtranscript sequences and structures. Nucleic acids research,2013.41(D1):D246-D251.
    [165] Rotblat, B., G. Leprivier, and P.H. Sorensen, A possible role for long non-codingRNA in modulating signaling pathways. Medical Hypotheses,2011.77(6):962-965.
    [166] Chen, L.-L. and G.G. Carmichael, Decoding the function of nuclear longnon-coding RNAs. Current opinion in cell biology,2010.22(3):357-364.
    [167] Guan, D., et al., Switching cell fate, ncRNAs coming to play. Cell Death&Disease,2013.4(1): e464.
    [168] Dhanasekaran, K., S. Kumari, and C. Kanduri, Noncoding RNAs in ChromatinOrganization and Transcription Regulation: An Epigenetic View. Epigenetics:Development and Disease,2013:343-372.
    [169] Yap, K.L., et al., Molecular Interplay of the Noncoding RNA ANRIL andMethylated Histone H3Lysine27by Polycomb CBX7in TranscriptionalSilencing of INK4a. Molecular cell,2010.38(5):662-674.
    [170] Pasmant, E., et al., ANRIL, a long, noncoding RNA, is an unexpected majorhotspot in GWAS. The FASEB Journal,2011.25(2):444-448.
    [171] Harismendy, O., et al.,9p21DNA variants associated with coronary arterydisease impair interferon-[ggr] signalling response. Nature,2011.470(7333):264-268.
    [172] Burd, C.E., et al., Expression of linear and novel circular forms of anINK4/ARF-associated non-coding RNA correlates with atherosclerosis risk.PLoS genetics,2010.6(12): e1001233.
    [173] Golding, M.C., et al., Depletion of Kcnq1ot1non-coding RNA does not affectimprinting maintenance in stem cells. Development,2011.138(17):3667-3678.
    [174] Gu, T., et al., Expression of non-coding RNA AB063319derived from Rian geneduring mouse development. Journal of molecular histology,2011.42(2):105-112.
    [175] Ng, S.-Y., R. Johnson, and L.W. Stanton, Human long non-coding RNAspromote pluripotency and neuronal differentiation by association with chromatinmodifiers and transcription factors. The EMBO journal,2011,31(3):522-533.
    [176] Eisen, M.B., et al., Cluster analysis and display of genome-wide expressionpatterns. Proceedings of the National Academy of Sciences,1998.95(25):14863-14868.
    [177] Lee, H.K., et al., Coexpression analysis of human genes across many microarraydata sets. Genome research,2004.14(6):1085-1094.
    [178] Pang, K.C., et al., Genome-wide identification of long noncoding RNAs inCD8+T cells. The Journal of Immunology,2009.182(12):7738-7748.
    [179] Zhou, D., et al., Learning with local and global consistency. Advances in neuralinformation processing systems,2004.16(753760):284.
    [180] Szklarczyk, D., et al., The STRING database in2011: functional interactionnetworks of proteins, globally integrated and scored. Nucleic acids research,2011.39(suppl1):D561-D568.
    [181] Rinn, J.L. and H.Y. Chang, Genome regulation by long noncoding RNAs.Annual review of biochemistry,2012.81:145-166.
    [182] Dinger, M.E., et al., Long noncoding RNAs in mouse embryonic stem cellpluripotency and differentiation. Genome research,2008.18(9):1433-1445.
    [183] Lee, I., et al., Rational association of genes with traits using a genome-scalegene network for Arabidopsis thaliana. Nature biotechnology,2010.28(2):149-156.
    [184] Mohamed, J.S., et al., Conserved long noncoding RNAs transcriptionallyregulated by Oct4and Nanog modulate pluripotency in mouse embryonic stemcells. Rna,2010.16(2):324-337.
    [185] Chodroff, R.A., et al., Long noncoding RNA genes: conservation of sequenceand brain expression among diverse amniotes. Genome biology,2010.11(7):R72.
    [186] Mercer, T., et al., Long noncoding RNAs in neuronal-glial fate specification andoligodendrocyte lineage maturation. BMC neuroscience,2010.11(1):14.
    [187] Tripathi, V., et al., The nuclear-retained noncoding RNA MALAT1regulatesalternative splicing by modulating SR splicing factor phosphorylation.Molecular cell,2010.39(6):925-938.
    [188] Bellucci, M., et al., Predicting protein associations with long noncoding RNAs.Nature methods,2011.8(6):444-445.
    [189] Mortazavi, A., et al., Mapping and quantifying mammalian transcriptomes byRNA-Seq. Nature methods,2008.5(7):621-628.
    [190] Gibb, E.A., C.J. Brown, and W.L. Lam, The functional role of long non-codingRNA in human carcinomas. Mol Cancer,2011.10(38):116.
    [191] Liao, Q., et al., ncFANs: a web server for functional annotation of longnon-coding RNAs. Nucleic acids research,2011.39(suppl2): W118-W124.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700