摘要
越来越多证据表明RNA在生物系统中扮演着重要的角色,而这些发现支持了生命起源于RNA的假设。在人类基因组中,大部分的基因并不编码蛋白质,被称为非编码RNA基因。长非编码RNA(lncRNA)是其中最大的一类,其转录本长度大于200个核苷酸。虽然一些lncRNA已被证明是调控基因表达和3D基因组结构的重要元件,但是大部分lncRNA还未被研究和注释。本课题组利用大量基因组数据,提出一些基于数据挖掘和机器学习的方法,对人类lncRNA进行功能注释。我们与其他同领域课题组的近期研究结果表明,基因组数据挖掘可帮助加深对lnc RNA功能的理解,并为与疾病相关lncRNA的实验研究提供重要信息。
Life may have begun in an RNA world, which is supported by increasing evidence of the vital role that RNAs perform in biological systems. In the human genome, most genes actually do not encode proteins;they are noncoding RNA genes. The largest class of noncoding genes is known as long noncoding RNAs(lncRNAs), which are transcripts greater in length than 200 nucleotides, but with no protein-coding capacity.While some lncRNAs have been demonstrated to be key regulators of gene expression and 3D genome organization, most lncRNAs are still uncharacterized. We thus propose several data mining and machine learning approaches for the functional annotation of human lncRNAs by leveraging the vast amount of data from genetic and genomic studies. Recent results from our studies and those of other groups indicate that genomic data mining can give insights into lncRNA functions and provide valuable information for experimental studies of candidate lncRNAs associated with human disease.
引文
Achar A,S?trom P,2015.RNA motif discovery:a computational overview.Biol Direct,10:61.https://doi.org/10.1186/s13062-015-0090-5
Brázda V,HároníkováL,Liao JCC,et al.,2014.DNA and RNA quadruplex-binding proteins.Int J Mol Sci,15(10):17493-17517.https://doi.org/10.3390/ijms151017493
Cabili MN,Dunagin MC,McClanahan PD,et al.,2015.Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution.Genome Biol,16:20.https://doi.org/10.1186/s13059-015-0586-4
Cajigas I,Leib DE,Cochrane J,et al.,2015.Evf2 lncRNA/BRG1/DLX1 interactions reveal RNA-dependent inhibition of chromatin remodeling.Development,142(15):2641-2652.https://doi.org/10.1242/dev.126318
Cammas A,Millevoi S,2017.RNA G-quadruplexes:emerging mechanisms in disease.Nucleic Acids Res,45(4):1584-1595.https://doi.org/10.1093/nar/gkw1280
Cao HF,Wahlestedt C,Kapranov P,2018.Strategies to annotate and characterize long noncoding RNAs:advantages and pitfalls.Trends Genet,34(9):704-721.https://doi.org/10.1016/j.tig.2018.06.002
Cao Z,Pan XY,Yang Y,et al.,2018.The lncLocator:a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier.Bioinformatics,34(13):2185-2194.https://doi.org/10.1093/bioinformatics/bty085
Carlevaro-Fita J,Johnson R,2019.Global positioning system:understanding long noncoding RNAs through subcellular localization.Mol Cell,73(5):869-883.https://doi.org/10.1016/j.molcel.2019.02.008
Chaudhary R,Gryder B,Woods WS,et al.,2017.Prosurvival long noncoding RNA PINCR regulates a subset of p53targets in human colorectal cancer cells by binding to Matrin 3.eLife,6:e23244.https://doi.org/10.7554/eLife.23244
Chen LL,2016.Linking long noncoding RNA localization and function.Trends Biochem Sci,41(9):761-772.https://doi.org/10.1016/j.tibs.2016.07.003
Ching T,Himmelstein DS,Beaulieu-Jones BK,et al.,2018.Opportunities and obstacles for deep learning in biology and medicine.J R Soc Interface,15(141):20170387.https://doi.org/10.1098/rsif.2017.0387
Clark BS,Blackshaw S,2014.Long non-coding RNA-dependent transcriptional regulation in neuronal development and disease.Front Genet,5:164.https://doi.org/10.3389/fgene.2014.00164
Clemson CM,Hutchinson JN,Sara SA,et al.,2009.An architectural role for a nuclear noncoding RNA:NEAT1RNA is essential for the structure of paraspeckles.Mol Cell,33(6):717-726.https://doi.org/10.1016/j.molcel.2009.01.026
Cogill SB,Wang LJ,2014.Co-expression network analysis of human lncRNAs and cancer genes.Cancer Inform,13(Suppl 5):49-59.https://doi.org/10.4137/CIN.S14070
Cogill SB,Wang LJ,2016.Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates.Bioinformatics,32(23):3611-3618.https://doi.org/10.1093/bioinformatics/btw498
Cogill SB,Srivastava AK,Yang MQ,et al.,2018.Co-expression of long non-coding RNAs and autism risk genes in the developing human brain.BMC Syst Biol,12(Suppl 7):91.https://doi.org/10.1186/s12918-018-0639-x
Darnell JC,Fraser CE,Mostovetsky O,et al.,2005.Kissing complex RNAs mediate interaction between the Fragile-Xmental retardation protein KH2 domain and brain polyribosomes.Genes Dev,19(8):903-918.https://doi.org/10.1101/gad.1276805
Davidovich C,Cech TR,2015.The recruitment of chromatin modifiers by long noncoding RNAs:lessons from PRC2.RNA,21(12):2007-2022.https://doi.org/10.1261/rna.053918.115
de Rubeis S,He X,Goldberg AP,et al.,2014.Synaptic,transcriptional and chromatin genes disrupted in autism.Nature,515(7526):209-215.https://doi.org/10.1038/nature13772
Derrien T,Johnson R,Bussotti G,et al.,2012.The GENCODEv7 catalog of human long noncoding RNAs:analysis of their gene structure,evolution,and expression.Genome Res,22(9):1775-1789.https://doi.org/10.1101/gr.132159.111
ENCODE Project Consortium,2012.An integrated encyclopedia of DNA elements in the human genome.Nature,489(7414):57-74.https://doi.org/10.1038/nature11247
FerrèF,Colantoni A,Helmer-Citterich M,2016.Revealing protein-lncRNA interaction.Brief Bioinform,17(1):106-116.https://doi.org/10.1093/bib/bbv031
Geisler S,Coller J,2013.RNA in unexpected places:long non-coding RNA functions in diverse cellular contexts.Nat Rev Mol Cell Biol,14(11):699-712.https://doi.org/10.1038/nrm3679
Gudenas BL,Wang LJ,2015.Gene coexpression networks in human brain developmental transcriptomes implicate the association of long noncoding RNAs with intellectual disability.Bioinform Biol Insights,9(Suppl 1):21-27.https://doi.org/10.4137/BBI.S29435
Gudenas BL,Wang LJ,2018.Prediction of lncRNA subcellular localization with deep learning from sequence features.Sci Rep,8(1):16385.https://doi.org/10.1038/s41598-018-34708-w
Gudenas BL,Srivastava AK,Wang LJ,2017.Integrative genomic analyses for identification and prioritization of long non-coding RNAs associated with autism.PLo SONE,12(5):e0178532.https://doi.org/10.1371/journal.pone.0178532
Guo Y,Chen X,Xing RX,et al.,2018.Interplay between FMRP and lncRNA TUG1 regulates axonal development through mediating SnoN-Ccd1 pathway.Hum Mol Genet,27(3):475-485.https://doi.org/10.1093/hmg/ddx417
Guttman M,Rinn JL,2012.Modular regulatory principles of large non-coding RNAs.Nature,482(7385):339-346.https://doi.org/10.1038/nature10887
Hangauer MJ,Vaughn IW,Mc Manus MT,2013.Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs.PLoS Genet,9(6):e1003569.https://doi.org/10.1371/journal.pgen.1003569
Huarte M,Guttman M,Feldser D,et al.,2010.A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response.Cell,142(3):409-419.https://doi.org/10.1016/j.cell.2010.06.040
Iyer MK,Niknafs YS,Malik R,et al.,2015.The landscape of long noncoding RNAs in the human transcriptome.Nat Genet,47(3):199-208.https://doi.org/10.1038/ng.3192
Jackman JE,Alfonzo JD,2013.Transfer RNA modifications:nature’s combinatorial chemistry playground.Wiley Interdiscip Rev RNA,4(1):35-48.https://doi.org/10.1002/wrna.1144
Jin JJ,Lv W,Xia P,et al.,2018.Long noncoding RNA SYISLregulates myogenesis by interacting with polycomb repressive complex 2.Proc Natl Acad Sci USA,115(42):E9802-E9811.https://doi.org/10.1073/pnas.1801471115
Ke SD,Alemu EA,Mertens C,et al.,2015.A majority of m6Aresidues are in the last exons,allowing the potential for 3'UTR regulation.Genes Dev,29(19):2037-2053.https://doi.org/10.1101/gad.269415.115
Kiser DP,Rivero O,Lesch KP,2015.Annual research review:the(epi)genetics of neurodevelopmental disorders in the era of whole-genome sequencing-unveiling the dark matter.J Child Psychol Psychiatry,56(3):278-295.https://doi.org/10.1111/jcpp.12392
Kumar V,Westra HJ,Karjalainen J,et al.,2013.Human diseaseassociated genetic variation impacts large intergenic noncoding RNA expression.PLoS Genet,9(1):e1003201.https://doi.org/10.1371/journal.pgen.1003201
Kung JT,Kesner B,An JY,et al.,2015.Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF.Mol Cell,57(2):361-375.https://doi.org/10.1016/j.molcel.2014.12.006
Li L,Zhuang YL,Zhao XS,et al.,2019.Long non-coding RNA in neuronal development and neurological disorders.Front Genet,9:744.https://doi.org/10.3389/fgene.2018.00744
Li R,Zhu HL,Luo YB,2016.Understanding the functions of long non-coding RNAs through their higher-order structures.Int J Mol Sci,17(5):E702.https://doi.org/10.3390/ijms17050702
Liao Q,Liu CN,Yuan XY,et al.,2011.Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network.Nucleic Acids Res,39(9):3864-3878.https://doi.org/10.1093/nar/gkq1348
Linder B,Grozhik AV,Olarerin-George AO,et al.,2015.Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.Nat Methods,12(8):767-772.https://doi.org/10.1038/nmeth.3453
Liu N,Dai Q,Zheng GQ,et al.,2015.N6-methyladenosinedependent RNA structural switches regulate RNA-protein interactions.Nature,518(7540):560-564.https://doi.org/10.1038/nature14234
Lu QS,Ren SJ,Lu M,et al.,2013.Computational prediction of associations between long non-coding RNAs and proteins.BMC Genomics,14:651.https://doi.org/10.1186/1471-2164-14-651
Maurano MT,Humbert R,Rynes E,et al.,2012.Systematic localization of common disease-associated variation in regulatory DNA.Science,337(6099):1190-1195.https://doi.org/10.1126/science.1222794
Morris KV,2016.Long Non-coding RNAs in Human Disease.Springer International Publishing,Cham,Germany.https://doi.org/10.1007/978-3-319-23907-1
Muppirala UK,Honavar VG,Dobbs D,2011.Predicting RNA-protein interactions using only sequence information.BMCBioinformatics,12:489.https://doi.org/10.1186/1471-2105-12-489
Necsulea A,Soumillon M,Warnefors M,et al.,2014.The evolution of lncRNA repertoires and expression patterns in tetrapods.Nature,505(7485):635-640.https://doi.org/10.1038/nature12943
O'Roak BJ,Vives L,Girirajan S,et al.,2012.Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations.Nature,485(7397):246-250.https://doi.org/10.1038/nature10989
Pan XY,Fan YX,Yan JC,et al.,2016.IPMiner:hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction.BMC Genomics,17:582.https://doi.org/10.1186/s12864-016-2931-8
Patil DP,Chen CK,Pickering BF,et al.,2016.m6A RNAmethylation promotes XIST-mediated transcriptional repression.Nature,537(7620):369-373.https://doi.org/10.1038/nature19342
Pertea M,Salzberg SL,2010.Between a chicken and a grape:estimating the number of human genes.Genome Biol,11(5):206.https://doi.org/10.1186/gb-2010-11-5-206
Pian C,Zhang GL,Chen Z,et al.,2016.LncRNApred:classification of long non-coding RNAs and protein-coding transcripts by the ensemble algorithm with a new hybrid feature.PLo S ONE,11(5):e0154567.https://doi.org/10.1371/journal.pone.0154567
Ponting CP,Oliver PL,Reik W,2009.Evolution and functions of long noncoding RNAs.Cell,136(4):629-641.https://doi.org/10.1016/j.cell.2009.02.006
Quinn JJ,Chang HY,2016.Unique features of long non-coding RNA biogenesis and function.Nat Rev Genet,17(1):47-62.https://doi.org/10.1038/nrg.2015.10
Rashid F,Shah A,Shan G,2016.Long non-coding RNAs in the cytoplasm.Genomics Proteomics Bioinformatics,14(2):73-80.https://doi.org/10.1016/j.gpb.2016.03.005
Rica?o-Ponce I,Wijmenga C,2013.Mapping of immune-mediated disease genes.Annu Rev Genomics Hum Genet,14:325-353.https://doi.org/10.1146/annurev-genom-091212-153450
Song JH,Yi CQ,2017.Chemical modifications to RNA:a new layer of gene expression regulation.ACS Chem Biol,12(2):316-325.https://doi.org/10.1021/acschembio.6b00960
Srivastava AK,Schwartz CE,2014.Intellectual disability and autism spectrum disorders:causal genes and molecular mechanisms.Neurosci Biobehav Rev,46:161-174.https://doi.org/10.1016/j.neubiorev.2014.02.015
Su ZD,Huang Y,Zhang ZY,et al.,2018.iLoc-lncRNA:predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC.Bioinformatics,34(24):4196-4204.https://doi.org/10.1093/bioinformatics/bty508
Sun QY,Hao QY,Prasanth KV,2018.Nuclear long noncoding RNAs:key regulators of gene expression.Trends Genet,34(2):142-157.https://doi.org/10.1016/j.tig.2017.11.005
Sun S,del Rosario BC,Szanto A,et al.,2013.Jpx RNA activates Xist by evicting CTCF.Cell,153(7):1537-1551.https://doi.org/10.1016/j.cell.2013.05.028
Tripathi V,Ellis JD,Shen Z,et al.,2010.The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation.Mol Cell,39(6):925-938.https://doi.org/10.1016/j.molcel.2010.08.011
van de Vondervoort IIGM,Gordebeke PM,Khoshab N,et al.,2013.Long non-coding RNAs in neurodevelopmental disorders.Front Mol Neurosci,6:53.https://doi.org/10.3389/fnmol.2013.00053
Verpelli C,Montani C,Vicidomini C,et al.,2013.Mutations of the synapse genes and intellectual disability syndromes.Eur J Pharmacol,719(1-3):112-116.https://doi.org/10.1016/j.ejphar.2013.07.023
Wang KC,Chang HY,2011.Molecular mechanisms of long noncoding RNAs.Mol Cell,43(6):904-914.https://doi.org/10.1016/j.molcel.2011.08.018
Wang X,He C,2014.Dynamic RNA modifications in posttranscriptional regulation.Mol Cell,56(1):5-12.https://doi.org/10.1016/j.molcel.2014.09.001
Wang X,Lu ZK,Gomez A,et al.,2014.N6-methyladenosinedependent regulation of messenger RNA stability.Nature,505(7481):117-120.https://doi.org/10.1038/nature12730
Wang X,Zhao BS,Roundtree IA,et al.,2015.N6-methyladenosine modulates messenger RNA translation efficiency.Cell,161(6):1388-1399.https://doi.org/10.1016/j.cell.2015.05.014
Wang Y,Zhao X,Ju W,et al.,2015.Genome-wide differential expression of synaptic long noncoding RNAs in autism spectrum disorder.Transl Psychiatry,5(10):e660.https://doi.org/10.1038/tp.2015.144
Werner MS,Ruthenburg AJ,2015.Nuclear fractionation reveals thousands of chromatin-tethered noncoding RNAs adjacent to active genes.Cell Rep,12(7):1089-1098.https://doi.org/10.1016/j.celrep.2015.07.033
Wu P,Zuo XL,Deng HL,et al.,2013.Roles of long noncoding RNAs in brain development,functional diversification and neurodegenerative diseases.Brain Res Bull,97:69-80.https://doi.org/10.1016/j.brainresbull.2013.06.001
Xu X,Xu YC,Shi CQ,et al.,2017.A genome-wide comprehensively analyses of long noncoding RNA profiling and metastasis associated lncRNAs in renal cell carcinoma.Oncotarget,8(50):87773-87781.https://doi.org/10.18632/oncotarget.21206
Yang LT,Tang YY,Xiong F,et al.,2018.LncRNAs regulate cancer metastasis via binding to functional proteins.Oncotarget,9(1):1426-1443.https://doi.org/10.18632/oncotarget.22840
Yoon JH,Abdelmohsen K,Kim J,et al.,2013.Scaffold function of long non-coding RNA HOTAIR in protein ubiquitination.Nat Commun,4:2939.https://doi.org/10.1038/ncomms3939
Zampetaki A,Albrecht A,Steinhofel K,2018.Long-noncoding RNA structure and function:is there a link?Front Physiol,9:1201.https://doi.org/10.3389/fphys.2018.01201
Zhang YQ,Hamada M,2018.DeepM6ASeq:prediction and characterization of m6A-containing sequences using deep learning.BMC Bioinformatics,19(Suppl 19):524.https://doi.org/10.1186/s12859-018-2516-4
Zhang ZH,Jhaveri DJ,Marshall VM,et al.,2014.A comparative study of techniques for differential expression analysis on RNA-seq data.PLo S ONE,9(8):e103207.https://doi.org/10.1371/journal.pone.0103207
Zheng GXY,Do BT,Webster DE,et al.,2014.Dicer-microRNA-Myc circuit promotes transcription of hundreds of long noncoding RNAs.Nat Struct Mol Biol,21(7):585-590.https://doi.org/10.1038/nsmb.2842
Zhou Y,Zeng P,Li YH,et al.,2016.SRAMP:prediction of mammalian N6-methyladenosine(m6A)sites based on sequence-derived features.Nucleic Acids Res,44(10):e91.https://doi.org/10.1093/nar/gkw104
Ziats MN,Rennert OM,2013.Aberrant expression of long noncoding RNAs in autistic brain.J Mol Neurosci,49(3):589-593.https://doi.org/10.1007/s12031-012-9880-8
Zou Q,Xing PW,Wei LY,et al.,2019.Gene2vec:gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA.RNA,25(2):205-218.https://doi.org/10.1261/rna.069112.118