一种新的融合统计特征的DNA甲基化位点识别方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:New method for identification of DNA methylation sites with fusion statistical characteristics
  • 作者:孙佳伟 ; 张明 ; 王长宝 ; 徐维艳 ; 程科 ; 段先华
  • 英文作者:SUN Jiawei;ZHANG Ming;WANG Changbao;XU Weiyan;CHENG Ke;DUAN Xianhua;School of Computer Science, Jiangsu University of Science and Technology;School of Science, Jiangsu University of Science and Technology;
  • 关键词:DNA甲基化 ; 统计特征 ; 模式信息 ; 支持向量机 ; Jackknife测试
  • 英文关键词:DNA methylation;;statistical characteristics;;pattern information;;SVM;;Jackknife test
  • 中文刊名:HDCB
  • 英文刊名:Journal of Jiangsu University of Science and Technology(Natural Science Edition)
  • 机构:江苏科技大学计算机学院;江苏科技大学理学院;
  • 出版日期:2019-04-15
  • 出版单位:江苏科技大学学报(自然科学版)
  • 年:2019
  • 期:v.33;No.173
  • 基金:国家自然科学基金资助项目(61572242,61373062);; 江苏省自然科学基金资助项目(BK20141403,BK20130472);; 江苏省科技支撑项目(BE2014692)
  • 语种:中文;
  • 页:HDCB201902011
  • 页数:7
  • CN:02
  • ISSN:32-1765/N
  • 分类号:66-72
摘要
自动、准确地识别DNA甲基化修饰位点对于研究基因的调控、转录和表达机理,有针对性地开发癌症靶向治疗药物有重要意义.然而,基于核酸频率统计特征和物化属性伪核酸成分统计特征并不能很好地反应DNA甲基化位点的模式信息,所构建的DNA甲基化位点预测器精度也不高.因此,文中提出从3个不同的视角抽取DNA序列上的核酸频次统计信息、位置统计信息和空间结构属性信息,并将其融合为一种新的统计特征向量,然后在相同的基础数据集上采用SVM分类器和严格的Jackknife测试方法进行实验验证.结果表明:该方法构建的预测器较当前最好的iDNA-methyl预测器,在Acc、Mcc和AUC 3个性能指标上分别提高了11.85%、24%和11.3%;该研究表明在DNA甲基化位点预测问题上,核酸序列的频次统计信息、位置统计信息和空间结构属性信息具有较好互补性,这3个视角相融合得到的特征向量能够更好地反映DNA甲基化修饰位点的模式特征,提高DNA甲基化位点的预测精度.
        It is of great significance to adopt intelligent computing method to identify DNA methylation sites automatically and accurately, to study the gene regulation, transcription and expression mechanism, and to develop targeted cancer drugs. However, the nucleotide composition feature based on frequency statistics and the pseudo nucleotide composition feature, does not reflect better the pattern information of the DNA methylation site, which are based on physical and chemical properties, and the accuracy of the constructed DNA methylation site predictor is not higher. Therefore, we propose to extract the new fusion statistical feature vectors through frequency statistics, position statistics and spatial structure attribute information of nucleotide on the DNA sequence from three different angles, and verify the dataset using the SVM classifier and rigorous Jackknife test on the same benchmark dataset. The experimental results show that the predictor constructed by the method is superior to the best current iDNA-methyl predictor, which improves by 11.85%, 24% and 11.3% in Acc, Mcc and AUC, respectively. The research shows that the frequency statistics, position statistics and spatial structure attribute information of nucleotide on the DNA sequence are complementary to each other on the DNA methylation site prediction problem. The feature vectors, can better reflect the pattern of DNA methylation sites and improve the prediction accuracy of DNA methylation sites, which are obtained by using the fusion of these three angles.
引文
[1]韩竞男,鲁昊骋,梁静.DNA甲基化与癌症[J].中国生物化学与分子生物学报,2012,28(2):108-114.HAN Jingnan,LU Haocheng,LIANG Jing.DNAmethylation and cancer[J].Chinese Journal of Biochemistry and Molecular Biology,2012,28(2):108-114.(in Chinese).
    [2]韦云真,刘晓娟,王芳,等.癌症DNA甲基化调控位点的识别[J].生物信息学,2015,13(3):170-178.DOI:10.3969/j.issn.1672-5565.2015.03.05.WEI Yunzhen,LIU Xiaojuan,WANG Fang,et al.I-dentification of cancer DNA methylation regulatory sites[J].Chinese Journal of Bioinformatics,2015,13(3):170-178.DOI:10.3969/j.issn.1672-5565.2015.03.05.(in Chinese).
    [3]杜秀全,程家兴,宋杰.基于最大熵模型的蛋白质作用位点识别方法[J].计算机工程,2010,36(18):203-204.DU Xiuquan,CHENG Jiaxing,SONG Jie.Recognition method of protein interaction sites based on maximum entropy model[J].Computer Engineering,2010,36(18):203-204.(in Chinese).
    [4]石大宏,何雪.序列蛋白质-GDP绑定位点预测[J].计算机工程与应用,2016,52(13):55-59.SHI Dahong,HE Xue.Sequential protein-GDPbinding residues prediction[J].Computer Engineering and Applications,2016,52(13):55-59.(in Chinese).
    [5]ZHANG M,SUN J W,LIU Z,et al.Improving N6-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties[J].Analytical Biochemistry,2016,508:104-113.DOI:10.1016/j.ab.2016.06.001.
    [6]CHEN W,FENG P,TANG H,et al.RAMPred:identifying the N(1)-methyladenosine sites in eukaryotic transcriptomes[J].Scientific Reports,2016,6:31080.DOI:10.1038/srep31080.
    [7]BHASIN M,ZHANG H,REINHERZ E L,et al.Prediction of methylated Cp Gs in DNA sequences using a support vector machine[J].Febs Letters,2005,579(20):4302-4308.DOI:10.1016/j.febslet.2005.07.002.
    [8]FANG F,FAN S,ZHANG X,et al.Predicting methylation status of Cp G islands in the human brain[J].Bioinformatics,2006,22(18):2204-2209.DOI:10.1093/bioinformatics/btl377.
    [9]LIU Z,XIAO X,QIU W R,et al.i DNA-Methyl:identifying DNA methylation sites via pseudo trinucleotide composition[J].Analytical Biochemistry,2015,474:69-77.DOI:10.1016/j.ab.2014.12.009.
    [10]XU Y,DING Y X,DING J,et al.Phogly-PseAAC:prediction of lysine phosphoglycerylation in proteins incorporating with position-specific propensity[J].Journal of Theoretical Biology,2015,379:10-15.DOI:10.1016/j.jtbi.2015.04.016.
    [11]LI G Q,LIU Z,SHEN H B,et al.Target M6A:identifying N6-methyladenosine sites from RNA sequences via position-specific nucleotide propensities and a support vector machine[J].IEEE Transactions on Nanobioscience,2016,15(7):674-682.DOI:10.1109/TNB.2016.2599115.
    [12]AMOREIRA C,HINDERMANN W,GRUNAU C.An improved version of the DNA methylation database(MethDB)[J].Nucleic Acids Research,2003,31(1):75-77.DOI:10.1093/nar/gkg093.
    [13]VAPNIK V N,LERNER A Y.Recognition of patterns with help of generalized portraits[J].Avtomat i Telemekh,1963,774-780.
    [14]CHEN W,TRAN H,LIANG Z,et al.Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome[J].Scientific Reports,2015,5(1).DOI:10.1038/srep13859.
    [15]FENG P,DING H,CHEN W,et al.Identifying RNA5-methylcytosine sites via pseudo nucleotide compositions[J].Molecular Biosystems,2016,12(11):3307-3311.DOI:10.1039/c6mb00471g.
    [16]CHEN W,FENG P,TANG H,et al.Identifying 2'-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions[J].Genomics,2016,107(6):255-258.DOI:10.1016/j.ygeno.2016.05.003.
    [17]CHOU K C,ZHANG C T.Prediction of protein structural classes[J].Critical Reviews in Biochemistry and Molecular Biology,1995,30(4):275-349.DOI:10.3109/10409239509083488.
    [18]GUO S H,DENG E Z,XU L Q,et al.i NucPse KNC:a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition[J].Bioinformatics,2014,30(11):1522-1529.DOI:10.1093/bioinformatics/btu083.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700