用户名: 密码: 验证码:
基于深度学习的核小体位点预测方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Prediction method of nucleosome positioning based on deep learning
  • 作者:钱慎一 ; 李代祎 ; 王晓 ; 刘慧慧
  • 英文作者:QIAN Shen-yi;LI Dai-yi;WANG Xiao;LIU Hui-hui;School of Computer and Communication Engineering,Zhengzhou University of Light Industry;
  • 关键词:核小体位点 ; 向量化 ; 特征提取 ; 卷积神经网络 ; 交叉验证
  • 英文关键词:nucleosome positioning;;vectorization;;feature extraction;;convolutional neural network;;cross validation
  • 中文刊名:SJSJ
  • 英文刊名:Computer Engineering and Design
  • 机构:郑州轻工业学院计算机与通信工程学院;
  • 出版日期:2019-03-16
  • 出版单位:计算机工程与设计
  • 年:2019
  • 期:v.40;No.387
  • 基金:国家自然科学基金项目(61672470);国家自然科学基金青年基金项目(61402422);; 国家重点研发计划政府间科技合作基金项目(2016YFE0100300、2016YFE0100600);; 河南省科技厅中外合作基金项目(162102410076)
  • 语种:中文;
  • 页:SJSJ201903044
  • 页数:7
  • CN:03
  • ISSN:11-1775/TP
  • 分类号:269-275
摘要
为实现在海量的被测序DNA序列中快速、准确的定位核小体,解决传统人工实验法和被提出的一些计算方法耗时长和准确率低等问题,迫切需要设计一种快速有效的核小体自动化定位方法。在基于伪核苷酸K-联体特征提取的基础上构造样本集的特征向量,提出在TensorFlow框架下利用卷积神经网络(CNN)构建核小体定位的网络预测模型。在预测模型上分别对智人、线虫和果蝇3个基准数据集进行交叉验证测试,预测准确率分别为88.21%、89.19%、85.07%,实验结果表明,该预测模型性能高于目前已有预测模型。
        To locate nucleosomes quickly and accurately in a large number of sequenced DNA sequences,and to solve the problems of long time-consuming and low accuracy of traditional manual experiments and some computational methods proposed,it is urgent to design a fast and effective automatic nucleosome localization method.Therefore,the feature vectors of the sample set were constructed based on the feature extraction of pseudo K-tuple nucleotide composition,and the network prediction model of nucleosome localization was constructed using convolutional neural network(CNN)in the framework of TensorFlow.Cross-validation tests were carried out on three benchmark data sets of Homo sapiens,Nematodes and Drosophila respectively.The prediction accuracy is 88.21%,89.19% and 85.07%respectively.Experimental results show that the performance of the prediction model constructed is obviously better than the existing prediction model.
引文
[1]LI Zehua,FENG Jihua,ZHOU Xiaowen,et al.Progress in localization of human genome nucleosomes[J].Genomics and Applied Biology,2016,35(11):3024-3027(in Chinese).[李泽华,丰继华,周晓雯,等.果蝇核小体定位研究进展[J].基因组学与应用生物学,2016,35(11):3024-3027.]
    [2]ZHANG Defang,MA Qiuyue,YIN Tongming,et al.Part IIIgenetic sequencing and its application[J].Journal of Chinese Biotechnology,2013,33(5):125-131(in Chinese).[张得芳,马秋月,尹佟明,等.第三代测序技术及其应用[J].中国生物工程杂志,2013,33(5):125-131.]
    [3]Struhl K,Segal E.Determinants of nucleosome positioning[J].Structural&Molecular Biology,2013,20(3):267-273.
    [4]Meher P K,Sahu T K,Rao A R.Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier[J].Gene,2016,592(2):316.
    [5]Chen W,Feng P,Ding H,et al.Using deformation energy to analyze nucleosome positioning in genomes[J].Genomics,2016,107(2-3):69-75.
    [6]Guo S H,Deng E Z,Xu L Q,et al.iNuc-PseKNC:A sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition[J].Bioinformatics,2014,30(11):15-22.
    [7]Tahir M,Hayat M.iNuc-STNC:A sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC[J].Molecular Biosystems,2016,12(8):25-87.
    [8]Awazu A.Prediction of nucleosome positioning by the incorporation of frequencies and distributions of three different nucleotide segment lengths into a general pseudo k-tuple nucleotide composition[J].Bioinformatics,2017,33(1):42-43.
    [9]Chen Q,Wan Y,Lei Y,et al.Evaluation of CD-HIT for constructing non-redundant databases[C]//IEEE International Conference on Bioinformatics and Biomedicine.IEEE,2017:703-706.
    [10]Su H,Cheng Z,Huang J,et al.ArabidopsisRAD51,RAD51Cand XRCC3proteins form a complex and facilitate RAD51localization on chromosomes for meiotic recombination[J].Plos Genetics,2017,13(5):e1006827.
    [11]Khomushku G M,Brezhneva Y S,Puchnin V S,et al.Unification of procedures for determining pharmaceutical substances bearing ionogenic acidic and basic groups,tranexamic acid,ampasse,and ethylmethylhydroxypyridine succinate by reversed-phase HPLC[J].Journal of Analytical Chemistry,2014,69(2):194-199.
    [12]Chen W,Feng P M,Lin H,et al.iSS-PseDNC:Identifying splicing sites using pseudo dinucleotide composition[J].Biomed Research International,2014(2):88-90.
    [13]Qiu WR,Xiao X,Chou KC.iRSpot-TNCPseAAC:Identify recombination spots with trinucleotide composition and pseudo amino acid components[J].International Journal of Molecular Sciences,2014,15(2):1746-1766.
    [14]Huang C,Yuan J Q.A multilabel model based on chou’s pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types[J].Journal of Membrane Biology,2013,246(4):327-334.
    [15]Liu B,Yang F,Huang D S,et al.iPromoter-2L:A twolayer predictor for identifying promoters and their types by multi-window-based PseKNC[J].Bioinformatics,2017,34(1):33-40.
    [16]LI Chuanpeng,QIN Pinle,ZHANG Jinjing.Research on image denoising based on depth convolution neural network[J].Computer Engineering,2017,43(3):253-260(in Chinese).[李传朋,秦品乐,张晋京.基于深度卷积神经网络的图像去噪研究[J].计算机工程,2017,43(3):253-260.]
    [17]Yuan Dong,Yue Wu.Adaptive cascade deep convolutional neural networks for face alignment[J].Computer Standards&Interfaces,2015,42(3):105-112.
    [18]Kingma D P,Ba J.Adam:A method for stochastic optimization[J].Computer Science,2014,28(12):16-31.
    [19]Alexander K M,Jechel C,Pinter C,et al.SU-E-T-231:Cross-validation of 3Dgamma comparison tools[J].Medical Physics,2015,42(6):3385.
    [20]Saito T,Rehmsmeier M.Precrec:Fast and accurate precision-recall and ROC curve calculations in R[J].Bioinformatics,2016,33(1):145-147.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700