摘要
为实现在海量的被测序DNA序列中快速、准确的定位核小体,解决传统人工实验法和被提出的一些计算方法耗时长和准确率低等问题,迫切需要设计一种快速有效的核小体自动化定位方法。在基于伪核苷酸K-联体特征提取的基础上构造样本集的特征向量,提出在TensorFlow框架下利用卷积神经网络(CNN)构建核小体定位的网络预测模型。在预测模型上分别对智人、线虫和果蝇3个基准数据集进行交叉验证测试,预测准确率分别为88.21%、89.19%、85.07%,实验结果表明,该预测模型性能高于目前已有预测模型。
To locate nucleosomes quickly and accurately in a large number of sequenced DNA sequences,and to solve the problems of long time-consuming and low accuracy of traditional manual experiments and some computational methods proposed,it is urgent to design a fast and effective automatic nucleosome localization method.Therefore,the feature vectors of the sample set were constructed based on the feature extraction of pseudo K-tuple nucleotide composition,and the network prediction model of nucleosome localization was constructed using convolutional neural network(CNN)in the framework of TensorFlow.Cross-validation tests were carried out on three benchmark data sets of Homo sapiens,Nematodes and Drosophila respectively.The prediction accuracy is 88.21%,89.19% and 85.07%respectively.Experimental results show that the performance of the prediction model constructed is obviously better than the existing prediction model.
引文
[1]LI Zehua,FENG Jihua,ZHOU Xiaowen,et al.Progress in localization of human genome nucleosomes[J].Genomics and Applied Biology,2016,35(11):3024-3027(in Chinese).[李泽华,丰继华,周晓雯,等.果蝇核小体定位研究进展[J].基因组学与应用生物学,2016,35(11):3024-3027.]
[2]ZHANG Defang,MA Qiuyue,YIN Tongming,et al.Part IIIgenetic sequencing and its application[J].Journal of Chinese Biotechnology,2013,33(5):125-131(in Chinese).[张得芳,马秋月,尹佟明,等.第三代测序技术及其应用[J].中国生物工程杂志,2013,33(5):125-131.]
[3]Struhl K,Segal E.Determinants of nucleosome positioning[J].Structural&Molecular Biology,2013,20(3):267-273.
[4]Meher P K,Sahu T K,Rao A R.Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier[J].Gene,2016,592(2):316.
[5]Chen W,Feng P,Ding H,et al.Using deformation energy to analyze nucleosome positioning in genomes[J].Genomics,2016,107(2-3):69-75.
[6]Guo S H,Deng E Z,Xu L Q,et al.iNuc-PseKNC:A sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition[J].Bioinformatics,2014,30(11):15-22.
[7]Tahir M,Hayat M.iNuc-STNC:A sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC[J].Molecular Biosystems,2016,12(8):25-87.
[8]Awazu A.Prediction of nucleosome positioning by the incorporation of frequencies and distributions of three different nucleotide segment lengths into a general pseudo k-tuple nucleotide composition[J].Bioinformatics,2017,33(1):42-43.
[9]Chen Q,Wan Y,Lei Y,et al.Evaluation of CD-HIT for constructing non-redundant databases[C]//IEEE International Conference on Bioinformatics and Biomedicine.IEEE,2017:703-706.
[10]Su H,Cheng Z,Huang J,et al.ArabidopsisRAD51,RAD51Cand XRCC3proteins form a complex and facilitate RAD51localization on chromosomes for meiotic recombination[J].Plos Genetics,2017,13(5):e1006827.
[11]Khomushku G M,Brezhneva Y S,Puchnin V S,et al.Unification of procedures for determining pharmaceutical substances bearing ionogenic acidic and basic groups,tranexamic acid,ampasse,and ethylmethylhydroxypyridine succinate by reversed-phase HPLC[J].Journal of Analytical Chemistry,2014,69(2):194-199.
[12]Chen W,Feng P M,Lin H,et al.iSS-PseDNC:Identifying splicing sites using pseudo dinucleotide composition[J].Biomed Research International,2014(2):88-90.
[13]Qiu WR,Xiao X,Chou KC.iRSpot-TNCPseAAC:Identify recombination spots with trinucleotide composition and pseudo amino acid components[J].International Journal of Molecular Sciences,2014,15(2):1746-1766.
[14]Huang C,Yuan J Q.A multilabel model based on chou’s pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types[J].Journal of Membrane Biology,2013,246(4):327-334.
[15]Liu B,Yang F,Huang D S,et al.iPromoter-2L:A twolayer predictor for identifying promoters and their types by multi-window-based PseKNC[J].Bioinformatics,2017,34(1):33-40.
[16]LI Chuanpeng,QIN Pinle,ZHANG Jinjing.Research on image denoising based on depth convolution neural network[J].Computer Engineering,2017,43(3):253-260(in Chinese).[李传朋,秦品乐,张晋京.基于深度卷积神经网络的图像去噪研究[J].计算机工程,2017,43(3):253-260.]
[17]Yuan Dong,Yue Wu.Adaptive cascade deep convolutional neural networks for face alignment[J].Computer Standards&Interfaces,2015,42(3):105-112.
[18]Kingma D P,Ba J.Adam:A method for stochastic optimization[J].Computer Science,2014,28(12):16-31.
[19]Alexander K M,Jechel C,Pinter C,et al.SU-E-T-231:Cross-validation of 3Dgamma comparison tools[J].Medical Physics,2015,42(6):3385.
[20]Saito T,Rehmsmeier M.Precrec:Fast and accurate precision-recall and ROC curve calculations in R[J].Bioinformatics,2016,33(1):145-147.