Recognition of Human Promoter based on GMM and Rough Set
详细信息    查看官网全文
摘要
Identifying Human promoter is the foundation for understanding gene regulation and the key point of large-scale function prediction. Although many algorithms have been proposed, they are rather complex with the performance still limited by low sensitivity and high false positives. In this paper, Gaussian Mixture Model is used to build the model of positional densities of oligonucleotides as the promoter features. Rough Set is applied for analysis the relationship of the binding sites occurence at same time and identify the human promoter sequence. Clustering algorithm which using the inverse of fuzzy likelihood function as the cluster distance is used to estimate the Gaussian Mixture Model optimal number and the parameters of sub-models. And Least Square is used to calculate the mixing proportions. The optimal solution and efficient convergence are obtained. The features for building the predict model are not perfect, so Rough Set Theory is applied for building the promoter information table finding the relationship of the motifs at upstream and downstream of promoter sequence, at the mean time, to predict human promoter sequence. The simulation results show the accuracy of the prediction model is high.
Identifying Human promoter is the foundation for understanding gene regulation and the key point of large-scale function prediction. Although many algorithms have been proposed, they are rather complex with the performance still limited by low sensitivity and high false positives. In this paper, Gaussian Mixture Model is used to build the model of positional densities of oligonucleotides as the promoter features. Rough Set is applied for analysis the relationship of the binding sites occurence at same time and identify the human promoter sequence. Clustering algorithm which using the inverse of fuzzy likelihood function as the cluster distance is used to estimate the Gaussian Mixture Model optimal number and the parameters of sub-models. And Least Square is used to calculate the mixing proportions. The optimal solution and efficient convergence are obtained. The features for building the predict model are not perfect, so Rough Set Theory is applied for building the promoter information table finding the relationship of the motifs at upstream and downstream of promoter sequence, at the mean time, to predict human promoter sequence. The simulation results show the accuracy of the prediction model is high.
引文
[1]Hong Lin Zhai.The prediction of promoter sequences based on the chemical features[J].Expert Systems with Applications,38:7881–7885,2011.
    [2]Scheila de Avila e Silva*,Franciele Forte,Ivaine T.S.Sartor,Tahila Andrighetti,Günther J.L.Gerhardt,Ana Paula Longaray Delamare,Sergio Echeverrigaray.DNA duplex stability as discriminative54 28-dependent characteristic for Escherichia coli-and promoter sequences[J].Biologicals,42:22-28,2014.
    [3]Yong-Chun Zuo,Qian-Zhong Li.Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure,GC-Skew and DNA geometric flexibility[J].Genomics,97:112-120,2011.
    [4]Kemal Polat,Salih Günes.A new method to forecast of Escherichia coli promoter gene sequences:Integrating feature selection and Fuzzy-AIRS classifier system[J].Expert Systems with Applications,36:57-64,2009.
    [5]Xuan Zhoua,Zhanchao Lia,Zong Daib,Xiaoyong Zou.Predicting promoters by pseudo-trinucleotide compositions based on discrete wavelets transform[J].Journal of Theoretical Biology,319:1-7,2013.
    [6]Xiao-yu Zhao,Jin Zhang,Yuan-yuanC hen,Qiang Li,Tao Yang,Cong Pian,Liang-yun Zhang.Promoter recognition based on the maximum entropy hidden Markov model[J].Computers in Biology and Medicine,51:73-81,2014.
    [7]Vipin Narang,Wing-Kin Sung,Ankush Mittal.Computational Modeling of Oligonucleotide Positional Densities for Human Promoter Prediction[J].Artificial intelligence in medicine,35:107-119,2005.
    [8]Minaei-Bidgoli B,Topchy A,Punch W.A comparison of resampling methods for clustering ensembles[A].Proceedings of the international conference on artificial intelligence(IC-AI’04)[C].Las Vegas,Nevada,USA:2004:939-945.
    [9]Zeng F.F.,Cai Z.X..Fuzzy Identification based on fuzzy likelihood function[J],Control and Decision,13(5):581,1998.(in Chinese)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700