The elusive short gene - an ensemble method for recognition for prokaryotic genome

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

The elusive short gene - an ensemble method for recognition for prokaryotic genome

详细信息	查看全文 \| 推荐本文 \|

作者：Baharak Goli ; ^{baharak_goli@yahoo.com} ; Achuthsankar S. Nair
关键词：Computational gene finding ; Short gene prediction ; Ensemble classifier ; Feature selection ; Adaboost.M1 ; Random forests
刊名：Biochemical and Biophysical Research Communications
出版年：2012
期刊代码：159_0006291x
类别：bio
出版时间：25 May, 2012
卷：422
期：1
页码：36-41
文件大小：547 K

摘要

Accurate prediction of short protein coding DNA from genome sequence information remains an unsolved problem in DNA sequence analysis. Popular gene finding tools show drastic reduction in accuracy while attempting to predict genes of length less than 400 nt, a length we define as short. This study performs a quantitative evaluation of a set of selected coding measures in terms of their discriminative power in recognizing short genes in prokaryotic genomes. By performing Fast Correlation Based Feature Selection (FCBF) technique, we identified a subset of coding measures with high discriminative power. Using the measures identified thus, we present a novel approach for short genes recognition. A short-gene predictor employing AdaBoost.M1 in conjunction with random forests as the base classifier gives 92.74%accuracy, 94.77%sensitivity and 90.06%specificity on short genes.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700