Systematic analysis revealed better performance of random forest algorithm coupled with complex network features in predicting microRNA precursors
详细信息    查看全文
文摘
The improvement of computational methods greatly benefits the investigation of miRNAs. Our study validates the features in miRNA identification with an independent dataset, and provides researchers with common practices of the development of predictive models. A total number of 84 representative features, which occurred in researches of miRNAs classification, have been extracted and divided into four feature sets, i.e. complex network feature set (NET), structural feature set (STRUC), thermodynamic feature set (THERMO), and hybrid feature set (TOTAL). Systematic analysis is carried out on network, structural, thermodynamic and hybrid features. The dominant features are discriminated from uninformative features in both single and hybrid sets, on the basis of permutation importance strategy. Random forest models are constructed using only informative network, structural, thermodynamic and hybrid variables, resulting in area under the receiver operating curve (AUC) values of 0.9611, 0.9563, 0.9351, and 0.9469, respectively, based on validated datasets. The result suggests that the best performance could be got by using features derived from complex network. These results would be invaluable in understanding biological mechanism and function of miRNAs. All the data and scripts used in this article are freely available for download at .

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700