Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors
详细信息    查看全文
  • 作者:Feng Xiao ; Hong-Bin Shen
  • 刊名:Journal of Chemical Information and Modeling
  • 出版年:2015
  • 出版时间:November 23, 2015
  • 年:2015
  • 卷:55
  • 期:11
  • 页码:2464-2474
  • 全文大小:571K
  • ISSN:1549-960X
文摘
The 伪-helical transmembrane proteins constitute 25% of the entire human proteome space and are difficult targets in high-resolution wet-lab structural studies, calling for accurate computational predictors. We present a novel sequence-based method called MemBrain-Rasa to predict relative solvent accessibility surface area (rASA) from primary sequences. MemBrain-Rasa features by an ensemble prediction protocol composed of a statistical machine-learning engine, which is trained in the sequential feature space, and a segment template similarity-based engine, which is constructed with solved structures and sequence alignment. We locally constructed a comprehensive database of residue relative solvent accessibility surface area from the solved protein 3D structures in the PDB database. It is searched against for segment templates that are expected to be structurally similar to the query sequence鈥檚 segments. The segment template-based prediction is then fused with the support vector regression outputs using knowledge rules. Our experiments show that pure machine learning output cannot cover the entire rASA solution space and will have a serious prediction preference problem due to the relatively small size of membrane protein structures that can be used as the training samples. The template-based engine solves this problem very well, resulting in significant improvement of the prediction performance. MemBrain-Rasa achieves a Pearson correlation coefficient of 0.733 and mean absolute error of 13.593 on the benchmark dataset, which are 26.4% and 26.1% better than existing predictors. MemBrain-Rasa represents a new progress in structure modeling of 伪-helical transmembrane proteins. MemBrain-Rasa is available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700