Using machine learning techniques and genomic/proteomic information from known databases for defining relevant features for PPI classification

详细信息	查看全文 \| 推荐本文 \|

作者：J.M. Urquiza^a ; ^{jurquiza@atc.ugr.es} ; I. Rojas^a ; ^{irojas@atc.ugr.es} ; H. Pomares^a ; ^{hpomares@atc.ugr.es} ; J. Herrera^a ; ^{jherrera@atc.ugr.es} ; J.P. Florido^a ; ^{jperez@atc.ugr.es} ; O. Valenzuela^b ; ^{olgavc@ugr.es} ; M. Cepero^b ; ^{mcepero@ugr.es}
关键词：Proteomic in protein interaction ; PPI classification ; Feature extraction and selection ; Support vector machines
刊名：Computers in Biology and Medicine
出版年：2012
期刊代码：70_00104825
类别：med
出版时间：June, 2012
卷：42
期：6
页码：639-650
文件大小：492 K

摘要

In modern proteomics, prediction of protein-protein interactions (PPIs) is a key research line, as these interactions take part in most essential biological processes. In this paper, a new approach is proposed to PPI data classification based on the extraction of genomic and proteomic information from well-known databases and the incorporation of semantic measures. This approach is carried out through the application of data mining techniques and provides very accurate models with high levels of sensitivity and specificity in the classification of PPIs. The well-known support vector machine paradigm is used to learn the models, which will also return a new confidence score which may help expert researchers to filter out and validate new external PPIs. One of the most-widely analyzed organisms, yeast, will be studied. We processed a very high-confidence dataset by extracting up to 26 specific features obtained from the chosen databases, half of them calculated using two new similarity measures proposed in this paper. Then, by applying a filter-wrapper algorithm for feature selection, we obtained a final set composed of the eight most relevant features for predicting PPIs, which was validated by a ROC analysis. The prediction capability of the support vector machine model using these eight features was tested through the evaluation of the predictions obtained in a set of external experimental, computational, and literature-collected datasets.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700