基于正负样例的蛋白质功能预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Protein Function Prediction Using Positive and Negative Examples
  • 作者:傅广垣 ; 余国先 ; 王峻 ; 郭茂祖
  • 英文作者:Fu Guangyuan;Yu Guoxian;Wang Jun;Guo Maozu;College of Computer and Information Science,Southwest University;School of Computer Science and Technology,Harbin Institute of Technology;
  • 关键词:蛋白质功能预测 ; 正样例 ; 负样例 ; 符号混合图 ; 标签传播
  • 英文关键词:protein function prediction;;positive examples;;negative examples;;signed hybrid graph;;label propagation
  • 中文刊名:JFYZ
  • 英文刊名:Journal of Computer Research and Development
  • 机构:西南大学计算机与信息科学学院;哈尔滨工业大学计算机科学与技术学院;
  • 出版日期:2016-08-15
  • 出版单位:计算机研究与发展
  • 年:2016
  • 期:v.53
  • 基金:国家自然科学基金项目(61402378,61571163,61532014);; 重庆市基础与前沿研究项目(cstc2014jcyjA40031,cstc2016jcyjA0351);; 重庆市研究生科研创新项目(CYS16070);; 中央高校基本科研业务费基金项目(2362015XK07,XDJK2016B009,XDJK2016D021)~~
  • 语种:中文;
  • 页:JFYZ201608010
  • 页数:13
  • CN:08
  • ISSN:11-1777/TP
  • 分类号:108-120
摘要
蛋白质功能预测是后基因组时代生物信息学的核心问题之一.蛋白质功能标记数据库通常仅提供蛋白质具有某个功能(正样例)的信息,极少提供蛋白质不具有某个功能(负样例)的信息.当前的蛋白质功能预测方法通常仅利用蛋白质正样例,极少关注量少但富含信息的蛋白质负样例.为此,提出一种基于正负样例的蛋白质功能预测方法(protein function prediction using positive and negative examples,ProPN).ProPN首先通过构造一个有向符号混合图描述已知的蛋白质与功能标记的正负关联信息、蛋白质之间的互作信息和功能标记间的关联关系,再通过符号混合图上的标签传播算法预测蛋白质功能.在酵母菌、老鼠和人类蛋白质数据集上的实验表明,ProPN不仅在预测已知部分功能标记蛋白质的负样例任务上优于现有算法,在预测功能标记完全未知蛋白质的功能任务上也获得了较其他相关方法更高的精度.
        Predicting protein function is one of the key challenges in the post genome era.Functional annotation databases of proteins mainly provide the knowledge of positive examples that proteins carrying out a given function,and rarely record the knowledge of negative examples that proteins not carrying out a given function.Current computational models almost only focus on utilizing the positive examples for function prediction and seldom pay attention to these scarce but informative negative examples.It is well recognized that both positive and negative examples should be used to achieve a discriminative predictor.Motivated by this recognition,in this paper,we propose a protein function prediction approach using positive and negative examples(ProPN)to bridge this gap.ProPN first utilizes a direct signed hybrid graph to describe the positive examples,negative examples,interactions between proteins and correlations between functions;and then it employs label propagation on the graph to predict protein function.The experimental results on several public available proteomic datasets demonstrate that ProPN not only makes better performance in predicting negative examples of proteins whose functional annotations are partially known than state-of-the-art algorithms,but also performs better than other related approaches in predicting functions of proteins whose functional annotations are completely unknown.
引文
[1]Radivojac P,Clark W,Oron T,et al.A large-scale evaluation of computational protein function prediction[J].Nature Methods,2013,10(3):221-227
    [2]Schwikowski B,Uetz P,Field S.A network of proteinprotein interactions in yeast[J].Nature Biotechnology,2000,18(12):1257-1261
    [3]Guo Maozu,Dai Qiguo,Xu Liqiu,et al.On protein complexes identifying algorithm based on the novel modularity function[J].Journal of Computer Research and Development,2014,51(10):2178-2186(in Chinese)(郭茂祖,代启国,徐立秋,等.一种蛋白质复合体模块度函数及其识别算法[J].计算机研究与发展,2014,51(10):2178-2186)
    [4]Chua H,Sung W,Wong L.Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions[J].Bioinformatics,2006,22(13):1623-1630
    [5]Deng M,Tu Z,Sun F,et al.Mapping Gene Ontology to proteins based on protein-protein interaction data[J].Bioinformatics,2004,20(6):895-902
    [6]Mostafavi S,Morris O.Fast integration of heterogeneous data sources for predicting gene function with limited annotation[J].Bioinformatics,2010,26(14):1759-1765
    [7]Zhang X,Dai D.A framework for incorporating functional interrelationships into protein function prediction algorithms[J].IEEE/ACM Trans on Computational Biology and Bioinformatics,2012,9(3):740-753
    [8]Wang H,Huang H,Ding C.Function-function correlated multi-label protein function prediction over interaction networks[J].Journal of Computational Biology,2013,20(4):322-343
    [9]Wu J,Huang S,Zhou Z.Genome-wide protein function prediction through multi-instance multi-label learning[J].IEEE/ACM Trans on Computational Biology and Bioinformatics,2014,11(5):891-902
    [10]Zhang M,Zhou Z.A review on multi-label learning algorithms[J].IEEE Trans on Knowledge&Data Engineering,2014,26(8):1819-1837
    [11]Ashburner M,Ball C A,Blake J A,et al.Gene ontology:tool for the unification of biology[J].Nature Genetics,2000,25(1):25-29
    [12]Pandey G,Myers C,Kumar V.Incorporating functional inter-relationships into protein function prediction algorithms[J].BMC Bioinformatics,2009,10:No.142
    [13]Belkin M,Niyogi P,Sindhwani V.Manifold regularization:A geometric framework for learning from labeled and unlabeled examples[J].Journal of Machine Learning Research,2006,7(1):2399-2434
    [14]Yu G,Domeniconi C,Rangwala H,et al.Transductive multi-label ensemble classification for protein function prediction[C]//Proc of 18th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2012:1077-1085
    [15]Chi X,Hou J.An iterative approach of protein function prediction[J].BMC Bioinformatics,2011,12:No.437
    [16]Yu G,Rangwala H,Domeniconi C,et al.Predicting protein function using multiple kernels[J].IEEE/ACM Trans on Computational Biology and Bioinformatics,2015,12(1):219-233
    [17]Yu G,Zhu H,Domeniconi C,et al.Integrating multiple networks for protein function prediction[J].BMC Systems Biology,2015,9(S1):S3
    [18]Tao Y,Sam L,Li J,et al.Information theory applied to the sparse gene ontology annotation network to predict novel gene function[J].Bioinformatics,2007,23(13):i529-i538
    [19]Rhee S,Wood V,Dolinski K,et al.Use and misuse of the gene ontology annotations[J].Nature Review Genetics,2008,9(7):509-515
    [20]Schnoes M,Ream D,Thorman A,et al.Biases in the experimental annotations of protein function and their effect on our understanding of protein function space[J].PLoS Computational Biology,2013,9(5):No.e1003063
    [21]Yu G,Zhu H,Domeniconi C,et al.Predicting protein function via downward random walks on a gene ontology[J].BMC Bioinformatics,2015,16:No.273
    [22]Legrain P,Aebersold R,Archakov A,et al.The human proteome project:current state and future direction[J].Molecular and Cellular Proteomics,2011,10(7):M111.00999
    [23]Gao Lei,Li Xia,Guo Zheng,et al.Broadly predicting specific protein functions with protein-protein interactions and gene expression profiles[J].China Science:Life Science,2006,36(5):441-450(in Chinese)(高磊,李霞,郭政,等.结合蛋白质互作与基因表达谱信息大范围预测蛋白质的精细功能[J].中国科学:生命科学,2006,36(5):441-450)
    [24]Zhao X,Wang Y,Chen L,et al.Gene function prediction using labeled and unlabeled data[J].BMC Bioinformatics,2008,9:No.57
    [25]Elkan C,Noto K.Learning classifiers from only positive and unlabeled data[C]//Proc of the 14th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2008:213-220
    [26]Yu G,Rangwala H,Domeniconi C,et al.Protein function prediction with incomplete annotations[J].IEEE/ACM Trans on Computational Biology and Bioinformatics,2014,11(3):579-591
    [27]Guan Y,Myers C,Hess D,et al.Predicting gene function in a hierarchical context with an ensemble of classifiers[J].Genome Biology,2008,9(S1):S3
    [28]Mostafavi S,Ray D,Warde-Farley D,et al.GeneMANIA:A real-time multiple association network integration algorithm for predicting gene function[J].Genome Biology,2008,9(S1):S4
    [29]Cesa-Bianchi N,Re M,Valentini G.Synergy of multi-label hierarchical ensembles,data fusion,and cost-sensitive methods for gene functional inference[J].Machine Learning,2012,88(1/2):209-241
    [30]Youngs N,Duncan P,Kevin D,et al.Parametric Bayesian priors and better choice of negative examples improve protein function prediction[J].Bioinformatics,2013,29(9):1190-1198
    [31]Yu G,Zhu H,Domeniconi C.Predicting protein function using incomplete hierarchical labels[J].BMC Bioinformatics,2015,16:No.1
    [32]Pena-Castillo L,Tasan M,Myers C,et al.A critical assessment of mus musculus gene function prediction using integrated genomic evidence[J].Genome Biology,2008,9(S1):S2
    [33]Valentini G.True path rule hierarchical ensembles for genome-wide gene function prediction[J].IEEE/ACM Trans on Computational Biology and Bioinformatics,2011,8(3):832-547
    [34]Youngs N,Penfold-Brown D,Bonneau R,et al.Negative example selection for protein function prediction:the NoGO database[J].PLoS Computational Biology,2014,10(6):e1003644
    [35]Blei D,Ng A,Jordan M.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022
    [36]Huntley R,Sawford T,Martin M,et al.Understanding how and why the Gene Ontology and its annotations evolve:the GO within UniProt[J].GigaScience,2014,3:No.4
    [37]Wang H,Huang H,Ding C.Image annotation using birelational graph of images and semantic labels[C]//Proc of the 24th IEEE Conf on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2011:793-800
    [38]Zhou D,Bousquet O,Lal T,et al.Learning with local and global consistency[C]//Advance in Neural Information Processing Systems.Cambridge,MA:MIT Press,2003:321-328
    [39]Myers C,Barrett D,Hibbs M,et al.Finding function:evaluation methods for functional genomic data[J].BMC Genomics,2006,7(1):No.187
    (1)http:geneontology.org
    (2)http:geneontology.org/page/download-annotations

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700