蛋白质相互作用预测方法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类基因组全序列测序与工作草图的完成,基因组学研究的重心己逐渐由结构基因组学向功能基因组学转移,生物医学随之进入一个新纪元--后基因组时代。在后基因组时代,一个重要任务就是对蛋白质组学的研究。得益于越来越多的高通量实验技术的出现和日臻成熟,目前已积累了大量的蛋白质组数据。当前的问题是.分析和研究这些数据的手段和能力严重滞后,使得花费大量人力和财力获得的数据未能产生更多有生物学意义的结果。因此,发展先进高效的信息分析和数据挖掘手段,从大量而繁杂的蛋白质组数据中找出内在联系,以揭示蛋白质的功能及相互作用关系具有极其重要的意义。
     蛋白质相互作用是分子生物学研究的热点及难点。蛋白质作为最主要的生命活动载体和功能执行者,对其复杂多样的结构功能、相互作用和动态变化进行深入研究,有助于在分子、细胞和生物体等多个层次上全面揭示生命现象的本质。蛋白质相互作用是生物体中众多生命活动过程的重要组成部分,是生物体生化反应的基础,是后基因组时代的主要任务。实验方法提供大量数据的同时,会带来大量的假阳性和假阴性数据。因此,本文从计算的角度来研究蛋白质相互作用,主要研究和探索了机器学习方法对蛋白质一蛋白质相互作用的预测问题。本文的工作主要包括以下几个方面:
     1)提出了一种新的基于氨基酸进化保守性的蛋白质相互作用预测方法。由于自然选择法则的作用,在一个蛋白质家族中与分子功能相关的氨基酸残基在进化过程中呈现保守性特征,蛋白质与外界的作用依赖于这些关键的残基。我们从蛋白质序列出发,提出一种新的基于氨基酸序列相关系数的编码方式,该编码方式同时考虑序列内部长程相互作用和序列之间的协同进化关系。对于正负学习样本,分别考虑来自DIP,MIPS与BIND数据库的正样本和四种不同方式构建的负样本,包括:通过随机选取蛋白质构造R-NEG;通过亚细胞定位方式,利用位于同一个亚细胞区间的蛋白质对构造IS-NEG;通过亚细胞定位方式利用位于不同亚细胞区间的蛋白质对构造BS-NEG:通过基因本体信息得到的具有较低RSSBP与RSSCC值的蛋白质来构造GO-NEG。MiPS Core和GO-NEG这一种组合方式与另外十一种组合方式相比预测准确率最高,并且有统计学意义的P值最小,分别称MIPS Core和GO-NEG为黄金正样本和黄金负样本。与已知的氨基酸残基自相关编码方式相比,相关系数编码方式得到较好的预测结果。对于跨模式生物的预测结果表明,基于相关系数编码方式的SVM模型具有较好的泛化能力。
     2)构建了一个改进的GPCA plus LDA预测模型来预测蛋白质相互作用,能够有效地提高膜蛋白相互作用的预测精度。Greedy KPCA(GPCA)得到的基坐标直接来源于样本数据,而KPCA算法得到的基坐标来自于样本数据的线性组合。尽管基于贪婪算法的KPCA算法是次优的,但是与传统的KPCA算法相比能够极大地降低计算复杂度。对于酵母这一典型的单细胞模式生物,大多数酵母整合膜蛋白无法通过实验方法来验证。我们提出利用膜蛋白相互作用的21个结构与序列特征,来构造56个正样本和150个负样本,实验表明基于GPCA plus LDA预测模型得到包括189个膜蛋白300对具有高可靠性的预测结果。实验结果也表明,尽管GPCA plus LDA方法进行了特征约简,同时有效地降低了数据之间的冗余性,但得到的结果比LDA方法略优;GPCA plus LDA方法由于解决了高阶信息丢失问题,得到的结果与KPCA plus LDA方法相比有了很大的改善与提高;GPCA plus LDA方法得到的预测正确率方差最小,表明十次测量值之间存在较小的差异,该方法具有较好的鲁棒性。通过计算揭示了膜蛋白相互作用网络具有小世界效应和无标度特性。
     3)提出了一种新的基于贝叶斯累加回归树计算模型(BART)的蛋白质相互作用预测方法。BART是一种新颖的集成学习方法,通过非参数化的贝叶斯回归方法,把累加回归树计算模型分解为若干个弱分类器,并通过集成方法整合为一个分类器集成系统。基于整合MCMC算法的BART预测模型用于蛋白质相互作用预测,获得了较好的预测准确率。与标准的MCMC方法相比,提出的整合MCMC算法能够有效地避免局部极小情况的出现。同时对于独立测试集,BART模型能够得到较好的预测结果,这表明BART具有较好的泛化性能.
With the human genome sequencing and the completion of the draft work, genomics research has been gradually shifting from the focus of structural genomics to functional genomics,bio-medicine enters a new era - the post-genome era.In the post-genome era,an important task is the study of proteomics.Benefiting from a growing number of high-throughput experimental technologies and becoming more mature,it has accumulated a large number of proteomic data.The current problem is that the means and ability of data analysis and study seriously lag behind,making the obtained data via a lot of human labour and financial supports fail to produce more meaningful results of biology.Therefore,the development of advanced and highly efficient information analysis and data mining tools to find internal links from a large number of proteins and complex set of data so as to reveal the relationship between protein function and protein interaction is of vitally important significance.
     Protein-protein interaction is the hot and difficult spots in the molecular biology investigation.Protein is a major life activity carrier and function executor,and the in-depth study of its complex and diverse structure and function,interaction and dynamic changes will be helpful to reveal the nature of life phenomenon in the molecular,cellular and organism levels.Protein-protein interaction is an important part in the process of diverse life activity in organisms,and the basis of biochemical reactions in organisms as well as the main task of post-genome era.When experimental methods provide large amounts of data,they also at the same time,will bring a large number of false positive and false negative data.Therefore,this thesis studies protein-protein interaction from the perspective of computing,which mainly includes the research and exploration of applying machine learning methods for protein-protein interaction prediction problems.This thesis mainly includes the following facets:
     1) Anovei method of predicting protein-protein interaction was proposed in this thesis based on amino acid evolutionary conservation.Under the rule of natural selection,amino acid residues that are involved in the function of a given protein family are more conservative.The interaction between proteins and environments depends on these important residues.Starting from the protein sequence,a new correlation coefficient based on the amino acid sequence of the encoding is illustrated. The encoding sequence scheme considers the internal long-range interactions and sequence relationship between the co-evolution.For positive and negative learning samples,this thesis adopted the positive samples from the DIP,MIPS and BIND and the negative samples constructed from four different ways including:1) R-NEG constructed by randomly selecting protein structure;2) IS-NEG constructed through the subcellular localization and the use of the same range of sub-cellular structure of the protein;3) BS-NEG constructed through the subcellular localization and the use of subcellular localization in different sub-cellular range of structural protein;4) GO-NEG constructed through the Gene Ontology information available from RSSBP and RSSCC with lower value.The comparison of the combination of MIPS Core and GO-NEG with other 11 kinds of combination shows that the prediction accuracy for the former is higher,wherein the value of P with statistical significance is minimum. Thus,the MIPS Core and GO-NEG are called as gold standard positive samples and gold standard negative samples,respectively.In addition,compared with the known amino residual encoding auto-correlation,the correlation coefficient encoding scheme yields better prediction results.The prediction results for across-species show that the SVM model based on the correlation coefficient encoding scheme has better generalization ability.
     2) An improved GPCA plus LDA model was constructed to predict the protein-protein interation,which can effectively improve the prediction accuracy of the membrane protein-protein interaction.The base coordinates obtained by means of Greedy KPCA(GPCA) algorithm were directly from the sample data,while the ones by KPCA algorithm were derived from a linear combination of sample data.Although the greedy algorithm based on KPCA algorithm is sub-optimal,it can greatly reduce the computational complexity comparied with tranditional KPCA algorithm.For single-celled eukaryote of the yeast Saccharomyces cerevisiae,most of integral membrane proteins of Saccharomyces cerevisiae can not be verified by experiments. We proposed the use of 21 sturctures and sequence features for membrane protein interaction to construct 56 positive samples and 150 negative samples.It was found in experiments that based on the kernel method of GPCA plus LDA,300 protein-protein interactions involving 189 membrane proteins are of high reliability prediction results. The experimental results also show that although the GPCA plus LDA method performs feature reduction and removes the redundancy between the data,the obtained results were only slightly better than the LDA method;for GPCA plus LDA method which solves the loss problem of high-order information,the obtained prediction results are better compared with KPCA plus LDA method;the variance of the prediction correct rate for GPCA plus LDA approach is the smallest,which indicates that the difference among the measured values for ten times is smaller,thus this method has better robustness.Moreover,it revealed by computing that the interactions of membrane proteins are of the properties of small-world effect and scale-free properties.
     3) A novel Bayesian additive regression tree(BART) model was proposed to infer protein-protein interaction.BART is a newly integrated approach,which is a classifier ensemble system formed by decomposing BART model into a number of weak classifiers through non-parametric Bayesian regression method.Moreover, BART prediction model based on the integration of backfitting MCMC algorithms obtained better prediction accuracy for protein-protein interation.Particularly, compared with standard MCMC methods,the proposed integration of the backfitting MCMC algorithm can effectively avoid the local minimum situation.At the same time, an independent test set based on BART model achieves better prediction results, which indicates that BART model has good generalization ability.
引文
Alberts,B.,2002.Molecular biology of the cell[M].4th ed.New York:Garland Science.
    Alber,R.,Jeong,H.,Barabasi,A L.,2000.Error and attack tolerance of complex networks[J].Nature,406(6794):378-382.
    Andre,B.,Hein,C,Grenson,M.,et al.,1993.Cloning and expression of the UGA4 gene coding for the inducible GABA-specific transport protein of Saccharomyces cerevisiae[J].Mol.Gen.Genet,237(1-2):17-25.
    Asa,B.H.,William,S.N.,2005.Kernel methods for predicting protein-protein interactions[J].Bioinformatics,21(Suppl 1):i38-i46.
    Bader,G D.,Donaldson,I.,Wolting,C,et al.,2001.BIND-The Biomolecular Interaction Network Database[J].Nucleic Acids Res,29(1):242-245.
    Bader,G D.,Cary,M P.,Sander C,2006.Pathguide:a pathway resource list.Nucleic Acids Res[J].34(Database issue):D504-D 506.
    Barabasi,A.L.,Albert,R.,1999.Emergence of scaling in random networks[J].Science,286(5439):509-512.
    Barabasi,AL.,Oltvai,ZN.,2004.Network biology:understanding the cell's functional organization[J].Nature Reviews Genetics,5(2):101-113.
    Baudat,G.,Anouar,F.,2000.Generalized discriminant analysis using a kernel approach[J].Neural Computation,12(10):2385-2404.
    Ben-hur,A.,Brutlag,D.,2003.Remote homology detection:a motif based approach[J].Bioinformatics,19(Suppl 1):i26-i33.
    Ben-Hur,A.,Noble,WS.,2005.Kernel methods for predicting protein-protein interactions[J].Bioinformatics,21(Suppl 1):i38-46.
    Ben-Hur,A.,Noble,W.S.,2006.Choosing negative examples for the prediction of protein-protein interactions[J].BMC Bioinformatics,7(Suppl 1):S2.
    Berman,H.M.,Westbrook,J.,Feng,Z.et al.,2000.The protein data bank.Nucleic Acids Res[J].28(1):235-242.
    Bock,J.R.,Gough,D.A.,2001.Predicting protein-protein interactions from primary structure[J].Bioinformatics,17(5):455-460.
    Bork,P.,Jensen,L.J.,von Mering,C,et al.,2004.Protein interaction networks from yeast to human[J].Curr.Opin.Struct.Biol,14(3):292-299.
    Breiman,L.,2001.Random forest s[J].Machine Learning,45(1):5-32.
    Broto,P.,Moreau,G.,Vandicke,C.,1984.Molecular structures:perception,autocorrelation descriptor and SAR studies[J].Eur.J.Med.Chem.,19(1):71-78.
    Burbidge,R.,Trotter,M.,Buxton,B.,et al.,2001.Drug design by machine learning:Support vector machines for pharmaceutical data analysis[J].Comp Chem,26(1):4-15.
    Carr,D.W.,Scott,J.D.,1992.Blotting and band-shifting:techniques for studying protein-protein interactions[J].Trends Biochem Sci,17(7):246-249.
    Charton,M.,Charton,B.I.,1982.The structural dependence of amino acid hydrophobicity parameters[J].J.Theor.Biol,99(4):629-644.
    Chatr-aryamontri,A.,Ceol,A.,Palazzi,L M.,et al.,2007.MINT:the Molecular INTeraction database[J].Nucleic Acids Res,35(Database issue):D572-574
    Chang,C.C,Lin,C J.,2001.LIBSVM:a library for support vector machines.http://www.csie.ntu.edu.te/~cjlin/libsvm.
    Chen,Y.,Xu,D.,2003.Computational analyses of high-throughput protein-protein interaction data[J].Curr Protein Pept Sci,4(3):159-181.
    Chipman,H.A.,George,E.I.,McCulloch,R.E.,Bayesian Ensemble Learning[C].NIPS,2006:265-272.
    Chothia,C,1976.The nature of the accessible and buried surfaces in proteins[J].J.Mol.Biol,105(1):1-12.
    Chou,K.C.,Shen,H.B.,2007.EukmPLoc:a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites[J].Journal of Proteome Research,6:1728-1734.
    Collins,S.R.,Kemmeren,P.,Zhao,X.C.,et al.,2007.Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae[J].Mol Cell Proteomics,6(3):439-450.
    Cortes,C,Vapnik,V.,1995.Support vector networks[J].Machine Learning,20(3):273-297.
    Costa,P.J.,Arndt,K.M.,2000.Synthetic lethal interactions suggest a role for the Saccharomyces cerevisiae Rtf1 protein in transcription elongation[J].Genetics,156(2):535-547.
    Coward,E.,1999.Shufflet:shuffling sequences while conserving the k-let counts[J].Bioinformatics,15(12):1058-1059.
    Dandekar,T.,Snel,B.,Huynen,M.,et al.,1998.Conservation of gene order:a fingerprint of proteins that physically interact[J].Trends Biochem.Sci,23(9):324-328.
    Daraselia,N.,Yuryev,A.,Egorov,S.,et al.,2004.Extracting human protein interactions from MEDLINE using a full 2sentence parser[J].Bioinformatics,20(5):604-611.
    Datsenko,K.A.,Wanner,B.L.,2000.One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products[J].Proc Natl Acad Sci U S A,97(12):6640-6645.
    de Gunzburg,J.,Riehl,R.,Weinberg,R.A.,1989.Identification of a protein associated with p21ras by chemical crosslinking[J].Proc Natl.Acad.Sci,86(11):4007-4011.
    Deane,C.M.,Salwinski,L.,Xenarios,I.,et al.,2002,Protein interactions:two methods for assessment of the reliability of high throughput observations[J].Mol.Cell Proteomics,1(5):349-356.
    Deng,M.,Mehta,S.,Sun,F.,et al.,2002.Inferring domain-domain interactions from protein-protein interactions[J].Genome Res,12(10):1540-1548.
    Dietterich,T.G.,1997.Machine learning research:Four current directions[J].Al Magazine,18(4):97-136.
    Domlngos,P.,Pazzani,M.,Beyond independence:conditions for the optimality of the simple bayesian classifier[C].Proceeding of the international conference Machine learning.Morgan Kauffman,San Mateo,1996:105-112.
    Domingos,P.,Pazam,M.,1997.On the Optimality of the simple Bayesian classifier under zero-one loss[J].Machine Learning,29(23):103-130.
    Efron,B.,Hastie,T.,Johnstone,I.,et al.,2004.Least angle regression[J].Annals of Statistics,32(2):407-451.
    Eisenberg,D.,McLachlan,A.D.,1986.Solvation energy in protein folding and binding[J].Nature,319(6050):199-203.
    Enright,A.J.,Iliopoulos,I.,Kyrpides,N.C.,et al.,1999.Protein interaction maps for complete genomes based on gene fusion events[J].Nature,402(6757):86-90.
    Erdos,P.,Renyi,A.,1959.On random graphs[J].Publicationes Mathematicae,1959,6:290-297.
    Fauchere,J.L.,1988.Amino acid side chain parameters for correlation studies in biology and pharmacology[J].Int.J.Peptide Protein Res,32(4):269-278.
    Feng,Z.P.,Zhang,C.T.,2000.Prediction of membrane protein types based on the hydrophobic index of amino acids[J].J.Protein Chem.,19(4),269-275.
    Fields,S.,Song,O-K.,1989.A novel genetic system to detect protein-protein interactions[J].Nature,40(18):245-246.
    FitzGerald,K.,2000.In vitro display technologies-new tools for drug discovery[J].Drug Discov Today,5(6):253-258.
    Franc,V.,Optimization Algorithms for Kernel Methods[D].PhD thesis,Centre for Machine Perception,Czech Technical University,July,2005.
    Freund,Y.,1995.Boosting a weak learning algorithm by majority[J].Information and Computation,121(2):256-285.
    Freund,Y.,Schapire R.E.A.,1997.Decision-theoretic generalization of onLine learning and an application to boosting[J].Journal of Computer and System Sciences,55(1):119-139.
    Friedman,N.,Geiger,D.,Goldszmidt,M.,1997.Bayesian network classifiers[J].Machine Learning,29(2-3):131-163,
    Friedman,J.H.,2001.Greedy function approximation:A gradient boosting machine[J].The Annals of Statistics,29(5):1189-1232.
    Fryxell,K.J.,1996.The coevolution of gene family trees[J].Trends Genet,12(9):364-369.
    Fukunaga,K.Introduction to statistical pattern recognition[M],second ed,Academic Press,1990.
    Gaasterl,T.,Ragan,M.A.,1998.Microbial genescapes:phyletic and functional patterns of ORF distribution among prokaryotes[J].Microb Comp Genomics,3(4):199-217.
    Garel,J.P.,1973.Coefficients de partage d'aminoacides,nucleobases,nucleosides et nucleotides dans un systeme solvant salin[J].Chromatogr,78:381-391.
    Giot,L.,Bader,J.S.,Brouwer,C,et al.,2003.A protein interaction map of Drosophila melanogaster[J].Science,302(5651):1727-1736.
    Gobel,U.,Sander,C,Schneider,R.,et al.,1994.Correlated mutations and residue contacts in proteins[J].Proteins,18(4):309-317.
    Goh,C.S.,Bogan,A.A.,Joachimiak,M.,et al.,2000.Co-evolution of proteins with their interaction partners[J].J Mol.Biol,299(2):283-293.
    Gomez,S.M.,Lo,S.H.,Rzhetsky,A.,2001.Probabilistic prediction of unknown metabolic and signal-transduction networks[J].Genetics,159(3):1291-1298.
    Gomez,S.M.,Noble.W.S.,Rzhetsky,A.,2003.Learning to predict protein-protein interactions from protein sequences[J].Bioinformatics,19(15):1875-1881.
    Grantham,R.,1974.Amino acid difference formula to help explain protein evolution[J].Science,185(4154):862-864.
    Guldener.U.,Munsterkotter,M.,Oesterheld,M.,et al.,2006.MPact:the MIPS protein interaction resource on yeast[J].Nucleic Acids Res,34(Database issue),D436-D441
    Guo,X.,Liu,R.,Shriver C.D.,et al.,2006.Assessing semantic similarity measures for the characterization of human regulatory pathways[J].Bioinformatics,22(8):967-973.
    Guo,Y.Z.,Yu,L.Z.,Wen,Z.N.,et al.,2008.Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences[J].Nucleic Acids Research,36(9):3025-3030.
    Han,G.,Gable,K.,Kohlwein,S.D.,et al.,2002.The Saccharomyces cerevisiae YBR159w Gene Encodes the 3-Ketoreductase of the Microsomal Fatty Acid Elongase[J].J.Biol.Chem.,277(38):35440-35449.
    Hand,D.,Yu,K.,2001.Idiot's bayes-not so stupid after all?[J].International statistical Review, 69(3):385-398.
    Harlow,E.,Whyte,P.,Franza,Jr.,et al.,1986.Association of adenovirus early-region 1A proteins with cellular polypeptides[J].Mol Cell Biol,6(5):1579-1589.
    Hirokawa,T.,Boon-Chieng,S.,Mitaku,S.,1998.SOSUI:Classification and secondary structure prediction system for membrane proteins[J].Bioinformatics,14(4):378—379.
    Ho,Y.,Gruhler,A.,Heilbut,A.et al.,2002.Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry[J].Nature,415(6868):180-183.
    Hong,S J.,Weiss,S M.,2001.Advances in predictive models for data mining[J].Pattern Recogn Lett,22(1):55—61.
    Hopp,T.P.,Woods,K.R.,1981.Prediction of protein antigenic determinants from amino acid sequences[J].Proc.Natl.Acad.Sci.USA,78(6):3824-3828.
    Home,D.S.,1988.Prediction of protein helix content from an autocorrelation analysis of sequence hydrophobicities[J].Biopolymers,27(3):451-477.
    Hua,S.,Sun,Z.,2001.A novel method of protein secondary structure prediction with high segment overlap measure:Support vector machine approach[J].J Mol Biol,308(2):397—407.
    Hutchens,J.O.,1970.Heat capacities,absolute entropies,and entropies of formation of amino acids and related compounds.In “Handbook of Biochemistry”,2nd ed.(Sober,H.A.,ed.),Chemical Rubber Co.,Cleveland,Ohio,B60-B61.
    Imhof,I.,Flury,I.,Vionnet,C,et al.,2004.Giycosylphosphatidylinositol(GPI)proteins of saccharomyces cerevisiae contain ethanolamine phosphate groups on the α1,4-linked mannose of the GPI anchor[J].J.Biol.Chem.,279(19):19614-19627.
    Jang,H.,Lim,J.,Lim,J H.,et al.,2006.Finding the evidence for protein-protein interactions from PubMed abstracts[J].Bioinformatics,22(14):e220-226.
    Janin,J.,1979.Surface and inside volumes in globular proteins[J].Nature,277(5696):491-492.
    Jansen,R.,Yu,H.,Greenbaum,D.,et al.,2003.A Bayesian networks approach for predicting protein-protein interactions from genomic data[J].Science,302(5644):449-453.
    Jayasinghe,S.,Hristova,K.,White,SH.,2001.MPtopo;A database of membrane protein topology[J].Protein Sci,10(2):455-458.
    Jeong,H.,Mason,S.P.,Barab(?)si,AL.,et al.,2001.Lethality and centrality in protein nefworks[J].Nature,411(6833):41-42.
    Joachims,T.Making large-scale SVM learning practical[A].In:Scholkopf B,Burges C J C,Smo la A J eds,Advances in Kernel Methods-Support Vector Learning[M],Cambridge,MA:MIT Press,1998:169-184.
    John,S.T.,Nello,C,2006.Kernel method for pattern analysis[M].
    Johnson,N.,Varshavsky,A.,1994.Split ubiquitin as a sensor of protein interactions in vivo[J].Proc Nat Acad Sci USA,91(22):10340-10344.
    Jones,D.T.,Taylor,WR.,Thornton,JM.,1994.A model recognition approach to the prediction of all2helical membrane protein structure and topology[J].Biochemistry,33(10):3038—3049.
    Kaeser,M.D.,Iggo,R.D.,2002.Chromatin immunoprecipitation analysis fails to support the latency model for regulation of p53 DNA binding activity in vivo[J].Proc Natl Acad Sci USA,99(1):95-100.
    Kearns,M.,Valiant,L.G.,1988.Learning boolean formulae or factoring.Technical Report TR21488,Cambridge,MA:Havard University Aiken Computation Laboratory.
    Keerthi,S.,Gilbert,E.,2002.Convergence of a generalization SMO algorithm for SVM classifier design[J].Machine Learning,46(1-3):351-360.
    Kelley,BP.,Sharan,R.,Karp,RM.,et al,2003.Conserved pathways within bacteria and yeast as revealed by global protein network alignment[J].Proceedings of the National Academy of Sciences,100(20):11394-11399.
    Kim,H.,Melen,K.,Osterberg,M.,et al.,2006.A global topology map of the Saccharomyces cerevisiae membrane proteome[J].Proc.Natl.Acad.Sci.103(30):11142-11147.
    Kim,K.I.,Jung,K.,Park,S.H.,et al.,2002.Support vector machines for texture classification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,24(11),1542-1550.
    Kitano,H.,2002,Computational systems biology[J].Nature,420(6912):206-210.
    Kitano,H.,2002,Systems biology:a brief overview[J].Science,295(5560):1662-1664.
    Kleanthous,C,2000.Protein-protein recognition[M].Frontiers in molecular biology,Oxford University Press.
    Koji,T.,William,S.N.,2004.Learning kernels from biological networks by maximizing entropy[J].Bioinformatics,20(Suppl 1):i326-i333.
    Krogan,N.J.,Cagney,G.,Yu,H.,et al.,2006.Global landscape of protein complexes in the yeast Saccharomyces cerevisiae[J],Nature,440(7084):637-643.
    Kyte,J.,Doolittle,RF.,1982.A simple method for displaying the hydro2 pathic character of a protein[J].J Mol Biol,157(1):105—132.
    Lai,P.L.,Fyfe,C,2000.Kernel and nonlinear canonical correlation analysis[J].International Journal of Neural system,10(5):365-377.
    Lanckriet,G.R.G.,Deng,M.,Cristianini,N.,et al,Kernel-based data fusion and its application to protein function prediction in yeast[C].In Proceedings of the Pacific Symposium on Biocomputing,2004:300-311.
    Lau,W.T.,Howson,R.W.,Malkus,P.,et al.,2000.Pho86p,an ER Resident Protein in Saccharomyces cerevisiae,is Required for ER Exit of the High Affinity Phosphate Transporter Pho84p[J].Proc.Natl.Acad.Sci,97(3):1107-1112.
    Lee,MS.,Park,SS.,Kim,MK.,A protein interaction verification system based on a neural network algorithm[C].Proceeding of the 2005 IEEE Computational Systems Bioinformatics Conference Workshops,2005.
    Leslie.C,Eskin,E.,Noble.W.S.,The spectrum kernel:A string kernel for SVM protein classification.Proceedings of the Pacific Symposium on Biocomputing,New Jersey[C].World Scientific,Singapore,2002:564-575.
    Li,W.Z.,Jaroszewski,L.,Godzik,A.,2001.Clustering of highly homologous sequences to reduce the size of large protein databases[J].Bioinformatics,17(3):282-283.
    Lin,Z.,Pan,X.M.,2001.Accurate prediction of protein secondary structural content[J].J.Protein Chem.,20(3):217-220.
    Lindley,D.V.,2000.The Philosophy of Statistics[J].The Satistician,49(3):293-338.
    Madaoui,H.,Guerois,R.,2008.Coevolution at protein complex interfaces can be detected by the complementarity trace with important impact for predictive docking[J].Proc.Natl.Acad.Sci.USA,105(22):7708-7713.
    Marcotte,E.M.,Pellegrini,M.,Ng,H.L.,et al.,1999.Detecting protein function and protein-protein interactions from genome sequences[J].Science,285(5428):751-753.
    Martin,S.,Roe,D.,Faulon,J.L.,2005.Predicting protein-protein interactions using signature products[J].Bioinformatics,21(2):218-226.
    Maslov,S.,Sneppen,K.,2002.Specificity and stability in topology of protein networks[J].Science,296(5569):910-913.
    Matsuda,S.,Giliberto,L.,Matsuda,Y.,et al.,2005.The familial dementia BRI2 gene binds the Alzheimer gene APP and inhibits Abeta production[J].J Biol Chem,280(32):28912-28916.
    McMahon S.B.Van Buskirk H.A.Dugan K.A.et al.1998.The novel ATM-related protein TRRAP is an essential cofactor for the c-Myc and E2F oncoproteins[J].Cell,94(3):363-374.
    Mewes,H.W.,Frishman,D.,Mayer,KFX.,et al.,2006.MIPS:analysis and annotation of proteins from whole genomes in 2005[J].Nucleic Acids Res,34(Database issue):D169-D172.
    Miller,J.P.,Lo,R.S.,Ben-Hur,A.,et al.,2005.Large-scale identification of yeast integral membrane protein interactions[J].Proc.Natl.Acad.Sci.USA,102(34):12123-12128.
    Milo,R.,Shen-Orr,S.,Itzkovitz,S.,et al.,2002.Network motifs:simple building blocks of complex networks[J].Science,298(5594):824-827.
    Mishra,G R.,Suresh,M.,Kumaran,K.,et al.,2006.Human protein reference database-2006 update[J].Nucleic Acids Res,34(Database issue):D411-414.
    Mitchell,D.A.,Marshall,T.K.,Deschenes,R.J.,1993.Vectors for the inducible overexpression of glutathione S-transferase fusion proteins in yeast[J].Yeast,9(7):715-722.
    Mochizuki,N.,Yamashita,S.,Kurokawa,K.,et al.,2001.Spatio-temporal images of growth-factor-induced activation of Ras and Rapl[J].Nature,411(6841):1065-1068.
    Moran,P.A.,1950.Notes on continuous stochastic phenomena[J].Biometrika,37(1-2):17-23.
    Moreau.G.,Broto,P.,1980.Autocorrelation of molecular structures,application to SAR studies[J].Nour.J.Chim.,4:757-764.
    Multhaup,G.,Strausak,D.,Bissig,K.D.,et al,2001.Interaction of the CopZ copper chaperone with the CopA copper ATPase of Enterococcus hirae assessed by surface plasmon resonance[J].Biochem Biophys Res Commun.288(1):172-177.
    Nooren,I.M.A.,Thornton,J.M.,2003.Diversity of Protein-protein interactions[J].EMBO J.22(14):3486-3492.
    Olmea,O.,Valencia,A.,1997.Improving contact predictions by the combination of correlated mutations and other sources of sequence information[J].Fold Des,2(3):S25-32.
    Ortiz,A.R.,Kolinski,A.,Rotkiewicz,P.,et al.,1999.Ab initio folding of proteins using restraints derived from evolutionary information[J].Proteins,37(Suppl 3):177-185.
    Ortiz,A.R.,Skolnick,J.,2000.Sequence evolution and the mechanism of protein folding[J].Biophys J,79(4):1787-1799.
    Overbeek,R.,Fonstein,M.,D'Souza,M.,et al.,1999.Use of contiguity on the chromosome to predict functional coupling[J].In Silico.Biol,1(2):93-108.
    Ozier,O.,Amin,N.,Ideker,T.,2003.Global architecture of genetic interactions on the protein network[J].Nature biotechnology,21(5):490-491.
    Patil,A.,Nakamura,H,,2005.Filtering high-throughput protein-protein interaction data using a combination of genomic features[J].BMC Bioinformatics,6(1):100-112.
    Pazos,F.,Helmer-Citterich,M.,Ausiello,G.,et al.,1997.Correlated mutations contain information about protein-protein interaction[J].J Mol Biol,271(4):511-523.
    Pazos,F.,Valencia,A.,2001.Similarity of phylogenetic trees as indicator of protein-protein interaction[J].Protein Eng,14(9):609-614.
    Pazos,F.,Valencia,A.,2002.In silico two-hybrid system for the selection of physically interacting protein pairs[J].Proteins,47(2):219-227.
    Pellegrini,M.,Marcotte,E.M.,Thompson,M.J.,et al.,1999.Assigning protein functions by comparative genome analysis:protein phylogenetic profiles[J].Proc.Natl.Acad.Sci,96(8):4285-4288.
    Pereira-Lea,J B.,Levy,E D.,Teichmann,S A.,2006.The origins and evolution of functional modules:lessons from protein complexes[J].Philos Trans R Soc Lond B Biol Sci,361(1467):507-517.
    Piatt,J C.,Fast training of SVMs using sequential minimal optimization[A].In:Scholkopf B,Burges C J C,Smola A J eds,Advances in Kernel Methods-Support Vector Learning[M],Cambridge,MA:M IT P ress,1998:185-208.
    Prabhakaran,M.,Ponnuswamy,P.K.,1982.Shape and surface features of globular proteins[J].Macromolecules,15(2):314-320.
    Przulj.N.,Corneil,DG.,Jurisica,I.,2004.Modeling interactome:Scale-free or geometric?[J].Bioinformatics,12(20):3508-3515.
    Qi,Y.,Klein-Seetharaman,J.,Bar-Joseph,Z.,2005.Random forest similarity for protein-protein interaction prediction from multiple sources[J].Pac Symp Biocomput,10:531-542.
    Rain,J.C.,Selig,L.,De Reuse,H.,et al.,2001.The protein-protein interaction map of Helicobacter pylori[J].Nature,409(6817):211-215.
    Resendis-Antonio,O.,Freyre-Gonzalez,JA.,Menchaca-Mendez,R.,et al.,2005.Modular analysis of the transcriptional regulatory network of E.coli[J].Trend in Genetics,21(1):16-20.
    Resnik,P.,1999.Semantic similarity in a taxonomy:an information-based measure and its application to problems of ambiguity in natural language[J].J.Artificial Intelligence Res,11:95-130.
    Rhodes,DR.,Tomlins,SA.,Varambally,S.,et al.,2005.Probabilistic model of the human protein-protein interaction network.[J].Nature Biotechnology,23(8):951-959.
    Rives,AW.,Galitski,T.,2003.Modular organization of cellular networks[J].PNAS,100(3):1128-1133.
    Rogers,J.,2003.The finished genome sequence of Homo sapiens[J].Cold Spring Harb Symp Quant Biol,68:1-11.
    Rost,B.,Fariselli,P.,Casadio,R.,1996.Topology prediction for helical transmembrane proteins at 86 % accuracy[J].Protein Sci,5(8):1704-1718.
    Saito,R.,Suzuki,H.,Hayashizaki.Y.,2002.Interaction generality,a measurement to assess the reliability of a protein-protein interaction[J].Nucleic Acids Res,30(5):1163-1168.
    Schapire,R.E.,1990.The Strength of Weak Learnability[J].Machine Learning,5(2):197-227.
    Scholkopf,B.,Smola,A.,Muller,K.R.,1998.Nonlinear component analysis as a kernel eigenvalue problem[J].Neural Computation,10(5):1299-1319.
    Schuldiner,M.,Collins,S.R.,Thompson,N.J.,et al.,2005.Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile Cell,123(3):507-519.
    Schwikowski,B.,Uetz,P.,Fields,S.,2000.A network of protein-protein interactions in yeast[J].Nat Biotechnol,18(12):1257-1261.
    Scollnik,D.P.M.,2001.Actuarial Modeling with MCMC and BUGS[J].North American Actuarial Journal,5(2):96-124.
    Shen,J.,Zhang,J.,Luo,X.,et al.,2007.Predicting protein-protein interactions based only on sequences information[J].Proc.Natl.Acad.Sci,104(11):4337-4341.
    Sonnhammer,EL.,von Heijne,G.,Krogh,A.,A Hidden Markov model for predicting transmembrane helices in protein sequences[C].Proc Int Conf Intell Syst Mol Biol,1998,6:175-182.
    Sprinzak,E.,Margalit,H.,2001.Correlated sequence-signatures as markers of protein-protein interaction[J].J Mol.Biol,311(4):681-692.
    Stagljar,I.,Korostensky,C.,Johnsson,N.,et al.,1998.A genetic system based on split-ubiquitin for the analysis of interactions between membrane proteins in vivo[J].Proc.Natl.Acad.Sci.USA,95(9):5187-5192.
    Stark,C,Breitkreutz,BJ.,Reguly,T.,et al.,2006.BioGRID:A General Repository for Interaction Datasets[J].Nucleic Acids Res,34(Database Issue):D535-539.
    Stumpf,M P.,Kelly,W P.,Thorne,T.,et al.,2007.Evolution at the systems level:the natural history of protein interaction networks[J].Trends Ecol Evol,22(7):366-373.
    Sweet,R.M.,Eisenberg,D.,1983.Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure[J].J.Mol.Biol,171(4):479-488.
    Tamames,J.,Casari,G.,Ouzounis,C,et al.,1997.Conserved clusters of functionally related genes in two bacterial genomes[J].J Mol.Evol,44(1):66-73.
    Tibshirani,R.1996.Regression Shrinkage and Selection via the Lasso.J.Roy.Statist.Soc.Ser,58(1):267-288.
    Tipper,D.J.,Harley,C.A.,2002.Yeast genes controlling responses to topogenic signals in a model transmembrane protein[J].Mol.Biol.Cell.,13(4):1158-1174.
    Tomitori,H.,Kashiwagi,K.,Asakawa,T.,et al.,2001.Multiple polyamine transport systems on the vacuolar membrane in yeast[J].Biochem.J.,353(3):681-688.
    Tong,A.H.,Evangelista,M.,Parsons,A.B.,et al.,2001.Systematic genetic analysis with ordered arrays of yeast deletion mutants[J].Science,294(5550):2364-2368.
    Uetz,P.,Giot,L.,Cagney,G.et al.,2000.A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae[J].Nature,403(6770):623-627.
    ValiantL.G.,1984.A Theory of the Learnable[J].Communications of the ACM,27(11):1134-1142.
    Von Heijine,G.,1986.The distribution of positively charged residues in bacterial inner membrane proteins correlates with the transmembrane topology[J].EMBO J,5(11):3021-3027.
    Von Mering,C,Krause,R.,Snel,B.,et al.,2002.Comparative assessment of large scale data sets of protein-protein interactions[J].Nature,417(6887):399-403.
    Walhout,A.J.,Sordella,R.,Lu,X.,et al.,2000.Protein interaction mapping in C.elegans using proteins involved in vulval development[J].Science,287(5450):116-122.
    Wang,J.Z.,Du,Z.D.,Payattakool,R.,et al.,2007.A new method to measure the semantic similarity of GO terms[J].Bioinformatics,23(10):1274-1281.
    Waterston,R.H.,Lindblad-Toh,K.,Birney,E.,et al.,2002.Initial sequencing and comparative analysis of the mouse genome[J].Nature,420(6915):520-562.
    Wats,D.J.,Strogatz,S.H.,1998.Collective dynamics of small-world networks[J].Nature,393(6684):440-442.
    Watts,D J.,Strogatz,S.,1998.Collective dynamics of “Small world” Networks[J].Nature,393(6684):440-442.
    Williams,C,Addona,T.A.,2000.The integration of SPR biosensors with mass spectrometry:possible applications for proteome analysis[J].Trends Biotechnol,18(2):45-48.
    Wittke,S.,Lewke,N.,Muller,S.,et al.,1999.Probing the molecular environment of membrane proteins in vivo[J].Mol.Biol.Cell 10(8):2519-2530.
    Wojcik,J.,Schachter,V.,2001.Protein-protein interaction map inference using interacting domain profile pairs[J].Bioinformatics,17(Suppl 1):S296-305.
    Wold S,Jonsson J,Sj(o|¨)str(o|¨)m M,et al.,1993,DNA and peptide sequences and chemical processes mutlivariately modelled by principal component analysis and partial least-squares projections to latent structures.Anal.Chim.Acta,277:239-253.
    Wolpert,D.H.,1992.Stacked generalization[J].Neural Networks,5(2):241-259.
    Wolpert,D.H.,Macready,W.G.,Center,I.,et al.,1997.No free lunch theorems for optimization[J].IEEE Transactions on Evolutionary Computation,1(1):67-82.
    Wu,X.,Zhu,L.,Guo,J.,et al.,2006.Prediction of yeast protein-protein interaction network:insights from the gene ontology and annotations[J].Nucleic Acids Res,34(7): 2137-2150.
    Xenarios,I.,Salwinski,L,,Duan,X.J.,et al.,2002.DIP,the Database of Interacting Proteins:a research tool for studying cellular networks of protein interactions[J].Nucleic Acids Res,30(1):303-305.
    Yan,A.,Wu,E.,Lennarz,W J.,2005.Studies of yeast 01igosaccharyl transferase subunits using the split-ubiquitin system:topological features and in vivo interactions.Proc Natl Acad Sci USA,
    Yang,M.H.,2002.Facer recognition using kernel methods[C],Advances in Neural Information Processing Systems 14(NIPS 14),MIT Press,2002:215-220.
    Yeang,C.H.,Haussler,D.,2007.Detecting coevolution in and among protein domains[J].PLoS Comput.Biol,3(11):2122-2134.
    Yompakdee,C.,Ogawa,N.,Harashima,S.et al.,1996.A putative membrane protein,Pho88p,involved in inorganic phosphate transport in Saccharomyces cerevisiae[J].Mol.Gen.Genet,251(5):580-590.
    Yu,H.,Braun,P.,Yildirim,M.A.,et al.,2008.High-quality binary protein interaction map of the yeast interactome network.[J].Science,322(5898):104-110.
    Zhou,Z.H.,Wu,J.,Tang,W.,2002.Ensembling neural networks:Many could be better than all[J].Artificial Intelligence,137(1-2):239-263.
    Zhou,Z.H.,Tang,W.,2003.Selective ensemble of decision trees[J].Lecture Notes in Artificial Intelligence,2639(2003):476-483.
    Zhu,H.,Bilgin,M.,Bangham,R.et al.,2001.Global analysis of protein activities using proteome chips[J].Science.293(5537):2101 -2105.
    陈希孺,1999.高等数理统计学[M],中国科学技术大学出版社.
    Kotz,s.and吴喜之,2000.现代贝叶斯统计学[M],中国统计出版社.
    李霞,刘超,2008.基于收缩机制的若干回归模型比较研究[J].统计与决策,5:30-32.
    梁国栋,2001.最新分子生物学实验技术[M],科学出版社.
    梁琳慧,韩忠朝.2005.蛋白质相互作用的研究方法[J].生命的化学,25(3):255-257.
    刘乐平,袁卫2004.现代贝叶斯分析与现代统计推断[J]-经济理论与经济管理,6:64-69.
    茆诗松,周纪芗,2000.概率论与数理统计[M],中国统计出版社.
    孙景春,2005.基于基因组上下文预测蛋白质相互作用方法的优化、整合与应用[D].上海交通大学博士论文
    田云,卢向阳,2003.蛋白质问相互作用研究技术进展[J].生物学通报,38(5):1-3.
    Vapnik著,张学工译,2000.统计学习理论本质[M],清华大学出版社.
    谢承旺,2008.不同种类支持向量机算法的比较研究[J],小型微型计算机系统, 29(1):106-109.
    王兵,2006.蛋白质相互作用及其位点的预测方法研究[D].中国科学技术大学博士论文。
    王文馨,陈宇光,石铁流,2008.异源蛋白质相互作用数据整合算法的进展[J].生命科学,20(5):821-826.
    张丽苹,霍克克,2003.蛋白质相互作用研究技术进展[J].高技术通讯,11:99-106.
    朱慧明,郝立亚,2007.非寿险精算中的贝叶斯信用模型分析[J].数量经济技术经济研究,1:109-117.
    朱新宇,沈百荣,2004.预测蛋白质问相互作用的生物信息学方法[J].生物技术通讯,15(1):70-75.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700