用户名: 密码: 验证码:
机器学习方法预测蛋白质相互作用应用Logistic回归提高质谱多肽鉴定的准确度
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白质组学成为后基因组时代的热点学科。生物质谱、蛋白质芯片等高通量实验技术的发明极大地推动了蛋白质组学的发展。本文致力于通过生物信息学的方法,进一步提高当前高通量实验技术的效率和精确程度,以更低的实验代价,获得更加全面、准确的实验结果。
     蛋白质—蛋白质相互作用在生命过程中起着重要的作用。通过多年的生物学实验,已经积累了大量的蛋白质相互作用数据,但未知的相互作用还有很多。目前筛选蛋白质相互作用的实验方法既耗费人力物力,而且由于丰度抑制的原因而很难鉴定出低丰度的蛋白之间的相互作用。一条更简单的途径是通过生物信息学的方法首先用计算机筛选蛋白质数据库,预测出潜在的蛋白质相互作用,然后再用生物学实验进行验证。这个策略具有比实验手段高得多的通量,而且可以解决丰度抑制的问题。
     在蛋白质—蛋白质相互作用的类型中,有相当一部分相互作用是通过蛋白质的某个结构域与其配体蛋白上的一段短肽相结合来实现的,这种结构域被称为多肽识别元件(Peptide recognition module,PRM)。本文的第一章通过研究PRM结合多肽的结合特性,预测了蛋白质—蛋白质之间的相互作用。
     以PDZ结构域为例,结合了基于结构的和基于序列的预测方法,本文建立了一个整合的预测系统来预测结构域和配体间的相互作用。在这个系统中,我们提取了结构域和配体三维结构上相互接触的氨基酸残基来代替序列全长,利用三种新型的氨基酸编码方式,用支持向量机和人工神经网络两种机器学习算法分别建立了三个子预测系统,最后将它们的预测结果综合在一起。
     用交叉验证的方法来评价,预测系统的特异性为0.99,灵敏度为0.60。然而,由于已知的一个结构域的配体通常只有几十或几百个,远远小于蛋白质数据库的上万个蛋白的规模,仅仅建立在少量数据上的交叉验证的评价结果不一定能保证预测方法在筛选数据库时的成功。为了验证这一点,本文从Swissprot人类数据库中为3个PDZ结构域筛选了配体蛋白序列,预测结果的相当一部分与高通量的体外实验(peptide SPOT array)的结果重合,证明了预测系统的泛化能力。
     串联质谱技术(MS/MS)是常用的蛋白质组学研究方法。在这个方法中,蛋白质混合物首先被酶切为多肽混合物,在质谱仪中被离子化,再经过碎裂后产生大量的二级质谱图。数据库检索是常见的质谱数据处理方法。其主要思想是将实验谱图与数据库中的酶切多肽的理论谱图进行比对,通过特定的打分算法,找到匹配最佳的多肽。由于样品和实验原理的复杂性,质谱图带有很高的噪声,为后续的数据处理工作带来了很大的难度。目前已有多种算法用来优化多肽的鉴定,但阳性和阴性的多肽鉴定仍不能够被完美地区分。为了保证鉴定结果的可信,就不得采用更严格的参数限制来去除假阳性鉴定,与此同时不可避免地产生了大量的假阴性鉴定,降低了蛋白质组学研究的效率。
     本文的第二章建立了一个新的参数Oscore,对实验谱图与多肽的匹配进行打分。Oscore基于logistic回归模型建立,以18个标准蛋白数据集作为学习集,可以直接地计算出谱图与多肽的匹配为正确匹配的概率。回归模型的自变量包括:SEOUEST软件输出的参数Xcorr,△Cn,Sp(preliminary score)和实验室自制的AMASS(Sun etal.Mol Cell Proteomics.2004 Dec;3(12):1194-9)软件的输出参数Rscore,Cont,Matchpct,以及多肽电荷数和漏切位点数(number of missed internal cleavage sites)。AMASS的三个参数考虑了子离子强度和b/y系列离子的连续性的信息,有助于区分阳性和阴性的多肽鉴定。由于上述的8个参数之间具有复杂的相关关系,将它们组合成Oscore可以提高鉴定的准确度。
     与常用的软件PeptideProphet相比,Oscore同时在多个数据集上表现出更好的特异性(低假阳性率)和灵敏度(低假阴性率)。这些数据集包括标准蛋白混合物数据集和3个蛋白质组水平的数据集,涵盖了不同的样品复杂度、数据库规模和分离方式,在一定程度上表明了Oscore的泛化能力。通过一个同样基于logistic回归,但只采用PeptideProphet所用参数的新模型,本文探讨了Oscore具有更好的判别能力的原因。
     目前的Oscore针对的是具有完全酶切的末端(即多肽的两端都是由胰酶酶切在氨基酸K或R之后产生)的多肽,提高非完全酶切的多肽的鉴定水平将是今后的工作。
Proteomics has become a hot subject in the post-genomic era.In the recent years,high-throughput technologies such as biological mass spectrometry and protein chip have greatly promoted the development of proteomics.This article works on further improving the accuracy and efficiency of current experimental technologies by the adoption of bioinformatics methods,in order to reduce the cost of biological experiments and to obtain more comprehensive and accurate data.
     Protein-protein interactions play an essential role in life course. During the past years,great amounts of interactions were found by various high-throughput biological experiments.However,there are still many unknown interactions.Unfortunately,experimental screening for protein binding partners is not only labor intensive but almost futile in screening for low abundant binding species,due to the suppression by high abundant ones.A more plausible way of studying protein-protein interactions is by using high-throughput computational predictions rather than experimental approaches to screen for interactions from protein sequence databases, consequently directing the validating experiments towards the most promising peptides.Compared to traditional experimental essays,computational prediction offers a higher throughput strategy for identifying interactions on a proteomic scale.It also provides a satisfactory settlement for the abundance suppression problem.
     A fairly large set of protein-protein interactions are mediated by families of peptide binding domains(PRM,Peptide recognition module).The first chapter of this article predicted protein-protein interactions by studying the binding selectivity of PRMs and their ligand peptides.
     Taking PDZ domain family as an example,an integrated prediction system was set up to predict ligand peptides for PRMs based on both structural and sequential information.In this system,amino acid residues on the interface of the interacting domain-ligand pairs were extracted to take place of their full-length sequences.Next,three novel coding methods were invented to represent different aspects of interactions between the amino acid residue pairs.Support vector machine and artificial neural network were employed as machine learning algorithms and three independent predictors were built to process the encoded data.Prediction results of these three predictors were assembled to make the final prediction.
     Evaluated by the cross-validation method,specificity of the assembled system was 0.99 and sensitivity was 0.60.However,since the number of known ligands of a PRM is usually only a few dozens or hundreds,which is much less than the size of a protein database(usually over ten thousands),the performance on cross-validation cannot represent the real performance when the whole protein database are screened.In this paper,we screened the Swissprot protein databases for potential ligands of 3 PDZ domains by this trained system.A large fraction of predictions have already been experimentally confirmed by peptide SPOT array assays,indicating a satisfying generalization capability of this prediction system.
     Tandem mass spectrometry(MS/MS) has been widely used in proteomics studies.In such an approach,protein mixture are firstly digested into peptide mixture by enzymes,then ionized and fragmented to produce large numbers of MS/MS spectra.Database searching is a common method to process MS/MS data by comparing experimental spectra with theoretical spectra,which are predicted from peptides in a target protein database,and finding the best matches based on some scoring methods.Due to the complexity of mass spectrometry experiments and the samples tested,the MS/MS spectra involve high level of noises,hence processing MS/MS data is a difficult work. Currently,various algorithms have been developed to improve peptide identification from MS/MS spectra.However,correct and incorrect matches between the experimental spectra and peptides in database still cannot be very well distinguished.To guarantee the confidence of peptide identification, strict criteria of the scoring functions have to be used,the sensitivity of proteomics research has to be scarified.
     In the second chapter of this article,a new measurement Oscore was developed by logistic regression based on a training dataset produced from 18 known proteins mixture.Oscore directly estimates the probability of a correct peptide assignment for each MS/MS spectrum.Variables involved in this regression model were:SEQUEST variables Xcorr,△Cn,Sp;and the homemade software AMASS(Sun et al.Mol Cell Proteomics.2004 Dec;3(12):1194-9.) output variables MatchPct,Cont,Rscore;peptide charge state and number of peptide internal missed cleavage sites(NIMCS).The AMASS variables provide supplemental information to SEQUEST variables by considering fragment ion intensity and b/y ion continuity.Because of the complicated associations among AMASS and SEQUEST variables,combining them together rather than applying them to a threshold model improved the classification of correct and incorrect peptide identifications.
     Oscore achieved both lower false negative rate and lower false positive rate than PeptideProphet on datasets generated from 18 known protein mixture and several proteome-scale samples of different complexity,database size, and separation methods.By a three-way comparison among Oscore, PeptideProphet and another logistic regression model which only made use of PeptideProphet variables,the main contributor for Oscore' s improvement was discussed.
     Presently,Oscore is restricted to be used for identifying fully-tryptic peptides.To extend Oscore for non- and partially-tryptic peptides will be the future work.
引文
[1]T.Pawson and P.Nash.Assembly of cell regulatory systems through protein interaction domains[J].Science,2003,300(5618):445-452.
    [2]T.Pawson,M.Raina and P.Nash.Interaction domains:from simple binding events to complex cellular behavior[J].FEBS Lett,2002,513(1):2-10.
    [3]T.Pawson and J.D.Scott.Signaling through scaffold,anchoring,and adaptor proteins[J].Science,1997,278(5346):2075-2080.
    [4]C.Nourry,S.G.Grant and J.P.Borg.PDZ domain proteins:plug and play![J].Sci STKE,2003,2003(179):RE7.
    [5]Y.J.Im,J.H.Lee,S.H.Park,S.J.Park,S.H.Rho,G.B.Kang,E.Kim and S.H.Eom.Crystal structure of the Shank PDZ-ligand complex reveals a class I PDZ interaction and a novel PDZ-PDZ dimerization[J].J Biol Chem,2003,278(48):48099-48104.
    [6]F.Jelen,A.Oleksy,K.Smietana and J.Otlewski.PDZ domains-common players in the cell signaling[J].Acta Biochim Pol,2003,50(4):985-1017.
    [7]O.Michielin and M.Karplus.Binding free energy differences in a TCR-peptide-MHC complex induced by a peptide mutation:a simulation analysis[J].J Mol Biol,2002,324(3):547-569.
    [8]J.C.Tong,T.W.Tan and S.Ranganathan.Modeling the structure of bound peptide ligands to major histocompatibility complex[J].Protein Sci,2004,13(9):2523-2532.
    [9]J.C.Obenauer,L.C.Cantley and M.B.Yaffe.Scansite 2.0:Proteome-wide prediction of cell signaling interactions using short sequence motifs[J].Nucleic Acids Res,2003,31(13):3635-3641.
    [10]A.H.Tong,B.Drees,G.Nardelli,G.D.Bader,B.Brannetti,L.Castagnoli,M.Evangelista,S.Ferracuti,B.Nelson,S.Paoluzi,M.Quondam,A.Zucconi,C.W.Hogue,S.Fields,C.Boone and G.Cesareni.A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules[J].Science,2002,295(5553):321-324.
    [11]M.A.Stiffler,J.R.Chen,V.P.Grantcharova,Y.Lei,D.Fuchs,J.E.Allen,L.A.Zaslavskaia and G.MacBeath.PDZ domain binding selectivity is optimized across the mouse proteome[J].Science,2007,317(5836):364-369.
    [12]M.C.Honeyman,V.Brusic,N.L.Stone and L.C.Harrison.Neural network-based prediction of candidate T-cell epitopes[J].Nat Biotechnol,1998,16(10):966-969.
    [13]P.Donnes and A.Elofsson.Prediction of MHC class I binding peptides,using SVMHC[J].BMC Bioinformatics,2002,3(25.
    [14]V.Brusic,G.Rudy,G.Honeyman,J.Hammer and L.Harrison.Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network[J].Bioinformatics,1998,14(2):121-130.
    [15]S.Martin,D.Roe and J.L.Faulon.Predicting protein-protein interactions using signature products[J].Bioinformatics,2005,21(2):218-226.
    [16]B.Brannetti,A.Via,G Cestra,G.Cesareni and M.Helmer-Citterich.SH3-SPOT:an algorithm to predict preferred ligands to different members of the SH3 gene family[J].J Mol Biol,2000,298(2):313-328.
    [17]Y.Altuvia and H.Margalit.A structure-based approach for prediction of MHC-binding peptides[J].Methods,2004,34(4):454-459.
    [18]H.M.Berman,T.N.Bhat,P.E.Bourne,Z.Feng,G.Gilliland,H.Weissig and J.Westbrook.The Protein Data Bank and the challenge of structural genomics[J].Nat Struct Biol,2000,7(Suppl):957-959.
    [19]J.D.Thompson,D.G.Higgins and T.J.Gibson.CLUSTAL W:improving the sensitivity of progressive multiple sequence alignment through sequence weighting,position-specific gap penalties and weight matrix choice[J].Nucleic Acids Res,1994,22(22):4673-4680.
    [20]J.Schultz,F.Milpetz,P.Bork and C.P.Ponting.SMART,a simple modular architecture research tool:identification of signaling domains[J].Proc Natl Acad Sci U S A,1998,95(11):5857-5864.
    [21]T.Beuming,L.Skrabanek,M.Y.Niv,P.Mukherjee and H.Weinstein.PDZBase:a protein-protein interaction database for PDZ-domains[J].Bioinformatics,2005,21(6):827-828.
    [22]Pierre Baldi and S(?)ren Brunak.Bioinformatics:the machine learning approach M].Cambridge,Mass.:MIT Press,2001:452
    [23]H.B.Bull and K.Breese.Surface tension of amino acid solutions:a hydrophobicity scale of the amino acid residues[J].Arch Biochem Biophys,1974,161(2):665-670.
    [24]J.Kyte and R.F.Doolittle.A simple method for displaying the hydropathic character of a protein[J].J Mol Biol,1982,157(1):105-132.
    [25]C.Chothia.Structural invariants in protein folding[J].Nature,1975, 254(5498):304-308.
    [26]R.Bhaskaran and P.K.Ponnuswamy.Dynamics of amino acid residues in globular proteins[J].Int J Pept Protein Res,1984,24(2):180-191.
    [27]J.M.Zimmerman,N.Eliezer and R.Simha.The characterization of amino acid sequences in proteins by statistical methods[J].J Theor Biol,1968,21(2):170-201.
    [28]L.Betancourt,T.Takao,L.Hernandez,G.Padron and Y.Shimonishi.Structural characterization of Acetobacter diazotropicus levansucrase by matrix-assisted laser desorption/ionization mass spectrometry:identification of an N-terminal blocking group and a free-thiol cysteine residue[J].J Mass Spectrom,1999,34(3):169-174.
    [29]S.Hua and Z.Sun.A novel method of protein secondary structure prediction with high segment overlap measure:support vector machine approach[J].J Mol Biol,2001,308(2):397-407.
    [30]T.S.Furey,N.Cristianini,N.Duffy,D.W.Bednarski,M.Schummer and D.Haussler.Support vector machine classification and validation of cancer tissue samples using microarray expression data[J].Bioinformatics,2000,16(10):906-914.
    [31]M.Bhasin and G P.Raghava.SVM based method for predicting HLA-DRB 1*0401 binding peptides in an antigen sequence[J].Bioinformatics,2004,20(3):421-423.
    [32]Vladimir Naumovich Vapnik.The nature of statistical learning theory[M].New York:Springer,2000:1-314
    [33]B.W.Matthews.Comparison of the predicted and observed secondary structure of T4 phage lysozyme[J].Biochim Biophys Acta,1975,405(2):442-451.
    [34]L k.Hansen and P.Salamon.Neural Network Ensembles[J].IEEE transactios on Pattern Analysis and Machine Intelligence,1990,12(10):993-1001.
    [35]P.Boisguerin,R.Leben,B.Ay,G.Radziwill,K.Moelling,L.Dong and R.Volkmer-Engert.An improved method for the synthesis of cellulose membrane-bound peptides with free C termini is useful for PDZ domain binding studies[J].Chem Biol,2004,11(4):449-459.
    [36]U.Wiedemann,P.Boisguerin,R.Leben,D.Leitner,G.Krause,K.Moelling,R.Volkmer-Engert and H.Oschkinat.Quantification of PDZ domain specificity,prediction of ligand affinity and rational design of super-binding peptides[J].J Mol Biol,2004,343(3):703-718.
    [37]H.Huang and Y.Gao.A method for generation of arbitrary peptide libraries using genomic DNA[J].Mol Biotechnol,2005,30(2):135-142.
    [38]E.Song,S.Gao,R.Tian,S.Ma,H.Huang,J.Guo,Y.Li,L.Zhang and Y.Gao.A high efficiency strategy for binding property characterization of peptide-binding domains[J].Mol Cell Proteomics,2006,5(8):1368-1381.
    [39]V.Brusic,V.B.Bajic and N.Petrovsky.Computational methods for prediction of T-cell epitopes--a framework for modelling,testing,and applications[J].Methods,2004,34(4):436-443.
    [40]S.L.Lo,C.Z.Cai,Y.Z.Chen and M.C.Chung.Effect of training datasets on support vector machine prediction of protein-protein interactions[J].Proteomics,2005,5(4):876-884.
    [41]G.Sanclemente and D.K.Gill.Human papillomavirus molecular biology and pathogenesis[J].J Eur Acad Dermatol Venereol,2002,16(3):231-240.
    [42]F.Mantovani and L.Banks.The human papillomavirus E6 protein and its contribution to malignant progression[J].Oncogene,2001,20(54):7874-7887.
    [1]A.C.Gavin,M.Bosche,R.Krause,P.Grandi,M.Marzioch,A.Bauer,J.Schultz,J.M.Rick,A.M.Michon,C.M.Cruciat,M.Remor,C.Hofert,M.Schelder,M.Brajenovic,H.Ruffner,A.Merino,K.Klein,M.Hudak,D.Dickson,T.Rudi,V.Gnau,A.Bauch,S.Bastuck,B.Huhse,C.Leutwein,M.A.Heurtier,R.R.Copley,A.Edelmann,E.Querfurth,V.Rybin,G.Drewes,M.Raida,T.Bouwmeester,P.Bork,B.Seraphin,B.Kuster,G.Neubauer and G.Superti-Furga.Functional organization of the yeast proteome by systematic analysis of protein complexes[J].Nature,2002,415(6868):141-147.
    [2]Y.Ho,A.Gruhler,A.Heilbut,G.D.Bader,L.Moore,S.L.Adams,A.Millar,P.Taylor,K.Bennett,K.Boutilier,L.Yang,C.Wolting,I.Donaldson,S.Schandorff,J.Shewnarane,M.Vo,J.Taggart,M.Goudreault,B.Muskat,C.Alfarano,D.Dewar,Z.Lin,K.Michalickova,A.R.Willems,H.Sassi,P.A.Nielsen,K.J.Rasmussen,J.R.Andersen,L.E.Johansen,L.H.Hansen,H.Jespersen,A.Podtelejnikov,E.Nielsen,J.Crawford,V.Poulsen,B.D.Sorensen,J.Matthiesen,R.C.Hendrickson,F.Gleeson,T.Pawson,M.F.Moran,D.Durocher,M.Mann,C.W.Hogue,D.Figeys and M.Tyers.Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry[J].Nature,2002,415(6868):180-183.
    [3]B.Blagoev,I.Kratchmarova,S.E.Ong,M.Nielsen,L.J.Foster and M.Mann.A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling[J].Nat Biotechnol,2003,21(3):315-318.
    [4]S.W.Taylor,E.Fahy,B.Zhang,G.M.Glenn,D.E.Warnock,S.Wiley,A.N.Murphy,S.P.Gaucher,R.A.Capaldi,B.W.Gibson and S.S.Ghosh.Characterization of the human heart mitochondrial proteome[J].Nat Biotechnol,2003,21(3):281-286.
    [5]D.Fenyo.Identifying the proteome:software tools[J].Curr Opin Biotechnol,2000,11(4):391-395.
    [6]K.Eng,A.McCormack and J.Yates.An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database[J].J Am Soc Mass Spectrom,1994,5(11):976-989.
    [7]D.N.Perkins,D.J.Pappin,D.M.Creasy and J.S.Cottrell.Probability-based protein identification by searching sequence databases using mass spectrometry data[J].Electrophoresis,1999,20(18):3551-3567.
    [8]H.I.Field,D.Fenyo and R.C.Beavis.RADARS,a bioinformatics solution that automates proteome mass spectral analysis,optimises protein identification,and archives data in a relational database[J].Proteomics,2002,2(1):36-47.
    [9]R.Craig and R.C.Beavis.TANDEM:matching proteins with tandem mass spectra[J].Bioinformatics,2004,20(9):1466-1467.
    [10]A.Keller,A.I.Nesvizhskii,E.Kolker and R.Aebersold.Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search[J].Anal Chem,2002,74(20):5383-5392.
    [11]E.F.Strittmatter,L.J.Kangas,K.Petritis,H.M.Mottaz,G.A.Anderson,Y.Shen,J. M.Jacobs,D.G.Camp,2nd and R.D.Smith.Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry[J].J Proteome Res,2004,3(4):760-769.
    [12]J.P.Dworzanski,A.P.Snyder,R.Chen,H.Zhang,D.Wishart and L.Li.Identification of bacteria using tandem mass spectrometry combined with a proteome database and statistical scoring[J].Anal Chem,2004,76(8):2355-2366.
    [13]R.Higdon,N.Kolker,A.Picone,G.van Belle and E.Kolker.LIP index for peptide classification using MS/MS and SEQUEST search via logistic regression[J].Omics,2004,8(4):357-369.
    [14]J.Razumovskaya,V.Olman,D.Xu,E.C.Uberbacher,N.C.VerBerkmoes,R.L.Hettich and Y.Xu.A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST[J].Proteomics,2004,4(4):961-969.
    [15]T.Baczek,A.Bucinski,A.R.Ivanov and R.Kaliszan.Artificial neural network analysis for evaluation of peptide MS/MS spectra in proteomics[J].Anal Chem,2004,76(6):1726-1732.
    [16]D.C.Anderson,W.Li,D.G.Payan and W.S.Noble.A new algorithm for the evaluation of shotgun peptide sequencing in proteomics:support vector machine classification of peptide MS/MS spectra and SEQUEST scores[J].J Proteome Res,2003,2(2):137-146.
    [17]P.J.Ulintz,J.Zhu,Z.S.Qin and P.C.Andrews.Improved classification of mass spectrometry database search results using newer machine learning approaches[J].Mol Cell Proteomics,2006,5(3):497-509.
    [18]M.Havilio,Y.Haddad and Z.Smilansky.Intensity-based statistical scorer for tandem mass spectrometry[J].Anal Chem,2003,75(3):435-444.
    [19]J.E.Elias,F.D.Gibbons,O.D.King,F.P.Roth and S.P.Gygi.Intensity-based protein identification by machine learning from a library of tandem mass spectra[J].Nat Biotechnol,2004,22(2):214-219.
    [20]F.D.Gibbons,J.E.Elias,S.P.Gygi and F.P.Roth.SILVER helps assign peptides to tandem mass spectra using intensity-based scoring[J].J Am Soc Mass Spectrom,2004,15(6):910-912.
    [21]C.Narasimhan,D.L.Tabb,N.C.Verberkmoes,M.R.Thompson,R.L.Hettich and E.C.Uberbacher.MASPIC:intensity-based tandem mass spectrometry scoring scheme that improves peptide identification at high confidence[J].Anal Chem,2005,77(23):7581-7593.
    [22]W.Sun,F.Li,J.Wang,D.Zheng and Y.Gao.AMASS:software for automatically validating the quality of MS/MS spectrum from SEQUEST results[J].Mol Cell Proteomics,2004,3(12):1194-1199.
    [23]F.Li,W.Sun,Y.Gao and J.Wang.RScore:a peptide randomicity score for evaluating tandem mass spectra[J].Rapid Commun Mass Spectrom,2004,18(14):1655-1659.
    [24]A.Keller,S.Purvine,A.I.Nesvizhskii,S.Stolyar,D.R.Goodlett and E.Kolker. Experimental protein mixture for validating tandem mass spectral analysis[J].Omics,2002,6(2):207-212.
    [25]W.Sun,S.Gao,L.Wang,Y.Chen,S.Wu,X.Wang,D.Zheng and Y.Gao.Microwave-assisted protein preparation and enzymatic digestion in proteomics[J].Mol Cell Proteomics,2006,5(4):769-776.
    [26]W.Sun,F.Li,S.Wu,X.Wang,D.Zheng,J.Wang and Y.Gao.Human urine proteome analysis by three separation approaches[J].Proteomics,2005,5(18):4994-5001.
    [27]D.J.States,G.S.Omenn,T.W.Blackwell,D.Fermin,J.Eng,D.W.Speicher and S.M.Hanash.Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study[J].Nat Biotechnol,2006,24(3):333-338.
    [28]王济川 郭志刚.Logistic回归模型-方法与应用[M].北京:高等教育出版社,2001:
    [29]J.E.Elias and S.P.Gygi.Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry[J].Nat Methods,2007,4(3):207-214.
    [30]R.Higdon,J.M.Hogan,G.Van Belle and E.Kolker.Randomized sequence databases for tandem mass spectrometry peptide and protein identification[J].Omics,2005,9(4):364-379.
    [1]B.C.Kone,T.Kuncewicz,W.Zhang and Z.Y Yu.Protein interactions with nitric oxide synthases:controlling the right time,the right place,and the right amount of nitric oxide[J].Am J Physiol Renal Physiol,2003,285(2):F178-190.
    [2]J.Wang.Protein recognition by cell surface receptors:physiological receptors versus virus interactions[J].Trends Biochem Sci,2002,27(3):122-126.
    [3]A.I.Archakov,V.M.Govorun,A.V.Dubanov,Y.D.Ivanov,A.V.Veselovsky,P.Lewi and P.Janssen.Protein-protein interactions as a target for drugs in proteomics[J].Proteomics,2003,3(4):380-391.
    [4]S.Fields and O.Song.A novel genetic system to detect protein-protein interactions[J].Nature,1989,340(6230):245-246.
    [5]A.C.Gavin,M.Bosche,R.Krause,P.Grandi,M.Marzioch,A.Bauer,J.Schultz,J.M.Rick,A.M.Michon,C.M.Cruciat,M.Remor,C.Hofert,M.Schelder,M.Brajenovic,H.Ruffher,A.Merino,K.Klein,M.Hudak,D.Dickson,T.Rudi,V.Gnau,A.Bauch,S.Bastuck,B.Huhse,C.Leutwein,M.A.Heurtier,R.R.Copley,A.Edelmann,E.Querfurth,V.Rybin,G.Drewes,M.Raida,T.Bouwmeester,P.Bork,B.Seraphin,B.Kuster,G.Neubauer and G.Superti-Furga.Functional organization of the yeast proteome by systematic analysis of protein complexes[J].Nature,2002,415(6868):141-147.
    [6]H.Zhu,M.Bilgin,R.Bangham,D.Hall,A.Casamayor,P.Bertone,N.Lan,R.Jansen,S.Bidlingmaier,T.Houfek,T.Mitchell,P.Miller,R.A.Dean,M.Gerstein and M.Snyder.Global analysis of protein activities using proteome chips[J].Science,2001,293(5537):2101-2105.
    [7]C.von Mering,R.Krause,B.Snel,M.Cornell,S.G.Oliver,S.Fields and P.Bork.Comparative assessment of large-scale data sets of protein-protein interactions[J].Nature,2002,417(6887):399-403.
    [8]T.R.Hazbun and S.Fields.Networking proteins in yeast[J].Proc Natl Acad Sci U S A,2001,98(8):4277-4278.
    [9]T.Ito,T.Chiba,R.Ozawa,M.Yoshida,M.Hattori and Y.Sakaki.A comprehensive two-hybrid analysis to explore the yeast protein interactome[J].Proc Natl Acad Sci U S A,2001,98(8):4569-4574.
    [10]P.Uetz,L.Giot,G.Cagney,T.A.Mansfield,R.S.Judson,J.R.Knight,D.Lockshon,V.Narayan,M.Srinivasan,P.Pochart,A.Qureshi-Emili,Y Li,B.Godwin,D.Conover,T.Kalbfleisch,G.Vijayadamodar,M.Yang,M.Johnston,S.Fields and J.M.Rothberg.A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae[J].Nature,2000,403(6770):623-627.
    [11]M.Pellegrini,E.M.Marcotte,M.J.Thompson,D.Eisenberg and T.O.Yeates.Assigning protein functions by comparative genome analysis:protein phylogenetic profiles[J].Proc Natl Acad Sci U S A,1999,96(8):4285-4288.
    [12]J.Sun,J.Xu,Z.Liu,Q.Liu,A.Zhao,T.Shi and Y.Li.Refined phylogenetic profiles method for predicting protein-protein interactions[J].Bioinformatics,2005,21(16):3409-3415.
    [13]J.Wu,S.Kasif and C.DeLisi.Identification of functional links between genes using phylogenetic profiles[J].Bioinformatics,2003,19(12):1524-1530.
    [14]J.Tamames,G.Casari,C.Ouzounis and A.Valencia.Conserved clusters of functionally related genes in two bacterial genomes[J].J Mol Evol,1997,44(1):66-73.
    [15]T.Dandekar,B.Snel,M.Huynen and P.Bork.Conservation of gene order:a fingerprint of proteins that physically interact[J].Trends Biochem Sci,1998,23(9):324-328.
    [16]R.Overbeek,M.Fonstein,M.D'Souza,G.D.Pusch and N.Maltsev.Use of contiguity on the chromosome to predict functional coupling[J].In Silico Biol,1999,1(2):93-108.
    [17]S.A.Teichmann and R.A.Veitia.Genes encoding subunits of stable complexes are clustered on the yeast chromosomes:an interpretation from a dosage balance perspective[J].Genetics,2004,167(4):2121-2125.
    [18]E.M.Marcotte,M.Pellegrini,H.L.Ng,D.W.Rice,T.O.Yeates and D.Eisenberg.Detecting protein function and protein-protein interactions from genome sequences[J].Science,1999,285(5428):751-753.
    [19]A.J.Enright,Ⅰ.Iliopoulos,N.C.Kyrpides and C.A.Ouzounis.Protein interaction maps for complete genomes based on gene fusion events[J].Nature,1999,402(6757):86-90.
    [20]S.Tsoka and C.A.Ouzounis.Prediction of protein interactions:metabolic enzymes are frequently involved in gene fusion[J].Nat Genet,2000,26(2):141-142.
    [21]U.Gobel,C.Sander,R.Schneider and A.Valencia.Correlated mutations and residue contacts in proteins[J].Proteins,1994,18(4):309-317.
    [22]F.Pazos,M.Helmer-Citterich,G.Ausiello and A.Valencia.Correlated mutations contain information about protein-protein interaction[J].J Mol Biol,1997,271(4):511-523.
    [23]A.J.Walhout,S.J.Boulton and M.Vidal.Yeast two-hybrid systems and protein interaction mapping projects for yeast and worm[J].Yeast,2000,17(2):88-94.
    [24]A.J.Walhout,R.Sordella,X.Lu,J.L.Hartley,G.F.Temple,M.A.Brasch,N.Thierry-Mieg and M.Vidal.Protein interaction mapping in C.elegans using proteins involved in vulval development[J].Science,2000,287(5450):116-122.
    [25]李春华,马晓慧,陈慰祖and王存新.蛋白质—蛋白质分子对接方法研究进展[J].生物化学与生物物理进展,2006,33(7):616-621.
    [26]A.A.Bogan and K.S.Thorn.Anatomy of hot spots in protein interfaces[J].J Mol Biol,1998,280(1):1-9.
    [27]W.L.DeLano.Unraveling hot spots in binding interfaces:progress and challenges[J].Curr Opin Struct Biol,2002,12(1):14-20.
    [28]A.Valencia and F.Pazos.Computational methods for the prediction of protein interactions[J].Curr Opin Struct Biol,2002,12(3):368-373.
    [29]E.Katchalski-Katzir,I.Shariv,M.Eisenstein,A.A.Friesem,C.Aflalo and I.A.Vakser.Molecular surface recognition:determination of geometric fit between proteins and their ligands by correlation techniques[J].Proc Natl Acad Sci U S A,1992, 89(6):2195-2199.
    [30]R.Chen,L.Li and Z.Weng.ZDOCK:an initial-stage protein-docking algorithm[J].Proteins,2003,52(1):80-87.
    [31]R.M.Jackson,H.A.Gabb and M.J.Sternberg.Rapid refinement of protein interfaces incorporating solvation:application to the docking problem[J].J Mol Biol,1998,276(1):265-285.
    [32]J.G.Mandell,V.A.Roberts,M.E.Pique,V.Kotlovyi,J.C.Mitchell,E.Nelson,I.Tsigelny and L.F.Ten Eyck.Protein docking using continuum electrostatics and geometric fit[J].Protein Eng,2001,14(2):105-113.
    [33]E.J.Gardiner,P.Willett and P.J.Artymiuk.Protein docking using a genetic algorithm[J].Proteins,2001,44(1):44-56.
    [34]J.S.Taylor and R.M.Burnett.DARWIN:a program for docking flexible molecules[J].Proteins,2000,41(2):173-191.
    [35]J.J.Gray,S.Moughon,C.Wang,O.Schueler-Furman,B.Kuhlman,C.A.Rohl and D.Baker.Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations[J].J Mol Biol,2003,331(l):281-299.
    [36]C.J.Camacho and D.W.Gatchell.Successful discrimination of protein interactions[J].Proteins,2003,52(1):92-97.
    [37]C.Zhang,S.Liu and Y.Zhou.Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential[J].Protein Sci,2004,13(2):391-399.
    [38]C.Zhang,G.Vasmatzis,J.L.Cornette and C.DeLisi.Determination of atomic desolvation energies from the structures of crystallized proteins[J].J Mol Biol,1997,267(3):707-726.
    [39]G.Moont,H.A.Gabb and M.J.Sternberg.Use of pair potentials across protein interfaces in screening predicted docked complexes[J].Proteins,1999,35(3):364-373.
    [40]T.Pawson and P.Nash.Assembly of cell regulatory systems through protein interaction domains[J].Science,2003,300(5618):445-452.
    [41]J.C.Obenauer,L.C.Cantley and M.B.Yaffe.Scansite 2.0:Proteome-wide prediction of cell signaling interactions using short sequence motifs[J].Nucleic Acids Res,2003,31(13):3635-3641.
    [42]A.H.Tong,B.Drees,G.Nardelli,G.D.Bader,B.Brannetti,L.Castagnoli,M.Evangelista,S.Ferracuti,B.Nelson,S.Paoluzi,M.Quondam,A.Zucconi,C.W.Hogue,S.Fields,C.Boone and G.Cesareni.A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules[J].Science,2002,295(5553):321-324.
    [43]M.A.Stiffler,J.R.Chen,V.P.Grantcharova,Y.Lei,D.Fuchs,J.E.Allen,L.A.Zaslavskaia and G.MacBeath.PDZ domain binding selectivity is optimized across the mouse proteome[J].Science,2007,317(5836):364-369.
    [44]M.C.Honeyman,V.Brusic,N.L.Stone and L.C.Harrison.Neural network-based prediction of candidate T-cell epitopes[J].Nat Biotechnol,1998,16(10):966-969.
    [45]P.Donnes and A.Elofsson.Prediction of MHC class I binding peptides,using SVMHC[J].BMC Bioinformatics,2002,3:25.
    [46]D.J.Reiss and B.Schwikowski.Predicting protein-peptide interactions via a network-based motif sampler[J].Bioinformatics,2004,20(Suppl 1):i274-282.
    [47]J.R.Bock and D.A.Gough.Predicting protein--protein interactions from primary structure[J].Bioinformatics,2001,17(5):455-460.
    [48]W.P.Lehrach,D.Husmeier and C.K.Williams.A regularized discriminative model for the prediction of protein-peptide interactions[J].Bioinformatics,2006,22(5):532-540.
    [49]S.Martin,D.Roe and J.L.Faulon.Predicting protein-protein interactions using signature products[J].Bioinformatics,2005,21(2):218-226.
    [50]B.Brannetti,A.Via,G.Cestra,G Cesareni and M.Helmer-Citterich.SH3-SPOT:an algorithm to predict preferred ligands to different members of the SH3 gene family[J].J Mol Biol,2000,298(2):313-328.
    [51]L.Zhang,C.Shao,D.Zheng and Y.Gao.An integrated machine learning system to computationally screen protein databases for protein binding peptide ligands[J].Mol Cell Proteomics,2006,5(7):1224-1232.
    [52]M.Ashburner,C.A.Ball,J.A.Blake,D.Botstein,H.Butler,J.M.Cherry,A.P.Davis,K.Dolinski,S.S.Dwight,J.T.Eppig,M.A.Harris,D.P.Hill,L.Issel-Tarver,A.Kasarskis,S.Lewis,J.C.Matese,J.E.Richardson,M.Ringwald,G.M.Rubin and G.Sherlock.Gene ontology:tool for the unification of biology.The Gene Ontology Consortium[J].Nat Genet,2000,25(1):25-29.
    [53]R.Jansen,H.Yu,D.Greenbaum,Y.Kluger,N.J.Krogan,S.Chung,A.Emili,M.Snyder,J.F.Greenblatt and M.Gerstein.A Bayesian networks approach for predicting protein-protein interactions from genomic data[J].Science,2003,302(5644):449-453.
    [54]D.R.Rhodes,S.A.Tomlins,S.Varambally,V.Mahavisno,T.Barrette,S.Kalyana-Sundaram,D.Ghosh,A.Pandey and A.M.Chinnaiyan.Probabilistic model of the human protein-protein interaction network[J].Nat Biotechnol,2005,23(8):951-959.
    [55]U.Stelzl,U.Worm,M.Lalowski,C.Haenig,F.H.Brembeck,H.Goehler,M.Stroedicke,M.Zenkner,A.Schoenherr,S.Koeppen,J.Timm,S.Mintzlaff,C.Abraham,N.Bock,S.Kietzmann,A.Goedde,E.Toksoz,A.Droege,S.Krobitsch,B.Korn,W.Birchmeier,H.Lehrach and E.E.Wanker.A human protein-protein interaction network:a resource for annotating the proteome[J].Cell,2005,122(6):957-968.
    [56]P.W.Lord,R.D.Stevens,A.Brass and C.A.Goble.Investigating semantic similarity measures across the Gene Ontology:the relationship between sequence and annotation[J].Bioinformatics,2003,19(10):1275-1283.
    [57]H.Wu,Z.Su,F.Mao,V.Olman and Y.Xu.Prediction of functional modules based on comparative genome analysis and Gene Ontology application[J].Nucleic Acids Res,2005,33(9):2822-2837.
    [58]X.Wu,L.Zhu,J.Guo,D.Y.Zhang and K.Lin.Prediction of yeast protein-protein interaction network:insights from the Gene Ontology and annotations[J].Nucleic Acids Res,2006,34(7):2137-2150.
    [59]Y.Yamanishi,J.P.Vert and M.Kanehisa.Protein network inference from multiple genomic data:a supervised approach[J].Bioinformatics,2004,20(Suppl 1):i363-370.
    [60]T.Kato,K.Tsuda and K.Asai.Selective integration of multiple biological data for supervised network inference[J].Bioinformatics,2005,21(10):2488-2495.
    [61]T.W.Huang,C.Y.Lin and C.Y Kao.Reconstruction of human protein interolog network using evolutionary conserved network[J].BMC Bioinformatics,2007,8:152.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700