基于智能计算的蛋白质相互作用预测方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类基因组计划的顺利完成,科学家们获得了大量的序列信息,进而人类由以基因组研究为主的时代踏进了以研究功能基因组为标志的后基因组时代。在后基因组时代里,蛋白质组学研究是生物信息学的重要分支之一,这是因为生物体内的各种生理功能的执行都离不开蛋白质以及蛋白质与其它配体之间的协同合作。
     蛋白质作为生命体的主要基本物质之一,其蛋白质间的相互作用不仅对细胞和生物通路的功能发挥着关键性的作用,而且理解这些相互作用对各种疾病的发病机理和治疗也具有极大的推进作用。因此蛋白质组学研究中一个最重要的挑战就是如何从物理和结构层次上大规模地了解蛋白质-蛋白质之间的相互作用和构建相应的蛋白质相互作用网络,一般常见的研究方法是根据其已知的蛋白质及其配体的一级结构序列,提取出序列中所包含的有用信息,利用实验和计算方法结合这些信息来预测蛋白质之间发生相互作用的可能性并建立蛋白质相互作用网络。随着X晶体衍射和核磁共振(NMR)实验技术的进步,大量蛋白质结构数据被测定出来,这些数据信息进一步促进了开发基于数据驱动的方法(计算方法)来预测蛋白质相互作用。
     本文采用智能计算中相关的算法围绕蛋白质相互作用里的一些基本问题进行着重研究。主要内容包括微观层次上的蛋白质相互作用位点的预测、宏观层次上的蛋白质相互作用预测以及蛋白质相互作用网络的构建。针对这三方面的问题,我们分别进行了深入的分析并提出相对应的预测方法,详细内容分别如下:
     1.通过对蛋白质相互作用所形成的界面中表面残基和界面残基的分析,将覆盖算法引入到蛋白质相互作用位点预测中。该算法能够很好地结合蛋白质相互作用界面上界面残基在空间结构和一级序列结构中有聚类现象的特点,首先分别将界面残基样本和表面残基样本设想成在一个n维空间的球面上(通过某种方式的转换),然后提取两种基本的蛋白质序列特征:序列谱和溶剂可及表面积。利用覆盖算法从初始界面残基样本开始计算以该样本为中心,以与异类样本最近距离和与同类样本最远距离的一半作为半径画圆,构造一个覆盖用以覆盖同类样本,然后再以异类样本为中心,以相同方式构造覆盖,如此这样交叉进行。我们根据数据特点,实验分别构造了两种数据集(完全集和平衡集),设计了该方法和传统的机器学习算法(SVM,ME)在两种数据集上的实验。实验结果显示该算法在两种数据集上的结果是有效的,可行的。最后给出几种算法在两个复合物上相互作用位点定位的实例,进一步说明该算法对未知蛋白质相互作用位点具有较强的适应性和预测能力。
     2.可用于蛋白质相互作用位点预测的特征非常之多,不同的研究者使用不同的特征组合从而得到不同的结果。由于各种特征从不同角度对蛋白质相互作用位点预测提供的信息都不尽相同,其中一些特征对分类器的预测能力毫无作用,甚至可能会降低预测结果。因此,我们针对蛋白质相互作用位点预测的特征选择问题,提出了一种新的基于遗传算法(GA)和支持向量机(SVM)相组合的特征提取算法。该算法利用GA从原始基本特征所组成的110维蛋白质序列向量中提取出相对重要的68种特征,同时对提取出的特征采用SVM进行评估。我们将个体的适应度评价指标设置为算法分类能力的敏感度和特异度的均衡值F1-measure,这有利于寻找出对分类器各种性能指标相均衡的特征组合。实验分别设计了随机分类器、两阶段分类器、SVM和GA/SVM分类器。实验结果表明,这种基于GA/SVM特征提取算法的蛋白质相互作用位点预测方法具有较好的鲁棒性,取得了比原始特征和其它方法更好的性能。
     3.蛋白质相互作用预测的一个关键问题是如何有效地转换相互作用的蛋白质序列对信息,因为不同的蛋白质序列信息转换方法所表达的信息量会大不相同,由此产生不同的分类性能。为此,我们提出了一种称为氨基酸排序信息的蛋白质序列转换方法(伪氨基酸组成,PseAA),这种方法不仅考虑到蛋白质序列的基本氨基酸组成,同时也把氨基酸间的短程、中程和远程相互作用的影响放入蛋白质序列信息表达中。实验采用SVM对新的蛋白质序列编码方案进行学习和分类,同时为了与其它方法进行性能比较,实验也设计了另外三种转换方法,相关系数变换(CC)、自协方差表达变换(AC)和氨基酸组成(AAC)。实验结果表明,我们提出的序列编码方案的分类性能在四种方法中居于第二,四种方法中AC方法变换所产生的维数最高,达到了840;CC方法次之,为420;AAC方法最低,仅为40;我们提出的方法维数也只有100。因此,综合其性能和所需的代价,我们提出的蛋白质相互作用序列转换方法是有效的,可行的。
     4.蛋白质相互作用预测仅仅是从蛋白质层次进行研究,而生命体的各种功能都与其细胞内蛋白质形成的相互作用网络调控相关。为此,我们利用前面蛋白质相互作用预测方法所获得的分类器模型,从BioGrid数据库中提取出两种类型的相互作用网络数据用于测试。我们分别对两种相互作用网络中所有的蛋白质序列利用伪氨基酸组成方法转换成相应的离散化向量,然后再利用分类器模型进行预测,最后对预测的结果绘制了蛋白质相互作用网络图谱。实验结果表明,该方法在蛋白质相互作用网络的构建上也同样是有效的。
With the Human Genome Project successfully completed, the scientists got a lot of sequence information, and then the human from the era of research-based genome to the era of research-based functional genomics, it also be post-genomic era. Proteomics is an important branch of research in Post-genomic era, because in vivo the implementation of a variety of physiological functions was depended on protein and protein-ligand collaboration interaction.
     Proteins are one of the basic materials of the life, the interaction between the proteins not only plays a key role of functions of cells and biological pathways, and understanding of these interactions on the pathogenesis of various diseases and treatment also has a positive promoting effect. One of the most important challenge on proteomics research is how to large-scale understand protein-protein interactions from the physical level and structural level and build the corresponding protein-protein interaction network, the general common research method was based on its known protein primary structure and ligand sequences, extracted useful information from sequence, using experimental or computational methods combined these information to predict the possibility of interaction between proteins and the establishment of protein interaction networks. With the progress of X crystallography and nuclear magnetic resonance (NMR) experiment technological, a lot of protein structure data to be measured out, then these data further promote the development of data-driven based method (calculation method) to predict protein interactions.
     In this paper, we use computational intelligence algorithms to research some basic issues on protein-protein interactions. The main contents include protein interaction sites prediction on the micro-level, prediction of protein interactions on macro-level and protein-protein interaction network construction. For these three areas, we conduct in-depth analysis and put forward the corresponding predicting methods, the details are as follows:
     1 Through the analysis on surface residues and interface residues of protein-protein interface, we introduce covering algorithm to protein-protein interaction sites prediction. This algorithm can work well with the clustering phenomenon of interface residues in the spatial structure and primary sequence, respectively. The first, interface residue samples and surface residue samples are conceived on a sphere of an n-dimensional space (through some form of conversion), then extracted two basic characteristics of the protein sequence:protein sequence profile and solvent accessible surface area. We use one of the interface residue samples as the center dot, and compute the minimum distance with heterogeneous samples, the maximum distance with the same class samples, then draw a circle using the half distance of the minimum and the maximum distance and construct a cover into cover the same class samples, and then the center is changed into heterogeneous sample, construct cover in the same way, so that alternately. According to the data characteristics, we constructed two experimental datasets (Complete dataset and Trim dataset), we design experiment on our method and the traditional machine learning algorithms (SVM, ME) in the two data sets. Experimental results show that the algorithms are effective and feasible on the results of the two data sets. Finally, we give two examples about protein interaction sites location based on several algorithms, and further shows this algorithm has strong adaptability and predictability for unknown protein interaction sites
     2 The features of predicting protein interaction sites are very many, different researchers use different features combinations to get different results. Because of various features provided different the information from different angles on the prediction of protein interaction sites, some of which features are useless on the classifier's predictive power at all, and may even reduce the predicting results. Therefore, we focused on feature selection of prediction of protein interaction sites, proposed a new feature extraction algorithm based on the combination of genetic algorithm (GA) and support vector machine (SVM). The algorithm extracted the relative importance 68 features from the original 110-dimensional vector of protein sequences using GA, and evaluated the extracted features by SVM. We will individual fitness function is set to Fl-measure, equilibrium value of the sensitivity and specificity, this will help to find out balanced the performance of the classifier with all parameters. We designed random classifier, two-stage classifier, SVM and GA/SVM classifier experiments. The experimental results showed that the proposed GA/SVM feature extraction algorithm is robust and made better performance than other methods and the original features.
     3 A key problem is how to effectively convert the protein sequence information about protein-protein interaction prediction, because different conversion methods of protein sequence information express different the amount of information, and resulting in different classification performance. So, we propose a amino acid order information method (pseudo amino acid composition, PseAA), this method not only take into account basic amino acid composition, but also short-range, medium-range and long-range interaction of amino acids in the protein expression. We use SVM to learn and classify it for new protein sequence coding scheme, while for the performance comparison with other methods, we designed three other conversion methods, such as correlation coefficient (CC), auto covariance (AC) and amino acid composition (AAC). Experimental results show that the classification performance is in the second by our proposed coding scheme of sequence. In the four methods, AC method produces the highest dimension, reaching 840; CC method followed for the 420; AAC method the lowest, only 40; our proposed dimension is 100. Therefore, from the performance and cost angle, the proposed protein-protein interaction sequence conversion method is effective and feasible.
     4 Protein interaction prediction is only from the protein level, but a variety of functions of life related to protein interaction network. Therefore, we use obtained classification model, and extract two types of interaction network data from the BioGrid database for testing. We convert protein sequences of interaction network into the corresponding discrete vector by pseudo amino acid composition method, and then use classification model to predict them, finally draw the map for the prediction results of protein interaction networks. Experimental results show that this method is also effective on the construction of the protein interaction network.
引文
[1]Olivier M.,Aggarwal A.,Allen J., et al. A high-resolution radiation hybrid map of the human genome draft sequence [J]. Science,2001,291 (5507):1298-1302.
    [2]Venter J. C.,Adams M. D.,Myers E. W., et al. The sequence of the human genome [J]. Science,2001,291(5507):1304-1351.
    [3]Baxevanis A. D. The molecular biology database collection:2003 update [J]. Nucleic Acids Research,2003,31(1):1-12.
    [4]Baxevanis A. D. The Molecular Biology Database Collection:an updated compilation of biological database resources [J]. Nucleic Acids Research,2001, 29(1):1-10.
    [5]Baxevanis A. D. The molecular biology database collection:an online compilation of relevant database resources [J]. Nucleic Acids Research,2000, 28(1):1-7.
    [6]王翼飞,史定华.生物信息学:智能化算法及其应用[M].北京:化学工业出版社,2006:1-4.
    [7]Akalin P. K. Introduction to bioinformatics [J]. Molecular Nutrition & Food Research,2006,50(7):610-619.
    [8]Durbin R.,Eddy S. R.,Krogh A., et al. Biological sequence analysis: Probabilistic models of proteins and nucleic acids [M]. Cambridge Univ Pr,1998.
    [9]Mount D. W. Bioinformatics:sequence and genome analysis [M]. Cold Spring Harbor Laboratory press,2004.
    [10]Wang J. T. L.,Zaki M. J.,Toivonen H. T. T., et al. Data mining in bioinformatics [M]. Springer Verlag,2005.
    [11]曹建平,马义才,李亦学等.计算方法在蛋白质相互作用研究中的应用[J].生命科学,2005,17(1):82-87.
    [12]朱新宇,沈百荣.预测蛋白质间相互作用的生物信息学方法[J].生物技术通讯,2004,15(1):70-72.
    [13]Anfinsen C. B., Haber E. Studies on the reduction and re-formation of protein disulfide bonds [J]. Journal of Biological Chemistry,1961,236(5):1361-1363.
    [14]Ofran Y., Rost B. Predicted protein-protein interaction sites from local sequence information [J]. FEBS Letters,2003,544(1):236-239.
    [15]高莹,来鲁华.蛋白质一蛋白质相互作用界面统计分析[J].物理化学学报,2004,20(7):676-679.
    [16]Harlow E.,Whyte P.,Franza Jr B. R., et al. Association of adenovirus early-region 1A proteins with cellular polypeptides [J]. Molecular and Cellular Biology,1986,6(5):1579-1589.
    [17]Mitchell D. A., Marshall T. K., Deschenes R. J. Vectors for the inducible overexpression of glutathione S-transferase fusion proteins in yeast [J]. Yeast,1993, 9(7):715-715.
    [18]Uetz P.,Giot L.,Cagney G., et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae [J]. Nature,2000,403(6770):623-627.
    [19]Ito T.,Chiba T.,Ozawa R., et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome [J]. Proceedings of the National Academy of Sciences, 2001,98(8):4569-4574.
    [20]Williams C., Addona T. A. The integration of SPR biosensors with mass spectrometry:possible applications for proteome analysis [J]. Trends in Biotechnology,2000,18(2):45-48.
    [21]Multhaup G.,Strausak D.,Bissig K. D., et al. Interaction of the CopZ copper chaperone with the CopA copper ATPase of Enterococcus hirae assessed by surface plasmon resonance [J]. Biochemical and Biophysical Research Communications, 2001,288(1):172-177.
    [22]Ho Y.,Gruhler A.,Heilbut A., et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry [J]. Nature,2002, 415(6868):180-183.
    [23]Mann M., Pandey A. Use of mass spectrometry-derived data to annotate nucleotide and protein sequence databases [J]. Trends in Biochemical Sciences,2001, 26(1):54-61.
    [24]Zhu H.,Bilgin M.,Bangham R., et al. Global analysis of protein activities using proteome chips [J]. Science,2001,193(5537):2101-2105.
    [25]Tong A. H. Y.,Drees B.,Nardelli G., et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules [J]. Science,2002,295:321-324.
    [26]Ge H.,Liu Z.,Church G. M., et al. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae [J]. Nature Genetics,2001, 29(4):482-486.
    [27]Tong A. H. Y.,Evangelista M.,Parsons A. B., et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants [J]. Science,2001, 294(5550):2364-2368.
    [28]Franzot G., Carugo O. Computational approaches to protein-protein interaction [J]. Journal of Structural and Functional Genomics,2003,4(4):245-255.
    [29]Aloy P., Russell R. B. The third dimension for protein interactions and complexes [J]. Trends in Biochemical Sciences,2002,27(12):633-638.
    [30]倪青山,王正志,黎刚果等.基于K近邻的蛋白质功能的预测方法[J].生物医学工程研究,2009,28(2).
    [31]杨晓飞.基于多源数据融合的蛋白质-蛋白质相互作用网络构建方法研究[D].合肥:中国科学技术大学,2009.
    [32]史明光.蛋白质相互作用预测方法的研究[D].合肥:中国科学技术大学,2009.
    [33]马志强.蛋白质功能预测的非同源性计算方法研究[D].长春:吉林大学,2009.
    [34]代利坚.蛋白质相互作用预测及其假阳性过滤研究[D].长沙:中南大学,2009.
    [35]谢江.蛋白质相互作用网络的数值研究[D].上海:上海大学,2008.
    [36]陆林英.基于序列从头预测法的蛋白质相互作用研究[D].长春:东北师范大学,2008.
    [37]李舟军,陈义明,刘军万等.蛋白质相互作用研究中的计算方法综述[J].计 算机研究与发展,2008,45(012):2129-2137.
    [38]蔡钊.蛋白质网络中相互作用及功能预测算法的研究[D].长沙:中南大学,2008.
    [39]张坤鹏.基于智能计算模型的蛋白质功能位点的预测[D].合肥:中国科学技术大学,2007.
    [40]于建涛,郭茂祖,蔡禄.蛋白质相互作用及其网络预测方法研究进展[J].电子学报,2007,35(B12):1-7.
    [41]倪青山,王正志,王广云等.基于局部支持向量机的蛋白质相互作用的预测方法[J].生物医学工程研究,2008,27(2):69-73.
    [42]Zhou H. X., Shan Y. Prediction of Protein Interaction Sites From Sequence Profile and Residue Neighbor List [J]. PROTEINS:Structure, Function, and Genetics,2001,44(3):336-343.
    [43]Erkizan H. V.,Kong Y.,Merchant M., et al. A small molecule blocking oncogenic protein EWS-FLI1 interaction with RNA helicase A inhibits growth of Ewing's sarcoma [J]. Nature Medicine,2009,15(7):750-756.
    [44]Crick F. Central dogma of molecular biology [J]. Nature,1970, 227(5258):561-563.
    [45]Fields S., Song O. A novel genetic system to detect protein protein interactions [J]. Nature,1989,340(6230):245-246.
    [46]Stephens D. J., Banting G. The use of yeast two-hybrid screens in studies of protein:protein interactions involved in trafficking [J]. Traffic,2000, 1(10):763-768.
    [47]Semple J. I., Sanderson C. M., Campbell R. D. The jury is out on'guilt by association'trials [J]. Briefings in Functional Genomics and Proteomics,2002, 1(1):40-52.
    [48]James P., Halladay J., Craig E. A. Genomic libraries and a host strain designed for highly efficient two-hybrid selection in yeast [J]. Genetics,1996, 144(4):1425-1436.
    [49]Vidal M.,Brachmann R. K.,Fattaey A., et al. Reverse two-hybrid and one-hybrid systems to detect dissociation of protein-protein and DNA-protein interactions [J]. Proceedings of the National Academy of Sciences,1996,93(19):10315-103120.
    [50]Vidal M.,Braun P.,Chen E., et al. Genetic characterization of a mammalian protein-protein interaction domain by using a yeast reverse two-hybrid system [J]. Proceedings of the National Academy of Sciences,1996,93(19):10321-10326.
    [51]McMahon S. B.,Van Buskirk H. A.,Dugan K. A., et al. The novel ATM-related protein TRRAP is an essential cofactor for the c-Myc and E2F oncoproteins [J]. Cell, 1998,94(3):363-374.
    [52]Rigaut G.,Shevchenko A.,Rutz B., et al. A generic protein purification method for protein complex characterization and proteome exploration [J]. Nature Biotechnology,1999,17(10):1030-1032.
    [53]Gavin A. C.,B sche M.,Krause R., et al. Functional organization of the yeast proteome by systematic analysis of protein complexes [J]. Nature,2002, 415(6868):141-147.
    [54]Rohila J. S.,Chen M.,Cerny R., et al. Improved tandem affinity purification tag and methods for isolation of protein heterocomplexes from plants [J]. The Plant Journal,2004,38(1):172-181.
    [55]Chen Y., Xu D. Computational analyses of high-throughput protein-protein interaction data [J]. Current Protein and Peptide Science,2003,4(3):159-180.
    [56]Bork P.,Jensen L. J.,von Mering C., et al. Protein interaction networks from yeast to human [J]. Current Opinion in Structural Biology,2004,14(3):292-299.
    [57]李明辉.基于机器学习的蛋白质二级结构和相互作用预测[D].哈尔滨:哈尔滨工业大学,2007.
    [58]胡佳.用生物统计方法预测蛋白质相互作用[D].上海:同济大学,2007.
    [59]陈鹏.蛋白质残基间的相互作用分析与预测[D].合肥:中国科学技术大学,2007.
    [60]王燕.机器学习在蛋白质结构和功能预测中的应用研究[D].武汉:华中科学技术大学,2006.
    [61]王秀鹤.基于序列和相互作用的蛋白质功能预测[D].长沙:国防科学技术大学,2006.
    [62]王兵.蛋白质相互作用及其位点的预测方法研究[D].合肥:中国科学技术大学,2006.
    [63]邵壮超.基于多分类器组合的蛋白质一蛋白质相互作用位点预测研究[D].西安:西北工业大学,2006.
    [64]任仙文,李北平,王月兰等.蛋白质相互作用的生物信息学研究进展[J].生物技术通讯,2006,17(6):976-980.
    [65]刘阳,张冬宁,邵建林等.基于序列剖面和可及表面积的蛋白质相互作用位点的预测[J].上海大学学报:自然科学版,2006,12(6):593-598.
    [66]刘阳.基于SVM的蛋白质相互作用位点的预测研究[D].上海:上海大学,2006.
    [67]安书君.改进的蛋白质相互作用位点预测方法研究[D].哈尔滨:哈尔滨工业大学,2006.
    [68]刘翔.基于共鸣识别模型的蛋白质相互作用预测的研究和算法实现[D].上海:上海大学,2005.
    [69]曹建平.生物信息学方法研究蛋白质相互作用[D].成都:电子科学技术大学,2005.
    [70]Jones S., Thornton J. M. Principles of protein-protein interactions [J]. Proceedings of the National Academy of Sciences,1996,93(1):13-20.
    [71]Zhou H. X., Qin S. Interaction-site prediction for protein complexes:a critical assessment [J]. Bioinformatics,2007,23(17):2203-2209.
    [72]Kufareva I.,Budagyan L.,Raush E., et al. PIER:protein interface recognition for structural proteomics [J]. Proteins:Structure, Function, and Bioinformatics,2007, 67(2):400-417.
    [73]Li J. J.,Huang D. S.,Wang B., et al. Identifying protein-protein interfacial residues in heterocomplexes using residue conservation scores [J]. International Journal of Biological Macromolecules,2006,38(3-5):241-247.
    [74]Burgoyne N. J., Jackson R. M. Predicting protein interaction sites:binding hot-spots in protein-protein and protein-ligand interfaces [J]. Bioinformatics,2006, 22(11):1335-1342.
    [75]de Vries S. J., van Dijk A. D., Bonvin A. M. WHISCY:what information does surface conservation yield?Application to data-driven docking [J]. Proteins,2006, 63(3):479-489.
    [76]Hoskins J., Lovell S., Blundell T. L. An algorithm for predicting protein-protein interaction sites:abnormally exposed amino acid residues and secondary structure elements [J]. Protein Science,2006,15(5):1017-1029.
    [77]Landau M.,Mayrose I.,Rosenberg Y., et al. ConSurf2005:The projection of evolutionary conservation scores of residues on protein structures [J]. Nucleic Acids Res,2005,33:W299-W302.
    [78]Liang S.,Zhang C.,Liu S., et al. Protein binding site prediction using an empirical scoring function [J]. Nucleic Acids Research,2006,34(13):3698-3707.
    [79]Murakami Y., Jones S. SHARP2:protein-protein interaction predictions using patch analysis [J]. Bioinformatics,2006,22(14):1794-1795.
    [80]Bordner A. J., Abagyan R. Statistical analysis and prediction of protein-protein interfaces [J]. Proteins:Struct. Func. Bioinf,2005,60(3):353-366.
    [81]Bradford J. R., Westhead D. R. Improved prediction of protein-protein binding sites using a support vector machines approach [J]. Bioinformatics,2005, 21(8):1487-1494.
    [82]Chung J. L., Wang W., Bourne P. E. Exploiting sequence and structure homologs to identify protein-protein binding sites [J]. PROTEINS-NEW YORK-,2006, 62(3):630-640.
    [83]Koike A., Takagi T. Prediction of protein-protein interaction sites using support vector machines [J]. Protein Engineering Design and Selection,2004, 17(2):165-173.
    [84]Res I., Mihalek I., Lichtarge O. An evolution based classifier for prediction of protein interfaces without using protein structures [J]. Bioinformatics,2005, 21(10):2496-2501.
    [85]Wang B.,Chen P.,Huang D. S., et al. Predicting protein interaction sites from residue spatial sequence profile and evolution rate [J]. FEBS Letters,2006, 580(2):380-384.
    [86]Wang B., San Wong H., Huang D. S. Inferring protein-protein interacting sites using residue conservation and evolutionary information [J]. Protein and Peptide Letters,2006,13(10):999-1005.
    [87]Yan C., Honavar V., Dobbs D. Identification of interface residues in protease-inhibitor and antigen-antibody complexes:a support vector machine approach [J]. Neural Computing & Applications,2004,13(2):123-129.
    [88]Chen H., Zhou H. X. Prediction of interface residues in protein-protein complexes by a consensus neural network method:test against NMR data [J]. Proteins:Structure, Function, and Bioinformatics,2005,61(1):21-35.
    [89]Fariselli P.,Pazos F.,Valencia A., et al. Prediction of protein-protein interaction sites in heterocomplexes with neural networks [J]. European Journal of Biochemistry, 2002,269(5):1356-1361.
    [90]Porollo A., Meller J. Prediction-based fingerprints of protein-protein interactions [J]. Proteins:Structure, Function, and Bioinformatics,2007,66(3):630-645.
    [91]Bradford J. R.,Needham C. J.,Bulpitt A. J., et al. Insights into protein-protein interfaces using a Bayesian network prediction method [J]. Journal of Molecular Biology,2006,362(2):365-386.
    [92]Jansen R.,Yu H.,Greenbaum D., et al. A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data [J]. Science,2003, 302(5644):449-453.
    [93]Friedrich T.,Pils B.,Dandekar T., et al. Modelling interaction sites in protein domains with interaction profile hidden Markov models [J]. Bioinformatics,2006, 22(23):2851-2857.
    [94]Wojcik J., Schachter V. Protein-protein interaction map inference using interacting domain profile pairs [J]. Bioinformatics,2001,17:S296-S305.
    [95]Li M. H.,Lin L.,Wang X. L., et al. Protein-protein interaction site prediction based on conditional random fields [J]. Bioinformatics,2007,23(5):597-604.
    [96]Pitre S.,Alamgir M.,R.Green G., et al. Computational Methods For Predicting Protein-Protein interactions [J]. Advances in Biochemical Engineering/ Biotechnology,2008,110:247-267.
    [97]Marcotte E. M. Computational genetics:finding protein function by nonhomology methods [J]. Current Opinion in Structural Biology,2000, 10(3):359-365.
    [98]Eisen M. B.,Spellman P. T.,Brown P. O., et al. Cluster analysis and display of genome-wide expression patterns [J]. Proceedings of the National Academy of Sciences,1998,95(25):14863-14868.
    [99]Tamames J.,Casari G.,Ouzounis C., et al. Conserved clusters of functionally related genes in two bacterial genomes [J]. Journal of Molecular Evolution,1997, 44(1):66-73.
    [100]Overbeek R.,Fonstein M.,D'Souza M., et al. Use of contiguity on the chromosome to predict functional coupling [J]. In Silico Biology,1999, 1(2):93-108.
    [101]Dandekar T.,Snel B.,Huynen M., et al. Conservation of gene order:a fingerprint of proteins that physically interact [J]. Trends in Biochemical Sciences, 1998,23(9):324-328.
    [102]Enright A. J.,Iliopoulos I.,Kyrpides N. C., et al. Protein interaction maps for complete genomes based on gene fusion events [J]. Nature,1999,402(6757):86-90.
    [103]Marcotte E. M.,Pellegrini M.,Ng H. L., et al. Detecting protein function and protein-protein interactions from genome sequences [J]. Science,1999, 285(5428):751-753.
    [104]Goh C. S.,Bogan A. A.,Joachimiak M., et al. Co-evolution of proteins with their interaction partners [J]. Journal of Molecular Biology,2000,299(2):283-293.
    [105]Pazos F., Valencia A. Similarity of phylogenetic trees as indicator of protein-protein interaction [J]. Protein Engineering Design and Selection,2001, 14(9):609-614.
    [106]Ramani A. K., Marcotte E. M. Exploiting the co-evolution of interacting proteins to discover interaction specificity [J]. Journal of Molecular Biology,2003, 327(1):273-284.
    [107]Pellegrini M.,Marcotte E.,Thompson M., et al. Assigning protein functions by comparative genome analysis:protein phylogenetic profiles [J]. Proceedings of the National Academy of Sciences,1999,96:4285-4288.
    [108]Vert J. P. A tree kernel to analyse phylogenetic profiles [J]. Bioinformatics, 2002,18(1):276-284.
    [109]Dutta S.,Burkhardt K.,Young J., et al. Data deposition and annotation at the worldwide protein data bank [J]. Molecular Biotechnology,2009,42(1):1-13.
    [110]Berman H. M. The protein data bank:a historical perspective [J]. Acta Crystallographica Section A:Foundations of Crystallography,2007,64(1):88-95.
    [111]Lu L., Lu H., Skolnick J. MULTIPROSPECTOR:an algorithm for the prediction of protein-protein interactions by multimeric threading [J]. Proteins Structure Function and Genetics,2002,49(3):350-364.
    [112]Aloy P., Russell R. B. Interrogating protein interaction networks through structural biology [J]. Proceedings of the National Academy of Sciences,2002, 99(9):5896-5901.
    [113]Aloy P., Russell R. B. InterPreTS:protein interaction prediction through tertiary structure [J]. Bioinformatics,2003,19(1):161-162.
    [114]Bock J. R., Gough D. A. Predicting protein-protein interactions from primary structure [J]. Bioinformatics,2001,17(5):455-460.
    [115]Sprinzak E., Margalit H. Correlated sequence-signatures as markers of protein-protein interaction [J]. Journal of Molecular Biology,2001,311(4):681-692.
    [116]Ogmen U.,Keskin O.,Aytuna A. S., et al. PRISM:protein interactions by structural matching [J]. Nucleic Acids Research,2005,33(Web Server Issue):W331-W336.
    [117]Deng M.,Mehta S.,Sun F., et al. Inferring domain-domain interactions from protein-protein interactions [J]. Genome Research,2002,12(10):1540-1548.
    [118]Finn R. D.,Tate J.,Mistry J., et al. The Pfam protein families database [J]. Nucleic Acids Research,2007, Database Issue:1-8.
    [119]Han D. S.,Kim H. S.,Jang W. H., et al. PreSPI:a domain combination based prediction system for protein-protein interaction [J]. Nucleic Acids Research,2004, 32(21):6312-6320.
    [120]Han D.,Kim H.,Jang W., et al. PreSPI:design and implementation of protein-protein interaction prediction service system [J]. GENOME INFORMATICS, 2004,15(2):171-180.
    [121]Gomez S. M., Lo S. H., Rzhetsky A. Probabilistic Prediction of Unknown Metabolic and Signal-Transduction Networks [J]. Genetics,2001,159(3):1291-1298.
    [122]Gomez S. M., Noble W. S., Rzhetsky A. Learning to predict protein-protein interactions from protein sequences [J]. Bioinformatics,2003,19(15):1875-1881.
    [123]Kim W. K., Park J., Suh J. K. Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair [J]. GENOME INFORMATICS,2002,13:42-50.
    [124]Mulder N. J.,Apweiler R.,Attwood T. K., et al. InterPro:an integrated documentation resource for protein families, domains and functional sites [J]. Briefings in bioinformatics,2002,3(3):225-235.
    [125]Martin S., Roe D., Faulon J. L. Predicting protein-protein interactions using signature products [J]. Bioinformatics,2005,21 (2):218-226.
    [126]Ben-Hur A., Noble W. S. Kernel methods for predicting protein-protein interactions [J]. Bioinformatics,2005,21(1):i38-i46.
    [127]Shen J.,Zhang J.,Luo X., et al. Predicting protein-protein interactions based only on sequences information [J]. Proceedings of the National Academy of Sciences, 2007,104(11):4337-4341.
    [128]Pitre S.,Dehne F.,Chan A., et al. PIPE:a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs [J]. BMC Bioinformatics,2006,7(1):365-379.
    [129]Shi M.-G.,Xia J.-F.,Li X.-L., et al. Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset [J]. Amino Acids,2009,38(3):891-899.
    [130]Guo Y.,Yu L.,Wen Z., et al. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences [J]. Nucleic Acids Research,2008,36:3025-3030.
    [131]Xenarios I.,Salwinski L.,Duan X. J., et al. DIP, the Database of Interacting Proteins:a research tool for studying cellular networks of protein interactions [J]. Nucleic Acids Research,2002,30(1):303-305.
    [132]Bader G. D.,Donaldson I.,Wolting C., et al. BIND--the biomolecular interaction network database [J]. Nucleic Acids Research,2001,29(1):242-245.
    [133]Kabsch W., Sander C. Dictionary of protein secondary structure:pattern recognition of hydrogen-bonded and geometrical features [J]. Biopolymers,1983, 22(12):2577-2637.
    [134]Breitkreutz B. J.,Stark C.,Reguly T., et al. The BioGRID interaction database: 2008 update [J]. Nucleic Acids Research,2008,36:D637-D640.
    [135]Li N., Sun Z., Jiang F. Prediction of protein-protein binding site by using core interface residue and support vector machine [J]. BMC Bioinformatics,2008, 9:553-565.
    [136]Deng L.,Guan J.,Dong Q., et al. Prediction of protein-protein interaction sites using an ensemble method [J]. BMC Bioinformatics,2009,10(1):426-440.
    [137]Liu B.,Wang X.,Lin L., et al. Prediction of protein binding sites in protein structures using hidden Markov support vector machine [J]. BMC Bioinformatics, 2009,10(1):381-394.
    [138]Chen X. W., Liu M. Prediction of protein-protein interactions using random decision forest framework [J]. Bioinformatics,2005,21(24):4394-4400.
    [139]Chen X. W., Jeong J. C. Sequence-based prediction of protein interaction sites with an integrative method [J]. Bioinformatics,2009,25(5):585-591.
    [140]Sikic M., Tomic S., Vlahovicek K. Prediction of Protein-Protein Interaction Sites in Sequences and 3D Structures by Random Forests [J]. PLoS Comput Biol, 2009,5(1):e1000278.
    [141]Tuncbag N., Gursoy A., Keskin O. Identification of computational hot spots in protein interfaces:combining solvent accessibility and inter-residue potentials improves the accuracy [J]. Bioinformatics,2009,25(12):1513-1520.
    [142]Chenna R.,Sugawara H.,Koike T., et al. Multiple sequence alignment with the Clustal series of programs [J]. Nucleic Acids Research,2003,31(13):3497-3500.
    [143]Berman H. M.,Battistuz T.,Bhat T. N., et al. The protein data bank [J]. Acta Crystallographica Section D:Biological Crystallography,2002,58(6):899-907.
    [144]Dodge C., Schneider R., Sander C. The HSSP database of protein structure-sequence alignments and family profiles [J]. Nucleic Acids Research,1998, 26(1):313-315.
    [145]Lee B., Richards F. M. The interpretation of protein structures:Estimation of static accessibility [J]. Journal of Molecular Biology,1971,55(3):379-380.
    [146]Shrake A., Rupley J. A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin* 1 [J]. Journal of Molecular Biology,1973,79(2):351-364.
    [147]Chakrabarti P., Janin J. Dissecting protein-protein recognition sites [J]. PROTEINS:Structure, Function, and Genetics,2002,47(3):334-343.
    [148]McCulloch W. S., Pitts W. A logical calculus of the ideas immanent in nervous activity [J]. Bulletin of Mathematical Biology,1943,5(4):115-133.
    [149]张铃,张钹.M-P神经元模型的几何意义及其应用[J].软件学报,1998,9(5):334-338.
    [150]张铃,张钹.多层前向网络的交叉覆盖设计算法[J].软件学报,1999,10(7):737-742.
    [151]张铃,张钹.多层反馈神经网络的FP学习和综合算法[J].软件学报,1997,8(4):252-258.
    [152]Chen Y. C.,Lin Y. S.,Lin C. J., et al. Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences [J]. Proteins Structure Function and Bioinformatics,2004, 55(4):1036-1042.
    [153]Chen C.,Tian Y. X.,Zou X. Y, et al. Using pseudo-amino acid composition and support vector machine to predict protein structural class [J]. Journal of Theoretical Biology,2006,243(3):444-448.
    [154]Cai C. Z.,Han L. Y.,Ji Z. L., et al. Enzyme family classification by support vector machines [J]. Proteins Structure Function and Bioinformatics,2004, 55(1):66-76.
    [155]Bhasin M., Raghava G. P. S. ESLpred:SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST [J]. Nucleic Acids Research,2004,32(Web Server Issue):W414-W419.
    [156]Wang J.,Lim K.,Smolyar A., et al. Atomic structure of an alpha beta T cell receptor (TCR) heterodimer in complex with an anti-TCR Fab fragment derived from a mitogenic antibody [J].1998,17(1):10-26.
    [157]Prasad L.,Waygood E. B.,Lee J. S., et al. The 2.5 resolution structure of the jel42 Fab fragment/HPr complex [J]. Journal of Molecular Biology,1998, 280(5):829-845.
    [158]Dash M., Liu H. Feature selection for classification [J]. Intelligent data analysis, 1997,1(3):131-156.
    [159]Dash M., Liu H. Consistency-based search in feature selection [J]. Artificial Intelligence,2003,151 (1-2):155-176.
    [160]Dettling M., Buhlmann P. Finding predictive gene groups from microarray data [J]. Journal of Multivariate Analysis,2004,90(1):106-131.
    [161]范劲松,方廷健.特征选择和提取要素的分析及其评价[J].计算机工程与应用,2001,37(13):95-99.
    [162]陈光英,张千里,李星.特征选择和SVM训练模型的联合优化[J].清华大学学报:自然科学版,2004,44(1):9-12.
    [163]张小丹.基于支持向量机的基因表达数据特征选取方法研究[D].甘肃:兰州大学,2008.
    [164]李伟红.基于支持向量机的人脸特征选择及识别研究[D].重庆:重庆大学,2006.
    [165]毛勇.基于支持向量机的特征选择方法的研究与应用[D].浙江:浙江大学,2006.
    [166]张振慧.蛋白质分类问题的特征提取算法研究[D].长沙:国防科学技术大学,2006.
    [167]Mitchell M. An introduction to genetic algorithms [M]. The MIT press,1998.
    [168]Krishna Murthy H. M.,Judge K.,DeLucas L., et al. Crystal structure of Dengue virus NS3 protease in complex with a Bowman-Birk inhibitor:implications for flaviviral polyprotein processing and drug design [J]. Journal of Molecular Biology, 2000,301 (4):759-767.
    [169]Dai S.,Schwendtmayer C.,Schurmann P., et al. Redox signaling in chloroplasts: cleavage of disulfides by an iron-sulfur cluster [J]. Science,2000, 287(5453):655-658.
    [170]Birtalan S. C., Phillips R. M., Ghosh P. Three-dimensional secretion signals in chaperone-effector complexes of bacterial pathogens [J]. Molecular Cell,2002, 9(5):971-980.
    [171]Donaldson 1.,Martin J.,de Bruijn B., et al. PreBIND and Textomy-mining the biomedical literature for protein-protein interactions using a support vector machine [J]. BMC Bioinformatics,2003,4(1):11-23.
    [172]Ben-Hur A., Noble W. S. Kernel methods for predicting protein-protein interactions [J]. Bioinformatics,2005,21(1):38-46.
    [173]Liu L.,Cai Y.,Lu W., et al. Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection [J]. Biochemical and Biophysical Research Communications,2009, 380(2):318-322.
    [174]Deane C. M.,Salwinski L.,Xenarios I., et al. Protein interactions:two methods for assessment of the reliability of high throughput observations [J]. Molecular & Cellular Proteomics,2002, 1(5):349-356.
    [175]Sweet R. M., Eisenberg D. Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure [J]. Journal of Molecular Biology, 1983,171(4):479-488.
    [176]Hopp T. P., Woods K. R. Prediction of protein antigenic determinants from amino acid sequences [J]. Proceedings of the National Academy of Sciences,1981, 78(6):3824-3828.
    [177]Grantham R. Amino acid difference formula to help explain protein evolution [J]. Science,1974,185(4154):862-864.
    [178]M C., BI C. The structure dependence of amino acid hydrophobicity parameters [J]. The Journal of Theoretical Biology,1982,99:629-644.
    [179]Eisenberg D., McLachlan A. D. Solvation energy in protein folding and binding [J]. Nature,1986,319:199-203.
    [180]Fauchere J.,Charton M.,Kier L. B., et al. Amino acid side chain parameters for correlation studies in biology and pharmacology [J]. International Journal of Peptide and Protein Research,1988,32(4):269-278.
    [181]Janin J. Surface and inside volumes in globular proteins [J]. Nature,1979, 277:491-492.
    [182]Prabhakaran M., Ponnuswamy P. K. Shape and surface features of globular proteins [J]. Macromolecules,1982,15(2):314-320.
    [183]Garel J. P., Filliol D., Mandel P. Coefficients de partage d'aminoacides, nucleobases, nucleosides et nucleotides dans un systeme solvant salin [J]. J. Chromatog,1973,78:381-391.
    [184]Hutchens J. O. Heat capacities, absolute entropies and entropies of formation of amino acids and related compounds [J]. Handbook of biochemistry and molecular biology. D. Physical and chemical data,1976:B60-B61.
    [185]Krigbaum W. R., Komoriya A. Local interactions as a structure determinant for protein molecules:Ⅱ [J]. Biochimica et Biophysica Acta,1979,576(1):204-228.
    [186]Rose G. D.,Geselowitz A. R.,Lesser G. J., et al. Hydrophobicity of amino acid residues in globular proteins [J]. Science,1985,229(4716):834-838.
    [187]Peng Z.,Fei-Fei T.,Bo L. I., et al. Genetic algorithm-based virtual screening of combinative mode for peptide/protein [J]. Acta Chimica Sinica,2006, 64(7):691-697.
    [188]Luscombe N. M., Laskowski R. A., Thornton J. M. Amino acid-base interactions:a three-dimensional analysis of protein-DNA interactions at an atomic level [J]. Nucleic Acids Research,2001,29(13):2860-2874.
    [189]Chou K. C. Prediction of protein cellular attributes using pseudo-amino acid composition [J]. Proteins Structure Function and Genetics,2001,44(1):60-60.
    [190]Metz C. E. Basic principles of ROC analysis [J]. Seminars in Nuclear Medicine, 1978,8(4):283-298.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700