基于智能计算的膜蛋白结构与相互作用预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
得到基因数据后,要分析全部基因的功能,阐明基因组所表达的真正执行生命活动的全部蛋白质的表达规律和生物功能,最直接的是进行蛋白质结构研究。在膜蛋白结构与功能研究的具体领域中,膜蛋白类型预测是一个重要的基础性研究,利用分子生物学方法来预测膜蛋白已经不能满足日益增长的膜蛋白序列的需求,因此,本论文结合智能计算的相关技术,挖掘膜蛋白序列内氨基酸的排列顺序信息,以更好的理解膜蛋白序列与结构、功能之间的关系。另外,越来越多的基因组大规模测序提供了更多的膜蛋白序列,同时也为膜蛋白的相互作用提供了基础。膜蛋白的相互作用在生命活动中起重要作用,不仅控制正常的生理过程,也在病理过程中起着重要的作用;不仅为注释未知膜蛋白的生物学功能提供了线索,也为研究膜蛋白结构,了解生命活动的机制,提供了必要的信息。
     本论文在膜蛋白序列的基础上研究膜蛋白的结构,主要从两个方面进行:膜蛋白结构类预测研究和膜蛋白相互作用识别预测。采用伪氨基酸组成理论和近似熵算法,优化参数组合,根据参数不同组合形成不同的分类器,最终构建一个集成分类器,用来对膜蛋白的结构类进行预测;建立模糊支持向量机网络,结合生物物理属性对膜蛋白进行分类。在膜蛋白相互作用研究中,收集较多的正样本数据,借助实验数据提取相互作用特征,应用模糊支持向量机算法识别膜蛋白相互作用;在此基础上,采用不同的特征表示,建立另外的数据集,应用AdaBoost算法集成多个弱分类器,用来预测膜蛋白相互作用,以更好的研究膜蛋白的结构和功能。本论文具体的研究内容有:
     在膜蛋白二级结构类预测中,采用伪氨基酸成分理论描述膜蛋白序列,近似熵方法计算结果作为补充序列信息,使用优化后的权重系数,根据参数设置的不同,组合建立多个不同的分类器,集成了多个模糊k近邻分类器,经过训练、测试,应用集成分类器预测膜蛋白结构分类,刀切法测试证明了该方法的有效性和实用性。
     针对传统支持向量机分类问题中出现不可分区域的问题,引入模糊隶属度函数,构成模糊支持向量机分类器,集成多个这样的分类器构建模糊支持向量机网络,结合膜蛋白序列的物理化学属性信息预测膜蛋白结构类。
     由于膜蛋白的疏水等特性,其结构数据在整个蛋白质数据库中所占比例非常小,实验方法获取膜蛋白相互作用更是困难,所以已知的膜蛋白相互作用数据非常少。本文提出用模糊支持向量机算法识别未知的膜蛋白对,收集较多的正样本数据,借助实验数据提取相互作用特征,经验证,该算法是有效的。
     AdaBoost的原理是,一个弱学习器不能很好学习的样本,将尽可能成为下一个弱学习器着重学习的样本,因此,我们应用AdaBoost算法集成多个弱分类器,结合不同的数据集,采取不同方法提取膜蛋白相互作用特征,以获得更好的特征表示,应用集成分类系统对膜蛋白相互作用进行分类预测,取得了很好的结果。
     最后,总结了全论文的工作,指出了研究工作中存在的不足,并对今后的研究方向和研究重点进行了讨论。
After obtaining genetic data, the most direct way is to conduct studies of protein structure in order to analyze all the gene function and clarify the expression patterns and biological functions of proteins, especially the the proteins expressed by the genome and used to implement the life activity. In the specific study of membrane protein structure and function, the prediction of membrane protein types is the important foundation. However, it can not meet the demand for the increasing membrane protein sequences using molecular biological methods to predict membrane protein types. Given an amino acid sequence, what features should be derived from it and how to formulize these features so as to represent the relationship between the sequence and the structure or function of the corresponding protein correctly? In other words, characteristic description of the amino acid sequence requires further study. In this thesis, combining intelligent computing technologies, the information of membrane protein sequences is mined in order to better understand the relationship between the membrane protein sequences, structure and function. Besides, more and more large-scale genome sequencing provided us not only additional membrane protein sequences, but also conditions for the study of membrane protein interactions. Membrane protein interactions play an important role in the life activities. They provide not only clues for the annotation of the unknown biological functions of membrane proteins, but also necessary information for study of membrane protein structure and understanding of the mechanisms of life activities.
     In this thesis, we study the structures of membrane protein based on the sequences. We mainly focus on two areas: the prediction of membrane protein types and prediction of membrane protein interactions. Using pseudo amino acid composition theory and the approximate entropy algorithm, optimizing parameter combination, according to different combinations of parameters of the formation several different types of classifiers are built, then we ultimately construct a classifier by integrating the different basic ones. The integrated classifier is used for predicting membrane protein structure classes. Besides, we establish fuzzy support vector machine network to classify membrane proteins by combination of bio-physical properties of them.
     In the study of membrane protein interaction, we collect more positive samples, extract features of membrane protein interactions through the experimental data, and use fuzzy support vector machine algorithm to identify membrane protein interactions. By creating additional data set, we use different feature representation methods and apply AdaBoost algorithm to integrate multiple weak classifiers to predict membrane protein interactions. The main contributions in the thesis are described as follows.
     In the prediction of secondary structural classes of membrane protein, first, we use pseudo-amino acid composition theory to describe membrane protein sequences and the additional sequence information is computed with approximate entropy method. Next, we establish a number of different classifiers according to the different parameter settings using the optimized weighting factor. Then we integrate a number of fuzzy K nearest neighbor classifier, and after training and testing we apply integrated classifier to predict membrane protein structural classes. Jackknife tests on the datasets show that the method is effective and practical.
     In the process of classification using traditional support vector machine algorithm, unclassifiable regions exist. In order to resolve the problem, we introduce the fuzzy membership function to constitute a fuzzy support vector machine classifier and then integrate multiple classifiers to build fuzzy support vector machine network. Combining with the information of physical and chemical properties of membrane protein sequences, the network is used to predict membrane protein structural classes.
     As the hydrophobic characteristics of membrane proteins, its structure data in the database occupies a very small proportion. Experimental methods for membrane protein interactions are more difficult, so the known data about membrane protein interactions is very little. In this paper, we use fuzzy support vector machine algorithm to identify unknown pairs of membrane proteins. We collect more data on the positive samples and extract interactive features with the experimental data. The algorithm is proven to be effective.
     AdaBoost principle is that the samples that a weak learner can not well study will be the samples that the next weak learner focus on as far as possible. Therefore, we apply the AdaBoost algorithm for integration of multiple weak classifiers, test on different data sets and take different ways to extract the characteristics of membrane protein interactions in order to obtain better feature representations. Application of integrated classification system to classify and predict membrane protein interactions achieved good results.
     At last, a summary of the thesis is made, and the deficiency in the project and the further development are narrated respectively.
引文
[1]Miller L J.Publishing controversial research.Science,1998,282:1045.
    [2]Gershon D.1997.Bioinformatics in a post-genomics age.Nature,389:417-418.
    [3]郝柏林,张淑誉.生物信息学手册第二版.上海科学出版社,2002.
    [4]张成岗,贺福初.生物信息学方法与实践.科学出版社,2002.
    [5]阎隆飞,孙之荣.蛋白质分子结构(第一版).清华大学出版社,1999.
    [6]杨福愉.生物膜结构研究的一些进展.生物化学与生物物理进展.2003,30(4):495-502.
    [7]Scheper G C,van der K T,vail A R J,et al.Mitochondrial aspartyl-tRNA synthetase deficiency causes leukoence phalopathy with brain stem and spinal cord involvement and lactate elevation.Nat Genet,2007,39(4):534-539.
    [8]Cidlowski J A,Lu N Z,Jewell C M,Beckley A,Ren R,Revollo J,Lannan E,Duma D,Gross K,Oakley R H.The glucocorticoid receptor:One gene,many proteins -New mechanisms for tissue specific anti-inflammatory actions of glucocorticoids in health and disease.Bone,2009,45,S3:120.
    [9]John W C,Richard D B,Don J M.The integral membrane of lysosomes:Its proteins and their roles in disease.Journal of Proteomics,2009,72(1):23-33.
    [10]Chou K C,Elrod D W.Prediction of membrane protein types and subcellular locations.Proteins:Structure,Function,and Genetics,1999,34(1):137-153.
    [11]Rost B,Casadio R,.Fariselli P,Sander C.Transmembrane helices predicted at 95%accuracy.Pro Sci,1995,4(3):521-533.
    [12]Rost B,Fariselli P,Casadio R.Topology prediction for helical transmemembrane proteins at 86%accuracy.Pro Sci,1996,5(8):1704-1718.
    [13]Kyte J,Doolittle R F.A simple method for displaying the hydropathic character of a protein,J.Mol.Biol.,1981,147:195-197.
    [14]Jayasinghe S,Hristova K,White S H.MPtopo:A database of membrane protein topology.Pro Sci,2001,10(2):455-458.
    [15]Scott R,Ferenc E,Julianne L K,Sagar P.The measurement of immersion depth and topology of membrane proteins by solution state NMR.Biochimica et Biophysica Acta(BBA) - Biomembranes,2007,1768(12):3044-3051.
    [16]Magdalena M.Dailey,Chayanendu Hait,Patrick A.Holt,Jon M.Maguire,Jason B.Meier,M.Clarke Miller,Luigi Petraccone,John O.Trent Structure-based drug design:From nucleic acid to membrane protein targets Experimental and Molecular Pathology,2009,86(3):141-150.
    [17]Moller S,Kriventseva E V,Apweiler R.A collection of well characterized integral membrane proteins.Bioinformatics,2000,16(12):1159-1160.
    [18]Krogh A,Larsson B,von Heijne G,Sonnhammer E L.Predicting transmembrane protein topology with a hidden Markov model:application to complete genomes.J.Mol.Biol,2001,305(3):567-580.
    [19]Kalina Hristova,New Tools For Studies Of Membrane Protein Dimerization In Mammalian Membranes.Biophysical Journal,2009,96(3):S1.
    [20]Wonpil I,Michael F,Charles L.Brooks Ⅲ.An Implicit Membrane Generalized Born Theory for the Study of Structure,Stability,and Interactions of Membrane Proteins.Biophysical Journal,2003,85(5):2900-2918.
    [21]Yang X G,Luo R Y,Feng Z P.Using amino acid and peptide composition to predict membrane protein types.Biochemical and Biophysical Research Communications,2007,353(1):164-169.
    [22]Chou K C,Zhang C T.Prediction of Protein Structural Classes.Critical Reviews in Biochemistry and Molecular Biology,1995,30(4):275-349
    [23]Chou K C.Prediction of protein cellular attributes using pseudo-amino acidcomposition.Proteins:Structure,Function,and Genetics,2001,43:246-255.
    [24]丁永生.计算智能-理论、技术与应用.科学出版社,2004.
    [25]丁永生,邵世煌,任立红.DNA计算与软计算.科学出版社,2002.
    [26]Baker D,Sali A.Protein structure prediction and structural genomics.Science,2001,294(5540):93-96.
    [27]Anthony L.How Lipids Regulate Membrane Protein Function.Biophysical Journal,2009,96(3):214a.
    [28]Raghothama C,Harsha H C,Prasad C K,et al.Bioinformatics and proteomics approaches for aging research.Biogerontology,2005,6(4):227-232.
    [29]Kyte J,Doolittle R F.A simple method for displaying the hydroparthic character of a protein.J Mol Biol,1982,157(1):105-132.
    [30]Von Heijne G.Membrane protein structure prediction:hydrophobicity analysis and the positive - inside rule.J Mol Biol,1992,225:487-494.
    [31]Juretic D,Zucic D,Lucic B,et al.Preference functions for prediction of membrane-buried helices in integral membrane proteins.Comput Chem,1998,22(4):2792-2941.
    [32]Jones D T,Taylor W R,Thornton J M.A model recognition approach to the prediction of all2helical membrane protein structure and topology.Biochemistry,1994,33(10):3038-3049.
    [33]Persson B,Argos P.Topology prediction of membrane proteins.Protein Sci,1996,5(2):363-371.
    [34]Tusnady G E,Simon I.Principles governing amino acid composition of integral membrane proteins:application totopology prediction.J Mol Biol,1998,283(2):489-506.
    [35]Diederichs K,Freigang J,Umhau S,et al.Prediction by a neural network of outer membrane beta-strand protein topology.Protein Sci,1998,7(11):24132-24201.
    [36]Jacoboni I,Martelli P L,Fariselli P,et al.Prediction of the transmembrane regions of beta2barrel membrane proteins with a neural network2based predictor.Protein Sci,2001,10(10):7792-7871
    [37]Natt N K,Kaur H,Raghava G P.Prediction of transmembrane regions of beta-barrel proteins using ANN and SVM based methods.Proteins,2004,56(1):112-181.
    [38]徐志节,杨杰,王猛.利用非线性降维方法预测膜蛋白类型.上海交通大学学报,2005,39(2):111-115.
    [39]Lin H.The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition.Journal of Theoretical Biology,2008,25(2):350-356.
    [40]Samad J,Parviz A,Mina J,Ebrahim B A.Novel two-stage hybrid neural discriminant model for predicting proteins structural classes.Biophysical Chemistry,2007,128(1):87-93.
    [41]Chou K C,Elord D W.Using discriminant function for prediction of subcellular location of Proteins.Biochem Biophys Res Commun,1998:252:63-68.
    [42]Cai Y D,Liu X J,Chou K C.Artificial neural network model for predicting membrane protein types.J Biomol Struct Dyn,2001,18(4):607-610.
    [43]Cai Y D,Liu X J,Xu X B,et al.Support vector machines for predicting protein structural class.Bioinformatics,2001,2(3):1471-2105.
    [44]Cai Y D,Zhou G P,Chou K C.Support vector machines for predicting membrane protein types by using functional domain composition.Biophysical Journal,2003,84:3257-3263.
    [45]Feng Z P,Zhang C T.Prediction of membrane protein types based on the hydrophobic index of amino acids.J Protein Chem,2000,19(4):269-75.
    [46]Yang X G,Feng Z P.Predicting Membrane Protein Types Using Residue-pairModels Based on Reduced Similarity Dataset.J Biomol Struct Dyn 2002,20(2):163-72.
    [47]来鲁华,蛋白质的结构预测与分子设计,北京大学出版社,1993.
    [48]Sun Z R,Ji X L,Hu S M,Wang Y,He F C.A new method of molecular evolution of proteinase superfamily.Chinese Science Bulletin.2001,46(7):578-582.
    [49]刘琪,陈钟强,王保华,朱贻盛,李亦学.跨膜蛋白拓扑结构预测的研究进展.国外医学.生物医学工程分册,2001,(5):197-201.
    [50]Li Y X et al.Research on several prediction methods of membrane protein structure and topology.High Technology Letters,2001,3:1-4.
    [51]张春霆.生物信息学的现状与展望.世界科技研究与发展,2000,22:17-20.
    [52]Sela M,F H White Jr,and Anfinsen C B.“Reductive Cleavage of Disulfide Bridges in Ribonuclease.” Science,1957,125:691-692.
    [53]Anfinsen C B,Edgar Haber.Studies on the Reduction and Reformation of Protein Disulfide Bonds.J.Biol.Chem.236 1960:1361-1363.
    [54]Chou P Y,Fasman G D.Prediction of protein conformation.Biochemistry,1974,13(2):22-45.
    [55]Garnier J,Osguthorpe D J,Robson B.Analysis of the accuracy and implications of simple method for predicting the secondary structure of globular proteins.J Mol Biol,1978,120:97-120.
    [56]Lim N K,Yoohk J C,et al.The relationship between cell surface hydrophobicity (CSH) and stress tolerance in Bifidobaterlumspp.Food Science and Biotechnology,1998,7:66-70.
    [57]Qian N,Sejnowski T J.Predicting the secondary structure of globular proteins using neural networks models.J.Mol.Biol.,1988,202:865-884.
    [58]Levin J M,et al.An algorithm for secondary structure determination in proteins based on sequence similarity.FEB S Let,1986,205:303-308
    [59]Michael A.W,Kathleen M C,Elizabeth J G,Mark E D.Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins Journal of Molecular.Biology,2007,365(3):621-636.
    [60]张绍武,潘泉,程咏梅等.基于一种新的特征提取法和SVM的膜蛋白分类研究.计算机与应用化学,2006,23(4):294-298.
    [61]Ju W,Shan J,Yan C H,et al.Discrimination of outer membrane proteins using fuzzy support vector machines.Information Sciences.Salt Lake City,Utah,USA,2007.
    [62]von Mering C,Krause R,Snel B,et al.Comparative assessment of large-scale data sets of protein-protein interactions.Nature,2002,417(6887):399-403.
    [63]Uetz P,Giot L,Cagney G,et al.A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.Nature,2000,403(6770):623-627.
    [64]Hazbun T R,Fields S.Networking proteins in yeast.Proc Natl Acad Sci USA,2001,98(8):4569-4574.
    [65]Strong M,Mallick P,Pellegrini M,Thompson M J,Eisenberg D,Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization:a combined computational approach,Genome Biol.4(2003):59.
    [66]Pellegrini M,Marcotte E M,Thompson M J,et al.Assigning protein functions by comparative genome analysis:protein phylogenetic profiles.Proc Natl Acad Sci, 1999,4285-4288
    [67]Overbeek R,Fonstein M,D'Souza M,et al.The use of gene clusters to infer functional coupling.Proc Natl Acad Sci USA,1999,96(6):2896-2901.
    [68]Enright A J,Iliopoulos I,Kyrpodes N C,et al.Protein interaction maps for complete genomes based on gene fusion events.Nature,1999,402(6757):86-90,
    [69]Marcotte E M,Pellegrini M,Ng HL,Rice D W,Yeates T O,Eisenberg D.Detecting protein function and protein-protein interactions from genome sequences.Science 1999 285:751-753.
    [70]Marcotte E M,Pellegrini M,Thompson M J,Yeates T O,Eisenberg D.A combined algorithm for genome-wide prediction of protein function.Nature,1999402,83-86.
    [71]Snel B,Lehmann G,Bork P,Huynen M A.STRING:a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene.Nucleic Acids Res 2000 28:3442-3444.
    [72]Dill K A.Dominant forces in protein folding.Biochemistry 1990 29:7133-7155.
    [73]Lau K F,Dill K A.A lattice statistical mechanics model of the conformational and sequence space of proteins.Macromolecules,1989,22(10):3986-3997.
    [74]Kirkpatrick S,Gelat C,Vecchi M.Optimization by Simulated Annealing.Science,1983,(220):671-680.
    [75]胡山鹰,陈丙珍,何小荣,沈静珠.非线性规划问题全局优化的模拟退火法],清华大学学报,1997,37(6):5-9.
    [76]杨若黎,顾基发.一种高效的模拟退火全局优化算法.系统工程理论与实践,1997,17(5):29-35.
    [77]黄文奇,李宗曼.基于模拟退火算法的蛋白质折叠问题求解.计算机工程与应用,2005,7:41.
    [78]靳利霞,唐焕文.蛋白质空间结构预测的一种优化模型及算法.应用数学与计算数学学报,2000,2.
    [79]解伟,王翼飞.蛋白质折叠的三维计算机模拟.上海大学学报(自然科学版),2000,6(2):145-149.
    [80]孙之荣,饶晓谦.用人工神经网络方法预测蛋白质超二级结构.生物物理学 报,1995,04:570-574.
    [81]Fasman G D.Protein conformational prediction.Trends in Biochemical Sciences,1989,14(7):295-299.
    [82]Petersen T N,Lundegrad C,Neilsen M,Bohr H,Bohr J,Brunak S,Gippert G P,Lund O.Prediction of protein secondary structure at 80%accuracy.Proteins,2000,41:17-20.
    [83]焦李成等.智能数据挖掘与知识发现.西安电子科技大学出版社,2006.
    [84]Kremer A,Schneider R,Tcrstappen G.C.A bioinformatics perspective on proteomics:data storage,analysis,and integration.Bio sci Rep,2005,25(1-2):95-106.
    [85]Baldi P,Brunak S.Bioinformatics:The Machine Learning Approach.MIT Press,2001.
    [86]Persson B,Argos P.Prediction of transmembrane segments in proteins utilizing multiple sequence alignments.J Mol Biol,1994,273(2):182-192.
    [87]Persson B,Argos P.Topology prediction of membrane proteins.Pro Sci,1996,5:363-371.
    [88]Sonnhammer E L,Von Heijne G,Krogh A.A hidden Markov model for predicting transmembrane helices in protein sequences.Proc Int Conf Intell Syst Mol Biol,1998,6:175-182.
    [89]Kahsay R Y,Gao G,Liao L.An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes.Bioinformatics,2005,21(9):1853-1858.
    [90]Zhu B J,Wu X Z.Identification of outer membrane protein ompR from rickettsia-like organism and induction of immune response in Crassostrea ariakensis.Molecular Immunology,2008,45(11):3198-3204.
    [91]Bagos P G,Liakopoulos T D,Hamodrakas S J.Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method.BMC Bioinformatics,2005,1(12):6-7.
    [92]Clements J D,Martin R E.Identification of novel membrane proteins by searching for patterns in hydropathy profiles.Eur.J.Biol.che,2002,269(8):2101-2107.
    [93]Rutz C,Rosenthal W, Schulein R. A single negatively charged residue affects the orientation of a membrane protein in the inner membrane of Escherichia coli only when it is located adjacent to a transmembrane domain. J. Biol. Chem, 1999, 274(47): 33757-33763.
    [94]Nakashima H, Nishikawa K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol, 1994,238:54-61.
    [95]Kumarevel T S,Gromiha M M, Ponnuswamy M N. Structural class prediction:an application of residue distribution along the sequence. Biophys Chemist, 2000, 88: 81-101
    [96]Gao Q B, Wang Z Z, Yan C, Du Y H. Prediction of protein subcellular location using a combined feature of sequence. Febs Letters, 2005.579: 3444-3448.
    [97]Yu C S, Lin C J, Hwang J K. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions Protein Sci, 2004, 13: 1402 1406 .
    [98]Garg A, Bhasin M, Raghava, Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search J. Biol. Chem, 2005 280: 14427-14432 .
    [99]Park K J, Kanehisa M. Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics, 2003, 19:1656-1663.
    [100] Feng Z P, Zhang C T. A graphic representation of protein primary structure and its application in predicting subcellular locations of prokaryotic proteins. Int. J. Biochem. Cell Biol, 2002, 34: 298-307.
    [101] Chou K C. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Res. Commun, 2000, 19: 477-483.
    [102] Cai Y D, Liu X J, Xu X B, et al. Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. Journal of Cellular Biochemistry, 2002, 84(2): 343-348
    [103]Bhasin M,Raghava G P.ESLpred:SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST Nucleic Acids Res,2004,32:414-419.
    [104]Pan Y X,Zhang Z Z,Guo Z M,Feng G Y,Huang Z D,He L.Application of pseudo amino acid composition for predicting protein subcellular location:stochastic signal processing approach J.Protein Chem,2003,22:395-402
    [105]Pan Y X,Li D W,Duan Y,Zhang Z Z,Xu M Q,Feng G Y,He L.Predicting Protein Subcellular Location Using Digital Signal Processing Approach,Acta biochim biophys sin,2005,37(2):88-96
    [106]Chou P Y.Prediction of protein structural classes from amino acid composition.In:fasman G D.Prediction of protein structure and the principles of protein conformation.New York:Plenum Press,1989,549-586.
    [107]Nakashima H,Nishikawa K,Ooi T.The folding type of a protein is relevant to the amino acid composition.J Biochem,1986,99(1):152-162.
    [108]Chou K C,Maggiora G M.Domain Structural Prediction.Protein Engineering,1998,11(7):523-538.
    [109]Rost B.PHD:predicting one-dimensional protein structure by profile-based neural networks.Methods Enzymol.1996,266:525-539.
    [110]Rost B,Sander C,Schneider R.PHD:an automatic mail server for protein secondary structure prediction.Comput Appl Biosci.1994,10(1):53-60.
    [111]Riis S K,Krogh A.Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments.Journal of Computational Biology,1996,3:163-183.
    [112]Wood M J,Hirst J D.Predicting protein secondary structure by cascade-correlation neural networks.Bio informatics,2004,20(3):419-420.
    [113]王龙会,石峰.遗传神经网络及其在蛋白质二级结构预测中的应用.数学杂志,2002,22:179-183
    [114]孙海军,阮晓钢.用多模神经网络预测蛋白质二级结构.昆明理工大学学报(理工版),2004,9-16.
    [115]Pollastri G,McLysaght A,Porter:a new,accurate server for protein secondary structure prediction. Bioinformatics, 2005, 21(8): 1719-1720.
    [116] Hua S J, Sun Z R. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J. Mol. Biol.,2001, 308:397-407.
    [117] Kim H, Haesun Park H. Protein secondary structure prediction based on an improved support vector machines approach. Protein Eng., 2003,16 (8):553-560.
    [118] Liu Y, Carbonell J, Klein-Seetharaman J, Gopalakrishnan V. Context sensitive vocabulary and its application in protein secondary structure prediction, SIGIR'04, Sheffield, South Yorkshire, UK, 2004,July 25-29.
    [119] Ding Y S, Zhang T L , Gu Q , Zhao P Y, Chou K C, Using maximum entropy model to predict protein secondary structure with single sequence, Protein & Peptide Letters, 2009, 16(5): 552-60.
    [120] Zhao P Y, DingY S, Zhang J X, Prediction of Membrane Protein types by a fuzzy support vector machine Classifier, ICC09: The third Intelligent Computing Conference, Jinan, China, may 15-19, 2009.
    [121] http://www.ebi.ac.uk/uniprot/
    
    [122] http://www.rcsb.org/pdb/
    
    [123] Miller R G. The jackknife-a review. Biometrika, 1974, 61(1): 1-15.
    
    [124] Efron B, Tibshirani R. An introduction to the bootstrap. New York, 1993.
    [125] Martin S, Rose D, Faulon J L. Predicting protein-protein interactions using signature products. Bioinformatics, 2005, 21: 218-226.
    [126] Gaastedand T, Ragan M A. Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics, 1998, 3(4): 199-217.
    [127] Pazos F, Valencia A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Engineering, 2001, 9(14): 609-614.
    [128] Pazos F, Helmer-Citterich M, Ausiello G, et al. Correlated mutations contain information about protein-protein interaction. Mol boil, 1997, 271(4): 511-523.
    [129] Walkout A J M, Sordella R, Lu X, et al. Protein interaction mapping in C.elegans using proteins involved invulval development. Science, 2000, 287: 116-122.
    [130] Fraser H B, Hirsh A E, Steinmetz L M, et al. Evolutionary rate in the protein interaction network. Science, 2002,296:750-752.
    [131] Joel R B, David A G. Predicting protein-protein interactions from primary structure. Bioinformatics, 2001, 17(5): 455-460.
    [132] Aloy P, Russell R B, Protein interaction prediction through tertiary structure . Bioinformatics, 2003, 19(1): 161-162.
    [133] Qi Y, Bar-Joseph Z, Klein-Seetharaman J. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins, 2006, 63(3): 490-500.
    [134] Klein P. Prediction of protein structural class by discriminant analysis. Biochim. Biophys. Acta, 1986, 874: 205-215.
    [135] Chou P Y. Amino acid composition of four classes of proteins. Las Vegas: Second Chemical Congress of North American Continent, 1980.
    [136] Zhang C T, Chou K C, Maggiora G M. Predicting protein structural classes from amino acid composition: application of fuzzy clustering. Protein Engineering, 1995,8:425-435.
    [137] Cai Y D, Zhou G P. Prediction of protein structural classes by neural network. Biochemistry, 2000, 82(8): 783-785.
    [138] Shen H B, Yang J, Liu X J, Chou K C. Using supervised fuzzy clustering to predict protein structural classes. Biochemical and Biophysical Research Communications, 2005, 334, 577-581.
    [139] Gao Q B, Wang Z Z. Using nearest feature line and tunable nearest neighbor method for prediction of protein subcellular locations. Computational Biology and Chemistry, 2005,29(5): 388-340.
    [140] Yuan Z. Prediction of protein subcellular locations using markov chain models. FEBS Letters, 1999, 451 (1): 23-26
    [141] Markowetz F, Edler L, Vingron M. Support Vector Machines for Protein Fold Class Prediction. Biometrical Journal, 2003, 45: 377-389.
    [142] Zhang T L, Ding Y S, Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. Journal of Theoretical Biology. 2008, 250: 186-193.
    [143] Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K. Prediction of protein structural class with Rough Sets. BMC Bioinformatics, 2006, 7: 20-26.
    [144] Kedarisetti K D, Kurgan L A, Dick S. Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun. 2006, 348: 981-988.
    [145] Kurgan L, Homaeian L. Prediction of Structural Classes for Protein Sequences and Domains-Impact of Prediction Algorithms, Sequence Representation and Homology, and Test Procedures on Accuracy. Pattern Recognition Letter 2006, 39: 2323-2343.
    [146] Nanni L. A novel ensemble of classifiers for protein fold recognition. Neurocomputing 2006, 69: 2434-2437.
    [147] Nanni L, Lumini A, Ensemblator: an ensemble of classifiers for reliable classification of Biological Data, Pattern Recognition Letters, vol.28, no.5, pp.622-630.
    [148] Zhao P Y, Ding Y S, Prediction of membrane protein types by an ensemble classifier based on pseudo amino acid composition and approximate entropy, BMEI2008: International Conference on BioMedical Engineering and Informatics, Sanya, China, 164-168.
    [149] M. Spiess, Heads or tails - what determines the orientation of proteins in the membrane, FEBS Lett. 369 (1995) 76-79.
    [150] Chou K C, Shen H B: MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM, Biochem Biophys Res Commun, 2007, 360:339-345
    [151] Hong B, Tang Q Y, Yang F S. ApEn and Cross-ApEn: property, fast algorithm and preliminary application to the study of EEG and Cognition. Signal Process, 1999,15: 100-108.
    [152] Pincus S M. Approximate entropy as a measure of system complexity. Proc Natl Acad Sci USA, 1991, 88: 2297-2301.
    [153] Richman J S, Moorman J R. Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol Heart Circ Physiol. 2000, 278(6): H2039-H2049.
    [154] Keller J M, Gray M R , Givens J A. A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst .Man Cybern. ,1985 ,15 :580 - 585.
    [155] J. Cedano, P. Aloy, J.A. P'erez-Pons, E. Querol, Relation between amino acid composition and cellular location of proteins, J. Mol. Biol. 266 (1997) 594-600.
    [156] Levitt M, Chothia C. Structural patterns in globular proteins. Nature, 1976, 261:552-557.
    [157] Chen C, Zhou X, Tian Y, Zou X, Cai P. Predicting Protein Structural Class with Pseudo Amino Acid Composition and Support Vector Machine Fusion Network. Anal. Biochem., 2006, 357: 116-121.
    [158] Zhang T L, Ding Y S, Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 2007, 33: 623-629.
    [159] Nguyen M N, Rajapakse J C. Prediction of protein relative solvent accessibility with a two-stage SVM approach, Proteins: Struct. Funct. Bioinform., 2005, 59: 30-37.
    [160] Nguyen M N, Rajapakse J C. Two-stage multi-class support vector machines to protein secondary structure prediction, Pacific Symp. Biocomp., 2005, 346-357.
    [161] Nguyen M N, Rajapakse J C. Two-stage support vector regression approach for predicting accessible surface areas of amino acids, Proteins: Struct. Funct. Bioinform., 2006, 63: 542-550.
    [162] Shigeo A, Keita S. Generalization improvement of a fuzzy classifier with ellipsodial regions. FUZZ- IEEE . 2001: 207-210.
    [163] Daisuke T, Shigeo A. Fuzzy lest squares support vector machines for multiclass problems. Neural Networks, 2003 , 16 (5-6): 785-792.
    [164] Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273-297.
    [165] Marcotte E M. Computational genetics: Finding protein function by nonhomology. Current Opinion in Structural Biology, 2000. 10(3): 359-365.
    [166] Zhu H, Bilgin M, Bangham R et al. Global analysis of protein activities using proteome chips. Science, 2001, 293(5537): 2101-2105.
    [167] Fields S, Song O. A novel genetic system to detect protein-protein interactions. Nature, 1989, 340(6230): 245-246.
    [168] Ho Y, Gruhler A, Heilbut A, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 2002,415 (6868): 180-183.
    [169] Shen B, Vihinen M. RankViaContact: ranking and visualization of amino acid contacts. Bioinformatics., 2003,19(16): 2161-2162.
    [170] Ayton G S, Voth G A. Systematic multiscale simulation of membrane protein systems. Current Opinion in Structural Biology, 2009, 19,(2),: 138-144.
    [171] Stagljar I, Korostensky C, Johnsson N, et al. A genetic system based on split-ubiquitin for the analysis of interactions between membrane proteins in vivo. Proc Natl Acad Sci USA, 1998, 95 (9): 5187-5192.
    [172] Grasberger B, Minton A P, Delisi C, Metzger H. Interaction between proteins localized in membranes. Proc. Natl. Acad. Sci. USA, 1986, 83:6258.
    \[173] Miller J P, Lo R S, Ben-Hur A, Desmarais C, Stagljar I, Noble W S, Fields S. Large-scale identification of yeast integral membrane protein interactions. Proc Natl Acad Sci, 2005, 102(34): 12123-12128.
    [174] Xia Y, Lu L J, Gerstein. M.Integrated Prediction of the Helical Membrane Protein Interactome in Yeast. Journal of Molecular Biology, 2006, 357(1): 339-349.
    
    [175] http://dip.doe-mbi.ucla.edu
    [176] http://www.thebiogrid.org/
    [177] Beeler T, Bacikova D, Gable K, Hopkins L, Johnson C, Slife H, Dunn T. The Saccharomyces cerevisiae TSC10/YBR265w gene encoding 3-ketosphinganine reductase is identified in a screen for temperature-sensitive suppressors of the Ca2+-sensitive csg2Delta mutant. J. Biol. Chem.1998, 273: 30688-30694.
    [178] Ghaemmaghami S, W K. Huh, K Bower, R W Howson, Belle A, Dephoure N, O'Shea E K, Weissman J S.. Global analysis of protein expression in yeast. Nature, 425:737-741.
    [179] Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N. Revealing modular organization in the yeast transcriptional network. Nat. Genet. 2002, 31, 370-377.
    [180] Huh W K, Falvo J V, Gerke L C, Carroll A S, Howson R W, et al. Global analysis of protein localization in budding yeast. Nature, 2003, 425: 686-691.
    [181] Vapnik V N. The nature of statistical learning theory. NY: spring-Verlag, 1995
    [182] Nanni L. Fusion of classifiers for predicting protein-protein interactions. Neurocomputing, 2005, 68: 289-296.
    [183] Abbott A. A post-genomic challenge learning to read patterns of protein synthesis, 1999, 402(6763): 715-720.
    [184] [177] Maslov s, Sneeepen K. Specificity and stability in toplogy of protein networks science, 2002, 296: 910-913.
    [185] Jeong H, Mason S P, Barabasi A L, Oltvai Z N. Lethality and centrality in protein networks. nature 2001, 411: 41-42
    [186] Kunin Victor, Pereira-Leal, Jose B, Ouzounis, Christos A. Functional evolution of the Yeast protein interaction network. 2004, 21(7): 1171-1175
    [187] Jones S, Thornton J M. Principles of protein-protein interactions. Proc Natl Acad Sci USA, 1996, 93(1):13-20.
    [188] Valiant L G. A theory of learnable. Communications of the ACM, 1984, 27:1134-1142
    [189] Kearns M, Valiant L G. Learning boolean formulae or finite automata is as hard as factoring. Cambridge, MA: Harvard University Aiken Computation Laboratory. Technical Report TR-14-88, 1988.
    [190] Kearns M, Valiant L G. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 1994,41(1):67-95.
    [191] Shapire R E. The strength of weak learnability. Machine Learning, 1990, 5(2):197-227.
    
    [192] Dietterich T G. Ensemble methods in machine learning. In Multiple Classifier Systems, Cagliari, Italy, 2000.
    [193] Freund Y, Schapire R E. Experiments with a new boosting algorithnV/Proceedings of the Thirteenth International Conference on Machine Learning (ICML), San Francisco: Morgan Kaufmann Publisher, 1996:148-156.
    [194] Breiman L. Prediction games and arcing classifiers. Neural Computation, 1999,11:1493-1517.
    [195] Freund Y. Boosting a weak learning algorithm by majority. Inform.and Comput. 1995, 121(2): 256-285.
    [196] Freund Y, Schapire R. A decision-theoretic generalization of online learning and an application to boosting. J. Computer and System Sciences, 1997, 55: 119-139.
    [197] Duda R, Hart P, Stork DG. Pattern Classification. New York: John Wiley and Sons Inc. 2001.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700