基于伪氨基酸成分和功能域的蛋白质序列分类研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
人类基因组计划启动以来,蛋白质数据库中储存了海量的序列信息,但是对蛋白质结构与功能的认识却严重滞后。在这种情况下,探索理论和计算的方法就显得尤为重要,它将对认识蛋白质结构和功能起到重要的辅助作用。
     蛋白质分类问题作为蛋白质组学研究的一个分支,近年来受到研究者们的关注越来越多。蛋白质分类研究是全面掌握蛋白质结构与功能的前提和基础,在细胞生物学、分子生物学、医学和药理学中扮演着非常重要的角色。
     在构建蛋白质分类计算模型的过程中,特征提取算法是最为基本的问题,有时甚至成为关系分类质量好坏的关键所在。本文详细分析并研究了此问题,提出了基于元胞自动机图像参数的伪氨基酸成分和SMART功能域表示法,在标准数据集上进行了测试验证,大大提高了分类预测率。本文的主要工作和创新之处概括如下:
     (1)本文利用氨基酸数字编码模型生成蛋白质序列的元胞自动机图,提出了一种基于纹理图像特征的伪氨基酸成分表示法。用扩大的协方差算法对蛋白质二级结构类型进行预测,仿真结果显示有较好的分类效果。
     (2)本文提出了一种新的蛋白质序列特征杂交表示法——SMART功能域成分结合伪氨基酸成分。要理解一条蛋白质序列的结构和功能,一个重要的前提任务就是辨别一个新的多酞链的四级结构类型。本文采用最近邻居算法对七类同源寡聚体蛋白的分类问题进行了探讨。实验结果表明,该方法计算简单、分类性能好;另外拓展了蛋白质序列四级结构分类,构建了四级结构超家族数据集,并用功能域和伪氨基酸方法对其分类进行了研究。
     (3)设计了G蛋白偶联受体的两级分类器,对序列的元胞自动机图像纹理特征和功能域分布状况进行了较为深入的分析。
With the HGP put in practice, abundant sequence information is stored in biologic database. However, there is a very lack of understanding of the protein structure and function. In this situation, it is very important to explore theoretical and computational approaches, and this will boost the prediction of protein structures and functions from immensurable sequences.
     In these years, protein classfication, as an important aspect of proteomics, arose more and more attention. Feature extraction of protein sequence is a basic problem in the research of protein classification, even a key factor of the classification performance. This thesis lucubrate this problem and proposes a few new feature extraction algorithms, such as charactor parameter based on Celluler Automation Image and SMART function domain composition, which perfermonce very well in some protein classfication problemes. The main work and the creative achievements in this thesis are shown as followed:
     (1) Investigating the prediction of secondary structural class of proteins. Based on the concept of CAI, a new approach is presented. It was demonstrated thru the jackknife cross-validation test that the overall success rate by the new approach was significantly higher than those by the others.
     (2) For the protein quaternary structure prediction, two different composite feature extraction methods are raised, combined with the nearest neighbor algorithm, good results are obtained.
     (3) Designing a two level classifier for GPCR, carefully researching the CAI texture character and SMART function domain distributing status of the different sequence species.
引文
[1]Abbott,A.,And now for the proteome[J].Nature.2001,409:747.
    [2]Chou,K.C.,Elrod,D.W.Prediction of membrane protein types and subcellular locations.Proteins:Structure,Function,and Bioinformatics.1999,(34):137-153.
    [3]Rose,G.D.Hierarchic organization of domains in globular proteins.Journal of Molecular Biology.1979,40:447-470.
    [4]Levitt,M.,Chothia,C.Structure patterns in globular proteins.Nature.1976,261(17):552-557.
    [5]Nakashima,H.,Nishikawa,K.,Ooi,T.The folding type of a protein is relevant to the amino acid composition.Journal of Biochemistry.1986,(99):152-162.
    [6]Klotz IM D D,Langerman NR.Quaternary structure of proteins.In:Neurath H,Hill RL,editors.The protein[M],3rd ediction,Vol.1.New York:Academic Presess.1975,1:226-411.
    [7]Robert,G.Prediction of quaternary structure from primary structure.Bioinformatics,2001,(17):551-556.
    [8]Chou,K.C.,Cai,Y.D.Predicting protein quaternary structure by pseudo amino acid composition.Proteins:Structure,Function,and Genetics.2003,(53):282-289.
    [9]张绍武.基于支持向量机的蛋白质分类研究[学位论文].西安:西北工业大学.2003.
    [10]Yu,X.J.,Wang,C.,Li,Y.X.Classification of protein quaternary structure by functional domain composition.BMC Bioinformatics.2006,4(7):187.
    [11]Gudermann,T.,Nurnberg,B.,Schultz,G.Receptors and G proteins as primary components of transmembrane signal transduction.Part 1.G-protein-coupled receptors:Structure and function[J].J Mol Med.1995,73(2):51-63.
    [12]Karchin,R.,Karplus,K.,Haussler,D.Classifying G-protein-coupled receptors with support vector machines[J].Bioinformatics.2002,18(1):147-159.
    [13]Lapinsh,M.,Gutcaits,A.,Prusis,P.Classification of G-protein-coupled receptors by alignment independent extraction of principal chemical properties of primary amino acid sequences[J].Protein Sci.2002,11(4):795-805.
    [14]Joost,P.,Methner,A.Phylogenetic analysis of 277 human G-protein-coupled receptors as a tool for the prediction of orphan receptor ligands[J].Genome Biol.2002,3(11):1-16.
    [15]Nakashima,H.,Nishikawa,K.Discrimination of intracellular and extracellular proteins using amino acid composition and residues-pair frequencies[J].J.Mol. Biol. 1994,238:54-61.
    [16] Fujiwara, Y., Asogawa, M. Prediction of subcellular locations using amino acid composition and order. Genome Informatics. 2001, 12:103-112.
    [17] Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins. 2001, 43:246-255.
    [18] Chou, K.C. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Res. Commun. 2000, 278: 477-483.
    [19] Feng, Z.P., Zhang, C.T. A graphic representation of protein primary structure and its application in predicting subcellular locations of prokaryotic proteins. Int. J. Biochem. Cell Biol. 2002, 34:298-307.
    [20] Gao, Q.B., Wang, Z.Z., Yan, C, Du, Y.H. Prediction of protein subcellular location using a combined feature of sequence. FEBS LETTERS. 2005, 579(16):3444-3448.
    [21] Chou, K.C., Cai, YD. Using functional domain composition and support vector machines for prediction of protein subcellular location. Journal of Biological Chemistry. 2002, 277(48):45765-45769.
    [22] Murvai, J., Valhovicek, K., Barta, E., Pongor, S. The SBASE protein domain library, release 8.0:a collection of annotated protein sequence segments. Nucleic Acids Res. 2001,29:58-60.
    [23] Cai, YD., Chou, K.C. Nearest neighbor algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem. Biophys. Res. Commun. 2003, 305:407-411.
    [24] Apweiler, R., Attwood, T.K., Zdobnov, E.M. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 2001, 29:37-40.
    [25] Chou, K.C., Cai, YD. A new hybrid approach to predict subcellular locatization of proteins by incorporating gene ontology. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS. NOV 21 2003, 311(3):743-747.
    [26] Bhasin, M., Garg, A., Raghava, GP.S. PSLpred:prediction of subcellular localization of bacterial protein. Bioinformatics. 2005, 21:2522-2524.
    [27] Nair, R., Rost, B. Inferring subcellular localization through automated lexical analysis. Bioinformatics. 2002,18:S78-S86.
    [28] Nair, R., Rost, B. Better prediction of subcellular localization by combing evolutionary and structural information.Proteins Struct. Funct. Genet. 2003, 53: 917-930.
    [29] Scott, M.S., Thomas, D.Y, Hallett, M.T. Predicting subcellular localization via protein motif co-occurrence. Genome Res. 2004, 14:1957-1966.
    [30] Jin, L., Tang, H., Fang, W. Prediction of protein subcellular locations using a new measure of information discrepancy. J Bioinform Comput Biol. 2005, 3:915-927.
    [31] Guo, J., Lin, Y.L., Sun, Z.R. A novel method for protein subcellular localization: Combining residue-couple model and SVM. Proceedings of 3rd Asia-Pacific Bioinformatics Conference. Singapore. 2005.
    [32] Bu, W.S., Feng, Z.P., Zhang, Z.D., Zhang, C.T. Prediction of protein(domain) structural classes based on amino acid index. European Journal of Biochemistry. 1999,(266):1043-1049.
    [33] Chou, K.C., Cai, Y.D. Predicting protein structural class by functional domain composition. Biochemical and Biophysical Research Communications. 2004, (321): 1007-1009
    [34] Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D. Gene ontology: tool for the unification of biology. Nature Genetics. 2000, (25):25-29.
    [35] Lei, Z.D., Dai, Y. A Novel Approach for Prediction of Protein Subcellular Localization from Sequence Using Fourier Analysis and Support Vector Machines. Proceedings of 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics. 2004, August 22:11-17.
    [36] Pan, Y.X., Zhang, Z.Z., Guo, Z.M., Feng, G.Y., Huang, Z.D. Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach. Journal of Protein Chemistry. 2003, (22):395-402.
    [37] Chou, P.Y. Amino acid composition of four classes of proteins. In:Abastracts of Papers, Part I, Second Chemical Congress of the North American Continent. Las Vega. 1980.
    [38] Chou, K.C., Zhang, C.T. Predicting protein folding types by distance functions that make allowances for amino acid interactions. The Journal of Biological Chemistry, 1994, (269):22014-22020.
    [39] Chou, K.C., Maggiora, GM. Domain structural class prediction. Protein Engineering, 1998,(11):523-538.
    [40] Chou, K.C., Liu, W., Maggiora, GM., Zhang, C.T. Prediction and classification of domain structural classes. Proteins Structure, Function, and Bioinformatics, 1998, (31):97-103.
    [41] Biro, J.C., Benyo, B. A common periodic table of codons and amino acids, Biochemical and Biophysical research communications. 2003, 306:1408-1415.
    [42] Cristea, P. Independent Component Analysis for Genetic Signals. SPUE Conference BIOS 2001-international Biomedical optics Symposium, SC316, Short Course, San Jose, USA, 20-26 January 2001.
    [43]Sofer,W.H.Predicting secondary structure of proteins using genetic algorithms.http://waksman.Rutgers.Edu/Waks/Sofer/sofer.Html.
    [44]Stambuk,N.On the Genetic Origin of Complementary Protein Coding,Croatica Chemica acta.1998,71(3):573-589.
    [45]Xiao,X.,Shao,S.H.,Huang,Z.D.Using Cellular Automata Images and Pseudo Amino Acid Composition to Predict Protein Sub-Cellular Location.Amino Acids,2006,30(1):49-54.
    [46]Wolfram,S.A New Kind of Science.Wolfram Media Inc.,Champaign,IL.2002.
    [47]Haralick,R.M.Shanmugam,K.Textural features for image classification.IEEE Transactions on Systems,Man,and Cybernetics.1973,3:610-621.
    [48]Wang,Z.X.The current situation and prospect of protein structure prediction.Chemistry of Life.1998,18(6):19-22.
    [49]Berman,H.M.,Westbrook,J.,Feng,Z.Nucleic Acids Research,2000,28:235.
    [50]Chou,K.C.,Cai,Y.D.Using GO-PseAA predictor to predict enzyme subclass.Biochemical and Biophysical Research Communications.2004,325:506-509.
    [51]Zhang,C.T.,Chou,K.C.,Maggiora,G.M.Predicting protein structural classesfrom amino acid composition:application of fuzzy clustering.Protein Eng.1995,8:425-435.
    [52]Shen,H.B.,Yang,J.,Liu,X.J.,Chou,K.C.Using supervised fuzzy clustering to predict protein structural classes.Biochem.Biophys.Res.Commun.2005,334:577-581.
    [53]Xiao,X.,Shao,S.,Ding,Y.,Huang,Z.,Chen,X.,Chou,K.C.Using cellular automata to generate Image representation for biological sequences.Amino Acids,2005,28:29-35.
    [54]阎隆飞,孙之荣编.蛋白质分子结构.北京:清华大学出版社.2000.
    [55]施建宇,潘泉,张绍武,程咏梅.基于氨基酸组成分布的蛋白质同源寡聚体分类研究.生物物理学报,2006(22):49-56.
    [56]Zhang,S.W.,Pan,Q.,Zhang,H.C.Classification of protein quaternary structure with support vector machine.Bioinformatics,2003,(19):2390-2396.
    [57]Letunic,I.,Copley,R.R.,Pils,B.SMART 5:domains in the context of genomes and networks.Nucleic Acids Res.2006,34:257-260.
    [58]Tanford,C.Contribution of hydrophobic interactions to the stability of the globular conformation of proteins.J Am Chem Soc.1962,84:4240-4274.
    [59]Hopp,T.P.,Woods,K.R.Prediction of protein antigenic determinants from amino acid sequences.Proc Natl Acad Sci USA.1981,78:3824-3828.
    [60]Carugo,O.A structural proteomics filter:prediction of the quaternary structural type of hetero-oligomeric proteins on the basis of their sequences. Applied Crystallography. 2007, 40:986-989.
    [61] Li, W., Godzik, A. Cd-hit:a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22:1658-1659.
    [62] Xiao, X., Shao, S.H., Ding, Y.S., Huang Z.D., Huang, Y., Chou, K.C. Using complexity measure factor to predict protein subcellular location. Amino acids. 2005, 28:29-35.
    [63] Horn, F., Weare, J., Beukers, M.W. GPCRDB:an information system for G protein-coupled receptors. Nucleic Acids Res. 1998, 26:275-279.
    [64] Horn, R, Mokrane, M., Weare, J. G protein-coupled receptors:Mechanism of agonist activation. J Biol Chem. 1998, 273:17929-17982.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700