蛋白质分类问题的特征提取算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
人类基因组计划的实施带来了蛋白质数据库中海量的序列信息,而对蛋白质高级结构和功能的认识却远远落后于序列信息。面对浩瀚的蛋白质序列数据,探索理论与计算的方法研究蛋白质结构和功能具有重要意义,也是后基因组时代生物信息学的核心问题之一。
     由于蛋白质结构和功能的复杂性,人们很难抓住其整体特征用简单的方法对所有蛋白质进行分类。而在蛋白质研究中存在许多专业分类方法,每一种分类准则在一定领域内都有很重要的实用价值。因此蛋白质分类问题作为蛋白质组学研究的一个分支,近年来受到研究者们越来越多的关注。蛋白质分类研究是全面掌握蛋白质结构与功能的前提和基础,在分子生物学、细胞生物学、药理学和医学中扮演着非常重要的角色。
     蛋白质序列的特征提取是基于计算的蛋白质分类研究中最为基本的问题,也是决定分类质量的关键问题。本文对此进行了深入的分析和研究,针对蛋白质分类研究中的四类基本问题,提出和实现了四种不同的特征提取算法,并在标准数据集上进行了测试验证和比较分析。本文的主要工作和创新之处概括如下:
     (1)蛋白质的结构型可以为蛋白质空间结构预测提供重要的信息。对于一个结构未知的蛋白质,如果能够准确地知道其结构型,不仅可以提高二级结构分类精度,而且能够大大缩小三级结构预测中构象搜索的范围。此外,结构型与蛋白质的某些功能也具有密切联系。本文基于离散量的概念构造了一种新的蛋白质序列特征提取算法——k -子串离散源方法。结合k -子串离散源和最小离散增量算法,构建了一种新的蛋白质结构型分类模型SS+Diver。该模型从蛋白质的序列出发,不需引入其它任何信息,计算简单、分类精度高。针对标准数据集T359,SS+Diver模型的Jackknife检验总体分类精度达到97.49%,比目前已有的分类模型提高了1.67~56.27个百分点。实验结果表明,与已有分类模型相比,本文提出的SS+Diver模型具有较强的自适应、泛化和推广应用能力。
     (2)四级结构是蛋白质一级结构、二级结构和三级结构的延伸,是指寡聚蛋白质中亚基的种类、数目、空间排布以及亚基之间的相互作用。寡聚蛋白质广泛地参与物质代谢、信号传导、染色体复制等各种生命活动,对寡聚蛋白质四级结构的研究有着重要的生物学意义。本文提出了三种不同的组合特征提取算法,并采用最近邻居算法对二聚体与非二聚体蛋白以及七类同源寡聚体蛋白的分类问题进行了探讨。实验结果表明,三种组合特征提取算法中基于DPC_ACF的模型计算简单、分类性能好;针对标准数据集RG1639,该模型的Jackknife检验总体分类精度达到90.2%,比目前已有的分类模型提高了2.7~31.3个百分点;针对标准数据集CC3174,该模型的Jackknife检验总体分类精度达到91.18%,比目前已有的分类模型提高了12.68~22.78个百分点。
     (3)细胞凋亡蛋白质在生物体的生长发育和动态平衡中起重要作用,这些蛋白质对于了解细胞程序性死亡的机制非常重要。而细胞凋亡蛋白质的亚细胞定位与其在细胞中行使的功能有着密切的关系。本文基于“粗粒化”和“分组”的思想,提出了一种新的蛋白质序列特征提取算法——分组重量编码方法。并分别结合组分耦合算法、最近邻居算法和支持向量机构建了EBGW+CCA、EBGW+NNA和EBGW+SVM三个分类模型。实验结果表明,针对相同的数据集,采用相同的分类算法,分组重量编码方法综合考虑氨基酸的多种物理化学特性,能比氨基酸组成和非稳定性指标等特征提取算法更加有效地揭示出蕴含在字母序列中的结构与功能信息,且计算简单;在标准数据集上与现有的工作相比,本文提出的EBGW+SVM模型分类效果较好,总体分类精度、各类的敏感性和Matthews相关系数都有较大幅度的提高。
     (4)膜蛋白质在细胞中占有重要的地位。国际上已有成功的方法区分膜蛋白质与非膜蛋白质。如果人们能够从理论上预测膜蛋白质的类型及其与磷酸双脂层的结合方式,对于了解新测序的膜蛋白质的功能有十分重要的意义。本文引入亚字母集(sub-alphabet)的概念,并进一步提出了基于亚字母集的亚多肽组成特征提取算法。该方法不仅能够提取蛋白质序列中蕴含的细胞特征信息,有效改善分类模型的性能;而且大大降低计算复杂性,解决了传统多肽组成方法特征提取能力强,但是计算复杂、应用受限的现状。针对标准数据集CE2059,提出的基于AAC_S6P2的模型的总体分类精度比基于氨基酸组成和二肽组成组合方法的模型提高了0.1%,而运算时间仅为后者的11.75%。与已有的分类模型相比,该模型的总体分类精度提高了1.02~25.16个百分点。
     (5)最后,本文还对分类模型的分类性能与数据集特性之间的关系进行了初步探讨。
With the success of human genome project, a widening gap appears between sharply increasing known protein sequences and slow accumulation of known protein structures and functions. It is urgent to find a trustworthy theoretical and computational approach to predict protein structures and functions from immensurable sequences, which is a kernel task of bioinformatics in the post-genomic era.
     Since the great diversity of protein structures and functions, it is difficult to capture the important features of them with any simple classification scheme. There are many specialized ways of grouping proteins, each of which has been helpful for some fields. As an offshoot of the research of proteomics, protein classification has been focused on with more and more attentions. Any new breakthrough in this research will be helpful to further understand the structure and function of protein. What’s more, it plays an im-portant role in molecular biology, cellular biochemistry, pharmacology and medicine etc.
     Feature extraction of protein sequence is a basic problem in the research of protein classification, and also a key factor of the classification performance. This thesis studies some algorithms in this subject, proposes four new feature extraction algorithms for four basic types of problems in the research of protein classification, and takes some testing and analysis for these algorithms based on the standard dataset. The main work and the creative achievements in this thesis are shown as followed:
     1. Protein structural class is very important to the protein structure prediction. To protein with unknown structure, it will lead to the increase of secondary structure pre-diction accuracy, and also lead to the decrease of the complexity of protein tertiary structure prediction, if the structural class is clear. Based on the concept of measure of diversity, k-substring diversity source is presented. Combined with the increment of di-versity algorithm, the new feature extraction approach is applied to protein structural class prediction. For the dataset T359, the overall accuracy of SS+Diver model in Jack-knife test is 97.49%, about 1.67~56.27 percentile higher than that of other existing models.
     2. To understand the structure and function of a protein, an important task is to identify the quaternary structure for a new polypeptide chain, i.e., whether it is formed just as a monomer, or as dimer, or any other oligomer. Thus, a computational method for properly classifing the quaternary structure of proteins would be significant in inter-preting the original data produced by the large-scale genome sequencing projects. Three different composite feature extraction methods are raised and applied to protein quater-nary structure prediction combined with the nearest neighbor algorithm. The simulation results show that the performances bsed on DPC_ACF are higher than that of other composite methods. For the dataset RG1639, the overall classification accuracy of DPC_ACF in Jackknife test is 90.2%, about 2.7%~31.3% higher than that of other ex-isting models. For the dataset CC3174, the overall classification accuracy of DPC_ACF in Jackknife test is 91.18%, about 12.68%~22.78% higher than that of the best existing model.
     3. Apoptosis proteins play an important role in the growth and homeostasis of or-ganism. Functions of those proteins will be helpful to make clear the mechanism of programmed cell death. The knowledge of the subcellular location of apoptosis protein is important to understand the function of apoptosis protein. Based on the idea of coarse-grained description and grouping, a new approach named as encoding based on grouped weight (EBGW) for protein sequence is presented. Combining with the com-ponent-coupled algorithm, the nearest neighbor algorithm and support vector machine respectively, three classification models (named as EBGW+CCA, EBGW+NNA and EBGW+SVM) are put forward, and applied to the subcellular location prediction of apoptosis protein. Experiments show that, for the same dataset, with the same classifica-tion algorithm, the capacity of feature extraction from EBGW approach excel that from amino acid composition and instability index. The overall classification accuracy, sensi-tivity and Matthews’correlation coefficient of each class from EBGW+SVM model are all higher than those of existing models.
     4. Membrane proteins are very important in a cell, and can be relatively easily dis-criminated from non-membrane proteins. The determination of functions for new mem-brane proteins can be expedited significantly if we can find an effective algorithm to predict their types. Based on the concept of sub-alphabet, sub-polypeptide composition of protein sequence is presented. The new algorithm not only contains more cellular in-formation of protein sequence, but also greatly decreases the computation complexity. Consequently, for the dataset CE2059, the overall classification accuracy of model with sub-polypeptide composition is 0.1% higher than that of model with traditional poly-peptide composition. Even more, the computation time of our model is only 11.75% of that of the latter. Compared with existing models, the overall classification accuracy in-creases about 1.02~25.16 percentile in the Jackknife test.
     5. In the end, relation between the performance of classification model and the characteristics of training dataset is simple discussed.
引文
[1] Abbott A.And now for the proteome.Nature,2001(409):747
    [2] Service R F.High-speed biologists search for gold in proteins.Science,2001(294):2074~2077
    [3] Marte B.Proteomics.Nature,2003(422):191
    [4] Tyers M,Mann M.From genomics to proteomics.Nature,2003(422):193~197
    [5] Aebersold R,Mann M.Mass spectrometry-based proteomics.Nature,2003(422):198~207
    [6] Phizicky E,Bastiaens P I,Zhu H et al.Protein analysis on a proteomic scale.Nature,2003(422):208~215
    [7] Sali A,Glaeser R,Earnest T et al.From words to literature in structural pro-teomics.Nature,2003(422):216~225
    [8] Hanash S.Disease proteomics.Nature,2003(422):226~232
    [9] Boguski M S,McIntosh M W.Biomedical informatics for proteomics.Nature,2003(422):233~237
    [10] 夏其昌,曾嵘等编.蛋白质化学与蛋白质组学.北京:科学出版社,2004
    [11] Jung E,Heller M,Sanchez J C et al.Proteomics meets cell biology: the es-tablishment of subcellular proteomes.Electrophoresis,2000(21):3369~3377
    [12] Taylor R S,Wu C C,Hays L G et al.Proteomics of rat liver Golgi complex: minor proteins are identified through sequential fractionation.Electrophoresis,2000(21):3441~3459
    [13] Dreger M,Bengtsson L,Schoneberg T et al.Nuclear envelope proteomics: novel integral membrane proteins of the inner nuclear membrane.Proceedings of the National Academy of Sciences,2001(98):11943~11948
    [14] Gerstein M,Lan N,Jansen R.Integrating interactomes.Science,2002(295):284~287
    [15] Gilbert W.Towards a paradigm shift in biology.Nature,1991(349):99
    [16] 郝柏林,刘寄星编.理论物理与生命科学.上海:上海科学技术出版社,1999.
    [17] Chou K C,Elrod D W.Prediction of membrane protein types and subcellular locations.Proteins: Structure, Function, and Bioinformatics,1999(34):137~153
    [18] Chou K C.Review: prediction of protein structural classes and subcellular loca-tions.Current Protein & Peptide Science,2000(1):171~208
    [19] Feng Z P.An overview on predicting the subcellular location of a protein.In Silico Biology,2002(2):291~303
    [20] Rose G D.Hierarchic organization of domains in globular proteins.Journal of Molecular Biology,1979(134):447~470
    [21] Levitt M,Chothia C.Structure patterns in globular proteins.Nature,1976(262):552~557
    [22]. 张春霆.蛋白质结构分类与结构类预测研究.中国科学基金,2000:298~299
    [23]. 李晓琴,罗辽复.蛋白质结构型的定义和识别.生物化学与生物物理进展,2002(29):124~127
    [24] Nakashima H,Nishikawa K,Ooi T.The folding type of a protein is relevant to the amino acid composition.Journal of Biochemistry,1986(99):152~162
    [25] Chou K C . A key driving force in determination of protein structural classes.Biochemical and Biophysical Research Communications,1999(264):216~224
    [26] Chou K C,Liu W,Maggiora G M,Zhang C T.Prediction and classification of domain structural classes.Proteins: Structure, Function, and Bioinformatics,1998(31):97~103
    [27] Zhang C T,Chou K C.An optimization approach to predicting protein structural class from amino acid composition.Protein Science,1992(1):401~408
    [28] Zhou G F,Xu X,Zhang C T.A weighting method for predicting protein strctural class.European Journal of Biochemistry,1992(210):747~749
    [29] Chou K C,Zhang C T.A new approach to predicting protein folding types.Journal of Protein Chemistry,1993(12):169~178
    [30] Zhang C T,Chou K C,Maggiora G M.Predicting protein structural classes from amino acid composition: application of fuzzy clustering.Protein Engineering,1995(8):425~435
    [31] Bahar I,Atilgan A R,Jernigan R L,Erman B.Understanding the recognition of protein structural calsses by amino acid composition.Proteins: Structure, Function, and Genetics,1997(29):172~185
    [32] Liu W,Chou K C.Prediction of protein structural classes by modified Maha-lanobis discriminant algorithm.Journal of Protein Chemistry,1998(17):209~217
    [33] Chou K C.Does the folding type of a protein depend on its amino acid compo-sition?.FEBS Letters,1995(363):127~131
    [34] Chou K C,Zhang C T.Predicting protein folding types by distance functions that make allowances for amino acid interactions.The Journal of Biological Chem-istry,1994(269):22014~22020
    [35]. 秦红珊,杨新岐,曹文斗.从非同源蛋白质的一级序列预测其结构类.生物物理学报,2002(18):213~222
    [36] Chou K C.A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space.Proteins: Structure, Function, and Genetics,1995(21):319~344
    [37] Zhang C T,Chou K C.An eigenvalueeigenvector approach to predicting protein folding types.Journal of Protein Chemistry,1995(14):309~326
    [38] Zhou G P.An intriguing controversy over protein structural class predic-tion.Journal of Protein Chemistry,1998(17):729~738
    [39] Cai Y D,Hu J,Liu X J,Chou K C.Prediction of protein structural classes by neural network method.Internet Electronic Journal of Molecular Design,2002(1):332~338
    [40] Cai Y D,Zhou G P.Prediction of protein structural classes by neural net-work.Biochimie,2000(82):783~785
    [41] Cai Y D,Li Y X,Chou K C.Using neural networks for prediction of domain structural classes.Biochimica et Biophysica Acta,2000(1476):1~2
    [42] Metfessel B A,Saurugger P N,Connelly D P,Rich S S.Cross-validation of protein structural class prediction using statistical clustering and neural net-works.Protein Science,1993(2):1171~1182
    [43] Dubchak I,Muchnik I,Holbrook S R,Kim S H.Prediction of protein folding class using global description of amino acid sequence.Proceedings of the Na-tional Academy of Sciences,1995(92):8700~8704
    [44] Cai Y D,Liu X J,Xu X B,Chou K C.Prediction of protein structural classes by support vector machines.Computers and Chemistry,2002(26):293~296
    [45] Cai Y D,Liu X J,Xu X B,Chou K C.Support vector machines for prediction of protein domain structural class.Journal of Theoretical Biology,2003(221):115~120
    [46] Cai Y D,Liu X J,Xu X B,Zhou G P.Support Vector Machines for predicting protein structural class.BMC Bioinformatics,2001(2):3
    [47] Shen H P,Yang J,Liu X J,Chou K C.Using supervised fuzzy clustering topredict protein structural classes.Biochemical and Biophysical Research Com-munications,2005(334):577~581
    [48] Jin L X,Fang W W,Tang H W.Prediction of protein structural classes by a new measure of information discrepancy.Computational Biology and Chemistry,2003(27):373~380
    [49] Luo R Y,Feng Z P,Liu J K.Prediction of protein structural class by amino acid and polypeptide composition.European Journal of Biochemistry,2002(269):4219~4225
    [50] Li Q Z,Lu Z Q.The Prediction of the Structural Class of Protein: Application of the Measure of Diversity.Journal of Theoretical Biology,2001(213):493~502
    [51] Cai Y D,Feng K Y,Lu W C,Chou K C.Using LogitBoost classifier to predict protein structural classes.Journal of Theoretical Biology,2006(238):172~176
    [52] Feng K Y,Cai Y D,Chou K C.Boosting classifier for predicting protein domain structural class.Biochemical and Biophysical Research Communications,2005(334):213~217
    [53] Cai Y D.Is it a paradox or misinterpretation.Proteins: Structure, Function, and Genetics,2001(43):336~338
    [54] Wang Z X.The prediction accuracy for protein structural class by the compo-nent-coupled method is around 60%.Proteins: Structure, Function, and Genetics,2001(43):339~40
    [55] Wang Z X,Yuan Z.How good is prediction of protein structural class by the component-coupled method?.Proteins: Structure, Function, and Genetics,2000(38):165~175
    [56] Zhou G P,Assa-Munt N.Some insights into protein structural class predic-tion.Proteins: Structure, Function, and Bioinformatics,2001(44):57~59
    [57] Bu W S,Feng Z P,Zhang Z D,Zhang C T.Prediction of protein (domain) structural classes based on amino acid index.European Journal of Biochemistry,1999(266):1043~1049
    [58]. 李晓琴,罗辽复.蛋白质结构类预测新方法-基于蛋白质二级结构序列的预测方法.内蒙古大学学报,1998(29):650~654
    [59]. 贾孟文,李前忠.预测蛋白质结构型的新方法.内蒙古大学学报,2000(31):440~442
    [60]. 贾孟文,李前忠.蛋白质结构型预测的研究.内蒙古大学学报,2002(33):276~279
    [61]. 李晓琴,罗辽复.蛋白质结构型的识别方法.生物化学与生物物理进展,2002(29):938~941
    [62]. 张立震,唐焕文.一种基于子序列分布的蛋白质结构类预测方法.计算机与应用化学,2003(20):251~256
    [63] Chou K C,Cai Y D.Predicting protein structural class by functional domain composition.Biochemical and Biophysical Research Communications,2004(321):1007~1009
    [64] Klotz IM D D, Langerman NR. Quaternary structure of proteins. In: Neurath H, Hill RL, editors. The protein[M], 3rd ediction, Vol.1. New York: Academic Pre-sess, 1975,1:226~411.
    [65] 王大成编.蛋白质工程.北京:化学工业出版社,2002.
    [66] Robert G . Prediction of quaternary structure from primary struc-ture.Bioinformatics,2001(17):551~556
    [67] Chou K C,Cai Y D.Predicting protein quaternary structure by pseudo amino acid composition.Proteins: Structure, Function, and Genetics,2003(53):282~289
    [68] 张绍武.基于支持向量机的蛋白质分类研究:学位论文.西安:西北工业大学,2003.
    [69] 张绍武,潘泉,张洪才,张云龙,王海瑜.基于支持向量机和贝叶斯方法的蛋白质四级结构分类研究.生物物理学报,2003(19):171~175
    [70] 张绍武,潘泉,陈润生,张洪才.基于支持向量机的蛋白质同源寡聚体分类研究.生物化学与生物物理进展,2003(30):879~883
    [71] 张邵武,潘泉,张洪才,张云龙,王海瑜.基于支持向量机和贝叶斯方法的蛋白质四级结构分类研究.生物物理学报,2003(19):171~175
    [72] 张邵武,潘泉,陈润生,张洪才.基于支持向量机的蛋白质同源寡聚体分类研究.生物化学与生物物理进展,2003(30):879~883
    [73] Zhang S W,Pan Q,Zhang H C,Shao Z C,Shi J Y.Prediction of protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive bayes feature fusion.Amino Acids,2006(30):461~468
    [74] Yu X J,Wang C,Li Y X.Classification of protein quaternary structure by functional domain composition.BMC Bioinformatics,2006(7):187
    [75] Fujiwara Y,Asogawa M.Prediction of subcellular localizations using amino acid composition and order.Genome Inform Ser Workshop Genome Informatics,2001(12):103~12
    [76] Rigaut G,Shevchenko A,Rutz B.A generic protein purification method for protein complex characterization and proteome exploration.Nature Biotech-nology,1999(17):1030~1032
    [77] Murphy R F,Boland M V,Velliste M et al.Towards a systematics for protein subcellular location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images.International Conference on Intelligent Systems for Molecular Biology,2000(8):251~259
    [78] Rout M P,Aitchison J D,Suprato A et al.The yeast nuclear pore complex: composition, architecture, and transport mechanism.The Journal of Cell Biol-ogy,2000(148):635~652
    [79] Wigge P A,Jensen O N,Holmes S et al.Ananlysis of the Saccharomyces spindle pole by matrix-assisted laser desorption/ionization (MALDI) mass spectrome-try.The Journal of Cell Biology,1998(141):967~977
    [80] Fialka I,Steinlein P,Ahorn H et al.Identification of syntenin as a protein of the apical early endocytic compartment in Madin-Darby canine kidney cells.The Journal of Biological Chemistry,1999(274):26233~26239
    [81] Neubauer G,King A,Rappsilber J et al.Mass spectrometry and EST-database seraching allows charaterization of the multi-protein spliceosome com-plex.Nature Genetics,1998(20):46~50
    [82] Nakai K . Protein sorting signals and prediction of subcellular localiza-tion.Advances in Protein Chemistry,2000(54):277~344
    [83] Emanuelsson O , von Heijne G . Prediction of organellar targeting sig-nals.Biochimica et Biophysica Acta,2001(1541):114~119
    [84] Koonin E V.Bridging the gap between sequence and function.Trends in Ge-netics,2000(16):16
    [85] Tamames J,Ouzounis C et al.EUCLID: automatic classification of proteins i n functional classes by their database annotations.Bioinformatics,1998(14):542~543
    [86] Orengo C A,Todd A E et al.From protein structure to function.Current Opinion in Structural Biology,1999(9):374~382
    [87] Wilson C A,Kreychman J et al.Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.Journal of Molecular Biology,2000(297):233~249
    [88] Pawlowski K,Godzik A.Surface map comparison: studying function diversity of homologous proteins.Journal of Molecular Biology,2001(309):793~806
    [89] Rost B.Enzyme function less conserved than anticipated.Journal of Molecular Biology,2002(318):595~608
    [90] Nair R,Rost B.Sequence conserved for subcellular localization.Protein Sci-ence,2002(11):2836~2847
    [91] Wrzeszczynski K O,Rost B.Annotating proteins from endoplasmic reticulum and Golgi apparatus in eukaryotic proteomes.Cellular and Molecular Life Sci-ences,2004(61):1341~1353
    [92] von Heijne G . Protein sorting signals: simple peptides with complex functions.EXS,1995(73):67~76
    [93] Nakai K,Horton P.PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization.Trends in Biochemical Sciences,1999(24):34~36
    [94] Cokol M,Nair R,Rost B.Finding nuclear localization signals.EMBO Reports,2000(1):411~415
    [95] Emanuelsson O,Nielsen H,von Heijne G.ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites.Protein Science,1999(8):978~984
    [96] Claros M.MitoProt, a macintosh application for studying mitochondrial pro-teins.Computer Applications in the Biosciences,1995(11):441~447
    [97] Claros M,Vincens P.Computational method to predict mitochondrially imported proteins and their targeting sequences.European Journal of Biochemistry,1996(241):779~786
    [98] Nielsen H,Engelbrecht J,Brunak S,von Heijne G.Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.Protein Engineering,1997(10):1~6
    [99] Emanuelsson O,Nielsen H,Brunak S,von Heijne G.Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.Journal of Molecular Biology,2000(300):1005~1016
    [100] Nakai K,Kanehisa M.Expert system for predicting protein localization sites in Gram-negative bacteria.Proteins: Structure, Function, and Bioinformatics,1991(11):95~110
    [101] Nakai K,Kanehisa M.A knowledge base for predicting protein localization sitesin eukaryotic cells.Genomics,1992(14):897~911
    [102] Horton P,Nakai K.A probabilistic classification system for predicting the cellular localization sites of proteins.Proceedings/International Conference on Intelligent Systems for Molecular Biology,1996(4):109~115
    [103] Nakai K,Horton P.Better prediction of protein cellular localization sites with the k nearest neighbors classifier.Intell.Syst.Mol.Biol,1997(5):147~152
    [104] Reinhardt A,Hubbard T.Using neural networks for prediction of the subcellular location of proteins.Nucleic Acids Research,1998(26):2230~2236
    [105] Lewis D D,Ringuette M.Comparison of two learning algorithms for text categorization.Proceedings of the thrid annual symposium on Document Analysis and Information Retrieval,1994
    [106] Apte C,Damerau F et al.Towards language independent automated learning of text categorization models.Proceedings of the 17th Annual ACM/SIGIR con-ference,1994
    [107] Nair R,Rost B.Inferring sub-cellular localization through automated lexical analysis.Bioinformatics,2002(18):S78~S86
    [108] Lu Z,Szafron D,Greiner R,Lu P,Wishart D S,Poulin B,Anvik J,Macdonell C,Eisner R.Predicting subcellular localization of proteins using machine-learned classifiers.Bioinformatics,2004(20):547~556
    [109] Stapley B J,Kelley L A et al.Predicting the sub-cellular location of proteins from text using support vector machines.Pacific Symposium on Biocomputing,2002:374~385
    [110] Eisenhaber F,Bork P.Wanted: subcellular localization of proteins based on sequence.Trends in Cell Biology,1998(8):169~170
    [111] Fleischmann W,Moller S.A novel method for automatic functional annotation of proteins.Bioinformatics,1999(15):228~233
    [112] Nakashima H,Nishikawa K.Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies.Journal of Molecular Biology,1994(238):54~61
    [113] Cedano J,Aloy P,Perez-Pons J A,Querol E.Relation between amino acid composition and cellular location of proteins.Journal of Molecular Biology,1997(266):594~600
    [114] Chou K C,Elrod D W.Using discriminant function for prediction of subcellular location of prokaryotic proteins.Biochemical and Biophysical Research Com-munications,1998(252):63~68
    [115] Chou K C,Elrod D W.Protein subcellular location prediction.Protein Engi-neering,1999(12):107~118
    [116] Yuan Z.Prediction of protein subcellular locations using Markov chain mod-els.FEBS Letters,1999(14):23~26
    [117] Garg A,Bhasin M,Raghava G P.SVM-based method for subcellular localization of human proteins using amino acid compositions,their order and similarity search.The Journal of Biological Chemistry,2005(280):14427~14432
    [118] Chou K C , Cai Y D . Predicting protein localization in budding Yeast.Bioinformatics,2005(21):944~ 950
    [119] Cai Y D , Chou K C . Predicting 22 protein localizations in budding yeast.Biochemical and Biophysical Research Communications,2004(323):425~428
    [120] Zhou G P,Doctor K.Subcellular location prediction of apoptosis pro-teins.Proteins: Structure, Function, and Bioinformatics,2003(50):44~48
    [121] Lei Z,Dai Y.An SVM-based system for predicting protein subnuclear local-izations.BMC Bioinformatics,2005(6):291
    [122] Shen H B,Chou K C.Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composi-tion.Biochemical and Biophysical Research Communications,2005(337):752~756
    [123] 陈颖丽,李前忠.用离散量方法预测细胞凋亡蛋白的亚细胞位置.内蒙古大学学报,2004(35):413~417
    [124] 黄静,石峰,周怀北.利用支持向量机和蛋白质非稳定性指标预测凋亡蛋白类型.生物信息学,2005(3):121~123
    [125] Cai Y D,Liu X J,Chou K C.Artificial neural network model for predicting membrane protein types.Journal of Biomolecular Strcuture and Dynamics,2001(18):607~610
    [126] Cai Y D,Ricardo P W,Jen C H,Chou K C.Application of SVM to predict membrane protein types.Journal of Theoretical Biology,2004(226):373~376
    [127] 刘国平,姚莉秀,杨杰,王猛.基于加权支持向量机的膜蛋白类型预测中不平衡问题处理.上海交通大学学报,2005(39):1676~1684
    [128] 郭宗明,张治洲,潘宇曦,黄振德,冯国,贺林.利用支持向量机预测生物膜蛋白类型.上海交通大学学报,2004(38):806~809
    [129] Shen H B,Chou K C.Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types.Biochemical and Biophysical Research Communications,2005(334):288~292
    [130] Shen H B,Yang J,Chou K C.Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition.Journal of Theoretical Biology,2005
    [131] Liu H,Yang J,Wang M,Xue L,Chou K C.Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types.The Protein Journal,2005
    [132] Wang M,Yang J,Xu Z J,Chou K C.SLLE for predicting membrane protein types.Journal of Theoretical Biology,2004(232):7~15
    [133] 徐志节,杨杰,王猛.利用非线性降维方法预测膜蛋白类型.上海交通大学学报,2005(39):279~283
    [134] Cai Y D,Liu X J,Xu X B,Chou K C.SVM for predicting membrane protein types by incorporating quasi-sequence-order effect.Internet Electronic Journal of Molecular Design,2002(1):219~226
    [135] Cai Y D,Zhou G P,Chou K C.Support Vector Machines for Predicting Membrane Protein Types by Using Functional Domain Composi-tion.Biophysical Journal,2003(84):3257~3263
    [136] 张绍武,潘泉,程咏梅,施建宇.基于一种新的特征提取法和支持向量机的膜蛋白分类研究.计算机与应用化学,2006(23):294~298
    [137] Feng Z P,Zhang C T.Prediction of membrane protein types based on the hy-drophobic index of amino acids.Journal of Protein Chemistry,2000(19):269~275
    [138] 靳利霞,唐焕文.氨基酸序列的特征描述.计算机与应用化学,2003(20):1~5
    [139] 靳利霞.蛋白质结构预测方法研究:学位论文.大连:大连理工大学,2002.
    [140] Park K, J , Kanehisa M . PLOC: prediction of subcellular location of proteins.Genome Informatics,2003(14):559~560
    [141] Bhasin M,Raghava G P S.ESLpred: SVM-based method for subcellular lo-calization of eukaryotic proteins using dipeptide composition and PSI-BLAST.Nucleic Acids Research,2004(32):W414~W419
    [142] Huang Y,Li Y.Prediction of protein subcellular locations using fuzzy k-NN method.Bioinformatics,2004(20):21~28
    [143] Lei Z D,Dai Y.A Novel Approach for Prediction of Protein Subcellular Lo-calization from Sequence Using Fourier Analysis and Support Vector Ma-chines.Proceedings of 4th ACM SIGKDD Workshop on Data Mining in Bio-informatics,2004,Seattle,August 22:11~17
    [144] Kumarevel T S,Gromiha M M,Ponnuswamy M N.Structural class prediction: an application of residue distribution along the sequence.Biophysical Chemis-try,2000(88):81~101
    [145] Gao Q B,Wang Z Z,Yan C,Du Y H.Prediction of protein subcellular location using a combined feature of sequence.FEBS Letters,2005(579):3444~3448
    [146] Yu C S,Lin C J,Hwang J K.Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide composi-tions.Protein Science,2004(13):1402~1406
    [147] Park K J,Kanehisa M.Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.Bioinformatics,2003(19):1656~1663
    [148] Bhasin M,Garg A,Raghava G P S.PSLpred: prediction of subcellular local-ization of bacterial proteins.Bioinformatics,2005(21):2522~2524
    [149] Guo J,Lin Y L,Sun Z R.A novel method for protein subcellular localization: Combining residue-couple model and SVM.Proceedings of 3rd Asia-Pacific Bioinformatics Conference,Singapore,2005
    [150] Chou K C.Prediction of protein cellular attributes using pseudo-amino acid composition.Proteins: Structure, Function, and Genetics,2001(43):246~255
    [151] Pan Y X,Zhang Z Z,Guo Z M,Feng G Y,Huang Z D,He L.Application of pseudo amino acid composition for predicting protein subcellular location: sto-chastic signal processing approach.Journal of Protein Chemistry,2003(22):395~402
    [152] Pan Y X,Li D W,Duan Y,Zhang Z Z,Xu M Q,Feng G Y,He L.Predicting protein subcellular location using digital signal processing.Acta Biochimica et Biophysica Sinica,2005(37):88~96
    [153] Chou K C,Cai Y D.Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition.Journal of Cellular Biochemistry,2004(91):1197~1203
    [154] Cai Y D,Chou K C.Nearest neighbor algorithm for predicting protein subcel-lular location by combining functional domain composition and pseudo-aminoacid composition.Biochemical and Biophysical Research Communications,2003(305):407~411
    [155] Chou K C , Cai Y D . Prediction of protein subcellular locations by GO-FunD-PseAA predictor.Biochemical and Biophysical Research Communi-cations,2004(320):1236~1239
    [156] Cai Y D,Chou K C.Predicting subcellular localization of proteins in a hy-bridization space.Bioinformatics,2004(20):1151~1156
    [157] Feng Z P,Zhang C T.A graphic representation of protein primary structure and its application in predicting subcellular locations of prokaryotic proteins.Internet Journal of Biochemistry and Cell Biology,2002(34):298~307
    [158] Chou K C . Prediction of protein subcellular locations by incorporating quasi-sequence-order effect.Biochemical and Biophysical Research Communi-cations,2000(19):477~83
    [159] Cai Y D,Liu X J,Xu X B,Chou K C.Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order ef-fect.Journal of Cellular Biochemistry,2002(84):343~348
    [160] Chou K C,Cai Y D.Using functional domain composition and support vector machines for prediction of protein subcellular location.The Journal of Biological Chemistry,2002(277):45765~45769
    [161] Apweiler R,Attwood T K.The InterPro database, an integrated documentation resource for protein families, domains and functional sites.Nucleic Acids Re-search,2001(29):37~40
    [162] 高芸.基于基因本体论的生物信息个人数据库与其在蛋白质亚细胞定位预测研究中的应用:学位论文.上海:东华大学,2004.
    [163] Ashburner M,Ball C A,Blake J A,Botstein D.Gene ontology: tool for the unification of biology.Nature Genetics,2000(25):25~29
    [164] Chou P Y.Amino acid composition of four classes of proteins.In:Abastracts of Papers, Part I, Second Chemical Congress of the North American Continent, Las Vega,1980
    [165] Chou P Y.Prediction of protein structural classes from amino acid composi-tion.In Prediction of Protein Structure and the Principles of Protein Conforma-tion, ed. Fasman, G.D,1986:549~586
    [166] Klein P,Delisi C.Prediction of protein structural class from amino acid se-quence.Biopolymers,1986(25):1659~1672
    [167] Klein P . Prediction of protein structural class by discriminant analy-sis.Biochimica et Biophysica Acta,1986(874):205~215
    [168] Kikuchi T.Discrimination of folding types of globular proteins based on average distance maps constructed from their sequences.Journal of Protein Chemistry,1993(12):515~523
    [169] Mao B,Chou K C,Zhang C T.Protein folding classes: a geometric interpretation of the amino acid composition of globular proteins.Protein Engineering,1994(7):319~330
    [170] Chou K C,Maggiora G M.Domain structural class prediction.Protein Engi-neering,1998(11):523~538
    [171] Chandonia J M,Karplus M.Neural networks for secondary structure and structural class predictions.Protein Science,1995(4):275~285
    [172] 秦红珊,杨新岐.用 BP 神经网络基于氨基酸特性预测非同源蛋白质二级结构含量.生物物理学报,2002(18):467~473
    [173] Guo J,Lin Y L,Sun Z R.A novel method for protein subcellular localization based on boosting and probabilistic neural network.Proceedings of the second conference on Asia-Pacific bioinformatics,2004,Dunedin,New Zealand
    [174] 孙豫峰.基于概率神经网络的蛋白质亚细胞定位.太原师范学院学报,2005(4):23~25
    [175] Gao Q B,Wang Z Z.Using Nearest Feature Line and Tunable Nearest Neighbor methods for prediction of protein subcellular locations.Computational Biology and Chemistry,2005(29):388~392
    [176] Cai Y D,Liu X J,Xu X B,Chou K C.Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins.Molecular Cell Bi-ology Research Communications,2000(4):172~173
    [177] Cai Y D,Liu X J,Chou K C.Artificial neural network model for predicting protein subcellular location.Computers and Chemistry,2002(26):179~82
    [178] 吕志清,李前忠.一种预测蛋白质结构型的新方法.内蒙古大学学报,2002(33):26~30
    [179] 吕志清,李前忠.用离散量预测蛋白质的结构型.生物物理学报,2001(17):703~712
    [180] 陈颖丽,李前忠.用离散量预测原核生物蛋白质的亚细胞位置.内蒙古大学学报,2003(34):510~517
    [181] 李凤敏,李前忠.用离散量方法预测蛋白质亚细胞定位.内蒙古大学学报,2003(34):416~419
    [182] 李凤敏,李前忠.蛋白质亚细胞定位的识别.生物物理学报,2004(20):297~306
    [183] Bairoch A,Apweiler R.The SWISS-PROT protein sequence data bank and its supplement TrEMBL.Nucleic Acids Research,1997(25):31~36
    [184] Murzin A G,Brenner S E,Hubbard T,Chothia C.SCOP: a structural classi-fication of protein database for the investigation of sequences and struc-tures.Journal of Molecular Biology,1995(247):536~540
    [185] Baldi P,Brunak S,Chauvin Y,Andersen C A F,Nielsen H.Assessing the accuracy of prediction algorithms for classification: an over-view.Bioinformatics,2000(16):412~424
    [186] Zhang C T,Zhang R.Q9, a content-balancing accuracy index to evaluate algo-rithms of protein secondary structure prediction.The International Journal of Biochemistry and Cell Biology,2003(35):1256~1262
    [187] Anfinsen C B,Haber E,Sela M,White F H.The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain.Proceedings of the National Academy of Sciences,1961(47):1309~1314
    [188] 徐克学编.生物数学.北京:科学出版社,1999.
    [189] Laxton R R.The measure of diversity.Journal of Theoretical Biology,1978(71):51~67
    [190] Wang G , Dunbrack R L J . PISCES: a protein sequence culling server.Bioinformatics,2003(19):1589~1591
    [191] 阎隆飞,孙之荣编.蛋白质分子结构.北京:清华大学出版社,2000.
    [192] Klotz I M,Darnall D W,Langerman N R.Quaternary structure of proteins.In: Neurath H, Hill R L, editors. The protein[M], 3rd ediction, New York: Academic Presess,1975(1):226~411
    [193] Einstein E,Schachman H K.Determing the roles of subunits in protein func-tion.In: Creighton T E, editor. Protein function: A practical approach. London: IRL Press,1989:135~176
    [194] Price N C.Assembly of multi-subunit structure.New York:Oxford University Press,1994
    [195] Zhang S W,Pan Q,Zhang H C,Wu Y H,Shi J Y.Support vector mathines for predicting protein homo-oligomers by incorporating pseudo-amino acid compo-sition.Internet Electronic Journal of Molecular Design,2003(2):392~402
    [196] Zhang S W,Pan Q,Zhang H C,Zhang Y L,Wang H Y.Classification of proteinquaternary structure with support vector machine.Bioinformatics,2003(19):2390~2396
    [197] 施建宇,潘泉,张绍武,程咏梅.基于氨基酸组成分布的蛋白质同源寡聚体分类研究.生物物理学报,2006(22):49~56
    [198] 夏玉凤,赵倩,于静娟,敖光明.PF40 的亚细胞定位研究.生物化学与生物物理进展,2005(32):1020~1025
    [199] 孙景春,徐晋麟,李亦学,石铁流.大规模蛋白质相互作用数据的分析与应用.科学通报,2005(50):2055~2060
    [200] Schwikowski B,Uetz P,Fields S.A network of protein-protein interactions in yeast.Nature Biotechnology,2000(18):1257~1261
    [201] Chen Y,Xu D.Computational analyses of high-throughput protein-protein in-teraction data.Current Protein and Peptide Science,2003(4):159~181
    [202] Cai Y D,Liu X J,Xu X B,Chou K C.Support vector machines for prediction of protein subcellular location.Molecular Cell Biology Research Communica-tions,2000(4):230~233
    [203] Hua S,Sun Z.Support vector machine approach for protein subcellular local-ization prediction.Bioinformatics,2001(17):721~728
    [204] Cui Q,Jiang T,Liu B,Ma S.Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms.BMC Bioinformatics,2004(5):66
    [205] Gao Y,Shao S,Xiao X,Ding Y,Huang Y,Huang Z,Chou K C.Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter.Amino Acids,2005(28):373~376
    [206] Xiao X,Shao S,Ding Y,Huang Z,Huang Y,Chou K C.Using complexity measure factor to predict protein subcellular location.Amino Acids,2005(28):57~61
    [207] Chou K C,Cai Y D.Prediction and classification of protein subcellular loca-tion-sequence-order effect and pseudo amino acid composition.Journal of Cel-lular Biochemistry,2003(90):1250~1260
    [208] Feng Z P,Zhang C T.Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acids.Internet Journal of Biological Macromolecules,2001(28):255~261
    [209] Feng Z P.Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition.Biopolymers,2001(58):491~499
    [210] Xie D.LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST.Nucleic Acids Research,2005(33):W105~110
    [211] Nielsen H,Brunak S,von Heijne G.Machine learning approaches for the prediction of signal peptides and other protein sorting signals.Protein Engi-neering,1999(12):3~9
    [212] Jacobson M D,Weil M,Raff M C.Programmed cell death in animal devel-opment.Cell Molecular and Life Science,1997(88):347~354
    [213] Kaufmann S H,Hengartner M O.Programmed cell death: alive and well in the new millennium.Trends in Cell Biology,2001(11):526~534
    [214] He P A,Wang J.Numerical characterization of DNA primary sequence.Internet Electronic Journal of Molecular Design,2002(1):668~674
    [215] 贺平安.DNA 序列及蛋白质序列的分析与比较:学位论文.大连:大连理工大学,2002.
    [216] 林钧材,杨康成编.生物化学.沈阳:辽宁科学技术出版社,1996.
    [217] Chou K C,Cai Y D.Using GO-PseAA predictor to identify membrane proteins and their types.Biochemical and Biophysical Research Communications,2005(327):845~847
    [218] Cai Y D,Chou K C.Predicting membrane protein type by functional domain composition and pseudo amino acid composition.Journal of Theoretical Biol-ogy,2005
    [219] Chou K C,Cai Y D.Prediction of membrane protein types by incorporating amphipathic effects.Journal of Chemical Information and Modeling,2005(45):407~413
    [220] Maetschke S,Towsey M,Boden M.BLOMAP: An encoding of amino acids which improves signal peptide cleavage site prediction.Proceedings of 3rd Asia-Pacific Bioinformatics Conference,Singapore,2005:17~21
    [221] Wang M,Yang J,Liu G P,Xu Z J,Chou K C.Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition.Protein Engineering, Design and Selection,2004(17):509~516

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700