蛋白质亚细胞定位特征表达与分类算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白质组学是后基因组时代的一个重要研究方向,它试图诠释蛋白质在细胞中扮演的角色,揭示细胞环境中蛋白质之间的相互作用和及其功能。确定蛋白质的亚细胞定位是实现蛋白质功能注释的重要一环,但生物实验确定蛋白质亚细胞定位周期长,成本高,迫切需要发展新的更有效的方法。
     本文基于现代统计模式识别理论与方法,开展了亚细胞定位预测中的特征表达、分类算法、多类分类策略以及不均衡数据处理等问题的研究。主要贡献如下:
     1.提出了矩描述子特征表达方法,并从预测正确率、支持向量、训练和测试时间几个方面对基于支持向量机的三种多类分类策略的分类性能进行了研究。该特征表达方法从统计学角度分析了氨基酸组成成分特征,引入了氨基酸次序和位置信息,以氨基酸坐标均值和坐标方差来表示蛋白质序列中氨基酸出现位置的期望值和离散程度。基于两种典型数据库进行了实验数据验证,结果表明,矩描述子特征能更有效地表达出蛋白质序列中各种氨基酸残基的位置分布信息。
     2.提出了氨基酸组成分布特征表达方法,给出了不均衡性衡量指标,研究了不均衡数据集的不均衡性对支持向量机分类的影响,并提出了一种基于加权惩罚系数的训练方法。该特征表达方法将蛋白质序列平均分成多段,分别求取每一段子序列的氨基酸组成成分,不仅包含了所有子序列的氨基酸含量而且能够体现了子序列在空间结构上的相互作用关系。实验数据验证结果表明,(1)氨基酸组成成分特征体现了局部的子序列信息之和大于整体序列信息,能更有效地表达出蛋白质子序列之间的相互关系;(2)基于加权惩罚系数的训练方法能够来减轻数据的不均衡性给分类带来的负面影响。
     3.针对蛋白质物理化学信号的非平稳性,提出了基于氨基酸残基指数的多尺度能量特征表达方法。该特征表达方法利用氨基酸残基指数将蛋白质符号序列映射成数值信号,使用基于多分辨率分析思想的小波变换,将信号进行Mallat塔式分解,从而求解出该信号在多个尺度上的均方根能量,并通过向量的形式来表达亚细胞定位的特征信息。实验数据验证结果表明,本方法能更有效地表达出蛋白质物理化学信号的特性,并具有更小的计算复杂度。
     4.针对多种亚细胞定位特征之间的不一致性和特征维数高等问题,提出了一种基于多分类器系统的蛋白质亚细胞定位预测方法。该方法引入多分类器系统对多种特征进行聚合,融合了互补模式信息,降低单个分类器的不确定性,降低了高维特征带来的分类器模型构造难度,并减小了相应的计算负担。实验结果表明,与单个分类器相比,分类系统的预测能力得到了提高和改进;与其他方法相比,本方法更为有效和鲁棒。
As one of the most important areas in post-genome era, proteome aims tounderstand proteins' potential roles, elucidate their interaction in a cellular context, andfurther make the corresponding functional annotation. Determination of subcellularlocation of proteins is of essence and importance to their functional annotation.However, the biological experiment of protein subcellular localization will be hard tomeet the demands. Therefore, there is a need to develop more effective methods.
     Based on the modern theories and methods of statistical pattern recognition, therepresentation of feature, the algorithms of classification, the multi-class classification,and the processing of imbalance dataset are studied for the prediction of proteinsubcellular localization. The main contributions are as follows:
     1. A feature representation, moment descriptor (MD), is proposed and theperformances of three approaches of multi-class for support vector machines (SVM) areanalyzed in the case of recognition rate, the number of support vector, the training andtesting time. With the view of statistical theory, the presented method analyses aminoacid composition (AAC) and considers the information of amino acid's position inprotein sequence, and then uses amino acid coordinate mean (AAM) and coordinatevariance (AAV) to respectively represent the expectation and variance of its position ina protein sequence. The experiments are executed to validate the presented method ontwo classical databases, and its result shows that MD can represents the information ofpositions of amino acid residues in a protein sequence more effectively.
     2. A feature representation, amino acid composition distribution (AACD), isproposed, and then both an imbalance index and a training algorithm by weightingpenalty coefficients are presented to analyze prediction performance of SVM on theimbalance dataset. The presented method divides a protein sequence equally intomultiple segments, and then calculates AAC of each segment in series. In this way, itcan not only show AAC of each segment, but also reflect their interaction. In theexperiments, it is shown that the information of all segments is more useful than that ofthe whole sequence and AACD can represent the interaction of several segment of aprotein sequence effectively. Besides, the presented training algorithm can lighten thenegative effect derived from the imbalance.
     3. A feature representation, multi-scale energy (MSE), is proposed for theunstationarity of protein physic-chemical signal. The presented method codes a proteinsequence to a digital signal by mapping all residues of the sequence to thecorresponding numerical codes of one amino acid index. Via wavelet transform based on multi-resolution analysis, the mapped signal is decomposed according to Mallatdecomposition algorithm. Consequently, the square root energy factors are calculatedand further joined to a feature vector to represent the approximation and detailinformation of the signal. The experiments are executed to validate the presentedmethod, and its results show that MSE can represent the physic-chemical property ofprotein more effectively and has less computation complexity than other methods.
     4. Based on multiple classifier system (MCS), a novel method for prediction ofprotein subcellular localization is introduced to deal with the case of high dimensionand disagreement of multi-feature. This method can aggregate multiple groups offeatures, fuse the complementary information of patterns, and decrease the uncertaintyof individual classifier. Furthermore, the difficulty of designing a classifier and the highcomputation burden derived from high dimension vector can be avoided. Theexperimental results show that the presented method is better than any individualclassifier, and is more effective and robust thanother methods.
引文
[1] 陈润生.生物信息学,生物物理学报,1999,15(1):5-12.
    [2] 郝柏林,张淑誉.生物信息学手册(第二版)..上海:上海科学技术出版社;2002.
    [3] 郑国清.生物信息学中的数据挖掘与信息隐藏研究,博士后出站报告,博士后,广州:中山大学,2003.
    [4] 贺林.解码生命——人类基因组计划和后基因组计划.北京:科学出版社;2001.
    [5] 张春霆.生物信息学的现状与展望.,世界科技研究与发展,2000,22(6):17-20.
    [6] 陈润生.当前生物信息学的重要研究任务,生物工程进展,1999,19(4):11-14.
    [7] Eisenberg D, Marcotte EM, Xenarios I, Yeates TO. Protein function in the post-genomic era., Nature, 2000, 405(6788):823-826.
    [8] Koonin EV. Bridging the gap between sequence and function., Trends Genet, 2000, 16(1):16.
    [9] Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins: Struct Funct Genet, 2001, 43(3):246-255.
    [10] 李凤敏.蛋白质亚细胞定位的序列分析和理论预测算法研究,硕士学位论文,硕士,呼和浩特:内蒙古大学,2004.
    [11] Fujiwara Y, Asogawa M. Prediction of subcellular localization using amino acid composition and order, Genorne Informatics, 2001, 12:103-112.
    [12] Hartmann T, Bergsdorf CS, R., Tienari PJ, Multhaup G, Ida N, Bieger S, Dyrks T, Weidemann A, Masters CL, Beyreuther K. Alzheimer's disease betaA4 protein release and amyloid precursor protein sorting are regulated by alternative splicing., J Biol Chem, 1996., 271(22): 13208-13214.
    [13] Shurety W, Merino-Trigo A, Brown D, Hume DA, Stow JL. Localization and post-Golgi trafficking of tumor necrosis factor-alpha in macrophages., J Interferon Cytokine Res, 2000, 20(4): 427-438.
    [14] Bryant DM, Stow JL. The ins and outs of E-cadherin trafficking., Trends Cell Biol, 2004., 14(8):427-434.
    [15] Nakai K. Protein sorting signals and prediction of subcellular localization, Adv Protein Chem, 2000, 54:277-344.
    [16] Doerks T, Bairoch A, Bork P. Protein annotation: detective work for function prediction, Trends Genet, 1998, 14(6):248-250.
    [17] Nakai K, Kanehisa M. A knowledge base for predicting protein localization sites in eukaryotic cells, Genomics, 1992, 14(4):897-911.
    [18] Anfinsen CB, Haber E, White. FH. The kinetics of the formation of native ribonuclease during oxidation of the reduced polypetide domain. In, Proceedings of the National Academy of Science of the USA: 1961; 1961: 1309-1314.
    [19] Anfinsen CB. Principles that govern the folding of protein chains., Science, 1973, 181(96): 223-230.
    [20] 来鲁华.蛋白质的结构预测与分子设计.北京:北京大学出版社;1993.
    [21] Donnes P, Hoglund A. Predicting Protein Subcellular Localization: Past, Present, and Future, Geno Prot Bioinfo, 2004, 2(4): 209-215.
    [22] Guda C, Subramaniam S. pTARGET: a new method for predictirig protein subcellular localization in eukaryotes, Bioinformatics, 2005, 21(21):3963-3969.
    [23] Nakashima H, Nishikawa K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J Mol Biol, 1994, 238(1):54-61.
    [24] Cedano J, Aloy P, Perez-pons J, Querol E. Relation between amino acid composition and cellular location of proteins, J Mol Biol, 1997, 266:594-600.
    [25] Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular localization of proteins, Nucleic Acids Research, 1998, 26(9):2230-2236.
    [26] Yuan Z. Prediction of protein subcellular localizations using Markov chain models, FEBS Letters, 1999, 451(1): 23-26.
    [27] Cai YD, Chou KC. Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins, Mol Cell Biol Res Commun, 2000, 4(3):172-173.
    [28] Hua SJ, Sun ZR. Support vector machine approach for protein subcellular localization prediction, Bioinformatics, 2001, 17(8):721-728.
    [29] Cui Q, Jiang T, Liu B, Ma S. EsubS: A novel tool to predict protein subcellular localizations in eukaryotic organisms, BMC Bioinformatics, 2004, 5(1):66-72.
    [30] Cai YD, Liu XJ, Xu XB, Chou KC. Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect, Journal of Cellular Biochemistry, 2002, 84(2):343-348.
    [31] Park KJ, Kanehisa M. Prediction of Protein Subcellular Locations by Support Vector Machines Using Compositions of Amino Acids and Amino Acid Pairs, Bioinformatics, 2003, 19(13):1656-1663.
    [32] Huang Y, Li YD. Prediction of protein subcellular locations using fuzzy k-NN method, Bioinformatics, 2004, 20(1):21-28.
    [33] Yu CS, Lin CJ, Hwang JK. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions, Protein Sci, 2004,13(5):1402-1406.
    
    [34] Andrade MA, O'Donoghue SI, Rost B. Adaptation of protein surfaces to subcellular location, J MolBiol, 1998, 276(2):517-525.
    
    [35] Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang Z, He L. Application of Pseudo Amino Acid Composition for Predicting Protein Subcellular Location: Stochastic Signal Processing Approach, Journal of Protein Chemistry, 2003,22(4):395-402.
    
    [36] Pan YX, Li DW, Duan Y, Zhang ZZ, Xu MQ, Feng GY, He L. Predicting Protein Subcellular Location Using Digital Signal Processing, Acta Biochimica et Biophysica Sinica, 2005,37(2):88-96.
    
    [37] Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC. Using pseudo amino acid composition to predict protein subcellular location:approached with Lyapunov index, Bessel function, and Chebyshev filter, Amino Acids, 2005, 28(4):373-376.
    
    [38] Liu H, Yang, J., Wang, M., Xue, L. Chou, K.C. Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types, Protein J, 2005, 24(6):385-389.
    [39] Xiao X, Shao SH, Ding YS, Huang ZD, Huang Y, Chou KC. Using complexity measure factor to predict protein subcellular location, Amino Acids, 2005,28(1):57-61.
    [40] Shi JY, Zhang SW, Liang. Y, Pan. Q. Prediction of Protein Subcellular Localizations Using Moment Descriptors and Support Vector Machine. In:Ragapakse JC, Wong L, Acharya R (eds.), PRIB: 2006; Hong KongChina:Springer-Verlag Berlin Heidelberg; 2006: 105-114.
    [41] Xiao X, Shao SH, Ding YS, Huang ZD, Chou KC. Using cellular automata images and pseudo amino acid composition to predict protein subcellular location,Amino Acids, 2005,30(1):49-54.
    [42] Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence, J Mol Biol, 2000,300(4):1005-1016.
    [43] Nielsen H, Engelbrecht J, Brunak S, von Heijne G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, Protein Eng,1997,10(1):1-6.
    [44] Emanuelsson O, Nielsen H, von Heijne G. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites.,Protein Sci, 1999, 8(5):978-984.
    [45] Claros MG, Vincens P. Computational method to predict mitochondrially imported proteins and their targeting sequences., Eur J Biochem, 1996,241(3):779-786.
    
    [46] Bannai H, Tamada Y, Maruyama O, Nakai K, Miyano S. Extensive feature detection of N-terminal protein sorting signals., Bioinformatics, 2002,18(2):298-305.
    
    [47] Kawashima S, Ogata H, Kanehisa M. AAindex: Amino Acid Index Database, Nucleic Acids Research, 1999,27(1):368-369.
    
    [48] Petsalaki EI, Bagos PG, Litou ZI, Hamodrakas SJ. PredSL: a tool for the N-terminal sequence-based prediction of protein subcellular localization.,Genomics Proteomics Bioinformatics, 2006, 4(1):48-55.
    
    [49] Marcotte EM, Xenarios I, van Der Bliek A, Eisenberg D. Localizing proteins in the cell from their phylogenetic profiles., Proc NatlAcad Sci USA, 2000,97(22):12115-12120.
    
    [50] Schultz J, Copley RR, Doerks T, Ponting CP, Bork P. SMART: a web-based tool for the study of genetically mobile domains., Nucleic Acids Res, 2000,28(1):231-234.
    
    [51] Mott R, Schultz J, Bork P, Ponting CP. Predicting protein cellular localization using a domain projection method., Genome Res, 2002,12(8): 1168-1174.
    
    [52] Cokol M, Nair R, Rost B. Finding nuclear localization signals., EMBO Rep, 2000,1(5):411-415.
    
    [53] Cai YD, Chou KC. Nearest neighbour algorithm for predicting protein subcellular by combining functional domain composition and pseudo-amino acid composition, Biochem Biophys Res Commun, 2003,305(2):407-411.
    
    [54] Lu Z, Szafron D, Greiner R, Lu P, Wishart D, Poulin B, Anvik J, Macdonell C,Eisner R. Predicting subcellular localization of proteins using machine-learned classifiers., Bioinformatics, 2004,20(4):547-556.
    
    [55] Nair R, Rost B. Inferring sub-cellular localization through automated lexical analysis., Bioinformatics, 2002,18(90001):S78-S86.
    
    [56] Nair R, Rost B. Sequence conserved for subcellular localization., Protein Sci,2002, 11(12):2836-2847.
    
    [57] Scott MS, Thomas DY, Hallett MT. Predicting subcellular localization via protein motif co-occurrence., Genome Res, 2004,14(10A):1957-1966.
    
    [58] Horton P, Nakai K. Better prediction of protein cellular localization sites with the k nearest neighbors classifier. In, Proceeding of Intelligent Systems in Molecular Biology: 1997; Halkidiki, Greece; 1997: 147-152.
    [59] Gardy JL, Spencer C, Wang K, Ester M, Tusnady GE, Simon I, Hua S, deFays K,Lambert C, Nakai K, Brinkman FS. PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria., Nucleic Acids Res, 2003,31(13):3613-3617.
    
    [60] Bhasin M, Raghava GPS. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST,Nucl Acids Res, 2004,32(Web Server):W414-W419.
    
    [61] Drawid A, Gerstein M. A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome,J Mol Biol, 2000,301(4):1059-1075.
    
    [62] Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A,Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR. The Pfam protein families database., Nucleic Acids Res, 2000, 28(Database):263-266.
    
    [63] Guda C, Fahy E, Subramaniam S. MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins, Bioinformatics, 2004,20(11):1785-1794.
    
    [64] Hoglund A, Domes P, Blum T, Adolph H-W, Kohlbacher O. MultiLoc:prediction of protein subcellular localization using N-terminal targeting sequences,sequence motifs and amino acid composition, Bioinformatics, 2006,22(10):1158-1165.
    
    [65] Vapnik V. Statistical Learning Theory. New York: Wiley; 1998.
    
    [66] Cai YD, Liu, X.J. and Chou, K.C. Support vector machines for prediction of protein subcellular location, Mol Cell Biol Res Commun, 2000, 4(4).
    
    [67] Chou KC, Cai YD. Using functional domain composition and support vector machines for prediction of protein subcellular location, J Biol Chem, 2002,.277(48):45765-45769.
    
    [68] Garg A, Bhasin M, Raghava GP. Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search, J Biol Chem, 2005,280(15): 14427-14432.
    
    [69] Matsuda S, Vert JP, Saigo H, Ueda N, Toh H, Akutsu T. A novel representation of protein sequences for prediction of subcellular location using support vector machines, Protein Science, 2005,14(11):2804-2813.
    
    [70] Jain AK, Duin RPW, Mao J. Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000,22(1):4-37.
    
    [71] Duin RPW, Pekalska E. Open issues in pattern recognition. In: Kurzynski M,Puchala E, Wozniak M et al (eds.), Computer Recognition Systems, Advances in soft computing: 2005; Berlin: Springer Verlag; 2005: 27-42.
    [72] Holm L, Sander C. Mapping the protein universe, Science, 1996, 273:595-602.
    [73] 靳利霞.蛋白质结构预测方法研究,博士学位论文,大连:大连理工大学,2002.
    [74] Cendnao J, Aloy, P., Peez-Pons, J.A. and Querol, E. Relation between amino acid composition and cellular location of proteins, J Mol Biol, 1997, 266(3):594-600.
    [75] Guo J, Lin Y, Sun Z. A novel method for protein subcellular localization: Combining residue-couple model and SVM. In, Proceedings of 3rd Asia-Pacific Bioinformatics Conference: January 2005; Singapore; 2005:17-21.
    [76] Feng ZP, Zhang CT. Prediction of the subcellular localization of prokaryotic proteins based on the hydrophobicity index of amino acid, Int J Biol Macromol, 2001, 28(3): 255-261.
    [77] Chou KC. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect, Biochem Biophys Res Commun, 2000, 278(2):477-483.
    [78] Dubchak I MI, Mayor C, et al. Recognition of a protein fold in the context of the SCOP classification, Proteins, 1999, 35(4): 401-407.
    [79] Ding CHQ, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 2001, 17(4):349-358.
    [80] Chinnasamy A, Sung WK, Mittal A. Protein structure and fold prediction using tree-augmented naive bayesian classifier, Journal of Bioinformatics and Computational Biology, 2005, 3(4):803~820.
    [81] 施建宇,潘泉,张绍武,邵壮超,姜涛.基于多特征融合的蛋白质折叠子预测,北京生物医学工程,2006,25(5):510-514.
    [82] Joachims T. Advances in Kernel Methods: Support Vector Machines: MIT Press, Cambridge, MA; 1999.
    [83] Hsu C, Lin CJ. A comparison of methods for multi-class support vector machines, IEEE Transactions on Neural Networks, 2002, 13(2):415-425.
    [84] Brown M, Grundy, W., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, Tr. M. all, D. Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc Natl Acad Sci USA, 2000, 97(1): 262-267.
    [85] Zien A, Ratsch G, Mika S, Scholkopf B, Lemmen C, Smola A, Lengauer T, Muller KR. Engineering support vector machine kernels that recognize translation initiation sites, Bioinformatics, 2000, 16(9):799-807.
    [86] Jaakkola T, Diekhans, M. and Haussler, D. Using the Fisher kernel method to detect remote protein homologies. In, Proceedings of the 7th International Conference on Intelligent systems for Molecular Biology: 1999; Menlo Park, CA: AAAI Press; 1999: 149-158.
    [87] Cai YD, Liu XJ, Xu, X.B. and Chou, K.C. Support vector machines for prediction membrane protein types by incorporating quasi-sequence-order effect, Internet Electron J Mol Des, 2002, 1:219-226.
    [88] 施建宇,潘泉,张绍武,梁彦.基于支持向量机融合网络的蛋白质折叠子识别研究,生物化学与生物物理进展,2006,33(2):155-162.
    [89] Zhang SW, Pan Q, Zhang HC, Zhang YL, Wang HY. Classification of Protein Quaternary Structure with Support Vector Machine, Bioinformatics, 2003, 19(18):2390-2396.
    [90] 张绍武,潘泉,张洪才,张云龙,王海瑜.基于支持向量机和贝叶斯方法的蛋白质四级结构分类研究,生物物理学报,2003,19(2):171-175.
    [91] Suen CY, Nadal C, Mai TA, Legault R, Lam L. Recognition of Totally Unconstrained Handwriting Numerals Based on the Concept of Multiple Experts. In: Y SC (ed.), International Workshop on Frontiers in Handwriting Recognition: 1990; Montreal, Canada; 1990:131-143.
    [92] Kuncheva LI, Bezdek JC, Duin R. Decision Templates for Multiple Classifier Fusion: an Experimental Comparison, Pattern Recognition, 2001, 34(2):299-314.
    [93] Kuncheva LI. Combining Pattern Classifiers: Methods and Algorithms: Wiley; 2004.
    [94] Kittler J, Hatef M, Duin R, Matas J. On Combining Classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(3):226-239.
    [95] Kuncheva LI, Whitaker CJ. Measures of diversity in classifier ensembles, Machine Learning, 2003, 51(2): 181-207.
    [96] Xu L, Krzyzak A, Suen CY. Methods of combining multiple classiers and their application to handwriting recognition, IEEE Transactions on Systems, Man, and Cybernetics, 1992, 22(3):418-435.
    [97] 郝红卫,戴汝为.集成型手写汉字识别方法与系统,中国科学:E辑,1997,27(6):556-559.
    [98] 杨利英,覃征,王卫红.多分类器融合系统设计与应用,计算机工程,2005,31(5):175-177.
    [99] Chen K, Wang, L. and Chi, H. Methods of Combining Multiple Classifiers with Different Features and their Applications to Text-Independent Speaker Identification, International Journal of Pattern Recognition and Artificial Intelligence, 1997, 11(3): 417-445.
    [100] 李士进,郭跃飞.基于多分类器组合的人脸识别,数据采集与处理,2000,15(3):293-296.
    [101] 陈刚,戚飞虎.多分类器结合的人脸识别,上海交通大学学报,2001,35(2):173-176.
    [102] 董火明,高隽,胡良梅.多分类器融合的指纹全局特征协同识别,电路与系统学报,2005,10(3):58-63.
    [103] Rohlfing T, Maurer CR. Multi-classifier framework for atlas-based image segmentation, Pattern Recogn Lett, 2005, 26(13):2070-2079.
    [104] Roli F, Giacinto G, Vernazza G. Comparison and combination of statistical and Neural Network algorithms for remote-sensing image classification. In: Neurocomputation in Remote Sensing Data Analysis. Edited by Kanellopoulos W, Roli, Austin: Springer Verlag; 1997:117-124.
    [105] Briem GJ, Benediktsson JA, Sveinsson. JR. Multiple Classifiers Applied to Multisource Remote Sensing Data, IEEE Transactions on Geoscience and Remote Sensing, 2002, 40(10):2291-2299.
    [106] 柏延臣,王劲峰.结合多分类器的遥感数据专题分类方法研究,遥感学报,2005,9(5):555-563.
    [107] 王飞,李在铭.序列图像中动目标标示的多分类器自适应融合识别,信号处理,2004,20(4):410-412.
    [108] Giacinto G, Paolucci R, Roli F. Application of Neural Networks and Statistical Pattern Recognition Algorithms to Earthquake Risk Evaluations, Pattern Recognition Letters, 1997, 18(11-13): 1353-1362.
    [109] 叶晨洲,杨杰,周越,陈念贻.基于多分类器的通信信号调制方式识别,高技术通讯,2003,13(2):5-9.
    [110] Kim E, Kim, W., and Lee, Y. Combination of multiple classifiers for the customer's purchase behavior prediction, Decision Support Systems, 2003, 34(2):167-175.
    [111] Lin W, Hauptmann A. News video classification using SVM-based multimodal classifiers and combination strategies. In, Proceedings of the Tenth ACM international Conference on Multimedia: 2002; Juan-les-Pins, France: ACM Press, New York; 2002: 323-326.
    [112] Giacinto G, Roli F, Didaci L. Fusion of multiple classifiers for intrusion detection in computer networks, Pattern Recogn Lett, 2003, 24(12): 1795-1803.
    [113] Zhang SW, Pan Q, Zhang He, Shao ZC, Shi JY. Prediction Protein Homo-oligomer Types by Pesudo Amino Acid Composition: Approached with an Improved Feature Extraction and Naive Bayes Feature Fusion, Amino Acids, 2006, 30(4):461-468.
    [114] Chou KC, Shen HB. Predicting protein subcellular location by fusing multiple classifiers., J Cell Biochem, 2006, In press.
    [115] Chou KC, Shen HB. Hum-PLoc: A novel ensemble classifier for predicting human protein subcellular localization., Biochem Biophys Res Commun, 2006, In press.
    [116] Shen HB, Chou KC. Ensemble classifier for protein fold pattern recognition, Bioinformatics, 2006, In press.
    [117] Guo J, Lin YL. TSSub: Eukaryotic protein subcellular localization by extracting features from profiles, Bioinformatics, 2006, In press.
    [118] Hamamoto Y, Uchimura S, Tomita S. A Bootstrap Technique for Nearest Neighbor Classifier Design, IEEE Trans Pattern Anal Mach Intell, 1997, 19(1): 73-79.
    [119] Jain AK, Dubes RC, Chen C-C. Bootstrap Techniques for Error Estimation, IEEE Trans Pattern Analysis and Machine Intelligence, 1987, 9(5): 628-633.
    [120] Fukunaga K. Introduction to Statistical Pattern Recognition, 2 edn. New York: Academic Press; 1990.
    [121] Duda RO,Hart PE,Stork DG.模式分类(Pattern Classification),2 edn.北京:机械工业出版社,中信出版社;2005.
    [122] B.W. Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochim Biophys Acta, 1975,405:442-451.
    [123] Ding CHQ, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks., Bioinformatics, 2001, 17(4): 349-358.
    [124] Chou KC, Cai YD. A new hybrid approach to predict subcellular localization of proteins by incorporating Gene ontology, Biochemical and Biophysical Research Communications, 2003, 311(3): 743-747.
    [125] Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Mille rW, Lipman D. Gapped BLAST and PSI-BLAST: a new generateon of protein database search programs, Nucleic Acids Research, 1997, 25(17).
    [126] Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000, Nucl Acids Res, 2000, 28(1): 45-48.
    [127] Hoglund A, Donnes P, Blum T, Adolph HW, Kohlbacher O. Using N-terminal targeting sequences, amino acid composition, and sequence motifs for predicting protein subcellular localization. In: Andrew Torda SK, Matthias Rarey. (ed.), Proceedings of the German Conference on Bioinformatics: 2005; 2005: 45-59.
    [128] Lodish H, Baltimore,D., Berk,A., Zipursky, S.L., Matsudaira,P. and Damell,J. Molecular Cell Biology, 3rd edn. New York: Scientific American Books; 1995..
    [129] Schneider G, Wrede, P. The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site, J Biophys, 1994, 66(2): 335-344.
    [130] 张绍武.基于支持向量机的蛋白质分类研究,博士学位论文,西安:西北工业大学,2003.
    [131] Freund Y SR. A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 1999, 14(5):771-780.
    [132] Verikas A, Lipnickas A, Malmqvist K, M. Bacauskiene, Gelzinis A. Soft combination of neural classifiers: A comparative study, Pattern Recognit Lett, 1999, 20(4):429-444.
    [133] Kuncheva LI. Switching Between Selection and Fusion in Combining Classifiers:An Experiment, IEEE Transactions on Systems, Man, and Cybernetics, Part B, 2002, 32(2): 146-156.
    [134] Woods K, Jr WPK, Bowyer K. Combination of multiple classifiers using local accuracy estimates, IEEE Transactions of Pattern Analysis and Machine Intelligence, 1997, 19(4): 405-410.
    [135] Giacinto G, Roli F. Dynamic Classifier Selection based on Multiple Classifier Behaviour, Pattern Recognition, 2001, 34(9): 1879-1881.
    [136] Lam L, Suen CY. Application of majority voting to pattern recognition: An analysis of its behavior and performance, IEEE Transactions on Systems, Man, and Cybernetics, 1997, 27(5): 553-568.
    [137] Huang YS, Suen CY. A method of combining multiple experts for the recognition of unconstrained handwritten numerals, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(1): 90-94.
    [138] Ho TX, Jonathan. J H, Srihari SN. Decision Combination in Multiple Classifiers Systems, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16(1): 66-75.
    [139] Kuncheva LI. An application of OWA operators to the aggregation of multiple classication decisions. In: Ordered Weighted Averaging Operators: Theory and Applications. Edited by Yager R, Kacprzyk, J.: Kluwer Academic Publishers; 1997: 330-343.
    [140] Cho SB, Kim JH. Combining multiple neural networks by fuzzy integral and robust classication, IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25:380-384.
    [141] Hashem S. Optimal linear combinations of neural networks, Neural Networks, 1997, 10(4):599-614.
    [142] Rogova G. Combining the Results of Several Neural Network Classifiers, Neural Networks, 1994, 7(5): 777-781.
    [143] Bredensteiner E, Bennet K. Multicategory classification by support vector machines, Computational Optimization and Applications, 1999, 12(1): 53-79.
    [144] Crammer K, Singer Y. On the algorithmic implementation of multiclass kernel-based vector machines, Journal of Machine Learning Research, 2001, 2:265-292.
    [145] Crammer K, Singer Y. On the learnability and design of output codes for multiclass problems, Machine Learning, 2002, 47(2):201-233.
    [146] Kreβel UH. Pairwise classification and support vector machines. In: Scholkopf B, Burges C J, Smola AJ(eds.), Advances in Kernel Methods: Support Vector Learning: 1999: MIT Press, Cambridge, MA; 1999: 255-268.
    [147] Platt J, Cristianini N, Shawe-Taylor J. Large margin DAGs for multiclass classification. In, Advances in Neural Information Processing Systems: 2000; 2000: 547-553.
    [148] Rifin R, Klautau A. In defense of one-vs-all classification, Journal of Machine Learning Research, 2004, 5:101-141.
    [149] 施建宇,潘泉,张绍武,程咏梅.基于氨基酸组成分布的蛋白质同源寡聚体分类研究,生物物理学报,2006,22(1):49-55.
    [150] Chawla NV. C4.5 and Imbalanced Datasets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In, Workshop on Learning from Imbalanced Datasets Ⅱ, International Conference on Machine Learning: 2003: ACM Press; 2003.
    [151] Wang M, Yang J, Liu GP, Xu ZJ, Chou KC. Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition., Protein Eng Des Sel, 2004, 17(6): 509-516.
    [152] Wu G, Chang EY. KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution, IEEE Transactions on Knowledge and Data Engineering, 2005, 17(6):786-795.
    [153] Wu G, Chang EY. Class-boundary alignment for imbalanced dataset learning. In, Workshop on learning from imbalanced datasets, Ⅱ, ICML: 2003; Washington DC; 2003.
    [154] Chawla N, Bowyer K, Hall L, Kegelmeyer WP. Smote:Synthetic Minority Over-Sampling Technique, J Artificial Intelligence and Research, 2002, 16:321-357.
    [155] Kubat M, Matwin S. Addressing the Curse of Imbalanced Data Sets: One-Sided Sampling. In, Proceedings of the Fourteenth International Conference on Machine Learning: 1997; 1997:179-186.
    [156] Lessmann S. Solving Imbalanced Classification Problems with Support Vector Machines. In, International Conference on Artificial Intelligence: 2004: Las Vegas, Nevada, USA: CSREA Press; 2004: 214-220.
    [157] Karakoulas G, Taylor JS. Optimizing Classifiers for Imbalanced Training Sets. In: Kearns M, Solla S, Cobb D (eds.), Advances in Neural Information Processing Systems: 1999: MIT Press; 1999: 253-259.
    [158] Lin Y, Lee Y, Wahba G. Support Vector Machines for Classification in Nonstandard Situations, Machine Learning, 2002, 46:191-202.
    [159] Mallat S. A theory of multiresolution signal decomposition: The wavelet representation, IEEE Trans Pattern Anal Machine Intell, 1989, 11(7): 674-693.
    [160] Mallat S. A Wavelet Tour of Signal Processing, 2 edn: Academic Press; 1999.
    [161] Kuncheva LI, Jain LC. Designing classifier fusion systems by genetic algorithms, IEEE Transactions on Evolutionary Computation, 2000, 4(4): 327-336.
    [162] Webb AR.统计模式识别.北京:电子工业出版社;2004.
    [163] Windeatt T. Diversity Measures for Multiple Classifier System Analysis and Design, Information Fusion, 2004, 6(1): 21-36.
    [164] Fred A. Finding consistent clusters in data partitions. In: Kittler J, Roli F(eds.), 2nd Int Workshop on Multiple Classifier Systems: 2001: Springer Verlag; 2001: 309-318.
    [165] Ghosh J. Multiclassifier systems: back to the future. In: F.Roli JK (ed.), 3rd Int Workshop on Multiple Classifier Systems: 2002: Springer Verlag; 2002: 1-15.
    [166] Bennet K, Demiriz A, Maclin R. Exploiting unlabeled data in ensemble methods. In, Proceedings of the Eighth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining: 2002; Edmonton, Alberta, Canada: ACM Press; 2002: 289-296.
    [167] Roli F. Semi-Supervised Multiple Classifier Systems: Background and Research Directions. In, Proceedings of Multiple Classifiers System: 2005: Springer; 2005: 1-11.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700