基于机器学习方法的蛋白质亚细胞定位预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着生物信息的爆炸性增长,采用实验的方法收集和分析相关的生物信息已远远不能满足实际研究的需要。人们已经迫切认识到,采用智能数据处理技术解决上述问题可以大大节省时间和成本。蛋白质序列信息是这个领域的研究重点之一,本论文运用机器学习方法对蛋白质亚细胞定位预测和蛋白质结构类预测展开研究,主要工作如下:
     1、针对革兰阴性杆菌亚细胞定位预测问题,本文提出了改进的选择性集成Elman神经网络方法。首先以Elman网络作为基底分类器;然后利用多种不同的算法来训练Elman网络,以增加基底分类器的多样性;最后用GASEN算法选择合适的网络进行集成,使集成后的各个网络彼此互补,相互协调。采用氨基酸组成成分分析表示蛋白质序列,在自相容验证、留一法验证和独立测试集验证等三种实验模型上都取得了良好的效果。
     2、针对蛋白质亚细胞定位预测问题,本文构造了一种新颖的亚细胞定位预测系统ELM-PCA,可以预先确定传统的伪氨基酸组成成分分析模型中反映氨基酸序列次序效应的参数。在该系统中,首先让参数λ取最大以包含尽可能多的序列次序信息,然后用主成分分析技术提取关键主特征,最后采用Elman神经网络作为分类器,实验表明ELM-PCA的性能要优于已有的预测系统;同时,将主成分分析技术和伪氨基酸组成模型结合,形成了新的蛋白质表示模型PPseAAC,在几个常用的机器学习算法实验中表明此模型要优于原始模型。
     3、针对蛋白质结构类的预测问题,本文提出了改进的局部线性嵌入映射(LLE)算法,克服了传统局部线性嵌入映射算法在求取最优重构权值时常常出现的奇异现象。改进的算法基于共轭梯度算法,具有有限步收敛的性质,求解过程中不涉及矩阵的逆运算。在此基础上,把此改进的局部线性嵌入映射算法应用于蛋白质结构类的预测,采用k-nn分类器,伪氨基酸组成模型中参数λ值大于序列长度L。在Jackknife实验中,结果显示本方法具有较好的预测性能。
With the explosive growth of biological information, experimental methods of collect-ing and analyzing the related biological information have been far from meeting the needs of the actual research. People have urgently realized that using intelligent data processing techniques to solve the above problem can greatly save time and cost. Protein sequence information is the focus of research in this field. This paper employs machine learning methods to study on protein subcellular localization prediction and protein structural class prediction. The main contributions are described as follows:
     1. An improved selective Elman neural networks ensemble method is proposed for Gram-negative bacterial protein subcellular localization prediction. Firstly, Elman net-work is used as a base classifier:Secondly, many different algorithms are employed to train the Elman network to consider the diversity of the base ensemble; lastly, GASEN algorithm is used to select appropriate networks for ensemble, to make sure the networks can complement and coordinate each other. Meanwhile, amino acid composition is em-ployed to represent the protein sequence. Experimental results show that our method can achieve better performance in the self-consistency test, the jackknife test and the independent data set test.
     2. A novel prediction system ELM-PC A is designed for protein subcellular local-ization prediction, which can determine in advance the parameter value that reflects the protein sequence order effects in the traditional pseudo amino acid composition (PseAAC). Firstly, the parameter A is set to be the maximum to contain the more sequence order information. Secondly, principal component analysis (PCA) is employed to extract the essential features. Finally, the Elman network is used as a classifier. Experimental results show that the system performance is better than other existing systems. Meanwhile, PCA and PseAAC are combined into a new protein representation model PPseAAC. Ex-periments for several common machine learning algorithms show that the new model is superior to the original model.
     3. An improved locally linear embedding (LLE) algorithm is proposed for protein structural class prediction, which can overcome the singular phenomenon via solving the optimal reconstruction weight in traditional LLE algorithm. This improved algorithm is based on the conjugate gradient algorithm, which has convergence property in finite steps and does not involve the inverse matrix. Furthermore, this algorithm is applied in the protein structural class prediction, where the simple k-nn classifier is used and the parameterλof PseAAC is greater than the sequence length L. Experimental results show that the proposed method has better performance in the jackknife test.
引文
[1]Yu C, Lin C, Hwang J. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions [J]. Protein Science. 2004,13(5).
    [2]Tantoso E, Li K. AAIndexLoc:predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices [J]. Amino Acids.2008, 35(2):345-353.
    [3]Zhang T, Ding Y, Chou K. Prediction of protein subcellular location using hydropho-bic patterns of amino acid sequence [J]. Computational Biology and Chemistry.2006, 30(5):367-371.
    [4]张松,黄波,夏学峰,孙之荣.蛋白质亚细胞定位的生物信息学研究[J].生物化学与生物物理进展.2007.34(6):573-579.
    [5]Cai Y. Chou K. Using Neural Networks for Prediction of Subcellular Location of Prokary-otic and Eukaryotic Proteins [J]. Molecular Cell Biology Research Communications.2000. (4):172-173.
    [6]Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence [J]. Journal of Molecular Biology. 2000,300(4):1005-1016.
    [7]Fujiwara Y, Asogawa M. Prediction of subcellular localizations using amino acid compo-sition and order [J]. GENOME INFORMATICS SERIES.2001, (12):103-112.
    [8]Guo J, Lin Y, Sun Z. A novel method for protein subcellular localization based on boosting and probabilistic neural network [C].,2004:21-27.
    [9]Chou K, Elrod D. Protein subcellular location prediction [J]. Protein Engineering Design and Selection.1999,12(2):107-118.
    [10]Zou L. Wang Z, Huang J. Prediction of Subcellular Localization of Eukaryotic Proteins Using Position-Specific Profiles and Neural Network with Weighted Inputs [J]. Journal of Genetics and Genomics.2007,34(12):1080-1087.
    [11]陶士珩.生物信息学[M].北京:科学出版社,2007.
    [12]Chou K. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space [J]. Proteins:Structure, Function, and Genetics.1995,21(4).
    [13]Chou K. A key driving force in determination of protein structural classes [J]. Biochemical and Biophysical Research Communications.1999,264(1):216-224.
    [14]孙啸.陆祖宏,谢建明.生物信息学基础[M].北京:清华大学出版社.2004.
    [15]Levitt M, Chothia C. Structural patterns in globular proteins [J]. Nature.1976, 261(5561):552-558.
    [16]Zhou G. An intriguing controversy over protein structural class prediction [J]. Journal of protein chemistry.1998,17(8):729-738.
    [17]Chen C, Tian Y, Zou X, Cai P, Mo J. Using pseudo-amino acid composition and support vector machine to predict protein structural class [J]. Journal of theoretical biology.2006, 243(3):444-448.
    [18]Shen H, Yang J, Liu X, Chou K. Using supervised fuzzy clustering to predict pro-tein structural classes [J]. Biochemical and Biophysical Research Communications.2005, 334(2):577-581.
    [19]Liu T, Zheng X, Wang J. Prediction of protein structural class using a complexity-based distance measure [J]. Amino Acids.2010,38(3):721-728.
    [20]Chou K, Shen H. Large-scale predictions of Gram-negative bacterial protein subcellular locations [J]. Journal of proteome research.2006.5(12):3420-3428.
    [21]Glory E, Murphy R. Automated subcellular location determination and high-throughput microscopy [J]. Developmental cell.2007,12(1):7-16.
    [22]Chou K, Shen H. Cell-PLoc:a package of Web servers for predicting subcellular localiza-tion of proteins in various organisms [J]. Nature Protocols.2008,3(2):153-162.
    [23]Ding Y, Zhang T. Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins:an approach with immune genetic algorithm-based ensemble classifier [J]. Pattern recognition letters.2008,29(13):1887-1892.
    [24]Lin H, Wang H, Ding H, Chen Y, Li Q. Prediction of Subcellular Localization of Apoptosis Protein Using Chou's Pseudo Amino Acid Composition [J]. Acta biotheoretica.2009. 57(3):321-330.
    [25]Habib T, Zhang C, Yang J, Yang M, Deng Y. Supervised learning method for the predic-tion of subcellular localization of proteins using amino acid and amino acid pair compo-sition [J]. BMC genomics.2008,9(Suppl 1):S16.
    [26]张松,夏学峰,沈金城,孙之荣.基于序列保守性和蛋白质相互作用的真核蛋白质亚细胞定位预测[J].生物化学与生物物理进展.2008,35(5):531-535.
    [27]Shen H, Chou K. Virus-mPLoc:A Fusion Classifier for Viral Protein Subcellular Location Prediction by Incorporating Multiple Sites [J]. Journal of biomolecular structure and dynamics.2010,28(2):175.
    [28]Chou K. Shen H. Cell-PLoc 2.0:an improved package of web-servers for predicting sub-cellular localization of proteins in various organisms [J]. development.2010,109:1091.
    [29]Chou K, Shen H, Newbigin E. Plant-mPLoc:A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization [J]. PloS one.2010.5(6):259-270.
    [30]Briesemeister S. Blum T, Brady S, Lam Y, Kohlbacher O, Shatkay H. SherLoc2:a high-accuracy hybrid method for predicting subcellular localization of proteins [J]. Journal of proteome research.2009,8(11):5363-5366.
    [31]Wang T, Yang J. Using the nonlinear dimensionality reduction method for the prediction of subcellular localization of Gram-negative bacterial proteins [J]. Molecular Diversity. 2009,13(4):475-481.
    [32]Zhang L, Liao B, Li D, Zhu W. A novel representation for apoptosis protein subcellular localization prediction using support vector machine [J]. Journal of theoretical biology. 2009.259(2):361-365.
    [33]Gu Q, Ding Y, Jiang X. Zhang T. Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection [J]. Amino Acids.2010.38(4):975-983.
    [34]Chou K, Cai Y. Using functional domain composition and support vector machines for prediction of protein subcellular location. Journal of Biological Chemistry.2002, 277(48):45765-45769.
    [35]Chou K, Shen H. Hum-PLoc:A novel ensemble classifier for predicting human protein subcellular localization. Biochemical and Biophysical Research Communications.2006, 347(1):150-157.
    [36]Zheng Z, Yang J. Subcellular Localization of Gram-Negative Bacterial Proteins Using Sparse Learning. The protein journal.2010,29(3):195-203.
    [37]Chou K, Shen H. Euk-mPLoc:a fusion classifier for large-scale eukaryotic protein sub-cellular location prediction by incorporating multiple sites. Journal of proteome research. 2007,6(5):1728-1734.
    [38]Jia P. Qian Z. Zeng Z, Cai Y, Li Y. Prediction of subcellular protein localization based on functional domain composition. Biochemical and Biophysical Research Communications. 2007,357(2):366-370.
    [39]Bhasin M, Garg A, Raghava G. PSLpred:prediction of subcellular localization of bacterial proteins [J]. Bioinformatics.2005,21(10):2522.
    [40]Kaundal R, Saini R, Zhao P. Combining Machine Learning and Homology-Based Ap-proaches to Accurately Predict Subcellular Localization in Arabidopsis [J]. Plant Physi-ology.2010,154(1):36.
    [41]Guo J, Lin Y, Liu X. GNBSL:A new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics.2006,6(19):5099-5105.
    [42]Nakai K, Kanehisa M. Expert system for predicting protein localization sites in gram-negative bacteria [J]. Proteins:Structure, Function, and Genetics.1991,11(2).
    [43]Nakai K, Kanehisa M. A knowledge base for predicting protein localization sites in eu-karyotic cells [J]. Genomics.1992,14(4):897.
    [44]Chou K, Cai Y. A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology [J]. Biochemical and Biophysical Research Communications. 2003,311(3):743-747.
    [45]Nakashima H, Nishikawa K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies [J]. Journal of Molecular Biology. 1994,238(1):54.
    [46]Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins [J]. Nucleic Acids Research.1998,26(9):2230.
    [47]李凤敏,李前忠.蛋白质亚细胞定位的识别[J].生物物理学报.2004,20(4):297-306.
    [48]Niu B. Jin Y, Feng K, Lu W, Cai Y. Li G. Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins [J]. Molecular Diversity.2008,12(1):41-45.
    [49]Gardy J, Spencer C, Wang K, Ester M. Tusnady G, Simon I, Hua S, deFays K, Lambert C, Nakai K. PSORT-B:Improving protein subcellular localization prediction for Gram-negative bacteria [J]. Nucleic Acids Research.2003,31(13):3613.
    [50]Gardy J, Laird M, Chen F, Rey S, Walsh C, Ester M, Brinkman F. PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis [J]. Bioinformatics.2005,21(5):617.
    [51]Yu N, Wagner J, Laird M, Melli G, Rey S, Lo R, Dao P, Sahinalp S, Ester M, Foster L. PSORTb 3.0:improved protein subcellular localization prediction with refined local-ization subcategories and predictive capabilities for all prokaryotes [J]. Bioinformatics. 26(13):1608.
    [52]Wang J, Sung W, Krishnan A, Li K. Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines [J]. BMC Bioinformatics.2005,6(1):174.
    [53]Shen H, Chou K. Gneg-mPLoc:A top-down strategy to enhance the quality of predict-ing subcellular localization of Gram-negative bacterial proteins [J]. Journal of theoretical biology.2010,264(2):326-333.
    |54] Hopfield J. Neural networks and physical systems with emergent collective computational abilities [C].1982,79(8):2554.
    [55]Cai Y, Liu X, Chou K. Artificial neural network model for predicting protein subcellular location [J]. Computers and chemistry.2002.26(2):179-182.
    [56]Breiman L. Bagging predictors [J]. Machine Learning.1996,24(2):123-140.
    [57]Freund Y, Schapire R. A desicion-theoretic generalization of on-line learning and an ap-plication to boosting [J]. Lecture Notes in Computer Science,1995:23-37.
    [58]Zhou Z, Wu J, Tang W. Ensembling neural networks:Many could be better than all [J]. Artificial Intelligence.2002,137(1-2):239-263.
    [59]Chou K. Prediction of protein cellular attributes using pseudo-amino acid composition [J]. Proteins:Structure, Function, and Genetics.2001,43(3).
    [60]Park K, Kanehisa M. Prediction of protein subcellular locations by support vector ma-chines using compositions of amino acids and amino acid pairs [J]. Bioinformatics.2003, 19(13):1656.
    [61]Feng Z, Zhang C. A graphic representation of protein primary structure and its application in predicting subcellular locations of prokaryotic proteins [J]. Int. J. Biochem. Cell Biol. 2002,34:298-307.
    [62]Gao Q, Wang Z, Yan C, Du Y. Prediction of protein subcellular location using a combined feature of sequence [J]. FEBS letters.2005,579(16):3444-3448.
    [63]Chou K, Cai Y. Predicting protein localization in budding yeast [J]. Bioinformatics.2005, 21(7):944-950.
    [64]Chou K, Cai Y. Prediction and classification of protein subcellular location—sequence order effect and pseudo amino acid composition [J]. Journal of Cellular Biochemistry. 2003.90(6):1250-1260.
    [65]Xiao X. Shao S, Ding Y, Huang Z, Huang Y, Chou K. Using complexity measure factor to predict protein subcellular location [J]. Amino Acids.2005,28(1):57-61.
    [66]Gao Y, Shao S, Xiao X. Ding Y, Huang Y, Huang Z. Chou K. Using pseudo amino acid composition to predict protein subcellular location:approached with Lyapunov index, Bessel function, and Chebyshev filter [J]. Amino Acids.2005,28(4):373-376.
    [67]Zhou X, Chen C, Li Z, Zou X. Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes [J]. Journal of theoretical biology.2007,248(3):546-551.
    [68]Zhang T, Ding Y, Chou K. Prediction protein structural classes with pseudo-amino acid composition:Approximate entropy and hydrophobicity pattern [J]. Journal of theoretical biology.2008,250(1):186-193.
    [69]Shen H, Chou K. A top-down approach to enhance the power of predicting human protein subcellular localization:Hum-mPLoc 2.0 [J]. Analytical Biochemistry.2009,394(2):269-274.
    [70]孙亮,禹晶.模式识别原理[M].北京:北京工业大学出版社,2009.
    [71]Shi X, Liang Y, Lee H, Lin W, Xu X, Lim S. Improved Elman networks and applications for controlling ultrasonic motors [J]. Applied Artificial Intelligence.2004,18:603-629.
    [72]Lin H, Li Q. Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components [J]. Journal of Computational Chemistry.2007,28(9):1463-1466.
    [73]Xiao X, Shao S, Huang Z, Chou K. Using pseudo amino acid composition to predict protein structural classes:approached with complexity measure factor [J]. Journal of Computational Chemistry.2006,27(4):478-482.
    [74]Luo R, Feng Z, Liu J. Prediction of protein structural class by amino acid and polypeptide composition [J]. European Journal of Biochemistry.2002,269(17):4219-4225.
    [75]Zhang T, Ding Y. Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes [J]. Amino Acids.2007,33(4):623-629.
    [76]Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K. Prediction of protein structural class with Rough Sets [J]. BMC Bioinformatics.2006,7(1):20.
    [77]Kedarisetti K, Kurgan L, Dick S. Classifier ensembles for protein structural class predic-tion with varying homology [J]. Biochemical and Biophysical Research Communications. 2006,348(3):981-988.
    [78]Wang Z, Yuan Z. How good is prediction of protein structural class by the component-coupled method [J]. Proteins:Structure, Function, and Bioinformatics.2000,38(2):165-175.
    [79]Chen C, Chen L, Zou X, Cai P. Predicting protein structural class based on multi-features fusion [J]. Journal of theoretical biology.2008,253(2):388-392.
    [80]张春霆.蛋白质结构分类与结构类预测研究[J].中国科学基金.2000,14(5):298-299.
    [81]李晓琴,罗辽复.氨基酸组成聚类,蛋白质结构型和结构型的预测[J].生物物理学报.1998,14(4):729-736.
    [82]Costantini S, Facchiano A. Prediction of the protein structural class by specific peptide frequencies [J]. Biochimie.2009,91(2):226-229.
    [83]Yang J, Peng Z, Yu Z, Zhang R, Anh V, Wang D. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation [J]. Journal of theoretical biology.2009,257(4):618-626.
    [84]Nakashima H. Nishikawa K, Ooi T. The folding type of a protein is relevant to the amino acid composition [J]. Journal of biochemistry.1986,99(1):153.
    [85]Cai Y, Zhou G. Prediction of protein structural classes by neural network [J]. Biochimie. 2000,82(8):783-5.
    [86]Cai Y, Liu X, Xu X, Zhou G. Support vector machines for predicting protein structural class [J]. BMC Bioinformatics.2001,2(1):3.
    [87]Niu B, Cai Y, Lu W, Li G, Chou K. Predicting protein structural class with AdaBoost learner [J]. Protein and Peptide Letters.2006,13(5):489-492.
    [88]Chen K, Kurgan L, Ruan J. Prediction of protein structural class using novel evolutionary collocation-based sequence representation [J]. Journal of Computational Chemistry.2008, 29(10):1596-1604.
    [89]Cai Y. Is it a paradox or misinterpretation?. Proteins:Structure, Function, and Bioinfor-matics [J].2001,43(3):336-338.
    [90]Zhou G, Assa-Munt N. Some insights into protein structural class prediction [J]. Proteins: Structure. Function, and Bioinformatics.2001,44(1):57-59.
    [91]Chou K. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J]. Bioinformatics.2005,21(1):10-19.
    [92]Du P, Li Y. Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence [J]. BMC Bioinformatics.2006,7(1):518.
    [93]Lin H. The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition [J]. Journal of theoretical biology.2008, 252(2):350-356.
    [94]Chen C, Chen L, Zou X, Cai P. Prediction of Protein Secondary Structure Content by Us-ing the Concept of Chous Pseudo Amino Acid Composition and Support Vector Machine [J]. Protein and Peptide Letters.2009,16(1):27-31.
    [95]Kurgan L, Homaeian L. Prediction of structural classes for protein sequences and domains-Impact of prediction algorithms, sequence representation and homology, and test proce-dures on accuracy [J]. Pattern Recognition.2006,39(12):2323-2343.
    [96]Tenenbaum J, Silva V, Langford J. A global geometric framework for nonlinear dimen-sionality reduction [J]. Science.2000,290(5500):2319-2323.
    [97]Roweis S, Saul L. Nonlinear dimensionality reduction by locally linear embedding [J]. Science.2000,290(5500):2323-2326.
    [98]Seung H, Lee D. The manifold ways of perception [J]. Science.2000,290(5500):2268-2269.
    [99]Zhang Z, Zha H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment [J]. Journal of Shanghai University (English Edition).2004,8(4):406-424.
    [100]赵连伟,罗四维,赵艳敞,刘蕴辉.高维数据流形的低维嵌入及嵌入维数研究[J]. Journal of Software.2005,16(8):1423-1430.
    [101]何力,张军平.周志华.基于放大因子和延仲方向研究流形学习算法[J].计算机学报.2005,28(012):2000-2009.
    [102]王钰,周志华.周傲英.机器学习及其应用[M].北京:清华大学出版社.2006.
    [103]DeCoste D. Visualizing Mercel kernel feature spaces via kernelized locally linear em-bedding [C]. Proceeding of the Eighth International Conference on Neural Information processing, Shanghai,China,2001.
    [104]Saul L, Roweis S. Think globally, fit locally:unsupervised learning of low dimensional manifolds [J]. The Journal of Machine Learning Research.2003,4(2):119-155.
    [105]Bengio Y, Paiement J, Vincent P, Delalleau O, Le Roux N, Ouimet M. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering [C]. Advances in neural information processing systems,2004:177-184.
    [106]Chang H, Yeung D. Robust locally linear embedding [J]. Pattern Recognition.2006, 39(6):1053-1065.
    [107]Yang J, Zhang D, Niu B. Globally maximizing, locally minimizing:unsupervised discrim-inant projection with applications to face and palm biometrics [J]. IEEE Transactions on pattern analysis and machine intelligence.2007,29(4):650-664.
    [108]Zhang J, He L, Zhou Z. Ensemble-Based discriminant manifold learning for face recogni-tion [C].
    [109]Li X, Lin S, Yan S, Xu D. Discriminant locally linear embedding with high-order tensor data [J]. Systems, Man, and Cybernetics, Part B:Cybernetics, IEEE Transactions on. 2008,38(2):342-352.
    [110]施光燕,钱伟懿:庞丽萍.最优化方法[M].北京:高等教育出版社.2007.
    [111]Elman J. Finding structure in time [J]. Connectionist Psychology:A Text with Readings. 1999,289.
    [112]Koker R. Design and performance of an intelligent predictive controller for a six-degree-of-freedom robot using the Elman network [J]. Information Sciences.2006,176(12):1781-1799.
    [113]Kearns M, Valiant L. Learning boolean formulae or factoring [R]. Cambridge, MA:Har-vard University, Aiken Computation Laboratory.1988.
    [114]Schapire R. The strength of weak learnability [J]. Machine Learning.1990,5(2):197-227.
    [115]Tang EK, Suganthan PN, Yao X. An analysis of diversity measures [J]. Marchine learning. 2006,65:247-271.
    [116]Kuncheva L, Whitaker C. Measures of diversity in classifier ensembles and their relation-ship with the ensemble accuracy [J]. Machine Learning.2003,51(2):181-207.
    [117]Dietterich TG. Ensemble methods in machine learning [C].2000,1-15.
    [118]Kolen J, Pollack J. Back propagation is sensitive to initial conditions [J]. Complex Systems. 1990.4(3):269-280.
    [119]Partridge D. Network generalization differences quantified [J]. Neural Networks.1996, 9(2):263-271.
    [120]杨淑莹.模式识别与智能系统-Matlab技术实现[M].北京:电子工业出版社,2009.
    [121]Cedano J, Aloy P, Perez-Pons J, Querol E. Relation between amino acid composition and cellular location of proteins [J]. Journal of Molecular Biology.1997,266(3):594-600.
    [122]Chou K, Shen H. A new method for predicting the subcellular localization of eukary-otic proteins with both single and multiple sites:Euk-mPLoc 2.0 [J]. PloS one.2010, 5(4):e9931.
    [123]Ma J. Liu W, Gu H. Using Elman Networks Ensemble for Protein Subnuclear Location Prediction [J]. International Journal of Innovative Computating, Information and Control. 2010,6(11):5093-5103.
    [124]李国正,杨杰,孔安生,陈念贻.基于聚类算法的选择性神经网络集成[J].复旦学报:自然科学版.2004,43(00.5):689-691.
    [125]Zhou G, Doctor K. Subcellular location prediction of apoptosis proteins [J]. Proteins: Structure, Function, and Genetics.2003,50(1).
    [126]Xiao X, Shao S, Ding Y, Huang Z, Chou K. Using cellular automata images and pseudo amino acid composition to predict protein subcellular location [J]. Amino Acids.2006, 30(1):49-54.
    [127]Mundra P, Kumar M, Kumar K, Jayaraman V, Kulkarni B. Using pseudo amino acid com-position to predict, protein subnuclear localization:Approached with PSSM [J]. Pattern recognition letters.2007,28(13):1610-1615.
    [128]Jolliffe I. Principal component analysis [M]. Springer verlag,2002.
    [129]Shen H, Chou K. Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition [J]. Biochemical and Biophysical Research Communications.2005,337(3):752-756.
    [130]Kim J. Raghava G, Bang S, Choi S. Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine [J]. Pattern recognition letters. 2006,27(9):996-1001.
    [131]Mizianty M, Kurgan L. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences [J]. BMC Bioinformatics.2009,10(1):414.
    [132]Dehling H, Fleurke S, Kulske C. Parking on a random tree [J]. Journal of Statistical Physics.2008,133(1):151-157.
    [133]Freund Y. Schapire R. Experiments with a. new boosting algorithm [C]. MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE,1996:148-156.
    [134]Witten I, Frank E. Data Mining:Practical machine learning tools and techniques [M]. Morgan Kaufmann Pub,2005.
    [135]Yousef M. Jung S, Kossenkov A. Showe L, Showe M. Naive Bayes for microRNA target predictions machine learning for microRNA targets [J]. Bioinformatics.2007,23(22):2987.
    [136]Ma J, Gu H. A novel method for predicting protein subcellular localization based on pseudo amino acid composition [J]. Journal of Biochemistry and Molecular Biology.2010, 43(10):670-676.
    [137]Chou K, Shen H. Recent progress in protein subcellular location prediction [J]. Analytical Biochemistry.2007,370(1):1-16.
    [138]Anand A, Pugalenthi G, Suganthan P. Predicting protein structural class by SVM with class-wise optimized features and decision probabilities [J]. Journal of theoretical biology. 2008.253(2):375-380.
    [139]Breiman L, Friedman J, Olshen R, Stone C. Classification and regression trees [M]. Chap-man and Hall/CRC,1984.
    [140]Safavian S, Landgrebe D. A survey of decision tree classifier methodology [J]. Systems. Man and Cybernetics, IEEE Transactions on.2002,21(3):660-674.
    [141]Bel L, Allard D, Laurent J, Cheddadi R, Bar-Hen A. CART algorithm for spatial data: Application to environmental and ecological data [J]. Computational Statistics and Data Analysis.2009,53(8):3082-3093.
    [142]Bishop C. Pattern recognition and machine learning [M]. Springer New York,2006.
    [143]Zhang M, Zhou Z. A k-nearest neighbor based algorithm for multi-label classification [C]. IEEE International Conference on Granular Computing,2005:718-721.
    [144]Yi T. Lander E. Protein secondary structure prediction using nearest-neighbor methods [J]. Journal of Molecular Biology.1993,232(4):1117-1129.
    [145]Kim S. Protein β-turn prediction using nearest-neighbor method [J]. Bioinformatics.2004, 20(1):40.
    [146]Horton P, Nakai K. Better prediction of protein cellular localization sites with the k nearest neighbors classifier [C]. Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology,1997:147-152.
    [147]Veenman C, Reinders M. The nearest subclass classifier:A compromise between the nearest mean and nearest neighbor classifier [J]. IEEE Transactions on pattern analysis and machine intelligence.2005,1417-1429.
    [148]Feng K, Cai Y. Chou K. Boosting classifier for predicting protein domain structural class [J]. Biochemical and Biophysical Research Communications.2005,334(1):213-217.
    [149]Chen C, Zhou X, Tian Y, Zou X, Cai P. Predicting protein structural class with pseudo-ammo acid composition and support vector machine fusion network [J]. Analytical Bio-chemistry.2006,357(1):116-121.
    [150]Jahandideh S, Abdolmaleki P, Jahandideh M. Asadabadi E. Novel two-stage hybrid neural discriminant model for predicting proteins structural classes [J]. Biophysical chemistry. 2007.128(1):87-93.
    [151]Jahandideh S, Abdolmaleki P, Jahandideh M, Hayatshahi S. Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes [J]. Journal of theoretical biology.2007,244(2):275-281.
    [152]Raghuraj R. Lakshminarayanan S. Variable predictive model based classification algo-rithm for effective separation of protein structural classes [J]. Computational Biology and Chemistry.2008,32(4):302-306.
    [153]Li Z, Zhou X, Lin Y, Zou X. Prediction of protein structure class by coupling improved genetic algorithm and support vector machine [J]. Amino Acids.2008.35(3):581-590.
    [154]Chou K, Maggiora G. Domain structural class prediction. Domain structural class predic- tion.1998, 11(7):523-538.
    [155]Bu W, Feng Z, Zhang Z, Zhang C. Prediction of protein (domain) structural classes based on amin-acid index. European Journal of Biochemistry.1999,266(3):1043-1049.
    [156]Cai Y, Liu X, Xu X, Chou K. Support vector machines for prediction of protein domain structural class. Journal of theoretical biology.2003,221(1):115-120.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700