用户名: 密码: 验证码:
基于集成学习算法的若干生物信息学问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
20世纪后期,由于人类等生物物种基因组学以及生物科学技术的飞速发展,生物信息发生了惊人的增长,这极大地丰富了生物科学的数据资源,并随之诞生了一门新兴的交叉学科:生物信息学,其目的在于通过对生物学实验数据的获取、加工、存储、检索与分析,揭示数据所蕴含的生物学意义。数据挖掘技术用于在数据中发现潜在有用的知识,在生物信息学研究当中,正发挥着越来越重要的作用,而且取得了丰硕的成果。本文应用集成学习方法来对生物信息中的若干问题进行讨论。本文的主体工作分为四个部分:
     1.用集成学习算法预测蛋白质结构和功能定位。随着生物技术的不断发展,越来越多的蛋白质序列被测定出来,探索利用理论及计算方法来研究蛋白质结构和功能定位具有重要意义。本文从蛋白质的一级序列出发,基于氨基酸组成进行蛋白质序列特征编码,使用了AdaBoost与Bagging这两种集成学习算法来对蛋白质的结构类型、膜蛋白类型和蛋白质亚细胞定位进行预测。在建模过程中,分别使用了RandomForest,KNN和C4.5三种不同的弱学习算法来作为基本分类器,并用基于10组交叉验证法的计算结果对建模参数进行优化。结果表明:(1)用AdaBoost-RandomForest算法预测蛋白质结构类型时,预测结果良好,对于所选用的两个标准数据集,其留一法预报准确率分别可以达到94.18%和85.9%,优于先前文献报导的预报结果;(2)用AdaBoost-C4.5算法预测原核和真核蛋白亚细胞定位时,其留一法预报准确率分别达到91.80%和80.80%,优于先前文献报导的预报结果;(3)用Bagging-KNN算法预测膜蛋白类型问题时,其留一法预报准确率可以达到84.42%,优于先前文献报导的预报结果。根据以上所建立的预测模型,我们同时开发了相应的在线预报系统。
     2.用集成学习算法研究小分子的生物功能。研究小分子生物功能,在分子生物学领域能帮助人类理解生命现象,在医学领域帮助人类认识疾病机理。由于通过实验来发现小分子的生物功能会耗费大量的人力、物力和财力,且具有一定的盲目性和风险性,因此,用集成学习方法来研究这个问题具有实际意义。本文中我们首先研究了小分子代谢途径类型的预测问题,提出了基于官能团组成的小分子编码方法,用AdaBoost-C4.5算法建模,其交叉验证预报准确率达到74.05%,对独立测试集的预报准确率达到75.11%。然后,我们又研究了小分子与酶相互作用的预测问题,用AdaBoost-C4.5算法建模,其交叉验证预报准确率达到81.76%,对独立测试集的预报准确率达到83.35%。结果表明,集成学习算法可以用来研究小分子的生物功能,所建模型有很好的预测性能。此外,我们根据所建立的小分子代谢途径类型和小分子与酶相互作用的预测模型,同时开发了相应的在线预报系统。
     3.运用集成学习算法AdaBoost来研究苯酚类化合物毒性机理预测的问题。我们从文献中收集了274个苯酚化合物,计算了45个分子描述符,用基于互信息增益的CFS(Correlation-based Feature Subset)算法筛选出9个分子描述符。基于这9个描述符,我们分别以C4.5,RandomTree,RandomForest和KNN四种算法作为基本分类器建立AdaBoost模型,经过优化和验证后,最终选用C4.5为基本分类器建模。最后,又与SVM和KNN算法的预报性能做了比较,结果表明AdaBoost算法在苯酚类化合物毒性机理预测中,有良好的预报能力,其交叉验证和对独立测试集的预报准确率分别达到96.3%和92.8%。基于该研究内容,建立了相应的在线预报系统。
     4.使用mRMR- KNN集成方法研究HIV-1蛋白酶的裂解位点预测。首先,使用AAindex的531个氨基酸残基指数对8肽进行编码,然后使用mRMR特征筛选方法得到了500个特征。在此基础上,使用改进的Wrapper搜索方法得到了含有364个特征的子集。最后用最近邻方法(KNN)建模预测HIV-1蛋白酶裂解位点,其留一法测试和对独立测试集的预报准确率分别可以达到91.3%和87.3%。通过对500个特征进行生物学分析,我们发现:(1)P1位点和P2’位点对于HIV-1蛋白酶底物的特异性所作贡献最大, (2)P1位点上的氨基酸残基主要是疏水性残基,而P2’位点上的氨基酸残基主要由二级结构决定,以上两点结论与先前通过实验所得到的文献结论相吻合。本工作结果表明: mRMR方法结合改进的Wrapper方法能够对生物数据集进行有效的特征筛选;在此基础上建模,不仅可以得到满意的预测结果,而且所选的特征具有生物学意义。因此,mRMR方法有望成为生物信息学领域特征筛选的一个重要方法。
In the late 20th century, with the rapid development of bioscience techniques、human genomics and other life genomics, the information of biology increased with surprising speed, which greatly enriched the bioinformation resource and led to the birth of bioinformatics. In Bioioformatics, researchers try to discover encyclopedic biological knowledge by captureing, managing, depositing, retrieving and analyzing biological information. Data mining technology is used to extract potential and useful information from the databases, and is playing an increasingly important role in the study of bioioformatics. In this paper, ensemble learning methods were used to investigagete some topics of bioinformatics. The main work of the paper contains following four parts:
     1. Using ensemble learning algorithm to study the prediction of protein structure and function types. With the success of human genome project, the protein sequences entering into the data banks are rapidly increasing. The structures and functions of these proteins may be determined by means of experiments, but it is very time-consuming and almost impossible. Thus the scientists have being sought after the theoretical or computational methods for predicting the structures and functions of proteins. AdaBoost and Bagging were employed to classify or predict protein structures and function locations based on sequence amino acid composition in this dissertation. During the modeling process, four different weak machine learning mtehod were used to build model, and the modeling parameters were optimized based on the results of cross-validation of the models. The results show that: (1) The best model with prediction accuracies of 94.18% and 85.90% were obtained by using AdaBoost-RandomForest in leave-one-out cross-validation for two standard data set of protein structure, respectively; (2) The best models with prediction accuracy of 91.80% and 80.80% were obtained by using AdaBoost-C4.5 in leave-one-out cross-validation for subcellular location of Prokaryotic and Eukaryotic Proteins, respectively;(3) The best model with a correct rate of 84.42% was obtained by using Bagging-KNN in leave-one-out cross-validation for membrane protein. All the prediction accuracies by using ensembe learning method are better than the previous results reported. Based on the models of predicting subcellular location and membrane protein, two corresponding online web servers were established.
     2. Using ensemble learning algorithm to study the prediction of small molecules’metabolic pathways and small molecule and enzyme interaction-ness. Firstly, based on AdaBoost method and featured by function group composition, a novel approach is proposed to quickly map the small chemical molecules back to the possible metabolic pathway that they belong to. As a result, 10 folds cross validation test and independent set test on the model reached 74.05% and 75.11%, respectively. Secondly, based on above research, we try to use amino acid physicochemical properties to code enzyme, resulting in totally 160 features. These features are input into AdaBoost classifier to predict the interaction-ness. As a result, the overall prediction accuracies, tested by 10-folds cross-validation and independent set, are 81.76% and 83.35%, respectively. Based on the models of prediction of small molecules’metabolic pathways, small molecule and enzyme interaction-ness, two corresponding online web servers were built.
     3. AdaBoost Learner is employed to investigate toxic action mechanisms of phenols based on molecular descriptors. 274 phenols from different references were collected, and 45 descriptors were calculated. Firstly, 9 descriptors were selected by using CFS (Correlation-based Feature Subset) method. Then C4.5,RandomTree,RandomForest and K nearest neighbors (KNNs) were employed as basic classifiers of AdaBoost to build the model, and C4.5 is selected. Finally, the performance of AdaBoost Learner is compared with support vector machine (SVM) and, KNN which are the most common algorithms used for SARs analysis. As a result, AdaBoost Learner performed better than SVM and KNNs in predicting the mechanism of toxicity of phenols based on molecular descriptors. It can be concluded that AdaBoost has a potential to improve the performance of SARs analysis. We also developed an online web server for the prediction of ecotoxicity mechanisms of phenols.
     4. Knowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. Recently, a number of classifier creation and combination methods were proposed to approach the HIV-1 protease specificity problem. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease. In this work, we selected HIV-1 protease as the subject of the study. Two hundred ninety-nine oligopeptides were chosen for the training set, while the other sixty-three oligopeptides were taken as a test set. The peptides are represented by features constructed by AAindex. The mRMR method (Maximum Relevance, Minimum Redundancy) combining with Incremental Feature Selection (IFS) and Feature Forward Search (FFS) are applied to find the 2 important cleavage sites and to select 364 important biochemistry features by jackknife test. Using KNN (K-nearest neighbours) with selected features, the prediction model with high accuracy rates of 91.3% and 87.3% were obtained for Jackknife cross-validation test and independent-set test, respectively. It is expected that our feature selection scheme can be used as a useful assistant technique for finding effective inhibitors of HIV protease.
引文
【1】Dal P A, Dovier A, Will S. Introduction to the special issue on bioinformatics and constraints. Constraints[J]. 2008, 13: 1-2.
    【2】Kumar K. Introduction to bioinformatics. Journal of the Royal Statistical Society Series a-Statistics in Society[J]. 2008, 171: 761-762.
    【3】Xu Y, Markstein P. Computational Systems Bioinformatics Conference 2007. Introduction. J Bioinform Comput Biol[J]. 2008, 6: v-vi.
    【4】Akalin PK. Introduction to bioinformatics. Molecular Nutrition & Food Research[J]. 2006, 50: 610-619.
    【5】Broom M. Statistical methods in bioinformatics: an introduction. Journal of the Royal Statistical Society Series a-Statistics in Society[J]. 2006, 169: 170-170.
    【6】Jones DT, Sternberg MJE, Thornton JM. Introduction. Bioinformatics: from molecules to systems. Philosophical Transactions of the Royal Society B-Biological Sciences[J]. 2006, 361: 389-391.
    【7】Magoulas G, Dounias G. Introduction to the special issue on intelligent technologies in medicine and bioinformatics. Computers in Biology and Medicine[J]. 2006, 36: 1045-1048.
    【8】Broom M. Introduction to mathematical methods in bioinformatics. Journal of the Royal Statistical Society Series a-Statistics in Society[J]. 2005, 168: 461-462.
    【9】张阳德.生物信息学[M].北京:科学出版社,2004:20-30。
    【10】张成岗,贺福初.生物信息学方法与实践[M].北京:科学出版社, 2002:50-60。
    【11】王哲.生物信息学概论[M].北京:第四军医大学出版,26-29:2002。
    【12】Benson DA, Boguski MS, Lipman DJ, Ostell J. GenBank. Nucleic Acids Research[J]. 1997, 25: 1-6.
    【13】Stoesser G, Sterk P, Tuli MA, Stoehr PJ, Cameron GN. The EMBL Nucleotide Sequence Database. Nucleic Acids Research[J]. 1997, 25: 7-13.
    【14】Tateno Y, Gojobori T. DNA Data Bank of Japan in the age of information biology. Nucleic Acids Research[J]. 1997, 25: 14-17.
    【15】Tateno Y, Gojobori T. [Genome biology and DNA data bank]. Tanpakushitsu Kakusan Koso[J]. 1997, 42: 3052-3061.
    【16】Bairoch A, Apweller R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Research[J]. 1997, 25: 31-36.
    【17】Sidman KE, George DG, Barker WC, Hunt LT. The Protein Identification Resource (Pir). Nucleic Acids Research[J]. 1988, 16: 1869-1871.
    【18】张春霆.生物信息学的现状与展望.世界科技研究与发展[J]. 2000, 6: 17-20.
    【19】Lu G, Ni J. Highlighting computations in bioscience and bioinformatics: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB07). BMC Bioinformatics[J]. 2008, 9 Suppl 6: S1.
    【20】Wang J. Computational biology of genome expression and regulation - A review of microarray bioinformatics. Journal of Environmental Pathology Toxicology and Oncology[J]. 2008, 27: 157-179.
    【21】Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics[J]. 2007, 23: 2507-2517.
    【22】Deng YP, Ni J, Zhang CY. Development of computations in bioscience and bioinformatics and its application: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB06). Bmc Bioinformatics[J]. 2006, 7
    【23】Ezziane Z. Applications of artificial intelligence in bioinformatics: A review. Expert Systems with Applications[J]. 2006, 30: 2-10.
    【24】Pal SK, Bandyopadhyay S, Ray SS. Evolutionary computation in bioinformatics: A review. Ieee Transactions on Systems Man and Cybernetics Part C-Applications and Reviews[J]. 2006, 36: 601-615.
    【25】Teufel A, Krupp M, Weinmann A, Galle PR. Current bioinformatics tools in genomic biomedical research (Review). International Journal of Molecular Medicine[J]. 2006, 17: 967-973.
    【26】Liew AWC, Yan H, Yang MS. Pattern recognition techniques for the emerging field of bioinformatics: A review. Pattern Recognition[J]. 2005, 38: 2055-2073.
    【27】Baldi P, Brunk S. Bioinformatics: The Machine Learning Approach[M].London:MIT Press, 2001:55-66.
    【28】Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol[J]. 1990, 215: 403-410.
    【29】Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research[J]. 1997, 25: 3389-3402.
    【30】Pearson WR, Lipman DJ. Improved Tools for Biological Sequence Comparison. Proceedings of the National Academy of Sciences of the United States of America[J]. 1988, 85: 2444-2448.
    【31】Pearson WR. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol[J]. 2000, 132: 185-219.
    【32】Thompson JD, Higgins DG, Gibson TJ. Clustal-W - Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Research[J]. 1994, 22: 4673-4680.
    【33】陈凯,朱钰.机器学习及其相关算法综述.统计与信息论坛[J]. 2007, 22: 105-112.
    【34】张晓龙,杨艳霞.机器学习在生物信息学中的应用.武汉科技大学学报(自然科学版)[J]. 2005, 28: 201-204.
    【35】Bansal AK. Bioinformatics in microbial biotechnology - a mini review. Microbial Cell Factories[J]. 2005, 4
    【36】Cai YD, Liu XJ, Xu XB, Chou KC. Support vector machines for the classification and prediction of beta-turn types. Journal of Peptide Science[J]. 2002, 8: 297-301.
    【37】Crum BA, Fraser T R. On the connection between chemical constitution and physiological action[M]. Edinburgh:Transactions of the Royal Society of Edinburgh, 1868: 151-203.
    【38】Balandin AA. Structural algebra in chemistry. Acta Physicochim[J]. 1940, 12: 447-479.
    【39】Meyer E. Vielseitige maschinelle Suchm?glichkeiten nach Strukturformeln, Teilstrukturen und Stoffklassen. Angewandte Chemie[J]. 1970, 82: 605-611.
    【40】Rouvray DH. Isomer enumeration methods. Chemical Society Reviews[J]. 1974, 3: 355-372.
    【41】Wiener H. Structural determination of paraffin boiling points. Journal of American Chemical Society[J]. 1947, 69: 17-20.
    【42】Hansch C, Maloney PP, Fujita T, Muir RM. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature[J]. 1962, 194: 178-180.
    【43】Hansch C, Muir RM, Fujita T, Maloney PP, Geiger F, Streich M. The correlation of biological activity of plant growth regulators and chloromycetin derivatives with Hammett constants and partition coefficients. Journal of American Chemical Society[J]. 1963, 85: 2817-2824.
    【44】Fujita T, Iwasa J, Hansch C. A new substituent constant,π, derived from partition coefficients. Journal of American Chemical Society[J]. 1964, 86: 5175-5180.
    【45】Free SM, Wilson JM. A mathematical contribution to structure-activity studies Journal of Medicinal Chemistry[J]. 1964, 7: 395-399.
    【46】Hansch C, Kurup A, Garg R, Gao H. Chem-bioinformatics and QSAR: A review of QSAR lacking positive hydrophobic terms. Chemical Reviews[J]. 2001, 101: 619-672.
    【47】Randic M. On characterization of molecular branching. Journal of American Chemical Society[J]. 1975, 97: 6609-6615.
    【48】Kier LB, Hall LH. Molecular Connectivity in Chemistry and Durg Research[M]. New York: Academic Press, 1976:100-153.
    【49】Kier LB, Hall LH. Molecular Connectivity in Sturcture-Activity Analysis[M].Letchworth England: Research Studies Press,1986:200-300.
    【50】Balaban AT. From chemical topology to 3D geometry. Journal of Chemical Information and Computer Sciences[J]. 1997, 37: 645-650.
    【51】Singh PP, Srivastava HK, Pasha FA. DFT-based QSAR study of testosterone and its derivatives. Bioorganic & Medicinal Chemistry[J]. 2004, 12: 171-177.
    【52】Liu WQ, Yi PG, Tang ZL. QSPR models for various properties of polymethacrylates based on quantum chemical descriptors. Qsar & Combinatorial Science[J]. 2006, 25: 936-943.
    【53】Cash GG. Prediction of physicochemical properties from Euclidean distance methods based on electrotopological state indices. Chemosphere[J]. 1999, 39: 2583-2591.
    【54】Rogers D, Hopfinger AJ. Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. Journal of Chemical Information and Computer Sciences[J]. 1994, 34: 854-866.
    【55】Luke BT. Evolutionary programming applied to the development of quantitative structure-activity relationships and quantitative structure-property relationships. Journal of Chemical Information and Computer Sciences[J]. 1994, 34: 1279-1287.
    【56】Aoyama T, Suzuki Y, Ichikawa H. Neural Networks Applied to Pharmaceutical Problems .3. Neural Networks Applied to Quantitative Structure Activity Relationship Analysis. Journal of Medicinal Chemistry[J]. 1990, 33: 2583-2590.
    【57】Agatonovic-Kustrin S, Beresford R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. Journal of Pharmaceutical and Biomedical Analysis[J]. 2000, 22: 717-727.
    【58】Burbidge R, Trotter M, Buxton B, Holden S. Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers & Chemistry[J]. 2001, 26: 5-14.
    【59】Yang SS, Lu WC, Chen NY, Hu QN. Support vector regression based QSPR for the prediction of some physicochemical properties of alkyl benzenes. Journal of Molecular Structure-Theochem[J]. 2005, 719: 119-127.
    【60】Crippen GM. Distance Geometry and Conformational Calculations, in Chemometric Research Studies[M].Chichester:Wiley, 1987:100-143.
    【61】Hopfinger AJ. A QSAR investigation of dihydrofolate reductase inhibition by Baker triazines based upon molecular shape analysis. Journal of American Chemical Society[J]. 1980, 102: 7196-7206.
    【62】Cramer RD, Patterson DE, Bunce JD. Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. Journal of American Chemical Society[J]. 1988, 110: 5959-5967.
    【63】史忠植.知识发现[M].北京:清华大学出版社, 2002: 1-265。
    【64】Sreerama KM. Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey. Data Mining and Knowledge Discovery[J]. 1998, 2: 345-389.
    【65】Hong JR. AE1: Extension matrix approximate method for the general covering problem. International Journal of Computer & Information Science[J]. 1985, 14: 421-437.
    【66】桂现才,彭宏,王小华. C4.5算法在保险客户流失分析中的应用.计算机工程与应用[J]. 2005, 17: 197-200。
    【67】胡学钢,李楠.基于属性重要度的随机决策树学习算法.合肥工业大学学报(自然科学版)[J]. 2007, 30: 681-685。
    【68】张华伟,王明文,甘丽新.基于随机森林的文本分类模型研究.山东大学学报(理学版)[J]. 2006, 41: 139-143。
    【69】Pearson WR. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol[J]. 2000, 132: 185-219.
    【70】Hansen LK, Slamon P. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence[J]. 1990, 12: 933-1001.
    【71】Sehapire RE. The strength of weak learnability. Machine Learning[J]. 1990, 5: 197-227.
    【72】Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J.Comput. System Sc.[J]. 1997, 55: 119-139.
    【73】Zhou ZH, Wu JX, Tang W. Ensembling neural networks: many could be better than all. Artificial Intelligence[J]. 2002, 137: 239-263.
    【74】Zhou ZH, Wu JX, Tang W, Chen ZQ. Combining regression estimators: GA-based selective neural network ensemble. International Journal of Computational Intelligence and Applications[J]. 2001, 1: 341-356.
    【75】Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics[J]. 1998, 26: 1651-1686.
    【76】Freund Y. Boosting a weak algorithm by majority. Information and computation[J]. 1995, 121: 256-285.
    【77】Freund Y, Schapire RE. Large margin classification using the perceptron algorithm. Mach Learn[J]. 1999, 37: 277-296.
    【78】Freund Y, Iyer R, Schapire RE, Singer Y. An efficient boosting algorithm for combining preferences. J Mach Learn Res[J]. 2004, 4: 933-969.
    【79】Freund Y, Mansour Y, Schapire RE. Generalization bounds for averaged classifiers. Annals of Statistics[J]. 2004, 32: 1698-1722.
    【80】Freund Y, Schapire RE. Additive logistic regression: A statistical view of boosting - Discussion. Annals of Statistics[J]. 2000, 28: 391-393.
    【81】Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Mach Learn[J]. 1999, 37: 297–336.
    【82】Schapire RE. The Boosting Approach to Machine Learning : An Overview. MSRI Workshop on Nonlinear Estimation and classification[R].NJ: AT&T Labs ,2002:1-15.
    【83】Duffy N, Helmbold D. A geometric approach to leveraging weak learners. Theoretical Computer Science[J]. 2002, 284: 67-108.
    【84】Breiman L. Bagging predictors. Mach Learn[J]. 1996, 24: 123-140.
    【85】Vapnik V. Statistical learning theory[M]. New York, Wiley-Interscience, 1998:50-80.
    【86】Vapnik VNT. The nature of statistical learning theory[M].New York: Springer, 1995:77-89.
    【87】Burges CJC. A tutorial on Support Vector Machines for pattern recognition. Data Min Knowl Discov[J]. 1998, 2: 121-167.
    【88】王骏,王炜.物理学进展[J]. 1997, 17: 289。
    【89】靳利霞.蛋白质结构预测研究[M].大连:大连理工大学出版社,2002:10-15。
    【90】Chou KC. Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Current Protein & Peptide Science. 2005, 6(5):423-36.
    【91】Cai YD, Li YX, Chou KC. Using neural networks for prediction of domain structural classes. Biochimica Et Biophysica Acta-Protein Structure and Molecular Enzymology[J]. 2000, 1476: 1-2.
    【92】Cai YD, Zhou GP. Prediction of protein structural classes by neural network. Biochimie[J]. 2000: 783~785.
    【93】Cai YD, Liu XJ, Xu XB, Chou KC. Support vector machines for prediction of protein domain structural class. J Theor Biol[J]. 2003, 221: 115-120.
    【94】Jin L X, Fang WW, Tang HW. Prediction of protein structural classes by a new measure of information discrepancy. Comput Biol Chem[J]. 2003, 27: 373~380.
    【95】Feng KY, Cai YD, Chou KC. Boosting classifier for predicting protein domain structural class. Biochem Biophys Res Communi[J]. 2005, 334: 213-217.
    【96】Bu WS, Feng ZP, Zhang ZD, Zhang CT. Prediction of protein(domain) structural classes based on amino acid index. European Journal of Biochemistry[J]. 1999, 266: 1043-1049.
    【97】Chou KC, Maggiora GM. Domain structural class prediction. Protein Engineering[J]. 1998, 11: 523-538.
    【98】Cai YD, Chou KC. Prediction of protein structural classes by neural network method. Internet Electronic Journal of Molecular Design[J]. 2002, 1: 332-338.
    【99】Cai YD, Liu XJ, Zhou GP. Support Vector Machines for predicting protein structural class. Bioinformatics[J]. 2001, 2: 1471-2105.
    【100】Shen HB, Yang J, Liu XJ, Chou KC. Using supervised fuzzy clustering to predict protein structural classes. Biochem Biophys Res Communi[J]. 2005, 334: 577-581.
    【101】寿天德.现代生物学导论[M].合肥:中国科学技术大学出版社, 1998:56-60。
    【102】徐晋麟,徐沁,陈淳.现代遗传学原理[M].北京:科学出版社, 2001:78-88。
    【103】韩贻仁.分子细胞生物学[M].北京:科学出版社, 2002:15-20。
    【104】Chou KC, Shen HB. Recent progress in protein subcellular location prediction. Analytical Biochemistry[J]. 2007, 370: 1-16.
    【105】李凤敏.蛋白质亚细胞定位的序列分析和理论预测算法研究[M].呼和浩特:内蒙古大学出版社,2004:50-60.
    【106】Nakashima H, Nishikawa K. Discrimination of intracellular and extracellular Proteins using amino acid composition and residue-pair frequencies. J Mol Biol [J]. 1994, 238: 54-61.
    【107】Cedano J, Aloy P, Perez-Pons J, Querol E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol[J]. 1997, 266: 594-600.
    【108】Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular localization of proteins. Nucleic Acids Research[J]. 1998, 26: 2230-2236.
    【109】Yuan Z. Prediction of protein subcellular localizations using Markov chain models. FEBS Letters[J]. 1999, 451: 23-26.
    【110】Cai YD, Chou KC. Using neural networks for prediction of subcellular location of prokaryotic and eukalyotic proteins. Mol Cell Biol Res Commun[J]. 2000, 4: 172-173.
    【111】Hua SJ, Sun ZR. Support vector machine approach for protein subcellular localization prediction. Bioinformatics[J]. 2001, 17: 721-728.
    【112】Cui Q, Jiang T, Liu B, Ma S. Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinformatics[J]. 2004, 5: 66-72.
    【113】Cai YD, Liu XJ, Xu XB, Chou KC. Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. Journal of Cellular Biochemistry[J]. 2002, 84: 343-348.
    【114】Chou KC. Prediction of protein cellular attributes using pseudo-aminoacid composition. Proteins: Struct Funct. Genet[J]. 2001, 43: 246-255.
    【115】Park KJ, Kanehisa M. Prediction of Protein Subcellular Locations by Support Vector Machines Using Compositions of Amino Acids and Amino Acid Pairs. Bioinformatics[J]. 2003, 19: 1656-1663.
    【116】Huang Y, Li YD. Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics[J]. 2004, 20: 21-28.
    【117】Yu CS, Lin CJ, Hwang JK. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci[J]. 2004, 13: 1402-1406.
    【118】Andrade MA, O'Donoghue SI, Rost B. Adaptation of protein surfaecs to subcellular location. J Mol Biol[J]. 1998, 276: 517-525.
    【119】Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang ZD, He L. Application of pseudo amino acid composition for predicting protein subcellular location: Stochastic signal processing approach. J Protein Chem[J]. 2003, 22: 395-402.
    【120】Pan YX, Li DW, Duan Y, et al. Predicting Protein Subcellular Location Using Digital Signal Processing. Acta Biochimica et. Biophysica Sinica[J]. 2005, 37: 88-96.
    【121】Gao Y, Shao S, Xiao X, et al. Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids[J]. 2005, 28: 373-376.
    【122】Liu H, Yang J, Wang M, Xue L, Chou KC. Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types. Protein Journal[J]. 2005, 24: 385-389.
    【123】Xiao X, Shao S, Ding Y, Huang Z, Huang Y, Chou KC. Using complexity measure factor to predict protein subcellular location. Amino Acids[J]. 2005, 28: 57-61.
    【124】Xiao X, Shao S, Ding Y, Huang Z, Chou KC. Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids[J]. 2006, 30: 49-54.
    【125】Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol Biol.[J]. 2000, 300: 1005-1016.
    【126】Nielsen H, Engelbrecht J, Brunak S, von Heijne G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering[J]. 1997, 10: 1-6.
    【127】Emanuelsson O, Nielsen H, von Heijne G. a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci[J]. 1999, 8: 978-984.
    【128】Claros MG, Vineens P. Computational method to predict mitoehondrially imported proteins and their targeting sequenecs. Eur. J. Biochem[J]. 1996, 241: 779-778.
    【129】Bannai H, Tamada Y, Maruyama O, Nakai K, Miyano S. Extensive feature detection of N-terminal protein sorting signals. Bioinformatics[J]. 2002, 18: 298-305.
    【130】Kawashima S, Ogata H, Kanehisa M. AAindex: Amino Acid Index Database. Nucleic Acids Research[J]. 1999, 27: 368-369.
    【131】Mareotte EM, Xenarios I, van Der Bliek A, Eisenberg D. Localizing proteins in the cell from their phylogenetic profiles. Proc NatlAcad Sci. USA[J]. 2000, 97: 12115-12120.
    【132】Schultz J, Copley RR, Doerks T, Ponting CP, Bork P. SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res.[J]. 2000, 28: 231-234.
    【133】Mott R, Schultz J, Bork P, Ponting CP. Predicting protein cellular localization using a domain projection method. Genome Res[J]. 2002, 12: 1168-1174.
    【134】Cokol M, Nair R, Rost B, , l(5): . Finding nuclear localization signals. EMBO Rep[J]. 2000, 1: 411-415.
    【135】Cai YD, Chou KC. Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Communi[J]. 2003, 305: 407-411.
    【136】Lu Z, Szafron D, Greiner R, et al. Predicting subcellular localization of proteins using maehine-learned classifiers. Bioinformatics[J]. 2004, 20: 547-556.
    【137】Nair R, Rost B. Sequences conserved for subcellular localization. Protein Science[J]. 2002, 11: 2836-2847.
    【138】Scott MS, Thomas DY, Hallett MT. Predicting subcellular localization via protein motif co-occurrence. Genome Research[J]. 2004, 14: 1957-1966.
    【139】Nakai K, Kanehisa M. A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells. Genomics[J]. 1992, 14: 897-911.
    【140】Horton P, Nakai K. Better prediction of protein cellular localization sites with the k nearest neighbors classifier. Proc Int Conf Intell Syst Mol Biol[J]. 1997, 5: 147-152.
    【141】Gardy JL, Spencer C, Wang K, et al. PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Research[J]. 2003, 31: 3613-3617.
    【142】Bhasin M, Raghava GPS. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Research[J]. 2004, 32: W414-W419.
    【143】Drawid A, Gerstein M. A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome. J Mol Biol[J]. 2000, 301: 1059-1075.
    【144】Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer ELL. The Pfam protein families database. Nucleic Acids Research[J]. 2000, 28: 263-266.
    【145】Guda C, Fahy E, Subramaniam S. MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins. Bioinformatics[J]. 2004, 20: 1785-1794.
    【146】Hoglund A, Donnes P, Blum T, Adolph HW, Kohlbacher O. MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics[J]. 2006, 22: 1158-1165.
    【147】Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucl. Acids Res[J]. 2000, 28: 45-48.
    【148】Reinhardt A, Hubbard T, :. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res.[J]. 1998, 9: 2230–2236.
    【149】杨福愉,张旭家.生物膜蛋白三维结构研究的现状与展望.生物化学与生物物理进展[J]. 2000, 27: 965-975。
    【150】陶慰,孙李惟,姜涌明.蛋白质分子基础[M].北京:北京高等教育出版社, 1995:31-33。
    【151】Alberts B, Bray D, Lewis J, et al. Molecular Biology of the Cell[M].New York:Garland Publishing, 1994:74-99.
    【152】Lodish H, Baltimore D, A.Berk, S.L.Zipursky, P.Matsudaira, J.Darnell. Molecular Cell Biology[M]. New York: Scientific American Books, 1995:88-90.
    【153】Chou KC, Elrod DW. Prediction of membrane protein types and subcellular locations. Proteins Struct Funct Genet[J]. 1999, 34: 137-153.
    【154】Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition Proteins Struct Funct Genet[J]. 2001, 43: 246-255.
    【155】Chou PY. Amino acid composition of four classes of proteins[C]. Second Chemical Congress of the North American Continent.Las Vegas:Wiley,1980:111-146.
    【156】Chou PY. Prediction of protein structural classes from amino acid composition[M]. New York: Plenum Press,1989:45-55.
    【157】Nakashima H, Nishikawa K, Ooi T. The folding type of a protein is relevant to the amino acid composition Journal of Biochemistry[J]. 1986, 99: 152–162.
    【158】Cai YD, Zhou GP, Chou KC. Support vector machines for predicting membrane protein types by using functional domain composition. Biophysical Journal[J]. 2003, 84: 3257-3263.
    【159】Cai YD, Ricardo PW, Jen CH, Chou KC. Application of SVM to predict membrane protein types. J Theor Biol[J]. 2004, 226: 373-376.
    【160】Mahalanobis PC. On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India[J]. 1936, 2: 49–55.
    【161】Chou KC. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins[J]. 1995, 21: 319-344.
    【162】Cai YD, Liu XJ, Chou KC. Artificial neural network model for predicting membrane protein types. J Biomol Struct Dyn.[J]. 2001, 18: 607-610.
    【163】Klein P. Prediction of protein structural class by discriminant analysis. Biochem Biophys. Acta[J]. 1986, 874: 205-215.
    【164】Zhang CT, Chou KC. An optimization approach to predicting protein structural class from amino acid composition. Protein Sci,[J]. 1992, 1: 401-408.
    【165】Zhou GF, Zhang CT. A weighting method for predicting protein structural class from amino acid composition. Eur J Biochem[J]. 1992, 210: 747-749.
    【166】Zhang CT, Chou KC, Maggiora K. Predicting protein structural classes from amino acid composition: application of fuzzy clustering. Protein Engineering[J]. 1995,
    【167】Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucl Acids Res[J]. 2000, 28: 45-48.
    【168】Chou K, Zhang C. Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology[J]. 1995, 30: 275-349.
    【169】Chen C, Tian YX, Zou XY, Cai PX, Mo JY. Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol[J]. 2006, 243: 444-448.
    【170】Chen C, Zhou XB, Tian YX, Zou XY, Cai PX. Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Analytical Biochemistry[J]. 2006, 357: 116-121.
    【171】Chen J, Liu H, Yang J, Chou KC. Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids[J]. 2007, 33: 423-428.
    【172】Chou KC, Shen HB. Hum-PLoc: A novel ensemble classifier for predicting human protein subcellular localization. Biochemical and Biophysical Research Communications[J]. 2006, 347: 150-157.
    【173】Chou KC, Shen HB. Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. Journal of Proteome Research[J]. 2006, 5: 1888-1897.
    【174】Chou KC, Shen HB. Euk-mPLoc: A fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res[J]. 2007, 6: 1728-1734.
    【175】Chou KC, Shen HB. Large-scale plant protein subcellular location prediction. Journal of Cellular Biochemistry[J]. 2007, 100: 665-678.
    【176】Chou KC, Shen HB. MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Communi[J]. 2007, 360: 339-345.
    【177】Chou KC, Shen HB. Signal-CF: A subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Communi[J]. 2007, 357: 633-640.
    【178】Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M. Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids[J]. 2008, 34: 111-117.
    【179】Ding YS, Zhang TL, Chou KC. Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept Lett[J]. 2007, 14: 811-815.
    【180】Du PF, Li YD. Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. Bmc Bioinformatics[J]. 2006, 7: 518.
    【181】Fang Y, Guo Y, Feng Y, Li M. Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features. Amino Acids[J]. 2008, 34: 103-109.
    【182】Guo YZ, Li M, Lu M, et al. Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform. Amino Acids[J]. 2006, 30: 397-402.
    【183】Kedarisetti KD, Kurgan L, Dick S. Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Communi[J]. 2006, 348: 981-988.
    【184】Li FM, Li QZ. Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids[J]. 2008, 34: 119-125.
    【185】Lin H, Li QZ. Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Communi[J]. 2007, 354: 548-551.
    【186】Lin H, Li QZ. Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components. J Comput Chem[J]. 2007, 28: 1463-1466.
    【187】Liu DQ, Liu H, Shen HB, Yang J, Chou KC. Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments. Amino Acids[J]. 2007, 32: 493-496.
    【188】Mondal S, Bhavna R, Babu RM, Ramakumar S. Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol[J]. 2006, 243: 252-260.
    【189】Niu B, Cai YD, Lu WC, Li GZ, Chou KC. Predicting protein structural class with AdaBoost learner. Protein Pept Lett[J]. 2006, 13: 489-492.
    【190】Shen HB, Chou KC. Using ensemble classifier to identify membrane protein types. Amino Acids[J]. 2007, 32: 483-488.
    【191】Shen HB, Chou KC. Hum-mPLoc: An ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Communi[J]. 2007, 355: 1006-1011.
    【192】Shen HB, Chou KC. Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. Protein Engineering Design & Selection[J]. 2007, 20: 39-46.
    【193】Shen HB, Chou KC. Virus-PLoc: A fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers[J]. 2007, 85: 233-240.
    【194】Shen HB, Yang J, Chou KC. Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids[J]. 2007, 33: 57-67.
    【195】Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J. Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids[J]. 2007, 33: 69-74.
    【196】Sun XD, Huang RB. Prediction of protein structural classes using support vector machines. Amino Acids[J]. 2006, 30: 469-475.
    【197】Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L. Prediction of mitochondrial proteins based on genetic algorithm - partial least squares and support vector machine. Amino Acids[J]. 2007, 33: 669-675.
    【198】Wang M, Yang J, Chou KC. Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids[J]. 2005, 28: 395-402.
    【199】Wen Z, Li M, Li Y, Guo Y, Wang K. Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids[J]. 2007, 32: 277-283.
    【200】Xiao X, Chou K-C. Digital coding of amino acids based on hydrophobic index. Protein Pept Lett[J]. 2007, 14: 871-875.
    【201】Xiao X, Shao S, Ding Y, Huang Z, Chen X, Chou KC. Using cellular automata to generate image representation for biological sequences. Amino Acids[J]. 2005, 28: 29-35.
    【202】Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY. Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion. Amino Acids[J]. 2006, 30: 461-468.
    【203】Zhang TL, Ding YS. Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids[J]. 2007, 33: 623-629.
    【204】Zhou GP. An intriguing controversy over protein structural class prediction. J Protein Chem[J]. 1998, 17: 729-738.
    【205】Zhou GP, Assa-Munt N. Some insights into protein structural class prediction. Proteins-Structure Function and Genetics[J]. 2001, 44: 57-59.
    【206】Zhou GP, Doctor K. Subcellular location prediction of apoptosis proteins. Proteins Struct Funct Genet[J]. 2003, 50: 44-48.
    【207】Zhou XB, Chen C, Li ZC, Zou XY. Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol[J]. 2007, 248: 546-551.
    【208】陈钟强,朱贻盛,李亦学.膜蛋白跨膜区预测方法的评价.生物化学与生物物理学报[J]. 2002, 34: 285-290.
    【209】刘琪,朱贻盛,李亦学等.跨膜蛋白拓扑结构预测的研究进展.国外医学生物医学工程分册[J]. 2001, 24: 197-201.
    【210】李萍,李衍达,孙之荣等.基于小波分析的膜蛋白跨膜区段序列分析和预测, 2000, 9: 577-585.
    【211】赵国屏.生物信息学[M].北京:科学出版,2002:41-43。
    【212】Cristianini N, Shawe-Taylor J. An introduction to support vector machines[M].Cambridge: Cambridge University Press, 2000:88-113.
    【213】Chou GP, Nuria A-M. Some Insights into Protein Structural Class Prediction PROTEINS: Structure, Function, And Genetics[J]. 2001, 44: 57-59.
    【214】Cai YD. Is It a Paradox or Misinterpretation? PROTEINS: Structure,Function, And Genetics[J]. 2001, 43: 336-338.
    【215】Metzler DE. Biochemistry: The Chemical Reactions of Living Cells[M].London: Academic press, 1977:26-55.
    【216】Burkart MD. Metabolic engineering--a genetic toolbox for small molecule organic synthesis. Org Biomol Chem[J]. 2003, 1: 1-4.
    【217】Marchand-Geneste N, Watson KA, Alsberg B,KaK, RD. New Approach to Pharmacophore Mapping and QSAR Analysis Using Inductive Logic Programming- Application to Thermolysin Inhibitors and Glycogen Phosphorylase b Inhibitors[C]. Washington: ACS Publications,2002: 399-409.
    【218】Boros LG, Boros TF. Use of metabolic pathway flux information in anticancer drug design. Ernst Schering Found Symp Proc[J]. 2007: 189-203.
    【219】David S, Wishart, Dan Tl. Nucleic Acids Res[J]. 2007: D521–D526.
    【220】Nicholson JK, Holmes E, Lindon JC, Wilson ID. The challenges of modeling mammalian biocomplexity. Nat Biotechnol[J]. 2004, 22: 1268-1274.
    【221】Sarah AT, Stuart CG, Janet M, Monica R, Julian G, Cyrus C. TRENDS in Biotechnology[J]. 2001, 19: 482-486.
    【222】de Atauri P, Sorribas A, Cascante M. Analysis and prediction of the effect of uncertain boundary values in modeling a metabolic pathway. Biotechnology and Bioengineering[J]. 2000, 68: 18-30.
    【223】Girgis RR, Javitch JA, Lieberman JA. Antipsychotic drug mechanisms: links between therapeutic effects, metabolic side effects and the insulin signaling pathway. Molecular Psychiatry[J]. 2008, 13: 918-929.
    【224】Moreno-Sanchez R, Encalada R, Marin-Hernandez A, Saavedra E. Experimental validation of metabolic pathway modeling - An illustration with glycolytic segments from Entamoeba histolytica. Febs Journal[J]. 2008, 275: 3454-3469.
    【225】Pireddu L, Szafron D, Lu P, Greiner R. The Path-A metabolic pathway prediction web server. Nucleic Acids Research[J]. 2006, 34: W714-W719.
    【226】Anishetty S, Pulimi M, Pennathur G. Potential drug targets in Mycobacterium tuberculosis through metabolic pathway analysis. Computational Biology and Chemistry[J]. 2005, 29: 368-378.
    【227】Goto S, Nishioka T, Kanehisa M. LIGAND: Chemical Database for Enzyme Reactions. Bioinformatics[J]. 1998, 14: 591-599.
    【228】Kanehisa M, Goto S, Hattori M, et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res[J]. 2006, 34: D354-357.
    【229】Cai YD, Chou KC. Artificial neural network model for predicting alpha-turn types. Analytical Biochemistry[J]. 1999, 268: 407-409.
    【230】Cai YD, Feng KY, Lu WC, Chou KC. Using LogitBoost classifier to predict protein structural classes. Journal of Theoretical Biology[J]. 2006, 238: 172-176.
    【231】Cai YD, Chou KC. Predicting subcellular localization of proteins in a hybridization space. Bioinformatics[J]. 2004, 20: 1151-1156.
    【232】Cai YD, Liu XJ, Chou KC. Artificial neural network model for predicting protein subcellular location. Computers & Chemistry[J]. 2002, 26: 179-182.
    【233】Chou KC, Cai YD. Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochemical and Biophysical Research Communications[J]. 2004, 320: 1236-1239.
    【234】Chou KC, Cai YD. Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. Journal of Cellular Biochemistry[J]. 2004, 91: 1197-1203.
    【235】Chou KC, Elrod DW. Prediction of cellular location of proteins. Abstracts of Papers of the American Chemical Society[J]. 1998, 216: 208-211.
    【236】Chou KC, Elrod DW. Protein subcellular location prediction. Protein Engineering[J]. 1999, 12: 107-118.
    【237】Chou KC, Shen HB. Large-scale predictions of gram-negative bacterial protein subcellular locations. J Proteome Res. 2006, 5(12):3420-8.
    【238】Chou KC, Cai YD. Predicting protein-protein interactions from sequences in a hybridization space. J Proteome Res. 2006, 5(2):316-22.
    【239】Chou K-C, Shen H-B. Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc[J]. 2008, 3: 153-162.
    【240】Jia PL, Qian ZL, Zeng ZB, Cai YD, Li YX. Prediction of subcellular protein localization based on functional domain composition. Biochemical and Biophysical Research Communications[J]. 2007, 357: 366-370.
    【241】Niu B, Jin YH, Feng KY, Lu WC, Cai YD, Li GZ. Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Molecular Diversity[J]. 2008, 12: 41-45.
    【242】Cai YD, Chou KC. Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. Journal of Theoretical Biology[J]. 2006, 238: 395-400.
    【243】Marchand-Geneste N, Watson KA, Alsberg BK, King RD. New Approach to Pharmacophore Mapping and QSAR Analysis Using Inductive LogicProgramming. Application to Thermolysin Inhibitors and Glycogen Phosphorylase b Inhibitors[M]. Washington: ACS Publications, 2002:399-409.
    【244】Brooksbank C, Cameron G, Thornton J. The European Bioinformatics Institute's data resources: towards systems biology. Nucleic Acids Res[J]. 2005, 33: 46–53.
    【245】Sarah AT, Stuart CGR, Janet MT, Monica R, Julian G, Cyrus C. Small-molecule metabolism: an enzyme mosaic. Trends in Biotech[J]. 2001, 19: 482-486.
    【246】Caspi R, Foerster H, Fulcher CA, et al. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res[J]. 2006, 34: D511-D516.
    【247】Wishart DS, Tzur D, Knox C, et al. HMDB: the human metabolome database. Nucleic Acids Res[J]. 2007, 35: D521-D526.
    【248】Wheeler DL, Barrett T, Benson DA, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res[J]. 2007, 35: 5–12.
    【249】Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P. STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res[J]. 2008, 36: 684- 688.
    【250】Creighton TE. Proteins– Structures and Molecular Properties[M].New York: Freeman, 1993:88-111.
    【251】Mucchielli-Giorgi MH, Hazout S, Tuffery P. PredAcc: prediction of solvent accessibility. Bioinformatics[J]. 1999, 15: 176-177.
    【252】Tusnady GE, Simon I. Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Bio[J]. 1998, 283: 489-506.
    【253】Chou KC, Cai YD. Predicting protein structural class by functional domain composition. Biochem Biophys Res Communi[J]. 2004, 321: 1007-1009.
    【254】Inna Dubchak IMCMIDS-HK. Recognition of a protein fold in the context of the SCOP classification. Proteins Struct Funct Genet[J]. 1999, 35: 401-407.
    【255】Chothia C, Finkelstein AV. The classification and origins of protein folding patterns. Annu. Rev. Biochem.[J]. 1990, 59: 1007–1039.
    【256】Frishman D, Argos P. Seventy-five percent accuracy in protein secondary structure prediction. Proteins Struct Funct Genet[J]. 1997, 27: 329–335.
    【257】Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim, S.H. . Recognition of a protein fold in the context of the structural classification of proteins (SCOP) classification. Proteins [J]. 1999, 35: 401–407.
    【258】Mucchielli-Giorgi MH, Hazout S, Tuffery P. Pred Acc:prediction of solvent accessibility. Bioinformatics[J]. 1999, 15: 176–177.
    【259】Bender ML, Brubacher LJ. Catalysis and enzyme action[M].NewYork:McGraw-Hill, 1973:211-255.
    【260】Hermann D. Bioorganic Chemistry.A chemical approach to enzyme action[M].NewYork: Springer, 2005:88-139.
    【261】Michael P, Andrew W. Organic and Bio-organic Mechanisms[M].Harlow: Addison Wesley Longman, 1997:155-188.
    【262】Bugg T. An Introduction to Enzyme and Coenzyme Chemistry [M].Oxford: Blackwell Publishing, 1997:44-66.
    【263】Kurt W, Helmut G. Persistent organic pollutants (POPs) in Antarctic fish: levels, patterns, changes. Chemosphere[J]. 2003, 53: 667-678.
    【264】Blaney FE, Naylor D, Woods J. Mambas - a Real-Time Graphics Environment for Qsar. Journal of Molecular Graphics[J]. 1993, 11: 157-165.
    【265】Netzeva TI, Pavan M, Worth AP. Review of (quantitative) structure-activity relationships for acute aquatic toxicity. Qsar & Combinatorial Science[J]. 2008, 27: 77-90.
    【266】Schultz TW, Cronin MTD, Walker JD, Aptula AO. Quantitative structure-activity relationships (QSARs) in toxicology: a historical perspective. Journal of Molecular Structure-Theochem[J]. 2003, 622: 1-22.
    【267】Seward JR, Cronin MTD, Schultz TW. Structure-toxicity analyses of Tetrahymena pyriformis exposed to pyridines - An examination into extension of surface-response domains. Sar and Qsar in Environmental Research[J]. 2001, 11: 489-512.
    【268】高为,王超.苯酚及其衍生物毒性研究.苏州城建环保学院学报[J]. 1998, 11: 38.
    【269】Dora R. Linear Solvation Energy Relationships for Toxicity of Selected Organic Chemicals to Daphnia Pulex and Daphnia magna[C]. Proceedings of QSAR .Knoxville :Springer,1988: 22-26.
    【270】Rice-Evans C, Packer L. Flavonoids in health and disease[M].New York:Marcel Dekker, 1998:71-119.
    【271】Russom CL, Bradbury SP, Broderium SJ, Hammermeister DE, Drummond RA. Predicting modes of toxic action from chemical structure: acute toxicity in the fathead minnow (Pimephales promelas)Environ. Toxicol. Chem.[J]. 1997, 16: 948–967.
    【272】Aptula AO, Netzeva TI, Valkova IV, et al. Multivariate discrimination between modes of toxic action of phenols. Quantitative Structure-Activity Relationships[J]. 2002, 21: 12-22.
    【273】Bearden AP, Schultz TW. Structure-activity relationships for Pimephales and Tetrahymena: A mechanism of action approach. Environ. Toxicol. Chem.[J]. 1997, 16: 1311-1317.
    【274】Schultz TW, Applehans FM, Riggin GW. Structure-Activity-Relationships of Selected Pyridines .3. Log Kow Analysis. Ecotoxicology and Environmental Safety[J]. 1987, 13: 76-83.
    【275】Schultz T, Lin D, Wilke T, Arnold L. Practical applications of quantitative structure– activity relationships (QSAR) in environmental chemistry and toxicology[M].Dordrecht, Netherlands: Kluwer Academic, 1990:156-189.
    【276】Schultz T, Sinks G, Cronin M. Quantitative Structure–Activity Relationships in environmental sciences[M].Pensacola:SETAC Press, 1997: 329-342.
    【277】McKim JM, Bradbury SP, Niemi GJ. Fish Acute Toxicity Syndromes and Their Use in the Qsar Approach to Hazard Assessment. Environmental Health Perspectives[J]. 1987, 71: 171-186.
    【278】Veith GD, Broderius SJ. Rules for Distinguishing Toxicants That Cause Type-I and Type-Ii Narcosis Syndromes. Environmental Health Perspectives[J]. 1990, 87: 207-211.
    【279】Ren S. Determining the mechanisms of toxic action of phenols to Tetrahymena pyriformis. Environmental Toxicology[J]. 2002, 17: 119-127.
    【280】Flaten GR, Grung B, Kvalheim AM. Multi-way exploration of regular environmental monitoring surveys. Chemometrics and Intelligent Laboratory Systems. 2005, 77(1-2):104-14.
    【281】Geladi P, Hadjiiski L, Hopke P. Multiple regression for environmental data: nonlinearities and prediction bias. Chemometrics and Intelligent Laboratory Systems. 1999, 47(2):165-73
    【282】Gerencer M, Burek V. Identification of HIV-1 protease cleavage site in human C1-inhibitor. Virus Research. 2004, 105(1):97-100.
    【283】LoPresti CA, Obrien RF. Characterizing environmental pressures along the US/Mexico border: An application of the Toxics Release Inventory in environmentrics. Chemometrics and Intelligent Laboratory Systems. 1997, 37(1):95-111.
    【284】Lundstedt-Enkel K, Gabrielsson J, Olsman H, Seifert E, Pettersen J, Lek PM, et al. Different multivariate approaches to material discovery, process development, PAT and environmental process monitoring. Chemometrics and Intelligent Laboratory Systems. 2006, 84(1-2):201-7.
    【285】Paakkunainen M, Reinikainen SP, Minkkinen P. Estimation of the variance of sampling of process analytical and environmental emissions measurements. Chemometrics and Intelligent Laboratory Systems. 2007, 88(1):26-34.
    【286】Romanenko SV, Larina LN, Larin SL. A non-statistical approach in systematic error estimation at some metal ions determination in environmental objects by stripping voltammetry. Chemometrics and Intelligent Laboratory Systems. 2007, 88(1):11-7.
    【287】Cronin MTD, Aptula AO, Duffy JC, et al. Comparative assessment of methods to develop QSARs for the prediction of the toxicity of phenols to Tetrahymena pyriformis. Chemosphere[J]. 2002, 49: 1201-1221.
    【288】Ren S. Ecotoxicity prediction using mechanism- and non-mechanism-based QSARs: a preliminary study. Chemosphere[J]. 2003, 53: 1053-1065.
    【289】Ren SJ. Modeling the toxicity of aromatic compounds to Tetrahymena pyriformis: The response surface methodology with nonlinear methods. Journal of Chemical Information and Computer Sciences[J]. 2003, 43: 1679-1687.
    【290】Wang XD, Yu JZ, Wang Y, Wang LS. Mechanism-based quantitative structure-activity relationships for the inhibition of substituted phenols on germination rate of Cucumis sativus. Chemosphere[J]. 2002, 46: 241-250.
    【291】Bearden AP, Schultz TW. Comparison of Tetrahymena and Pimephales toxicity based on mechanism of action. Sar and Qsar in Environmental Research[J]. 1998, 9: 127-153.
    【292】Zhao YH, Cronin MTD, Dearden JC. Quantitative structure-activity relationships of chemicals acting by non-polar narcosis - Theoretical considerations. Quantitative Structure-Activity Relationships[J]. 1998, 17: 131-138.
    【293】Schultz TW, Cronin MTD. Quantitative structure - Activity relationships for weak acid respiratory uncouplers to Vibrio fisheri. Environ. Toxicol. Chem.[J]. 1997, 16: 357-360.
    【294】Dash M, H L. Feature selection for classification. Intelligent Data Analysis[J]. 1997, 1: 131–156.
    【295】Anderssen E, Dyrstad K, Westad F, Martens H. Reducing over-optimism in variable selection by cross-model validation. Chemometrics and Intelligent Laboratory Systems[J]. 2006, 84: 69-74.
    【296】Gidskehaug L, Anderssen E, Alsberg BK. Cross model validated feature selection based on gene clusters. Chemometrics and Intelligent Laboratory Systems[J]. 2006, 84: 172-176.
    【297】Peng HC, Long FH, Ding C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE. T. PATTERN ANAL.[J]. 2005, 27: 1226-1238.
    【298】Kohavi R, John G. Wrapper for Feature Subset Selection. Artif. Intell[J]. 1997, 1-2: 273-324.
    【299】Hall MA. Practical feature subset selection for machine learning[C]. Proceedings of the Twenty first Australian Computer Science Conference. Perth:Springer,1998:212-256.
    【300】Korf RE, Chickering DM. Best-first minimax search. Artificial Intelligence[J]. 1996, 84: 299-337.
    【301】徐筱杰,候廷军.计算机药物分子设计[M].北京:化学工业出版社, 2004: 476-488。
    【302】Appett k. Crystal structures of HIV-1 protease-inhibitors complexes. Persp Drug Discov Des[J]. 1993, 1: 1993.
    【303】陈凯先,蒋华良,稽汝运.计算机药物辅助设计一原理力法及应用[M].北京:化学工业出版社,2000:361-365。
    【304】Cofn,J. Nature[J]. 1986, 321: 10.
    【305】Kramer RA, Sehaber MD, Skalla AM. HTLV-III gag protein is processed in yeast cells by the virus pol-protease. Science[J]. 1986, 231: 1580-1584.
    【306】Toh,H., Ono,M., Saigo,K, Miyata,T. Retroviral Protease -like sequeneein the yeast Nature[J]. 1985, 315: 691.
    【307】Beck ZQ, Hervio, L., Dawson, P.E., Elder, J.E., Madison, E.L. Identification of efficiently cleaved substrates for HIV-1 protease using a phage display library and use in inhibitor development. Virology[J]. 2000, 75: 9502-9508.
    【308】Schechter I, Berger A. On the size of the active site in proteases. Biochem Biophys Res Commun[J]. 1967, 27: 157—162.
    【309】Poorman RA, Tomasselli AG, Heinrikson RLK, F.J. . A cumulative speci¢city model for proteases from human immunode¢ciency virus types 1 and 2, inferred from statistical analysis of an extended substrate data base. J. Biol. Chem.[J]. 1991, 266: 14554-14561.
    【310】Cai YD, Chou, K.C. Artificial neural network model for predicting HIV protease cleavage sites in protein. . Adv. Eng. Software [J]. 1998a, 29, : 119-128.
    【311】Cai YD, Yu, H., Chou, K.C., . Using neural network for prediction of HIV protease cleavage sites in proteins. J. Protein Chem[J]. 1998b, 17: 607–615.
    【312】Narayanan A, Wu X, Yang ZR. Mining viral protease data to extract cleavage knowledge. Bioinformatics[J]. 2002, 18: 5-13.
    【313】R?gnvaldsson T, You L. Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics[J]. 2004, 20: 1702–1709.
    【314】Cai YD, Liu XJ, Xu XB, Chou KC. Support vector machines for predicting HIV protease cleavage sites in protein. J Comput Chem[J]. 2002, 23: 267-274.
    【315】Chou KC. Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry[J]. 1996, 233: 1-14.
    【316】Kim HL, Oh B, Kimm K, Koh I. Prediction of phosphorylation sites using SVMs. Bioinformatics.[J]. 2004, 20: 3179–3184.
    【317】Kawashima S, Ogata H, Kanehisa M. AAindex: Amino Acid Index Database. Nucleic Acids Res.[J]. 1999, 27: 368-369.
    【318】Kawashima S, Kanehisa M. AAindex: amino acid index database. . Nucleic Acids Res.[J]. 2000, 28: 374.
    【319】Ding C, Peng HC. Minimum Redundancy Feature Selection from Microarray Gene Expression Data. Proc. Second IEEE Computational Systems Bioinformatics Conf[J]. 2003: 523-528.
    【320】Ding YS, Zhang TL, Chou KC. Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein and Peptide Letters[J]. 2007, 14: 811-815.
    【321】John G, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem[C]. The Eleventh International Conference on Machine Learning.NJ: 1994, Morgan Kaufmann:121-129.
    【322】Aha DW, Bankert RL. Feature selection for case-based classification of cloud types[C]. Working notes of the AAAI94 workshop on case-based reasoning. Seattle:Lawrence Erlbaum, 1994:106-l112.
    【323】Provan GM, Singh M. Learning bayesian networks using feature selection[C]. Proc. 5th Intern:Workshop on AI and Statistics. New York, NY: Springer Verlag, 1995: 450-456.
    【324】Inza I, Larraaga P, Sierra B. Feature subset selection by Bayesian networks based on optimization. Artificial Intelligence[J]. 2001, 123: 157~184.
    【325】Kohavia R, John G. Wrappers for feature subset selection Artificial Intelligence [J]. 1997, 97: 273-324
    【326】Nakai K, Kidera A, Kanehisa M. Cluster analysis of amino acid indices for prediction of protein structure and function. Protein Eng. [J]. 1988, 2: 93-100.
    【327】Jaskolski M, Tomasselli AG, Sawyer TK, Staples DG, Schneider RL, Wlodawer A. Structure at 2.5-A resolution of chemically synthesized human immunodeficiency virus type 1 protease complexed with a hydroxyethylene-based inhibitor. Biochem.[J]. 1991, 30: 1600-1609.
    【328】Wlodawer A, Gustchina A. Structural and biochemical studies of retroviral proteases. Biochim Biophys Acta[J]. 2000, 7: 16-34.
    【329】Ridky TW, Cameron CE, Cameron J, et al. Human immunodeficiency virus, type 1 protease substrate specificity is limited by interactions between substrate amino acids bound in adjacent enzyme subsites. J. Biol. Chem. [J]. 1996, 271: 4709-4717.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700