若干有机小分子生物活性和毒性的识别及预测
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近些年来,伴随着人类等生物物种基因组学、信息技术和生物检测手段的不断发展,生物信息资源日渐丰富,生物信息学作为新兴的交叉学科应运而生。理论研究者可以在实验获取的数据基础上进行加工、存储等,利用机器学习方法进行分析,从中找出隐含的规律和模式,从而进一步加深对事物的认识,揭示数据所蕴含的生物学意义。本文就是采用这一研究方法着手若干有机小分子生物活性和毒性的识别及预测。本文的主体工作分为三个部分:
     第一部分:基于集成学习算法的小分子生物功能预测
     如何准确并有效地确定小分子生物功能是一个挑战,小分子生物功能预报研究具有重要意义。本部分内容中我们运用集成学习算法来解决这个问题。我们尝试用AdaBoost-C4.5算法建模,用官能团组成来实现小分子编码,完成小分子代谢途径类型预测等研究。小分子生物功能的研究可以帮助我们认识疾病机理、理解生命现象。本部分研究所建立的模型显示出较好的预测性能,其交叉验证预报准确率为73.71%,对独立测试集的预报准确率达73.8%。根据建立的预测模型,我们开发了相应的小分子代谢途径类型预报的在线服务系统,有关WEB界面见http://chemdata.shu.edu.cn/pathway/。
     第二部分:基于集成学习算法的代谢过程中酶和小分子相互作用的预测
     酶和小分子之间相互作用的信息对于我们理解酶和小分子的新陈代谢作用和其它生物过程非常重要。本文中我们应用AdaBoost,Bagging and KNN等不同的分类器组合,通过多分类器投票系统来预测酶和小分子在代谢过程中的相互作用。研究表明:多分类器投票系统的预报结果优于任何单个分类器预报的结果。我们得到的训练数据集和独立测试集的预报准确率分别为82.8%和84.8%。其中对于酶和小分子相互作用对(即正样本)独立测试集的预报准确率为75.5%,比之前文献报道的准确率高出4个百分点。本工作提出的预报方法的相关内容已建立在WEB服务器上,地址为http://chemdata.shu.edu.cn/small-enz/。
     第三部分:基于支持向量机回归的麻醉药毒性构效关系研究
     本部分工作中,我们采用支持向量机回归方法、多元线性回归、偏最小二乘法及逆传播人工神经网络研究了39个麻醉药毒性的定量构效关系。从若干量子化学计算参数中筛选出能有效建模的分子描述符。所得SVR,MLR,PLS,BP-ANN模型的均方根差分别为0.283,0.385,0.392和0.466。结果表明,所建支持向量机回归模型的预报精度高于MLR、PLS和BP-ANN方法所得的结果。支持向量机方法有望成为结构毒性关系研究领域中有用的化学计量学工具。
With the development of genomics, information technology, and biological inspection means, the amount of biological information is rapidly increasing. The tremendous resources of biological information lead to the birth of a new interdisciplinary ? bioinformatics. Researchers have been exploring biological knowledge by capturing, managing, depositing, retrieving and analyzing the biological information. Data mining is used to extract potential and useful information from the databases, and playing an increasingly important role in the study of bioinformatics. In this dissertation, ensemble learning methods are used to investigate identification and prediction of biological activities and toxicities of some small organic molecules. The main contributions of the dissertation can be summarized as follows.
     I. Prediction of biological function of small molecules based on ensemble learning algorithm
     Studies on biological functions of small molecules can help understand biological phenomena in molecular biology and disease mechanism in medicine. To discover biological functions of small molecules, a great deal of manpower, materials and financial resources are required in experiments. In this study, an ensemble learning approach is proposed. Based on the AdaBoost method with function group composition, a novel method was used to quickly map the small chemical molecules back to the possible metabolic pathway which the small molecules belonged. As a result, 10-folds cross validation test and independent set test on the model reached 73.71% and 73.8%, respectively. It is concluded that the proposed approach is promising in mapping unknown molecules’possible metabolic pathway. Based on the models for predicting small molecules’metabolic pathways, an online predictor developed in our laboratory is available at http://chemdata.shu.edu.cn/pathway.
     II: Prediction of interaction between enzymes and small molecules in metabolic pathways with integrated multiple classifiers
     Information about interactions between enzymes and small molecules is important for understanding various metabolic bioprocesses. We applied a majority voting system to predict the interaction between enzymes and small molecules in the metabolic pathways by combining several classifiers including AdaBoost, Bagging and KNN. The advantage of the strategy is attributed to the fact that a predictor based on majority voting systems usually can provide results with better reliability than any single classifier. The prediction accuracy of a training dataset and an independent testing dataset were 82.8% and 84.8%, respectively. The prediction accuracy for the networking couples in the independent testing dataset was 75.5%, about 4% higher than that reported in a previous study. An implementation of the proposed prediction method is available at http://chemdata.shu.edu.cn/small-enz.
     III. Quantitative structure-property relationship based on support vector regression for narcotics toxicities
     Quantitative structure-toxicity relationship of narcotics was studied using support vector regression, multiple linear regression, partial least squares, and back propagation artificial neural network. The molecular descriptors contributing to toxicities were selected from various features obtained using quantum chemistry methods. The root-mean-square errors of SVR, MLR, PLS and BP-ANN models were 0.283, 0.385, 0.392 and 0.466 respectively. The results indicate that the prediction accuracy of SVR model is higher than those of MLR, PLS and BP-ANN models. It is expected that SVR is a useful chemometric tool in the research of structure-toxicity relationship.
引文
【1】.张春霆,生物信息学的现状与展望[J].世界科技研究与发展. 2000, 6: 17-20.
    【2】.Broom M. Statistical methods in bioinformatics: an introduction. Journal of the Royal Statistical Society Series a-Statistics in Society. 2006, 169:170-3.
    【3】. Jones DT, Sternberg MJE, Thornton JM. Introduction. Bioinformatics: from molecules to systems. Philosophical Transactions of the Royal Society B-Biological Sciences. 2006, 361(1467):389-91.
    【4】.Magoulas G, Dounias G. Introduction to the special issue on intelligent technologies in medicine and bioinformatics. Computers in Biology and Medicine. 2006, 36(10):1045-8.
    【5】. Broom M. Introduction to mathematical methods in bioinformatics. Journal of the Royal Statistical Society Series a-Statistics in Society. 2005, 168:461-2.
    【6】.张阳德,生物信息学,2004,北京:科学出版社.
    【7】.张成岗,贺福初,生物信息学方法与实践,2002,北京:科学出版社.
    【8】.王哲,生物信息学概论,2002,北京:第四军医大学出版社.
    【9】.Benson DA, Boguski MS, Lipman DJ, Ostell J. GenBank. Nucleic Acids Research. 1997, 25(1):1-6.
    【10】.Stoesser G, Sterk P, Tuli MA, Stoehr PJ, Cameron GN. The EMBL Nucleotide Sequence Database.Nucleic Acids Research. 1997, 25(1):7-13.
    【11】.Tateno Y, Gojobori T. DNA Data Bank of Japan in the age of information biology. Nucleic Acids Research. 1997, 25(1):14-7.
    【12】.Tateno Y, Gojobori T. [Genome biology and DNA data bank]. Tanpakushitsu Kakusan Koso. 1997, 42(17 Suppl):3052-61.
    【13】.Bairoch A, Apweller R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Research. 1997, 25(1):31-6.
    【14】.Sidman KE, George DG, Barker WC, Hunt LT. The Protein Identification Resource (Pir). Nucleic Acids Research. 1988, 16(5):1869-71.
    【15】.桂现才,彭宏,王小华,C4.5算法在保险客户流失分析中的应用[J].计算机工程与应用, 2005, 17: 197-200.
    【16】.Lu G, Ni J. Highlighting computations in bioscience and bioinformatics: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB07). BMC Bioinformatics. 2008, 9 Suppl 6:S1.
    【17】.Wang J. Computational biology of genome expression and regulation - A review of microarray bioinformatics. Journal of Environmental Pathology Toxicology and Oncology. 2008,27(3):157-79.
    【18】.Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007, 23:2507-17.
    【19】.Deng YP, Ni J, Zhang CY. Development of computations in bioscience and bioinformatics and its application: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB06). Bmc Bioinformatics. 2006, 7.
    【20】.Ezziane Z. Applications of artificial intelligence in bioinformatics: A review. Expert Systems with Applications. 2006, 30(1):2-10.
    【21】.Pal SK, Bandyopadhyay S, Ray SS. Evolutionary computation in bioinformatics: A review. Ieee Transactions on Systems Man and Cybernetics Part C-Applications and Reviews. 2006, 36:601-15.
    【22】.Teufel A, Krupp M, Weinmann A, Galle PR. Current bioinformatics tools in genomic biomedical research (Review). International Journal of Molecular Medicine. 2006, 17(6):967-73.
    【23】.Liew AWC, Yan H, Yang MS. Pattern recognition techniques for the emerging field of bioinformatics: A review. Pattern Recognition. 2005, 38(11):2055-73.
    【24】.Baldi P, Brunk S. Bioinformatics: The Machine Learning Approach. 2001. London: MIT Press;
    【25】.Etzold T and Argos P, SRS—an indexing and retrieval tool for flat file data libraries. Comput Applic Biosci 1993, 9 :. 49–57
    【26】.Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system, Methods Enzymol. 1996, 266:141-62
    【27】.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990, 215(3):403-10.
    【28】.Pearson WR. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol. 2000, 132:185-219.
    【29】.Thompson JD, Higgins DG, Gibson TJ. Clustal -W - Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Research. 1994, 22(22):4673-80.
    【30】.陈凯,朱钰,机器学习及其相关算法综述.统计与信息论坛,2007, 22: 105-12.
    【31】.张晓龙,杨艳霞,机器学习在生物信息学中的应用.武汉科技大学学报(自然科学版),2005, 28:201-4.
    【32】.Bansal AK. Bioinformatics in microbial biotechnology - a mini review. Microbial Cell Factories. 2005, 4.
    【33】. Cai YD, Feng KY, Li YX, et al. Support Vector Machine for predicting alpha-turn types [J]. Peptides, 2003, 24(4): 629-630.
    【34】.徐光宪, 21世纪的化学是研究泛分子的科学,中国科学基金,2002:70-76.
    【35】.Crum-Brown A, Fraser T, R. . On the connection between chemical constitution and physiological action. 1868-1869: Transactions of the Royal Society of Edinburgh;
    【36】.Hansch C, Maloney PP, Fujita T, Muir RM. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature. 1962, 194(4824):178-80.
    【37】.Hansch C, Muir RM, Fujita T, Maloney PP, Geiger F, Streich M. The correlation of biological activity of plant growth regulators and chloromycetin derivatives with Hammett constants and partition coefficients. Journal of American Chemical Society. 1963, 85(18):2817-24.
    【38】.Fujita T, Iwasa J, Hansch C. A new substituent constant,π, derived from partition coefficients. Journal of American Chemical Society. 1964, 86(23):5175-80.
    【39】.Free SM, Wilson JM. A mathematical contribution to structure-activity studies Journal of Medicinal Chemistry. 1964, 7(4):395-9.
    【40】.Hansch C, Kurup A, Garg R, Gao H. Chem-bioinformatics and QSAR: A review of QSAR lacking positive hydrophobic terms. Chemical Reviews. 2001, 101(3):619-72.
    【41】.Crippen GM. Distance Geometry and Conformational Calculations, in Chemometric Research Studies. 1987. Chichester: Wiley;
    【42】.Hopfinger AJ. A QSAR investigation of dihydrofolate reductase inhibition by Baker triazines based upon molecular shape analysis. Journal of American Chemical Society. 1980, 102(24):7196-206.
    【43】.Cramer RD, Patterson DE, Bunce JD. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. Journal of American Chemical Society. 1988, 110(18):5959-67.
    【44】.史忠植,知识发现,2002.北京:清华大学出版社;
    【45】.Sreerama KM. Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey. Data Mining and Knowledge Discovery. 1998, 2:345-89.
    【46】.Hong JR. AE1: Extension matrix approximate method for the general covering problem. International Journal of Computer & Information Science. 1985, 14(6):421-37.
    【47】.胡学钢,李楠,基于属性重要度的随机决策树学习算法,合肥工业大学学报(自然科学版),2007, 30:681-5.
    【48】.钮冰,基于集成学习算法的若干生物信息学问题研究,上海大学博士学位论文,2009:23-24.
    【49】.袁友浪,应用机器学习算法预报蛋白质与核酸的相互作用,上海大学硕士学位论文,2009:10-11.
    【50】.张华伟,王明文,甘丽新,基于随机森林的文本分类模型研究,山东大学学报(理学版),2006,41:139-43.
    【51】.Domine D., Devillers J., Chastrette M., Karcher W.. Non-linear mapping for structure-activity and structure-property modeling. Journal of Chemomatrics 1993, 7: 227-242
    【52】.Wang ZY, Jenq-Hwang, Kowalski Bruce R., ChemNets: Theory and Application, Analytical Chemistry, 1995, 67(9):1497-1504
    【53】.Ruffini R. et al., nUsing neural network for springback minimization in a channel forming process, SAE Trans. J. Mater. Manufacture, 1998, 107, 65
    【54】.Fukunaga K.. Introduction to statistical pattern recognition. Academic. New York; 1972
    【55】.陈念贻,钦佩,陈瑞亮,陆文聪,《模式识别在化学化工中的应用》,北京:科学出版社,2000
    【56】.Chen NY, Lu WC, Chemometric Methods Applied to Industrial Optimization and Materials Optimal Design, Chemometrics and intelligent laboratory systems, 1999, 45, 329-333
    【57】.Chen Nianyi, Lu Wencong, Software Package“Materials Designer”and its Application in Materials Research, IPMM’99, Hawaii, USA, July, 1999
    【58】.Lu WC, Yan LC, Chen NY, Pattern Recognition and ANNS Applied to the Formobility of Complex Idide, Journal of Molecular Science, 1995,11(1): 33
    【59】.刘亮,包新华,冯建星,陆文聪,陈念贻,α-唑基-α-芳氧烷基频哪酮(芳乙酮)及其醇式衍生物抗真菌活性的分子筛选,计算机与应用化学,2002,19(4) : 465
    【60】.陆文聪,包新华,吴兰,孔杰,阎立诚,陈念贻,二元溴化物系(MBr-M’Br2)中间化合物形成规律的逐级投影法研究,计算机与应用化学,.2002,19(4) : 474
    【61】.陆文聪,冯建星,陈念贻,二种过渡元素和一种非过渡元素间形成三元金属间化合物的规律,计算机与应用化学,2000,17(1) : 43
    【62】.陆文聪,阎立诚,陈念贻, PVPEC-PTC和V-PTC材料优化设计专家系统,计算机与应用化学,1996,13(1): 39
    【63】.Vapnik Vladimir N., The Nature of Statistical Learning Theory. Berlin, Springer, 1995
    【64】.Wan, Vincent; Campbell, William M., Support vector machines for speaker verification and identification, Neural Networks for Signal Processing - Proceedings of the IEEE Workshop 2, 2000:775-784
    【65】.Thorsten Joachims, Learning to Classify Text Using Support Vector Machines. Dissertation, Universitaet Dortmund, February 2001.
    【66】.Burbidge R, Trotter M, Buxton B, Holden S, Drug design by machine learning: support vector machines for pharmaceutical data analysis, Computer and Chemistry, 2001, 26 (1): 5-14
    【67】.Trotter MWB, Buxton BF, Holden SB, Support vector machines in combinatorial chemistry, Measurement and Control, 2001, 34(8): 235-239
    【68】.Van Gestel T, Suykens JAK, Baestaens DE, Lambrechts A, Lanckriet G, Vandaele B, De Moor B, Vandewalle J, Financial time series prediction using least squares support vector machines within the evidence framework, IEEE Transactions on Neural Networks, 2001, 12(4): 809-821
    【69】.V.Vapnik著,张学工译:统计学习理论的本质,北京,清华大学出版社, 2000
    【70】.袁亚湘,孙文瑜,《最优化理论与方法》,北京,科学出版社, 1999.
    【71】.Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK, Improvement to Platt’s SMO Algorithm for SVM Classifier Design, Technical Report CD-99-14 Dept. of Mechanical and Production Engineering National University of Singapore, 1999.
    【72】.Platt JC, Fast Training of Support Vector Machines Using Sequential Minimal Optimization, Advances in Kernel Methods: Support Vector Machines (Edited by Scholkopf B, Burges C, Smola A), Cambridge MA, MIT Press 1998: 41-64.
    【73】.Smola Alex J, Scholkopf Bernhard, A Tutorial on Support Vector Regression, NeuroCOLT2 Technical Report Series NC2-TR-1998-030 (http//www.neurocolt.com), 1998.
    【74】.陶卿,曹进德,孙德敏,基于支持向量机分类的回归方法,软件学报, 2002, 13(5): 1024-1027.
    【75】.叶晨洲,数据挖掘算法泛化能力与软件平台的研究与应用,上海交通大学博士学位论文, 2002: 71-75.
    【76】.陈全,赵文辉,李洁,江雨燕,选择性集成学习算法的研究,计算机技术与发展,Vol.20,No.2,Feb.2010
    【77】.Zhou ZH, Wu JX, Tang W. Ensembling neural networks: many could be better than all. Artificial Intelligence. 2002, 137(1-2):239-63.
    【78】.Zhou ZH, Wu JX, Tang W, Chen ZQ. Combining regression estimators: GA-based selective neural network ensemble. International Journal of Computational Intelligence and Applications. 2001, 1(4):341-56.
    【79】.Hansen LK, Slamon P. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990, 12(10):933-1001.
    【80】.Schapire RE. The strength of weak learnability. Machine Learning. 1990, 5(2):197-227.
    【81】.Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput System Sc. 1997, 55(1):119-39.
    【82】.Schapire RE. The Boosting Approach to Machine Learning : An Overview. MSRI Workshop on Nonlinear Estimation and classification; 2002.
    【83】.Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics. 1998, 26(5):1651-86.
    【84】.Freund Y. Boosting a weak algorithm by majority. Information and computation. 1995,121(2):256-85.
    【85】.Freund Y, Schapire RE. Large margin classification using the perceptron algorithm. Mach Learn. 1999, 37(3):277-296.
    【86】.马冉冉,集成学习算法研究,山东科技大学硕士学位论文, 2010:26-31.
    【87】.付忠良,赵向辉,苗青,姚宇,基于属性组合的集成学习算法,计算机应用,Vol.30,No.2,Feb.2010
    【88】.陆文聪,李国正,刘亮,包新华,《化学数据挖掘方法与应用》,上海,化学工业出版社, 2011
    【89】.Dash MH L. Feature selection for classification. Intelligent Data Analysis. 1997, 1:131–56.
    【90】.Paakkunainen M, Reinikainen SP, Minkkinen P. Estimation of the variance of sampling of process analytical and environmental emissions measurements. Chemometrics and Intelligent Laboratory Systems. 2007, 88(1):26-34.
    【91】.Anderssen E, Dyrstad K, Westad F, Martens H. Reducing over-optimism in variable selection by cross-model validation. Chemometrics and Intelligent Laboratory Systems. 2006, 84(1-2):69-74.
    【92】.Gidskehaug L, Anderssen E, Alsberg BK. Cross model validated feature selection based on gene clusters. Chemometrics and Intelligent Laboratory Systems. 2006, 84(1-2):172-6.
    【93】.Peng HC, Long FH, Ding C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE T PATTERN ANAL. 2005, 27(8):1226-38.
    【94】.Kohavi R, John G. Wrapper for Feature Subset Selection. Artif Intell. 1997, 1-2:273-324.
    【95】.Hall MA. Practical feature subset selection for machine learning. Proceedings of the Twenty first Australian Computer Science Conference: Springer; 1998.
    【96】.Burkart MD. Metabolic engineering-a genetic toolbox for small molecule organic synthesis. Org Biomol Chem. 2003, 1(1):1-4.
    【97】.Boros LG, Boros TF. Use of metabolic pathway flux information in anticancer drug design. Ernst Schering Found Symp Proc. 2007, (4):189-203.
    【98】.de Atauri P, Sorribas A, Cascante M. Analysis and prediction of the effect of uncertain boundary values in modeling a metabolic pathway. Biotechnology and Bioengineering. 2000, 68(1):18-30.
    【99】.Girgis RR, Javitch JA, Lieberman JA. Antipsychotic drug mechanisms: links between therapeutic effects, metabolic side effects and the insulin signaling pathway. Molecular Psychiatry. 2008, 13(10):918-29.
    【100】.Moreno-Sanchez R, Encalada R, Marin-Hernandez A, Saavedra E. Experimental validation of metabolic pathway modeling - An illustration with glycolytic segments from Entamoeba histolytica. Febs Journal. 2008, 275(13):3454-69.
    【101】.Pireddu L, Szafron D, Lu P, Greiner R. The Path-A metabolic pathway prediction web server.Nucleic Acids Research. 2006, 34:W714-W9.
    【102】.Anishetty S, Pulimi M, Pennathur G. Potential drug targets in Mycobacterium tuberculosis through metabolic pathway analysis. Computational Biology and Chemistry. 2005, 29(5):368-78.
    【103】.Goto S, Nishioka T, Kanehisa M. LIGAND: Chemical Database for Enzyme Reactions. Bioinformatics. 1998, 14:591-9.
    【104】.Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006, 34(Database issue):D354-7.
    【105】.Chou KC. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins. 1995, 21:319-344.
    【106】.Chou K, Zhang C. Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology. 1995, 30:275-349.
    【107】.Chen C, Tian YX, Zou XY, Cai PX, Mo JY. Using pseudo-amino acid composition and support vector machine to predict protein structural class. Journal of Theoretical Biology. 2006, 243(3):444-448.
    【108】.Sun XD, Huang RB. Prediction of protein structural classes using support vector machines. Amino Acids. 2006, 30(4):469-75.
    【109】.Zhou GP, Assa-Munt N. Some insights into protein structural class prediction. Proteins Struct Funct Genet. 2001, 44(1):57-9.
    【110】.Cai YD, Chou KC. Artificial neural network model for predicting alpha-turn types. Analytical Biochemistry. 1999, 268(2):407-9.
    【111】.Cai YD, Feng KY, Lu WC, Chou KC. Using LogitBoost classifier to predict protein structural classes. Journal of Theoretical Biology. 2006, 238(1):172-6.
    【112】.Cai YD, Liu XJ, Xu XB, Chou KC. Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. Journal of Cellular Biochemistry. 2002, 84(2):343-8.
    【113】.Gao Y, Shao S, Xiao X, Ding Y, Huang Y, Huang Z, et al. Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids. 2005, 28(4):373-376.
    【114】.Cai YD, Chou KC. Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Communi. 2003, 305(2):407-11.
    【115】.Chou KC, Shen HB. Large-scale plant protein subcellular location prediction. Journal of Cellular Biochemistry. 2007, 100(3):665-78.
    【116】.Shen HB, Chou KC. Gpos-PLoc: an ensemble classifier for predicting subcellular localization ofGram-positive bacterial proteins. Protein Engineering Design & Selection. 2007, 20(1):39-46.
    【117】.Shen HB, Chou KC. Virus-PLoc: A fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers. 2007, 85(3):233-40.
    【118】.Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J. Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids. 2007, 33(1):69-74.
    【119】.Cai YD, Chou KC. Predicting subcellular localization of proteins in a hybridization space. Bioinformatics. 2004, 20(7):1151-6.
    【120】.Cai YD, Liu XJ, Chou KC. Artificial neural network model for predicting protein subcellular location. Computers & Chemistry. 2002, 26(2):179-82.
    【121】.Chou KC, Cai YD. Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochemical and Biophysical Research Communications. 2004, 320(4):1236-1239.
    【122】.Chou KC, Cai YD. Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. Journal of Cellular Biochemistry. 2004, 91(6):1197-203.
    【123】.Chou KC, Elrod DW. Prediction of cellular location of proteins. Abstracts of Papers of the American Chemical Society. 1998, 216:U208-U.
    【124】.Chou KC, Elrod DW. Protein subcellular location prediction. Protein Engineering. 1999, 12(2):107-18.
    【125】.Chou KC, Shen HB. Large-scale predictions of gram-negative bacterial protein subcellular locations. J Proteome Res. 2006, 5(12):3420-8.
    【126】.Chou KC, Cai YD. Predicting protein-protein interactions from sequences in a hybridization space. J Proteome Res. 2006, 5(2):316-22.
    【127】.Chou KC, Shen HB. Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc. 2008, 3(2):153-62.
    【128】.Jia PL, Qian ZL, Zeng ZB, Cai YD, Li YX. Prediction of subcellular protein localization based on functional domain composition. Biochemical and Biophysical Research Communications. 2007, 357(2):366-370.
    【129】.Niu B, Jin YH, Feng KY, Lu WC, Cai YD, Li GZ. Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Molecular Diversity. 2008, 12(1):41-5.
    【130】.Cai YD, Zhou GP, Chou KC. Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J. 2003, 84(5):3257-63.
    【131】.Cai YD, Ricardo PW, Jen CH, Chou KC. Application of SVM to predict membrane protein types. J Theor Biol. 2004, 226(4):373-6.
    【132】.Cai YD, Liu XJ, Chou KC. Artificial neural network model for predicting membrane protein types. Journal of Biomolecular Structure & Dynamics. 2001, 18(4):607-10.
    【133】.Cai YD, Chou KC. Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. Journal of Theoretical Biology. 2006, 238(2):395-400.
    【134】.Marchand-Geneste N, Watson KA, Alsberg BK, King RD. New Approach to Pharmacophore Mapping and QSAR Analysis Using Inductive Logic Programming. Application to Thermolysin Inhibitors and Glycogen Phosphorylase b Inhibitors. 2002. p. 399-409.
    【135】.Niu B, Jin YH, Lu L, Fen KY,Gu L, He ZS, Lu WC, Li YX, Cai Y, Prediction of interaction between small molecule and enzyme using AdaBoost. Mol. Divers. 2009, 13, (3), 313-320.
    【136】.Brooksbank C, Cameron G, Thornton J, The European Bioinformatics Institute's data resources: towards systems biology. Nucleic. Acids. Res. 2005, 33, 46–53.
    【137】.Sarah AT, Stuart CGR, Janet MT, Monica R, Julian G. , Cyrus C, Small-molecule metabolism: an enzyme mosaic. Trends Biotechnol. 2001, 19, (12), 482-486.
    【138】.Chou KC, Cai YD, A novel approach to predict active sites of enzyme molecules.Proteins 2004, 55, (1), 77-82.
    【139】.Chou KC, Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005, 21, (1), 10-19.
    【140】.Chou KC, Cai YD, Zhong WZ, Predicting networking couples for metabolic pathways of Arabidopsis. Experimental and Clinical Sciences International online journal for advances in science 2006, 5, 55-65.
    【141】.Cai YD, Muldoon M, Metabolic Pathway Modeling by Using the Nearest Neighbor Algorithm. MIMS EPrint 2007, (110), 1-21.
    【142】. Chen L, ZQ, Feng KY, Cai YD, Prediction of interactiveness between small molecules and enzymes by combining gene ontology and compound similarity. J. Comput. Chem. 2010, 31, (8), 1766-1776.
    【143】.Chen L, Lu L, Feng K, Li W, Song J, Zheng L, Yuan Y, Zeng Z, Lu W, Cai Y, Multiple classifier integration for the prediction of protein structural classes. J. Comput. Chem. 2009, 10.1002/jcc.21230
    【144】.Josef K, Mohamad H, Robert PWD, Jiri M, On Combining Classifiers. IEEE Computer Society 1998, 20, (3), 226-239.
    【145】.Ruta D, Gabrys B, Classifier selection for majority voting. Information Fusion 2005, 6, (1), 63-81.
    【146】.Rahman AFR, Alam H, Fairhurst MC, Multiple classifier combination for character recognition: Revisiting the majority voting system and its variations. Document Analysis System V, Proceedings 2002, 2423, 167-178.
    【147】.Chou KC, Shen HB, A New Method for Predicting the Subcellular Localization of EukaryoticProteins with Both Single and Multiple Sites: Euk-mPLoc 2.0. 2010, 5, e9931.
    【148】.Chou KC, Shen HB, Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization. PLoS ONE, 2010, 5, e11335.
    【149】.Chou KC, Shen HB, Euk-mPLoc: A fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J. Proteome Res. 2007, 6, (5), 1728-1734.
    【150】.Chou KC, Shen HB, Hum-PLoc: A novel ensemble classifier for predicting human protein subcellular localization. Biochem. Bioph. Res. Co 2006, 348, (4), 1479-1479.
    【151】. Chou KC, Shen HB, Predicting protein subcellular location by fusing multiple classifiers. J. Cell Biochem. 2006, 99, (2), 517-527.
    【152】.Nanni L, Lumini A, Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization. Amino Acids 2008, 34, (4), 653-660.
    【153】.Nanni L, Mazzara S, Pattini, L.; Lumini, A., Protein classification combining surface analysis and primary structure. Protein Eng. Des. Sel. 2009, 22, (4), 267-272.
    【154】. Nanni L, Lumini A, Using ensemble of classifiers in Bioinformatics. Nova publisher: 2008; Vol. Machine Learning Research Progress.
    【155】.Schapire RE, SingerY, Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 1999, 37, 297–336.
    【156】.Niu B, Cai YD, Lu WC, Li GZ, Chou KC, Predicting protein structural class with AdaBoost learner. Protein Peptide. Lett. 2006, 13, (5), 489-492.
    【157】.Breiman L, Bagging predictors. Mach. Learn. 1996, 24, (2), 123-140.
    【158】.Dong LH, Yuan Y, Cai YD, Using bagging classifier to predict protein domain structural class. J. Biomol. Struct. Dyn. 2006, 24, (3), 239-242.
    【159】.Cristianini, N.; Shawe-Taylor, J., An introduction to support vector machines. Cambridge University Press: Cambridge, UK, 2000.
    【160】.Ding YS, Zhang TL, Chou KC, Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Peptide. Lett. 2007, 14, (8), 811-815.
    【161】.Wang M, Yang J, Liu GP, Xu ZJ, Chou KC, Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. Protein Eng. Des. Sel. 2004, 17, (6), 509-516.
    【162】.Zhou XB, Chen C, Li ZC, Zou XY, Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J. Theor. Biol. 2007, 248, 546-551.
    【163】.Shen HB, Chou KC, Predicting protein subnuclear location with optimized evidence-theoreticK-nearest classifier and pseudo amino acid composition. Biochem. Bioph. Res. Co 2005, 337, (3), 752-756.
    【164】.Shen HB, Yang J, Chou KC, Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J. Theor. Biol. 2006, 240, (1), 9-13.
    【165】.Quinlan R. , C4. 5: Programs for Machine Learning Morgan Kaufmann Publishers: San Mateo, CA, 1993.
    【166】.Freund Y, Iyer R, Schapire RE, Singer Y, An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 2004, 4, (6), 933-969.
    【167】.Ian H. Witten EF, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition). Morgan Kaufmann: San Francisco, 2005; p 525.
    【168】.Chou KC, Shen HB, Recent progress in protein subcellular location prediction. Anal. Biochem. 2007, 370, (1), 1-16.
    【169】.Chou KC, Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology. Current Proteomics 2009, 6, (4), 262-274.
    【170】.Chou KC, Prediction of protein cellular attributes using pseudo-amino acid composition Proteins 2001, 43, 246-255.
    【171】.Chou KC, Cai YD, Prediction and classification of protein subcellular location - Sequence-order effect and pseudo amino acid composition. J. Cell Biochem. 2003, 90, (6), 1250-1260.
    【172】.Shen HB, Chou KC, Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. Biochem. Bioph. Res. Co 2005, 334, (1), 288-292.
    【173】.Wang SQ, Yang J, Chou KC, Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. J. Theor. Biol. 2006, 242, (4), 941-946.
    【174】.Xiao X, Shao SH, Huang ZD, Chou KC, Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor. J. Comput. Chem. 2006, 27, (4), 478-482.
    【175】.Liu H, Yang J, Wang M, Xue L, Chou KC, Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types. Protein J. 2005, 24, (6), 385-389.
    【176】.Xiao X, Shao S, Ding Y, Huang Z, Chou KC, Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 2006, 30, (1), 49-54.
    【177】.Mundra P, Kumar M, Kumar KK, Jayaraman VK, Kulkarni BD, Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSM. Pattern. Recogn. Lett. 2007, 28, (13), 1610-1615.
    【178】.Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY, Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion. Amino Acids 2006, 30, (4), 461-468.
    【179】.Lin H, Li QZ, Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem. Bioph. Res. Co 2007, 354, (2), 548-551.
    【180】. Lin H, Li QZ, Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components. J. Comput. Chem. 2007, 28, (9), 1463-1466.
    【181】.Chen YL, Li QZ, Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J. Theor. Biol. 2007, 248, 377-381.
    【182】.Liu H, Wang M, Chou KC, Low-frequency Fourier spectrum for predicting membrane protein types. Biochem. Bioph. Res. Co 2005, 336, (3), 737-739.
    【183】.Shen HB, Chou KC, Ensemble classifier for protein fold pattern recognition. Bioinformatics 2006, 22, (14), 1717-1722.
    【184】.Shen HB, Yang J, Chou KC, Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 2007, 33, (1), 57-67.
    【185】.Chou KC, Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Bioph. Res. Co 2000, 278, (2), 477-483.
    【186】.Xiao X, Shao SH, Ding YS, Huang ZD, Chen XJ, Chou KC, An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. J. Theor. Biol. 2005, 235, (4), 555-565.
    【187】.Chou KC, Cai YD, Predicting protein quaternary structure by pseudo amino acid composition. Proteins 2003, 53, (2), 282-289.
    【188】.Chou KC, Cai YD, Predicting protein structural class by functional domain composition. Biochem. Bioph. Res. Co 2004, 321, (4), 1007-1009.
    【189】.Bugg T, An Introduction to Enzyme and Coenzyme Chemistry Blackwell Publishing: Oxford, 1997.
    【190】.Zhang TL, Ding YS, Chou KC, Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence. Comput. Biol. Chem. 2006, 30, (5), 367-371.
    【191】.Xiao X, Chou KC, Digital coding of amino acids based on hydrophobic index. Protein Pept Lett 2007, 14, (9), 871-5.
    【192】.Du QS, Li DP, He WZ, Chou KC, Heuristic molecular lipophilicity potential (HMLP): Lipophilicity and hydrophilicity of amino acid side chains. J. Comput. Chem. 2006, 27, (6), 685-692.
    【193】.Tusnady GE, Simon I, Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 1998, 283, 489-506.
    【194】.Mucchielli-Giorgi MH, Hazout S, Tuffery P, PredAcc: prediction of solvent accessibility. 1999, 15, (2), 176-177.
    【195】.Creighton TE, Proteins– Structures and Molecular Properties. 2nd ed.; Freeman: New York., 1993.
    【196】.Gangardiwala A, Polikar R., In Dynamically weighted majority voting for incremental learning and comparison of three boosting based approaches, Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on, 2005; 2005; pp 1131-1136 vol. 2.
    【197】.Zhou GP, An intriguing controversy over protein structural class prediction. J. Protein Chem. 1998, 17, (8), 729-738.
    【198】.Cai YD, He JF, Li XL, Feng KY, Lu L, Feng KR, Kong XY, Lu WC, Prediction of Protein Subcellular Locations with Feature Selection and Analysis. Protein Peptide. Lett. 17, (4), 464-472.
    【199】.Esmaeili M, Mohabatkar H, Mohsenzadeh S, Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. J. Theor. Biol. 263, (2), 203-209.
    【200】.Chou KC, Shen HB, Review: recent advances in developing web-servers for predicting protein attributes. Natural Science, 2009, 2, 63-92 (openly accessible at http://www.scirp.org/journal/NS/).
    【201】.Basak SC, Grunwald GD, Gute BD, Balasubramanyam KJ, Chem. Inf. Comput. Sci., 2000, 40: 885.
    【202】.Basak SC, Bertelsen S, Grunwald GD. Toxicol. Lett., 1995, 79: 239.
    【203】.Baker JR, Mihelcic JR, Sabljic A. Reliable QSAR for estimating Koc for persistent organic pollutants: correlation with molecular connectivity indices. Chemosphere, 2001, 45: 213-221.
    【204】.Sabljic A. QSAR models for estimating properties of persistent organic pollutants required in evaluation of their environmental fate and risk. Chemosphere, 2001, 43: 363-375.
    【205】.Gramatica P, Pilutti P, Papa E. Validated QSAR prediction of OH tropospheric degradation of VOCs: splitting into training-test sets and consensus modeling. Journal of Chemical Information and Computer Sciences, 2004, 44 (5): 1794-1802.
    【206】.Netzeva TI, Dearden JC, Edwards R, Worgan ADP, Cronin MTD. QSAR analysis of the toxicity of aromatic compounds to Chlorella vulgaris in a novel short-term assay. Journal of Chemical Information and Computer Sciences, 2004, 44 (1): 258-265.
    【207】.Medven Z, Gusten H, Sabljic A. Comparative QSAR study on hydroxyl radical reactivity with unsaturated hydrocarbons: PLS versus MLR. Journal of Chemometrics, 1996, 10: 135-147.
    【208】.Luco JM, Ferretti FH. QSAR based on multiple linear regression and PLS methods for the anti-HIV activity of a large group of HEPT derivatives. Journal of Chemical Information and Computer Sciences, 1997, 37(2): 392-401.
    【209】.Tuppurainen K, Ruuskanen J. Electronic eigenvalue (EEVA): a new QSAR/QSPR descriptor for electronic substituent effects based on molecular orbital energies. A QSAR approach to the Ahreceptor binding affinity of polychlorinated biphenyls (PCBs), dibenzo-p-dioxins (PCDDs) and dibenzofurans (PCDFs). Chemosphere, 2000, 41: 843-848.
    【210】.Stanton DT. On the physical interpretation of QSAR models. Journal of Chemical Information and Computer Sciences, 2003, 43 (5): 1423-1433.
    【211】.Burden FR, Rosewarne BS, Winkler DA. Predicting maximum bioactivity by effective inversion of neural networks using genetic algorithms. Chemometrics and Intelligent Laboratory Systems, 1997, 38: 127-137.
    【212】.Kovesdi I, Dominguez-Rodriguez MF, Orfi L, Naray-Szabo G, Varro A, Papp JG, Matyus P. Application of neural networks in structure-activity relationships. Medicinal Research Reviews, 1999, 19(3); 249-269.
    【213】.Jalali-Heravi M, Parastar F. Use of artificial neural networks in a QSAR study of anti-HIV activity for a large group of HEPT derivatives. Journal of Chemical Information and Computer Sciences, 2000, 40(1): 147-154.
    【214】.Yao XJ, Panaye A, Doucet JP, Zhang RS, Chen HF, Liu MC, Hu ZD, Fan BT, 2004. Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression. J. Chem. Inf. Comput. Sci. 44, 1257-1266.
    【215】.Yang SS, Lu WC, Ji XB, Chen NY. Support vector classification for SAR of 5-HT3 receptor antagonists. Journal of Shanghai University (English Edition), 2006, 10 (4): 366-370.
    【216】.Luan F, Ma WP, Zhang XY, Zhang HX, Liu MC, Hu ZD, Fan BT. Quantitative structure-activity relationship models for prediction of sensory irritants (logRD50) of volatile organic chemicals. Chemosphere, 2006, 63: 1142-1153.
    【217】.Ijay K, Agrawala, Padmakar, and Khadikar V. QSAR study on narcotic mechanism of action and toxicity: a molecular connectivity approach to vibrio fischeri toxicity testing. Bioorganic & Medicinal Chemistry, 2002, 10: 3517-3522.