基于OECD准则对QSAR/QSPR模型几个重要问题的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
摘要:本论文研究主要根据国际经济合作与发展组织(OECD)准则的要求,对定量构效关系(QSAR/QSPR)建模中的几个重要问题进行了研究;此外,对大规模分子结构数据库的生物标识亦进行了初步探索研究。
     本论文第一章首先阐述了OECD准则的内容及其对QSAR/QSPR研究的重要指导意义。然后,基于OECD准则要求,提出QSAR/QSPR建模中需要研究的几个重要问题,分别为提高QSAR/QSPR模型准确性和稳健性的方法、模型应用域定义方法及模型解释。
     第二章主要对提高线性QSAR/QSPR模型准确性和稳健性的方法进行研究。我们在非相关线性判别法(ULDA)的基础上进行改进,提出一种新的模型方法,此外,我们还提出了一种新的变量优选方法。我们采用新的模型方法结合变量优选方法(MULDA-RFE)对五组ADMET相关性质及一组Xa凝血因子抑制性数据进行QSAR/QSPR建模预测。结果表明,相对于原有算法,新方法得到的预测准确性和稳健性都有提高。通过与参考文献中一系列线性和非线性模型的比较,新方法的预测结果要优于或与这些模型的结果相当,说明新方法是一种很有效的QSAR/QSPR建模方法。同时,ULDA-RFE是线性的模型方法,在算法无歧义性和模型可解释性方面具有优势。
     第三章主要以气味分子在不同极性固定相上的保留指数为QSAR/QSPR模型研究对象,对提高偏最小二乘(PLS)线性模型预测的准确性和稳健性进行研究,并对影响气味分子在不同极性固定相保留行为的主要结构特征进行初步分析。得到以下结论:通过引入奇异样本检测的蒙特卡洛(MC)方法和随机青蛙变量选择方法,使模型的预测标准偏差(SDEP)大为减小,模型的R2和Q2参数都比之前有了很大的提高,这表明,奇异样本检测和变量选择方法使模型的预测准确性和稳健性都得到了极大改善。重取样预测误差的统计分布进一步证明了我们所提出的这一套QSAR/QSPR建模方法的有效性。
     第四章主要对QSAR/QSPR模型的准确性和稳健性、模型应用域定义方法及模型解释这几个重要问题进行比较全面的研究讨论。采用的QSAR/QSPR模型预测对象为四组重要的生物活性及毒性数据。在对QSAR/QSPR模型的准确性和稳健性的研究中,我们对比了几种有代表性的描述子和模型方法,结果表明:分子指纹结构描述符如MACCS和PubChem,在和适当的模型方法结合时,其模型准确性及稳健性与与计算型结构描述子Dragon相当;在各类模型方法中,支持向量机(SVM)和随机森林(RF)方法是两种准确性和稳健性较突出的方法。在模型应用域定义方法研究中,我们提出一种基于模型预测概率的应用域定义新方法,并与目前较为常用的基于分子结构相似性的应用域定义方法进行了对比,结果表明:我们所提出的模型应用域定义新方法要优于结构相似性的应用域定义方法;此外,在两种基于模型预测概率的方法中, Prob-SVM要稍优于Prob-RF方法。在模型解释的研究中,我们通过变量选择过程得到的重要分子描述子对各模型的构效关系进行一定分析解释。结果表明:采用适当的变量选择方法,能够为模型的解释提供极大的便利;而采用分子指纹作为结构描述子,可以更直观地挖掘与分子活性相关的结构信息,子结构类型描述子对于很多种类的活性预测有着重要作用。
     第五章中,我们对大规模分子结构数据库的生物活性标识作了初步的探索研究。主要采用PASS程序对接近一百万个化合物进行了生物活性标识;然后通过相似性结构搜索,对活性标识结果进行一定的检验和对比;此外,对活性标识中体现的生物化学型即优势骨架等有用信息也做了一定的挖掘。基于上述的工作,我们得到以下一些初步的结论和展望。我们提出了生物活性标识的重要性,但是,根据我们在大规模数据库生物标识实践中的初步结果分析,我们发现,大规模数据库的生物标识是一个极大的挑战,在今后还有很大的改善空间:需要在生物标识的准确性,生物活性标识的非黑箱性,生物标识的效率与准确性平衡、生物活性与生物化学型本体论定义等方面进行更深入研究。
ABSTRACT: The main works of this dissertation are to study a few key problems in QSAR/QSPR (Quantitative Structure-Activity Relationship) modeling according to the requirements of the OECD (Organization for Economic Co-operation and Development) principles. Moreover, a study toward automated bioactivity annotation of large compound libraries is also carried out.
     In Chapter1, we have discussed the importance of OECD principles for QSAR/QSPR model validation. Based on the five OECD principles, we proposed that there are several key problems in QSAR/QSPR modeling need to be studied. These key problems include how to improve the accuracy and robustness of QSAR/QSPR models, how to define the applicability domain and interpretation of QSAR/QSPR models.
     In Chapter2, we studied on the method for improving the accuracy and robustness of QSAR/QSPR models. We have proposed an M-ULDA (Modified Uncorrelated Linear Discriminant Analysis) algorithm coupled with RFE (Recursive Feature Elimination) method for feature selection as a powerful QSAR modeling method. The QSAR studies on six data sets related to ADMET(Absorption, Distribution, Metabolism, Excretion and Toxicity) properties and inhibition activity of factor Xa were used to evaluate the performance of new method. The results of accuracy and robustness indicate that the new method is superior to the original method. And the comparison with other linear or nonlinear QSAR/QSPR methods has shown that the new method can provide comparable or better predictive accuracy. In addition, the new modeling method is easier to interpret with respect to the nonlinear methods.
     In Chapter3, the studies were mainly focused on the method for promoting the accuracy and robustness of PLS (Partial Least Squares) model. We have introduced the MC outlier detection method and random frog variable selection method recently developed by our laboratory in the QSAR model to predict retention index of237flavor compounds on four stationary phases with different polarity. And the important structural features relating to the flavor compounds'retention behavior on stationary phases with different polarity were explored. The results of SDEP (Standard Deviation Error of Prediction) and Q2show that the accuracy and robustness of PLS model can be significantly improved by using our new method for outlier detection and variable selection. This conclusion has been further confirmed by results of Monte Carlo test.
     In Chapter4, a comprehensive study on accuracy of QSAR/QSPR models, the applicability domain of QSAR/QSPR models and interpretation of models was carried out. Four sets of important bioactivity and toxicity were used for QSAR/QSPR study. For the study on accuracy and robustness of QSAR/QSPR models, we compared the performance of different types of molecular descriptor and modeling methods. The results indicate that the use of molecular descriptors of fingerprint type such as MACCS and Pubchem did not reduce the accuracy and robustness of QSAR/QSPR models compared with the theoretical type Dragon descriptors. Among the different modeling methods studies in this chapter, SVM and RF are superior concerning the accuracy and stability of predicting results. For the discussion about applicability domain of QSAR/QSPR models, we have proposed a novel method for defining the applicability domain. The new method based on predictive probability has been compared with a commonly used method which is based on molecular similarity. The results of assessment indicate that the new method is superior to the method based on molecular similarity. It seems quite reasonable to defining the applicability domain of QSAR/QSPR models by using the new method. Furthermore, we have found that the method based on probability of SVM (support vector machines) is better than that based on probability of RF (Random Forest). For the study on model interpretation, we mainly focused on the effect of variable selection and use of molecular fingerprinting. We have drawn the conclusion that variable selection and use of molecular fingerprinting are both very helpful for model interpretation since they can provide the important substructure related with the activity or property.
     Chapter5describes a process to automatically annotate biochemotypes of compounds in a library and thus to identify bioactivity related chemotypes (biochemotypes) from a large library of compounds. The process consists of two steps:(1) predicting all possible bioactivities for each compound in a library, and (2) deriving possible biochemotypes based on predictions. About a one million (982,889) commercially available compound library (CACL) has been tested using this process. This chapter has demonstrated the importance and feasibility of automatically annotating biochemotypes for large libraries of compounds. Moreover, we suggest the ways in which the systematic bioactivities prediction program should be improved. Firstly, a balance between the automated bioactivity annotation technology and data quality has to be found. The annotation process is very fast by using PASS program. It is equally important that accuracy not be sacrificed. Secondly, an ideal systematic bioactivity prediction tool must indicate privileged structures and be trainable by users. Thirdly, the definition of bioactivities (biochemotype ontology) needs to be better developed in future.
引文
[1]Martin Y C, Kofron J L, Traphagen L M. Do structurally similar molecules have similar biological activity?[J]. Journal of Medicinal Chemistry,2002, 45(19):4350-4358.
    [2]Esposito E, Hopfinger A, Madura J. Methods for applying the quantitative structure-activity relationship paradigm[M]//BAJORATH J. Chemoinformatics. Humana Press,2004:131-213.
    [3]王连生.分子结构、性质与活性[M].北京:化学工业出版社,1997.
    [4]梁逸曾,俞汝勤.化学计量学[M].1.北京:高等教育出版社,2003.
    [5]王鹏.定量构效关系及研究方法[M].哈尔滨:哈尔滨工业大学出版社,2004.
    [6]Hansch C, Muir R M, Fujita T, et al. The correlation of biological activity of plant growth regulators and chloromycetin derivatives with Hammett constants and partition coefficients [J]. Journal of the American Chemical Society,1963, 85(18):2817-2824.
    [7]Free S M, Wilson J W. A Mathematical contribution to structure-activity studies[J]. Journal of Medicinal Chemistry,1964,7(4):395-399.
    [8]Puzyn T, Leszczynski J, Cronin M T. Recent advances in QSAR studies: methods and applications[M]. Dordrecht: Springer Netherlands,2010.
    [9]Development O F E C. Guidance document on the validation of (quantitative) structure-activity relationship [(Q)SAR] models[R]. OECD Environment Health and Safety Publications, Series on Testing and Assessment, No.69, ENV/JM/MONO(2007)2, OECD Paris, France,2007.
    [10]Worth A P, Van Leeuwen C J, Hartung T. The prospects for using (Q)SARs in a changing political environment--high expectations and a key role for the european commission's joint research centre[J]. SAR and QSAR in Environmental Research,2004,15(5-6):331-343.
    [11]Weaver S, Gleeson M P. The importance of the domain of applicability in QSAR modeling[J]. Journal of Molecular Graphics and Modelling,2008, 26(8):1315-1326.
    [12]Tetko I V, Bruneau P, Mewes H, et al. Can we estimate the accuracy of ADME-Tox predictions?[J]. Drug Discovery Today,2006,11(15-16):700-707.
    [13]Dragos H, Gilles M, Alexandre V. Predicting the predictability:a unified approach to the applicability domain problem of QSAR models[J]. Journal of Chemical Information and Modeling,2009,49(7):1762-1776.
    [14]Todeschini R, Consonni V. Handbook of molecular descriptors:Handbook of molecular descriptors[M]. New York: John Wiley & Sons,2008.
    [15]Karelson M. Molecular descriptors in QSAR/QSPR[M]. New York: John Wiley & Sons,2000.
    [16]Katritzky A R, Gordeeva E V. Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research[J]. Journal of Chemical Information and Computer Sciences,1993,33(6):835-857.
    [17]Estrada E, Molina E. Novel local (fragment-based) topological molecular descriptors for QSPR/QSAR and molecular design[J]. Journal of Molecular Graphics and Modelling,2001,20(1):54-64.
    [18]Karelson M, Lobanov V S, Katritzky A R. Quantum-chemical descriptors in QSAR/QSPR studies[J]. Chemical Reviews,1996,96(3):1027-1044.
    [19]Klopman G, Li J, Wang S, et al. Computer automated log P calculations based on an extended group contribution approach[J]. Journal of Chemical Information and Computer Sciences,1994,34(4):752-781.
    [20]Ghose A K, Viswanadhan V N, Wendoloski J J. Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: an analysis of ALOGP and CLOGP methods[J]. The Journal of Physical Chemistry A,1998,102(21):3762-3772.
    [21]Consonni V, Mauri A. Dragon Professional 5.4[EB/OL]. http://www.talete.mi.it.
    [22]Helguera A M, Combes R D, Gonzalez M P E R, et al. Applications of 2D descriptors in drug design: a DRAGON tale[J]. Current Topics in Medicinal Chemistry,2008,8(18):1628-1655.
    [23]Cheng F, Ikenaga Y, Zhou Y, et al. In silico assessment of chemical biodegradability[J]. Journal of Chemical Information and Modeling,2012, 52(3):655-669.
    [24]Bender A, Jenkins J L, Scheiber J, et al. How similar are similarity searching methods? a principal component analysis of molecular descriptor space[J]. Journal of Chemical Information and Modeling,2009,49(1):108-119.
    [25]Bocker A. Toward an improved clustering of large data sets using maximum common substructures and topological fingerprints [J]. Journal of Chemical Information and Modeling,2008,48(11):2097-2107.
    [26]Luco J M, Ferretti F H. QSAR based on multiple linear regression and PLS methods for the Anti-HIV activity of a large group of HEPT derivatives [J]. Journal of Chemical Information and Computer Sciences,1997,37(2):392-401.
    [27]Asikainen A H, Ruuskanen J, Tuppurainen K A. Performance of (consensus) kNN QSAR for predicting estrogenic activity in a large diverse set of organic compounds[J]. SAR and QSAR in Environmental Research,2004,15(1):19-32.
    [28]Golmohammadi H, Dashtbozorgi Z, Acree Jr. W E. Quantitative structure-activity relationship prediction of blood-to-brain partitioning behavior using support vector machine[J]. European Journal of Pharmaceutical Sciences, 2012,47(2):421-429.
    [29]Manallack D T, Ellis D D, Livingstone D J. Analysis of linear and nonlinear QSAR data using neural networks[J]. Journal of Medicinal Chemistry,1994, 37(22):3758-3767.
    [30]Svetnik V, Liaw A, Tong C, et al. Application of Breiman's random forest to modeling structure-activity relationships of pharmaceutical molecules [J]. Multiple Classifier Systems,2004,3077:334-343.
    [31]Merkwirth C, Mauser H, Schulz-Gasch T, et al. Ensemble methods for classification in cheminformatics[J]. Journal of Chemical Information and Computer Sciences,2004,44(6):1971-1978.
    [32]Kaiser K L E, Esterby S R. Regression and cluster analysis of the acute toxicity of 267 chemicals to six species of biota and the octanol/water partition coefficient[J]. Science of The Total Environment,1991,109-110(0):499-514.
    [33]Wold S, Dunn W J. Multivariate quantitative structure-activity relationships (QSAR):conditions for their applicability [J]. Journal of Chemical Information and Computer Sciences,1983,23(1):6-13.
    [34]Walczak B, Massart D L. Robust principal components regression as a detection tool for outliers[J]. Chemometrics and Intelligent Laboratory Systems, 1995,27(1):41-54.
    [35]Wold S, Sjostrom M, Eriksson L. PLS-regression: a basic tool of chemometrics[J]. Chemometrics and Intelligent Laboratory Systems,2001, 58(2):109-130.
    [36]Wold S, Trygg J, Berglund A, et al. Some recent developments in PLS modeling[J]. Chemometrics and Intelligent Laboratory Systems,2001, 58(2):131-150.
    [37]Griep M I, Wakeling I N, Vankeerberghen P, et al. Comparison of semirobust and robust partial least squares procedures [J]. Chemometrics and Intelligent Laboratory Systems,1995,29(1):37-50.
    [38]Breiman L, Friedman J, Olshen R, et al. Classification and regression trees[M]. Belmont, CA: Wadsworth,1984.
    [39]Worth A P, Cronin M T D. The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects[J]. Journal of Molecular Structure:THEOCHEM,2003, 622(1-2):97-111.
    [40]Deconinck E, Hancock T, Coomans D, et al. Classification of drugs in absorption classes using the classification and regression trees (CART) methodology [J]. Journal of Pharmaceutical and Biomedical Analysis,2005, 39(1-2):91-103.
    [41]Panaye A, Doucet J P, Devillers J, et al. Decision trees versus support vector machine for classification of androgen receptor ligands1[J]. SAR and QSAR in Environmental Research,2008,19(1-2):129-151.
    [42]Polishchuk P G, Muratov E N, Artemenko A G, et al. Application of random forest approach to QSAR prediction of aquatic toxicity[J]. Journal of Chemical Information and Modeling,2009,49(11):2481-2488.
    [43]Cao D, Xu Q, Liang Y, et al. The boosting: a new idea of building models[J]. Chemometrics and Intelligent Laboratory Systems,2010,100(1):1-11.
    [44]Hopfield J J. Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the National Academy of Sciences, 1982,79(8):2554-2558.
    [45]潘忠孝,陈玲然.神经网络及其在化学中的应用[M].潘忠孝,陈玲然,译.合肥:中国科学技术大学出版社,2000.
    [46]Vracko M, Bandelj V, Barbieri P, et al. Validation of counter propagation neural network models for predictive toxicology according to the OECD principles:a case study[J]. SAR and QSAR in Environmental Research,2006, 17(3):265-284.
    [47]Chen H. Quantitative predictions of gas chromatography retention indexes with support vector machines, radial basis neural networks and multiple linear regression[J]. Analytica Chimica Acta,2008,609(1):24-36.
    [48]Vapnik V. The nature of statistical learning theory[M]. New York: Springer-Verlag,1995.
    [49]Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines[J]. Machine Learning,2002,46(1-3):389-422.
    [50]Warmuth M K, Liao J, Ratsch G, et al. Active learning with support vector machines in the drug discovery process [J]. Journal of Chemical Information and Computer Sciences,2003,43(2):667-673.
    [51]Burges C C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery,1998,2(2):121-167.
    [52]Heikamp K, Hu X, Yan A, et al. Prediction of activity cliffs using support vector machines[J]. Journal of Chemical Information and Modeling,2012, 52(9):2354-2365.
    [53]Dietterich T. Ensemble methods in machine learning[J]. Multiple Classifier Systems,2000,1857:1-15.
    [54]Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: Bagging, Boosting, and Variants[J]. Machine Learning,1999, 36(1-2):105-139.
    [55]Breiman L. Bagging predictors [J]. Machine Learning,1996,24(2):123-140.
    [56]Tin K H. The random subspace method for constructing decision forests[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on,1998, 20(8):832-844.
    [57]Breiman L. Random Forests[J]. Machine Learning,2001,45(1):5-32.
    [58]Cao D, Hu Q, Xu Q, et al. In silico classification of human maximum recommended daily dose based on modified random forest and substructure fingerprint[J]. Analytica Chimica Acta,2011,692(1-2):50-56.
    [59]Freund Y. Boosting a weak learning algorithm by majority[J]. Information and Computation,1995,121(2):256-285.
    [60]Ojha P K, Roy K. Comparative QSARs for antimalarial endochins:importance of descriptor-thinning and noise reduction prior to feature selection[J]. Chemometrics and Intelligent Laboratory Systems,2011,109(2):146-161.
    [61]张丽新.高维数据的特征选择及基于特征选择的集成学习研究[D].清华大学,2004.
    [62]尹建新,计智伟,胡珉.特征选择算法综述[J].电子设计工程, 2011,(09):46-51.
    [63]Goldberg D, Holland J. Genetic algorithms and machine learning[J]. Machine Learning,1988,3(2-3):95-99.
    [64]Huan L, Lei Y. Toward integrating feature selection algorithms for classification and clustering[J]. Knowledge and Data Engineering, IEEE Transactions on,2005,17(4):491-502.
    [65]Kohavi R, John G H. Wrappers for feature subset selection[J]. Artificial Intelligence,1997,97(1-2):273-324.
    [66]孙优贤,毛勇,周晓波.特征选择算法研究综述[J].模式识别与人工智能,2007,(02):211-218.
    [67]Guyon I, Elisseeff A. An introduction to variable and feature selection[J]. J. Mach. Learn. Res.,2003,3:1157-1182.
    [68]Dash M, Choi K, Scheuermann P, et al. Feature selection for clustering-a filter solution:Data Mining,2002. ICDM 2003. Proceedings.2002 IEEE International Conference on,2002[C].
    [69]Shen Q, Jiang J, Tao J, et al. Modified ant colony optimization algorithm for variable selection in QSAR modeling:QSAR studies of cyclooxygenase inhibitors[J]. Journal of Chemical Information and Modeling,2005, 45(4):1024-1029.
    [70]Cho S J, Hermsmeier M A. Genetic algorithm guided selection: variable selection and subset selection[J]. J Chem Inf Comput Sci,2002,42(4):927-936.
    [71]Xue Y, Li Z R, Yap C W, et al. Effect of molecular descriptor feature selection in support vector machine classification of pharmacokinetic and toxicological properties of chemical agents[J]. Journal of Chemical Information and Computer Sciences,2004,44(5):1630-1638.
    [72]Louw N, Steel S J. Variable selection in kernel Fisher discriminant analysis by means of recursive feature elimination[J]. Computational Statistics & Data Analysis,2006,51(3):2043-2055.
    [73]Schultz T W, Cronin M T D. Response-surface analyses for toxicity to tetrahymena pyriformis:reactive carbonyl-containing aliphatic chemicals[J]. Journal of Chemical Information and Computer Sciences,1999,39(2):304-309.
    [74]Schultz T W, Netzeva T I, Roberts D W, et al. Structure-toxicity relationships for the effects to tetrahymena pyriformis of aliphatic, carbonyl-containing, a,(3-unsaturated chemicals[J]. Chemical Research in Toxicology,2005, 18(2):330-341.
    [75]Schultz T W, Cronin M T D, Netzeva T I, et al. Structure-toxicity relationships for aliphatic chemicals evaluated with tetrahymena pyriformis[J]. Chemical Research in Toxicology,2002,15(12):1602-1609.
    [76]Netzeva T I, Worth A P, Aldenberg T, et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships[J]. Alternatives to Laboratory Animals,2005,33:1-19.
    [77]Jaworska J, Jeliazkova N N, Aldenberg T. QSAR applicability domain estimation by projection of the training set in descriptor space: a review[J]. Alternatives to Laboratory Animals,2005,33:445-459.
    [78]Eriksson L, Jaworska J, Worth A P, et al. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs[J]. Environ Health Perspect,2003, 111(10):1361-1375.
    [79]Gramatica P, Pilutti P, Papa E. Validation QSAR prediction of OH tropospheric degradation of VOCs:splitting into training-test sets and consensus modeling[J]. Journal of Chemical Information and Computer Sciences,2004,44(5):1794-1802.
    [80]Sheridan R P, Feuston B P, Maiorov V N, et al. Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR[J]. Journal of Chemical Information and Computer Sciences,2004,44(6):1912-1928.
    [81]Tong W, Hong H, Fang H, et al. Decision Forest: combining the predictions of multiple independent decision tree models [J]. Journal of Chemical Information and Computer Sciences,2003,43(2):525-531.
    [82]Svetnik V, Wang T, Tong C, et al. Boosting: An ensemble learning tool for compound classification and QSAR modeling[J]. Journal of Chemical Information and Modeling,2005,45(3):786-799.
    [83]Svetnik V, Liaw A, Tong C, et al. Random Forest: A classification and regression tool for compound classification and QSAR modeling[J]. Journal of Chemical Information and Computer Sciences,2003,43(6):1947-1958.
    [84]Sheridan R P. Three useful dimensions for domain applicability in QSAR models using random forest[J]. Journal of Chemical Information and Modeling, 2012,52(3):814-823.
    [85]Tetko I V, Sushko I, Pandey A K, et al. Critical assessment of QSAR models of environmental toxicity against tetrahymena pyriformis:focusing on applicability domain and overfitting by variable selection[J]. Journal of Chemical Information and Modeling,2008,48(9):1733-1746.
    [86]Novotarskyi S, Sushko I, Korner R, et al. A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition[J]. Journal of Chemical Information and Modeling,2011,51(6):1271-1280.
    [87]Guha R, Jurs P C. Determining the validity of a QSAR model-a classification approach[J]. Journal of Chemical Information and Modeling,2004, 45(1):65-73.
    [88]Hawkins D M. The problem of overfitting[J]. Journal of Chemical Information and Computer Sciences,2003,44(1):1-12.
    [89]Roy K, Mitra I, Kar S, et al. Comparative studies on some metrics for external validation of QSPR models[J]. Journal of Chemical Information and Modeling, 2011,52(2):396-408.
    [90]Golbraikh A, Tropsha A. Beware of q2![J]. Journal of Molecular Graphics and Modelling,2002,20(4):269-276.
    [91]Hawkins D M, Basak S C, Mills D. Assessing model fit by cross-validation[J]. Journal of Chemical Information and Computer Sciences,2003,43(2):579-586.
    [92]Gramatica P. Principles of QSAR models validation:internal and external[J]. QSAR & Combinatorial Science,2007,26(5):694-701.
    [93]Marini F, Roncaglioni A, Novic M. Variable selection and interpretation in structure-affinity correlation modeling of estrogen receptor binders [J]. Journal of Chemical Information and Modeling,2005,45(6):1507-1519.
    [94]Cronin M T D, Dearden J C, Duffy J C, et al. The importance of hydrophobicity and electrophilicity descriptors in mechanistically-based QSARs for toxicological endpoints[J]. SAR and QSAR in Environmental Research,2002,13(1):167-176.
    [95]Benigni R. Structure-activity relationship studies of chemical mutagens and carcinogens:mechanistic investigations and prediction approaches [J]. Chemical Reviews,2005,105(5):1767-1800.
    [96]Hansch C, Bonavida B, Jazirehi A R, et al. Quantitative structure-activity relationships of phenolic compounds causing apoptosis[J]. Bioorganic & Medicinal Chemistry,2003,11(4):617-620.
    [97]Debnath A K, Debnath G, Shusterman A J, et al. A QSAR investigation of the role of hydrophobicity in regulating mutagenicity in the ames test: 1. mutagenicity of aromatic and heteroaromatic amines in salmonella typhimurium TA98 and TA100[J]. Environmental and Molecular Mutagenesis, 1992,19(1):37-52.
    [98]Raevsky O A, Dearden J C. Creation of predictive models of aquatic toxicity of environmental pollutants with different mechanisms of action on the basis of molecular similarity and HYBOT descriptors [J]. SAR and QSAR in Environmental Research,2004,15(5-6):433-448.
    [99]Hou T J, Xu X J. ADME evaluation in drug discovery.3. modeling blood-brain barrier partitioning using simple molecular descriptors [J]. Journal of Chemical Information and Computer Sciences,2003,43(6):2137-2152.
    [100]Gough J D, Hall L H. Modeling the toxicity of amide herbicides using the electrotopological state [J]. Environmental Toxicology and Chemistry,1999, 18(5):1069-1075.
    [101]van de Waterbeemd H, Gifford E. ADMET in silico modelling: towards prediction paradise?[J]. Nature Reviews Drug Discovery,2003,2(3):192-204.
    [102]Grossman I. ADME pharmacogenetics: current practices and future outlook[J]. Expert Opinion on Drug Metabolism & Toxicology,2009,5(5):449-462.
    [103]Hilmer S N. ADME-tox issues for the elderly[J]. Expert Opinion on Drug Metabolism & Toxicology,2008,4(10):1321-1331.
    [104]Kassel D B. Applications of high-throughput ADME in drug discovery[J]. Current Opinion in Chemical Biology,2004,8(3):339-345.
    [105]Selick H E, Beresford A P, Tarbit M H. The emerging importance of predictive ADME simulation in drug discovery[J]. Drug Discovery Today,2002, 7(2):109-116.
    [106]Penzotti J E, Landrum G A, Putta S. Building predictive ADMET models for early decisions in drug discovery[J]. Current opinion in drug discovery & development,2004,7(1):49-61.
    [107]Duda R O, Hart P E, Stork D G. Pattern Classification[M].2nd. New York: Wiley,2001.
    [108]Jin Z, Yang J, Hu Z, et al. Face recognition based on the uncorrelated discriminant transformation[J]. Pattern Recognition,2001,34(7):1405-1416.
    [109]Ye J, Li T, Xiong T, et al. Using uncorrelated discriminant analysis for tissue classification with gene expression data[J]. IEEE/ACM Trans. Comput. Biol. Bioinformatics,2004, 1(4):181-190.
    [110]Chen X, Liang Y Z, Yuan D L, et al. A modified uncorrelated linear discriminant analysis model coupled with recursive feature elimination for the prediction of bioactivity[J]. SAR and QSAR in Environmental Research,2009, 20(1-2):1-26.
    [111]Li H, Yap C W, Ung C Y, et al. Effect of selection of molecular descriptors on the prediction of blood-brain barrier penetrating and nonpenetrating agents by statistical learning methods[J]. Journal of Chemical Information and Modeling, 2005,45(5):1376-1384.
    [112]Arodz T, Yuen D A, Dudek A Z. Ensemble of linear models for predicting drug properties[J]. Journal of Chemical Information and Modeling,2005, 46(1):416-423.
    [113]Hou T, Wang J, Zhang W, et al. ADME evaluation in drug discovery.7. prediction of oral absorption by correlation and classification[J]. Journal of Chemical Information and Modeling,2006,47(1):208-218.
    [114]Abraham M H, Zhao Y H, Le J, et al. On the mechanism of human intestinal absorption[J]. European Journal of Medicinal Chemistry,2002,37(7):595-605.
    [115]Wang Y, Li Y, Yang S, et al. Classification of substrates and inhibitors of p-glycoprotein using unsupervised machine learning approach[J]. Journal of Chemical Information and Modeling,2005,45(3):750-757.
    [116]Kaiser D, Terfloth L, Kopp S, et al. Self-organizing maps for identification of new inhibitors of p-glycoprotein[J]. Journal of Medicinal Chemistry,2007, 50(7):1698-1702.
    [117]Klopman G, Shi L M, Ramu A. Quantitative structure-activity relationship of multidrug resistance reversal agents[J]. Molecular Pharmacology,1997, 52(2):323-334.
    [118]Sun H. A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing[J]. Journal of Medicinal Chemistry,2005, 48(12):4031-4039.
    [119]Suomalainen P, Johans C, Soderlund T, et al. Surface activity profiling of drugs applied to the prediction of blood-brain barrier permeability [J]. Journal of Medicinal Chemistry,2004,47(7):1783-1788.
    [120]Fontaine F, Pastor M, Zamora I, et al. Anchor-GRIND:filling the gap between standard 3D QSAR and the GRid-INdependent descriptors[J]. Journal of Medicinal Chemistry,2005,48(7):2687-2694.
    [121]Lipinski C A. Drug-like properties and the causes of poor solubility and poor permeability[J]. Journal of Pharmacological and Toxicological Methods,2000, 44(1):235-249.
    [122]Lipinski C A. Lead-and drug-like compounds: the rule-of-five revolution[J]. Drug Discovery Today:Technologies,2004,1(4):337-341.
    [123]Kovats E. Gas-chromatographische charakterisierung organischer Verbindungen. teil 1:Retentionsindices aliphatischer halogenide, alkohole, aldehyde und Ketone[J]. Helvetica Chimica Acta,1958,41(7):1915-1932.
    [124]van Den Dool H, Dec. Kratz P. A generalization of the retention index system including linear temperature programmed gas-liquid partition chromatography[J]. Journal of Chromatography A,1963,11:463-471.
    [125]Isidorov V A, Szczepaniak L. Gas chromatographic retention indices of biologically and environmentally important organic compounds on capillary columns with low-polar stationary phases[J]. Journal of Chromatography A, 2009,1216(51):8998-9007.
    [126]Lu C, Guo W, Yin C. Quantitative structure-retention relationship study of the gas chromatographic retention indices of saturated esters on different stationary phases using novel topological indices [J]. Analytica Chimica Acta,2006, 561(1-2):96-102.
    [127]Garkani-Nejad Z, Karlovits M, Demuth W, et al. Prediction of gas chromatographic retention indices of a diverse set of toxicologically relevant compounds[J]. Journal of Chromatography A,2004,1028(2):287-295.
    [128]Heberger K. Quantitative structure-(chromatographic) retention relationships[J]. Journal of Chromatography A,2007,1158(1-2):273-305.
    [129]Ren B. Atom-type-based AI topological descriptors for quantitative structure-retention index correlations of aldehydes and ketones[J]. Chemometrics and Intelligent Laboratory Systems,2003,66(1):29-39.
    [130]Sutter J M, Peterson T A, Jurs P C. Prediction of gas chromatographic retention indices of alkylbenzenes[J]. Analytica Chimica Acta,1997,342(2-3):113-122.
    [131]Farkas O, Zenkevich I G, Stout F, et al. Prediction of retention indices for identification of fatty acid methyl esters[J]. Journal of Chromatography A, 2008,1198-1199:188-195.
    [132]Yan J, Cao D, Guo F, et al. Comparison of quantitative structure-retention relationship models on four stationary phases with different polarity for a diverse set of flavor compounds[J]. Journal of Chromatography A,2012, 1223:118-125.
    [133]Chen X, Li H D, Guo F Q, et al. QSRR study on flavor compounds of diverse structures on different columns with the help of new chemometric methods[J]. Chromatographia,2013,76(5-6):241-253.
    [134]Avila M, Zougagh M, Escarpa A, et al. Determination of alkenylbenzenes and related flavour compounds in food samples by on-column preconcentration-capillary liquid chromatography [J]. Journal of Chromatography A,2009,1216(43):7179-7185.
    [135]Saison D, De Schutter D P, Uyttenhove B, et al. Contribution of staling compounds to the aged flavour of lager beer by studying their flavour thresholds[J]. Food Chemistry,2009,114(4):1206-1215.
    [136]Rocha S M, Coelho E, Zrostlikova J, et al. Comprehensive two-dimensional gas chromatography with time-of-flight mass spectrometry of monoterpenoids as a powerful tool for grape origin traceability[J]. Journal of Chromatography A,2007,1161(1-2):292-299.
    [137]Ryan D, Shellie R, Tranchida P, et al. Analysis of roasted coffee bean volatiles by using comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry [J]. Journal of Chromatography A,2004,1054(1-2):57-65.
    [138]Tian F, Yang L, Lv F, et al. Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches [J]. Analytica Chimica Acta,2009,644(1-2):10-16.
    [139]Skrbic B, Onjia A. Prediction of the Lee retention indices of polycyclic aromatic hydrocarbons by artificial neural network[J]. Journal of Chromatography A,2006,1108(2):279-284.
    [140]Luan F, Xue C, Zhang R, et al. Prediction of retention time of a variety of volatile organic compounds based on the heuristic method and support vector machine[J]. Analytica Chimica Acta,2005,537(1-2):101-110.
    [141]Cao D S, Liang Y Z, Xu Q S, et al. A new strategy of outlier detection for QSAR/QSPR[J]. Journal of Computational Chemistry,2010,31(3):592-602.
    [142]Li H D, Xu Q S, Liang Y Z. Random frog:An efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification [J]. Analytica Chimica Acta,2012,740:20-26.
    [143]Egan W J, Morgan S L. Outlier detection in multivariate analytical chemical data[J]. Analytical Chemistry,1998,70(11):2372-2379.
    [144]Huber P. Robust statistics[M]. New York:Wiley,1981.
    [145]Xu Q, Liang Y, Du Y. Monte Carlo cross-validation for selecting a model and estimating the prediction error in multivariate calibration[J]. Journal of Chemometrics,2004,18(2):112-120.
    [146]Li H D, Liang Y Z, Xu Q S, et al. Model population analysis for variable selection[J]. Journal of Chemometrics,2010,24(7-8):418-423.
    [147]Li H D, Liang Y Z, Cao D S, et al. Model-population analysis and its applications in chemical and biological modeling[J]. TrAC Trends in Analytical Chemistry,2012,38:154-162.
    [148]GREEN P J. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination[J]. Biometrika,1995,82(4):711-732.
    [149]林翔云.调香术[M].2.化学工业出版社,2008.
    [150]Cheng F, Yu Y, Shen J, et al. Classification of cytochrome P450 inhibitors and noninhibitors using combined classifiers [J]. Journal of Chemical Information and Modeling,2011,51 (5):996-1011.
    [151]朱立勤,娄建石.细胞色素P450与药物代谢的研究现状[J].中国临床药理学与治疗学,2004,(10):1081-1086.
    [152]Xu C, Cheng F, Chen L, et al. In silico prediction of chemical Ames mutagenicity[J]. Journal of Chemical Information and Modeling,2012, 52(11):2840-2847.
    [153]Hansen K, Mika S, Schroeter T, et al. Benchmark data set for in silico prediction of Ames mutagenicity[J]. Journal of Chemical Information and Modeling,2009,49(9):2077-2081.
    [154]黄震华.凝血因子Xa抑制药研究进展[J].中国新药与临床杂志,2012,(09):505-510.
    [155]王天才,周金培,张惠斌.抗结核药物的研究进展[J].中国药科大学学报,2010,(04):299-305.
    [156]Mitchison T J. Towards a pharmacological genetics[J]. Chemistry and Biology, 1994,1:3-6.
    [157]Shogren-Knaak M A, Alaimo P J, Shokat K M. Recent advances in chemical approaches to the study of biological systems[J]. Annual Review of Cell and Developmental Biology,2001,17(1):405-433.
    [158]Strausberg R L, Schreiber S L. From knowing to controlling: a path from genomics to drugs using small molecule probes[J]. Science,2003, 300(5617):294-295.
    [159]Schreiber S L. Stuart Schreiber: biology from a chemist's perspective[J]. Drug Discovery Today,2004,9(7):299-303.
    [160]Bleicher K H. Chemogenomics:bridging a drug discovery gap[J]. Current Medicinal Chemistry,2002,9(23):2077-2084.
    [161]Dunstan C N, Salafranca M N, Adhikari S, et al. Identification of two rat genes orthologous to the human interleukin-8 receptors [J]. Journal of Biological Chemistry,1996,271(51):32770-32776.
    [162]Horvath D, Mao B. Neighborhood behavior. Fuzzy molecular descriptors and their influence on the relationship between structural similarity and property similarity [J]. QSAR & Combinatorial Science,2003,22(5):498-509.
    [163]Root D E, Flaherty S P, Kelley B P, et al. Biological mechanism profiling using an annotated compound library[J]. Chemistry & Biology,2003,10(9):881-892.
    [164]Chembank[EB/OL]. http://chembank.broad.harvard.edu/.
    [165]Schneider P, Schneider G. Collection of bioactive reference compounds for focused library design[J]. QSAR & Combinatorial Science,2003, 22(7):713-718.
    [166]Goto S, Okuno Y, Hattori M, et al. LIGAND:database of chemical compounds and reactions in biological pathways[J]. Nucleic Acids Research,2002, 30(1):402-404.
    [167]Lipinski C A, Lombardo F, Dominy B W, et al. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings [J]. Advanced Drug Delivery Reviews, 1997,23(1-3):3-25.
    [168]http://www.ibmh.msk.su/PASS/[EB/OL].
    [169]Geronikaki A, Lagunin A, Poroikov V, et al. Computer aided prediction of biological activity spectra:Evaluating versus known and predicting of new activities for thiazole derivatives [J]. SAR and QSAR in Environmental Research,2002,13(3-4):457-471.
    [170]Stepanchikova A V, Lagunin A A, Filimonov D A, et al. Prediction of biological activity spectra for substances:evaluation on the diverse sets of drug-like structures[J]. Current Medicinal Chemistry,2003,10(3):225-233.
    [171]Poroikov V V, Filimonov D A, Ihlenfeldt W, et al. PASS biological activity spectrum predictions in the enhanced open NCI database browser[J]. Journal of Chemical Information and Computer Sciences,2002,43(1):228-236.
    [172]Xu J. A New approach to finding natural chemical structure classes[J]. Journal of Medicinal Chemistry,2002,45(24):5311-5320.
    [173]Grandy D K, Marchionni M A, Makam H, et al. Cloning of the cDNA and gene for a human D2 dopamine receptor [J]. Proceedings of the National Academy of Sciences,1989,86(24):9762-9766.
    [174]Xu J, Hagler A. Chemoinformatics and drug disco very [J]. Molecules,2002, 7(8):566-600.
    [175]Ashburner M, Ball C A, Blake J A, et al. Gene Ontology: tool for the unification of biology[J]. Nature Genetics,2000,25(1):25-29.
    [176]Soldatova L N, King R D. Are the current ontologies in biology good ontologies?[J]. Nature Biology,2005,23(9):1095-1098.
    [177]Chandrasekaran B, Josephson J R, Benjamins V R. What are ontologies, and why do we need them?[J]. Intelligent Systems and their Applications, IEEE, 1999,14(1):20-26.
    [178]Bodenreider O, Stevens R. Bio-ontologies:current trends and future directions[J]. Briefings in Bioinformatics,2006,7(3):256-274.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700