y-Randomization and Its Variants in QSPR/QSAR
详细信息    查看全文
  • 作者:Christoph Rü ; cker ; Gerta Rü ; cker ; Markus Meringer
  • 刊名:Journal of Chemical Information and Modeling
  • 出版年:2007
  • 出版时间:November 2007
  • 年:2007
  • 卷:47
  • 期:6
  • 页码:2345 - 2357
  • 全文大小:117K
  • 年卷期:v.47,no.6(November 2007)
  • ISSN:1549-960X
文摘
y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of theoriginal model in data description (r2) is compared to that of models built for permuted (randomly shuffled)response, based on the original descriptor pool and the original model building procedure. We comparedy-randomization and several variants thereof, using original response, permuted response, or random numberpseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting ofmultilinear regression (MLR) with descriptor selection. For each combination of number of observations(compounds), number of descriptors in the final model, and number of descriptors in the pool to selectfrom, computer experiments using the same descriptor selection method result in two different mean highestrandom r2 values. A lower one is produced by y-randomization or a variant likewise based on the originaldescriptors, while a higher one is obtained from variants that use random number pseudodescriptors. Thedifference is due to the intercorrelation of real descriptors in the pool. We propose to compare an originalmodel's r2 to both of these whenever possible. The meaning of the three possible outcomes of such a doubletest is discussed. Often y-randomization is not available to a potential user of a model, due to the values ofall descriptors in the pool for all compounds not being published. In such cases random number experimentsas proposed here are still possible. The test was applied to several recently published MLR QSAR equations,and cases of failure were identified. Some progress also is reported toward the aim of obtaining the meanhighest r2 of random pseudomodels by calculation rather than by tedious multiple simulations on randomnumber variables.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700