摘要
A kernel version of k-nearest neighbor algorithm (k-NN) has been developed to model the complex relationship between molecular descriptors and bioactivities of compounds. Kernel k-NN is to perform the original k-NN algorithm by mapping the training samples in the input space into a high-dimensional feature space. It can be easily constructed by calculating the distance between samples in the feature space, directly deriving from the simple calculation of the kernel used. The developed kernel k-NN is very flexible to deal with complex nonlinear relationship, more importantly; it can also conveniently cope with some non-vectorial data only by the definition of different kernels. The results obtained from several real SAR datasets indicated that the performance of kernel k-NN is comparable to support vector machine methods. It can be regarded as an alternative modeling technique for several chemical problems including the study of structure-activity relationship (SAR). The source codes implementing kernel k-NN in R language are freely available at .