文摘
We describe a novel method for ligand-based virtual screening, based on utilizing Self-Organizing Maps(SOM) as a novelty detection device. Novelty detection (or one-class classification) refers to the attempt ofidentifying patterns that do not belong to the space covered by a given data set. In ligand-based virtualscreening, chemical structures perceived as novel lie outside the known activity space and can therefore bediscarded from further investigation. In this context, the concept of "novel structure" refers to a compound,which is unlikely to share the activity of the query structures. Compounds not perceived as "novel" aresuspected to share the activity of the query structures. Nowadays, various databases contain active structuresbut access to compounds which have been found to be inactive in a biological assay is limited. This workaddresses this problem via novelty detection, which does not require proven inactive compounds. Thestructures are described by spatial autocorrelation functions weighted by atomic physicochemical properties.Different methods for selecting a subset of targets from a larger set are discussed. A comparison with similaritysearch based on Daylight fingerprints followed by data fusion is presented. The two methods complementeach other to a large extent. In a retrospective screening of the WOMBAT database novelty detection withSOM gave enrichment factors between 105 and 462-an improvement over the similarity search based onDaylight fingerprints between 25% and 100%, when the 100 top ranked structures were considered. Noveltydetection with SOM is applicable (1) to improve the retrieval of potentially active compounds also in concertwith other virtual screening methods; (2) as a library design tool for discarding a large number of compounds,which are unlikely to possess a given biological activity; and (3) for selecting a small number of potentiallyactive compounds from a large data set.