文摘
In a recent publication we described the application of an unsupervised learning method using self-organizingmaps to the separation of three tribes and seven subtribes of the plant family Asteraceae based on a set ofsesquiterpene lactones (STLs) isolated from individual species. In the present work, two different structurerepresentations-atom counts (2D) and radial distribution function (RDF) (3D)-and two supervisedclassification methods-counterpropagation neural networks and k-nearest neighbors (k-NN)-were used topredict the tribe in which a given STL occurs. The data set was extended from 144 to 921 STLs, and theAsteraceae tribes were augmented from three to seven. The k-NN classifier with k = 1 showed the bestperformance, while the RDF code outperformed the atom counts. The quality of the obtained model wasassessed with two test sets, which exemplified two possible applications: (1) finding a plant source for adesired compound and (2) based on a plant species chemical profile (STLs): (a) study the relationshipbetween the current taxonomic classification and plant's chemistry and (b) assign a species to a tribe bymajority vote. In addition, the problem of defining the applicability domain of the models was assessed bymeans of two different approaches-principal component analysis combined with Hotelling T2 statistic andan a posteriori probability-based rule.