The species-by-species models performed better than the RIVPACS model according to the dissimilarity measure BC, the area under curve (AUC) and proportion of true positives. In contrast, the taxonomic completeness index (O/E), commonly used for freshwater assessments, indicated that the RIVPACS model performed better. However, we believe that O/E overestimates model performance, due to the index omitting false negative errors (i.e. errors where species are wrongly predicting as absent).
No support was found for our hypothesis that rare species would be better modelled by the RIVPACS model. Indeed, the RIVPACS model predicted common species significantly better than the species-by-species models, whilst the species-by-species models predicted rare species better than the RIVPACS model.
Both modelling methods were able to separate impaired sites (acidified and eutrophic) from reference sites.
We suggest that classification-then-modelling is evaluated using data-set containing more possible biological interactions, e.g. phytoplankton, zooplankton and fish. We also suggest that AUC is used as a complement to taxonomic completeness when evaluating models for reference condition taxa composition.