文摘
The problem of clustering large-scale spatio-textual data is firstly studied. It has many real applications like location-based data cleaning. A modified version of the k-means clustering algorithm is developed for spatio-textual data using the expected pairwise distance. Experimentally, our algorithm is not only fast enough to tackle a massive spatio-textual dataset, but also fairly effective in terms of the quality.