We modify hashing strategies to cluster high dimensional documents.
We estimate the Jaccard similarity by counting bucket collisions between documents.
We introduce a penalized Hamming function to approximate the cosine similarity.
Both strategies allow improving the quality of the detected clusters.