Hashing-based clustering in high dimensional data
详细信息    查看全文
文摘

We modify hashing strategies to cluster high dimensional documents.

We estimate the Jaccard similarity by counting bucket collisions between documents.

We introduce a penalized Hamming function to approximate the cosine similarity.

Both strategies allow improving the quality of the detected clusters.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700