Tight lower bound instances for k-means++ in two dimensions

详细信息查看全文

作者：Anup Bhattacharya^a ; ^{anupb@cse.iitd.ernet.in" class="auth_mail" title="E-mail the corresponding author} ; Ragesh Jaiswal^a ; ¹ ; ^{rjaiswal@cse.iitd.ac.in" class="auth_mail" title="E-mail the corresponding author} ; ^{rjaiswal@cse.iitd.ernet.in" class="auth_mail" title="E-mail the corresponding author} ; Nir Ailon^b ; ² ; ^{nailon@cs.technion.ac.il" class="auth_mail" title="E-mail the corresponding author}
关键词：k-means++ ; Lower bounds
刊名：Theoretical Computer Science
出版年：2016
出版时间：27 June 2016
年：2016
卷：634
期：Complete
页码：55-66
全文大小：404 K

文摘

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the Lloyd's algorithm for the k-means problem. It was conjectured by Brunsch and Röglin [9] that k-means++ behaves well for datasets with small dimension. More specifically, they conjectured that the k -means++ seeding algorithm gives O(log⁡d)iner hidden">

O (\log d)

approximation with high probability for any d-dimensional dataset. In this work, we refute this conjecture by giving two dimensional datasets on which the k -means++ seeding algorithm achieves an O(log⁡k)iner hidden">

O (\log k)

approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. [12] and by Brunsch and Röglin [9].

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700