Nearest Neighbors Graph Construction: Peer Sampling to the Rescue
详细信息    查看全文
  • 关键词:K ; Nearest Neighbors ; Clustering ; Sampling ; Randomness
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9944
  • 期:1
  • 页码:48-62
  • 全文大小:2,569 KB
  • 参考文献:1.Jester dataset. http://​grouplens.​org/​datasets/​jester/​
    2.Jester joke recommender. http://​shadow.​ieor.​berkeley.​edu/​humor/​
    3.Movielens dataset. http://​grouplens.​org/​datasets/​movielens/​
    4.Agrawal, D., Das, S., El Abbadi, A.: Big data, cloud computing: current state and future opportunities. In: Proceedings of the 14th International Conference on Extending Database Technology, EDBT/ICDT 2011, pp. 530–533. ACM (2011)
    5.Amato, G., Falchi, F.: KNN based image classification relying on local feature similarity. In: Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP 2010, pp. 101–108. ACM (2010)
    6.Bai, X., Guerraoui, R., Kermarrec, A.-M., Leroy, V.: Collaborative personalized top-k processing. ACM Trans. Database Syst. 36(4), 26 (2011)CrossRef
    7.Bertier, M., Frey, D., Guerraoui, R., Kermarrec, A.-M., Leroy, V.: The gossple anonymous social network. In: Gupta, I., Mascolo, C. (eds.) Middleware 2010. LNCS, vol. 6452, pp. 191–211. Springer, Heidelberg (2010)CrossRef
    8.Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)
    9.Boutet, A., Frey, D., Guerraoui, R., Jegou, A., Kermarrec, A.-M.: WhatsUp: a decentralized instant news recommender. In: Proceedings of the 27th IEEE International Symposium on Parallel Distributed Processing, IPDPS 2013, pp. 741–752 (2013)
    10.Chen, J., Fang, H.-R., Saad, Y.: Fast approximate KNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10, 1989–2012 (2009)MathSciNet MATH
    11.Connor, M., Kumar, P.: Fast construction of k-nearest neighbor graphs for point clouds. IEEE Trans. Vis. Comput. Graph. 16(4), 599–608 (2010)CrossRef
    12.Desrosiers, C., Karypis, G.: A comprehensive survey of neighborhood-based recommendation methods. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 107–144. Springer, Heidelberg (2011)CrossRef
    13.Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 577–586. ACM (2011)
    14.Giakkoupis, G., Kermarrec, A.-M., Woelfel, P.: Gossip protocols for renaming and sorting. In: Afek, Y. (ed.) DISC 2013. LNCS, vol. 8205, pp. 194–208. Springer, Heidelberg (2013)CrossRef
    15.Guo, G., Wang, H., Bell, D.J., Bi, Y., Greer, K.: KNN model-based approach in classification. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003)CrossRef
    16.Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, pp. 1312–1317. AAAI Press (2011)
    17.Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR 2008, pp. 39–43. ACM (2008)
    18.Jelasity, M., Montresor, A., Babaoglu, O.: T-Man: gossip-based fast overlay topology construction. Comput. Netw.: Int. J. Comput. Telecommun. Netw. 53(13), 2321–2339 (2009)CrossRef MATH
    19.Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A.-M., van Steen, M.: Gossip-based peer sampling. ACM Trans. Comput. Syst. 25(3) (2007)
    20.Olman, V., Mao, F., Wu, H., Xu, Y.: Parallel clustering algorithm for large data sets with applications in bioinformatics. The IEEE/ACM Trans. Comput. Biol. Bioinform. 6(2), 344–352 (2009)CrossRef
    21.Ormándi, R., Hegedűs, I., Jelasity, M.: Overlay management for fully distributed user-based collaborative filtering. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part I. LNCS, vol. 6271, pp. 446–457. Springer, Heidelberg (2010)CrossRef
    22.Pan, R., Dolog, P., Xu, G.: KNN-based clustering for improving social recommender systems. In: Cao, L., Zeng, Y., Symeonidis, A.L., Gorodetsky, V.I., Yu, P.S., Singh, M.P. (eds.) ADMI. LNCS, vol. 7607, pp. 115–125. Springer, Heidelberg (2013)CrossRef
    23.Paredes, R., Chávez, E., Figueroa, K., Navarro, G.: Practical construction of k-nearest neighbor graphs in metric spaces. In: Àlvarez, C., Serna, M. (eds.) WEA 2006. LNCS, vol. 4007, pp. 85–97. Springer, Heidelberg (2006)CrossRef
    24.Voulgaris, S., van Steen, M.: VICINITY: a pinch of randomness brings out the structure. In: Eyers, D., Schwan, K. (eds.) Middleware 2013. LNCS, vol. 8275, pp. 21–40. Springer, Heidelberg (2013)CrossRef
    25.Wang, J., Wang, J., Zeng, G., Tu, Z., Gan, R., Li, S.: Scalable k-NN graph construction for visual descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, pp. 1106–1113 (2012)
    26.Zhong, R., Li, G., Tan, K.-L., Zhou, L.: G-tree: an efficient index for knn search on road networks. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, pp. 39–48. ACM (2013)
  • 作者单位:Yahya Benkaouz (15)
    Mohammed Erradi (15)
    Anne-Marie Kermarrec (16)

    15. Networking and Distributed Systems Research Group, ENSIAS, Mohammed V University in Rabat, Rabat, Morocco
    16. INRIA Rennes, Rennes, France
  • 丛书名:Networked Systems
  • ISBN:978-3-319-46140-3
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9944
文摘
In this paper, we propose an efficient KNN service, called KPS (KNN-Peer-Sampling). The KPS service can be used in various contexts e.g. recommendation systems, information retrieval and data mining. KPS borrows concepts from P2P gossip-based clustering protocols to provide a localized and efficient KNN computation in large-scale systems. KPS is a sampling-based iterative approach, combining randomness, to provide serendipity and avoid local minimum, and clustering, to ensure fast convergence. We compare KPS against the state of the art KNN centralized computation algorithm NNDescent, on multiple datasets. The experiments confirm the efficiency of KPS over NNDescent: KPS improves significantly on the computational cost while converging quickly to a close to optimal KNN graph. For instance, the cost, expressed in number of pairwise similarity computations, is reduced by \(\approx \)23 % and \(\approx \)49 % to construct high quality KNN graphs for Jester and MovieLens datasets, respectively. In addition, the randomized nature of KPS ensures eventual convergence, not always achieved with NNDescent.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700