AKNN-Qalsh:PostgreSQL系统高维空间近似最近邻检索插件
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:AKNN-Qalsh:an approximate KNN search extension for high-dimensional data in PostgreSQL
  • 作者:张楚涵 ; 张家侨 ; 冯剑琳
  • 英文作者:ZHANG Chuhan;ZHANG Jiaqiao;FENG Jianlin;School of Data and Computer Science, Sun Yat-sen University;
  • 关键词:高维数据 ; 特征向量 ; 最近邻检索 ; 位置敏感哈希 ; PostgreSQL插件
  • 英文关键词:high-dimensional data;;feature vector;;nearest neighbor search;;Locality-Sensitive Hashing;;PostgreSQL extension
  • 中文刊名:ZSDZ
  • 英文刊名:Acta Scientiarum Naturalium Universitatis Sunyatseni
  • 机构:中山大学数据科学与计算机学院;
  • 出版日期:2019-05-15
  • 出版单位:中山大学学报(自然科学版)
  • 年:2019
  • 期:v.58;No.263
  • 基金:国家自然科学基金(61772563)
  • 语种:中文;
  • 页:ZSDZ201903010
  • 页数:7
  • CN:03
  • ISSN:44-1241/N
  • 分类号:85-91
摘要
复杂数据对象(如图片、文本)通常被表示成高维特征向量。PostgreSQL系统现有的最近邻检索方法KNN-Gist基于树状索引实现,无法高效支持高维数据的最近邻检索。引入的PostgreSQL系统高维空间近似最近邻检索插件:AKNN-Qalsh,基于位置敏感哈希机制实现,支持大规模、高维数据对象的近似最近邻检索。通过在五个真实数据集上的密集实验,验证了该插件的有效性。
        Complex data objects(such as pictures, text) are usually represented as high-dimensional feature vectors. The existing nearest neighbor search method KNN-Gist in PostgreSQL is based on the tree-structured index and cannot efficiently support the nearest neighbor search of high-dimensional data. The PostgreSQL system high-dimensional approximate nearest neighbor search extension: AKNN-Qalsh is introduced, which is based on the Locality-Sensitive Hashing scheme and supports approximate nearest neighbor search of large-scale, high-dimensional data objects. The effectiveness of the extension via extensive experiments on five real data sets is demonstrated.
引文
[1] The PostgreSQL Global Development Group.PostgreSQL 9.1 Press Kit [EB/OL].(2011-09-12) [2018-5-14].https://www.postgresql.org/about/press/presskit91/.
    [2] KOROTKOV A.ImgSmlr [CP/OL].(2018-02-24) [2018-5-14].https://github.com/ postgrespro/ imgsmlr.
    [3] LOWE D G.Distinctive image features from scale-invariant keypoints [J].International journal of computer vision,2004,60(2):91-110.
    [4] PostgreSQL 9.6.9 Documentation.Appendix F.Additional Supplied Modules [EB/OL].[2018-5-14].https://www.postgresql.org/docs/9.6/static/pgtrgm.html.
    [5] WEBER R,SCHEK H J,BLOTT S.A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces [C]//VLDB.1998,98:194-205.
    [6] INDYK P,MOTWANI R.Approximate nearest neighbors:towards removing the curse of dimensionality [C]//Proceedings of the thirtieth annual ACM symposium on Theory of computing.ACM,1998:604-613.
    [7] PANIGRAHY R.Entropy based nearest neighbor search in high dimensions [C]//Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm.Society for Industrial and Applied Mathematics,2006:1186-1195.
    [8] LV Q,JOSEPHSON W,WANG Z,et al.Multi-probe LSH:efficient indexing for high-dimensional similarity search [C]//Proceedings of the 33rd international conference on Very large data bases.VLDB Endowment,2007:950-961.
    [9] TAO Y,YI K,SHENG C,et al.Efficient and accurate nearest neighbor and closest pair search in high-dimensional space [J].ACM TODS,2010,35(3):20.
    [10] GAN J,FENG J,FANG Q,et al.Locality-sensitive hashing scheme based on dynamic collision counting [C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.ACM,2012:541-552.
    [11] HUANG Q,FENG J,ZHANG Y,et al.Query-aware locality-sensitive hashing for approximate nearest neighbor search [J].Proceedings of the VLDB Endowment,2015,9(1):1-12.
    [12] HUANG Q,FENG J,FANG Q,et al.Query-aware locality-sensitive hashing scheme for lp norm [J].The VLDB Journal,2017,26(5):683-708.
    [13] LEHMAN P L.Efficient locking for concurrent operations on B-trees [J].ACM TODS,1981,6(4):650-670.
    (1)http://yann.lecun.com/exdb/mnist
    (2)http://corpus-texmex.irisa.fr/
    (3)http://labelme.csail.mit.edu/Release3.0/
    (4)http://archive.ics.uci.edu/ml/datasets/P53+Mutants

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700