用户名: 密码: 验证码:
残差增强的图像描述符
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Residual Enhanced Image Descriptor
  • 作者:魏本昌 ; 郑丽 ; 管涛
  • 英文作者:Wei Benchang;Zheng Li;Guan Tao;School of Electrical and Information Engineering, Hubei University of Automotive Technology;School of Computer Science & Technology, Huazhong University of Science and Technology;
  • 关键词:图像描述符 ; 层次视觉码书 ; L2归一化 ; 积量化
  • 英文关键词:image descriptor;;hierarchical visual codebook;;L_2-normalization;;production quantization
  • 中文刊名:JSJF
  • 英文刊名:Journal of Computer-Aided Design & Computer Graphics
  • 机构:湖北汽车工业学院电气与信息工程学院;华中科技大学计算机科学与技术学院;
  • 出版日期:2019-06-15
  • 出版单位:计算机辅助设计与图形学学报
  • 年:2019
  • 期:v.31
  • 基金:国家自然科学基金(61272202);; 湖北汽车工业学院博士基金(BK201603)
  • 语种:中文;
  • 页:JSJF201906020
  • 页数:7
  • CN:06
  • ISSN:11-2925/TP
  • 分类号:173-179
摘要
针对增大视觉码书在提高图像全局描述符——局部特征聚合描述符(VLAD)精度的同时会增加VLAD存储开销的问题,提出一种基于2层结构层次视觉码书生成残差增强的图像全局描述符EVLAD.离线码书生成阶段,首先通过K-means算法生成第1层视觉码书,然后基于量化残差最小化原则非均匀地生成第2层各视觉子码书.在线EVLAD生成阶段,图像局部特征首先面向细粒度的第2层视觉子码书生成量化残差;然后面向第1层视觉码书进行聚集生成各子向量,EVLAD即为各子向量的串联结果,为了抑制特征空间爆发现象,各子向量和串联结果分别进行了L_2归一化.实验结果表明, EVLAD精度优于VLAD和其他各种改进方法.
        For VLAD, higher search accuracy will be obtained by increasing the size of visual codebook, but more memory usage is entailed. To solve the contradiction between search quality and memory usage, a global image descriptor called EVLAD, aggregating finer residual by use of hierarchical visual codebook with two-layer structure, is proposed. In the offline preprocessing stage, firstly, the first layer visual codebook is learned with K-means in the local descriptor space, and then each visual sub-codebook of the second layer is generated non-uniformly based on the quantization residual minimization criterion. In the online generation stage, the idea of EVLAD is associating the residual generation and accumulation process to different layer visual words, i.e., for a local descriptor, the residual, which is generated by subtracting the second layer nearest visual word from the local descriptor, is summed to a vector corresponding to one of the first layer visual word, and then EVLAD is the concatenation of all vector. In order to suppress the burst phenomena in feature space, L_2-normalization is employed for each subsector and the final concatenation vector. The experimental result shows our EVLAD outperforms VLAD and other modified strategies.
引文
[1]Lowe D G.Object recognition from local scale-invariant features[C]//Proceedings of the 7th International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,1999,2:1150-1157
    [2]Lowe D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110
    [3]Zhuang Liansheng,Fei Chi,Zhong Hangshi,et al.D-SIFT:an extended SIFT feature in the DCT domain[J].Journal of Computer-Aided Design&Computer Graphics,2015,27(10):1859-1864(in Chinese)(庄连生,费驰,钟航世,等.D-SIFT:一种面向DCT域的扩展SIFT特征[J].计算机辅助设计与图形学学报,2015,27(10):1859-1864)
    [4]Bay H,Tuytelaars T,van Gool L.SURF:speeded up robust features[C]//Proceedings of the European Conference on Computer Vision.Heidelberg:Springer,2008:404-417
    [5]Tola E,Lepetit V,Fua P.A fast local descriptor for dense matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEEComputer Society Press,2008:1-8
    [6]Tola E,Lepetit V,Fua P.Daisy:an efficient dense descriptor applied to wide-baseline stereo[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(5):815-830
    [7]Fan B,Wu F C,Hu Z Y.Aggregating gradient distributions into intensity orders:a novel local image descriptor[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2011:2377-2384
    [8]Rublee E,Rabaud V,Konolige K,et al.ORB:an efficient alternative to SIFT or SURF[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEEComputer Society Press,2011:2564-2571
    [9]Leutenegger S,Chli M,Siegwart R Y.BRISK:binary robust invariant scalable keypoints[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2011:2548-2555
    [10]Alahi A,Ortiz R,Vandergheynst P.Freak:fast retina keypoint[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2012:510-517
    [11]Yuan Hongliang,Wu Fukun,Zheng Changwen.Adaptive sampling guided by SURE and sample color histogram reconstruction[J].Journal of Computer-Aided Design&Computer Graphics,2016,28(4):533-539(in Chinese)(袁红亮,吴付坤,郑昌文.用SURE引导自适应采样与采样颜色直方图重构[J].计算机辅助设计与图形学学报,2016,28(4):533-539)
    [12]Oliva A,Torralba A.Modeling the shape of the scene:a holistic representation of the spatial envelope[J].International Journal of Computer Vision,2001,42(3):145-175
    [13]Hu Dameng,Huang Weiguo,Yang Jianyu,et al.Improved shape matching algorithm based on discrete curve evolution[J].Journal of Computer-Aided Design&Computer Graphics,2015,27(10):1865-1897(in Chinese)(胡大盟,黄伟国,杨剑宇,等.改进离散曲线演化的形状匹配算法[J].计算机辅助设计与图形学学报,2015,27(10):1865-1897)
    [14]Sivic J,Zisserman A.Video Google:a text retrieval approach to object matching in videos[C]//Proceedings of the 9th IEEEInternational Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2003:1470-1477
    [15]Yang J C,Yu K,Gong Y H,et al.Linear spatial pyramid matching using sparse coding for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2009:1794-1801
    [16]Perronnin F,Dance C.Fisher kernels on visual vocabularies for image categorization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2007:1-8
    [17]Perronnin F,Liu Y,Sánchez J,et al.Large-scale image retrieval with compressed fisher vectors[C]//Proceedings of the IEEEComputer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2010:3384-3391
    [18]Jégou H,Douze M,Schmid C,et al.Aggregating local descriptors into a compact image representation[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2010:3304-3311
    [19]Jégou H,Perronnin F,Douze M,et al.Aggregating local image descriptors into compact codes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(9):1704-1716
    [20]Arandjelovic R,Gronat P,Torii A,et al.Netvlad:CNN architecture for weakly supervised place recognition[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2018,40(6):1437-1451
    [21]Duta I C,Uijlings J R R,Ionescu B,et al.Efficient human action recognition using histograms of motion gradients and VLAD with descriptor shape information[J].Multimedia Tools and Applications,2017,76(21):22445-22472
    [22]Jegou H,Douze M,Schmid C.Product quantization for nearest neighbor search[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(1):117-128
    [23]Xu D,Tsang I W,Zhang Y.Online product quantization[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(11):2185-2198
    [24]Hwang Y,Han B,Ahn H K.A fast nearest neighbor search algorithm by nonlinear embedding[C]//Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2012:3053-3060
    [25]Ai L F,Yu J Q,Guan T.Spherical soft assignment:improving image representation in content-based image retrieval[C]//Proceedings of the Pacific-Rim Conference on Multimedia.Heidelberg:Springer,2012:801-810
    [26]Arandjelovic R,Zisserman A.All about VLAD[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2013:1578-1585
    (1)http://lear.inrialpes.fr/people/jegou/data.php#holidays
    (2)http://bigimbaz.inrialpes.fr/herve/ukbench_descriptors

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700