Image Annotation by Propagating Labels from Semantic Neighbourhoods

详细信息查看全文

作者：Yashaswi Verma ; C. V. Jawahar
关键词：Image annotation ; Nearest neighbour ; Metric learning ; Cross ; media analysis
刊名：International Journal of Computer Vision
出版年：2017
出版时间：January 2017
年：2017
卷：121
期：1
页码：126-148
全文大小：
刊物类别：Computer Science
刊物主题：Computer Imaging, Vision, Pattern Recognition and Graphics; Artificial Intelligence (incl. Robotics); Image Processing and Computer Vision; Pattern Recognition;
出版者：Springer US
ISSN：1573-1405
卷排序：121

文摘

Automatic image annotation aims at predicting a set of semantic labels for an image. Because of large annotation vocabulary, there exist large variations in the number of images corresponding to different labels (“class-imbalance”). Additionally, due to the limitations of human annotation, several images are not annotated with all the relevant labels (“incomplete-labelling”). These two issues affect the performance of most of the existing image annotation models. In this work, we propose 2-pass k-nearest neighbour (2PKNN) algorithm. It is a two-step variant of the classical k-nearest neighbour algorithm, that tries to address these issues in the image annotation task. The first step of 2PKNN uses “image-to-label” similarities, while the second step uses “image-to-image” similarities, thus combining the benefits of both. We also propose a metric learning framework over 2PKNN. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label data. In addition to the features provided by Guillaumin et al. (2009) that are used by almost all the recent image annotation methods, we benchmark using new features that include features extracted from a generic convolutional neural network model and those computed using modern encoding techniques. We also learn linear and kernelized cross-modal embeddings over different feature combinations to reduce semantic gap between visual features and textual labels. Extensive evaluations on four image annotation datasets (Corel-5K, ESP-Game, IAPR-TC12 and MIRFlickr-25K) demonstrate that our method achieves promising results, and establishes a new state-of-the-art on the prevailing image annotation datasets.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700