基于概率建模图像标注算法的研究及实现

英文题名：Research and Implementation on Image Annotation Using Probability Modeling
作者：丁雷
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：图像标注 ; 概率建模 ; 相关模型
英文关键词：Image Annotation ; Probability Modeling ; Relevance Models
学位年度：2010
导师：须德
学科代码：081202
学位授予单位：北京交通大学
论文提交日期：2010-06-17
答辩委员会主席：王宁

摘要

自动图像标注是解决人工标注问题的具有挑战性的工作,它试图在高层语义特征和底层视觉特征之间建立一座桥梁。特别随着机器学习理论的不断发展,很多学者设计出了不同的学习模型,大致可分为两类,即基于概率建模的图像标注和基于分类器的图像标注。
本文首先研究两种具有代表性的基于概率建模的标注算法,分别是共现模型和翻译模型。共现模型将图像划分成规则区域,根据图像区域和关键词的共现概率来标注图像,即观察关键词与图像区域的联合发生概率。翻译模型改进了共现模型,提供一种描述图像的新概念——视觉词元。视觉词元通过图像特征聚类后得到,那么每幅图像都包含一个视觉词元集合,图像标注可以看作是从视觉词元“翻译”成为关键词的过程。结合共现模型和翻译模型的思想,本文设计了一种改进相关模型。假设有一个已标注的训练图像集合,通过图像划分聚类后可获得其视觉词元集合,那么每幅图片就可以用视觉词元和关键字两个集合联合表示。再给定一个测试图像,使用语言生成模型方法假设存在一个潜在的概率分布,即相关模型,其包含所有可能出现在图像中的关键词和视觉词元,那么标注过程就是对这个概率分布进行随机抽样。通过训练集可以近似估计这个联合分布,再通过抽样概率值大小提取最有代表性的关键词作为图像的标注结果。这种改进相关模型技术可以有效地利用大规模的带标注的训练图像集,达到更好的标注效果。最后,在Corel数据集上的实验证实了该模型的有效性。
Automatic image annotation is a challenging work to solve the problem of manually annotation; it tries to build a bridge between the semantic features in high-level and bottom visual features. Especially with the development of machine learning theory, many researchers have designed different learning models about automatic annotation algorithms, which generally can be divided into two categories: probability-based model and classifier-based model.
This paper studies two representative annotation algorithms using probability models firstly. They are the Co-occurrence Model and the Translation Model. In the first model they observe the co-occurrence of keywords with image regions which are created using a regular grid. And they annotate the images by the association probability. The Translation Model is a substantial improvement on the Co-occurrence Model. It provides a new concept to describe images using a vocabulary of blobs. Blobs are generated from image features using clustering. Each image is generated by using a certain number of its blobs. They assume that image annotation can be viewed as the process of translating from a vocabulary of blobs to a vocabulary of keywords. Based on Co-occurrence Model and Translation Model, the paper improves and uses a relevance model. For a training set of images, each image in the set has a dual representation in terms of both keywords and blobs. Given a test image, we adopt a generative language modeling approach and assume that there exists some underlying probability distribution, referred to relevance model. The model can be thought of as a set that contains all possible blobs that could appear in the image, as well as all words. So the annotation process is the result of random samples from it. It is to develop probabilistic models to estimate the conditional probability between words and blobs by the training set. This model gets a better significantly performance on the large set of annotated images. Experiments on Corel image databases show the effectiveness and efficiency of the proposed approach.

引文

[1]K. Barnard, D. Forsyth. Learning the semantics of words and pictures. In International Conference on Computer Vision, Vol.2, pages 408-415,2001
    [2]D. Blei, Michael, M. I. Jordan. Modeling annotated data. To appear in the Proceedings of the 26th annual international ACM SIGIR conference
    [3]A. W. Smeulders, M..Worring, S. Santini, et al. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions, 22:1349-1380,2000
    [4]Gustavo Carneiro, B. Chan, J. Moreno, and Nuno Vasconcelos, Supervised Learning of Semantic Classes for Image Annotation and Retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.29, No.3,2007.3
    [5]Jiayu Tang and Paul H. Lewis. A Study of Quality Issues for Image Auto-Annotation With the Corel Dataset, IEEE Transactions on Circuits and Systems for Video Technology, Vol.17, No.3,2007.3
    [6]Xin-Jing Wang, Lei Zhang, Xirong Li, and Wei-Ying Ma. Annotating Images by Mining Image Search Results. IEEE Transactions on Pattern Analysis and Machine Intelligence, TPAMISI-2007-09-0630
    [7]C. Yang, M. Dong, F. Fotouhi. Region Based Image Annotation through Multiple-Instance Learning. Proceedings of the 12th annual ACM international conference on Multimedia, 2005
    [8]V. Lavrenko and W. Croft. Relevance-based language models. Proceedings of the 24th annual international ACM SIGIR conference, pages 120-127,2001
    [9]Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. In MISRM'99 First International Workshop on Multimedia Intelligent Storage and Retrieval Management,1999
    [10]M. Das and R. Manmatha and E. M. Riseman. Indexing Flowers by Color Names using Domain Knowledge-driven Segmentation. IEEE Intelligent Systems,14(5):24-33,1999.
    [11]S. L. Feng, R. Manmatha and V. Lavrenko. Multiple Bernoulli Relevance Models for Image and Video. Annotation Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2004
    [12]P. Duygulu, K. Barnard, J.F.G de Freitas, and D.A. Forsyth. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. Proceedings of the 7th European Conference on Computer Vision, pages 97-112,2002
    [13]P. Brown, S. D. Pietra, V. D. Pietra, and R. Mercer. The mathematics of statistical machine translation: Parameter estimation. In Computational Linguistics,19(2):263-311,1993
    [14]R. W. Picard and T. P. Minka. Vision Texture for Annotation. In Multimedia Systems, 3(1):3-14,1995
    [15]K. Barnard, D. Forsyth. Learning the semantics of words and pictures. In International Conference on Computer Vision, Vol.2, pages 408-415,2001
    [16]A. Berger, and J. Lafferty. Information retrieval as statistical translation. In Proceedings of the 22nd annual international ACM SIGIR conference, pages 222-229,1999
    [17]王梅,周向东,许红涛,施伯乐.基于可判别超平面树的生成模型图像标注方法.Journal of Software, Vol.20, No.9, pages 2450-2461,2009
    [18]王梅,周向东,张军旗,许红涛,施伯乐.基于扩展生成语言模型的图像自动标注方法.Journal of Software, Vol.19, No.9, pages 2449-2460,2008
    [19]周文昭,夏定元,周曼丽等.基于内容的图像检索系统的最新进展.计算机工程与应用,(26)：112-115,2002
    [20]C. Gusano, G Ciocca, R. Scettini. Image Annotation Using SVM. Proceedings of Internet Imaging V. the SPIE,2003
    [21]J. Shi and J. Malik. Normalized cuts and Image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence,22(8):888-905,2000
    [22]C. B. Yang, M. Dong, Region-Based image annotation using asymmetrical support vector machine-based multiple-instance learning. In:Proc. of the 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. New York:IEEE Computer Society, 2057-2063,2006
    [23]Y. L. Gao, J. P. Fan, X. Y. Xue, R. Jain. Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers. In:Klara N, Matthew T, Yong R, Wolfgang K, Ketan MP, eds. Proc. of the ACM Int'l Conf. on Multimedia. Santa Barbara: ACM Press, pages 901-910,2006
    [24]M. Srikanth, J. Varner, M. Bowden, D. Moldovan. Exploiting ontologies for automatic image annotation. In:Ricardo ABY, Nivio Z, Gary M, Alistair M, John T, eds. Proc. of the SIGIR. Salvador:ACM Press, pages 552-558,2005
    [25]M. Marszalek, C. Schmid. Semantic hierarchies for visual object recognition. In:Proc. the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. Minneapolis: IEEE Computer Society Press,2007
    [26]J. A. Lasserre, C. M. Bishop, T. P. Minka. Principled hybrids of generative and discriminative models. In:Proc. the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. New York:IEEE Computer Society Press, pages 87-94,2006
    [27]H. Grabner, P. M. Roth, H. Bischof. Eigenboosting:Combining discriminative and generative information. In:Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. Minneapolis:IEEE Computer Society Press,2007
    [28]V. Lavrenko, M. Choquette, and W. Croft. Cross-lingual relevance models. Proceedings of the 25th annual international ACM SIGIR conference, pages 175-182,2002
    [29]J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. Proceedings of the 24th annual international ACM SIGIR Conference, pages 111-119,2001
    [30]J. M. Ponte, and W. B. Croft, A language modeling approach to information retrieval. Proceedings of the 21st annual international ACM SIGIR Conference, pages 275-281, 1998
    [31]J. B. Bi, Y. X. Chen. A sparse support vector machine approach to region-based image categorization. In:Proc. of the IEEE Conf. Computer Vision and Pattern Recognition. San
    Diego:IEEE Computer Society, pages 1121-1128,2005 [32] T. Wang, Y. Rui, J. G Sun. Contraint based region matching for image retrieval. Int'l Journal of Computer Vision,56(1/2):37-45,2004 [33] F. Monay, D. Gatica-Perez. On image auto-annotation with latent space models. In: Lawrence AR, Harrick MV, Thomas P, Prashant JS, John RS, eds. Proc. of the ACM Int'l Conf. on Multimedia. Berkeley:ACM Press, pages 275-278,2003 [34] J. Li, J. Z. Wang. Real-Time computerized annotation of picture. In:Klara N, Matthew T, Yong R, Wolfgang K, Ketan MP, eds. Proc. of the ACM Int'l Conf. on Multimedia. Santa Barbara:ACM Press, pages 911-920,2006 [35] C. Carson, M. Thomas, S. Belongie, J. M. Hellerstein, and J. Malik. Blobworld:A system for region-based image indexing and retrieval. In Third International Conference on Visual Information Systems, Lecture Notes in Computer Science,1614, pages 509-516,1999

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700