图像检索中自动标注与快速相似搜索技术研究

英文题名：Automatic Image Annotation and Fast Similarity Search in Image Retrieval
作者：王斌
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：自动图像标注 ; 图像标注改善 ; 重复图像检测 ; 相似图像检索
英文关键词：automatic image annotation ; annotation refinement ; duplicate image detection ; similar image search
学位年度：2007
导师：李明镜 ; 俞能海
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2007-04-01

摘要

当前成像技术的快速发展，使数码相机、可拍照手机等设备日益普及，各种各样的图像数量飞速增长。同时，互联网的诞生与发展极大地促进了人们之间的信息交流，也使图像传播变得更加方便快捷。越来越丰富的图像资源使用户难以在浩如烟海的数据中找到其真正需要的信息，从而，各种各样的图像检索技术得到了广泛的关注。
     现有的图像检索主要依赖于图像对应的标注信息，随着图像数量的快速增加，手工进行图像标注方法由于费用太过昂贵，已经不能满足人们的需要。所以，人们寻找能够自动生成图像标注的方法，近些年来已成为了研究热点。目前研究中遇到的问题主要是“语义鸿沟”问题以及巨大的图像数量带来的效率问题等。
     同时，基于图像视觉内容的检索也在许多领域具有非常重要的作用，如指纹检索、医学图像检索等等。而且，自动图像标注算法中也经常需要进行基于图像内容的检索。这类检索的一个关键问题是如何快速、准确地寻找到与查询图像相似或近似的图像集合。由于图像具有巨大的数据量，一般表示为高维空间中的矢量，所以其索引和检索变得十分困难。当需要处理的图像数目达到上百万甚至上亿张时，快速搜索近似图像将成为非常具有挑战性的任务。
     本文主要针对图像检索中的自动图像标注以及快速搜索相似图像等方面进行研究，主要研究内容和创新之处为：
     1、对自动图像标注算法进行了介绍，重点讨论了基于相关模型、生成式模型、传播式模型等几类得到广泛研究的标注方法。传统的图像标注算法主要研究图像与词汇之间的关系，而近期受到普遍关注的一类方法是利用词汇之间存在的统计和语义关系对已有的标注进行改善，我们也对这方面的代表性工作进行了介绍。
     2、本文详细分析了图像自动标注问题中的目标与涉及到的可用信息，提出了一种统一的自动图像标注的模型框架，将传统的自动图像标注问题扩展到包括自动标注与标注改善两个子问题。该框架可以清晰地解释现有的多种自动标注方法，帮助人们更好地理解自动图像标注问题。
     3、基于本文所提框架，我们提出了若干种有效的图像标注改进算法，分别改进了相应的图像关系计算方法、词汇间关系计算以及学习算法等部分。实验表明，本文提出的算法取得了明显效果，也说明了所提统一自动标注框架的有效性。
     4、基于内容的图像检索其核心问题是相似图像的检索问题，同时，在自动图像标注中经常需要寻找与待标注图像相似的图像集合。所以，我们探讨了快速搜索相似图像的算法。为了简化问题，我们首先讨论了如何在大规模图像库中快速进行重复图像的检测。针对此问题，我们提出了一种高效的图像表示和索引方法，该方法计算复杂度低，准确度高，所需存储量小，具有很好的检测性能。
     5、我们将该重复图像检测的方法进一步扩展到相似图像的搜索中，联合利用多种图像特征的表示和索引，通过机器学习的方法以最佳方式对这些信息进行组合，实现了对大规模图像集快速寻找相似图像的功能。
With the rapid development of imaging technology, digital cameras and other imaging devices are becoming more and more popular. So the number of available images increases at an explosive speed. Further, the Internet greatly facilitates the communication between people. The exchange and deliverer of digital images are very cheap and convenient. Meanwhile, the ever increasing number of images brings problems to end users: they cannot find what they really need from huge amount of available data. Therefore, a lot of image retrieval and search technologies are developed.
     Present image retrieval usually depends on the annotation information, which is the textual description of an image. While the number of images is fast increasing, manually labeling all images becomes infesible. Therefore, automatic image annotation receives great attention and research effort in recent years. The most difficult problems are "semantic gap" and efficiency problems due to the huge number of images.
     Besides, content-based image retrieval (CBIR) is necessary in many application areas, such as medicial image retrieval. Automatic image annotation also needs to perform CBIR in many cases. The key problem in CBIR is to quickly and precisely find images similar to the query one. Because images are often represented as high-dimensional vectors and their huge amount, both the index and search are very difficult. When the number of images increases to millions or billions, such fast similar image search will be a very challenging research problem.
     This dissertation focuses on the automatic image annotation and fast similar image search. To make it clear, the main content and contribution are listed below:
     1. Introdutions to the automatic image annotation algorithms. The emphasis is put on the relevance-based models, generative models, and label propagation methods. Some recent research utilize the correlation between words, either statistical or semantic, to refine the image annotation. Some of this type of work is also discussed.
     2. This dissertation makes an analysis to the goal and available information in the automatic image annotation. Then, a unified annotation framework is proposed. The traditional annotation is extended to include two sub-problems: basic image annotation and annotation refinement. With the proposed framework, many previous annotation methods can be clearly undetstood.
     3. Based on the proposed framework, this dissertation presents several effective improved image annotation methods. These methods improve the image relation, word relation and lerning process, respectively. The experiments show the improvements are effective. It also helps validate the proposed annotation framework.
     4. This dissertation also discusses the similar image search. First, we restrict our focus on the detection of duplicate images within an image set. We propose an efficient and concise representation of an image. The proposed method has low computational complexity, needs little storage cost and can achieve high detection performance.
     5. The method in duplication image detection is further generalized to conduct similar image search. We propose to use multiple kinds of image features and exploit AdaBoost method to combine the concise representations of these features. So the similar image search in large image database can be quickly performed with good performance.

引文

[1] Feng D., Siu W. C., and Zhang H.-J., "Multimedia Information Retrieval and Management: Technological Fundamentals and Applications", Springer-Verlag Berlin and Heidelberg GmbH & Co. K (May 2003)
    [2] Rui Y., Huang T. S., Ortega M., and Mehrotra S., "Relevacne Feedback: A Power Tool for Interactive Content-Based Image Retrieval", IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, No. 5, pp. 644-655, 1998
    [3] Furnas G. W., Landauer T. K., Gomez L. M. and Dumais S. T., "The vocabulary problems In Huamn-System Communication", in Communications of the ACM, Vol. 30, Issue 11, pp. 964-971, 1987
    [4] D. Elworthy. Retrieval from Captioned Image Databases using Natural Language Processing. In Proc. 9th Int'1 Conf. on Information and Knowledge Management pages 430-437, McLean, VA, 2000
    [5] Office Clip Art Images, http://office.microsoft.com
    [6] http://www.flickr.com
    [7] http://www.fotolia.com
    [8] World Wide Web Consortium(W3C) Recommendation, "HTML 4.01 Specification", 1999, available at http://www.w3.org/TR/htm1401/
    [9] Jeon J., and Manmatha R. "Automatic Image Annotation of News Images with Large Vocabularies and Low Quality Training Data", In Proceedings of ACM International Conference of Multimedia, 2004
    [10] http://www.google.com
    [11] http://www.msn.com
    [12] http://www.yahoo.com
    [13] http://www.photosig.com
    [14] Clough, P., Mueller, H. and Sanderson, M. "The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004", Fifth Workshop of the Cross-Language Evaluation Forum (CLEF 2004), Lecture Notes in Computer Science (LNCS), Springer, Heidelberg, Germany, 2005
    [15] Clarkson, K., "Nearest-neighbor searching and metric space dimensions" In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice. MIT Press, 2005
    [16] Christian Bohm, Stefan Berchtold, Daniel A. Keim, "Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases", ACM Computing Surveys (CSUR), v.33 n.3, p.322-373, September 2001
    [17] Smeulders A. W. M., Worring M., Santini S., et al.. "Content-based image retrieval at the end of the early years". IEEE Transaction on Pattern Analysis and Machine. Intelligence., 22(12): 1349-1380, 2000
    [18] Niblack W., Barber R. Equitz W. Flickner M., Glasman E., Petkovic D., Yanker P., Faloutsos C., and Taubin G., "The QBIC Project: Query Images by Content using Color, Texture, and Shape", In Proceedings of the SPIE Conference on Storage and Retrieval for Image and Video Databases, pp. 173-187, 1993
    [19] Ma W.-Y., and Manjunath B. S., "Netra: A Toolbox for Navigating Large Image Databases", Multimedia Systems, 7(3): 184-198, 1999
    [20] Wang J., Li J., and Wiederhold G., "SIMPLIcity: Semantics-Sensitive Integrated matching for Picture Libraries", in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 9, pp. 947-963, 2001
    [21] Carson C., and Ogle V., "Storage and Retrieval of Feature Data for a very large online image collection", IEEE computer Society Bulletin of the Technical Committee on Data Engineering, 19(4): 19-27, 1996
    [22] Carson C., Thomas M., Belongie S., Hetlerstein J. and Malik J., "Blobworld: a System for region-based image indexing and Retrieval", Proceedings of the Third International Conference on Visual Information and Information Systems, pp. 509-516, UK, 1999
    [23] Smith J., and Chang S.-F., "Automated Image Retrieval using Color and Texture", Technical Report CU/CTR 408-95-14, CTR, Columbia University, 1995
    [24] Smith J., and Chang S.-F., "Querying by color regions using the VisualSEEK content-based Visual Query System", Intelligent Multimedia Information Retrieval, AAAI Press, 1997
    [25] Smith J., "Integrated Spatial and Feature Image Systems: Retrieval, Compression and Analysis", PhD thesis, Columbia University, 1997
    [26] Veltkamp R. C., Tanase, "Content-based Image Retrieval Systems: A Survey", Technical Report UU-CS-2000-34, Dept. of Computing Science, Utrecht University, 2000
    [27] Quack T., Monich U., Thiele L., and Manjunath B. S., "Cortina: a System for Large Scale, Content-based Image Retrieval", in Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 508-511, USA, 2004
    [28] Manjunath B. S., Salembier P. and Sikora T., "Introduction to MPEG7: Multimedia Content Description Language", 2002
    [29] Nister D., Stewenius H., "Scalable Recognition with a Vocabulary Tree", 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Volume 2 (CVPR'06) pp. 2161-2168, 2006
    [30] Cusano C., Ciocca G., Schettini R. "Image Annotation using SVM", Proceedings of SPIE-IS&T Electronic Imaging, SPIE Vol. 5304, pp. 330-338, 2004
    [31] Gonzalez R., Woods R. E. "Digital Image Processing", 2 ed, Prentice Hall Press, ISBN 0-201-18075-8
    [32] Duygulu P., Barnard K., Freitas J., and Forsyth D., "Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary", Proceedings of the 7th European Conference on Computer Vision, London, UK, pp. 97-112, 2002
    [33] Shi J., and Malik J. "Normalized Cuts and Image Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 22, no. 8, pp. 888-905. Aug. 2000
    [34] Brown, Della Pietra S.A., Deta Pietra V. J., and Mercer R. L., "The Mathematics of Statistical Machine Translation: Parameter Estimation", Computational Linguistics, 32(2), pp. 263-311, 1993
    [35] Jeon J., and Manmatha R. "Using maximum Entropy for Automatic Image Annotation", in Proceedings of International Conference on Image and Video Retrieval, pp. 24-32, 2004
    [36] Berger A. L., Pietra V., and Pietra S., "A Maximum Entropy Approach to Natural Language Processing", Computational Linguistics, pp. 39-71, 1996
    [37] Carneiro G., Vasconcelos N., "Formulating Semantic Image Annotation as a Supervised Learning Problem", IEEE Conference on Computer Vision and Pattern Recognition, Washington, USA, pp. 163-168, 2005
    [38] Maron O., and Ratan A., "Multiple Instance Learning for Natural Scene Classification", Proceedings of the Fifteenth International Conference on Machine Learning, pp. 341-349, USA, 1998
    [39] Vasconcelos N., "Image Indexing with Mixture Hierarchies", in Proceedings of IEEE Conference in Computer Vision and Pattern Recognition, pp 3-10., USA, 2001
    [40] Lavrenko V., Croft B., "Relevance-based Langugage Models", in Proceedings of 24th Annual Internationl ACM SIGIR, pp. 120-127, 2001
    [41] J Jeon, V Lavrenko, R Manmatha, "Automatic Image Annotation and Retrieval Using Cross-Media Relevance Models", In Proceedings of SIGIR, 2003
    [42] Lavrenko V., Manmatha R., and Jeon J., "A Model for Learning the Semantics of Pictures", Proceedings of Advance in Neutral Information Processing, 2003
    [43] Feng S., Manmatha R., and Laverenko V., "Multiple Bernoulli Relevance Models for Image and Video Annotation", IEEE Conference on Computer Vision and Pattern Recognition, Washington, pp. 1002-1009, 2004
    [44] TrecVid, http:://www-nlpir.nist.bov/projects/trevid
    [45] Jin R., Chai J., and Si L, "Effective Automatic Image Annotation Via A Coherent Language Model and Active learning", Proceedings of the 12th Annual ACM International Conference On Multimedia, pp. 892-899, USA, 2004
    [46] Blimes J., "A Gental Tutorial of the EM algorithm and its Application to Paratmeter Estimation for Guassian Mixture and Hidden Markov Models", U.C. Berkeley, Technical Report, TR-97-021, April 1998
    [47] Zhang R., Zhang Z., Li M., et. al. "A Probabilistic Semantic Model for Image Annotation and Multimodal Image Retrieval", in Proceedings of 10th International Conference on Computer Vision, pp. 846-851, Beijing, 2005
    [48] Deerwester S., Dumais S., and Harshman R., "Indexing by Latent Semantic Analysis", Journal of the American Society for Information Science, 1990
    [49] Hofmann T., "Probabilistic Latent Semantic Indexing", In the Proceedings of ACM International Conference on Research and Development in Information Retrieval, USA, 1999
    [50] Monay F., and Gatica-Perez D., "On Image Auto-Annotation with latent Space Models", In Proceedings of ACM International Conference on Multimedia, pp.275-278, 2003
    [51] Monay F., and Gatica-Perez D., "PLSA-based Image Auto-Annotation: Constraining the Latent Space", In Proceedings of ACM International Conference on Multimedia, pp.348-351, 2004
    [52] Blei D., and Jordan M., "Modeling Annotated Data". In 26th International Conference on Research and Development in Information Retrieval, New York, 2003
    [53] Blei D., Ng A. Y., and Jordan M., "Latent Dirichlet Allocation". Journal of Machine Learning Research, 3, 993-1022, 2003
    [54] Zhou, D., Bousquet, O., Lal, T.N., et al.. "Learning with Local and Global Consistency". 18th Annual Conf. on Neural Information Processing Systems, pp. 37-244, 2003
    [55] Zhou, D., J. Weston, A. Gretton, O. et al., "Ranking on Data Manifolds." MPI Technical Report (113), Max Planck Institute for Biological Cybernetics, Tubingen, Germany (June 2003)
    [56] He J., Li M., Zhang H.-J., et.al., "Manifold-ranking based Image Retrieval", in Proceedings of ACM Multimedia 2004, pp.9-16, 2004
    [57] Tong H., He J., Li M., et al., "Graph Based Multi-Modality Learning", in Proceedings of ACM Multimedia 2005, pp. 862-871, 2005
    [58] Miller, G.A. "WordNet: A lexical database for English". Communication of ACM, 38, 11 (Nov. 1995), 39-4, 1995
    [59] Jin Y., Wang L. and Khan L., "Improving Image Annotations using WordNet", International Workshop on Multimedia Information Systems (MIS 2005), Sorrento, Italy, page: 115-130, September, 2005.
    [60] Inoue M., and Udea N., "Retrievaing Lightly Annotated Images using Image Similarities", Proceedings of the 2005 ACM symposium on Applied Computing, pp. 1031-1037, USA, 2005
    [61] Kang F., Jin R., and Sukthankar R., "Correlated Label Propagation with application to Multi-label Learning", in Proceedings of the 2006 IEEE Computer Society conference on Computer Vision and Pattern Recognition, pp. 1719-1726, USA, 2006
    [62] Parker R. G., "Discrete Optimization", Academic Press, 1988
    [63] Budanitsky A. and Hirst G. "Semantic distance in WordNet: An experimental, Application-oriented Evaluation of Five measure", In Workshop on Wordnet and Other Lexical Resources, 2nd of the North American Chapter of the ACL, Pittsburgh, 2001
    [64] Jin Y., Khan L., Wang L. and Awad M., "Image Annotations By Combining Multiple Evidence & WordNet", Proceedings of the 13th Annual ACM International Conference On Multimedia, pp. 706-715, Singapore, 2005
    [65] Jiang J. and Conrath D., "Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy", in Proceedings of International Conference Research on Comutational Linguistics", 1997
    [66] D. Lin. "Using Syntactic Dependency as a local Context to Resolve Word Sense Ambiguity". In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 64-71, 1997
    [67] Banerjee S. and Pedersen T. "Extended gloss overlaps as a measure of semantic relatedness". In Proceedings of the Eighteenth International Joint Conference on Articial Intelligence, pages 805-810, 2003.
    [68] Shafer G.. "A Mathematical Theory of Evidence". Princeton University Press, 1976.
    [69] Wang C., Jing F., Zhang L., and Zhang H.-J., "Image Annotation Refinement using Random Walk with Restarts", in Proceedings of ACM International Conference on Multimedia 2006, pp. 647-650, USA, 2006
    [70] Pan J.-Y., Yang H.-J., Faloutsos C., and Duygulu P., "Automatic Multimedia Cross-modal Correlation Discovery", in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 653-658, USA, 2004
    [71] Wang X., Ma W.-Y., Zhang L., and Xie X., Iteratively Clustering Web Images Based on Link and Attribute Reinforcements. Proceedings of the 13th Annual ACM International Conference on Multimedia, Singapore, 2005, pp. 122-131
    [72] Baeza-Yates R. and Ribeiro-Neto B.. Modern Information Retrieval. ACM Press, New York, 1999
    [73] Cilibrasi R. and Vitanyi P., "Automatic Meaning Discovery Using Google",
    [74] Qamra A., Y. Meng, E. Chang. Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. PAMI, 27, (2005) 379-391
    [75] C. Herley, "Why Watermarking is Nonsense," IEEE Signal Processing Magazine, pp. 10-11, September 2000
    [76] A. Jaimes, S-F. Chang., and A.C. Loui, "Detection of Non-Identical Duplicate Consumer Photographs," Proceedings of 4th IEEE Pacific Rim Conference on Multimedia, special session on Home Media Albums, Singapore, pp. 16-20, Dec. 15-18, 2003
    [77] Y. Ke, R. Sukthankar, and L. Huston, "Efficient Near-duplicate Detection and Sub-image Retrieval," Proceedings of ACM International Conference on Multimedia, New York, pp. 869-876, Oct. 10-16, 2004
    [78] S. Lin, Ozsu, M.T., Ozsu, V Oria, et al., "An Extendible Hash for Multi-Precision Similarity Querying of Image Databases," Proceedings of VLDB 2001, Roma, Italy, pp. 221-230, 2001
    [79] D-Q. Zhang, and S-F. Chang, "Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning," Proceedings of ACM International Conference on Multimedia, New York, pp. 877-884, Oct. 10-16, 2004
    [80] H. Ferhatosmanoglu, and e. Tuncel, "Vector Approximation based Indexing for Non-uniform High Dimensional Data Sets," Proceedings of 9th Conference on Information and Knowledge Management, McLean, USA, pp 202-209, 2000
    [81] E.A. Riskin, "Optimal Bit Allocation via the Generalized BFOS algorithm," IEEE Trans. on Information Theory, Vol. 37, No. 2, pp. 400-402, March 1991

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700