Web图像语义分析与自动标注研究

作者：许红涛
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：Web图像语义自动标注 ; 回归模型 ; 条件随机场 ; 语义共现性 ; Web多媒体信息检索
英文关键词：Automantic Web image annotation ; regression model ; conditional random field model ; semantic correlation ; Web multimedia information retrieval
学位年度：2009
导师：施伯乐
学科代码：081202
学位授予单位：复旦大学
论文提交日期：2009-04-15

摘要

Web图像通常关联着多种不同类型的信息,如图像本身的视觉特征(颜色、纹理、形状等)、关联的文本信息等,其语义内容或多或少地都与这些关联信息有关。图像的视觉特征空间和语义概念空间之间存在着巨大的“语义鸿沟”,使得基于视觉内容的图像语义自动标注方法的性能远远达不到人们的预期。而Web图像关联的文本信息更加接近Web图像的语义空间,因此利用Web图像的关联文本揭示其语义内容是Web图像语义自动标注的一种重要手段。然而,Web图像的语义内容在其关联文本上的分布是复杂多变的,不同的图像或语义关键词通常对应不同的语义分布。多数已有的Web图像语义自动标注方法或者把所有关联文本作为一个整体,或者仅仅根据先验知识或启发想法提前估计一个固定的语义分布模型,因此,Web图像语义自动标注的性能仍有待进一步提高。
     本文围绕Web图像语义内容在其关联文本上分布的复杂性和个异性特点,利用自适应学习的思想对Web图像语义自动标注开展研究,在多个方面进行了新的尝试,提出了多个具有较好性能的Web图像语义自动标注方法。本文还将Web图像语义自动标注应用到Web多媒体信息搜索中,对图文并茂的搜索方式进行了初步的尝试。
     本文主要研究内容如下:
     1.提出基于关联文本位置权重自适应学习的Web图像语义自动标注方法:通过基扩展的方法进一步考虑关联文本之间的高阶结构关系对预测Web图像语义内容的贡献,并提出利用一种新颖的分段惩罚加权回归模型对Web图像的语义内容在其关联文本上的分布进行自适应建模。实验证明所提出的Web图像语义自动标注方法大大提高了标注性能。
     2.提出基于自适应模型的Web图像语义自动标注方法:在基于关联文本位置权重自适应学习的Web图像语义自动标注方法的基础上,进一步考虑Web图像的视觉特征和先验知识对预测Web图像语义内容的贡献,提出利用受约束的分段惩罚加权回归模型对Web图像的语义内容在其关联文本上的分布进行自适应建模。实验证明所提出的Web图像语义自动标注方法大大提高了标注性能。
     3.提出基于条件随机场模型的Web图像语义自动标注方法:利用条件随机场模型将Web图像相关的各种不同类型的信息有效地集成起来,充分发挥各种信息对预测图像语义内容的贡献。特别地,提出利用Flickr标签(tag)资源来学习标注词之间的语义共现性。实验证明所提Web图像语义自动标注方法和基于Flickr标签的标注词之间的语义共现矩阵大大提高了标注的性能。
     4.提出一种基于标注的Web多媒体信息搜索原型系统:在传统搜索引擎和Web图像语义自动标注的基础上,提出了一个Web多媒体信息搜索原型系统:PictureBook。PictureBook系统利用Web搜索结果聚类、多文档文摘和Web图像语义自动标注等技术,将Web页面搜索和图像搜索有效地结合在一起,为用户返回图文并茂的搜索结果,从而更加便于用户获取知识。
Various types of information are usually available for Web images, such as the basic visual features (color, texture, shape etc.) and the associated textual features. It is well known that the semantics of Web images are well correlated with these associated informations. The previous research work demonstrate that there exist a huge "semantic gap" between the low-level visual space and the upper-level semantic space of images, and this results in the poor performance of visual content based image semantic annotation. The associated textual space is closer to the semantic space of Web images than the visual space, so they can be well used to infer the semantics of Web images. However, the relation between the semantic contents of Web images and their features is very intricate and various, and different annotation keywords or Web images usually correspond to different semantic distributions. Most previous work either regard the associated texts as a whole, or assign fixed weights to different types of associated texts only according to some prior knowledge or heuristics, and the performance of Web image semantic annotation is still need to be further improved.
     This paper studies the intricacy characteristic of semantic distribution of Web images. Based on the adaptive learning idea, several Web image annotation methods with good performance are proposed.
     The main works of this paper are as follows:
     1. Position weights adaptive learning of the associated texts based automatic Web image semantic annotation approach is proposed: we use the basic expansion method to further consider the semantic contributions of the high order structure relation between different types of the associated texts, and propose a piecewise penalized weighted regression model to adaptively model the Web image's semantic distribution on the corresponding associated texts. The experimental results on a real world benchmark show that our method can improve the annotation performance greatly.
     2. The adaptive model based automatic Web image semantic annotation approach is proposed: this method further leverages the visual features and the prior knowledge to improve the annotation performance on the basis of position weights adaptive learning of the associated texts based automatic web image annotation approach. To incorporate the contribution of prior knowledge, we propose a constrained piecewise penalized weighted regression mode to adaptively model the Web image's semantic distribution on the corresponding associated texts. The experimental results on a real world benchmark show that our method our method can improve the annotation performance greatly.
     3. The conditional random field model based automatic Web image semantic annotation approach is proposed: this method provides a unified annotation framework to combine different types of features of Web images to improve annotation performance. We further explore the manually tags resources of Flickr to improve the estimation of semantic correlation between annotation keywords. The experimental results on a real world benchmark show that our method outperforms the state-of-the-art Web image annotation method.
     4. A Web multimedia information retrieval prototype system is proposed: wepresent a novel Web multimedia information retrieval prototype system-PictureBook, which combines text and image retrieval using techniques of search results clustering, multiple document summarization and Web image semantics analysis. Particularly, audience can interactively investigate the effect of the combined text and image search results in Web information searching and knowledge acquisition.

引文

[Bar96]J.R .Bach etc., The virage image search engine:An open frame work for image management, In Proc.SPIE: Storage and Retrieval for Still Image and Video Databases IV 2670,1996: 76-87.
    [BC05]J.Bi and Y.X.Chen.A Sparse Support Vector Machine Approach to Region-Based Image Categorization.In: Proc.of the IEEE Conf.Computer Vision and Pattern Recognition.San Diego:IEEE Computer Society, 2005:1121-1128.
    [BF01]K.Barnard and D.A.Forsyth.Learning the Semantics of Words and Pictures.In Proc.International Conference on Computer Vision, Vancouver, Canada:IEEE Computer Society, 2001:408-415.
    [BY99]Ricardo Baeza-Yates, Berthier Ribeiro-Neto.Modern Information Retrieval, New York:ACM press, 1999:123-129.
    [Cai03]Deng Cai.Yu Shipeng,Wen Jirong,et al.VIPS:A Vision-Based Page Segmentation Algorithm [R].Microsoft Technical Report, MSR-TR-2003-79,2003.
    [CBW06]Y.X.Chen, J.B.Bi and J.Z.Wang.MILES: Multiple-Instance Learning via Embedded Instance Selection.IEEE Trans, on Pattern Analysis and Machine Intellience, IEEE CS Press,28(12), 2006:1931-1947.
    [CC07]J.Chen and C.Tang.Spatio-temporal markov random field for video denoising.In processing of CVPR'07,2007.
    [CCM07]G Carneiro, A.B.Chan, P.J.Moreno, N.Vasconcelos, Supervised Learning of Semantic Classes for Image Annotation and Retrieval, IEEE Trans, on Pattern Analysis and Machine Intellience, IEEE CS Press, 29(3), 2007:394-410.
    [CLKH08]L.Cao, J.Luo, H.Kautz, and T.Huang.Annotating collections of photos using hierarchical event and scene models.In processing of CVPR'08,2008.
    [CSY87]S.K.Chang, Q.Y.Shi, and C.Y.Yan, "Iconic indexing by 2-D strings, IEEE Trans, on Pattern Analysis and Machine Intellience, IEEE CS Press, 9(3), 1987:413-428.
    [DBF02]P.Duygulu, K.Barnard, J.F.G de Freitas, and D.A.Forsyth.Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary.In: Heyden A eds.Proc.of European Conference on Computer Vision.Berlin:Spring-Verlag, 2002:97-112.
    [DGS00]J.Ding, L.Gravano, and N.Shivakumar.Computing Geographical Scopes of Web Resource.In Proc.26th Intel.Conference on Very Large Data Bases (VLDB 2000), Amr El Abbadi etc.Eds., Cairo: Morgan Kaufmann, 2000:545-556.
    [DMJ07]Douglas L.Vail, Manuela M.Veloso, John D.Lafferty.Conditional random fields for activity recognition.In Proc.Of AAMAS'07, Honolulu Hawaii, USA, 2007:1331-1338.
    [FH95]M.Flickner etc., Query by image and video content: the QBIC system, IEEE Comput.28,1995:23-32.
    [Flickr]http://www.flickr.net/.
    [FML04]S.L.Feng, R.Manmatha and V.Lavrenko.Multiple Bernoulli Relevance Models for Image and Video Annotation.In: Proc.of the IEEE Conf.Computer Vision and Pattern Recognition.Washington DC: IEEE Computer Society, 2004:1002-1009.
    [FSC04]H.M.Feng, R.Shi and T.S.Chua.A Bootstrapping Framework for Annotating and Retrieving WEB Images.In Proc.12th ACM International Conference on Multimedia, H.Schulzrinne etc.Eds., New York:ACM Press, 2004:960-967.
    [GDH04]E.Gabrilovich, S.Dumais and E.Horvitz.Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty.In Proc.of the 13th International WWW Conference, New York: ACM Press, 2004:482-490.
    [GFX06]Y.L.Gao, J.P.Fan, X.Y.Xue and R.Jain.Automatic Image Annotation by Incorporating Feature Hierarchy and Boosting to Scale up SVM Classifiers.In: Klara N, Matthew T, Yong R, Wolfgang K, Ketan MP, eds.In Proc.of ACM International Conference on Multimedia.Santa Barbara:ACM Press, 2006.901-910.
    [GR95]V.N.Gudivada, and V.V Raghavan, Design and evaluation of algorithms for image retrieval spatial similarity, ACM Trans.on Information Systems, 13(2), 1995:115-144.
    [HP96]M.Hearst, J.Pedersen.Reexamining the cluster hypothesis: Scatter/gather on retrieval results, In processing of ACM SIGIR'96,1996:76-84.
    [HS92]R.M.Haralick and L.G.Shapiro, Computer and Robot Vision (Vol.1), Addison-Wesley,Reading, Boston, Mass., 1992.
    [Htmlparser]http://htmlparser.sourceforge.net.
    [HWL05]Z.Hua, X.Wang, Q.Liu, and H.Liu.Semantic Knowledge Extraction and Annotation forWeb Images.In: Zhang HZ, Chua TS, eds.Proc.of ACM International Conference on Multimedia.Singapore:ACM Press, 2005.467-470.
    [HWX05]Z.Hua, C.Wang, X.Xie, H.Lu, and W.-Y.Ma.Automatic Annotation of Location Information for WEB Images.In Proc.International Conference on Mulitimedia and Expo(ICME),Amsterdam:IEEE Computer Society, 2005:771-774.
    [JCS04]R.Jin, J.Y.Chai, and L.Si.Effective Automatic Image Annotation via A Coherent Language Model and Active Learning.In: Henning S, Nevenka D, eds.Proc.of International Conference on ACM Multimedia.New York:ACM Press, 2004:892-899.
    [JKW05]Y.Jin, L.Khan, L.Wang, M.Awad.Image Annotations By Combining Multiple Evidence & WordNet.In: Zhang HZ, Chua TS, eds.Proc.of ACM International Conference on Multimedia.Singapore:ACM Press, 2005.706-715.
    [JLM03]J.Jeon, V.Lavrenko, and R.Manmatha.Automatic Image Annotation and Retrieval using Cross-Media Relevance Models.In: Proc.of Intl.Conf.ACM SIGIR.Toronto, ACM Press,2003:119-126.
    [Kartoo]Kartoo: a metasearch engine, http://www.kartoo.com/.
    [KLR04]K.Kammamuru, R.Lotlikar, S.Roy, K.Singnal, R.Krishnapuram, A hierarchical monothetic document clustering algorithm for summarization and browsing search results, Inprocessing of WWW'04,2004:658-665.
    [LC06]X.R.Li, L.Chen,L.Zhang,F.Z.Lin, and W.Y.Ma.Image Annotation by Large-scale Content-based Image Retrieval.In Proc.of the 14th ACM International Conference on Multimedia, Klara Nahrstedt etc.Eds, Santa Barbara:ACM Press, 2006:607-610.
    [Leuski01]A.Leuski, Evaluating document clustering for interactive information retrieval, In processing of CIKM'01,2001:33-40.
    [Li06]Xirong Li, Le Chen,Lei Zhang,Fuzong Lin, and WeiYing Ma.Image Annotation by Large-scale Content-based Image Retrieval.In processing of MM'06,2006.
    [LLM06]J.Liu, M.J.Li, W.Y.Ma, Q.S.Liu, H.Q.Lu, An Adaptive Graph Model for Automatic Image Annotation, In: James ZW, Nozha B, eds.Proc.of ACM SIGMM International Workshop on Multimedia Information Retrieval.Santa Barbara:ACM Press, 2006.61-69.
    [LMJ04]V.Lavrenko, R.Manmatha, and J.Jeon.A Model for Learning the Semantics of Pictures.In: Sebastian T, Lawrence KS, Bernhard S, eds.Proc.of Neural Information Processing Systems(NIPS).Vancouver and WhistlenMIT Press, 2004:553-560.
    [LMP01]Lafferty J ,McCallum A , Pereira F.Conditional Random Fields : Probabilistic Model s for Segmenting and Labeling Sequence Data.In Proceedings of t he 18th International Conf on machine Learning, 2001.282 - 289.
    [LSD06]Lew, Sebe, Djeraba, Jain, Content-based Multimedia Information Retrieval: State of the Art and Challenges", ACM Transactions on Multimedia Computing, Communications, and Applications, ACM Press, 2006:1-19.
    [LTGT05]Y.Li, Y.Tsin, Y.Genc, and T.Kanade.Object detection using 2d spatial ordering contraints.In processing of CVPR'05, 2005.
    [LW06]J.Li and J.Z.Wang.Real-Time Computerized Annotation of Picture.In: Klara N, Matthew T, Yong R, Wolfgang K, Ketan MP, eds.Proc.of ACM International Conference on Multimedia.Santa Barbara:ACM Press, 2006:911-920.
    [LZZ06]T.Li, C.L.Zhang and S.H.Zhu.Empirical Studies on Multi-label Classification, In Proc.of 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI06), Washington: IEEE Computer Society, 2006:86-92.
    [MC05]D.Metzler and W.B.Croft, A Markov Random Field Model for Term Depedencies, In Proc.of 28rd Annual International ACM SIGIR.Salvador, Brazil: ACM Press, 2005: 472-479.
    [MG95]R.Mehrotra and J.Gary, Similar-shape retrieval in shape data management, IEEE Comput.28,1995:57-62.
    [MG04]Monay, F.and D.GaticaPerez.PLSA-based Image AutoAnnotation: Constraining the Latent Space.In: Henning S, Nevenka D, eds.Proc.of International Conference on ACM Multimedia.New York:ACM Press, 2004:348-351.
    [MM96]B .S.Manjunath and W.Ma, Texture features for browsing and retrieval of image data, IEEE Trans.Pattern Anal.Machine Intell.18(8), 1996:837-842.
    [MP07]B.Micusik and T.Pajdla.Multi-label image segmentation via max-sum solver.In processing of CVPR 07, 2007.
    [Nib93]W.Niblack etal, The QBIC project: Querying images by content using color, texture and shape, In Proc.SPIE:: Storage and Retrieval for Image and Video Databases, San Jose, Calif:SPIE,1993:173-182.
    [ONO04]M.Ohta, H.Narita, S.Ohno, Overlapping cluatering method using local and global importance of feature terms at NTCIR-4 Web task, Working notes of NTCIR (NⅡ-NACSIS Test Collection for IR system)-4 Vol.supl.l, 2004:37-44.
    [0099]Oren Zamir and Oren Etzioni, Grouper: A Dynamic Clustering Interface to Web Search Results, In processing of WWW'99,1999.
    [PPM04]R.Pedersen, S.Patwardhan, and J.Michelizzi.Wordnet:similarity - measuring the relatedness of concepts.In processing of AAAI'04,2004.
    [QHY]G.Qi, X.Hua, Y.Rui, J.Tang, T.Mei, and H.Zhang.Correlative multi-label video annotation.In processing of ACM SIGMMo
    [RJB00]R.Radev, Hongyan Jing, Malgorzata Budzikowska Centroid-based summarization of multiple documents: sentence extraction,utility-based evaluation, and user studies.In ANLP/NAACL 2000 Workshop, April 2000:21-29.
    [SAOO]R.Swan and J .Allan.TimeMine: Visualizing Automatically Constructed Timelines.In Proc.of 23rd Annual International ACM SIGIR.Athens: ACM Press, 2000:393.
    [Sanderson97]H.M.Sanderson, M.D.Dunlop.Image Retrieval by Hypertext Links.In Proceedings of SIGIR'97, 1997: 296-303.
    [SB99]F.Song and W.Bruce Croft.A General Language Model for Information Retrieval.In Proc.18th Conf.on Information and Knowledge Management.Mario J.Silva, etc.Eds., Lisbon:ACM Press, 1999:316-321.
    [SC97]J.Smith and S.Chang, Querying by color regions using the VisualSEEK content-based visual query system.In: M.Maybury, editor, Intelligent Multimedia Information Retrieval, AAAI ress, 1997.
    [SCL06]R.Shi, T.S.Chua, C.H.lee and S.Gao.Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation.In: Hari S, eds.Proc.of Conf.Image and Video Retrieval.Tempe:Lecture Notes in Computer Science,2006:102-112.
    [SD97]H.M.Sanderson and M.D.Dunlop.Image Retrieval by Hypertext Links.In Proc.of the 20th Annual International ACM SIGIR, Philadelphia:ACM Press, 1997: 296-303.
    [Sim97]J.R .Smith.Integrated Spatial and Feature Image Systems: Retrieval, Compression and Analysis.PhD thesis, Graduate School of Arts and Sciences, Columbia University, February 1997.
    [SJD04]Stanislaw Osinski, Jerzy Stefanowski, Dawid Weiss, Lingo: Search results clustering algorithm based on Singular Value Decomposition, Intelligent Information Systems Conference, Zakopane, Poland, 2004.
    [SQT00]H.T.Shen, B.C.Qoi and K.L.Tan.Giving meaning to WEB images.In Proceedings of ACM International Conference on Multimedia, 2000, LA.USA.39-47.
    [SVB05]M.Srikanth, J.Varner, M.Bowden, D.Moldovan.Exploiting Ontologies for Automatic Image Annotation.In: Ricardo ABY, Nivio Z, Gary M, Alistair M, John T, eds.Proc.of SIGIR.Salvador:ACM Press, 2005:552-558.
    [Tang07]J.Tang, X.-S.Hua, G.-J.Qi, M.Wang, T.Mei, and X.Wu.Structure-sensitive manifold ranking for video concept detection.In processing of ACM MM'07,2007.
    [TMY78]H.Tamura, S.Mori, and T.Yamawaki, Texture features corresponding to visual perception, IEEE Trans.Sys.Man.Cyb.SMC-8( 6), 1978:780-786.
    [TSW07]V.S.Tseng, J.H.Su, B.W.Wang, Y.M.Lin.WEB Image Annotation by Fusing Visual Features and Textual Information.In Proceedings of the 2007 ACM symposium on Applied computing, Symposium on Applied Computing, New York:ACM Press, 2007:1056 -1060 .
    [Wang07]B.Wang, Z.Li, N.Yu, and M.Li.Image annotation in a progressive way.In processing of ICME'07,2007: 1483-1490.
    [WF01]Allison Woodruff, Andrew Faulring, Ruth Rosenholtz, Julie Morrison, Peter Pirolli.Using Thumbnails to Search the Web.In processing of SIGCHI'01, March 31 -April 4, 2001, Seattle,WA, USA.
    [WS82]G.Wyszeckiand W.Stiles, Color Science:Concepts and Methods, Wiley Sons Inc.New York, 1982.
    [WZ06]X.Wang,L.Zhang,and etc.AnnoSearch:Image Auto-Annotation by Search.In:Hari S,Milind RN,John RS,Yong R,eds.Proc.of Conf.Image and Video Retrieval.Tempe:Lecture Notes in Computer Science,2006:1483-1490.
    [YD06]C.B.Yang and M.Dong,Region-based Image Annotation using Asymmetrical Support Vector Machine-based Multiple-Instance Learning,In Proc.of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,New York:IEEE Computer Society,2006:2057-2063.
    [YourTube]http://www.youtube.com/.
    [YSR05]A.Yavlinsky,E.Schofield,and S.Ruger.Annotation using global features and robust nonparametric density estimation.In Proc.of 5th International Conference on Image and Video Retrieval,Hari Sundaram,etc.,Eds.,Lecture Notes in Computer Science,Tempe:Springer,2005:507-517.
    [ZCZM04]Hua-Jun Zeng,Qi-Cai He Zheng,Chen Wei-Ying,Ma Jinwen Ma.Learning to Cluster Web Search Results,SIGIR'04,July 25-29,2004,Sheffield,South Yorkshire,UK.
    [ZGY02]Q.Zhang,S.A.Goldman,W.Yu,and J.Fritts,“Content-Based Image Retrieval Using Multiple-Instance Learning,” In Proc.19th Int'l Conf.Machine Learning,Claude Sammut,etc.Eds.,Sydney:Morgan Kaufmann,2002:682-689.
    [ZHCMM]H.J.Zeng,Q.C.He,Z.Chen,W.Y.Ma,J.Ma,Learning to cluster Web search results,In processing of SIGIR'04,2004:210-217.
    [Zhou07]X.Zhou,M.Wang,Q.Zhang,J.Zhang,and B.Shi.Automatic image annotation by an iterative approach:incorporating keyword correlations and region matching.In processing of CIVR'07,2007:25-32.
    [ZL04]C.X.Zhai and J.Lafferty.A Study of Smoothing Methods for Language Models Applied to Information Retrieval.ACM Transactions on Information Systems,Vol.22,No.2,April,2004:179-214.
    [ZNWZM]Jun Zhu,Zaiqing Nie,Jirong Wen,Bo Zhang,Weiing Ma,2D Conditional random fields for Web information extraction.In Proc.of the 22nd Int'l Conf.on Machine Learning,Bonn,Germany,2005:1044-1051.
    [ZNZW]Jun Zhu,Zaiqing Nie,Bo Zhang,Jirong Wen,Dynamic hierarchical Markov random fields and their applications to Web data extraction.In Proc.24th Int'l Conf.on Machine Learning,Corvallis OR,2007:1175-1182.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700