图像分类和图像语义标注的研究

英文题名：The Study on Image Classifition and Image Annotation
作者：张磊
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：基于内容的图像检索 ; 图像分类 ; 图像语义 ; 图像标注 ; 纹理分类 ; 支持向量机 ; MPEG-7
英文关键词：CBIR ; Image Classification ; Image Semantic ; Image Annotation ; Texture Classification ; SVM ; MPEG-7
学位年度：2008
导师：马军
学科代码：081201
学位授予单位：山东大学
论文提交日期：2008-04-05

摘要

随着多媒体技术的发展和Internet的普及,人们获得各种多媒体信息越来越容易,其中图像是数量最多的一种,如何有效地、快速地从大规模图像数据库中检索出所需要的图像已成为人们日益关注的问题。
     基于内容的的图像检索技术(Content-Based Image Retrieval,CBIR)利用图像的底层视觉特征(颜色,纹理,形状等)代表图像的内容,由于图像的底层视觉特征与图像的语义表达之间存在“语义鸿沟”,传统的CBIR技术不能满足人们按语义检索图像的需求。如果事先对图像集合按语义进行合理地分类或者标注,会极大提高CBIR系统的性能。本文主要研究基于图像底层视觉特征的图像语义分类和语义自动标注。本文的主要贡献在以下几点:
     1.提出了一种基于Gabor变换和支持向量机(Support Vector Machine,SVM)的纹理分类算法,该算法具有旋转不变性。在实验过程中,为确保分类器对旋转后的图像特征“一无所知”,训练集和测试集分别选自不同旋转角度图像的上半部分和下半部分,保证了本实验是一个真正意义上的旋转不变实验。在Brodatz和UIUCTex两个数据集中的实验表明,该纹理分类方法是有效可行的,在某些类别上的分类准确率可以达到100%,分类准确率和时间复杂度均优于kNN(kNearest Neighbors)算法。
     2.提出一种基于SVM并综合MPEG-7视觉描述子的图像分类算法。由于图像集中有多个语义类别,使用多类分类策略构建一个多类SVM分类器。图像特征使用MPEG-7 Experimentation Model软件从图像中提取。在实验中用到了多种颜色和纹理描述子,对比了各种描述子结合SVM分类器在Corel 1K图像集中的分类准确率和时间复杂度。实验同时表明,合理地综合使用多种视觉描述子可以取得更高的分类准确率。
     3.提出了一种基于SVM分类器的图像语义自动标注算法。图像特征是基于MPEG-7颜色和纹理描述子的全局特征。每个标注词对应一个二分SVM分类器,针对多个语义词,利用多类分类策略构建一个多类分类器,这就建立了图像底层特征与语义词之间的关联。SVM分类器的输出采用后验概率形式,以方便地比较图像属于各个语义词类别的可能性。实验在Corel 5000数据集中进行,首先使用Poner stemming算法对所有语义词进行stemming操作,并舍弃图像数过少的语义词,共有82个词可用于构建分类器。实验过程中采取了两种策略选取标注词,并对比了两种策略的实验结果。评价标注结果时,使用了分别针对图像和标注词的准确率和召回率,结果评价更加客观、全面。
With the development of multimedia technology and the popularization of Internet, people can acquire multimedia information in large amount. How to retrieve the images from image database precisely and efficiently has been an important issue in the field of image retrieval.
     Content-Based Image Retrieval(CBIR) extracts visual features as retrieval features, such as color, texture and shape, etc. For the existence of semantic gap between low-level image features and human understanding to images, CBIR can't get satisfied retrieval results. Classifying images into reasonable categories using low-level features or annotating images will greatly improve the performance of CBIR systems. This thesis does a study of image classification and image annotation. The main contributions of this thesis are as follows:
     1. Propose a method of rotation invariant texture classification using Gabor transform and Support Vector Machine(SVM). To make sure the classifier knows nothing about the characters of rotated images, we create the training set from the subimages from the top half of none rotation image. The subimages from the foot half of rotated images are grouped to the testing set. This method is tested on Brodatz and UIUCTex datasets and the experimental results demonstrate that it is effective and efficient. The precision can be as high as 100% in some classes.
     2. Propose a method of image classification based on MPEG-7 color and texture descriptors, using SVM as classifier. For there are several classes in image dataset, the approach constructs the multi-class SVM with the help of multi-class classification strategy. Image features are extracted using MPEG-7 Experimentation Model software. The experiment with Corel 1K utilizes several color and texture descriptors. Classification precision and time complexity are given.. The results show that if we properly fuse the MPEG-7 descriptors the higher precision can be achieved.
     3. Propose a method of image annotation using MPEG-7 descriptors and SVM The image features are global features based on MPEG-7 color and texture descriptors. The method builds a binary SVM according to each word. For there are a lot of words usually, the method constructs the multi-class SVM with help of multi-class classification strategy. Therefore, this multi-class SVM establishes a mapping from images to words. The output of SVM classifier is modified to posterior probability form so we can get the probability estimates. In the experiment with Corel 5000 dataset, the method use Porter stemming algorithm as the first step. By eliminating the words with so few images, 82 words are used to build SVM classifier. The mean per-word precision and recall as well as mean per-image precision and recall are adopted for evaluating annotation effectiveness.

引文

[1] Huang, T. and Y. Rui, Image retrieval: Past, present, and future. International Symposium on Multimedia Information Processing, 1997.

    [2] Smeulders, A.W.M., M. Worring, S. Santini, A. Gupta, and R Jain, Content-Based Image Retrieval at the End of the Early Years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. 22(12): p. 1349-1380.

    [3] Wang, J.Z., J. Li, D. Chan, and G. Wiederhold, Semantics-sensitive Retrieval for Digital Picture Libraries. 1999, Corporation for National Research Initiatives, p. 1082-9873.

    [4] Datta, R., D. Joshi, J. Li, and J.Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age. 2007. p. 65.

    [5] Szummer, M. and R.W. Picard, Indoor-outdoor image classification. IEEE International Workshop on Content-based Access of Image and Video Databases, in conjunction with ICCV, 1998. 98: p. 42-51.

    [6] Mojsilovic, A. and J. Gomes. Semantic based categorization, browsing and retrieval in medical image databases. in Image Processing. 2002. Proceedings. 2002 International Conference on. 2002.

    [7] Al-Khatib, W., Y.F. Day, A. Ghafoor, and P.B. Berra, Semantic modeling and knowledge representation in multimediadatabases. Knowledge and Data Engineering, IEEE Transactions on, 1999. 11(1): p. 64-80.

    [8] Colombo, C, A. Del Bimbo, and P. Pala, Semantics in visual information retrieval. Multimedia, IEEE, 1999. 6(3): p. 38-53.

    [9] Bruzzone, L. and D.F. Prieto, Unsupervised retraining of a maximum likelihood classifier for theanalysis of multitemporal remote sensing images. Geoscience and Remote Sensing, IEEE Transactions on, 2001. 39(2): p. 456-460.

    [10] Niblack, C.W., R Barber, W. Equitz, M.D. Flickner, E.H. Glasman, D. Petkovic, P. Yanker, C. Faloutsos, and G. Taubin, QBIC project: querying images by content, using color, texture, and shape. Proceedings of SPIE, 1993. 173(1993).
    [11]Tamura,H.,S.Mori,and T.Yamawaki,Textural features corresponding to visual perception.IEEE Transactions on Systems,Man and Cybemetics,1978.8:p.460-473.
    [12]Pentland,A.,R.W.Picard,and S.Sclaroff,Photobook:Content-based manipulation of image databases.International Journal of Computer Vision,1996.18(3):p.233-254.
    [13]Smith,J.R.and S.F.Chang,VisualSEEk:a fully automated content-based image query system.Proceedings of the fourth ACM international conference on Multimedia,1997:p.87-98.
    [14]Smith,J.R.and S.F.Chang,Visually searching the Web for content.Multimedia,IEEE,1997.4(3):p.12-20.
    [15]Ma,W.Y.and B.S.Manjunath,Edge flow:a framework of boundary detection and image segmentation.Proc.of CVPR,1997:p.744-749.
    [16]Huang,T.S.,S.Mehrotra,and K.Ramchandran,Multimedia analysis and retrieval system(MARS)project.Proc of 33rd Annual Clinic on Library Application of Data Processing-Digital Image Access and Retrieval,1996.14(1):p.212-225.
    [17]Wang,J.Z.,J.Li,and G Wiederhold,SIMPLicity:Semantics-Sensitive Integrated Matching for Picture Libraries.Advances in Visual Information Systems:4th International Conference,VISUAL 2000,Lyon,France,November 2-4,2000:Proceedings,2000.
    [18]Cox,I.J.,M.L.Miller,T.P.Minka,T.V.Papathomas,and P.N.Yianilos,The Bayesian Image Retrieval System,PicHunter.Theory,Implementation,and Psychophysical Experiments.Readings in Multimedia Computing and Networking,2001.
    [19]Bach,J.R.,C.Fuller,A.Gupta,A.Hampapur,B.Horowitz,R.Humphrey,R.C.Jain,and C.F.Shu,Virage image search engine:an open framework for image management.Proceedings of SPIE,1996.2670:p.76.
    [20]Natsev,A.,R.Rastogi,and K.Shim,WALRUS:a similarity retrieval algorithm for image databases.Knowledge and Data Engineering,IEEE Transactions on,2004.16(3):p.301-316.
    [21]Zhang,L.,F.Lin,and B.Zhang,Support vector machine learning for image retrieval.Image Processing,2001.Proceedings.2001 International Conference on,2001.2.
    [22]易文晟,图像语义检索和分类技术研究.2007,浙江大学.
    [23]Jeong,P.and S.Nedevschi,Efficient and Robust Classification Method Using Combined Feature Vector for Lane Detection.Circuits and Systems for Video Technology,IEEE Transactions on,2005.15(4):p.528-537.
    [24]Kandaswamy,U.,D.A.Adjeroh,and M.C.Lee,Efficient Texture Analysis of SAR Imagery.IEEE Transactions on Geoscience and Remote Sensing,Sept.2005.43:p.2075-2083.
    [25]Tagare,H.D.,C.C.Jaffe,and J.Duncan,Medical Image Databases A Content-based Retrieval Approach.1997,Am Med Inform Assoc.p.184-198.
    [26]Rui,T.H.a.Y.,Image retrieval:Past,present,and future.T.S.Huang and Y.Rui,Image retrieval:Past,present,and future,in Proc.of Int.Symposium on Multimedia Information Processing,Dec 1997.,1997.
    [27]Huang,P.W.and S.K.Dai,Image retrieval by texture similarity.Pattern Recognition,2003.36(3):p.665-679.
    [28]Haralick,R.M.,I.Dinstein,and K.Shanmugam,Textural features for image classification.IEEE Transactions on Systems,Man,and Cybemetics,1973.3:p.610-621.
    [29]Ojala,T.,M.Pietikainen,and T.Maenpaa,Gray Scale and Rotation Invariant Texture Classification with Local Binary Patterns.Proc.Sixth European Conference on Computer Vision,2000.1:p.404-420.
    [30]Li,S.Z.,Markov random field modeling in computer vision.1995:Springer-Verlag London,UK.
    [31]Lu,C.S.,P.C.Chung,and C.F.Chen,Unsupervised texture segmentation via wavelet transform.Pattern Recognition,1997.30(5):p.729-742.
    [32]胡广寰,基于内容图像检索中图像语义分类技术研究.2005,浙江大学.
    [33]King,R.D.,R.Henery,C.Feng,and A.Sutherland,A comparative study of classification algorithms:statistical,machine learning and neural network.Machine intelligence 13: machine intelligence and inductive learning table of contents, 1994: p. 311-359.

    [34] Vapnik, V., Estimation of Dependences Based on Empirical Data. 1982: Springer.

    [35] Burges, C.J.C., A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 1998. 2(2): p. 121-167.

    [36] Randen, T. and J.H. Husoy, Filtering for texture classification: a comparative study. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1999. 21(4): p. 291-310.

    [37] Zhang, D., A. Wong, M. Indrawan, and G. Lu, Content-based Image Retrieval Using Gabor Texture Features. Proc. Of First IEEE Pacific-rim Conference on Multimedia (PCM" 00): p. 1-9.

    [38] Friedman, I, Another approach to polychotomous classification. Dept. Statistics, Stanford Univ., Tech. Rep, 1996.

    [39] Li, S., J.T. Kwok, H. Zhu, and Y. Wang, Texture classification using the support vector machines. Pattern Recognition, 2003. 36(12): p. 2883-2893.

    [40] Okumura, H., M. Maeda, and K. Arai, Appropriate training area selection for supervised texture classification by using the genetic algorithms. Proceedings of SPIE, 2003. 4885: p. 411.

    [41] Lorette, A., X. Descombes, and J. Zerubia, Texture Analysis through a Markovian Modelling and Fuzzy Classification: Application to Urban Area Extraction from Satellite Images. International Journal of Computer Vision, 2000. 36(3): p. 221-236.

    [42] Manjunath, B.S. and W.Y. Ma, Texture features for browsing and retrieval of image data. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1996. 18(8): p. 837-842.

    [43] Brieman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Wadsworth Inc, 1984. 67.

    [44] Brodatz, P., Textures: A Photographic Album for Artists and Designers. 1966: Dover Publications.
    [45] Weber, A.G. The USC-SIPI image database. [cited; Available from: http://sipi.usc.edu/database/.

    [46] Lazebnik, S., C. Schmid, and J. Ponce, A sparse texture representation using local affine regions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2005. 27(8): p. 1265-1278.

    [47] Cover, T. and P. Hart, Nearest neighbor pattern classification. Information Theory, IEEE Transactions on, 1967. 13(1): p. 21-27.

    [48] Chang, C.C. and C.J. Lin, LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cilin/libsvm, 2001. 80: p. 604-611.

    [49] Hafner, J., H.S. Sawhney, W. Equitz, M. Flickner, and W. Niblack, Efficient color histogram indexing for quadratic form distance functions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995. 17(7): p. 729-736.

    [50] Stricker, M. and M. Orengo, Similarity of color images. Proc. SPIE Storage and Retrieval for Image and Video Databases, 1995. 2420: p. 381-392.

    [51] Smith, J.R. and S.F. Chang, Tools and techniques for color image retrieval. Storage & Retrieval for Image and Video Databases IV, 1996. 2670: p. 426-437.

    [52] Pass, G. and R. Zabih, Histogram refinement for content-based image retrieval. IEEE Workshop on Applications of Computer Vision, 1996: p. 96-102.

    [53] Huang, J., S.R. Kumar, M. Mitra, W.J. Zhu, and R. Zabih, Image indexing using color correlograms. Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on, 1997: p. 762-768.

    [54] Yang, L. and F. Albregtsen, Fast computation of invariant geometric moments: a new methodgiving correct results. Pattern Recognition, 1994. Vol. 1-Conference A: Computer Vision & Image Processing., Proceedings of the 12th IAPR International Conference on, 1994. 1.

    [55] Hu, M.K., Visual pattern recognition by moment invariants. Information Theory, IEEE Transactions on, 1962. 8(2): p. 179-187.

    [56] Gudivada, V.N., R-string: A geometry-based representation for efficientand effective retrieval of images by spatial similarity. IEEE Transactions on Circuits and Systems for Video Technology, 1998. 10(3): p. 504-512.
    [57]Chang,S.F.,T.Sikora,and A.Purl,Overview of the MPEG-7 standard.Circuits and Systems for Video Technology,IEEE Transactions on,2001.11(6):p.688-695.
    [58]Manjunath,B.S.,J.R.Ohm,V.V.Vasudevan,and A.Yamada,Color and Texture Descriptors.IEEE Transactions on Circuits and Systems for Video Technology,2001.11(6):p.703.
    [59]Buturovic,A.,MPEG 7 Color Structure Descriptor.VizIR Project,http://vizir.ims.tuwien.ac.at,2005.
    [60]Eakins,J.P.,Automatic image content retrieval-are we getting anywhere.Proc.of Third International Conference on Electronic Library and Visual Information Research,1996:p.123-135.
    [61]Luo,J.,A.E.Savakis,and A.Singhal,A Bayesian network-based framework for semantic image understanding.Pattern Recognition,2005.38(6):p.919-934.
    [62]Aksoy,S.,K.Koperski,C.Tusk,G.Marchisio,and J.C.Tilton,Learning bayesian classifiers for scene classification with a visual grammar.Geoscience and Remote Sensing,IEEE Transactions on,2005.43(3):p.581-589.
    [63]Chapelle,O.,P.Haffner,and V.N.Vapnik,Support vector machines for histogram-based image classification.Neural Networks,IEEE Transactions on,1999.10(5):p.1055-1064.
    [64]Zhou,X.S.and T.S.Huang,Relevance feedback in image retrieval:A comprehensive review.Multimedia Systems,2003.8(6):p.536-544.
    [65]Shen,H.T.,B.C.Ooi,and K.L.Tan,Giving meanings to WWW images.Proceedings of the eighth ACM international conference on Multimedia,2000:p.39-47.
    [66]万华林and M.U.Chowdhury,基于支持向量机的图像语义分类.软件学报,2003.14(011):p.1891-1899.
    [67]Goldberger,J.,S.Gordon,and H.Greenspan,Unsupervised image-set clustering using an information theoretic framework.Image Processing,IEEE Transactions on,2006.15(2):p.449-458.
    [68]Boutell,M.and J.Luo,Bayesian fusion of camera metadata cues in semantic scene classification.Computer Vision and Pattem Recognition,2004.CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on,2004.2.
    [69]Wong,K.M.,K.W.Cheung,and L.M.Po,MIRROR:an interactive content based image retrieval system.Circuits and Systems,2005.ISCAS 2005.IEEE International Symposium on,2005:p.1541-1544.
    [70]Weiiyin,L.,S.Dumais,Y.Sun,H.Zhang,M.Czerwinski,and B.Field,Semi-Automatic Image Annotation.Human-Computer Interaction-Interact'01,2001.
    [71]Pan,J.Y.,H.J.Yang,C.Faloutsos,and P.Duygulu,Automatic multimedia cross-modal correlation discovery.Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining,2004:p.653-658.
    [72]Duygulu,P.,K.Bamard,N.de Freitas,and D.Forsyth,Object recognition as machine translation:Learning a lexicon for a fixed image vocabulary.Proceedings of the 7th European Conference on Computer Vision-Part Ⅳ,2002:p.97-112.
    [73]Blei,D.M.,A.Y.Ng,and M.I.Jordan,Latent Dirichlet allocation.Journal of Machine Learning Research,2003.3(5):p.993-1022.
    [74]Li,J.and J.Z.Wang,Automatic Linguistic Indexing of Pictures by a statistical modeling approach.Pattern Analysis and Machine Intelligence,IEEE Transactions on,2003.25(9):p.1075-1088.
    [75]Chang,E.,K.Goh,G.Sychay,and G.Wu,CBSA:content-based soft annotation for multimodal image retrieval using Bayes point machines.Circuits and Systems for Video Technology,IEEE Transactions on,2003.13(1):p.26-38.
    [76]Lin,H.T.,C.J.Lin,and R.C Weng,A note on Platt's probabilistic outputs for support vector machines.Machine Learning,2007.68(3):p.267-276.
    [77]Porter,M.,The Porter Stemming Algorithm.Accessible at http://www.tartarus.org/martin/PorterStemmer.
    [78]Tang,J.and P.H.Lewis,A Study of Quality Issues for Image Auto-Annotation with the Corel Data-Set.IEEE Transactions on Circuits and Systems for Video Technology,2007.17(3):p.1.
    [79]Miller,G.A.,WordNet:A Lexical Database for English.Communications of The ACM,1995.38(11):p.39.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700