图像的语义化标注和检索关键技术研究

英文题名：Research on Technologies of Image Semantic Annotation and Retrieval
作者：李倩倩
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：图像标注 ; 语义知识库 ; MPEG-7 ; 基于语义的图像检索 ; 基于路径的检索
英文关键词：image annotation ; semantic knowledge base ; MPEG-7 ; semantic-based image retrieval ; path-based retrieval
学位年度：2008
导师：阳爱民
学科代码：081203
学位授予单位：湖南工业大学
论文提交日期：2008-05-20

摘要

如何有效地组织、管理和充分利用多媒体信息资源,如何快速、高效地查询、检索所需要的多媒体信息是当今媒体信息处理与理解领域的研究热点。对于非结构化的图像数据,传统的基于文本的检索方法和基于内容的检索方法不能解决基于高层语义的图像检索中存在的问题。在实际应用中,用户需要的是对图像即高层概念特征的查询。本文旨在弥合可感知的媒体视觉特征与高层概念特征间的“语义鸿沟”,设计并实现了一个基于语义的图像标注和检索原型系统,该研究的创新点体现在以下三个方面:
     1.构建特定领域的图像语义知识库和新的MPEG-7描述方案
     利用本体规范化描述领域知识的特点,在图像的对象和对象关系层上,建立一个针对特定领域的图像语义知识库。对MPEG-7的多媒体描述方案进行扩展,得到新的描述方案,提供了能与MPEG-7标准的描述相互转换的扩展描述机制。
     2.图像语义的获取和标注
     采用图像分割算法获取图像包含的对象信息,与用户进行交互,通过区域融合获取最终的图像对象区域。根据图像语义本体库描述出图像的内容,采用MPEG-7标准和图形化的表示方式对语义内容进行组织,获得对图像语义理解的最终表示形式。
     3.基于语义的图像检索
     基于图形化的图像语义标注模型,建立基于路径的索引机制,对图像语义标注文档进行检索。在检索过程的实现中,利用Lucene搜索引擎技术建立索引,利用Xpath对XML文档的操作,实现了基于语义的图像检索。
     在实验中,与全文检索方法进行比较。分别对这两种方法得到的查询结果计算查全率和查准率。实验表明本文提出的图像语义标注和检索系统可有效实现基于语义的图像检索。
With the development of computer and network technology, the research focuses how to effectively organize, manage and fully use multimedia information, how to fast and effectively query and retrieval needful them in current domain of media information process and understanding. For unstructured image data, the traditional textual retrieval method and the Content-Based Image Retrieval (CBIR) can not solve the problems in semantic based image retrieval. In the practical application, users need to query images based on the high level concept features. In order to bridge the“semantic gap”between visual features and concept features, this paper designs and implements a prototype system of image annotation and retrieval based on semantic. Its novelty includes the following three aspects:
     1. Construct image semantic knowledge for special domain and new MPEG-7 description scheme.
     On the level of the objects and their relationships in the image, an image semantic knowledge base for special domain is constructed by adopting the feature of ontology specification to describe domain knowledge. A new description scheme is acquired by extending MPEG-7 multimedia description scheme, providing a new description mechanism that can be conversed to the description in MPEG-7.
     2. Acquire and annotate of image semantic
     In the paper, image segment algorithm is adopted to acquire its objects, and regional integration technology is utilized to acquire the final image semantic objects with user's interaction. According to the semantic ontology knowledge, MPEG-7 and graphical method are adopted to organize image semantic content, which is the final format of user’s understanding.
     3. Retrieve images based on semantic
     According to the graphical form of semantic, the indexing mechanism based on path is constructed to implement the image retrieval based on semantic. In the implementation of retrieval system, Lucene search engine is adopted to construct the index, and Xpath is utilized to process XML documents.
     The corresponding experimental data is compared with the method of full text on recall and precision. The results testify that our method can effectively implement semantic based image retrieval.

引文

[1] 谭罗生. 基于 MPEG-7 的图像内容描述及在检索应用中的研究. [硕士学位论文], 南昌: 江西师范大学,2004: 31-44
    [2] 张明. 基于内容的图像相似性试题技术研究及其在水利中的应用. [博士学位论文]. 南京: 河海大学, 2002: 6-8
    [3] Mori, Y., Takahashi, H., Oka, R. Image-to-word transformation based on defining and vector quantizing images with words. [C]. The First International Workshop on Multimedia Intelligent Storage and Retrieval Management.1999: 65-72
    [4] Barnard, K., Duygulu, P., de Freitas, N., et al. Matching words and pictures. [J]. Machine Learn. 2003, 3: 1107-1135
    [5] Blei, D., Ng, A., Jordan, M. Dirichlet allocation models. [C]. The International Conference on Neural Information Processing Systems, 2001: 150-157
    [6] Blei, D., Jordan, M. Modeling annotated data. [C]. The 26th International Conference on Research and Development in Information Retrieval (SIGIR), 2003: 211-218
    [7] Li, J., Wang, J.Z. Automatic linguistic indexing of pictures by a statistical modeling approach. [J]. IEEE Trans. PAMI, 2003, 25(9): 41-48
    [8] Su Z, Ma SP, Zhang HJ. Feature subspaces extraction for content-based image retrieval. [J]. Journal of Software, 2003, 14(2): 190-193
    [9] Martinez. MPEG-7 Overview-Moving Picture Expert Group MPEG, Pattaya. [EB/OL], 2003, http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm
    [10] 蔡昌许. 基于语义的图像标注与检索系统研究. [硕士学位论文], 武汉: 武汉大学, 2005: 3-9
    [11] A. Smeulder, M. Wowing, S. Santini, et al. Content-Based Image Retrieval at the End of the Early Years. [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000, 22(12): 1349-1380
    [12] 蔡骏. 基于语义的多媒体信息检索的研究. [硕士学位论文]. 南京: 南京邮电学院, 2003: 14-21
    [13] van Leuken, R.H., Veltkamp, R.C., and Typke, R. Selecting Vantage Objects for Similarity Indexing. [M]. Proceedings of the 18th International Conference on Pattern Recognition (ICPR). 2006: 51-58
    [14] 陈建. 领域本体的创建和应用研究. [硕士学位论文]. 北京: 对外经济贸易大学, 2006:11-12
    [15] Perez AG, Benjamins VR. Overview of Knowledge Sharing and Reuse Components. [M]. Ontologies and Problem-Solving Methods. 1999: 1-15
    [16] Kyung-Wook Park, Jin-Woo Jeong, Dong-Ho Lee. OLYBIA: Ontology-Based Automatic Image Annotation System Using Semantic Inference Rules. [C]. DASFAA 2007. Springer-Verlag, LNCS 4443, 2007, 8: 485-496
    [17] F. F. Li. Pictures of objects belonging to 101 categories. [EB/OL]. http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html
    [18] Paul Buitelaar, Michael Sintek, Malte Kiesel. A Multilingual / Multimedia Lexicon Model for Ontologies. [EB/OL], http://www.dfki.de/~paulb/eswc2006.pdf
    [19] Vasileios Mezaris, Ioannis Kompatsiaris, and Michael G. Strintzis. Region-based Image Retrieval using an Object Ontology and Relevance Feedback. [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING. 2004, 6: 886-901
    [20] 东吴大学虚拟教育学院. 动物分类学. [EB/OL]. http://vschool.scu.edu.tw/HAPPY1/project1/course/chapter3/section1.htm
    [21] Ontology Development for Animal Behavior and Welfare Research. [EB/OL]. http://www.webstructor.net/docs/index.html
    [22] Martínez, José M. MPEG-7 Overview. [M]. Pattaya, 2003. 21-37
    [23] 张李义, 李歆. 基于 MPEG-7 的图像内容描述方案研究. [J]. 情报学报, 2004, 23(3): 314-315
    [24] 林开颜, 吴军辉, 徐立鸿. 彩色图像分割方法综述. [J]. 中国图像图形学报. 2005, 10(1): 1-2
    [25] Sahoo PK, Soltani S, Wong AKC, Chen YC. A survey of thresholding techniques. [J]. Computer Vision Graphical Image Process, 1988, 41(2): 233-260.
    [26] Huang LK, Wang MJJ. Image thresholding by minimizingthe measures of fuzziness. [J]. Pattern Recognition, 1995, 28(1): 41-51.
    [27] Tobias OJ, Seara R. Image segmentation by histogram thresholding using fuzzy sets. [J]. IEEE Trans. on Image Processing, 2002, 11(12): 1457-1465.
    [28] Kundu A, Mitra SK. A new algorithm for image edge extraction using a statistical classifier approach. [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1987, 9(4): 569-577
    [29] Liow YT. A contour tracing algorithm that preserves common boundaries between regions. [J]. CVGIP-Image Understanding, 1991, 53(3): 313-321
    [30] Adams R, Bischof L. Seeded region growing. [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1994, 16(6): 641-647
    [31] Mehnert A, Jackway P. An improved seeded region growing algorithm. [J]. Pattern RecognitionLetters, 1997, 18(10):1065-1071
    [32] Vincent L, Soille P. An efficient algorithm based on immersion simulations. [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1991, 13(6): 583-598
    [33] Lezoray O, Cardot H. Cooperation of color pixel classification schemes and color watershed: a study for microscopic images. [J]. IEEE Transactions on Image Processing, 2002, 11(7): 783-789
    [34] 林开颜, 吴军辉, 徐立鸿. 彩色图像分割方法综述. [J]. 中国图象图形学报. 2005, 10(1): 2-7
    [35] 王惠峰, 孙正兴. 基于内容的图像检索中的语义处理方法. [J]. 中国图象图形学报. 2001,10(1): 945-952
    [36] Noel E. O’Connor, Edward Cooke, Herve Le Borgne, etal. The AceToolBox- Low-level Audiovisual Feature Extraction for Retrieval and Classification. [EB/OL]. http://www.acemedia.org/aceMedia/files/document/wp7/2005/ewimt05-dcu.pdf
    [37] aceMedia Research Group. Integrating Knowledge, Semantic and Content for User-Centered Intelligent Media Services. [EB/OL]. http://www.acemedia.org/aceMedia
    [38] MPEG-7, Visual experimentation model (xm) version 10.0. ISO/IEC/JTC1/SC29/WG11, N4062, 2001
    [39] C. Carson, S. Belongie, H. Greenspan, and J. Malik, Blobworld: Color- and texture-based image segmentation using em and its application to image querying and classification. [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, (24)8: 1026–1037
    [40] T. Adamek, N. O’Connor, and N. Murphy. Region-based segmentation of images using syntactic visual features. [C]. The 6th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Montreux, Switzerland, 2005, 4: 30-37
    [41] S. H. Kwok, A. G. Constantinides, and W.-C. Siu. An efficient recursive shortest spanning tree algorithm using linking properties. [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(6): 852-863
    [42] C. F. Bennstrom and J. R. Casas. Binary-partition-tree creation using a quasi-inclusion criterion. [C]. IEEE Computer Society Press, in the proceedings of the Eighth International Conference on Information Visualization (IV). London, UK, 2004: 125-132
    [43] Poh l C, V and Genderen J L. Multisensor image fusion in remo the sensing: Concepts methods and applications. [J]. Remo the Sensing, 1998, (5): 823-854
    [44] Piella G. A general framework for multiresolution image fusion: From pixels to regions. [J]. Information Fusion, 2003, (4): 259-280
    [45] 王惠锋. 基于语义的图像检索系统及其关键技术研究. [硕士学位论文].南京: 南京大学.2002: 6-8
    [46] 刘晶. 基于语义的自然图像检索. [硕士学位论文]. 西安: 西北工业大学. 2006: 7-19
    [47] W. A1-Khatib, Y. F. Day, A. Ghafoor, etal. Semantic Modeling and Knowledge Representation in Multimedia Databases. [J]. IEEE Transactions on Knowledge and Data Engineering. 1999, 11(1): 64-80
    [48] Voorhees E. Using WordNet to Disambiguate Word Senses for Text Retrieval. [C]. In Proc. 16th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, 1993: 171-180
    [49] Zhu X Q, Zhang H J, H U C H, et al. A new query refinement and semantic integrated image retrieval system with. Semi-automatic annotation scheme. [J]. Journal of Electronic Imaging, 2000, 10(4): 850-860
    [50] Colombo C, DelBimbo A, and Pala P. Semantics in visual information retrieval. [J]. IEEE Multimedia, 1999, 6(3): 38-53
    [51] Remco C, Veltkamp. Multimedia Retrieval Algorithmics. [M]. SOFSEM 2007, LNCS 4362, Spinger. 2007: 138-154
    [52] Cavazza M, Green R J and Palmer I J. Multimedia Semantic Features and Image Content Description. [C]. In Proceedings of the 1998 MultiMedia Modeling. 1998: 30-37
    [53] Veltkamp, R.C. and Latecki, L.J. Properties and Performances of Shape Similarity Measures. Batagelj et al. Data Science and Classification. [C]. Proceedings of the IFCS06 Conference, Spinger 2006: 47-56
    [54] Qian-Qian Li, Ai-Min Yang. Reseach on Graphical Annotation and Retrieval of Image Semantic [C]. Proceedings of 2007 International Conference on Machine Learning and Cybernetics. Hongkong, 2007, 8. IEEE Catalog Number: 07EX1680: 1565-1569
    [55] R. DIESTEL. Graph theory [M]. Springer Press, 2000: 120-128
    [56] 李刚, 宁伟, 邱哲. 征服Ajax+Lucene构建搜索引擎. [M]. 人民邮电出版社, 2006: 218-247
    [57] 张校乾. 基于Lucene的全文检索系统的研究与应用. [硕士学位论文]. 大连: 大连理工大学, 2005: 43-45
    [58] W3C. XPath [EB/OL]. http://www.opendl.com/openxml/w3/TR/xpath/xpath-gb.html, 2003-08-05/2007-01-17
    [59] Valiente, Gabriel. Algorithms on trees and graphs. [M]. Springer Press, 2002: 206-212
    [60] Lucene (Apache Work Group). Apache lucene overview. [EB/OL]. http://lucene.apache.org
    [61] Bunke, Horst, Shearer, Kim. A Graph Distance Metric based on the Maximal Common Subgraph. [J]. Pattern Recognition Letters. Elsevier Science Inc., 1998, 19(1): 255-259
    [62] Bunke, Horst. On a Relation Between Graph Edit Distance and Maximum Common Subgraph. [J]. Pattern Recognition Letters, 1997, 18(9): 689-694
    [63] Berretti, S., Del Bimbo, A., Pala, P. A. Graph Edit Distance Based on Node Merging. [C].Proceedings Image and Video Retrieval: Third International Conference, CIVR, Springer, LNCS 3115, 2004: 464-472
    [64] Meyer zu Eissen, Sven, Stein, Benno, et al. The Suffix Tree Document Model Revisited. [C]. I-KNOW 05: 5th Intl. Conference on Knowledge Management, Graz, Austria, 2005: 596-603
    [65] Ying Liu, Dengsheng Zhang, Guojun Lu, Wei-Ying Ma. Region-Based Image Retrieval with High-Level Semantic Color Names. [C]. Proceedings of the 11th International Multimedia Modelling Conference (MMM’05). 2005, 12(14): 180-187

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700