基于语义绑定的分层视觉词汇库的图像理解算法研究

英文题名：Research on Image Understanding Algorithm Based on Semantic-Binding Hierarchical Visual Vocabulary
作者：傅光磊
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：图像理解 ; 图像语义 ; 分层语义模型 ; 语义绑定的分层视觉词汇库 ; Scale-Invariant ; Feature ; Transform
英文关键词：Image Understanding ; Image Semantic ; Hierarchical Semantic Model ; Semantic Binding Hierarchical Visual Vocabulary ; Scale-Invariant Feature Transform
学位年度：2010
导师：蒋兴浩 ; 孙锬锋
学科代码：081001
学位授予单位：上海交通大学
论文提交日期：2010-12-01

摘要

随着互联网科技和多媒体技术的不断发展,数字图像的应用已经渗透到社会生活的方方面面。同时计算机科学也在飞速的发展,硬件设备和软件设备在功能和性能方面不断地进步和创新。在这样的背景下,近年来图像理解问题成为了计算机视觉领域中的研究热点之一。所谓图像理解是指通过设计和实现相关模型和算法,并基于计算机的运算对输入的图像的图像语义和图像内容进行识别,从而让计算机像人类视觉一样能够明白图像所传递和表达的意思。图像理解研究的应用领域相当广泛,在医学医疗,安全控制,军事科技等领域都能见其身影,但是由于应用需求和应用范围的不断深入和拓宽,图像理解这一研究领域正在受到更加多的关注。
     本文在总结分析了近年来国内外对于图像理解研究领域的相关研究成果后,首先提出了分层语义模型的概念。分层语义模型通过对于语义空间中所涉及的图像语义的分析,能够将语义空间中的图像语义构建成具有上下层关联的语义模型。论文在提出分层语义模型的同时,还给出了对于图像语义相互联系和自身属性的定义。
     在提出了分层语义模型的概念基础上,本文继而提出了语义绑定的分层视觉词汇库的概念,并阐述了其构建的方法和讨论了相关细节问题。语义绑定的分层视觉词汇库是在分层语义模型的模板上而建立起来的基于SIFT(Scale-Invariant Feature Transform)图像特征的视觉词汇库,它是由具有分层结构的若干子词汇库组合而成,每一个子词汇库都与一个特定的图像语义相绑定。本文在提出语义绑定的分层视觉词汇库之后会给出其与传统BOVW(Bag Of Visual Words)所产生的视觉词汇库的比较分析。
     本文最后把分层语义模型和语义绑定的分层视觉词汇库理论应用到两个具体的图像理解问题中去:1)基于语义的图像内容识别问题研究;2)基于内容的图像检索问题研究。本文将会具体阐述通过本文提出的模型算法生成解决上述两类研究问题的解决方案。同时本文还将通过基于上述两类研究问题的仿真实验,以及同传统算法模型性能的比较来充分说明本文提出的模型算法的创新性和有效性。
With the development of the technology of internet and multi-media, the application of Digital Image has been fully spread through the social life.And at the meantime, with the continuous progress and innovation of the hardware and software facility, Computer Science is booming on its way. Under such circumstances, the research on Image Understanding has become one of the hottest points in field of Computer Vision. The Image Understanding is mainly responsible for recognizing the semantic and content of images just like what human beings do with the help of the proposed models and algorithm which is running on computers. The theory from the research on Image Understanding has been widely applied into the society, including the field of medical treatment, security control, military technology and etc. In recent years, More light has been shed on the research of Image Understanding since the need and range of its application is widened all the time.
     This papger firstly proposed the concept of Hierarchical Semantic Model on the conclusive analysis of recent research work on Image Understanding globally.Hierarchical Semantic Model can construct a semantic model in which semantic is connected with other semantic located at contiguous layers through the analysis on all the image semantic from the semantic space. The definition of the semantic connection and semantic attributes will be given out when the Hierarchical Semantic Model has been introduced.
     After the introduction of Hierarchical Semantic Model, the concept of Semantic Binding Hierarchical Visual Vocabulary (SBHV) will be proposed, and also the method to construct the SBHV and some certain details will be discussed afterwards.SBHV is a kind of visual vocabulary which is coustructed on the template of Hierarchical Semantic Mode with the SIFT (Scale-Invariant Feature Trasform) image feature.SBHV is made up of several layers of sub-vocabulary which is responding to one certain image semantic.After that, the comparative analysis with traditional BOVW model will be generated.
     This paper will apply the Hierarchical Semantic Model and SBHV into two kinds of concrete Image Understanding problems: 1) image content recongnition based on semantic; 2) image retrieval basec on semantic. The method to merge the proposed model and algorithm into the solutions of the above two kinds of problems will be discussed. And at the mean time, experiments on applying the SBHV into the solutions of above problems and on comparison with the tranditional model and algorithm is carried out to confirm the innovation and effectiveness of model and algorithm proposed by this paper.

引文

[1] Robert E. Schapire, Park F., A brief introduction to boosting [C], Proceeding of the 16th international joint conference on Artificial intelligence, 1999, vol.2, pages: 1401-1406
    [2] Lodhi H., Karakoulas G., Taylor J.S., Boosting strategy for classification [J], Intelligent Data Analysis, April 2002, vol.2(2), pages:149-174
    [3] Cristianini N., Taylor J.S., An introduction to support Vector Machines: and other kernel-based learning methods [M], Cambridge, Cambridge University Press, 2000
    [4] Christopher J. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition [J],Data Mining and Knowledge Discovery, June 1998, vol.2(2), pages: 121-167
    [5]杞娴,胡光华,徐天泽,模糊邻域风险最小化算法,计算机科学,2004,31(z2),200-205
    [6] Gurney K., An Introduction to Neural Networks [M], UK, CRC Press, 1997
    [7] Guo Qiang, Zhang P., Neural Networks for Classification:A Survey [J], IEEE Transaction On System, Man,And Cybernetics-part C:Application and Review, 2000, vol.30(4), pages:451-462
    [8] Sotirios P.Chatzis, Tsechpenakis G., The infinite hidden Markov random field model [J], IEEE Transaction on Neural Networks, 2010, vol.21(6), pages: 1004-1014
    [9] Paget R., Strong Markov Random Field Model [J],IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, vol.26(3), pages:408-413
    [10] Sutton C., McCallum A., Rohanimanesh K., Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data [J], The Journal of Machine Learning Research, 2007, vol.8(2), pages:693-723
    [11] Hosseini A.S., Knapp G.M., Semantic image retrieval based on probabilistic latent semantic analysis [A], International Multimedia Conference Proceedings of the 14th annual ACM international conference on Multimedia, CA, USA , 2006,pages:703-706
    [12] Hofmann T., Unsupervised Learning by Probabilistic Latent Semantic Analysis [J], Machine Learning, 2001, vol.42(1-2), pages:177-196
    [13] Tsai S.F., Chen Y., Chan K.L., Probabilistic Latent Semantic Analysis for Search and Mining of Corporate Blogs [A],Proceeding of the 2008 conference on Applications of Data Mining in E-Business and Finance,2008,pages:63-73
    [14] Blei D.M., Ng A.Y., Jordan M.I., Latent dirichlet allocation [J], The journal of Machine Learning Research, 2003, vol.3, pages: 993-1022
    [15] Kim T.K., Kim H., Hwang W. and et al., Component-based LDA face description for image retrieval and MPEG-7 standardisation [J], Image and Vision Computing, July 2005, Vol.23(7),pages:631-642
    [16] Muselet D., Macarie L., Postaire J.G., Color histograms adapted to query-target images for object recognition across illumination changes [J], Journal on Applied Signal Processing, Jan 2005, pages: 2164-2172
    [17] Wang X.Y., Wu J.F., Yang H.Y., Robust image retrieval based on color histogram of local feature regions [J], Multimedia Tools and Applications, Aug.2010, vol.49(2), pages:323-345
    [18] Juang C.F., Sun W.K., Chen G.C., Object detection by color histogram-based fuzzy classifier with support vector learning [J], Neurocomputing, Jun 2009, vol.72(10-12), pages:2464-2476
    [19] Stricker M.A., Orengo M., Similarity of color images [C], In Proceeding of SPIE on storage retrieval for image and videodatabases, 1995, vol.2420, pages:381-392
    [20] Sirisathitkul Y., Auwatanamongkol S., Uyyanonvara B., Color image quantization using distance between adjacent colors along the color axis with highest color variance [J],Pattern Recognition Letters, July 2004, vol.25(9), pages:1025-1043
    [21] Chang S.F., Smith J.R., Extracting multi-dimensional signal features for content-based visual query [A], In Proceeding of Symposium on Visual Communications and Image Processing, Taiwan, 1995
    [22] Pass G., Zabih R., Histogram refinement for content-based image retrieval [C], In Proceedings of the 3rd IEEE Workshop on Application of Computer Vision, 1996, pages:96
    [23] Huang J., Kumar S.R., Mitra M. and et al., Image indexing using color correlograms [C], In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 1997, pages:762-768
    [24] Chen X.L., Yang X.K., Zhang R. and et al., Edge region color autocorrelogram: A new low-level feature applied in CBIR [C], IEEE international Symposium on Broadband Multimedia Systems and Broadcasting, Shanghai, China, March 2010, pages:1-4
    [25] Kokare M., Biswas P.K., Chatterji B.N., Texture image retrieval using rotated wavelet filters [J], Pattern Recognition Letters, July 2007, vol.28(10), pages:1240-1249
    [26] Han J.,Ma K.K., Rotation-invariant and scale-invariant Gabor features for texture image retrieval [J],Image and Vision Computing, Sep.2007, vol.25(9), pages:1474-1481
    [27] Lee M.C., Pun C.M., Rotation and scale invariant wavelet feature for content-based texture image retrieval [J], Journal of the American Society for Information Science and Technology, Jan.2003, vol.54(1), pages:68-80
    [28] Hitam M.S., Muslan M.Y.H., Deris M.M. and et al., Image texture classification using gray level co-occurence matrix and neural network [C], In Proceedings of the 2nd WSEAS international Conference on Electronics,Control and Signal Processing, Singapore , 2003,Article No:69
    [29] Patricio M.A. ,Maravall D., A novel generalization of the gray-scale histogram and its application to the automated visual measurement and inspection of wooden Pallets [J], Image and Vision Computing, Jun.2007, vol.25(6), pages:805-816
    [30] Tamura H., Mori S., Yamawaki T., Texture features corresponding to visual perception [J],IEEE Transactions on System,Man and Cybernetics, 1978, vol.8(6), pages:460-473
    [31] Minakshi, Banerjee, Kundu M.K. and et al.,Content-based image retrieval using visually significant point features [J], Fuzzy Sets and Systems, 2009, vol.160(23), pages:3323-3341
    [32] Tissainayagam P., Suter D., Object tracking in image sequences using point features [J], Pattern Recognition, 2005, vol.38(1), pages:105-113
    [33] Zheng Q.F., Chellappa R.,Automatic feature point extraction and tracking in image sequences for arbitrary camera motion [J], International Journal of Computer Vision,1995,vol.15(1-2), pages:31-76
    [34] Harris C., Stephens M.J., A combined corner and edge detector [C], In Proceedings of Alvey Vision Conference, Manchester, UK, 1988, pages: 147-152
    [35] Rosenfeld A.,Weszka J.S.,An Improved Method of Angel Detection on Digital Curves [J],IEEE Transaction on Computers, 1975, vol.24(9), pages:940-941
    [36] Freeman H., Davis L.S., A corner Finding Algorithm for Chain Coded Curves [J],IEEE Transaction on Computers,1977, vol.26(3), pages:297-303
    [37] Lindeberg T., Feature detection with automatic scale selection [J], International Journal of Computer Vision, 1998, vol.30(2), pages:77-116
    [38] Mikolajcyk K., Schimid C., An affine invariant interest point detector [C], In Proceedings of the 8th international Conference on Computer Vision,Copenhagen, Denmark,2002, pages:128-142
    [39] Tajima J., Kono H., Natural Object/Artifact Image Classification Based on Line features [J],IEICE Transaction on Information and Systems, 2008, vol.E91-D(8), pages:2207-2211
    [40]戴青云,余英林,一种基于形态小波的在线掌纹的线特征提取方法,计算机学报,2003,26(2), 1-5
    [41] Zhang Q.N., lzquierdo E., Adaptive salient block-based image retrieval in multi-feature space [J],Image Communication, July 2007, vol.22(6), pages:591-603
    [42] Yang J., Jiang Y.G., Hauptmann A.G. and et al., Evaluatingbag-of-visual-words representations in scene classification [C], International Multimedia Conference, Seoul, Korea 2007, pages: 197-206
    [43] Jiang Y.G., Ngo C.W., Yang J., Toward optimal bag-of-features for object categorization and semantic video retrieval [J], ACM, 2007 , pages: 494-501
    [44] Collins M.J., Three generative, lexicalised models for statistical parsing [C], In Proceedingsof the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain, 1997, pages:16-23
    [45] Miller G.A., Beckwith R., Fellbaum C. and et al., Introduction to wordnet: An on-line lexical database [J], Journal of Lexicography, 1990, vol.3(4), pages:235-244
    [46] Lowe D.G., Distinctive image features from scale-invariant keypoints [C], International Journal of Computer Vision, 2004, pages: 91-110
    [47] MacQueen J.B., Some Methods for classification and Analysis of Multivariate Observations [C], In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability,University of California Press, 1967, vol.1, pages:281-297
    [48] Steven C.H.Hoi, Liu W., Lyu M.R. and et al., Learning Distance Metrics with Contextual Constraints for Image Retrieval [C], Conference on Computer Vision and Pattern Recognition, NY, USA, 2006, pages: 2072-2078
    [49] Kitchen L., Rosenfeld A.,Analysis of Gray Level Corner Detection [J], Pattern Recognition Letters, 1999, vol.20(2), pages:149-162.
    [50] Ma Y.J., Fang T.J., Wang D.C. and et al., Texture image classification based on support vector machine and distance classification [C], In Proceedings of the 4th World Congress on Intelligent Control and Automation, Jinan, China, 2002, vol.1(1), pages:551-554
    [51] Nishii R., Eguchi S., Supervised image classification based on adaboost with contextual weak classifiers [C], In Proceedings of IEEE International Geoscience and Remote Sensing Symposium, Anchorage, Alaska, 2004, vol.2, pages:1467-1470
    [52] Huang, Ng M.K., A fuzzy k-modes algorithm for clustering categorical data [J], IEEE Transaction on Fuzzy System, 1999, vol.7, pages:446-452
    [53] Russel B.C., Torralba A., Murphy K.P. and et al., Labelme:A database and web-based tool for image annotation [J], International Journal of Compute and Vision, 2008, vol.77(1-3), pages: 157-173
    [54] Wu R.H., Li S.Z., Zou F.M., Image retrieval based on SIFT features [J], Application Research of Computer, 2008, vol. 25, pages:478-481
    [55] Wu L. , Steven C.H. Hoi, and Yu N., Semantics-preserving bag-of-words models for efficient image annotation [C], International Mutimedia Conference, QingDao, China, 2009, Pages: 19-26

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700