基于“词袋”模型的图像分类系统

英文题名：Bag-of-Visual Words for Image Categorization
作者：周鸽
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：图像分类 ; 词袋模型 ; bag ; of ; words ; BOW ; SOM-SD ; Evolving ; SOM-SD ; FCM-S ; FCM-HS
英文关键词：image categorization ; bag of words ; BOW ; SOM-SD ; Evolving SOM-SD ; FCM-S ; FCM-HS
学位年度：2011
导师：王加俊
学科代码：081001
学位授予单位：苏州大学
论文提交日期：2011-04-01

摘要

作为图像检索、图像识别、图像过滤等方法的关键技术,基于内容的图像分类技术已成为模式识别领域中的一个重要研究方向,它的目的是将图像数据按照自身的语义特征进行分类,“词袋”模型在基于内容的图像分类领域中取得了很大的成功,因此越来越受到大家的重视。但是,在构建视觉词汇表的过程中,当前的很多方法只是简单的将底层特征进行聚类,并没有考虑图像区域之间的空间关系,这导致了词汇表不够准确和稳定。本论文引入和改进了几种能够结合空间信息的算法,用于构建视觉词汇表。本文的主要贡献在以下几点:
     首先,提出了一种演化SOM-SD算法对传统的SOM-SD的神经网络算法进行加速并用来进行图像分类。传统的SOM-SD算法的最大优点是它能够有效处理结构数据,区分相似度较高的对象。但是,由于引入了空间信息,SOM-SD计算量非常大,影响了其在大规模图像库上的应用。在保留SOM-SD处理结构数据能力的前提下,本文利用分层演化思想提高计算效率。实验证明:演化SOM-SD算法在图像分类性能上比没有考虑结构信息的传统算法有了明显的提高,其计算速度远远高于传统的SOM-SD算法。
     其次,提出了一种基于空间约束的分层模糊C均值算法,该算法是基于FCM-S(基于空间约束的模糊C均值算法)改进而来的。相比K均值,其避免了噪声对视觉词汇的影响,增加算法聚类的鲁棒性;相比FCM-S,其提高了算法的计算效率。在相同环境下的实验证明,该算法在图像分类的鲁棒性和计算效率方面都有了明显的提高。
Content based image categorization, as a key technique of image retrieval, image recognition and image filtering, has become one of the most important research areas in the field of the pattern recognition. It aims at classifying images into different semantic categories. The bag of visual words model which has achieved a lot of success in image classification attracts more and more attention. However most existing approaches construct a visual vocabulary by simply clustering image regions represented with low-level visual features, where spatial context of image regions has not been well utilized. This thesis adopted and improved some methods which can take the spatial context into consideration. The main contributions of this thesis are as follows:
     Firstly, a new algorithm called evolving SOM-SD is proposed for the acceleration of the conventional SOM-SD for image classification. The most important advantage of the conventional SOM-SD is that it can deal with structural data and distinguish similar objects. However, it is not suitable for large database because of the extremely intensive computing task resulted from the consideration of the spatial information. We resolved the problem while keeping the capability of dealing with structural data by utilizing a hierarchical and evolving strategy. Experimental results demonstrated that our proposed method performs better than those without considering spatial context and can implement much faster than the conventional SOM-SD algorithm.
     Secondly, an algorithm called FCM-HS is proposed for the improvement of the FCM-S (FCM with spatial constraints). It’s more robust to the noise compared with the k-means algorithm, and more efficient compared with the FCM-S algorithm. Experimental in the same environment showed a significant improvement in the robustness and efficiency in image classification.

引文

[1] Gudivada V. N., Raghavan V. V.. Content based image retrieval systems[J]. IEEE Computer, September 1995:18-22.
    [2] Fu y,Huang TS. Image classification using correlation tensor analysis[J]. IEEE Trans. Image Process Feb. 2008, 17(2): 226-3.
    [3]王惠锋,孙正兴,王箭.语义图像检索研究进展[J].计算机研究与发展, 2002, 39(5): 513-523.
    [4]王顺富.基于语义的图像分类研究[D].西南交通大学,2009.
    [5]陈红绢.基于概率潜在语义分析的图像场景分类[D].天津大学,2009.
    [6]伊怀彬.基于语义的图像多概念标注[D].苏州大学,2009.
    [7]曾璞,吴玲达,文军.基于分块潜在语义的场景分类方法[J].计算机应用, 2008, 28(6):1537-1539.
    [8] Lei Wu, Mingjing Li, Zhiwei Li, Wei-Ying Ma, Nenghai Yu. Visual Language Modeling for Image Classification[C]. In Proceedings of the International Workshop on Multimedia Information Retrieval, 2007: 115-124.
    [9] Pierre Tirilly, Vincent Claveau, Patrick Gros. Language Modeling for Bag-of-Visual Words Image Categorization[C]. In Proceedings of the 2008 international conference on Content-based image and video retrieval, 2008: 249-258 .
    [10] J. Sivic, A. Zisserman. Video Google:A text retrieval approach to object matching in videos[C]. In Proceedings of ICCV. Oct. 2003, 2: 1470-1477.
    [11]王冰.基于词袋模型的图像分类方法研究[D].中国科学技术大学,2008.
    [12] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, C. Bray. Visual categorization with bags of keypoints[J]. In Workshop on Statistical Learning in Computer Vision, ECCV, 2004:1-22.
    [13] J. Bai, J.-Y. Nie. Using language models for text classification[C]. In Proceedings of the Asia Information Retrieval Symposium. 2004.
    [14] W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization[C]. In Proceedings of the Symposium on Document Analysis and Information Retrieval, Las Vegas, US. 1994: 161–175.
    [15]飞龙.蒙古语语音识别系统的研究与优化[D].内蒙古大学,2009.
    [16] Rong Zhao, William I., Grosky. Narrowing the Semantic Gate--Improved Text Based Web Document Retrieval Using Visual Features[J]. IEEE Transactions on multimedia. 2002, 4(2): 189– 200.
    [17] C. D. Manning, P. Raghavan, H. Schutze. Introduction to information retrieval[J]. Cambridge University Press. 2008.
    [18] J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman. Discovering objects and their location in images[C]. Proceedings of international conference on computer vision (ICCV), Beijing, China. 2005, 1:370-377.
    [19] F. F. Li, P. A. Perona.A Bayesian Hierarchical Model for Learning Natural Scene Categories[C].Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR’05). 2005, 2: 524-531.
    [20]任立斌.基于支持向量机的图像分类研究与实现[D].中山大学,2010.
    [21]图像分割的概述及几种经典算法的实现与比较[J].中国科技博览,2009 (29).
    [22]李艳灵.基于聚类的图像分割算法研究[D].华中科技大学,2009.
    [23] K. Mikolajczyk, C. Schmid.. A performance evaluation of local descriptors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10):1615-1630.
    [24] Lowe D G.. Distinctive image features from scaleInvariant keypoints[J]. International Journal of Computer Vision. 2004, 60(2): 91-110.
    [25] Ge Zhou, Zhiyong Wang, Jiajun Wang, Dagan Feng. Spatial Context for Visual Vocabulary Construction[C]. In Proceedings of 2010 International Conference on Image Analysis and Signal Processing (IASP). 2010, 9-11:176-181.
    [26]金玮.孙甲松.汉语语音识别中语言模型的并行优化[J].电声技术,2010, 34(8).
    [27]赵敏涯,结合语言模型的自动文本分类的应用研究[J].计算机与现代化,2010, (3).
    [28] R. Jin, R. Yan, A. Hauptmann. Image Classification Using a Bigram Model[C]. AAAISpring Symposium Series on Intelligent Multimedia Knowledge Management, Palo Alto, CA. 2003: 24-26.
    [29] Kohonen, T.. Self-Organizing Maps[C]. Berlin: Springer-Verlag (1995).
    [30]曹金平.基于SOM神经网络和K-均值聚类的分类器设计[D].江苏大学,2007.
    [31] Markus Hagenbuchner, Alessandro Sperduti, Ah Chung Tsoi.. A Self-Organizing Map for Adaptive Processing of Structured Data[J]. IEEE Transactions on Neural Networks. 2003, 14 (3): 491-505.
    [32] Trentini F., Hagenbuchner M., Sperduti A., Scarselli, F.. A Self-Organising Map Approach for Clustering of XML Documents[C]. International Joint Conference on Neural Networks, IJCNN '06. 2006:1805- 1812.
    [33] Da San Martino G., Sperduti A.. Mining Structured Data[J]. IEEE Computational Intelligence Magazine. 2010 5(1):42 - 49.
    [34] Z. Wang, M.Hagenbuchner, Tsoi. A.C, S.Y.Cho, Z.Chi.. Image classification with structured self-organization map[C]. Proceedings of the 2002 International Joint Conference on Neural Networks. 2002, 2: 1918-1923.
    [35] Jussi Pakkanen and Jukka Iivarinen and Erkki Oja.. The Evolving Tree-A Novel Self-Organizing Network for Data Analysis[J]. Neural Processing Letters. 2004, 20: 199-211.
    [36] Pasi Koikkalainen, Erkki Oja. Self-Organizing hierarchical feature maps[C]. Proceedings of the International Joint Conference on Neural Networks (IJCNN). 1990, 2: 79-284.
    [37] Blackmore J., Miikkulainen R.. Incremental grid growing: encoding high dimensional structure into a two-dimensional feature map[C]. In Proceedings of the IEEE International Conference on Neural Networks. 1993, 1: 450-455.
    [38] Fritzke, B.. Growing cell structures-a self-organizing network for unsupervised and supervised learning[J]. 1994, Neural Networks 7(9) :1441-1460.
    [39] J. MacQueen. Some methods of classification and analysis of multivariate observations[C]. In L. M. LeCam and j. Neyman,editors, Proc. 5th BerkeleySymposium on Math., Stat., and Prob., pp. 281. U. California Press, Berkeley, CA, 1967.
    [40] SELIM S Z, ISMAIL M A. K-means type alogorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1994, 6(1):81-87.
    [41] A. Bocker, S. Derksen, E. Schmidt, A. Teckentrup, G. Schneider. A Hierarchical Clustering Approach for Large Compound Libraries[J]. J. Chem. Inf. Model., 2005, 45 (4): 807-815.
    [42] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algorithms[C]. New York: Plenum, 1981.
    [43]李泰.基于模糊K-均值算法的模糊分类器设计[D].东南大学,2005.
    [44]张东波,王耀南. FCM聚类算法和粗糙集在医疗图像分割中的应用[J].仪器仪表学报. 2006, 27(12): 1683-1687.
    [45] Mohamed N. Ahmed, Sameh M. Yamany, Nevin Mohamed,Aly A. Farag, Thomas Moriarty. A Modified Fuzzy C-Means Algorithm for Bias Field Estimation and Segmentation of MRI Data[J]. IEEE Transactions on Medical Imaging. 2002, 21(3): 193-199.
    [46] Songcan Chen,Daoqiang Zhang. Robust Image Segmentation Using FCM With Spatial Constraints Based on New Kernel-Induced Distance Measure[J]. IEEE Transactions on Systems, Man, and Cybernetics: Part B. 2004 34(4): 1907-1916.
    [47] M. Luo, Y. F. Ma, H. J. Zhang. A spatial constrained K-Means approach to image segmentation[C]. In Proceedings of the Joint Conference of International Conference on Information, Communications and Signal Processing, and Pacific Rim Conference on Multimedia. 2003, 2: 738–742.
    [48]彭代强,李家强,林幼权.基于模糊隶属度空间约束的FCM图像分割[J].计算机科学,2010, 37(10).
    [49] Philip Clarkson. (1999). The Toolkit homepage on speech. [Online]. Available: http://www.speech.cs.cmu.edu/SLM/toolkit.html.
    [50] Robotics Research Group. (2004). The data-cats homepage on robots. [Online].Available: http://www.robots.ox.ac.uk/~vgg/data/data-cats.html.
    [51] (2004) The Laboratory of Computer and Information Science (CIS) Adaptive Informatics Research Centre site. [Online]. Available: http://www.cis.hut.fi /research/som_lvq_pak.shtml.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700