基于显著区域的图像语义分类方法研究

英文题名：Research on Image Semantic Classification Method Based on Salient Regions
作者：梁上松
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：图像分类 ; 显著区域 ; 视觉词袋 ; EMD ; 高层语义
英文关键词：image classification ; salient region ; bag-of-visual word ; EMD ; high-level semantics
学位年度：2011
导师：何东健
学科代码：081201
学位授予单位：西北农林科技大学
论文提交日期：2011-05-01

摘要

随着数字图像获取设备的广泛使用,数字图像的数量成指数性增长。如何对海量数字图像进行快捷、高效的图像组织、分类与检索就成为颇具价值的研究课题。本文在图像显著区域的基础上,重点研究图像视觉词袋的构建方法、图像相似度度量方法和图像语义分类方法,并对图像语义分类算法进行了测试。
     本文的主要研究内容如下:
     (1)针对图像全局特征附带过多冗余信息,较难表示图像主要类别信息的问题,提出一种图像视觉词袋构建方法。首先,利用Harris-Laplace区域检测子获得图像的显著区域,并使用特征描述子对显著区域进行描述形成特征向量;然后,对图像的显著区域特征向量使用仿射传播聚类算法进行聚类;最后,把每个聚类中心当做该图像的视觉单词,对应聚类的特征向量个数与图像总的特征向量个数的比重作为该视觉单词的词频,形成图像的视觉词袋。实验结果表明,构建的视觉词袋能够代表图像的主体信息。
     (2)为了更合理地度量两幅图像之间的相似程度,提出一种基于图像视觉词袋的EMD(Earth Mover’s Distance)相似度度量方法。该方法把视觉单词看作是直方图中的Bin,把视觉单词词频看作是直方图中对应Bin的统计值。首先,构建一个保存两幅图像视觉单词之间的欧式距离的相似矩阵;然后,通过约束条件寻找两个图像视觉词袋之间唯一存在的一个流;最后,获取两幅图像之间的相似度。实验结果表明,该相似度度量方法能够比较合理的度量两幅图像之间的相似度,对图像语义分类产生积极的影响。
     (3)针对图像语义分类问题,提出一种多图像特征描述子多最近邻居的图像语义分类方法。首先,使用Harris-Laplace检测子获得每幅图像的显著区域,在不同图像特征描述子下,对显著区域进行特征描述形成特征向量;然后,对特征向量使用仿射传播聚类算法形成特定描述子下的图像视觉词袋,并通过基于视觉词袋的EMD距离度量方法寻找未标记图像的来自每个类别的最近邻居;最后,综合各个特征描述子,利用获得的未标记图像与对应的最近邻居之间的相似度关系,对未标记图像进行图像语义分类。
     (4)使用Matlab、Java和C++编程语言在著名图像数据库1000-image和Caltech-101 Object上进行算法测试。1000-image上的实验结果表明,综合利用多种图像特征描述子这一策略,比使用单一图像特征描述子可使得图像语义分类平均准确率提高5%-30%;Caltech-101 Object上的实验结果表明,基于多图像特征描述子多最近邻居的图像语义分类方法在已标记样本图像数量较少的情况下,获得较好的图像语义分类平均准确率,在已标记样本图像数量较多的情况下,获得的图像语义分类平均准确率与著名算法相当。
As digital image acquisition devices are used widely, the number of images increases exponentially. How to organize, classify and retrieve digital image has become a valuable research topic. Based on image salient regions, this paper mainly focuses on the approaches of bag-of-visual words construction, image dissimilarity measure and image semantic classification, and tests the classification algorithm.
     The main content of this research is as follows:
     (1) Since it is relatively difficult to distinguish different classes with the whole image features which include too much redundant information, we propose an image bag-of-visual word construction approach. To begin with, Harris-Laplace region detector is employed to acquire image salient regions, which are described to form the corresponding feature vectors by feature descriptors. Subsequently, affinity propagation algorithm is used to cluster the image salient regions. Finally, the exemplar of each cluster is regarded as visual word, while the proportion of the number of feature vectors in a cluster to that of feature vectors in the whole image is regarded as the corresponding frequency. The exemplar and the corresponding frequency are formed bag-of-visual word. Experimental results show that bag-of-visual word could express the main information of an image.
     (2) In order to measure the dissimilarity between two images more reasonably, we propose a bag-of-visual word based EMD (Earth Mover’s Distance) dissimilarity measure approach, which regards visual word as the bin of a histogram and the frequency of the visual word as the statistics information on the corresponding bin of the histogram. First of all, a dissimilarity matrix is constructed, which saves Euclidean distances between two image bag-of-visual words. Subsequently, an only existed flow is found for two image bag-of-visual words by subjecting to some constraints. Finally, the dissimilarity of two images is acquired. Experimental results show that this dissimilarity approach could measure the dissimilarity between two images relatively reasonably, and have great impact on image semantic image classification.
     (3) We propose a multi-descriptor multi-nearest neighbors image classification algorithm to address image semantic classification problems. First of all, for each image Harris-Laplace salient region detector is used to detect salient regions which are described by different image feature descriptors to form feature vectors. Subsequently, bag-of-visual word is constructed by using affinity propagation algorithm, and the nearest neighbors of an unlabeled image coming from all categories are found by bag-of-visual word based EMD dissimilarity measure approach. Finally, unlabeled images could be classified by combing different feature descriptors and using the results of dissimilarity measures between unlabeled images and their corresponding nearest neighbors.
     (4) Experiments are done on two renowned image database, i.e., 1000-image and Caltech-101 Object, to evaluate our classification algorithm by using Matlab, Java and C++ programming languages. Experimental results on 1000-image database show that compared to using only one kind of feature descriptor, the approach of using multi image feature descriptors could boosts mean recognition rates of image classification performance from 5% to 30%. Meanwhile, experimental results on Caltech-101 Object database show that when the number of labeled images is small, our image classification algorithm outperforms some state-of-the-art algorithms, and when the number of labeled images is large, our image classification algorithm performs almost the same as some state-of-the-art algorithms.

引文

程邦胜,唐孝威. Harris尺度不变性关键点检测子的研究.浙江大学学报(工学版), Vol. 43, No. 5, pp. 855-859, 2009.
    佟强.图像区域粗糙分割情况下的区域物体分类.计算机辅助设计与图形学学报, Vol. 22, No. 7, pp. 1183-1189, 2010.
    郭立君,赵杰煜,史忠植.生成模型与判别方法相融合的图像分类方法.电子学报,Vol. 38, No. 5, pp. 1141-1145, May 2010.
    虎晓红,钱旭,王培崇等.基于Vague融合的图像分类方法.计算机工程, Vol. 35, No. 11, pp. 226-230, June 2009.
    秦磊,高文.基于内容相关性的场景图像分类方法.计算机研究与发展, Vol. 46, No. 7, pp. 1198-1205, 2009.
    魏宪,李元祥,赵海涛等.基于改进ISOMAP算法的图像分类.上海交通大学学报, Vol. 44, No. 7, pp. 911-915, Jul. 2010.
    张斌,杨然,谢兴等.利用极化目标分解和WMRF全极化SAR图像分类方法.武汉大学学报(信息科学版), Vol.36, No. 3, pp. 297-301. March 2011.
    A. Bosch, A. Zisserman, X. Muoz. Scene classification using a hybrid generative/discriminative approach, IEEE Transaction on Pattern Analysis and Machine Intelligence, 30(4), pp. 712-727, 2008.
    A. F. Smeation, P. Over, W. Kraaij. Evaluation campaigns and trecvid. In Proceedings of ACM Multimedia Information Retrieva, pp. 321-330, Santa Barbara, USA, 2006. A. Gupta, R. Jain. Visual information retrieval. Commun. ACM 40 (5) (1997) 70-79.
    A. Pentland, R.W. Picard, S. Scaroff. Photobook: content-based manipulation for image databases. Int. J. Comput. Vision 18 (3) (1996) 233-254.
    Arthur D., Vassilvitskii S. How slow is the K-means method?, ACM SCG’06, June 5-7, Sedona, Arizona, USA.
    Arthur D., Vassilvitskii S. K-means++: the advantages of careful seeding, Proceedings of the eighteenth annual ACM-SLAM symposium on Discrete algorithms, pp. 1027-1035, 2007.
    A. Vailaya, M. A. T. Figueiredo, A. K. Jain et al. Image classification for content-based indexing. IEEE Trnas. Image Process. 10(1) (2001), pp. 117-130.
    Belongie S., Carson C., Greenspan H., Malik J. Color and texture-based image segmentation using EM and its application to content-based image retrieval. IEEE International Conference on Computer Vision, Bombay, India. pp. 675-682, 1998.
    B. Efron. Bootstrap methods: Another look at the jackknife, Annals of Statistics, 7:1-26, 1979.
    Boiman, O., Shechtman, E., Irani, M. In defense of nearest-neighbor based image classification. IEEE International Conference on Computer Vision and Pattern Recognition, 2008.
    Brendan J. Frey, Delbert Dueck. Clustering by passing messages between data points, Science, Vol. 315, NO. 5814, pp. 972-976, 2007. http://www.psi.toronto.edu/~psi/pubs2/2007/972.pdf
    Carson, C., Thomas, M., Belongie, S., et al. Image segmentation using expectation-maximization and its application to image querying. Third International Conference on Visual Information Systems, Springer1999
    C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, W. Equitz. Efficient and effective querying by image content. J. Intell. Inf. Sys. 3(3-4) (1994) 231-262.
    C. Harris, M. Stephens. A combined corner and edge detector. Proceedings of the 4th Alvey Vision Conference, pages 147-151, 1988.
    C. M. Bishop. Pattern recognition and machine learning, springer, August 2006. Cover T. M., Thomas J. A. Elements of information theory, Wiley Series in Telecommunications, John Wiley & Sons: New York, USA, 1991.
    C. Schmid, R. Mohr. Local grayvalue invariants for image retrieval. Pattern Analysis and Machine Intelligent, 19(5): 530-534, 1997.
    Das M., Riseman E. M., Draper B. A. FOCUS: Searching for multi-colored objects in a diverse image database, Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp. 756-761, 1997.
    David G. Lowe. Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, Vol.60, No. 2, pp. 91-110, Nov 2004.
    David. G. Lowe. Object recognition from local scale-invariant features. In International Conference on Computer Vision (ICCV’99), pages 1150-1157, 1999.
    D. Lu, Q. Weng. A survey of image classification methods and techniques for improving classification performance, International Journal of Remote Sensing, Vol. 28, No. 5, pp. 823-870, 2007.
    Edward Rosten, Tom Drummond. Fusing points and lines for high performance tracking. IEEE International Conference on Computer Vision, page 1508-1511, October 2005.
    Edward Rosten, Tom Drummond. Machine learning for high-speed corner detection. Eurpean Conference on Computer Vision, May 2006.
    Everingham M., Van Gool L., Williams C. K. I., Winn J., Zisserman A. The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc20 07/workshop/index.html.
    F. Mindru, T. Tuytelaars, L. Van Gool, T. Moons. Moment invariants for recognition under changing viewpoint and illumination. Computer Vision and Image Understanding, 94(1-3), pp. 3-27, 2004.
    Frome, A., Singer, Y., Sha, F., Malik, J. Learning globally consistent local distance functions for shape-based image retrieval and classification. IEEE International Conference on Computer Vision, 2007.
    G. D. Finlayson, M. S. Drew, B. V. Funt. Spectral sharpening: sensor transformations for improved color constancy, J. Optical Society of America A, 11(5), 1994.
    G. D. Finlayson, S. D. Hordley, R. Xu. Convex programming colour constancy with a diagonal-offset model. IEEE International Conference on Image processing, pp. 948-951, 2005.
    Griffin G., Holub A., Perona P. Caltech-256 object category dataset, Technical Report, California Institute of Technology, 2007
    Gy. Dorko, C. Schmid. Selection of scale-invariant parts for object classification. Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV 2003), Vol. 2, 2003.
    Hafner J., Sawhney H. S., Equitz W. et al. Efficient color histogram indexing for quadratic form distance functions, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(7): 729-735, 1995.
    Harris, C., Stephens, M. A combined corner and edge detector. Proceedings of the 4th ALVEY visionconference, University of Manchester, England (1998) 147-151.
    He Dongjian, Shao Junming, et al. A Model for Image Categorization Based on Biological Visual Mechanism. New Zealand Journal of Agricultural Research, 2007, Vol. 50: 781-787. http://fourier.eng.hmc.edu/e161/lectures/gradient/node10.html
    I. K. Sethi, I.L. Coman. Mining association rules between low-level image features and high-level concepts. Proceedings of the SPIE Data Mining and Knowledge Discovery, vol. 3, 2001, pp. 279-290
    Itti, L., Koch, C. Computational modeling of visual attention. Nat. Rev. Neurosci, 2 (2001) 194-203.
    James Z. Wang, Jia Li, Gio Wiederhold. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Transations on Pattern Analysis and Machine Intelligence, Vol. 23, No. 9, 2001.
    J. B. MacQueen. Some methods for classification and analysis of multivariate observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297, 1967.
    J. Eakins, M. Graham. Content-based image retrieval. Technical Report, Univerisity of Northumbria at Newcastle, 1999.
    J. Luo, A. Savakis. Indoor vs outdoor classification of consumer photographs using low-level and semantic features. International Conference on Image Processing (ICIP), vol.Ⅱ, October 2001.
    J. M. Geusebroek, R. van den Boomgaard, A. W. M. Smeulders et al. Color invariance, IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(12), pp. 1338-1350, 2001.
    Jonathon S. Hare, Paul H. Lewis. Salient regions for query by image content. Lecture Notes in Computer Science, 2004, Volume 3115, Image and Video Retrieval, pp. 264-268.
    J. R. Smith, S. F. Chang. VisualSeek: a fully automatic content-based query system. Proceedings of the Fourth ACM International Conference on Multimedia, 1996, pp. 87-98.
    Junming Shao, Dongjian He, Qinli Yang. Multi-semantic Scene Classification Based on Region of Interest. Proceeding of International Conference on Computational Intelligence for Modelling, Control and Automation (CIMCA), December 2008: 732-737.
    J. van de Weijer, T. Gevers, A. Bagdanov. Boosting color saliency in image feature detection. IEEE Transaction on Pattern Analysis and Machine Intelligence, 28(1), pp. 150-156, 2006.
    J. von Kries. Influence of adaptation on the effects produced by luminous stimuli, Sources of Color Vision, MIT Press, Cambridge, 1970.
    J. Zhang, M. Marszalek, S. Lazebnik et al. Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), pp. 213-238, 2007.
    J. Z. Wang, J. Li, G. Wiederhold. SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 23 (9) (2001) 947-963.
    Kadir, T., Brady, M. Saliency, scale and image description. Int. J. Comput. Vis. 45 (2001) 83-105
    Kadir, T. Scale, saliency and scene description. PhD thesis, University of Oxford, Robotics Research Group, Department of Engineering Science, Oxford, UK (2001).
    Kelly K. Affinity program slashes computing times, http://www.news.utoronto.ca/bin6/070215- 2952 .asp.
    K. Mikolajczyk, T. Tuytelaars, C. Schmid et al. A comparison of affine region detectors, International Journal of Computer Vision, 65(1-2), pp. 43-72, 2005.Koenderink J. J. The structure of image, Biological Cybernetics, 50:363-396, 1984.
    Koen E. A. van de Sande, Theo Gevers, Cees G. M. Snoek. Evaluation of color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 32, No. 9, pp. 1582-1596, 2010.
    Koen E. A. van de Sande, Theo Gevers, Cees G. M. Snoek. Color descriptors for object category recognition, European Conference on Color in Graphics, Imaging and Vision, pp. 378-381, 2008.
    Krystian Mikolajczyk, Cordelia Schmid. Scale & affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63-86, 2004.
    Krystian Mikolajczyk, Cordelia Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, pp. 31-47, 2005.
    Krystian Mikolajczyk, Cordelia Schmid. Indexing based on scale invariant interest points. 8th International Conference on Computer Vision (ICCV’01), Vol. 1, 2001.
    Kullback S. Information theory and statistics dover: New York, 1968.
    L. Fei-Fei, R. Fergus, P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. IEEE CVPR 2004, Workshop on Generative-Model Based Vision, 2004.
    Lindeberg T. Scale-space theory: a basic tool for analyzing structures at different scales, Journal of Applied Staticstics, 21(2): 224-270, 1994.
    L. Itti and C. Koch. Computational modeling of visual attention[J]. Nature Reviews Neuroscience, Mar 2001, 2(3):194-203.
    L. Itti, C. Koch, E. Niebur. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11):1254-1259.
    Manjunath B. S., Ma W. Y. Texture features for browsing and retrieval of image data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8) :837-842, 1996.
    M. Shafer. Using color to separate reflection components. Color Research and Applications, 10(4), pp. 210-218, 1985.
    Nikita Orlov, Lior Shamir, Tomasz Macura et al. WND-CHARM: Multi-purpose image classification using compound image transforms. Pattern Recognition Letters 29 (2008) pp. 1684-1693.
    Puzicha J., Hofmann T., Buhmann J. Non-parametric similarity measures for unsupervised texture segmentation and image retrieval, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 267-272, 1997.
    Roberto Valenti, Nicu Sebe, Theo Gevers. Image saliency by isocentric curvedness and color. 2009 IEEE 12th International Conference on Computer Vision, 2185-2192, Oct. 2009.
    Sebe, N., Tian, Q., Loupias, E., et al. Evaluation of salient point techniques. Image and Vision Computing, 21 (2003) pp. 1087-1095.
    Sebe, N., Lew, M. S. Comparing salient point detectors. Pattern Recognition letters, 24 (2003) 89-96.
    Serre, T., Wolf, L., Poggio, T. Object recognition with features inspired by visual cortex. IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
    S. F. Chang, D. Ellis, W. Jiang, K. Lee et al. Large-scale multimodal semantic concept detection for consumer video. In Proceedings of ACM Multimedia Information Retrieval, pp. 255-264, Augsburg, Germany, 2007.
    S. Hichem, A. Jean-Yves, K. Renaud. Context-dependent kernels for object classification. IEEE Transactions on Pattern Anlysis and Machine Intelligence, 33(4): 699-708, April 2011.
    Shokoufandeh, A., Marsic, I., Dickinson, S. View-based object recognition using saliency maps. Image Vis. Comput. 17 (1999) 445-460.
    S. Lazebnik, C. Schmid, J. Ponce. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 2169-2178, New York, USA, 2006.
    Stricker M., Orengo M. Similarity of color images, SPIE Conference on Storage and Retrieval for Image and Video Databases III, Vol. 2420, pp. 381-392, 1995.
    Swain M. J., Ballard D. H. Color indexing, International Journal of Computer Vision, 7(1):11-32, 1991.
    Tomasik, B., Thiha, P. Turnbull, D. Tagging products using image classification. ACM SIGIR conference on research and development in information retrieval, pp. 792-793, 2009.
    Tony. Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, Vol. 30, No. 2, pp 77-116, 1998.
    T. Tuytelaars, L. V. Gool. Content-based image retrieval based on local affinely invariant regions. In Visual99, pages 493-500, 1999.
    Vedaldi A., Gulshan V., Varma M, Zisserman A. Multiple kernels for object detection. IEEE International Conference on Computer Vision and Pattern Recognition, 2009.
    Witkin, A. P. Scale-space filtering. International Joint Conference on Artificial Intelligence, Karlsruhe, Germany, pp. 1019-1022, 1983.
    W. Y. Ma, B. Manjunath. Netra: a toolbox for navigating large image databases. Proceedings of the IEEE International Conference on Image Processing. 1997, pp. 568-571.
    X. S. Zhou, T. S. Huang. CBIR: from low-level features to high-level semantics. Proceedings of the SPIE, Image and Video Communication and Processing, San Jose, CA, vol. 3974, January 2000, pp. 426-431.
    Yangqing Jia, Jingdong Wang, Changshui Zhang, Xiansheng Hua. Finding image exemplars using fast sparse affinity propagation, Proceeding of the 16th ACM International Conference on Multimedia, British Columbia, Canada, October 2008.
    Y. Chen, J. Z. Wang, R. Krovetz. Clue: cluster-based retrieval of images by unsupervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005.
    Ying Liu, Dengsheng Zhang, Guojun Lu, Weiying Ma. A survey of content-based image retrieval with high-level semantics, Pattern Recognition, 40, pp. 262-282, 2007.
    Yossi Rubner, Carlo Tomasi, Leonidas J. Guibas. The earth mover’s distance as a metric for image retrieval, International Journal of Computer Vision, 40(2), pp. 99-121, 2000
    Yossi Rubner, Carlo Tomasi, Leonidas J. Guibas. A metric for distributions with applications to image databases, Proceedings of the 1998 IEEE International Conference on Computer Vision, Bombay, India, 1998
    Yu-Gang Jiang, Chong-Wah Ngo, Jun Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. International Conference on Image and Video Retrieval (CIVR’07), 2007.
    Yu H, Yang J. A direct LDA algorithm for high dirmensional data with application to face recognition.Pattern Recognition, 34(10), pp. 2067-2070, 2001.
    Zhang H., Berg A. C., Maire M., Malik J. Svm-knn: Discriminative nearest neighbor classification for visual category recognition. IEEE International Conference on Computer Vision and Pattern Recognition, 2006.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700