Web图像搜索中的内存索引与融合聚类技术研究

英文题名：The Research on Memory Indexing and Integration of Clustering Technology in Web Image Retrieval
作者：罗锋
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：Web图像搜索 ; 内存索引 ; 多模 ; 融合聚类
英文关键词：Web image retrieval ; memory indexing ; multi-modal ; integration of clustering
学位年度：2008
导师：李胜利
学科代码：081202
学位授予单位：华中科技大学
论文提交日期：2008-05-01

摘要

随着计算机技术的发展和网络带宽的提高,Web上图像资源变的越来越丰富,它们被大量的内嵌在网页中,构成了一个庞大的“Web图像数据库”。Web图像检索致力于解决从纷繁复杂的Web上,帮助用户快速的检索到需要的信息。而目前Web图像检索的瓶颈问题是如何提高检索效率和如何准确的标识图像的语义。基于文本的图像检索(Text-Based Image Retrieval,TBIR)是当前商业图像搜索引擎所采用的主要方式,它面临的主要问题是只利用了Web图像的文本信息来间接地检索图像,没有利用图像本身的内容信息;基于内容的图像检索(Content-Based Image Retrieval,CBIR)则是当前图像检索学术研究领域的主流方式,它面临着主要问题是“语义鸿沟”的问题,即图像的底层视觉特征不能有效的描述其高层语义。
     根据EMD(the Earth Mover’s Distance)算法的近似匹配算法,提出了Web图像的内存索引方法,此方法主要把高维的图像特征降维为一维的加权平均中心,并以此建立平衡二叉搜索树内存索引。并把索引常驻内存,有效的减少了磁盘I/O的访问开销,显著提高了系统的检索速度。通过改进系统的检索模式,提出了全局检索模式。此模式先基于KNN(K-Nearest Neighbor)的范围查找,过滤掉许多对查询结果没有影响的聚类中心,然后EMD算法匹配找到与样例图像最相似的K个聚类中心,能够用更少的时间检索出比分层检索模式更好的查询结果。
     针对Web图像的多模特性,提出了基于图像内容和图像文本信息的融合聚类方法。此方法的核心思想是在聚类过程中同时利用Web图像的文本信息和内容特征,实现相互作用或关联以缩小图像的“语义鸿沟”,建立文本关键字和图像内容特征的联系。采用此方法明显提高了图像语义标识的准确度,使得聚类时能够把相似的Web图像尽可能的分到同一类中,从而达到提高检索准确度的目的。
     通过在VAST(VisuAl & SemanTic image search)系统上的测试分析,证明Web图像的内存索引方法能够在保证系统查准率的前提下,将检索时间减少到原来的1/3左右。采用融合聚类方式,也达到了比较好的检索效果,相对于顺序检索的查准率达到了98.1%。
With the development of computer technology and the improvement of network bandwidth, there are more and more Web images because of the rich resources. Most of Web images are embedded in pages, so they constitute a huge "Web image database." Web image retrieval helps users to quickly access to the information which they needed on the complex Web environment. The bottlenecks of current Web image retrieval are how to increase efficiency and how to annotate image of semantics.Text-Based Image Retrieval (TBIR) is the main technology in the current commercial image search engine, which depends on the text only to indirectly retrieve Web images. In contrast, Content-Based Image retrieval (CBIR) has recently receveived a great deal of interest in the research community, the major charllenge of which is the semantic gap problem, i.e. the gap between the low-level visual features and the high-level semantic concepts.
     We propose the memory indexing algorithm of Web images, on the basis of the approximation algorithm for the Earth Mover’s Distance (EMD). Down through Mitigating the Problem of High Dimension by the weighted average centers, the balanced binary searching tree by memory indexing is built. The index are stored in memory,in order to effectively decrease frequent visits of disk I/O, and significantly improve the speed of the system retrieval. By improving the system retrieval model, the global retrieval model is proposed. First query is based on the scope of the K-Nearest Neighbor (KNN) algorithm. Many of the cluster centers, which do not affect the query results, are filtered to reduce the number of matching operations. Second, EMD algorithm is used to find the K cluster centres, which are similar to the sample image, with less time to get better results than hierarchical retrieval model.
     Because of the Web images have multi-modal characteristics obviously, based on the content features and textual features of images of multi-modal integration clustering method is proposed. The key idea is to using the content features and textual features while in the process of clustering. So it can simultaneously leverage all types of data which are related to Web image, explore their mutual reinforcement, and construct the association between textual features and content features to bridge the semantic gap.Using this method significantly improve the accuracy of annotation of the images, making the similar Web images put into the same cluster as far as possible, in order to improve accuracy of the retrieval.
     Based on the test analysis in the VisuAl & SemanTic image search (VAST) system, it proves that memory indexing method of Web iamges spends only 1/3 around time than the original retrieval, under the premise of high precision. The integration of clustering achieves a relatively good retrieval results, in relation to sequence retrieval scheme with the precision of 98.1 percent.

引文

[1] Yong Rui, Thomas S Huang, Shiih-Fu Chang. Image Retrieval: Current techniques, Promising Directions and Open Issues. Journal of Visual Communication and Image Representation, 1999, 10: 39~62
    [2] SCLAROFF, S. World Wide Web image search engines. In Proceedings of the NSF Workshop on Visual Information Management (Cambridge, MA). 1995
    [3] SCLAROFF, S., TAYCHER, L., LA CASCIA, M. ImageRover: A content-based image browser for the World-Wide Web. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries, 1997, 2~9
    [4] TAYCHER, L., LA CASCIA, M., SCLAROFF, S. Image digestion and relevance feedback in the Image Rover WWW search engine. In Proceedings of the 2nd International Conference on Visual Information (San Diego, CA), 1997, 85~94
    [5] SMITH, J. R. , CHANG, S.-F. VisualSEEK: A fully automated content-based image query system. In Proceedings of the 4thACMInternational Conference on Multimedia (Boston, MA), 1996, 87~98
    [6] SMITH, J. R., CHANG, S.-F. Searching for images and videos on the World-Wide Web. Tech. Rep. No. 459-96-25. Center for Telecommunication Research, Columbia University, New York, NY. 1996a
    [7] SMITH, J. R., CHANG, S.-F. An image and video search engine for the World-Wide Web. In Proceedings of the SPIE Conference on Storage and Retrieval for Image and Video Databases V(IS&T/SPIE, San Jose, CA), 1997, 84~95
    [8] Q. Iqbal , J. K. Aggarwal, CIRES: A System for Content-based Retrieval in Digital Image Libraries, Invited session on Content Based Image Retrieval: Techniques and Applications International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, December 2-5, 2002, 205~210
    [9] LEW,M. S. Next generationWeb searches for visual content. In IEEE Comput. 33, 11, 2000, 46~53
    [10] LEW, M. S., LEMPINEN, K., HUIJSMANS, N. Webcrawling using sketches. In Proceedings of the VISUAL97 Conference (San Diego, CA), 1997, 77~84
    [11] Huang Q, Dom B, Gorkani M, et al. Query by image and video content:the QBIC system.IEEE Computer. 1995, 28 (9):23~32
    [12] Niblack W, Barber R, EquitZ W, et al. The QBIC project: Querying images by content using color, texture, and shape. In: Proc SPIE Electronic Imaging: Scienceand Technology. San Jose, CA, 1993,173~187
    [13] H. Zhang, L. Wenyin, C. Hu. iFind -- A System for Semantics and Feature Based Image Retrieval over Internet. in: Proc. of the eighth ACM international conference on Multimedia. Los Angeles, California, 2000, 477~478
    [14]朱兴全,张宏江,刘文印. iFind:一个结合语义和视觉特征的图像相关反馈检索系统.计算机学报, 2002, 25(7): 681~688
    [15] RUI, Y., HUANG, T. S., CHANG, S.-F. Image retrieval: Current techniques, promising directions, and open issues. J. Vis. Comm. Image Rep. 10, 1, 1999,39~62
    [16] R.Ng, A. Sedighian. Evaiuating multi-dimensional indexing structures for images transformed by PrinciPal component analysis. In Proe. SPIE Storage and Retrieval for Image and video Database, 1996
    [17] C.Faioutsos. King-Ip(David) Lin.Fastmap: A fast algorithm for indexing data-mining and visualization of traditional and multimedia datasets.In Proe.Of SIGMOD, 1995, 163~179
    [18] S.Chandrasekaran, B.S.Manjunath, Y.F.Wang, et al.An eigenspace update algorithm for image analysis. ComPut. Vis., Graphics, and Image Proc.1997
    [19] A.Guttman. R-tree: a dynamic index structure for spatial searehing.In Proc.ACM SIGMOD, 1984
    [20] T. Sellis, N. RoussoPoulos, C. Faloutsos. The R+ tree: A dynamic index for multi-dimensional objeets.In Proc.12.VLDB, 1987
    [21] D. Greene. An implementation and performance analysis of spatial data access.In Proc. ACM SIGMOD, 1989
    [22] N. Beekmann, H.P.Kriegel, R.Schenier, et al.The R*-tree: an effieient and robust access method for points and rectangles.In Proe. ACM SIGMOD, 1990
    [23] Hemant Tagare. Increasing retrieval efficieney by index tree adaption.In Proe. of IEEE Workshop on Content-based Access of Image and Video Libraries, in conjunction with IEEE CVPR’97, 1997
    [24] Moses Charikar, Chandra Chekur, Tomas Feder, et al. Incremental clustering and dynamic information retrieval. In Proc of the 29a Anual ACM SymPosium on Theory of Computing, 1997, 626~635
    [25] Yong Rui, Kaushik Chakrabarti, et al. Dynamic clustering for optimal retrieval in high dmensional multimedia databases.In TRMARS-10-97, 1997
    [26] Hong Jiang Zhang, Di Zhong. A seheme for visual feature based image retrieval.In: Proc. SPIE Storage and Retrieval for Image and Video Database, 1995
    [27] M. L. Kherfi, D. Ziou, A.Bernardi. Image Retrieval From the World Wide Web:Issues, Techniques, and Systems. ACM Computing Surveys, 2004, 36(1): 35~67
    [28] A. W. M. Smeulders, M. Worring, S. Santini, et al. Content-Based Image Retrieval at the End of the Early Years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12): 1349~1380
    [29] K. Zettsu, Y. Kidawara, K. Tanaka. Retrieving Web images based on their usage context for augmenting ubiquitous contents. in: Proc. of IEEE Pacific Rim Conference on Communications, Computers and signal Processing (PACRIM 2003), 2003, 923~926
    [30] SMEULDERS, A. W. M., WORRING, M., SANTINI, S., et al. Content-based image retrieval at the end of the early years. IEEE Trans. Patt. Analys. Mach. Intell. 22, 12, 2000,1349~1380
    [31] BERCHTOLD, S., KEIM, D. A., KRIEGEL, H.-P. The X-Tree: An index structure for high-dimensional data. In Proceedings of the 22nd VLDB Conference (Bombay, India), 1996,28~39
    [32] GEVERS, T., ALDERSHOFF, F., SMEULDERS, A. W. M. Classification of images on internet by visual and textual information. In Proceedings of the Internet Imaging (SPIE, San Jose, CA). 2000
    [33] GEVERS, T., SMEULDERS, A. PicToSeek: A content-based image search system for the World Wide Web. In Proceedings of the VISUAL97 Conference (San Diego, CA), 1997,93~100
    [34] WebER, R., SCHEK, H.-J., BOTT, S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large Data Bases (New York, NY), 1998,194~205
    [35] AMSALEG, L., GROS, P., MEZHOUD, R. Mise en base d’images indexées par des descripteurs locaux: problèmes et perspectives. Res. Rep. No. 1316-mars 2000. Institut de Recherche en Informatique et Systèmes Aléatoires, Rennes, France. 2000
    [36] BERCHTOLD, S., B¨OHM, C., KRIEGEL, H.-P. The Pyramid-Tree: Breaking the curse of dimensionality. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Seattle, WA), 1998,142~153
    [37] Rubner, Y., Guibas, L., Tomasi, C. The Earth Mover’s Distance, Multi-Dimensional Scaling, and color based image retrieval. In Proceedings of the ARPA Image Understanding Workshop, 1997, 661~668
    [38] Hillier, F., Lieberman, G. Introduction to Linear Programming. McGraw-Hill, 1990
    [39] Faloutsos, C., Barber, R., Flickner, M., et al. Efficient and effective querying by image content. Journal of Intelligent Information Systems, 1994, 3:231~262
    [40] Jing, F., Li, M., Zhang, H., et al. An Efficient and Effective Region-Based Image Retrieval Framework. IEEE Transactions on Image Processing, 2004, 13(5): 699~709
    [41] Grauman, K., Darrell, T. Fast contour matching using approximate earth movers distance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004
    [42] Lazebnik, S., Schmid, C., Ponce, J. Sparse texture representation using affine-invariant neighborhoods. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003
    [43] Typke, R., Giannopoulos, P., Veltkamp, R., et al. Using transportation distances for measuring melodic similarity. In Proceedings of the Fourth International Conference on Music Information Retrieval (ISMIR), 2003
    [44] Typke, R., Veltkamp, R., Wiering, F. Searching notated polyphonic music using transportation distances. In Proceedings of the 12th annual ACM international conference on Multimedia, 2004, 128~135
    [45] Demirci, M., Shokoufandeh, A., Dickinson, S., et al. L. Many-to-many feature matching using spherical coding of directed graphs. In Computer Vision-ECCV 2004: 8th European Conference on Computer Vision, Prague, Czech Republic. Lecture Notes in Computer Science, Springer, 2004
    [46] Lavin, Y., Batra, R., Hesselink, L. Feature comparisons of vector fields using earth mover’s distance. In Proceedings of the conference on Visualization, 1998, 103~109
    [47] Puzicha, J., Buhmann, J., Rubner, Y., Tomasi, C. Empirical evaluation of dissimilarity measures for color and texture. In Proceedings of the International Conference on Computer Vision-Volume 2, 1999, 1165~1173
    [48] Rubner, Y., Tomasi, C. Perceptual Metrics for Image Database Navigation. Kluwer Academic Publishers, 2001
    [49] Hafner, J., Sawhney, H., Equitz, W., et al. Efficient color histogram indexing for quadratic form distance functions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(7):729~736
    [50] Sawhney, H., Hafner, J. Efficient color histogram indexing. In Proceedings of the IEEE International Conference on Image Processing, 1994, 66~70
    [51] W. Richard Stevens, Bill Fenner and Andrew M. Rudoff. Unix网络编程.第2卷进程间的通信.(第3版).杨继张.北京:清华大学出版社,2006. 32~59
    [52] R. Zhao, W. I. Grosky. Narrowing the semantic gap-Improved text-based Web document retrieval using visual features. IEEE Trans. Multimedia, 2002, 4(1): 189~200

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700