基于内容的多媒体视觉信息搜索研究

英文题名：Research on Content-based Multimedia Visual Information Retrieval
作者：赵英海
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：视觉内容搜索 ; 多标注概念检测 ; 特定概念检测 ; 标签排序 ; 颜色结构 ; 多媒体视觉信息 ; 机器学习
英文关键词：visual content retrieval ; multi-label concept detection ; specific concept detection ; tag ranking ; color layout ; multimedia visual information ; machine learning
学位年度：2010
导师：吴秀清
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2010-05-01

摘要

近年来,随着电子技术、多媒体技术的发展,多媒体视觉内容信息作为一种直观形象、更具吸引力的知识表达形式产生着越来越重要的影响。与此同时,互联网技术、大容量数据存储技术的进步有效促进了多媒体视觉内容信息的存储及传播。面对如此丰富的视觉内容信息,如何实现合理地、有效地组织、表达及搜索,已成为现阶段信息检索领域研究的热点问题。
     本文从基于内容的多媒体视觉信息搜索的总体框架出发,以视觉内容分析为主线,分别从视觉内容多标注语义概念检测、视觉内容特定概念检测、视觉内容标注语义相关度排序、交互式颜色结构搜索四个方面展开研究。本论文的主要研究工作和创新点如下：
     1.针对多媒体视觉内容多标注概念检测问题,提出了一种基于稀疏图结构的转导半监督学习方法。传统方法假设多个语义概念间相互独立,忽略了语义概念之间的相关性信息。本文方法利用信号稀疏化表达原理挖掘样本间的视觉相似性关系以及概念间的分布相关性关系,并通过隐马尔可夫随机场模型将概念间分布相关性与半监督学习一致性假设有机结合在一起,完成多标注转导半监督学习。算法在克服训练样本缺乏问题的同时,通过稀疏化方法合理挖掘了概念间的相关性,改进了标注表现并降低了模型复杂度。算法在TRECVID 2005数据集上与6种相关算法进行了比较,实验结果验证了本文算法的有效性。
     2.针对视觉内容中特定概念(船只目标)的检测应用,提出一种基于灰度标准差平面局部Contrast-Box滤波的可见光遥感图像中船只目标检测方法。选用局部灰度统计标准差作为检测特征实现了对黑-白两种极性船只目标的统一描述,并消除了海面背景平均亮度变化的影响,同时有效降低了问题的规模。选用Contrast-Box局部自适应滤波在检测特征平面上完成候选目标定位,并利用了目标的空间结构信息,有效克服云、海浪、船只尾迹的影响。
     3.针对网络共享多媒体视觉内容的噪声标注信息,提出一种基于视觉内容语义相关度的标签排序方法。算法基于贝叶斯理论给出标签与视觉内容语义相关度定义的概率描述,同时考虑了标签本身的视觉信息语义相关先验概率和标签与特定视觉内容语义相关的似然概率。针对不同底层特征在表达不同的语义内容时的语义鸿沟状况,融合全局与局部视觉特征实现对不同语义的标签与视觉内容间相关度概率的准确估计。算法具有无监督特性,能够自动挖掘网络数据完成运算,不需要事先提供训练样本以及额外的训练过程。算法在较大规模的Flickr数据集上进行了实验,实验结果表明本文方法能够对与视觉内容相关的标签和与上下文信息相关的标签实现正确地区分。
     4.在基于关键字的多媒体视觉内容搜索模式的基础上,提出了一种面向颜色结构信息挖掘的交互式视觉内容搜索技术。通过两种搜索模式的结合帮助用户搜索获取不仅语义上相关而且满足用户颜色结构搜索意向的视觉内容结果。颜色结构信息通过新设计的二进制形式的特征进行表达。特征具有很小的空间存储需求。颜色结构一致性定义考虑了不同感兴趣颜色间的绝对、相对以及上下文空间分布一致性关系。一致性计算过程可通过按位比特运算在线完成。在交互界面方面,本文提供了多种灵活的颜色选择、空间表达交互方式。算法基于网络图像搜索数据进行了实验,从参数设置、时间空间复杂度、相关算法性能比较、用户调查等多个方面进行了细致地分析与验证。
With the great advances in electronic and multimedia techniques, multimedia visual content information which acts a vivid and interesting knowledge representa-tion modality has more and more influence in recent years. Meanwhile, the develop-ment of Internet and large-scale data storage techniques accelerate the storage and propagation of multimedia visual content information furthermore. How to organize, represent, and retrieval these gigantic volume of visual content information has been a focus problem in modern information retrieval community.
     Concentrating on the overall framework of content-based multimedia visual con-tent retrieval, in this thesis, we research into four aspects of visual content analysis: visual content multi-label concept detection, visual content specific concept detection, visual content label ranking based on semantic relevance and interactive color layout retrieval. The main contributions are illustrated as follows:
     1. For the problem of visual content multi-label concept detection, we propose a sparse graph based transductive semi-supervised learning method. Conventional methods assume that the concepts happen independently, hence neglect the cor-relation among multiple concepts. We exploit the sparse signal representation theory to mine the visual similarity among instanes and the distribution correla-tion amone concepts. Then, the concept correlation attributes and the consistency assumptions of semi-supervised learning are integrated together under the hidden Markov random field framework. The semi-supervised learning could overcome the problem of lacking of training data, and the sparse techniques catch the con-cept correlation more reasonably and efficiently, which improve the annotation performance and reduce the model complexity. Our method is evaluated on the TRECVID 2005 dataset, and conducts extensive comparative experiments with respect to 6 related methods.
     2. For the problem of visual content specific concept (ship) detection, we propose a ship detection scheme based on Contrast-Box filtering on the 2-D feature plane constructed with local intensity standard deviation. Taking the intensity standard deviation as detection feature could reach a consistent characterization for ships of both white and black polarity, and remove the brightness variances of sea background, meanwhile, reduce the problem to a reasonable scale. The Con- trast-Box filtering process could detect the target candidates on the detection fea-ture plane self-adaptively by exploiting the spatial structure information of the targets, and remove the false alarms caused by clouds, waves and ship tracks.
     3. For the noisy social-tagging results of the community-contributed multimedia visual contents, we propose a tag ranking algorithm based on visual content se-mantic relatedness. In this algorithm, the definition of semantic relatedness be-tween tag and visual content is formulated in probability based on Bayes'theo-rem, taking account to both the prior visual information related probability of tags and the relatedness likelihood between tag and specific visual content. Morever, because different visual features have different semantic gap size when representing different semantic contents, global and local features are fused to conduct the probability estimation more accurately. The proposed method is of semi-supervised in nature, and fullfills based on the internet data, and does not need any training data and the time cost of model training. This method is eva-luated on a large scale Flickr image dataset and the experimental results demon-strate that the visual content related tags could be distinguished from the contex-tual tags effectively.
     4. As a powerful supplement of keyword-based visual content retrieval scheme, we propose an interactive multimedia search scheme based on visual content color layout to help users get search results which are not only related in semantic but also in consistency of color layout. The color layout information is characterized by a novel feature in binary format which is compact in storage. The consistency definition between color layouts simultaneously considers the absolute, relative and contextual spatial distribution consistency of the colors. The consistency computation could be completed online through bit-wise operations. Moreover, a convenient interactive interface is presented to allow users to specify interest col-or layout flexibly. Extensive experiments are conducted on internet image search results to evaluate the proposed approach in every aspect, such as parameter sen-sitivity, time-space complexity, performance comparison, and user study.

引文

[1]Flickr:http://www.flickr.com/.
    [2]YouTube:http://www. youtube.com/.
    [3]Picasa:http://picasa.google.com/.
    [4]Facebook:http://www.facebook.com/.
    [5]Hulu:http://www.hulu.com/.
    [6]土豆网：www.tudou.com/.
    [7]优酷网：www.youku.com/.
    [8]A.D.Bimbo. Visual Information Retrieval. Morgan Kaufmann, San Francisco, CA,1999.
    [9]M.S.Lew. Principles of Visual Information Retrieval. Springer Verlag,2001.
    [10]Lew M. S, Sebe N., Djeraba C., and Jain, R. Content-based multimedia information re-trieval:State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Vol.2(1),2006.
    [11]Datta R., Joshi D., Li J., Wang J. Z. Image retrieval:Ideas, influences, and trends of the new age. ACM Comput. Surv. Vol.40(2):1-60,2008.
    [12]Swain M. J., Ballard D. H. Color indexing. Int. J. Comput. Vision. Vol.7(1):11-32,1991.
    [13]Huang J., Kumar S. R., Mitra M., Zhu W. Spatial Color Indexing and Applications. in Pro-ceedings of the Sixth international Conference on Computer Vision,1998.
    [14]Manjunath B. S, Ma W. Y. Texture Features for Browsing and Retrieval of Image Data. IEEE Trans. Pattern Anal. Mach. Intell. Vol.18(8):837-842,1996.
    [15]Veltkamp R. C, Hagedoorn, M. State of the art in shape matching. in Principles of Visual information Retrieval. Springer-Verlag, London,87-119,2001.
    [16]Sebe N., Lew M. S. Robust Shape Matching. InProceedings of the international Conference on Image and Video Retrieval, vol.2383:17-28. Springer-Verlag, London.2002.
    [17]Wang J., Wiederhold G., Firschein O., Wei S. Content-based Image Indexing and Searching Using Daubechies' Wavelets. International Journal on Digital Libraries (IJODL), Vol.1(4): 311-328,1998.
    [18]Lowe, D. G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision. Vol.60(2):91-110,2004.
    [19]Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool. SURF:Speeded Up Robust Features. Computer Vision and Image Understanding (CVIU), Vol.110(3):346-359,2008.
    [20]C. Schmid, R. Morh. Image Retrieval Using Local Characterization. In International Con- ference on Image Processing, pp.781-.783,1996.
    [21]Tuytelaars, T. and Gool, L. J. Content-Based Image Retrieval Based on Local Affinely In-variant Regions. InProceedings of the Third international Conference on Visual information and information Systems, vol.1614:493-500, Springer-Verlag, London,1999.
    [22]Del Bimbo A, Pala P. Visual Image Retrieval by Elastic Matching of User Sketches. IEEE Trans. Pattern Anal. Mach. Intell. Vol.19(2):121-132,1997.
    [23]Smeulders A. W., Worring M., Santini S., Gupta A., Jain R. Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Anal. Mach. Intell. Vol.22(12): 1349-1380,2000.
    [24]Wilson R. C., Hancock E. R. Structural Matching by Discrete Relaxation. IEEE Trans. Pat-tern Anal. Mach. Intell.Vol.19(6):634-648,1997.
    [25]Wolfson H. J., Rigoutsos I. Geometric Hashing:An Overview. IEEE Comput. Sci. Eng. Vol.4(4):10-21,1997.
    [26]Fagin R. Combining fuzzy information from multiple systems. Journal of Computer and System Sciences. Vol.58(1):83-99,1999.
    [27]Weber M., Welling M., Perona P. Unsupervised Learning of Models for Recognition. In Proceedings of the 6th European Conference on Computer Vision. vol.1842:18-32. Sprin-ger-Verlag, London,2000.
    [28]Wu Y., Tian Q., Huang T. S. Discriminant-EM algorithm with application to image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp.222-227,2000.
    [29]Flickner M., Sawhney H., Niblack W., Ashley J., et al. Query by Image and Video Content: The QBIC System. Computer. Vol.28(9):23-32,1995.
    [30]Smith J. R., Chang, S. VisualSEEk:a fully automated content-based image query system. in Proceedings of the Fourth ACM international Conference on Multimedia, MULTIMEDIA '96. pp.87-98, ACM, New York,1996.
    [31]Ma W., Manjunath B. S. NeTra:a toolbox for navigating large image databases. Multimedia Syst. Vol.7(3):184-198,1999.
    [32]Jacobs C. E., Finkelstein A., Salesin D. H. Fast multiresolution image querying. In Pro-ceedings of the 22nd Annual Conference on Computer Graphics and interactive Techniques, SIGGRAPH '95. ACM, New York, NY, pp.277-286,1995.
    [33]Sigurbjornsson B., van Zwol R. Flickr tag recommendation based on collective knowledge. In Proceeding of the 17th international Conference on World Wide Web, WWW′08. ACM, New York, pp.327-336,2008.
    [34]Li X.,Snoek C. G., Worring M. Learning tag relevance by neighbor voting for social image retrieval. InProceeding of the 1st ACM international Conference on Multimedia informa-tion Retrieval, MIR'08. ACM, New York, NY, pp.180-187,2008.
    [35]Chen H., Chang M., Chang P., Tien M., Hsu W. H., Wu J. SheepDog:group and tag rec-ommendation for flickr photos by automatic search-based learning. In Proceeding of the 16th ACM international Conference on Multimedia, MM'08. ACM, New York, NY, pp.737-740,2008.
    [36]Wu L., Hua X., Yu N., Ma W., Li S. Flickr distance. In Proceeding of the 16th ACM inter-national Conference on Multimedia, MM'08. ACM, New York, pp.31-40,2008.
    [37]Hauptmann, A. Lessons for the Future from a Decade of Informedia Video Analysis Re-search. International Conference on Image and Video Retrieval, CIVR'05. pp.20-22,2005.
    [38]Hauptmann A., Yan R., Lin W-H., Christel M., Wactlar H. Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? IEEE Transactions on Multimedia. Vol.9(5), pp. 958-966,2007.
    [39]Mu X. Content-based video retrieval:does video's semantic visual feature matter?. In Pro-ceedings of the 29th Annual international ACM SIGIR Conference on Research and De-velopment in information Retrieval, SIGIR'06. ACM, New York, pp.679-680,2006.
    [40]Oliva A., Torralba A. Modeling the Shape of the Scene:A Holistic Representation of the Spatial Envelope. International Journal of Computer Vision. Vol.42(3):145-175,2001.
    [41]Dalal N., Triggs B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recogni-tion, Cvpr'05. pp.886-893,2005.
    [42]Gao Xinbo, Xiao Bing, Tao Dacheng, Li Xuelong, Image categorization:Graph edit dis-tance+edge direction histogram. Pattern Recognition,Vol.41(10):3179-3191,2008.
    [43]Kim Y, Oh I. Watermarking text document images using edge direction histograms. Pattern Recogn. Lett. Vol.25(11):1243-1251,2004.
    [44]Mikolajczyk K., Schmid C. An Affine Invariant Interest Point Detector. In Proceedings of the 7th European Conference on Computer Vision. vol.2350, Springer-Verlag. pp.128-142, 2002.
    [45]Mikolajczyk K., Schmid C. Scale & Affine Invariant Interest Point Detectors. International Journal of Computer Vision, Vol.60(1):63-86,2004.
    [46]Schaffalitzky F., Zisserman A. Multi-view Matching for Unordered Image Sets, or“How Do I Organize My Holiday Snaps?”. In Proceedings of the 7th European Conference on Computer Vision. vol.2350:414-431. Springer-Verlag, London,2002.
    [47]Crowley James L., Alice C. Parker. A representation for shape based on peaks and ridges in the difference of lowpass transform. IEEE Trans. on Pattern Analysis and Machine Intelli-gence. Vol.6(2):156-170,1984.
    [48]Lindeberg Tony. Detecting salient blob-like image structures and their scales with a scale-space primal sketch:a method for focus-of-attention. International Journal of Com-puter Vision. Vol.11 (3):283-318,1993.
    [49]Lowe D. G. Object Recognition from Local Scale-Invariant Features. In Proceedings of the international Conference on Computer Vision, ICCV'99. pp.1150-1157,1999.
    [50]J. Matas O. Chum, M. Urban T. Pajdla. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, Vol.22(10), British Machine Vision Com-puting.pp.761-767,2002.
    [51]Yan Ke, Rahul Sukthankar. PCA-SIFT:A More Distinctive Representation for Local Image Descriptors. IEEE Computer Society Conference on Computer Vision and Pattern Recogni-tion (CVPR'04), vol.2, pp.506-513,2004.
    [52]Krystian Mikolajczyk, Cordelia Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.10(27):1615-1630, 2005.
    [53]Belongie S., Malik J., Puzicha J. Shape Matching and Object Recognition Using Shape Contexts. IEEE Trans. Pattern Anal. Mach. Intel]. vol.24(4):509-522,2002).
    [54]Mikolajczyk K., Tuytelaars T., Schmid C., Zisserman, et al. A Comparison of Affine Region Detectors. Int. J. Comput. Vision. vol.65(1-2):43-72,2005.
    [55]Richard O. Duda, Peter E. Hart, David G. Stork. Pattern classification(2nd edition), Wiley, New York,2001.
    [56]Burges C. J. A Tutorial on Support Vector Machines for Pattern Recognition. Data Min. Knowl. Discov. vol.2(2):121-167,1998.
    [57]B. E. Boser, I. M. Guyon, V. N. Vapnik. A training algorithm for optimal margin classifier. Proceeding of the 5th Annual ACM workshop on Computational Learning Theory, pp. 144-152,1992.
    [58]M. J. Wainwright, M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning. Vol.1:1-305,2008.
    [59]Xiaojin Zhu, Zoubin Ghahramani, John Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In The 20th International Conference on Machine Learning (ICML),2003.
    [60]Zhou D., O. Bousquet, T.N. Lal, J. Weston, B. Scholkopf. Learning with Local and Global Consistency. Advances in Neural Information Processing Systems, vol.16:321-328. MIT Press, Cambridge, MA, USA,2004.
    [61]Wang F., Zhang C. Label propagation through linear neighborhoods. In Proceedings of the 23rd international Conference on Machine Learning, ICML'06, vol.148:985-992,2006.
    [62]Qi G., Hua X., Rui Y., Tang J., Mei T., Zhang H. Correlative multi-label video annotation. In Proceedings of the 15th international Conference on Multimedia, MULTIMEDIA'07. pp. 17-26,2007.
    [63]Rong Yan, Ming-yu Chen, Alexander Hauptmann. Mining Relationship Between Video Concepts using Probabilistic Graphical Models. IEEE International Conference on Multi-media and Expo, pp.301-304,2006.
    [64]Li J., Wang J. Z. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Ap-proach. IEEE Trans. Pattern Anal. Mach. Intell. vol.25(9):1075-1088,2003.
    [65]He J., Li M., Zhang H., Tong H., Zhang C. Manifold-ranking based image retrieval. In Proceedings of the 12th Annual ACM international Conference on Multimedia, MULTI-MEDIA'04. pp.9-16,2004.
    [66]J. He, M. Li, H.-J. Zhang, H. Tong, C. Zhang. Generalized manifold-ranking based image retrieval. IEEE Transactions on Image Processing. vol.15(10),2006.
    [67]Wang J., Zhao Y., Wu X., Hua X. Transductive multi-label learning for video concept de-tection. InProceeding of the 1st ACM international Conference on Multimedia information Retrieval, MIR'08.pp.298-304,2008.
    [68]Torralba A., Fergus R., Freeman W. T.80 Million Tiny Images:A Large Data Set for Non-parametric Object and Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. Vol.30(11):1958-1970,2008.
    [69]Wang C., Jing F., Zhang L., Zhang H.-J. Scalabel search-based image annotation. Multide-dia Systems. Vol.14(4):205-220,2008.
    [70]Wang X., Zhang L., Li X., Ma W. Annotating Images by Mining Image Search Results. IEEE Trans. Pattern Anal. Mach. Intell. Vol.30(11):1919-1932,2008.
    [71]Li Xirong, Snoek C. G., Worring M. Annotating images by harnessing worldwide us-er-tagged photos. InProceedings of the 2009 IEEE international Conference on Acoustics, Speech and Signal Processing, ICASSP. pp.3717-3720,2009.
    [72]Park G., Baek Y., Lee H. Web image retrieval using majority-based ranking approach. Mul-timedia Tools Appl. Vol.31(2):195-219,2006.
    [73]Hsu W. H., Kennedy L. S., Chang S. Video search reranking via information bottleneck principle. InProceedings of the 14th Annual ACM international Conference on Multimedia, MULTIMEDIA'06. pp.35-44,2006.
    [74]Jing Y, Baluja S. VisualRank:Applying PageRank to Large-Scale Image Search. IEEE Trans. Pattern Anal. Mach. Intell.Vol.30(11):1877-1890,2008.
    [75]D. W. Scott. Multivariate Density Estimation:Theory, Practice, and Visualization. New York:Wiley,1992.
    [76]Liu Y, Mei T., Hua X. CrowdReranking:exploring multiple search engines for visual search reranking. InProceedings of the 32nd international ACM SIGIR Conference on Re-search and Development in information Retrieval, SIGIR'09. pp.500-507, ACM, New York,2009.
    [77]Lu Y, Tian Q., Zhang L., Ma W. What Are the High-Level Concepts with Small Semantic Gaps? Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2008.
    [78]Liu D., Hua X., Yang L., Wang M., Zhang H. Tag ranking. In Proceedings of the 18th in-ternational Conference on World Wide Web, WWW'09. pp.351-360, ACM, New York, 2009.
    [79]Li X., Snoek C. G., Worring, M. Learning social tag relevance by neighbor voting. Trans. Multi:Vol.11 (7). pp.1310-1322,2009.
    [80]Cilibrasi R. L., Vitanyi, P. M. The Google Similarity Distance. IEEE Trans. on Knowl. and Data Eng.Vol.19(3):370-383,2007.
    [81]Wu L., Hua X., Yu N., Ma W., Li S. Flickr distance. In Proceeding of the 16th ACM inter-national Conference on Multimedia, MM'08. pp.31-40, ACM, New York,2008.
    [82]Yahoo! Key Scientific Challenges Program:http://labs.yahoo.com/ksc/Multimedia.
    [83]Zhu Xiaojin. Semi-Supervised Learning Literature Survey. Computer Sciences Techinical Report, University of Wisconsin-Madison,2008.
    [84]Jiang Wei, Chang Shih-Fu, Loui A.C. Active Context-Based Concept Fusionwith Partial User Labels. IEEE International Conference on Image Processing,vol.8(11):2917-2920, 2006.
    [85]Zhang M., Zhou Z. ML-KNN:A lazy learning approach to multi-label learning. Pattern Recogn.Vol.40(7):2038-2048,2007.
    [86]Joachims T. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of the Sixteenth international Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, CA, pp.200-209,1999.
    [87]Bennett K. P., Demiriz, A. Semi-supervised support vector machines. In Proceedings of the Conference on Advances in Neural information Processing Systems. MIT Press, Cambridge, MA, pp.368-374,1999.
    [88]Belkin M., Niyogi P., Sindhwani V. Manifold Regularization:A Geometric Framework for Learning from Labeled and Unlabeled Examples. J. Mach. Learn. Res. Vol.7:2399-2434, 2006.
    [89]Wang M., Hua X., Song Y., Yuan X., Li S., Zhang H. Automatic video annotation by semi-supervised learning with kernel density estimation. In Proceedings of the 14th Annual ACM international Conference on Multimedia, MULTIMEDIA'06. ACM, New York, NY, pp.967-976,2006.
    [90]Tang J., Hua X.-S., Mei T., Qi G.-J., Wu X. Video annotation based on temporally consis-tent Gaussian random field. Electronics Letters. vol.43(8):448-449,2007.
    [91]Tang J., Hua X., Qi G., Wang M., Mei T., Wu X. Structure-sensitive manifold ranking for video concept detection. In Proceedings of the 15th international Conference on Multime-dia, MULTIMEDIA'07. ACM, New York, NY, pp.852-861,2007.
    [92]Wang M., Hua X., Yuan X., Song Y., Dai L. Optimizing multi-graph learning:towards a unified video annotation scheme. In Proceedings of the 15th international Conference on Multimedia, MULTIMEDIA'07. ACM, New York, NY, pp.862-871,2007.
    [93]Chen Gang, Song Yanqiu, Wang Fei, et al. Semi-supervised Multi-label Learning by Solv-ing a Sylvester Equation. The 8th SIAM Conference on Data Mining. Atlanta Georgia, 2008:410-419.
    [94]Liu Yi, Jin Rong, Yang Liu. Semi-supervised multi-label learning by constrained non-negative matrix factorization. in Proceedings of the 21st National Conference on Ar-tificial intelligence. Boston:AAAI Press,2006:421-426.
    [95]Sam T. Roweis, Lawrence K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding[J]. Science.2000, Vol.290, No.5500:2323-2326.
    [96]Brefeld U., Scheffer T. Semi-supervised learning for structured output variables. In Pro-ceedings of the 23rd international Conference on Machine Learning, ICML'06, vol.148. ACM, New Yo(?), NY, pp.145-152,2006.
    [97]Altun Y., McAllester D., Belkin M. Maximum margin semi-supervised learning for struc-tured variables. Advances in Neural Information Processing Systems,2005.
    [98]K. Duh, K. Kirchhoff. Structured Multi-Label Transductive Learning. In NIPS Workshop on Advances in Structured Learning for Text/Speech Processing,2005.
    [99]C.-H. Lee, S. Wang, F. Jiao, D. Schuurmans, and R. Greiner. Learning to Model Spatial Dependency:Semi-Supervised Discriminative Random Fields. In NIPS, pages 793-800, 2006.
    [100]Zien A., Brefeld U., Scheffer T. Transductive support vector machines for structured va-riables. In Proceedings of the 24th international Conference on Machine Learning, ICML '07, vol.227. ACM, New York, pp.1183-1190,2007.
    [101]Vurajesh P. N. Rao, Bruno A. Olshausen, Michael S. Lewicki. Probabilistic Models of the Brain:Perception and Neural Function. The MIT Press,2002.
    [102]John Wright, Allen Yang, Arvind Ganesh, et al. Robust Face Recognition via Sparse Re-presentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009, Vol. 31, No.2:210-227.
    [103]Emmanuel Candes, Mark Rudelson, Terence Tao, et al. Error Correction via Linear Pro-gramming, in 46th Annual IEEE Symposium on Foundations of Computer Science. 2005:295-308.
    [104]David L. Donoho. For Most Large Underdetermined Systems of Linear Equations the Mi-nimal (?)1-norm Solution is also the Sparsest Solution[J]. Communications on Pure and Ap-plied Mathematics,2004, vol.59, no.6:797-829.
    [105]Frey B. J., Jojic N. A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models. IEEE Trans. Pattern Anal. Mach. Intell. Vol.27(9):1392-1416,2005.
    [106]Kolmogorov V. Convergent Tree-Reweighted Message Passing for Energy Minimization. IEEE Trans. Pattern Anal. Mach. Intell.Vol.28(10):1568-1583,2006.
    [107]Yuri Boykov, Vladimir Kolmogorov. An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.26(9):1124-1137,2004.
    [108]Zha Z., Mei T., Wang J., Wang Z., Hua X. Graph-based semi-supervised learning with mul-tiple labels. J. Vis. Comun. Image Represent. Vol.20(2):97-103,2009.
    [109]D.Y. Hu, L. Reichel. Krylov-subspace methods for the Sylvester equation, Linear Algebra Appl. Vol.172:283-313,1992.
    [110]TRECVID 2005:http://www-nlpir.nist.gov/projects/tv2005/tv2005.html.
    [111]Milind Naphade, John R. Smith, Jelena Tesic, et al. Large-Scale Conce(?)t Ontology for Multimedia[J]. IEEE MultiMedia,2006,13(3):86-91.
    [112]LSCOM Annotation:http://www.ee.columbia.edu/ln/dvmm/columbia374/
    [113]种劲松,朱敏慧.SAR图像舰船及尾迹检测研究综述[J].电子学报,2003,31(9)：1356-1360.
    [114]温佩芝,史泽林,于海斌等.基于小波变换的海面背景红外小目标检测方法[J].光电工程,2004,31(4)：38-41.
    [115]储昭亮,王庆华,陈海林,徐守时.基于极小误差阈值分割的舰船自动检测方法[J].计算机工程,2007,33(11)：239-242.
    [116]肖利平,曹炬,高晓颖.复杂海地背景下的舰船目标检测[J].光电工程,2007,34(6)：6-10.
    [117]David Casasent, Wei su, Deepak Turaga. SAR Ship Detection Using New Conditional Con-trast Box Filter[C]. SPIE conference on Algorithm for Synthetic Aperture Radar Imagery. Orlando, Florida,1999:274-284.
    [118]Knut Eldhuset. An Automatic Ship and Ship Wake Detection System for Spaceborne SAR Images in Coastal Regions[J]. IEEE Transactions on Geoscience and Remote sensing,1996 Vol.34, No.4:1010-1019.
    [119]贾云得编.机器视觉[M].北京：科学出版社,2000.
    [120]Kittler. J., Illingworth J. Minimum error thresholding. Pattern Recogn. Vol.19(1): 41-47,1986.
    [121]Godfried T. Toussaint. Solving geometric problems with the rotating calipers. Proceedings of IEEE MELECON'83. Athens, Greece,1983.
    [122]Milan Sonka, Vaclav Hlavac, Roger Boyle. Image Processing:Analysis and Machine Vision, 2 edition. CL-Engineering,1998.
    [123]Photobucket. http://photobucket.com/.
    [124]Kennedy L S, Chang S, Kozintsev I V. To search or to label?:predicting the performance of search-based automatic image classifiers. Proceedings of the 8th ACM International Work-shop on Multimedia Information Retrieval. USA:ACM Press,2006:249-258.
    [125]Wang C, Jing F, Zhang L, Zhang H. Scalable search-based image annotation of personal images, in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval. California:ACM Press,2006:26-27.
    [126]Wu L, Yang L, Yu N, Hua X-S. Learning to tag. in Proceedings of the 18th International Conference on World Wide Web. Spain:ACM Press,2009:361～370.
    [127]Kennedy L S, Naaman M. Generating diverse and representative image search results for landmarks. in Proceeding of the 17th International Conference on World Wide Web. China: ACM Press,2008:297-306.
    [128]Ames M, Naaman M. Why we tag:motivations for annotation in mobile and online media. in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. USA: ACM Press,2007:971～980.
    [129]Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei D M, Jordan M I. Matching words and pictures[J]. Journal of Machine Learning Research,2003,3(6):1107-1135.
    [130]Li Jia, James Z Wang. Real-time Computerized Annotation of Pictures [J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence,2008,30(6):985-1002.
    [131]Jin Y., Khan L., Wang L., Awad, M. Image annotations by combining multiple evidence & wordNet. InProceedings of the 13th Annual ACM international Conference on Multimedia, MULTIMEDIA'05. ACM, New York, NY, pp.706-715,2005.
    [132]Weinberger, K. Q., Slaney, M., Van Zwol, R. Resolving tag ambiguity. In Proceeding of the 16th ACM international Conference on Multimedia, MM'08. ACM, New York, pp.111-120, 2008.
    [133]Barnard K., Duygulu P., Forsyth D., de Freitas N., Blei D. M., Jordan M. I. Matching words and pictures. J. Mach. Learn. Res.Vol.3:1107-1135,2003).
    [134]Chang E., Kingshy Goh, Sychay G., Gang Wu. CBSA:content-based soft annotation for multimodal image retrieval using Bayes point machines. IEEE Transactions on Circuits and Systems for Video Technology, vol.13(1):26-38,2003.
    [135]Cusano Claudio, Ciocca Gianluigi, Schettini Raimondo. Image annotation using SVM. Proceedings of the SPIE, Volume 5304, pp.330-338,2003.
    [136]Pedro Quelhas, Florent Monay, Jean-Marc Odobez, Daniel Gatica-Perez, Tinne Tuytelaars. A Thousand Words in a Scene. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, pp.1575-1589,2007.
    [137]Datta R., Joshi D., Li J., Wang J. Z. Tagging over time:real-world image annotation by lightweight meta-learning. In Proceedings of the 15th international Conference on Multi-media, MULTIMEDIA'07. ACM, New York, pp.393-402,2007.
    [138]Yinghhai Zhao, Zheng-Jun Zha, Shanshan Li, Xiuqing Wu. Which Tags Are Related to Visual Content?.in 16th International Conference on Advances in Multimedia Modeling, pp. 669-675,2010.
    [139]Bruce Croft, Donald Metzler, Trevor Strohman. Search Engines:Information Retrieval in Practice[M]. USA:Addison Wesley,2009.
    [140]Sivic J, Zisserman A. Video google:A text retrieval approach to object matching in videos, in Proceedings of the Ninth IEEE International Conference on Computer Vision. France: IEEE Computer Society,2003:1470～1477.
    [141]Wikipedia:http://en.wikipedia.org/wiki/Wiki.
    [142]Tian X., Yang L., Wang J., Yang Y, Wu X., Hua X. Bayesian video search reranking. In Proceeding of the 16th ACM international Conference on Multimedia, MM'08. ACM, New York, pp.131-140,2008.
    [143]Cui J., Wen F., Tang X. IntentSearch:interactive on-line image search re-ranking. In Pro- ceeding of the 16th ACM international Conference on Multimedia, MM'08. ACM, New York, pp.997-998,2008.
    [144]Nicu Sebe, Michael S. Lew. Color-based retrieval. Pattern Recognition Letters. Vol. 22(2),pp.223-230,2001.
    [145]Ooi B., Tan K., Chua T., Hsu, W. Fast image retrieval using color-spatial information. The VLDB Journal. Vol.7(2):115-128,1998.
    [146]Matusiak S., Daoudi M., Blu T., Avaro O. Sketch-Based Images Database Retrieval. In Proceedings of the 4th international Workshop on Advances in Multimedia information Systems, Lecture Notes In Computer Science, vol.1508. Springer-Verlag, London, pp.185-191,1998.
    [147]Mehtre B. M., Kankanhalli M. S., Lee W. F. Shape measures for content based image re-trieval:a comparison.Inf. Process. Manage. Vol.33(3):319-337,1997.
    [148]Fuh C.-S., Cho S.-W., Essig. K. Hierarchical color image region segmentation for con-tent-based image retrieval system. IEEE Trans Image Process. Vol.9(1):156-62,2000.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700