基于机器学习的图像检索若干问题研究

英文题名：Research on Image Retrieval with Machine Learning Techniques
作者：张磊
论文级别：博士
学科专业名称：计算机系统结构
中文关键词：图像检索 ; 图像标注 ; 图像重排序 ; 物体检测 ; 支持向量机 ; 机器学习
英文关键词：Image Retrieval ; Image Annotation ; Image Re-ranking ; Object Detection ; SVM ; Machine Learning
学位年度：2011
导师：马军
学科代码：081201
学位授予单位：山东大学
论文提交日期：2011-04-20

摘要

近十年来,随着数码相机、拍照手机、带有摄像头的移动电脑的普及,数字图像得以大量涌现,而随着互联网技术的发展,特别是web 2.0技术的流行,图像的传播和扩散也变得越来越容易。如何快速、有效地组织和管理这些海量的图像信息,已经成为学术界和工业界共同关注的热点问题。近些年来,随着研究的深入,机器学习技术被广泛的应用于图像检索领域,例如图像标注、图像内容的分类、用户反馈的建模、图像搜索结果的排序、图像数据集的获取等等。
     本文围绕机器学习框架下的图像检索这一研究主题,主要针对图像标注(image annotation)、图像重排序(image re-ranking)和物体检测(object detection)这三个问题展开研究。论文的主要工作与创新体现在以下几个方面：
     1：图像标注的目的是根据图像的视觉内容来确定对应的文本语义描述。本文提出了一种把词汇间的语义关系嵌入到多类支持向量机中的图像标注方法。首先,每幅图像被分成5个固定大小的块(block),对于训练集中的图像,手工指定每个标注词对应于哪个块,词汇间的语义关系通过共现矩阵来计算。然后,利用MPEG-7视觉描述子表示每个块的视觉特征。为了减少特征维数,采用了一种名为mRMR(最小重复性最大相关性)的特征选择方法。同时针对Corel 5000数据集中的80个语义词,训练了一个多类支持向量机分类器。最后,把支持向量机分类器的后验概率输出和词汇间语义关系集成到一起,用于得到图像的标注词。在Corel 5000数据集中的实验表明此方法是有效的。
     2：图像重排序是指在原始搜索结果排序的基础上,通过利用图像内容、挖掘数据关联、或者借鉴领域知识和人工交互,对原始搜索结果进行重排序提升用户满意度的过程。当前的商业搜索引擎尽管在语义相关性上取得很大进步,但由于较少利用图像内容本身,造成图像排序结果缺乏视觉多样性。而一些研究者提出的纯粹基于聚类的方法,在取得视觉多样性的同时,又有把不相关图像排在前面的风险。
     本文提出了一种同时兼顾语义相关性和视觉多样性的图像重排序方法,本算法是一种混合方法,把Leuken等人提出的相互投票算法和Deselaers等人提出的贪心算法综合起来,以同时获得两种方法的优点。首先,每幅图像根据视觉相似度为其它图像投票,得票数最高的一些图像作为候选者。然后利用一个受限的轻量级贪心算法来找出最相关和最有新鲜感的图像作为聚类的中心。在计算视觉相似度时,混合了不同的视觉特征,包括颜色、纹理和主题特征。同时利用PLSA和LDA两种潜在主题模型作为降维手段,并在实验中比较了这两种主题模型,并讨论了综合主题特征的优点。首次引入了聚类查全率和NDCG的调和平均值作为衡量排序性能的标准。对Google和Bing的初始排序结果做了大量的重排序实验,与学术界领先的算法做了比较,通过计算聚类召回率、F1值、聚类召回率与NDCG的调和平均值表明,本文方法是可行的。
     3：物体检测的目的在于不仅需要判断出某图像中有无该物体,还需要指出该物体在图像中的具体位置。当前领先的物体检测技术主要采用有监督的机器学习方法并组合多种特征,这些基于有监督学习的方法需要大量的训练数据,但标注用于物体检测的训练数据非常耗时,需要大量的人力。虽然一些研究者提出可以利用web图像或者半监督学习技术来获取物体的图像库,但这些图像库中由于没有物体的具体位置信息,一般情况下只能用于物体的分类。
     本文首次提出可以利用Flickr中的notes数据来获取物体检测数据集,本方法的目的是希望能够以较少的人力提供用于物体检测的训练数据,并且保证训练数据的高质量,这些可以通过挖掘Flickr中的notes数据来实现。Notes数据是由用户在图像中添加的感兴趣的区域(矩形框)及其元数据,包括矩形框的位置、大小以及文本。本文的方法首先通过文本挖掘找到与物体有语义关联的初始图像集,然后从初始集中人工选择出高质量图像作为种子集,最后这个种子集通过增量式的主动学习算法来扩展。在PASCAL VOC2007和NUS-WIDE数据集中做了实验,结果表明本方法获取的数据集可以作为传统数据集的补充,甚至替代传统数据集。
In recent ten years, the digital images have been springing up rapidly due to the rising popularity of digital cameras, camera phones and mobile PCs with cameras. Meanwhile, with the develop of Internet techniques, especially Web 2.0, the sharing and defusing of images get easier day by day. How to organize these massive images efficiently and effectly has become a hot topic in both academic and industrial community. At the same time as research progressed, machine learning techniques have been widely used in image retrieval area, such as image annotation, image classification, user feedback, image re-ranking and image dataset acquisition.
     This thesis investigates three specific image retrieval problems with machine learning techniques:image annotation, image re-ranking and object detection. The main research contents and innovations of this thesis are listed as follows:
     1. Image annotation refers to the labeling of an image depending on the content of the image. An image annotation approach by incorporating word correlations into multi-class support vector machines (SVM) is proposed. At first, each image is segmented into five fixed-size blocks instead of timeconsuming object segmentation. Every keyword from training images is manually assigned to the corresponding block and word correlations are computed by a co-occurrence matrix. Then, MPEG-7 visual descriptors are applied to these blocks to represent visual features and the mRMR (minimal redundancy maximum relevance) method is used to reduce the feature dimension. A block-feature based multi-class SVM classifier is trained for 80 semantic concepts from Corel 5000 dataset. At last, the probabilistic outputs from SVM and the word correlations are integrated to obtain the final annotation keywords. The experiments on Corel 5000 dataset demonstrate this approach is effective and efficient.
     2. The image re-ranking process is used to improve the user satisfaction by reordering the images based on the multimodal cues extracted from the initial search results (including image content and data association), the auxiliary knowledge, user feedback, etc. Eventhough there has been a noticeable improvement in current commercial search engines for retrieving relevant images, the search results are lack of visual diversity without analyzing the visual content of images. Some researches impove the visual diversity by purely cluster-based methods, with the risk of little relevant images being at top ranks.
     An image re-ranking approach, which takes semantic relevance and visual diversity into consideration, is proposed. This approach is a hybrid approach to capture the benefits of the reciprocal election algorithm proposed by R.van Leuken et al. and the greedy search algorithm proposed by T.Deselaers et al. At first, each image casts votes for other images according to visual similarity. The images with the top highest votes are selected as candidate representatives. Then a bounded greedy selection algorithm is employed to select the most novel and relevant one as the cluster representative. This approach fuses different visual features to calculate image similarity including color, texture, and especially topic content features. We present an evaluation of pLSA and LDA as dimension reduction approach for the task of web image re-ranking and discuss the benefits of integrating topic distribution features. This thesis novelly introduces the harmonic mean of cluster recall and NDCG as a criterion to evaluate the reranking performance. Extensive experiments and the comparation with state-of-the-art methods demonstrate that using this approach to re-rank an initially returned set of images from Google and Bing search engines is a practical way to improve the user satisfaction in terms of cluster recall, F1 score and the harmonic mean of NDCG and cluster recall.
     3. Object detection systems aim at deciding not only whether an image contains a specific object but also where the object is. Most of the state-of-the-art systems for object detection combine multiple features with machine learning techniques. For these supervised learning methods to work well, one needs large amounts of labeled training data, but image labeling for object detection is very time consuming and requires a large amount of human efforts. Some researches can gather object images by exploiting images from the web or using some semi-supervised techniques. However, these images are not able to be used for object detection since there are no object size and position information.
     An effortless training data acquisition approach for object detection by active learning through notes data in Flickr is proposed. The motivation is to provide high-quality training dataset for object localization with minimum human effort. This can be realized by data mining through the notes data in Flickr. The notes are some specific interesting regions (bounding boxes) defined by users. The metadata of a note includes the position, size and text of a bounding box. In this approach, a text mining method to gather semantically related images for a specific class is applied at first. Then a handful of images are selected manually as seed images or initial training set. At last, the training set is expanded by an incremental active learning framework. This approach requires significantly less manual supervision compared to standard methods. The experimental results on the PASCAL VOC 2007 and NUS-WIDE datasets show that the training data acquired by this approach can complement or even substitute conventional training data for object localization.

引文

[1]Niblack, C.W., R. Barber, W. Equitz, M.D. Flickner, E.H. Glasman, D. Petkovic, P. Yanker, C. Faloutsos, and G. Taubin, QBIC project:querying images by content, using color, texture, and shape. Proceedings of SPIE, 1993.173(1993).
    [2]Pentland, A., R.W. Picard, and S. Sclaroff, Photobook:Content-based manipulation of image databases. International Journal of Computer Vision, 1996.18(3):p.233-254.
    [3]Smith, J.R. and S.F. Chang, VisualSEEk:a fully automated content-based image query system. Proceedings of the fourth ACM international conference on Multimedia,1997:p.87-98.
    [4]Huang, T.S., S. Mehrotra, and K. Ramchandran, Multimedia analysis and retrieval system (MARS) project. Proc of 33rd Annual Clinic on Library Application of Data Processing-Digital Image Access and Retrieval,1996. 14(1):p.212-225.
    [5]Wang, J.Z., J. Li, and G Wiederhold, SIMPLIcity:Semantics-Sensitive Integrated Matching for Picture Libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence,2001:p.947-963.
    [6]Blei, D.M., A.Y. Ng, and M.I. Jordan, Latent Dirichlet allocation. Journal of Machine Learning Research,2003.3(5):p.993-1022.
    [7]Chang, E., K. Goh, G. Sychay, and G. Wu, CBSA:content-based soft annotation for multimodal image retrieval using Bayes point machines. IEEE Transactions on Circuits and Systems for Video Technology,2003. 13(1):p.26-38.
    [8]Cusano, C., G. Ciocca, and R. Schettini, Image annotation using SVM, in Society of Photo-Optical Instrumentation Engineers(SPIE) Conference. 2003. p.330-338.
    [9]Tsai, C.F., K. McGarry, and J. Tait, CLAIRE:A modular support vector image indexing and classification system. ACM Transactions on Information Systems(TOIS),2006.24:p.353-379.
    [10]Duygulu, P., K. Barnard, N. de Freitas, and D. Forsyth, Object recognition as machine translation:Learning a lexicon for a fixed image vocabulary. Proceedings of the 7th European Conference on Computer Vision-Part IV, 2002:p.97-112.
    [11]Blei, D. and M. Jordan, Modeling annotated data. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,2003:p.127-134.
    [12]Jeon, J., V. Lavrenko, and R. Manmatha, Automatic image annotation and retrieval using cross-media relevance models. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,2003:p.119-126.
    [13]Torralba, A., R. Fergus, and W. Freeman,80 million tiny images:a large dataset for non-parametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,2008.30(11):p.1958-1970.
    [14]Wang, C., L. Zhang, and H. Zhang, Learning to reduce the semantic gap in web image retrieval and annotation, in Proceedings of the 31st annual international ACM SIGIR,2008, p.355-362.
    [15]van Leuken, R., L. Garcia, X. Olivares, and R. van Zwol, Visual diversification of image search results, in Proceedings of the 18th international conference on World wide web.2009. p.341-350.
    [16]Varma, M. and D. Ray, Learning the discriminative power-invariance trade-off, in ICCV.2007.
    [17]Kumar, A. and C. Sminchisescu, Support kernel machines for object recognition, in ICCV.2007.
    [18]Lampert, C., M. Blaschko, and T. Hofmann, Beyond sliding windows: Object localization by efficient subwindow search, in CVPR.2008.
    [19]Deselaers, T. and V. Ferrari, Global and Efficient Self-Similarity for Object Classification and Detection, in CVPR.2010.
    [20]Harzallah, H., F. Jurie, and C. Schmid, Combining efficient object localization and image classification, in ICCV.2009.
    [21]Vedaldi, A., V. Gulshan, M. Varma, and A. Zisserman, Multiple kernels for object detection, in ICCV.2009.
    [22]Everingham, M., L. Van Gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision,2010.88(2):p.303-338.
    [23]Tang, J. and P.H. Lewis, A Study of Quality Issues for Image Auto-Annotation with the Corel Data-Set. IEEE Transactions on Circuits and Systems for Video Technology,2007.17(3):p.1.
    [24]Wong, R. and C. Leung, Automatic Semantic Annotation of Real-World Web Images. IEEE Transactions on Pattern Analysis and Machine Intelligence,2008.30(11):p.1933-1944.
    [25]Goh, K., E. Chang, and B. Li, Using one-class and two-class SVMs for multiclass image annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005.17(10):p.1333-1346.
    [26]Fan, J., Y. Gao, H. Luo, and G. Xu, Automatic image annotation by using concept-sensitive salient objects for image content representation, in Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval.2004.
    [27]Chen, Y. and J. Wang, Image categorization by learning and reasoning with regions. The Journal of Machine Learning Research,2004.5:p.913-939.
    [28]Qi, X. and Y. Han, Incorporating multiple SVMs for automatic image annotation. Pattern Recognition,2007.40(2):p.728-741.
    [29]Carneiro, G., A. Chan, P. Moreno, and N. Vasconcelos, Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence,2007.29(3):p.394.
    [30]Monay, F. and D. Gatica-Perez, On image auto-annotation with latent space models, in Proceedings of the eleventh ACM international conference on Multimedia.2003.
    [31]Zhou, X., M. Wang, Q. Zhang, J. Zhang, and B. Shi, Automatic image annotation by an iterative approach:incorporating keyword correlations and region matching, in Proceedings of the 6th ACM international conference on Image and video retrieval.2007, p.25-32.
    [32]Jin, Y., L. Khan, L. Wang, and M. Awad, Image annotations by combining multiple evidence & wordNet, in Proceedings of the 13th annual ACM international conference on Multimedia.2005. p.706-715.
    [33]Fellbaum, C., WordNet:An electronic lexical database.1998:The MIT press.
    [34]Paramita, M., M. Sanderson, and P. Clough, Diversity in photo retrieval: overview of the ImageCLEFPhoto task 2009. CLEF working notes,2009.
    [35]Ferecatu, M. and H. Sahbi, TELECOMP aris Tech at ImageClefphoto 2008: Bi-modal text and image retrieval with diversity enhancement, in Working Notes for the CLEF 2008 workshop.2008.
    [36]Tang, J., T. Arni, M. Sanderson, and P. Clough, Building a diversity featured search system by fusing existing tools, in CLEF Workshop,2008, Springer.
    [37]Ben-Haim, N., B. Babenko, and S. Belongie, Improving Web-based Image Search via Content Based Clustering, in Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop.2006.
    [38]Google image swirl. Available from:http://image-swirl.googlelabs.com.
    [39]Song, K., Y. Tian, W. Gao, and T. Huang, Diversifying the image retrieval results, in Proceedings of the 14th annual ACM international conference on Multimedia.2006. p.707-710
    [40]Jing, Y. and S. Baluja, VisualRank:Applying PageRank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence,2008:p.1877-1890.
    [41]Popescu, A., P. Mo llic, I. Kanellos, and R. Landais, Lightweight web image reranking, in Proceedings of the seventeen ACM international conference on Multimedia.2009, p.657-660.
    [42]Deselaers, T., T. Gass, P. Dreuw, and H. Ney, Jointly Optimising Relevance and Diversity in Image Retrieval, in CIVR.2009.
    [43]Lampert, C. and M. Blaschko, A Multiple Kernel Learning Approach to Joint Multi-class Object Detection. Lecture Notes In Computer Science, 2008.5096:p.31-40.
    [44]Lampert, C., M. Blaschko, and T. Hofmann, Efficient Subwindow Search: A Branch and Bound Framework for Object Localization. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009.31(12):p.2129-2142.
    [45]Berg, T. and D. Forsyth, Animals on the web, in CVPR.2006.
    [46]Schroff, F., A. Criminisi, and A. Zisserman, Harvesting image databases from the web, in ICCV.2007.
    [47]Deng, J., W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in CVPR.2009.
    [48]Fergus, R., L. Fei-Fei, P. Perona, and A. Zisserman, Learning object categories from google's image search, in ICCV.2005. p.1816-1823.
    [49]Tang, J., S. Yan, R. Hong, G. Qi, and T. Chua, Inferring semantic concepts from community-contributed images and noisy tags, in ACM Multimedia. 2009, p.223-232.
    [50]Zhu, S., G. Wang, C. Ngo, and Y. Jiang, On the Sampling of Web Images for Learning Visual Concept Classifiers, in CIVR.2010.
    [51]Gammeter, S., L. Bossard, T. Quack, and L. Van Gool, I know what you did last summer:object-level auto-annotation of holiday snaps, in ICCV. 2009.
    [52]Li, L. and L. Fei-Fei, Optimol:automatic online picture collection via incremental model learning. International Journal of Computer Vision,2009. 88(2):p.147-168.
    [53]Guillaumin, M., J. Verbeek, C. Schmid, I. LEAR, and L. Kuntzmann, Multimodal semi-supervised learning for image classification, in CVPR. 2010.
    [54]Deselaers, T., B. Alexe, and V. Ferrari, Localizing objects while learning their appearance, in ECCV.2010.
    [55]Fei-Fei, L., R. Fergus, and P. Perona, A Bayesian approach to unsupervised one-shot learning of object categories, in ICCV.2003.
    [56]Russell, B., A. Torralba, K. Murphy, and W. Freeman, LabelMe:a database and web-based tool for image annotation. International Journal of Computer Vision,2008.77(1):p.157-173.
    [57]Von Ahn, L. and L. Dabbish, Labeling images with a computer game, in CHI04.2004, ACM. p.319-326.
    [58]Huiskes, M., B. Thomee, and M. Lew, New trends and ideas in visual concept detection:the MIR flickr retrieval evaluation initiative, in MIR. 2010, ACM. p.527-536.
    [59]Quack, T., B. Leibe, and L. Van Gool, World-scale mining of objects and events from community photo collections, in CIVR.2008. p.47-56.
    [60]Tong, S. and E. Chang, Support vector machine active learning for image retrieval, in ACM Multimedia.2001. p.107-118.
    [61]Collins, B., J. Deng, K. Li, and L. Fei-Fei, Towards scalable dataset construction:An active learning approach, in ECCV.2008. p.86-98.
    [62]Ertekin, S., J. Huang, L. Bottou, and L. Giles, Learning on the border: active learning in imbalanced data classification, in CIKM.2007, ACM. p. 127-136.
    [63]Holub, A., P. Perona, and M. Burl, Entropy-based active learning for object recognition, in Computer Vision and Pattern Recognition Workshops. 2008.
    [64]Jain, P. and A. Kapoor, Active learning for large multi-class problems, in CVPR.2009. p.762-769.
    [65]Chang, E., S. Tong, K. Goh, and C. Chang, Support vector machine concept-dependent active learning for image retrieval. IEEE Transactions on Multimedia,2005.2.
    [66]Hoi, S., R. Jin, J. Zhu, and M. Lyu, Semi-supervised SVM batch mode active learning with applications to image retrieval. ACM Transactions on Information Systems (TOIS),2009.27(3):p.1-29.
    [67]Joshi, A., F. Porikli, and N. Papanikolopoulos, Multi-class active learning for image classification, in CVPR.2009.
    [68]Kapoor, A., K. Grauman, R. Urtasun, and T. Darrell, Active Learning with Gaussian Processes for Object Categorization, in ICCV.2007.
    [69]Cohn, D., L. Atlas, and R. Ladner, Improving generalization with active learning. Machine learning,1994.15(2):p.201-221.
    [70]Huang, T, What are the 7 millennium problems in multimedia information retrieval?, in Proceeding of the 1st ACM international conference on Multimedia information retrieval.2008.
    [71]Manjunath, B.S., J.R. Ohm, V.V. Vasudevan, and A. Yamada, Color and Texture Descriptors. IEEE Transactions on Circuits and Systems for Video Technology,2001.11(6):p.703.
    [72]Buturovic, A., MPEG 7 Color Structure Descriptor. VizIR Project, http://vizir.ims.tuwien.ac.at,2005.
    [73]Tamura, H., S. Mori, and T. Yamawaki, Textural features corresponding to visual perception. IEEE Transactions on Systems, Man and Cybernetics, 1978,8(6):p.460-473.
    [74]Lowe, D., Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision,2004.60(2):p.91-110.
    [75]Mikolajczyk, K. and C. Schmid, A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005:p.1615-1630.
    [76]Bosch, A., A. Zisserman, and X. Munoz, Scene classification via pLSA, in ECCV.2006.
    [77]Van De Sande, K., T. Gevers, and C. Snoek, Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence.32(9):p.1582-1596.
    [78]Burghouts, G and J. Geusebroek, Performance evaluation of local colour invariants. Computer Vision and Image Understanding,2009.113(1):p. 48-62.
    [79]Dalal, N., B. Triggs, I. Rhone-Alps, and F. Montbonnot, Histograms of oriented gradients for human detection, in CVPR.2005. p.886-893.
    [80]Sivic, J. and A. Zisserman, Video Google:a text retrieval approach to object matching in videos, in Ninth IEEE International Conference on Computer Vision.2003. p.1470-1477.
    [81]Harris, C. and M. Stephens, A combined corner and edge detector, in Alvey vision conference.1988.
    [82]Nister, D. and H. Stewenius. Scalable recognition with a vocabulary tree. in CVPR.2006.
    [83]Yan, K. and R. Sukthankar, PCA-SIFT:A more distinctive representation for local image descriptors, in CVPR.2004. p.506-513.
    [84]Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool, Speeded-up robust features(SURF). Computer Vision and Image Understanding,2008.110(3): p.346-359.
    [85]Yang, J., Y. Jiang, A. Hauptmann, and C. Ngo, Evaluating bag-of-visual-words representations in scene classification, in Proceedings of the international workshop on Workshop on multimedia information retrieval.2007, ACM. p.206-213.
    [86]Wu, L., S.C.H. Hoi, and N. Yu, Semantics-preserving bag-of-words models and applications. IEEE Transactions on Image Processing.19(7):p. 1908-1920.
    [87]van Gemert, J., J. Geusebroek, C. Veenman, and A. Smeulders, Kernel codebooks for scene categorization, in ECCV.2008.
    [88]Lazebnik, S., C. Schmid, and J. Ponce, Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories, in CVPR.2006.
    [89]Grauman, K. and T. Darrell, The Pyramid Match Kernel:Discriminative Classification with Sets of Image Features, in Proceedings of the Tenth IEEE International Conference on Computer Vision-Volume 2.2005.
    [90]Vapnik, V., The nature of statistical learning theory.2000:Springer.
    [91]Kecman, V., Learning and Soft Computing:Support Vector Machines. Neural Networks, and Fuzzy Logic Models, MIT Press, Cambridge, MA, 2001:p.121-191.
    [92]邓乃扬and田英杰,数据挖掘中的新方法：支持向量机.2004：科学出版社.
    [93]Zhang, J., M. Marsza ek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories:A Comprehensive Study. International Journal of Computer Vision,2007. 73(2):p.213-238.
    [94]Platt, J.C., Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods. Advances in Large Margin Classifiers,1999:p.61-74.
    [95]Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995.20(3):p.273-297.
    [96]Krebel, U.H.G., Pairwise classification and support vector machines, in Advances in kernel methods.1999, MIT Press, p.255-268.
    [97]Platt, J.C., N. Cristianini, and J. Shawe-Taylor, Large margin DAGs for multiclass classification. Advances in neural information processing systems, 2000.12(3):p.547-553.
    [98]Chang, C.C. and C.J. Lin, LIBSVM:a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm,2001.80:p. 604-611.
    [99]Friedman, J., Another approach to polychotomous classification. Dept. Statistics, Stanford Univ., Tech. Rep,1996.
    [100]Hastie, T. and R. Tibshirani, Classification by pairwise coupling. The annals of statistics,1998.26(2):p.451-471.
    [101]Wu, T.F., C.J. Lin, and R.C. Weng, Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research,2004.5:p.975-1005.
    [102]段华,支持向量机的增量学习算法研究.2008,上海交通大学,博士学位论文.
    [103]Yeh, T. and T. Darrell, Dynamic visual category learning, in CVPR.2008.
    [104]Kembhavi, A., B. Siddiquie, R. Miezianko, S. McCloskey, and L. Davis, Incremental Multiple Kernel Learning for Object Recognition, in ICCV. 2009.
    [105]Ikizler-Cinbis, N., R. Cinbis, and S. Sclaroff, Learning actions from the web, in ICCV.2009.
    [106]Fern, A. and R. Givan, Online ensemble learning:An empirical study. Machine learning,2003.53(1):p.71-109.
    [107]Settles, B., Active Learning Literature Survey.2009, University of Wisconsin--Madison.
    [108]Zhu, X., Semi-supervised learning literature survey. Computer Science Technical Report 1530, University of Wisconsin-Madison,2005.
    [109]Bordes, A., S. Ertekin, J. Weston, and L. Bottou, Fast kernel classifiers with online and active learning. The Journal of Machine Learning Research, 2005.6:p.1579-1619.
    [110]Datta, R., D. Joshi, J. Li, and J.Z. Wang, Image Retrieval:Ideas, Influences, and Trends of the New Age. ACM Computing Surveys,2007.40.
    [111]Peng, H., F. Long, and C. Ding, Feature Selection Based on Mutual Information:Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005:p.1226-1238.
    [112]Rahman, M., B. Desai, and P. Bhattacharya, A Feature Level Fusion in Similarity Matching to Content-Based Image Retrieval, in 9th International Conference on Information Fusion.2006.
    [113]Stricker, M. and A. Dimai, Spectral covariance and fuzzy regions for image indexing. Machine vision and applications,1997.10(2):p.66-73.
    [114]Wang, J., J. Li, and G Wiederhold, SIMPLIcity:Semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence,2001:p.947-963.
    [115]Eidenberger, H., How good are the visual MPEG-7 features? SPIE & IEEE Visual Communications and Image Processing Conference, Lugano, Switzerland,2003.
    [116]Jeannin, S., Mpeg-7 Visual part of eXperimentation Model Version 9.0. ISO/IEC JTC1/SC29/WG11,2001.3914.
    [117]Chen, Y. and C. Lin, Combining SVMs with Various Feature Selection Strategies. Studies In Fuzziness And Soft Computing,2006.207:p.315.
    [118]Kononenko, I., Estimating attributes:Analysis and extensions of RELIEF, in Machine Learning:ECML-94.1994, Springer, p.171-182.
    [119]Hall, M.A. and U.o.W.D.o.C. Science, Correlation-based feature selection for discrete and numeric class machine learning, in ICML.2000, p.359-366.
    [120]Yu, L. and H. Liu, Feature selection for high-dimensional data:A fast correlation-based filter solution, in ICML.2003.
    [121]Witten, I.H. and E. Frank, Data Mining:Practical machine learning tools and techniques.2005:Morgan Kaufmann Pub.
    [122]Burges, C.J.C., A Tutorial on Support Vector Machines for Pattern Recognition. Data mining and knowledge discovery,1998.2(2):p.121-167.
    [123]Segata, N. and E. Blanzieri, Fast local support vector machines for large datasets, in Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition.2009, Springer.
    [124]Morsillo, N., C. Pal, and R. Nelson, Mining the web for visual concepts, in Proceedings of the 9th International Workshop on Multimedia Data Mining:held in conjunction with the ACM SIGKDD 2008.2008.
    [125]Hsu, W., L. Kennedy, and S. Chang, Video search reranking through random walk over document-level context graph, in Proceedings of the 15th international conference on Multimedia.2007, ACM.
    [126]Hofmann, T., Probabilistic latent semantic indexing, in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval.1999. p.50-57.
    [127]Quelhas, P., F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars, A thousand words in a scene. IEEE Transactions on Pattern Analysis and Machine Intelligence,2007.29(9):p.1575-1589.
    [128]Sivic, J., B. Russell, A. Efros, A. Zisserman, and W. Freeman, Discovering object categories in image collections, in Proc. ICCV.2005.
    [129]Sivic, J., B. Russell, A. Zisserman, W. Freeman, and A. Efros, Unsupervised discovery of visual object class hierarchies, in CVPR.2008.
    [130]Monay, F. and D. Gatica-Perez, Modeling Semantic Aspects for Cross-Media Image Indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence,2007:p.1802-1817.
    [131]Griffiths, T. and M. Steyvers, Finding scientific topics. Proceedings of the National Academy of Sciences,2004.101(Suppl 1):p.5228.
    [132]Zhai, C.X., W.W. Cohen, and J. Lafferty, Beyond independent relevance: methods and evaluation metrics for subtopic retrieval, in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval.2003.
    [133]J rvelin, K. and J. Kek 1 inen, Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS),2002.20(4): p.422-446.
    [134]Schmitz, P., Leveraging community annotations for image adaptation to small presentation formats, in ACM Multimedia.2006.
    [135]Chua, T., J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, Nus-wide:A real-world web image database from national university of Singapore, in CIVR.2009.
    [136]Huiskes, M. and M. Lew, The MIR Flickr retrieval evaluation, in ACM International Conference on Multimedia Information Retrieval (MIR'08). 2008.
    [137]Ntoulas, A., M. Najork, M. Manasse, and D. Fetterly, Detecting spam web pages through content analysis, in Proceedings of the 15th international conference on World Wide Web.2006, ACM. p.83-92.
    [138]Turney, P., Mining the Web for Synonyms:PMI-IR versus LSA on TOEFL, in ECML.2001, Springer-Verlag. p.491-502.
    [139]Zhu, Q., S. Avidan, M. Yeh, and K. Cheng, Fast human detection using a cascade of histograms of oriented gradients, in CVPR.2006.
    [140]Everingham, M., L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge (VOC2007) Results Available from: http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.h tml.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700