视频搜索结果的重排序研究

英文题名：Research on Video Search Reranking
作者：刘媛
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：视频搜索重排序 ; 基于内容的视频搜索 ; 语义分析 ; 全监督学习 ; 半监督学习 ; 优化 ; 概念检测 ; 样本选择 ; 群重排序
英文关键词：video search reranking ; content-based video search ; semantic analysis ; supervised learning ; semi-supervised learning ; optimization ; concept detection ; sample selection ; CrowdReranking
学位年度：2009
导师：吴秀清
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2009-05-04

摘要

互联网中视频数据在近几年呈爆炸式增长并且广泛分布,使得视频搜索成为当前视频研究的重点和热点。由于文本搜索的成功应用,现今通用的大型视频搜索引擎,如Google、Yahoo!、Live、百度等主要还是利用视频数据周围的文本信息,采用基于文本搜索的方法实现视频搜索和排序。然而,视频内容及其所包含的复杂意义通常是语言工具难以完整描述与表达的。为了解决这种原始文本搜索的缺陷,视频搜索结果的重排序逐渐引起了众多研究者的关注。重排序,是指基于原始搜索排序的基础上,通过挖掘数据内在关联、或者借鉴外部知识和人工干预,对原始搜索结果进行重新排序的过程,目的是提高搜索质量和提升用户搜索体验。
     本论文首先提出一种新颖的基于查询独立的学习框架,接着从三个阶段研究了视频搜索结果的重排序中的关键问题,即自重排序(仅从自身挖掘相关知识)、样例重排序(利用用户提供的查询样例)和群重排序(利用从外部搜索引擎的结果中挖掘的知识)。显然这三个阶段涵盖了现今的大部分视觉信息重排序的框架和方法。本文对视频重排序方法进行了深入的研究,主要工作和创新之处归纳为以下几点:
     (1)对于查询独立的学习框架,本文提出了在“查询-镜头”对中学习相关性关系。与传统的查询依赖的学习框架不同,该种方法的训练模型和任何查询没有直接关系,故训练样本在所有的查询中能达到共享,更适用于实际的应用。在这种查询独立的学习框架下,各种机器学习的方法都可以扩张并应用,从而进一步提出了一种基于SVM模型的全监督查询独立的学习方法和一种基于多图模型的半监督的查询独立学习方法。经过大量实验证实,查询独立的学习方法明显优于传统的查询依赖的学习方法,从算法的运算量角度看,查询独立的学习方法也更具有实用性。
     (2)对于自重排序,本文提出一种基于典型性的视频结果的重排序方法。传统的基于学习的重排序方法往往只关心训练样本的相关性或多样性,却忽略了样本的典型性。本文提出在考虑相关性和多样性的同时应兼顾样本的典型性。首先根据样本的概率分布定义视频/图像的典型性,并将样本选择看成一个既考虑样本典型性又兼顾原始搜索结果的优化问题,最后基于选择的高典型性样本并利用SVM构建重排序模型,实验表明该模型具有较好的泛化能力和较强的鲁棒性。
     (3)对于样例重排序,本文提出一种基于查询样例的全监督视频重排序方法。传统的全监督的视频重排序方法常根据经验将重排序问题转化为二类的分类问题,样本完全根据分类的置信度进行排序。文中提出了重排序实际上应是一个优化问题,即一个序列中的任意两个样本都能正确排序即可达到全局最优,而不是简单地区分每一个样本是否相关。在这样的框架下,进一步提出两种重排序算法,即直接重排序和插入重排序。实验证实,新的重排序方法可以较大程度地改进原始的搜索结果,与其他一些经典的重排序方法相比,也具有较大的优势。
     (4)对于群重排序,是本文中提出的重排序问题的新的发展阶段,旨在从互联网中挖掘相关的视觉原型并利用到重排序中。据大量资料的调研,群重排序是首次将互联网中的群包数据应用到搜索结果的重排序当中,与传统的自重排序和样例重排序有显著的不同。首先利用多个搜索引擎返回的结果图像构建一组视觉单词;接着在此视觉单词中挖掘两种视觉原型(显著度和共存性);最终基于该视觉原型将重排序问题转化为一个优化问题,并给出封闭解。实验表明,群重排序对原始搜索结果的提高是较稳定的,与传统的重排序方法相比有较明显的提升。
The explosive growth and widespread accessibility of community-contributed multimedia contents on the Internet have led to surge of research activity in video search.Due to the great success of text search,most popular video search engines, such as Google,Yahoo！,Live and Baidu,build upon text search techniques by using the text information associated with video data.This kind of video search approach has proven unsatisfying as it often entirely ignores the visual contents and human perception on the search results.To address this issue,video search reranking has received increasing attention in recent years.It is defined as reordering video shots based on multimodal cues to improve search precision.
     In this thesis,we first propose a novel query-independent learning based video search framework;then we investigate the key problems of video search reranking in three paradigms:self-reranking,which only uses initial search results; query-example based reranking,which leverages user provided query examples; CrowdReranking,which aims to mine relevant visual patterns from the search results of external search engines.Obviously,such three paradigms cover most of existing reranking framework or approaches.Accordingly,this thesis conducts a deep research on video search reranking,and obtains the following achievements:
     (1) We firstly propose a novel query-independent learning(QIL) framework for video search by investigating relevance from query-shot pairs.Unlike conventional query-dependent learning framework,it is more general and suitable for real-world search applications.Under this framework,we can use various machine learning technologies.Therefore,we further propose a SVM-based(Support Vector Machine) supervised query-independent learning and a multi-graph-based semi-supervised query-independent learning approach.
     (2) For self-reranking,we propose a typicality-based video search reranking. Conventional learning-based approaches to video search reranking only care the relevance or diversity of the selected examples for building the reranking model,while video typicality is usually neglected.In this thesis,we propose to select the most typical samples to build reranking model,considering that typicality indicates the representativeness of each sample,so that more robust ??reranking model could be learned.We first define the typicality score of image/video based on sample distribution,and then formulate the example selection as an optimization scheme that takes into account both the image typicality and the initial ranking order in the initial search results.Based on the selected examples we build the reranking model by using SVM.
     (3) For query-example-based reranking,we present a novel supervised approach to video search reranking with several query examples.Conventional supervised reranking approaches empirically convert the reranking as a classification problem in which each document is determined relevant or not,followed by reordering the documents according to the confidence scores of classification. We argue that reranking is essentially an optimization problem in which the ranked list is globally optimal if any two arbitrary documents from the list are correctly ranked in terms of relevance,rather than simply classifying a document into relevant or not.Under the framework,we further propose two effective algorithm,called straight reranking and insertion reranking,to solve the problem more practically.
     (4) For CrowdReranking,we have proposed a new paradigm for visual search reranking called CrowdReranking,which is characterized by mining relevant visual patterns from image search results of multiple search engines available on.the Internet.To the best of our knowledge,the proposed CrowdReranking represents the first attempt towards leveraging crowdsourcing knowledge for visual reranking.This is a great difference from existing self-reranking and query-example-based reranking.We first construct a set of visual words based on the local image patches collected from multiple image search engines.We then explicitly detect two kinds of visual patterns,i.e.,salient and concurrent patterns,among the visual words.Finally,we formalize the reranking as an optimization problem on the basis of the mined visual patterns and propose a close-form solution.

引文

[1]D.Feng,W.C.Siu,and H.J.Zhang.Multimedia Information Retrieval and Management:Technological Fundamentals and Applications.Berlin:Springer,2003.
    [2]A.Natsev,A.Haubold,J.Te(?)i(?),L.Xie,R.Yan.Semantic Concept-Based Query Expansion and Re-ranking for Multimedia Retrieval.In Proceedings of ACM Multimedia,2007.
    [3]X.Mu.Content-based Video Retrieval:Does Video's Semantic Visual Feature Matter? In Proceedings of ACM SIGIR Conference,2006.
    [4]TRECVID.http://www-nlpir.nist.gov/projects/trecvid/.
    [5]T.Mei,X.-S.Hua,W.Lai,L.Yang,and et al.MSRA-USTC-SJTU at TRECVID 2007:High-Level Feature Extraction and Search.In NIST TRECVID Workshop,2007.
    [6]Y Liu,T.Mei,X.Wu,and X.-S.Hua,Optimizing Video Search Reranking via Minimum Incremental Information Loss.In ACM MIR,2008.
    [7]X.Tian,L.Yang,J.Wang,Y.Yang,X.Wu,and X.-S.Hua.Bayesian Video Search Reranking.In ACM Multimedia,2008.
    [8]Y Liu,T.Mei,X.-S.Hua.CrowdReranking:Exploring Multiple Search Engines for Visual Search Reranking.In ACM SIGIR,2009.
    [9]L.Kennedy,and S,-F Chang.A Reranking Approach for Context-based Concept Fusion in Video Indexing and Retrieval.In International Conference on Image and Video Retrieval,2007.
    [10]Y.Liu,T.Mei,X.-S.Hua,J.Tang,X.Wu,and S.Li.Learning to Video Search Rerank via Pseudo Preference Feedback.In ICME,2008.
    [11]W.Hsu,L.Kennedy,and S,-F Chang.Video Search Reranking Through Random Walk Over Document-Level Context Graph.In ACM International Conference on Multimedia,2007.
    [12]W.Hsu,L.Kennedy,and S,-F Chang.Video Search Reranking via Information Bottleneck Principle.In ACM International Conference on Multimedia,2006.
    [13]M.S.Lew.Principles of Visual Information Retrieval.Springer Verlag,2001.
    [14]J.Carbonell,Y Yang,R.Frederking,R.D.Brown.Translingual Information Retrieval:A Comparative Evaluation.In IJCAI,1997.
    [15]R.Yan,A.Hauptmann,and R.Jin.Multimedia Search with Pseudo-Relevance Feedback.In CIVR,2003.
    [16]J.Te(?)i(?),A.Natsev,L.Xie,and J.-R.Smith.Data Modeling Strategies for unbalanced Learning in Visual Search.In ICME,2007.
    [17]G.Iyengar.Characterization of unstructured Video.PhD Dissertation,Massachusetts Institute of Technology,2000.
    [18]Y.Wang,P.Zhao,D.Zhang,M.Li,and H.-J.Zhang.My Videos-A System for Home Video Management.In Proceedings of ACM International Conference on Multimedia,Pages 412-413,Juan-les-Pins,French,Dec.2002.
    [19]Y.Rui,T.S.Huang,M.Ortega,and S.Mehrotra.Relevance Feedback:A Power Tool for Interactive Content-Based Image Retrieval.IEEE Trans,on Circuits and Systems for Video Technology,8(5):644-655,Sept.1998.
    [20]G.Wang,D.Forsyth.Object Image Retrieval by Exploiting Online Knowledge Resources.In CVPR,2008.
    [21]F.Schroff,A.Criminisi,A.Zisserman.Harvesting Image Databases from the Web.In ICCV,2007.
    [22]Y.Liu,T.Mei,G.-J.Qi,X.Wu,and X.-S.Hua.“Query-Independent learning for Video Search.In ICME,2008.
    [23]A.Natsev,M.Naphade,and J.Te(?)si'c.“Learning the semantics of multimedia queries and concepts from a small number of examples.In ACM International Conference on Multimedia,2005.
    [24]M.Campbell et al.IBM ReSearch TRECVID-2006 Video Retrieval System.In NIST TRECVTD 2006 Workshop,2006.
    [25]H.-J.Zhang,C.Y.Low,and S.W Smoliar.Video Parsing and Browsing Using Compressed Data.Multimedia Tools and Applications.1995.
    [26]X.Li,D.Wang,J.Li,and B.Zhang.Video Search in concept space:a text-like paradigm.In Proceedings of ACM International Conference on Image and Video Retrieval,2007.
    [27]M.R.Naphade and J.R.Smith.On the detection of semantic concepts at TRECVID.In Proceedings of ACM Multimedia,2004.
    [28]V.Castelli and T.Cover.The relative value of labeled and unlabeled samples in pattern Recognition with an unknown mixing parameter.IEEE transactions on Information Theory,vol.42,1996.
    [29]K.Nigam,A.K.McCallum,S.Thrun,and T.Mitchell.Text classification from labeled and unlabeled documents using EM.Machine Learning,vol.39,2000.
    [30]M.Belkin,I.Matveeva,and P.Niyogi.Regularization and semi-supervised learning on large Graphs,in Proceedings of Annual Conference on Learning Theory,2004.
    [31]X.Zhu,Z.Ghahramani,and J.Lafferty.Semi-supervised learning using Gaussian Fields and Harmonic function.In Proc.20-th International Conference on Machine Learning,2003.
    [32]D.Zhou,O.Bousquet,T.Lai,J.Weston,and B.Scholkopf.Learning with Local and Global consistency,in Proceedings of Advances in Neural Information Processing System,2004.
    [33]Q.Tian,J.Yu,Q.Xue,and N.Sebe.A new analysis of the value of unlabeled data in Semi-supervised learning in Image retrieval.In ICME,2004.
    [34]R.Yan and M.Naphade.Semi-supervised cross feature learning for semantic concept Detection in Videos,in Proceedings of International Conf.on Computer Vision and Pattern Recognition,2005.
    [35]J.R.He,M.J.Li,H.J.Zhang,H.H.Tong and C.S.Zhang.Manifold-ranking based Image retrieval.In ACM Multimedia,2004.
    [36]X.Yuan,X.-S.Hua,M.Wang,and X.Wu.Manifold-Ranking Based Video Concept Detection on Large Database and Feature Pool.In ACM Multimedia,2006.
    [37]F.Wang,C.Zhang.Label Propagation Through Linear Neighborhoods.In ICML,2006.
    [38]D.Duong,B.Goertzel,J.Venuto,R.Richardson,S.Bonner,E.Fox.Support Vector Machines to Weight Voters in a Voting System of Entity Extractors,in IJCNN,2006.
    [39]M.R.Naphade,L.Kennedy,J.R.Kender,S.-F.Chang,and J.R.Smith.A Light Scale Concept Ontology for Multimedia Understanding for TRECVID 2005.IBM ReSearch Technical Report,2005.
    [40]S.-E.Robertson and K.Sp(a|¨)rck Jones.Simple,Proven Approaches to Text Retrieval.In Cambridge University Computer Laboratory Technical Report TR356,1997.
    [41]J.Xu,and H.Li.AdaRank:A Boosting Algorithm for Information Retrieval.In ACM SIGIR,2007.
    [42]A.Blum and S.Chawla.Learning from Labeled and Unlabeled Data Using Graph Min-cuts.In Proc.18th International Conference on Machine Learning,2001.
    [43]M.S.Lew,Next Generation Web Searches for Visual Content.IEEE Computer,vol.33,Issue.11.pp.46-53,2000.
    [44]C.Dorai,S.Venkatesh,Bridging the Semantic Gap in Content Management Systems:Computational Media Aesthetics.In Proc.Conf.on Computational Semiotics for Games and New Media,Amsterdam,pp.94-99,2001.
    [45]F.Chung.Spectral Graph Theory.American Mathematical Society,1997.
    [46]B.S.Manjunath,J.R.Ohm,V.V Vasudevan,and A.Yamada,Color and Texture Descriptors,IEEE Transactions On Circuits and Systems for Video Technology,vol.11(6),pp.703-715, 2001.
    [47]O.Bousquet,O.Chapelle,and M.Hein.Measure based regularization.In 17-th Annual Conference on Neural Information Processing Systems,2003.
    [48]O.Chapelle,A.Zien,and B.Scholkopf.Semi-Supervised Learning.MIT Press,2006.
    [49]A.Yanagawa,S.-F.Chang,L.Kennedy,W.Hsu.Columbia university's Baseline Detectors for 374 Lscom Semantic Visual Concepts.In Columbia University ADVENT Technical Report #222-2006-8,March 20,2007.
    [50]Z.-J.Zha,Y.Liu,T.Mei,and X.-S.Hua.Video Concept Detection Using Support Vector Machines-TRECVTD 2007 Evaluations.In Technical Report,MSR-TR-2008-10,2007.
    [51]J.Te(?)i(?),A.Natsev,and J.-R.Smith.Cluster-Based Data Modeling for Semantic Video Search.In CTVR,2007.
    [52]E.Rosch.Principles of categorization.In Cognition and Categorization.E.Rosch and B.Lloyd(eds),pp.27-48,1978.
    [53]J.Tang,X.-S.Hua,G.-J.Qi,Z.Gu,X.Wu.Beyond Accuracy:Typicality Ranking for Video Annotation.In ICME,2007.
    [54]J.Vogel and B.Schiele.Semantic Modeling of Natural Scenes for Content-Based Image Retrieval.In International Journal of Computer Vision,2007.
    [55]B.S.Manjunath,J.Ohm,V.Vasudevan,and A.Yamada.Color and Texture Descriptors.In IEEE Transactions on Circuits and Systems for Video Technology,2001.
    [56]T.M.Cover and T.A.Thomas.The Elements of Information Theory.Plenum Press,New York,1991.
    [57]G.Neil.The Nature of Mathematical Modeling,Cambridge,UK.:Cambridge University Press,ISBN 978-0521-570954,1999.
    [58]M.M.Zloof.Query by example.In Proceedings of National Compute Conference.AFIPS Press,44:431-438,1975.
    [59]T.H.Cormen,C.E.Leiserson,R.L.Rivest,C.Stein.Introduction to Algorithms.The MIT Press,2nd Edition,September,2001.
    [60]N.Tishby,F.C.Pereira,W.Bialek.The Information Bottleneck Method.In Proceedings of 37th Annual Allerton Conference on Communication,Control and Computing,1999.
    [61]Live Video Search.http://www.kumo.com/?scope=video/.
    [62]Yahoo Image Search.http://Image.yahoo.com/.
    [63]Google Image Search.http://Image.google.com/.
    [64]Flickr.http://www.flickr.com/.
    [65]R.Fergus,P.Perona,A.Zisserman.A Visual Category Filter for Google Images.In ECCV, 2004.
    [66]D.Lowe.Object recognition with informative features and linear classification.In ICCV,2003.
    [67]Y.Jing and S.Baluja.PageRank for Product Image Search.In WWW,2008.
    [68]S.Brin,L.Page.The Anatomy of a Large-scale Hyper Textual Web Search Engine,In Computer Networks,30(1-7):107-117,1998.
    [69]K.Jarvelin,J.Kekalainen.IR Evaluation Methods for Retrieving Highly Relevant Documents.In ACM SIGIR,2000.
    [70]J.Sivic,A.Zisserman.Video Google:A Text Retrieval Approach to Object Matching in Videos.In ICCV,2003.
    [71]L.B.Cremeant and R.M.Murra.Stability Analysis of Interconnected Nonlinear Systems Under Matrix Feedback.In Proc.of IEEE Conference on Decision and Control,2003.
    [72]M.Huijbregts,R.Ordelman and F.Jong,Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition.In SAMT,2007,Genova,Italy.
    [73]J.S.Boreczky and L.A.Rowe.Comparison of Video Shot Boundary Detection Techniques,In Proceedings of SPIE Storage and Retrieval for Image and Video Databases,1996.
    [74]R.Lienhart.Comparison of Automatic Shot Boundary Detection Algorithm.In Proceedings of SPIE Storage and Retrieval for Image and Video Databases Ⅶ,1999.
    [75]Y.F.Ma,L.Lu,H.-J.Zhang,and M.Li.A User Attention Model for Video Summarization.In ACM Multimedia,2002.
    [76]M.R.Naphade,J.R.Smith.On the Detection of Semantic Concepts at TRECVID.In ACM Multimedia,2004.
    [77]J.Yuan,J.Li,and B.Zhang.Learning Concepts from Large Scale Imbalanced Datasets Using Support Cluster Machines.In ACM Multimedia,2006.
    [78]Y.Wu,Q.Tian,and T.S.Huang.Dsicriminant-EM Algorithm with Application to Image Retrieval.In CVPR,2000.
    [79]J.R.He,M.J.Li,H.J.Zhang,H.H.Tong and C.S.Zhang.Manifold-ranking based Image retrieval.In ACM Multimedia,2004.
    [80]S.Tong and E.Y.Chang.Support Vector Machine Active Learning for Image Retrieval.In ACM Multimedia,2001.
    [81]J.Yu,J.Amores,N.Sebe,and Q.Tian.Toward Robust Distance Metric Analysis for Similarity Estimation.In CVPR,2006.
    [82]M.Strieker and M.Orengo.Similarity of color Images.In Proceedings of SPIE 2420,Storage and Retrieval for Image and Video Databases,2000.
    [83]J.Xu,T.-Y.Liu,M.Lu,H.Li,W.-Y.Ma,Directly Optimizing Evaluation Measures in Learning to Rank.In ACM SIGIR,2008.
    [84]C.Zhe,T.Qin,T.-Y.Liu,M.-F.Tsai,H..Li.Learning to Rank:From Pairwise Approach to Listwise Approach.In ICML,2007.
    [85]R.Yan,A.G.Hauptmann.Efficient Margin-Based Rank Learning Algorithms for Information Retrieval.In CIVR,2006.
    [86]K.Dun,K.Kirchhoff.Learning to Rank with Partially-Labeled Data.In ACM SIGIR,2008.
    [87]K.Crammer,Y.Singer.Pranking with ranking.In:Advances in Neural Information Processing Systems,2002.
    [88]R.Herbrich,T.Graepel,K.Obermayer.Large Margin Rank Boundaries for Ordinal Regression.In:Advances in Large Margin Classifiers,2000.
    [89]D.Tao,X.Li,and S.Maybank.Negative Samples Analysis in Relevance Feedback.In IEEE Transactions on Knowledge and Data Engineering,19(4):568-580,2007.
    [90]D.Tao,X.Tang,X.Li,and Y.Rui.Direct Kernel Biased Discriminant Analysis:A New Content-based Image Retrieval Relevance Feedback Algorithm.In IEEE Transactions on Multimedia,8(4):716-727,2006.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700