基于内容的视频搜索结果优化

英文题名：Content Based Refinement on Video Search Result
作者：张鹿
论文级别：硕士
学科专业名称：模式识别与智能系统
中文关键词：视频搜索结果优化 ; 重排序 ; 基于内容的视频搜索 ; 粒子群优化 ; 视频质量评估 ; 拷贝检测 ; 早停止策略
英文关键词：refinement on video search result ; visual reranking ; content based video search ; particle swarm optimization ; video quality assessment ; video copy detection ; early-stopping mechanic
学位年度：2010
导师：周荷琴
学科代码：081104
学位授予单位：中国科学技术大学
论文提交日期：2010-05-01

摘要

随着数码摄录设备的普及和多媒体技术的发展,互联网上视频数据量迅速增长。如何有效地对互联网上的视频数据进行存储、组织和管理,已成为视频领域最主要的课题。目前,主要的视频搜索引擎都采用基于文本的策略提供视频搜索服务,如Google Video Search、Yahoo Video Search、Bing Video Search和百度视频搜索。其算法的基本原理,是获取互联网上视频周围的文本信息,然后使用文本搜索的框架处理这些信息,进而实现视频的搜索和排序。然而,基于文本的视频搜索策略存在以下的缺陷:大量视频数据的内容和其周围的文本信息不一致;视频数据包含的复杂视听觉信息量不能用文字完全表达。
     为了解决基于文本的视频搜索的缺陷,学术界提出对搜索结果进行优化。视频搜索结果的优化,是指在文本搜索的结果的基础上,通过挖掘视频数据包含的视觉内容信息,对原始搜索结果重新排序,从而得到更优的搜索结果,其根本目的是提高搜索结果的质量以及搜索过程中的用户体验。
     本文从三个层次研究了基于内容的视频搜索结果优化,即相关性重排序、视频质量评估和视频拷贝检测。这三个层次组成了对视频搜索结果进行优化的一个相对完整的框架:
     (1)为了对原始结果的整体相关性进行优化,本文研究了基于自适应粒子群算法的视频搜索重排序,并深入分析了重排序问题的本质。与传统的重排序方法不同,本文将重排序定义成一个群体进化的过程,并充分利用进化中每一个个体学习到的知识来指导群体进化的方向。
     (2)为了对原始搜索结果的质量进行控制,本文研究了基于内容的视频质量评估,并分析了影响视频整体视觉质量的几个关键因素。和传统的基于信号降损理论视频质量评估系统不同,本文给出的评估系统不需要参考源视频。
     (3)为了去除视频搜索结果里的大量重复视频,减少搜索结果中的冗余视频,本文研究了基于时空组合特征的视频拷贝检测。本文分析了互联网视频拷贝的主要类型,并指出要确保检测算法的高效性才能将算法应用于实际中。算法采用粗略匹配和精确匹配相结合的2级匹配框架,并在每一级中都加入了早停止策略,在保证检测精度的同时,大大提高了检测速度。
With the rapid development in multi-media technology and the easy access to digital record device, the internet is witnessing online video-data explosion. Online video storage, organization and management has became one of the most cutting-edge topic in video domain. Currently, the hottest video search engine in the world, including Google Video, Yahoo Video, Bing Video and Baidu Video, tend to utilize query-by-keyword scenario for video annotation, mining the surrounding text of video data for video search. The main assumption for this scenario is that information retrieval theory in web-page domain can be directly applied to video domain. However, this mechanic faces two disadvantages: a large amount of online video present contradicting content level information against surrounding text; video data is too multi-informative to be described by single text.
     To overcome the disadvantages of text-based video search technology, the research community is intended to apply content based refinement on video search result. Content based refinement means to refine the ranking list by mining content information through analysis, based on the original text search result. The main reason for content based refinement is to present better user experience and optimize the over-all satisfaction of video search.
     This desertion conduct video search refinement in three domain: relevance aimed reranking, video quality assessment and copy detection, which contributes to a integrated system for the optimization of text search result:
     (1) This desertion studies visual reranking via adaptive particle swarm optimization, to upgrade the overall relevance of search result. Compared with traditional reranking methods, this approach models reranking as a swarm intelligence based evolutionary optimization process. The essence of reranking is studied in this desertion.
     (2) This desertion studies content based video quality assessment, to control the overall quality in the return list of video search. Several key factors which are largely responsible for video visual quality are studied one by one. Different from Full-Reference quality assessment and signal theory based assessment, our approach involves no reference video in the assessing process.
     (3) To reduce the redundancy in the search result, this desertion studies content based video copy detection, to effectively detect similar videos in the search result. This desertion studies the main categories of video copy on the internet, indicating the detection efficiency is the key for the practical video search refinement scenario. We apply raw matching and refine matching in our 2-level matching implementation. Early-stopping mechanic is adopted in each of the matching process. Experiments indicate our approach confirms detection precision while largely improves detection efficiency.

引文

黎洪松.1997.数字视频技术及其应用[M].北京:清华大学出版社.
    庄越挺,潘云鹤,吴飞. 2002.网上多媒体信息分析与检索[M].清华大学出版社.
    梅涛. 2006.家用视频内容分析方法研究[M].中国科学技术大学博士学位论文.
    Symeonidis P, Nanopoulos A, Manolopoulos Y. 2004. A unified framework for providing recommendations in social tagging systems based on Ternary semantic analysis[J]. IEEE Transactions on Knowledge and Data Engineering, pp. 1-1, issue 99.
    Liu Y, Mei T, Hua X-S. 2009. CrowdReranking: exploring multiple search engines for visual search reranking[C]. In Proceedings of SIGIR, pp. 500-507.
    Hsu W, Kennedy L.S, Chang S-F. 2006. Video search reranking via information bottleneck principle[J]. In ACM Multimedia, pp. 35-44.
    Tian X, Yang L, Wang, Hua X-S. 2008. Bayesian video search reranking. In ACM Multimedia. Cao Z, Qin T, Liu T-Y, Tsai M-F, and Li H. 2007. Learning to rank: from pair-wise approach to list-wise approach[J]. In ICML, pp. 129-136.
    Liu Y, Mei T, Tang J-H, Hua X-S. 2008. Learning to video search rerank via pseudo preference feedback[J]. In ICME.
    Yan R, Hauptmann A. 2004. Co-retrieval: a boosted rerenking approach for video retrieval. In CIVR, pp. 60-69.
    Luo P, Xiong H, Zhan G, Wu J, Shi Z. 2009. Information-theoretic distance measures for clustering validation: generalization and normalization[J]. IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 21, No. 9, pp. 1249-1262.
    Hsu W.H, Kennedy L.S, Chang S-F. 2007. Video search reranking through random walk over document-level context graph[C]. In ACM Multimedia, pp. 971-980.
    Liu J, Lai W, Hua X-S, Huang Y, Li S. 2007. Video search re-ranking via multi-graph propagation[C]. In ACM Multimedia, pages 208-217.
    Mei T, Hua X-S, Lai W, Yang L, Zha Z-J, Liu Y, Gu Z, Qi G-J, Wang M, Tang J, Yuan X, Lu Z, Liu J. 2007. Msra-ustc-sjtu at trecvid 2007: high-level feature extraction and search[C]. TRECVideo Retrieval Evaluation Online Proceddings.
    Yoshida H, Kawata K, Fukuyama Y, Takayama S, and Nakanishi Y.2002. A particle swarm optimization for reactive power and voltagecontrol considering voltage security assessment[J]. IEEE Transactions on Power System, pp. 1232-1239.
    Tandon , Vipul. 2001. Closing the gap between CAD/CAM and optimized CNC end milling[J]. Master’s thesis Purdue School of Engineering and Tech.
    Zhang X, Hu W, Maybank S, Li X, Zhu M.2008. Sequential particle swarm optimization for visual tracking[C]. In IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1-8.
    TRECVID. 2007. TREC Video Retrieval Evaluation. Http://www-nlpir.nist.gov/projects/trecvid/. Yan R, Hauptmann A, Jin R. 2003. Multimedia search with pseudo-relevance feedback[C]. In CIVR, pages 238-247, 2003.
    Robertson S.E, Walker S, Payne A. 1997. Simple, proven approaches to text retrieval[J]. Cambridge Univ. Computer Laboratory Technical Report TR356.
    Zhu X, Ghahramani Z, Laffery J. 2003. Semi-supervised learning using Gaussian files and harmonic functions[J]. ICML, pages 912-919.
    Zhou D, Bousquet O, Lal T, Weston J, Scholkopf B. 2004. Learning with local and global consistency[J]. In NIPS.
    Feng D, Siu W, Zhang H-J. 2003. Multimedia Information Retrieval and Management. Technological Fundamentals and Applications. Berlin: Springer.
    Mu X. 2006. Content-based Video Retrieval: Does Video’s Semantic Visual Feature Matter?[C]. In Proceedings of ACM SIGIR Conference.
    Zhang H-J. 2002. Content-Based Video Analysis, Retrieval and Browsing[M]. Book Chapter of Readings in Multimedia Computing and Networking. Academic Press.
    Dimitrova N, Zhang H-J, Shahraray B, Sezan M.I, Huang T.S, Zakhor A. 2002. Applications of video content analysis and retrieval[J]. IEEE MultiMedia, 9(3):42-55.
    Carbonell J, Yang Y, Frederking Y, Brown R.D. 1997. Translingual Information Retrieval: A Comparative Evaluation[J]. In IJCAI.
    Iyengar G. 2000. Characterization of unstructured Video[J]. PhD Dissertation, Massachusetts Institute of Technology.
    Ekin A, Tekalp A.M, and Mehrotra R. 2003. Automatic soccer video analysis and summarization[J]. IEEE Trans. on Image Processing, 12(7):796-807.
    Rasheed Z, Sheikh Y, Shah M. 2005. On the use of computable features for film classification[J]. IEEE Trans. on Circuit and System for Video Technology, 15(1):52-64.
    Castelli V , Cover T. 1996. The relative value of labeled and unlabeled samples in pattern Recognition with an unknown mixing parameter[J]. IEEE transactions on Information Theory, vol. 42, 1996.
    Belkin M, Matveeva I, Niyogi P. 2004. Regularization and semi-supervised learning on large Graphs[C]. In Proceedings of Annual Conference on Learning Theory.
    Hua X-S, Lu L, Zhang H-J. 2004. Optimization-based automated home video editingsystem[J]. IEEE Trans. on Circuit and System for Video Technology, 14(5):572-583.
    Zhang H-J, Kankanhalli A, Smoliar S.W. 1993. Automatic partitioning of full-motion video[J]. Multimedia Syst., Vol. 1,No. 1, pp.10-28.
    Mei T, Hua X.S, Zhu C.Z, Zhou H.Q. 2007. Home video visual quality assessment with spatiotemporal factors[J]. IEEE Trans. On Circuits and Systems for Video Technology, 17(6): 699-706.
    George A. 2002. Triantafyllidis, Blocking artifact detection and reduction in compressed data[J]. IEEE Trans. On Circuitsand Systems for Video Technology, 12(10): 877-890.
    Hsu P, Chen B.Y. 2008. Blurred image detection and classification[J]. In: Proc. of the 14th Int. Conf. Multimedia Modeling, pp.277-286.
    Vapnik V. 1995. The nature of statistical learning theory[M]. New York: Springer-Verlag.
    Chang C.C, Lin C.J. 2001. LIBSVM: a library for support vector machines. Software available at http://www.csie.edu.tw/~cjlin/limsvm.
    Wang Z, Sheikh H.R, Bovik A. 2003. Video quality assessment[M]. The Handbook of Video Databases: Design and Applications. Boca Raton, FL: CRC Press: 1041-1078.
    Wu S, Ma Y.F, Zhang H.J. 2005. Video quality classification based home video segmentation[J]. Proc. IEEE Int. Conf. Multimedia & Expo: pp. 217-220.
    Konrad J, Dufaux F. 1998. Improved Global Motion Estimation for N3[M].ISO/IEC JTC1/SC29/WG11 M3096.
    Wu X, Hauptmann A.G, Ngo C-W. 2007. Practical Elimination of Near-Duplicate from Web Video Search[C]. In proceedings of ACM Multi-Media.
    Lienhart R, Effelsberg W. 2000. VisualGREP: A Systematic Method to Compare and Retrieve Video Sequences[J]. Multimedia Tools Appl., vol. 10, no. 1, pp. 47–72.
    Mikolajczyk K, Schmid C. 2003. A Performance Evaluation of Local Descriptors[C]. CVPR: 257-263.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700