基于内容的视频拷贝检测算法研究

英文题名：The Research on Content-Based Video Copy Detection Algorithm
作者：周志立
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：视频拷贝检测 ; 二次分块 ; 时空联合 ; 压缩域 ; 视觉感知 ; DCT系数
英文关键词：Video Copy Detection ; Double Deblocking ; Spatiotemporal ; Compressed Domain ; Visual Perception ; DCT Coefficient
学位年度：2010
导师：杨高波
学科代码：081001
学位授予单位：湖南大学
论文提交日期：2010-03-15
答辩委员会主席：孙星明

摘要

随着数字视频技术的发展,数字视频数据量呈爆炸性增长,因此对数字视频的检索、管理和版权保护的产生了迫切的需求。基于内容的视频拷贝检测(简称视频拷贝检测)技术在视频信息管理、过滤和版权保护等方面有着重要意义。采用视频拷贝检测技术可以快速有效地找出数字视频的拷贝进行版权保护,对视频信息检索得到的结果进行过滤和排序,或者监测一段数字视频的版权和播出状况等。近年来,视频拷贝检测成为一个新兴的前沿研究领域。
     本文分析现有视频拷贝检测算法存在的问题,在像素域和压缩域两个方面对视频拷贝检测技术展开研究,提出了两种具有较好检测性能的算法。论文的主要工作如下：
     第一,介绍视频拷贝检测的系统结构和拷贝视频的种类；对现有的视频拷贝检测算法的研究现状、研究方向和研究重点进行了阐述；介绍视频镜头检测、关键帧提取、视频特征提取和特征匹配等视频拷贝检测的关键技术。
     第二,为了解决分块数和对添加边框攻击鲁棒的矛盾,本文提出一种二次分块的时空联合方法。本方法首先将查询视频和测试视频序列的每帧图像分别分成2×4块和4×2分块两种模式,在两种模式下分别提取排序特征向量并计算特征相似度,取较大相似度作为计算空域特征相似度。接着,联合时域特征相似度得出视频相似度值。最后,根据视频相似度进行视频匹配。由实验结果可知,该算法不仅保证针对添加边框和多种拷贝行为鲁棒性好并具有较高检测精度。
     第三,提出一种结合视觉感知的压缩域视频拷贝检测算法。为了达到快速检测的目的,通过直接对DCT域提取视频特征并建立索引进行二级匹配；在特征提取时充分考虑人眼视觉感知的特性,根据人眼视觉对空域区域和频域的敏感度不同,分别提取Ⅰ帧的结合视觉感知的DC系数和AC系数排序特征,从而保证算法抵抗拷贝攻击能力更强。实验结果表明,与现有的相关算法相比,该算法不仅检测精度高,针对各种拷贝攻击鲁棒性强,并显著提升了检索速度。
With the development of digital video technology, the data of digital video increases exponentially, therefore, there is an urgent need of the indexing, managing and protecting intellectual property rights (IPR) of digtal video. The technology of content-based video copy detection (called video copy detection for short) plays an important role in the aspects of video retrieval, video filter and protecting IPR. By using the technology of video copy detection, we can effectively find out the copy of digital video to protect intellectual property rights, percolate and sort the results of video retrieval, or supervise the copyrights of a digital video and its broadcasting conditions. Recently, video copy detection technology becomes a focus research.
     Analyzing the problems of present video copy detection algorithm, this essay makes researches on video copy detection technology from the two aspects of pixel domain and compressed domain, and proposes two effective methods. The major work of this essay are:
     (1) Firstly, the paper introduces video copy detection technology and some kinds of copy video; Secondly, describes the present, direction, key points of the technique. Third, describes the technical of shot detection key frame and the feature of key frame extracting and video matching.
     (2) In order to solve the contradiction between the number of subblocks and the robust of letter-box and pillar-box formats conversions, this essay employs the method of double deblocking based Spatiotemporal video copy detection. Fistly, the frames of video sequence are divided into two models of 2×4 and 4×2; Under these two models, the method of spatiotemporal is employed to calculate the similarities of spatio features between original video and test video and uses the higher similarity as spatio feature similarity. And then, it mesures the video similarity by combining spatio feature similarity with temporal feature similarity. The results of the experiment show that this method not only ensures the robust of letter-box, pillar-box formats and other formats conversions, and also has high precision.
     (3) It proposes a method of compressed domain video copy detection based upon visual Perception. In order to improve the speed of the detection, it directly extracts feature from dct domain, and uses the two-level hierarchical detection scheme with creating index to reduce the time of process. In the process of extracting feature, by the different human sensitivity of different areas and frequency, extracts the ordinal feature of DC and AC coefficients. The experiment results show, compared with the previous algorithm, the algorithm can enhances the robustness of multiple copy video attack and improves the detection speed obviously with the higher detection precision.

引文

[1]S. C. S. Cheung, A. Zakhor. Estimation of Web video multiplicity. In:Proc. of Int. Soc. Opt. Eng. USA,2000,34-46
    [2]X. S. Hua, X. Chen, H. J. Zhang. Robust video signature based on ordinal measure. In:Proc of 2004 International Conference on Image Processing. Piscataway,2004,685-688
    [3]S. S. Cheung, A. Zakhor. Efficient video similarity measurement with video signature. IEEE Transactions on Circuits and Systems for Video Technology, 2003,13(1):59-74
    [4]S. S. Cheung, A. Zakhor. Fast similarity search and clustering of video sequences on the world-wide-web. IEEE Transactions on Multi-media,2005, 7(3):524-537
    [5]R. Mohan. Video sequence matching. In:Proc. of Int Conf Audio, Speech and Signal Processing. Jan,1998,3697-3700
    [6]R. B. Yates, B. Ribeiro. Modern Information Retrieval. Wesley:ACM Press, 1999,35-40
    [7]P. Over. TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/ projects/trecvid/,2010-03-01
    [8]M. R. Naphade, M. M. Yeung, B. L. Yeo. Novel scheme for fast and efficient video sequence matching using compact signatures. In:Proc. of Int. Soc. Opt. Eng. USA,2000,564-572
    [9]W. Xiao, Y. D. Zhang, Y. F. Wu, et al. Invariant visual patterns for video copy detection. In:Proc. of ICPR 2008 19th International Conference on Pattern Recognition. Piscataway,2008,4-8
    [10]A. Joly, O. Buisson, C. Frelicot. Content-based copy retrieval using distortion-based probabilistic similarity search. IEEE Transactions on Multimedia,2007,9(2):293-306
    [11]凌贺飞,陈勇,邹复好等.基于Harris兴趣点区域的图像拷贝检测算法.计算机应用,2009,46(1)：244-249
    [12]C. Kim, B. Vasudev. Spatiotemporal sequence matching for efficient video copy detection. IEEE Transactions on Circuits and Systems for Video Technology,2005,15(1):127-132
    [13]D. N. Bhat, S. K. Nayar. Ordinal measures for image correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(4) 415-423
    [14]赵玉鑫,刘光杰,戴跃伟等.基于局部排序的视频拷贝检测.计算机辅助设计与图形学学报,2009,21(9)：1339-1343
    [15]靳延安.基于内容的视频拷贝检测研究.计算机应用,2008,28(8)：2021-2023
    [16]张勇东,张冬明,郭俊波等.压缩域快速视频拷贝检测算法.通信学报,2009,30(3)：135-140
    [17]Z. H. Xu, H. F. Ling, F. H. Zou, et al. Fast and robust video copy detection scheme using full DCT coefficients. In:Proc. of 2009 IEEE International Conference on Multimedia and Expo. Piscataway,2009,434-437
    [18]J. M. Jang, G. C. Feng. The spatial relationship of DCT coefficients between a block and its sub-blocks. IEEE Transactions on Signal Processing,2002,50(5): 1160-1169
    [19]B. J. Davis, S. H. Nawab. The relationship of transform coefficients for differing transforms and/or differing subblock sizes. IEEE Transactions on Signal Processing,2004,52(5):1458-1461
    [20]Y. Rui, T. S. Huang, S. Mehrotra. Constructing table-of-content for videos. Multimedia Systems,1999,7(5):359-368
    [21]N. Adami, R. Leonardi Indentification of editing effect in image sequences by statistical modeling. In:Proc. of Symposium on Picture Coding. Portland, 1999,611-612
    [22]E. Arman, R. Depommier, A. Hsu, et al. Content-based browsing of video sequences. In:Proc. of ACM Multimedia'94. New York,1994,97-103
    [23]J. M. Gauch, S. Gauch, S. Bouix, et al. Real time video scene detection and classification. Information Processing and Management,1999,35(3):381-400
    [24]Y. Yusoff, J. Kittler, W. Christmas. Combining multiple experts for classifying shot changes in video sequences. In:Proc. of IEEE International Conference on Multimedia Computing and Systems. Los Alamitos,1999, 700-704
    [25]M. R. Naphade, R. Mehrotra, A. M. Ferman, et al. A high-performance shot boundary detection algorithm using multiple cues. In:Proc. of 1998 International Conference on Image Processing. Los Alamitos,1998,884-887
    [26]A. Aner, J. R. Kender. A unified memory-based approach to cut, dissolve, key frame and scene analysis. In:Proc. of IEEE International Conference on Image Processing. Thessaloniki,2001,370-373
    [27]叶楠,李捷,郑志航.一种MPEG域上的快速缓变效果场景分割算法.上海交通大学学报,2001,35(1)：34-36
    [28]叶楠,欧智坚,郑志航.一种MPEG压缩域上的快速场景分割算法.通信学报,1999,20(6)：45-49
    [29]R. M. Bolle, B. L. Yeo, M. M. Yeung. Content-based digital video retrieval. In: Proc. of International Broadcasting Convention. London,1997,160-165
    [30]B. L. Yeo, B. Liu. On the extraction of DC sequence from MPEG compressed video. In:Proc. of International Conference on Image Processing. Los Alamitos,1995,260-263
    [31]杨胜,钟玉琢.一种从MPEG压缩视频流中提取关键帧的方法.中国图像图形学报,2001,6(3)：254-258
    [32]周小明,李风亭.一种新的基于运动矢量的MPEG VIDEO码流cut检出算法.中国图像图形学报,1999,4(4)：323-326
    [33]卢汉清,孔维新.基于内容的视频信号与图像库检索中的图像技术.自动化学报,2001,27(1)：56-59
    [34]朱兴全,薛向阳,吴立德.一种自动门限选取的视频Shot分割方法.计算机研究与发展,2000,37(1)：80-85
    [35]苏爱民.数字视频镜头检测研究：[西北工业大学硕士学位论文].西安：西北工业大学计算机学院,2005,12-13
    [36]B. Shahraray, D. C. Gibbon. Automatic generation of pictorial transcripts of video programs. In:Proc. of Int. Soc. Opt. Eng. USA,1995,512-518
    [37]B. Gunsel, A. M. Tekalp, P. J. L. Van Beek. Content-based access to video objects:temporal segmentation, visual summarization, and feature extraction. Signal Processing,1998,66(2):261-280
    [38]W. Wolf. Key frame selection by motion analysis. In:Proc. of 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings. New York,1996,1228-1231
    [39]顾志伟.面向结构化数据的视频检索研究： [中国科学技术大学].合肥：中国科学技术大学计算机科学与技术学院,2008,17-18
    [40]章毓晋.基于内容的视觉信息检索.北京：科学出版社,2003,1-494
    [41]D. P. Huttenlocher, G. A. Klanderman, W. J. Rucklidge. Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence,1993,15(9):850-863
    [42]菅云峰,胡勇,李介谷.基于Hausdorff距离的图像匹配技术.红外线激光工程,1998,27(4)：22-25
    [43]彭宇新,C.W. Ngo,董庆杰等.一种通过视频片段进行视频检索的方法.软件学报,2003,4(8)：1409-1417
    [44]J. Kim, J. Nam. Content-based video copy detection using spatio-temporal compact feature. In:Proc. of International Conference on Advanced Communication Technology, ICACT. Phoenix Park,2009,1667-1671
    [45]J. R. Smith, S. F. Chang. Transform features for texture classification and discri mination in large image databases. In:Proc. of IEEE Internatinoal. Austin,1994,407-411
    [46]R. Reeves, K. Kubik, W. Osberger. Texture characterization of compressed aerial images using DCT coefficients. In:Proc. of Int. Soc. Opt. Eng. USA, 1997,398-407
    [47]D. G. Sim, H. K. Kim, R. H. Park. Fast texture description and retrieval of DCT-based compressed images. Electronics Letters,2001,37(1):18-19
    [48]N. Jayant, J. Johnston, R. Safranek. Signal compression based on models of human perception. Proceedings of the IEEE,1993,81(10):1385-1422

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700