基于内容的视频检索系统中关键帧提取方法的研究与实现

英文题名：The Research and Implementation of KeyFrame Extraction Methods in Content-Based Video Retrieval System
作者：陶丹
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：基于内容 ; 视频检索 ; MPEG-2 ; DCT ; 镜头分割 ; 关键帧 ; 直方图 ; 子块划分 ; 基于分块理论的IDC关键帧检测法
英文关键词：Content-based ; Video Retrieval ; MPEG-2 ; DCT ; Shot segmentation ; Key Fram ; Histogram ; Block Method ; Improved Block-based IDC key frame extraction method
学位年度：2004
导师：申铉京
学科代码：081203
学位授予单位：吉林大学
论文提交日期：2004-05-01

摘要

随着计算机技术、多媒体技术的发展和信息需求的不断增长，多媒体信息已经成为各类信息系统的主要数据来源形式。计算机所能处理的信息媒体范围迅速扩大，不仅要求数据库和其它信息系统能对图像、视频和声音等媒体进行存储和基于关键字的检索，而且要对多媒体数据的内容进行语义分析，以达到更深的检索层次，从而基于内容的多媒体信息检索应运而生。
    基于内容的多媒体检索，包括图像、视频和音频信息的检索；而基于内容的视频检索是其中一个非常重要的研究领域。所谓基于内容的视频检索就是根据视频数据中的场景、镜头、帧和运动对象以及图像数据中的颜色、纹理、形状等特征在大规模视频数据库中找到满足特定的视觉特征描述的图像的过程。目前，基于内容的视频检索的工作主要集中在识别和描述图像的颜色、纹理、形状、空间关系的基础上，对视频数据进行镜头边界检测、关键帧提取以及故事情节的重构。基于内容的视频检索突破了传统的基于表达式检索的局限，它从图像视频内容中提取信息线索，利用这些内容特征建立索引进行检索，是一种近似匹配。在检索的过程中，它采用相似性匹配的方法逐步求精来获得查询的结果，不断减小查询结果的范围，直到定位到目标。
    本文主要针对基于内容的视频检索系统中关键帧提取技术展开研究。调查了国内外相关领域的研究现状，对现有的基于内容的视频处理、检索方法进行了细致研究与分析。并以MPEG-2压缩视频文件为例，详尽分析了此国际压缩标准中视频序列以及视频数据的层次结构，在MPEG-2压缩视频序列的获取环节作了大量工作。最后针对传统的关键帧提取方法，提出了一种改进的基于分块理论的IDC关键帧提取方法。
    本文完成了以下内容：
    视频序列的分析。

    图像序列的获取。
    亮度直方图的获得及分析。
    传统的视频检索的关键帧提取方法分析、比较和检验。
    改进的关键帧提取方法的提出及相应的实验数据。
    图像序列的获取是进行视频序列关键帧分析和处理的数据来源，因此分析MPEG-2压缩视频流中视频序列结构并从中获得视频图像序列为将来的工作奠定了基础。亮度直方图反映的是图像帧亮度分量的统计信息，本文就是从YUV颜色模型中提取出Y亮度分量来完成亮度直方图的获得及定性分析工作。关键帧的检测与提取是本论文的关键所在。最后针对传统的视频检索的关键帧提取方法（绝对距离法、欧氏距离法、X平方检测法、双域值比较法、子块划分法、I帧DC系数法等）进行实验分析及性能比较，提出了一种改进的基于分块理论的IDC关键帧提取方法。此法针对压缩视频文件检测，获得了较好的检测效果。
     为了测试改进的关键帧提取算法的检索效率，我们采用了查全率（Recall）和查准率（Precision）和检索时间（Retrieval Time）来衡量其优劣。同时在测试模型中，考虑到取材的广泛性和普遍性，我们选取了动画片、电影片断、广告片、科教片等多种类型的视频片断来测试关键帧的检测结果。
    实验证明，压缩视频文件经过完全解码，采用像素域中的关键帧检测方法的检测结果在查准率这项指标上较压缩域方法略优一筹；因为此时检测方法占有丰富完整的图像信息。但要对压缩文件进行大规模解码，造成检测时间较长。对于压缩域方法，如I帧DC系数法和改进的基于分块理论的IDC法，虽然在检测时间方面要优于像素域方法，而对压缩视频文件部分解码获得的图像信息毕竟有限，故查准率指标值略差。但从实验结果来看，压缩域方法在查全率方面表现出很好的性能；即单位时间内的查全百分比高于像素域检测方法。
     本文提出改进的基于分块理论的IDC法，与传统的划分子块检测方法比在查全率和检索时间两方面有一定的进步。它提高了对镜头中心位置物体运动的敏感程度，比较适合新闻纪录片、电影片等局部运动较为剧烈的视频序列。镜头中的关键帧几乎没有遗漏，但会出现少量的冗余。


    但查准率这方面性能还有待进一步改进，因为此方法对帧间差值的变化过于敏感，帧间差的阈值设置不当极容易造成误检。总体来说，改进的基于分块理论的IDC法在一定程度上改进了检索效果。
With the development of computer technology, multimedia technology and increase of the demand of information, multimedia information has become the main resource data of all sorts of information systems. The range of information media expands rapidly which is disposed by computers, not only we require the databases and other information systems can deal with storage and key frames retrieval to the images, video and audio, but also we need to analyze the semanteme of multimedia data to reach deeper retrieval levels. So content-based multimedia information retrieval comes forth.
    Content-based multimedia information retrieval includes images, video and audio retrieval, however, content-based video retrieval is one of the most important research fields. Content-based video retrieval is a process which bases on scenes, shots, frames, moving objects and color, texture, shape characters in the image data in order to find out images that satisfy with given visual characterizations from a huge video database. At present, the major work of content-based video retrieval is concentrating on shot segmentation, key frames extraction and scenario reconstruction through identifying and describing color, texture, shape, spatial relation of image. Content-based video retrieval breaks through the localization of traditional retrieval based on expressions and picks up content information as clues to establish indexes. It is a kind of approximate matching. During the process of retrieval, it adopts the method of resembling matching and decreasing the range of query results to get the more accurate retrieval result.
    This paper is mostly aiming at the technology of key frames extraction and makes deep research into these aspects. It investigates domestic and overseas research status in correlative fields, and makes an aborative research and analysis of the existing methods of content-based video processing and retrieval. Taking example for compressed video file MPEG-2, the paper analyzes detailedly video sequence and layer structure of the video data in the international standard, and dose lots of work on the acquirement of


    compressed image sequence MPEG-2. At last, the paper puts forward an improved method compared with the traditional methods of key frames extraction. The method is an improved block-based IDC key frame extraction method.
    This paper has several parts as follow：
    Analysis of video sequence
    Acquirement of image sequence
    Luminance histogram acquirement and analysis
    Traditional key frames extraction methods analysis, compare, test
    Putting forward an improved key frames extraction method and relevant experimental data
    Acquirement of image sequence is the data source of analysis and process of key frames. So analysis of MPEG-2 compressed video stream and acquirement of image sequence is the basis of subsequent work. Luminance histogram reflects luminance statistical information of images. So we pick up the value of Y, which stands for luminance from the YUV color model to get the luminance histogram of images and make qualitative analysis work. The detection and retrieval of key frames is where the shoe pinches in the paper. At last, the paper makes an experimental analysis and capability comparison aiming at the traditional key frames extraction methods, such as Absolute Distance method, Euclidean Distance method, X Square method, Double Threshold Value method, Block method, I Frame/DC Coefficient method and so on. The paper puts forward an improved block-based IDC key frame extraction method in compressed video sequence and achieves good retrieval effect.
    In order to test the retrieval effect of the improved key frames extraction algorithm, we adopt a testing model with the parameters of Recall、Precision and Retrieval Time. Considering universality and catholicity of video data，we select several video such as cartoon, film snippet, advertisement , science and educational film to check the retrieval result of key frames.
    The experiment proves that we make use of key frame extraction methods in pels fi

引文

[F.Arman 1994] F. Arman, R. Depommier, A. Hsu and M-Y. Chiu. Content-based browsing of video sequences. Proceedings ACM Multi’94. 1994. 97～103
    [1] 马华东. 多媒体技术原理及应用. 清华大学出版社. 2002.8
    [W.Kin 1990] W. Kin. Introduction to Object-oriented Database. The MIT Press Cambridge, Massachusetts.1990
    [Flickner et al 1995] Flickner, M et al. Query by image and video content: the QBIC system, IEEE Computer, 1995,28(9): 23-32
    [Gupta et al 1996] Gupta, A et al, The Virage image search engine: an open framework for image management in Storage and Retrieval for Image and Video Databases IV, Proc SPIE 2670,1996:76-87
    [Feder 1996] Feder, J, Towards image content-based retrieval for the World-Wide Web, Advanced Imaging 11(1), 1996:26-29
    [Pentland et al 1996] Pentland A et al, Photobook: tools for content-based manipulation of image databases, International Journal of Computer Vision 1996,18(3):233-254
    [Smith and Chang 1997] Smith J R and Chang S F, Querying by color regions using the VisualSEEK content-based visual query system, Intelligent Multimedia Information Retrieval(Maybury, M T, ed).AAAI Press, Menlo Park, CA, 1997:23-41
    [Huang et al 1997] Huang, T et al, Multimedia Analysis and Retrieval System(MARS) project, Digital Image Access and Retrieval:1996 Clinic on Library Applications of Data Processing(Heidom, P B and Sandore, B, eds),1997
    [2] 周学海. 基于内容检索的图像数据库系统研究. 博士学位论文. 中国科学技术大学. 1997
    [3] http://media.cs.tsinghua.edu.cn/~lizhao/research_tvfind_cn.php

    [J.K.Wu 1993] J.K.Wu, et al. Facial image retrieval identification and inference system. ACM Multimedia’93. 47～55
    [4] 陶丹，申铉京. 基于内容的图象检索系统的关键技术. 北华大学学报. 2004.1
    [5] 张若英. 基于内容的视频方法的研究及实验系统的开发. 硕士学位论文. 吉林大学. 2002
    [6] 高晶，杨晓明，夏丽英. 计算机出版系统. 印刷工业出版社. 2002. 5
    [7] 戴锋. Visual C++程序设计基础. 清华大学出版社. 2001. 4
    [8] 林通，张宏江，封举富，石青云. 镜头内容分析及其在视频检索中的应用. 软件学报. 2002.13(8)
    [9] 钟玉琢，王琪，赵黎，杨小勤. MPEG-2运动图像压缩编码国际标准及MPEG的新进展. 清华大学出版社. 2002. 3
    [Jerry et al 2000] Jerry D. Gibson, Toby Berger, Tom Lookabaugh, Dave Lindbergh, Richard L. Baker. Digital Compression for Multimedia Principles & Standards [M]. Publishing House of Electronics Industry.2000.8
    [10] 雷方燕. MPEG-2视频文件的播放与编辑实现. 硕士学位论文. 成都电子科技大学. 2001. 2
    [11] 张益贞，刘滔. Visual C++实现MPEG/JPEG编码技术. 人民邮电出版社. 2002.11
    [12] 章毓晋. 图像工程下册—图像理解与计算机视觉. 清华大学出版社. 2001. 9
    [13] 陶丹，申铉京. MPEG-2视频流编解码技术原理及其实现. 北华大学学报. 2004.6
    [14] 于俊清，周洞汝，刘军，蔡波. 基于文字和图像信息提取视频关键帧. 计算机工程与应用. 2002.09:83-85
    [15] 容观澳. 计算机图像处理. 清华大学出版社. 2000.2
    [16] 王保雄，余松煜. 视频检索中的镜头边界检索. 红外与激光工程. 2000.5
    [17] 金红，周源华. 基于内容检索的视频处理技术. 中国图像图形学报. 2000.4
    [Stephen 1994] Stephen W.Smoliar and HongJiang Zhang, Content-based


    Video Indexing and Retrieval. Multimedia 1994, Summer.
    [Arman 1993] Arman F, Hsu A, Chiu M Y. Featre management for large video databases[J].SPIE.1993,1908:2-12
    [Yeo 1995] Yeo B L and Liu B. Rapid scene analysis on compressed
    [18] 杨胜，钟玉琢. 一种从MPEG压缩视频流中提取关键帧的方法. 中国图像图形学报. 2001.3
    [Hampapur 1994] Hampapur A, Jain R, Weymouth T. Digital video segmentation.Proc.2nd ACM Int. Conf. Multimedia.1994.357-364
    [Boreczky 1996] Boreczky J, Rowe L. Comparison of video shot boundary detection techniques.SPIE,1996,2670:170-179
    [Koprinska 2001] Koprinska I, Carrato S. Temporal video segmentation: a survey. Signal Processing: Image Communication. 2001,16(5):477-500
    [H?ynck 2002] H?ynck M, Mayer C, Ohm J R. Application of MPEG-7 descriptors for temporal video segmentation.SPIE.2002,4676: 347-358
    [19] 章毓晋. 基于内容的视觉信息检索. 科学出版社. 2003.5
    [Mehtre 1995] Mehtre B M，Kankanhalli M S，Narasimhalu A D，et al. Color matching for image retrieval. PRL,1995.16,325～331
    [Zhang 1998] Zhang Y J, Liu Z W, He H. Comparison and improvement of color-based image retrieval techniques. SPIE,1998.3312:371-382
    [Patel 1997] Patel N V, Sethi I S. 1997. Video shot detection and characterization for video databases. Pattern Recognition, 30:583-592

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700