基于内容的视频检索系统研究

英文题名：Research in Content-Based Video Retrieval System
作者：师鸣若
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：视频检索 ; 镜头分割 ; 模板匹配 ; 文字识别 ; 帧
英文关键词：video retrieval ; video segmentation ; templates matching ; text detection ; frame
学位年度：2003
导师：胡涛 ; 刘昉
学科代码：081002
学位授予单位：西安理工大学
论文提交日期：2003-03-01

摘要

随着视频信息的大规模引入，基于关键字的传统检索技术已不能适应人们的需要，近年来，基于内容的检索成为研究的热点。本课题分析了现有的视频检索理论框架，并对传统的文字识别算法加以改进，实现了视频帧中复杂背景下的实时文字检测与数字识别。对视频中文字的丰富的语义信息加以分析，在镜头分割、场景聚类的基础上形成镜头描述的特征空间，建立了以文字信息为主要特征的视频检索系统。
     本课题以篮球比赛的视频片断为例，主要进行了以下几个方面的研究：1)针对视频中的“闪动”和“急跳”现象，提出了“二级镜头分割算法”，解决了镜头边缘的误识别问题。2)对所检测出的视频片断或整个视频流进行播放。3)基本解决了复杂背景下文字检测与定位问题，提高了数字的识别速度与正确率。4)文字区域的检测分为固定区域和非固定区域的文字检测，本文采用“双阈值窗口检测法”算法讨论了对固定区域的探测。5)使用改进了的模板匹配方法实现对印刷体文字进行识别，在传统的模板匹配基础上，提出了“二值化掩码模板”和“三灰度加权匹配”的算法。
     本系统以Microsoft Window 2000 professional为操作平台，采用Microsoft Visual Studio C++6.0为工具开发出"NBA篮球比赛视频检索系统”(简称NBVRS)。该系统采用模块化设计，层次分明，界面友好，识别正确率高，且基本满足了用户

     西安理工大学硕士学位论文
    实时性检索的要求。同时，该软件还具有视频的括放功能。最后对系统所采用的算
    法进行了实验验证，并指出了后续研究的方向。
     本课题在原有的视频检索框架上对视频片段的语义内容加深了理解，基本达到
    依靠比分来进行播放的目的，虽然与实用阶段还有距离，但对于其它研究有着借鉴
    意义，并具有良好的应用前景。
As the video information introduced cosmically, the traditional retrieval technology based on key words can not achieve the satisfying goals, instead, content-based video retrieval system has been the focus of study on video in recent years. The subject makes an analysis on the theory frame of nowadays video retrieval, add the improvements to the traditional algorithms of printed character recognition, and realize the real-time text detection and character recognition on the complex background. The paper Analyzes the fruitful information of the text in video, develops the symbolic space to describe the shot based on the video segmentation and scenes grouping, and establishes the video retrieval system characterized the text information.
    The study on query system using the basketball video fragments as an example is involved in such areas: first, in the condition of "video-flash" and "jump" phenomenon, the "two-level video segmentation algorithm" solves the mistaken detection of the shot edge; second, the part or the whole video as the result of detection can be well played; third, the text on the complex background can be detected and fixed, not only the speed but also the right proportion have been improved; forth, the text can be divided into "regular detection" and "irregular detection" . The "double threshold



    window detection algorithm " is applied in the former, and another uses the progress of "crude detection algorithm " and "delicate detection" ; fifth, using the improved templates matching method to realize the printed character recognition. "binary mask templates matching" and "tri-grayscale matching" algorithms are introduced.
    "NBA video retrieval system" has been developed using the Microsoft Visual Studio C++6. 0 under "Microsoft Window 2000 professional" . The system is designed on the module structure, so the arrangement is clearly demarcated, and the interface is handled friendly. It can realize highly right rate of recognition, and satisfied the user's real-time retrieval. At last, the algorithm is verified, and the later research directions are also pointed out.

引文

(1) 王娣，黄春毅．基于内容的视频检索．现代图书情报技术，2000年，总第86期：71-72．
    (2) 卢汉清．图像视频信号的浏览与检索．中国图象图形学报，2000(1)，
    (3) kiu W Z, Huang J.Multimedia Content Analysis Using Audio and Visual Information. IEEE Signal Processing Magazine, 2000,17(6):12-36．
    (4) 张继东，陈都．基于内容的视频检索技术．电视技术，2002(8)总第242期：17-19．
    (5) Jonathan D Courtney.Automatic video indexing via object motion analysis.Pattern Recognition, 1997,30(4):607-625．
    (6) 张洪德，刘雨，唐波．基于内容的视频检索技术研究．电视技术2001(6)总第228期：30-33．
    (7) Hongjiang Zhong, Jiahua Wu, Zhong D I, Smoliar S W, An Integrated System for Content-Based Video Retrieval and Browsing.Pattern Recognition, 1997,30(4): 643-658．
    (8) Tzanetakis G, Cook P.Sound Analysis Using MPEG Compressed Audio.In: Proc IEEE International Conference of Audio; Speech and Signal Processing, Istanbul, USA, 2000,756-761．
    (9) 卢官明．基于内容的图像及视频检索．南京邮电学院学报，2002，22(2)：23-26．
    (10) SURESH K C, VIJAY V R.Generic and fully automatic content-based image retrieval using color.Pattern Recognition Letters,1997, 18: 1233-1240．
    (11) COLOMBO C, DELBIMBOA, PALA P.Semantics in visual information

    retrieval.IEEE Multimedia,1999,6(3):38-53.
    [12]师鸣若，刘昉．基于DirectShow的多媒体实时处理和实例分析．电脑开发与应用，2003．1：1-3．
    [13]Microsoft Corp.DirectX 8.0 Programmer's Reference.http://www.microsoft.com.2000,10.
    [14]Guy Eddon Henry,COM组件编程内幕．北京：北京希望电子出版社，2000：35-254．
    [15]Tekalp A Murat.Digital Video Processing.北京：清华大学出版社，1998：374-386．
    [16]庄越挺，刘骏伟，吴飞，潘云鹤，张引．基于支持向量机的视频字幕自动定位与提取，计算机辅助设计与图形学学报：2002，14(8)：750-753．
    [17]R Lienhart,F Stuber.Automatic text recognition in digital video.In:Proceedings of ACM Multimedia,Boston,1996.11-20.
    [18]王辰，老松杨，胡晓峰．视频中的文字探测．小型微型计算机系统，23(4)，2002：478-481．
    [19]Jiang Gao,Robotics Institute Carnegie Mellon University Pittsburgh, PA,15213.Jie Yang,Interactive Systems Laboratory Carnegie Mellon University Pittsburgh,PA,15213.An Adaptive Algorithm for Text Detection from Natural Scenes.
    [20]Wu, V.R. Manmatha, E.M. Riseman.Textfinder:an automatic system to detect and recognize text in images,IEEE Transactions on Pattern Analysis and Machine Intelligence,21(11):1224-1229,1999.
    [21]Wong, E. K., M. Chen. A robust algorithm for text extraction in color video,Proceedings of IEEE Int. Conference on Multimedia and Expo(ICME2000), 2000.


    (22)U.Gargi, S, Antani, R.E. Woods. Indexing text events in digital video database.International Conference on Pattern Recognition,1998.
    (23)Lienhart, R.Automatic text recognition for video indexing.Proceedings of ACM Multimedia 96,1996:11-20.
    (24)Jie Xi, Xian-Sheng Hua, Xiang-Rong Chen,Liu Wenyin,Hong-Jiang Zhang. A video text detection and recognition system.
    (25)胡宏斌，徐骏，周洞汝．基于COM技术的视频流文字检测。计算机工程27(6)，2001；95-97．
    (26)W u V, Marmatra R, Risam an EM. Automatic Text Detection and Recognition.In Proceedings of Image Understanding Workshop,1997:707-712.
    (27)赵荣椿，赵忠明等．数字图象处理导论．西北工业大学出版社2000年．
    (28)Kenneth R.Castleman著．数字图像处理．电子工业出版社，2002年．
    (29)吴飞，庄越挺，郑科等．基于压缩域特征话者识别的电视节目分类检索，15(1)，2002：21-26．
    (30)沈亮，程乾生．一种新的文字细化算法．模式识别与人工智能，10(3)，1997：232-237．
    (31)吴佑寿，丁晓青．汉字识别原理方法与实现．高等教育出版社，1992年．
    (32)R. Lienhart, F. Suber. Automatic text recognition for video indexing.SPIE conference on image and video processing,Jan,1996.
    (33)李小平等．车辆牌照识别系统可靠性问题的研究．北京理工大学学报，21(1)，2001：11-14．


    [34]Papamarkes N, Gatos B. A new approach for multilevel threshold seclection.CVGIP:Graphical Models and Image Process,1994,56(5):357-370.
    [35]R Lienhart, F Stuber. Automatic text recognition in digital videos.In:Proceedings of ACM Multimedia, Boston, 1996:11-20.
    [36]Granbieri M N,Stabile F, Comelli P.Recognition of Motor Vechicle License Plates in Rear-View Images.In:Sanz J L C, ed. Image Technology. Coordinated Sience Laboratory, University of Illineis at Urbana.USA,1993,231-251.
    [37]魏武，黄心汉等．基于模板匹配和神经网络的车牌字符识别方法．模式识别与人工智别，2001，14(1)：123-126．
    [38]Setchell J. Applications of Computer Vision to Road-Traffic Monitoring. PH.D Thesis. University of Bristol, England,1997,66-81.
    [39]Salam F M A.Learning Algorithms for Artificial Neural Nets for Analog Circuit Implementation.In: Proceedings of 22nd Symposium on the Interface,East Lansing,Michigan,USA, 1990, 167-178.
    [40]黄德双．神经网络模式识别系统理论，北京：电子工业出版社，1996，46-48．
    [41]Zhang H J, WuJianhua, Zhong Dietal. An integrated system for content-based video retrieval and browsing.Pattern Recognition,1997,30(4):643～657.
    [42]Edoardo Ardizzone,Marcola Cascia.Automatic video database and retrieval.Multimedia Tools and Applications,1997,4:29～56.
    [43]Haitao Jiang, AbdelSalam HelM. Scene change detection techniques for video databases systems.Multimedia Systems,1998(6):186～195.
    [44]Zhang H J etal.Video parsing,retrieval and browsing:An

    integrated and content-based solution.In:Proc. of ACM Multimedia'95 San Francisco,1995,15～24.
    [45]Boreczky J S, Rowe L A.Comparison of Video Shot Boundary Detection Techniques.In: Proc SPIE Conf on Vis Commun and Image Proc,Chicago, USA, 1996,234-238.
    [46]庄越挺，吴翌，潘云鹤。视频目录----视频结构化的一种新方法。模式识别与人工智能，12(4)，1999：408-414．
    [47]金红，周源华。用Hausdorff距离进行视频镜头边界检测。电视技术，2000年11月，总第221期：12-14．
    [48]Hampapur A,Jain R,Weymouth T.Digital video segmentation. In:Proc.Second Annual ACM,New York,NY, USA:1994,357-364.
    [49]陆海斌。一种高效的视频切变检测算法．图形图象学报，1994，4(10)：805-809．
    [50]Tat-Seng CHUA,Mohan KANKANHALLI,Yi LIN .A general framework for video segmentation based on temporal temporal multi-resolution analysis. School of Computing,National University of Singapore.
    [51]Toller M S, Lewis, Nixon M S. Video segmentation using combined cues.Proc.SPIE,1997,3312:414～425.
    [52]Patel Nilesh V,SethiIshwar K. Video shot detection and characterization for video databases.Pattern Recognition,1997,30(4):583～592.
    [53]Song S Moon-Ho, KwonTae-Hoon. On detection of gradual scene changes for parsing of video data.SPIE,1997,3312:404～409.
    [54]Adnan M.Alattar. Wipe scene change detector for use with video compression algorithm and MPEG-7.IEEE Transactionson Consumer Electronics,1998,44(1):43～51.
    [55]Hampapur A,Jain R, Weymouth T.Digital video segmentation.

    In:Proc.Second Annual ACM Multimedia Conference and Exposition ACM,New York, NY, USA,1994,357～364.
    (56)Arman F,Hsu A, Chiu M Y.Image processing on compressed video data for large video databases.ACM Multimedia,1993,267～272.
    (57)Srinivasan M V,Venkatesh S,Hosie R.Qualitative estimation of camera motion parameters from video sequence.Pattern Recognition,1997,30(4):593～606.
    (58)Wolf Wayne.Key frames election by motion analysis.In:Proc.of IEEE Int.Conf.On Acoustics,Speech and Signal Processing,ICASSP,Atlanta,1996,7～10.
    (59)Yueting Zhuang,Yong Rui,Huang T S.Adaptive Key Frame Extraction Using Unsupervised Clustering.In:Proc IEEE Int Conf on Image Proc,Chicago,USA,1998,76-81.
    (60)Bilge Gunsel,Tekalp A Murat.Content-based access to video objects:Temporal segmentation,visual summarization,and feature extraction.Signal Processing,1998,66:261～280.
    (61)Michael J Swain.Color indexing.International Journal of Computer Vision,1991,7:11～32.
    (62)Suresh K Choubey,Vijay V Raghavan.Generic and fully automatic content-based image retrieval using color.Pattern Recognition Letters,1997,18:1233～1240.
    (63)Niblack W,Barber R.The QBIC project:querying images by content using color,texture and shape.Proc.SPIE,1993,1908:173～178.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700