基于内容的新闻视频静态摘要技术研究

英文题名：Research on Content-based News Video Summary
作者：纪旭
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：新闻视频 ; 静态视频摘要 ; 关键帧提取 ; 聚类分析 ; 自适应阈值
英文关键词：News Video ; Video Summary ; Key Frame Extraction ; Clustering Analysis ; Adaptive Threshold
学位年度：2008
导师：苏育挺
学科代码：081002
学位授予单位：天津大学
论文提交日期：2008-06-01

摘要

伴随着多媒体技术和网络技术的快速发展,大量的视频信息不断涌现,使得对视频数据的有效管理、浏览和检索成为了亟待解决的问题。与一般的视频数据相比,新闻视频有着特殊的四层结构:视频帧-镜头-故事单元-整个视频。新闻视频的静态摘要技术也因其在视频的浏览、检索和传输方面的应用而受到了广泛地研究。
     论文首先描述了新闻视频的结构以及视频处理中常用的视觉、音频、文本特征以及压缩域特征。然后简要介绍了镜头边界检测、关键帧提取和故事单元分割三种结构分析技术,其中的关键帧提取技术是静态视频摘要的核心。接着,详细介绍了本文提出的三种新的关键帧提取技术,分别为:基于自适应阈值聚类的方法、基于协方差的方法以及基于条件熵的方法。基于自适应阈值聚类的方法利用图像分割技术中的分水岭算法和Otsu算法来计算自适应的阈值。基于协方差的方法以及基于条件熵的方法则是利用了相邻视频之间存在着高度相关性的特点,尽量减少提取的关键帧之间的冗余。实验结果证明了三种方法的有效性。
     为了满足不同人群的观看需要,还提出了一种基于分层的静态视频摘要方法,该方法根据人的主观感受来决定提取关键帧的数量。最后实现了一个基于COM架构的静态视频摘要系统。
With the rapid development of multimedia and network techniques, effective video management, browsing and retrieval become more and more important due to the large amount of information video provided. News video has its own structure: frame-shot-story-video. The technique of news video summary has been widely investigated because of its applications in video browsing, retrieval and transmission.
     In this dissertation, the structure of news video is described and some video analysis features, such as visual feature, audio feature, text feature and MPEG domain feature in video processing are also introduced first. Second, shot boundary detection, news story segmentation and key frame extraction are analyzed. All of them are the key parts of video summary. After that, three new algorithms for key frame extraction are presented, which are based on clustering algorithm with adaptive threshold, based on covariance and based on conditional entropy. The first method uses the watershed algorithm and Otsu algorithm to find the adaptive threshold. The last two method use the character that most video frames are similar to their adjacent ones to reduce the redundant information between key frames. The experimental results prove the effectiveness and efficiency of the three methods.
     In order to satisfy different people’s demand, a video summary method based on hierarchy is proposed, which adjusts the number of key frames according to subjective perception. At last, a COM-based video summary system is devised and implemented.

引文

[1] Wactlar H, Kanade T, Smith M,et al. Intelligent access to digital video: Informedia project. Computer, 1996,29(5): 46~52
    [2] Hauptmann A, Baron V, Chen M. Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video. In: Proceedings of TRECVID workshop on 2003, 2003
    [3] Merlino A, Morey D, Maybury M. Broadcast News Navigation Using Story Segmentation. In: Proceedings of the fifth ACM international conference on Multimedia. Seattle, USA, 1997: 381~391
    [4] Hsu W, Kennedy L, Huang C W, et al. News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada, 2004: III645~III648
    [5] Chang SF, Hsu W, Jiang W, et al. Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction. In: Proceedings of TRECVID workshop on 2007, 2007
    [6] Yanagawa A, Chang SF, Kennedy L,et al. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. USA: Research Report Columbia University, 2007
    [7] Chaisorn L, Chua T S, Koh C K, et al. A Two-Level Multi-Modal Approach for Story Segmentation of Large News Video Corpus. In: Proceedings of TRECVID workshop on 2003, 2003
    [8] Yang H, Chaisorn L, Zhao Y, et al. VideoQA: Question Answering on News Video. In: Proceedings of the eleventh ACM international conference on Multimedia. Berkeley, USA, 2003: 632~641
    [9] Chua TS, Neo SY, Goh HK. TRECVID 2005 by NUS PRIS. In: Proceedings of TRECVID workshop on 2005, 2005
    [10] Connor N, Czirjek C, Deasy S, Marlow S. News story segmentation in the Fischlar video indexing system. 2001 In: International Conference on Image Processing. Thessaloniki, USA, 2001: 418~421
    [11] Srinivasan S, Petkovic D, Ponceleon D. Towards robust features for classifying audio in the CueVideo system. In: Proceedings of the seventh ACM international conference on Multimedia, Orlando, USA, 1999: 393~400
    [12] Liu Z, Huang Q. Classification of audio events in broadcast news. In: IEEE Second Workshop on Multimedia Signal Processing. Redondo Beach, USA, 1998: 364~369
    [13] Liu Z, D Gibbon, E Zavesky, etr al. AT&T research at trecvid 2006. In: Proceedings of TRECVID workshop on 2006, 2006
    [14] Snoek CGM, Worring M , Geusebroek JM, et al. The MediaMill TRECVID 2004 semantic video search engine. In: Proceedings of TRECVID workshop on 2004, 2004
    [15] Hanjalic A, Li Qunxu. Affective video content representation and modeling. IEEE Transactions on Multimedia, 2005, 7(1), 143~154
    [16]谢毓湘,栾悉道,吴玲达,等,NVPS:一个多模态的新闻视频处理系统,情报学报,2004,23(4):404~409
    [17]庄越挺,潘云鹤,吴飞,网上多媒体信息分析与检索,北京:清华大学出版社,2002
    [18]姜帆,章毓晋,新闻视频的场景分段索引及摘要生成,计算机学报,2003,26(7):859~865
    [19] Ma Y F, Zhang H J. A model of motion attention for video skimming. Proceeding of IEEE International Conference on Image Processing, Rochester, New York, 2002: 129~132
    [20]程文刚,须德,蒋轶玮,等,一种新的动态视频摘要生成方法,电子学报,2005,33(8):1461~1466
    [21]冀中,新闻视频内容分析技术研究:[博士学位论文],天津;天津大学,2008
    [22] M Flickner, H Sawhney, W Niblack, et al. Query by image and video content: the QBIC system. Computer, 1995, 28(9): 23~32
    [23]王炳锡,屈丹,彭煊,实用语音识别基础,北京:国防工业出版社,2005
    [24]张春田,苏育挺,张静,数字图像压缩编码,北京:清华大学出版社,2006
    [25] Wang H, Divakaran A, Vetro A, et al. Survey of compressed-domain features used in audio-visual indexing and analysis. Visual Communication and Image Representation, 2003, 14: 150~163
    [26] Wang H, Divakaran A, Vetro A, et al. Survey of compressed-domain features used in audio-visual indexing and analysis. Visual Communication and Image Representation, 2003, 14: 150~163
    [27] Srinivasan U, Pfeiffer S, Nepal S, et al. A Survey of MPEG-1 audio, video and semantic analysis techniques. Multimedia Tools and Applications, 2005, 27(1): 105~141
    [28]刘艳,李宏东,DCT域图象处理和特征提取技术,中国图象图形学报,8A(2),2003:121~128
    [29] Feng J, Lo KT, Mehrpour H. Scene change detection algorithm for MPEG video sequence. In: International Conference on Image Processing. Lausanne, Switzerland, 1996: 821~824
    [30] Westerveld T, Vries A, Jong F. Workshop on the evaluation of multimedia retrieval. ACM SIGIR Forum, 39(1), 2005: 34~366
    [31]钱刚,曾贵华,典型视频镜头分割方法的比较,计算机工程与应用,2004(32):51~55
    [32] Kraaij W, Smeaton AF, Glasnevin D. TRECVID 2004 - an overview. In: Proceedings of TRECVID Workshop on 2004, 2004
    [33] TDT-4 corpus annotation specification. http://projects.ldc.upenn.edu/TDT4/Annotation/annot_task_def_V1.4.pdf, 2007,11
    [34] Taniguchi Y, Akutsu A, Tonomura Y, et al. An intuitive and efficient access interface to real-time incoming video based on automatic indexing. In: Proceeding of ACM Multimedia 1995, San Francisco, USA,1995:25~33
    [35] Mills M. A magnifier tool for video data. In: Proceeding of ACM Human Computer Interface, Monterey, USA, 1992: 93~98
    [36] Shahraray B. Multimedia information retrieval using pictorial transcripts. Handbook of Multimedia Computing. CRC Press, Boca Raton, FL, 1999: 345~359
    [37] Zhang H, Wu J, Zhong D, et al. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 1997, 30(4): 643~658
    [38] Wolf W. Key frame selection by motion analysis. In: Proceeding of IEEE International Conference Acoustic, Speech and Signal Proceeding. Atlanta, USA,1996:1228~1231
    [39] Liu T, Zhang HJ, Qi F. A novel video Key-Frame-Extraction algorithm based on perceived motion energy model. IEEE Transactions on Circuits and Systems for Video Technology, 2003, 13(10): 1006~1013
    [40]于俊清,周洞汝,刘军,等,基于文字和图像信息提取视频关键帧,计算机工程与应用,2002(9):83~85
    [41] Zhuang YT, Rui Y, Huang TS, et al. Adaptive key frame extraction using unsupervised clustering. In: IEEE International Conference on Image Processing. Chicago, USA, 1998: 886~870
    [42]王方石,须德,吴伟鑫,基于自适应阈值的自动提取关键帧的聚类算法,计算机研究与发展,2005,42(10):1752~1757
    [43]张婵,高新波,姬红兵,视频关键帧提取的可能性C-模式聚类算法,计算机辅助设计与图形学学报,2005,17(9):2040~2045
    [44] Ma WY, Zhang HJ. Content-based image indexing and retrieval. The Handbook of Multimedia Computing. CRC Press LLC, 1999: Chapter11
    [45]姚敏,数字图像处理,北京:机械工业出版社,2005
    [46]程文刚,视频内容结构化与摘要的研究:[博士学位论文],北京;北京交通大学,2005

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700