基于多模态信息的新闻视频内容分析技术研究

英文题名：Research on News Video Content Analysis Based on Multimodality Information
作者：冀中
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：新闻视频 ; 视频内容分析 ; 主持人镜头检测 ; 视频镜头分类 ; 新闻故事单元分割 ; 字幕提取 ; 多模态信息融合
英文关键词：News Video ; Video Content Analysis ; Anchorperson Shot Detection ; Video Shot Classification ; News Story Segmentation ; Caption Extraction ; Multimodality Information Fusion
学位年度：2007
导师：张春田
学科代码：081002
学位授予单位：天津大学
论文提交日期：2007-12-01

摘要

对视频数据的有效处理、浏览、检索和管理正伴随着视频数据的快速增长而成为亟待解决的现实问题。视频内容分析技术旨在将非结构化的视频数据结构化,并提取其中的语义内容,构建低层特征到高层语义之间的桥梁,最终建立视频的摘要、索引和检索等应用系统,提供给用户方便的视频内容获取方式。
     本论文以新闻视频为研究对象,以音频、字幕、视觉等多模态信息及其有效融合为研究手段,以模式识别理论中的相关模型为工具,对视频内容分析技术展开了较为深入的研究。主要贡献包括以下三个方面:
     (1)提出了一种新颖的基于MPEG压缩域的主持人镜头快速检测算法。其中,在预处理部分,引入了一种改进的利用压缩域信息检测人脸的方法;在镜头聚类部分,构造了一个新颖的度量特征量对主持人镜头采用系统聚类法进行聚类,并用模糊C均值聚类法解决了聚类过程中自适应阈值确定的问题。该算法在保持较高检测性能的前提下提高了主持人镜头的检测速度。
     (2)提出了一种基于决策树的镜头分类算法,将新闻视频镜头依次分为广告、“其他”、静态图像、主持人、记者和独白六类。其中广告、“其他”和静态图像三类分别利用黑帧、运动、时间以及人脸等特征进行检测;主持人镜头采用聚类方法进行检测;对于比较难区分的记者和独白镜头,创新性地将它们的检测转换为文本序列标注的问题,并采用条件随机场进行建模。该算法有效地融合了音频、人脸以及上下文等多模态信息,对新闻视频中重要的镜头进行了区分,并取得了较好的分类结果。
     (3)提出了一种融合音频、字幕以及视觉等多模态信息的新闻故事单元分割算法。创新性地将字幕变化、音频类型以及镜头类型等高层次内容特征联系起来共同处理,巧妙地将新闻镜头序列转换成为多个关键词序列,使新闻故事单元分割问题转换成为文本序列分割的问题。该算法采用条件随机场进行建模,充分利用了每个序列内以及序列之间的上下文信息,得到了较好的分割性能。
     此外,论文还综述了视频内容分析技术,构造了一个基于规则和隐马尔可夫模型的分层音频分类方法,实现了一个较完整的新闻视频中字幕提取框架,最终设计并实现了一个基于COM架构的视频内容分析与摘要系统。
     综上所述,本论文分别从音频、字幕、视觉以及它们之间的有效融合等方面对新闻视频进行了基于内容的分析,实验结果证明了这些算法的有效性。
Semantic video management, including video browsing, indexing and retrieval, is necessary for the effective utilization of video repositories. Video content analysis technology aims to bridge the semantic gap between low-level features and high-level concepts, and to provide an accessible way to organize and manage video data.
     In this dissertation, research efforts are concentrated on audio, caption and visual content analysis and multimodality information fusion techniques for news video with pattern recognition models. The three main contributions are as follows:
     (1) A novel anchorperson shot detection algorithm in MPEG domain is proposed, in which an improved face detection method in compressed domain and a new dissimilarity metric for clustering are presented. The proposed algorithm is effective and computationally efficient.
     (2) A new video shot classification method is proposed using decision tree. Six semantic types are studied and categorized: Commercial, Others, Still Image, Anchorperson, Reporter and Monologue. The first three types are identified with features of black frame, motion activity, shot duration and face. The anchorperson shots are detected by clustering method. And the reporter and monologue shots are distinguished by conditional random fields (CRFs) model, where the detection is transformed into sequence labeling problem using audio, face, motion and temporal information. The experimental results demonstrate the effectiveness and high performance of the method.
     (3) A novel news story segmentation method is proposed, fusing multimodality information from the results of audio classification, caption extraction and video shot classification. The video shot sequence is transformed into several keywords sequences so that the news story segmentation is treated as a sequence segmentation problem. CRFs model is employed to fuse the context information within and between the keywords sequences. Experiments show that the idea is feasible and better result is achieved.
     Besides, various video content analysis techniques are surveyed, a layered audio classification method based on rules and HMM model is developed, a caption extraction framework for news video is designed and realized, and a COM-based video content analysis and abstraction system is devised and implemented in this dissertation.
     All in all, the dissertation provides an in-depth investigation into semantic concepts detection and multimodality information fusion.

引文

[1] Hanjalic A, Nesvadba J, Benois J. Moving away from narrow-scope solutions in multimedia content analysis. In: European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology. London, 2005: 1~6
    [2]章毓晋,基于内容的视觉信息检索,北京:科学出版社,2003
    [3] Wang Y, Liu Z, Huang J. Multimedia content analysis: using both audio and visual clues. IEEE Signal Processing Magazine, 2000, 17(6): 12~36
    [4] Zhang HJ, Kankanhalli A, Smoliar SW. Automatic partitioning of full-motion video. Multimedia System, 1993, 1(1): 10~28
    [5] Girgensohn A, Boreczky J. Time-constrained keyframe selection technique, Multimedia Tools and Applications, 2000, 11: 347~358
    [6] M Flickner, H Sawhney, W Niblack, et al. Query by image and video content: the QBIC system. Computer, 1995, 28(9): 23~32
    [7] Smith JR, Chang SF. VisualSEEK: a fully automated content-based image query system. In: ACM International Conference on Multimedia. Boston, USA, 1997: 87~98
    [8] Smeaton AF, Over P, Kraaij W. Evaluation campaigns and TRECVid. In: ACM International Workshop on Multimedia Information Retrieval. Santa Barbara, USA, 2006: 321~330
    [9] Wactlar H, Kanade T, Smith M, et al. Intelligent access to digital video: Informedia project. Computer, 1996, 29(5): 46~52
    [10] Hauptmann A, Baron V, Chen M. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In: Proceedings of TRECVID Workshop on 2003, 2003
    [11] Merlino A, Morey D, Maybury M. Broadcast news navigation using story segmentation. In: ACM International Conference on Multimedia. Seattle, USA, 1997: 381~391
    [12] Hsu W, Kennedy L, Huang C W, et al. News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal. Canada, 2004: III645~III648
    [13] Chang SF, Hsu W, Jiang W, et al. Columbia University TRECVID-2006 video search and high-level feature extraction. In: Proceedings of TRECVID Workshop on 2007, 2007
    [14] Yanagawa A, Chang SF, Kennedy L, et al. Columbia University's baseline detectors for 374 LSCOM semantic visual concepts. USA: Research Report Columbia University, 2007
    [15] Snoek C, Worring M, Geusebroek JM, et al. The MediaMill TRECVID 2004 semantic video search engine. In: Proceedings of TRECVID Workshop on 2004, 2004
    [16] Srinivasan S, Petkovic D, Ponceleon D. Towards robust features for classifying audio in the CueVideo system. In: ACM International Conference on Multimedia. Orlando, USA, 1999: 393~400
    [17] Connor N, Czirjek C, Deasy S, et al. News story segmentation in the Fischlar video indexing system. In: International Conference on Image Processing. Thessaloniki, USA, 2001: 418~421
    [18] Chaisorn L, Chua T S, Koh C K, et al. A two-level multi-modal approach for story segmentation of large news video corpus. In: Proceedings of TRECVID Workshop on 2003, 2003
    [19] Yang H, Chaisorn L, Zhao Y, et al. VideoQA: question answering on news video. In: ACM International Conference on Multimedia. Berkeley, USA, 2003: 632~641
    [20] Chua TS, Neo SY, Goh HK. TRECVID 2005 by NUS PRIS. In: Proceedings of TRECVID Workshop on 2005, 2005
    [21] Liu Z, Huang Q. Classification of audio events in broadcast news. In: IEEE Second Workshop on Multimedia Signal Processing. Redondo Beach, USA, 1998: 364~369
    [22] Liu Z, Gibbon D, Zavesky E, et al. AT&T research at TRECVID 2006. In: Proceedings of TRECVID Workshop on 2006, 2006
    [23] Hanjalic A, Li Q. Affective video content representation and modeling. IEEE Transactions on Multimedia. 2005, 7(1): 143~54
    [24] Hua XS, Lu L, Zhang HJ. Robust learning-based TV commercial detection. In: IEEE International Conference on Multimedia and Expo. Amsterdam, Netherlands, 2005: 149~152
    [25] Zhang D, Qi W, Zhang HJ. A new shot boundary detection algorithm. In: IEEE Pacific Rim Conference on Multimedia, 2001, 63~70
    [26] Lu L, Zhang HJ, Li SZ. Content-based audio classification and segmentation by using support vector machines. Multimedia Systems, 2003, 8(6): 482~492
    [27] Hua XS,Liu WY,Zhang HJ. An automatic performance evaluation protocol for video text detection algorithms. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(4): 498~507
    [28] Lai W, Hua XS, Ma WY. Towards content-based relevance ranking for video search. In: ACM International Conference on Multimedia. Santa Barbara, USA, 2006: 627~630
    [29] Zhu CZ, Hua XS, Mei T, et al. Video booklet - a natural video searching and browsing interface. In: ACM SIGMM International Workshop on Multimedia Information Retrieval. Singapore, 2005: 113~120
    [30] Cao J, Lan YX, Li JM, et al. Intelligent multimedia group of Tsinghua University at TRECVID 2006. In: Proceedings of TRECVID Workshop on 2006, 2006
    [31]姜帆,章毓晋,一种基于形态学操作的新闻标题条检测算法,电子与信息学报,2003,25(12):1647~165
    [32]马宇飞,白雪生,徐光佑,等,新闻视频中口播帧检测方法的研究,软件学报,2001,12(3):377~382
    [33]冯哲,基于内容的视频检索中的音频处理,[博士学位论文],上海:复旦大学,2004
    [34] Wu LD, Qiu XP, Feng Z, et al. Fudan University at TRECVID 2003. In: Proceedings of TRECVID Workshop on 2003, 2003
    [35] Xue XY, Lu H, Zhang SL, et al. Fudan University at TRECVID 2006. In: Proceedings of TRECVID Workshop on 2006, 2006
    [36]谢毓湘,辅助情报分析的新闻视频挖掘技术研究,[博士学位论文],长沙:国防科学技术大学,2004
    [37]代科学,武德峰,付畅俭,等,视频挖掘技术综述,中国图象图形学报,2006,11(4):451~457
    [38]谢毓湘,栾悉道,吴玲达,等,NVPS:一个多模态的新闻视频处理系统,情报学报,2004,23(4):404~409
    [39]章东平,视频文本的提取,[博士学位论文],杭州:浙江大学,2006
    [40] Zhuang YT, Rui Y, Huang TS, et al. Adaptive key frame extraction using unsupervised clustering. In: IEEE International Conference on Image Processing. Chicago, USA, 1998: 886~870
    [41]庄越挺,毛炜,吴飞,等,基于隐马尔可夫链的广播新闻分割分类,计算机研究与发展,2002,39(9):1057~1063
    [42]周洞汝,视频数据库导论,北京:科学出版社,2000
    [43]于俊清,基于内容的视频摘要研究,[博士学位论文],武汉:武汉大学,2002
    [44]朱映映,基于多模态综合的视频场景分析研究,[博士学位论文],武汉:武汉大学,2004
    [45] Wang JZ, Boujemaa N, Bimbo AD, et al. Diversity in multimedia Information Retrieval research. In: ACM International Workshop on Multimedia Information Retrieval. Santa Barbara, USA, 2006: 5~12
    [46]俞天力,章毓晋,基于全局运动信息的视频检索技术,电子学报,2001,29(12): 1794~1798
    [47]王炳锡,屈丹,彭煊,实用语音识别基础,北京:国防工业出版社,2005
    [48]张春田,苏育挺,张静,数字图像压缩编码,北京:清华大学出版社,2006
    [49] Wang H, Divakaran A, Vetro A, et al. Survey of compressed-domain features used in audio-visual indexing and analysis. Visual Communication and Image Representation, 2003, 14: 150~163
    [50] Srinivasan U, Pfeiffer S, Nepal S, et al. A Survey of MPEG-1 audio, video and semantic analysis techniques. Multimedia Tools and Applications, 2005, 27(1): 105~141
    [51]刘艳,李宏东,DCT域图象处理和特征提取技术,中国图象图形学报,8A(2),2003:121~128
    [52] Lee S. Video Analysis and abstraction in the compressed domain. [PHD Thesis], USA: Georgia Institute of Technology
    [53] Kobla V, DeMenthon D, Doermann D. Detection of slow-motion replay sequences for identifying sports videos. In: IEEE 3rd Workshop on Multimedia Signal Processing. Copenhagen, Denmark, 1999: 135~140
    [54] Feng J, Lo KT, Mehrpour H. Scene change detection algorithm for MPEG video sequence. In: International Conference on Image Processing. Lausanne. Switzerland, 1996: 821~824
    [55] Roach M, Mason J, Xu LQ, et al. Recent trends in video analysis: a taxonomy of video classification problems. In: Internet and Multimedia Systems and Applications. Hawaii, USA, 2002: 348~353
    [56] Fischer S, Lienhart R, Effelsberg W. Automatic recognition of film genres. In: ACM International Conference on Multimedia. San Francisco, USA, 1995: 295~304
    [57] Jase E, Kittler J, Christmas W. Hierarchical decision making scheme for sports video categorization with temporal post-processing. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA, 2004: 908~913
    [58] Yi HR, Rajan D, Chia LT. An efficient video classification system based on HMM in compressed domain. In: The Fourth International Conference on Information, Communications and Signal Processing and Fourth Pacific-Rim Conference on Multimedia. Singapore, 2003: 1546~1550
    [59] Fan J, Luo H, Xiao J, et al. Semantic video classification and feature subset selection under context and concept uncertainty. In: Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries. Tucson, 2004: 192~201
    [60] Kraaij W, Smeaton AF, Glasnevin D. TRECVID 2004 - an overview. In: Proceedings of TRECVID Workshop on 2004, 2004
    [61] Westerveld T, Vries A, Jong F. Workshop on the evaluation of multimedia retrieval. ACM SIGIR Forum, 39(1), 2005: 34~366
    [62] Cotsaces C, Nikolaidis N, Pitas I. Video shot detection and condensed representation: a review. IEEE Signal Processing Magazine, 2006, 23(2): 28~37
    [63]钱刚,曾贵华,典型视频镜头分割方法的比较,计算机工程与应用,2004(32):51~55
    [64] Gargi U, Kasturi R, Strayer S H. Performance characterization of video-shot-change detection methods. IEEE Transactions on Circuits and Systems for Video Technology, 2000, 10(1): 1~13
    [65] Gao XB, Tang XO. Unsupervised video-shot segmentation and model-free anchorperson detection for news video story parsing. IEEE Transactions on Circuits and Systems for Video, 2002, 12(9): 765~776
    [66] Li Y, Zhang T, Tretter D. An overview of video abstraction techniques. USA, Hewlett-Packard Labs,2001
    [67]王方石,须德,吴伟鑫,基于自适应阈值的自动提取关键帧的聚类算法,计算机研究与发展,2005,42(10):1752~1757
    [68] TDT-4 corpus annotation specification. http://projects.ldc.upenn.edu/TDT4/Annotation/annot_task_def_V1.4.pdf, 2007,11
    [69] Janvier B, Bruno E, Marchand-Maillet S, et al. Performance evaluation of a contextual news story segmentation algorithm. In: Multimedia Content Analysis, Management, and Retrieval 2006. San Jose, US, 2006: 60730X-1~60730X-10
    [70] Qi W, Gu L, Jiang H, et al. Integrating visual, audio and text analysis for news video. In: International Conference on Image Processing. Vancouver, BC, 2000: 520~523
    [71]张春林,张鹏林,胡瑞敏,新闻视频中基于主持人识别的新闻故事探测,计算机工程,2003,29(14): 20~21
    [72] Browne P, Czirjek C, Gaughan G, et al. Dublin City University video track experiments for TREC 2003. Proceedings of TRECVID Workshop on 2003, 2003
    [73] Hsu W, Chang SF. Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation. In: IEEE International Conference on Multimedia and Expro. Taipei, China, 2004: 1091~1094
    [74] Slonim N. The Information Bottleneck: Theory and Applications [PHD thesis], Jerusalem, Israel, Hebrew University, 2002
    [75] Hauptmann A, Yan R, Lin WH. How many high-level concepts will fill the semantic gap in video retrieval? In: Proceedings of the International Conference on Image and Video Retrieval. Amsterdam, The Netherlands, 2007: 627~634
    [76] Over P, Ianeva T, Kraaij W, et al. TRECVID 2005-An Overview. In: Proceedings of TRECVID Workshop on 2005, 2005
    [77] Naphade M, Smith JR, Tesic J, et al. Large-scale concept ontology for multimedia. IEEE MultiMedia Magazine, 2006: 86~91
    [78] Snoek C, Worring M. Are concept detector lexicons effective for video search? In: IEEE International Conference on Multimedia and Expo. Beijing, China, 2007: 1966~1969
    [79] Snoek C, Worring M, Gemert J, et al. The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM International Conference on Multimedia. Santa Barbara, USA, 2006: 421~430
    [80] Dimitrova N, Zhang HJ, Shahraray B, et al. Applications of video-content analysis and retrieval. IEEE Multimedia. 2002, 9(3): 42~55
    [81]谢毓湘,栾悉道,吴玲达,典型的视频摘要系统及其对比,计算机应用研究,2004,6:5~7
    [82] Sameer A, Rangachar K, Ramesh J. A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video. Pattern Recognition, 2002, 35: 945~965
    [83] Jaimes A, Christel M, Gilles S, et al. Multimedia information retrieval: what is it, and why isn’t anyone using it? In: ACM International Workshop on Multimedia Information Retrieval. Singapore, 2005: 3~8
    [84] Alan FS, Over P, Kraaij W. Trecvid: evaluating the effectiveness of information retrieval tasks on digital video. In: ACM International Conference on Multimedia. New York, USA, 2004: 652~655
    [85] Hanjalic A, Sebe N, Chang E. Multimedia content analysis, management and retrieval: trends and challenges. In: Multimedia Content Analysis, Management, and Retrieval 2006. San Jose, USA, 2006: 607301-1~607301-5
    [86] Smeaton AF. Techniques used and open challenges to the analysis, indexing and retrieval of digital video. Information Systems, 2007, 32(4): 545~559
    [87]欧阳建权,李锦涛,张勇东,视频摘要技术综述,计算机工程,2005,31(10):7~9
    [88] Agnihotri L, Kender J, Dimitrova N, et al. Framework for personalized multimedia summarization. In: ACM SIGMM International Workshop on Multimedia Information Retrieval. Singapore, 2005: 81~88
    [89] Wang T, Mei T, Hua XS, et al. Video Collage: A novel presentation of video sequence. In: IEEE International Conference on Multimedia and Expo. Beijing, China, 2007: 1479~1482
    [90] Luo H, Fan J. Concept-oriented video skimming and adaptation via semantic classification. In: ACM SIGMM International Workshop on Multimedia Information Retrieval. New York, USA, 2004: 213~220
    [91] Huang M, Mahajan AB, DeMenthon DF. Automatic performance evaluation for video summarization. USA, University of Maryland, 2004
    [92] Agnihotri L, Multimedia summarization and personalization of structured video [PHD thesis], USA: COLUMBIA UNIVERSITY, 2005
    [93] Zhu XQ, Wu XD, Ahmed K, et al. Video data mining: semantic indexing and event detection from the association perspective. IEEE Transactions on Knowledge and Data Engineering, 17(5), 2005: 665~677
    [94]陈沙寻,视频数据挖掘研究与进展,软件学报,2005,16(1):1~9
    [95]范明,孟小峰,数据挖掘概念与技术(译),北京:机械工业出版社,2001
    [96] Kules V, Petrushin V, Sethi IK. The Perseus project: creating personalized multimedia news portal. In: Second International Workshop on Multimedia Data Mining. San Francisco, USA, 2001: 1~37
    [97] Pan JY, Faloutsos C.“GeoPlot”: spatial data mining on video libraries. In: Proceedings of the Conference on Information and Knowledge Management. Virginia, USA, 2002: 405~412
    [98] Xie YX, Luan XD, Lao SY, et al. A news video mining method based on statistical analysis and visualization. Third International Conference on Image and Video Retrieval, Dublin, 2004: 115~22
    [99]谢锦辉,隐Markov模型(HMM)及其在语音处理中的应用,武汉:华中理工大学出版社,1995
    [100] Wold E, Blum T, Keislar D, et al. Content-based classification, search, and retrieval of audio. Multimedia, 1996, 3(3): 27~36
    [101] Srinivasan S, Petkovic D, Ponceleon D. Towards robust features for classifying audio in the CueVideo system. In: ACM International Conference on Multimedia. Orlando, USA, 1999: 393~400
    [102] Foote JT. A similarity measure for automatic audio classification. In: Proceedings of AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora. Stanford, USA, 1997: 1~7
    [103] Zhang T, Kuo J. Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 2001, 9(4): 441~457
    [104] Rosenberg AE, Chagnoleau IM, Parthasarathy S, et al. Speaker Detection in Broadcast Speech Databases, Proceedings of ICSLP’98, Sydney Australia, 1998: 1339~1342
    [105] Liu Z, Huang JC, Wang Y. Classification of TV programs based on audio information using hidden markov model. In: IEEE Workshop on Multimedia Signal Processing. Redondo Beach, USA, 1998: 27~32
    [106] Rajapakse M, Wyse L. Generic audio classification using a hybrid model based on GMMs and HMMs. In: International Multimedia Modeling Conference. Honolulu, USA, 2004: 53~58
    [107]安欣,面向新闻视频检索的音频分类算法,[硕士学位论文],天津:天津大学,2007
    [108]王建,基于内容新闻视频解析关键技术研究,[博士学位论文],上海:上海交通大学,2006
    [109]张学工,统计学习理论的本质(译),北京:清华大学出版社,2000
    [110]边肇祺,张学工,模式识别,北京:清华大学出版社,2000
    [111]何家颖,黎绍发,一种复杂背景图像文字分割算法,模式识别与人工智能,2005,18(2):148~153
    [112] Pei SC, Chuang YT. Automatic text detection using multi-layer color quantization in complex color images. In: IEEE International Conference on Multimedia and Expo.Taipei, China, 2004: 619~622
    [113] Zhong Y, Karu K, Jain AK. Locating text in complex color images. Pattern Recognition. 1995, 15: 1523~1535
    [114]谢毓湘,栾悉道,吴玲达,等,新闻视频帧中的字幕检测,计算机工程,2004,30(20):167~169
    [115] Chen D T, Odobez J M, Thiran J P. A localization/verification scheme for finding text in images and video frames based on contrast independent feature and machine learning methods. Signal Processing: Image Communication. 2004, 19: 205~217
    [116] Lienhart R, Wernicke A. Localizing and segmenting text in images and videos. IEEE Transactions on Circuits and Systems for Video Technology. 2002, 12(4): 256~268
    [117]黄剑华,颜子夜,唐降龙,基于小波重构的视频图像文本检测方法,哈尔滨工业大学学报,2006,38(9):1458~1460
    [118] Wang YK, Chen JM. Detecting video texts using spatial-temporal wavelet transform. In: International Conference on Pattern Recognition. Hong Kong, China, 2006: 754~757
    [119] Gllavata J, Ewerth R, Freisleben B. Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: International Conference on Pattern Recognition. Cambridge, UK, 2004: 425~428
    [120]周军,徐奕,周源华,基于局部能量特征的视频字幕分割,中国图象图形学报,2002,7(11):1134~1138
    [121]王伟强,高文,一种压缩域上的快速标题文字探测算法及其应用,计算机学报,2001,24(6):620~626
    [122] Zhong Y, Zhang H, Jain A K. Automatic caption localization in compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000, 22(4): 385~392
    [123] Chen DT, Odobez JM, Bourlard H. Text detection and recognition in images and video frames. Pattern Recognition, 2004, 37(3): 595~608
    [124]刘曼曼,基于支持向量机的新闻视频主题式字幕提取,[硕士学位论文],天津:天津大学,2007
    [125] Haralick R, Shanmugam K, Dinstein I. Textural features for image classification. IEEE Transactions on Systems, Man and Cybernetics. 1973, 3(6): 610~621
    [126]薄华,马缚龙,焦李成,图像纹理的灰度共生矩阵计算问题的分析,电子学报,2006,34(1):155~158
    [127]许建华,张学工,李衍达,支持向量机的新发展,控制与决策,2004,19(5):481~484
    [128]孙成叶,桑农,张天序,等,图像双线性插值无级放大及其运算量分析,计算机工程,2005,31(9):167~170
    [129]苟中魁,张少军,李忠富,一种OTSU阈值法的推广—OTSU双阈值法,机械,2004,31(7):12~14
    [130] Lee YJ, Mangasarian OL. SSVM: a smooth support vector machine for classification. Computational Optimization and Applications. 2001, 20(1): 5~22
    [131] Jung K,Kim KI,Jain AK. Text information extraction in images and video: A survey. Pattern Recognition, 2004, 37(5),977~997
    [132] Zhang HJ, Gong Y, Smoliar SW, et al. Automatic parsing of news video. In: International Conference on Multimedia Computing and Systems. Boston, USA, 1994: 45~54
    [133] Hanjalic A, Lagensijk RL, Biemond J. Template-based detection of anchorperson shots in news programs. In: International Conference on Image Processing. Chicago, USA, 1998: 148~152
    [134]于俊清,汤旸,周向东,利用主色模板匹配检测新闻视频口播帧,计算机辅助设计与图形学学报,2005,17(3):558~562
    [135]李默,李弼程,邓子健,新闻视频主持人镜头的半屏幕检测算法,计算机工程与应用,2005,15:184~185
    [136] Qi W, Gu L,Jiang H, Chen X R, et al. Integrating visual, audio and text analysis for news video. In: International Conference on mage Processing. Piscataway, USA, 2000: 520~523(vol 3)
    [137] Martin H, Kim H G, Thomas S. Audiovisual anchorperson detection for topic-oriented navigation in broadcast news. In: IEEE International Conference on Multimedia and Expo. Toronto, Canada, 2006: 1817~1820
    [138] Lan DJ, Ma YF, Zhang HJ. Multi-level anchorperson detection using multimodal association. In: International Conference on Pattern Recognition. Cambridge, UK, 2004: 890~893
    [139] Anna D,Marrazzo L, Percannella G, et al. A multi-stage approach for anchor shot detection. In: Joint IAPR International Workshops SSPR 2006 and SPR 2006. Berlin, German, 2006: 773~782
    [140] Wang WQ, Gao W. A fast anchor shot detection algorithm on compressed video. In: IEEE Pacific Rim Conference on Multimedia. Beijing, China, 2001: 873~878
    [141] Avrithis Y, Tsapatsoulis N, Kollias S. Broadcast news parsing using visual cues: A robust face detection approach. In: IEEE International Conference on Multimedia and Expo. New York, USA, 2000: 1469~1472
    [142] Zhi M, Cai AN. Shot change detection with adaptive thresholds. In: IEEE International Workshop on VLSI Design and Video Technology. Suzhou, China, 2005: 147~149
    [143] Wang H, Chang SF. A highly efficient system for automatic face region detection in MPEG video. IEEE Transactions on Circuits and Systems for Video Technology. 1997, 7(4): 615~628
    [144] Luo H, Eleftheriadis A. On face detection in the compressed domain. In: Proceedings of Multimedia 2000. Los Angeles, CA, USA, 2000: 285~294
    [145] Chua T S, Zhao Y S, Kankanhalli M S. Detection of human faces in a compressed domain for video stratification, The Visual Computer. 2002, 18(2): 121~133
    [146]李晓光,李晓华,沈兰荪,一种基于多级梯度能量特征的DCT压缩域人脸检测算法,电子学报,2005,33(12):2170~2173
    [147] Chen L, Zhou GF. A detection strategy of multi-pose face in compressed domain. Wuhan University Journal of Natural Sciences. 2004, 9(5): 845~850
    [148] Hsu R, Mottaleb A, Jain A. Face detection in color images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(5): 696~706
    [149] Bezdek J, Ehrlich R. The fuzzy c-means clustering algorithm. Computers and Geosciences. 1984, 10(2-3): 191~203
    [150] Eickeler S, Kosmala A, Rigoll G. A new approach to content-based video indexing using hidden markov models. IEEE Workshop on Image Analysis for Multimedia Interactive Service, 1997:149~154
    [151] Chaisorn L, Chua TS. The segmentation and classification of story boundaries in news video. In: Visual and Multimedia Information Management. Brisbane, Australia, 2002: 95~109
    [152] Wang P, Liu ZQ, Yang SQ. Investigation on unsupervised clustering algorithms for video shot categorization. Soft Computing. 2007, 11(4): 355~360
    [153] Xu G, Ma YF, Zhang HJ, et al. An HMM-based framework for video semantic analysis. Circuits and Systems for Video Technology, 2005, 15(11): 1422~1433
    [154] Sugano M, Isaksson R, Nakajima Y, et al. Shot genre classification using compressed audio-visual features. In: International Conference on Image Processing. Barcelona, Spain, 2003: 17~20
    [155]姜锋,基于条件随机场的中文分词研究,[硕士学位论文],大连:大连理工大学,2006
    [156]陈晴,基于条件随机场的自动分词技术的研究,[硕士学位论文],沈阳:东北大学,2004
    [157] Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning. Massachusetts, USA, 2001: 282~289
    [158] McCallum A, Freitag D, Pereira F. Maximum entropy Markov models for information extraction and segmentation. In: International Conference on Machine Learning. California, USA, 2000: 591~598
    [159] Wang Y, Ji Q. A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005: 264~270
    [160] Wang T, Li J, Diao Q, et al. Semantic event detection using conditional random fields. In: 2006 Conference on Computer Vision and Pattern Recognition Workshops. New York, USA, 2006: 1640552~1640557
    [161] Sha F, Pereira F. Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 2003: 134~141
    [162] Hauptmann AG, Witbrock MJ. Story Segmentation and Detection of Commercials in Broadcast News Video. In: Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries. Santa Barbara, USA, 1998: 168~179
    [163] Sadlier DA, Marlow S, Connor N, et al. Automatic TV advertisement detection from MPEG bitstream. Pattern Recognition. 2002, 35(12): 2719~2726
    [164] Duan LY, Wang J, Zheng Y, et al. Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis. In: ACM International Conference on Multimedia. Santa Barbara, USA, 2006: 201~210
    [165] Viola P, Jones M. Rapid Object detection using a boosted cascade of simple features. In: IEEE Conference. on Computer Vision and Pattern Recognition. Kauai, HI, 2001: 511–518(Vol. 1)
    [166] Snoek C, Hauptmann A. Learning to identify TV news monologues by style and context. Technical Report CMUCS-03-193, Carnegie Mellon University, 2003
    [167] Ide I, Sekioka N, Takahashi T, et al. Assembling personal speech collections by monologue scene detection from a news video archive. In: ACM International Workshop on Multimedia Information Retrieval MIR. Santa Barbara, USA, 2006: 223~229
    [168] Campbell JP. Speaker Recognition: A Tutorial. Proceedings of the IEEE, 1997, 85(9): 1437~1462
    [169]周俊生,戴新宇,尹存燕,等,基于层叠条件随机场模型的中文机构名自动识别,电子学报,2006,34(5):804~809
    [170]魏维,基于统计学的视频语义分析与提取技术研究,[博士学位论文],南京:南京理工大学,2006
    [171] Snoek C, Worring M. Multimodal Video Indexing: A Review of the State-of-the-art. Multimedia Tools and Applications. 2005, 25(1): 5~35
    [172] Skipper J, Wassenhove V, Nusbaum H, et al. Hearing Lips and Seeing Voices: How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception, Cerebral Cortex, 2007: 1~13
    [173] Hsu W, Chang SF, Huang CW, Kennedy L, et al. Discovery and fusion of salient multi-modal features towards news story segmentation. In: Storage and Retrieval Methods and Applications for Multimedia 2004. San Jose, USA, 2004: 244~258
    [174] Chua TS, Chang SF, Chaisorn L, et al. Story boundary detection in large broadcast news video archives-techniques: experience and trends. In: ACM International Conference on Multimedia. New York, USA, 2004: 656~659
    [175] Shriberg E, Stolcke A. Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication. 2000, 32(1~2): 127~154
    [176] Eichmann D, Park DJ. Experiments in boundaries recognition at the University of Iowa. In: Proceedings of TRECVID Workshop on 2003, 2003
    [177] Wang C, Wang Y, Liu HY, et al. Automatic story segmentation of news video based on audio-visual features and text information. In: International Conference on Machine Learning and Cybernetics. Xi'an, China, 2003: 3008~3011
    [178] Hoashi K, Sugano M, Naito M, et al. Shot Boundary Determination on MPEG compressed domain and story segmentation experiments for TRECVID 2004. In: Proceedings of TRECVID Workshop on 2004, 2004
    [179] Arlandis J, Over P, Kraaij W. Boundary error analysis and categorization in the TRECVID news story segmentation task. In: International Conference on Image and Video Retrieval. Singapore, 2005: 103~112
    [180]陈剑赟,体育视频语义内容分析技术研究,[博士学位论文],长沙:国防科学技术大学,2005
    [181]陆其明,DirectShow开发指南,北京:清华大学出版社,2004
    [182] Chang SF, Sikora T, Puri A. Overview of the MPEG-7 standard. IEEE Transactions on Circuits and Systems for Video Technology. 2001, 11(6): 688~695

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700