影视视频结构解析及自动编目技术研究

英文题名：Reasearch on Video Structure Analysis and Automatic Cataloging Technique
作者：高广宇
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：结构解析 ; 镜头边界检测 ; 场景检测 ; 场景识别 ; 人脸识别 ; 演员识别
英文关键词：video structure analysis ; shot boundary detection ; scene detection ; video scene recognition ; face recognition and character identification
学位年度：2013
导师：马华东
学科代码：0812
学位授予单位：北京邮电大学
论文提交日期：2013-04-12

摘要

广播电视及网络影视资源早已成为人们日常文化生活的一个重要组成部分。目前,影视视频内容的丰富性和多样性及特征数据特有的时空高维结构等,使得人们开始思考如何有效地组织和管理这些海量视频,并且尽可能快地定位到自己感兴趣的内容。然而,大部分的多媒体内容都需要人工地进行结构和语义分割及编目,以便能够更加有效地索引、分类存储或检索这些内容。随着影视视频资源数量井喷式地增长,这种需要投入大量人力物力资源来完成的视频结构、语义分割及编目方式越来越不能满足影视剧节目制作和使用的需求。如果能够实现影视视频结构的自动解析、理解及编目,这无疑将节约大量的资源,并提高电视节目的生产效率。
     本文针对这一问题,从分析视频的结构和内容出发,对视频结构解析及自动编目技术进行研究。其中视频结构解析主要是针对视频数据规模大的特点,对视频数据结构进行分析,将其分解成若干独立的逻辑单元。同时通过视频内容分析获取可用于视频自动编目的基本语义信息,实现影视视频自动结构解析和编目。具体而言,本论文主要从镜头边界检测、场景检测、场景识别、影视演员识别等方面对影视视频结构解析及编目进行研究并提出解决方法。具体的研究问题包括：如何在复杂多变的视频中合理高效地进行视频结构的解析；如何准确并有效地提取包括场景类别、影视演员人名等在内的用于自动编目的基本语义信息。本论文的主要贡献如下：
     (1)针对如何准确而高效地实现镜头边界检测的问题,提出了一种焦点区域互信息计算和跳帧检测结合的快速镜头边界检测方法,同时从时间和空间上加速检测并且保证了准确率。我们的方法合理有效地描述了视频帧间差,减少了冗余信息的处理,实现了快速准确的镜头边界检测。具体而言,利用焦点区域互信息在空间上减少了计算的帧内像素点数,利用自适应跳帧检测在时间上减少了处理的图像帧数量,并且通过分析图像帧的角点分布去除误检测。实验结果表明该方法能够有效地降低镜头边界检测所耗费的时间,同时保证准确率并检测到更多的渐变边界。
     (2)针对影视视频场景检测及识别的问题,我们首先提出了一种基于核典型相关分析和特征融合的影视场景检测算法。针对影视剧中场景较复杂且难以定位的问题,综合考虑音频和视频信息,利用核典型相关分析算法进行特征融合来获取鲁棒性更强的新特征,进而使用图分割的方式准确地检测场景边界。其次,对场景检测获取的场景片段,通过去除噪声区域、获取全景关键帧等获取典型局部特征块,并使用潜在狄利克雷分析主题模型进行场景类别建模和分类,将影视场景片段归类到特定的五类场景类别中。
     (3)针对影视剧中演员人物自动识别和人名标注问题,提出了有效的解决方法：使用演员列表信息和网络搜索结合的方式构造演员人脸训练集；通过人脸检测和视频跟踪获取人脸跟踪集后,利用基于核的多任务联合稀疏表示分类(Kernel Multi-task Joint Sparse Representation and Classification, KMTJSRC)算法对每个人脸跟踪集进行识别；最后,利用条件随机场模型对人脸跟踪集序列进行更准确和有效的序列标注。
     (4)基于视频结构和基本语义内容的自动编目原型系统的设计与实现。为了验证本文算法有效性,我们设计并实现了影视视频自动编目原型系统。我们基于论文中所提出的视频结构解析和若干特定基本语义提取算法,构建了更加高效和合理的自动编目原型系统,并且在实际应用中得到了应用和验证。通过大量实验表明,本文提出的方法能够准确而高效地对视频内容进行结构解析,还能够获取场景类别、演员人名等基本语义内容作为自动编目的条目内容,最终大幅地减少人工工作量,提高传媒行业企业视频编目的效率。
With the rapid development of multimedia technology and computer processing ability, people are faced with a huge digital information ocean. However, in the same time, with the richness and diversity of these contents as well as characteristics of the high-dimensional data structure in temporal and spatial space, people start thinking about how to effectively organize these massive data, and how to find their own interesting contents as fast as possible. Meanwhile, the movie and TV series on broadcast television and networks have already become an important part of people's daily life. Thus, in order to manage, organize, browse and index these videos more effectively, most of the processing, including multimedia structure analysis, semantics extraction as well as cataloging need the human assistances. Therefore, with so many movies and television resources to be proceed, a lot of manpower and material resources will be wasted for achieving the destination of video segmentation and semantic annotation. Besides, with the exponential growth of movies and television resources, such manual ways begin to be unable to meet the needs of resources producers as well as users. Well, that is the motivation for us to introduce several video processing methods, including the structure analysis, understand and automatically cataloging for movies and television series. Using these methods, it will undoubtedly save a lot of money, time and other resources, and also improve the efficiency of the production for broadcast program.
     In this paper, we do researches on video structure analysis and automatic cataloging. In order to handle the large-scale video data, video structural analysis is very useful and important, which is used to segment the video into several independent logical units at first. Then, through a series of semantic content analysis techniques, some important semantic information is obtained for automatic video parsing and cataloging. More specifically, to deal with these problems in video structure analysis and video cataloging, we proposed a series of methods, including the shot boundary detection, the video scene detection, the video scene recognition as well as the movie character identification method. Actually, with these methods, we mainly deal with the following two questions:1) how to efficiently do video structure analysis in such a complex video environment;2) how to accurately extract the basic semantic contents for automatic video cataloging. The main contribution of this paper is as follows:
     (1) For video structure analysis, a fast and efficient shot boundary detection algorithm is necessary, especially for real-time video processing applications. Extensive work has focused on accurate shot boundary detection at the expense of demanding computational costs. Therefore, we proposed a fast shot boundary detection method with the Focus Region (FR) definition and the adaptive skipping searching. Our method reduces the computation pixel-wise and frame-wise while still giving satisfactory accuracy. The proposed approach substantially speeds up the computation through reducing both detection region and scope. Color histogram and mutual information are used together to measure the difference between frames, and corner distribution of frames is utilized to exclude most of false boundaries.
     (2) Scene detection is the fundamental step for efficient accessing and browsing movies and TV series. Therefore, firstly, we propose to segment movie into scenes which utilizes fused visual and audio features:1) the movie is first segmented into shots and the key frames are extracted later;2) while feature movies are often filmed in open and dynamic environments using moving cameras and have continuously changing contents, we focus on the association extraction of visual and audio features;3) based on the Kernel Canonical Correlation Analysis (KCCA), all these features are fused for scene detection;4) spatial-temporal coherent shots construct the similarity graph which is partitioned to generate the scene boundaries. Secondly, while many existed scene recognition methods, which refers to the problem of recognizing the semantic scene labels (e.g. bedroom, street), focus on static images and cannot get satisfactory results on videos, we propose a robust movie scene recognition approach utilizing panoramic frame and representative feature patches, and also the correlations between video clips are used to enhance the final recognition performance.
     (3) Automatically identifying characters in movies has attracted researchers'interest, and led to several significant and interesting applications. However, due to the vast variation in character appearance as well as the weakness and ambiguity of available annotation, it is still a challenging problem. In this paper, we investigate this problem with the supervision of actor character name correspondence provided by the movie cast. Our proposed framework, namely Cast2Face, is featured by:(i) we restrict the assigned names within the set of character names in the cast;(ii) for each character, by using the corresponding actor and movie name as key words, we retrieve from Google image search and get a group of face images to form the gallery set;(iii) the probe face tracks in the movie are then identified as one of the actors by a robust kernel multi-task joint sparse representation and classification method; and (iv) the Conditional Random Field (CRF) model with consideration of the constraints between face tracks is introduced to enhance the final labeling. Finally, the assigned actor name of a face track is then mapped to the character name based on the cast again.
     (4) Finally, in order to verify the effectiveness of these proposed methods, we design an automatic video cataloging system based on the information of video structure analysis and semantic content extraction. Through a large number of experiments, it shows that the proposed methods can accurately and efficiently deal with problems in video structure and content analysis, and also provide more intelligent cataloging contents. These methods as well as the system really provide sufficient assistances for broadcasting producer and also the end users.

引文

[Wang04]王佳梅.广播电视音像资料编目规范.国家广播电影电视总局,GY/T 202.1-2004.
    [Lienhart99] Lienhart R. Comparison of automatic shot boundary detection algorithm. In Proceedings of SPIE Storage and Retrieval for Image and Video Databases VII, USA, Jan.1999, pp.290-301.
    [Baeza-Yates99] Baeza-Yates R., Riberiro B. Modern information retrieval. Addison-Wesley, ACM Press,1999.
    [Bimbo99]Bimbo A. D. Visual information retrieval. Morgan Kaufmann, San Francisco, 1999.
    [Lew01]Lew M. S. Principles of Visual Information Retrieval, Springer Verlag,2001.
    [Smoliar94] Smoliar S. W., Zhang H. J. Content-based video indexing and retrieval. IEEE Multimedia, vol.1, no.2,1994, pp.62-72.
    [Zhuang02]庄越挺,潘云鹤,吴飞.网上多媒体信息分析与检索.清华大学出版社,2002.
    [Dirnitrova02] Dirnitrova N., Zhang H. J., Shahraray B.et al. Application of video-content analysis and retrieval. IEEE Multimedia, vol.9, no.4,2002, pp.42-55.
    [Zhang03]章毓晋.基于内容的视觉信息检索.科学出版社,2003.
    [Flickner95] Flickner M., Sawhney H., Niblack W. et al. Query by image and video content: the OBIC system. IEEE Computer, vol.28, no.9,1995, pp.23-32.
    [Smith97]Smith J. R., Chang S. F. Visually searching the web for content. IEEE Multimedia Magazine, vol.4, no.3,1997, pp.12-20.
    [Eactlar96] Wactlar H., Hauptmann A., Witbrock M. Informedia:News-on-demand experiments in speech recognition. In Proceedings of ARPA Speech Recognition Workshop, Feb.1996, pp.18-21.
    [Mostefaoui02] Mostefaoui A., Kosch H., Brunie L. Semantic based prefetching in news-on-demand video servers. Multimedia Tools and Applications, vol.18, no.2, 2002, pp.159-179.
    [Otsuka06]Otsuka I., Radhakrishnan R., Siracusa M. et al. An enhanced video summarization system using audio features for a personal video recorder. IEEE Transactions on Consumer Electronics, vol.52, no.1,2006, pp.168-172.
    [Liu05]Liu H. Y, Zhang H. A content-based broadcasted sports video retrieval system using multiple modalities:SportBR. In Proceedings of the Fifth International Conference on Computer and Information Technology,2005, pp.652-656.
    [Chen05]Chen C. Y., Wang J. C., Wang J. F.et al. An efficient news video browsing system for wireless network application. In Proceedings of International Conference on Wireless Networks, Communications and Mobile Computing (WiCOM),2005, pp. 1377-1381.
    [Liu03]Liu H. Y, Zhou D. R. A content-based news video browsing and retrieval system. In Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis,2003, pp.793-798.
    [Wijesekera00] Wijesekera D., Barbara D. Mining cinematic knowledge work in progress. In Proceedings of International Workshop on Multimedia Data Mining (MDM/KDD), 2000, pp.98-103.
    [Liu10]Liu A., Yang Z. X. Watching, thinking, reacting:a human-centered framework for movie content analysis. International Journal of Digital Content Technology and its Applications (JDCTA), vol.4, no.5,2010, pp.23-37.
    [Zhang09]Zhang S. L., Tian Q., Huang Q. M. et al. Utilizing affective analysis for efficient movie browsing. In Proceedings of IEEE International Conference on Image Processing,2009, pp.1853-1856.
    [Lehane07]Lehane B., O'Connor N. E., Lee H. et al. Indexing of fictional video content for event detection and summarization. EURASIP Journal on Image and Video Processing, vol.2007, no.2,2007, pp.1-15.
    [Yeo96]Yeo B. L. Efficient processing of compressed images and video. PhD. Thesis, Princeton University,1996.
    [Aner01]Aner A., Kender J. R. A unified memory-based approach to cut, dissolve, key frame and scene analysis. In Proceedings of International Conference on Image Processing, vol.3,2001, pp.370-373.
    [Ye01]叶楠,李捷,郑志航.一种MPEG域上的快速缓变效果场景分割算法.上海交通大学学报,vol.35,no.1,2001,pp.34-36.
    [Jin00]金红,周源华.用Hausdorff距离进行视频镜头边界检测.电视技术,vol.221,2000,pp.12-19.
    [Cao08]曹建荣,蔡安妮.压缩域中基于支持向量机的镜头边界检测算法.电子学报,Vol.36,no.1,2008,pp.203-208
    [Jun00]Jun S. B., Yoon K., Lee H. Y. Dissolve transition detection algorithm using spatio-temporal distribution of MPEG macro-block types. In Proceedings of ACM International Conference on Multimedia,2000, pp.391-394.
    [Manda199]Mandal M. K., Idris F., Panchanathan S. Image and video indexing in the compressed domain:a critical review. Image and Vision Computing, vol.17, no.7, 1999, pp.513-529.
    [Liu04]Liu Y., Wang W. Q., Gao W. et al. A novel compressed domain shot segmentation algorithm on H.264/AVC. In Proceedings of International Conference on Image Processing,2004, pp.2235-2238.
    [Kim05]Kim S. M., Byun J. W., Won C. S. A scene change detection in H.264/AVC compression domain. In Proceedings of Pacific Rim Conference on Multimedia,2005, pp.1072-1082.
    [Bita06]Bita D., Mahmoud R. H., Mohammad K. A. A novel fade detection algorithm on H.264/AVC compressed domain. Advances in Image and Video Technology,2006, pp. 1159-1167.
    [Koprinska01] Koprinska I., Carrato S. Temporal video segmentation:A survey. Signal Processing:Image Communication, vol.16, no.5,2001, pp.477-500.
    [Yeo95]Yeo B. L., Liu B. Rapid scene analysis on compressed video. IEEE Transactions on Circuits and Systems for Video Technology, vol.5, no.6,1995, pp.533-544.
    [Su05]Su C. W., Liao H. Y., Tyan H. R. et al. A motion-tolerant dissolve detection algorithm. IEEE Transactions on Multimedia, vol.7, no.6,2005, pp.1106-1113.
    [Ren01]Ren W., Sharma M., Singh S. Automated video segmentation. In Proceedings of International Conference on Information, Communication, and Signal Processing. Singapore,2001.
    [Heng01]Heng W. J., Ngan K. N. An object-based shot boundary detection using edge tracing and tracking. Journal of Visual Communication and Image Representation, vol. 12, no.3,2001, pp.217-239.
    [Huang08]Huang, X. D., Ma H. D., Yuan H. D. A hidden markov model approach to parsing mtv video shot. In Proceedings of the 2008 Congress on Image and Signal Processing, vol.2,2008, pp.276-280.
    [Mas03]Mas J., Fernandez G. Video shot boundary detection based on color histogram. In Proceedings of the TREC Video Retrieval Evaluation Conference (TRECVID2003), 2003.
    [Yuan05]Yuan J. H., Li J. M., Lin F. Z. et al. A unified shot boundary detection framework based on graph partition model. In Proceedings of the 13th annual ACM international conference on Multimedia,2005, pp.539-542.
    [Zuzana06]Cernekova Z., Pitas I., Nikou C. Information theory-based shot cut or fade detection and video summarization. IEEE Transactions on Circuits and Systems for Video Technology, vol.16, no.1,2006, pp.82-91.
    [Chiu08]Chiu S. T., Lin G. S., Chang M. K. An effective shot boundary detection algorithm for movies and sports. In Proceedings of International Conference on Innovative Computer Information and Control,2008, pp.173-176.
    [Zhang2003]张毓晋.基于内容的视觉信息检索.北京：科学出版社,2003
    [Lienhart99a] Lienhart R., Pfeiffer S., Effelsberg W. Scene determination based on video and audio features. In Proceedings of IEEE International Conference on Multimedia Computing and Systems,1999, pp.685-690.
    [Rui99]Rui Y, Huang T. S, Mehrotra S. Constructing table-of-content for videos. Multimedia Systems, vol.7, no.5,1999, pp.359-368.
    [Lin01]Lin T., Zhang H. J., Shi Q. Y. Video content representation for shot retrieval and scene extraction. International Journal of Image and Graphics, vol.1, no.3,2001, pp. 507-526.
    [Nge01]Nge C. H., Pong T. C, Zhang H. J. On clustering and retrieval of video shots. In Proceedings of ACM International Conference on Multimedia,2001, pp.51-60.
    [Rasheed05] Rasheed Z., Shah M. Detection and representation of scenes in videos. IEEE Transactions on Multimedia, vol.7, no.6,2005, pp.1097-1105.
    [Wang06]Wang J. Q., Duan L. Y., Lu H. G. et al. A mid-level scene change representation via audiovisual alignment. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol.2,2006, pp.409-412.
    [Truong02]Truong B. T., Venkatesh S., Dorai C. Neighborhood coherence and edge based approaches to film scene extraction. In Proceedings of International Conference on Pattern Recognition,2002, pp.350-353.
    [Rasheed03] Rasheed Z., Shah M. Scene detection in Hollywood movies and TV shows. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2003, pp.343-350.
    [Chena08]Chena L. H., Lai Y. C, Liao H. Y. M. Movie scene segmentation using background information. Pattern Recognition, vol.41, no.3,2008, pp.1056-1065.
    [Li05]Li. F. F., Piepro P. Bayesian Hierarchical model for learning natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.524-531.
    [Oliva01]Oliva A., Torralba A. Modeling the shape of the scene:a holistic representation of the spatial envelope. International Journal of Computer Vision, vol.42, no.3,2001, pp.145-175.
    [Lazebnik06] Lazebnik S., Schmid C, Ponce J. Beyond Bags of Features:Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2006, pp. 2169-2178.
    [Xiao10]Xiao J. X., Hays J. et al. SUN database:Large-scale scene recognition from abbey to zoo. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2010, pp.3485-3492.
    [Marszalek209] Marszalek M., Latev I., Schmid C. Actions in context. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2009, pp.2929-2936.
    [Wu11]Wu J. X., Rehg J. M. CENTRIST:A Visual Descriptor for Scene Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, no.8,2011, pp.1489-1501.
    [Ando07]Ando R., Shinoda K., Mochizuki T. A robust scene recognition system for baseball broadcast using data. In Proceedings of the ACM International Conference on Image and Video Retrieval,2007, pp.186-193.
    [Huang05]Huang J. C., Liu Z., Wang Y. Joint scene classification and segmentation based on hidden Markov model. IEEE Transactions on Multimedia, vol.7, no.3,2007, pp. 538-550.
    [Engels10]Engels C., Deschacht K., Becker J. H. et al. Automatic annotation of unique locations from video and text. In Proceedings of British Machine Vision Conference, 2010, pp.115.1-115.11.
    [Satoh99]Satoh S., Nakamura Y., Kanade T. Name-it:naming and detecting faces in news videos. IEEE Multimedia, vol.6, no.1,1999, pp.22-35.
    [Chen04]Chen M. Y., Hauptmann A. Searching for a specific person in broadcast news video. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing,2004, pp.1036-1039.
    [Ozkan06]Ozkan D., Duygulu P. Finding people frequently appearing in news. In Proceedings of International Conference on Image and Video Retrieval,2006, pp. 173-182.
    [Yang04]Yang J., Hauptmaim A., Chen M. Y. Finding person x:correlating names with visual appearances. In Proceedings of International Conference on Image and Video Retrieval,2004. pp.270-278.
    [Everingham06] Everingham M., Sivic J., Zisserman A. "Hello! My name is... Buffy"-automatic naming of characters in TV video. In Proceedings of British Machine Vision Conference (BMVC),2006, pp.899-908.
    [Sivic09]Sivic J., Everingham M., Zisserman A. "Who are you?"-learning person specific classifiers from video. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2009, pp.1145-1152.
    [Everingham09] Everingham M., Sivic J., Zisserman A. Taking the bite out of automatic naming of characters in TV videos. Image and Vision Computing, vol.27, no.5,2009, pp.545-559.
    [Liu07]Liu Z., Wang Y. Major cast detection in video using both speaker and face information. IEEE Transactions on Multimedia, vol.9, no.1,2007, pp.89-101.
    [Tapaswil2] Tapaswi M., Bauml M., Stiefelhagen R. "Knock! Knock! who is it?" probabilistic person identification in TV series. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2012, pp.2658-2665.
    [Arandjelovic06] Arandjelovic O., Cipolla R. Automatic cast listing in feature length films with anisotropic manifold space. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition,2006, pp.1513-1520.
    [Gao07]Gao Y, Wang T., Li J. G. et al. Cast indexing for videos by ncuts and page ranking. In Proceedings of International Conference on Image and Video Retrieval,2007, pp. 441-447.
    [Cotsaces05] Cotsaces C., Gavrielides M. A., Ioannis P. A survey of recent work in video shot boundary detections. In Proceedings of Workshop on Audio-Visual Content and Information Visualization in Digital Libraries (AVIVDiLib),2005, pp.4-6.
    [Ren01]Ren W., Sharma M., Singh S. Automated video segmentation. In Proceedings of International Conference on Information, Communication, and Signal Processing, 2001.
    [Yeo95]Yeo B. L., Liu B. Rapid scene analysis on compressed video. IEEE Transactions on Circuits and Systems for Video Technology, vol.5, no.6,1995, pp.533-544.
    [Lalor01]Lalor G. C., Zhang C. Multivariate outlier detection and remediation in geochemical databases. The Science of the Total Environment, vol.28, no.1,2001, pp.99-109.
    [Zhang93]Zhang H. J., Kankanhalli A., Smoliar S. W. Automatic partitioning of full-motion video. Multimedia systems, vol.1, no.1,1993, pp.10-28.
    [Akutsu92]Akutsu A., Tonomura Y., Hashimoto H. et al. Video indexing using motion vectors. In Proceedings of SPIE Visual Communications and Image Processing,1992, pp.1522-1530.
    [Shahrary95] Shahrary B. Scene change detection and content-based sampling of video sequences. In Proceedings of SPIE Digital Video Compression:Algorithms and Technologies,1995, pp.2-13.
    [Y0006]Yoo H. W., Ryoo H. J., Jang D. S. Gradual shot boundary detection using localized edge blocks. Multimedia Tools and Applications, vol.28, no.3,2006, pp. 283-300.
    [Henga01]Henga W. J., Ngan K. N. An object-based shot boundary detection using edge tracing and tracking. Visual Communication and Image Representation, vol.12, no.3, 2001, pp.217-239.
    [Zhu09]Zhu S. H., Liu Y. C. Automatic scene detection for advanced story retrieval. Expert Systems with Applications, vol.36, no.3, Part 2,2009, pp.5976-5986.
    [Tapu11]Tapu R., Zaharia T. A complete framework for temporal video segmentation. In Proceedings of IEEE International Conference on Consumer Electronics,2011, pp. 156-160.
    [Adjeroh09] Adjeroh D., Lee M. C., Banda N. et al. Adaptive edge-oriented shot boundary detection. EURASIP Journal on Image and Video Processing, vol.2009, no.5,2009, pp.1-14.
    [韩06]韩冰.基于智能软计算的视频镜头边界检测算法研究[博士学位论文].西安电子科技大学,2006.
    [耿06]耿玉亮.基于内容的视频结构化技术的研究[博士学位论文].北京邮电大学,2006.
    [Swain91]Swain M. J., Ballard D. H. Color Indexing. International Journal of Computer Vision, vol.7, no.1,1991, pp.11-32.
    [Huang01]Huang C. L., Liao B. Y. A robust scene-change detection method for video segmentation. IEEE Transactions Circuits and Systems for Video Technology, vol.11, no.12,2001, pp.1281-1288.
    [Mas03]Mas J., Fernandez G. Video shot boundary detection based on color histogram. In Proceedings of the TREC Video Retrieval Evaluation Conference (TRECVID2003), 2003.
    [Yeung95]Yeung M., Liu B. Efficient matching and clustering of video shots. In Proceedings of International Conference on Image Processing,1995, pp.338-341.
    [Han02]Han S. H., Yoon K. J., Kweon I. S. A new technique for shot detection and key frames selection in histogram space. In 12th Workshop on Image Proceeding and Image Understanding,2000.
    [HanjalicO2] Hanjalic A. Shot-boundary detection:Unraveled and resolved? IEEE Transactions on Circuits and Systems for Video Technology, vol.12, no.2,2002, pp. 90-105.
    [Yuan05]Yuan J. H., Li J. M., Lin F. Z. et al. A unified shot boundary detection framework based on graph partition model. In Proceedings of the 13th annual ACM International Conference on Multimedia,2005, pp.539-542.
    [Huang08]Huang C. R., Lee H. P., Chen C. S. Shot change detection via local keypoint matching. IEEE Transactions on Multimedia, vol.10, no.6,2008, pp.1097-1108.
    [Cernekova06] Cernekova Z., Pitas I., Nikou C. Information theory-based shot cut or fade detection and video summarization. IEEE Transactions on Circuits and Systems for Video Technology, vol.16, no.1,2006, pp.82-91.
    [Xia07]Xia D. Y., Deng X. F., Zeng Q. N. Shot boundary detection based on difference sequences of mutual information. In Proceedings of the Fourth International Conference on Image and Graphics,2007, pp.389-394.
    [Wu98]Wu M., Wolf W., Liu B. An algorithm for wipe detection, In Proceedings of International Conference on Image Processing,1998, pp.893-897.
    [Pei99]Pei S. C., Chou Y. Z. Efficient mpeg compressed video analysis using macroblock type information. IEEE Transactions on Multimedia, vol.1, no.4,1999, pp.321-333.
    [PeiO2]Pei S. C., Chou Y. Z. Effective wipe detection in mpeg compressed video using macro block type information. IEEE Transactions on Multimedia, vol.4, no.3,2002, pp.309-319.
    [Lienhart1] Lienhart R. W. Reliable dissolve detection. Storage and Retrieval for Media Databases,2001, pp.219-230.
    [Su05]Su C. W., Liao H. Y., Tyan H. R. et al. A motion-tolerant dissolve detection algorithm. IEEE Transactions on Multimedia, vol.7, no.6,2005, pp.1106-1113.
    [Zhao07]Zhao Z. C., Zeng X., Liu T. et al. Bupt at trecvid 2007:Shot boundary detection. In Proceedings of the 2007 TREC Video Retrieval Evaluation (TRECVID),2007.
    [Hu11]Hu W. M., Xie N. H., Li L. et al. A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol.41, no.6,2011, pp.797-819.
    [Koprinska01] Koprinska I., Carrato S. Temporal video segmentation:a survey. Signal Processing:Image Communication, vol.16, no.5,2001, pp.477-500.
    [Lefevre03] Lefevre S., Holler J., Vincent N. A review of real-time segmentation of uncompressed video sequences for content-based search and retrieval. Real-Time Imaging, vol.9, no.1,2003, pp.73-98.
    [Ling08]Ling X., Ouyang Y. X., Huan L. et al. A method for fast shot boundary detection based on svm. In Proceedings of Congress on Image and Signal Processing,2008, pp. 445-449.
    [Li09]Li Y. N., Lu Z. M., Niu X. M. Fast video shot boundary detection framework employing pre-processing techniques. IET Image Processing, vol.3, no.3,2009, pp. 121-134.
    [Danisman06] Danisman T., Alpkocak A. Dokuz Eylul university video shot boundary detection at TRECVid 2006. In Proceedings of the TREC Video Retrieval Evaluation (TRECVID),2006.
    [Chiu08]Chiu S. T., Lin G. S., Chang M. K. An effective shot boundary detection algorithm for movies and sports. In Proceedings of International Conference on Innovative Computer Information and Control,2008, pp.173-176
    [Xiong98]Xiong W. and Lee J. C. M. Efficient scene change detection and camera motion annotation for video classification. Computer vision and image understanding, vol.71, no.2,1998, pp.166-181.
    [Cover91]Cover T. M., Thomas J. A. Elements of Information Theory. Wiley-Interscience Publication, New York,1991.
    [Derpanis04] Derpanis K. G. The harris corner detector. New York,2004.
    [Qin10]Qin T. F., Gu J. Y., Chen H. T. et al. A fast shot-boundary detection based on k-step slipped window. In Proceedings of IEEE International Conference on Network Infrastructure and Digital Content,2010, pp.190-195.
    [Zhang03]张毓晋.基于内容的视觉信息检索.北京：科学出版社,2003.
    [Yang02]杨娜,罗航哉,薛向阳.一种用于电视新闻节目的播音员镜头检测算法.软件学报,vol.13,no.8,2002,pp.1559-1567.
    [Jiang03]姜帆,章毓晋.新闻视频的场景分段索引及摘要生成.计算机学报,vol.26,no.7,2003,pp.859-865.
    [Tjondronegoro04] Tjondronegoro D., Chen Y., Pham B. Classification of self-consumable highlights for soccer video summaries. In Proceedings of IEEE International Conference on Multimedia & Expo, Jun,2004, pp.579-582.
    [Xu03]Xu M., Duan L. Y., Xu C. S. et al. A fusion scheme of visual and auditory modalities for event detection in sports video. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr.,2003, pp. Ⅲ-189-192.
    [Rasheed03] Rasheed Z., Shah M. Scene detection in Hollywood movies and TV shows. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2003, pp.343-350.
    [Chasanis09] Chasanis V. T., Likas A. C., Galatsanos N. P. Scene detection in videos using shot clustering and sequence alignment. IEEE Transactions on Multimedia, vol.11, no. 1,2009, pp.89-100.
    [Hanjalic 99] Hanjalic A., Lagendijk R.L., Biemond J. Automated high-level movie segmentation for advanced video-retireval systems. IEEE Trans, on Circuits and Systems of Video Technology, vol.9, no.4,1999, pp.580-588.
    [Tavanapong2004] Tavanapong W., Zhou J. Shot clustering techniques for story browsing. IEEE Transactions on Multimedia, vol.6, no.4,2004, pp.517-527.
    [Yeung98]Yeung M., Yeo B., Liu B. Segmentation of videos by clustering and graph analysis. Computer Vision and Image Understanding, vol.71, no.1998, pp.94-109.
    [Rasheed05] Rasheed Z., Shah M. Detection and representation of scenes in videos. IEEE Transactions on Multimedia, vol.7, no.6,2005, pp.1097-1105.
    [Zhai05]Zhai Y, Shah M. A general framework for temporal video scene segmentation. In Proceedings of IEEE International Conference on Computer Vision (ICCV),2005, pp. 1111-1116.
    [Tan02]Tan Y. P., Lu H. Model-based clustering and analysis of video scenes. In Proceedings of IEEE International Conference on Image Processing,2002, pp. 617-620.
    [Gupta07]Gupta L., Pathangay V., Patra A. et al. Indoor versus outdoor scene classification using probabilistic neural network. EURASIP Journal on Applied Signal Processing, vol.2007, no.1,2007, pp.1-10.
    [Zhou08]Zhou X., Zhuang X. D., Tang H. et al. A novel Gaussianized vector representation for natural scene categorization. In Proceedings of International Conference on Pattern Recognition (ICPR),2008, pp.1-4.
    [Greene09]Greene M. R., Oliva A. Recognition of natural scenes from global properties: seeing the forest without representing the trees. Cognitive Psychology, vol.58, no.2, 2009, pp.137-176.
    [Li05]Li F. F., Piepro P. Bayesian hierarchical model for learning natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp.524-531.
    [Lazebnik06] Lazebnik S., Schmid C., Ponce J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2006, pp.2169-2178.
    [Wu11]Wu J. X., Rehg J. M. CENTRIST:a visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, no.8,2011, pp.1489-1501.
    [Xiao10]Xiao J. X., Hays J., Ehinger K. A. et al. SUN database:large-scale scene recognition from abbey to zoo. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2010, pp.3485-3492.
    [Ando07]Ando R., Shinoda K., Mochizuki T. A robust scene recognition system for baseball broadcast using data. In Proceedings of the ACM International Conference on Image and Video Retrieval,2007, pp.186-193.
    [Huang05]Huang J. C, Liu Z., Wang Y. Joint scene classification and segmentation based on hidden Markov model. IEEE Transactions on Multimedia, vol.7, no.3,2007, pp. 538-550.
    [Schaffalitzky03] Schaffalitzky F., Zisserman A. Automated Location Matching in Movies. Computer Vision and Image Understanding, vol.92, no.2,2003, pp.236-264.
    [Marszalek09] Marszalek M., Laptev I., Schmid C. Actions in context. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition,2009, pp.2929-2936.
    [Engels10]Engels C., Deschacht K., Becker J. H. et al. Automatic annotation of unique locations from video and text. In Proceedings of British Machine Vision Conference, 2010, pp.115.1-115.11.
    [Wang06]Wang J., Duan L., Lu H. et al. A midlevel scene change representation via audiovisual alignment. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol.2,2006, pp.409-412.
    [ConstantinidisOl] Constantinidis A. S., Fairhurst M. C., Rahman A. A new multi-expert decision combination algorithm and its application to the detection of circumscribed masses in digital mammograms. Pattern Recognition, vol.34, no.8,2001, pp. 1528-1537.
    [Jing03]Jing X. Y., Zhang D., and Yang J. Y. Face recognition based on a group decision-making combination approach. Pattern Recognition, vol.36, no.7,2003, pp. 1675-1678.
    [Yang02]Yang J., Yang J. Y. Generalized K-L transform based combined feature extraction. Pattern Recognition, vol.35, no.1,2002, pp.295-297.
    [Yang03]Yang J., Yang J. Y, Zhang D. et al. Feature fusion:parallel strategy vs. serial strategy. Pattern Recognition, vol.36, no.6,2003, pp.1369-1381.
    [Rasheed03] Rasheed Z., Shah M. Scene detection in Hollywood movies and TV shows. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2003, pp.343-348.
    [Sun05]孙权森,曾生根,王平安,夏德深.典型相关分析的理论及其在特征融合中的应用.计算机学报,vol.28,no.9,2005,pp.1524-1533.
    [Zhang08]张建明,杨丽瑞,王良民.基于典型相关分析特征融合的人脸表情识别方法.计算机应用.vol.28,no.3,2008,pp.643-649.
    [Rasheed03] Rasheed Z., Shah M. A graph theoretic approach for scene detection in produced videos. In Proceedings of ACM Multimedia Information Retrieval Workshop,2003.
    [Shi00]Shi J. B., Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, no.8, Aug.2000, pp.888-905.
    [Ghaneml2]Ghanem B., Zhang T. Z., Ahuja N. Robust video registration applied to field-sports video analysis. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2012,
    [Brown06]Brown M., Lowe D. G. Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, vol.74,2006, pp.59-73.
    [Wagnerll]Wagner A., Wright J., Ganesh A. et al. Towards a practical face recognition system:robust alignment and illumination by sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, no.2,2012, pp. 372-386.
    [Zhangll]Zhang Z. D., Liang X. et al. TILT:transform invariant low-rank textures. In Proceedings of Asian Conference on Computer Vision,2011, pp.314-328.
    [Mikolajczyk04] Mikolajczyk K., Schmid C. Scale & affine invariant interest point detectors. International Journal of Computer Vision, vol.60, no.1,2004, pp.63-86.
    [Lowe99]Lowe D. G. Object recognition from local scale-invariant features. In Proceedings of International Conference on Computer Vision,1999, pp.1150-1157.
    [Bourdev09] Bourdev L., Malik J. Poselets:body part detectors trained using 3D human pose annotations. In Proceedings of IEEE International Conference on Computer Vision,2009, pp.1365-1372.
    [Blei03]Blei D. M., Ng A. Y., Jordan M. I. Latent dirichlet allocation. The Journal of Machine Learning Research, vol.3,2003, pp.993-1022.
    [Gelman95]Gelman A., Carlin J. B., Stern H. S. et al. Bayesian data analysis. Chapman Hall/CRC,1995.
    [Zhao07]Zhao W. L., Ngo K., Tan H. K. et al. Near duplicate keyframe identification with interest point matching and pattern learning. IEEE Transactions on Multimedia, vol.9, no.5,2007, pp.1037-1048.
    [Kyperountas07] Kyperountas M., Kotropoulos C., Pitas I. Enhanced eigen-audio frames for audiovisual scene change detection. IEEE Transactions on Multimedia, vol.9, no.4,2007, pp.785-797.
    [Gao12]Gao G. Y., Ma H. D. Multi-modality movie scene detection using kernel canonical correlation analysis. In Proceedings of International Conference on Pattern Recognition,2012, pp.3074-3077.
    [Arandjelovic05] Arandjelovic O., Zisserman A. Automatic face recognition for filmcharacter retrieval in feature-length films. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2005, pp.860-867.
    [Everingham05] Everingham M., Zisserman A. Identifying individuals in video by combining generative and discriminative head models. In Proceedings of IEEE International Conference on Computer Vision,2005, pp.1103-1110.
    [Everingham06] Everingham M., J. Sivic, Zisserman A.'Hello! my name is... bufy'-automatic naming of characters in TV video. In Proceedings of British Machine Vision Conference,2006, pp.889-908.
    [Zhang09]Zhang Y. E, Xu C., Lu H. et al. Character identification feature-length films using global face-name matching. IEEE Transactions on Multimedia, vol.11, no.9, 2009, pp.1276-1288.
    [Viola01]Viola P., Jones M. Rapid object detection using a boosted cascade of simple features. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2001, pp.511-518.
    [Satoh99]Satoh S., Nakamura Y., Kanade T. Name-it:naming and detecting faces in news videos. IEEE Multimedia, vol.6, no.1,1999, pp.22-35.
    [Chen04]Chen M. Y, Hauptmann A. Searching for a specific person in broadcast news video. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing,2004, pp.1036-1039.
    [Berg04]Berg A., Edwards J., Maire M. et al. Names and faces in the news. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, pp. II-848-II-852.
    [Ozkan06]Ozkan D., Duygulu P. Finding people frequently appearing in news. In Proceedings of International Conference on Image and Video Retrieval,2006, pp. 173-182.
    [Yang04]Yang J., Hauptmaim A., Chen M. Y. Finding person x:correlating names with visual appearances. In Proceedings of International Conference on Image and Video Retrieval,2004. pp.270-278.
    [Liu07]Liu Z., Wang Y. Major cast detection in video using both speaker and face information. IEEE Transactions on Multimedia, vol.9, no.1,2007, pp.89-101.
    [Kanak03]Kanak A., Erzin E., Yemez Y. et al. Joint audio-video processing for biometric speaker identification. In Proceedings of International Conference on Multimedia and Expo,2003, pp.561-564.
    [Li04]Li Y, Narayanan S., Jay-Kuo C. Adaptive speaker identification with audiovisual cues for movie content analysis. Pattern Recognition Letters, vol.25, no.7,2004, pp. 777-791.
    [Kwon05]Kwon S., Narayanan S. Unsupervised speaker indexing using generic models. IEEE Transactions on Speech and Audio Processing, vol.13, no.5,2005, pp. 1004-1013.
    [Tapaswil2] Tapaswi M., Bauml M., Stiefelhagen R. Knock knock who is it? probabilistic person identification in TV series. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2012, pp.2658-2665.
    [Everingham09] Everingham M., Sivic J., Zisserman A. Taking the bite out of automated naming of characters in TV video. Image and Vision Computing, vol.27, no.2009,2009, pp.545-559.
    [Olshausen97] Olshausen B. A., Field D. J. Sparse coding with an over complete basis set: A strategy employed by V1?. Vision Research, vol.37,1997, pp.3311-3325.
    [Serre06]Serre T. Learning a dictionary of shape-components in visual cortex: Comparison with neurons, humans and machines. Ph.D. dissertation, MIT,2006.
    [Donoho06]Donoho D. For most large underdetermined systems of linear equations the minimalll-norm solution is also the sparest solution. Communications on Pure and Applied Mathematics, vol.59, no.6,2006, pp.797-829.
    [Candes06]Candes E., Tao T. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Transactions on Information Theory, vol.52, no.12, 2006, pp.5406-5425.
    [Zhao06]Zhao P., Yu B. On model selection consistency of lasso. Journal of Machine LearningResearch, vol.7,2006, pp.2541-2567.
    [Amaldi98]Amaldi E., Kann V. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, vol.209,1998, pp.237-260.
    [Yuan10]Yuan X. T, Yan S. C. Visual classification with multi-task joint sparse representation. In IEEE International Conference on Computer Vision and Pattern Recognition,2010, pp.3493-3500.
    [Obozinski09] Obozinski G., Taskar B., Jordan M. Joint covariate selection and joint subspace selection for multiple classification problems. Journal of Statistics and Computing, vol.20, no.2,2010, pp.231-252.
    [Nesterov07] Nesterov Y. Gradient methods for minimizing composite objective function. CORE Discussion Paper 2007/076,2007.
    [Tseng08]Tseng P. On accelerated proximal gradient methods for convex-concave optimization, submitted to SI AM Journal of Optimization,2008.
    [Schmidt09] Schmidt M., Berg E., Friedlander M. et al. Optimizing costly functions with simple constraints:A limited-memory projected quasi-newton algorithm. In Proceedings of the Conference on Artificial Intelligence and Statistics,2009, pp.456-463.
    [Gehler09]Gehler P., Nowozin S. On feature combination for multiclass object classification. In Proceedings of International Conference on Computer Vision,2009, pp. 221-228.
    [Lafferty01] Lafferty J., McCallum A., Pereira F. Conditional random fields:probabilistic models for segmenting and labeling sequence data. In Proceedings of 18th International Conference on Machine Learning,2001, pp.282-289.
    [MacLean00] MacLean J., Tsotsos J. Fast pattern recognition using gradient-descent search in an image pyramid. In Proceedings of International Conference on Pattern Recognition, vol.2,2000, pp.873-877.
    [Sun02]孙冬梅,裘正定.利用薄板样条函数实现非刚性图像匹配算法.电子学报,vol.30, no.8,2002, pp.1104-1107.
    [Boykov01]Boykov Y. Y, Veksler O., Zabih R. Fast approximate energy minimization via graph cuts. vol.23, no.11,2001, pp.1222-1239.
    [Sutton98]Sutton R. S., Barto A. G. Reinforcement learning:an introduction. The MIT Press, Cambridge, MA,1998.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700