基于视听信息的视频语义分析与检索技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于视听信息的视频语义分析与检索技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

作者：闫乐林
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：视频检索 ; 多模态信息融合 ; 未确知测度 ; 情感类型 ; 球拍类体育视频 ; 条件随机场 ; 新闻故事分割
英文关键词：Video Retrieval ; Multimodality Information Fusion ; Unascertained
英文关键词：Measure ; Emotion Type ; Racquet Sports Video ; Conditional Random Fields ; News
英文关键词：Story Segmentation
学位年度：2012
导师：温向明
学科代码：081001
学位授予单位：北京邮电大学
论文提交日期：2012-05-04

摘要

随着计算机技术、视频压缩技术和互联网技术的迅猛发展,人们可访问的信息资源空前丰富。视频资料因其具有直观性、信息综合性的特点,在数据库中的地位日渐增强,其数量增长更是惊人。但同时,由于视频结构的复杂性、数据内容的多样性以及时空多维性的特点,致使如何有效地组织视频数据、快速地检索和浏览用户需求的视频信息等成为视频领域内亟待解决的重大课题。在这种背景之下,基于内容的视频检索技术应运而生,它综合了图像处理、人工智能、模式识别及计算机视觉等领域的相关理论知识,并对视频特征和视频对象进行深入分析和研究,旨在获得蕴含在视频中的高层语义信息,建立可用的视频检索体系。因此,对视频语义的分析和检索技术的研究具有广阔的前景和现实意义。
     本文以影视类视频、球拍类体育视频和新闻类视频为研究对象,融合了视频中的视觉、音频等多模态信息,对各种视频进行了有针对性的语义分析和检索方法研究。影视类视频语义分析从情感角度展开,分析了此类视频的视觉低层情感特征与音频的情感特征,基于未确知测度模型对影视类视频场景的情感类型进行了检测和识别。通过分析球拍类体育视频的视觉、音频特征,对视频中的感兴趣事件进行语义识别和检索研究。基于条件随机场模型,比较系统的研究了新闻类视频故事场景的检测和分割方法。主要工作如下：
     (1)通过分析影视类视频的低层视觉情感特征和音频情感特征,基于未确知测度理论,提出了一种视频情感内容识别的新算法。该方法先分析了影视类视频的场景亮度、镜头切变率、色调效能三个视觉类情感特征和多种音频情感特征,介绍了每种视频情感特征的数据提取方法,并分别构建了影视类视频场景的视觉情感特征向量和音频情感特征向量。其次,定义了视频情感识别的对象空间和指标空间,构造了各视听情感特征的未确知测度函数和未确知测度矩阵。最后,采用信息熵确定情感特征向量中各分量的权位,用置信度识别准则对视频场景的情感类型进行识别和判定。实验结果表明所捉算法是有效的、可行的。
     (2)提出一种融合视觉、音频特征信息的球拍类体育视频精彩片段检索的算法。首先,基于支持向量机(Support Vector Machine, SVM)视频镜头分类与帧图像边缘特征的方法,将视频流中的镜头分为比赛镜头(Court View Shot)和非比赛镜头(Non-Court View Shot)两类；然后,分析了球拍类体育视频中声音和感兴趣事件之间的内在联系,构建了一个基于击球声和掌声的SVM分类器模型；最后,将镜头分类与声音事件分类进行融合,并建立了视频中精彩片段的提取规则,并对检索结果进行了排序处理。
     (3)提出了一种基于条件随机场模型的新闻故事识别和检索方法。该方法首先对新闻视频的音频内容和结构特征进行分析,应用规则分类和隐马尔可夫模型(Hidden Markov Model, HMM)分类相结合的方法将新闻视频中的音频数据进行分层、细化处理,并按照语义将其分类为男主播语音、女主播语音、交替播报、现场声音、介分音乐和有效静音六种。接着,通过分析新闻视频中镜头的特点,将新闻类视频按照语义分为主播镜头、静态画面镜头、现场访录镜头、广告镜头和其它类镜头五种,辅以音频语义信息对视频中的镜头进行了识别和分类。在完成新闻类视频音频语义分类和镜头语义识别的基础之上,通过相应的关键词序列的转换,构建了新闻故事场景的条件随机场(Conditional Random Fields, CRFs)分割提取模型,对新闻类视频进行了较为成功的语义识别和检索。
     (4)设计并初步实现了一个基于内容的视频语义识别和检索的实验平台,验证了上述所提算法的性能。
With the rapid development of the computer technology, video compression technology and Internet technology, people can access a variety of information resources. Video becomes increasingly important content in the database because of its intuitiveness and comprehensiveness. However, because of the characteristics existing in video, such as complicated structure, content diversity and space-time multidimensionality, it becomes an urgent research topic in the field how to organizing and managing those video data to convenient for people to find the needed information expediently and fleetly. Content-based video retrieval technique arises at the historic moment in this background, which involves numerous discipline domains that are Digital Image Processing, Artificial Intelligence, Pattern Recognition and Computer Vision, etc. The study on video features and objects is aimed at understanding the semantic information embedded in video and founding an effective retrieval system. So, there is practicality and foreground to engage in the research on video semantic analysis and retrieval.
     The paper mainly presents content-based clip retrieval and semantic recognition methods of three video genres which are film and TV play video, racquet sports videos and new video based on fusion of visual and audio features. The semantic analysis of film and TV play video is researched from an emotion perspective in depth. A few low-level visual and audio features in the video are discussed respectively, and the unascertained measure model is properly built and applied to recognize the emotion type of video scene. Then, we study on the highlights detection in racquet video by means of the audiovisual integration. Furthermore, a news story segmentation method is presented and analyzed systematically according to the conditional random field model. The main contents of this paper are as following:
     (1) Based on unascertained mathematics, a novel algorithm for affective content recognition of the video is proposed by establishing the relationship between low-level video features and high-level cognitive emotion about video scene. Firstly, the scene brightness, shot cut rate and color efficacy in a video scene are selected as the low-level visual features for their special characteristics that can be used to better distinguish different types of human emotion. Similarly, a few audio emotion features are filtered and analyzed carefully to recognize the affective content with the visual features. Meanwhile, the methods of data extraction from each emotion feature are presented, and the visual and audio emotion feature vectors are created accordingly. Secondly, after constructing the unascertained object space and the index space, three unascertained measure functions are respectively formed to quantify the components in the visual and audio emotion feature vectors, and then the unascertained measure emotion matrixes are built. Finally, the information entropy is applied to determine the weights of each emotion feature vector and their components, and the emotion type of the video scene is obtained according to credible degree criteria. The experimental results verify the feasibility and effectiveness of the proposed algorithm.
     (2) This paper presents a new audiovisual integration scheme for retrieving the highlights from racquet sports videos. Firstly, the shots in racquet sports video are classified into two types:Court View Shot and Non-Court View Shot, by means of the image edge detection theory and the SVM classifier. Then, the inherent relations between the highlights and the audio characteristics are analyzed deeply in racquet sports video, and a SVM classifier model is used to distinguish the ball hitting and the applause from the audio stream. Finally, a few rules are reasonably established to determine the highlights based on integrating the shot semantic infonnation and the audio events embedded in the videos. And, the rally segments are sequentially ranked according to their wonderful degree yet.
     (3) A news story segmentation method is presented based on the CRFs. At first, some typical characteristics about audio content and structure in news video are surveyed and understood completely. By the combination approach of rule and HMM, the audio data are described as a hierachical structure equivalently and subdivided into six semantic categories, namely, anchorman voice, anchorwoman voice, alternate reporting, scene sound, delimited music and valid silence. The next, the shots in news video are classified properly into five semantic categories according to the organizational characters of video content, which are respectively anchor shot, static image shot, interview shot, advertisement shot and other shot. Meanwhile, the different semantic shots are detected and recognized successfully with the help of audio semantic features. After the classifications of audio events and semantic shots, the CRFs model is built with the keyword sequences transformation to segment the news story scenes and accomplish the semantic recognition and retrieval in news videos.
     (4) A content-based semantic recognition and retrieval platform was designed and implemented to validate that the proposed algorithms are effective and practical with good performance in this paper.

引文

[1]章毓晋.基于内容的视觉信息检索[M].北京：科学出版社,2003.
    [2]方勇.基于内容的视频检索关键技术研究[D].上海：上海交通大学,2006.
    [3]王志良.人工心理[M].北京：机械工业出版社,2006.
    [4]Chang S F, Chen W, Sundaram H. VideoQ:A full automated video retrieval system using motion sketches[C]. In:the Proceedings of the 4th IEEE Workshop on Application of Computer Vision,1998,PP:270-271.
    [5]Flickner M, Sawhney H, Niblack, et al. Query by image and video content:the QBIC system[J]. IEEE Computer Society,Vol.28, No.9,1995, PP:23-32
    [6]马国栋.基于内容的视频检索研究[D].上海：上海大学,2005.
    [7]AhnH.I., Picard, R.W. Affective Cognitive Leaning and Decision Making:A Motivational Reward Framework For Affective Agents[C]. In:Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction. Beijing,2005.866-873
    [8]Gaughan G, Smeaton A F, Gurrin C, etal. Design, implementation and testing of an interactive video retrieval system[C].In:Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval. NewYork:ACM,2003.23-30.
    [9]Zhu X, Elmagarmid A K, Xue X, etal. Insight video:toward hierarchical video content organization for efficient browsing, summarization and retrieval[J]. IEEE Transactions on Multimedia,2005,7(4):648-665.
    [10]Storage and Retrieval for Image and Video Databases (SPIE), http://spie.org/electronic-imaging.xml
    [11]ACM Multimedia, http://www.acm.org/sigmm
    [12]ACM Special Interest on Information Retrieval, http://www.acm.org/sigir
    [13]ACM SIGGRAPH, http://www.siggraph.org/
    [14]ACM Special Interest on Knowledge Discovery in Data and Data Mining, http://www.acm.org/sigs/sigkdd
    [15]The Final Program of IEEE International Conference on CVPR 2009, http://www.cvpapers.com/cvpr2009.html
    [16]李国辉.信息组织与检索[M].北京：科学出版社,2001.
    [17]曹建荣.基于内容的风光记录片检索技术研究[D].北京：北京邮电大学信息与通信工程学院,2006.
    [18]Choros Kazimierz. Reduction of faulty detected shot cuts and cross dissolve effects in video segmentation process of different categories of digital videos [J]. Lecture Notes in Computer Science,2011,19(10):124-139.
    [19]Sakarya U, Telatar Z. Video scene detection using graph-based representations [J].Signal Processing:Image Communication,2010,25(10):774-783.
    [20]Hochin, Teruhisa; Nomiya, Hiroki. Towards the usage of harmony of audio and video clips in cross-media retrieval system based on impression [C].In Proceeding of 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, London, United kingdom,2010,PP:253-258.
    [21]Alan Hanjalic and Xu Li-Qun. Affective Video Content Representation and Modeling[J]. IEEE transactions or multimedia,Vol.7, No.1,2005,PP:143-154.
    [22]林新棋.基于模糊理论的电影情感内容识别[D].北京：北京邮电大学,2009.
    [23]G. Sudhir, John C. M. Lee, Anil K. Jain. Automatic Classification of Tennis Video for High-level Content-based Retrieval[C]. In Proceeding of International Workshop on Content-Based Access of Image and Video Databases,Bombay,India,1998,PP:81-90.
    [24]S H Kim, R H Park. An effieient algorithm for video sequence matching using the modified Hausdorff distance and the directed divergence[J]. IEEE Trans Circuits and Systems for Video Technology,2002,Vol.12,No.7,PP:592-595.
    [25]林通,张宏江,封举富等.镜头内容分析及其在视频检索中的应用[J].软件学报,2002,13(8)：1577-1585.
    [26]Fersini E, Messina E, Arosio G, et al. Audio-based emotion recognition in judicial domain:A multilayer support vector machines approach [C]. In Proceeding of 6th International Conference on Machine Learning and Data Mining in Pattern Recognition, Leipzig, Germany,2009,PP:594-602.
    [27]Zhang H J,Zhong D,Smoliar SW. An integrated system for content-based video retrieval and browsing[J].Pattern Recognition,1997, Vol.30, No.4, PP:643-658.
    [28]彭宇新,Ngo Chong-Wah,肖建国.一种基于二分图最优匹配的镜头检索方法[J].电子学报,2004,32(7)：1135-1139.
    [29]A.Ghoshal,P.Arcing,S. Khudanpur.Hidden Markov's Models for Automatic Annotation and Content-Based Retrieval of Images and Video[C]. In Proceedings of ACM conference on Research & Development on Information Retrieval,2005.
    [30]L.Xie,P.Xu,S.F. Chang, et al.Structure analysis of soccer video with domain knowledge and hidden markov model[J]. Pattern Recognition Letters,2004,Vol.25(7),PP:767-775.
    [31]D. Zhong, S.-F. Chang. Structure analysis of sports video using domain models[C]. In Proceedings of International Conference on Multimedia and Expo,2001,PP:713-716.
    [32]Jiang Wei, Chang Shifu,Jebara Tony, et al. Semantic concept classification by joint semi-supervised learning of feature subspaces and support vector machines [C]. In Proceedings of 10th European Conference on Computer Vision, Marseille, France,2008,PP:270-283.
    [33]Y. Song, X.-S. Hua.Semi-automatic video annotation based on active learning with multiple complementary predictors[C]. In Proceedings of ACM International Workshop on Multimedia Information Retrieval,2005.
    [34]M. R. NaPhade. A novel scheme for fast and efficient video sequence matching using compact signatures [C]. In Proceedings of SPIE storage and retrieval for media databases,Califomia,USA,2000,PP:564-572.
    [35]Chen Li, Stentiford F.W.M. Video sequence matching based on temporal ordinal measurement [J]. Pattern Recognition Letters,2008,29(13):1824-1831.
    [36]Gao Li, Li Zhu, Katsaggelos Aggelos. An efficient video indexing and retrieval algorithm using the luminance field trajectory modeling [J]. IEEE Transactions on Circuits and Systems for Video Technology,2009,19(10):1566-1570.
    [37]吴翌,庄越挺,潘云鹤.视频的检索反馈[J].计算机研究与发展,2001,38(5)：46-551.
    [38]彭宇新,Ngo Chong-Wah,董庆杰等.一种通过视频片段进行视频检索的方法[J].软件学报,2003,14(8)：1409-1417.
    [39]Nicholas DiakoPoulos.Temporally tolerant video matching[C]. In Proceedings of SIGIR Multimedia Information Retrieval Workshop 2003,Toronto Canada,Aug.2003.
    [40]Huang Zi,Shen Hengtao,Shao Jie, et al. Bounded coordinate system indexing for real-time video clip search[J]. ACM Transactions on Information Systems,2009,Vol.27(3),PP:767-775.
    [41]Myron Flickner,Harpreet Sawhney,Wayne Niblack. Query by Image and Video Content:The QBIC System[J]. IEEE Computer,Vol.28,No.9,1995, pp:23-32.
    [42]John R. Smith and Skih. Fu Chang. VisualSEEK:A Fully Automated Content-Based Image Query System[C]. In Proceedings of the fourth ACM International Conference on Multimedia, Boston, MA, USA,1996,pp87-98.
    [43]陈剑赟.体育视频语义内容分析技术研究[D].长沙：国防科学技术大学,2005.
    [44]Liu Chen, Wei Zhou. Multi-feature method:An integrated content based image retrieval system [C]. In Proceedings of 2011 International Symposium on Intelligence Information Processing and Trusted Computing, Wuhan, Hubei, China,2011, PP 43-46.
    [45]黄知义,周宁.基于内容视频检索的关键技术研究[J].现代情报,2005,10：126-129.
    [46]Zhuang Y, Liu X, Pan Y. Webscope-CBVR:A customized content-based Search Engine for video on WWW[C].In Proceeding of IS&T and SPIE Image and Video Communications and Processing 2000.
    [47]Benini Sergio, Migliorati Pierangelo, Leonardi Riccardo. Hierarchical structuring of video previews by Leading-Cluster-Analysis [J]. Signal, Image and Video Processing,2010,4(4):435-450.
    [48]C. Dorai, S. Venkatesh. Bridging the semantic gap in content management system:computational media aesthetics[C].In Proc. Cinf. On Computational Semiotics for Games and New Media,Amsterdam,2001,pp94-99.
    [49]吴飞,刘亚楠,庄越挺.基于张量表示的直推式多模态视频语义概念检测[J].软件学报2008,19(11)：2853-2868.
    [1]Yong R, Huang T S, Mehrotra S. Exploring video structure beyond the shots[C]. In:the Proceedings of IEEE International conference on Multimedia Computing and Systems,1998, PP:237-240.
    [2]彭宇新,Ngo Chong-wah,董庆杰等.一种通过视频片段进行视频检索的方法[J].软件学报,2003,14(8)：1409-1417.
    [3]庄越挺,潘云鹤,吴飞.网上多媒体信息分析与检索[M].北京：清华大学出版社,2002.
    [4]章毓晋.基于内容的视觉信息检索[M].北京：科学出版社,2003.
    [5]刘杜清.视频摘要技术的研究与实现[D].长沙：国防科学技术大学,2004.
    [6]M. Petkovic, W. Jonker. A Framework for Video Modeling[C]. Eighteenth IASTED International Conference Applied Informatics, Innsbruck, Austria, February 2000.
    [7]薛琛璋.视频关键帧提取技术及其在舆情分析中的应用[D].北京：北京交通大学,2009.
    [8]闫君飞, WANG Song,李俊等.一种应用于视频点播系统的视频检索方法[J].小型微型计算机系统,2008,29(8)：1534-1537.
    [9]冯扬,罗森林,王丽萍等.一种新的自适应镜头边界检测算法[J].北京理工大学学报,2010,30(1)：100-104.
    [10]冯镔,肖非,朱光喜等.一种基于H.264/AVC的压缩域运动对象分割方法[J].中国图象图形学报,2009,14(7)：1327-1333.
    [11]朱曦,林行刚.视频镜头时域分割方法的研究[J].计算机学报,2004,27(8)：1027-1034.
    [12]董晨晨.镜头边界检测与关键帧提取技术研究[D].南京：东南大学,2010.
    [13]Swangberg D, Shu C F, Jain R. Knowledge guided parsing in video databases[C]. In:the Proceedings of SP IE Sto rage and Retrieval for Image and Video Databases, San Jose, CA, USA,1993, PP.13-21.
    [14]华漫.基于语义的体育视频场景分割方法[J].计算机工程,2010,36(15)：206-208.
    [15]程文刚,须德,郎从妍.一种有效的视频场景检测方法[J].中国图象图形学报,2004,9(8)：984-990.
    [16]T. W. Parson著,文成义等译.语音处理[M].北京：国防工业出版社,1996.
    [17]赵力.语音信号处理[M].北京：机械工业出版社,2003.
    [18]冯哲.基于内容的视频检索中的音频处理[D].上海：复旦大学,2004.
    [19]张卫强,刘加.基于听感知特征的语种识别[J].清华大学学报(自然科学版),2004,49(1)：78-81.
    [20]赵腊生.语音情感特征提取与识别方法研究[D].大连理工大学,2010.
    [21]赵亚琴.基于内容的视频片段检索技术研究[D].南京理工大学,2006.
    [22]Wang F, Man L C, Wang B P, et al. Fuzzy-based algorithm for color recognition of license plates[J]. Pattern Recognition Letters, Vol.29,2008,PP: 1007-1020.
    [1]Hanjalic A. Video and Image Retrieval beyond the Cognitive Level:the Needs and Possibilities [C]. In:Proceeding of Storage and Retrieval for Media Databases, San Jose,2001,130-140.
    [2]Michael S Lew,Nicu Sebe, Chabane Djeraba, etal.Content-based Multimedia Information Retrieval State of the Art and Challenges[J]. ACM Transactions on Multimedia Computing, Communications and Applications,2006,2(1):1-19.
    [3]The Internet Movie Database (IMDB). http://www.imdb.com/chart/top.
    [4]Sundaram H and Chang S F. Computable Scene and Structures in Films[J]. IEEE Transactions on Multimedia,2002,4(4):482-491.
    [5]周洁.语音信号中情感信息的分析和处理研究[D].东南大学,2005.
    [6]林新棋.基于模糊理论的电影情感内容识别[D].北京：北京邮电大学,2009.
    [7]Valdez P, Mehrabian A. Effects of color on emotions[J]. Journal of Experimental Psychology:General,1994,123(4):394-409.
    [8]Reynertson AF. The Work of the Film Director[M]. New York:Hasting House, 1970.
    [9]Hong-Bong Kang. Affective content detection using HMMs[C]. In:Proceeding of 1 lth ACM international conference on Multimedia, USA,2003, PP:259-262.
    [10]Wang H L, Cheong L F. Affective understanding in film[J]. IEEE Transactions on Circuits and Systems for Video Technology,2006,16(6):689-704.
    [11]Rasheed Z, Sheikh Y and Shah M. On the use of computable features for film classification[J]. IEEE Transactions on Circuits and Systems for Video Technology, Vol.15, No.1,2005, PP:52-64
    [12]郭戈.数字视频语义信息提取与分析[D].郑州：解放军信息工程大学,2010.
    [13]张会军,穆德远.银幕创造—与中国当代电影摄影师对话[M],北京：中国电影出版社,2000.
    [14]B. H. Detenber, R. F. Simons, G. G. Bennett. The Effects of Picture Motion on Emotional Responses [J]. Journal of Broadcasting & Electronic Media.1997. 21(1):112-126.
    [15]Hong-Bong Kang. Affective content detection using HMMs. In:the Proceedings of 11th ACM international conference on Multimedia, Berkeley, CA, USA,2003, PP:259-262.
    [16]Liu Anan, Li Jintao, Zhang Yongdong, et al. An innovative model of tempo and its application in action scene detection for movie analysis[C]. In Proceeding of IEEE Workshop on Applications of Computer Vision, WACV, Copper Mountain, CO, USA,2008,PP:1-6.
    [17]Zettl H著,赵淼淼译.图像、声音、运动：实用媒体学[M].北京广播电影学院出版社,北京,2003.
    [18]安宁.色彩原理与色彩构成.中国美术学院出版社,1999.
    [19]姜海犁.现代色彩构成[M].西南师范大学出版社,2000.
    [20]陆绍阳.视听语言[M],北京：北京大学出版社2009.3
    [21]W.舒里安著,罗悌伦译.影视心理学[M],四川,四川人民出版社,1988.
    [22]秦俊香.影视艺术心理学[M],北京：中国广播电视出版社2009.
    [23]赵力,钱向民,邹采荣等.语音信号中的情感识别研究.软件学报Vol.12 No.7 2001:1050-1055.
    [24]Murray Iain R, Arnott John L. Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech [J]. Computer Speech and Language, 2008,22(2):107-129.
    [25]张石清,赵知劲,雷必成等.结合音质特征和韵律特征的语音情感识别[J].电路与系统学报,2009,14(4)：120-123.
    [26]Liu Z.,Wang Y.,Chen T.Audio Feature Extraction and Analysis for Scene Segmentation and Classification[J].Journal of VLSI Signal Processing Systems, 1998,20(2):61-79.
    [27]杨行峻,迟惠生,语音信号数字处理,电子工业出版社,1995.
    [28]汪慧.多语种语音情感识别的研究与实现[D].福州：华侨大学2007.
    [29]林奕琳.基于语音信号的语音情感识别研究[D].广州：华南理工大学,2006.
    [30]Chul M. L., Shrikanth S. N.. Toward Detecting Emotions in Spoken Dialogs, IEEE Trans. Speech and Audio Processing,13(2),2005, pp.293-303.
    [31]Xie B., Chen L., Chen G. C., Chen C..Statistical Feature Selection for Mandarin Speech Emotion Recognition. In Proc. efIntemational Conference on Intelligent Computing (1CIC), LNCS,2005, pp.591-600.
    [32]毛启容.语音情感特征提取及识别方法研究[D].镇江：江苏大学,2009.
    [33]王小佳.基于选择性特征的语音情感识别方法研究.硕士学位论文,镇江,江苏大学,2008.
    [34]冯哲.基于内容的视频检索中的音频处理[D].上海：复旦大学,2004.
    [30]Lie Lu, Dan Liu, Hong-Jiang Zhang. Automatic Mood Detection and Tracking of Music Audio Signals[J]. IEEE Transactions on Audio, Speech, and Language Processing,2006,14(1):5-18.
    [36]Juslin P.N..Cue utilization in communication of emotion in music performance: relating performance to perception[J].Journal of experimental psychology: human perception and performance,2000,26(6):1797-1813
    [37]Dan-Ning Jiang, Lie Lu, Hong-Jiang Zhang, et al. Music type classification by spectral contrast feature[C]. In Proceeding of IEEE International Conference on Multimedia and Expo,2002,PP:113-116.
    [38]孙凯.面向观众的电影情感内容表示与识别方法研究[D].上海：华中科技大学,2009.
    [39]张润楚.多元统计分析[M].科学出版社,2010年.
    [40]赵腊生.语音情感特征提取与识别方法研究[D].大连理工大学,2010年.
    [41]刘开第编著.不确定性信息数学处理及其应用.武汉：华中理工大学出版社,1998.
    [42]王光远.未确知信息及其数学处理.哈尔滨建筑工程学院学报,1990,12(4)：1-4.
    [43]刘开第,吴和琴等.未确知数学.武汉：华中理工大学出版社,1997.
    [44]李惠娟.未确知数学的基本原理及其应用.第三届全国模糊数学分析设计学术交流会论文集,北京：中国建筑工业出版社,1993.
    [45]Maggioni M, Mhaskar H.N. Diffusion polynomial frames on metric measure spaces [J]. Applied and Computational Harmonic Analysis,2008,24(3):329-353.
    [46]闫乐林,温向明,郑伟等.基于未确知测度的视频情感内容识别[J].东南大学学报(自然科学版),2011,41(3)：473-477.
    [47]A.M.雅格洛姆,N M.雅格洛姆.概率与信息[M].吴茂森,译.上海：上海科学技术出版社,1964：29-88.
    [48]Jafari S, Shabaninia F, Nava PA. Neural network algorithms for tuning of fuzzy certainty factor expert systems [J]. Automation Congress,2002,13:95-100.
    [49]He RY, Yin JC, Chang CW. Research on access control based on certainty factor [J]. Intelligent Systems Design and Applications,2008,2:585-590.
    [50]Wang F, Man L C, Wang B P, et al. Fuzzy-based algorithm for color recognition of license plates[J]. Pattern Recognition Letters, Vol.29,2008,PP: 1007-1020.
    [51]K. Sreenivasa Rao, V.K. Saroj, Sudhamay Maity, et al. Recognition of emotions from video using neural network models [J], Expert Systems with Applications, 2011,38(10):13181-13185.
    [1]庄越挺,潘云鹤,吴飞.网上多媒体信息分析与检索[M].北京：清华大学出版社,2002.
    [2]刘桂清.视频摘要技术的研究与实现[D].长沙：国防科学技术大学,2004.
    [3]闫君飞,王松,李俊等.一种应用于视频点播系统的视频检索方法[J].小型微型计算机系统,2008,29(8)：1534-1537.
    [4]Chang BF, Chu HW, Chen CL, et al. The comparison of scapular muscle strength between collegiate pitchers and tennis players [C]. In Proc. of 6th World Congress of Biomechanics, WCB 2010, Singapore, Singapore,2010. PP:988-991.
    [5]Zhang Yifan, Xu Changsheng, Zhang Xiaoyu, et al. Personalized retrieval of sports video based on multi-modal analysis and user preference acquisition [J]. Multimedia Tools and Applications,2009,44(2):305-330.
    [6]Zhu Zhengli,Zhao ChunXia,Hou Yingkun. Improved texture image retrieval using Multi-feature [J]. Advances in Information Sciences and Service Sciences, 2011,3(11):44-51.
    [7]Jiang Hui,Zhang Ming.Tennis video shot classification based on support vector machine[C]. In Proc. of the 2011 IEEE International Conference on Computer Science and Automation Engineering, CSAE 2010, v 2, PP:757-761.
    [8]Dao Minh-Son, Babaguchi Noboru. A new spatio-temporal method for event detection and personalized retrieval of sports video[J]. Multimedia Tools and Applications,2010,50(1),227-248.
    [9]Xu Min, Duan Ling-Yu, Xu Chang-Sheng, et al. A fusion scheme of visual and auditory modalities for event detection in sports video [C]. In Proc. of the International Conference on Acoustic, Speech and Signal Processing, PP:189-192,2003.
    [10]Yin Jian, Han QL. Video segmentation and retrieval system based on color and texture[J]. Journal of Information and Computational Science,2008,5(1):275-280.
    [11]Xing Liyuan, Ye Qixiang, Zhang Weigang, et al. A scheme for racquet sports video analysis with the combination of audio-visual information[C]. In Proc. of International Conference on Visual Communications and Image Processing 2005, Beijing, Dec.2005 PP:259-267.
    [12]Kijak Ewa,Oisel Lionel, Gros Patrick.Temporal structure analysis of broadcast tennis video using Hidden Markov Models [C]. In Proc. of SPIE Storage and Retrieval for Media Databases,2003, PP:289-299.
    [13]陈剑赟.体育视频语义内容分析技术研究[D].长沙：国防科技大学,2005.
    [14]Liu Huayong. Content-based TV sports video retrieval based on audio-visual features and text information [C]. In Proc. of ACM International Conference on Web Intelligence, WI 2004, Beijing, China,2004, PP:481-484.
    [15]Yong R, Huang T S. Mehrotra S. Exploring video structure beyond the shots. In: the Proceedings of IEEE International conference on Multimedia Computing and Systems,1998, PP:237-240.
    [16]张龙飞,曹元大,周艺华等.基于支持向量机元分类器的体育视频分类[J].北京理工大学学报,2006,26(1)：41-44.
    [17]M Pontil, A Verri. Support vector machines for 3-d object recognition[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence.1998,20(6):637-646.
    [18]V Blanz, B Scholkopf, H Bulthoff, et al. Comparison of view-based object recognition algorithms using realistic3d models [C]. In Proc. of International Conference on Artificial Neural Networks,Berlin, Germany,1996,251-256.
    [19]Duan Rujiao, Zhao Wei, Huang Songling, et al. Fast line detection algorithm based on improved Hough transformation [J]. Chinese Journal of Scientific Instrument,2010,31(12):2774-2780.
    [20]戴小文,蔡志平,钟桂英.基于小波变换及熵的视频镜头分割检测方法[J].西南交通大学学报,2008,43(3)：314-318.
    [21]潘磊,吴小俊,尤媛媛.基于聚类的视频镜头分割和关键帧提取[J].红外与激光工程,2005,34(3)：341-344.
    [22]Zeng Jun, Li Dehua. An improved Canny edge detector against impulsive noise based on CIELAB space [C]. In Proc. of International Symposium on Intelligence Information Processing and Trusted Computing, Huanggang, China,2010, PP:520-523.
    [23]Liu Chunxi,Huang Qingming,Jiang Shuqiang, et al.A framework for flexible summarization of racquet sports video using multiple modalities [J]. Computer Vision and Image Understanding,2009,113(3):415-424.
    [24]E. Kijak, G. Gravier, L. Oisel et al. Audiovisual integration for tennis broadcast structuring [C]. In:Proceedings of International Conference on Multimedia and Exhibition,2003.
    [25]孙凯.面向观众的电影情感内容表示与识别方法研究[D].武汉：华中科技大学,2009.
    [26]Zhu Lijin. Sports video shot transition and classification research based on video content retrieval [J]. International Journal of Digital Content Technology and its Applications,2011,5(3):110-118.
    [1]王一丁,蒋小森.基于梯度增强的新闻字幕分割算法.计算机辅助设计与图形学学报[J],2009,21(8)：1170-1174.
    [2]郭梦琦.基于内容的新闻视频分析方法研究[D].上海：上海大学,2008.
    [3]彭天强,李弼程.基于朴素贝叶斯模型的新闻故事分割方法[J].计算机工程,2009.30(20)：178-180.
    [4]吴玲达,文军,陈丹雯.新闻视频故事单元关联分析技术研究综述[J].计算机科学,2010,37(6)：5-10.
    [5]Luo Hangzai, Fan Jianping, Zhou Youjie. Multimedia news exploration and retrieval by integrating keywords, relations and visual features [J]. Multimedia Tools and Applications,2011,51(2):625-648.
    [6]冯哲.基于内容的视频检索中的音频处理[D].上海：复旦大学,2004.
    [7]王炳锡,屈丹,彭煊.实用语音识别基础[M].北京：国防工业出版社,2005.
    [8]孙即祥.现代模式识别(第2版)[M].北京：高等教育出版社,2008.
    [9]Kumatani Kenichi, Nakamura Satoshi. Audio-visual speech recognition based on optimized product HMMs and GMM based-MCE-GPD stream weight estimation [J]. IIEICE Transactions on Information and Systems,2003, E86-D (3):454-463.
    [10]Rabiner L, Juang B H. Theory and Implementation of Hidden Markov Models [M]. Book Chapter, Fundamentals of Speech Recognition, Prentice Hall,1993.
    [11]Zhang T, Kuo C.-C Jay. Audio content analysis for online audiovisual data segmentation and classification [J]. IEEE Transactions on Speech and Audio Processing,2001,9(4):441-457.
    [12]Eickeler S, Kosmala A, Rigoll G. A new approach to content-based video indexing using hidden markov models [C]. In Proc. of IEEE Workshop on Image Analysis for Multimedia Interactive Service,1997:149-154.
    [13]Liu Huayong, Zhou Dongru. Content-based news video story segmentation and video retrieval [C]. In Proc. of the International Society for Optical Engineering. Hefei, China,2002:1038-1044.
    [14]Lu Mimi, Xie Lei,Fu Zhonghua, et al. Multi-modal feature integration for story boundary detection in broadcast news [C]. In Proc. of the 7th International Symposium on Chinese Spoken Language Processing, ISCSLP 2010. Tainan, Taiwan,2010:420-425.
    [15]Hsu Winston, Chang Shih-Fu, Huang Chih-Wei, et al. Discovery and fusion of salient multi-modal features towards news story segmentation [C]. In Proc. of the Storage and Retrieval Methods and Applications for Multimedia 2004. San Jose, CA, USA,2004:244-258.
    [16]刘文萍,蒋小森.新的基于综合特征的新闻事件分割方法[J].计算机工程与应用,2009,45(31)：233-236.
    [17]于俊情,汤旸,周向东.利用主色模板匹配检测新闻视频口播帧[J].计算机辅助设计与图形学学报,2005,17(3)：558-562.
    [18]杨武夷,曾智,张树武等.基于人脸检测与SIFT的播音员镜头检测[J].软件学报,2009,20(9)：2417-2425.
    [19]张春林,张鹏林,胡瑞敏.新闻视频中基于主持人识别的新闻故事探测[J].计算机工程,2003,29(14)：20-26.
    [20]Zhou Zhihao, Wang Shitong. Content-Based Shot Retrieval by On-Line Clustering Algorithm [J]. Journal of Data Acquisition & Processing,2008,23 (1) pp.84-88.
    [21]Pavani Sri-Kaushik, Delgado David, Frangi Alejandro F. Haar-like features with optimally weighted rectangles for rapid object detection[J]. Pattern Recognition, 2010,43 (1) pp.160-172.
    [22]Viola Paul, Jones Michael. Rapid object detection using a boosted cascade of simple features [C]. In Proc. of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, HI, USA,2001:1511-1518.
    [23]张亮,朱振峰,赵耀等.基于镜头的鲁棒视频广告检测[J].智能系统学报,2007,2(2)：83-88.
    [24]Duan LingYu, Wang Jinqiao, Zheng, Yantao, et al. Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis [C]. In Proc. of the 14th Annual ACM International Conference on Multimedia. Santa Barbara, CA, USA,2006:201-210.
    [25]Hua Xiansheng, Lu Lie, Zhang Hongjiang. Robust learning-based TV commercial detection [C]. In Proc. of IEEE International Conference on Multimedia and Expo, ICME 2005. Amsterdam, Netherlands,2005:149-152.
    [26]李士进,郭跃飞,李昊等.新闻视频中广告片段精确定位方法研究[J].中国图象图形学报,2009,14(7)：1432-1439.
    [27]Wang Jinjun, Xu Changsheng, Chng Engsiong, et al. Automatic replay generation for soccer video broadcasting[C]. In:Proceedings of the 12th ACM International Conference on Multimedia, New York, USA,2004:32-39.
    [28]Duan Ling-Yu, XuMin, ChuaTat-Seng, et al. A mid-level representation framework for semantic sports video analysis [C]. In:Proceedings of the ACM International Multimedia Conference and Exhibition, Berkeley, CA, USA,2004: 33-44.
    [29]Zhang Dong, Gatica-Perez Danie, Bengio Samy, et al. Semi-supervised adapted HMMs for unusual event detection [C]. In Proc. of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR,2005, PP:611-618.
    [30]Ide, Ichiro, Sekioka Naoki, Takahashi Tomokazu, et al.Assembling personal speech collections by monologue scene detection from a news video archive[C]. In Proc. of the 8th ACM Multimedia International Workshop on Multimedia Information Retrieval, MIR 2006, Santa Barbara, CA, United states, Oct.2006 PP: 223-230.
    [31]陈晴.基于条件随机场的自动分词技术的研究[D].沈阳：东北大学,2004.
    [32]Lafferty J, McCallum A, Pereira F. Conditional random fields:probabilistic models for segmenting and labeling sequence data [C]. In Proc. of the Eighteenth International Conference on Machine Learning. Massachusetts, USA,2001: 282-289.
    [33]Sha F, Pereira F. Shallow parsing with conditional random fields[C]. In Proc. of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology,2003:134-141.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700