基于内容的视频片段检索技术研究

英文题名：Study of Content-Based Video Clip Retrieval
作者：赵亚琴
论文级别：博士
学科专业名称：控制科学与工程
中文关键词：视频片段检索 ; 相似度模型 ; 视觉特征 ; 语义信息 ; 新闻主题字幕文本 ; 广告视频 ; 球拍类体育比赛视频 ; 新闻视频
英文关键词：video clip retrieval ; similarity model ; visual feature ; semantic information ; news topic caption text ; commercial video ; racquet sports video ; news video
学位年度：2007
导师：周献中
学科代码：081101
学位授予单位：南京理工大学
论文提交日期：2006-09-01

摘要

近年来，随着视频信息资源的日益丰富，从大量的视频数据中检索感兴趣的视频片段已经成为目前视频检索研究的热点。人们期望在不久的将来像文本检索那样，由用户向系统提交查询的视频内容，系统自动地返回符合查询需要的结果。在这种背景下，基于内容的视频检索应运而生，它综合了图像处理、模式识别、计算机视觉、图像理解等领域的知识，具有广阔的应用前景。
     本文以广告类视频、球拍类体育比赛视频和新闻视频为研究对象，主要研究了这三类视频的片段检索方法。
     提出了一种新的广告类视频片段检索方法。在定义查询片段镜头的相似镜头集的基础上，给出了两个片段一一对应相似镜头数目的计算方法，以排除伪相似片段，进而定义了片段的匹配函数；采用滑动镜头窗从连续的视频流中自动分割出多个相似片段；将相似镜头集映射为相似镜头矩阵，利用矩阵的特性来计算影响片段相似程度的各个因子，建立了相似片段的排序模型。与同样功能的其他方法相比，在确保较高的检索精度的前提下，提高了检索速度。
     融合视觉特征、运动特征和音频特征，提出了一种球拍类体育比赛视频的Rally精彩片段检索方法。提出了一种基于运动特征和主颜色特征的自动选择Global View Shots类的参考关键帧的方法，以便将镜头划分为Global View Shots类和Non-Global View Shots类；SVM用于识别镜头的重要音频类型；将音频信息应用到镜头类，建立了感兴趣事件检测的相应规则；利用回放的Rally片段和Rally精彩程度模型，从球拍类比赛中检索出符合多数人心理的Rally精彩片段。
     提出了一种基于视觉特征和语义信息的新闻视频片段检索方法。提出了一种新闻视频主题字幕文本检测方法，进而讨论了一种通过主题字幕文本和静音片段进行新闻视频故事分段的方法；基于低层视觉特征和主题字幕文本获得的语义信息建立了新闻片段的相似度模型，并应用相关反馈技术从新闻节目中检索出最符合用户需求的新闻故事片段。
     设计并初步实现了一个基于内容的视频片段检索实验平台，验证了上述片段检索方法的性能。
Rapid growth of video resources leads to an urgent demand for effectively retrievinginteresting video clip from huge amounts of video data in recent years. As text retrieval,video retrieval is also expected that a retrieval system can automatically return results inaccord with query demand when users submit query video content. Content-based videoretrieval techniques have emerged as the times requires, which combine image processing,pattern recognition, computer vision and image understanding, etc. And it has wideapplication foreground.
     This paper mainly presents content-based clip retrieval methods of three video genres,commercial video, racquet sports video and news video.
     A new commercial video clip retrieval method is presented, in which the similar shotset of shots of query clip is defined to compute the number of one-one similar shotsbetween two video clips, on the basis of this, the definition of matching function of twovideo clips is given. Afterwards, similar commercial clips are automatically segmentedfrom continuous video databases using sliding shot window according to their matchingdegrees. Furthermore, the similar shot sets are mapped into one similar shot matrix tocompute various factors for similarity ranking of the similar clips by characters of thematrix. Experimental results showed that the proposed method could effectively andefficiently retrieve and rank similar video clips.
     This paper presents a new audiovisual integration scheme for retrieving rally highlightfrom racquet sports video. Motion features and dominant color are applied to classify shotsinto two classes, global view shots and non-global view shots. At the same time, importantauditory features including both ball hitting and applause are detected by using SVM.Afterwards, audio features are applied into shot classes for identifying interesting eventswith strong semantic meaning. Finally, rally highlights are retrieved by rally clips replayedand exciting degree model of a rally from racquet sports video.
     This paper presents a scheme of retrieving news video clips based on low-level visualfeature and high-level semantic information. The semantic information is obtained fromtopic caption text. In the method, news video is first segmented into a series of news storyclips on the basis of topic caption text and silence clip. Moreover, the segmented story clips are annotated by topic caption text. Afterwards, a similarity model of two news clipsis established based on topic caption text and low-level visual features. Furthermore,relevant feedback technique is applied to retrieve similar news story clips, which haverelative topic, or similar vision, or relative topic and similar vision with query news clips.
     A content-based video clip retrieval platform was designed and implemented tovalidate the performance of the above clip retrieval methods for three video genres.

引文

1 Boreczky JS and Wilcox LD. A hidden Markov model framework for video segmentation using audio and image features. In: Proceedings of Acoustics, Speech, and Signal Processing, Seattle, WA, USA, 1997, pp. 3741-3744.
    2 HJ Zhang, A Kankanhalli, S W Smoliar. Automatic partitioning of full-motion video. Multimedia Systems 1993, Vol. 1, No. 1, pp. 10-28.
    3 Yueting Zhang, Yong Rui et al. Adaptive key frame extraction using unsupervised clustering. In: Proceedings of International Conference on Image Processing. October 4-7, 1998, pp. 886-870
    4 J. Fan, Aref W, Elmagarmid A. MultiView: multilevel video content representation and retrieval. Journal of Electron Image, 2001, Vol. 14, No. 4, pp. 895-908.
    5 王东辉，朱淼良，吴春明．基于时序结构图的视频流描述方法．计算机学报，2001，24(9)：944-950．
    6 Sundaram H., Segmentation. Structure detection and summarization of multimedia Sequences. Doctoral Dissertation: Graduate School of Arts and Sciences, Columbia University, 2002.
    7 Ma Y F, Zhang H J. A model of motion attention for video skimming. In: Proceeding of IEEE International Conference on Image Processing, Rochester, New York, Sep., 2002, pp. 129-132.
    8 Yeo B L, Liu B. Rapid scene analysis on compressed video. IEEE Transactions on Circuits and Systems for Video Technology, 1995, Vol. 5, No. 6, pp. 533-544.
    9 S H Kim, R H Park. An efficient algorithm for video sequence matching using the modified Hausdorff distance and the directed divergence. IEEE Trans Circuits and Systems for Video Technology, 2002, Vol. 12, No. 7, pp. 592-595.
    10 曹莉华．视频媒体的基于内容处理和检索的研究与实现．博士学位论文．长沙：国防科学技术大学七系，1998．
    11 Ngo, C.W., Pong, T.C., Zhang, H.J., et al. Motion-based video representation for scene change detection. In: Proceedings of the ICPR 2000. Barcelona, Spain, 2000.
    12 林通，张宏江，封举富等．镜头内容分析及其在视频检索中的应用．软件学报，2002，13(8)：1577-1585．
    13 Nevenka Dimitrova, Mohamed Abdel-Mottaled. Content-based video retrieval by example video clip. In: Proceedings of IS&T and SPIE Storage and Retrieval of Image and Video Databases Ⅵ, 1998, Vol. 3022, pp. 184-196.
    14 Zhang H J, Zhong D, Smoliar SW. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 1997, Vol. 30, No. 4, pp. 643- 658.
    15 赵黎，祁卫，李子青等．利用改进NFL算法对镜头进行基于内容的检索．软件学报，2002，13(4)：586-560．
    16 彭宇新，Ngo Chong-Wah，肖建国．一种基于二分图最优匹配的镜头检索方法．电子学报，2004，32(7)：1135-1139．
    17 M.R. Naphade. A novel scheme for fast and efficient video sequence matching using compact signatures. In: Proceedings of SPIE storage and retrieval for media databases, USA, California, 2000, pp. 564-572.
    18 Rakesh Mohan. Video sequence matching. In: International Conference on Acoustics, Speech, and Signal Processing, USA, Washington, 1998, pp. 3679-3700.
    19 Anil K. Jain, Aditya Vailaya Xiong Wei. Query by video clip. Multimedia ystem, 1999, Vol. 7, No. 5, pp. 369-384.
    20 Nicholas Diakopoulos. Temporally tolerant video matching. In SIGIR Multimedia Information Retrieval Workshop 2003, Toronto, Canada, Aug. 2003.
    21 Liu X M, Zhuang, Y T, Pan Y H. A new approach to retrieve video by example video clip. Proceedings of ACMMultimedia. Orlando: ACM, 1999, pp. 41-44.
    22 吴翌，庄越挺，潘云鹤．视频的检索反馈．计算机研究与发展，2001，38(5)：546-551．
    23 彭宇新，Ngo Chong-Wah，董庆杰等．一种通过视频片段进行视频检索的方法．软件学报，2003，14(8)：1409-1417．
    24 Chen L P, Chua T S. A match and tiling approach to content-based video retrieval. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2001), 2001, pp. 417-420.
    25 刘阳，许松涛，吴志美等．一种分级检索MPEG视频的方法．软件学报，2003，14(3)：675-681．
    26 Junsong Yuan, Qi Tian, Surendra Ranganath. Fast and robust search method for short video clips from large video collection. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), 2004.
    27 Niblack W, Flickner M. Retrieving image by eontent using color, texture, and shape. Adanced Imaging, 1993, Vol. 8, No. 4, pp. 32-35.
    28 J. Meng, S F Chang. A compressed video editing and parsing system, In: Proceedings of ACM Multimedia 96 Conference, Boston, MA, November 1996, pp. 43-53.
    29 S F Chang, W Chen, et al. VideoQ: an automatic content-based video search system using visual visual cues. Seattle, WA: ACM Multimedia conference, 1997, Vol. 9, pp. 435-442.
    30 J R Smith, S F Chang. Visually searching the web for content. IEEE Multimeadia Magazine, Summer, 1997, Vol. 4, No. 3, pp. 12-20.
    31 Wactlar H D, Kanade T, et al. Intelligent access to digital video: information project. IEEE computer, 1996, Vol. 29, No. 5, pp. 46-52.
    32 M. Christel, T. Kanade, M. Mauldi, et al. Informedia digital video library. Communications of the ACM, April, 1995, Vol. 38, No. 4, pp. 23-34.
    33 Marco L, Edoardo A. JACOB: just a content-based query system for video databases. Proc. ICASSP-96, Atlanta, GA, May, 1996.
    34 Multimedia Description Schemes group. Text of 15938-5 FCD Information Technology-Multimedia Content Description Interface-Part 5 Multimedia Description Schemes. ISO/IEC JTC 1/SC29/WG11/N3966. Singapore, March, 2001.
    35 Servetto S, Rui Y. A Region-based representation of images in MARS. Special Issue on Multimedia Signal Processing. Journal on VLSI Signal Processing. Oct. 1998.
    36 Li Ying, Jay Kuo. Unsupervised real-time speaker identification for daily movies. In Proceedings of SPIE, the International Society for Optical Engineering, 2002, Vol. 4862, pp. 151-162.
    37 J. Kittler, K. Messer. Generation of semantic cues for sports video annotation. IEEE International Conference on Image Processing, 2001, Vol. 3, pp. 26-29.
    38 Alexander Hauptmann, Rong Jin. Video retrieval with the informedia digital video library system. In Proceedings of the 10th text retrieval confence, Gaithersburg, Maryland, November 13-16, 2001, pp. 78-84.
    39 黄知义，周宁．基于内容视频检索的关键技术研究．现代情报，2005，10：126-129．
    40 Zhuang Y, Liu X, Pan Y. Webscope-CBVR: A customized content-based Search Engine for video on WWW. In: Proceeding of IS&T and SPIE Image and Video Communications and Processing 2000.
    41 R. Lienhart, et al. On the detection and recognition of television commercials. In: Proceedings of IEEE Conference on Multimedia Computing and Systems, 1997, pp. 509-516.
    42 Juan M. S. et al. Shot partitioning based recognition of TV commercials. IEEE Transactions on Multimedia Tools and Applications, 2002, Vol. 18, pp. 233-247.
    43 Y. T. Kim, T.S. Chua. Retrieval of news video using video sequence matching. In: Proceedings of the 11th International Multimedia Modelling Conference (MMM'05), 2005, pp. 68-75.
    44 庄越挺，潘云鹤，吴飞．网上多媒体信息分析与检索．第1版．北京：清华大学出版社，2002．
    45 章毓晋．基于内容的视觉信息检索．第1版．北京：科学出版社，2003。
    46 刘桂清．视频摘要技术的研究与实现．博士毕业论文．长沙：国防科学技术大学，2004．
    47 欧阳建权，李锦涛，张勇东．视频摘要技术综述．计算机工程，2005，31(10)：7-9．
    48 朱曦，林行刚．视频镜头时域分割方法的研究．计算机学报，2004，27(8)：1027-1034．
    49 Choubey S.K., Raghavan V.V. Generic and fully automatic content-based image retrieval using color. Pattern Recognition Letters, 1997, Vol. 1, No. 8, pp. 1233-1240.
    50 Srinivasan M.V., Venkatesh S., Hosie R. Qualitative estimation of camera motion parameters from video sequence. Pattern Recognition, 1997, Vol. 30, No. 4, pp. 593-606.
    51 Femando, W.A.C., Canagarajah C.N., Bull D.R. Video segmentation of classification for content based storage and retrieval using motion vectors. SPIE, 1999, Vol. 3656, pp. 687-698.
    52 Zhang H J. Video parsing, retrieval and browsing: An integrated and content-based solution. In: Proceedings of ACM Multimedia'95, San Francisco, CA, 1995, pp. 15-24.
    53 Zabin R, Miller J and Mai K. Feature-based algorithms for detecting and classifying scene breaks. In: Proceedings of 4ICM, 1995, pp. 97-103.
    54 Zhang HJ, Kankanhalli, Smoliar S. Automatic patitioning of video. Multimedia Systems, 1993, Vol. 1, No. 1, pp. 10-28.
    55 Zhang HJ, Wu JH, Zhong D, et al. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 1997, Vol. 30, No. 4, pp. 643-657.
    56 Meng JH, Juan YJ, Chang SF. Scene change detection in a MPEG compressed video sequence. In: Proceedings of IS&T/SPIE, Conference on Multimedia Computing and Networking. San Jose, CA, 1995, Vol. 2417: 180-191.
    57 Truong B.T., Dorai C., Venkatesh S. Improved fade and dissolve detection for reliable video segmentation. In: Proceedings of IEEE International Conference on Image Processing (ICIP 2000), Vancouver, BC, Canada, 2000, Vol. 3, pp. 961-964.
    58 Ngo CW, Pong TC, Chin RT. Video partitioning by temporal slice coherency. IEEE Transactions on Circuits and Systems for Video Technology, 2001, Vol. 11, No. 8, pp. 941-953.
    59 朱映映，周洞汝．一种基于视频聚类的关键帧提取方法．计算机工程，2004，Vol．30，No．4，pp．12-13．
    60 Wolf W. Key frame selection by motion analysis. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996.
    61 Zhuang Y T, Rui Y, Huang T S, et al. Adaptive key frame extraction using unsupervised clustering. In: Proceedings of IEEE International Conference on Image Processing, 1998.
    62 Gresle P O, Huang T S. Gisting of video documents: A key frames selection algorithm using relative activity measure. In: The 2nd International Conference on Visual Information Systems, 1997.
    63 Swanberg D, Shu C F, Jain R. Know ledge guided parsing in video databases. In: Proceedings of SP IE Sto rage and Retrieval for Image and Video Databases (1908), San Jose, CA ,USA, 1993, pp. 13-21.
    64 Boyk in S, M erlino A. Machine learning of event segmentation for news on demand. Communications of the ACM, 2000, Vol. 43, No. 2, pp. 35-41.
    65 徐骏，周晓峥，于俊清等．基于事件流的新闻视频场景分割方法．计算机辅助设计与图形学学报，2003，15(2)：228-232．
    66 Lu Hong, Tan Yap-peng. An unsupervised approach to dominant video scene clustering. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS'03), Bangkok, Thailand, 2003, pp. 680-683.
    67 程文刚，须德，郎从妍．一种有效的视频场景检测方法．中国图象图形学报，2004，9(8)：984-990．
    68 Yeung M M, Yeo B L, Liu B. Segmentation of video by clustering and graph analysis. Computer Vision and Image Understanding, 1998, Vol. 71, No. 1, pp. 94-109.
    69 Chong-Wah Ngo, Yu-Fei Ma, Hong-Jiang Zhang. Video Summarization and Scene Detection by Graph Modeling, IEEE Transactions on Circuits and Systems for Video Technology, 2005, Vol. 15, No. 2, pp. 296-305.
    70 王东辉，朱淼良，吴春明．基于时序结构图的视频流描述方法．计算机学报，2001，24(9)：944-950．
    71 X Q Zhu, J P Fan, Ahmed KE. Hierarchical video content description and summarization using unified semantic and visual similarity. Multimedia Systems, 2003, No.9, pp. 31-53.
    72 Boreczky JS, Wilcox LD. A hidden Markov model framework for video segmentation using audio and image features. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Seattle, WA, USA, 1997, pp. 3741-3744.
    73 Iyengar G, Lippman A. Models for automatic classification of video sequences. In: Proceedings of SPIE Storage and Retrieval for Image and Video Databases, SanJose, CA, USA, 1998, pp. 3312-3334.
    74 杨行峻，迟惠生．语音信号数字处理．第1版．北京：电子工业出版社，1995．
    75 Rabiner L. and Juang B.H. Fundamentals of speech recognition. Prentice Hall: Englewood Cliffs, NJ, 1993.
    76 冯哲．基于内容的视频检索中的音频处理．博士学位论文．上海：复旦大学信息科学与工程学院，2004：4-15．
    77 Liu Z., Wang Y., Chen T. Audio feature extraction and analysis for scene segmentation andclassification. Journal of VLSI Signal Processing Systems, 1998, Vol. 20, No. 1, pp. 61-79.
    78 Lu L., Jiang H., Zhang HJ. A robust audio classification and segmentation method. In: Proceedings of 9th ACM International Multimedia Conference, Ottawa, Canada, 2001.
    79 Patel N., Sethi I. Audio characterization for video indexing. In: Proceedings of SPIE on Storage and Retrieval for Still Image and Video Databases, San Jose, CA, USA, 1996, Vol. 2670, pp. 373-384.
    80 Zhang T, Kuo CC. Content-based classification and retrieval of Audio. In: Proceedings of SPIE on Advance Signal Processing, Algorithms, Architectures and Implementations Ⅷ, San Diego, CA, USA, 1998, Vol. 3461, pp. 432-443.
    81 Scheirer E., Slaney M. Construction and evaluation of a robust multi-features speech/music discriminator. In: Proceedings of 22nd International Conference on Acoustics, Speech and Signal Processing, Munich, Germany, 1997, pp. 1331-1334.
    82 吴飞，庄越挺，张引等．基于隐马尔可夫链的音频语义检索．模式识别与人工智能，2001，14(1)：104-110．
    83 Stan Z Li, GuoDong Guo. Content-based audio classification and retrieval using SVM learning. In: Proceedings of 1st IEEE Pacific-Rim Conference on Multimedia, Sydney, Australia, 2000.
    84 吴飞，庄越挺，潘云鹤等．基于增量学习支持向量机的音频例子识别与检索．计算机研究与发展，2003，40(7)：950-955．
    85 Pfeiffer S., Fischer S., Effelsberg W. Automatic audiocontent analysis. In: Proceedings of 4th ACM International Multimedia Conference, Boston, MA, 1996, pp. 21-30.
    86 Keechul Jung, Kwang In Kim, Anil K. Jain. Text information extraction in images and video: A Survey. Pattern Recognition, 2004, Vol. 37, No. 5, pp. 977-997.
    87 许剑峰．数字视频中的文本分割的研究．博士学位论文．华南理工大学计算机科学与工程学院，2005：7-9．
    88 卿来云，王伟强，高文．文字自动提取及其在视频索引和检索中的应用．中科院第7届计算机研究生科技论坛，2002：1-9．
    89 Sato T., Kanade T., Hughes E.K., et al. Video OCR for digital news archives. IEEE International Workshop on Content-Based Access of Image and video Databases, 1998, pp. 52-60.
    90 Agnihotri L., Dimitrova N. Text detection for video analysis. Workshop on Content Based Image and Video Libraries, held in conjunction with CVPR, Colorado, 1999, pp. 109-113.
    91 Sobottka, K., Bunke, H., Kronenberg, H. Identification of text on colored book and journal covers. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1998.
    92 Shim J. C., Dorai C., Bolle R. Automatic text extraction from video for content-based annotation and retrieval. In: Proceedings of 14th International Conference on Pattern Recognition, 1998, pp. 618-620.
    93 Zhong Y., Zhang H.J., Jain A.K. Automatic caption localization in compressed video. In: Proceedings of IEEE International Conference on Image Processing, 1999.
    94 Wu V., Manmatha R., Riseman E. Automatic text detection and recognition. In: Proceedings of Image Understanding Workshop, 1997, pp. 707-712.
    95 Yeo B.L., Liu B. Visual content highlighting via automatic extraction of embedded captions on MPEG compressed video. SPIE Digital Video Compression: Algorithms and Technologies, 1996, Vol. 268, pp. 38-47.
    96 史迎春，王韬，周献中．一种基于时空分布特征的新闻字幕检测新算法．系统仿真学报，2004，16(11)：2483-2485．
    97 庄越挺，刘小明，吴翌等．通过例子视频进行进行视频检索的新方法．计算机学报，2000，23(3)：300-305．
    98 K. Kashino, et al. A quick search method for audio and video signals based on histogram priming. IEEE Transactions on Multimedia, 2003, Vol. 5, No. 3, pp. 348-357.
    99 T.C. Hoad, et al. Video similarity detection for digital rights management. In: Proceedings of Twenty-Sixth Australasian Computer Science Conference, Australia, February, 2003, pp. 237-245.
    100 N.X. Lian, Y.P. Tan. Probabilistic approach to k-nearest neighbor video retrieval. ISCAS 2004, pp. 193-196.
    101 卿来云，王伟强，高文．文字自动提取及其在视频索引和检索中的应用．
    102 Nevenka Dimitrova, Thomas McGee, Lalitha Agnihotri, et al. On selective video content analysis and filtering. In: Proceedings of SPIE Conference on Storace and Retrieval for Media Databases, San Jose, California, 2000, Vol. 3972, pp. 359-368.
    103 C. Colombo, A. Del Bimbo, P. Pala. Retrieval of commercials by video semantics. In: Proc. of CVPR, 1998, pp. 572-577.
    104 祁卫，钟玉琢．基于MPEG国际标准压缩视频流的镜头切分算法．清华大学学报，1997，37(9)：50-54．
    105 白雪生，张子银，徐光祜等．数字视频特技镜头转换检测算法的分析．软件学报，2002，13(7)：1278-1282．
    106 M. Petkovic, Z. Zivkovic, W. Jonker. Recognizing strokes in tennis videos using Hidden Markov Models. In: Proceedings of IASTED International Conference on Visualization, Imaging and Image Processing, Spain, 2001.
    107 E. Kijak, L. Oisel, P. Gros. Temporal structure analysis of broadcast tennis video using Hidden Markov Models. In: SPIE Storage and Retrieval for Media Databases, 2003, pp. 289-299.
    108 E. Kijak, G. Gravier, L. Oisel, P. Gros. Audiovisual integration for tennis broadcast structuring. In: Proceedings of International Conference on Multimedia and Exhibition, 2003.
    109 R. Dayhot, A. Kokaram, N. Rea. Joint audio visual retrieval for tennis broadcasts. In: IEEE International Conference on Acoustics, Speech, & Signal Processing, Hong Kong, 2003.
    110 LY. Xing, QX. Ye, WG. Zhang. A scheme for racquet sports video analysis with the combination of audio-visual information. In: Proceedings of International Conference on Visual Communications and Image Processing, Vol. 5960, 2005.
    111 M. Xu, LY. Duan, CS. Xu, Q. Tian: A fusion scheme of visual and auditory modalities for event detection in sports video. In: IEEE International Conference on Acoustics, Speech, & Signal Processing, Hong Kong, 2003, pp. 333-336.
    112 Ma, WY, Zhang HJ. Content-based image indexing and retrieval. In: Borko, F., ed. Handbook of Multimedia Computing. CRC Press, 1998.
    113 庄越挺，刘骏伟，吴飞，潘云鹤，张引．基于支持向量机的视频字幕自动定位与提取．清华大学学报，2002，14(8)：750-753．
    114 David A., Sadlier, Noel E. Event detection in field sports video using audio-visual features and a support vector machine. IEEE Transactions on Circuits and Systems for Video Technology, 2005, Vol. 15, No. 10, pp. 1225-1233.
    115 任金昌，赵荣椿，冯大淦．一个基于内容的数字新闻检索系统的设计与实现．计算机应用，2001，21(10)：49-50．
    116 王蓉蓉，金万军，吴立德．一种新的利用多帧结合检测视频标题文字的算法．计算机研究与发展，2005，42(7)：1191-1197．
    117 Toshio Sato, Takeo Kanade, Ellen K. Hughes, etc. Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimedia Systems, 1999, 7, pp. 385-395.
    118 蔡波，周洞汝，胡宏斌．数字视频中字幕检测及提取的研究和实现．计算机辅助设计与图形学学报，2003，15(7)：898-903．
    119 X.O. Tang, X.B. Gao, J.Z. Liu, H.J. Zhang. A spatial-temporal approach for video caption detection and recognition. IEEE Transactions on neural networks, 2002, Vol. 13, No. 4, pp. 961-971.
    120 Luo H, Huang Q. Automatic model-based anchorperson detection. In: Proceedings of SPIE—The International Society for Optical Engineering, San Jose, CA, USA, 2001, Vol. 4315, pp. 536-544.
    121 马宇飞，白雪生，徐光，史元春．新闻视频中口播帧检测方法的研究．软件学报，2001，12(3)：377-382．
    122 史迎春，周献中，方鹏飞．综合利用形状、颜色特征的台标识别．模式识别与人工智能，2005，18(2)：216-222．
    123 B. Maison, C. Neti, R. Senior. Audio-visual speaker recognition for video broadcast news. Journal of VLSI Signal Processing, 2001, Vol. 29, pp. 71-79.
    124 A. F. Martone, C. M. Taskiran, E. J. Delp. Multimodal approach for speaker identification in news programs. Storage and Retrieval Methods and Applications for Multimedia 2005, In: Proceedings of SPIE-IS&T Electronic Imaging, 2005, Vol. 6582, pp. 308-316.
    125 Zhang HJ, Tan SY, Smoliar SW, Gong Y. Automatic Parsing and Indexing of News Video. Multimedia Systems, 1995, Vol. 2, No. 6, pp. 256-266.
    126 Gunsel B., Ferman A.M., Tekalp A.M. Video indexing through integration of syntactic and semantic features. In: IEEE Computer Society ed. Proceedings of the IEEE Workshop on Applications of Computer Vision. Los Alamitos: IEEE Computer Society Press, 1996, pp. 90-95.
    127 Gulrukh Ahanger, Thomas D.C. Little. Data Semantics for Improving Retrieval Performance of Digital News Video Systems. IEEE Transactions on knowledge and data engineering, 2001, Vol. 13, No. 3, pp. 352-360.
    128 Ichiro IDE, Hiroshi MO, Nono KATAYAMA, Shin'ichi SATOH, Topic-based inter-video structuring of a large-scale news video corpus. Proc. of 2003 IEEE International Conference on Multimedia and Expo (ICME2003), 2003, Vol.3, pp.305-308
    129 Huang Q, Liu Z, Rosenberg Aaron. Automated semantic structure reconstruction and representation generation for broadcast news. In: Proceedings of SPIE—The International Society for Optical Engineering, San Jose, CA, USA, 1999, Vol. 3656: 50-62.
    130 Wang WQ, Gao W. Automatic parsing of news video using multimodal analysis. Journal of Software, 2001, Vol. 12, No. 9, pp. 1271-1278.
    131 L. Chaisom, T.S. Chua, C.H. LEE. A multi-modal approach to story segmentation for news video. World Wide Web: Intemet and Web Information Systems, 2003, Vol. 6, pp. 187-208.
    132 姜帆，章毓晋．新闻视频的场景分段索引及摘要生成．计算机学报，2003，26(7)：859-865．
    133 S. Morrison, J. Jose. A comparative study of online news retrieval and presentation strategies. In: Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering (ISMSE'04), 2004, pp. 403-409.
    134 X.O. Tang, X.B. Gao, C. Y. Wong. NewsEye: a news video browsing and retrieval system. In: Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing, May 2-4, 2001, HongKong, 2001, pp. 150-153.
    135 刘华咏，周洞汝．一个基于内容的新闻视频浏览和查询系统：NewsBR．小型微型计算机系统，2004，25(4)：535-539．
    136 H.M. Wang, S.S. Cheng, Y.G. Chen. The SoVideo mandarin Chinese broadcast news retrieval system. Journal of Speech Technology, 2004, Vol. 7, pp. 189-202.
    137 W.K. LO, H.M. Meng, P.C. Ching. Multi-scale spoken document retrieval for Cantonese broadcast news. Journal of Speech Technology, 2004, Vol. 7, pp. 203-219.
    138 S.E. Johnson, P. Jourlin, K.S. Jones. Information retrieval from unsegmented broadcast news audio. Journal of Speech Technology, 2001, Vol. 4, pp. 251-268.
    139 JP Liu, YX He, M. Peng. NewsBR: A content-based news video browsing and retrieval system. Lecture Notes in Computer Science 3332, 2004, pp. 197-204.
    140 Bimbo A. Visual information retrieval. Morgan Kaufmann, Inc. 1999, Vol. 40, No. 5, pp. 70-79.
    141 F. Chang, G. C. Chen, C.C. Lin, etc. Caption analysis and recognition for building video indexing systems. ACM Multimedia Systems, 2005, Vol. 10, No. 4, pp. 344-355.
    142 彭宇新，Ngo Chong-Wall，郭宗明，肖建国．基于内容的视频检索关键技术，计算机工程，2004，30(1)：14-16．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700