视频中的文本提取及其应用

英文题名：Text Extraction on Video and Its Application
作者：陆兵
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：视频检索 ; 视频文本定位 ; 投影 ; 支持向量机 ; 广告视频探测
英文关键词：Video Retrieval ; Video Text Localization ; Projection ; Support Vector Machine ; Advertisement Detection
学位年度：2007
导师：李士进
学科代码：081203
学位授予单位：河海大学
论文提交日期：2007-05-01
答辩委员会主席：曹敬

摘要

文本是视频中重要的内容信息。视频中文本的检测和识别在视频分析过程中起到很大的作用。文本可以作为视频片断的内容标识和索引，例如在新闻视频中出现的新闻摘要，可以作为该段新闻内容的描述，用于新闻视频资料的检索。所以对视频文字的检测和分析是视频分析的重要内容。而检测视频中文字的出现及其准确位置，并将文字从复杂多变的背景中分割出来，是视频文字分析处理的基础。
     文本信息提取系统主要包括文本检测，文本定位，文本跟踪，文本提取，文本增强和OCR识别六个部分。本文重点研究了文本定位的方法，提出了一种基于投影分析与支持向量机学习相结合的文本定位方法，试验表明该方法比单纯的基于边缘的方法或是学习的方法都要好。首先采用投影分析的方法将可能的文本区域提取出来，然后再采用基于支持向量机学习的方法将提取出来的文本区域中的虚假文本区域排除掉。该方法虽然比基于边缘的方法多了一步，但文本区域的检准率有了较大的提高。与一般的基于学习的方法相比，该方法不必对整个图像区域进行特征计算，减小了计算的时间复杂度。在使用支持向量机进行文本分类时本文采用了小波，角点，扫描线和区域内边缘点的重心位置等特征。
     论文最后用该方法用于广告视频文本的检测，采用基于多分辨率分析的方法定位广告文本。通过比较发现，新闻中的文本出现位置比较固定而且各个电视台的文本都有各自固定的格式，但广告中的文本无论是大小，字体都是不一样的，利用这一差别可以对广告片断的起始位置有一个更加精确的定位。实验结果表明该方法可以很好的定位出广告文本。
Text is part of the important information in videos. Text detection and recognition in videos can help a lot in video content analysis and understanding, since text can provide concise and direct description of the stories presented in the videos. In digital news videos, the superimposed captions usually present the involved person's name and the summary of the news event. Hence, the recognized text can become a part of index in a video retrieval system. The importance of video can be estimated by the recognized text. So text detection and analysis is important in video analysis. Detecting the accurate position of text in the video and segmenting text from the complex background are the foundation of video text analysis.
     The text information extraction system can be divided into the following six parts: text detection, text localization, text tracking, text extraction, text enhancement and text recognition. This thesis focuses on the research in text localization. The projection analysis of edge based method and the learning of support vector machine based method are combined to localize text on videos. It has shown good results in the experiments compared to the simple edge based method and the learning based method. The text localization can be divided into two steps. In the first step, the potentially text area are extracted by the edge method. In the second step, support vector machine is used to classify the actual text areas and the false text areas. The false text areas are removed in this step. This method improves the precision rate of text areas. Compared to the learning based method, this method doesn't need to compute the texture of the whole image. Instead, it only computes the texture of the text areas. This algorithm can reduce the time complexity. The textures used in the support vector machine are wavelet, corner, line and the center of gravity of the text areas.
     This method is applied in localizing text in advertisements. A multi-resolution based method is used to localize text in advertisements. It is a part of the advertisements detection system. It is obvious that texts in the news are more formal and its positions of texts are in a certain areas. But texts in the advertisements are different from each other in size and style. The method can give out a more accurate position of advertisements. And it has shown good results in the experiments.

引文

1．陆燕，陈福生，基于内容的视频检索技术，计算机应用研究，2003(11)：3-6
    2．陈剑赟，老松扬，吴玲达，基于内容的图像检索的发展最新趋势，计算机工程与应用，2002(10)：47-49
    3．汪丹，基于内容的图像标引与检索创新，现代情报，2005(04)：61-64
    4．章毓晋，基于内容的视觉信息检索，北京：科学出版社，2003
    5．李向阳，庄越挺，潘云鹤，基于内容的图像检索技术与系统，计算机研究与发展，2001(3)：89-99
    6. Keechul Jung, Kwang In Kim, Anil K. Jain, Text information extraction in images and video: a survey, Pattern Recognition, 2004 (37): 977-997
    7. U. Gargi,, D. Crandall, S. Antani, T. Gandhi, R. Keener, R. Kasturi, A system for automatic text detection in video, Proceedings of International Conference on Document Analysis and Recognition, 1999: 29-32
    8. Y. K. Lim, S. H. Choi, S.W. Lee, Text extraction in MPEG compressed video for content-based indexing, Proceedings of International Conference on Pattern Recognition, 2000: 409-412
    9. Giulia Piccioli, Enrico De Micheli,, Marco Campani, A robust method for road sign detection and recognition, Image and Vision Computing, 1996 (14) : 209-233
    10. C. M. Lee, A. Kankanhalli, Automatic extraction of characters in complex images, Pattern Recognition, 1995 (1) : 67-82
    11. Edward K. Wong, Minya Chen, A new robust algorithm for video text extraction, pattern recognition, 2003: 1397-1406
    12. Chert Datong, Odobez Jean-Marc, Bourlard Hervé, Text detection and recognition in images and video frames, Pattern Recognition, 2004 37(3): 595-608
    13. H. Hase, T. Shinokawa, M. Yoneda, C. Y. Suen, Character string extraction from color documents, Pattern Recognition, 2001 (7) : 1349-1365
    14. A. K. Jain, B. Yu, Automatic text location in images and video frames, Pattern Recognition 1998 (12) : 2055-2076
    15. Cai M, Song J Q, Lyu M R, A new approach for video text detection, Proceedings of 2002 International Conference on Image Processing, 2002 (1) : 117-120
    16. Lyu M R, Song J Q, Cai M, A Comprehensive Method for Multilingual VideoText Detection, Localization, and Extraction, IEEE transactions on circuits and system for video technology, 2005 (15): 243-255
    17. Y. M. Y. Hasan, L. J. Karam, Morphological text extraction from images, IEEE Trans. Image Process., 2000 (11) : 1978-1983
    18．刘洋，薛向阳，路红，郭跃飞，一种基于边缘检测和线条特征的视频字符检测算法，计算机学报，2005(3)：427-433
    19．蔡波，周洞汝，胡宏斌，数字视频中字幕检测及提取的研究和实现，计算机辅助设计与图形学学报，2003(7)：898-903
    20．季丽琴，王加俊，视频图像内文字自动提取的新方法，苏州大学学报(自然科学版)，2006(02)：46-50
    21．史迎春，王韬，周献中，一种基于时空分布特征的新闻字幕检测新算法，系统仿真学报，2004(11)：2483-2485
    22．郭丽，孙兴华，黄元元，杨静宇，视频文本的自动提取方法，小型微型计算机系统，2004(6)：1086-1088
    23. Y. Zhong, K. Karu, A. K. Jain, Locating text in complex color images, Pattern Recognition 1995 (10) : 1523-1535
    24. S. H. Park, K. I. Kim, K. Jung, H. J. Kim, Locating car license plates using neural networks, Electronics Letters , 1999 (35) : 1475-1477
    25. V. Wu, R. Manmatha, E. M. Riseman, Text Finder: an automatic system to detect and recognize text in images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999 (11) : 1224-1229
    26. B. Sin, S. Kim, B. Cho, Locating characters in scene images using frequency features, Proceedings of International Conference on Pattern Recognition, 2002 (3) : 489-492
    27. W. Mao, F. Chung, K. Lanm, W. Siu, Hybrid Chinese English text detection in images and video frames, Proceedings of International Conference on Pattern Recognition, 2002 (3) : 1015-1018
    28．李闯，丁晓青，吴佑寿，一种基于直方图特征和AdaBoost的图像中的文字定位算法，中国图形图象学报，2006(3)：31-37
    29．张宏志，张金换，岳卉，黄世霖，基于CamShift的目标跟踪算法，计算机工程与设计，2006(11)：108-110
    30. H. Li, D. Doermann, A video text detection system based on automated training, Proceedings of IEEE International Conference on Pattern Recognition, 2000: 223-226
    31. R. Lienhart, A. Wernicke, Localizing and segmenting text in images and videos, IEEE Transactions on Circuits and Systems for Video Technology, 2002 12(4): 256-268
    32. Kwang In Kim, Keechul Jung, Se Hyun Park, Hang Joon Kim, Support vector machine-based text detection in digital video，Pattern Recognition, 2001(2): 527-529
    33．庄越挺，刘骏伟，吴飞，潘云鹤，张引，基于支持向量机的视频字幕自动定位与提取，计算机辅助设计与图形学学报，2002(8)：749-753
    34．李朝晖，余英林，一种视频文本自动定位、跟踪和识别的方法，中国图象图形学报，2005(4)：457-462
    35. B. L. Yeo, B. Liu, Visual content highlighting via automatic extraction of embedded captions on MPEG compressed video, IS&T/SPIE Symposium on Electronic Imaging: Digital Video Compression, 1996: 142-149.
    36. R. Lienhart, W. E2elsberg, Automatic text segmentation and text recognition for video indexing, Technical Report TR-98-009, Praktische Informatik Ⅳ, University of Mannheim, 1998
    37. S. Antani, U. Gargi, D. Crandall, T. Gandhi, R. Kasturi, Extraction of text in video, Technical Report, Department of Computer Science and Engineering, Pennsylvania State University, CSE-99-016, 1999
    38. T. Sato, T. Kanade, E. K. Hughes, M. A. Smith, Video OCR for digital news archive, Proceedings of IEEE Workshop on Content based Access of Image and Video Databases, 1998: 52-60
    39. H. Li, O. Kia, D. Doermann, Text enhancement in digital video, Proceedings of SPIE, Document Recognition Ⅳ, 1999: 1-8
    40．王勇，燕继坤，郑辉，一种自适应的视频帧中字幕检测定位方法，计算机应用，2004(1)：134-135
    41．沈淑娟，基于时空信息的视频字幕提取，西安电子科技大学硕士学位论文，2004
    42．Rafael C．Gonzalez，Richard E．Woods著，阮秋琦，阮宇智译，数字图像处理，北京：电子工业出版社，2005
    43. Otsu N, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1): 62-66
    44．马小勇，谢萍，张宪民，在视频帧中提取文字区域的算法，计算机工程，2003(6)：155-157
    45. M. Bertini, C. Colombo, A. Del Bimbo, automatic caption localization in videos using salient points, IEEE international conference on multimedia and Expo., 2001
    46．赵文彬，张艳宁，角点检测技术综述，计算机应用研究，2006(10)：17-20
    47. David A. Sadlier, Dr. Scan Marlow, Dr Noel O. Connor, Dr Noel Murphy, Automatic TV Advertisement Detection from MPEG Bitstream, Pattern Recognition, 2002 (35): 2719-2726
    48. R. Lienhart, C. Kuhmunch, W. Effeisberg, On the detection and Recognition of Television Commercials, In proceedings of IEEE international Conference on Multimedia Computing and Systems, 1997
    49. Satterwhite, B. Marques, Automatic detection of TV commercials, IEEE Potentials Magazine, 2004 (23): 9-12
    50. A. Albiol, M. J. Ch. Fulla, A. Albiol, L. Torres, Detection of TV commercials, in: Proc. of the Int. Conf. on Acoustics, Speech and Signal Processing, Montreal, 2004: 541-544
    51. Sato T., Kanade T., Hughes E., Simth M., Satoh S., Video OCR: Indexing digital news libraries by recognition of superimposed caption, Multimedia System, 1999, 7(5): 385-395
    52. Zhang H J, Wu Jianhua, et al. An Integrated System for Content-Based Video Retrieval and Browsing, Pattern Recognition, 1997 (4): 643-657
    53．李默，李弼程，邓子健等，新闻视频主持人镜头的半屏幕检测算法，计算机工程与应用，2005(15)：183-185

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700