基于连通域的文本定位方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

基于连通域的文本定位方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Connected Component Based Approach for Text Localization in Images
作者：嵇新浩
论文级别：硕士
学科专业名称：电子与通信工程
中文关键词：文本定位 ; 字符识别 ; 连通域分析 ; 场景文本 ; 最小生成树 ; BP神经网络
英文关键词：Text location ; character recognition ; component analysis ; scene text ; Minimum Spanning Tree ; BP Neural Network
学位年度：2007
导师：石旭刚
学科代码：081001
学位授予单位：浙江工业大学
论文提交日期：2007-04-01

摘要

文本定位是指从含有字符的复杂背景图像中检测或者定位出字符所在的区域。有效的文本定位方法可以将现有的光学字符识别技术扩展到更为广泛的实际应用中，例如：基于内容的图像和视频检索，车牌定位与识别等。如何有效而且快速的从复杂背景图像中提取出文本区域成为当前文档分析与识别领域研究的一个热点问题。
     本文考察了现有的主要文本定位方法，分析了其中的优缺点，提出了基于连通域和神经网络的文本区域定位方法。该方法能够有效而且快速的实现对文本区域的定位，并且对字符的大小、颜色、字体能达到很好的鲁棒性。
     基于连通域的文本区域定位方法主要有五个部分构成。首先运用改进的Niblack分割方法对输入的图像进行分割。然后对分割结果进行连通性分析得到候选的连通域集合。再从连通域中提取出各种有效的特征，将这些特征运用一个级联阈值分类器结构对连通域进行确认是否为字符连通域。
     本文将BP神经网络分类方法引入到字符连通域的识别上，将获得的连通域特征作为BP神经网络的输入。用手工获得的字符连通域样本训练神经网络，训练之后的BP神经网络能够对级联阈值分类器结构无法识别的连通域进行有效的识别，提高了字符定位的准确率。
     本文还提出了基于最小生成树方法将字符连通域联合成文本区域的方法。假设在同一文本区域的字符连通域具有类似的特征(相近的颜色和大小等)以及相近的距离。根据两个连通域特征的相似性与位置关系构建字符连通域之间一条边的权值。遍历所有的字符连通域，则将字符连通域集合构建成了一张无向带权值的图。使用最小生成树和一个阈值可以得到图像中的各个文本区域。
Locating text is refer to detecting and locating the area of characters in images which have complex background. Effective locating text in complex background image can extend the application of OCR technology such as content based image and video retrieval, car plate location and recognition, etc. Locating text in complex background images has become a very hot research issue in document analysis and recognition area.
     In this paper, we conduct an exhaustive survey of text location methods,categorize them, and discuss the advantage and disadvantage of them. Then we propose text location algorithm based on connected component(CC) and neural network. This method can effectively detect text regions in images and is robust to the variation of character's size, color, and font.
     CC based text location method is composed by four steps. First, the input image is segmented by improved Niblack method. Then CC analysis is utilized to get CCs. The set of candidate of character CC is obtained.
     Third, we extract all kinds of features of component. At last, a cascade of threshold classifiers is used to classify CCs into character CCs or non-character CCs.
     BP Neural Network is introduced into the classification of CCs. The features of CC are used as the input of BP Neural Network. Training samples is got by hand and feed into Neural Network to train the parameters. The trained Neural Network can classify CCs which the cascade of threshold classifiers can not classify and improve the precision of text location.
     In this paper, we also use Minimum Spanning Tree to combine CCs into text regions. We suppose that the character CCs in the same text region have same size and color, and that they are near each other. According to the distance and similarity between two CCs, the weight of the edge is define. By calculating all couples in the set of character CCs, a graph is got. Minimum Spanning Tree is divided into subsets based on the edge weights by a threshold.

引文

[1] Agnihotri L, Dimitrova N. Text detection for video analysis[A]. Elmaghraby A. Proc Int'l Conference on Multimedia Computing and Systems[C]. Florence: IEEE Computer Society Press, 1999:109-113
    [2] 张佑生，彭青松，汪荣贵，偶春生．基于子图像VCH的文本检测与定位方法研究[J]．武汉大学学报 (信息科学版)，2003，28(3)：354-358
    [3] Haritaoglu Ismail. Scene text extraction and translation for handheld devices[A]. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition[C]. USA: Kauai, 2001,2:Ⅱ408-Ⅱ413
    [4] 刘峰，汪斌，李向丰，胡福乔．图像文本分析技术及其应用[J]．计算机应用与软件，2005，22(2)：76-94
    [5] 王策，何炎祥，王云，张春林．基于视音频特征和文本信息的新闻视频自动场景分割[J]．计算机工程，2005，31(6)：17-199
    [6] Zhong Yu, Zhang Hongjiang, Jain Anil K. Automatic caption localization in compressed video[J]. IEEE Transactions on pattern Analysis and Achine intelligence, 2000,22(4):385-392
    [7] Lucas Simon M., Panaretos Alex, Sosa Luis. ICDAR 2003 robust reading competitions entries, results and future directions[A]. In Proc. Of the International Conference on the Document Analysis and Recognition[C], Heidelberg: Springer Berlin, 2005:105-122
    [8] Shiku O., Yi Xiao, Hong Yan. Extraction of character patterns in different styles and orientations from natural scene images[A]. Paper presented at the Intelligent Multimedia, Video and Speech Processing, 2004. Proceedings of 2004 International Symposium on[C], 2004:719-722
    [9] Wang Kongqiao, Kangas Jari A. Character location in scene images from digital camera[J]. Pattern Recognition, 2003,36(10):2287-2299
    [10] Zhou Jiangying, Lopresti D. Extracting text from WWW images[A]. In: Fourth International Conference on Document Analysis and Recognition(ICDAR) [C]. Ulm, Germany, 1997,1: 248-252
    [11] Wu Jiang, Qu Shao-Lin, Zhuo Qing, Wang Wen-Yuan, Automatic text detection in complex color image[A]. In Proc. of the 2002 International Conference on Machine Learning and Cybernetics[C], 2002,3: 1167-1171
    [12] Mao Wen-ge, Chung Fu-lai, Lam K.K.M, Sun Wan-chi. Hybrid Chinese/English text detection in images and video frames[A]. In: Proceedings of 16th International Conference on Pattern Recognition[C], Quebec, Canada, 2002:1015-1018
    [13] Haritaoglu E.D., Haritaoglu Ⅰ. Real time image enhancement and segmentation for sign/text detection[J]. Image Processing, 2003,3:993-996
    [14] Karatzas D., Antonacopoulos A. Text extraction from web images based on a split-and-merge segmentation method using colour perception[J]. Paper presented at the ICPR 2004.2004,2:634-637
    [15] Zhu Kaihua, Qi Feihu, Jiang Renjie, Xu li, Wu Guorong. Using adaboost to detect and segment characters from natural scenes[A]. In: Proceedings of Camera Based Document Analysis and Recognition[C], Seoul, Korea, 2005:52-59
    [16] Jain Anil K, Bhattacharjee Sushil. Text segmentation using gabor filters for automatic document processing[J]. Machine Vision and Application. 1992, 5(3):169-184
    [17] Wu Victor, Manmatha R, Riseman E. M. Textfinder: An automatic system to detect and recognize text in images[J]. Transactions on Pattern Analysis and Machine Intelligence, 1999, 21 (11): 1224-1229
    [18] Li Huiping, Doermann D, Kia O. Automatic text detection and tracking in digital video[J]. IEEE Trans. Image Process, 2000,9(1):147-156
    [19] KEE C J, Jung H H, Kwang I K, Park S. H. Support vector machines for text location in news video images[J], IEEE TENCON 2000 Proceedings[C]. Kuala Lumpur: s.n., 2000:176-180
    [20] 许剑峰，黎绍发．基于颜色边缘与SVM的图像文本定位[J]．计算机应用研究，2006，23(3)：155-157
    [21] Gargi U, Antani A, Kasturi R. Indexing text events in digital video databases[A]. In:Proc. 14th Int'l conf. on Pattern Recognition(ICPR)[C]. Brisbane, 1998:916-918
    [22] Lim Young-Kyu, Choi Song-Ha, Lee Seong-Whan. Text extraction in MPEG compressed video for content-based indexing[A], in Proc. Int. Conf. on Pattern Recognit[C], 2000, 4: 409-412
    [23] Clark P., Mirmehdi M. Combining statistical measures to find image text regions[A]. In Proceedings of the ICPR[C], 2000, 1: 450-453
    [24] Chen Xiangrong, Yuille A. L. Detecting and Reading Text in Natural Scenes[A].In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition[C], 2004: 366-373
    [25] Wu Victor, Manmatha R, Riseman E. M. Finding Text In Images[A]. In: Proceedings of 2nd ACM International Conference Digital Libraries[C].Philadelphia, PA, USA, 1997: 23-26
    [26] Chun Byung Tae, Bae Younglae, Kim Tai -Yun. Automatic text extraction in digital videos using FFT and neural network[A].In: Proceedings of IEEE International Fuzzy Systems Conference[C], Seoul, Korea, 1999, 2: 1112-1115
    [27] Shin C. S., Kim K. I., M. H. Park, Kim H. J. Support vector machine-based text detection in digital video[A] IEEEProc. of Signal Processing Society Workshop on Neural Networks for Signal Processing[C], Sydney, NSW, Australia, 2000, 2: 634-641
    [28] Hasan, Y. M. Y., Karam, L. J. Morphological text extraction from images[J], IEEE Trans. Image Process. 2000, 9 (11): 1978-1983
    [29] Gao Jiang, Yang Jie. An Adaptive Algorithm for Text Detection from Natural Scenes [J].Proceedings of Computer Vision and Pattern Recognition, 2001, 2: 84-89
    [30] Chen Datong, Bourlard Herve. Thiran Jean-Philippe. Text identification in complex background using SVM[A]. In Proc. of the IEEE international conference on Computer Vision and Pattern Recognition2001[C].Kauai Marriott, Hawaii, USA, 2001, 2: 621-626
    [31] Ye Qixiang, Gao Wen, Wang Weiqiang, Zeng Wei. A robust text detection algorithm in Images and video frames[A]. In Proc. Of the 2003 Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing and the Fourth Pacific Rim Conference on Multimedia[C], 2003: 802-806
    [32] Chen Po-Yueh, Liang Chung-Wei. Dwt based text localization[J]. International Journal of Applied Science and Engineering, 2004, 2(1): 105-116
    [33] Liu Yangxing, GOTO S, Ikenaga T. A robust algorithm for text detection in color images[A]. Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on[C]. 2005,1:399-403
    [34] Takahashi Hiroki, Nakajima Masayuki. Region graph based text extraction from outdoor images[A]. Paper presented at the Third International Conference on Information Technology and Applications 2005[C].2005,1:680-685
    [35] 李朝晖，余英林，黄海康．基于形态学的视频文本自动检测[J]．计算机应用研究，2005，22(2)：258-260
    [36] 欧文武，朱军民，刘昌平．自然场景文本定位[J]．中文信息学报，2004，18(5)：42-63
    [37] Wesley E．Snyder. Hairong Qi著，林学訚，崔锦实，赵清杰译著．机器视觉教程[M]，第1版，北京：机械工业出版社，2005：142-145，307-308
    [38] 章毓晋．图像处理和分析基础[M]，第1版，北京：高等教育出版社，2002：196-199
    [39] 林开颜，吴军辉，徐立鸿．彩色图像分割方法综述[J]．中国图象图形学报：A辑，2005，10(1)：1-10
    [40] 闻新，周露，李翔，张宝伟．MATLAB神经网络仿真与应用[M]，第1版，北京：科学出版社，2003：258
    [41] 余祥宣，崔国化，邹海明．计算机算法基础[M]，第2版，武汉：华中科技大学出版社，2000：78-82

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700