脱机手写体汉字识别系统
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
脱机手写体汉字识别是模式识别领域一个极具挑战性的课题,它将在信函分拣、银行支票识别、统计报表处理以及手写文稿的自动输入等诸多方面发挥巨大的作用。然而,手写体汉字的书写随意性很大,相邻汉字之间的位置关系也复杂多样,因此,相对于其他字符识别,脱机手写体汉字的发展明显缓慢而障碍重重。
     本系统的主要应用方向为手写文稿的自动录入,主要工作如下:
     1、预处理方面,实现了基本的图像平滑,并针对不同纸张背景制定了区别对待的图像二值化策略:对以空白纸张为背景的汉字图像采用迭代最佳分割阈值算法,以稿纸为背景的汉字图像采用双重阈值法。
     2、回顾和总结了历年手写汉字的主要细化方法,在结合本系统主要适用于汉字录入这一用途的基础上,提出了改进细化算法。
     3、介绍了几种主要的统计特征和笔划结构特征提取方法,针对手写体汉字采用全新的笔段特征提取算法,同时还提出了一种新的基于笔画结构的字切分算法。
     4、在识别阶段,本文采用了改进的双层串行分类器结构,使识别时间比单层分类器缩短了50%。
     本系统中训练和测试样本共包含一级汉字和二级汉字约2000个,每个汉字有6种不同风格。将训练样本分为两类:第一类为手写印刷体汉字,笔划疏散且基本横平竖直;第二类工整普通汉字书写有少量连笔,字形尽量规整。分别采用两种不同识别方法后得到第一类汉字识别正确率为90%,第二类汉字为85%。
Off-line handwritten Chinese character recognition is a challenge in the field of pattern recognition. It will take an important part in many fields of our life, such as letter selecting, check recognition, report form disposing and handwritten manuscript auto-input. However, for Chinese characters are so much different when written by different people, the research on off-line Chinese character recognition develops evidently slower than many other character recognition researches.
     The system is mainly used for handwritten manuscript auto-input, The main work on this is as follows:
     1、In the step of preprocessing, smoothing, binarizing and normalizing is completed. Especially, different binarizing methods are used according to different paper backgrounds. Such as the examination partition methodology is used on character pictures of blank background and double-threshold methodology is used on the pictures with frame lines.
     2、Lots of researches and tests are done on Chinese characters segmentation, and blurry rules judgment segmentation method is chosen for the application of this recognition system is Chinese character auto-input.
     3、Some statistical and structural feature extraction methods are introduced and two new ones are used for two kinds of characters, such as elastic meshing directional features are proposed for handwritten printed characters and thinned Chinese character stroke features are used for normative general characters. Besides, element tracing method and cross spot stroke segment combination method are proposed to improve inflexion extraction and stroke segment combination.
     4、In the stage of recognition, two kinds of pre-classification methods are also used to do with the two kinds of above-mentioned features. We use error equilibrium distance to judge elastic meshing directional features, and a new stroke matching method is used for the other kind of feature. Besides, an improved classifier structure is performed to save the time to 50%.
     There are 2000 different Chinese characters as training samples and testing samples, each character has 6 samples. They are divided into two classes according to their styles. The testing results show that the correct rate of the first class is 90% and the second class is 85%.
引文
[1] 丁晓青,汉字识别研究的回顾,电子学报,2002,30(9): 1364-1368
    [2] 何斌,马天予,王运坚等,Visual C++数字图像处理,北京,人民邮电出版社,2001:1-674
    [3] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Boston, Addison-Wesley, 1992:1-793
    [4] 洪华军,乔为民,朱立新,指纹图像自动识别系统预处理算法的研究,计算机应用,2001,1:12-15
    [5] 张玉姣,史忠科,一种新的车牌识别预处理算法,西北工业大学学报,2002, 20(1):83-86
    [6] 蔡 樱 , 盛 立 东 , 中 文 手 写 文 稿 的 二 值 化 与 行 列 切 分 , 中 文 信 息 学 报 ,2000,14(1):22-26
    [7] 吴佑寿,丁晓青,汉字识别原理方法与实现,北京,高等教育出版社,2002:1-161
    [8] Lu Y.,Machine Printed Character Segmentation——An Overview. Pattern Recognition, 1995, 28(1): 67-80
    [9] 朱锴,赵宇明,吴越. 一种离线手写体汉字切分的自适应算法。 计算机工程与应用, 2004,6: 47-50.
    [10] 陈强,娄震,杨静宇. 非限定手写汉字的分割研究. 南京理工大学学报, 2004, 28(1): 95-98.
    [11] Casey R G, Lecolinet E.,A Survey of Method and Strategies in Character Segmentation. IEEE Trans. PAMI,1996,18(7):690-706
    [12] 王琳琬,杨扬,颉斌,杨毅,基于连通域单元和穿越算法的汉字切分,信息技术, 2004,28(4): 30-35
    [13] Lin Yu Tseng, Rung Ching Chen. ,Segmenting handwritten Chinese character based on heuristic merging of stroke bounding boxes and dynamic programming. Pattern Recognition Letters, 1998,19:963-973
    [14] 王嵘,丁晓青,刘长松,基于笔划合并的手写体信函地址汉字切分识别,清华大学学报, 2004,44(4): 498-502
    [15] Zhongkang Lu,Zheru Chi,Wan-Chi Siu, Pengfei Shi,A background thinning based approach for separating and recognizing connected handwritten digit strings. Pattern Recognition, 1998, 32:921-933
    [16] Shuyan Zhao, Zheru Chi, Penfei Shi, Hong Yan,Two-stage segmentation of unconstrained handwritten Chinese characters,Pattern Recognition, 2003,36: 145-156
    [17] 魏湘辉,马少平, 基于凸包像素比特征的粘连汉字切分,中文信息学报, 2005,19(1):91-97
    [18] Z. Liang, P. Shi, A metasynthetic approach for segmenting handwritten Chinese character strings,Pattern Recognition Letters, 2005,26:1-14
    [19] Yi-Hong Tseng, His-Jian Lee, Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognition Letters, 1999,20:791-806
    [20] 张睿,丁晓青,方驰,脱机手写汉字识别的最优采样特征新方法,中国图像图形学报,2002,7(2): 176-180
    [21] 毛慧芸,金连文,韦岗,汉字的分形性及其计盒维数的统计分析,电路与系统学报,1998,3 3(1): 77-81
    [22] 金连文,覃剑钊,手写汉字识别弹性网格 Gabor 特征提取方法的研究,计算机应用研究,2004,12: 163-165
    [23] Agrawal R, Srikant R. Mining Sequential Patterns. In:Proceedings of 11 International Conference on Data Engineering, Taipei, Taiwan, IEEE Computer Society Press, Silver Spring, 1995-03
    [24] Chen M S, Park J S, Yu P S. Efficient Data Mining for Path Travsersal Paterns. IEEE Trans. Knowledge Data Engineer, 1998,10 (2) : 209-211
    [25] Pei J, Han J, Mortazavi B, et al. Mining Access Patterns Efficiently from Web Logs. In: Proceedings 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto,Japan(PAKDD00), 2000-04
    [26] Seong-Whan Lee,Jeong-Seon Park. Nonlinear shape Normalization Methods For The Recognition of Large-set Handwritten Characters, Pattern Recognition, 1994,27:895-902
    [27] 陈 友 斌 , 丁 晓 青 , 吴 佑 寿 . 一 种 手 写 汉 字 特 征 抽 取 的 新 方 法 . 信 号 处 理.1998.14(2):117-122
    [28] Kuo-Chin Fan, Wei-Hsien Wu, A run-length-coding-based approach to stroke extraction of Chinese characters, Pattern Recognition, 2000,33:1881-1895
    [29] His-Jian Lee, Bin Chen, Recognition of handwritten Chinese characters viashort line segments, Pattern Recognition, 1992,25(5): 543-552
    [30] Fang-hsuan Cheng, Multi-stroke relaxation matching method for handwritten Chinese character recognition, Pattern Recognition, 1998,31(4):401-410
    [31] An-Bang Wang, Kuo-Chin Fan, Optical recognition of handwritten Chinese characters by hierarchical radical matching method, Pattern Recognition, 2001,34:15-35
    [32] Cheng-Lin Liu, In-Jung Kim, Jin H. Kim, Model-based stroke extraction and matching for handwritten Chinese character recognition, Pattern Recogniton, 2001,34: 2339-2352
    [33] Cheng-Huang Tung, His-Jian Lee and Jeng-Yuh Tsai, Multi-stage pre-candidate selection in handwritten Chinese character recognition system,Pattern Recognition, 1994, 27(8):1093-1102
    [34] Chin-Chuan Han,Yao-Lung Tseng,Kuo-Chin Fan, Coarse classification of Chinese character via stroke clustering method,Pattern Recognition Letters, 1995,16:1079-1089
    [35] Zen Chen,Chi-Wei Lee,and Rei-Heng Cheng,Handwritten Chinese Character Analysis and Preclassification Using Stroke Structural Sequence, ICPR, 1996:89-93
    [36] 高学,金连文,尹俊勋等,一种基于支持向量机的手写汉字识别方法,电子学报,2002,30(5):651-654
    [37] 冯兵,丁晓青,吴寿佑,HMM 方法识别脱机手写汉字,模式识别与人工智能,2002, 15(1):84-87
    [38] 钟国华,金连文,手写体汉字扇形弹性网格特征提取的新方法,计算机工程,2002,28(11):61-62
    [39] 金连文,梁宇杰,一种新的距离分类方法及其应用,计算机工程,1999,25(8):30-31
    [40] 王华,脱机手写体汉字识别系统,[南京航空航天大学硕士学位论文],南京,南京航空航天大学,2005
    [41] 张红旗,方应谦,王鲁等,识别手写汉字的基元模糊关系法,计算机工程与应用,2003,25:124-126
    [42] 朱小燕,史一凡,基于反馈的手写体字符识别方法的研究,计算机学报,2002, 25(5):476-482
    [43] 李元祥,丁晓青,刘长松,一种基于噪声信道模型的汉字识别后处理新方法,清华大学学报,2001,41(1):24-28
    [44] 牛光,脱机手写体汉字识别后处理综合纠错算法设计和实现,计算机应用与软件,2003,5:26-28
    [45] 李元祥,丁晓青,刘长松,基于 HMM 的汉语文本识别后处理研究,中文信息学报,1999,13(4):29-34

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700