基于数学形态学的手写体汉字识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
汉字识别是模式识别的一个重要分支,也是文字识别领域最为困难的问题,它涉及到模式识别,图象处理,数字信号处理等学科,是一门综合性技术,在中文信息处理,办公自动化,人工智能等领域,都有着重要的使用价值和理论意义。脱机手写体汉字识别因其自身的复杂性,使得系统的实现具有很大的困难,目前还没有十分成熟的产品,是一门待发展的技术,因此它成为了国内外研究的热点。
     特征提取是汉字识别过程中最重要的环节,快速地提取稳定、可靠并能充分描述汉字模式的特征,是手写体汉字识别的关键。数学形态学是分析几何形状和结构的数学方法,是建立在集合代数基础上,用集合论方法定量描述集合结构的科学。
     本文提出一种以数学形态学为基础的汉字特征提取新方法。它改进及利用形态学的基本运算,无需对原图像进行预处理等操作,直接对手写印刷体汉字原始图像进行处理,从中提取出稳定有效的笔划方向特征。它具有完备的数学理论基础,无需对原始字符图像进行预处理,细化等操作,大大节约了系统开销,在运算速度方面比其他特征提取算法有更大的优势。利用上述方法提取的各方向笔段,笔段较为清晰,其中的横竖笔段,基本能达到印刷体汉字“横平竖直”的效果,在此基础上,利用所获得的横竖笔段交点,横竖笔段的统计误差作为粗分类的标准,再根据起点位置得到笔段编码序列,与标准印刷体的笔段码序列进行匹配,从而得到识别结果,实验证明该方法的有效性。
Chinese Character Recognition is an important branch of pattern recognition, text recognition is also the most difficult issues, it involves pattern recognition, image processing, digital signal processing, and other disciplines, is an integrated technology, the Chinese information processing, office automation, artificial Smart, and other fields, have a useful and important theoretical significance. Offline handwritten Chinese character recognition of the complexity of their own, making the system is very difficult to achieve, not very mature product, is a question to the development of technology, it has become a hot research at home and abroad.
     Feature extraction is the Chinese character recognition process of the most important link to quickly extract stable and reliable and can fully describe the characteristics of Chinese character patterns, hand-written Chinese character recognition is the key. Is the analysis of mathematical morphology geometric shape and structure of the mathematical method, is built on the collection on the basis of algebra, and set theory describes the collection of quantitative methods of science.
     This article proposes one kind take mathematics morphology as the foundation Chinese character feature extraction new method. It improves and uses the morphology the fundamental operation, does not need to carry on operations and so on pretreatment to the original image, to writes by hand the block letter Chinese character primitive image to carry on processing directly, withdraws the stable effective stroke direction characteristic. It has the complete mathematical theory foundation, does not need to carry on the pretreatment to the original character image, operations and so on refinement, saved the system expenses greatly, has a bigger superiority in the operating speed aspect compared to other feature extraction algorithm. Using the above method extraction's various directions pen section, the pen section is clearer, in which pen section in any case, basic can achieve the block letter Chinese character“horizontal even vertical”the effect, based on this, uses the pen section point of intersection which in any case obtains, the pen section's statistical error does in any case for the thick classified standard, obtains the pen section coded sequence again according to the beginning position, carries on the match with the standard block letter's pen section code sequence, thus obtains the recognition result, the experiment proves this method the validity.
引文
[1] 边肇琪,张学工.模式识别第二版, 清华大学出版社, 2000,4
    [2] 朱小燕,史一凡,马少平.手写体字符识别研究.模式识别与人工智能,2000,13(2): 174-180
    [3] 吴佑寿,丁晓青.汉字识别原理方法与实现.北京:高等教育出版社,1992
    [4] 高彦宇,杨扬.基于融合特征和LS-SVM的脱机手写体汉字识别.北京科技大学学报,2005, 27(4): 509-512
    [5] VK Govindan. Character Recognition?A Review. Patern Recognition, 1990 (23):671-683
    [6] 李玉静,杨扬,颇斌.基于矩和Crabor变换的手写体汉字识别方法.信息技术,2004, 27(12):46
    [7] 王琳碗,杨扬,领斌,杨毅.基于连通域单元合并和改进的穿越算法的手写汉字切分.信息技术,1998,30-32
    [8] 张析中.汉字识别技术.北京:清华大学出版社,1992
    [9] R Casey , CNagy.Automatic recognition of machine printed Chinese characters. IEEET raps.El ec.Co mpute,1 966,1 (15):9 1-101
    [10] Bemsen J .Dynamic thresholding of gray-level images .Pmcof8th Intel Co nfon Paten Recognition ,IEEE Computerpress,1986:1 251-1255
    [11] 胡家忠.计算机文字识别技术.北京:气象出版社,1999
    [12] Lee S. W, Park J. S. Nonlinear Shape Normalization Methods for the Recognition of Large-Set. Handwritten Characters Pattern Recognition, 1994,27 (7):8 95-102
    [13] 陈友斌,丁晓青等.汉字识别技术.中国计算机报,1997, 6: 89-93
    [14] 张涌,李恩林.脱机手写体汉字识别中的预处理算法研究.沈阳工业大学学报,1996, 21(6): 534-537
    [15] (日)谷口庆治编,朱虹译.数字图像处理(基础篇).北京:科学出版社,2002
    [16] 高彦宇,杨扬.脱机手写体汉字识别研究综述.计算机工程与应用,2004, (7):74 -77
    [17] 陈友斌,丁晓青,吴佑寿.非特定人脱机手写汉字识别,中国人工智能网
    [18] 蔺非.手写体汉字的研究, 合肥工大硕士学位论文,2006,5
    [19] 刘昌平,钱跃良,张永慧等.863手写汉字识别测试平台.中文信息学报1999, 14(2):2-7
    [20] 居琰. 基于多层次信息融合的手写体汉字识别研究.2002,重大博士论文
    [21] 温昌兵.基于特征融合的脱机手写体汉字识别.北科硕士论文, 2005
    [22] hunji Mori,C .Y Suen, Kazuhiko Yamamoto.Historical review of OCR research.Procof IEEE,1992,8
    [23] 高彦宇.脱机手写体汉字识别关键环节的研究及其在银行支票自动处理中的应用,北科大博士论文. 2004,04,28
    [24] R.Casey,G.Nagy. Recognition of printed Chinese character. IEEE T. Elec.Comput., 1966, 1(15): 91-101
    [25] N .Fu ji,H .Su gawaraet c. Some result on handprinted Kanji character recognition using The feature extracted from multiple stand point .T rans. IE CE,1981,(4):8 -13
    [26] 田到吉九等.手书汉字读取装置OCR-V595.东芝报告,1984, 39(9):78-178
    [27] 赤松茂等.手书汉字用文字读取装置.研究实用化报告,1987, 36(4): 57-58
    [28] 张析中.汉字识别技术的新动向 第四届全国汉字及汉语语音识别学术会议论文集,1992, (5): 56-62
    [29] 封筠.基于支持向量机的脱机手写相似汉字识别的研究.北科大博士论文 20050428
    [30] 张睿,丁晓青,方驰.脱机手写汉字识别的最优采样特征新方法. 中国图像图形学报,2002, 7 (2):176-180
    [31] 马少平,夏莹,朱小燕.基于模糊方向线素特征的手写体汉字识别.清华大学学报(自然科学版),1997,37 (3 ):42-45
    [32] 阮秋琦.数字图像处理学.北京:电子工业出版社,2001
    [33] (日)谷口庆治编,朱虹译.数字图像处理(基础篇). 北京:科学出版社,2002
    [34] 金连文,梁宇杰. 一种新的距离分类方法及其应用[J].计算机工程,1999 (8)
    [35] 石繁槐,童学锋. SVM 小字符集脱机手写体汉字识别中的应用研究. 计算机工程,2002,28(6):154
    [36] 丁爱玲,郑建国,刘芳. 支撑矢量机的改进分类器算法.长安大学学报(自然科学版),2002,22(4):86
    [37] 高彦宇,杨扬. 脱机手写体汉字识别研究综述. 计算机工程与应用,2004, 74
    [38] 封筠,王先梅. 脱机手写体汉字识别技术研究的回顾与展望.微型电脑应用. 2003,19(4):18
    [39] 吴媛,杨扬,颉斌,王宏. 基于数学形态学的脱机手写体汉字识别方法. 计算机应用, 2006,3
    [40] 金长龙, 吴迪.基于退化HMM 的印刷体文字识别研究.延边大学学报(自然科学版) 2005,12
    [41] 王树文,闫成新,张天序等. 数学形态学在图像处理中的应用. 计算机工程与应用,2004 ,40 (32) :89-92
    [42] 史绍强,王英健,唐贤瑛.基于整形特征和模糊识别的手写体汉字识别 微机发展, 2004, 1
    [43] 任俊玲.脱机手写汉字识别若干关键技术研究,北邮博士论文, 2005
    [44] 李美丽.基于数学形态学的脱机手写体汉字识别北科大硕士论文, 2007

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700