手写体汉字的计算机识别研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
手写体汉字计算机识别是模式识别领域最难解决的问题之一。在我们所从事的《计算机笔迹鉴别》和《网络化笔迹检索》项目的研究与应用中,经常需要从选定文稿中挑选出常见字以备鉴定,然而,从大段的手写文稿中挑选出所需要的字迹是一件繁琐的事情,工作量大、容易出错。为了提高软件的鉴别效率及实现软件的自动化、智能化,有必要对其中的手写体汉字实现计算机自动跟踪识别。手写体汉字的识别是尚未攻克的难题,相关的资料有限,在短期内试图完全解决这一难题是不大可能的。然而,本课题研究的是部分常用汉字的识别,与传统意义上的大数量集的汉字识别有所区别,这为该课题的成功实施提供了可能性。
     本文的主要研究内容为:文字识别的原理和方法,汉字图像的预处理,汉字识别的分类算法,神经网络在汉字识别中的应用,常用汉字识别系统方案设计与开发。
     文字识别的原理和方法介绍了文字识别领域采用的一般方法和策略——基于数学特征的统计决策法和基于结构特征的句法分析法。汉字图像的预处理包括对识别文稿进行平滑去噪、图像二值化、倾斜校正、行字切割、归一化以及细化。汉字识别的分类算法包括对汉字进行粗分类和细分类,在不同的分类方法中各采用两种互补的特征抽取算法,并相应地在识别上采用不同的策略。神经网络在汉字识别中的应用包括研BP神经网络及其改进算法、设计汉字识别所需要的BP神经网络,即在神经网络的输入层、中间层、隐含层采用64—20—4的结构,并利用Matlab6.5对所设计方案进行仿真和验证。
     本项目在汉字识别领域最新成果的基础上设计并开发了三级识别策略的汉字识别系统。第一级,使用传统的外围特征法和投影变换系数法将待选字进行粗分。第二级,使用笔画密度特征和比画四分解的弹性扇形网格特征进行细分。第三级,结合当前最流行的BP神经网络算法对结果进行最后的确认,最终输出结果。
     本系统采用Delphi6.0进行软件开发,对写字较为规范正规的手写体,其识别率达到98%以上(10候选),取得了令人满意的结果。
The problem of Chinese handwritten character recognition by computer is thought of one of the most difficult problems in the field of pattern recognition. In our project of "Computer Chinese Handwriting Identification" and "Chinese Handwriting Sort in Internet", we always need to pick some special hand script from the manuscript for discrimination. But this work is very troublesome, uninteresting and easy to make mistakes. To improve the automation and intelligence of the software, we need to implement the function of auto-pick scripts. That is the task of handwritten character recognition. But the problem is too large and difficult to be solved all. Even though, in our research, we just want to pick a small quantity of special characters from the manuscript. It supplies us a possibility to solve the problem successfully.
    The main research content of this thesis include: the basic theory and method of character recognition, the pre-work of script image, the classification algorithm, the research of neutral network, and the system design of usual character recognition.
    The thesis introduces two basic thinking in field of optic character recognition (OCR), which is statistical- decision algorithm based on math characteristic of character and structure-decomposition algorithm based on physical characteristic of character. The thesis introduced 6 steps of OCR pre-work, which is getting rid of noise, image binary, image incline rectify, image incise, image standardize and image thinning, the classification algorithm include rude classification and particular classification. We adopt two different
    characteristic extracting methods and recognition algorithms accordingly. Thirdly, we researched the neutral network algorithm and its improvement algorithm, designed a BP neutral network, which could apply in Chinese handwritten recognition. In the network, the input node is 64, the middle is 20 and the output is 4.we also use matlab train and simulate the designed network. Finally, we designed software, which combines all the correlate theory and method list above to validate the thinking.
    
    
    In this project, a new handwritten character recognition system has been designed successfully, which has 3 levels. In the first level, it used the tradition periphery characteristic to class approximately. In the second level, it used stressful grids characteristic to class accurately and in the third level, used neutral network tools and gives the last output.
    The software uses Delphi 6.0 do experiments, from the effect of experiment. We could see, its correct recognition rate has reach up to 98 %( 10 candidates) this result is satisfying and encouraging.
引文
[1] 胡家忠.计算机文字识别技术.北京:气象出版社,1993.
    [2] 吴佑寿.教电脑识字—浅谈汉字识别.北京:清华大学出版社,1991.
    [3] 周昌乐.手写汉字的机器识别.北京:科学出版社,1992.
    [4] 吴佑寿,丁晓青.汉字识别—原理、方法与实现.北京:高等教育出版社,1991.
    [5] 刘定一.文字、图形识别技术.北京:人民邮电出版社,1987.
    [6] 丁晓青,郭繁夏.汉字识别技术的发展.北京:清华大学出版社,1993.
    [7] 张析中.汉字识别技术.北京:清华大学出版社,1992.
    [8] 张德喜.手写体机器识别技术的现状分析.许昌师专学报,1999(3).
    [9] Z.Zhao, M.Suters, H.Yan. Connected handwritten digit separation by optimal contour Partition. Proceedings of DICTA-93 Conference on Digital Image Computing Techniques and Applications, 1993, pp.786-793
    [10] N.W. Strathy, C.Y.Suen, A.Krzyzak. Segmentation of handwritten digits using contour features. Proceedings of the Second International Conference on Document Analysis and Recognition, 1993, pp .577-580
    [11] M.Suters, H.Yan. Connected handwritten digits separation using external boundary curvature. J. Electron. Imaging, 1994, 3(3): 251-256
    [12] Donggang. Yu, Hong Yan. Separation of touching handwritten multi-numeral strings based onMorphological structural features. Pattern Recognition, 2001, 34:587-599
    [13] Yi-Kai Chen, Jhing-Fa Wang. Segmentation of Single- or Multiple-Touching Handwritten Numeral String Using Background and Foreground Analysis. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2000, 220(11): 1304-1317
    [14] 何斌,马天予.Visual C++数字图像处理.北京:人民邮电出版社,2001
    [15] 阮秋琦.数字图像处理学.电子工业出版社,2001
    [16] 白廷柱,提高OCR识别率的研究.北京理工大学学报,1994
    [17] 任世宏,郭志芬.一种基于链码的轮廓平滑算法.北京理工大学学报,1998,18(4):494-497
    [18] 熊军,谢跃雷.手写印刷体汉字识别的细化算法研究.桂林电子工业学院学报,1997,17(4):45-48
    
    
    [19] 杨承磊,孟祥旭.一种新的快速细化算法的设计与实现.工程图学学报,1998,3:87-93
    [20] 韩燮,张永梅,刘幼立.汉字识别的方法及Rosen细化算法的改进。华北工学院学报,1997,18(1)
    [21] 盛业华,庸宏,杜培军,等.一种保形的快速图像形态细化算法.中国图像图形学报,2000,5A(2):89-93
    [22] 吕岳,施鹏飞.一种实用并行细化算法及其实现.计算机工程与设计,2000,21(4):53-56
    [23] 刘志敏,杨杰,施鹏飞.数学形态学的细化算法.上海交通大学学报,1998,32(9):15-19
    [24] Xinyu Wu, Tao Yang. A New Algorithm of CNN for Binary Image Thinning. Journal of Nanjing Institute of Posts and Telecommunications, 1995, 15(3):32-42
    [25] 李树忠.基于神经网络的一种图像细化方法.计算机工程,1999,25(2):70-71
    [26] 李存华.基于轮廓投影方法的文本图像偏斜纠正.中国图像图形学报,2001,10
    [27] 郭一平,基于角度编码的几何特征抽取方法.Computer Applications, 1996,6:11-13
    [28] 娄震,胡钟山,胡静宇,等.基于轮廓分段特征的手写体阿拉伯数字识别.计算机学报,22(10):1065-1073
    [29] 沈会良,李志能.基于矩和小波变换的数字、字母字符识别研究.2000,5(A)(3):249-252
    [30] 金忠,胡钟山,杨静宇,等.手写体数字有效鉴别特征的抽取与识别,1999,36(12):1484-1489
    [31] 高彤,姜华,吕民.基于模板匹配的手写体字符识别方法.哈尔滨工业大学学报,1999,31(11):104-106
    [32] Kwok-Wai Cheung, Dit-Yan Yeung. A Bayesian Framework for Deformable Pattern Recognition With Application to Handwritten Character Recognition. IEEE Transactions On Pattern Analysis and Machine Intelligence, 1998, 20(12):1382-1389
    [33] A.El-Yacoubi, M.Gilloux, R.Sabourin, et al. An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(8): 752-760
    [34] 蒋宗礼.人工神经网络导论。北京:高等教育出版社,2001.
    [35] Carlos M.Travieso, Ciro R.Morales, ltziar G.Alonso, et al. Handwritten Digits Parameterization for HMM based recognition, Image Processing and its Applications. IEEE Conference Publication, 1999, 465:770-774
    
    
    [36] Pandya A S,Macy R B.神经网络模式识别及其实现.见:徐勇,荆涛译.北京电子工业出版社,1999.
    [37] 刘永红.神经网络理论的发展与前沿问题.信息与控制,1999,28(1):31-46
    [38] 蔡元明.神经网络识别手写体数字预处理后样本空间凸集性研究.中国科学院半导体研究所,1995
    [39] Dehghan.M, Faez.K, Ahmadi.M. A hybrid handwritten word recognition using self-organizing feature map, discrete HMM, and evolutionary programming. Proceedings of the IEEE-INNS-ENNS International Joint Conference On Neural Networks, 2000, 5:515-520
    [40] 王垒,戚飞虎.多字体字符识别的模糊神经网络模型.红外与毫米波学报,1999, 18(5):412-416
    [41] Gader, P.D.Keller, J.M.Krishnapuram,et al. Neural and fuzzy methods in handwriting recognition. Computer, 1997, 30(2):79-86
    [42] 李人厚.面向Matlab工具箱的神经网络理论与应用.北京:中国科学技术出版社,2001
    [43] 楼顺天.基于Matlab的系统分析与设计—神经网络.西安:西安电子科技大学出版社.1998
    [44] 孙兆林.Matlbab6.x图像处理.北京:清华大学出版社,2002
    [45] 常利,肖立洪.Delphi6编程—数据库.北京:中国电力出版社,2001
    [46] 刘骏.Delphi数字图像处理及高级应用.北京:科学出版社,2003
    [47] 杨宁.新概念Delphi6教稃.北京:电子科技出版社,2002

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700