脱机手写体汉字识别的特征提取研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
汉字识别是用计算机自动辨识印刷在纸上或人写在纸上的汉字,学科上属于模式识别和人工智能的范畴。汉字识别涉及到模式识别、图像处理、人工智能、形式语言与自动机、模糊数学、组合数学、信息论、中文信息处理等学科,也涉及到语言文字学、心理学、仿生学等,是一门综合性技术。
     汉字识别是一种难度非常大的模式识别。这是因为:从客观上讲,汉字是一种特殊的模式集合,其模式种类很多,结构非常复杂,有的模式又十分相似,加上印刷质量与干扰的影响,以及人们在书写时的随意性使字形不够规范等原因,都使得汉字字符的识别十分困难。
     首先,预处理在手写体汉字识别中占有重要地位。本文讨论了手写体汉字的预处理方法,实现了传统的二值化、平滑算法,实现了一种基于图像有效区域的密度均衡原则的非线性规范化方法,它较之其他几种方法更能有效地减小同类字符之间的差异,更有效地提高了手写体汉字的识别率。
     在特征提取方面,本文提出一种模糊子笔画抽取方法,解决了因无限制手写体笔画随意性而使得抽取的子笔画不稳定的问题。计算字符边缘点“横”、“竖”、“撇”、“捺”的模糊子笔画属性特征,并将其与模糊网格相结合,生成模糊子笔画统计特征。
     此外,在特征提取方面,还提出了一种基于子块及其相关模糊特征的提取方法。这种方法既考虑了汉字笔画的分布特点,又很好地考虑了汉字拓扑结构上的相关性,是对人认知汉字机理的一种模仿,这对识别书写风格差异大、随意性强、结构变形大的手写体汉字,是一种很好的方法。
     最后,本文介绍了一个机器阅卷系统。包括其应用环境、主要功能、使用的主要技术。论文作者主要负责答案填涂区域的处理,并用本文提出的方法对姓名进行了识别实验。
Chinese character recognition is automatically recognizing Chinese characters printed or written on paper with the help of computer. It is pertain to pattern recognition and artificial intelligence. It deals with pattern recognition, image processing, artificial intelligence,formal language and automata,fuzzy mathematics, compounding mathematics,informatics, Chinese information processing, as well as linguistics,psychology,bionics. It is a universal technology.
     Chinese character recognition is a kind of pattern recognition with great difficulty. On one hand,Chinese characters are a special pattern set,which has many patterns,complicated structures. Some patterns are very alike. Poor quality of printing,impact of voice,and irregular shape of written characters make their recognition even more difficult.
     Firstly, preprocessing plays an important role in handwritten Chinese character recognition. In the step of preprocessing, traditional thresholding, smoothing is implemented. In addition, a modified nonlinear normalization method based on density equalization of the exact area is implemented, which narrows the difference within the same class, compared with other normalization method. As a result, the recognition rate is increased greatly.
     A fuzzy sub-stroke extraction method is proposed to resolve the unsteadiness because of the unconstrained written fashion. First calculating the attribution feature of boundary point related to the four fuzzy sub-strokes—horizontal、vertical、left diagonal and right diagonal, then combing fuzzy mesh with fuzzy sub-stroke attribution feature of boundary points to obtain the fuzzy sub-stroke statistical feature of a Chinese character.
     In addition, an approach of block feature and its related fuzzy feature based on elastic mesh is presented. The method simulates the mechanism when people recognizing Chinese character. It is an effective method, especially being good at recognizing handwritten Chinese character which is written in different style, with great distortion.
     Finally, a practical processing system of paper check developed by us is introduced including the applying environment、primary function and the main technology used in the system.My main job is to process the painting answer area and use the methods presented in this paper to recognize the name character.
引文
[1] 张忻中.汉字识别技术.北京:清华大学出版社,1991,3-5
    [2] 吴佑寿,丁晓青.汉字识别原理、方法与实现.北京:高等教育出版社,1992,2-4
    [3] 胡家忠.计算机文字识别技术.北京:气象出版社,1994,2-3
    [4] 朱学庆.脱机手写体汉字识别的研究与实现:[北京大学博士论文].北京:北京大学,2000,5-8
    [5] 王庆.脱机手写体汉字识别方法研究:[西北工业大学博士论文].陕西:西北工业大学,2000,3-5
    [6] 蔡元龙.模式识别.陕西:西安电子科技大学出版社,1990,4-7
    [7] 李介谷,蔡国廉.计算机模式识别技术.上海:上海交通大学出版社,1991,5-6
    [8] 朱宁波,曾生根,娄震等.支票手写体汉字大写金额识别的非线性规范化.计算机辅助设计与图形学学报,2005,17(6):1247-1251
    [9] 赵珀璋,张凇江.中文信息处理.北京:宇航出版社,1990,100-102
    [10] N.Friel,I.S.Molchanov.A new thresholding technique based on random sets. Pattern recognition,1999,32(9):1507-1517
    [11] (美)Kenneth.R.Castleman著. 数字图像处理.朱志刚,林学訚,石定机等译.北京:电子工业出版社,1998,30-33
    [12] Y.Yang,H.Yan.An adaptive logical method for binarization of degraded document images.Pattern Recognition,2000,33(5): 787-807
    [13] J.N.Kapur,P.K.Sahoo,A.K.C.Wong.A new method for gray-level picture thresholding using the entry of the histogram.Computer Vision Graphics Image Process,1985,29(2): 273-285
    [14] F.Deravi,S.k.Pal.Gray level thresholding using second-order statistics.Pattern Recognition Lett. 1983,1(3): 417-422
    [15] H.D.Chang,J.F.Wang.Preclassification for handwritten Chinese character recognition by a peripheral shape coding methods.Pattern Recognition,1993, 26 (5): 711-719
    [16] L.Lam,S.W.Lee,C.Y.Suen.Thinning methodologies:A comprehensive survey. IEEE Trans on PAMI,1992,14(9):869 -885
    [17] J.Y.Lin,Z.Chen.A Chinese character thinning algorithm based on global features and contour information.Pattern Recognition,1995,28(4):493-512
    [18] S.S.Yu,W.H.Tsai.A new thinning algorithm for gray-scale images by therelaxation technique.Pattern Recognition,1990, 23(10):1067-1076
    [19] V.K.Govindan,A.P.Shivaprasad.A pattern adaptive thinning algorithm.Pattern Recognition,1987,20(6): 623-637
    [20] Y.S.Chen,Y.T.Yu.Thinning approach for noisy digital patterns.Pattern Recognition,1996,29(11):1847-1862
    [21] 朱学芳,石青云,程民德.一种自适应细化方法.模式识别与人工智能,1997,10 (2):140-146
    [22] 沈 亮 , 程 乾 生 . 一 种 新 的 文 字 细 化 算 法 . 模 式 识 别 与 人 工 智 能 ,1997,10 (3):232-237
    [23] 赵荣椿,赵忠明.数字图像处理导论.西安:西北工业大学出版社,1995,50-52
    [24] Shuyan Zhao, Zheru Chi. Two-stage segmentation of unconstrained handwritten Chinese characters. Patern Recognition, 2003,36(1): 145-150
    [25] Casey R G, Lecolinet E.A survey of method and strategies in character segmentation. IEEE Trans. PAMI, 1996, 18(7): 690-706
    [26] 王耀南 ,李树涛 ,毛建旭 .计算机图像处理与识别技术 .北京 :高等教育出版社,2001,60-61
    [27] 封筠,王先梅.脱机手写体汉字识别技术研究的回顾与展望.微型电脑应用, 2003,19(4):17-19
    [28] ( 日 ) 谷 口 庆 治 编 . 数 字 图 像 处 理 ( 基 础 篇 ). 朱 虹 译 . 北 京 : 科 学 出 版社,2002,120-125
    [29] 周昌乐.手写体汉字的机器识别.北京:科学出版社,1997,190-196
    [30] (美)傅京孙主编.模式识别应用.程民德译.北京:北京大学出版社,1990,80-83
    [31] 李金宗.模式识别导论.北京:高等教育出版社,1994,90-91
    [32] 边肇棋,张学工.模式识别.北京:清华大学出版社,2000,70-72
    [33] 张世辉,孔令富.汉字识别及现状分析.燕山大学学报,2003,27(4):367-369
    [34] 鲍胜利 ,沈予洪 .汉字识别技术的新方法及发展趋势 .实用测试技术 ,2002, 5(2):20-22
    [35] 陈勤,张国煊,王小华等.基于模糊模式识别的文本自动分类法研究.浙江大学学报(理学版),2000,27(3):292-295
    [36] 殷勤业,杨宗凯.模式识别与神经网络.北京:机械工业出版社,1992,78-80
    [37] 邵秀丽,李勇建,蔡文进等.基于进化神经网络的手写体汉字识别.南开大学学报(自然科学版),2001,23(12):53-56
    [38] ( 美 ) 瓦 普 尼 克 . 统 计 学 习 理 论 的 本 质 . 张 学 工 译 . 北 京 : 清 华 大 学 出 版社,1999,201-210
    [39] V. N. Vapnik. The Nature of Statistical Learning Theory. Berlin: Springer,1998,150-157
    [40] 石繁槐,童学锋.SVM在小字符集脱机手写体汉字识别中的应用研究.计算机工程,2000,28(6):154-155
    [41] 李元祥,丁晓青,刘长松.基于HMM的汉语文本识别后处理的研究.中文信息学报,1999,13(4):29-32
    [42] 夏 莹 , 马 少 平 , 孙 茂 松 等 . 汉 字 文 本 识 别 的 自 动 后 处 理 . 语 言 文 字 应用,1997,2(18):99-105
    [43] 王正群,叶晖,孙兴华等.基于模糊方向特征的手写体汉字识别.模式识别与人工智能,2001,9(3):318-320
    [44] 王正群.手写体汉字识别研究:[南京理工大学博士学位论文].南京:南京理工大学计算机系,2001,36-39
    [45] O.D.Trier,A.K.Jain,T.Taxt.Feature extraction methods for character recognition -A survey.Pattern Recognition,1996,29(4):641-662
    [46] B.Lazzerini,F.Marcelloni.A linguistic fuzzy recogniser of off-line handwritten characters.Pattern Recognition Latters,2000,21(4):319-327
    [47] F.H.Cheng,W.H.Hsu,M.Y.Chen.Recognition of handwritten Chinese characters by modified hough transform techniques.IEEE Trans.on PAMI,1989,11(4): 429- 438
    [48] H.M.Lee,C.C.Sheu.A handwritten Chinese characters recognition method based on primitive and fuzzy features via seart neural net model.In:IEEE International Conference on SMC. Man Cybern,1998, 5(A):1939-1944
    [49] P.K.Wong,C.Chan.Off-line handwritten Chinese character recognition as a compound Bayes decision problem.IEEE Trans.on PAMI,1998,20(9):1016-1023
    [50] T.Akiyama,N.Hagita.Automatic entry system for.Pattern Recognition,1990,23 (11):1141-1154
    [51] 曾理 ,唐远炎 .离线手写汉字的多尺度小波特征提取 .模式识别与人工智能,2000,13(3):281-284
    [52] C.H.Leung,Y.S.Cheung,Y.L.Wong. A knowledge-based stroke-matching method for Chinese character recognition. IEEE Trans.on SMC,1987,SMC-17(6): 993- 1003
    [53] F.H.Cheng,W.H.Hsu,C.A.Chen.Fuzzy approach to solve the recognition problem of handwritten Chinese characters.Pattern Recognition,1989,22(2): 133-141
    [54] H.J.Lee,B.Chen.Recognition of handwritten Chinese character via short line segments.Pattern Recognition,1992,25(5):543-552
    [55] K.P.Chan,Y.S.Cheung.Fuzzy-attribute graph with application to Chinese character recognition.IEEE Trans.on SMC,1992,22(1):153-160
    [56] L.W.Chen,J.R.Lieh.Handwritten character recognition using a 2-layer random graph model by relaxation matching.Pattern Recognition,1990,23(11):1189- 1205
    [57] S.L.Xie,M.Suk.On machine recognition of hand-printed Chinese characters by feature relaxation.Pattern Recognition,1988, 21(1):1-7
    [58] C.W.Liao,J.S.Huang.Stroke segmentation by Bernstein-Bezier curve fitting. Pattern Recognition,1990,23(5):475-484
    [59] C.W.Liao,J.S.Huang.A transformation invariant matching algorithm for handwritten Chinese character recognition.Pattern Recognition,1990,23(11): 1167-1188
    [60] D.S.Yeung,H.S.Fong.A fuzzy sub-stroke extractor for handwritten Chinese characters.Pattern Recognition,1996,29(12):1963-1980
    [61] F.H.Cheng,W.H.Hsu,M.C.Kuo.Recognition of handprinted Chinese characters via stroke relaxation.Pattern Recognition,1993, 26(4):579-593
    [62] K.C.Fan,W.H.Wu.A run-length-coding-based approach to stroke extraction of Chinese characters.Pattern Recognition,2000,33(9):1881-1895
    [63] H.P.Chiu,D.C.Tseng.A novel stroke-based feature extraction for handwritten Chinese character recognition.Pattern Recognition,1999,32(10):1947-1959
    [64] H.S.Fong,D.S.Yeung.An extension of a fuzzy substroke extractor.In:IEEE International Conference on SMC.London,1998,5(A): 4213-4216
    [65] D.Shi,W.Shu,H.Liu.Feature selection for handwritten Chinese character recognition based on genetic algorithms.In:Pro. IEEE International Conference on SMC.Washington,1992,22(1):4201-4206
    [66] H.Wang,D.Bell,F.Murtagh.Axiomatic approach to feature subset selection based on relevance.IEEE Trans. PAMI,1999,21(3):271-277
    [67] L.Y.Tseng and C.T.Chuang.An efficient knowledge-based stroke extraction method for multi-font Chinese characters.Pattern Recognition,1992,25(12): 1445-1458
    [68] Eric L’Homer.Extraction of strokes in handwritten characters.Pattern Recogni- tion,2000,33(3):1147-1160
    [69] LIN J.R,CHEN C.F.Stroke extraction for Chinese characters using a trend-followed transcribing technique. Pattern Recogniton,1996,29(11): 1789-1805
    [70] Cheng Lin Liu, In Jung Kim, Jin H.Kim.Model-based stroke extraction andmatching for handwritten Chinese character recognition.Pattern Recognition 2001,34,(2):2339-2352
    [71] Qivind Due Trier,Anil K Jain,Torfinn Taxt.Feature Extraction Methods for Character Recognition-A Survey.Pattern Recogniton,1996,29(4):641-662
    [72] R Plamondon,S N Srihari.On-line and Off-line Handwritting Recogition-A Comprehensive Survey.IEEE Trans on PAMI,2000,22(1):63-81
    [73] 陈津颖,金奕江,马少平.手写体汉字在特征空间的可视化分析.中文信息学报,2000,14(5):42-48
    [74] 马少平,夏莹,朱小燕.基于模糊方向线素特征的手写体汉字识别.清华大学学报(自然科学版),1997,37(3):42-45
    [75] 吴天雷,马少平.基于重叠动态网格和模糊隶属度的手写体汉字特征抽取.电子学报,2004,32(2):187-190
    [76] 孙立民,狄红卫,余英林.基于子块特征及其相关模糊隶属度特征的手写体汉字识别方法.通信学报,1999,20(12):82-85
    [77] 边肇祺.模式识别.北京:清华大学出版社,1998,50-54

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700