基于二叉树多层分类SVM的手写体汉字识别方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
汉字识别的研究工作一直被认为是具有重要理论意义和实践价值的模式识别问题,并被视为字符识别研究的最终目的,脱机手写体汉字识别是当前模式识别领域的一个研究热点。支持向量机是一种专门研究有限样本预测的学习方法,SVM算法是建立在结构风险最小化原理基础之上发展成的一种新型结构化学习方法,能很好的解决有限数量样本的高维模型的构造问题。因此,将SVM理论运用于脱机手写体汉字的识别有较大的理论意义和实用价值。
     论文的主要工作如下:
     1)汉字繁杂度和结构度的划分。采用基于像素点密度法将汉字分为简单字和复杂字;采用基于水平和垂直投影直方图与连通域相结合的方法将汉字分为独体字和非独体字。
     2)二叉树支持向量机构造。针对脱机手写体汉字识别中复杂模式多分类问题,在应用二叉树和SVM理论的基础上,构造了手写体汉字分类的二叉树结构支持向量机模型,进行粗分类,以支持向量机工具箱为实现手段,成功实现了对多种类型(简单、复杂、独体字、非独体字等)的分类。
     3)手写体汉字识别算法。通过多种特征提取方法的组合提取手写体汉字图像特征的方法,根据每类字的不同特点,采用不同的特征提取方法进行特征提取,利用SVM“一对多”的方法对每个类细分类识别。
     实验结果表明,本文采用二叉树SVM粗分类与“一对多”SVM细分类结合的分类识别方法,可以充分发挥SVM在二类分类问题方面相对于单一SVM方法的优势,在解决脱机手写体汉字复杂多分类识别问题上,能有效的提高分类精度和速度。
The study of Chinese character recognition is regarded as not only a important theory meaning and practice value direction in pattern recognition field, but a final goal to the research of character recognition. Chinese Characters recognition is one aspect of pattern recognition field. Support Vector Machine (SVM) is a leaning method for especially studying small-sample prediction, which is based on Statistical Learning Theory. It can well solve the construction issue of a high dimensional model of small-sample set. It can get a biggish theory meaning and practice value that the SVM theory is used for the off-line Handwritten Chinese Characters Recognizing.
     The primary contents of this thesis are:
     1) Chinese characters are composed of complication and structure. A method based on the pixels density is adopted, Chinese characters is divided into simple and complexity Chinese by this method. A method based on the combination of horizontal and vertical projection with connected component is adopted, the Chinese characters is divided into impartibility Chinese and separable Chinese.
     2) binary tree SVM. the problems associated with complex pattern and multi-classification in off-line written Chinese characters recognition are addressed and a method of classification recognition combined with binary tree SVM(support vector machine) and "one against rest" SVM are presented. A binary tree SVM multi-classification is presented. It can make coarse classification. SVM toolbox is used as the realization methods in this thesis. The classification of various style script Chinese character images depending on the above Chinese character image classification machine structures are accomplished successfully.
     3) written Chinese characters recognition machine. The feature extraction method based on six method combined is proposed. because the Chinese character have different character. So the different feature extraction method is adopted. The classification based on "one against rest" SVM is adopted to recognition.
     The experimental results indicate that the method of classification recognition combined with binary tree SVM (support vector machine) and "one against rest" SVM can exerted the superiority for 2-class classification of SVM over simple SVM algorithms completely. The generalization ability has improved greatly. The new method yields higher precision and speeds up support vector machine multi-class classification.
引文
[1]fionzalez著,阮秋琦,阮宇智等译.数字图像处理(第二版)[M].北京:电子工业出版社,2003。
    [2]张中.汉字识别技术综述[J].语言文学应用,1997,22(2):77-86。
    [3]Anil R.Jain,Robert P.W.Duin,Jianchang Mao,Statistical Pattern Recognition:A Review[J],IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(1):1-3.
    [4]高焉宇,杨扬.脱机手写体汉字识别研究综述[J].计算机工程与应用,2004,7:74-77。
    [5]Xiaoou Tang,Feng Lin,Jianzhuang Liu.Video-based handwritten Chinese character recognition[J],IEEE Transactions on Circuits and Systems for Video Technology,2005,15(1):167-174.
    [6]张德喜.手写体机器识别技术现状[J],许昌师专学报,1999(3):91-95。
    [7]郭军.智能信息技术[M].北京:北京邮电大学出版社,1999。
    [8]Thomas H.Hildebrandt,Wentai Liu,Optical Recognition of Handwritten Chinese Characters:Advances Since 1980,pattern Recognition,1993,Vol.26,No.2,pp205-225.
    [9]Hong-Wei Hao,Xu-Hong Xiao and Ru-Wei Dai,Handwritten Chinese Chracter Recognition By Metasynthetic Approach,pattern Recognition 1997,Vol.30,No.8,pp1321-1328.
    [10]周昌乐.手写体汉字的机器识别.北京:科学出版社,1997.
    [11]蔡元龙,《模式识别》,西安:西安电子科技大学出版社,1990.
    [12]李介谷,蔡国廉,《计算机模式识别技术》,上海交通大学出版社,1991.
    [13]Ng,G.S,S.Erdogan,D.Shi,Insight of fuzzy neural systems in the application of handwritten digits classification[J],International Journal of Image and Graphics,2006,6(4):1-21.
    [14]Attaulah Khawaja,Shen Tingzhi,Root Mohammad Memon,Recognition of printed Chinese characters by using neural network[A],INMIC' 06IEEE Multitopic Conference[C],2006:169-172.
    [15]陈友斌,丁晓青,吴佑寿.非特定人脱机手写体汉字识别,中国人工智能网
    [16]汪芳,康慕宁等,印刷体汉字识别技术,情报技术[J],2004(2):32-33。
    [17]赵明,手写印刷体汉字识别方法综述,计算机研究与发展,1993,Vol.30,No.4,p59-64。
    [18]高彦宇,杨扬,脱机手写体汉字识别研究综述,计算机工程[J],2004 (7).74-77。
    [19]路浩如,杨源远,手写体汉字识别问题综述,计算机应用与软件,Vol.11。
    [20]郭小朝,汉字图像模式识别的早期知觉过程,人类工效学,2000(3)。
    [21]鲍胜利,沈予洪,汉字识别技术的新方法及发展趋势,实用测试技术,2002(2)。
    [22]鲍胜利,沈予洪,汉字识别技术的新方法及发展趋势[J],实用测试技术,2002,(2):20-22。
    [23]李鑫,惠晓威,张全贵,脱机汉字识别技术研究的方法及发展趋势[J],2005,(1):1-2。
    [24]封筠,王先梅,脱机手写体汉字识别技术研究的回顾与展望[J],微型电脑应用,2003,19(4):17-19。
    [25]陈友斌,丁晓青,吴佑寿,非特定人脱机手写体汉字识别[EB/OL],中国OCR信息网,http://www.chinaocr.net/show hdr.php?xname=TVKUIVO&xpos=6&dname=
    [26]张辉,等,基于主元分析神经网络的人脸特征提取及识别研究[J],模式识别与人工智能,1996,9(1):52-57。
    [27]Ng,G.S,S.Erdogan,D.Shi,Insight of fuzzy neural systems in the application of handwritten digits classification[J],International Journal of Image and Graphics,2006,6(4):1-21.
    [28]Attaulah Khawaja,Shen Tingzhi,Noor Mohammad Memon,Recognition of printed Chinese characters by using neural network[A],INMIC' 06 IEEE Multitopic Conference[C],2006:169-172.
    [29]边肇祺,张学工。模式识别[M],北京:清华大学出版社。2002,1-8.
    [30]柳回春,马树元。支持向量机的研究现状。中国图像图形学报。2002。
    [31]邓乃杨,田英杰著。数据挖掘中的新方法—支持向量机[M],北京:科学出版社。2004.105-109.
    [32]Christopher J C Burge.A tutorial on support vector machines for pattern recognition[J].Data Mining and Knowledge Discovery,1998,2:121-167.
    [33]Boser Bernhard E,Guyon Isabelle M,Vanik V N.A Tranning Algorithern for Optimal Margin Classifiers.Fifth Annual Workshop on Computational Learning Theory,Pittsburgh:ACM press,1992.
    [34]J.P.Marques de sa著。模式识别[M],北京:清华大学出版社。2002.124-130.
    [35]朱晓霞,孙同景,陈桂友,基于二叉树和SVM的指纹分类,山东大学学报[J],2006,36(1):122-124.
    [36]王建平,黄冉,王金玲等,一种图像汉字智能识别机的研究[J],微电子学与计算机,2006,23(12):161-167.
    [37]Gonzalez著,阮秋琦,阮宇智等译.数字图像处理(第二版)[M],电子工业出版社,2003。
    [38]封筠,王彦芳,杨扬等,SVM多值分类器在脱机手写体相似汉字识别中的应用,计算机工程与应用[J],2004,27:200-202.
    [39]王建平,陈军,徐晓兵等,基于SVM的脱机手写体汉字机器学习识别方法研究[J],计算机技术与发展,2006,16(10):104-107.
    [40]Wenhao shu,Daming Shi,Guolian Qian,Fusi Wang,An Extension Matrix Approach to Chinese Character Recognition[EB/OL]http://ieeexplore.ieee.org/ie15/7099/19129/OO884415.pdf?tp=&arnumber =884415&isnumber=19129 IEEE,2006.8.
    [41]吴佑寿,丁晓青,汉字识别原理方法与实现[M],高等教育出版社,北京,1992,pp.166-168,。
    [42]Liu Cheng- lin,Kim In-Jung,Kim Jin H.Model-based stroke-based feature extraction for handwritten Chinese character recognition [J].Pattern Recognition,2001,34(2):2339-2352.
    [43]L.Xu,A.Krzyzak,C.Y Suen.Methods of combining multiple classifiers and their applications to handwriting recognition.IEEE Transactions of SMC,1992,22(3),P418-435
    [44]王鹏,朱小燕.基于RBF核的SVM的模型选择及其应用[J].计算机工程与应用2003,24,P72-73。
    [45]S.Guta,H.Wechsler.Gender classification of human faces using hybrid classifier systems Proc.International Conf.Neural Networks,1997,3,P1353-1358。
    [46]李昆仑,黄厚宽,田盛丰.入侵检测的Ⅰ类支持向量机模型.中国安全科学学报[J],2003 13(6),P72-76。
    [47]N.Giusti,F.Masulli,A.Sperduti.Theoretical and experimental analysis of a two-stage system for classification.IEEE Trans.PAMI,2002,24(7),P893-904。
    [48]S.G Bakamidis.An exact fast nearest neighbor identification technique.IEEE Int'l ConfAcoustics,Speech and Signal Processing,1993,5,P658-661。
    [49]王建平,张丽萍,脱机手写体汉字识别的SVM方法研究,计算机与数字工程,2008.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700