一种改进的支持向量机在手写体汉字识别中的研究与应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
模式识别是一种人工智能信息处理技术,在近年来广泛应用于文字、指纹和遥感图像识别等领域。模式识别大致分为三个过程:预处理、特征提取、识别。预处理完成的是前期工作,对获取的待识别图像进行二值化、平滑、细化等图像规范化操作使得更易进行下步的识别操作。特征提取过程将输入对象的识别特征作为特征空间的一个点或一个特征矢量提取出来。识别完成最后的分类,这个过程将前面提取出来的特征矢量用分类器进行分类,通过决策函数得到最后的分类结果。
     本文主要研究的是识别过程中近年来应用较为广泛的一种分类器:支持向量机(SVM)。支持向量机是在统计学习理论的基础上发展而来的一种机器学习方法,在解决小样本、非线性及高维模式识别问题中表现出了许多特有的优势,但是传统的SVM存在很多亟待解决的问题:1)SVM核函数及其参数的选择没有固定的标准;2)SVM只能解决二类样本问题,无法解决实际情况中的多类分类问题。遗传算法(GA)是一种搜索寻优算法,摒弃了传统优化方法的搜索方式,模拟自然界生物进化过程,采用人工进化的方式对目标空间进行随机化搜索。遗传算法对求解问题本身一无所知,所需要的仅是对算法产生的每个个体进行评价,通过作用于个体上的基因,寻找更好的个体来求解问题。遗传算法这种进化搜索的优点,能在多代搜索中寻求最适合的SVM核函数参数,较好的解决了SVM参数没有固定标准的问题。同时,将SVM用正态树形层次集成起来,进行多次二类分类,从而达到多类分类的目的。
     汉字识别是用计算机自动辨识印刷在纸上或人写在纸上的汉字,学科上属于模式识别和人工智能的范畴。在当今信息发展一日千里的时代,越来越多时候面临将手写文字录入计算机系统处理的需要,这就迫使手写字符识别成为一个亟待解决的问题。
     本文结合遗传算法和正态二叉树改进支持向量机构成GA-SVMs,将这种改进的支持向量机应用在手写体汉字识别上,开发出一套手写体汉字识别系统。GA-SVMs摒弃了传统的SVM参数不确定的缺陷,能快速的搜寻最优SVM,在分类正确率上有一定的提高,同时改进了传统SVM只能二类识别的不足。实验证明,GA-SVMs对整个识别功能及结果来说有较好的表现,对传统的SVM有较好的改进。
Pattern recognition is an information processing technique of artificial intelligen -ce, which is recently widely applied in many fields such as letter recognition, finger mark recognition and remote sensing image recognition. The process of pattern recognition is approximately divided into three steps: preprocessing, character distilling and recognition. Preprocessing finishes the prophase job to make it easier to do the following recognizing work, which includes binarization, smoothness and refinement such image standardization operations. Character distilling distills recognizing character of the input object as a point or a character vector. Recognition completes the classification which classifies the above-mentioned character vector by classifier which gets the final result by decision-making function.
     The paper mainly researches a traditional classifier: support vector machine (SVM) which is diffusely used in recent years. Support vector machine is a machine learning method developing from the basic of statistic learning theory, which shows special superiorities in dealing problems of small sample, nonlinear and multidimen -sional recognitions. But there are many problems that need to be settled immediately in the traditional SVM: 1) There are no fixed standardizations in the selection of kernel functions and parameters of SVM; 2) SVM can only deal with the two-sample problem and can do nothing for the multi-classification problem. Genetic algorithm (GA) is a useful algorithm to search the optimal solution which imitates the natural evolution process of life and searches randomly in the objective space with artificial evolution mode, discarding the traditional optional search method. Genetic algorithm demands nothing for question itself but need to estimate every object generated by algorithm and to search the optimal object to settle problems through gene effecting on object. The evolutional searching advantage of genetic algorithm can help search the appropriate kernel function parameters of SVM in the multi-generation search that best resolves the problem of no fixed standardization for SVM parameters. At the same time, we can get the result of multi-classification by integrating SVM to normal trees mode.
     Chinese character recognition identifies the characters by computer that is printed or written down on paper, which belongs to the field of pattern recognition and artificial intelligence. Nowadays with the information technique developing so fast, more and more hand-writing characters are needed to be handled by computer system which makes it a serious problem to find a best way to recognize hand-writing characters.
     The paper presents GA-SVMs combining GA algorithm and normal tree and develops a hand-writing recognition system which applies this improved SVM. GA-SVMs can quickly find the optimal SVM parameters which enhances the accuracy of classification. Totally speaking, GA-SVMs represent elevation in accuracy of classification by study and experiments.
引文
[1]张忻中.《汉字识别技术》.清华大学出版社,广西科学技术出版社,1991年
    [2]吴佑寿,丁晓青.《汉字识别原理、方法与实现》.高等教育出版社,1992年
    [3]胡家忠.《计算机文字识别技术》.气象出版社,1994年
    [4]朱学庆,钱敏.脱机手写体汉字识别的研究与实现:[北京大学博士论文].北京:北京大学,2000年
    [5]王庆,赵荣椿.脱机手写体汉字识别方法研究:[西北工业大学博士论文].陕西:西北工业大学,2000年
    [6]S.V.Rice,G.Nagy and Y.A.Nartker.Optical Character Recognition:An illustrated guide to the frontier.Kluwer Academic Publishers,1999
    [7]姚天任.《语音信号处理》.华中理工大学出版社,1992
    [8]丁小清.汉字识别研究的回顾.电子学报.2002,30(9):1364-1368
    [9]Vapnik V著,张学工译.统计学习理论的本质.北京:清华大学出版社,2000:25-83
    [10]C.Cortes,V.Vapnik.Support Vector Machine Learning,1995.20(11):273-297
    [11]吴涛,贺汉根,贺明科.基于插值的核函数构造.计算机学报,2003.26(8):990-996
    [12]Daniel J S,Gabriele S.Hybrid wavelet support vector classification of waveforms.Journal of Computational and Applied Mathematics,2002.148(2):375-400
    [13]Shun-ichi Amari,Si Wu.Improving Support Vector Machine Classifiers by Modifying Kernel Functions.Neural Networks,1999.12(6):783-789
    [14)G.Gauwenberghs,T.Poggio.Incremental and Decremental Support Vector Machine Learning.Machine Learning,2001.44(13):409-415
    [15]M.Opper,R.Urbanczik.Support vector machines learning noisy polynomial rules.Physica A.2001,302(1-4):10-118
    [16]A.Bordes,S.Ertekin,J.Weston,et al.Fast kernel classifiers with online and active learning,journal of Machine Learning Research.2005,6(9):1579-1619
    [17]T.Serafini,L.Zanni.On the working set selection in gradient projection-based decomposition techniques for support vector machines.Optimization Methods and Software,2004.20(4-5):583-596
    [18]Y.Joachims.Making Large-Scale SVM Learning Practical.In:B.Schoelkopf,C.Burges,Smola,eds.Advances in Kernel Methods- Support VectorLearning.Cambridge:MIT Press,1999:169-184
    [19]Joachims T.Transductive inference for text classification using support vector machines.In:Bratko I,Dzeroski S,eds.Proc.of the 16th Int'l Conf.on Machine Learning(ICML-99).Bled:Morgan Kaufmann Publishers,1999:200-209
    [20]马勇,丁晓青.基于层次型SVM的人脸检测.清华大学学报(自然科学版),2003.43(1):35-38
    [21]Isabelle GuYon,Jason Weston,Stephen Barnhill.Gene Selection for Cancer Classification using Support Vector Machines.Machine Learning,2002.46(3):389-422
    [22]N.Friel,I.S.Molchanov.A new thresholding technique based on random sets,1999,32(9):1507-1517.
    [23]Kenneth.R.Castleman.数字图像处理.北京:电子工业出版社,1998.9
    [24]Y.Yang,H.Yan.An adaptive logical method for binarization of degraded document images.Pattern Recognition,2000.33(5):787-807
    [25]J.N.Kapur,RK.Sahoo.,A.K.C.Wong.A new method for gray-level picture Thresholding using the entry of the histogram.Computer Vision Graphics Image Process,1985,29:273-285
    [26]F.Deravi,S.k.Pal.Gray level thresholding using secondorder statistics,Pattern Recognition Lett.1983,1:417-422
    [27]H.D.Chang,J.F.Wang.Preclassification for handwritten Chinese character recognition by a peripheral shape coding methods.Pattern Recognition,1993,26(5):711-719
    [28]L.Lam,S.W.Lee,C.Y.Suen.Thinning methodologies-Acomprehensive urvey,IEEE Trans on PAMI.1992,14(9):869-885
    [29]J.Y.Lin,Z.Chen.A Chinese character thinning algorithm based on.global features and contour information,Pattern Recognition.1995,28(4):493-512.
    [30]S.S.Yu,W.H.Ysai.A new thinning algorithm for gray-scale images by the relaxation technique,Pattern Recognition.1990,23(10):1067-1076
    [31]V.K.Govindan,A.P.Shivaprasad.A pattern adaptive thinning algorithm,Pattern Recognition.1987,20(6):623-637
    [32]Y.S.Chen,Y.T.Yu.Thinning approach for noisy digital patterns,Pattern Recognition.1996,29(11):1847-1862
    [33]朱学芳,石青云,程民德.一种自适应细化方法.模式识别与人工智能,1997,10(2):140-146
    [34]沈亮,程乾生.一种新的文字细化算法.模式识别与人工智能,1997,10(3): 232-237
    [38]王耀南等.计算机图像处理与识别技术.北京:高等教育出版社,2001
    [39]封筠,王先梅.脱机手写体汉字识别技术研究的回顾与展望.微型电脑应用,2003,19(4):172
    [40](日)谷口庆治编,朱虹译.数字图像处理(基础篇).北京:科学出版社,2002
    [41]周昌乐.手写体汉字的机器识别.北京:科学出版社,1997
    [42](美)傅京孙主编,程民德等译.模式识别应用.北京:北京大学出版社,1990
    [43]李金宗.模式识别导论.北京:高等教育出版社,1994
    [44]边肇棋,张学工.模式识别.北京:清华大学出版社,2000
    [45]张世辉,孔令富.汉字识别及现状分析.燕山大学学报,2003,27(4):367-369
    [46]鲍胜利,沈予洪.汉字识别技术的新方法及发展趋势.实用测试技术,2002(2):20-22
    [47]陈勤,张国煊等.基于模糊模式识别的文本自动分类法研究.浙江大学学报(理学版),2000,27(2):292-295
    [48]殷勤业,杨宗凯.模式识别与神经网络.北京:机械工业出版社,1992
    [49]邵秀丽,李勇建等。基于进化神经网络的手写体汉字识别.南开大学学报(自然科学),2001(12):53-56
    [50]张学工译.统计学习理论的本质.北京:清华大学出版社,1999
    [51]V N.Vapnik.The Nature of Statistical Learning Theory.Berlin:Springer,1998
    [52]石繁槐,童学锋.SVM在小字符集脱机手写体汉字识别中的应用研究.计算机工程,2000(6):154-155
    [53]李元祥,丁晓青等.基于HMM的汉语文本识别后处理的研究.中文信息学报,1999,13(4):29-32
    [54]夏莹,马少平等.汉字文本识别的自动后处理.语言文字应用,1997(2):99-105
    [56]杜树新,吴铁军.模式识别中的支持向量机方法[J].浙江大学学报:工学版,2003,375:521 527.
    [57]周明,孙树栋.遗传算法原理及应用[M].北京:国防工业出版社,1999:32 64.
    [58]Tom M.Mitchell著,曾华军,张银奎等译.机器学习.北京:机械工业出版社,2003
    [59]邓乃扬,田英杰.数据挖掘中的新方法:支持向量机.北京:科学出版社,2004
    [60]Nello Cristianini,John Shawe-Taylor著.李国正,王猛,曾华军译.支持向量机导论.北京:电子工业出版社,2004
    [61]张学工.关于统计学习理论与支持向量机.自动化学报,2000.26(1):32-42
    [62]陈宝林.最优化理论与算法.北京:清华大学出版社,1998
    [63]Amari S,Wu S.Improving support vector machine classifier by modifying kernel functions.Neural Networks,1999.12(9):783-789
    [64]Schokopf B,et al.Input space versus feature space in kernel-based methods.IEEE Trans.Neural Networks.1999.10(9):1000-1017
    [65]Smola A.J,Scholkopf B,MUller K-R.The connection between regulari zation operators and support vector kenerl.Neural Networks,1998,11(4):637-649
    [66]Platt J C.Fast training of support vector machines using sequential minimal optimization.In:ScholkopfB,Burges C,Smola.Advance in Kernel method-support vector Learning.Cambridge:MITPress,1999:185-208
    [67]陈继超.支持向量机技术及其应用[J].科技信息,2007第25期
    [68]Platt J C,Cristianini N,Shawe-Taylor J.Large Margin DAGs for Multiclass Classification In Advances in Neural Information Processing Sys-ms[A].MIT Press,2000:547-553
    [69]刘志刚,李德仁,秦前清等.支持向量机在多类分类问题中的推广[J].计算机工程与应用,2004,40(7):10-13

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700