脱机印刷体维吾尔文字识别特征选择和分类器设计方法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
文字识别是模式识别的一个重要应用方向。目前,阿拉伯文字及以阿拉伯字母为基础的维吾尔文字识别技术研究相对滞后。发展维吾尔文字识别技术对研究我国西部地区少数民族历史文化、宗教信仰、古代文献和文字资料有重要意义。
     本文在对维吾尔文的特点和识别方面的难点进行详细分析的基础上,从文档图像预处理、文字切分、特征提取、分类器设计等方面对印刷体维吾尔文的识别技术进行了细致地研究和实验,研究成果主要有以下几个方面:
     1.对脱机印刷体维文的文档图像预处理方法进行了深入探讨,通过实验实现了图像二值化、平滑去噪、细化、归一化等处理,为进一步识别文字作出了准备。
     2.通过研究维文和拉丁文、中文等文字特点的不同,提出了先切分文字行、再切分字词、最后识别字母的识别方法和思路,并进行了大量的相关实验。也提出了使用隐形马尔可夫模型的整体识别方法的思路和实现设想。
     3.根据维吾尔文书写特点,提出了多种基于二值字符图像的特征提取方法:如:模板特征、环特征、连通区域特征、附加笔划特征、笔划密度特征、投影变换系数特征等,并将其作为BP神经网络分类器的输入特征进行训练。
     4.在字符图像预处理和字符特征提取的基础上,设计并实现了基于BP神经网络模型的维吾尔文字符识别分类器。该分类器通过样本集训练实验获得了收敛的结果并在维文字符识别实验中获得良好效果,印刷体字符识别率达到了98.21%。
The Recognition for character is a major application direction of pattern recognition. At present,The Arabic character, and the Uighur character based Arabic Letters recognition technology research has lagged behind, which is determined by its own characteristics. The development of the Xinjiang Uyghur character recognition technology is important to study the minority history, culture, religion and Preserve the text information and ancient literature of minorities in western China. At the same time the research have some reference value to the Arabic character Recognition.
     Based on the detailed analysis of the characteristics of the Uyghur and the difficulty in Recognition ,In this paper,We do some research and experimentation in image pre-process- ing, text segmentation, feature extraction, classification, and other aspects of the printed Uighur recognition technology. The important research is focus on the Uygur character recognition using BP neural network classifiers Design and Implementation. The main search results are the following:
     1.Printed on offline Uyghur character image pre-processing method has conducted in-depth study.We completed the binarization, smoothing and the normalization of the original image, laid the foundation for the further work.
     2.By comparison the different characteristics of Uighur, English and Chinese text, A Methods have been proposed By using First division line, then separate the words, finally identification letters, and a large number of experiments are completed. This paper also proposed the method of overall realization using Hidden Markov Model(HMM).
     3.According to Uighur writing characteristics,a variety of feature extraction methods have been introduced,such as Template features ,Aspect Ratio,Loop, Euler,Link,strokes in different directions. the combination of these characteristics Provide input vector to the ANN Classifier.
     4.This article explores the use of neural network model to achieve the Uighur character recognition method and the use of MATLAB toolbox for BP neural network classifiers to achieve the specific design process. We have a good experiment result using the of Ann classifier, the printed character recognition rate of 98.21 percent.
引文
[1]陈永杰.《福乐智慧》的辩证法思想探微.新疆社会科学,2007,6:55-57.
    [2]靳简明,丁晓青,彭良瑞等.印刷维吾尔文本切分.中文信息学报,2005,18(5):76-82.
    [3]阿布都鲁甫·甫拉提.察合台维吾尔文及其主要文献.民族语文,2006,4:50-58.
    [4]木哈拜提·哈斯木,贺燕等.现代维吾尔语和田方言词汇特点.语言与翻译,2006,58 (1):31-35.
    [5]冯志敏.基于结构特征的手写体汉字识别研究.中国学位论文全文数据库.云南大学:2006.
    [6]刘刚.基于HMM的文字识别和语音识别中若干问题的研究.中国学位论文全文数据库.北京邮电大学:2002.
    [7] Rocjard O.Duda .模式识别.北京:机械工业出版社,2004.
    [8] Khlafkhatatneh,Ibrahiem,Basem Al-Rifai.Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters,Journal Of Computer Science,2006, 2(12):879-884.
    [9]龚声蓉,刘纯平,王强等.数字图像处理与分析.北京:清华大学出版社,2006.
    [10]苏金明,王永利.Matlab7.0实用指南(下).北京:电子工业出版社,2004.
    [11] Abbas H.Hassin.离线阿拉伯字符识别系统.中国学位论文全文数据库.哈尔滨工业大学:2004.
    [12]李振宏.印刷体蒙古文文字识别的研究.中国学位论文全文数据库.内蒙古大学:2002.
    [13]杨淑莹.模式识别与智能计算-Matlab技术实现.北京:电子工业出版社,2008.
    [14]佚名.图像切分中阈值的自动选取的研究及其算法实现.http://www.image2003.co m.中国图形图像论坛.2008.9.3.
    [15]赵希刚.重磁异常解释断裂构造的处理方法及图示技术.地球物理学进展,2008,23(2):414-421.
    [16]张旭明,徐滨士,董世运.用于图像处理的自适应中值滤波.计算机辅助设计与图形学学报,2005,17(2):295-299.
    [17]马向辰.字符识别系统中图像预处理方法的研究.中国学位论文全文数据库.北京科技大学:2002.
    [18] K.R.Castleman.数字图像处理.北京:电子工业出版社,2003.
    [19]蔡元龙.模式识别.西安:西北电讯工程学院出版社,1986.
    [20]张文卿,谭宇硕,刘旭光.基于Matlab神经网络工具箱的文字识别.机电产品开发与创新.2006,6(6):101-102.
    [21] Mohemmed Z.Khedher、Gherth A.Abandah. optimizing Feature Selection for Recong nizing Handwiritten Arabic characters,proceedings of world academy of science , engineering and teachnology volume 4 february 2005, 8(2):208-216.
    [22]李冰.基于多神经网络集成的手写体字符识别.中国学位论文全文数据库.华中科技大学:2005.
    [23] AhmadM.Sarhan,Omar I.Al Helalat,Arabic.Character Recognition using Artificial Neural Networks and Statistical Analysis,Proceedings Of World Academy of Science Encineering And Technology,2007,21(5):1307-6884.
    [24]四维科技,胡小锋,赵辉.Visual C++/MATLAB图像处理与识别实用案例精选.北京:人民邮电出版社,2004.
    [25]邓文华.基于BP网络的字符模式识别.计算机仿真,2007,24(2):145-146.
    [26]吴佑寿,丁晓青.汉字识别-原理方法与实现.北京:高等教育出版社,1992.
    [27]周治紧,李玉扭.基于投影归一化的字符特征提取方法.计算机工程,2006,32(2):197-199.
    [28] Abbas H Hassin,Huang Jian-hua,Tang Xiang-long.a word level segmentation for off-line Arabic characters.Journal of Harbin Institute of Technology(New Series),2002,9(4):391-396.
    [29] Adnan Amin,Humoud Al-Sadoun,Stephen Fischer.Hand-Printed Arabic Character Recognition System Using An Artificial Network.Pattern Recognition.1996,26(4) :663-675.
    [30] J.p.Marques de sa,模式识别原理方法及应用,北京:清华大学出版社,2002.
    [31]汪芳.模式识别技术及其在文字识别领域的应用研究.中国学位论文全文数据库.西北工业大学:2004.
    [32] Saeed Mozaffari,Karim Faez,Volker Margner,Haikal El-Abed.Lexcon reduction using dots for off-line Farsi/Arabic handwritten word recognition,ScienceDirect,2008, 29(2):724-734.
    [33]陈桂明,张明照,戚红雨.应用Matlab语言处理数字信号与数字图像.科学出版社,2000.
    [34]湖海燕,黄志雄.一种基于二值图像的鲁棒字符特征提取方法.福建电脑,2006,6:76-78.
    [35]邓菁,郑永果.基于形态学的图像二值化方法.计算机工程,2002,28(11):205-207.
    [36]飞思科技产品研发中心,神经网络理论及Matlab7实现,北京:电子工业出版社,2005.
    [37]姚东,王爱民,冯峰,王朝阳等.Matlab命令大全.北京:人民邮电出版社,2000.
    [38]谷口庆治.数字图像处理应用篇.北京:科学出版社.2002.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700