基于HTK的维吾尔文手写单词识别设计与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
联机手写字符识别在模式识别领域占有重要的地位,无论是在理论上还是实际应用中都具有非常大的研究价值。目前,国内外对英文、汉字及其他使用广泛的字符的识别研究较多,且取得了很大的成果,很多已应用到实际的生活中。对于少数民族字符的研究相对较少,联机手写维吾尔文字符的识别研究一直主要集中在对字母的研究上,对单词的识别研究仍处在初级探索阶段,本文旨在研究联机手写维吾尔文单词识别的应用技术和方法。
     维吾尔文字符和阿拉伯文极为相似,本文在借鉴阿拉伯文、英文及其他常用字符比较成熟的识别方法基础上,提出了符合维吾尔文单词特点的,基于隐马尔可夫模型的识别方法,并采用用于语音识别的HTK工具对其进行实现。本文首先分析了维吾尔文字符识别存在的难点,阐述了隐马尔可夫模型的基本原理,及其主要研究的三个问题和解决问题用到的算法;其次,概述了样本库的建立,提出了适用于维吾尔文字符的预处理和特征提取方法,此外,对延迟笔划的处理也做了研究,选择了特殊的投影方法;再次,对隐马尔可夫模型的选择和设置做了详细介绍,分别对不同的语言模型做了详细说明,基于规则的语言模型在识别过程中排除了不符合单词字母组成结构的可能单词,提高了识别率;最后,运用HTK工具对字母模型进行初始化、训练,然后对单词进行识别和评估,并对其做了实验比较,总结了不同语言模型的优缺点。通过实验结果,得出本文提出的基于HTK的维吾尔文手写单词识别方法是具有可实施性的,且有效的。
Online handwriting character recognition plays an important role in the field ofpattern recognition, having a large value both in theory and practical application. Atpresent, the recognition research is focus on English, Chinese and other widely usedcharacters at home and abroad, and it has made great achievements applied to real life.Fewer studies on minority characters, online handwriting Uyghur characterrecognition has been focused on the letters, however the research for wordsrecognition is still in the stage of exploration. This paper aims to propose sometechniques and methods for online handwriting Uighur word recognition.
     Uyghur script is similar to Arabic. Learning from the mature method about Arabic,English and other commonly used characters, this paper presents a method based onHMMs using HTK. It is in line with Uyghur word characteristics. This paper firstlyanalyzes the difficulties of the Uyghur character recognition, and describes the basicprinciples of the Hidden Markov Model, the three problems and their algorithm.Secondly, it gives an overview of the establishment for the sample set and proposessome methods about preprocessing and feature extraction for Uyghur word. Inaddition, there have some work on delay strokes handing, which adopting theprojection method. Thirdly, it analyzes the structure of HMM model and differentlanguage models in detail. Rule-based language model exclude the possible wordwhich is not in accordance with the structure of Uyghur word, and improves therecognition rate. Finally, HMMs are initialized, trained. Words are identified, assessed.They are both using HTK tools. It has some experimental comparison, summarizingthe advantages and disadvantages of different language models. The experimentalresults show that the method proposed in this paper is practicable and effective.
引文
[1]丁晓青.汉字识别—原理方法与实现[M].清华大学.2000年.
    [2]边肇祺,张学工等.模式识别[M].第二版.北京:清华大学出版社,1992年12月.P296-303.
    [3]刘昌平,钱跃良等.863手写汉字识别测试平台[J].中文信息学报.2003年,14(2):2-7.
    [4]宋扬.基于HMM的联机手写汉字识别[D].陕西:西安电子科技大学,2009年.
    [5]万芳.联机手写维吾尔文字识别技术的研究与实现[D].乌鲁木齐:新疆大学,2007年.
    [6]阿力木江·亚森.维吾尔文联机手写识别的预处理与特征提取[D].乌鲁木齐:新疆大学,2010年.
    [7]哈力木拉提,阿孜古丽.多字体印刷维吾尔文字符识别系统的研究与开发[J].计算机学报.2004年11月,27(11):1480-1484.
    [8]靳简明,丁晓青,彭良瑞,王华.印刷维吾尔文本切割[J].中文信息学报.2004年,18(5):76-83.
    [9]哈力木拉提,丁晓青.多字体印刷维吾尔文的切分[J].中文信息学报.1997年,11(3):35-40.
    [10]袁保社,吾守尔·斯拉木.一种手写维吾尔文字母识别算法[J].计算机工程.2010年1月,36(2):186-188.
    [11] Steve Young,Gunnar Evermann and Mark Gales.The HTK Book[M].Cambridge UniversityEngineering Department,2006, P110-112.
    [12]木塔力甫·沙塔尔.基于训练机制的联机维吾尔手写字母识别技术研究[D].大连海事大学.2010年.
    [13]陈世明,廖泽余译.现代维吾尔语[M].乌鲁木齐:新疆人民出版社,1987年10月.P50-226.
    [14]郑胜林,潘保昌,赵学军,陈箫枫.联机手写笔划特征抽取的逼近-合并算法[J].计算机工程与设计.2006年4月,27(7):1248-1250.
    [15]孙嫣,刘瀚猛等.基于数学形态学的联机手写字符识别去噪方法[J].2009年10月,36(10):237-240.
    [16]鲁湛,丁晓青.基于笔段方向信息的联机手写汉字倾斜矫正算法[J].模式识别与人工智能.2000年12月,13(4):378-382.
    [17]黄襄念,程萍,杨波等.自然手写汉字预处理子系统[J].重庆大学学报(自然科学版).2000年7月,23(4):33-37.
    [18] M. Blumenstein, C. K.Cheng and X. Y. Liu,“New Preprocessing Techniques for HandwrittenWord Recognition”, Proc.of the2nd IASTED conference on visualization, Imaging andImage Process,2002,P:480-484.
    [19] M.Pechwitz,V.Margner.“Baseline Estimation for Arabic Handwritten Words”.IWFHR,2002.
    [20] Wacef Guerfall and Rejean Plamondon,“Normalizing and Restoring On-line Handwriting”,Patter Recognition,1993,26(3):419-431.
    [21]姚丹霖,殷建平.一种联机手写汉字识别方法[J].国防科技大学学报.1997年2月,19(1):32-35.
    [22]梁佳玉,刘昌平,黄磊.脱机自由手写英文单词的识别[J].计算机应用.2004年9月,24(9):41-43.
    [23]邹明福,钮兴昱等.联机手写英文识别[J].计算机研究与发展.2006,43(1):138-144.
    [24]陈小苹,俞铁城,戴汝为.联机手写中文速记符的自动识别[J].软件学报.2000年,11(10):1361-1367.
    [25]董慧.手写体数字识别中的特征提取和特征选择研究[D].北京:北京邮电大学,2007年.
    [26]柳回春,马树元等.基于结构特征的手写体数字识别算法[J].计算机工程.2002年11月,28(11):28-30.
    [27] Fadi Biadsy,Jihad El-Sana and Nizar Habash,“Online Arabic Handwriting RecognitionUsing Hidden Markov Models”, Tenth International Workshop on Frontiers in HandwritingRecognition,2006.
    [28] M. A. ALI, K. B. J.and S. A. S,“Features Extraction Method for Arabic Characters Based onPixel Orientation Technique”, Proceedings of the5th Wseas Int. Conf. on ComputationalIntelligence, Man-Machine Systems and Cybernetics,2006:292-295.
    [29]黄襄念,程萍等.自由手写拉丁字母联机识别[J].计算机应用.2000年5月,20(5):40-42.
    [30] M.Harouni, D.Mohamad, A.Rasouli,“Deductive method for recognition of on-linehandwritten Persian/Arabic characters,” Computer and Automation Engineering,2010,5(26), pp.791-795.
    [31] T.J.Klassen, Malcolm.I.Heywood,“Towards the On-Line Recognition of ArabicCharacters,” Proceedings of the2002International Joint Conference on Neural Networks,2002, pp.1900-1905.
    [32] SAMIR AL-EMAMI and MIKE USHER,“On-line Recognition of Handwritten ArabicCharacters”, IEEE,1990,12(7):704-710.
    [33] Ramy El-Hajj a,b, Laurence Likforman-Sulem c, Chafic Mokbel a,“Arabic HandwritingRecognition Using Baseline Dependant Features and Hidden Markov Modeling”, ICDAR’05,2005, Vol.2:893–897.
    [34] A.T.Al-Taani, S.al-Haj,“Recognition of On-line Arabic Handwritten Characters usingStructural features,” Journal of Pattern Recognition Research,2010,5(1):23-37.
    [35] H.Boubaker, A.Elbaati, M.Kherallah, H.Elabed, and A.M.Alimi,“Online ArabicHandwriting Modeling System based on the Graphemes Segmentation,” Proceedings ofthe20th International Conference on Pattern recognition,2010, pp.2061-2064.
    [36]冯兵,丁晓青,吴佑寿.HMM方法识别脱机手写汉字[J].模式识别与人工智能.2002年3月,15(1):84-88.
    [37] M.S. Khorsheed,“Recognizing handwritten Arabic manuscripts using a single hiddenMarkov model”, Pattern Recognition Letters,2003:2235–2242.
    [38] J.Tokuno, N.Inami, S.Matsuda, M.Nakai, H.Shimodaira, and S.Sagayama,“Context-dependent Substroke Model for HMM-based On-line Handwriting Recognition,”Proceedings of the8th International Workshop on Frontiers in Handwriting Recognition,2002,pp.78-83.
    [39] M.Hamdani, H.E.Abed, M.Kherallah, and A.M.Alimi,“Combining Multiple HMMs UsingOn-line and Off-line Features for Off-line Arabic”, Document Analysis andRecognition,2009, pp.201-205.
    [40] G.Rigll, A.Kosmala,“A Comparison between continuous and discrete density HiddenMarkov Models for cursive handwritten recognition”, proceedings of ICPR,1996.
    [41] A. B., A. E. and M. S.“HMMs with Explicit State Duration Applied to Handwritten ArabicWord Recognition”, The18th International Conference on Pattern Recognition (ICPR'06),2006, Vol.2:897–900.
    [42] A. Benouareth, A. Ennaji and M. Sellami,“Arabic Handwritten Word Recognition UsingHMMs with Explicit State Duration”, EURASIP Journal on Advances in Signal Processing,Volume2008:1-13.
    [43] V. M rgner.H. El Abed and M. Pechwitz,“Offline Handwritten Arabic Word RecognitionUsing HMM-a Character Based Approach without Explicit Segmentation”,IEEE,2010,Vol.1:526–529.
    [44] M.S.Khorsheed,“Recognising Handwritten Arabic Manuscripts Using a Single HiddenMarkov Model”, Pattern Recognition Letters,2003, vol.24(14):2235-2242.
    [45] Jianying Hu, Sok Gek Lim and Michael K. Brown,“Writer independent on-line handwritingrecognition using an HMM approach”, Pattern Recognition,2000, Vol.33:133-147.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700