联机手写蒙古文字识别技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蒙古文输入法的研究开始于上世纪八十年代初期,主要集中在键盘输入上,对蒙古文文字识别的研究非常少。针对这种情况,我们提出研制一套手写体蒙古文字识别系统,为蒙古文提供了一种快速、高效、智能的输入方式。联机手写识别的根本任务是通过数字设备采集手写输入信号,从中提取输入特征,再与特征库加以匹配识别的过程。但是由于手写体笔迹变动非常大,精确识别比较困难。特别是连笔字的识别,由于字母切分的困难使得识别难度大增。
     近年来,随着个人数字助理(PDA)等便携式移动计算设备的普及,手写输入的应用越来越广泛。现在有很多汉字和英文的联机手写识别产品问世。而蒙古文字作为一种在蒙古族等少数民族地区流行的语言文字,研究它的手写识别方法对促进民族地区的信息与科技发展都是大有裨益的。
     本论文主要论述了联机手写蒙古文字识别技术。我们依次采用了去除噪声的预处理技术、基于蒙古文自身结构特征的基元切分技术、粗分类和细分类特征提取技术,以及结合了HMM模型与DTW方法的多分类器设计技术等。基于以上技术,我们开发出一个蒙古文字识别实验系统。实验结果表明,受训人员的单词正确识别率达到90%,笔迹受限的单词正确识别率达到83%。系统整体性能良好稳定,识别率初步达到实用化水平。
In the 1980s, research of Mongolian characters input methods was begun. Most of input methods were concentrated on the keyboard code. But research of Mongolian characters recognition was quite little. Under the circumstances, we proposed to research and realize a recognition system for handwriting Mongolian characters, that can provide a new input method, which is quick, highly efficient and intelligent. The fundamental task of Online handwriting recognition is to take an input pattern, and the handwritten signals collected online via a digitizing device, and classify it as one of a pre-specified set of words (i.e., the system's lexicon). Because of large variation of handwriting, exact recognition is very difficult. Especially the connectivity between the characters, make the recognition more difficult.
     During recent years, the application of online handwriting recognition is more and more widespread, mainly due to the increasing popularity of the personal digital assistant (PDA). Now there are many products of online handwriting recognition of Chinese characters and English characters. Mongolia language is very popular among the Mongolia people in the North China, so the research of online handwriting Mongolia words recognition has a far-reaching meaning about developing the Minority information technology and national culture.
     This paper primarily discussed Online Handwriting Recognition methods for Mongolia words. We used in turn preprocessing technology based on removing the noise、letter segmentation method based on the structure of Mongolian language、the feature selection technology which include coarse classification features and fine classification features, as well as Multiple Classifier which combined HMM model and the DTW method and so on. Based on the above technology, we developed a Mongolian writing recognition experiment system. Experimental results show that writer-dependent words achieve recognition rates above 90%. And unconstrained words achieve recognition rates above 83%. Our system run well, and the recognition rate initially achieves the practical level.
引文
[1] Nadir Farah, Labiba Souici, and Mokhtar Sellami, "Arabic Word Recognition by Classifiers and Context," Computer Science Department, Annada University, 2005, 20 (8), 402-410.
    [2] 邹明福,钮兴昱,刘吕平,白洪亮等,联机手写英文识别,计算机研究与发展,2006,43(1),138-144.
    [3] 靳简明,江红英,王庆人等,数学公式识别系统:MatheReader,计算机学报,2006,29(11),218-220.
    [4] 张昕中 编著,《汉字识别技术》[M],清华大学出版社,1992.
    [5] 郭军,马跃,盛立东等,发展中的文字识别理论与技术,电子学报,1995,23(10),184~187.
    [6] M.K.Brown and S.Ganapathy, "Preprocessing techniques for cursive script wors recognition," Pattern Recognition, 1983, 16 (5), 447-458.
    [7] SENI G, SRIHARI R K, NASRABADI N, "Large vocabulary recognition of on-line handwritten cursive words," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(7), 757-762.
    [8] Box GEP, Hunter WG, and Hunter JS (1978), "Statistics for experimenters: an introduction to=design, data analysis, and model building," Wiley, New York, 1978, 15(5), 207-221.
    [9] Bozinovic RM, Srihari SN, "Off-line cursive script word recognition," IEEE Trans Patt Anal Mach Intell1989, 22(1), 63-84.
    [10] R.M.Gray, "Vector Quantization," Readings in Speech Recognition, Alex Waibel and Kai-Fu Lee, eds., Morgan Kaufmann, 1990, 20(10), 75-100.
    [11] K.F.Chan and D.Y.Yeung, "Elastic Structural Matching for On-Line Handwritten Alphanumeric Character Recognition," Proc. 14th Int.Cong. Pattern Recognution, Aug, 1998, 2(14), 1508-1511.
    [12] E.Mandler, R.Oed, and W.Doster, "Experiments in On-Line Script Recognition," Proc.4th Scandinavian Conf.Image Analysis, 1985, 6(4), 75-86.
    [13] C.C.Tappert, "Adaptive On-Line Handwriting Recognition," Proc. 7th Int.Conf.on Pattern Recognition, Montreal, Canda, 1984, 8(7), 1004-1007.
    [14] L.R.Rabiner and B.H.Juang, "Fundamentals of Speech Recognition"[M]. Prentice Hall, 1995, 19(1), 205~308.
    [15] 张炘中,我国汉字识别技术的历史、现状和展望,中文信息学报,1995,9(1),25~30.
    [16] 郭丽,孙兴华,王正群等,一种基于连通域的版面分割方法,计算机工程与应用,2003,39(5)105-107.
    [17] J.J. Brault and R. Plamondon, "Segmenting handwritten signatures at their perceptually important points," IEEE transaction on Pattern Analysis and Machine Intelligence, 1993, 15(4) 953-957.
    [18] Le D S, Thoma G R, Wechsler H, "Automation page orientation and skew angle detection for binary document image, Pattern Recognition," 1997, 30(10), 1325~1344.
    [19] 唐泽圣等,“计算机图形学”[M],北京:清华大学出版社,1995.
    [20] H.Beigi, "Pre-Processing the Dynamics of On-Line Handwriting Data, Feature Extraction and Recognition," Proceedings of the International Workshop on Frontiers of Handwriting Recognition, Colchester, England,, 1996, 9(3), 255-258.
    [21] S.C. Hinds, J.L. Fisher, and D.P. D'Amato, "A Document Skew Detection Method Using Run-Length Encoding and the Hough Transform," Proc. 10th Int. Conf. Pattern Recognition, Atlantic City, NJ,, 1990, 6(10), 464-468.
    [22] Claus Bahlmann and Hans Burkhardt, "The Writer Independent Online Handwrinting Recognition System frog on hand and Cluster Generative Statistical Dynamic Time Warping," IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26(3), 75-86.
    [23] C.Bahlmann, "Advanced Sequence Classification Techniques Applied to On-Line Handwriting Recognition," PhD thesis, Albert-Ludwigs-University Freiburg, Institute for Informatik, to appears, 2004, 18 (2), 101-110.
    [24] Alessandro L.Koerich, Rovert Sabourin, Ching Y.Suen, "Lexicon-driven HMM decoding for large vocabulary handwriting recognition with multiple character models," International Journal on Document Analysis and Recognition, 2003, 16(6), 126-144.
    [25] J.Kittler, "On Combining Classifiers," IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20(3), 226-238.
    [26] STARNER T, MAKHOUL J, SCHWARTZ R, et al. "On-line cursive handwriting recognition using speech recognition methods" [A]. IEEE Int Conf on Acoustics, Speech and Signal Processing ICASSP-94 [C]. Adelaide: [s. n.], 1994, 5 (2), 125-128.
    [27] HU J Y, BROWNAND M K, TURIN W, "HMM based on-line handwriting recognition". IEEE Transaction on Pattern Analysis and Machine Intelligence, 1996, 18 (10), 1039-1045.
    [28] NATHAN K S, BELLEGARDA J R. "On-line handwriting recognition using continuous parameter hidden markov models" [A]. Proc Int Conf on Acoustics, Speech and Signal Processing ICASSP-93[C]. [S. l.]: [s. n.], 1993, 5 (2), 121-124.
    [29] 赵巍,刘家锋,唐降龙.“一种基于字符HMM模型级联的手写体两文单词识别方法,”计算机研究与发展,2002,39(6),712-717.
    [30] 王永庆.“人工智能原理与方法”[M].西安:西安交通大学出版社出版,1999.
    [31] El-Yacoubi A, Sabourin R, Suen CY, Gilloux M (1998), "Improved model architecture and training phase in an off-line hmm-based word recognition system". In: Proceedings of the 14th international conference on pattern recognition, Brisbaine, Australia, 1998, 8 (2), 17-20.
    [32] El-Yacoubi A, Gilloux M, Sabourin R, Suen CY (1999a), "Unconstrained handwritten word recognition using hidden markov models". IEEE Trans Patt Anal Mach Intell1999, 21 (8): 752-760.
    [33] El-Yacoubi A, Sabourin R, Gilloux M, Suen C (1999b) "Off-line handwritten word recognition using hidden markov models". In: Jain LC, Lazzerini B (eds)Knowledge-based intelligent techniques in character recognition, CRC Press, Boca Raton, FL, 1999, 20 (12), 191-229.
    [34] Elms AJ, Procter S, Illingworth J (1999), "The advantage of using an hmm-based approach for faxed word recognition". 1999, 1 (2) 18-36.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700