基于训练机制的联机维吾尔手写字母识别技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
由于磁性笔简洁、输入舒适,在各种便携式移动计算设备的普及中得到广泛应用,因此联机手写识别技术也成为模式识别领域中一个“热”点研究分支。联机手写识别技术能给用户提供自然、方便的人机交互方法。联机手写识别中通过手写板等轨迹捕获设备,获得手写者的书写信息,并对它进行实时地识别操作。手写者也能够很容易地发现和纠正识别错的字符。相对于脱机识别而言,联机识别的优势是在笔尖运动过程中可获取动态信息。
     在市场上已经有很多种中文和英文的联机手写识别产品问世,但联机维吾尔文手写识别技术还处在初步研究状态。本文对联机维吾尔手写字母识别技术做了理论和实验研究,包括维吾尔文字母轨迹数据采集、预处理、特征提取和分类器的设计等。本文在数据采集阶段中,采用自定义的数据结构和相应的文件格式来保存手写样本数据;预处理阶段中,首先对原始数据进行平滑滤波,然后为了保留维吾尔字母的结构信息,根据字母的书写特点,进行线性归一化,最后通过重采样方法压缩信息量,这样可以提高下一步的计算速度;特征提取中,结合了结构特征和统计特征的梯度方向,使特征提取算法对字符的扭曲、变形具有较好稳定性的;分类过程采用支持向量机进行分类。测试表明,随着样本数量的增加,识别率分别达到90.62%、92.86%、94.53%、96.09%。实验结果表明采用梯度方向特征提取方法能够获得较理想的结果,最高分类精度达到96.09%,最差不低于90%。这些研究对于新疆维吾尔自治区的哈萨克文、柯尔克孜文等相似的文字研究也有一定的参考价值。
With the common use of various portable devices attached with magnetic pen, which can deliver more compact and comfortable input methods, online handwritten recognition technology is becoming a hot research topic in pattern recognition field. Online handwritten recognition technology can afford natural, easy human-computer interaction method for user. In the online handwritten recognition, track information is captured and machine recognizes instantaneously while the user writes using some special writing device such as magnetic pen on some writing tablet. The user can easily detect and correct misrecognized character. The advantage of online recognition is that the dynamic information of the pen movement can be captured in contrast to offline recognition.
     Now there are many products of online handwriting recognition for Chinese characters and Latin characters. However, the handwriting Uyghur character recognition is still in preliminary research stage. This paper carried out theoretical and experimental researches on the online handwritten character recognition, such as sampling of Uyghur characters, preprocessing, feature extraction and classifier design. In the sampling, the customized file format is designed to save data sample. In the pre-processing, to keep the structure information use smoothing and linear normalization, then resampling to improve calculation speed in next step. Use the gradient directional feature method for feature extraction, which combined with the structural features and statistical features. Classifier use support vector machine. Tests show, with the increasing of training data, the recognition rate reaches 90.62%,92.86%,94.53%,96.09%, respectively. Experimental results show, through gradient direction feature can get better results, up to 96.09%, the worst also higher than 90%. These achievements are also valuable for other similar characters, which have being applied in Xinjiang Uyghur Autonomous Region, such as Kazakh and Kyrgyz.
引文
[1]张忻中.汉字识别技术.北京:清华大学出版社,1992.
    [2]吴佑寿,晓青.汉字识别原理方法与实践.北京:高等教育出版社,1993.
    [3]胡家忠.计算机文字识别技术.北京:气象出版社,1994.
    [4]Abdurazzag Ali ABURAS, Salem M. A. REHIEL et al. Off-line Omni-style Handwriting Arabic Character Recognition System Based on Wavelet Compression. ARISER,2007,3 (4):123-135.
    [5]靳简明,王华,丁晓晴.维汉英混排文档识别.电子与信息学报,2006,28(7):1188-1191.
    [6]T. H. Hildebrandt, W. Liu et al. Optical recognition of handwritten Chinese characters: advances since 1980. Pattern Recognition,1993,26 (2):205-225.
    [7]Somaya Alma'adeed. Recognition of off-line handwritten Arabic words using neural network. Geometric Modeling and Imaging-New Trends(GMAI'06),2006,141-144.
    [8]Sanparith Marukatatt, Thierry Artieres et al. Handling spatial information in on-line handwritten recognition. Proceedings of the 9th International Workshop on Frontiers in Handwritten Recognition(IWFHR-9 2004),2004.
    [9]P. M. LALLICAN, C. V. GAUDIN, S. KNEER et al. FROM OFF-LINE TO ON-LINE HANDWRITTEN RECOGNITION. Proceedings of the 7th International Workshop on Frontiers in Handwritten Recognition (IWFHR-9 2000),2000,303-312.
    [10]Gray Elilsha. Telautograph. United States,386814.1888.
    [11]Goldberg. Controller. United States,1117184.1914.
    [12]Moodey. Telautograph System. United States,2269599.1940.
    [13]http://baike. baidu. com/view/2705. html.
    [14]http://www.mzb.com.cn/html/report/14738-1.htm.
    [15]Kurban Ubul et al. A Feature Selection and Extraction Method for Uyghur Handwriting-Based Writer Identification. International Conference on Computational Intelligence and Natural Computing,2009,345-348.
    [16]A. Ymin, Y. Aoki et al. On the Segmentation of Multi-Front Printed Uygur Scripts.13th International Conference on Pattern Recognition (ICPR'96),1996,3:215-219.
    [17]玉素甫·艾白都拉,潘伟民,热孜万.笔式维吾尔文识别的中的文字切分研究.第十一届全国民族语言文字信息学术研讨会,2007,19-23.
    [18]王华,丁晓晴,哈力木拉提.多字体多字号印刷维吾尔字符识别.清华大学学报,2004,44(7):946-949.
    [19]边肇祺.模式识别.北京:清华大学出版社,2000.
    [20]李弼程,邵美珍,黄洁等.模式识别原理与应用.西安:西安电子科技大学出版社,2008.
    [21]西奥多迪斯.模式识别.北京:电子工业出版社,2006.
    [22]R. G. Casey. Moment normalization of handprinted characters. IBM J, Res, Develop,1970, 14:548-557.
    [23]G. Nagy, N. Tuong et al. Normalization techniques for handprinted numerals. CACM,1970, 13 (8):465-481.
    [24]H. Freeman. On the encoding of arbitrary geometric configurations. IEEE Trans. Electron. Comput. EC-10,1961,260-268.
    [25]Qing Wang, Zheru Chi, David D Feng, Rongchun Zhao et al. Match Between Normalization Schemes and Feature Sets for Handwritten Chinese Character Recognition. Sixth International Conference on Document Analysis and Recognition,2001,551-555.
    [26]蒋艳凰.机器学习方法.北京:电子工业出版社,2009.
    [27]阿培丁著.范明,牛常勇译.机器学习导论.北京:机械工业出版社,2009.
    [28]周志华.半监督学习中的协调训练风范.机器学习及应用.北京:清华大学出版社,2007,259-275.
    [29]Cortes, V. Vapnik et al. Support-vector network. Machine Learning,1995,20:273-297.
    [30]瓦普尼克著.张学工译.统计学习理论的本质.北京:清华大学出版社,2000.
    [31]瓦普尼克著.许建华,张学工译.统计学习理论.北京:电子工业出版社,2009.
    [32]N. Cristianini, J. S. Taylor著.李国正,王猛,曾华军译.支持向量机导论.北京:电子工业出版社,2004.
    [33]B. Samanta. Gear fault detection using artificial neural networks and support vector machines with genetic algorithms. Mechanical Systems and Signal Processing,2004, 18:625-644.
    [34]Platt J C. Fast training of support vector machines using sequential minimal optimization. Massachusetts:MIT Press,1998,185-208.
    [35]Knerr S, Personnez L, Drayfus G et al. Single-layer learning revisited:a stepwise procedure for building and training a neural network. Neurocomputing:Algorithms, Architectures and Applications. NATO ASI[M]. Springer-Verlag,1990.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700