脱机手写女书文字识别技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
女书是世界上最具性别意识的文字,有着重要的非物质文化遗产保护价值。到目前为止,女书文献主要依靠手工抄写的方式传承,而随着女书传人的相继去世,女书文献的收集和整理变得更加困难,女书文化濒临消失。针对此问题,本文将脱机手写文字识别技术应用到女书文献的信息化上,为保护和发扬女书这组中华民族珍贵的文明基因尽自己的一份力量。
     本文在对目前脱机手写文字识别算法进行详细分析的基础上,针对女书自身的特点提出了一种脱机手写女书文字识别方案。从方案的设计着手,详细分析了脱机手写女书文字识别的工作流程,各部分的功能和常用算法,将周边方向贡献度特征提取算法应用到女书文字的特征提取上,并提出了一种改进的笔画密度特征提取算法和一种三级距离分类识别算法;设计并实现了一个实用的女书识别系统。
     本文的主要工作和特色如下:
     1)针对女书文字的样本,采用平滑算法和二值化算法去除样本图像中的方格噪声和背景,并根据女书样本中文字分布的特性,采用行合并的切分算法切分女书文字。最后将切分出的女书文字归一化成统一规格。
     2)分析了两种笔画密度特征提取算法的特点以及它们应用在女书文字上的不足,将周边方向贡献度特征提取算法应用到女书文字的特征提取上,并根据女书文字倾斜的特性,提出了一种改进的笔画密度特征提取算法。
     3)对现有多级距离分类器进行了分析,针对欧式距离在识别过程中的不足,设计了一种三级距离分类器。分类器的一级分类采用Manhattan距离,二级分类和三级分类采用误差均衡距离,该分类器具有Manhattan距离分类速度快和误差均衡距离分类能够使女书文字特征中稳定的部分得到突出,不稳定的部分被抑制的优点。
     4)采用本文提出的改进笔画密度特征提取方法、三级距离分类器等算法,设计并实现了一个脱机手写女书文字识别系统。用系统进行了仿真实验,对实验结果进行分析和比较。
Nüshu is the world's most gender character, and has an important value of the intangible cultural heritage protection. So far, the inheritance of Nüshu documents mainly relies on the way of manual transcription. With the inheritance persons of the Nüshu died successively, the culture of Nüshu is endangered as collecting and compiling the literature of Nüshu become more difficult. In order to solve this problem, handwritten character recognition technology is used in this dissertation to informationize the literature of Nüshu, so as to protect and promote Nüshu which is precious Chinese civilization.
     An off-line handwritten Nüshu character recognition program for the features of Nüshu is provided in this paper, based on the detailed analysis on the current off-line handwritten character recognition algorithm. The study started from the design of the scheme. The work processes of the off-line handwritten Nüshu character recognition, functions of each part and the frequently-used methods are carefully analyzed. The peripheral direction contribution feature extraction algorithm is applied to the feature extraction of Nüshu. An improved stroke density feature extraction algorithm and a three-level classification algorithm are proposed as well as a practical system of Nüshu is designed and implemented.
     The main work and the features are as follows:
     1) According to the samples of Nüshu, smoothing algorithms and binary algorithm are used to remove the background and noises in the sample images. According to the distribution characteristics of Nüshu characters which is in samples, used the combined with the line segmentation algorithm. Finally the size of the Nüshu characters is normalized.
     2) The features and the defects of using in Nüshu characters of the two stroke density feature extraction algorithm are analyzed.The peripheral direction contribution feature extraction algorithm is applied to the feature extraction of Nüshu as well as an improved stroke density feature extraction algorithm is proposed according to the tilt features of Nüshu characters.
     3) The current multi-level classifier is analyzed, a three-level distance classifier is designed according to the defects of Euclidean distance in the recognition process. One-level classifier uses Manhattan distance classifier as well as two-level and three-level use error balanced distance. This method has the advantage of the Manhattan distance classifier of high speed and the error balancing distance classifier can make the stable characteristics of Nüshu characters highlighted and the instable parts inhibit.
     4) An off-line handwritten Nüshu character recognition system is designed and implemented based on the improve stroke density feature extraction method and three-level distance classifier algorithms. Simulation experiment of the system is carried out and the experiment results are analyzed and compared.
引文
[1]刘春侠.“女书”研究综述[J].湖南科技学院学报, 2005, 1:45-48.
    [2]乐伶俐.女书传承中的文化教育效应[J].社会科学家, 2008, 6:154-175.
    [3]李莹莹.女书的概述与探索[J].赤峰学院学报(汉文哲学社会科学版), 2010, 3:47-48.
    [4] Iijima I, Okumura Y, Kuwabara K. New Process of Character Recognition Using Sieving Method[J]. Information and Control Research. 1963, 1(1):30–35.
    [5] Greanias E C. Some Important Factors in the Practical Utilization of Optical Character Readers[J]. Optical Character Recognition, 1962:129–146.
    [6] Rholand W S, Traglia P J, Hurley P J. The Design of an OCR System for Reading Handwritten Numerals[C]. Proceedings of the Fall Joint Computer Conference. Montvale, N. J, 1968:1151–1162.
    [7] Sheinberg I. The Input-2 Document Reader[J]. Pattern Recognition, 1970, 3(3):167-173.
    [8] Mori S, Yamamoto K, Yamada H, et al. On a Handprinted Kyoiku-Kanji Character Data Base[J]. Bulletin of the Electrotechnical Laboratory, 1979, 43(11-12):752-773.
    [9] Methasate I, Marukatat S, et al. The feature combination technique for off-line Thai character recognition system[C]. Proceedings Eighth International Conference on Document Analysis and Recognition, 2005, 2(5):1006-1009.
    [10] Khateeb J H, Khelifi F, et al. A New Approach for Off-Line Handwritten Arabic Word Recognition Using KNN Classifier[C]. 2009 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009:191-194.
    [11] Kannan R J, Prabhakar R, Suresh R M. Off-line Cursive Handwritten Tamil Character Recognition[C]. International Conference on Security Technology, 2008:159-164.
    [12] Marti U V, Bunke H. A full English Sentence Database for Off-line Handwriting Recognition[C]. Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999:705-708.
    [13] Purkait P, Chanda B. Off-line Recognition of Hand-Written Bengali Numerals Using Morphological Features[C]. International. Conference on Frontiers in Handwriting Recognition (ICFHR), 2010:363-368.
    [14] Vargas J F, Travieso C M, et al. Off-line Signature Verification Based on Gray Level Information Using Wavelet Transform and Texture Features[C]. International Conference on Frontiers in Handwriting Recognition (ICFHR), 2010, 1:587-592
    [15] Batista L, Granger E, et al. A Multi-Hypothesis Approach for Off-Line Signature Verification with HMMs[C]. International Conference on Document Analysis and Recognition, 2009, 2:1315-1319.
    [16]张承德,黄襄念,张凯兵,卢华.基于一种新的组合特征的脱机手写数字识别[J].微计算机信息, 2009, 25:294-295.
    [17]张红云,苗夺谦,张东星.基于主曲线的脱机手写数字结构特征分析及选取[J].计算机研究与发展, 2005, 8:1344-1349.
    [18]杨金伟,段会川.基于BP神经网络的脱机手写数字识别[J].信息技术与信息化, 2008, 4:49-50.
    [19] Xue Yang, Jin Lianwen. A New Rotation Feature for Single Tri-axial Accelerometer Based 3D Spatial Handwritten Digit Recognition[C].International Conference on Pattern Recognition (ICPR), 2010:4218-4221.
    [20]张豪杰,张红云,苗夺谦.基于主曲线的脱机手写英文字母结构特征分析及选取[J].计算机科学, 2009, 10:197-201.
    [21]肖春景,李春利,张敏.脱机手写体签名识别的小波包隐马尔可夫模型[J].计算机应用, 2010, 2:445-448.
    [22] Yang Ming, Yin Zhongke,et al. A Contourlet-based Method for Handwritten Signature Verification[C]. IEEE International Conference on Automation and Logistics, 2007, 8: 1561-1566.
    [23]王建平,赵丽欣,王金玲.一种脱机手写体汉字识别的容错编码方法研究[J].中国图象图形学报, 2007, 12:2169-2177.
    [24]王建平,李帷韬,王金玲,王熹徽,程羽.一种基于仿生识别的脱机手写体汉字识别方法[J].模式识别与人工智能, 2008, 1:62-71.
    [25]温尚清,郝志峰,廖芹,陈炎雄.基于贝叶斯网络的脱机手写体汉字智能识别[J].计算机辅助工程, 2006, 3:72-74.
    [26]普次仁.多种印刷字体藏文字符的特征提取方法研究[J].西藏大学学报(自然科学版) , 2008, 1:25-28.
    [27]刘真真,王茂基,李永忠,沈晔华.基于分形矩的印刷体藏文特征提取方法[J].模式识别与人工智能, 2008, 5:654-657.
    [28]门光福.一种基于多级分类的西夏文字识别算法[J].高师理科学刊, 2010, 4:44-47.
    [29]王嘉梅,文永华,李燕青,高雅莉.基于图像分割的古彝文字识别系统研究[J].云南民族大学学报(自然科学版) , 2008, 1:76-79.
    [30]白文荣.手写体蒙古文字识别——切分技术的研究[J].科技经济市场, 2009, 6:30-31.
    [31] (日)谷口庆治编.数字图像处理(基础篇)[M].朱虹译.北京:科学出版社, 2002:120-125.
    [32]王玉雷,李永忠,王汝山.粗网格在印刷体藏文特征提取中的应用[J].科学技术与工程, 2009:5546-5548.
    [33]欧阳应华.一种基于特征提取的脱机手写汉字识别技术[D].兰州大学, 2007:34-35.
    [34]金连文,高学.几种手写体汉字网格方向特征提取法的比较研究[J].计算机应用研究, 2004, 11:38-40.
    [35] Jin Lianwen, Yin Junxun,Xue Gao , et al. Study of Several Directional Feature Extraction Methods with Local Elastic Meshing Technology for HCCR[J]. Computer Science and Technology in New Century, 2001:232-236.
    [36] Suen C Y, Mori S, Kim S H, et al. Analysis and Recognition of Asian Scripts– the State of the Art[C]. Proceedings of the 7th International Conference on Document Analysis and Recognition. Edinburgh, Scotland, 2003:866–878.
    [37]吴佑寿,丁晓青.汉字识别–原理方法与实现[M].北京:高等教育出版社, 1992.
    [38]赵敏,舒俭.基于K-L变换的人脸识别系统[J].华东交通大学学报, 2006, 5:70-74.
    [39] Yao Y. Handprinted Chinese Character Recognition Via Neural Networks[J]. Pattern Recognition Letters, 1988, 7(1):19-25.
    [40] Liu C L, Nakashima K, Sako H, et al. Handwritten Digit Recognition: Benchmarking of State-of-the-Art Techniques[J]. Pattern Recognition, 2003, 10:2271–2285.
    [41] Yang Feng, Yang Fan. Character Recognition Using Parallel BP Neural Network[C]. International Conference on Language and Image Processing,ICALIP, 2008: 1595-1599.
    [42] Arnold R, Miklos P. Character Recognition Using Neural Networks[C]. International Symposium on Computational Intelligence and Informatics (CINTI), 2010:311-314.
    [43]王江晴,万晨.周边方向贡献度在脱机手写女书特征提取中的应用[J].中南民族大学学报(自然科学版), 2010, 29(3):65-67.
    [44] Toth D, Aach T. Improved Minimum Distance Classification with Gaussian OutlierDetection for Industrial Inspection[C]. 11th International Conference on Image Analysis and Processing, 2001:584– 588.
    [45] Liu Zhe. Minimum Distance Texture Classification of SAR Images in Contourlet Domainl[C]. International Conference on Computer Science and Software Engineering, 2008:834-837.
    [46] Lin H, Venetsanopoulos A N. A Weighted Minimum Distance Classifier for Pattern Recognition[C]. Canadian Conference on Electrical and Computer Engineering, 1993:904-907.
    [47] Ma Yanhua, Liu Chuanjun. A Recognition Algorithm for Chinese Chararcter Based on Minimum Distance Classifier[C]. WCSE '09. Second International Workshop on Computer Science and Engineering, 2009:246-249.
    [48]蒋慧峰,蔡光兴.一种新的距离分类函数[J].湖北工业大学学报, 2006, 1:78-79.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700