详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
In the field of Chinese characters information processing, the present approaches to the formal description of Chinese character glyph are mostly base on structure analysis method used for describing the topography of Chinese characters in the research on Chinese characters and teaching of Chinese, where strategic descriptions are adopted by applying the human perceptive units, viz. glyph formation units such as types of structure, components and strokes. These methods result in ambiguities and description deficiency with regard to glyph resolution, structure classification, and selection of descriptive elements, therefore they can not meet the need to describe any possible glyph skeletons (including wrongly written characters, variant forms of characters in ancient literatures, and combined-characters), nor can they support automatic computation of glyph comparison, let alone to meet the practical need based on glyph comparison and analysis, such as the description of wrongly written characters or the quantitative analysis of misused characters in the teaching and research of Chinese characters, the description and analysis of variant forms of characters in ancient literatures, or the retrieval of rare character glyphs in the electronic books and so on.
     For special Chinese characters the glyph samples of which can not be collected in advance, such as wrongly written ones, variant forms in ancient literatures, and combined-characters, since no sample training can be done, comparative computation of the glyph cannot be supported and the recognition and identification of them cannot be guaranteed. It would also be difficult for the glyph features generated by statistics, which are adopted by recognition models, to logically resolve and map to the structure types of characters, components and strokes derived from human cognition. They are rather blackbox-like, and they do not meet the demand to human-oriented comparison and analysis of different types of glyph.
     With regard to the core issue of the lack of universally accepted effective means of the formal description and automatic glyph comparison computation of Chinese character glyph, this paper, oriented from the application of comparison and analysis of Chinese character glyphs, offers a new approach to describing them and provides a set of algorithms of related character glyphs comparison and some practical tools. The main innovative includes:
     1) A method is offered formally describe Chinese characters by a stroke-segment-mesh, which uses a line-segment of pre-defined length and direction as a glyph description element (stroke segment). Since it is equipped with suitable granular degree, free of ambiguity, and standardized, it can describe the glyph skeleton of all Chinese characters (including wrongly written characters, variant forms of characters in ancient literatures, and combined-characters). Experiments show that, compared with dot-matrix glyph, which have the same amount of element, the number of effective elements reduces a great deal in the stroke-segment-mesh glyph description, and yet a higher efficiency is achieved. What’s more, the accuracy and reliability of computation are improved thanks to a higher discrepancy degree between different Chinese character stroke-segment-mesh glyphs.
     2) Based on stroke-segment-mesh Chinese characters formal description method, a set of glyph comparing algorithm is presented. The algorithm of glyph comparing by stroke-segment and its context uses stroke-segment as comparing unit. The experiments on the GB2312 character set and some wrongly written characters, variant forms of characters, and combined-characters show that the results of glyph similarity comparing are less affected by the factors such as character structure types and strokes division. Free of training,the algorithm can compare character glyphs, and has a high rate of accuracy when the input character is basically the same size as the compared one. The algorithm of glyph comparing by the combination of stroke-segments, based on the stroke-segment-mesh, can automatically extract simple strokes, compound strokes. It uses simple strokes, or compound strokes and simple strokes adaptively as comparing unit. Experiments on the same character set of Chinese show that the algorithms based on simple stroke and compound strokes can also compute the similarity between character glyph without training, and the result is less subject to the size and different deformation of inclined strokes. The algorithms enjoy a high accuracy rate (nearly 100%) when choosing the first candidate from input glyphs of normal structure. The algorithms use bigger glyph comparing unit and can be applied for large-scale Chinese characters glyph searching with high efficiency. The comparing unit adopted can be easily mapped to the units in human cognition, and it is a"white-box" approach to glyph similarity computation. The method can be applied to the comparison of an entire Chinese character or part of it. It can find the differences between characters of non-standard structure with standardized structure characters, and therefore it can meet the needs of glyph-analysis-oriented application.
     The description and computation method of the structure relationship, based on the relationship matrix of strokes, are also provided, which can be used for the automatic identification of structure types of Chinese characters.
     3) With regard to the importance of components of Chinese characters in the research of physical structure of them, a component description method and the algorithm of automatically detecting components are attached to simple strokes of stroke-segment-mesh glyph. Experiments show that the algorithm can accurately detect the Chinese characters that have specific components, free from the influence of the location and the size of the components in the glyph.
     4) This paper also improves the description system of Chinese character structure of "Chinese character information dictionary", offering an algorithm for the calculating glyph similarity of Chinese characters based on structure description. The experiment results show that the similar character lists found by this algorithm have a high degree of consistence on structure and conform to human cognition. Therefore, the algorithm is suitable for similarity calculation of Chinese characters of definite structure classes.
     5) In this paper, an application software system– Toolkit of Chinese Character Glyph Description and Automatic Comparison and Analysis is designed and implemented, The tool creates a stroke-segment-mesh glyph description by popular hand-written and drawing method. Any imaginable Chinese characters can be put in, including wrongly written characters, variant forms of characters in ancient literatures, combined-characters, and other related information. The stroke-segment-mesh glyph can be automatically transformed to corresponding TrueType font, and processed just like those in the set of standard Chinese character. The tool can make a comparison among stroke-segment-mesh glyphs and find their similarities and differences as a whole or as part, and can find a similar character lists sorted by similarity. The work of creating 20,902 Chinese characters stroke-segment-mesh glyph description in GBK character sets and wrongly written characters written by foreign students studying in Beijing Language and Culture University has been completed by this tool. The Chinese characters glyph database has been applied to the analysis of spelling errors made by foreign students.
     The work will benefit the standardization of Chinese character glyph description and will found wide application in various fields based on Chinese character glyph computing, such as the input of Chinese characters outside of the standard character set, the construction of digital libraries in China, the research, the teaching, and international promotion of Chinese, the research into the history of Chinese characters and culture, the informationalized social management, etc.
    2 http://lyzy.dragoninfo.cn/北京龙戴特信息技术有限公司龙与汉堂字源数据库网站
    3 http://www.xiaoyaobi.com/北京逍遥笔模式识别工作站网站
    4 His-Jian Lee, Hung-Chi Hsu.A hierarchical model-guide generation of Chinese characters. 1994 IEEE Proceeding of ICPR’94, p256-260
    6王瑜,荒源,张福炎.Windows中TrueType字形数据的存取技术.小型微型计算机系统, 1997,18(11)
    8何明,匡燕玲等.页面描述语言Postscript及其转换程序.北京工业大学学报,1994,20(4): 101-104
    10 Candy L.K.Yiu,WaiWong. Chinese character synthesis using METAPOST. Proceedings of the 2003 Annual Meeting TUGboat,Volume 24(2003),No.1 p85-88
    12 SHIN,JUNGPIL,SUZUKI,KAZUNORI.Handwritten Chinese Character Font Generation Based on Stroke Correspondence. International Journal of Computer Processing of Oriental Languages,2005,18(3):211-226
    13冯万仁,金连文.基于部件复用的分级汉字字库的构想与实现.计算机应用,2006,26(3): 714-716
    14 http://www.hifont.com/上海汉峰信息科技有限公司网站
    18潘德孚.关于汉字部件类排序的意义和方法.温州师范学院学报(哲学社会科学版), 1995(4):30-31
    23施正宇.外国留学生形符书写偏误分析.北京大学学报(哲学社会科学版),1999(4): 147-153
    34 Herng-Yow Chen, Kuo-Yu Liu. Web-based synchronized multimedia lecture system design for teaching/learning Chinese as second language. Computers & Education, 2008, 50(3), P693-702.
    36国家语言文字工作委员会. GF3001-1997信息处理用GB13000.1字符集汉字部件规范.北京:语文出版社,1997.12.1发布,1998.5.1实施
    41 Ideographic Description,http://www.unicode.org/versions/Unicode4.0.0/ ch11.pdf:307-309
    42 http://www.eforth.com.tw/eforth.htm易符智慧科技公司网站
    43 Omega/CHISE: A Typesetting Framework based on the Character Information Service Environment, Kyoto University 21st Century COE Program, http://coe21.zinbun.kyoto-u.ac.jp/papers/ws-type-2003/077-Omega-CHISE.pdf
    45 http://www.sinica.edu.tw/~cdp/台湾中央研究院信息科学研究所文献处理实验网站
    46孙星明,殷建平,陈火旺等.汉字的数学表达式研究[J].计算机研究与发展, 2002,39(6): 707-711
    47张问银,孙星明,曾振柄等.汉字数学表达式的自动生成[J].计算机研究与发展,2004, 41(5):848-852
    48 Richard Cook. A Specification for CDL(Character Description Language): an extract of [PhD Dissertation]. UC Berkeley,Dept.of Linguistics,2003
    49 http://www.wenlin.com/cdl/美国加州大学伯克利分校文林研究所网站
    50 Y. Liu, J. Tai, J.Liu, An introduction to the 4 million handwriting Chinese character samples library, in:Proceedings of the International Conference on Chinese Computing and Orient Language Processing, Changsha,China,1989.
    51 Hsi-Jian Lee,Hung-Chi Hsu.A hierarchical model-guided generation of Chinese characters. Proc.of the 12th Intern.conf.on Pattern Recognition, 256-260, Jerusalem,Israel,Oct.1994.
    52 Zen Chen,Chi-Wei Lee, Rei-Heng Cheng.Handwriten Chinese Character Analysis and Preclassification Using Stroke Structual Sequences. 1996 IEEE Proceeding of ICPR’96, p89-93
    53钱国良,洪勇等.基于机器学习的手写汉字识别的研究.模式识别与人工智能,1996,9(4): 353-358
    56 H.Zhang,J.Guo,Introduction to HCL2000 database,in:Proceedings of Sino-JapanSymposium on Intelligent Information Networks,Beijing,China,2000.
    60蔺志青,郭军.贝叶斯分类器在手写汉字识别中的应用.电子学报,2002,30(12): 1804-1807
    61 Kuo-Chin Fan,Wei-Hsien Wu etc.A Symmetry-Based Coarse Classification Method for Chinese Characters.IEEE Transactions on System,Man,and Cybernetics-PartC, 2002,32(4):522-528
    62 DAMING SHI,ROBERT I.DAMPER etc.Offline Handwritten Chinese Character Recongition by Radical Decomposition.ACM Transaction on Asian Language Information Processing,2003,2(1):27-48
    63左文明,黎绍发,曾宪贵.BP算法在手写体汉字识别中的应用.计算机工程与设计,2003, 24(10):71-73
    64李晓辉,吴蓓等.基于部件的分类方法及在汉字识别中的应用.微电子学与计算机, 2003(10):17-19
    65杨静宇,魏兴国,孙怀江.一种快速SVM学习算法.南京理工大学学报, 2003,27(5):530-535
    66 C.Y.Suen,S.Mori,S.H.Kim,C.H.Leung,Analysis and recognition of Asian scripts-the state of the art, in: Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh,Scotland,2003.
    67 Rabiner,L.R.,A tutorial on Hidden Markov Models and selected applications in speech recognition.Proc.IEEE.v77.257-285.
    69石大明,刘家锋,唐降龙等.手写汉字识别的非线性动态部件模板.自动化学报,2004, 30(3)
    71 Shi, D.,Ng,G.S.Radical recognition of handwritten Chinese characters using GA-based kernel active shape modelling.IEE Proceedings -- Vision,Image & Signal Processing, 2005,152( 5):634-638
    73曹喆炯,王永成.笔顺连笔自由的联机手写汉字识别.计算机工程与应用,2005,29:167- 169
    75喻莹,杨杨,董才林.基于动态特征选择的手写体相似汉字的识别.计算机工程,2006, 32(17):10-12
    77王建平,赵丽欣,王金玲.一种汉字识别的容错编码方法研究.计算机技术与发展,2006,16 (11):67-69
    78 Joseph B. Hellige, Maheen M. Adamson. Hemispheric differences in processing handwritten cursive.Brain and Language,2007,102(3):215-227
    79 Yang Ma, Graham Leedham.On-line recognition of handwritten Renqun shorthand for fast mobile Chinese text entry. Pattern Recognition Letters, 2007,28(7):873-883.
    80 Paul Morrison,Ju Jia Zou.Triangle refinement in a constrained Delaunay triangulation skeleton.Pattern Recognition,2007,40(10):2754-2765
    81 T.-H.Su,T.-W.Zhang,H.-J.Huang,Y.Zhou, HMM-based recognizer with segmentation -free strategy for unconstrained Chinese handwritten text,in: Proceedings of the 9th International Conference on Document Analysis and Recognition,2007.
    82 Varga,T.and Bunke,H.,Offline handwriting recognition using synthetic training data produced by means of a geometrical distortion model.Int.J.Pattern Recognition Artif. Intell.v18.1285-1302.
    83 T.-H.Su,T.-W.Zhang,Z.-W.Qiu.HMM-based system for transcribing Chinese handwriting, in: Proceedings of the 6th International Conference of Machine Learning and Cybernetics,Hong Kong,China,2007.
    84 C.-L.Liu, Handwritten Chinese character recognition: effects of shape normalization and feature extraction,in:Arabic and Chinese Handwriting Recognition,2008.
    87唐玉荣.生物信息学中一个优化的全局双序列比对算法.计算机应用,2004,24(6): 307-308
    88杜世宏,王桥,杨一鹏.一种定性细节方向关系的表达模型.中国图象图形学报,2004, 9(12):1496-1503 89杜世宏,王桥.不确定性空间关系.中国图象图形学报,2004,9(5)
    91杜世宏,王桥,杨一鹏,李治江.空间方向关系模糊描述.计算机辅助设计与图形学学报,2005, 17(08):1744-1751