笔画码汉字输入法软件设计与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
汉字是中华民族传统文化的核心和信息交流的主要工具,古老而复杂多样的汉字属于二维平面的方块字,不像英文等西方文字那样是一维线形文字,可以直接输入计算机,而是需要采用特殊的汉字输入法软件。汉字输入计算机是计算机中文信息处理的第一个环节,汉字输入技术直接影响着中文信息处理的发展。本文着眼于汉字输入法软件在系统中的设计和开发过程,提出一种简单、方便的汉字键盘输入法。
     论文首先统计了国标二级字库中汉字笔画信息的各种数据,这些数据主要包括:汉字的平均笔画数及按使用频度加权的平均笔画数、能与其它字区分开的汉字前若干笔画的平均数与加权平均数、以各种笔画起笔的汉字数、各种笔画在汉字字库中的出现次数、汉字字库中笔画相同的汉字以及汉字字库中相邻笔画的频度等。根据这些统计数据,我们采用书写汉字时的笔画顺序作为汉字输入码,设计了笔画码汉字输入法和实现该输入方法的键盘。
     为了在输入法中显示汉字的笔画,论文介绍了采用曲线轮廓描述技术的TrueType字体,分析TrueType字形描述技术原理和TrueType文件结构,利用字体创造软件建立汉字笔画的TrueType字体文件。
     在Windows系统下,输入法文件实际上是一个动态链接库程序。为此,论文分析了Windows操作系统对输入法支持的内部机制,揭示了输入法与系统的关系,并根据输入法原理,描述了输入法接口函数的工作过程以及应用程序对输入法的支持。
     在实现输入法软件时,论文首先分析了笔画码汉字输入法的运行流程图,划分了程序的各个模块,并重点论述输入法外码与内码转换处理模块。为了存储汉字的笔画编码,论文提出了一种基于有序二叉树的高效优化索引树,将传统的trie-索引树进行优化,将节点合并,并采用特定的非定长结构存储树节点,大大节省了存储空间。由于汉字的平均笔画数过大,在用笔画码输入汉字时,如果完整的输入汉字笔画就会使得码长过长。为了实现汉字输入码的不完整输入,解决带有模糊输入符的字符串模式与一个字符串集合之间的模糊匹配问题,论文在第五章提出了一种字符串集合的模糊匹配算法,给出了算法的具体实现和复杂度分析。
     论文在最后提出一种利用计算机硬件信息解决软件安全注册的实现方法,设计了软件注册的流程图。
Chinese character is the core of Chinese tradition culture and the main tool of information exchange. The western character, just like English, which is one dimension linearity character, can be put into the computer directly. But the old and complex Chinese character, which belong to two dimensions square character, needs special input method software to put it into the computer. The first step of Chinese information processing is to put Chinese character into the computer and the technology of Chinese character input method affect the development of Chinese information processing directly. A simple and convenient Chinese character keyboard input method has been proposed in this thesis with a view to the process of design and development of Chinese character input method software.
    In this thesis we firstly made all kinds of statistics on Chinese character stroke information of Chinese National Standard Code For Information Interchange (GB2312-80), such as the average strokes of each Chinese character and each character which uses utility frequency as its weight, the average strokes of each character been added weight or not that can be differed from the other characters, the number of Chinese characters which begin with each stroke, the times of each stroke in the Chinese character set, the Chinese characters which have same strokes in the Chinese character set, and the frequency of adjacent stroke, etc. According to the statistic, we have devised a new Chinese Character keyboard input method named "Stroke Code" by adopting the stroke sequence of hand-written Chinese character as its input codes and designed a kind of key arrangement for the method.
    In order to display Chinese character's strokes in the input method, we introduced TrueType font adopting curve outline describing technology, analyzed the principle of the describing technology and the file structure of TrueType font. Then we created a TrueType font file for Chinese character's strokes by using font creator program.
    Under the Windows system, the file of input method editor(IME) is a dynamic link library in fact. Through analyzing the kernel mechanism of Windows operating system(OS) supporting IMB, we discovered the relation between IMB and OS. According to the principle of IME, we described the work procedure of IME's interface functions and the application supporting IME.
    In order to develop a software of Chinese character IME named "Stroke Code", we analyzed flow chart of the IME and divided the program into several modules firstly. And then we mainly discussed the module which convert outside codes into internal statement number of Chinese character. We have proposed an efficient optimized trie-tree based on ordered binary tree to store the stroke strings of Chinese character. By merging some trie-tree nodes and using special structure as the trie-tree
    
    
    
    node's memory structure, we got an optimized trie-tree which has the merits of low memory space. To avoid inputting the stroke strings completely while inputting a Chinese character, we have a requirement of missing some elements in Chinese character input codes. So a fast pattern matching algorithm on mass string assemble has been proposed to solve the problem of fuzzy matching between a string pattern and a string assemble. The algorithm has been described in detail and the cost of space and run time has been analyzed in chapter 5th of the thesis.
    The final work we have done is that we have proposed a safety method of shared software registration by using PC's hardware information.
引文
[1] 信息交换用汉字编码字符集 基本集(GB2312-80).
    [2] 信息技术 信息交换用汉字编码字符集 基本集的扩充(GB18030-2000).中国标准出版社,2001.
    [3] 信息技术 数字键盘汉字输入通用要求(GB/T18031-2000).
    [4] 现代汉语通用字表.国家语言文字工作委员会,1988.
    [5] 现代汉语通用字笔顺规范.国家语言文字工作委员会,1997.
    [6] 费锦昌,黄佑源,张静贤.汉字写法规范字典.上海辞书出版社,1992.
    [7] 林彦凤,黄英妮.新华笔顺字典.吉林音像出版社,2001.
    [8] 国家语委标准化工作委员会办公室.国家语言文字规范和标准选编.中国标准出版社,1997.
    [9] 陈一凡,胡宣华.汉字键盘输入技术与理论基础.清华大学出版社,广西科学技术出版社,1994.
    [10] 张寿萱,徐建毅,张建生.中文信息的计算机处理.宇航出版社,1984.
    [11] 赵珀璋,徐力.计算机中文信息处理.宇航出版社,1989.
    [12] 杨道沅,李棣.汉字输入键盘设计方法的研究——兼论标准汉字双拼键盘的设计.中文信息学报,1997(3).
    [13] 王立建,陈壮,王欣,代红.中文信息处理标准化.中国标准化,2002(6).
    [14] 张学涛.汉字的笔画、部件、偏旁和基本字四大组成部分——中文信息名词标准化概念探讨.电子出版,1995(7).
    [15] 廖继莉.中文超大字符集输入法的研究和开发.语言研究,2002.
    [16] 王力德.汉字编码的普及目标体系与编码实例.中文信息学报,1994(4).
    [17] 陈一凡,朱亮.汉字键盘输入智能处理软件综述.中文信息学报,2003(2).
    [18] Nadine Kano著,郑全战等译.Windows 95/NT国际软件开发指南.清华大学出版社,1998.
    [19] 刘启文,曾大亮,江国星.在MS Windows环境下显示汉字的方法.微电子学与计算机,1995(4).
    [20] 石永久.Windows环境下矢量汉字的使用.计算机应用研究,1998(2).
    
    
    [21] 王瑜,黄源,张福炎.Windows中TrueType字形数据的存取技术.小型微型计算机系统,1997(11).
    [22] 闻申生.字形技术的现状和趋势——兼谈Windows平台字型技术TrueType.电子出版,1995(2).
    [23] 吕强等.TrueType文件格式初探.计算机研究与发展,1995(11).
    [24] 杨亮,阮晓星,魏晋鹏.Windows消息驱动机制中的核心技术分析.计算机应用研究,1997(5).
    [25] 侯俊杰.深入浅出Windows MFC程序设计.华中理工大学出版社,1998.
    [26] 傅清祥,王晓东.算法与数据结构.电子工业出版社,1998.
    [27] 苗兰芳,杨传斌.模糊串匹配算法及其应用.小型微型计算机系统,1996(10).
    [28] 胡配华,王永成,刘功申.基于有序二叉树的多模式匹配算法.计算机科学,2002.
    [29] http://www.unicode.org
    [30] http://fhpi.yingkou.net.cn/bbs/1951/29.htm
    [31] http://www.high-logic.com
    [32] Microsoft Corporation. TrueType 1.0 Font Files Technical Specification Revision 1.66. November, 1995.
    [33] Microsoft Corporation. TrueType Font File Version 1.66. 1995.
    [34] Microsoft Corporation. Input Method Editor. MSDN Library-July 2000.
    [35] Microsoft Corporation. Windows DDK Documentation. MSDN Library-October 2001
    [36] C.M.Hoffmann and M.J.O'Donnel. Fast Pattern Matching In Strings. SIAM J.COMPUT, June, 1997.
    [37] Alfred V.Aho,John E.Hopcroft,and Jeffrey D.Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley,1974.
    [38] Donald E.Knuth,James H.Morris,Jr.,and Vaughan R.Pratt. Fast pattern matching in strings. SIAM Journal on Computing,6(2):323-350,1977.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700