汉字键盘输入和非键盘输入若干问题研究

英文题名：Research on Several Problems of Chinese Character Keyboard Input and Non-keyboard Input
作者：张建勋
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：汉字输入法 ; 键盘输入 ; 联机手写汉字输入 ; 模糊匹配 ; 字符串匹配 ; 笔划特征点
英文关键词：Chinese Character Input Method ; Keyboard Input ; OLCCR Fuzzy Match ; String Match Stroke feature Points
学位年度：2003
导师：吴建国
学科代码：081203
学位授予单位：安徽大学
论文提交日期：2003-05-05

摘要

本篇论文主要研究了自然输入汉字方法(包括键盘输入和联机手写汉字输入)的实现，致力于解决实现汉字自然输入过程中出现的若干问题。这里所谓的自然输入汉字的方法，是指无需经过太多的学习和训练便能掌握的方法。本文从汉字结构出发，将汉字笔划分类，并将国标二级字库中的汉字用笔划进行编码，制定了笔划编码字典，统计了笔划信息的各种数据。根据笔划编码字典和笔划统计信息，设计了笔划编码汉字输入的方法和实现该方法的键盘。
     由于汉字的平均笔划数过多，在用笔划编码方法输入汉字时，如果完整的输入汉字笔划就会使得码长过长。为了实现汉字输入码的不完整输入，解决带有模糊输入符的字符串模式与一个字符串集合之间的匹配问题，论文在第三章提出一种海量字符串集合的模式匹配算法，给出了算法的具体实现和复杂度分析，并且提出一种优化的检索树结构来存储字符串集合以节省内存空间。为了提高算法的运行速度，算法还引入了KMP模式匹配和有限自动机匹配的思想。
     为了在键盘上实现汉字的自然输入，论文提出一种“模拟笔划”的汉字输入新方法，这种方法特别适用在目前信息产品上广泛使用的数字小键盘上，它不直接在键盘上输入汉字笔划，而是根据笔划的形状特征和运笔方向输入汉字笔划的起点、折点和落点等笔划特征点。这种方法可以连续在键盘上输入汉字笔划，中间不需分割键，并且可以在输入错误时向前删除笔划，它可以看作是键盘输入向联机手写汉字输入的过渡方法。
     本文在上述工作的基础上最后给出联机手写汉字输入方法的初步实现，其方法是在笔划编码字典的基础上，根据“模拟笔划”的输入汉字的思想先识别汉字笔划、再识别汉字。笔划的识别思想是通过笔迹上的坐标点抽取笔划的特征点，由特征点形成笔段，由笔段组成笔划，最后由笔划序列来识别汉字。
The thesis has maily been research in the implementaion of Chinese character input methods which can be mastered without a specialized leaming,including keyboard input and OLCCRAfter studying the structure of Chinese character,we have classified Chinese character strokes and coded Chinese character with stroke strings. Our basic work has also included compiling a stroke coding dictionary for Chinese National Standard Code For Information Interchange (GB23 12-80) and making statistics on stroke imformation.On the basis of the coding dictionary and statistical data,we have devised a keyboard input method and designed a kind of key arrangement for the method.
    To avoid inputting the stroke strings completely while inputting a Chinese character, we have a requirement of missing some elements in Chinese character input codes. So a fast pattern matching algorithm on mass string assemble has been proposed to solve the problem of fuzzy matching between a string pattern and a string assemble.To make the algorithm cost-effective in space and time,we have developed an optimized trie-tree structure to store the string assemble and introduced the Knuth-Morris-Pratt(KMP) and Finite-Automata(FA) string matching thought to our algorithm.The algorithm has been describled in details and the cost of space and run time has been analized in the thesis.
    In order to input Chinese character naturally from keyboard,our next step is presenting a new input method named "stroke simulation".The main idea of the method is inputting the feature points of a stroke,such as up-point, twist-point and down-point, instead of inputting a stroke directly from keyboard. The trail of the feature points we extracted from a stroke shoud be able to shape the stroke. The most attractive point of this method is that it allows users input strokes continually without an extra key to separate two continuous strokes. In addition,the method supports deleting strokes backwards to make users be able to modify the error input. Except for PCs,this new method is fit for applicating on some portable device with number keyboard,such as mobile phone,electronical dictionary,etc.It can be viewed as an transitional step towards OLCCR.
    The final work we have done is a preliminary implementation of OLCCR based on the stroke coding dictionary which had been presented before.Our method is to extract the feature points from the trails of the strokes.The feature points formed the stroke segments,the stroke segments formed the strokes. We recognize hand-written Chinese character from the stroke imformation we have obtained from the original inputting points.

引文

[1]《信息交换用汉字编码字符集基本集》(GB2312—80)
    [2]《现代汉语通用字笔顺规范》，国家语言工作委员会，1997。
    [3]《现代汉语通用字表》，国家语言工作委员会，1988。
    [4]刘韵玲，叶玉秀，《汉字写法规范字典》，上海辞书出版社，p497-504，1992。
    [5]林彦风，黄英妮，《新华笔顺字典》，吉林齐像山版社，2001。
    [6]张寿萓，徐建毅，张建生，《中文信息的计算机处理》，宇航出版社，p27-34，p78-94，1984。
    [7]张忻中，《汉字识别技术》，清华大学出版社，p63-74，1992。
    [8]陈一凡，胡宣华，《汉字键盘输入技术珮与理论基础》，清华大学出版社，广西科学技术出版社。
    [9]傅清祥，王晓东著，《算法与数据结构》，电子工业出版社，p169-172，1998。
    [10]吕映芝，张素琴，蒋维壮，《编译原理》，清华大学出版社，p49-63，清华大学出版社，1998。
    [11](美)R.C.冈察雷斯，M.G.汤姆逊著(濮群，徐凤家，徐光佑译)，《句法模式识别》，清华大学出版社，1984．7。
    [12]赵珀璋，徐力，《计算机中文信息处理(下)》，宇航出版朴，p8-19，1989。
    [13]钱培德，董庭辉，《微机汉字操作系统实用开发技术》，北京师范学院山版社，p6-7，1991。
    [14]李政，计算技术输入技术的现状和发展趋势，松辽学刊，1999。
    [15]王立建，数字键盘汉字输入不要“混战”，隙望新闻周刊，2000。
    [16]李金凯，计算机中文信息笔形编码法，计算机学报，4：4(1981)，p50-315。
    [17]李毅民等，汉字形符编码法研究报告，电子计算机动态，1(1981)，p50-54。
    [18]施琼斐，“汉语拼齐输入法”对于相关键盘之探究，台湾大叶大学设计研究所硕士班学位论文，2001。
    [19]征荆，丁晓青，吴佑寿，郭繁夏，兼顾连笔和笔顺的联机手写汉字识别方法，清华大学学报(自然科学版)23／29，1997年第37卷第9期。
    [20]胡配华，干永成，刘功申，基于有序二叉树的多模式匹配算法，计算机科学，2002。


    [21] 苗兰芳，杨传斌，模糊串匹配算法及其应用，小型微型计算机系统，1996。
    [22] 王世昌，字符串匹配的自动机方法，Computer Applications,1996．
    [23) 余楚中，联机手写体汉字识别方法的研究，重庆大学学报，1998。
    [24] 王伟智，基于智能技术的汉字输入，兵工自动化，1999。
    [25] 唐降龙，一种笔段序列匹配联机汉字识别方法，计算机研究与发展，1999。
    [26] 李宜仲，利用文字结构来辨识线上手写中文字之字首，台湾中兴大学应用数学系，硕士学位论文，2000。
    [27] T.H.Cormen,C.E.Leiserso,and R.L.Rivest,Introdution to Algorithms,the MitPress, 1996.
    [28] Kenneth C.Louden,Compiler Construction Principles and Practice,PWS Publishing Company, 1997.
    [29] C.M.Hoffinann.and M.J.O'Donnel Fast Pattern Matching In Strings. SIAM J.COMPUT. June,1997.
    [30] Alfred V. Aho,John E.Hopcrofi,and Jeffrey D.Ullman.The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
    [31] Donald E.Knuth,James H.Morris,Jr.,and Vaughan R.Pratt. Fast pattern matching in strings. SIAM Journal on Computing,6(2):323-350,1977.
    [32] Hsiao-Tzu Lu,A Simple Tree Pattern-Matching Algorithm,Deparment of Computer and Information Science Taiwan Chiao Tung University(Paper),1999.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700