A compression algorithm using integrated record information for translation dictionaries

详细信息	查看全文 \| 推荐本文 \|

作者：Kadoya ; Y. ; Fuketa ; M. ; Atlam ; El-Sayed ; Morita ; K. ; Sumitomo ; T. ; Aoe ; J.
关键词：Natural language dictionaries ; Dictionary search ; Individual dictionaries compound words ; Trie structure ; Natural language processing system ; Double-array
刊名：Information Sciences
出版年：2004
期刊代码：62_00200255
类别：cp
出版时间：October 19, 2004
卷：165
期：3-4
页码：171-186
文件大小：234 K

摘要

A Trie structure is a well-known method for retrieving natural language (NL) dictionaries for morphological analysis, machine translation and so on. With the development of a variety of NL processing systems, some types of dictionaries in a computer hard disk have a lot of common information. This paper presents a method of merging individual dictionaries into the generalized dictionary. It enables us to reduce the total dictionary size and to expand the usage of individual dictionaries to that of the other applications. For key retrieval of the merged dictionary, there are many long strings such as compound words and idioms which take much space for a huge set of keys when stored in the Trie, so a fast trie structure, called a double-array structure is introduced and its compression scheme is proposed by replacing long strings into corresponding leaf node numbers of the Trie. Although the size of the presented records grows, the total number of them is extremely decreased by merging common information. The presented method is evaluated by the observation experimental results for nine dictionaries show that new method is more efficient than previous ones.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700