Automatic lexeme acquisition for a multilingual medical subword thesaurus
详细信息查看全文 | 推荐本文 |
摘要
>Purpose

We present a method for the automated acquisition of a multilingual medical lexicon (for Spanish, French and Swedish) to be used within the framework of a medical cross-language text retrieval system.

Methods

For the lexical acquisition process, we incorporate seed lexicons and lists of trusted term translations derived from the UMLS Metathesaurus. The seed lexicons for Spanish, French and Swedish are automatically generated from (previously manually constructed) Portuguese, German and English sources by simple string transformations. Lexical and semantic hypotheses are then validated by processing pairs of term translations. In a last step, we use the cleaned list of “approved” translations in order to augment, step by step, the target dictionaries by processing the parallel corpora in terms of co-occurrence patterns of hypothesized translation equivalents which cannot be derived by simple character substitutions.

Results

An existing multilingual lexicon for the medical domain with about 60,000 entries for English, German, and Portuguese was automatically augmented by more then 17,000 new lexemes for Spanish, French, and Swedish.

Conclusions

Our approach constitutes a promising method for the automated creation of new lexicon entries and their linkage to semantic identifiers.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700