On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks
详细信息    查看全文
文摘

We analyze the use of character n-grams both as indexing and translation units for CLIR tasks.

We study their effective application and consistency across languages.

We use an algorithm of our own for parallel text alignment at the subword level.

Tests were performed for seven languages, with English as the target language.

Results confirm their feasibility and consistency, that their validity is not tied to a given implementation, and a remarkable robustness.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700