Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System
详细信息    查看全文
文摘
Statistical Machine Translation (SMT) is one of the most popular methods for machine translation. In this work, we carried out English-Malay SMT by acquiring an English-Malay parallel corpus in computer science domain. On the other hand, the training parallel corpus is from a general domain. Thus, there will be a lot of out of vocabulary during translation. We attempt to improve the English-Malay SMT in computer science domain using a dictionary and an English lemmatizer. Our study shows that a combination of approach using bilingual dictionary and English lemmatization improves the BLEU score for English to Malay translation from 12.90 to 15.41.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.