Morphological decomposition in Arabic ASR systems

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Morphological decomposition in Arabic ASR systems

详细信息	查看全文 \| 推荐本文 \|

作者：F. Diehl ; ^{fd257@eng.cam.ac.uk} ; M.J.F. Gales ^{mjfg@eng.cam.ac.uk} ; M. Tomalin ^{mt126@eng.cam.ac.uk} ; P.C. Woodland ^{pcw@eng.cam.ac.uk}
关键词：Automatic Speech Recognition ; Arabic ; Morphology ; Pronunciation probabilities
刊名：Computer Speech & Language
出版年：2012
期刊代码：124_08852308
类别：cp
出版时间：August, 2012
卷：26
期：4
页码：229-243
文件大小：270 K

摘要

In recent years, the use of morphological decomposition strategies for Arabic Automatic Speech Recognition (ASR) has become increasingly popular. Systems trained on morphologically decomposed data are often used in combination with standard word-based approaches, and they have been found to yield consistent performance improvements. The present article contributes to this ongoing research endeavour by exploring the use of the 鈥楳orphological Analysis and Disambiguation for Arabic鈥?(MADA) tools for this purpose. System integration issues concerning language modelling and dictionary construction, as well as the estimation of pronunciation probabilities, are discussed. In particular, a novel solution for morpheme-to-word conversion is presented which makes use of an N-gram Statistical Machine Translation (SMT) approach. System performance is investigated within a multi-pass adaptation/combination framework. All the systems described in this paper are evaluated on an Arabic large vocabulary speech recognition task which includes both Broadcast News and Broadcast Conversation test data. It is shown that the use of MADA-based systems, in combination with word-based systems, can reduce the Word Error Rates by up to 8.1%relative.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700