Compact representations and unsupervised training of discriminative language models.
详细信息   
  • 作者:Xu ; Puyang.
  • 学历:Ph.D.
  • 年:2013
  • 导师:Khudanpur, Sanjeev,eadvisorHermansky, Hynekecommittee memberKarakos, Damianosecommittee member
  • 毕业院校:The Johns Hopkins University
  • Department:Eletrical and Computer Engineering
  • ISBN:9781303395161
  • CBH:3572751
  • Country:USA
  • 语种:English
  • FileSize:4977975
  • Pages:141
文摘
Statistical language models are a crucial component of automatic speech recognition ASR) systems: they assign a priori probability to candidate word sequences under consideration by the system. Conventionally, an LM is trained from a text corpus using standard statistical criteria such as maximum likelihood ML). Discriminative training of an LM, by contrast, entails using an initial ASR system to identify a set of competing candidate transcriptions for each utterance in a speech corpus, and adjusting the LM parameters to favor the correct transcriptions over incorrect candidates. A discriminatively-trained language model DLM) is demonstrably complementary to an ML-trained model in improving ASR accuracy. Two important obstacles to the widespread use of DLMs are addressed in this dissertation: having to store a much larger number of parameters than a typical ML-trained model, and requiring transcribed speech to estimate model parameters. DLMs tend to have a much larger number of parameters than ML-trained LMs, mainly to capture statistical information from an enormous number of incorrect ASR hypotheses in addition to statistics from the correct transcriptions. Their memory footprint is therefore often prohibitively large. Three novel techniques are proposed to represent DLMs compactly, namely feature randomization that results in parameter sharing, re-parameterization of the DLM as a convolutional neural network, and phone-level parameterization of the DLM instead of word-level parameterization. All three techniques are able to reduce the size of the model by orders of magnitude, with negligible loss in model performance. Unsupervised training methods for DLMs are also developed—discriminative training methods that does not require transcribed speech—by observing that the core requirement in discriminative training is a set of incorrect competitors for each correct) sentence in a text corpus. A novel approach for simulating competitors is proposed that uses phrasal cohorts: alternative, acoustically confusable phrases that the ASR system is likely to consider for any phrase in the original sentence. Competing candidate transcriptions may be generated by this approach from text alone, without requiring transcribed speech. The efficacy of this approach is investigated on a range of state-of-the-art ASR systems. It is demonstrated empirically that depending on the underlying ASR system, unsupervised discriminative training using simulated confusions achieves between 15% and 60% of the improvement obtained by supervised discriminative training of language models.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700