文摘
In this work, buried Markov models (BMM) are introduced. In a BMM, a Markov chain state at time t determines the conditional independence patterns that exist between random variables lying within a local time window surrounding t. This model is motivated by and can be fully described by “graphical models”, a general technique to describe families of probability distributions. In the paper, it is shown how information-theoretic criterion functions can be used to induce sparse, discriminative, and class-conditional network structures that yield an optimal approximation to the class posterior probability, and therefore are useful for classification tasks such as speech recognition. Using a new structure learning heuristic, the resulting structurally discriminative models are tested on a medium-vocabulary isolated-word speech recognition task. It is demonstrated that discriminatively structured BMMs, when trained in a maximum likelihood setting using EM, can outperform both hidden Markov models (HMMs) and other dynamic Bayesian networks with a similar number of parameters.