Dealing with Unexpected Words in Automatic Recognition of Speech

详细信息查看全文

作者：Hynek Hermansky (12)
关键词：out ; of ; vocabulary words &#8211 ; automatic recognition of speech &#8211 ; parallel model of top ; down and bottom ; up human information extraction
刊名：Lecture Notes in Computer Science
出版年：2011
出版时间：2011
年：2011
卷：6836
期：1
页码：1-15
全文大小：324.9 KB
参考文献：1. Klatt, D.H.: Review of the ARPA speech understanding project. J. Acoust. Soc. Am. 62, 1345–1366 (1977)
2. Chase, L.L.: Error-Responsive Feedback Mechanism for Speech Recognizers, PhD Thesis, Carnegie-Mellon University
3. Allen, J.B.: Articulation and Intelligibility. Morgan & Claypool (2005)
4. Van Petten, C., et al.: Time course of word identification and semantic integration in spoken language. J. Experimental Psychology: Learning, Memory, and Cognition 25(2) (1999)
5. Boothroyd, A.: Speech perception and sensorineural hearing loss. In: Ross, M., Giolas, G. (eds.) Auditory Management of Hearing-Impaired Children, University Park, Baltimore, MD (1978)
6. Boothroyd, A., Nittrouer, S.: Mathematical treatment of context effects in phoneme and word recognition. J. Acoust. Soc. Am. 84(1), 101–114 (1988)
7. Miller, G.A., Heise, G.A., Lichten, W.: The intelligibility of speech as a function of the context of the test material. J. Exp. Psychol. 41, 329–335 (1951)
8. Grant, K.W., Seitz, P.F.: The recognition of isolated words ands words in sentences: Individual variability in the use of sentence context. J. Acoust. Soc. Am. 107(2)
9. Rankovic, C., Allen, J.B.: Study of Speech and Hearing in Bell Telephone Laboratories: The Fletcher Years. In: CD ROM with Correspondence, Internal Reports and Notebooks of R. Galt (1917-1933). Acoustical Society of America, Melville (2000)
10. Bourlard, H., Wellekens, C.J.: Links between Markov Models and Multilayer Perceptrons. In: Touretzky, D. (ed.) IEEE Conference on Neural Information Processing Systems, 1988, Denver, CO, pp. 502–510. Morgan-Kaufmann Publishers, San Francisco (1989)
11. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
12. Ketabdar, H., Hannemann, M., Hermansky, H.: Detection of Out-of-Vocabulary Words in Posterior Based ASR. In: Proceedings of the International Conference on Spoken Language Processing, Antwerp, Belgium (2007)
13. Wessel, F., et al.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech and Audio Processing 9(3), 288–298 (2001)
14. White, C., et al.: Confidence Estimation, OOV Detection And Language ID Using Phone-To-Word Transduction And Phone-Level Alignments. In: Proc. ICASSP (2008)
15. Burget, L., et al.: Combination Of Strongly And Weakly Constrained Recognizers For Reliable Detection Of OOVs. In: Proc. ICASSP (2008)
16. Kombrink, S., et al.: Posterior-based Out of Vocabulary Word Detection in Telephone Speech. In: Proc. Interspeech 2009, Brighton, U.K (2009)
17. Hannemann, M., et al.: Similarity scoring for recognized repeated Out-of-Vocabulary words. In: Proc. Interspeech 2010, Makuhari, Japan (2010)
18. Sz枚ke, I., Fapso, M., Burget, L., Cernocky, J.: Hybrid Word-Subword Decoding for Spoken Term Detection. In: SSCS 2008 - Speech Search Workshop at SIGIR (2008)
19. Kombrink, S., et al.: Recovery of rare words in lecture speech. In: Sojka, P., Hor谩k, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 330–337. Springer, Heidelberg (2010)
20. Kombrink, S.: OOV detection and beyond. In: DIRAC workshop at ECML/PKDD, Barcelona (2010)
21. Tobias, J.V.: Foundations of Modern Auditory Theory. Academic Press, London (1970)
22. Mesgarani, N., et al.: Phoneme representation and classification in primary auditory cortex. Acoust. Soc. Am. 123, 899–909 (2008)
23. Mesgarani, N., et al.: Toward optimizing stream fusion in multistream recognition of speech. J. Acoust. Soc. Am. 130(1), EL14–EL18 (2011); (5 pages)
24. Hermansky, H., et al.: Performance Monitoring For Robustness In Automatic Recognition Of Speech. In: Proc. Symposium on Machine Learning in Speech and Language Processing, Bellevue, Washington, USA (June 2011)
25. Mesgarani, N., Thomas, S., Hermansky, H.: Adaptive Stream Fusion in Multistream Recognition of Speech. In: Proc. Interspeech (2011)
作者单位：1. Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland, USA2. Brno University of Technology, Czech Republic
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349

文摘

Unexpected words attract listener’s attention. They are information-rich and getting them right is important for human communication. In the automatic recognition of speech (ASR), words that are not in the expected lexicon of the machine are typically substituted by some acoustically similar but nevertheless wrong words. The article discusses reasons for this undesirable behavior of the machine, describes some known examples of dealing with the unexpected words in human speech perception and their implications, and proposes an alternative architecture of ASR that could alleviate some of the problems with the unexpected acoustic inputs. Some published experimental results from using this alternative architecture are given.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700