Noise and Channel Normalized Cepstral Features for?Far-speech Recognition

详细信息查看全文

作者：Michal Borsky (22)
Petr Mizera (22)
Petr Pollak (22)
关键词：distorted speech ; far ; speech recognition ; cepstral features ; spectral subtraction ; cepstral mean normalization
刊名：Lecture Notes in Computer Science
出版年：2013
出版时间：2013
年：2013
卷：8113
期：1
页码：249-256
全文大小：318KB
参考文献：1. Ircing, P., Krbec, P., Hajic, J., Psutka, J., Khudanpur, S., Jelinek, F., Byrne, W.: On large vocabulary continuous speech recognition of highly inflectional language - Czech. In: INTERSPEECH, pp. 487-90 (2001)
2. Newton Media: Newton Dictate Home page (2013), http://www.diktovani.cz
3. Nouza, J., ??ánsky, J., David, P.: Fully Automated Approach to Broadcast News Transcription in Czech Language. In: Sojka, P., Kope?ek, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol.?3206, pp. 401-08. Springer, Heidelberg (2004) CrossRef
4. Vaněk, J., Psutka, J.V.: Gender-dependent acoustic models fusion developed for automatic subtitling of parliament meetings broadcasted by the czech TV. In: Sojka, P., Horák, A., Kope?ek, I., Pala, K. (eds.) TSD 2010. LNCS, vol.?6231, pp. 431-38. Springer, Heidelberg (2010) CrossRef
5. Chaloupka, J., Nouza, J., Zdansky, J., Cerva, P., Silovsky, J., Kroul, M.: Voice Technology Applied for Building a Prototype Smart Room. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds.) Multimodal Signals. LNCS (LNAI), vol.?5398, pp. 104-11. Springer, Heidelberg (2009) CrossRef
6. Rajnoha, J., Pollák, P.: ASR systems in noisy environment: Analysis and solutions for increasing noise robustness. Radioengineering?20(1), 74-4 (2011)
7. Nouza, J., Silovsky, J.: Fast keyword spotting in telephone speech. Radioengineering?18(4), 665-70 (2009)
8. Schuller, B., W?llmer, M., Moosmayr, T., Rigoll, G.: Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement. In: INTERSPEECH 2008, pp. 1789-792 (2008)
9. Kermorvant, C.: A comparison of noise reduction techniques for robust speech recognition. Idiap-RR Idiap-RR-10-1999, IDIAP, IDIAP-RR 99-10 (1999)
10. Wang, L., Odani, K., Kai, A.: Evaluation of hands-free large vocabulary continuous speech recognition by blind dereverberation based on spectral subtraction by multi-channel LMS algorithm. In: Habernal, I., Matou?ek, V. (eds.) TSD 2011. LNCS, vol.?6836, pp. 131-38. Springer, Heidelberg (2011) CrossRef
11. Sovka, P., Pollak, P., Kybic, J.: Extended spectral subtraction. In: EUSIPCO 1996, Trieste (September 1996)
12. Junqua, J.C., Haton, J.P.: Asr of noisy, stressed, and channel distorted speech. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol.?341, pp. 273-23. Springer, US (1996) CrossRef
13. Droppo, J., Acero, A.: Environmental robustness. In: Springer Handbook of Speech Processing, pp. 653-80. Springer (2008)
14. Young, S., et al.: The HTK Book, Version 3.4.1, Cambridge (2009)
15. Fousek, P., Mizera, P., Pollak, P.: CtuCopy feature extraction tool (2013), http://noel.feld.cvut.cz/speechlab/start.php?page=download&lang=en
16. Pollák, P., ?ernocky, J.: Czech SPEECON adult database. Technical report (November 2003), http://www.speechdat.org/speecon
17. Boril, H., Fousek, P., Pollak, P.: Data-driven design of front-end filter bank for Lombard speech recognition. In: Proc. of Interspeech 2006, Pitssburgh (September 2006)
作者单位：Michal Borsky (22)
Petr Mizera (22)
Petr Pollak (22)

22. Faculty of Electrical Engineering, Czech Technical University in Prague, K13131 CTU FEE, Technická 2, 166 27, Prague 6, Czech Republic

文摘

The paper analyses suitable features for distorted speech recognition. The aim is to explore the application of command ASR system when the speech is recorded with far-distance microphones with a possible strong additive and convolutory noise. The paper analyses feasible contribution of basic spectral subtraction coupled with cepstral mean normalization in minimizing of the influence of present distortion in such far-talk channel. The results are compared with reference close-talk speech recognition system. The results show the improvement in WER for channels with low or medium SNR. Using the combination of these basic techniques WERR of 55.6% was obtained for medium distance channel and WERR of 22.5% for far distance channel.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700