参考文献:1. Ircing, P., Krbec, P., Hajic, J., Psutka, J., Khudanpur, S., Jelinek, F., Byrne, W.: On large vocabulary continuous speech recognition of highly inflectional language - Czech. In: INTERSPEECH, pp. 487-90 (2001) 2. Newton Media: Newton Dictate Home page (2013), http://www.diktovani.cz 3. Nouza, J., ??ánsky, J., David, P.: Fully Automated Approach to Broadcast News Transcription in Czech Language. In: Sojka, P., Kope?ek, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol.?3206, pp. 401-08. Springer, Heidelberg (2004) CrossRef 4. Vaněk, J., Psutka, J.V.: Gender-dependent acoustic models fusion developed for automatic subtitling of parliament meetings broadcasted by the czech TV. In: Sojka, P., Horák, A., Kope?ek, I., Pala, K. (eds.) TSD 2010. LNCS, vol.?6231, pp. 431-38. Springer, Heidelberg (2010) CrossRef 5. Chaloupka, J., Nouza, J., Zdansky, J., Cerva, P., Silovsky, J., Kroul, M.: Voice Technology Applied for Building a Prototype Smart Room. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds.) Multimodal Signals. LNCS (LNAI), vol.?5398, pp. 104-11. Springer, Heidelberg (2009) CrossRef 6. Rajnoha, J., Pollák, P.: ASR systems in noisy environment: Analysis and solutions for increasing noise robustness. Radioengineering?20(1), 74-4 (2011) 7. Nouza, J., Silovsky, J.: Fast keyword spotting in telephone speech. Radioengineering?18(4), 665-70 (2009) 8. Schuller, B., W?llmer, M., Moosmayr, T., Rigoll, G.: Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement. In: INTERSPEECH 2008, pp. 1789-792 (2008) 9. Kermorvant, C.: A comparison of noise reduction techniques for robust speech recognition. Idiap-RR Idiap-RR-10-1999, IDIAP, IDIAP-RR 99-10 (1999) 10. Wang, L., Odani, K., Kai, A.: Evaluation of hands-free large vocabulary continuous speech recognition by blind dereverberation based on spectral subtraction by multi-channel LMS algorithm. In: Habernal, I., Matou?ek, V. (eds.) TSD 2011. LNCS, vol.?6836, pp. 131-38. Springer, Heidelberg (2011) CrossRef 11. Sovka, P., Pollak, P., Kybic, J.: Extended spectral subtraction. In: EUSIPCO 1996, Trieste (September 1996) 12. Junqua, J.C., Haton, J.P.: Asr of noisy, stressed, and channel distorted speech. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol.?341, pp. 273-23. Springer, US (1996) CrossRef 13. Droppo, J., Acero, A.: Environmental robustness. In: Springer Handbook of Speech Processing, pp. 653-80. Springer (2008) 14. Young, S., et al.: The HTK Book, Version 3.4.1, Cambridge (2009) 15. Fousek, P., Mizera, P., Pollak, P.: CtuCopy feature extraction tool (2013), http://noel.feld.cvut.cz/speechlab/start.php?page=download&lang=en 16. Pollák, P., ?ernocky, J.: Czech SPEECON adult database. Technical report (November 2003), http://www.speechdat.org/speecon 17. Boril, H., Fousek, P., Pollak, P.: Data-driven design of front-end filter bank for Lombard speech recognition. In: Proc. of Interspeech 2006, Pitssburgh (September 2006)
作者单位:Michal Borsky (22) Petr Mizera (22) Petr Pollak (22)
22. Faculty of Electrical Engineering, Czech Technical University in Prague, K13131 CTU FEE, Technická 2, 166 27, Prague 6, Czech Republic
文摘
The paper analyses suitable features for distorted speech recognition. The aim is to explore the application of command ASR system when the speech is recorded with far-distance microphones with a possible strong additive and convolutory noise. The paper analyses feasible contribution of basic spectral subtraction coupled with cepstral mean normalization in minimizing of the influence of present distortion in such far-talk channel. The results are compared with reference close-talk speech recognition system. The results show the improvement in WER for channels with low or medium SNR. Using the combination of these basic techniques WERR of 55.6% was obtained for medium distance channel and WERR of 22.5% for far distance channel.