Noise and Channel Normalized Cepstral Features for?Far-speech Recognition
详细信息    查看全文
  • 作者:Michal Borsky (22)
    Petr Mizera (22)
    Petr Pollak (22)
  • 关键词:distorted speech ; far ; speech recognition ; cepstral features ; spectral subtraction ; cepstral mean normalization
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2013
  • 出版时间:2013
  • 年:2013
  • 卷:8113
  • 期:1
  • 页码:249-256
  • 全文大小:318KB
  • 参考文献:1. Ircing, P., Krbec, P., Hajic, J., Psutka, J., Khudanpur, S., Jelinek, F., Byrne, W.: On large vocabulary continuous speech recognition of highly inflectional language - Czech. In: INTERSPEECH, pp. 487-90 (2001)
    2. Newton Media: Newton Dictate Home page (2013), http://www.diktovani.cz
    3. Nouza, J., ??ánsky, J., David, P.: Fully Automated Approach to Broadcast News Transcription in Czech Language. In: Sojka, P., Kope?ek, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol.?3206, pp. 401-08. Springer, Heidelberg (2004) CrossRef
    4. Vaněk, J., Psutka, J.V.: Gender-dependent acoustic models fusion developed for automatic subtitling of parliament meetings broadcasted by the czech TV. In: Sojka, P., Horák, A., Kope?ek, I., Pala, K. (eds.) TSD 2010. LNCS, vol.?6231, pp. 431-38. Springer, Heidelberg (2010) CrossRef
    5. Chaloupka, J., Nouza, J., Zdansky, J., Cerva, P., Silovsky, J., Kroul, M.: Voice Technology Applied for Building a Prototype Smart Room. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds.) Multimodal Signals. LNCS (LNAI), vol.?5398, pp. 104-11. Springer, Heidelberg (2009) CrossRef
    6. Rajnoha, J., Pollák, P.: ASR systems in noisy environment: Analysis and solutions for increasing noise robustness. Radioengineering?20(1), 74-4 (2011)
    7. Nouza, J., Silovsky, J.: Fast keyword spotting in telephone speech. Radioengineering?18(4), 665-70 (2009)
    8. Schuller, B., W?llmer, M., Moosmayr, T., Rigoll, G.: Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement. In: INTERSPEECH 2008, pp. 1789-792 (2008)
    9. Kermorvant, C.: A comparison of noise reduction techniques for robust speech recognition. Idiap-RR Idiap-RR-10-1999, IDIAP, IDIAP-RR 99-10 (1999)
    10. Wang, L., Odani, K., Kai, A.: Evaluation of hands-free large vocabulary continuous speech recognition by blind dereverberation based on spectral subtraction by multi-channel LMS algorithm. In: Habernal, I., Matou?ek, V. (eds.) TSD 2011. LNCS, vol.?6836, pp. 131-38. Springer, Heidelberg (2011) CrossRef
    11. Sovka, P., Pollak, P., Kybic, J.: Extended spectral subtraction. In: EUSIPCO 1996, Trieste (September 1996)
    12. Junqua, J.C., Haton, J.P.: Asr of noisy, stressed, and channel distorted speech. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol.?341, pp. 273-23. Springer, US (1996) CrossRef
    13. Droppo, J., Acero, A.: Environmental robustness. In: Springer Handbook of Speech Processing, pp. 653-80. Springer (2008)
    14. Young, S., et al.: The HTK Book, Version 3.4.1, Cambridge (2009)
    15. Fousek, P., Mizera, P., Pollak, P.: CtuCopy feature extraction tool (2013), http://noel.feld.cvut.cz/speechlab/start.php?page=download&lang=en
    16. Pollák, P., ?ernocky, J.: Czech SPEECON adult database. Technical report (November 2003), http://www.speechdat.org/speecon
    17. Boril, H., Fousek, P., Pollak, P.: Data-driven design of front-end filter bank for Lombard speech recognition. In: Proc. of Interspeech 2006, Pitssburgh (September 2006)
  • 作者单位:Michal Borsky (22)
    Petr Mizera (22)
    Petr Pollak (22)

    22. Faculty of Electrical Engineering, Czech Technical University in Prague, K13131 CTU FEE, Technická 2, 166 27, Prague 6, Czech Republic
文摘
The paper analyses suitable features for distorted speech recognition. The aim is to explore the application of command ASR system when the speech is recorded with far-distance microphones with a possible strong additive and convolutory noise. The paper analyses feasible contribution of basic spectral subtraction coupled with cepstral mean normalization in minimizing of the influence of present distortion in such far-talk channel. The results are compared with reference close-talk speech recognition system. The results show the improvement in WER for channels with low or medium SNR. Using the combination of these basic techniques WERR of 55.6% was obtained for medium distance channel and WERR of 22.5% for far distance channel.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700