A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers
详细信息    查看全文
  • 作者:Ali Dehghan Firoozabadi ; Hamid Reza Abutalebi
  • 关键词:Simultaneous speakers localization ; Speakers number estimation ; Nested microphone array ; Subband processing ; Direction of arrival (DOA) ; Generalized cross correlation (GCC)
  • 刊名:Circuits, Systems, and Signal Processing
  • 出版年:2016
  • 出版时间:February 2016
  • 年:2016
  • 卷:35
  • 期:2
  • 页码:573-601
  • 全文大小:2,516 KB
  • 参考文献:1.J. Allen, D. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979)CrossRef
    2.T. Arai, Estimating number of speakers by the modulation characteristics of speech, in Proceedings of the ICASSP (2003), pp. 197–200
    3.A. Brutti, M. Omologo, P. Svaizer, Localization of multiple speakers based on a two step acoustic map analysis, in Proceedings of the ICASSP (2008), pp. 4349–4352
    4.H. Buchner, R. Aichner, W. Kellermann, Relation between blind system identification and convolutive blind source separation, in Proceedings of the Joint Workshop on Hands-Free Communication and Microphone Array (2005), d-3-d-4
    5.H. Buchner, R. Aichner, J. Stenglein, H. Teutsch, W. Kellermann, Simultaneous localization of multiple source using blind adaptive MIMO filtering, in Proceeding of the ICASSP (2005), pp. 97–100
    6.W. Cai, X. Zhao, Zh. Wu, Localization of multiple speech sources based on sub-band steered response power, in Proceeding of the International Conference on Electrical and Computer Engineering (ICECE) (2010), pp. 1246–1249
    7.O. Cetin, E. Shriberg, Analysis of overlaps in meetings by dialog factors, hot spots, speakers, and collection site: insights for automatic speech recognition, in Proceeding of the ICSLP (2006), pp. 293–296
    8.E.D. Claudio, R. Parisi, G. Orlandi, Multi-source localization in reverberant environments by ROOT-MUSIC and clustering, in Proceeding of the ICASSP (2000), pp. 921–924
    9.A. Dehghan, H.R. Abutalebi, Subband processing-based approach for the localisation of two simultaneous speakers. IET Signal Process. 8(9), 996–1008 (2014)CrossRef
    10.A. Dehghan, H.R. Abutalebi, Combination of nested microphone array and Subband processing for multiple simultaneous speaker localization, in Proceeding of the 6th International Symposium on Telecommunications (IST) (2012), pp. 907–912
    11.A. Dehghan, H.R. Abutalebi, SRP-ML: A robust SRP-based speech source localization method for noisy environments, in Proceeding of the 18th Iranian Conference on Electrical Engineering (ICEE) (2010), pp. 2950–2955
    12.C. Faller, J. Merimaa, Source localization in complex listening situations: selection of binaural cues based on interaural coherence. J Acoust. Soc. Am. 116(5), 3075–3089 (2004)CrossRef
    13.M.F. Fallon, S.J. Godsill, Acoustic source localization and tracking of a time-varying number of speakers. IEEE Trans. Audio Speech Lang. Process. 20(4), 1409–1415 (2010)CrossRef
    14.J. Garofalo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus, linguistic data consortium, Philadelphia. http://​www.​ldc.​upenn.​edu/​Catalog/​CatalogEntry.​jsp?​catalogId=​LDC93S1 . Last accessed Dec 2014
    15.M. Hesam, H. Marvi, Improvement of sector based multiple speaker localization in a smart room, in Proceedings of the International Conference of Signal Processing (ICSP) (2010), pp. 470–473
    16.Y. Hikoa, M. Matsuo, N. Hamada, Multiple-speech-source-localization using advanced histogram mapping method. Acoust. Sci. Technol. 30(2), 143–146 (2009)CrossRef
    17.J.S. Hu, C.Y. Chan, C.K. Wang, M.T. Lee, C.Y. Kuo, Simultaneous localization of a mobile robot and multiple sound sources using a microphone array. Adv. Robot. 25(1), 135–152 (2011)CrossRef
    18.B. Kapralos, M.R.M. Jenkin, E. Milios, Audio-visual localization of multiple speakers in a video teleconferencing setting. Technical report, York University, Canada (2002), pp. 94–96
    19.H.D. Kim, K. Komatani, T. Ogata, H.G. Okuno, Evaluation of two-channel-based sound source localization using 3D moving sound creation tool, in Proceedings of the International Conference on Informatics Education and Research for Knowledge-Circulating Society (2008), pp. 209–212
    20.C.H. Knapp, G.C. Carter, The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)CrossRef
    21.A. Kumar, P.V. Balakrishna, C. Prakesh, S.V. Gangashetty, Bessel features for estimating number of speakers from multispeaker speech signals, in Proceedings of the 18th International Conference on Systems, Signals and Image Processing (IWSSIP) (2011), pp. 1–4
    22.B. Kwon, Y. Park, Y.S. Park, Multiple sound source localization using the spatially mapped GGC function, in Proceedings of the ICROS-SICE International Conference (2009), pp. 1773–1776
    23.G. Lathoud, I.A. McCowan, A sector-based approach for localization of multiple speakers with microphone arrays, in Proceedings of the Workshop of Statistical and Perceptual Audio Processing (SAPA), Jeju, Korea (2004), pp. 5–10
    24.S.Y. Lee, H.M. Park, Multiple reverberant sound localization based on rigorous zero-crossing-based its selection. IEEE Sig. Process. Lett. 17(7), 671–674 (2010)CrossRef
    25.A. Lombard, H. Buchner, W. Kellermann, Multidimensional localization of multiple sound sources using blind adaptive MIMO system identification, in Blind Speech Separation, ed. by S. Makino, T.W. Lee, S. Sawada (Springer, Berlin, 2007), pp. 101–147
    26.A. Lombard, T. Rozenkrank, H. Buchner, W. Kellermann, Multidimensional localization of multiple sound sources using averaged directivity pattern of blind source separation systems, in Proceedings of the ICASSP (2009), pp. 233–236
    27.A. Lombard, Y. Zheng, W. Kellermann, Synthesis of ICA-based methods for localization of multiple broadband sound sources, in Proceedings of the ICASSP (2011), pp. 157–160
    28.M.I. Mandel, R.J. Weiss, D.P.W. Ellis, Model-based expectation maximization source separation and localization. IEEE Trans. Audio Speech Lang. Proc. 18(2), 382–394 (2010)CrossRef
    29.J.B. McQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, 1967), pp. 281–297. Retrieved 2009
    30.H. Nakashima, M. Kawamoto, T. Mukai, A localization method for multiple sound sources by using coherence function, in Proceedings of the 18th European Signal Processing Conference (2010), pp. 130–134
    31.T. Nishiura, T. Yamada, S. Nakamura, K. Shikano, Localization of multiple sound sources based on a CSP analysis with a microphone array, in Proceedings of the ICASSP (2000), pp. 1053–1056
    32.J.R. Peter, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)MATH CrossRef
    33.S. Rickard, F. Dietrich, DOA estimation of many W-disjoint orthogonal sources from two mixtures using DUET, in Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP) (2000), pp. 311–314
    34.H. Sayoud, S. Ouamour, Proposal of a new condense parameter estimating the number of speakers—an experimental investigation. J. Inf. Hiding Multimed. Sig. Proc. 1(2), 101–109 (2010)
    35.R.K. Swamy, K.S.R. Murty, B. Yegnanarayana, Determining number of speakers from multispeaker speech signals using excitation source information. Technical report, Centre for Language Technologies Research Centre, International Institute of Information Technology, Hyderabad - 500 032 (2007), pp. 481–484
    36.H. Wang, P. Chu, Voice source localization for automatic camera pointing system in videoconferencing, in Proceedings of the ICASSP (1997), pp. 187–190
    37.T. Wolff, M. Buck, G. Schmidt, A subband based acoustic source localization system for reverberant environments, in Proceedings of the ITG-Fachtagung Sparchkommunikation (2008), pp. 1–4
    38.Y.R. Zheng, R.A. Goubran, M. El-Tanany, Experimental evaluation of a nested microphone array with adaptive noise cancellers. IEEE Trans. Instrum. Meas. 53(3), 777–786 (2004)
  • 作者单位:Ali Dehghan Firoozabadi (1)
    Hamid Reza Abutalebi (1)

    1. Electrical and Computer Engineering Department, Yazd University, Pajuhesh St., Safaieh, Postal Box: 89195-741, Yazd, Iran
  • 刊物类别:Engineering
  • 刊物主题:Electronic and Computer Engineering
  • 出版者:Birkh盲user Boston
  • ISSN:1531-5878
文摘
This paper addresses the topic of simultaneous speaker localization. The work is related to the generalized cross-correlation (GCC)-based methods for estimating the direction of multiple speakers. Considering the defects of GCC-based direction of arrival (DOA) estimation methods, we have applied several modifications to improve our previous subband processing-based system for the localization of simultaneous speakers. Three modifications have been presented in this paper. In the first step, the DOA estimation method is equipped with a front-end block that determines the number of speakers based on K-means clustering and silhouette criterion. This block provides the true number of speakers for the DOA estimator. Secondly, in order to eliminate the spatial aliasing, we propose a novel nested circular microphone array. In the proposed array design, each microphone pair is only used in appropriate subband according to its inter-microphone distance. In the third step, to overcome the weakness of GCC-phase transform (GCC-PHAT) in noisy and noisy-reverberant conditions, we propose a SNR estimation block. So, we can separate noisy and reverberant conditions and use PHAT filter for reverberant conditions and maximum likelihood filter for noisy situations. The proposed method has been evaluated on both simulated and real multi-speaker speech data in various environmental conditions and different number of speakers. Our evaluations in terms of DOA accuracy demonstrate the superiority of the proposed method compared to the fullband and baseline subband methods.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700