基于传声器阵列的建模和定位算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
传声器阵列可作为声通信系统的一个组成部分,用于拾取声信号。传声器阵列可用在语音增强、波束形成、系统辨识、去混响、语音识别与话者识别、声源定位与跟踪、回声抵消和语音分离等领域。本文基于传声器阵列研究混响和噪声环境中的多通道盲辨识、声源定位与跟踪。
     为提高多通道盲辨识对非高斯噪声的鲁棒性,提出了一种归一化多通道频域自适应最小平均M-估计算法。不同于使用平方误差作为代价函数的传统方法,所提算法使用M-估计器构建多通道频域自适应滤波的代价函数,利用M估计器对非高斯噪声的不敏感性提高算法的性能。真实声环境的实验数据验证了提出的算法的优越性。
     从非互信息角度提出了一种鲁棒的时延估计算法。将多个随机变量的联合熵分解成多个随机变量间的互信息和非互信息两类信息,提取出多个传声器信号间的非互信息构造时延估计器,获得对房间混响具有鲁棒性的算法。
     将多通道互相关系数算法从仅使用空间信息推广到使用空时信息,提出了一种基于多通道空时预测的时延估计算法。理论分析表明所提算法对传声器信号起预白化作用,并对混响具有鲁棒性。进一步提出递归算法来降低所提算法的运算复杂度。仿真实验证实了提出的算法的优越性以及递归算法的有效性。
     利用超心型指向性传声器建立了一种正方形阵列,对阵列的频响特性和指向性进行了分析,使其具有360度全方位拾音功能。在该阵列结构的基础上,提出了一种最大能量发言人自动跟踪算法,并通过实验验证了阵列和算法的有效性。
Microphone arrays, which are commonly employed to capture acoustic signals, are an important part of various acoustic communication systems. Microphone arrays are extensively applied to speech enhancement, beamforming, system identification, dereverberation, speech recognition and speaker recognition, sound source localization and tracking, acoustic echo cancellation, and speech separation. This dissertation focuses on research on blind multichannel identification, source localization and tracking based on microphone arrays.
     Blind multichannel identification (BMCI) is to estimate the channel impulse responses of an unknown multichannel system based only on the output signals. To improve the resilience of BMCI to non-Gaussian noise, we propose a robust normalized multichannel frequency-domain least-mean M-estimate algorithm. Unlike the traditional approaches that use the squared error as the cost function, we use an M-estimator to form the cost function, which is shown robust to non-Gaussian noise with a symmetric α-stable distribution. Simulations demonstrate the superiority of the proposed algorithm.
     To localize sound sources in room acoustic environments, time differences of arrival (TDOAs) between two or more microphone signals must be determined. In this dissertation, we partition the joint entropy of multiple random variables into two classes of information: mutual information shared by the multiple random variables and non-mutual information among them. We extract the non-mutual information among an array of microphones to estimate TDOA. Simulations in reverberant environments justify the effectiveness of the proposed algorithm.
     The multichannel cross-correlation-coefficient (MCCC) algorithm, which is an extension of the traditional cross-correlation method from two-to multiple-channel cases, exploits spatial information among multiple microphones to improve the robustness of time delay estimation. In this dissertation, we propose a multichannel spatio-temporal prediction (MCSTP) algorithm, which can be viewed as a generalization of the MCCC principle from using only spatial information to using both spatial and temporal information. We also propose a recursive version of this new algorithm, which can achieve similar performance as MCSTP, but is computationally more efficient. Experimental results in reverberant and noisy environments demonstrate the advantages of this new method.
     Finally, we use directional microphones to construct a square array, and analyze the frequency response and directionality of this array. To make the array capture the sound from all directions, we analyze how to design this array. Based on the square array, we propose a source tracking algorithm with the ability to track the speaker with the maximum speech power. Experimental results in anechoic and general rooms demonstrate the effectiveness of the algorithm.
引文
[1]S. L. Gay and J. Benesty, Acoustic Signal Processing for Telecommunication. Boston, MA:Kluwer,2000.
    [2]M. S. Brstein and D. B. Ward, Microphone Arrays:Signal Processing Techniques and Applications, Eds. Berlin, Germany:Springer-Verlag,2001.
    [3]H. L. V. Trees, Detection, Estimation and Modulation Theory, Part IV, Optimum Array Processing, New York, Wiley,2002.
    [4]Y. Sato, "A method of self-recovering equalization for multilevel amplitude-modulation," IEEE Trans. Commun., vol. COM-23, pp.679-682, Jun. 1975.
    [5]L. Tong, G. Xu, and T. Kailath, "A new approach to blind identification and equalization of multipath channels," in Proc.25th Asilomar Conf. Signals, Syst., Comput., vol.2,1991, pp.856-860.
    [6]E. Moulines, P. Duhamel, J. F. Cardoso, and S. Mayrargue, "Subspace methods for the blind identification of multichannel FIR filters," IEEE Trans. Signal Process., vol.43, pp.516-525, Feb.1995.
    [7]H. Liu, G. Xu, and L. Tong, "A deterministic approach to blind equalization," in Proc.27th Asilomar Conf. Signals, Syst., Comput., vol.1,1993, pp.751-755.
    [8]G. Xu, H. Liu, L. Tong, and T. Kailath, "A least-squares approach to blind channel identification," IEEE Trans. Signal Process., vol.43, pp.2982-2993, Dec. 1995.
    [9]C. Avendano, J. Benesty, and D. R. Morgan, "A least squares component normalization approach to blind channel identification," in Proc. IEEE Int. Conf. AcousL., Speech, Signal Process. (ICASSP), vol.4,1999, pp.1797-1800.
    [10]D. Slock, "Blind fractionally-spaced equalization, prefect reconstruction filerbanks, and multilinear prediction," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.(ICASSP), vol.4,1994, pp.585-588.
    [11]Y. Hua, "Fast maximum likelihood for blind identification of multiple FIR channels," IEEE Trans. Signal Process., vol.44, pp.661-672, Mar.1996.
    [12]L. Tong and S. Perreau, "Multichannel blind identification:From subspace to maximum likelihood methods," Proc. IEEE, vol.86, no.10, pp.1951-1968, Oct. 1998.
    [13]Q. Zhao and L. Tong, "Adaptive blind channel estimation by least squares smoothing," IEEE Trans. Signal Process., vol.47, pp.3000-3012, Nov.1999.
    [14]Y. Huang, J. Benesty, and G. W. Elko, "Adaptive eigenvalue decomposition algorithm for real-time acoustic source localization system," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), vol.2,1999, pp.937-940.
    [15]Y. Huang and J. Benesty, "Adaptive multi-channel least mean square and Newton algorithms for blind channel identification," Elsevier Signal Process., vol.82, pp. 1127-1138, Aug.2002.
    [16]M. Dentino, J. McCool, and B. Widrow, "Adaptive filtering in the frequency domain," Proc. IEEE, vol.66, no.12, pp.1658-1659, Dec.1978.
    [17]Y. Huang and J. Benesty, "A class of frequency-domain adaptive approaches to blind multichannel identification," IEEE Trans. Signal Process., vol.51, no.1, pp. 11-24, Jan.2003.
    [18]M. A. Haque and M. K. Hasan, "Noise robust multichannel frequency-domain LMS algorithms for blind channel identification," IEEE Signal Process. Lett., vol. 15, pp.305-308,2008.
    [19]J. Dibiase, H. F. Silverman, and M. S. Brandstein, "Robust localization in reverberant rooms," in Microphone Arrays:Signal Processing Techniques and Applications, M. S. Brstein and D. B. Ward, Eds. Berlin, Germany. Springer-Verlag,2001.
    [20]W. Bangs and P. Schultheis, "Space-time processing for optimal parameter estimation," in Signal Process., J. Griffiths, P. Stocklin, and C. V. Schooneveld, Eds. New York:Academic,1973.
    [21]W. Hahn, "Optimum signal processing for passive sonar range and bearing estimates," J. Acoust. Soc. Amer., vol.58, pp.201-207, Jul.1975.
    [22]B. D. Van Veen and K. M. Buckley, "Beamforming:A versatile approach to spatial filtering," IEEE Acoust., Speech, Signal Process. Mag., vol.5, pp.4-24, Apr.1988.
    [23]R. O. Schmidt, "Multiple emitter location and signal parameter estimation," IEEE Trans. Antennas Propag., vol. AP-34, no.3, pp.276-280, Mar.1986.
    [24]J. Capon, "High resolution frequency-wavenumber spectrum analysis," Proc. IEEE, vol.57, no.8, pp.1408-1418, Aug.1969.
    [25]J. Dmochowski, J. Benesty, and S. Affes, "Direction of arrival estimation using the parameterized spatial correlation matrix," IEEE Trans. Audio, Speech, Lang. Proceess., vol.15, no.4, pp.1327-1339, May 2007.
    [26]J. Dmochowski, J. Benesty, and S. Affes, "Broadband MUSIC:challenges and opportunities for multiple source localization," Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA),2007, pp.18-21.
    [27]W. Zeng and X. Li, "High-resolution multiple wideband and nonstationary source localization with unknown number of sources," IEEE Trans. Signal Process., vol. 58, pp.3125-3136, Jun.2010.
    [28]H. C. Schau and A. Z. Robinson, "Passive source localization employing intersecting spherical surfaces from time-of-arrival differences," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-35, pp.1223-1225, Aug.1987.
    [29]J. O. Smith and J. S. Abel, "Closed-form least-squares source location estimation from range-difference measurements," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-35, pp.1661-1669, Dec.1987.
    [30]Y. Huang, J. Benesty, G. Elko, and R. Mersereau, "Real-time passive source localization:A practical linearcorrection least-squares approach," IEEE Trans. Speech Audio Process., vol.9, pp.943-956, Nov.2001.
    [31]P. Stoica and J. Li, "Source localization from range-difference measurements," IEEE Signal Process. Mag., vol.23, pp.63-65,69, Nov.2006.
    [32]C. H. Knapp and G. C. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, pp. 320-327, Aug.1976.
    [33]J. Chen, J. Benesty, and Y. Huang, "Robust time delay estimation exploiting redundancy among multiple microphones," IEEE Trans. Speech Audio Process., vol.11, no.6, pp.549-557, Nov.2003.
    [34]J. Chen, J. Benesty, and Y. Huang, "Time delay estimation in room acoustic environments:An overview," EURASIP J. Appl. Signal Process., pp.1-19,2006.
    [35]P. Chavali and A. Nehorai, "A low-complexity multi-target tracking algorithm in urban environments using sparse modeling," Elsevier Signal Process., vol.92, pp. 2199-2213,2012.
    [36]J. Valin, F. Michaud, and J. Rouat, "Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering," Elsevier Robotics and Autonomous Systems, vol.55, no.3, pp.216-228,2007
    [37]M. Fallon and S. Godsill, "Acoustic source localization and tracking of a time-varying number of speakers," IEEE Trans. Audio, Speech, Lang. Proceess., vol.20, no.4, pp.1409-1415, May 2012.
    [38]P. Gutman and M. Velger, "Tracking targets using adaptive Kalman filtering," IEEE Trans. Aerosp. Electron. Syst., vol.26, pp.691-699, Sep.1990.
    [39]N. Strobel, S. Spors, and R. Rabenstein, "Joint audio-video object localization and tracking," IEEE Signal Process. Mag., vol.18, pp.22-31, Jan.2001.
    [40]N. Strobel, S. Spors, and R. Rabenstein, "Joint audio-video signal processing for object localization and tracking," in Microphone Arrays:Signal Processing Techniques and Applications, M. S. Brstein and D. B. Ward, Eds. Berlin, Germany:Springer-Verlag,2001.
    [41]彭科,基于差分阵列的自适应近讲麦克风阵列研究,南京大学博士学位论文,2005.
    [42]C. Zhang, D. Florencio, D. Ba, and Z. Zhang, "Maximum likelihood sound source localization and beamforming for directional microphone arrays in distributed meetings," IEEE Trans. Multimedia, vol.10, no.3, pp.538-548, Apr.2008.
    [43]K. Abed-Meraim, W. Qiu, and Y. Hua, "Blind system identification," Proc. IEEE, vol.85, no.8, pp.1310-1322, Aug.1997.
    [44]H. Luo and Y. Li, "The application of blind channel identification techniques to prestack seismic deconvolution," Proc. IEEE, vol.86, no.10, pp.2082-2089, Oct. 1998.
    [45]J. K. Tugnait, "A multidelay whitening approach to blind identification and equalization of SIMO channels," IEEE Trans. Wireless Commun. vol.1, no.3, pp. 456-467, Jul.2002.
    [46]J. A. Cadzow, "Blind deconvolution via cumulant extrema," IEEE Signal Process. Mag., vol.13, no.3, pp.24-42, May 1996.
    [47]P. G. Georgiou, P. Tsakalides, and C. Kyriakakis, "Alpha-stable modeling of noise and robust time-delay estimation in the presence of impulsive noise," IEEE Trans. Multimedia, vol.1, no.3, pp.291-301, Sep.1999.
    [48]C. L. Nikias and M. Shao, Signal Processing with Alpha-Stable Distributions and Applications. New York, NY:Wiley,1995.
    [49]P. J. Huber, Robust Statistics. New York, NY:Wiley,1981.
    [50]H. Buchner, J. Benesty, and W. Kellermann, "Generalized multichannel frequency-domain adaptive filtering:Efficient realization and application to hands-free speech communication," Elsevier Signal Process., vol.85, no.3, pp. 549-570, Mar.2005.
    [51]Z. Zhang, "Parameter estimation techniques:A tutorial with application on conic fitting," Image Vis. Computing, vol.15, no. 1,pp.59-76,1997.
    [52]S. C. Chan and Y. X. Zou, "A recursive least M-estimate algorithm for robust adaptive filtering in impulse noise:fast algorithm and convergence performance analysis," IEEE Trans. Signal Process., vol.52, no.4, pp.975-991, Apr.2004.
    [53]P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection. New York:Wiley,1987.
    [54]A. Harma, "Acoustic measurement data from the varechoic chamber," Tech. Memo., Agere Systems,2001. Available: http://www.acoustics.hut.fi/~aqi/vardata/Varechoic_array_data.html.
    [55]P. Tsakalides and C. L. Nikias, "Maximum likelihood localization of sources in noise modeled as a stable process," IEEE Trans. Signal Process., vol.43, pp. 2700-2713, Nov.1995.
    [56]J. M. Chambers, C. L. Mallows, and B. W. Stuck, "A method for simulating stable random variables," J. Amer. Statist. Assoc., vol.71, pp.340-344, Jun.1976.
    [57]D. R. Morgan, J. Benesty, and M. M. Sondhi, "On the evaluation of estimated impulse responses," IEEE Signal Process. Lett., vol.5, pp.174-176, Jul.1998.
    [58]J. Benesty, J. Chen, and Y. Huang, "Time-delay estimation via linear interpolation and cross-correlation," IEEE Trans. Speech Audio Process., vol.12, no.5, pp. 509-519, Sep.2004.
    [59]J. Benesty, Y. Huang, and J. Chen, "Time delay estimation via minimum entropy," IEEE Signal Process. Lett., vol.14, pp.157-160, Mar.2007.
    [60]Z. Sun and A. Hoogs, "Image comparison by compound disjoint information with applications to perceptual visual quality assessment, image registration and tracking," Int. J. Comput. Vis., vol.88, no.3, pp.461-488, Jul.2010.
    [61]T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley,1991.
    [62]J. P. W. Pluim, J. B. A. Maintz, and M. A. Viergever, "Mutual-information-based registration of medical images:A survey," IEEE Trans. Med. Imag., vol.22, pp. 986-1004, Aug.2003.
    [63]H. Matsuda, "Physical nature of higher-order mutual information:Intrinsic correlations and frustration," Phys. Rev. E, vol.62, no.3, pp.3096-3102, Sep. 2000.
    [64]S. Gazor and W. Zhang, "Speech probability distribution," IEEE Signal Process. Lett., vol.10, no.7, pp.204-207, Jul.2003.
    [65]T. Eltoft, T. Kim, and T. W. Lee, "On the multivariate Laplace distribution," IEEE Signal Process. Lett., vol.13, no.5, pp.300-303, May 2006.
    [66]E. Funnan, "On a multivariate gamma distribution," Elsevier Stat. Prob. Lett., vol. 78, pp.2353-2360, Feb.2008.
    [67]J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small-room acoustics," J. Acoust. Soc. Amer., vol.65, pp.943-950, Apr.1979.
    [68]J. P. Ianniello, "Time delay estimation via cross-correlation in the presence of large estimation errors," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-30, pp.998-1003, Dec.1982.
    [69]B. Champagne, S. Bedard, and A. Stephenne, "Performance of time-delay estimation in presence of room reverberation," IEEE Trans. Speech Audio Process., vol.4, pp.148-152, Mar.1996.
    [70]G. C. Carter, "Time delay estimation for passive sonar signal processing," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, pp.463-470, Jun.1981.
    [71]Y. Huang, J. Benesty, and G. W. Elko, "Adaptive eigenvalue decomposition algorithm for real time acoustic source localization system," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.(ICASSP),1999, pp.937-940.
    [72]J. Benesty, "Adaptive eigenvalue decomposition algorithm for passive acoustic source localization,"J. Acoust. Soc. Amer., vol.107, pp.384-391, Jan.2000.
    [73]S. Doclo and M. Moonen, "Robust adaptive time delay estimation for speaker localization in noisy and reverberant acoustic environments," EURASIP J. Appl. Signal Process., vol.2003, pp.1110-1124, Nov.2003.
    [74]T. G. Dvorkind and S. Gannot, "Time difference of arrival estimation of speech source in a noisy and reverberant environment," Elsevier Signal Process., vol.85, pp.177-204,Jan.2005.
    [75]F. Talantzis, A. G. Constantinides, and L. C. Polymenakos, "Estimation of direction of arrival using information theory," IEEE Signal Process. Lett., vol.12, pp.561-564, Aug.2005.
    [76]M. S. Brandstein, "A pitch-based approach to time-delay estimation of reverberant speech," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), 1997, pp.19-22.
    [77]S. Haykin, Adaptive Filter Theory,4th ed. Englewood Cliffs, NJ:Prentice Hall, 2002.
    [78]G. H. Golub and C. F. VanLoan, Matrix Computations,3rd ed. Baltimore, MD: The Johns Hopkins Univ. Press,1996.
    [79]J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing. Berlin, Germany:Springer-Verlag,2008.
    [80]M. Delcroix, T. Hikichi, and M. Miyoshi, "Precise dereverberation using multichannel linear prediction," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, pp.430-440, Feb.2007.
    [81]J. Benesty, J. Chen, and Y. Huang, "Linear prediction," in Springer Handbook of Speech Processing, J. Benesty, M. M. Sondhi, and Y. Huang, Eds. Berlin, Germany:Springer-Verlag,2008.
    [82]L. Fox, An Introduction to Numerical Linear Algebra, Clarendon Press, Oxford, UK,1964.
    [83]J. Benesty, J. Chen, Y. Huang, and J. Dmochowski, "On microphone array beamforming from a MIMO acoustic signal processing perspective," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.3, pp.1053-1065, Mar.2007.
    [84]F. Khalil, J. P. Jullien, and A. Gilloire, "Microphone array for sound pickup in teleconference systems",J. Audio Eng. Soc., vol.42, pp.691-700, Sep.1994.
    [85]Y. Tamai, S. Kagami, H. Mizoguchi, K. Sakaya, K. Nagashima, and T. Takano, "Circular microphone array for meeting system," Proc. IEEE Sensors,2003, pp. 1100-1105.
    [86]Y. Huang, Real-Time Acoustic Source Localization with Passive Microphone Arrays, Ph.D. dissertation, Georgia Institute of Technology, USA,2001.
    [87]S. Doclo and M. Moonen, "Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics," IEEE Trans. Signal Process., vol.51, no.10, pp.2511-2526, Oct.2003.
    [88]G. W. Elko, "Microphone array systems for hands free telecommunications," Elsevier Speech Commun., vol.20, pp.229-240, Sep.1996.
    [89]G. W. Elko, "Superdirectional microphone arrays," in Acoustic Signal Processing for Telecommunication, S. L. Gay and J. Benesty, Eds. Norwell, MA:Kluwer, 2000.
    [90]G. W. Elko and J. Meyer, "Microphone arrays," in Springer Handbook of Speech Processing, J. Benesty, M. M. Sondhi, and Y. Huang, Eds. Berlin, Germany: Springer-Verlag,2008.
    [91]T. D. Abhayapala and A. Gupta, "Higher order differential-integral microphone arrays," JASA Express Lett., vol.127, no.5, pp. EL227-EL233, May 2010.
    [92]E. De Sena, H. Hacihabiboglu, and Z. Cvetkovic, "A generalized design method for directivity patterns of spherical microphone arrays," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP),2011, pp.125-128.
    [93]E. De Sena, H. Hacihabiboglu, and Z. Cvetkovic, "On the design and implementation of higher-order differential microphones," IEEE Trans. Audio, Speech, Language Process., vol.20, no.1, pp.162-174, Jan.2012.
    [94]J. Benesty, M. Souden, and Y. Huang, "A perspective on differential microphone arrays in the context of noise reduction," IEEE Trans. Audio, Speech, Lang. Proceess., vol.20, no.2, pp.699-704, Feb.2012.
    [95]J. Benesty and J. Chen, Study and Design of Differential Microphone Arrays, Berlin, Germany:Springer-Verlag,2012.
    [96]http://www.audio-technica.com/cms/wired_mics/3e0e3e5cec6fe424/.
    [97]A. C. Ansari and M. S. Whalen, Automatic video tracking system, U.S. Pat. No. EP0765084A2,1997.
    [98]P. Chu, M. Kenoyer, and R. Washington, Videoconferencing system with horizontal and vertical microphone arrays, U.S. Pat. No.6922206B2,2005.
    [99]http://www.polycom.com/products-services/voice/conferencing-solutions/confere ncing-phones/voicestation-300.html.
    [100]http://enterprise.huawei.com/en/products/coll-communication/telepresence-videoc onference/endpoint-w/hw-u_150511.htm.
    [101]http://www.yamaha. com/products/en/communication/audio_conference_systems/ pjp-50r/.
    [102]杜功焕,朱哲民,龚秀芬,声学基础(第二版),南京:南京大学出版社,2001.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700