基于麦克风阵列的声源定位与语音增强方法研究

英文题名：Study on Methods of Microphone Array Based Sound Source Localization and Speech Enhancement
作者：崔玮玮
论文级别：博士
学科专业名称：信息与通信工程
中文关键词：麦克风阵列 ; 波束形成 ; 时延估计 ; 声源定位 ; 语音增强
英文关键词：Microphone array ; beamforming ; time delay estimation ; source localiza-tion ; speech enhancement
学位年度：2009
导师：曹志刚
学科代码：081001
学位授予单位：清华大学
论文提交日期：2008-12-01

摘要

在免提电话、视频会议等语音通信系统中,由于受到混响和背景噪声干扰,麦克风接收到的信号通常为带噪语音。这样不仅影响语音的可懂度,而且影响语音处理系统的整体性能。因此需要对带噪语音进行增强处理。在复杂的声学环境下,单麦克风语音增强已无法满足需求,而麦克风阵列处理技术能够捕捉声源位置并对带噪语音进行空间滤波,从而取得明显的消噪效果。在此背景下,本论文研究了基于麦克风阵列的声源定位和语音增强方法,主要工作如下:
     (1)归纳并总结了各种时延估计(time delay estimation, TDE)技术,特别针对一些常用的TDE方法进行了深入讨论,包括对定源和动源的跟踪能力,不同混响和信噪比条件下的抗干扰稳健性,以及算法的计算量。通过仿真结果总结出了它们各自的优缺点及适用场合。
     (2)提出一种双麦克风2D平面定位方法:该方法通过同时考虑阵列接收信号的时延和能量信息,将传统双步定位方法中所需的3个麦克风减为2个,降低了设备成本。在此基础上获得的闭式解方便了算法的快速处理。进而针对该定位模型,在测量噪声服从高斯分布的假设下,本论文推导出位置估计方差的Cramer-Rao下界,并由此分析了不同参数对定位结果的影响。
     (3)提出一种基于搜索空间预估计的高分辨方位(direction of arrival, DOA)估计方法:本论文利用TDE结果来获得高分辨DOA估计的搜索空间。这不仅使得计算量小于现有算法的1/3,而且还能够部分地去除干扰噪声的方向。在会议室环境下,实际定位系统(包含7个麦克风)的测试结果表明:在加入和未加入搜索空间预估计时,DOA估计的最大误差分别为4.4?和11.4?。
     (4)提出一种基于一阶差分麦克风(first-order di?erential microphone, FDM)阵列的谱域语音增强方法:该方法利用双通道的FDM阵列,并结合单通道的谱增强技术,可以同时提取语音和噪声谱估计,并实时地修正噪声谱。与现有的双通道语音增强技术相比,该算法可以获得2dB～6dB的输出信噪比增益,且计算量减少了2/3。
In many speech communication systems, such as hands-free telephone and video-conference, the speech signal received by a microphone is often corrupted by the rever-berations and background noises. It not only a?ects the intelligibility of speech signals,but also degrades the overall performance of speech processing systems. Therefore,it is necessary to develop speech enhancement methods to suppress the interferencenoises. In diverse acoustical environments, speech enhancement from a single mi-crophone fails to meet the requirements. While, an alternative solution, referred toas microphone array processing techniques, can obtain a significant noise reduction bycapturing the location of a sound source and implementing the spatial filtering on noisysignals. Herein, this dissertation focuses on microphone array based sound source lo-calization and speech enhancement methods, and the contributions are as follows.
     (1) Summarized the di?erent kinds of time delay estimation (TDE) techniques.Specifically, the most popular TDE methods are studied on tracking ability ofstationary and moving sources, robustness under di?erent reverberation levelsand signal-to-noise ratios (SNR), as well as the computational complexity. Basedon simulation results, this dissertation presents the advantages and disadvantagesof these algorithms and their applications.
     (2) Proposed a dual-microphone based source localization method in 2D space. Bycombining the information of time delay and energy attenuation of the receivedsignals, the proposed method reduces the number of microphones for localiza-tion to 2. Compared with 3 microphones required in the conventional two-steplocalization methods, this work cuts o? the device cost. Besides, the closed formsolution obtained in this dissertation facilitates the algorithm’s implementationand procession. Furthermore, under the assumption of Gaussian measurementerror, the Cramer-Rao lower bound of the estimated position’s variance is derivedfor the proposed localization model, and the impacts of di?erent parameters on localizing accuracy are also analyzed.
     (3) Proposed a high resolution direction of arrival (DOA) estimation method basedon searching space pre-estimation. This work utilizes the TDE result to obtaina candidate searching space for the high resolution DOA estimation. It not onlyreduces the computational consumption to less than 1/3 of the existing methods,but also can partially eliminate the directions of interference noises. In a real-istic conference room, experiment results of the localization system, composedof 7 microphones, show that: with and without searching space pre-estimationprocessing, the maximal error of DOA estimate is 4.4? and 11.4?, respectively.
     (4) Proposed a first-order di?erential microphone (FDM) array based spectral do-main speech enhancement method. This method applies dual-microphone FDMarray, in combination with single-channel spectral enhancement techniques, thusit can obtain an estimation of speech spectrum and noise spectrum simultane-ously, while correcting the noise spectrum in real time. Compared with thepresent dual-channel speech enhancement techniques, this method can achieve2dB～6dB output SNR gain, and reduce the computational complexity by 2/3.

引文

[1] Boll S F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans-actions on Acoustics, Speech and Signal Processing, 1979, 27(2):113–120.
    [2] Kamath S, Loizou P. A multi-band spectral subtraction method for enhancing speech cor-rupted by colored noise. Proceedings of IEEE International Conference on Acoustics,Speech, and Signal Processing, Orlando, Florida, 2002, 675–678.
    [3] Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and SignalProcessing, 1984, 32(6):1109–1121.
    [4] Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Pro-cessing, 1985, 33(2):443–445.
    [5] Scalart P, Filho J V. Speech enhancement based on a priori signal to noise estimation. Pro-ceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing,volume 2, Atlanta, GA, 1996, 2:629–632.
    [6] Ephraim Y, Van Trees H. A signal subspace approach for speech enhancement. IEEETransactions on Speech and Audio Processing, 1995, 3(4):251–266.
    [7]魏建强.基于小数量麦克风的语音增强算法研究[博士学位论文].北京:中国科学院声学研究所, 2005.
    [8]马晓红.传声器阵列语音增强中关键技术的研究[博士学位论文].大连:大连理工大学, 2006.
    [9]林静然.基于麦克风阵列的语音增强算法研究[博士学位论文].成都:电子科技大学,2007.
    [10] Huang Y T. Real-time Acoustic Source Localization with Passive Microphone Arrays[D].USA: Georgia Institute of Technology, February, 2001.
    [11] Doclo S. Multi-Microphone Noise Reduction and Dereverberation Techniques for SpeechApplications[D]. Belgium: Katholieke Universiteit Leuven, May, 2003.
    [12] Dvorkind T G. Speaker Localization in a Reverberant and Noisy Environment[M]. Haifa,Isreal: Technion– Isreal Institute of Technology, December, 2003.
    [13] Ryan J G, Goubran R A. Application of near-field optimum microphone arrays to hands-free mobile telephony. IEEE Transactions on Vehicular Technology, 2003, 52(2):390–400.
    [14] Wang H, Chu P. Voice source localization for automatic camera pointing system in video-conferencing. Proceedings of IEEE Workshop on Applications of Signal Processing toAudio and Acoustics, New Paltz, NY, 1997.
    [15] Video Conference System[M/OL]. http://www.polycom.com/usa/en/products/video/videoconferencing systems/video conferencing systems.html/.
    [16] Microphone Array Support in Windows Vista[M/OL], 2005. http://www.microsoft.com/whdc/device/audio/MicArrays.mspx/.
    [17] How to Build and Use Microphone Arrays for Windows Vista[M/OL], 2006. http://www.microsoft.com/whdc/device/audio/MicArrays guide.mspx/.
    [18] Advanced IntelliSonic software[M/OL]. http://www.knowles.com/search/products/arraytechnologies.jsp/.
    [19] Flanagan J L. Computer-steered microphone arrays for sound transduction in large rooms.Journal of Acoustic Society of America, 1985, 78(5):1508–1518.
    [20] McCowan I A. Microphone Arrays: A Tutorial. 2001.
    [21] Johnson D, Dudgeon D. Array Signal Processing: Concepts and Techniques, First edition.Englewood Cli?s, NJ, USA: Prentice Hall, 1993.
    [22] Ifeachor E, Jervis B. Digital Signal Processing: A Practical Approach. Boston, MA:Addison-Wesley, 1996.
    [23] Allen J B, Berkely D A. Image method for e?ciently simulating small room acoustics.Journal of the Acoustical Society of America, 1979, 65:943–950.
    [24]杜功焕,朱哲民,龚秀芬.声学基础.南京:南京大学出版社, 2001.
    [25] Brandstein M S, Ward E D B. Microphone Arrays: Signal Processing Techniques andApplications. Berlin: Springer-Verlag, 2001.
    [26] Hahn W R. Optimum signal processing for passive sonar range and bearing estimation.Journal of Acoustical Society of America, 1975, 58(1):201–207.
    [27] Carter G. Variance bounds for passively locating an acoustic source with a symmetric linearray. Journal of Acoustical Society of America, 1977, 62(4):922–926.
    [28]陆晓燕.基于麦克风阵列实现声源定位[硕士学位论文].大连:大连理工大学, 2003.
    [29] Chan Y T, Ho K C. A simple and e?cient estimator for hyperbolic location. IEEE Trans-actions on Signal Processing, 1994, 42(8):1905–1915.
    [30] Huang Y T, Benesty J, Elko G W. Passive acoustic source localization for video camerasteering. Proceedings of IEEE International Conference on Acoustic, Speech and SignalProcessing, Istanbul, Turkey, 2000, 2:909–912.
    [31] Knapp C H, Carter G C. The generalized correlation method for estimation of time delay.IEEE Transactions on Acoustics, Speech and Signal Processing, 1976, 24(4):320–327.
    [32] Brandstein M S, Adcock J E, Silverman H F. A practical time-delay estimator for localizingspeech sources with a microphone array. Computer, Speech, and Language, 1995, 9(2):153–169.
    [33] Omologo M, Svaizer P. Acoustic event localization using a cross power spectrum phasebased technique. Proceedings of IEEE International Conference on Acoustics, Speech, andSignal Processing, Adelaide, SA, Australia, 1994, 2:273–276.
    [34] Omologo M, Svaizer P. Acoustic source location in noisy and reverberant environmentusing CSP analysis. Proceedings of IEEE International Conference on Acoustics, Speech,and Signal Processing, Atlanta, GA, 1996, 921–924.
    [35] Benesty J. Adaptive eigenvalue decomposition algorithm for passive acoustic source local-ization. Journal of Acoustical Society of America, 2000, 107(1):384–391.
    [36] Doclo S, Moonen M. GSVD-based optimal filtering for signle and multimicrophone speechenhancement. IEEE Transactions on Signal Processing, 2002, 50(9):2230–2244.
    [37] Dvorkind T G, Gannot S. Time di?erence of arrival estimation of speech source in a noisyand reverberant environment. Signal Processing, 2005, 85(1):177–204.
    [38] Schmidt R O. Multiple emitter location and signal parameter estimation. IEEE Transactionson Antennas and Propagation, 1986, 34(33):276–280.
    [39] Cadzow J A, Kim Y S, Shiue D C. General direction of arrival estimation: a signal subspaceapproach. IEEE Transactions on Aerospace and Electronic Systems, 1989, 25(1):31–47.
    [40] Pillai S U, Kwon B H. Forward/backward spatial smoothing techniques for coherent sig-nal identification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989,37(1):8–15.
    [41] Yang Y, Wan C, Sun C, et al. DOA estimation for coherent sources in beamspace us-ing spatial smoothing. Proceedings of the 4th International Conference on Information,Communications & Signal Processing– 4th IEEE Pacific-Rim Conference On Multimedia,Singapore, 2003, 1028–1032.
    [42]居太亮.基于麦克风阵列的声源定位算法研究[博士学位论文].成都:电子科技大学,2006.
    [43] Khalil F, Jullien J P, Gilloire A. Microphone array for sound pickup in teleconferencesystems. Journal of the Audio Engineering Society, 1994, 42(9):691–700.
    [44] Chou T. Frequency-independent beamformer with low response error. Proceedings of IEEEInternational Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA,1995, 5:2995–2998.
    [45] Marciano J S, Vu T B. Reduced complexity beam space broadband frequency invariantbeamforming. Electronics Letters, 2000, 36:682–683.
    [46] Frost O L. An algorithm for linearly constrained adaptive array processing. Proceedings ofthe IEEE, 1972, 60(8):926–935.
    [47] Gri?ths L J, Jim C W. An alternative approach to linearly constrained adaptive beamform-ing. IEEE Transactions on Antennas Propagation, 1982, 30(1):27–34.
    [48] Fischer S, Simmer K U. Beamforming microphone arrays for speech acquisition in noisyenvironments. Speech Communication, 1996, 20(3):215–227.
    [49] Elko G W, Pong A T N. A simple adaptive first-order di?erential microphone. Proceedingsof IEEE workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz,NY, USA, 1995, 169–172.
    [50] Luo F L, Yang J, Pavlovic C, et al. Adaptive null-forming scheme in digital hearing aids.IEEE Transactions on Acoustics, Speech and Signal Processing, 2002, 50(7):1583–1590.
    [51] Carter G C, Nuttall A H, Cable P G. The Smoothed Coherence Transform (SCOT). Tech-nical report, Naval Underwater Systems Center, New London Lab., CT, Tech. Memo TC-159-72, August, 1972.
    [52] Carter G C, Nuttall A H, Cable P G. The smoothed coherence transform. IEEE SignalProcessing Letter, 1973, 61(10):1497–1498.
    [53] Reed F A, Feintuch P L, Bershad N J. Time delay estimation using the LMS adaptive filter-static behavior. IEEE Transactions on Acoustics, Speech and Signal Processing, 1981,29(3):561–571.
    [54] Youn D H, Ahmed N, Carter G C. On using the LMS algorithm for time delay estimation.IEEE Transactions on Acoustics, Speech and Signal Processing, 1982, 30(5):798–801.
    [55] Brandstein M, Silverman H. A robust method for speech signal time-delay estimation inreverberant rooms. Proceedings of IEEE International Conference on Acoustic, Speech andSignal Processing, Munich, Germany, 1997, 375–378.
    [56] Jian M, Kot A C, Er M H. Performance study of time delay estimation in a room environ-ment. Proceedings of IEEE International Symposium on Circuits and Systems, volume 5,Monterey, CA, USA, 1998, 5:554–557.
    [57] Champagne B, Stephene A. A new cepstral prefiltering technique for estimating time delayunder reverberant conditions. Signal Processing, 1997, 59(3):253–266.
    [58] Brandstein M S. A pitch-based approach to time-delay estimation of reverberant speech.Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acous-tics, New Paltz, NY, 1997.
    [59] Doclo S, Moonen M. Robust time-delay estimation in highly adverse acoustic environ-ments. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio andAcoustics, New Paltz, NY, 2001, 59–62.
    [60] Dvorkind T G, Gannot S. Approaches for time di?erence of arrival estimation in a noisyand reverberant environment. Proceedings of International Workshop on Acoustic Echo andNoise Control, Kyoto, Japan, 2003, 215–218.
    [61] Champagne B, Stephene A. Performance of time-delay estimation in the presence of roomreverberation. IEEE Transactions on Speech and Audio Processing, 1996, 4(2):148–152.
    [62] Liu Q G, Champagne B, Kabal P. Room Speech Dereverberation via Minimum-phase andAll-pass Component Processing of Multi-microphone Signals. Proceedings of IEEE Pa-cific Rim Conference on Communications, Computers and Signal Processing, Victoria, BC,Canada, 1995, 571–574.
    [63] Oppenheim A V, Schafer R W. Digital Signal Processing. Englewood Cli?s, NJ: Prentice-Hall, 1975.
    [64] Ma X H, Liang L L, Yin F L. Generalized crosspower-spectrum phase method. Proceedingsof International Conference on Communications, Circuits and Systems, volume 2, Chengdu,China, 2004, 2:826–829.
    [65] Mahmoudi D. Speech source localization using a multi-resolution technique. Proceedingsof IEEE Workshop on Interactive Voice Technology for Telecommunications Applications,Torino, Italy, 1998, 161–165.
    [66] Zurek P M. The precedence e?ect and its Possible role in the avoidance of interaural ambi-guities. Journal of the Acoustical Society of America, 1980, 67:952–964.
    [67] Goodridge S G. Multimedia Sensor Fusion for Intelligent Camera Control and Human-Computer Interaction[D]. USA: North Carolina State University, 1997.
    [68] Zhang C, Florencio D, Zhang Z. Why does PHAT work well in low noise, reverberativeenvironments? Proceedings of IEEE International Conference on Acoustics, Speech, andSignal Processing, Las Vegas, Nevada, USA, 2008, 2565–2568.
    [69]能够在笔记本电脑附近的纸张上书写的电子笔[M/OL]. http://www.intel.com/cd/corporate/icrc/apac/zho/190556.htm/.
    [70] Birchfield S T, Gangishetty R. Acoustic localization by interaural level di?erence. Proceed-ings of IEEE International Conference on Acoustic, Speech and Signal Processing, volume4, Philadelphia, PA, 2005, 4:1109–1112.
    [71] Foy W H. Position-localization solution by Taylor-series estimation. IEEE Transactions onAerospace and Electronic Systems, 1976, 12(2):187–194.
    [72] Hahn W R, Tretter S A. Optimum processing for delay-vector estimation in passive signalarrays. IEEE Transactions on Information Theory, 1973, 608–614.
    [73] Schau H C, Robinson A Z. Passive source localization employing intersection spherical sur-faces from time-of-arrival di?erence. IEEE Transactions on Acoustics, Speech and SignalProcessing, 1987, 35(8):1223–1225.
    [74] Smith J O, Abel J S. Closed-Form least-square source estimation from range-di?erentmeasurement. IEEE Transactions on Acoustics, Speech and Signal Processing, 1987,35(12):1661–1669.
    [75] Abel J S, Smith J O. The spherical interpolation method for close-form passive sourcelocalization using range di?erence measurements echo cancellation. Proceedings of IEEEInternational Conference on Acoustics, Speech, and Signal Processing, Dallas, TX, 1987,1:471–474.
    [76] Huang Y, Benesty J, Elko G W, et al. Real-time passive source localization: a practicallinear-correction least-squares approach. IEEE transactions on Speech and Audio Process-ing, 2001, 9(8):943–956.
    [77] Popescu D C, Rose C. Emitter localization in a multipath environment using extendedkalman filter. Proceedings of the 33rd Conference on Information Sciences and Systems,Baltimore, Maryland, USA, 1999, 1:147–150.
    [78] Rusell G F, Smith B A, Zimmerman T G. Digital pen using ultrasonic tracking. UnitedStates Patent, US6703570 B1, March, 2004.
    [79]张贤达.现代信号处理(第二版).北京:清华大学出版社, 2002.
    [80] Balan R V, Rosca J. Apparatus and method for estimating the direction of arrival of a sourcesignal using a microphone array. United States Patent, US2004013275, 2004.
    [81] Kim J T, Moon S H, Han D S, et al. Fast DOA estimation algorithm using pseudocovariancematrix. IEEE Transactions on Antennas and Propagation, 2005, 53(4):1346–1351.
    [82] Teague C C. Root-MUSIC direction finding applied to multifrequency coastal radar. Pro-ceedings of International Geoscience and Remote Sensing Symposium, Toronto, Canada,2002, 3:24–28.
    [83] Choi C, Kong D, Lee B, et al. Method and apparatus for robust speaker localiza-tion and automatic camera steering system employing the same. United States Patent,US20050080619A1, 2005.
    [84] Rossing T D. The Science Of Sound 2nd Edition. USA: Addison-Wesley, 1990.
    [85] Veen B D V, Buckley K M. Beamforming: a versatile approach to spatial filtering. IEEEASSP Magazine, 1988, 5(2):4–24.
    [86] Haykin S. Adaptive Filter Theory. Englewood Cli?s, NJ: Prentice-Hall, 1991.
    [87] Flanagan J L, Berkley D A, Elko G W, et al. Autodirective microphone systems. Acustica,1991, 73:58–71.
    [88] Zheng Y R, Goubran R A, El-Tanany M. Robust near-field adaptive beamforming withdistance discrimination. IEEE Transactions on Speech and Audio Processing, 2004,12(5):478–488.
    [89] Zelinski R. A microphone array with adaptive post-filtering for noise reduction in reverber-ant rooms. Proceedings of IEEE International Conference on Acoustics, Speech, and SignalProcessing, New York, NY, USA, 1988.
    [90]赵硕,曲天书,吴玺宏,等.一种改进的基于双麦克阵列的自适应零限波束形成语音增强方法.第八届全国人机语音通讯学术会议, 2005, 24(0):454–457.
    [91] Kellermann W. A self-steering digital microphone array. Proceedings of IEEE InternationalConference on Acoustics, Speech, and Signal Processing, Toronto, Canada, 1991, 5:3581–3584.
    [92] Mahieux Y, Le Tourneur G, Saliou A. A microphone array for multimedia workstations.Journal of the Audio Engineering Society, 1996, 44(5):365–372.
    [93] Bitzer J, Simmer K U. Multi-channel speech enhancement in a car environment usingwiener filtering and spectral subtraction. Proceedings of IEEE International Conference onAcoustics, Speech, and Signal Processing, Munich, Germany, 1997, 2:1167–1170.
    [94] Simmer K U, Wasilje? A. Adaptive microphone arrays for noise suppression in the fre-quency domain. Proceedings of the 2nd Cost 229 Workshop on Adaptive Algorithms inCommunications, Bordeaux-Technopolis, France, 1992, 185–194.
    [95] Bitzer J, Simmer K U, Kammeyer K D. Theoretical noise reductoin limits of the general-ized sidelobe canceller (GSC) for speech enhancement. Proceedings of IEEE InternationalConference on Acoustics, Speech and Signal Processing, Phoenix, AZ, 1999, 5:2965–2968.
    [96] Haykin S. Adaptive Filter Theory (Fourth Edition). Beijing: Publishing House of Electron-ics Industry, 2002.
    [97] You C H, Koh S N, Rahardja S. Masking-basedβ-order MMSE speech enhancement.Speech Communication, 2006, 48:57–70.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700