基于麦克风阵列的近场和远场混合声源定位
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
声源的定位是实现语音识别和语音增强的前提和基础,它具有广阔的应用前景。随着数字信号处理与阵列信号处理技术的发展和进步,麦克风阵列已广泛应用于声源定位中,但当前大多数基于麦克风阵列的声源定位技术中,要么信源完全处于近场源,要么信源完全处于远场源,另多数假设信源为窄带信号,而实际生活中语音信号为宽带信号。针对这些问题,本文深入研究了在混合近场和远场的情况下基于麦克风阵列的声源定位技术。主要内容如下:
     第一、分析了语音信号的特性,介绍了传统的窄带信号处理模型和宽带信号处理模型,研究了麦克风阵列均匀线阵在远场和近场的两种模型。
     第二、由于麦克风阵列不仅接收有用语音信号,还有其他各种各样的噪声,因此需要对得到的数据进行预处理,包括预滤波、预加重,归一化,加窗分帧,短时能量检测,和语音降噪等,本文对语音活动检测进行了研究,为了得到时域上的对数能量和频域上的子带谱熵这两种方法各自的优点,文中采用了一种新的对数能量子带谱熵法。
     第三、研究了近场MUSIC算法,分析了假若信号源既有处在远场又有处在近场时的信号模型,给出了用MUSIC算法在混合场中对语音宽带信号进行定位的算法,该算法首先将信号源的到达角和距离进行分离,推算出一个只含有到达角信息的新的方向矩阵,然后运用MUSIC算法得到所有信号源的到达角,最后基于已得到的到达角信息和远场距离特性,再次通过MUSIC算法获得对远场与近场声源的定位。
     第四、研究了在近场和混合场两种不同情况下基于稀疏分解的声源定位算法,当信源处在混合场时,本文根据混合场的信号模型,给出了构造适合麦克风阵列混合场的原子库的方法,然后使用匹配追踪算法完成在混合场的声源方位估计。通过实验仿真可知该算法在低信噪比情况下有较好的鲁棒性。
The localization of sound source is the basis and prerequisite for realization of speech recognition and speech enhancement. It has broad application prospects. As the development of the technology of the digital and array signal's processing, the microphone array has been widely used in the localization of sound source, but the most of sound source localization based on microphone array technology, either completely in near field source, or in the far field source. The most assuming source is narrow-band signal, but voice signal in real life is broadband signal.To solve these problems, this paper deeply studied in the case of mixed near field and far field of sound source localization based on microphone array technology. The main contents can be stated as follows:
     First, it analyses the characteristics of the speech signal, introduces the traditional narrow band signal processing model and broadband signal processing model, and studies the model of the uniform linear array microphone array in the far field and near field.
     Second, it is necessary to preprocess the data because the microphone array receives both the useful speech signal and all other kinds of noise. The preprocessing includes pre-filtering, pre-emphasis and normalization, and window frame, short-time energy detection, and voice noise reduction, etc. In this paper, the voice activity detection is studied. In order to take advantages of these two kinds of methods:logarithmic energy on the time domain and band-partitioning spectral entropy on the frequency domain, this paper gives a new logarithmic band-partitioning energy spectral entropy.
     Third, it studies the near-field MUSIC algorithm, and the signal model with the situation that the signal is in both the far field and the near field, and gives the algorithm of using MUSIC algorithm to locate the wideband speech signal in the hybrid field. The algorithm first separates the arrival direction and distance of the signal source, then gets a direction matrix which only contains the information of arrival direction. Then we can get the arrival angle of all sources with the apply of the MUSIC algorithm. At the end, we can obtain the far field and near field sound source localization through the MUSIC algorithm, based on the characteristics of the information of arrival direction and the far field.
     Last, it studies the sound source localization algorithm based on sparse decomposition in the near field and the mixed field. When the source is mixed in field, this paper, based on the signal model of hybrid field, puts forward the mixed structure suitable for microphone array field method of atomic library, and then succeeds in estimating the direction in mixed field of the sound, using the matching pursuit. Simulation shows that the algorithm has good robustness through the experiment under the condition of low signal-to-noise ratio.
引文
[1]Stenger B, Thayananthan A, Torr P. Model-based hand tracking using a hierarchical Bayesian filter[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence.2006, 28 (9):1372-1384.
    [2]Wang H, Chu P. Voice source localization for automatic camera pointing system in video-conferencing[C]. IEEE Workshop on Application of Signal Processing to Audio and Acoustics, New Paltz, NY,1997.
    [3]Asano F, ASoh H, Matsui T. Sound source localization and signal separation for office robot'Jijo-2'[C]. IEEE International Conference on Multisensor Fusion, Taipei, Taiwan, 1999:243-248
    [4]Lv X L, Zhang M L. Sound source localization based on robot hearing and vision[C]. International Conference on Computer Science and Information Technology, Singapore, 2008:942-946.
    [5]Aarabi P. The fusion of distributed microphone arrays for sound localization[J]. EURASIP Journal on Advances in Signal Processing.2003 (4):338-347.
    [6]Sasaki Y, Kagami S, Mizoguchi H. Multiple sound source mapping for a mobile robot by self-motion triangulation[C]. International Conference on Intelligent Robots and Systems, Beijing,2006:380-385.
    [7]Sachar J M, Silverman H F, Patterson W R. Microphone Position and gain calibration for a large-aperture microphone array[J]. IEEE Transactions on Speech and Audio Processing.2005,13 (1):42-52.
    [8]Kwon B, Park Y. Sound Source Localization for Robot Auditory System Using the Summed GCC Method[C]. International Conference on Control, Automation and Systems, Seoul,2008:242-245.
    [9]Bucher H, Kellermann W. An acoustic human-machine interface with multi-channel Sound reproduction[C]. IEEE International Workshop on Multimedia Signal Processing, Cannes, France,2001:359-364.
    [10]Wu S, Hong L. Hand tracking in a natural conversational environment by the interacting multiple model and Probabilistic data association (IMM-PDA) algorithm[J]. Pattern Recognition.2005,38 (11):2143-2158.
    [11]Ryan J G, Goubran R A. Application of near-field optimum microphone arrays to hands-free mobile telephone[J]. IEEE Transactions on Vehicular Technology.2003,52 (2):390-400.
    [12]Brandstein M S, Ward E D B. Microphone Arrays:Signal Processing Techniques and Applications[M]. Berlin Springer-Verlag,2001:181-195.
    [13]Youn D H, Ahmed N, Carter G C. On using the LMS algorithm for time delay estimation[J]. IEEE Transactions on Acoustics, Speech and Signal Processing.1982,30 (5):798-801.
    [14]Strobel N, Spors S, Rabenstein R. Joint audio-video object localization and tracking[J]. IEEE Signal Processing Magazine.2001,18 (1):22-31.
    [15]Asano F, Asoh H, Matsui T. Sound Source Localization and Separation in Near Field[J]. IEICE Transactions on Fundamentals of Electronics.2000,83 (11):2286-2294.
    [16]Knapp C, Carter G C A. The generalized correlation method for estimation of time delay[J]. IEEE Transaction on Acoustics, Speech and Signal Processing.1976,24 (4): 320-327.
    [17]Youn D H, Ahmed N, Carter G C. On using the LMS Algorithm for time delay estimation[J]. IEEE Transactions on Acoustics, Speech and Signal Processing.1982,30 (5):798-801.
    [18]Mallat S, Zhang Z. Matching pursuits with time-frequency dictionaries [J]. IEEE Transactions on Signal Processing.1993,41 (12):3397-3415.
    [19]Schmidt R O. Multiple emitter location and signal parameter estimation[J]. IEEE Transactions on Antennas and Propagation.1986,34 (3):276-280.
    [20]居太亮,彭启琮.基于任意麦克风阵列的近场声源三维定位算法研究[J].信号处理.2007,23(2):231-234.
    [21]居太亮.基于麦克风阵列的声源定位算法研究[D].电子科技大学博士论文.2006:43-72.
    [22]居太亮,彭启琮,邵怀宗、林静然.基于任意麦克风阵列的声源二维DOA估计算法研究[J].通信学报.2005,8(26):129-133.
    [23]林静然.基于麦克风阵列的语音增强算法研究[D].电子科技大学博士论文.2007:36-53.
    [24]林静然,彭启琼,邵怀宗.基于麦克风均匀圆阵的近场声源定位及分离[C].2004年全国博士生学术论坛论文集.成都,2004:73-76.
    [25]林静然,彭启琼,邵怀宗等.基于麦克风阵列的宽带鲁棒自适应波束形成算法[J].通信学报.2006,27(12);132-138.
    [26]居太亮,彭启琮,邵怀宗,林静然.麦克风阵列二维方向估计聚焦算法研究[J].电子 科技大学学报.2008,37(2):225-228.
    [27]殷作亮.基于麦克风阵列的MUSIC声源定位算法研究[D].哈尔滨工业大学硕士学位论文.2008:21-37.
    [28]胡郢.麦克风阵列声源定位和语音增强技术研究[D].哈尔滨工程大学硕士学位论文.2008:43-53.
    [29]Liang J L, Liu D. Passive Localization of Mixed Near-Field and Far-Field Sources Using Two-stage MUSIC Algorithm[J]. IEEE Transactions on Signal Processing.2010,58 (1): 108-120.
    [30]Liu C L, Hang H M. Direction of Arrival Estimation of Speech Signals Using ICA and MUSIC Methods[C]. IEEE Conference on Industrial Electronics and Applications, Taichung,2010:1768-1773.
    [31]Karabulut G Z, Kurt T, Yongacoglu A. Angle of arrival detection by matching pursuit algorithm[C]. IEEE Vehicular Technology Conference,2004,1:324-328.
    [32]Karbaulut G. Z, Kurt T, Yongaeoglu A. High resolution estimation of directions of arrival[C]. IEEE Vehicular Technology Conference,2005,1:20-24.
    [33]于洋.稀疏逼近方法在阵列信号测向中的应用研究[D].西南交通大学硕士论文.2007:17-27.
    [34]汪健.基于空域滤波的语音分离研究[D].西南交通大学硕士论文.2011:10-22.
    [35]Tom Sullivan. CMU Microphone Array Database, http://fife.speech.cs.cmu.edu/Data-base/micarray/cmu_tms_multimic.tar.gz,2004.
    [36]胡航.语音信号处理[M].哈尔滨工业大学出版社,2000:19-46.
    [37]赵力.语音信号处理[M].机械工业出版社,2009:31-44.
    [38]Brandwood D. Fourier Transforms in Radar and Signal Processing[M]. Artech House INC,2003:161-187.
    [39]王永良,孙辉,彭应宇等.空间谱估计理论与算法[M].清华大学出版社,2005:18-25,82-98.
    [40]Viberg H, Krim M. Two decades of array signal processing research[J]. IEEE Signal Processing Magazine.1996,13 (4):67-94.
    [41]Cadzow J A, Kim Y S, Shiue D C. General direction-of-arrival estimation: a signal subspace approach[J]. IEEE Transaction on Aerospace and Electronic Systems.1989,25 (1):31-47.
    [42]Zheng Y R, Goubran R A, El-Tanay M. Experimental evaluation of a nested microphone array with adaptive noise canceller[J]. IEEE Transaction on Instrumentation and Measurement.2004,53 (3):777-786.
    [43]Doclo S. Multi-microphone noise reduction and dereverberation techniques for speech application[D]. Ph.D. Thesis, ESAT, Katholieke University Leuven, Belgium.2003: 217-232.
    [44]Wang H, Kaveh M. Estimation of angles-of-arrival for wideband sources[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing,1984,9:279-282.
    [45]Chen J, Benesty J, Huang Y T. Robust time delay estimation exploiting redundancy among multiple microphones [J]. IEEE Transactions on Speech Audio Processing.2003, 11 (6):549-557.
    [46]肖述才,王作英.端点检测中的一种新的对数能量特征[J].电声技术.2004(6):37-41.
    [47]Shen J L, Hung J W, Lee L S. Robust entropy-based endpoint detection for speech recognition in noisy environments[C]. International Conference on Spoken Language, Sydney,1998:232-235.
    [48]Jia C, Xu B. An Improved Entropy-Based Endpoint Detection Algorithm[C]. International Symposium on Chinese Spoken Language Processing, Taipei, Taiwan,2002:96-102.
    [49]Schneider P. Wavelet thresholding of motivated images [J]. IEEE Transaction on Image Processing.2004,13 (4):475-483.
    [50]Benesty J, Chen J D, Huang Y T. Noise Reduction in Speech Processing[M]. Berlin, Springer-Verlag Berlin Heidelberg,2009:77-94.
    [51]Shu X L, Han S P. Improvement of DOA Estimation using Wavelet de-noising[C]. International Conference on Information Science and Engineering, Nanjing,2009: 587-590.
    [52]许文博.小波去噪方法分析与研究[C].四川省通信学会2011年学术年会,成都,2011.中国学术期刊电子出版社,2011:197-200.
    [53]Schmidt R O. Multiple emitter location and signal parameter estimation[J]. IEEE Transaction on Antennas and Propagation.1986,34 (3):276-280.
    [54]Benesty J, Chen J D, Huang Y T. Microphone Array Signal Processing[M]. Berlin,2008 Springer-Verlag Berlin Heidelberg,2008:85-114.
    [55]徐元欣,王安国,聂仲尔.基于四阶累积量的近远场源多参数联合估计算法[J].电子与信息学报.2011,33(6):1379-1384.
    [56]蒋佳佳,段发阶,陈劲.一种近场和远场混合信号源的分类和定位算法[J].华中科技大学学报.2013,41(4):46-50.
    [57]王建英,尹忠科,张春梅.信号与图像的稀疏分解及初步应用[M].西南交通大学出版社,2006:49-70.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700