数字助听器中语音增强技术的研究

英文题名：Research on Speech Enhancement Techniques of Digital Hearing Aids
作者：安扣成
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：数字助听器 ; 语音增强 ; 语音端点检测 ; 先验信噪比估计 ; 增益平滑 ; 掩蔽效应
英文关键词：Digital Hearing Aids ; Speech Enhancement ; Voice Activity Detection ; a prior SNR Estimation ; Gain Smoothing ; Masking Effect
学位年度：2012
导师：张玲华
学科代码：081002
学位授予单位：南京邮电大学
论文提交日期：2012-03-01

摘要

语音增强的主要目的是消除带噪语音信号中的噪声,提取较为纯净的语音信号,该技术对提高数字助听器的性能具有重要意义。本文主要研究数字助听器中的语音增强技术,通过对基于先验信噪比估计算法的语音增强方法和传统的基于听觉掩蔽效应的语音增强方法的研究,提出了改进算法。论文的主要工作包括:
     1、对数字助听器中的语音增强技术进行了研究,重点研究了基于先验信噪比估计算法的语音增强方法和传统的基于听觉掩蔽效应的语音增强方法,对不同算法进行了实验仿真和性能分析。研究发现,采用先验信噪比估计算法计算的相邻帧之间增益函数的取值变化过快,导致增强后的语音频谱存在随机尖峰,造成音乐噪声;采用传统谱减法对带噪语音信号进行初步增强造成掩蔽阈值误差偏大,对谱减系数的计算产生较大误差,使语音增强效果降低。
     2、为了减小音乐噪声的影响,提出了基于自适应先验信噪比估计和增益函数平滑相结合的方法,利用带噪语音信号的频谱和估计噪声频谱的差异度对增益函数进行平滑。实验表明,本文算法可以使相邻帧之间增益函数的取值缓慢变化,能够有效消除增强语音频谱上的随机尖峰,使音乐噪声得到有效抑制。
     3、通过对人耳听觉掩蔽效应的研究,对传统的基于听觉掩蔽效应的语音增强技术进行了改进,该方法在初步增强中采用了基于谱熵的改进谱减法,然后结合人耳听觉掩蔽效应,利用初步增强语音计算出掩蔽阈值,得到较为准确的掩蔽阈值,用以动态调整谱减参数。实验表明,该方法能有效改善语音增强的效果。
The main purpose of speech enhancement is to eliminate the noise in noisy speech, to extract more pure speech and it has great significance to improve the performance of digital hearing aids. The major study of this thesis is the speech enhancement technology of digital hearing aids. This thesis studies the speech enhancement based on a prior SNR estimation and the speech enhancement based on human auditory masking effect, and proposes the improved algorithms. The main work of the thesis includes the following aspects:
     First, this thesis studies the speech enhancement of digital hearing aids, and does research on the speech enhancement based on a prior SNR estimation and speech enhancement based on human auditory masking threshold. And then this thesis makes simulation and performance analysis on different algorithm. Therefore we find that the gain function value between adjacent frames changes rapidly, which leads to random spikes in enhanced speech spectrum and results in musical noise. The traditional spectral subtraction has a bad effect on initial speech enhancement, resulting in a large error in masking threshold which causes to calculate the spectral subtraction coefficient inaccuracy; so it reduces the perceptual of the speech enhancement.
     Second, in order to reduce the impact of musical noise, an improved algorithm that combines the adaptive a prior SNR estimation with gain function smoothing is proposed. This proposed algorithm uses the difference degree of noisy speech spectrum and noise spectrum estimated to smooth gain function. The simulation results show that the proposed method can make the slowly varying values of the gain function between adjacent frames; so that it can eliminate the random spikes, suppress the musical noise effectively.
     Third, this thesis conducts research on the human auditory masking effect, analyzes the method based on human auditory masking effect and proposes an improved algorithm. The improved spectral subtraction based on entropy is applied. And this thesis uses the initial enhancement speech to calculate the masking threshold and obtains the more accurate threshold. So we can use the threshold to adjust the spectral subtraction coefficient. The simulation results show that the proposed method can improve the perceptual of the speech enhancement effectively.

引文

[1]应俊,基于DSP数字助听器关键技术的研究[D],北京:军医进修学院,2006
    [2]梁瑞宇、奚吉、张学武,数字助听器发展现状及其算法综述[J],信息化研究,2011,1(37):1-3
    [3]胡旭君,《助听器学》[M],杭州:浙江大学出版社,2010
    [4] Bosman AJ, Snik FM, Mylanus EA, Cremers WR, Fitting range of the BAHA Intenso[J], International Journal of Audiology, 2009, 48(6): 346-352
    [5]兰明,Stephan Sagolla,蒋涛,软体助听器技术临床研究报告[J],中国听力语言康复科学杂志,2004,4:16-19
    [6]肖宪波,数字助听器中若干主要算法的发展和现状[J],生物医学工程,2004,21(4):694-698
    [7] Carbognani F, Burgin F, Henzen L, A 0.67-mm2 45- ?W DSP VLSI implementation of an adaptive directional microphone for hearing aids[C], IEEE Conference on Circuit Theory and Design, 2005, 3: 141-144
    [8]高杰,数字助听器中语音增强算法的研究[D],北京:清华大学医学院,2003
    [9] Don Hayes,瞬时噪声衰减技术解析[C],第三次助听器验配技术学术会议,2008:162-164
    [10] J. S. Lim, A. V. Oppenheim, All-pole modeling of degraded speech[J], IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, 26(3): 197-210
    [11] S. F .Boll, Suppression of acoustic noise in speech using spectral subtraction[J], IEEE Transaction on Acoustics, Speech and Signal Processing, 1979, 27(3): 113-120
    [12] R. J. Mcalay, M. L. Malpass, Speech enhancement using a soft-decision noise suppression filter[J], IEEE Transactions on Acoustics, Speech and Signal Processing, 1980, 28(2): 137-145
    [13] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator[J], IEEE Trans on Acoustics, Speech and Signal Processing, 1985, 33: 443-445
    [14] K. K. Paliwal, A. Basu, A speech enhancement method based on Kalman filtering[C], International Conference of Speech and Signal Processing, 1987, 177-180
    [15] Nathalie V, Signal channel speech enhancement based on masking properties of human auditory system[J], IEEE Transactions on Speech and Audio Processing, 1999, 7(2): 126-137
    [16] Visser E, Te-Won Lee, Speech enhancement using blind source separation and two-channelenergy based speaker detection[C], IEEE International Conference on Speech and Signal Proceedings, 2003, 1: 884-887
    [17] Y. Ephraim, H. L. Van Trees, A signal subspace approach for speech enhancement[J], IEEE Transactions on Speech and Audio Processing, 1995, 3(4): 251-256
    [18] Rezayee A, Gazor S, An adaptive KLT approach for speech enhancement[J], IEEE Transactions on Speech and Audio Processing, 2001, 9(2):87-95
    [19] Bahoura. M, Roust. J, Wavelet speech enhancement based on the teager energy operator[C], IEEE Signal Processing Letter, 2001, 8(1): 10-12
    [20] Liew Ban Fah, Hussain A, Samad S A, Speech enhancement by noise cancellation using neural network[J], Tencon 2000. Proceedings, Kuala Lumpur, 2000, 39-42
    [21] Hasan M K, Zilany M S A,Khan M R, DCT speech enhancement with hard and soft thresholding criteria[J], Electronics Letters, 2002, 38(13): 669-670
    [22] Zhimao Lu, Baisen Liu, Liran Shen, Speech enhancement based on hilbert-huang transform theory[J], Computer and Computational Sciences, 2006, 1: 208-213
    [23]冯定香、范小利,助听器领域的最新技术及应用[J],中国听力语言康复科学杂志,2008,2:71-73
    [24] Wen-chih Wu, Cheng-Hsun Hsieh, Hearing aids system with 3D sound localization[C], Conference on TENCON 2007 IEEE Region 10, Taipei, 2007: 1-4
    [25]韩纪庆、张磊、郑铁然,《语音信号处理》[M],北京:清华大学出版社,2005
    [26] Wang D L and Lim J S, The unimportance of phase in speech enhancement[J], IEEE Trans Acoustic Speech Signal processing, 1982, 30(4): 670-681
    [27]胡广书,数字信号处理-理论、算法与实现[M],北京:清华大学出版社,2003
    [28]胡航,《语音信号处理》[M],哈尔滨工业出版社,2002
    [29] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimation[J], IEEE Transactions on Acoustics, Speech and Signal Processing, 1984, 32(6): 1109-1121
    [30]张雪英,《数字语音处理及Matlab仿真》[M],北京:电子工业出版社,2010
    [31] B. F. Wu and K. C. Wang, Noise spectrum estimation with entropy-based VAD in non-stationary environments, IEICE (Japan) Trans Fundamental, 2006: 13-17
    [32]李昱、林志谋、黄云鹰,基于短时能量和短时过零率的VAD算法极其FPGA实现[J],电子技术应用,2009,9:110-113
    [33]傅祖芸,《信息论-基础理论与应用》[M],西安:电子工业出版社,2007
    [34] Shen J L, Hung J W, Lee L S, Robust entropy-based endpoint detection for speech recognition in noisy environments[C], International Conference on Spoken Language Processing, Sydney, Australia, 1998: 232-238
    [35]刘华平、李昕、郑宇,一种改进的自适应子带谱熵语音端点检测方法[J],系统仿真学报,2008,20(5):1366–1371
    [36]杨秋成、范炜玮,基于先验估计的语音增强方法[J],信号处理,2008,2(24):329-332
    [37] Ch. V. Rama Rao, M. B. Rama Murthy, K. Srinivasa, Speech enhancement using a modified a priori SNR and adaptive spectral gain control[J], IEEE International Journal of Computer Applications, 2011, 12(12): 13-17
    [38] Nils Hoglund, sven Nordholm, Improved a priori snr estimation with applications in Log-MMSE speech enhancement[J], IEEE Workshop on Applications of Audio and Acoustic, 2009, 189-192
    [39] Sunil Devdas Kalmath, A multi-band spectral subtraction method for speech enhancement[D], The University of Texas at Dallas, 2001: 13-23
    [40] Hassani. M, mollaei. M. R, Speech enhancement based on spectral subtraction in wavelet domain[C], IEEE International Colloquium on Signal Processing and its Application, 2011, 366-370
    [41] Cappe` Oliver, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor[J], IEEE Trans on Speech and Audio Processing, 1994, 2(3): 345-349
    [42]夏冬冬,基于短时谱估计的语音增强算法研究[D],安徽:中国科学技术大学,2006
    [43] D. D. Greenwood, Critical bandwidth and frequency coordinates of the basilar membrane[J], Acoustic Social of America, 1961(33): 1344-1356
    [44] Painter T, Spanias A, Perceptual coding of digital audio[J], Proceedings of the IEEE, 2000, 88(4): 451-512
    [45]蔡汉添、袁波涛,一种基于听觉掩蔽模型的语音增强算法[J],通信学报, 2002,23(8):93-98
    [46] Yoshifumi C, Katsumori H, Hidetoshi N, Real-time processing using the frequency domain binaural model[J], Speech Communication, 2008, 49(7): 542-557
    [47] Kotta M, Preeti R, Speech enhancement in non-stationary noise environments using noise properties[J], Speech Communication, 2006, 48(1): 96-109
    [48]陶智、赵鹤鸣、龚呈卉,基于听觉掩蔽效应和Bark子波变换的语音增强[J],声学学报,2005,30(4):367-372
    [49]陈国明、赵力、邹彩荣,一种基于短时谱估计和人耳掩蔽效应的语音增强算法[J],电子与信息学报,2007,4(29):862-866
    [50] Lu C D, Reduction of musical residual noise for speech enhancement using masking properties and optimal smoothing [J]. Pattern Recognition Letter, 2007, 28(11): 1300-1306
    [51] Zhong L, Rafik G, Richard M, Noise estimation using speech/non-speech frame decision and subbed spectral tracking[J], Speech Communication, 2008, 49(7): 542-557
    [52] Hansen J. H. L, Radhakrishnan V, Arehart K. H, Speech enhancement based on generalized minimum mean square error estimation and masking properties of the auditory system[J], IEEE Transaction on Audio, Speech and Language Processing, 2006, 6(14): 2049-2063
    [53] Zhi T, He M. Z, Noise reduction in whisper speech based on the auditory masking model[C], International Conference on Information, Network and Automation, 2010, 2: 272-277
    [54]李振静、王国胤、杨勇、罗飞,基于谱熵噪声估计的改进减谱法[J],计算机工程,2009,35(18):164-166

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700