摘要
针对非负矩阵分解(NMF)语音增强算法在低信噪比(SNR)非稳定环境下存在噪声残留的问题,提出一种基于感知掩蔽的重构NMF(PM-RNMF)单通道语音增强算法。首先,将心理声学掩蔽特性应用于NMF语音增强算法中;其次,对不同频率位采用不同的掩蔽阈值,建立自适应感知掩蔽增益函数,通过阈值约束残余噪声能量和语音失真能量;最后,结合语音存在概率(SPP)进行感知增益修正,重构NMF算法,以此建立新的目标函数。仿真结果表明,在不同SNR的3种非稳定噪声环境下,与NMF、重构NMF(RNMF)、感知掩蔽深度神经网络(PM-DNN)算法相比,PM-RNMF算法的感知语音质量评估(PESQ)平均值分别提高了0.767、0.474、0.162,信源失真比(SDR)平均值分别提高了2.785、1.197、0.948。实验结果表明,无论是在低频还是高频PM-RNMF有更好的降噪效果。
Aiming at the problem of noise residual in Non-negative Matrix Factorization(NMF) speech enhancement algorithm in low Signal-to-Noise Ratio(SNR) unsteady environment, a Perceptual Masking-based reconstructed NMF(PM-RNMF) single-channel speech enhancement algorithm was proposed. Firstly, psychoacoustic masking features were applied to NMF speech enhancement algorithms. Secondly, different masking thresholds were used for different frequencies to establish an adaptive perceptual masking gain function, and the residual noise energy and speech distortion energy were constrained by the thresholds. Finally, Speech Presence Probability(SPP) was combined to realize perceptual gain correction, the NMF algorithm was reconstructed and a new objective function was established. The simulation results show that under three kinds of unsteady noise environments with different SNR, the average Perceptual Evaluation of Speech Quality(PESQ) of PM-RNMF algorithm is improved by 0.767, 0.474 and 0.162 respectively and the average Signal-to-Distortion Ratio(SDR) is increased by 2.785, 1.197 and 0.948 respectively compared with NMF, RNMF(Reconstructive NMF) and PM-DNN(Perceptual Masking-Deep Neural Network) algorithms. Experimental results show that PM-RNMF has better noise reduction effect in both low frequency and high frequency.
引文
[1]VENKATESWARLU S C,PRASAD K S,REDDY A S.Improve speech enhancement using Wiener filtering[J].Global Journal of Computer Science and Technology,2011,11(7):30-38.
[2]MARTIN R.Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors[C]//ICASSP2002:Proceedings of the 2002 IEEE International Conference on Acoustics,Speech,and Signal Processing.Piscataway,NJ:IEEE,2002,1:253-256.
[3]XU Y,DU J,DAI L,et al.A regression approach to speech enhancement based on deep neural networks[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(1):7-19.
[4]韩伟,张雄伟,闵刚,等.基于感知掩蔽深度神经网络的单通道语音增强方法[J].自动化学报,2017,43(2):248-258.(HAN W,ZHANG X W,MIN G,et al.A single-channel speech enhancement approach based on perceptual masking deep neural network[J].Acta Automatica Sinica,2017,43(2):248-258.)
[5]MOHAMMADIHA N,SMARAGDIS P,LEIJON A.Supervised and unsupervised speech enhancement using nonnegative matrix factorization[J].IEEE Transactions on Audio,Speech,and Language Processing,2013,21(10):2140-2151.
[6]蒋茂松,王冬霞,牛芳琳,等.稀疏正则非负矩阵分解的语音增强算法[J].计算机应用,2018,38(4):1176-1180.(JIANG M S,WANG D X,NIU F L,et al.Speech enhancement method based on sparsity-regularized non-negative matrix factorization[J].Journal of Computer Applications,2018,38(4):1176-1180.)
[7]WILSON K W,RAJ B,SMARAGDIS P,et al.Speech denoising using non-negative matrix factorization with priors[C]//ICASSP2008:Proceedings of the 2008 IEEE International Conference on A-coustics,Speech and Signal Processing.Piscataway,NJ:IEEE,2008:4029-4032.
[8]HOYER P O.Non-negative matrix factorization with sparseness constraints[J].Journal of Machine Learning Research,2004,5(9):1457-1469.
[9]路成,田猛,周健,等.L1/2稀疏约束卷积非负矩阵分解的单通道语音增强方法[J].声学学报,2017,42(3):377-384.(LU C,TIAN M,ZHOU J,et al.A single-channel speech enhancement approach using convolution non-negative matrix factorization with L1/2sparse constraint[J].Acta Acustica,2017,42(3):377-384.)
[10]KWON K,SHIN J W,KIM N S.NMF-based speech enhancement using bases update[J].IEEE Signal Processing Letters,2015,22(4):450-454.
[11]CHUNG H,PLOURDE E,CHAMPAGNE B.Basis compensation in non-negative matrix factorization model for speech enhancement[C]//ICASSP 2016:Proceedings of the 2016 IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway,NJ:IEEE,2016:2249-2253.
[12]HU Y,LOIZOU P C.Incorporating a psychoacoustical model in frequency domain speech enhancement[J].IEEE Signal Processing Letters,2004,11(2):270-273.
[13]张毅,王可佳,席兵,等.基于子带能熵比的语音端点检测算法[J].计算机科学,2017,44(5):304-307.(ZHANG Y,WANG KJ,XI B,et al.Speech endpoint detection algorithm based on subband energy-entropy-ratio[J].Computer Science,2017,44(5):304-307.)
[14]LEE S,HAN D K,KO H.Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities[J].Applied Acoustics,2017,117:257-262.
[15]SUNNYDAYAL V,KUMAR T K.Speech enhancement using posterior regularized NMF with bases update[J].Computers and E-lectrical Engineering,2017,62:663-675.
[16]RIX A W,BEERENDS J G,HOLLIER M P,et al.Perceptual E-valuation of Speech Quality(PESQ)-a new method for speech quality assessment of telephone networks and codecs[C]//IC-ASSP 2001:Proceedings of the 2001 IEEE International Conference on Acoustics,Speech,and Signal Processing.Piscataway,NJ:IEEE,2001,2:749-752.
[17]HUANG P S,KIM M,HASEGAWA-JOHNSON M,et al.Deep learning for monaural speech separation[C]//ICASSP 2014:Proceedings of the 2014 IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway,NJ:IEEE,2014:1562-1566.