摘要
在电话会议、智能音箱等应用场景下,传声器往往处在声源的远场。混响信号的存在会掩蔽后续到达的直达声信号,降低传声器接收信号的语声质量以及语声识别系统的准确识别率。多通道线性预测算法是一种经典的盲去混响算法,但该算法往往具有较高的计算复杂度。该文提出了一种简化的卡尔曼滤波更新算法,通过对角化卡尔曼滤波器状态向量误差协方差矩阵,降低了自适应多通道线性预测去混响算法的复杂度。通过与现有分块对角简化算法对比发现,该文提出的简化算法在保证语声质量的同时,进一步降低了原卡尔曼滤波算法的复杂度。
Microphones are always far away from the speech source in the video-conference systems and intelligent loudspeakers applications. Reverberation signal will smear successive direct signal, which severely degrades the audible speech quality of the captured signals and the performance of automatic speech recognition(ASR) system. The multi-channel linear prediction(MCLP) algorithm is one of the classical blind dereverberation methods, but it suffers from high computational cost. We propose a simplified Kalman filter algorithm, which reduces the complexity of adaptive MCLP dereverberation method by diagonalizing the state error correlation matrix. Compared with the original Kalman filter, the complexity of the proposed algorithm is reduced considerably without significant performance degration.
引文
[1]Naylor P,Gaubitch N.Speech dereverberation[M].London:Springer-Verlag,2010:6–8.
[2]Miyoshi M,Kaneda Y.Inverse filtering of room acoustics[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1988,36(2):145–152.
[3]Bekrani M,Khong A.A robust MINT equalization algorithm based on near-common zero concept[C].Electrical Engineering(ICEE),IEEE,2017:1685–1690.
[4]Smith J.Spectral audio signal processing[M].USA:W3K Publishing,2011:300.
[5]Yoshioka T,Nakatani T,Miyoshi M.Integrated speech enhancement method using noise suppression and dereverberation[J].IEEE Transactions on Audio,Speech,and Language Processing,2009,17(2):231–246.
[6]Yoshioka T,Nakatani T.Generalization of multichannel linear prediction methods for blind MIMO impulse response shortening[J].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(10):2707–2720.
[7]Yoshioka T,Tachibana H,Nakatani T,et al.Adaptive dereverberation of speech signals with speaker position change detection[C].International Conference on Acoustics,Speech and Signal Processing,IEEE,2009:3733–3736.
[8]Braun S,Habets E.Online dereverberation for dynamic scenarios using a Kalman filter with an autoregressive model[J].IEEE Signal Processing Letters,2016,23(12):1741–1745.
[9]Dietzen T,Doclo S,Spriet A,et al.Low-complexity Kalman filter for multi-channel linear-prediction-based blind speech dereverberarion[C].IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,IEEE,2017.
[10]Nakatani T,Juang B H,Yoshioka T,et al.Speech dereverberation based on maximum-likelihood estimation with time-varying Gaussian source model[J].IEEE Transactions on Audio,Speech,and Language Processing,2008,16(8):1512–1527.
[11]Graham A.Kronecker products and matrix calculus:with applications[M].New York:John Wiley&Sons,1982:130.
[12]Schwartz B,Gannot S,Habets E.Online speech dereverberation using Kalman filter and EM algorithm[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(2):394–406.
[13]Esch T,Vary P.Speech enhancement using a modified Kalman filter based on complex linear prediction and supergaussian priors[C].International Conference on Acoustics,Speech and Signal Processing,IEEE,2008:4877–4880.
[14]Erkelens J S,Heusdens R.Correlation-based and modelbased blind single-channel late-reverberation suppression in noisy time-varying acoustical environments[J].IEEE Transactions on Audio,Speech,and Language Processing,2010,18(7):1746–1765.
[15]Enzner G,Vary P.Frequency-domain adaptive Kalman filter for acoustic echo control in hands-free telephones[J].Signal Processing,2006,86(6):1140–1156.
[16]Enzner G.Bayesian inference model for applications of time-varying acoustic system identification[C].18th European Signal Processing Conference,2010:2126–2130.
[17]Cohen A,Stemmer G,Ingalsuo S,et al.Combined weighted prediction error and minimum variance distortionless response for dereverberation[C].International Conference on Acoustics,Speech and Signal Processing,IEEE,2017:446–450.
[18]Paleologu C,Benesty J,Ciochin?S.Study of the general Kalman filter for echo cancellation[J].IEEE Transactions on Audio,Speech,and Language Processing,2013,21(8):1539–1549.
[19]Yang F,Enzner G,Yang J.Frequency-domain adaptive Kalman filter with fast recovery of abrupt echo-path changes[J].IEEE Signal Processing Letters,2017,24(12):1778–1782.
[20]Garofolo J.Getting started with the DARPA TIMIT CD-ROM:Anacoustic-phonetic continuous speech database[S].National Institute of Standards and Technology(NIST).Gaithersburg,MD,USA,1993.
[21]Allen J B,Berkley D A.Image method for efficiently simulating small-room acoustics[J].Journal of the Acoustical Society of America,1979,65(4):943–950.
[22]Nakatani T,Yoshioka T,Kinoshita K,et al.Speech dereverberation based on variance-normalized delayed linear prediction[J].IEEE Transactions on Audio,Speech,and Language Processing,2010,18(7):1717–1731.
[23]Nippon Telegraph and Telephone Corporation(NTT).WPE speech dereverberation[EB/OL].[2017-11-26].http://www.kecl.ntt.co.jp/icl/signal/wpe/.