联合贝叶斯估计与深度神经网络的语音增强方法

英文篇名：Joint Bayesian Estimation and Deep Neural Network Based Speech Enhancement Method
作者：黄张翼 ; 周翊 ; 舒晓峰 ; 刘宏清
英文作者：HUANG Zhang-yi;ZHOU Yi;SHU Xiao-feng;LIU Hong-qing;School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications;
关键词：语音增强 ; 贝叶斯估计 ; 深度神经网络 ; 特征提取
英文关键词：speech enhancement;;Bayesian estimation;;deep neural networks;;feature extraction
中文刊名：XXWX
英文刊名：Journal of Chinese Computer Systems
机构：重庆邮电大学通信与信息工程学院;
出版日期：2019-01-15
出版单位：小型微型计算机系统
年：2019
期：v.40
基金：国家自然科学基金项目(61501072)资助;; 重庆市科委自然科学基金项目(cstc2015jcyjA40027)资助
语种：中文;
页：XXWX201901009
页数：5
CN：01
ISSN：21-1106/TP
分类号：42-46

摘要

目前,深度学习的研究方法已经成为了语音增强算法的新趋势,而输入的特征是影响增强效果的关键因素.实验表明,输入增强过的语音特征相对原始特征能更好地提升神经网络的语音增强效果.因此,本文首先提出一种改进的Chi分布下基于听觉感知广义加权的贝叶斯估计器,接着将改进的贝叶斯估计器作为深度神经网络的输入特征提取器,进而得到一种联合深度神经网络与Chi分布下基于听觉感知广义加权的贝叶斯估计器预处理的新型网络结构.实验仿真证明,提出的联合算法较传统语音增强算法与基于深度神经网络的语音增强算法在各个噪声环境下,各种性能指标均有了明显的提升.
Recently,deep learning based methods have become a newtrend for speech enhancement research,and the input features play an important role in these methods. The experimental results reveal that the enhanced speech features can benefit the speech enhancement effects of neural network,showing superior performance than the conventional features. So firstly,in this paper,a generalized weighted Bayesian estimator with Chi prior pre-processing is extracted. Then,the improved Bayesian estimator is used as the input feature extractor of the deep neural network,a newnetwork structure based on the deep neural network with a generalized weighted Bayesian estimator with Chi prior pre-processing is obtained. Evaluation results showthat a significant improvement is achieved by the proposed method compared to the other algorithms.

引文

[1] Liu Wen-ju,Nie Shuai,Liang Shan,et al. Deep learning based speech separation technology and its developments[J]. ACTA Automatica Sinica,2016,42(6):819-833.
    [2] Boll S. Suppression of acoustic noise in speech using spectral subtraction[J]. IEEE Transactions on Acoustics Speech&Signal Processing,1979,27(2):113-120.
    [3] Wiener N. Extrapolation,interpolation,and smoothing of stationary time series[M]. Cambridge,MA:MIT Press,1949.
    [4] Chen J,Benesty J,Huang Y,et al. Newinsights into the noise reduction wiener filter[J]. IEEE Transactions on Audio,Speech,and Language Processing,2006,14(4):1218-1234.
    [5] Ephraim Y,Malah D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing,1984,32(6):1109-1121.
    [6] De Moor B. The singular value decomposition and long and short spaces of noisy matrices[J]. IEEE Transactions on Signal Processing,1993,41(9):2826-2838.
    [7] Hinton G E,Osindero S,The Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation,2006,18(7):1527-1554.
    [8] Xu Y,Du J,Dai L R,et al. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal Processing Letters,2014,21(1):65-68.
    [9] Nie S,Liang S,Li H,et al. Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation[C]. Acoustics,Speech and Signal Processing,IEEE International Conference on. IEEE,2016:469-473.
    [10] Huang P S,Kim M,Hasegawa-Johnson M,et al. Joint optimization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACMTransactions on Audio,Speech and Language Processing,2015,23(12):2136-2147.
    [11] Han W,Zhang X,Min G,et al. A novel single channel speech enhancement based on joint deep neural network and wiener filter[C]. Progress in Informatics and Computing,2015 IEEE International Conference on. IEEE,2015:163-167.
    [12] Han W,Wu C,Zhang X,et al. Speech enhancement based on improved deep neural networks with MMSE pretreatment features[C]. Signal Processing,2016 IEEE 13th International Conference on. IEEE,2016:1140-1145.
    [13] Lotter T,Vary P. Speech enhancement by MAP spectral amplitude estimation using a super-gaussian speech model[J]. Eurasip Journal on Advances in Signal Processing,2005,2005(7):1-17.
    [14] Loizou P C. Speech enhancement:theory and practice[M]. CRC Press,2013.
    [1]刘文举,聂帅,梁山,等.基于深度学习语音分离技术的研究现状与进展[J].自动化学报,2016,42(6):819-833.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700