基于双耳线索编码原理的语音增强方法

英文篇名：Speech Enhancement Method Based on Binaural Cues Coding Principle
作者：陈楠 ; 鲍长春
英文作者：CHEN Nan;BAO Chang-chun;Faculty of Information Technology,Beijing University of Technology;
关键词：语音增强 ; 双耳线索编码 ; 码书驱动 ; 深度神经网络
英文关键词：speech enhancement;;binaural cue coding;;codebook driven;;deep neural network
中文刊名：DZXU
英文刊名：Acta Electronica Sinica
机构：北京工业大学信息学部;
出版日期：2019-01-15
出版单位：电子学报
年：2019
期：v.47;No.431
语种：中文;
页：DZXU201901030
页数：7
CN：01
ISSN：11-2087/TN
分类号：229-235

摘要

借助双耳线索编码原理,通过构建一个语音和噪声的双耳线索先验码书,本文提出一种单通道语音增强方法.首先,该算法将语音和噪声的双耳线索作为语音和噪声的先验知识,在线下被训练成为先验码书.之后,在线上通过加权码书映射(Weighted Code Book Mapping,WCBM)算法估计纯净线索参数,最后,利用双耳线索编码原理增强含噪语音.此外,本文采用深度神经网络,即堆栈式自编码器(Stacked Auto-Encoders,SAE)代替WCBM算法估计纯净线索参数,提出了基于深度神经网络的双耳线索语音增强算法.进一步提高了增强算法的性能.客观测试结果表明,本文所提方法优于参考算法.
In this paper,a single channel speech enhancement method is proposed by constructing a priori binaural cue codebook of speech and noise based on binaural cue coding principle. Firstly,as a priori information, the binaural cues of speech and noise are offline trained to form a priori codebook. Then, the weighted codebook mapping( WCBM) algorithm is used to estimate the clean cue. At last, the noisy speech is enhanced with binaural cue coding( BCC) model. Moreover, an estimation method of the clean cue is proposed for further improving performance based on deep neural network,namely stacked auto-encoders( SAE),instead of WCBM algorithm. Objective test results show that the proposed method is superior to the reference methods.

引文

[1] Boll S F. Suppression of acoustic noise in speech using spectral subtraction[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing,1979,27(2):113-120.
    [2] Lim J S,Oppenheim A V. Enhancement and bandwidth compression of noisy speech[J]. Proceedings of the IEEE,1979,67(12):1586-1604.
    [3] Srinivasan S,Samuelsson J,Klejin W B. Codebook driven short-term prediction parameter estimation for speech enhancement[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing,2006,14(1):163-175.
    [4] Srinivasan S,Samuelsson J,Klejin W B. Codebook-based bayesian speech enhancement for nonstationary environments[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing,2007,15(2):441-451.
    [5]何玉文,鲍长春,夏丙寅.基于AR-HMM在线能量调整的语音增强方法[J].电子学报,2014,42(10):1991-1997.HE Yu-w en,BAO Chang-chun,XIA Bing-yin. Online energy adjustment using AR-HM M for speech enhancement[J]. Acta Electronica Sinica,2014,42(10):1991-1997.(in Chinese)
    [6]梁岩,鲍长春,夏丙寅,何玉文,李娜.基于高斯混合模型的压缩域语音增强方法[J].电子学报,2012,40(10):2031-2038.LIANG Yan,BAO Chang-chun,HE Yu-w en,LI Na. Compressed domain speech enhancement based on gaussian mixture model[J]. Acta Electronica Sinica,2012,40(10):2031-2038.(in Chinese)
    [7]Faller C,Baumgarte F. Binaural cue coding-part1:Psychoacoustic fundamentals and design principles[J]. IEEE Transactions on Audio,Speech,Language Processing,2003,11(6):509-519.
    [8]Faller C,Baumgarte F. Binaural cue coding-part2:Schemes and applicarions[J]. IEEE Transactions on Audio,Speech,Language Processing,2003,11(6):520-531.
    [9]张勇,胡瑞敏.基于高斯混合模型的语音频带扩展所发的研究[J].声学学报,2009,34(5):471-480.ZHANG Yong,HU Rui-min. Speech w ideband extension based on gaussian mixture model[J]. Chinese Journal of Acoustics,2009,34(5):471-480.(in Chinese)
    [10]孟宪波,鲍长春.基于最小控制GARCH模型的噪声估计方法[J].电子学报,2016,44(3):747-752.M ENG Xian-bo,BAO Chang-chun. Noise estimate algorithm based on minima controlled GARCH model[J]. Acta Electronica Sinica,2016,44(3):747-752.(in Chinese)
    [11]Araki S,Araki T. Exploring multi-channel features for denoising-autoencoder-based speech enhancement[A]. Proceedings of the 40th International Conference on Acoustics,Speech and Signal Processing(ICASSP)[C]. Brisbane,Australia:IEEE Press,2015. 116-120.
    [12] Hinton G E,Osindero S. A fast learning algorithm for deep belief nets[J]. Neural Computation,2006,18(7):1527-1554.
    [13] Quackenbush S R,Barnwell T P,Clements M A. Objective M easures of Speech Quality[M]. Englew ood Cliffs,NJ:Prentice Hall,1988.
    [14]ITU-T,Recommendation P. 862. Perceptual Evaluation of Speech Quality(PESQ):An Objective M ethod for Endto-End Speech Quality Assessment of Narrow-Band Telephone Netw orks and Speech Coders[S]. 2001.
    [15]Abramson A,Cohen I. Simultaneous detection and estimation approach for speech enhancement[J]. IEEE Transactions on Audio,Speech and Language Processing,2007,15(8):2348-2359.
    [16]Ephraim Y,Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing,1985,23(2):443-445.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700